RE: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board support

2015-07-16 Thread Priyanka Jain


> -Original Message-
> From: Wood Scott-B07421
> Sent: Friday, July 17, 2015 1:06 AM
> To: Jain Priyanka-B32167
> Cc: linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB
> board support
> 
> On Thu, 2015-07-16 at 04:34 -0500, Jain Priyanka-B32167 wrote:
> >
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Wednesday, July 15, 2015 11:17 PM
> > To: Jain Priyanka-B32167
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH][v2] powerpc/fsl-booke: Add
> T1040D4RDB/T1042D4RDB
> > board support
> >
> > On Wed, 2015-07-15 at 15:00 +0530, Priyanka Jain wrote:
> > > T1040D4RDB/T1042D4RDB are Freescale Reference Design Board which
> can
> > > support T1040/T1042 QorIQ Power Architecture™ processor respectively
> > >
> > > T1040D4RDB/T1042D4RDB board Overview
> > > -
> > > - SERDES Connections, 8 lanes supporting:
> > > - PCI
> > > - SGMII
> > > - SATA 2.0
> > > - QSGMII(only for T1040D4RDB)
> > > - DDR Controller
> > > - Supports rates of up to 1600 MHz data-rate
> > > - Supports one DDR4 UDIMM
> > > -IFC/Local Bus
> > > - NAND flash: 1GB 8-bit NAND flash
> > > - NOR: 128MB 16-bit NOR Flash
> > > - Ethernet
> > > - Two on-board RGMII 10/100/1G ethernet ports.
> > > - PHY #0 remains powered up during deep-sleep
> > > - CPLD
> > > - Clocks
> > > - System and DDR clock (SYSCLK, “DDRCLK”)
> > > - SERDES clocks
> > > - Power Supplies
> > > - USB
> > > - Supports two USB 2.0 ports with integrated PHYs
> > > - Two type A ports with   5V@1.5Aperport.
> > > - SDHC
> > > - SDHC/SDXC connector
> > > - SPI
> > > - On-board 64MB SPI flash
> > > - I2C
> > > - Devices connected: EEPROM, thermal monitor, VID controller
> > > - Other IO
> > > - Two Serial ports
> > > - ProfiBus port
> > >
> > > Add support for T1040/T1042D4RDB board:
> > > -add device tree
> > > -Add entry in corenet_generic.c
> > >
> > > Signed-off-by: Priyanka Jain 
> > > ---
> > >  Changes for v2:
> > >   Incorporated Scott's comments on device tree
> >
> > You didn't respond to the comments on the CPLD node.
> > [Priyanka]
> > T1042D4RDB,  T1040D4RDB are derivatives of same board , CPLD is same
> > for both.
> > So, I have moved below node having compatible and reg field together
> > in t104xd4rdb.dtsi.
> > Is this fine?
> >   cpld@3,0 {
> >   compatible = "fsl,t1040d4rdb-cpld";
> >   reg = <3 0 0x300>;
> >   };
> 
> If the CPLD image is exactly the same on both, this is fine.
> 
> > > +i2c@118100{
> > > +  mux@77{
> > > + compatible = "nxp,pca9546";
> > > + reg = <0x77>;
> > > + #address-cells = <1>;
> > > + #size-cells = <0>;
> > > + };
> > > + };
> >
> > A mux with no nodes under it (and yet it has #address-cells/#size-cells)?
> > What is it multiplexing?
> > [Priyanka]: PCA9546 is i2c mux device , to which other i2c devices
> > (up-to 8
> > ) can be further connected on output channels On T104xD4RDB,  channel
> > 0, 1, 3 line are connected to PEX device, Channel 2 to hdmi interface
> > (initialization is done in u-boot only), other channels are grounded.
> > So, as such Linux is not using the second level I2C devices connected
> > on this MUX device. So, I have not shown next level hierarchy.
> > Should I replace 'mux' with some other name? . Please suggest.
> 
> The device tree describes the hardware, not just what Linux uses... but what
> I don't understand is why you describe the mux at all if you're not going to
> describe what goes underneath it.
> 
[Jain Priyanka-B32167] : Is below looks OK?
i2c@118100{
 +  i2c@77{
 + compatible = "nxp,pca9546";
 + reg = <0x77>;
 + #address-cells = <1>;
 + #size-cells = <0>;
 + };
 + };
> -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 10/11] powerpc/eeh: Support error recovery for VF PE

2015-07-16 Thread Wei Yang
Different from PCI bus dependent PE, PE for VFs doesn't have the
primary bus, on which the PCI hotplug is implemented. The patch
supports error recovery, especially the PCI hotplug for VF's PE.
The hotplug on VF's PE is implemented based on VFs, instead of
PCI bus any more.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h   |1 +
 arch/powerpc/kernel/eeh.c|8 +++
 arch/powerpc/kernel/eeh_driver.c |  100 ++
 arch/powerpc/kernel/eeh_pe.c |3 +-
 4 files changed, 90 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 331c856..ea1f13c4 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -142,6 +142,7 @@ struct eeh_dev {
struct pci_controller *phb; /* Associated PHB   */
struct pci_dn *pdn; /* Associated PCI device node   */
struct pci_dev *pdev;   /* Associated PCI device*/
+   intin_error;/* Error flag for eeh_dev   */
struct pci_dev *physfn; /* Associated PF PORT   */
struct pci_bus *bus;/* PCI bus for partial hotplug  */
 };
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index af9b597..28e4d73 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1227,6 +1227,14 @@ void eeh_remove_device(struct pci_dev *dev)
 * from the parent PE during the BAR resotre.
 */
edev->pdev = NULL;
+
+   /*
+* The flag "in_error" is used to trace EEH devices for VFs
+* in error state or not. It's set in eeh_report_error(). If
+* it's not set, eeh_report_{reset,resume}() won't be called
+* for the VF EEH device.
+*/
+   edev->in_error = 0;
dev->dev.archdata.edev = NULL;
if (!(edev->pe->state & EEH_PE_KEEP))
eeh_rmv_from_parent_pe(edev);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 89eb4bc..99868e2 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -211,6 +211,7 @@ static void *eeh_report_error(void *data, void *userdata)
if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
if (*res == PCI_ERS_RESULT_NONE) *res = rc;
 
+   edev->in_error = 1;
eeh_pcid_put(dev);
return NULL;
 }
@@ -282,7 +283,8 @@ static void *eeh_report_reset(void *data, void *userdata)
 
if (!driver->err_handler ||
!driver->err_handler->slot_reset ||
-   (edev->mode & EEH_DEV_NO_HANDLER)) {
+   (edev->mode & EEH_DEV_NO_HANDLER) ||
+   (!edev->in_error)) {
eeh_pcid_put(dev);
return NULL;
}
@@ -339,14 +341,16 @@ static void *eeh_report_resume(void *data, void *userdata)
 
if (!driver->err_handler ||
!driver->err_handler->resume ||
-   (edev->mode & EEH_DEV_NO_HANDLER)) {
+   (edev->mode & EEH_DEV_NO_HANDLER) ||
+   (!edev->in_error)) {
edev->mode &= ~EEH_DEV_NO_HANDLER;
-   eeh_pcid_put(dev);
-   return NULL;
+   goto out;
}
 
driver->err_handler->resume(dev);
 
+out:
+   edev->in_error = 0;
eeh_pcid_put(dev);
return NULL;
 }
@@ -386,12 +390,38 @@ static void *eeh_report_failure(void *data, void 
*userdata)
return NULL;
 }
 
+static void *eeh_add_virt_device(void *data, void *userdata)
+{
+   struct pci_driver *driver;
+   struct eeh_dev *edev = (struct eeh_dev *)data;
+   struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
+   struct pci_dn *pdn = eeh_dev_to_pdn(edev);
+
+   if (!(edev->physfn)) {
+   pr_warn("%s: EEH dev %04x:%02x:%02x.%01x not for VF\n",
+   __func__, edev->phb->global_number, pdn->busno,
+   PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+   return NULL;
+   }
+
+   driver = eeh_pcid_get(dev);
+   if (driver) {
+   eeh_pcid_put(dev);
+   if (driver->err_handler)
+   return NULL;
+   }
+
+   pci_iov_virtfn_add(edev->physfn, pdn->vf_index, 0);
+   return NULL;
+}
+
 static void *eeh_rmv_device(void *data, void *userdata)
 {
struct pci_driver *driver;
struct eeh_dev *edev = (struct eeh_dev *)data;
struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
int *removed = (int *)userdata;
+   struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
/*
 * Actually, we should remove the PCI bridges as well.
@@ -416,7 +446,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
driver = eeh_pcid_get(dev);
if (driver) {
eeh_pcid_put(dev);
-   if (driver->err_handler)
+   if (removed && driver

[PATCH V9 11/11] powerpc/powernv: compound PE for VFs

2015-07-16 Thread Wei Yang
When VF BAR size is larger than 64MB, we group VFs in terms of M64 BAR,
which means those VFs in a group should form a compound PE.

This patch links those VF PEs into compound PE in this case.

[gwshan: code refactoring for a bit]
Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   46 +
 arch/powerpc/platforms/powernv/pci.c  |   17 +--
 2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5738d31..d1530cb 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1359,9 +1359,20 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev, 
u16 num_vfs)
}
 
list_for_each_entry_safe(pe, pe_n, &phb->ioda.pe_list, list) {
+   struct pnv_ioda_pe *s, *sn;
if (pe->parent_dev != pdev)
continue;
 
+   if ((pe->flags & PNV_IODA_PE_MASTER) &&
+   (pe->flags & PNV_IODA_PE_VF)) {
+   list_for_each_entry_safe(s, sn, &pe->slaves, list) {
+   pnv_pci_ioda2_release_dma_pe(pdev, s);
+   list_del(&s->list);
+   pnv_ioda_deconfigure_pe(phb, s);
+   pnv_ioda_free_pe(phb, s->pe_number);
+   }
+   }
+
pnv_pci_ioda2_release_dma_pe(pdev, pe);
 
/* Remove from list */
@@ -1414,7 +1425,7 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, 
u16 num_vfs)
struct pci_bus*bus;
struct pci_controller *hose;
struct pnv_phb*phb;
-   struct pnv_ioda_pe*pe;
+   struct pnv_ioda_pe*pe, *master_pe;
intpe_num;
u16vf_index;
struct pci_dn *pdn;
@@ -1456,10 +1467,13 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, 
u16 num_vfs)
continue;
}
 
-   /* Put PE to the list */
-   mutex_lock(&phb->ioda.pe_list_mutex);
-   list_add_tail(&pe->list, &phb->ioda.pe_list);
-   mutex_unlock(&phb->ioda.pe_list_mutex);
+   /* Put PE to the list, or postpone it for compound PEs */
+   if ((pdn->m64_per_iov != M64_PER_IOV) ||
+   (num_vfs <= M64_PER_IOV)) {
+   mutex_lock(&phb->ioda.pe_list_mutex);
+   list_add_tail(&pe->list, &phb->ioda.pe_list);
+   mutex_unlock(&phb->ioda.pe_list_mutex);
+   }
 
pnv_pci_ioda2_setup_dma_pe(phb, pe);
}
@@ -1472,10 +1486,32 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, 
u16 num_vfs)
vf_per_group = roundup_pow_of_two(num_vfs) / pdn->m64_per_iov;
 
for (vf_group = 0; vf_group < M64_PER_IOV; vf_group++) {
+   master_pe = NULL;
+
for (vf_index = vf_group * vf_per_group;
 vf_index < (vf_group + 1) * vf_per_group &&
 vf_index < num_vfs;
 vf_index++) {
+
+   /*
+* Figure out the master PE and put all slave
+* PEs to master PE's list.
+*/
+   pe = &phb->ioda.pe_array[pdn->offset + 
vf_index];
+   if (!master_pe) {
+   pe->flags |= PNV_IODA_PE_MASTER;
+   INIT_LIST_HEAD(&pe->slaves);
+   master_pe = pe;
+   mutex_lock(&phb->ioda.pe_list_mutex);
+   list_add_tail(&pe->list, 
&phb->ioda.pe_list);
+   mutex_unlock(&phb->ioda.pe_list_mutex);
+   } else {
+   pe->flags |= PNV_IODA_PE_SLAVE;
+   pe->master = master_pe;
+   list_add_tail(&pe->list,
+   &master_pe->slaves);
+   }
+
for (vf_index1 = vf_group * vf_per_group;
 vf_index1 < (vf_group + 1) * vf_per_group 
&&
 vf_index1 < num_vfs;
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 0e4f42e..f3aead0 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -739,7 +739,7 @@ void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
struct pci_controller *hose = pci_bus_to_

[PATCH V9 04/11] powerpc/pci: Remove VFs prior to PF

2015-07-16 Thread Wei Yang
As commit ac205b7bb72f ("PCI: make sriov work with hotplug remove") indicates,
VFs, which might be hooked to same PCI bus as their PF should be removed
before the PF. Otherwise, the PCI hot unplugging on the PCI bus would
cause kernel crash.

The patch applies the above pattern to PowerPC PCI hotplug path.

[gwshan: changelog]
Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/kernel/pci-hotplug.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 7f9ed0c..59c4361 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -55,7 +55,7 @@ void pcibios_remove_pci_devices(struct pci_bus *bus)
 
pr_debug("PCI: Removing devices on bus %04x:%02x\n",
 pci_domain_nr(bus),  bus->number);
-   list_for_each_entry_safe(dev, tmp, &bus->devices, bus_list) {
+   list_for_each_entry_safe_reverse(dev, tmp, &bus->devices, bus_list) {
pr_debug("   Removing %s...\n", pci_name(dev));
pci_stop_and_remove_bus_device(dev);
}
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 09/11] powerpc/powernv: Support PCI config restore for VFs

2015-07-16 Thread Wei Yang
After PE reset, OPAL API opal_pci_reinit() is called on all devices
contained in the PE to reinitialize them. However, VFs can't be seen
from skiboot firmware. We have to implement the functions, similar
those in skiboot firmware, to reinitialize VFs after reset on PE
for VFs.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h|1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |   70 +-
 arch/powerpc/platforms/powernv/pci.c |   18 +++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 7a72f68..c927d5b 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -220,6 +220,7 @@ struct pci_dn {
 #define IODA_INVALID_M64(-1)
int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
+   int mps;
 #endif
struct list_head child_list;
struct list_head list;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 8d88be1..b09c0d1 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1616,6 +1616,67 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
return ret;
 }
 
+static int pnv_eeh_restore_vf_config(struct pci_dn *pdn)
+{
+   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+   u32 devctl, cmd, cap2, aer_capctl;
+   int old_mps;
+
+   /* Restore MPS */
+   if (edev->pcie_cap) {
+   old_mps = (ffs(pdn->mps) - 8) << 5;
+   eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+2, &devctl);
+   devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
+   devctl |= old_mps;
+   eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+ 2, devctl);
+   }
+
+   /* Disable Completion Timeout */
+   if (edev->pcie_cap) {
+   eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP2,
+4, &cap2);
+   if (cap2 & 0x10) {
+   eeh_ops->read_config(pdn,
+   edev->pcie_cap + PCI_EXP_DEVCTL2,
+   4, &cap2);
+   cap2 |= 0x10;
+   eeh_ops->write_config(pdn,
+   edev->pcie_cap + PCI_EXP_DEVCTL2,
+   4, cap2);
+   }
+   }
+
+   /* Enable SERR and parity checking */
+   eeh_ops->read_config(pdn, PCI_COMMAND, 2, &cmd);
+   cmd |= (PCI_COMMAND_PARITY | PCI_COMMAND_SERR);
+   eeh_ops->write_config(pdn, PCI_COMMAND, 2, cmd);
+
+   /* Enable report various errors */
+   if (edev->pcie_cap) {
+   eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+   2, &devctl);
+   devctl &= ~PCI_EXP_DEVCTL_CERE;
+   devctl |= (PCI_EXP_DEVCTL_NFERE |
+  PCI_EXP_DEVCTL_FERE |
+  PCI_EXP_DEVCTL_URRE);
+   eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+   2, devctl);
+   }
+
+   /* Enable ECRC generation and check */
+   if (edev->pcie_cap && edev->aer_cap) {
+   eeh_ops->read_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+   4, &aer_capctl);
+   aer_capctl |= (PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE);
+   eeh_ops->write_config(pdn, edev->aer_cap + PCI_ERR_CAP,
+   4, aer_capctl);
+   }
+
+   return 0;
+}
+
 static int pnv_eeh_restore_config(struct pci_dn *pdn)
 {
struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -1626,7 +1687,14 @@ static int pnv_eeh_restore_config(struct pci_dn *pdn)
return -EEXIST;
 
phb = edev->phb->private_data;
-   ret = opal_pci_reinit(phb->opal_id,
+   /*
+* We have to restore the PCI config space after reset since the
+* firmware can't see SRIOV VFs.
+*/
+   if (edev->physfn)
+   ret = pnv_eeh_restore_vf_config(pdn);
+   else
+   ret = opal_pci_reinit(phb->opal_id,
  OPAL_REINIT_PCI_DEV, edev->config_addr);
if (ret) {
pr_warn("%s: Can't reinit PCI dev 0x%x (%lld)\n",
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 765d8ed..0e4f42e 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -788,6 +788,24 @@ static void pnv_p7ioc_rc_quirk(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IBM, 0x3b9, pnv_p7ioc_rc_quirk);
 
+#ifdef

[PATCH V9 08/11] powerpc/powernv: Support EEH reset for VF PE

2015-07-16 Thread Wei Yang
PEs for VFs don't have primary bus. So they have to have their own reset
backend, which is used during EEH recovery. The patch implements the reset
backend for VF's PE by issuing FLR or AF FLR to the VFs, which are contained
in the PE.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h   |1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c |  134 +-
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index ec21f8f..331c856 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -136,6 +136,7 @@ struct eeh_dev {
int pcix_cap;   /* Saved PCIx capability*/
int pcie_cap;   /* Saved PCIe capability*/
int aer_cap;/* Saved AER capability */
+   int af_cap; /* Saved AF capability  */
struct eeh_pe *pe;  /* Associated PE*/
struct list_head list;  /* Form link list in the PE */
struct pci_controller *phb; /* Associated PHB   */
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e9aec1d..8d88be1 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -404,6 +404,7 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
edev->pcix_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_PCIX);
edev->pcie_cap = pnv_eeh_find_cap(pdn, PCI_CAP_ID_EXP);
edev->aer_cap  = pnv_eeh_find_ecap(pdn, PCI_EXT_CAP_ID_ERR);
+   edev->af_cap   = pnv_eeh_find_cap(pdn, PCI_CAP_ID_AF);
if ((edev->class_code >> 8) == PCI_CLASS_BRIDGE_PCI) {
edev->mode |= EEH_DEV_BRIDGE;
if (edev->pcie_cap) {
@@ -893,6 +894,127 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int 
option)
return 0;
 }
 
+static void pnv_eeh_wait_for_pending(struct pci_dn *pdn, int pos,
+u16 mask, bool af_flr_rst)
+{
+   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+   int status, i;
+
+   /* Wait for Transaction Pending bit to be cleared */
+   for (i = 0; i < 4; i++) {
+   eeh_ops->read_config(pdn, pos, 2, &status);
+   if (!(status & mask))
+   return;
+
+   msleep((1 << i) * 100);
+   }
+
+   pr_warn("%s: Pending transaction while issuing %s FLR to "
+   "%04x:%02x:%02x.%01x\n",
+   __func__, af_flr_rst ? "AF" : "",
+   edev->phb->global_number, pdn->busno,
+   PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
+}
+
+static int pnv_eeh_do_flr(struct pci_dn *pdn, int option)
+{
+   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+   u32 reg;
+
+   if (!edev->pcie_cap)
+   return -ENOTTY;
+
+   eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCAP, 4, ®);
+   if (!(reg & PCI_EXP_DEVCAP_FLR))
+   return -ENOTTY;
+
+   switch (option) {
+   case EEH_RESET_HOT:
+   case EEH_RESET_FUNDAMENTAL:
+   pnv_eeh_wait_for_pending(pdn, edev->pcie_cap + PCI_EXP_DEVSTA,
+PCI_EXP_DEVSTA_TRPND, false);
+   eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+4, ®);
+   reg |= PCI_EXP_DEVCTL_BCR_FLR;
+   eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+ 4, reg);
+   msleep(EEH_PE_RST_HOLD_TIME);
+   break;
+   case EEH_RESET_DEACTIVATE:
+   eeh_ops->read_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+4, ®);
+   reg &= ~PCI_EXP_DEVCTL_BCR_FLR;
+   eeh_ops->write_config(pdn, edev->pcie_cap + PCI_EXP_DEVCTL,
+ 4, reg);
+   msleep(EEH_PE_RST_SETTLE_TIME);
+   break;
+   }
+
+   return 0;
+}
+
+static int pnv_eeh_do_af_flr(struct pci_dn *pdn, int option)
+{
+   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
+   u32 cap;
+
+   if (!edev->af_cap)
+   return -ENOTTY;
+
+   eeh_ops->read_config(pdn, edev->af_cap + PCI_AF_CAP, 1, &cap);
+   if (!(cap & PCI_AF_CAP_TP) || !(cap & PCI_AF_CAP_FLR))
+   return -ENOTTY;
+
+   switch (option) {
+   case EEH_RESET_HOT:
+   case EEH_RESET_FUNDAMENTAL:
+   /*
+* Wait for Transaction Pending bit to clear. A word-aligned
+* test is used, so we use the conrol offset rather than status
+* and shift the test bit to match.
+*/
+   pnv_eeh_wait_for_pending(pdn, edev->af_cap 

[PATCH V9 07/11] powerpc/eeh: Create PE for VFs

2015-07-16 Thread Wei Yang
Current EEH recovery code works with the assumption: the PE has primary
bus. Unfortunately, that's not true for VF PEs, which generally contains
one or multiple VFs (for VF group case).

The patch creates PEs for VFs in the weak function
pcibios_bus_add_device(). Those PEs for VFs are identified with newly
introduced flag EEH_PE_VF so that we handle them differently during EEH
recovery.

[gwshan: changelog and code refactoring]
Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h   |1 +
 arch/powerpc/kernel/eeh_pe.c |   10 --
 arch/powerpc/platforms/powernv/eeh-powernv.c |   16 
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 6c383ad..ec21f8f 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -72,6 +72,7 @@ struct pci_dn;
 #define EEH_PE_PHB (1 << 1)/* PHB PE*/
 #define EEH_PE_DEVICE  (1 << 2)/* Device PE */
 #define EEH_PE_BUS (1 << 3)/* Bus PE*/
+#define EEH_PE_VF  (1 << 4)/* VF PE */
 
 #define EEH_PE_ISOLATED(1 << 0)/* Isolated PE  
*/
 #define EEH_PE_RECOVERING  (1 << 1)/* Recovering PE*/
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 35f0b62..260a701 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev 
*edev)
 * EEH device already having associated PE, but
 * the direct parent EEH device doesn't have yet.
 */
-   pdn = pdn ? pdn->parent : NULL;
+   if (edev->physfn)
+   pdn = pci_get_pdn(edev->physfn);
+   else
+   pdn = pdn ? pdn->parent : NULL;
while (pdn) {
/* We're poking out of PCI territory */
parent = pdn_to_eeh_dev(pdn);
@@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
}
 
/* Create a new EEH PE */
-   pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
+   if (edev->physfn)
+   pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
+   else
+   pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
if (!pe) {
pr_err("%s: out of memory!\n", __func__);
return -ENOMEM;
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 5cf5e6e..e9aec1d 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1524,6 +1524,22 @@ static struct eeh_ops pnv_eeh_ops = {
.restore_config = pnv_eeh_restore_config
 };
 
+void pcibios_bus_add_device(struct pci_dev *pdev)
+{
+   struct pci_dn *pdn = pci_get_pdn(pdev);
+
+   if (!pdev->is_virtfn)
+   return;
+
+   /*
+* The following operations will fail if VF's sysfs files
+* aren't created or its resources aren't finalized.
+*/
+   eeh_add_device_early(pdn);
+   eeh_add_device_late(pdev);
+   eeh_sysfs_add_device(pdev);
+}
+
 /**
  * eeh_powernv_init - Register platform dependent EEH operations
  *
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 06/11] powerpc/powernv: EEH device for VF

2015-07-16 Thread Wei Yang
VFs and their corresponding pci_dn instances are created and released
dynamically as their PF's SRIOV capability is enabled and disabled.
The patch creates and releases EEH devices for VFs when creating and
releasing their pci_dn instances, which means EEH devices and pci_dn
instances have same life cycle. Also, VF's EEH device is identified
by (struct eeh_dev::physfn).

[gwshan: changelog and removed CONFIG_PCI_IOV]
Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/include/asm/eeh.h |1 +
 arch/powerpc/kernel/pci_dn.c   |   12 
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index c5eb86f..6c383ad 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -140,6 +140,7 @@ struct eeh_dev {
struct pci_controller *phb; /* Associated PHB   */
struct pci_dn *pdn; /* Associated PCI device node   */
struct pci_dev *pdev;   /* Associated PCI device*/
+   struct pci_dev *physfn; /* Associated PF PORT   */
struct pci_bus *bus;/* PCI bus for partial hotplug  */
 };
 
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index f771130..f0ddde7 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -180,7 +180,9 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn 
*parent,
 struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 {
 #ifdef CONFIG_PCI_IOV
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
struct pci_dn *parent, *pdn;
+   struct eeh_dev *edev;
int i;
 
/* Only support IOV for now */
@@ -206,6 +208,9 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
 __func__, i);
return NULL;
}
+   eeh_dev_init(pdn, hose);
+   edev = pdn_to_eeh_dev(pdn);
+   edev->physfn = pdev;
}
 #endif /* CONFIG_PCI_IOV */
 
@@ -254,10 +259,17 @@ void remove_dev_pci_data(struct pci_dev *pdev)
for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
list_for_each_entry_safe(pdn, tmp,
&parent->child_list, list) {
+   struct eeh_dev *edev;
if (pdn->busno != pci_iov_virtfn_bus(pdev, i) ||
pdn->devfn != pci_iov_virtfn_devfn(pdev, i))
continue;
 
+   edev = pdn_to_eeh_dev(pdn);
+   if (edev) {
+   pdn->edev = NULL;
+   kfree(edev);
+   }
+
if (!list_empty(&pdn->list))
list_del(&pdn->list);
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 05/11] powerpc/eeh: Cache only BARs, not windows or IOV BARs

2015-07-16 Thread Wei Yang
EEH address cache, which helps to locate the PCI device according to
the given (physical) MMIO address, didn't cover PCI bridges. Also, it
shouldn't return PF with address in PF's IOV BARs. Instead, the VFs
should be returned.

Also, by doing so, it removes the type check in
eeh_addr_cache_insert_dev(), since bridge's window would not be cached.

The patch restricts the address cache to cover first 7 BARs for the
above purposes.

[gwshan: changelog]
Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/kernel/eeh_cache.c |6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index a1e86e1..e6887f0 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -196,7 +196,7 @@ static void __eeh_addr_cache_insert_dev(struct pci_dev *dev)
}
 
/* Walk resources on this device, poke them into the tree */
-   for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+   for (i = 0; i <= PCI_ROM_RESOURCE; i++) {
resource_size_t start = pci_resource_start(dev,i);
resource_size_t end = pci_resource_end(dev,i);
unsigned long flags = pci_resource_flags(dev,i);
@@ -222,10 +222,6 @@ void eeh_addr_cache_insert_dev(struct pci_dev *dev)
 {
unsigned long flags;
 
-   /* Ignore PCI bridges */
-   if ((dev->class >> 16) == PCI_BASE_CLASS_BRIDGE)
-   return;
-
spin_lock_irqsave(&pci_io_addr_cache_root.piar_lock, flags);
__eeh_addr_cache_insert_dev(dev);
spin_unlock_irqrestore(&pci_io_addr_cache_root.piar_lock, flags);
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 03/11] powerpc/pci: Cache VF index in pci_dn

2015-07-16 Thread Wei Yang
The patch caches the VF index in pci_dn, which can be used to calculate
VF's bus, device and function number. Those information helps to locate
the VF's PCI device instance when doing hotplug during EEH recovery if
necessary.

Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h |1 +
 arch/powerpc/kernel/pci_dn.c  |4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 712add5..7a72f68 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -210,6 +210,7 @@ struct pci_dn {
 #define IODA_INVALID_PE(-1)
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
+   int vf_index;   /* VF index in the PF */
 #ifdef CONFIG_PCI_IOV
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index b3b4df9..f771130 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -139,6 +139,7 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
 #ifdef CONFIG_PCI_IOV
 static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
   struct pci_dev *pdev,
+  int vf_index,
   int busno, int devfn)
 {
struct pci_dn *pdn;
@@ -157,6 +158,7 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn 
*parent,
pdn->parent = parent;
pdn->busno = busno;
pdn->devfn = devfn;
+   pdn->vf_index = vf_index;
 #ifdef CONFIG_PPC_POWERNV
pdn->pe_number = IODA_INVALID_PE;
 #endif
@@ -196,7 +198,7 @@ struct pci_dn *add_dev_pci_data(struct pci_dev *pdev)
return NULL;
 
for (i = 0; i < pci_sriov_get_totalvfs(pdev); i++) {
-   pdn = add_one_dev_pci_data(parent, NULL,
+   pdn = add_one_dev_pci_data(parent, NULL, i,
   pci_iov_virtfn_bus(pdev, i),
   pci_iov_virtfn_devfn(pdev, i));
if (!pdn) {
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 01/11] PCI/IOV: Rename and export virtfn_add/virtfn_remove

2015-07-16 Thread Wei Yang
During EEH recovery, hotplug is applied to the devices which don't
have drivers or their drivers don't support EEH. However, the hotplug,
which was implemented based on PCI bus, can't be applied to VF directly.

The patch renames virtn_{add,remove}() and exports them so that they
can be used in PCI hotplug during EEH recovery.

[gwshan: changelog]
Signed-off-by: Wei Yang 
Reviewed-by: Gavin Shan 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/iov.c   |   10 +-
 include/linux/pci.h |8 
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..cc941dd 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -108,7 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, 
int resno)
return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
 
-static int virtfn_add(struct pci_dev *dev, int id, int reset)
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
 {
int i;
int rc = -ENOMEM;
@@ -183,7 +183,7 @@ failed:
return rc;
 }
 
-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset)
 {
char buf[VIRTFN_ID_LEN];
struct pci_dev *virtfn;
@@ -320,7 +320,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
}
 
for (i = 0; i < initial; i++) {
-   rc = virtfn_add(dev, i, 0);
+   rc = pci_iov_virtfn_add(dev, i, 0);
if (rc)
goto failed;
}
@@ -332,7 +332,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
 
 failed:
for (j = 0; j < i; j++)
-   virtfn_remove(dev, j, 0);
+   pci_iov_virtfn_remove(dev, j, 0);
 
iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
pci_cfg_access_lock(dev);
@@ -361,7 +361,7 @@ static void sriov_disable(struct pci_dev *dev)
return;
 
for (i = 0; i < iov->num_VFs; i++)
-   virtfn_remove(dev, i, 0);
+   pci_iov_virtfn_remove(dev, i, 0);
 
pcibios_sriov_disable(dev);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8a0321a..3fed437 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1668,6 +1668,8 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
 
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
+int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset);
+void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int reset);
 int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
@@ -1685,6 +1687,12 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev 
*dev, int id)
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void pci_disable_sriov(struct pci_dev *dev) { }
+static inline int pci_iov_virtfn_add(struct pci_dev *dev, int id, int reset)
+{
+   return -ENOSYS;
+}
+static inline void pci_iov_virtfn_remove(struct pci_dev *dev, int id, int 
reset)
+{ }
 static inline int pci_num_vf(struct pci_dev *dev) { return 0; }
 static inline int pci_vfs_assigned(struct pci_dev *dev)
 { return 0; }
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 02/11] PCI: Add pcibios_bus_add_device() weak function

2015-07-16 Thread Wei Yang
This patch adds a weak function pcibios_bus_add_device() for arch dependent
code could do proper setup. For example, powerpc could setup EEH related
resources.

Signed-off-by: Wei Yang 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/bus.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 6fbd3f2..b7e30a7 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -267,6 +267,7 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx)
 
 void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { }
 
+void __weak pcibios_bus_add_device(struct pci_dev *dev) { }
 /**
  * pci_bus_add_device - start driver for a single device
  * @dev: device to add
@@ -277,6 +278,8 @@ void pci_bus_add_device(struct pci_dev *dev)
 {
int retval;
 
+   pcibios_bus_add_device(dev);
+
/*
 * Can not put in pci_device_add yet because resources
 * are not assigned yet for some devices.
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 00/11] VF EEH on Power8

2015-07-16 Thread Wei Yang
This patchset enables EEH on SRIOV VFs. The general idea is to create proper
VF edev and VF PE and handle them properly.

Different from the Bus PE, VF PE just contain one VF. This introduces the
difference of EEH error handling on a VF PE. Generally, it has several
differences.

First, the VF's removal and re-enumerate rely on its PF. VF has a tight
relationship between its PF. This is not proper to enumerate a VF by usual
scan procedure. That's why virtfn_add/virtfn_remove are exported in this patch
set.

Second, the reset/restore of a VF is done in kernel space. FW is not aware of
the VF, this means the usual reset function done in FW will not work. One of
the patch will imitate the reset/restore function in kernel space.

Third, the VF may be removed during the PF's error_detected function. In this
case, the original error_detected->slot_reset->resume sequence is not proper
to those removed VFs, since they are re-created by PF in a fresh state. A flag
in eeh_dev is introduce to mark the eeh_dev is in error state. By doing so, we
track whether this device needs to be reset or not.

This has been tested both on host and in guest on Power8 with latest kernel
version.

v9:
   * split pcibios_bus_add_device() into a separate patch
   * Bjorn acked the PCI part and agreed this patch set to be merged from ppc
 tree
   * rebased on mpe/linux.git next branch
v8:
   * fix on checking the return value of pnv_eeh_do_flr()
   * introduced a weak function pcibios_bus_add_device() to create PE for VFs
v7:
   * fix compile error when PCI_IOV is not set
v6:
   * code / commit log refactor by Gavin
v5:
   * remove the compound field, iterate on Master VF PE instead
   * some code refine on PCI config restore and reset on VF
 the wait time for assert and deassert
 PCI device address format
 check on edev->pcie_cap and edev->aer_cap before access them
v4:
   * refine the change logs, comment and code style
   * change pnv_pci_fixup_vf_eeh() to pnv_eeh_vf_final_fixup() and remove the
 CONFIG_PCI_IOV macro
   * reorder patch 5/6 to make the logic more reasonable
   * remove remove_dev_pci_data()
   * remove the EEH_DEV_VF flag, use edev->physfn to identify a VF EEH DEV and
 remove related CONFIG_PCI_IOV macro
   * add the option for VF reset
   * fix the pnv_eeh_cfg_blocked() logic
   * replace pnv_pci_cfg_{read,write} with eeh_ops->{read,write}_config in
 pnv_eeh_vf_restore_config()
   * rename pnv_eeh_vf_restore_config() to pnv_eeh_restore_vf_config()
   * rename pnv_pci_fixup_vf_caps() to pnv_pci_vf_header_fixup() and move it
 to arch/powerpc/platforms/powernv/pci.c
   * add a field compound in pnv_ioda_pe to link compound PEs
   * handle compound PE for VF PEs
v3:
   * add back vf_index in pci_dn to track the VF's index
   * rename ppdev in eeh_dev to physfn for consistency
   * move edev->physfn assignment before dev->dev.archdata.edev is set
   * move pnv_pci_fixup_vf_eeh() and pnv_pci_fixup_vf_caps() to eeh-powernv.c
   * more clear and detail in commit log and comment in code
   * merge eeh_rmv_virt_device() with eeh_rmv_device()
   * move the cfg_blocked check logic from pnv_eeh_read/write_config() to
 pnv_eeh_cfg_blocked()
   * move the vf reset/restore logic into its own patch, two patches are
 created.
 powerpc/powernv: Support PCI config restore for VFs
 powerpc/powernv: Support EEH reset for VFs
   * simplify the vf reset logic
v2:
   * add prefix pci_iov_ to virtfn_add/virtfn_remove
   * use EEH_DEV_VF as a flag for a VF's eeh_dev
   * use eeh_dev instead of edev in change log
   * remove vf_index in eeh_dev, calculate it from pdn->busno and devfn
   * do eeh_add_device_late() and eeh_sysfs_add_device() both after pci_dev is
 well initialized
   * do FLR to reset a VF PE
   * imitate the restore function in FW for VF
   * remove the reverse order patch, since it is still under discussion

Wei Yang (11):
  PCI/IOV: Rename and export virtfn_add/virtfn_remove
  PCI: Add pcibios_bus_add_device() weak function
  powerpc/pci: Cache VF index in pci_dn
  powerpc/pci: Remove VFs prior to PF
  powerpc/eeh: Cache only BARs, not windows or IOV BARs
  powerpc/powernv: EEH device for VF
  powerpc/eeh: Create PE for VFs
  powerpc/powernv: Support EEH reset for VF PE
  powerpc/powernv: Support PCI config restore for VFs
  powerpc/eeh: Support error recovery for VF PE
  powerpc/powernv: compound PE for VFs

 arch/powerpc/include/asm/eeh.h   |4 +
 arch/powerpc/include/asm/pci-bridge.h|2 +
 arch/powerpc/kernel/eeh.c|8 +
 arch/powerpc/kernel/eeh_cache.c  |6 +-
 arch/powerpc/kernel/eeh_driver.c |  100 +---
 arch/powerpc/kernel/eeh_pe.c |   13 +-
 arch/powerpc/kernel/pci-hotplug.c|2 +-
 arch/powerpc/kernel/pci_dn.c |   16 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  220 +-
 arch/powerpc/platforms/powernv/pci-ioda.c 

Re: [PATCH v5 3/3] leds/powernv: Add driver for PowerNV platform

2015-07-16 Thread Vasant Hegde
On 07/16/2015 02:17 PM, Michael Ellerman wrote:
> On Thu, 2015-07-16 at 10:27 +0200, Jacek Anaszewski wrote:
>> On 07/16/2015 08:54 AM, Vasant Hegde wrote:
> +static enum led_brightness powernv_led_get(struct led_classdev *led_cdev)
> +{
> +char *loc_code;
> +int rc, led_type;
> +__be64 led_mask, led_value, max_led_type;
> +
> +led_type = powernv_get_led_type(led_cdev);
> +if (led_type == -1)
> +return LED_OFF;
> +
> +loc_code = powernv_get_location_code(led_cdev);
> +if (!loc_code)
> +return LED_OFF;
> +
> +/* Fetch all LED status */
> +led_mask = cpu_to_be64(0);
> +led_value = cpu_to_be64(0);
> +max_led_type = cpu_to_be64(OPAL_SLOT_LED_TYPE_MAX);
> +
> +rc = opal_leds_get_ind(loc_code, &led_mask, &led_value, 
> &max_led_type);
> +if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) {
> +dev_err(led_cdev->dev,
> +"%s: OPAL get led call failed [rc=%d]\n",
> +__func__, rc);
> +goto led_fail;
> +}
> +
> +led_mask = be64_to_cpu(led_mask);
> +led_value = be64_to_cpu(led_value);

 be64_to_cpu result should be assigned to the variable of u64/s64 type.
>>>
>>> PowerNV platform is capable of running both big/little endian mode.. But
>>> presently our firmware is big endian. These variable contains big endian 
>>> values.
>>> Hence I have created as __be64 .. (This is the convention we follow in other
>>> places as well).
>>
>> It is correct that the argument is of __be64 type, but be64_to_cpu
>> returns u64 type, whereas you assign it to  __be64.
> 
> Yeah that's wrong. You are using led_mask etc as __be64 when you pass them to
> firmware, which is correct, but then you're also using them as the lvalue of
> be64_to_cpu() which returns a u64.
> 

Yep. Got it.


> Sparse should warn you about that if you use it, please do.
> 
> $ apt-get install sparse
> $ cd kernel
> $ make C=2 CF=-D__CHECK_ENDIAN__
> 

Thanks!

-Vasant

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 3/3] leds/powernv: Add driver for PowerNV platform

2015-07-16 Thread Vasant Hegde
On 07/16/2015 01:57 PM, Jacek Anaszewski wrote:
> Hi Vasan,
> 

Hello Jacek,

.../...

>>
>> I have added as
>> -  compatible : "ibm,opal-v3-led".
> 
> Please retain "Should be :".
> 

Done.

.../...

>>>
>>> Please parse the led type once upon initialization and add related
>>> property to the struct powernv_led_data that will hold the value.
>>
>> I thought we can get location code and type using class dev name itself. 
>> Hence I
>> didn't add these two properties to structure..
> 
> This way you are doing extra work for parsing the name each time
> the brightness is set.

Agreed. I have added them to structure now.

> 
>> Do you want me to add them to structure itself?
> 
> Yes, please add them.

Done.

> 
>>>
 +loc_code = powernv_get_location_code(led_cdev);
 +if (!loc_code)
 +return;
>>>
>>> The same situation as in case of led type.
>>>
 +/* Prepare for the OPAL call */
 +max_led_type = cpu_to_be64(OPAL_SLOT_LED_TYPE_MAX);
>>>
>>> This value could be also calculated only once.
>>
>> Yeah. May be I can move this to powernv_leds_priv structure.
>>
>>>
 +led_mask = OPAL_SLOT_LED_STATE_ON << led_type;
 +if (value)
 +led_value = led_mask;
 +
 +/* OPAL async call */
 +token = opal_async_get_token_interruptible();
 +if (token < 0) {
 +if (token != -ERESTARTSYS)
 +dev_err(led_cdev->dev,
 +"%s: Couldn't get OPAL async token\n",
 +__func__);
 +goto out_loc;
 +}
 +
 +rc = opal_leds_set_ind(token, loc_code,
 +   led_mask, led_value, &max_led_type);
 +if (rc != OPAL_ASYNC_COMPLETION) {
 +dev_err(led_cdev->dev,
 +"%s: OPAL set LED call failed for %s [rc=%d]\n",
 +__func__, loc_code, rc);
 +goto out_token;
 +}
 +
 +rc = opal_async_wait_response(token, &msg);
 +if (rc) {
 +dev_err(led_cdev->dev,
 +"%s: Failed to wait for the async response [rc=%d]\n",
 +__func__, rc);
 +goto out_token;
 +}
 +
 +rc = be64_to_cpu(msg.params[1]);
 +if (rc != OPAL_SUCCESS)
 +dev_err(led_cdev->dev,
 +"%s : OAPL async call returned failed [rc=%d]\n",
 +__func__, rc);
 +
 +out_token:
 +opal_async_release_token(token);
 +
 +out_loc:
 +kfree(loc_code);
 +}
 +
 +/*
 + * This function fetches the LED state for a given LED type for
 + * mentioned LED classdev structure.
 + */
 +static enum led_brightness powernv_led_get(struct led_classdev *led_cdev)
 +{
 +char *loc_code;
 +int rc, led_type;
 +__be64 led_mask, led_value, max_led_type;
 +
 +led_type = powernv_get_led_type(led_cdev);
 +if (led_type == -1)
 +return LED_OFF;
 +
 +loc_code = powernv_get_location_code(led_cdev);
 +if (!loc_code)
 +return LED_OFF;
 +
 +/* Fetch all LED status */
 +led_mask = cpu_to_be64(0);
 +led_value = cpu_to_be64(0);
 +max_led_type = cpu_to_be64(OPAL_SLOT_LED_TYPE_MAX);
 +
 +rc = opal_leds_get_ind(loc_code, &led_mask, &led_value, 
 &max_led_type);
 +if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) {
 +dev_err(led_cdev->dev,
 +"%s: OPAL get led call failed [rc=%d]\n",
 +__func__, rc);
 +goto led_fail;
 +}
 +
 +led_mask = be64_to_cpu(led_mask);
 +led_value = be64_to_cpu(led_value);
>>>
>>> be64_to_cpu result should be assigned to the variable of u64/s64 type.
>>
>> PowerNV platform is capable of running both big/little endian mode.. But
>> presently our firmware is big endian. These variable contains big endian 
>> values.
>> Hence I have created as __be64 .. (This is the convention we follow in other
>> places as well).
> 
> It is correct that the argument is of __be64 type, but be64_to_cpu
> returns u64 type, whereas you assign it to  __be64.
> 

Got it .. Fixed.

>>>
 +/* LED status available */
 +if (!((led_mask >> led_type) & OPAL_SLOT_LED_STATE_ON)) {
 +dev_err(led_cdev->dev,
 +"%s: LED status not available for %s\n",
 +__func__, led_cdev->name);
 +goto led_fail;
 +}
 +
 +/* LED status value */
 +if ((led_value >> led_type) & OPAL_SLOT_LED_STATE_ON) {
 +kfree(loc_code);
 +return LED_FULL;
 +}
 +
 +led_fail:
 +kfree(loc_code);
 +return LED_OFF;
 +}
 +
 +/* Execute LED set task for given led classdev */
 +static void powernv_deferred_led_set(struct work_struct *work)
 +{
 +struct powernv_led_data *powernv_led =
 +container_of(work, struct powernv_led_

Re: BUG: perf error on syscalls for powerpc64.

2015-07-16 Thread Zumeng Chen

On 2015年07月17日 12:07, Michael Ellerman wrote:

On Fri, 2015-07-17 at 09:27 +0800, Zumeng Chen wrote:

On 2015年07月16日 17:04, Michael Ellerman wrote:

On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:

Hi All,

1028ccf5 did a change for sys_call_table from a pointer to an array of
unsigned long, I think it's not proper, here is my reason:

sys_call_table defined as a label in assembler should be pointer array
rather than an array as described in 1028ccf5. If we defined it as an
array, then arch_syscall_addr will return the address of sys_call_table[],
actually the content of sys_call_table[] is demanded by arch_syscall_addr.
so 'perf list' will ignore all syscalls since find_syscall_meta will
return null
in init_ftrace_syscalls because of the wrong arch_syscall_addr.

Did I miss something, or Gcc compiler has done something newer ?

Hi Zumeng,

It works for me with the code as it is in mainline.

I don't quite follow your explanation, so if you're seeing a bug please send
some information about what you're actually seeing. And include the disassembly
of arch_syscall_addr() and your compiler version etc.

Hi Michael,

Hi Zumeng,


Yeah, it seems it was not a good explanation, I'll explain more this time:

1. Whatever we exclaim sys_call_table in C level, actually it is a pointer
  to sys_call_table rather than sys_call_table self in assemble level.

No it's not a pointer.


Then what is the second one in the following:

zchen@pek-yocto-build2:$ cat  System.map |grep sys_call_table
c0009590 T .sys_call_table  <-this is a real sys_call_table.
c14e1b48 D sys_call_table  <-this should be referred by 
arch_syscall_addr


The c14e1b48[0] = c0009590



A pointer is a location in memory that contains the address of another location
in memory.


Yeah, this definition is right.




  arch/powerpc/kernel/systbl.S
  47 .globl sys_call_table   <--- see here
  48 sys_call_table:

Which gives us a .o that looks like:

    :
0: R_PPC64_ADDR64   sys_restart_syscall
8: R_PPC64_ADDR64   sys_restart_syscall
10: R_PPC64_ADDR64  sys_exit
18: R_PPC64_ADDR64  sys_exit

ie. at the location in memory called sys_call_table we have *the contents of
the syscall table*.

We do not have *the address* of the syscall table.

You can also see in the System.map:

   c0bb0798 R sys_call_table
   c0bb1e58 r cache_type_info


Please refer to `cat  System.map` above



ie. sys_call_table occupies 5824 bytes. If it was a pointer it would only
occupy 8 bytes.

Compare to SYS_CALL_TABLE, which *is* a pointer.

   c1172bf8 d SYS_CALL_TABLE
   c1172c00 d exception_marker

Note, 8 bytes.


Finally if you look at a running system using xmon:

   0:mon> d $sys_call_table
   c08f0798 c00a85a0 c00a85a0  ||
   c08f07a8 c0099b40 c0099b40  |...@...@|


This is right sys_call_table. but not what I'm talking about. What I'm 
talking about
is that the definition of sys_call_table by that commit will incur the 
following result:


sys_call_table[0]= 0xc14e1b48[0] = c0009590  la c00a85a0
   c00a85a0: .sys_restart_syscall+0x0/0x40
   0:mon> la c0099b40
   c0099b40: .SyS_exit+0x0/0x20

   0:mon> d $SYS_CALL_TABLE
   c0ec68f8 c08f0798 7265677368657265  |regshere|
^
 this is the address of sys_call_table


As another example, see hcall_real_table, which is basically identical, and is
also declared as an array in C.



3. What I have seen in 3.14.x kernel,
==
And so far, no more difference to 4.x kernel from me about this part if
I'm right.

*) With 1028ccf5

perf list|grep -i syscall got me nothing.


*) Without 1028ccf5
root@localhost:~# perf list|grep -i syscall
syscalls:sys_enter_socket  [Tracepoint event]
syscalls:sys_exit_socket   [Tracepoint event]
syscalls:sys_enter_socketpair  [Tracepoint event]
syscalls:sys_exit_socketpair   [Tracepoint event]
syscalls:sys_enter_bind[Tracepoint event]
syscalls:sys_exit_bind [Tracepoint event]
syscalls:sys_enter_listen  [Tracepoint event]
syscalls:sys_exit_listen   [Tracepoint event]
... ...

I don't know why that

Re: [PATCH] powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*

2015-07-16 Thread Michael Ellerman
On Thu, 2015-07-16 at 22:12 +1000, Paul Mackerras wrote:
> The hardware RNG on POWER8 and POWER7+ can be relatively slow, since
> it can only supply one 64-bit value per microsecond.  Currently we
> read it in arch_get_random_long(), but that slows down reading from
> /dev/urandom since the code in random.c calls arch_get_random_long()
> for every longword read from /dev/urandom.
> 
> Since the hardware RNG supplies high-quality entropy on every read, it
> matches the semantics of arch_get_random_seed_long() better than those
> of arch_get_random_long().  Therefore this commit makes the code use
> the hardware RNG only for arch_get_random_seed_{long,int} and not for
> arch_get_random_{long,int}.
> 
> Signed-off-by: Paul Mackerras 

Yep seems sensible.

Can you resend and CC some of the random folks, just in case they care.

eg: ty...@mit.edu, keesc...@chromium.org, h...@linux.intel.com.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 01/12] powerpc/kernel: Get pt_regs from r9 before calling do_syscall_trace_enter()

2015-07-16 Thread Michael Ellerman
On Fri, 2015-07-17 at 08:40 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2015-07-15 at 17:37 +1000, Michael Ellerman wrote:
> > To call do_syscall_trace_enter() we need pt_regs in r3, but we don't need
> > to recalculate it based on r1, it's already in r9.
> > 
> > Signed-off-by: Michael Ellerman 
> 
> Is there any performance difference ?

No.

I'm not going to bother measuring it :)

> I find the addi a bit more robust in case the code gets moved around or
> the "previous" code gets changed to either not use r9 or clobber it,
> which would have the potential to
> introduce a subtle bug ...

Yeah true.

There is an "invariant" in that entry code that r9 contains pt_regs, you can
see for example the DTL code goes to pains to ensure it puts pt_regs back in r9
after it clobbers it. As does the current syscall_dotrace.

But looking closer I don't see where we actually use that (prior to this
patch).

So yeah I'll drop this and send a clean up to just get rid of all the r9
reloading.

cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: BUG: perf error on syscalls for powerpc64.

2015-07-16 Thread Michael Ellerman
On Fri, 2015-07-17 at 09:27 +0800, Zumeng Chen wrote:
> On 2015年07月16日 17:04, Michael Ellerman wrote:
> > On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
> >> Hi All,
> >>
> >> 1028ccf5 did a change for sys_call_table from a pointer to an array of
> >> unsigned long, I think it's not proper, here is my reason:
> >>
> >> sys_call_table defined as a label in assembler should be pointer array
> >> rather than an array as described in 1028ccf5. If we defined it as an
> >> array, then arch_syscall_addr will return the address of sys_call_table[],
> >> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
> >> so 'perf list' will ignore all syscalls since find_syscall_meta will
> >> return null
> >> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
> >>
> >> Did I miss something, or Gcc compiler has done something newer ?
> > Hi Zumeng,
> >
> > It works for me with the code as it is in mainline.
> >
> > I don't quite follow your explanation, so if you're seeing a bug please send
> > some information about what you're actually seeing. And include the 
> > disassembly
> > of arch_syscall_addr() and your compiler version etc.
> 
> Hi Michael,

Hi Zumeng,

> Yeah, it seems it was not a good explanation, I'll explain more this time:
> 
> 1. Whatever we exclaim sys_call_table in C level, actually it is a pointer
>  to sys_call_table rather than sys_call_table self in assemble level.

No it's not a pointer.

A pointer is a location in memory that contains the address of another location
in memory.

>  arch/powerpc/kernel/systbl.S
>  47 .globl sys_call_table   <--- see here
>  48 sys_call_table:

Which gives us a .o that looks like:

   :
   0: R_PPC64_ADDR64   sys_restart_syscall
   8: R_PPC64_ADDR64   sys_restart_syscall
   10: R_PPC64_ADDR64  sys_exit
   18: R_PPC64_ADDR64  sys_exit

ie. at the location in memory called sys_call_table we have *the contents of
the syscall table*.

We do not have *the address* of the syscall table.

You can also see in the System.map:

  c0bb0798 R sys_call_table
  c0bb1e58 r cache_type_info

ie. sys_call_table occupies 5824 bytes. If it was a pointer it would only
occupy 8 bytes.

Compare to SYS_CALL_TABLE, which *is* a pointer.

  c1172bf8 d SYS_CALL_TABLE
  c1172c00 d exception_marker

Note, 8 bytes.


Finally if you look at a running system using xmon:

  0:mon> d $sys_call_table
  c08f0798 c00a85a0 c00a85a0  ||
  c08f07a8 c0099b40 c0099b40  |...@...@|

  0:mon> la c00a85a0
  c00a85a0: .sys_restart_syscall+0x0/0x40
  0:mon> la c0099b40
  c0099b40: .SyS_exit+0x0/0x20

  0:mon> d $SYS_CALL_TABLE
  c0ec68f8 c08f0798 7265677368657265  |regshere|
   ^
 this is the address of sys_call_table


As another example, see hcall_real_table, which is basically identical, and is
also declared as an array in C.


> 3. What I have seen in 3.14.x kernel,
> ==
> And so far, no more difference to 4.x kernel from me about this part if
> I'm right.
> 
> *) With 1028ccf5
> 
> perf list|grep -i syscall got me nothing.
> 
> 
> *) Without 1028ccf5
> root@localhost:~# perf list|grep -i syscall
>syscalls:sys_enter_socket  [Tracepoint event]
>syscalls:sys_exit_socket   [Tracepoint event]
>syscalls:sys_enter_socketpair  [Tracepoint event]
>syscalls:sys_exit_socketpair   [Tracepoint event]
>syscalls:sys_enter_bind[Tracepoint event]
>syscalls:sys_exit_bind [Tracepoint event]
>syscalls:sys_enter_listen  [Tracepoint event]
>syscalls:sys_exit_listen   [Tracepoint event]
>... ...

I don't know why that's happening.

Please just test 4.2-rc2 for now, so that there are not too many variables.

Assuming you have CONFIG_FTRACE_SYSCALLS=y, you can see the tracepoints in
debugfs with:

  $ ls -la /sys/kernel/debug/tracing/events/syscalls
  total 0
  drwxr-xr-x 596 root root 0 Jul 17 13:11 .
  drwxr-xr-x  45 root root 0 Jul 17 13:11 ..
  -rw-r--r--   1 root root 0 Jul 17 13:33 enable
  -rw-r--r--   1 root root 0 Jul 17 13:11 filter
  drwxr-xr-x   2 root root 0 Jul 17 13:11 sys_enter_accept
  drwxr-xr-x   2 root root 0 Jul 17 13:11 sys_enter_accept4
  drwxr-xr-x   2 root root 0 Jul 17 13:11 sys_enter_access
  drwxr-xr-x   2 root root 0 Jul 17 13:11 sys_enter_add_key
  ...


cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 2/2] powerpc/kexec: Reset HILE before entering target kernel

2015-07-16 Thread Benjamin Herrenschmidt
On Fri, 2015-07-17 at 11:53 +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2015-07-10 at 15:19 +1000, Samuel Mendoza-Jonas wrote:
> > +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_POWERNV)
> > +   li  r3,(FW_FEATURE_OPAL >> 16)
> > +   rldicr  r3,r3,16,63
> > +   and.r3,r3,r26
> > +   cmpwi   r3,0
> > +   beq 99f
> 
> If FW_FEATRURE_OPAL is 0x8000 then the li will sign extend.
> 
> The rldicr has a mask of all F's so it will keep all the bits you
> don't care about.

../..

Even better, you should be able to just do it all in C in
pnv_kexec_cpu_down(), after we wait for secondaries to
be in OPAL. At that point interrupts are already off, so
it should be all good.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: BUG: perf error on syscalls for powerpc64.

2015-07-16 Thread Ian Munsie
Excerpts from Sukadev Bhattiprolu's message of 2015-07-17 11:51:04 +1000:
> Are you seeing this on big-endian or little-endian system?
> 
> IIRC, I saw the opposite behavior on an LE system a few months ago.
> i.e. without 1028ccf5, 'perf listf|grep syscall' failed.
> 
> Applying 1028ccf5, seemed to fix it.

You could be on to something there - IIRC the ABI was changed for LE to
remove the dot symbols. Might be worth testing on both.

Cheers,
-Ian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 2/2] powerpc/kexec: Reset HILE before entering target kernel

2015-07-16 Thread Benjamin Herrenschmidt
On Fri, 2015-07-10 at 15:19 +1000, Samuel Mendoza-Jonas wrote:
> +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_PPC_POWERNV)
> +   li  r3,(FW_FEATURE_OPAL >> 16)
> +   rldicr  r3,r3,16,63
> +   and.r3,r3,r26
> +   cmpwi   r3,0
> +   beq 99f

If FW_FEATRURE_OPAL is 0x8000 then the li will sign extend.

The rldicr has a mask of all F's so it will keep all the bits you
don't care about.

So together, you'll get compares happening on bits above the 16 you care
about that might change the result of your comparison incorrectly.

Since FW_FEATURE_* bits aren't ABI, they can change, so we don't want
to impose a constraint on them.

Thus I would recommend using an rdlicl r3,r3,16,48 (aka srdi r3,r3,48)
instead which is going to clear all bits above 0x.

Now, that being said, FW_FEATURE_* can be 64-bit and this isn't perf
critical so why not just load the full 64-bit constant into r3 and
be done with it ? There's a macro to do that:

LOAD_REG_IMMEDIATE(r3,FW_FEATURE_OPAL)

Cheers,
Ben.



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: BUG: perf error on syscalls for powerpc64.

2015-07-16 Thread Sukadev Bhattiprolu
Zumeng Chen [zumeng.c...@gmail.com] wrote:
| 3. What I have seen in 3.14.x kernel,
| ==
| And so far, no more difference to 4.x kernel from me about this part if
| I'm right.
| 
| *) With 1028ccf5
| 
| perf list|grep -i syscall got me nothing.
| 
| 
| *) Without 1028ccf5
| root@localhost:~# perf list|grep -i syscall
|syscalls:sys_enter_socket  [Tracepoint event]
|syscalls:sys_exit_socket   [Tracepoint event]
|syscalls:sys_enter_socketpair  [Tracepoint event]
|syscalls:sys_exit_socketpair   [Tracepoint event]
|syscalls:sys_enter_bind[Tracepoint event]
|syscalls:sys_exit_bind [Tracepoint event]
|syscalls:sys_enter_listen  [Tracepoint event]
|syscalls:sys_exit_listen   [Tracepoint event]
|... ...

Are you seeing this on big-endian or little-endian system?

IIRC, I saw the opposite behavior on an LE system a few months ago.
i.e. without 1028ccf5, 'perf listf|grep syscall' failed.

Applying 1028ccf5, seemed to fix it.

Sukadev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: BUG: perf error on syscalls for powerpc64.

2015-07-16 Thread Zumeng Chen
On 2015年07月16日 17:04, Michael Ellerman wrote:
> On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
>> Hi All,
>>
>> 1028ccf5 did a change for sys_call_table from a pointer to an array of
>> unsigned long, I think it's not proper, here is my reason:
>>
>> sys_call_table defined as a label in assembler should be pointer array
>> rather than an array as described in 1028ccf5. If we defined it as an
>> array, then arch_syscall_addr will return the address of sys_call_table[],
>> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
>> so 'perf list' will ignore all syscalls since find_syscall_meta will
>> return null
>> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
>>
>> Did I miss something, or Gcc compiler has done something newer ?
> Hi Zumeng,
>
> It works for me with the code as it is in mainline.
>
> I don't quite follow your explanation, so if you're seeing a bug please send
> some information about what you're actually seeing. And include the 
> disassembly
> of arch_syscall_addr() and your compiler version etc.

Hi Michael,

Yeah, it seems it was not a good explanation, I'll explain more this time:

1. Whatever we exclaim sys_call_table in C level, actually it is a pointer
 to sys_call_table rather than sys_call_table self in assemble level.

 arch/powerpc/kernel/systbl.S
 47 .globl sys_call_table   <--- see here
 48 sys_call_table:

 So if you want to exclaim sys_call_table as array, then I think
it's very
 clear what we'll get when we do sys_call_table[i].

2. Disassemble codes difference of arch_syscall_addr with or without
1028ccf5


 *) With 1028ccf5
   -
Dump of assembler code for function arch_syscall_addr:
522{
523return (unsigned long)sys_call_table[nr];
0xc0df53d4 <+0>:addis   r10,r2,-13
0xc0df53d8 <+4>:addir9,r10,3488
0xc0df53dc <+8>:rldicr  r3,r3,3,60
524}
0xc0df53e0 <+12>:ldx r3,r9,r3
0xc0df53e4 <+16>:blr


 *) Without 1028ccf5
 ---
Dump of assembler code for function arch_syscall_addr:
522{
523return (unsigned long)sys_call_table[nr];
0xc0df53d0 <+0>:addis   r10,r2,-13
0xc0df53d4 <+4>:addir9,r10,3488
0xc0df53d8 <+8>:rldicr  r3,r3,3,60
0xc0df53dc <+12>:ld  r9,0(r9) <--only this is
different
524}
0xc0df53e0 <+16>:ldx r3,r9,r3
0xc0df53e4 <+20>:blr
End of assembler dump.

3. What I have seen in 3.14.x kernel,
==
And so far, no more difference to 4.x kernel from me about this part if
I'm right.

*) With 1028ccf5

perf list|grep -i syscall got me nothing.


*) Without 1028ccf5
root@localhost:~# perf list|grep -i syscall
   syscalls:sys_enter_socket  [Tracepoint event]
   syscalls:sys_exit_socket   [Tracepoint event]
   syscalls:sys_enter_socketpair  [Tracepoint event]
   syscalls:sys_exit_socketpair   [Tracepoint event]
   syscalls:sys_enter_bind[Tracepoint event]
   syscalls:sys_exit_bind [Tracepoint event]
   syscalls:sys_enter_listen  [Tracepoint event]
   syscalls:sys_exit_listen   [Tracepoint event]
   ... ...

Cheers,
Zumeng

>
> cheers
>
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc/powernv: Double VF BAR size for compound PE

2015-07-16 Thread Gavin Shan
On Fri, Jul 17, 2015 at 10:14:43AM +1000, Gavin Shan wrote:
>When VF BAR size is equal to 128MB or bigger than that, we extend
>the corresponding PF's IOV BAR to cover number of total VFs supported
>by the PF. Otherwise, we extend the PF's IOV BAR to cover 256 VFs.
>For the former case, we have to create compound PE, which includes
>4 VFs. Those 4 VFs included in the compound PE can't be passed through
>to different guests, which isn't good.
>
>The gate (128MB) was choosen based on the assumption that each PHB
>supports 64GB M64 space and one PF's IOV BAR can be extended to be
>as huge as 1/4 of that, which is 16GB. However, the IOV BAR can be
>extended to half of PHB's M64 window when the PF seats behind the
>root port. In that case, the gate can be enlarged to be 256MB to
>avoid compound PE as we can.
>
>Signed-off-by: Gavin Shan 
>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 21 -
> 1 file changed, 16 insertions(+), 5 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 6ec62b9..5b2e88f 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -2721,6 +2721,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>   struct resource *res;
>   int i;
>   resource_size_t size;
>+  resource_size_t limit;
>   struct pci_dn *pdn;
>   int mul, total_vfs;
>
>@@ -2730,6 +2731,18 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>   hose = pci_bus_to_host(pdev->bus);
>   phb = hose->private_data;
>
>+  /*
>+   * When the PF seats behind root port, the IOV BAR can
>+   * consume half of the PHB's M64 window. Otherwise,
>+   * 1/4 of the PHB's M64 window can be consumed to the
>+   * maximal degree.
>+   */
>+  if (!pci_is_root_bus(pdev->bus) &&
>+  pci_is_root_bus(pdev->bus->self->bus))
>+  limit = 128;
>+  else
>+  limit = 256;
>+

I sent it too fast. The limit should be reversed: 256 when PF seats behind the
root port. Otherwise, it should be 128. I will send follow-up v2 after waiting
for couple of days in case there are some comments for this revision.

>   pdn = pci_get_pdn(pdev);
>   pdn->vfs_expanded = 0;
>
>@@ -2748,11 +2761,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
>pci_dev *pdev)
>   }
>
>   size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
>-
>-  /* bigger than 64M */
>-  if (size > (1 << 26)) {
>-  dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size 
>is bigger than 64M, roundup power2\n",
>-   i, res);
>+  if (size >= (limit * 0x10)) {
>+  dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size 
>is bigger than %lldMB, roundup power2\n",
>+   i, res, limit);
>   pdn->m64_per_iov = M64_PER_IOV;
>   mul = roundup_pow_of_two(total_vfs);
>   break;

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/2] powerpc/powernv: Avoid compound PE for VF

2015-07-16 Thread Gavin Shan
When the VF BAR size is equal to 128MB or bigger than that, the IOV BAR
is extended to cover number of maximal VFs supported by the PF, not 256.
Also, one PHB's M64 BAR is picked to cover VF BARs for 4 continous VFs,
but the PHB's M64 BAR is configured as being owned by single PE. Eventually,
those 4 VFs have 4 separate PEs from the perspective of PCI config or DMA,
but single shared PE from MMIO's perspective. Once we have compound PE, all
those 4 VFs included in the compound PE can't be passed to separate guests
with VFIO infrastructure.

The above gate (128MB) was choosen based on the assumption: one IOV BAR can
consume 1/4 of PHB's M64 window, which is 16GB. However, it can consume as
much as half of that (32GB) when the PF seats behind the root port. Accordingly,
the gate can be doubled to be 256MB in order to avoid compound PE as we can.


Gavin Shan (2):
  powerpc/powernv: Fix alignment for IOV BAR
  powerpc/powernv: Double VF BAR size for compound PE

 arch/powerpc/platforms/powernv/pci-ioda.c | 56 +--
 1 file changed, 45 insertions(+), 11 deletions(-)

-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] powerpc/powernv: Double VF BAR size for compound PE

2015-07-16 Thread Gavin Shan
When VF BAR size is equal to 128MB or bigger than that, we extend
the corresponding PF's IOV BAR to cover number of total VFs supported
by the PF. Otherwise, we extend the PF's IOV BAR to cover 256 VFs.
For the former case, we have to create compound PE, which includes
4 VFs. Those 4 VFs included in the compound PE can't be passed through
to different guests, which isn't good.

The gate (128MB) was choosen based on the assumption that each PHB
supports 64GB M64 space and one PF's IOV BAR can be extended to be
as huge as 1/4 of that, which is 16GB. However, the IOV BAR can be
extended to half of PHB's M64 window when the PF seats behind the
root port. In that case, the gate can be enlarged to be 256MB to
avoid compound PE as we can.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 6ec62b9..5b2e88f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2721,6 +2721,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
struct resource *res;
int i;
resource_size_t size;
+   resource_size_t limit;
struct pci_dn *pdn;
int mul, total_vfs;
 
@@ -2730,6 +2731,18 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
hose = pci_bus_to_host(pdev->bus);
phb = hose->private_data;
 
+   /*
+* When the PF seats behind root port, the IOV BAR can
+* consume half of the PHB's M64 window. Otherwise,
+* 1/4 of the PHB's M64 window can be consumed to the
+* maximal degree.
+*/
+   if (!pci_is_root_bus(pdev->bus) &&
+   pci_is_root_bus(pdev->bus->self->bus))
+   limit = 128;
+   else
+   limit = 256;
+
pdn = pci_get_pdn(pdev);
pdn->vfs_expanded = 0;
 
@@ -2748,11 +2761,9 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
}
 
size = pci_iov_resource_size(pdev, i + PCI_IOV_RESOURCES);
-
-   /* bigger than 64M */
-   if (size > (1 << 26)) {
-   dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size 
is bigger than 64M, roundup power2\n",
-i, res);
+   if (size >= (limit * 0x10)) {
+   dev_info(&pdev->dev, "PowerNV: VF BAR%d: %pR IOV size 
is bigger than %lldMB, roundup power2\n",
+i, res, limit);
pdn->m64_per_iov = M64_PER_IOV;
mul = roundup_pow_of_two(total_vfs);
break;
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] powerpc/powernv: Fix alignment for IOV BAR

2015-07-16 Thread Gavin Shan
IOV BAR is extended to cover 256 VFs or number of supported VFs,
the alignment is the IOV BAR size, which is usually huge and bigger
than M64 segment size (256MB). That means the IOV BAR is expected
to be assigned to the beginning of PHB's M64 window prior to other
M64 BARs in PCI devices that are hooked to the PCI bus behind root
port. Other M64 BARs actually need M64 segment size other than the
huge IOV BAR size as the required alignment.

The patch returns M64 segment size if IOV BAR size is bigger than
it when the PF seats behind root port. Otherwise, the IOV BAR size
is returned as before. It will save lots of consumed M64 space,
which would be 16GB in some cases as I observed.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 35 +--
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index fdafbac..6ec62b9 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2961,16 +2961,39 @@ static resource_size_t pnv_pci_window_alignment(struct 
pci_bus *bus,
 static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
  int resno)
 {
+   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+   struct pnv_phb *phb = hose->private_data;
struct pci_dn *pdn = pci_get_pdn(pdev);
-   resource_size_t align, iov_align;
+   resource_size_t align;
+   resource_size_t m64_segsz = phb->ioda.m64_segsize;
 
-   iov_align = resource_size(&pdev->resource[resno]);
-   if (iov_align)
-   return iov_align;
+   /*
+* When PF is the only one adapter under the PHB, the IOV BAR
+* is expected to be assigned prior to any other M64 BARs. To
+* have M64 segment size, which is usually smaller than IOV
+* BAR size, as the alignment to avoid wasting M64 space to
+* satisfy the alignment required by other M64 BARs.
+*/
+   align = resource_size(&pdev->resource[resno]);
+   if (align) {
+   if (!pci_bus_is_root(pdev->bus) &&
+   pci_bus_is_root(pdev->bus->self->bus))
+   align = min(align, m64_segsz);
+   else
+   align = max(align, m64_segsz);
+
+   return align;
+   }
 
align = pci_iov_resource_size(pdev, resno);
-   if (pdn->vfs_expanded)
-   return pdn->vfs_expanded * align;
+   if (pdn->vfs_expanded) {
+   align = pdn->vfs_expanded * align;
+   if (!pci_bus_is_root(pdev->bus) &&
+   pci_bus_is_root(pdev->bus->self->bus))
+   align = min(align, m64_segsz);
+   else
+   align = max(align, m64_segsz);
+   }
 
return align;
 }
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/fsl-booke-64: Allow booting from the secondary thread

2015-07-16 Thread Scott Wood
According to Yuantian, this is needed for forthcoming power management
patches -- IIRC, for resuming from certain deep sleep states.

This also allows SMP kernels to work as kdump crash kernels.  While
crash kernels don't really need to be SMP, this prevents things from
breaking if a user does it anyway (which is not something you want to
only find out once the main kernel has crashed in the field, especially
if whether it works or not depends on which cpu crashed).

Signed-off-by: Scott Wood 
Cc: Tang Yuantian 
---
I'm sending this before the rest of the kexec patches, since Yuantian
needs it as a prerequisite.  Yuantian, if you explain the issue more I
can improve the commit message of this patch.
---
 arch/powerpc/platforms/85xx/smp.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index b8b8216..c2ded03 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -173,15 +173,22 @@ static inline u32 read_spin_table_addr_l(void *spin_table)
 static void wake_hw_thread(void *info)
 {
void fsl_secondary_thread_init(void);
-   unsigned long imsr1, inia1;
+   unsigned long imsr, inia;
int nr = *(const int *)info;
 
-   imsr1 = MSR_KERNEL;
-   inia1 = *(unsigned long *)fsl_secondary_thread_init;
-
-   mttmr(TMRN_IMSR1, imsr1);
-   mttmr(TMRN_INIA1, inia1);
-   mtspr(SPRN_TENS, TEN_THREAD(1));
+   imsr = MSR_KERNEL;
+   inia = *(unsigned long *)fsl_secondary_thread_init;
+
+   if (cpu_thread_in_core(nr) == 0) {
+   /* For when we boot on a secondary thread with kdump */
+   mttmr(TMRN_IMSR0, imsr);
+   mttmr(TMRN_INIA0, inia);
+   mtspr(SPRN_TENS, TEN_THREAD(0));
+   } else {
+   mttmr(TMRN_IMSR1, imsr);
+   mttmr(TMRN_INIA1, inia);
+   mtspr(SPRN_TENS, TEN_THREAD(1));
+   }
 
smp_generic_kick_cpu(nr);
 }
@@ -224,6 +231,12 @@ static int smp_85xx_kick_cpu(int nr)
 
smp_call_function_single(primary, wake_hw_thread, &nr, 0);
return 0;
+   } else if (cpu_thread_in_core(boot_cpuid) != 0 &&
+  cpu_first_thread_sibling(boot_cpuid) == nr) {
+   if (WARN_ON_ONCE(!cpu_has_feature(CPU_FTR_SMT)))
+   return -ENOENT;
+
+   smp_call_function_single(boot_cpuid, wake_hw_thread, &nr, 0);
}
 #endif
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 02/12] powerpc/kernel: Switch to using MAX_ERRNO

2015-07-16 Thread Benjamin Herrenschmidt
On Wed, 2015-07-15 at 17:37 +1000, Michael Ellerman wrote:
> Currently on powerpc we have our own #define for the highest (negative)
> errno value, called _LAST_ERRNO. This is defined to be 516, for reasons
> which are not clear.
> 
> The generic code, and x86, use MAX_ERRNO, which is defined to be 4095.
> 
> In particular seccomp uses MAX_ERRNO to restrict the value that a
> seccomp filter can return.
> 
> Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp
> tracer wanting to return 600, expecting it to be seen as an error, would
> instead find on powerpc that userspace sees a successful syscall with a
> return value of 600.
> 
> To avoid this inconsistency, switch powerpc to use MAX_ERRNO.
> 
> We are somewhat confident that generic syscalls that can return a
> non-error value above negative MAX_ERRNO have already been updated to
> use force_successful_syscall_return().
> 
> I have also checked all the powerpc specific syscalls, and believe that
> none of them expect to return a non-error value between -MAX_ERRNO and
> -516. So this change should be safe ...
> 
> Signed-off-by: Michael Ellerman 

Acked-by: Benjamin Herrenschmidt 

> ---
>  arch/powerpc/include/uapi/asm/errno.h | 2 --
>  arch/powerpc/kernel/entry_32.S| 3 ++-
>  arch/powerpc/kernel/entry_64.S| 5 +++--
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/uapi/asm/errno.h 
> b/arch/powerpc/include/uapi/asm/errno.h
> index 8c145fd17d86..e8b6b5f7de7c 100644
> --- a/arch/powerpc/include/uapi/asm/errno.h
> +++ b/arch/powerpc/include/uapi/asm/errno.h
> @@ -6,6 +6,4 @@
>  #undef   EDEADLOCK
>  #define  EDEADLOCK   58  /* File locking deadlock error */
>  
> -#define _LAST_ERRNO  516
> -
>  #endif   /* _ASM_POWERPC_ERRNO_H */
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 46fc0f4d8982..67ecdf61f4e3 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -20,6 +20,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -354,7 +355,7 @@ ret_from_syscall:
>   SYNC
>   MTMSRD(r10)
>   lwz r9,TI_FLAGS(r12)
> - li  r8,-_LAST_ERRNO
> + li  r8,-MAX_ERRNO
>   andi.   
> r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
>   bne-syscall_exit_work
>   cmplw   0,r3,r8
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 0796c487d3db..8292581a42f1 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -19,6 +19,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -207,7 +208,7 @@ system_call:  /* label this so stack 
> traces look sane */
>  #endif /* CONFIG_PPC_BOOK3E */
>  
>   ld  r9,TI_FLAGS(r12)
> - li  r11,-_LAST_ERRNO
> + li  r11,-MAX_ERRNO
>   andi.   
> r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
>   bne-syscall_exit_work
>   cmpld   r3,r11
> @@ -279,7 +280,7 @@ syscall_exit_work:
>   beq+0f
>   REST_NVGPRS(r1)
>   b   2f
> -0:   cmpld   r3,r11  /* r10 is -LAST_ERRNO */
> +0:   cmpld   r3,r11  /* r11 is -MAX_ERRNO */
>   blt+1f
>   andi.   r0,r9,_TIF_NOERROR
>   bne-1f


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 01/12] powerpc/kernel: Get pt_regs from r9 before calling do_syscall_trace_enter()

2015-07-16 Thread Benjamin Herrenschmidt
On Wed, 2015-07-15 at 17:37 +1000, Michael Ellerman wrote:
> To call do_syscall_trace_enter() we need pt_regs in r3, but we don't need
> to recalculate it based on r1, it's already in r9.
> 
> Signed-off-by: Michael Ellerman 

Is there any performance difference ?

I find the addi a bit more robust in case the code gets moved around or
the "previous" code gets changed to either not use r9 or clobber it,
which would have the potential to
introduce a subtle bug ...

Ben.

> ---
>  arch/powerpc/kernel/entry_64.S | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 579e0f9a2d57..0796c487d3db 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -243,7 +243,9 @@ syscall_error:
>  /* Traced system call support */
>  syscall_dotrace:
>   bl  save_nvgprs
> - addir3,r1,STACK_FRAME_OVERHEAD
> +
> + /* Get pt_regs into r3 */
> + mr  r3, r9
>   bl  do_syscall_trace_enter
>   /*
>* Restore argument registers possibly just changed.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface

2015-07-16 Thread Peter Zijlstra
On Tue, Jul 14, 2015 at 08:01:54PM -0700, Sukadev Bhattiprolu wrote:
> +/*
> + * Use the transaction interface to read the group of events in @leader.
> + * PMUs like the 24x7 counters in Power, can use this to queue the events
> + * in the ->read() operation and perform the actual read in ->commit_txn.
> + *
> + * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
> + * PMU directly in the ->read() operation.
> + */
> +static int perf_event_read_group(struct perf_event *leader)
> +{
> + int ret;
> + struct perf_event *sub;
> + struct pmu *pmu;
> +
> + pmu = leader->pmu;
> +
> + pmu->start_txn(pmu, PERF_PMU_TXN_READ);
> +
> + perf_event_read(leader);

There should be a lockdep assert with that list iteration.

> + list_for_each_entry(sub, &leader->sibling_list, group_entry)
> + perf_event_read(sub);
> +
> + ret = pmu->commit_txn(pmu);
> +
> + return ret;
> +}
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 5/8] perf: Split perf_event_read_value()

2015-07-16 Thread Sukadev Bhattiprolu
Peter Zijlstra [pet...@infradead.org] wrote:
| On Tue, Jul 14, 2015 at 08:01:52PM -0700, Sukadev Bhattiprolu wrote:
| > Move the part of perf_event_read_value() that computes the event
| > counts and event times into a new function, perf_event_compute().
| > 
| > This would allow us to call perf_event_compute() independently.
| > 
| > Signed-off-by: Sukadev Bhattiprolu 
| > 
| > Changelog[v3]
| > Rather than move perf_event_read() into callers and then
| > rename, just move the computations into a separate function
| > (redesign to address comment from Peter Zijlstra).
| > ---
| 
| Changelog[] bits go here, below the '---' where they get discarded.

Sorry. Will fix it.

| 
| >  kernel/events/core.c |   37 -
| >  1 file changed, 24 insertions(+), 13 deletions(-)
| > 
| > diff --git a/kernel/events/core.c b/kernel/events/core.c
| > index 44fb89d..b1e9a42 100644
| > --- a/kernel/events/core.c
| > +++ b/kernel/events/core.c
| > @@ -3704,6 +3704,29 @@ static int perf_release(struct inode *inode, struct 
file *file)
| > return 0;
| >  }
| >  
| > +static u64 perf_event_compute(struct perf_event *event, u64 *enabled,
| > + u64 *running)
| 
| This is a horrible name, 'compute' what?

We are aggregating event counts and time for children.

Would perf_event_aggregate() or perf_event_aggregate_children()
be better?

| 
| > +{
| > +   struct perf_event *child;
| > +   u64 total;
| > +
| > +   total = perf_event_count(event);
| > +
| > +   *enabled += event->total_time_enabled +
| > +   atomic64_read(&event->child_total_time_enabled);
| > +   *running += event->total_time_running +
| > +   atomic64_read(&event->child_total_time_running);
| > +
| 
|   lockdep_assert_held(&event->child_mutex);

OK. Thanks for the comments.

Sukadev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces

2015-07-16 Thread Sukadev Bhattiprolu
Peter Zijlstra [pet...@infradead.org] wrote:
| On Tue, Jul 14, 2015 at 08:01:50PM -0700, Sukadev Bhattiprolu wrote:
| > @@ -1604,6 +1613,12 @@ static void power_pmu_start_txn(struct pmu *pmu)
| >  static void power_pmu_cancel_txn(struct pmu *pmu)
| >  {
| > struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
| > +   int txn_flags;
| > +
| > +   txn_flags = cpuhw->txn_flags;
| > +   cpuhw->txn_flags = 0;
| > +   if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD)
| > +   return;
| 
| That seems, unintentional? ;-)

Argh. Thanks for catching it.

Sukadev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-07-16 Thread Thomas Huth
On 05/27/2015 01:56 AM, Sam Bobroff wrote:
> In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
> bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
> accessed as such.
> 
> This patch corrects places where it is accessed as a 32 bit field by a
> 64 bit kernel.  In some cases this is via a 32 bit load or store
> instruction which, depending on endianness, will cause either the
> lower or upper 32 bits to be missed.  In another case it is cast as a
> u32, causing the upper 32 bits to be cleared.
> 
> This patch corrects those places by extending the access methods to
> 64 bits.
> 
> Signed-off-by: Sam Bobroff 

Reviewed-by: Thomas Huth 

Actually this patch also fixes a bug that SLOF sometimes crashes when a
vCPU gets kicked out of kernel mode (see the following URL for details:
https://bugzilla.redhat.com/show_bug.cgi?id=1178502 ), and I've just
tested that this bug does not occur with this patch anymore, so also:

Tested-by: Thomas Huth 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 5/8] perf: Split perf_event_read_value()

2015-07-16 Thread Peter Zijlstra
On Tue, Jul 14, 2015 at 08:01:52PM -0700, Sukadev Bhattiprolu wrote:
> Move the part of perf_event_read_value() that computes the event
> counts and event times into a new function, perf_event_compute().
> 
> This would allow us to call perf_event_compute() independently.
> 
> Signed-off-by: Sukadev Bhattiprolu 
> 
> Changelog[v3]
>   Rather than move perf_event_read() into callers and then
>   rename, just move the computations into a separate function
>   (redesign to address comment from Peter Zijlstra).
> ---

Changelog[] bits go here, below the '---' where they get discarded.

>  kernel/events/core.c |   37 -
>  1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 44fb89d..b1e9a42 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3704,6 +3704,29 @@ static int perf_release(struct inode *inode, struct 
> file *file)
>   return 0;
>  }
>  
> +static u64 perf_event_compute(struct perf_event *event, u64 *enabled,
> +   u64 *running)

This is a horrible name, 'compute' what?

> +{
> + struct perf_event *child;
> + u64 total;
> +
> + total = perf_event_count(event);
> +
> + *enabled += event->total_time_enabled +
> + atomic64_read(&event->child_total_time_enabled);
> + *running += event->total_time_running +
> + atomic64_read(&event->child_total_time_running);
> +

lockdep_assert_held(&event->child_mutex);

> + list_for_each_entry(child, &event->child_list, child_list) {
> + perf_event_read(child);
> + total += perf_event_count(child);
> + *enabled += child->total_time_enabled;
> + *running += child->total_time_running;
> + }
> +
> + return total;
> +}
> +
>  /*
>   * Remove all orphanes events from the context.
>   */
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-07-16 Thread Laurent Vivier
On 27/05/2015 01:56, Sam Bobroff wrote:
> In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
> bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
> accessed as such.
> 
> This patch corrects places where it is accessed as a 32 bit field by a
> 64 bit kernel.  In some cases this is via a 32 bit load or store
> instruction which, depending on endianness, will cause either the
> lower or upper 32 bits to be missed.  In another case it is cast as a
> u32, causing the upper 32 bits to be cleared.
> 
> This patch corrects those places by extending the access methods to
> 64 bits.
> 
> Signed-off-by: Sam Bobroff 
> ---
> 
> v3:
> Adjust booke set/get xer to match book3s.
> 
> v2:
> 
> Also extend kvmppc_book3s_shadow_vcpu.xer to 64 bit.
> 
>  arch/powerpc/include/asm/kvm_book3s.h |4 ++--
>  arch/powerpc/include/asm/kvm_book3s_asm.h |2 +-
>  arch/powerpc/include/asm/kvm_booke.h  |4 ++--
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   |6 +++---
>  arch/powerpc/kvm/book3s_segment.S |4 ++--
>  5 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index b91e74a..05a875a 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -225,12 +225,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
>   return vcpu->arch.cr;
>  }
>  
> -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
> +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
>  {
>   vcpu->arch.xer = val;
>  }
>  
> -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
> +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
>  {
>   return vcpu->arch.xer;
>  }
> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
> b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index 5bdfb5d..c4ccd2d 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -112,7 +112,7 @@ struct kvmppc_book3s_shadow_vcpu {
>   bool in_use;
>   ulong gpr[14];
>   u32 cr;
> - u32 xer;
> + ulong xer;
>   ulong ctr;
>   ulong lr;
>   ulong pc;
> diff --git a/arch/powerpc/include/asm/kvm_booke.h 
> b/arch/powerpc/include/asm/kvm_booke.h
> index 3286f0d..bc6e29e 100644
> --- a/arch/powerpc/include/asm/kvm_booke.h
> +++ b/arch/powerpc/include/asm/kvm_booke.h
> @@ -54,12 +54,12 @@ static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
>   return vcpu->arch.cr;
>  }
>  
> -static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
> +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, ulong val)
>  {
>   vcpu->arch.xer = val;
>  }
>  
> -static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
> +static inline ulong kvmppc_get_xer(struct kvm_vcpu *vcpu)
>  {
>   return vcpu->arch.xer;
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 4d70df2..d75be59 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -870,7 +870,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>   blt hdec_soon
>  
>   ld  r6, VCPU_CTR(r4)
> - lwz r7, VCPU_XER(r4)
> + ld  r7, VCPU_XER(r4)
>  
>   mtctr   r6
>   mtxer   r7
> @@ -1103,7 +1103,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
>   mfctr   r3
>   mfxer   r4
>   std r3, VCPU_CTR(r9)
> - stw r4, VCPU_XER(r9)
> + std r4, VCPU_XER(r9)
>  
>   /* If this is a page table miss then see if it's theirs or ours */
>   cmpwi   r12, BOOK3S_INTERRUPT_H_DATA_STORAGE
> @@ -1675,7 +1675,7 @@ kvmppc_hdsi:
>   bl  kvmppc_msr_interrupt
>  fast_interrupt_c_return:
>  6:   ld  r7, VCPU_CTR(r9)
> - lwz r8, VCPU_XER(r9)
> + ld  r8, VCPU_XER(r9)
>   mtctr   r7
>   mtxer   r8
>   mr  r4, r9
> diff --git a/arch/powerpc/kvm/book3s_segment.S 
> b/arch/powerpc/kvm/book3s_segment.S
> index acee37c..ca8f174 100644
> --- a/arch/powerpc/kvm/book3s_segment.S
> +++ b/arch/powerpc/kvm/book3s_segment.S
> @@ -123,7 +123,7 @@ no_dcbz32_on:
>   PPC_LL  r8, SVCPU_CTR(r3)
>   PPC_LL  r9, SVCPU_LR(r3)
>   lwz r10, SVCPU_CR(r3)
> - lwz r11, SVCPU_XER(r3)
> + PPC_LL  r11, SVCPU_XER(r3)
>  
>   mtctr   r8
>   mtlrr9
> @@ -237,7 +237,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>   mfctr   r8
>   mflrr9
>  
> - stw r5, SVCPU_XER(r13)
> + PPC_STL r5, SVCPU_XER(r13)
>   PPC_STL r6, SVCPU_FAULT_DAR(r13)
>   stw r7, SVCPU_FAULT_DSISR(r13)
>   PPC_STL r8, SVCPU_CTR(r13)
> 


Reviewed-by: Laurent Vivier 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces

2015-07-16 Thread Peter Zijlstra
On Tue, Jul 14, 2015 at 08:01:50PM -0700, Sukadev Bhattiprolu wrote:
> +DEFINE_PER_CPU(int, nop_txn_flags);
> +
> +static int nop_txn_flags_get_and_clear(void)
> +{
> + int *flagsp;
> + int flags;
> + 
> + flagsp = &get_cpu_var(nop_txn_flags);
> +
> + flags = *flagsp;
> + *flagsp = 0;
> +
> + put_cpu_var(nop_txn_flags);
> +
> + return flags;
> +}
> +
> +static void nop_txn_flags_set(int flags)
> +{
> + int *flagsp;
> + 
> + flagsp = &get_cpu_var(nop_txn_flags);
> + *flagsp = flags;
> + put_cpu_var(nop_txn_flags);
> +}

That's really horrible, see below:

> +static void perf_pmu_start_txn(struct pmu *pmu, int flags)
>  {
__this_cpu_write(nop_txn_flags, flags);
> +
> + if (flags & ~PERF_PMU_TXN_ADD)
> + return;
> +
>   perf_pmu_disable(pmu);
>  }
>  
>  static int perf_pmu_commit_txn(struct pmu *pmu)
>  {
int flags = __this_cpu_read(nop_txn_flags);
__this_cpu_write(nop_txn_flags, 0);
> +
> + if (flags & ~PERF_PMU_TXN_ADD)
> + return 0;
> +
>   perf_pmu_enable(pmu);
>   return 0;
>  }
>  
>  static void perf_pmu_cancel_txn(struct pmu *pmu)
>  {
int flags = __this_cpu_read(nop_txn_flags);
__this_cpu_write(nop_txn_flags, 0);
> +
> + if (flags & ~PERF_PMU_TXN_ADD)
> + return;
> +
>   perf_pmu_enable(pmu);
>  }
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc/dts: Add and fix 1588 timer node for eTSEC

2015-07-16 Thread Scott Wood
On Wed, 2015-07-15 at 21:37 -0500, Lu Yangbo-B47093 wrote:
> Any comments?
> Thanks.

Sorry, I must have missed this on my last time through the patch queue.  I 
see you've decimalized the fiper and max-adj properties, which is good... but 
does it really make sense for tmr-add?  I'm not familiar with what this value 
represents, but the numbers look more natural as hex (e.g. 0xaaab versus 
2863311531).

> > diff --git a/arch/powerpc/boot/dts/p2020rdb-pc.dtsi
> > b/arch/powerpc/boot/dts/p2020rdb-pc.dtsi
> > index c21d1c7..363172d 100644
> > --- a/arch/powerpc/boot/dts/p2020rdb-pc.dtsi
> > +++ b/arch/powerpc/boot/dts/p2020rdb-pc.dtsi
> > @@ -215,12 +215,12 @@
> > };
> > 
> >  ptp_clock@24e00{
> > -   fsl,tclk-period = <5>;
> > -   fsl,tmr-prsc = <200>;
> > -   fsl,tmr-add = <0xCCCD>;
> > -   fsl,tmr-fiper1 = <0x3B9AC9FB>;
> > -   fsl,tmr-fiper2 = <0x0001869B>;
> > -   fsl,max-adj = <24999>;
> > +   fsl,tclk-period = <5>;
> > +   fsl,tmr-prsc= <2>;
> > +   fsl,tmr-add = <2863311531>;
> > +   fsl,tmr-fiper1  = <5>;
> > +   fsl,tmr-fiper2  = <0>;
> > +   fsl,max-adj = <2>;
> > };

And here, you're changing the value of fsl,tmr-add and fsl,max-adj.  Why?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/corenet: enable eSDHC

2015-07-16 Thread Scott Wood
OK, thanks.  Assuming no similar issues when testing, I'll apply this patch 
the next time I do a batch of patch application.

Any thoughts regarding better error handling?

-Scott

On Wed, 2015-07-15 at 21:37 -0500, Lu Yangbo-B47093 wrote:
> Hi Scott,
> 
> Now the patch below has been merged on 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git.
> This issue should have been resolved. 
> Thanks.
> 
> 
> commit 5fd26c7ecb32082745b0bd33c8e35badd1cb5a91
> Author: Ulf Hansson 
> Date:   Fri Jun 5 11:40:08 2015 +0200
> 
> mmc: sdhci: Restore behavior while creating OCR mask
> 
> 
> > -Original Message-
> > From: Wood Scott-B07421
> > Sent: Tuesday, June 09, 2015 4:31 AM
> > To: Lu Yangbo-B47093
> > Cc: linuxppc-dev@lists.ozlabs.org; linux-...@vger.kernel.org
> > Subject: Re: powerpc/corenet: enable eSDHC
> > 
> > On Mon, 2015-06-08 at 05:12 -0500, Lu Yangbo-B47093 wrote:
> > > Thanks a lot, Scott.
> > > And now a patch was merged on
> > > git://git.linaro.org/people/ulf.hansson/mmc.git next branch to fix
> > > this issue.
> > > It should be no problem.
> > 
> > Assuming that patch fixes it and gets pulled for 4.2, this config patch
> > can go in for 4.3.
> > 
> > That said, it would be nice if, apart from fixing the problem itself, the
> > MMC code failed more gracefully instead of endlessly repeating and
> > filling up the log/console.
> > 
> > -Scott
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/8] perf: Add a flags parameter to pmu txn interfaces

2015-07-16 Thread Peter Zijlstra
On Tue, Jul 14, 2015 at 08:01:50PM -0700, Sukadev Bhattiprolu wrote:
> @@ -1604,6 +1613,12 @@ static void power_pmu_start_txn(struct pmu *pmu)
>  static void power_pmu_cancel_txn(struct pmu *pmu)
>  {
>   struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
> + int txn_flags;
> +
> + txn_flags = cpuhw->txn_flags;
> + cpuhw->txn_flags = 0;
> + if (cpuhw->txn_flags & ~PERF_PMU_TXN_ADD)
> + return;

That seems, unintentional? ;-)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 1/2] perf,kvm/ppc: Add kvm_perf.h for powerpc

2015-07-16 Thread Scott Wood
On Thu, 2015-07-16 at 21:18 +0530, Hemant Kumar wrote:
> To analyze the exit events with perf, we need kvm_perf.h to be added in
> the arch/powerpc directory, where the kvm tracepoints needed to trace
> the KVM exit events are defined.
> 
> This patch adds "kvm_perf_book3s.h" to indicate that the tracepoints are
> book3s specific. Generic "kvm_perf.h" then can just include
> "kvm_perf_book3s.h".
> 
> Signed-off-by: Hemant Kumar 
> ---
> Changes:
> - Not exporting the exit reasons compared to previous patchset (suggested 
> by Paul)
> 
>  arch/powerpc/include/uapi/asm/kvm_perf.h|  6 ++
>  arch/powerpc/include/uapi/asm/kvm_perf_book3s.h | 14 ++
>  2 files changed, 20 insertions(+)
>  create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf.h
>  create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
> 
> diff --git a/arch/powerpc/include/uapi/asm/kvm_perf.h 
> b/arch/powerpc/include/uapi/asm/kvm_perf.h
> new file mode 100644
> index 000..5ed2ff3
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/kvm_perf.h
> @@ -0,0 +1,6 @@
> +#ifndef _ASM_POWERPC_KVM_PERF_H
> +#define _ASM_POWERPC_KVM_PERF_H
> +
> +#include 
> +
> +#endif
> diff --git a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h 
> b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
> new file mode 100644
> index 000..8c8d8c2
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
> @@ -0,0 +1,14 @@
> +#ifndef _ASM_POWERPC_KVM_PERF_BOOK3S_H
> +#define _ASM_POWERPC_KVM_PERF_BOOK3S_H
> +
> +#include 
> +
> +#define DECODE_STR_LEN 20
> +
> +#define VCPU_ID "vcpu_id"
> +
> +#define KVM_ENTRY_TRACE "kvm_hv:kvm_guest_enter"
> +#define KVM_EXIT_TRACE "kvm_hv:kvm_guest_exit"
> +#define KVM_EXIT_REASON "trap"
> +
> +#endif /* _ASM_POWERPC_KVM_PERF_BOOK3S_H */

Again, why is book3s stuff being presented via uapi as generic 
 with generic symbol names?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board support

2015-07-16 Thread Scott Wood
On Thu, 2015-07-16 at 04:34 -0500, Jain Priyanka-B32167 wrote:
> 
> -Original Message-
> From: Wood Scott-B07421 
> Sent: Wednesday, July 15, 2015 11:17 PM
> To: Jain Priyanka-B32167
> Cc: linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board 
> support
> 
> On Wed, 2015-07-15 at 15:00 +0530, Priyanka Jain wrote:
> > T1040D4RDB/T1042D4RDB are Freescale Reference Design Board which can 
> > support T1040/T1042 QorIQ Power Architecture™ processor respectively
> > 
> > T1040D4RDB/T1042D4RDB board Overview
> > -
> > - SERDES Connections, 8 lanes supporting:
> > - PCI
> > - SGMII
> > - SATA 2.0
> > - QSGMII(only for T1040D4RDB)
> > - DDR Controller
> > - Supports rates of up to 1600 MHz data-rate
> > - Supports one DDR4 UDIMM
> > -IFC/Local Bus
> > - NAND flash: 1GB 8-bit NAND flash
> > - NOR: 128MB 16-bit NOR Flash
> > - Ethernet
> > - Two on-board RGMII 10/100/1G ethernet ports.
> > - PHY #0 remains powered up during deep-sleep
> > - CPLD
> > - Clocks
> > - System and DDR clock (SYSCLK, “DDRCLK”)
> > - SERDES clocks
> > - Power Supplies
> > - USB
> > - Supports two USB 2.0 ports with integrated PHYs
> > - Two type A ports with   5V@1.5Aperport.
> > - SDHC
> > - SDHC/SDXC connector
> > - SPI
> > - On-board 64MB SPI flash
> > - I2C
> > - Devices connected: EEPROM, thermal monitor, VID controller
> > - Other IO
> > - Two Serial ports
> > - ProfiBus port
> > 
> > Add support for T1040/T1042D4RDB board:
> > -add device tree
> > -Add entry in corenet_generic.c
> > 
> > Signed-off-by: Priyanka Jain 
> > ---
> >  Changes for v2:
> >   Incorporated Scott's comments on device tree
> 
> You didn't respond to the comments on the CPLD node.
> [Priyanka]
> T1042D4RDB,  T1040D4RDB are derivatives of same board , CPLD is same for 
> both.
> So, I have moved below node having compatible and reg field together in 
> t104xd4rdb.dtsi.
> Is this fine?
>   cpld@3,0 {
>   compatible = "fsl,t1040d4rdb-cpld";
>   reg = <3 0 0x300>;
>   };

If the CPLD image is exactly the same on both, this is fine.

> > +i2c@118100{
> > +  mux@77{
> > + compatible = "nxp,pca9546";
> > + reg = <0x77>;
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > + };
> > + };
> 
> A mux with no nodes under it (and yet it has #address-cells/#size-cells)?  
> What is it multiplexing?
> [Priyanka]: PCA9546 is i2c mux device , to which other i2c devices (up-to 8 
> ) can be further connected on output channels
> On T104xD4RDB,  channel 0, 1, 3 line are connected to PEX device, Channel 2 
> to hdmi interface (initialization is done in u-boot only), other channels 
> are grounded. So, as such Linux is not using the second level I2C devices 
> connected on this MUX device. So, I have not shown next level hierarchy.
> Should I replace 'mux' with some other name? . Please suggest.

The device tree describes the hardware, not just what Linux uses... but what I
don't understand is why you describe the mux at all if you're not going to 
describe what goes underneath it.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] dt-bindings: powerpc: adapt mpc5121-psc document to reality

2015-07-16 Thread Uwe Kleine-König
The drivers support MPC5125 additionally to MPC5121, and there is an spi
mode that is also supported. Additionally some minor corrections are
done.

Signed-off-by: Uwe Kleine-König 
---
Hello,

I sent a patch adding mpc5125 support to the mpc512x driver and Mark
requested the new compatible to be documented. While at it I updated the
document a bit more, and obviously the spi support for mpc5125 depends
on my patch that isn't mainline yet.

Best regards
Uwe

 .../bindings/powerpc/fsl/mpc5121-psc.txt   | 24 --
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt 
b/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt
index 8832e8798912..647817527c88 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt
@@ -6,14 +6,14 @@ PSC in UART mode
 For PSC in UART mode the needed PSC serial devices
 are specified by fsl,mpc5121-psc-uart nodes in the
 fsl,mpc5121-immr SoC node. Additionally the PSC FIFO
-Controller node fsl,mpc5121-psc-fifo is requered there:
+Controller node fsl,mpc5121-psc-fifo is required there:
 
-fsl,mpc5121-psc-uart nodes
+fsl,mpc512x-psc-uart nodes
 --
 
 Required properties :
- - compatible : Should contain "fsl,mpc5121-psc-uart" and "fsl,mpc5121-psc"
- - cell-index : Index of the PSC in hardware
+ - compatible : Should contain "fsl,-psc-uart" and "fsl,-psc"
+   Supported s: mpc5121, mpc5125
  - reg : Offset and length of the register set for the PSC device
  - interrupts :  where a is the interrupt number of the
PSC FIFO Controller and b is a field that represents an
@@ -25,12 +25,21 @@ Recommended properties :
  - fsl,rx-fifo-size : the size of the RX fifo slice (a multiple of 4)
  - fsl,tx-fifo-size : the size of the TX fifo slice (a multiple of 4)
 
+PSC in SPI mode
+---
 
-fsl,mpc5121-psc-fifo node
+Similar to the UART mode a PSC can be operated in SPI mode. The compatible used
+for that is fsl,mpc5121-psc-spi. It requires a fsl,mpc5121-psc-fifo as well.
+The required and recommended properties are identical to the
+fsl,mpc5121-psc-uart nodes, just use spi instead of uart in the compatible
+string.
+
+fsl,mpc512x-psc-fifo node
 -
 
 Required properties :
- - compatible : Should be "fsl,mpc5121-psc-fifo"
+ - compatible : Should be "fsl,-psc-fifo"
+   Supported s: mpc5121, mpc5125
  - reg : Offset and length of the register set for the PSC
  FIFO Controller
  - interrupts :  where a is the interrupt number of the
@@ -39,6 +48,9 @@ Required properties :
  - interrupt-parent : the phandle for the interrupt controller that
services interrupts for this device.
 
+Recommended properties :
+ - clocks : specifies the clock needed to operate the fifo controller
+ - clock-names : name(s) for the clock(s) listed in clocks
 
 Example for a board using PSC0 and PSC1 devices in serial mode:
 
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: BUG: sleeping function called from ras_epow_interrupt context

2015-07-16 Thread Nathan Fontenot
On 07/16/2015 01:23 AM, Thomas Huth wrote:
> On 07/15/2015 09:58 PM, Nathan Fontenot wrote:
>> On 07/15/2015 09:35 AM, Thomas Huth wrote:
>>> On 07/14/2015 11:22 PM, Benjamin Herrenschmidt wrote:
 On Tue, 2015-07-14 at 20:43 +0200, Thomas Huth wrote:
> Any suggestions how to fix this? Simply revert 587f83e8dd50d? Use
> mdelay() instead of msleep() in rtas_busy_delay()? Something more
> fancy?

 A proper fix would be more fancy, the get_sensor should happen in a
 kernel thread instead.
>>>
>>> I'm not very familiar with this stuff, but isn't the EPOW interrupt
>>> something that is very time-critical? Moving parts of the handler into a
>>> kernel thread then does not sound like a very good idea to me...
>>>
>>> Another question: Can it happen at all that this get-sensor call results
>>> in a sleep condition? Looking at commit ID
>>> 81b73dd92b97423b8f5324a59044da478c04f4c4 ("Fix might-sleep warning on
>>> removing cpus"), which apparently fixed a similar issue for CPU
>>> hot-plugging, indicates that at least some of the rtas calls are never
>>> returning the busy code? In that case we could fix this by introducing a
>>> similar rtas_get_sensor_fast() function? (or simply revert 587f83e8dd50d
>>> which would be quite similar, I think)
>>>
>>
>> Looking at the PAPR, the get-sensor-state rtas call for the EPOW sensor
>> is listed as a fast call and should not return a busy indication.
> 
> Great, good to know, thanks for looking that up! So IMHO we should
> either introduce a rtas_get_sensor_fast() function or revert
> 587f83e8dd50d ... any preferences? Shall I come up with a patch?
>
A quick look at the kernel, I only find three places that rtas_get_sensor
is called. The instance you point out here for the EPOW sensor is the
only time I find it called for a sensor that should not return a busy
indication.

Reverting commit 587f83e8dd50d would solve the issue but not fix any future
users of a fast get-sensor call. I don't have an issue with a patch for a
rtas_get_sensor_fast().

-Nathan
 
>> I'm curious as to why we're getting a busy return indication when
>> making this call.
> 
> Looking at the code again, rtas_busy_delay() likely never slept ... it's
> likely just the "might_sleep()" annotation in that function that causes
> the BUG.
> 
>  Thomas
> 
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 2/2] perf/kvm: Support HCALL events

2015-07-16 Thread Hemant Kumar
Powerpc provides hcall events that also provides insights into guest
behaviour. Enhance perf kvm stat to record and analyze hcall events.

 - To trace hcall events :
  perf kvm stat record

 - To show the results :
  perf kvm stat report --event=hcall

The result shows the number of hypervisor calls from the guest grouped
by their respective reasons displayed with the frequency.

This patch makes use of two additional tracepoints
"kvm_hv:kvm_hcall_enter" and "kvm_hv:kvm_hcall_exit". To map the hcall
codes to their respective names, it needs a mapping. Such mapping is
added in this patch in book3s_hcalls.h.

Note that this patch has a dependency on 
"perf,kvm/ppc: Add hcall related info to kvm_perf.h" which adds the
hcall related tracepoints to kvm_perf.h to let "perf kvm stat" know
about these tracepoints.

 # pgrep qemu
A sample output :
19378
60515

2 VMs running.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624 samples) ]

 # perf kvm stat report -p 60515 --event=hcall
Analyze events for pid(s) 60515, all VCPUs:

 HCALL-EVENTSamples  Samples% Time%Min TimeMax Time 
Avg time

H_VIO_SIGNAL   103438.44%15.77%  0.36us  1.59us 
 0.44us ( +-   0.66% )
  H_SEND_CRQ65224.24%10.97%  0.39us  1.84us 
 0.49us ( +-   1.20% )
   H_IPI52319.44%62.05%  1.35us 19.70us 
 3.44us ( +-   2.88% )
 H_PUT_TERM_CHAR41115.28% 8.03%  0.38us  3.77us 
 0.57us ( +-   1.61% )
 H_GET_TERM_CHAR 50 1.86% 0.99%  0.40us  0.98us 
 0.57us ( +-   3.37% )
   H_EOI 20 0.74% 2.19%  2.22us  4.72us 
 3.17us ( +-   5.96% )

Total Samples:2690, Total events handled time:2896.94us.

Signed-off-by: Hemant Kumar 
---
This patch has a direct dependency on :
http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg91605.html

Changes:
- Added definitions for hcall code to hcall reason mapping in the userspace 
side.

 tools/perf/arch/powerpc/util/book3s_hcalls.h | 123 +++
 tools/perf/arch/powerpc/util/kvm-stat.c  |  64 ++
 2 files changed, 187 insertions(+)
 create mode 100644 tools/perf/arch/powerpc/util/book3s_hcalls.h

diff --git a/tools/perf/arch/powerpc/util/book3s_hcalls.h 
b/tools/perf/arch/powerpc/util/book3s_hcalls.h
new file mode 100644
index 000..3d50def
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/book3s_hcalls.h
@@ -0,0 +1,123 @@
+#ifndef ARCH_PERF_BOOK3S_HCALLS_H
+#define ARCH_PERF_BOOK3S_HCALLS_H
+
+/*
+ * PowerPC HCALL codes : hcall name to reason mapping
+ */
+#define kvm_trace_symbol_hcall \
+   {0x4,"H_REMOVE"},   \
+   {0x8,"H_ENTER"},\
+   {0xc,"H_READ"}, \
+   {0x10,"H_CLEAR_MOD"},   \
+   {0x14,"H_CLEAR_REF"},   \
+   {0x18,"H_PROTECT"}, \
+   {0x1c,"H_GET_TCE"}, \
+   {0x20,"H_PUT_TCE"}, \
+   {0x24,"H_SET_SPRG0"},   \
+   {0x28,"H_SET_DABR"},\
+   {0x2c,"H_PAGE_INIT"},   \
+   {0x30,"H_SET_ASR"}, \
+   {0x34,"H_ASR_ON"},  \
+   {0x38,"H_ASR_OFF"}, \
+   {0x3c,"H_LOGICAL_CI_LOAD"}, \
+   {0x40,"H_LOGICAL_CI_STORE"},\
+   {0x44,"H_LOGICAL_CACHE_LOAD"},  \
+   {0x48,"H_LOGICAL_CACHE_STORE"}, \
+   {0x4c,"H_LOGICAL_ICBI"},\
+   {0x50,"H_LOGICAL_DCBF"},\
+   {0x54,"H_GET_TERM_CHAR"},   \
+   {0x58,"H_PUT_TERM_CHAR"},   \
+   {0x5c,"H_REAL_TO_LOGICAL"}, \
+   {0x60,"H_HYPERVISOR_DATA"}, \
+   {0x64,"H_EOI"}, \
+   {0x68,"H_CPPR"},\
+   {0x6c,"H_IPI"}, \
+   {0x70,"H_IPOLL"},   \
+   {0x74,"H_XIRR"},\
+   {0x78,"H_MIGRATE_DMA"}, \
+   {0x7c,"H_PERFMON"}, \
+   {0xdc,"H_REGISTER_VPA"},\
+   {0xe0,"H_CEDE"},\
+   {0xe4,"H_CONFER"},  \
+   {0xe8,"H_PROD"},\
+   {0xec,"H_GET_PPP"}, \
+   {0xf0,"H_SET_PPP"}, \
+   {0xf4,"H_PURR"},\

[PATCH v5 1/2] perf/kvm: Port perf kvm stat to powerpc

2015-07-16 Thread Hemant Kumar
From: Srikar Dronamraju 

perf kvm can be used to analyze guest exit reasons. This support already
exists in x86. Hence, porting it to powerpc.

 - To trace KVM events :
  perf kvm stat record
  If many guests are running, we can track for a specific guest by using
  --pid as in : perf kvm stat record --pid 

 - To see the results :
  perf kvm stat report

The result shows the number of exits (from the guest context to
host/hypervisor context) grouped by their respective exit reasons with
their frequency.

To analyze the different exits, group them and present them (in a
slightly descriptive way) to the user, we need a mapping between the
"exit code" (dumped in the kvm_guest_exit tracepoint data) and to its
related Interrupt vector description (exit reason). This patch adds this
mapping in book3s_exits.h.

It records on two available KVM tracepoints :
"kvm_hv:kvm_guest_exit" and "kvm_hv:kvm_guest_enter".

Note that this patch has a direct dependency on
"perf,kvm/ppc: Add kvm_perf.h for powerpc" which adds kvm_perf.h, where
the required kvm tracpoints are defined for "perf kvm stat" to be used.

Here is a sample o/p:
 # pgrep qemu
19378
60515

2 Guests are running on the host.

 # perf kvm stat record -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.153 MB perf.data.guest (39624 samples) ]

 # perf kvm stat report -p 60515
Analyze events for pid(s) 60515, all VCPUs:

   VM-EXITSamples  Samples% Time%Min Time Max
Time Avg time

H_DATA_STORAGE   500635.30% 0.13%  1.94us 49.46us 
12.37us ( +-   0.52% )
HV_DECREMENTER   445731.43% 0.02%  0.72us 16.14us  
1.91us ( +-   0.96% )
   SYSCALL   269018.97% 0.10%  2.84us528.24us 
18.29us ( +-   3.75% )
RETURN_TO_HOST   178912.61%99.76%  1.58us 672791.91us  
27470.23us ( +-   3.00% )
  EXTERNAL240 1.69% 0.00%  0.69us 10.67us  
1.33us ( +-   5.34% )

Total Samples:14182, Total events handled time:49264158.30us.

Signed-off-by: Srikar Dronamraju 
Signed-off-by: Hemant Kumar 
---
This patch has a direct dependency on:
http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg91603.html

Changes :
- Added exit reasons definitions(unlikely to change) in the userspace side.

 tools/perf/arch/powerpc/Makefile|  1 +
 tools/perf/arch/powerpc/util/Build  |  1 +
 tools/perf/arch/powerpc/util/book3s_exits.h | 33 +
 tools/perf/arch/powerpc/util/kvm-stat.c | 33 +
 4 files changed, 68 insertions(+)
 create mode 100644 tools/perf/arch/powerpc/util/book3s_exits.h
 create mode 100644 tools/perf/arch/powerpc/util/kvm-stat.c

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index 7fbca17..21322e0 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -1,3 +1,4 @@
 ifndef NO_DWARF
 PERF_HAVE_DWARF_REGS := 1
 endif
+HAVE_KVM_STAT_SUPPORT := 1
diff --git a/tools/perf/arch/powerpc/util/Build 
b/tools/perf/arch/powerpc/util/Build
index 7b8b0d1..c8fe207 100644
--- a/tools/perf/arch/powerpc/util/Build
+++ b/tools/perf/arch/powerpc/util/Build
@@ -1,5 +1,6 @@
 libperf-y += header.o
 libperf-y += sym-handling.o
+libperf-y += kvm-stat.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_DWARF) += skip-callchain-idx.o
diff --git a/tools/perf/arch/powerpc/util/book3s_exits.h 
b/tools/perf/arch/powerpc/util/book3s_exits.h
new file mode 100644
index 000..94c58f4
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/book3s_exits.h
@@ -0,0 +1,33 @@
+#ifndef ARCH_PERF_BOOK3S_EXITS_H
+#define ARCH_PERF_BOOK3S_EXITS_H
+
+/*
+ * PowerPC Interrupt vectors : exit code to name mapping
+ */
+
+#define kvm_trace_symbol_exit \
+   {0x0,   "RETURN_TO_HOST"}, \
+   {0x100, "SYSTEM_RESET"}, \
+   {0x200, "MACHINE_CHECK"}, \
+   {0x300, "DATA_STORAGE"}, \
+   {0x380, "DATA_SEGMENT"}, \
+   {0x400, "INST_STORAGE"}, \
+   {0x480, "INST_SEGMENT"}, \
+   {0x500, "EXTERNAL"}, \
+   {0x501, "EXTERNAL_LEVEL"}, \
+   {0x502, "EXTERNAL_HV"}, \
+   {0x600, "ALIGNMENT"}, \
+   {0x700, "PROGRAM"}, \
+   {0x800, "FP_UNAVAIL"}, \
+   {0x900, "DECREMENTER"}, \
+   {0x980, "HV_DECREMENTER"}, \
+   {0xc00, "SYSCALL"}, \
+   {0xd00, "TRACE"}, \
+   {0xe00, "H_DATA_STORAGE"}, \
+   {0xe20, "H_INST_STORAGE"}, \
+   {0xe40, "H_EMUL_ASSIST"}, \
+   {0xf00, "PERFMON"}, \
+   {0xf20, "ALTIVEC"}, \
+   {0xf40, "VSX"}
+
+#endif
diff --git a/tools/perf/arch/powerpc/util/kvm-stat.c 
b/tools/perf/arch/powerpc/util/kvm-stat.c
new file mode 100644
index 000..d0e1930
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/kvm-stat.c
@@ -0,0 +1,33 @@
+#include "../../util/kvm-stat.h"
+#include "book3s_exits.h"
+
+define_exit_reasons_table(hv_exit_reasons, kvm_trace_symbol_exit);
+
+static struct kvm_events_ops 

[PATCH v5 2/2] perf,kvm/ppc: Add hcall related info to kvm_perf.h

2015-07-16 Thread Hemant Kumar
To analyze the hcalls with perf, we need the hcall related tracepoints
information to be exported.

This patch adds hcall tracepoints "kvm_hv:kvm_hcall_enter" and
"kvm_hv:kvm_hcall_exit" to kvm_perf.h. So, perf will now know as to what
tracepoints to look for if we are using "perf kvm stat record" to
collect guest hcall statistics.

Signed-off-by: Hemant Kumar 
---
Changes:
- Not exporting the hcall related codes and names through uapi compared to
  previous patch.

 arch/powerpc/include/uapi/asm/kvm_perf_book3s.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h 
b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
index 8c8d8c2..1378a8d 100644
--- a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
+++ b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
@@ -11,4 +11,8 @@
 #define KVM_EXIT_TRACE "kvm_hv:kvm_guest_exit"
 #define KVM_EXIT_REASON "trap"
 
+#define KVM_HCALL_ENTRY_TRACE "kvm_hv:kvm_hcall_enter"
+#define KVM_HCALL_EXIT_TRACE "kvm_hv:kvm_hcall_exit"
+#define KVM_HCALL_REASON "req"
+
 #endif /* _ASM_POWERPC_KVM_PERF_BOOK3S_H */
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V8 06/10] powerpc/eeh: Create PE for VFs

2015-07-16 Thread Bjorn Helgaas
On Thu, Jun 18, 2015 at 04:06:41PM +0800, Wei Yang wrote:
> Current EEH recovery code works with the assumption: the PE has primary
> bus. Unfortunately, that's not true for VF PEs, which generally contains
> one or multiple VFs (for VF group case).
> 
> The patch introduces a weak function pcibios_bus_add_device() which is
> called by pci_bus_add_device(). In this function, we creates PEs for VFs.
> Those PEs for VFs are identified with newly introduced flag EEH_PE_VF so
> that we handle them differently during EEH recovery.
> 
> [gwshan: changelog and code refactoring]
> Signed-off-by: Wei Yang 
> Acked-by: Gavin Shan 
> ---
>  arch/powerpc/include/asm/eeh.h   |1 +
>  arch/powerpc/kernel/eeh_pe.c |   10 --
>  arch/powerpc/platforms/powernv/eeh-powernv.c |   16 
>  drivers/pci/bus.c|2 ++
>  4 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 1b3614d..c1fde48 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -70,6 +70,7 @@ struct pci_dn;
>  #define EEH_PE_PHB   (1 << 1)/* PHB PE*/
>  #define EEH_PE_DEVICE(1 << 2)/* Device PE */
>  #define EEH_PE_BUS   (1 << 3)/* Bus PE*/
> +#define EEH_PE_VF(1 << 4)/* VF PE */
>  
>  #define EEH_PE_ISOLATED  (1 << 0)/* Isolated PE  
> */
>  #define EEH_PE_RECOVERING(1 << 1)/* Recovering PE*/
> diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
> index 35f0b62..260a701 100644
> --- a/arch/powerpc/kernel/eeh_pe.c
> +++ b/arch/powerpc/kernel/eeh_pe.c
> @@ -299,7 +299,10 @@ static struct eeh_pe *eeh_pe_get_parent(struct eeh_dev 
> *edev)
>* EEH device already having associated PE, but
>* the direct parent EEH device doesn't have yet.
>*/
> - pdn = pdn ? pdn->parent : NULL;
> + if (edev->physfn)
> + pdn = pci_get_pdn(edev->physfn);
> + else
> + pdn = pdn ? pdn->parent : NULL;
>   while (pdn) {
>   /* We're poking out of PCI territory */
>   parent = pdn_to_eeh_dev(pdn);
> @@ -382,7 +385,10 @@ int eeh_add_to_parent_pe(struct eeh_dev *edev)
>   }
>  
>   /* Create a new EEH PE */
> - pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
> + if (edev->physfn)
> + pe = eeh_pe_alloc(edev->phb, EEH_PE_VF);
> + else
> + pe = eeh_pe_alloc(edev->phb, EEH_PE_DEVICE);
>   if (!pe) {
>   pr_err("%s: out of memory!\n", __func__);
>   return -ENOMEM;
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index ce738ab..4ec1d2e 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -1520,6 +1520,22 @@ static struct eeh_ops pnv_eeh_ops = {
>   .restore_config = pnv_eeh_restore_config
>  };
>  
> +void pcibios_bus_add_device(struct pci_dev *pdev)
> +{
> + struct pci_dn *pdn = pci_get_pdn(pdev);
> +
> + if (!pdev->is_virtfn)
> + return;
> +
> + /*
> +  * The following operations will fail if VF's sysfs files
> +  * aren't created or its resources aren't finalized.
> +  */
> + eeh_add_device_early(pdn);
> + eeh_add_device_late(pdev);
> + eeh_sysfs_add_device(pdev);
> +}
> +
>  /**
>   * eeh_powernv_init - Register platform dependent EEH operations
>   *
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 90fa3a7..960577f 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -267,6 +267,7 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx)
>  
>  void __weak pcibios_resource_survey_bus(struct pci_bus *bus) { }
>  
> +void __weak pcibios_bus_add_device(struct pci_dev *dev) { }
>  /**
>   * pci_bus_add_device - start driver for a single device
>   * @dev: device to add
> @@ -277,6 +278,7 @@ void pci_bus_add_device(struct pci_dev *dev)
>  {
>   int retval;
>  
> + pcibios_bus_add_device(dev);

Add a blank line here.

>   /*
>* Can not put in pci_device_add yet because resources
>* are not assigned yet for some devices.

Please put the drivers/pci/bus.c change in a separate patch so it's clear
that we're changing the PCI core here, not just the powerpc code.  That
will also make it possible to revert the powerpc change if necessary
without breaking any other pcibios_bus_add_device() users that may be
added.

You can add my Acked-by: to the new drivers/pci/bus.c patch.

Bjorn
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 1/2] perf,kvm/ppc: Add kvm_perf.h for powerpc

2015-07-16 Thread Hemant Kumar
To analyze the exit events with perf, we need kvm_perf.h to be added in
the arch/powerpc directory, where the kvm tracepoints needed to trace
the KVM exit events are defined.

This patch adds "kvm_perf_book3s.h" to indicate that the tracepoints are
book3s specific. Generic "kvm_perf.h" then can just include
"kvm_perf_book3s.h".

Signed-off-by: Hemant Kumar 
---
Changes:
- Not exporting the exit reasons compared to previous patchset (suggested by 
Paul)

 arch/powerpc/include/uapi/asm/kvm_perf.h|  6 ++
 arch/powerpc/include/uapi/asm/kvm_perf_book3s.h | 14 ++
 2 files changed, 20 insertions(+)
 create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf.h
 create mode 100644 arch/powerpc/include/uapi/asm/kvm_perf_book3s.h

diff --git a/arch/powerpc/include/uapi/asm/kvm_perf.h 
b/arch/powerpc/include/uapi/asm/kvm_perf.h
new file mode 100644
index 000..5ed2ff3
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/kvm_perf.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_POWERPC_KVM_PERF_H
+#define _ASM_POWERPC_KVM_PERF_H
+
+#include 
+
+#endif
diff --git a/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h 
b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
new file mode 100644
index 000..8c8d8c2
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/kvm_perf_book3s.h
@@ -0,0 +1,14 @@
+#ifndef _ASM_POWERPC_KVM_PERF_BOOK3S_H
+#define _ASM_POWERPC_KVM_PERF_BOOK3S_H
+
+#include 
+
+#define DECODE_STR_LEN 20
+
+#define VCPU_ID "vcpu_id"
+
+#define KVM_ENTRY_TRACE "kvm_hv:kvm_guest_enter"
+#define KVM_EXIT_TRACE "kvm_hv:kvm_guest_exit"
+#define KVM_EXIT_REASON "trap"
+
+#endif /* _ASM_POWERPC_KVM_PERF_BOOK3S_H */
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 11/20] tty/hvc: xen: Use xen page definition

2015-07-16 Thread Stefano Stabellini
On Thu, 9 Jul 2015, Julien Grall wrote:
> The console ring is always based on the page granularity of Xen.
> 
> Signed-off-by: Julien Grall 
> Cc: Greg Kroah-Hartman 
> Cc: Jiri Slaby 
> Cc: David Vrabel 
> Cc: Stefano Stabellini 
> Cc: Boris Ostrovsky 
> Cc: linuxppc-dev@lists.ozlabs.org

Reviewed-by: Stefano Stabellini 

>  drivers/tty/hvc/hvc_xen.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
> index a9d837f..2135944 100644
> --- a/drivers/tty/hvc/hvc_xen.c
> +++ b/drivers/tty/hvc/hvc_xen.c
> @@ -230,7 +230,7 @@ static int xen_hvm_console_init(void)
>   if (r < 0 || v == 0)
>   goto err;
>   mfn = v;
> - info->intf = xen_remap(mfn << PAGE_SHIFT, PAGE_SIZE);
> + info->intf = xen_remap(mfn << XEN_PAGE_SHIFT, XEN_PAGE_SIZE);
>   if (info->intf == NULL)
>   goto err;
>   info->vtermno = HVC_COOKIE;
> @@ -392,7 +392,7 @@ static int xencons_connect_backend(struct xenbus_device 
> *dev,
>   if (xen_pv_domain())
>   mfn = virt_to_mfn(info->intf);
>   else
> - mfn = __pa(info->intf) >> PAGE_SHIFT;
> + mfn = __pa(info->intf) >> XEN_PAGE_SHIFT;
>   ret = gnttab_alloc_grant_references(1, &gref_head);
>   if (ret < 0)
>   return ret;
> @@ -476,7 +476,7 @@ static int xencons_resume(struct xenbus_device *dev)
>   struct xencons_info *info = dev_get_drvdata(&dev->dev);
>  
>   xencons_disconnect_backend(info);
> - memset(info->intf, 0, PAGE_SIZE);
> + memset(info->intf, 0, XEN_PAGE_SIZE);
>   return xencons_connect_backend(dev, info);
>  }
>  
> -- 
> 2.1.4
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ipmi/powernv: Fix potential invalid pointer dereference

2015-07-16 Thread Corey Minyard
Ok, this looks fine.  A couple of question...

Do I need to send this upstream right now?  How well has this been tested?

Do you want this backported to 4.0 stable?

-corey

On 07/16/2015 06:16 AM, Neelesh Gupta wrote:
> If the OPAL call to receive the ipmi message fails, then we free up the
> smi message and return. But, the driver still holds the reference to
> old smi message in the 'cur_msg' which can potentially be accessed later
> and freed again leading to kernel oops. To fix it up,
>
> The kernel driver should reset the 'cur_msg' and send reply to the user
> in addition to freeing the message.
>
> Signed-off-by: Neelesh Gupta 
> ---
>  drivers/char/ipmi/ipmi_powernv.c |   13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/char/ipmi/ipmi_powernv.c 
> b/drivers/char/ipmi/ipmi_powernv.c
> index 9b409c0..637486d 100644
> --- a/drivers/char/ipmi/ipmi_powernv.c
> +++ b/drivers/char/ipmi/ipmi_powernv.c
> @@ -143,9 +143,16 @@ static int ipmi_powernv_recv(struct ipmi_smi_powernv 
> *smi)
>   pr_devel("%s:   -> %d (size %lld)\n", __func__,
>   rc, rc == 0 ? size : 0);
>   if (rc) {
> - spin_unlock_irqrestore(&smi->msg_lock, flags);
> - ipmi_free_smi_msg(msg);
> - return 0;
> + /* If came via the poll, and response was not yet ready */
> + if (rc == OPAL_EMPTY) {
> + spin_unlock_irqrestore(&smi->msg_lock, flags);
> + return 0;
> + } else {
> + smi->cur_msg = NULL;
> + spin_unlock_irqrestore(&smi->msg_lock, flags);
> + send_error_reply(smi, msg, IPMI_ERR_UNSPECIFIED);
> + return 0;
> + }
>   }
>  
>   if (size < sizeof(*opal_msg)) {
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] spi: mpc512x-psc: add support for Freescale MPC5125

2015-07-16 Thread Mark Brown
On Wed, Jul 15, 2015 at 09:40:19AM +0200, Uwe Kleine-König wrote:
> On Tue, Jul 14, 2015 at 10:54:42AM +0100, Mark Brown wrote:

> > >  static const struct of_device_id mpc512x_psc_spi_of_match[] = {
> > > - { .compatible = "fsl,mpc5121-psc-spi", },
> > > + { .compatible = "fsl,mpc5121-psc-spi", .data = (void *)TYPE_MPC5121 },
> > > + { .compatible = "fsl,mpc5125-psc-spi", .data = (void *)TYPE_MPC5125 },
> > >   {},

> > The code seems fine but this should update the binding document to
> > include the new compatible string.

> I don't find fsl,mpc5121-psc-spi documented either. The best I found is
> ocumentation/devicetree/bindings/powerpc/fsl/mpc5121-psc.txt which
> describes fsl,mpc5121-psc-uart and fsl,mpc5121-psc.

OK, then please add a basic binding document.  The point is that new
bindings should be being documented, if people have been lax on this in
the past that does involve a bit of cleanup.


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*

2015-07-16 Thread Paul Mackerras
The hardware RNG on POWER8 and POWER7+ can be relatively slow, since
it can only supply one 64-bit value per microsecond.  Currently we
read it in arch_get_random_long(), but that slows down reading from
/dev/urandom since the code in random.c calls arch_get_random_long()
for every longword read from /dev/urandom.

Since the hardware RNG supplies high-quality entropy on every read, it
matches the semantics of arch_get_random_seed_long() better than those
of arch_get_random_long().  Therefore this commit makes the code use
the hardware RNG only for arch_get_random_seed_{long,int} and not for
arch_get_random_{long,int}.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/archrandom.h | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/archrandom.h 
b/arch/powerpc/include/asm/archrandom.h
index 0cc6eed..a4c3f54 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -7,13 +7,22 @@
 
 static inline int arch_get_random_long(unsigned long *v)
 {
+   return 0;
+}
+
+static inline int arch_get_random_int(unsigned int *v)
+{
+   return 0;
+}
+
+static inline int arch_get_random_seed_long(unsigned long *v)
+{
if (ppc_md.get_random_long)
return ppc_md.get_random_long(v);
 
return 0;
 }
-
-static inline int arch_get_random_int(unsigned int *v)
+static inline int arch_get_random_seed_int(unsigned int *v)
 {
unsigned long val;
int rc;
@@ -27,22 +36,13 @@ static inline int arch_get_random_int(unsigned int *v)
 
 static inline int arch_has_random(void)
 {
-   return !!ppc_md.get_random_long;
-}
-
-static inline int arch_get_random_seed_long(unsigned long *v)
-{
-   return 0;
-}
-static inline int arch_get_random_seed_int(unsigned int *v)
-{
return 0;
 }
+
 static inline int arch_has_random_seed(void)
 {
-   return 0;
+   return !!ppc_md.get_random_long;
 }
-
 #endif /* CONFIG_ARCH_RANDOM */
 
 #ifdef CONFIG_PPC_POWERNV
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 13/23] powerpc/time: Migrate to new 'set-state' interface

2015-07-16 Thread Viresh Kumar
Migrate powerpc driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.

This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.

We weren't doing anything in ->set_mode(ONSHOT) and so
set_state_oneshot() isn't implemented.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Viresh Kumar 
---
 arch/powerpc/kernel/time.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 43922509a483..1be1092c7204 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -99,16 +99,17 @@ static struct clocksource clocksource_timebase = {
 
 static int decrementer_set_next_event(unsigned long evt,
  struct clock_event_device *dev);
-static void decrementer_set_mode(enum clock_event_mode mode,
-struct clock_event_device *dev);
+static int decrementer_shutdown(struct clock_event_device *evt);
 
 struct clock_event_device decrementer_clockevent = {
-   .name   = "decrementer",
-   .rating = 200,
-   .irq= 0,
-   .set_next_event = decrementer_set_next_event,
-   .set_mode   = decrementer_set_mode,
-   .features   = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
+   .name   = "decrementer",
+   .rating = 200,
+   .irq= 0,
+   .set_next_event = decrementer_set_next_event,
+   .set_state_shutdown = decrementer_shutdown,
+   .tick_resume= decrementer_shutdown,
+   .features   = CLOCK_EVT_FEAT_ONESHOT |
+ CLOCK_EVT_FEAT_C3STOP,
 };
 EXPORT_SYMBOL(decrementer_clockevent);
 
@@ -862,11 +863,10 @@ static int decrementer_set_next_event(unsigned long evt,
return 0;
 }
 
-static void decrementer_set_mode(enum clock_event_mode mode,
-struct clock_event_device *dev)
+static int decrementer_shutdown(struct clock_event_device *dev)
 {
-   if (mode != CLOCK_EVT_MODE_ONESHOT)
-   decrementer_set_next_event(DECREMENTER_MAX, dev);
+   decrementer_set_next_event(DECREMENTER_MAX, dev);
+   return 0;
 }
 
 /* Interrupt handler for the timer broadcast IPI */
-- 
2.4.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] ipmi/powernv: Fix potential invalid pointer dereference

2015-07-16 Thread Neelesh Gupta
If the OPAL call to receive the ipmi message fails, then we free up the
smi message and return. But, the driver still holds the reference to
old smi message in the 'cur_msg' which can potentially be accessed later
and freed again leading to kernel oops. To fix it up,

The kernel driver should reset the 'cur_msg' and send reply to the user
in addition to freeing the message.

Signed-off-by: Neelesh Gupta 
---
 drivers/char/ipmi/ipmi_powernv.c |   13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_powernv.c b/drivers/char/ipmi/ipmi_powernv.c
index 9b409c0..637486d 100644
--- a/drivers/char/ipmi/ipmi_powernv.c
+++ b/drivers/char/ipmi/ipmi_powernv.c
@@ -143,9 +143,16 @@ static int ipmi_powernv_recv(struct ipmi_smi_powernv *smi)
pr_devel("%s:   -> %d (size %lld)\n", __func__,
rc, rc == 0 ? size : 0);
if (rc) {
-   spin_unlock_irqrestore(&smi->msg_lock, flags);
-   ipmi_free_smi_msg(msg);
-   return 0;
+   /* If came via the poll, and response was not yet ready */
+   if (rc == OPAL_EMPTY) {
+   spin_unlock_irqrestore(&smi->msg_lock, flags);
+   return 0;
+   } else {
+   smi->cur_msg = NULL;
+   spin_unlock_irqrestore(&smi->msg_lock, flags);
+   send_error_reply(smi, msg, IPMI_ERR_UNSPECIFIED);
+   return 0;
+   }
}
 
if (size < sizeof(*opal_msg)) {

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 3/7] powerpc/powernv: Nest PMU detection and device tree parser

2015-07-16 Thread Madhavan Srinivasan
Create a file "nest-pmu.c" to contain nest pmu related functions. Code
to detect nest pmu support and parser to collect per-chip reserved memory
region information from device tree (DT).

Detection mechanism is to look for specific property "ibm,ima-chip" in DT.
For Nest pmu, device tree will have two set of information.
1) Per-chip reserved memory region for nest pmu counter collection area.
2) Supported Nest PMUs and events

Device tree layout for the Nest PMU as follows.

  / -- DT root folder
  |
  -nest-ima -- Nest PMU folder
   |

   -ima-chip@  -- Per-chip folder for reserved region information
|
-ibm,chip-id-- Chip id
-ibm,ima-chip
-reg-- HOMER PORE Nest Counter collection Address (RA)
-size   -- size to map in kernel space

   -Alink_BW-- Nest PMU folder
|
-Alink0 -- Nest PMU Alink Event file
-scale.Alink0.scale -- Event scale file
-unit.Alink0.unit   -- Event unit file
-device_type-- "nest-ima-unit" marker
  

Subsequent patch will parse the next part of the DT to find various
Nest PMUs and their events.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/Makefile   |  2 +-
 arch/powerpc/perf/nest-pmu.c | 85 
 2 files changed, 86 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/perf/nest-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a5652a..6da656b50e3c 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS)   += callchain.o
 obj-$(CONFIG_PPC_PERF_CTRS)+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)  += power4-pmu.o ppc970-pmu.o power5-pmu.o \
   power5+-pmu.o power6-pmu.o power7-pmu.o \
-  power8-pmu.o
+  power8-pmu.o nest-pmu.o
 obj32-$(CONFIG_PPC_PERF_CTRS)  += mpc7450-pmu.o
 
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
new file mode 100644
index ..e7d45ed4922d
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -0,0 +1,85 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "nest-pmu.h"
+
+static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+
+static int nest_ima_dt_parser(void)
+{
+   const __be32 *gcid;
+   const __be64 *chip_ima_reg;
+   const __be64 *chip_ima_size;
+   struct device_node *dev;
+   struct perchip_nest_info *p8ni;
+   int idx;
+
+   /*
+* "nest-ima" folder contains two things,
+* a) per-chip reserved memory region for Nest PMU Counter data
+* b) Support Nest PMU units and their event files
+*/
+   for_each_node_with_property(dev, "ibm,ima-chip") {
+   gcid = of_get_property(dev, "ibm,chip-id", NULL);
+   chip_ima_reg = of_get_property(dev, "reg", NULL);
+   chip_ima_size = of_get_property(dev, "size", NULL);
+
+   if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
+   pr_err("Nest_PMU: device %s missing property\n",
+   dev->full_name);
+   return -ENODEV;
+   }
+
+   /* chip id to save reserve memory region */
+   idx = (uint32_t)be32_to_cpup(gcid);
+
+   /*
+* Using a local variable to make it compact and
+* easier to read
+*/
+   p8ni = &p8_nest_perchip_info[idx];
+   p8ni->pbase = be64_to_cpup(chip_ima_reg);
+   p8ni->size = be64_to_cpup(chip_ima_size);
+   p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
+   }
+
+   return 0;
+}
+
+static int __init nest_pmu_init(void)
+{
+   int ret = -ENODEV;
+
+   /*
+* Lets do this only if we are hypervisor
+*/
+   if (!cur_cpu_spec->oprofile_cpu_type ||
+   !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0) ||
+   !cpu_has_feature(CPU_FTR_HVMODE))
+   return ret;
+
+   /*
+* Nest PMU information is grouped under "nest-ima" node
+* of the top-level device-tree directory. Detect Nest PMU
+* by the "ibm,ima-chip" property.
+*/
+   if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
+   

[PATCH v5 4/7] powerpc/powernv: detect supported nest pmus and its events

2015-07-16 Thread Madhavan Srinivasan
Parse device tree to detect supported nest pmu units. Traverse
through each nest pmu unit folder to find supported events and
corresponding unit/scale files (if any).

The nest unit event file from DT, will contain the offset in the
reserved memory region to get the counter data for a given event.
Kernel code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit in the file name and
passes on the file as an event attr for perf tool to use in the post
processing.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.c | 126 ++-
 1 file changed, 125 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index e7d45ed4922d..c4c08e4dee55 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -11,6 +11,121 @@
 #include "nest-pmu.h"
 
 static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+
+static int nest_event_info(struct property *pp, char *name,
+   struct nest_ima_events *p8_events, int string, u32 val)
+{
+   char *buf;
+
+   /* memory for event name */
+   buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   strncpy(buf, name, strlen(name));
+   p8_events->ev_name = buf;
+
+   /* memory for content */
+   buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   if (string) {
+   /* string content*/
+   if (!pp->value ||
+  (strnlen(pp->value, pp->length) == pp->length) ||
+  (pp->length > P8_NEST_MAX_PMU_NAME_LEN))
+   return -EINVAL;
+
+   strncpy(buf, (const char *)pp->value, pp->length);
+   } else
+   sprintf(buf, "event=0x%x", val);
+
+   p8_events->ev_value = buf;
+   return 0;
+}
+
+static int nest_pmu_create(struct device_node *dev, int pmu_index)
+{
+   struct nest_ima_events **p8_events_arr, *p8_events;
+   struct nest_pmu *pmu_ptr;
+   struct property *pp;
+   char *buf, *start;
+   const __be32 *lval;
+   u32 val;
+   int idx = 0, ret;
+
+   if (!dev)
+   return -EINVAL;
+
+   /* memory for nest pmus */
+   pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
+   if (!pmu_ptr)
+   return -ENOMEM;
+
+   /* Needed for hotplug/migration */
+   per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+   /* memory for nest pmu events */
+   p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
+   GFP_KERNEL);
+   if (!p8_events_arr)
+   return -ENOMEM;
+   p8_events = (struct nest_ima_events *)p8_events_arr;
+
+   /*
+* Loop through each property
+*/
+   for_each_property_of_node(dev, pp) {
+   start = pp->name;
+
+   if (!strcmp(pp->name, "name")) {
+   if (!pp->value ||
+  (strnlen(pp->value, pp->length) == pp->length) ||
+  (pp->length > P8_NEST_MAX_PMU_NAME_LEN))
+   return -EINVAL;
+
+   buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+   if (!buf)
+   return -ENOMEM;
+
+   /* Save the name to register it later */
+   sprintf(buf, "Nest_%s", (char *)pp->value);
+   pmu_ptr->pmu.name = (char *)buf;
+   continue;
+   }
+
+   /* Skip these, we dont need it */
+   if (!strcmp(pp->name, "phandle") ||
+   !strcmp(pp->name, "device_type") ||
+   !strcmp(pp->name, "linux,phandle"))
+   continue;
+
+   if (strncmp(pp->name, "unit.", 5) == 0) {
+   /* Skip first few chars in the name */
+   start += 5;
+   ret = nest_event_info(pp, start, p8_events++, 1, 0);
+   } else if (strncmp(pp->name, "scale.", 6) == 0) {
+   /* Skip first few chars in the name */
+   start += 6;
+   ret = nest_event_info(pp, start, p8_events++, 1, 0);
+   } else {
+   lval = of_get_property(dev, pp->name, NULL);
+   val = (uint32_t)be32_to_cpup(lval);
+
+   ret = nest_event_info(pp, start, p8_events++, 0, val);
+   }
+
+   if (ret)
+   return ret;
+
+   

[PATCH v5 6/7] powerpc/powernv: generic nest pmu event functions

2015-07-16 Thread Madhavan Srinivasan
Add set of generic nest pmu related event functions to be used by
each nest pmu. Add code to register nest pmus.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.c | 105 +++
 1 file changed, 105 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index f3418bdec1cd..2ebd0508e9b3 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -24,6 +24,101 @@ static struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
 };
 
+static int p8_nest_event_init(struct perf_event *event)
+{
+   int chip_id;
+
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   /* Sampling not supported yet */
+   if (event->hw.sample_period)
+   return -EINVAL;
+
+   /* unsupported modes and filters */
+   if (event->attr.exclude_user   ||
+   event->attr.exclude_kernel ||
+   event->attr.exclude_hv ||
+   event->attr.exclude_idle   ||
+   event->attr.exclude_host   ||
+   event->attr.exclude_guest)
+   return -EINVAL;
+
+   if (event->cpu < 0)
+   return -EINVAL;
+
+   chip_id = topology_physical_package_id(event->cpu);
+   event->hw.event_base = event->attr.config +
+   p8_nest_perchip_info[chip_id].vbase;
+
+   return 0;
+}
+
+static void p8_nest_read_counter(struct perf_event *event)
+{
+   uint64_t *addr;
+   u64 data = 0;
+
+   addr = (u64 *)event->hw.event_base;
+   data = __be64_to_cpu(*addr);
+   local64_set(&event->hw.prev_count, data);
+}
+
+static void p8_nest_perf_event_update(struct perf_event *event)
+{
+   u64 counter_prev, counter_new, final_count;
+   uint64_t *addr;
+
+   addr = (uint64_t *)event->hw.event_base;
+   counter_prev = local64_read(&event->hw.prev_count);
+   counter_new = __be64_to_cpu(*addr);
+   final_count = counter_new - counter_prev;
+
+   local64_set(&event->hw.prev_count, counter_new);
+   local64_add(final_count, &event->count);
+}
+
+static void p8_nest_event_start(struct perf_event *event, int flags)
+{
+   event->hw.state = 0;
+   p8_nest_read_counter(event);
+}
+
+static void p8_nest_event_stop(struct perf_event *event, int flags)
+{
+   if (flags & PERF_EF_UPDATE)
+   p8_nest_perf_event_update(event);
+}
+
+static int p8_nest_event_add(struct perf_event *event, int flags)
+{
+   if (flags & PERF_EF_START)
+   p8_nest_event_start(event, flags);
+
+   return 0;
+}
+
+/*
+ * Populate pmu ops in the structure
+ */
+static int update_pmu_ops(struct nest_pmu *pmu)
+{
+   if (!pmu)
+   return -EINVAL;
+
+   pmu->pmu.task_ctx_nr = perf_invalid_context;
+   pmu->pmu.event_init = p8_nest_event_init;
+   pmu->pmu.add = p8_nest_event_add;
+   pmu->pmu.del = p8_nest_event_stop;
+   pmu->pmu.start = p8_nest_event_start;
+   pmu->pmu.stop = p8_nest_event_stop;
+   pmu->pmu.read = p8_nest_perf_event_update;
+   pmu->pmu.attr_groups = pmu->attr_groups;
+
+   return 0;
+}
+
+
 static int nest_event_info(struct property *pp, char *name,
struct nest_ima_events *p8_events, int string, u32 val)
 {
@@ -189,6 +284,16 @@ static int nest_pmu_create(struct device_node *dev, int 
pmu_index)
update_events_in_group(
(struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
 
+   update_pmu_ops(pmu_ptr);
+   /* Register the pmu */
+   ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+   if (ret) {
+   pr_err("Nest PMU %s Register failed\n", pmu_ptr->pmu.name);
+   return ret;
+   }
+
+   pr_info("%s performance monitor hardware support registered\n",
+   pmu_ptr->pmu.name);
return 0;
 }
 
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 5/7] powerpc/powernv: add event attribute and group to nest pmu

2015-07-16 Thread Madhavan Srinivasan
Add code to create event/format attributes and attribute groups for
each nest pmu.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.c | 65 
 1 file changed, 65 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index c4c08e4dee55..f3418bdec1cd 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -13,6 +13,17 @@
 static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
 static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
 
+PMU_FORMAT_ATTR(event, "config:0-20");
+static struct attribute *p8_nest_format_attrs[] = {
+   &format_attr_event.attr,
+   NULL,
+};
+
+static struct attribute_group p8_nest_format_group = {
+   .name = "format",
+   .attrs = p8_nest_format_attrs,
+};
+
 static int nest_event_info(struct property *pp, char *name,
struct nest_ima_events *p8_events, int string, u32 val)
 {
@@ -46,6 +57,56 @@ static int nest_event_info(struct property *pp, char *name,
return 0;
 }
 
+/*
+ * Populate event name and string in attribute
+ */
+static struct attribute *dev_str_attr(const char *name, const char *str)
+{
+   struct perf_pmu_events_attr *attr;
+
+   attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+   sysfs_attr_init(&attr->attr.attr);
+
+   attr->event_str = str;
+   attr->attr.attr.name = name;
+   attr->attr.attr.mode = 0444;
+   attr->attr.show = perf_event_sysfs_show;
+
+   return &attr->attr.attr;
+}
+
+static int update_events_in_group(
+   struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu)
+{
+   struct attribute_group *attr_group;
+   struct attribute **attrs;
+   int i;
+
+   /*
+* Allocate memory for both event attribute group and for
+* event attributes array.
+*/
+   attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
+   sizeof(*attr_group)), GFP_KERNEL);
+   if (!attr_group)
+   return -ENOMEM;
+
+   /*
+* Assign memory for event attribute array
+*/
+   attrs = (struct attribute **)(attr_group + 1);
+   attr_group->name = "events";
+   attr_group->attrs = attrs;
+
+   for (i = 0; i < idx; i++, p8_events++)
+   attrs[i] = dev_str_attr((char *)p8_events->ev_name,
+   (char *)p8_events->ev_value);
+
+   pmu->attr_groups[0] = attr_group;
+   return 0;
+}
+
 static int nest_pmu_create(struct device_node *dev, int pmu_index)
 {
struct nest_ima_events **p8_events_arr, *p8_events;
@@ -93,6 +154,7 @@ static int nest_pmu_create(struct device_node *dev, int 
pmu_index)
/* Save the name to register it later */
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
+   pmu_ptr->attr_groups[1] = &p8_nest_format_group;
continue;
}
 
@@ -124,6 +186,9 @@ static int nest_pmu_create(struct device_node *dev, int 
pmu_index)
idx++;
}
 
+   update_events_in_group(
+   (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
+
return 0;
 }
 
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 1/7] powerpc/powernv: Data structure and macros definition

2015-07-16 Thread Madhavan Srinivasan
Create new header file "nest-pmu.h" to add the data structures
and macros needed for the nest pmu support.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.h | 54 
 1 file changed, 54 insertions(+)
 create mode 100644 arch/powerpc/perf/nest-pmu.h

diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h
new file mode 100644
index ..28e3c6e024a6
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.h
@@ -0,0 +1,54 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define P8_NEST_MAX_CHIPS  32
+#define P8_NEST_MAX_PMUS   32
+#define P8_NEST_MAX_PMU_NAME_LEN   256
+#define P8_NEST_MAX_EVENTS_SUPPORTED   256
+#define P8_NEST_ENGINE_START   1
+#define P8_NEST_ENGINE_STOP0
+#define P8_NEST_MODE_PRODUCTION1
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+   uint32_t chip_id;
+   uint64_t pbase;
+   uint64_t vbase;
+   uint32_t size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct nest_ima_events {
+   const char *ev_name;
+   const char *ev_value;
+};
+
+/*
+ * Device tree parser code detects nest pmu support and
+ * registers new nest pmus. This structure will
+ * hold the pmu functions and attrs for each nest pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct nest_pmu {
+   struct pmu pmu;
+   const struct attribute_group *attr_groups[4];
+};
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 0/7]powerpc/powernv: Nest Instrumentation support

2015-07-16 Thread Madhavan Srinivasan
This patchset enables Nest Instrumentation support on powerpc.
POWER8 has per-chip Nest Intrumentation which provides various
per-chip metrics like memory, powerbus, Xlink and Alink
bandwidth.

Nest Instrumentation provides an interface (via PORE Engine)
to configure and move the nest counter data to memory. From
kernel side, OPAL Call interface is used to activate/deactivate
PORE Engine for nest data collection.

OPAL at boot, detects the feature, initializes it and pass on
the nest units and other related information such as memory
region, events supported so on, to kernel via device-tree.

Kernel code then, parses the device-tree for nest pmu support
and registers nest pmu with the events available. PORE Engine collects
and accumulate nest counter data in per-chip reserved memory region, hence
device-tree also exports per-chip nest accumulation memory region.
And individual event offset are used as event configuration values.

Here is sample perf usage to explain the interface.

#./perf list

  iTLB-load-misses   [Hardware cache event]

  Nest_Alink_BW/Alink0/  [Kernel PMU event]
  Nest_Alink_BW/Alink1/  [Kernel PMU event]
  Nest_Alink_BW/Alink2/  [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_00/   [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_01/   [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_02/   [Kernel PMU event]
  Nest_MCS_Read_BW/MCS_03/   [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_00/  [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_01/  [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_02/  [Kernel PMU event]
  Nest_MCS_Write_BW/MCS_03/  [Kernel PMU event]
  Nest_PowerBus_BW/External/ [Kernel PMU event]
  Nest_PowerBus_BW/Internal/ [Kernel PMU event]
  Nest_Xlink_BW/Xlink0/  [Kernel PMU event]
  Nest_Xlink_BW/Xlink1/  [Kernel PMU event]
  Nest_Xlink_BW/Xlink2/  [Kernel PMU event]

  rNNN   [Raw hardware event 
descriptor]
  cpu/t1=v1[,t2=v2,t3 ...]/modifier  [Raw hardware event 
descriptor]
.

# ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1

 Performance counter stats for 'system wide':

CPU0 15,913.18 MiB  Nest_Xlink_BW/Xlink1/
CPU3211,955.88 MiB  Nest_Xlink_BW/Xlink1/
CPU6411,042.43 MiB  Nest_Xlink_BW/Xlink1/
CPU9614,065.27 MiB  Nest_Xlink_BW/Xlink1/

   1.001062038 seconds time elapsed

# ./perf stat -e 
'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 
1000 sleep 5

 Performance counter stats for 'system wide':

CPU0  0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU32 0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU64 0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU96 0.00 MiB  Nest_Alink_BW/Alink0/   
  (100.00%)
CPU0  1,430.43 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU32   320.99 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU64 3,443.83 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU96 1,904.41 MiB  Nest_Alink_BW/Alink1/   
  (100.00%)
CPU0  2,856.85 MiB  Nest_Alink_BW/Alink2/
CPU32 7.50 MiB  Nest_Alink_BW/Alink2/
CPU64 4,034.29 MiB  Nest_Alink_BW/Alink2/
CPU96   288.49 MiB  Nest_Alink_BW/Alink2/
.

OPAL side patches are posted in the skiboot mailing list.

Changelog from v4:

1) Variable name changes for consistency and added more comments
2) Added sysfs_att_init to have lockdep happy
3) Updated OPAL Call interface changes and added code to handle
   failure case.
4) Added new macro "P8_NEST_MODE_PRODUCTION" to specify PORE Engine mode
5) Modified nest_pmu_cpumask_init function to return value to
   nest pmu init function incase of OPAL call failure.

Changelog from v3:

No logic change, just a rebase to latest upstream kernel.

Changelog from v2:

1) Changed variable and macro names to be consistent.
2) Made changes to commit message and code comment messages
3) Moved "format attribute" related code from patch 6 to 5
4) Added check for pmu register function
5) Changed cpu_init and cpu_exit functions to use first online
   cpu of the chip, there by making code lot simplier.

Changelog from v1:

1) No logic changes, re-ordered patches make each patch compile
   without error

[PATCH v5 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support

2015-07-16 Thread Madhavan Srinivasan
Adds cpumask attribute to be used by each nest pmu since nest
units are per-chip. Only one cpu (first online cpu) from each node/chip
is designated to read counters.

On cpu hotplug, dying cpu is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same node/chip is
designated as new cpu to read counters.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Cc: Preeti U Murthy 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/nest-pmu.c | 172 +++
 1 file changed, 172 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 2ebd0508e9b3..d3a2fd746cf9 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -12,6 +12,7 @@
 
 static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
 static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+static cpumask_t nest_pmu_cpu_mask;
 
 PMU_FORMAT_ATTR(event, "config:0-20");
 static struct attribute *p8_nest_format_attrs[] = {
@@ -24,6 +25,172 @@ static struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
 };
 
+static ssize_t nest_pmu_cpumask_get_attr(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return cpumap_print_to_pagebuf(true, buf, &nest_pmu_cpu_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, nest_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *nest_pmu_cpumask_attrs[] = {
+   &dev_attr_cpumask.attr,
+   NULL,
+};
+
+static struct attribute_group nest_pmu_cpumask_attr_group = {
+   .attrs = nest_pmu_cpumask_attrs,
+};
+
+static void nest_init(int *loc)
+{
+   int rc;
+
+   rc = opal_nest_ima_control(
+   P8_NEST_MODE_PRODUCTION, P8_NEST_ENGINE_START);
+   if (rc)
+   loc[smp_processor_id()] = 1;
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+   int i;
+
+   for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
+   perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+   old_cpu, new_cpu);
+}
+
+static void nest_exit_cpu(int cpu)
+{
+   int nid, target = -1;
+   struct cpumask *l_cpumask;
+
+   /*
+* Check in the designated list for this cpu. Dont bother
+* if not one of them.
+*/
+   if (!cpumask_test_and_clear_cpu(cpu, &nest_pmu_cpu_mask))
+   return;
+
+   /*
+* Now that this cpu is one of the designated,
+* find a next cpu a) which is online and b) in same chip.
+*/
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+   target = cpumask_next(cpu, l_cpumask);
+
+   /*
+* Update the cpumask with the target cpu and
+* migrate the context if needed
+*/
+   if (target >= 0 && target <= nr_cpu_ids) {
+   cpumask_set_cpu(target, &nest_pmu_cpu_mask);
+   nest_change_cpu_context(cpu, target);
+   }
+}
+
+static void nest_init_cpu(int cpu)
+{
+   int nid, fcpu, ncpu;
+   struct cpumask *l_cpumask, tmp_mask;
+
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+
+   /*
+* if empty cpumask, just add incoming cpu and move on.
+*/
+   if (!cpumask_and(&tmp_mask, l_cpumask, &nest_pmu_cpu_mask)) {
+   cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+   return;
+   }
+
+   /*
+* Alway have the first online cpu of a chip as designated one.
+*/
+   fcpu = cpumask_first(l_cpumask);
+   ncpu = cpumask_next(cpu, l_cpumask);
+   if (cpu == fcpu) {
+   if (cpumask_test_and_clear_cpu(ncpu, &nest_pmu_cpu_mask)) {
+   cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+   nest_change_cpu_context(ncpu, cpu);
+   }
+   }
+}
+
+static int nest_pmu_cpu_notifier(struct notifier_block *self,
+   unsigned long action, void *hcpu)
+{
+   long cpu = (long)hcpu;
+
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_ONLINE:
+   nest_init_cpu(cpu);
+   break;
+   case CPU_DOWN_PREPARE:
+  nest_exit_cpu(cpu);
+  break;
+   default:
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block nest_pmu_cpu_nb = {
+   .notifier_call  = nest_pmu_cpu_notifier,
+   .priority   = CPU_PRI_PERF + 1,
+};
+
+static int nest_pmu_cpumask_init(void)
+{
+   const struct cpumask *l_cpumask;
+   int cpu, nid;
+   int *cpus_opal_rc;
+
+   cpu_notifier_register_begin();
+
+   /*
+* Nest PMUs are per-chip counters. So designate a cpu
+* from each chip for count

[PATCH v5 2/7] powerpc/powernv: Add OPAL support for Nest PMU

2015-07-16 Thread Madhavan Srinivasan
Nest Counters can be configured via PORE Engine and OPAL
provides an interface to start/stop it.

OPAL side patches are posted in the skiboot mailing.

Cc: Stewart Smith 
Cc: Jeremy Kerr 
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Anton Blanchard 
Cc: Sukadev Bhattiprolu 
Cc: Anshuman Khandual 
Cc: Stephane Eranian 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/opal-api.h| 3 ++-
 arch/powerpc/include/asm/opal.h| 1 +
 arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index e9e4c52f3685..4cd8128c6ebc 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -154,7 +154,8 @@
 #define OPAL_FLASH_WRITE   111
 #define OPAL_FLASH_ERASE   112
 #define OPAL_PRD_MSG   113
-#define OPAL_LAST  113
+#define OPAL_NEST_IMA_CONTROL  116
+#define OPAL_LAST  116
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 958e941c0cda..7c813ed52ab4 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -202,6 +202,7 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, 
uint64_t buf,
uint64_t size, uint64_t token);
 int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_nest_ima_control(uint64_t mode, uint64_t value);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index d6a7b8252e4d..c475c04468fb 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -297,3 +297,4 @@ OPAL_CALL(opal_flash_read,  
OPAL_FLASH_READ);
 OPAL_CALL(opal_flash_write,OPAL_FLASH_WRITE);
 OPAL_CALL(opal_flash_erase,OPAL_FLASH_ERASE);
 OPAL_CALL(opal_prd_msg,OPAL_PRD_MSG);
+OPAL_CALL(opal_nest_ima_control,   OPAL_NEST_IMA_CONTROL);
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] kprobes: Mark OPTPROBES n/a for powerpc

2015-07-16 Thread Ananth N Mavinakayanahalli
Kprobes uses a breakpoint instruction to trap into execution flow
and the probed instruction is single-stepped from an alternate location.

On some architectures like x86, under certain conditions, the OPTPROBES
feature enables replacing the probed instruction with a jump instead,
resulting in a significant perfomance boost (one single-step exception
is bypassed for each kprobe).

Powerpc has an in-kernel instruction emulator. Kprobes on powerpc uses
this emulator already and bypasses the single-step exception, with a
lot less complexity.

Hence, mark OPTPROBES n/a for powerpc.

Signed-off-by: Ananth N Mavinakayanahalli 
---
 .../features/debug/optprobes/arch-support.txt  |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/features/debug/optprobes/arch-support.txt 
b/Documentation/features/debug/optprobes/arch-support.txt
index b8999d8..0a3ca33 100644
--- a/Documentation/features/debug/optprobes/arch-support.txt
+++ b/Documentation/features/debug/optprobes/arch-support.txt
@@ -27,7 +27,7 @@
 |   nios2: | TODO |
 |openrisc: | TODO |
 |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: | n/a  |
 |s390: | TODO |
 |   score: | TODO |
 |  sh: | TODO |

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 6/6] cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()

2015-07-16 Thread Thomas Gleixner
On Thu, 16 Jul 2015, Baolin Wang wrote:
> On 15 July 2015 at 19:55, Thomas Gleixner  wrote:
> > On Wed, 15 Jul 2015, Baolin Wang wrote:
> >
> >> On 15 July 2015 at 18:31, Thomas Gleixner  wrote:
> >> > On Wed, 15 Jul 2015, Baolin Wang wrote:
> >> >
> >> >> The cputime_to_timespec() and timespec_to_cputime() functions are
> >> >> not year 2038 safe on 32bit systems due to that the struct timepsec
> >> >> will overflow in 2038 year.
> >> >
> >> > And how is this relevant? cputime is not based on wall clock time at
> >> > all. So what has 2038 to do with cputime?
> >> >
> >> > We want proper explanations WHY we need such a change.
> >>
> >> When converting the posix-cpu-timers, it call the
> >> cputime_to_timespec() function. Thus it need a conversion for this
> >> function.
> >
> > There is no requirement to convert posix-cpu-timers on their own. We
> > need to adopt the posix cpu timers code because it shares syscalls
> > with the other posix timers, but that still does not explain why we
> > need these functions.
> >
> 
> In posix-cpu-timers, it also defined some 'k_clock struct' variables,
> and we need to convert the callbacks of the 'k_clock struct' which are
> not year 2038 safe on 32bit systems. Some callbacks which need to
> convert call the cputime_to_timespec() function, thus we also want to
> convert the cputime_to_timespec() function to a year 2038 safe
> function to make all them ready for the year 2038 issue.

You are not getting it at all.

1) We need to change k_clock callbacks due to 2038 issues

2) posix cpu timers implement affected callbacks

3) posix cpu timers themself and cputime are NOT affected by 2038

So we have 2 options to change the code in posix cpu timers:

   A) Do the timespec/timespec64 conversion in the posix cpu timer
  callbacks and leave the cputime functions untouched.

   B) Implement cputime/timespec64 functions to avoid #A

   If you go for #B, you need to provide a reasonable explanation why
   it is better than #A. And that explanation has absolutely nothing
   to do with 2038 safety.

Not everything is a 2038 issue, just because the only tool you have is
a timespec64.

Thanks,

tglx


   
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board support

2015-07-16 Thread Priyanka Jain


-Original Message-
From: Wood Scott-B07421 
Sent: Wednesday, July 15, 2015 11:17 PM
To: Jain Priyanka-B32167
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH][v2] powerpc/fsl-booke: Add T1040D4RDB/T1042D4RDB board 
support

On Wed, 2015-07-15 at 15:00 +0530, Priyanka Jain wrote:
> T1040D4RDB/T1042D4RDB are Freescale Reference Design Board which can 
> support T1040/T1042 QorIQ Power Architecture™ processor respectively
> 
> T1040D4RDB/T1042D4RDB board Overview
> -
> - SERDES Connections, 8 lanes supporting:
> - PCI
> - SGMII
> - SATA 2.0
> - QSGMII(only for T1040D4RDB)
> - DDR Controller
> - Supports rates of up to 1600 MHz data-rate
> - Supports one DDR4 UDIMM
> -IFC/Local Bus
> - NAND flash: 1GB 8-bit NAND flash
> - NOR: 128MB 16-bit NOR Flash
> - Ethernet
> - Two on-board RGMII 10/100/1G ethernet ports.
> - PHY #0 remains powered up during deep-sleep
> - CPLD
> - Clocks
> - System and DDR clock (SYSCLK, “DDRCLK”)
> - SERDES clocks
> - Power Supplies
> - USB
> - Supports two USB 2.0 ports with integrated PHYs
> - Two type A ports with  5V@1.5Aper port.
> - SDHC
> - SDHC/SDXC connector
> - SPI
> - On-board 64MB SPI flash
> - I2C
> - Devices connected: EEPROM, thermal monitor, VID controller
> - Other IO
> - Two Serial ports
> - ProfiBus port
> 
> Add support for T1040/T1042D4RDB board:
> -add device tree
> -Add entry in corenet_generic.c
> 
> Signed-off-by: Priyanka Jain 
> ---
>  Changes for v2:
>   Incorporated Scott's comments on device tree

You didn't respond to the comments on the CPLD node.
[Priyanka]
T1042D4RDB,  T1040D4RDB are derivatives of same board , CPLD is same for both.
So, I have moved below node having compatible and reg field together in 
t104xd4rdb.dtsi.
Is this fine?
cpld@3,0 {
compatible = "fsl,t1040d4rdb-cpld";
reg = <3 0 0x300>;
};


+i2c@118100{
> +  mux@77{
> + compatible = "nxp,pca9546";
> + reg = <0x77>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + };

A mux with no nodes under it (and yet it has #address-cells/#size-cells)?  
What is it multiplexing?
[Priyanka]: PCA9546 is i2c mux device , to which other i2c devices (up-to 8 ) 
can be further connected on output channels
On T104xD4RDB,  channel 0, 1, 3 line are connected to PEX device, Channel 2 to 
hdmi interface (initialization is done in u-boot only), other channels are 
grounded. So, as such Linux is not using the second level I2C devices connected 
on this MUX device. So, I have not shown next level hierarchy.
Should I replace 'mux' with some other name? . Please suggest.


-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/3] powerpc/iommu: Remove dma_data union

2015-07-16 Thread Michael Ellerman
On Wed, 2015-24-06 at 05:25:22 UTC, Benjamin Herrenschmidt wrote:
> To support "hybrid" DMA ops in a subsequent patch, we will need both
> a direct DMA offset and an iommu pointer. Those are currently exclusive
> (a union), so change them to be separate fields.
> 
> While there, also type iommu_table_base properly and make exist only
> on CONFIG_PPC64 since it's not referenced on 32-bit at all.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2db4928bb559f8b43ca7

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3,1/2] cxl: Add explicit precision specifiers

2015-07-16 Thread Michael Ellerman
On Thu, 2015-11-06 at 11:27:51 UTC, Rasmus Villemoes wrote:
> C99 says that a precision given as simply '.' with no following digits
> or * should be interpreted as 0. The kernel's printf implementation,
> however, treats this case as if the precision was omitted. C99 also
> says that if both the precision and value are 0, no digits should be
> printed. Even if the kernel followed C99 to the letter, I don't think
> that would be particularly useful in these cases. For consistency with
> most other format strings in the file, use an explicit precision of 16
> and add a 0x prefix.
> 
> Signed-off-by: Rasmus Villemoes 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/80c394fab89649585089

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/powernv: Unfreeze VF PE on releasing it

2015-07-16 Thread Michael Ellerman
On Tue, 2015-23-06 at 07:01:13 UTC, Gavin Shan wrote:
> When releasing PE for SRIOV VF, the PE is forced to be frozen
> wrongly. When the same PE is picked for another VF, it won't
> work anyhow. The patch fixes the issue by unfreezing, not
> freezing the VF PE when releasing it.
> 
> Signed-off-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f951e51003860705fc9f

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/4] powerpc/powernv: Allow to reserve one PE for multiple times

2015-07-16 Thread Michael Ellerman
On Fri, 2015-19-06 at 02:26:16 UTC, Gavin Shan wrote:
> The PE numbers are reserved according to root port's M64 window,
> which is aligned to M64 segment finely. So one PE shouldn't be
> reserved for multiple times. We will reserve PE numbers according
> to the M64 BARs of PCI device in subsequent patches, which aren't
> aligned to M64 segment size finely. It means one particular PE
> could be reserved for multiple times.
> 
> The patch allows one PE to be reserved for multiple times and we
> print the warning message at debugging level.
> 
> Signed-off-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e9dc4d7f72a375020ecb

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: cxl: Destroy afu->contexts_idr on release of an afu

2015-07-16 Thread Michael Ellerman
On Thu, 2015-09-07 at 07:39:42 UTC, Johannes Thumshirn wrote:
> Destroy afu->contexts_idr on release of an afu, reclaiming the allocated
> memory.
> 
> Signed-off-by: Johannes Thumshirn 
> Acked-by: Ian Munsie 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bd664f892e3e2b01c791

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: cxl: Destroy cxl_adapter_idr on module_exit

2015-07-16 Thread Michael Ellerman
On Wed, 2015-08-07 at 15:14:36 UTC, Johannes Thumshirn wrote:
> Destroy cxl_adapter_idr on module exit, reclaiming the allocated memory.
> 
> Signed-off-by: Johannes Thumshirn 
> Acked-by: Ian Munsie 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b2a02ac65e40fb3900d1

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v9] powerpc/powernv: Add poweroff (EPOW, DPO) events support for PowerNV platform

2015-07-16 Thread Michael Ellerman
On Wed, 2015-08-07 at 11:06:01 UTC, Vipin K Parashar wrote:
> This patch adds support for OPAL EPOW (Environmental and Power Warnings)
> and DPO (Delayed Power Off) events for the PowerNV platform. These events
> are generated on FSP (Flexible Service Processor) based systems. EPOW
> events are generated due to various critical system conditions that
> require system shutdown. A few examples of these conditions are high
> ambient temperature or system running on UPS power with low UPS battery.
> DPO event is generated in response to admin initiated system shutdown
> request. Upon receipt of EPOW and DPO events the host kernel invokes
> orderly_poweroff() for performing graceful system shutdown.
> 
> Signed-off-by: Vipin K Parashar 
> Acked-by: Vaibhav Jain 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/3b476aadbc1409fef6be

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/powernv: Include VF PE in PELTV of PF PE

2015-07-16 Thread Michael Ellerman
On Mon, 2015-22-06 at 03:45:47 UTC, Gavin Shan wrote:
> The PELTV of PF PE should include VF PE, which is missed by current
> code, so that the VF PE is frozen automatically when freezing PF PE.
> The patch fixes the PELTV of PF PE to include VF PE.
> 
> Signed-off-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/283e2d8a594bc902d0c8

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc: Remove mtmsrd(), use existing mtmsr()

2015-07-16 Thread Michael Ellerman
On Tue, 2015-07-07 at 03:56:59 UTC, Anton Blanchard wrote:
> mtmsr() does the right thing on 32bit and 64bit, so use it everywhere.
> 
> Signed-off-by: Anton Blanchard 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1c53973172f84fafa8ad

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: BUG: perf error on syscalls for powerpc64.

2015-07-16 Thread Michael Ellerman
On Thu, 2015-07-16 at 13:57 +0800, Zumeng Chen wrote:
> Hi All,
> 
> 1028ccf5 did a change for sys_call_table from a pointer to an array of
> unsigned long, I think it's not proper, here is my reason:
> 
> sys_call_table defined as a label in assembler should be pointer array
> rather than an array as described in 1028ccf5. If we defined it as an
> array, then arch_syscall_addr will return the address of sys_call_table[],
> actually the content of sys_call_table[] is demanded by arch_syscall_addr.
> so 'perf list' will ignore all syscalls since find_syscall_meta will
> return null
> in init_ftrace_syscalls because of the wrong arch_syscall_addr.
> 
> Did I miss something, or Gcc compiler has done something newer ?

Hi Zumeng,

It works for me with the code as it is in mainline.

I don't quite follow your explanation, so if you're seeing a bug please send
some information about what you're actually seeing. And include the disassembly
of arch_syscall_addr() and your compiler version etc.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 3/3] leds/powernv: Add driver for PowerNV platform

2015-07-16 Thread Michael Ellerman
On Thu, 2015-07-16 at 10:27 +0200, Jacek Anaszewski wrote:
> On 07/16/2015 08:54 AM, Vasant Hegde wrote:
> >>> +static enum led_brightness powernv_led_get(struct led_classdev *led_cdev)
> >>> +{
> >>> +char *loc_code;
> >>> +int rc, led_type;
> >>> +__be64 led_mask, led_value, max_led_type;
> >>> +
> >>> +led_type = powernv_get_led_type(led_cdev);
> >>> +if (led_type == -1)
> >>> +return LED_OFF;
> >>> +
> >>> +loc_code = powernv_get_location_code(led_cdev);
> >>> +if (!loc_code)
> >>> +return LED_OFF;
> >>> +
> >>> +/* Fetch all LED status */
> >>> +led_mask = cpu_to_be64(0);
> >>> +led_value = cpu_to_be64(0);
> >>> +max_led_type = cpu_to_be64(OPAL_SLOT_LED_TYPE_MAX);
> >>> +
> >>> +rc = opal_leds_get_ind(loc_code, &led_mask, &led_value, 
> >>> &max_led_type);
> >>> +if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) {
> >>> +dev_err(led_cdev->dev,
> >>> +"%s: OPAL get led call failed [rc=%d]\n",
> >>> +__func__, rc);
> >>> +goto led_fail;
> >>> +}
> >>> +
> >>> +led_mask = be64_to_cpu(led_mask);
> >>> +led_value = be64_to_cpu(led_value);
> >>
> >> be64_to_cpu result should be assigned to the variable of u64/s64 type.
> >
> > PowerNV platform is capable of running both big/little endian mode.. But
> > presently our firmware is big endian. These variable contains big endian 
> > values.
> > Hence I have created as __be64 .. (This is the convention we follow in other
> > places as well).
> 
> It is correct that the argument is of __be64 type, but be64_to_cpu
> returns u64 type, whereas you assign it to  __be64.

Yeah that's wrong. You are using led_mask etc as __be64 when you pass them to
firmware, which is correct, but then you're also using them as the lvalue of
be64_to_cpu() which returns a u64.

Sparse should warn you about that if you use it, please do.

$ apt-get install sparse
$ cd kernel
$ make C=2 CF=-D__CHECK_ENDIAN__


Whether the kernel or OPAL is running big or little endian is irrelevant to all
of this. The OPAL API defines that parameters to OPAL calls are big endian, and
that's all that matters:

  https://github.com/open-power/skiboot/blob/master/doc/opal-spec.txt#L142


Thanks for the review Jacek.

cheers



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5 3/3] leds/powernv: Add driver for PowerNV platform

2015-07-16 Thread Jacek Anaszewski

Hi Vasan,

On 07/16/2015 08:54 AM, Vasant Hegde wrote:

On 07/14/2015 02:30 PM, Jacek Anaszewski wrote:

Hi Vasant,


Jacek,



Thanks for the update. I think that we have still room
for improvements, please look at my comments below.


Thanks for the detailed review.


You're welcome.


.../...


@@ -0,0 +1,24 @@
+Device Tree binding for LEDs on IBM Power Systems
+-
+


Please start with:

-

Required properties:
- compatible : Should be "ibm,opal-v3-led".

Each location code of FRU/Enclosure must be expressed in the
form of a sub-node.

Required properties for the sub nodes:
- led-types : Supported LED types (attention/identify/fault) provided
   in the form of string array.

-

or something of this flavour. The example should be at the end.



Fixed.





+The 'leds' node under '/ibm,opal' lists service indicators available in the
+system and their capabilities.
+
+leds {
+compatible = "ibm,opal-v3-led";
+led-mode = "lightpath";


What about led-mode property? If it is generated by firmware I think,
that this should be mentioned somehow.


Yes.. Its generated by firmware. Added this property to documentation file.




+
+U78C9.001.RST0027-P1-C1 {
+led-types = "identify", "fault";
+};
+...
+...
+};
+
+Each node under 'leds' node describes location code of FRU/Enclosure.
+
+compatible : should be : "ibm,opal-v3-led".


Second colon was redundant here.



I have added as
-  compatible : "ibm,opal-v3-led".


Please retain "Should be :".




+
+The properties under each node:
+
+  led-types : Supported LED types (attention/identify/fault).
diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
index 4191614..4f56c7a 100644
--- a/drivers/leds/Kconfig
+++ b/drivers/leds/Kconfig
@@ -505,6 +505,17 @@ config LEDS_BLINKM
 This option enables support for the BlinkM RGB LED connected
 through I2C. Say Y to enable support for the BlinkM LED.

+config LEDS_POWERNV
+tristate "LED support for PowerNV Platform"
+depends on LEDS_CLASS
+depends on PPC_POWERNV
+depends on OF
+help
+  This option enables support for the system LEDs present on
+  PowerNV platforms. Say 'y' to enable this support in kernel.
+  To compile this driver as a module, choose 'm' here: the module
+  will be called leds-powernv.
+
   config LEDS_SYSCON
   bool "LED support for LEDs on system controllers"
   depends on LEDS_CLASS=y
diff --git a/drivers/leds/Makefile b/drivers/leds/Makefile
index bf46093..480814a 100644
--- a/drivers/leds/Makefile
+++ b/drivers/leds/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_LEDS_SYSCON)+= leds-syscon.o
   obj-$(CONFIG_LEDS_VERSATILE)+= leds-versatile.o
   obj-$(CONFIG_LEDS_MENF21BMC)+= leds-menf21bmc.o
   obj-$(CONFIG_LEDS_PM8941_WLED)+= leds-pm8941-wled.o
+obj-$(CONFIG_LEDS_POWERNV)+= leds-powernv.o

   # LED SPI Drivers
   obj-$(CONFIG_LEDS_DAC124S085)+= leds-dac124s085.o
diff --git a/drivers/leds/leds-powernv.c b/drivers/leds/leds-powernv.c
new file mode 100644
index 000..b5a307c
--- /dev/null
+++ b/drivers/leds/leds-powernv.c
@@ -0,0 +1,463 @@
+/*
+ * PowerNV LED Driver
+ *
+ * Copyright IBM Corp. 2015
+ *
+ * Author: Vasant Hegde 
+ * Author: Anshuman Khandual 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/*
+ * By default unload path resets all the LEDs. But on PowerNV platform
+ * we want to retain LED state across reboot as these are controlled by
+ * firmware. Also service processor can modify the LEDs independent of
+ * OS. Hence avoid resetting LEDs in unload path.
+ */
+static bool led_disabled;
+
+/* Map LED type to description. */
+struct led_type_map {
+const inttype;
+const char*desc;
+};
+static const struct led_type_map led_type_map[] = {
+{OPAL_SLOT_LED_TYPE_ID,POWERNV_LED_TYPE_IDENTIFY},
+{OPAL_SLOT_LED_TYPE_FAULT,POWERNV_LED_TYPE_FAULT},
+{OPAL_SLOT_LED_TYPE_ATTN,POWERNV_LED_TYPE_ATTENTION},
+{-1,NULL},
+};
+
+/*
+ * LED set routines have been implemented as work queue tasks scheduled
+ * on the global work queue. Individual task calls OPAL interface to set
+ * the LED state which might sleep for some time.
+ */
+struct powernv_led_data {
+struct led_classdevcdev;
+enum led_brightnessvalue; /* Brightness value */
+struct mutexlock;
+struct work_structwork_led; /* LED update workqueue */
+};
+
+struct powernv_leds_priv {
+int num_leds;
+struct powernv_led_data powernv_leds[];
+};
+
+
+static inline int sizeof_powernv_leds_priv(int num_leds)
+{
+return sizeof(struct powernv_le

[PATCH v5 6/6] cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling

2015-07-16 Thread Shilpasri G Bhat
If frequency is throttled due to OCC reset then cpus will be in Psafe
frequency, so restore the frequency on all cpus to policy->cur when
OCCs are active again. And if frequency is throttled due to Pmax
capping then restore the frequency of all the cpus  in the chip on
unthrottling.

Signed-off-by: Shilpasri G Bhat 
Acked-by: Viresh Kumar 
---
No changes from v4

Changes from v3:
- Refer to the members of 'struct opal_occ_msg' in the patch.
  Replace 'reason' with 'omsg.throttle_status'

 drivers/cpufreq/powernv-cpufreq.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 90b4293..546e056 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -48,6 +48,7 @@ static struct chip {
bool throttled;
cpumask_t mask;
struct work_struct throttle;
+   bool restore;
 } *chips;
 
 static int nr_chips;
@@ -415,9 +416,29 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
 void powernv_cpufreq_work_fn(struct work_struct *work)
 {
struct chip *chip = container_of(work, struct chip, throttle);
+   unsigned int cpu;
+   cpumask_var_t mask;
 
smp_call_function_any(&chip->mask,
  powernv_cpufreq_throttle_check, NULL, 0);
+
+   if (!chip->restore)
+   return;
+
+   chip->restore = false;
+   cpumask_copy(mask, &chip->mask);
+   for_each_cpu_and(cpu, mask, cpu_online_mask) {
+   int index, tcpu;
+   struct cpufreq_policy policy;
+
+   cpufreq_get_policy(&policy, cpu);
+   cpufreq_frequency_table_target(&policy, policy.freq_table,
+  policy.cur,
+  CPUFREQ_RELATION_C, &index);
+   powernv_cpufreq_target_index(&policy, index);
+   for_each_cpu(tcpu, policy.cpus)
+   cpumask_clear_cpu(tcpu, mask);
+   }
 }
 
 static char throttle_reason[][30] = {
@@ -469,8 +490,10 @@ static int powernv_cpufreq_occ_msg(struct notifier_block 
*nb,
throttled = false;
pr_info("OCC: Active\n");
 
-   for (i = 0; i < nr_chips; i++)
+   for (i = 0; i < nr_chips; i++) {
+   chips[i].restore = true;
schedule_work(&chips[i].throttle);
+   }
 
return 0;
}
@@ -487,8 +510,11 @@ static int powernv_cpufreq_occ_msg(struct notifier_block 
*nb,
return 0;
 
for (i = 0; i < nr_chips; i++)
-   if (chips[i].id == omsg.chip)
+   if (chips[i].id == omsg.chip) {
+   if (!omsg.throttle_status)
+   chips[i].restore = true;
schedule_work(&chips[i].throttle);
+   }
}
return 0;
 }
@@ -542,6 +568,7 @@ static int init_chip_info(void)
chips[i].throttled = false;
cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
+   chips[i].restore = false;
}
 
return 0;
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 0/6] powernv: cpufreq: Report frequency throttle by OCC

2015-07-16 Thread Shilpasri G Bhat
This patchset intends to add frequency throttle reporting mechanism
to powernv-cpufreq driver when OCC throttles the frequency. OCC is an
On-Chip-Controller which takes care of the power and thermal safety of
the chip. The CPU frequency can be throttled during an OCC reset or
when OCC tries to limit the max allowed frequency. The patchset will
report such conditions so as to keep the user informed about reason
for the drop in performance of workloads when frequency is throttled.

Changes from v4:
- Taken care of Joel Stanley's comment, modification in patch[3].
  This replaces memcpy() with be64_to_cpu() and no change in 
  functionality of the patch

Changes from v3:
- Rebased on top of 4.2-rc1
- Minor changes in patch 2,3,4,6 this does not change the
  functionality of the code
- 594fcb9ec9e powerpc/powernv: Expose OPAL APIs required by PRD
  interface , this patch fixes the build error due to which this
  series was initially dropped
  ERROR: ".opal_message_notifier_register"
  drivers/cpufreq/powernv-cpufreq.ko] undefined!

Changes from v2:
- Split into multiple patches
- Semantic fixes

Shilpasri G Bhat (6):
  cpufreq: powernv: Handle throttling due to Pmax capping at chip level
  powerpc/powernv: Add definition of OPAL_MSG_OCC message type
  cpufreq: powernv: Register for OCC related opal_message notification
  cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE
  cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is
set
  cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling

 arch/powerpc/include/asm/opal-api.h |  12 +++
 drivers/cpufreq/powernv-cpufreq.c   | 198 +---
 2 files changed, 195 insertions(+), 15 deletions(-)

-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 2/6] powerpc/powernv: Add definition of OPAL_MSG_OCC message type

2015-07-16 Thread Shilpasri G Bhat
Add OPAL_MSG_OCC message definition to opal_message_type to receive
OCC events like reset, load and throttled. Host performance can be
affected when OCC is reset or OCC throttles the max Pstate.
We can register to opal_message_notifier to receive OPAL_MSG_OCC type
of message and report it to the userspace so as to keep the user
informed about the reason for a performance drop in workloads.

The reset and load OCC events are notified to kernel when FSP sends
OCC_RESET and OCC_LOAD commands.  Both reset and load messages are
sent to kernel on successful completion of reset and load operation
respectively.

The throttle OCC event indicates that the Pmax of the chip is reduced.
The chip_id and throttle reason for reducing Pmax is also queued along
with the message.

CC: Stewart Smith 
Signed-off-by: Shilpasri G Bhat 
Acked-by: Viresh Kumar 
---
No change from v4

Changes from v3:
- '0d7cd8550d3 powerpc/powernv: Add opal-prd channel' this patch adds
  the definition of OPAL_MSG_PRD, so remove it and update the
  changelog.
- Move the definitions of OCC_RESET, OCC_LOAD and OCC_THROTTLE from 
  drivers/cpufreq/powernv-cpufreq.c to arch/powerpc/include/asm/opal-api.h
- Define OCC_MAX_THROTTLE_STATUS 
- Add a wrapper structure 'opal_occ_msg' to copy 'struct opal_msg.params[0..2]'
  This structure will define the parameters received from firmware to
  maintain compatibility for any future additions.

No change from v2

Change from v1:
- Update the commit changelog

 arch/powerpc/include/asm/opal-api.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index e9e4c52..64dc9f5 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -361,6 +361,7 @@ enum opal_msg_type {
OPAL_MSG_HMI_EVT,
OPAL_MSG_DPO,
OPAL_MSG_PRD,
+   OPAL_MSG_OCC,
OPAL_MSG_TYPE_MAX,
 };
 
@@ -700,6 +701,17 @@ struct opal_prd_msg_header {
 
 struct opal_prd_msg;
 
+#define OCC_RESET   0
+#define OCC_LOAD1
+#define OCC_THROTTLE2
+#define OCC_MAX_THROTTLE_STATUS 5
+
+struct opal_occ_msg {
+   __be64 type;
+   __be64 chip;
+   __be64 throttle_status;
+};
+
 /*
  * SG entries
  *
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 1/6] cpufreq: powernv: Handle throttling due to Pmax capping at chip level

2015-07-16 Thread Shilpasri G Bhat
The On-Chip-Controller(OCC) can throttle cpu frequency by reducing the
max allowed frequency for that chip if the chip exceeds its power or
temperature limits. As Pmax capping is a chip level condition report
this throttling behavior at chip level and also do not set the global
'throttled' on Pmax capping instead set the per-chip throttled
variable. Report unthrottling if Pmax is restored after throttling.

This patch adds a structure to store chip id and throttled state of
the chip.

Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Preeti U Murthy 
Acked-by: Viresh Kumar 
---
No change from v4

 drivers/cpufreq/powernv-cpufreq.c | 59 ---
 1 file changed, 55 insertions(+), 4 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index ebef0d8..d0c18c9 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -42,6 +43,13 @@
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled;
 
+static struct chip {
+   unsigned int id;
+   bool throttled;
+} *chips;
+
+static int nr_chips;
+
 /*
  * Note: The set of pstates consists of contiguous integers, the
  * smallest of which is indicated by powernv_pstate_info.min, the
@@ -301,22 +309,33 @@ static inline unsigned int get_nominal_index(void)
 static void powernv_cpufreq_throttle_check(unsigned int cpu)
 {
unsigned long pmsr;
-   int pmsr_pmax, pmsr_lp;
+   int pmsr_pmax, pmsr_lp, i;
 
pmsr = get_pmspr(SPRN_PMSR);
 
+   for (i = 0; i < nr_chips; i++)
+   if (chips[i].id == cpu_to_chip_id(cpu))
+   break;
+
/* Check for Pmax Capping */
pmsr_pmax = (s8)PMSR_MAX(pmsr);
if (pmsr_pmax != powernv_pstate_info.max) {
-   throttled = true;
-   pr_info("CPU %d Pmax is reduced to %d\n", cpu, pmsr_pmax);
-   pr_info("Max allowed Pstate is capped\n");
+   if (chips[i].throttled)
+   goto next;
+   chips[i].throttled = true;
+   pr_info("CPU %d on Chip %u has Pmax reduced to %d\n", cpu,
+   chips[i].id, pmsr_pmax);
+   } else if (chips[i].throttled) {
+   chips[i].throttled = false;
+   pr_info("CPU %d on Chip %u has Pmax restored to %d\n", cpu,
+   chips[i].id, pmsr_pmax);
}
 
/*
 * Check for Psafe by reading LocalPstate
 * or check if Psafe_mode_active is set in PMSR.
 */
+next:
pmsr_lp = (s8)PMSR_LP(pmsr);
if ((pmsr_lp < powernv_pstate_info.min) ||
(pmsr & PMSR_PSAFE_ENABLE)) {
@@ -414,6 +433,33 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.attr   = powernv_cpu_freq_attr,
 };
 
+static int init_chip_info(void)
+{
+   unsigned int chip[256];
+   unsigned int cpu, i;
+   unsigned int prev_chip_id = UINT_MAX;
+
+   for_each_possible_cpu(cpu) {
+   unsigned int id = cpu_to_chip_id(cpu);
+
+   if (prev_chip_id != id) {
+   prev_chip_id = id;
+   chip[nr_chips++] = id;
+   }
+   }
+
+   chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
+   if (!chips)
+   return -ENOMEM;
+
+   for (i = 0; i < nr_chips; i++) {
+   chips[i].id = chip[i];
+   chips[i].throttled = false;
+   }
+
+   return 0;
+}
+
 static int __init powernv_cpufreq_init(void)
 {
int rc = 0;
@@ -429,6 +475,11 @@ static int __init powernv_cpufreq_init(void)
return rc;
}
 
+   /* Populate chip info */
+   rc = init_chip_info();
+   if (rc)
+   return rc;
+
register_reboot_notifier(&powernv_cpufreq_reboot_nb);
return cpufreq_register_driver(&powernv_cpufreq_driver);
 }
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 5/6] cpufreq: powernv: Report Psafe only if PMSR.psafe_mode_active bit is set

2015-07-16 Thread Shilpasri G Bhat
On a reset cycle of OCC, although the system retires from safe
frequency state the local pstate is not restored to Pmin or last
requested pstate. Now if the cpufreq governor initiates a pstate
change, the local pstate will be in Psafe and we will be reporting a
false positive when we are not throttled.

So in powernv_cpufreq_throttle_check() remove the condition which
checks if local pstate is less than Pmin while checking for Psafe
frequency. If the cpus are forced to Psafe then PMSR.psafe_mode_active
bit will be set. So, when OCCs become active this bit will be cleared.
Let us just rely on this bit for reporting throttling.

Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Preeti U Murthy 
Acked-by: Viresh Kumar 
---
No changes from v4

 drivers/cpufreq/powernv-cpufreq.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 22f33ff..90b4293 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -39,7 +39,6 @@
 #define PMSR_PSAFE_ENABLE  (1UL << 30)
 #define PMSR_SPR_EM_DISABLE(1UL << 31)
 #define PMSR_MAX(x)((x >> 32) & 0xFF)
-#define PMSR_LP(x) ((x >> 48) & 0xFF)
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
@@ -313,7 +312,7 @@ static void powernv_cpufreq_throttle_check(void *data)
 {
unsigned int cpu = smp_processor_id();
unsigned long pmsr;
-   int pmsr_pmax, pmsr_lp, i;
+   int pmsr_pmax, i;
 
pmsr = get_pmspr(SPRN_PMSR);
 
@@ -335,14 +334,9 @@ static void powernv_cpufreq_throttle_check(void *data)
chips[i].id, pmsr_pmax);
}
 
-   /*
-* Check for Psafe by reading LocalPstate
-* or check if Psafe_mode_active is set in PMSR.
-*/
+   /* Check if Psafe_mode_active is set in PMSR. */
 next:
-   pmsr_lp = (s8)PMSR_LP(pmsr);
-   if ((pmsr_lp < powernv_pstate_info.min) ||
-   (pmsr & PMSR_PSAFE_ENABLE)) {
+   if (pmsr & PMSR_PSAFE_ENABLE) {
throttled = true;
pr_info("Pstate set to safe frequency\n");
}
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 4/6] cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE

2015-07-16 Thread Shilpasri G Bhat
Re-evaluate the chip's throttled state on recieving OCC_THROTTLE
notification by executing *throttle_check() on any one of the cpu on
the chip. This is a sanity check to verify if we were indeed
throttled/unthrottled after receiving OCC_THROTTLE notification.

We cannot call *throttle_check() directly from the notification
handler because we could be handling chip1's notification in chip2. So
initiate an smp_call to execute *throttle_check(). We are irq-disabled
in the notification handler, so use a worker thread to smp_call
throttle_check() on any of the cpu in the chipmask.

Signed-off-by: Shilpasri G Bhat 
Acked-by: Viresh Kumar 
---
No changes from v4

Changes from v3:
- Refer to the members of 'struct opal_occ_msg' in the patch.
  Replace 'chip_id' with 'omsg.chip'

 drivers/cpufreq/powernv-cpufreq.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index a634199..22f33ff 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -47,6 +47,8 @@ static bool rebooting, throttled, occ_reset;
 static struct chip {
unsigned int id;
bool throttled;
+   cpumask_t mask;
+   struct work_struct throttle;
 } *chips;
 
 static int nr_chips;
@@ -307,8 +309,9 @@ static inline unsigned int get_nominal_index(void)
return powernv_pstate_info.max - powernv_pstate_info.nominal;
 }
 
-static void powernv_cpufreq_throttle_check(unsigned int cpu)
+static void powernv_cpufreq_throttle_check(void *data)
 {
+   unsigned int cpu = smp_processor_id();
unsigned long pmsr;
int pmsr_pmax, pmsr_lp, i;
 
@@ -370,7 +373,7 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
return 0;
 
if (!throttled)
-   powernv_cpufreq_throttle_check(smp_processor_id());
+   powernv_cpufreq_throttle_check(NULL);
 
freq_data.pstate_id = powernv_freqs[new_index].driver_data;
 
@@ -415,6 +418,14 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
.notifier_call = powernv_cpufreq_reboot_notifier,
 };
 
+void powernv_cpufreq_work_fn(struct work_struct *work)
+{
+   struct chip *chip = container_of(work, struct chip, throttle);
+
+   smp_call_function_any(&chip->mask,
+ powernv_cpufreq_throttle_check, NULL, 0);
+}
+
 static char throttle_reason[][30] = {
"No throttling",
"Power Cap",
@@ -429,6 +440,7 @@ static int powernv_cpufreq_occ_msg(struct notifier_block 
*nb,
 {
struct opal_msg *msg = _msg;
struct opal_occ_msg omsg;
+   int i;
 
if (msg_type != OPAL_MSG_OCC)
return 0;
@@ -462,6 +474,10 @@ static int powernv_cpufreq_occ_msg(struct notifier_block 
*nb,
occ_reset = false;
throttled = false;
pr_info("OCC: Active\n");
+
+   for (i = 0; i < nr_chips; i++)
+   schedule_work(&chips[i].throttle);
+
return 0;
}
 
@@ -473,6 +489,12 @@ static int powernv_cpufreq_occ_msg(struct notifier_block 
*nb,
else if (!omsg.throttle_status)
pr_info("OCC: Chip %u %s\n", (unsigned int)omsg.chip,
throttle_reason[omsg.throttle_status]);
+   else
+   return 0;
+
+   for (i = 0; i < nr_chips; i++)
+   if (chips[i].id == omsg.chip)
+   schedule_work(&chips[i].throttle);
}
return 0;
 }
@@ -524,6 +546,8 @@ static int init_chip_info(void)
for (i = 0; i < nr_chips; i++) {
chips[i].id = chip[i];
chips[i].throttled = false;
+   cpumask_copy(&chips[i].mask, cpumask_of_node(chip[i]));
+   INIT_WORK(&chips[i].throttle, powernv_cpufreq_work_fn);
}
 
return 0;
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v5 3/6] cpufreq: powernv: Register for OCC related opal_message notification

2015-07-16 Thread Shilpasri G Bhat
OCC is an On-Chip-Controller which takes care of power and thermal
safety of the chip. During runtime due to power failure or
overtemperature the OCC may throttle the frequencies of the CPUs to
remain within the power budget.

We want the cpufreq driver to be aware of such situations to be able
to report the reason to the user. We register to opal_message_notifier
to receive OCC messages from opal.

powernv_cpufreq_throttle_check() reports any frequency throttling and
this patch will report the reason or event that caused throttling. We
can be throttled if OCC is reset or OCC limits Pmax due to power or
thermal reasons. We are also notified of unthrottling after an OCC
reset or if OCC restores Pmax on the chip.

Signed-off-by: Shilpasri G Bhat 
Acked-by: Viresh Kumar 
---
Changes from v4:
- Replace memcpy() with be64_to_cpu() to copy the msg->params[]

Changes from v3:
- Move the macro definitions of OCC_RESET, OCC_LOAD, OCC_THROTTLE to
  arch/powerpc/include/asm/opal-api.h
- Use 'struct opal_occ_msg' to copy the 'opal_msg->params[]' and refer
  the members of this structure in the code; Replace 'chip_id',
  'token' and 'reason' with omsg.chip, omsg.type, omsg.throttle_status
- Use OCC_MAX_THROTTLE_STATUS instead of the magic number.
- Add opal_message_notifier_unregister()

Changes from v2:
- Patch split in to multiple patches.
- This patch contains only the opal_message notification handler

Changes from v1:
- Add macros to define OCC_RESET, OCC_LOAD and OCC_THROTTLE
- Define a structure to store chip id, chip mask which has bits set
  for cpus present in the chip, throttled state and a work_struct.
- Modify powernv_cpufreq_throttle_check() to be called via smp_call()
- On Pmax throttling/unthrottling update 'chip.throttled' and not the
  global 'throttled' as Pmax capping is local to the chip.
- Remove the condition which checks if local pstate is less than Pmin
  while checking for Psafe frequency. When OCC becomes active after
  reset we update 'thottled' to false and when the cpufreq governor
  initiates a pstate change, the local pstate will be in Psafe and we
  will be reporting a false positive when we are not throttled.
- Schedule a kworker on receiving throttling/unthrottling OCC message
  for that chip and schedule on all chips after receiving active.
- After an OCC reset all the cpus will be in Psafe frequency. So call
  target() and restore the frequency to policy->cur after OCC_ACTIVE
  and Pmax unthrottling
- Taken care of Viresh and Preeti's comments.
 drivers/cpufreq/powernv-cpufreq.c | 74 ++-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index d0c18c9..a634199 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include  /* Required for cpu_sibling_mask() in UP configs */
+#include 
 
 #define POWERNV_MAX_PSTATES256
 #define PMSR_PSAFE_ENABLE  (1UL << 30)
@@ -41,7 +42,7 @@
 #define PMSR_LP(x) ((x >> 48) & 0xFF)
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
-static bool rebooting, throttled;
+static bool rebooting, throttled, occ_reset;
 
 static struct chip {
unsigned int id;
@@ -414,6 +415,74 @@ static struct notifier_block powernv_cpufreq_reboot_nb = {
.notifier_call = powernv_cpufreq_reboot_notifier,
 };
 
+static char throttle_reason[][30] = {
+   "No throttling",
+   "Power Cap",
+   "Processor Over Temperature",
+   "Power Supply Failure",
+   "Over Current",
+   "OCC Reset"
+};
+
+static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
+  unsigned long msg_type, void *_msg)
+{
+   struct opal_msg *msg = _msg;
+   struct opal_occ_msg omsg;
+
+   if (msg_type != OPAL_MSG_OCC)
+   return 0;
+
+   omsg.type = be64_to_cpu(msg->params[0]);
+
+   switch (omsg.type) {
+   case OCC_RESET:
+   occ_reset = true;
+   /*
+* powernv_cpufreq_throttle_check() is called in
+* target() callback which can detect the throttle state
+* for governors like ondemand.
+* But static governors will not call target() often thus
+* report throttling here.
+*/
+   if (!throttled) {
+   throttled = true;
+   pr_crit("CPU Frequency is throttled\n");
+   }
+   pr_info("OCC: Reset\n");
+   break;
+   case OCC_LOAD:
+   pr_info("OCC: Loaded\n");
+   break;
+   case OCC_THROTTLE:
+   o

BUG: perf error on syscalls for powerpc64.

2015-07-16 Thread Zumeng Chen
Hi All,

1028ccf5 did a change for sys_call_table from a pointer to an array of
unsigned long, I think it's not proper, here is my reason:

sys_call_table defined as a label in assembler should be pointer array
rather than an array as described in 1028ccf5. If we defined it as an
array, then arch_syscall_addr will return the address of sys_call_table[],
actually the content of sys_call_table[] is demanded by arch_syscall_addr.
so 'perf list' will ignore all syscalls since find_syscall_meta will
return null
in init_ftrace_syscalls because of the wrong arch_syscall_addr.

Did I miss something, or Gcc compiler has done something newer ?

Cheers,
Zumeng
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev