Re: [PATCH] powerpc: use device_online/offline() instead of cpu_up/down()

2014-11-02 Thread Bharata B Rao
On Fri, Oct 31, 2014 at 03:41:34PM -0400, Dan Streetman wrote:
> In powerpc pseries platform dlpar operations, Use device_online() and
> device_offline() instead of cpu_up() and cpu_down().
> 
> Calling cpu_up/down directly does not update the cpu device offline
> field, which is used to online/offline a cpu from sysfs.  Calling
> device_online/offline instead keeps the sysfs cpu online value correct.
> The hotplug lock, which is required to be held when calling
> device_online/offline, is already held when dlpar_online/offline_cpu
> are called, since they are called only from cpu_probe|release_store.
> 
> This patch fixes errors on PowerVM systems that have cpu(s) added/removed
> using dlpar operations; without this patch, the
> /sys/devices/system/cpu/cpuN/online nodes do not correctly show the
> online state of added/removed cpus.

Verified the patch to be working as expected when I online and offline
CPUs of a PowerKVM guest using QEMU (plus my RFC hotplug patchset for
QEMU)

Regards,
Bharata.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2] powerpc/TM: Disable/Enable TM looking at the ibm, pa-features device tree entry

2014-11-02 Thread Aneesh Kumar K.V
Runtime disable transactional memory feature looking at pa-features
device tree entry. We need to do this so that we can run a kernel
built with TM config in PR mode. For PR guest we provide a device
tree entry with TM feature disabled in pa-features

Signed-off-by: Aneesh Kumar K.V 
---
Changes from V1:
* rebase to latest linus

 arch/powerpc/kernel/prom.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 099f27e6d1b0..3e22930f15d1 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -160,6 +160,11 @@ static struct ibm_pa_feature {
{CPU_FTR_NODSISRALIGN, 0, 0,1, 1, 1},
{0, MMU_FTR_CI_LARGE_PAGE, 0,   1, 2, 0},
{CPU_FTR_REAL_LE, PPC_FEATURE_TRUE_LE, 5, 0, 0},
+   /*
+* We should use CPU_FTR_TM_COMP so that if we disable TM, it won't get
+* enabled via device tree
+*/
+   {CPU_FTR_TM_COMP, 0, 0, 22, 0, 0},
 };
 
 static void __init scan_features(unsigned long node, const unsigned char *ftrs,
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 00/18] Enable SRIOV on PowerNV

2014-11-02 Thread Wei Yang
This patchset enables the SRIOV on POWER8.

The gerneral idea is put each VF into one individual PE and allocate required
resources like MMIO/DMA/MSI. The major difficulty comes from the MMIO
allocation and adjustment for PF's IOV BAR.

On P8, we use M64BT to cover a PF's IOV BAR, which could make an individual VF
sit in its own PE. This gives more flexiblity, while at the mean time it
brings on some restrictions on the PF's IOV BAR size and alignment.

To achieve this effect, we need to do some hack on pci devices's resources.
1. Expand the IOV BAR properly.
   Done by pnv_pci_ioda_fixup_iov_resources().
2. Shift the IOV BAR properly.
   Done by pnv_pci_vf_resource_shift().
3. IOV BAR alignment is calculated by arch dependent function instead of an
   individual VF BAR size.
   Done by pnv_pcibios_sriov_resource_alignment().
4. Take the IOV BAR alignment into consideration in the sizing and assigning.
   This is achieved by commit: "PCI: Take additional IOV BAR alignment in
   sizing and assigning"

Test Environment:
   The SRIOV device tested is Emulex Lancer(10df:e220) and
   Mellanox ConnectX-3(15b3:1003) on POWER8.

Examples on pass through a VF to guest through vfio:
1. unbind the original driver and bind to vfio-pci driver
   echo :06:0d.0 > /sys/bus/pci/devices/:06:0d.0/driver/unbind
   echo  1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
   Note: this should be done for each device in the same iommu_group
2. Start qemu and pass device through vfio
   /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \
   -M pseries -m 2048 -enable-kvm -nographic \
   -drive file=/home/ywywyang/kvm/fc19.img \
   -monitor telnet:localhost:5435,server,nowait -boot cd \
   -device 
"spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6"

Verify this is the exact VF response:
1. ping from a machine in the same subnet(the broadcast domain)
2. run arp -n on this machine
   9.115.251.20 ether   00:00:c9:df:ed:bf   C eth0
3. ifconfig in the guest
   # ifconfig eth1
   eth1: flags=4163  mtu 1500
inet 9.115.251.20  netmask 255.255.255.0  broadcast 
9.115.251.255
inet6 fe80::200:c9ff:fedf:edbf  prefixlen 64  scopeid 0x20
ether 00:00:c9:df:ed:bf  txqueuelen 1000 (Ethernet)
RX packets 175  bytes 13278 (12.9 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 58  bytes 9276 (9.0 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
4. They have the same MAC address

Note: make sure you shutdown other network interfaces in guest.

---
v9:
   * make the change log consistent in the terminology
 PF's IOV BAR -> the SRIOV BAR in PF
 VF's BAR -> the normal BAR in VF's view
   * rename all newly introduced function from _sriov_ to _iov_
   * rename the document to 
Documentation/powerpc/pci_iov_resource_on_powernv.txt
   * add the vendor id and device id of the tested devices
   * change return value from EINVAL to ENOSYS for pci_iov_virtfn_bus() and
 pci_iov_virtfn_devfn() when it is called on PF or SRIOV is not configured
   * rebase on 3.18-rc2 and tested
v8:
   * use weak funcion pcibios_sriov_resource_size() instead of some flag to
 retrieve the IOV BAR size.
   * add a document Documentation/powerpc/pci_resource.txt to explain the
 design.
   * make pci_iov_virtfn_bus()/pci_iov_virtfn_devfn() not inline.
   * extract a function res_to_dev_res(), so that it is more general to get
 additional size and alignment
   * fix one contention which is introduced in "powrepc/pci: Refactor pci_dn".
 the root cause is pci_get_slot() takes pci_bus_sem and leads to dead
 lock.
v7:
   * add IORESOURCE_ARCH flag for IOV BAR on powernv platform.
   * when IOV BAR has IORESOURCE_ARCH flag, the size is retrieved from
 hardware directly. If not, calculate as usual.
   * reorder the patch set, group them by subsystem:
 PCI, powerpc, powernv
   * rebase it on 3.16-rc6
v6:
   * remove pcibios_enable_sriov()/pcibios_disable_sriov() weak function
 similar function is moved to
 pnv_pci_enable_device_hook()/pnv_pci_disable_device_hook(). When PF is
 enabled, platform will try best to allocate resources for VFs.
   * remove pcibios_sriov_resource_size weak function
   * VF BAR size is retrieved from hardware directly in virtfn_add()
v5:
   * merge those SRIOV related platform functions in machdep_calls
 wrap them in one CONFIG_PCI_IOV marco
   * define IODA_INVALID_M64 to replace (-1)
 use this value to represent the m64_wins is not used
   * rename pnv_pci_release_dev_dma() to pnv_pci_ioda2_release_dma_pe()
 this function is a conterpart to pnv_pci_ioda2_setup_dma_pe()
   * change dev_info() to dev_dgb() in pnv_pci_ioda_fixup_iov_resources()
 redu

[PATCH V9 01/18] PCI/IOV: Export interface for retrieve VF's BDF

2014-11-02 Thread Wei Yang
When implementing the SR-IOV on PowerNV platform, some resource reservation is
needed for VFs which don't exist at the bootup stage. To do the match between
resources and VFs, the code need to get the VF's BDF in advance.

In this patch, it exports the interface to retrieve VF's BDF:
   * Make the virtfn_bus as an interface
   * Make the virtfn_devfn as an interface
   * Rename them with more specific name
   * Code cleanup in pci_sriov_resource_alignment()

Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c   |   22 +-
 include/linux/pci.h |   11 +++
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 4d109c0..5e8091b 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -19,14 +19,18 @@
 
 #define VIRTFN_ID_LEN  16
 
-static inline u8 virtfn_bus(struct pci_dev *dev, int id)
+int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
 {
+   if (!dev->is_physfn)
+   return -EINVAL;
return dev->bus->number + ((dev->devfn + dev->sriov->offset +
dev->sriov->stride * id) >> 8);
 }
 
-static inline u8 virtfn_devfn(struct pci_dev *dev, int id)
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
 {
+   if (!dev->is_physfn)
+   return -EINVAL;
return (dev->devfn + dev->sriov->offset +
dev->sriov->stride * id) & 0xff;
 }
@@ -69,7 +73,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
struct pci_bus *bus;
 
mutex_lock(&iov->dev->sriov->lock);
-   bus = virtfn_add_bus(dev->bus, virtfn_bus(dev, id));
+   bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
if (!bus)
goto failed;
 
@@ -77,7 +81,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
if (!virtfn)
goto failed0;
 
-   virtfn->devfn = virtfn_devfn(dev, id);
+   virtfn->devfn = pci_iov_virtfn_devfn(dev, id);
virtfn->vendor = dev->vendor;
pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_DID, &virtfn->device);
pci_setup_device(virtfn);
@@ -140,8 +144,8 @@ static void virtfn_remove(struct pci_dev *dev, int id, int 
reset)
struct pci_sriov *iov = dev->sriov;
 
virtfn = pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
-virtfn_bus(dev, id),
-virtfn_devfn(dev, id));
+pci_iov_virtfn_bus(dev, id),
+pci_iov_virtfn_devfn(dev, id));
if (!virtfn)
return;
 
@@ -216,7 +220,7 @@ static int sriov_enable(struct pci_dev *dev, int nr_virtfn)
iov->offset = offset;
iov->stride = stride;
 
-   if (virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) {
+   if (pci_iov_virtfn_bus(dev, nr_virtfn - 1) > dev->bus->busn_res.end) {
dev_err(&dev->dev, "SR-IOV: bus number out of range\n");
return -ENOMEM;
}
@@ -516,7 +520,7 @@ resource_size_t pci_sriov_resource_alignment(struct pci_dev 
*dev, int resno)
if (!reg)
return 0;
 
-__pci_read_base(dev, type, &tmp, reg);
+   __pci_read_base(dev, type, &tmp, reg);
return resource_alignment(&tmp);
 }
 
@@ -546,7 +550,7 @@ int pci_iov_bus_range(struct pci_bus *bus)
list_for_each_entry(dev, &bus->devices, bus_list) {
if (!dev->is_physfn)
continue;
-   busnr = virtfn_bus(dev, dev->sriov->total_VFs - 1);
+   busnr = pci_iov_virtfn_bus(dev, dev->sriov->total_VFs - 1);
if (busnr > max)
max = busnr;
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5be8db4..3ed7c66 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1654,6 +1654,9 @@ int pci_ext_cfg_avail(void);
 void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
 
 #ifdef CONFIG_PCI_IOV
+int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
+int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
+
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
 int pci_num_vf(struct pci_dev *dev);
@@ -1661,6 +1664,14 @@ int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 #else
+static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
+static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
+{
+   return -ENOSYS;
+}
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }
 static inline void pci_disable_sriov(struct pci_dev *dev) { }
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinf

[PATCH V9 02/18] PCI: Add weak pcibios_iov_resource_alignment() interface

2014-11-02 Thread Wei Yang
The alignment of PF's IOV BAR is designed to be the individual size of a VF's
BAR size. This works fine for many platforms, but on PowerNV platform it needs
some change.

The original alignment works, since at sizing and assigning stage the
requirement is from an individual VF's BAR size instead of the PF's IOV BAR.
This is the reason for the original code to just retrieve the individual
VF BAR size as the alignment.

On PowerNV platform, it is required to align the whole PF IOV BAR to a hardware
segment. Based on this fact, the alignment of PF's IOV BAR should be
calculated seperately.

This patch introduces a weak pcibios_iov_resource_alignment() interface, which
gives platform a chance to implement specific method to calculate the PF's IOV
BAR alignment.

Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c   |   11 ++-
 include/linux/pci.h |3 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 5e8091b..4d1685d 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -501,6 +501,12 @@ int pci_iov_resource_bar(struct pci_dev *dev, int resno,
4 * (resno - PCI_IOV_RESOURCES);
 }
 
+resource_size_t __weak pcibios_iov_resource_alignment(struct pci_dev *dev,
+   int resno, resource_size_t align)
+{
+   return align;
+}
+
 /**
  * pci_sriov_resource_alignment - get resource alignment for VF BAR
  * @dev: the PCI device
@@ -515,13 +521,16 @@ resource_size_t pci_sriov_resource_alignment(struct 
pci_dev *dev, int resno)
 {
struct resource tmp;
enum pci_bar_type type;
+   resource_size_t align;
int reg = pci_iov_resource_bar(dev, resno, &type);
 
if (!reg)
return 0;
 
__pci_read_base(dev, type, &tmp, reg);
-   return resource_alignment(&tmp);
+   align = resource_alignment(&tmp);
+
+   return pcibios_iov_resource_alignment(dev, resno, align);
 }
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3ed7c66..bbf8058 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1159,6 +1159,9 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 unsigned long type);
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev,
+int resno,
+resource_size_t align);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 03/18] PCI: Add weak pcibios_iov_resource_size() interface

2014-11-02 Thread Wei Yang
When retrieving VF IOV BAR in virtfn_add(), it will divide the total PF's IOV
BAR size with the totalVF number. This is true for most cases, while may not
be correct on some specific platform.

For example on PowerNV platform, in order to fix PF's IOV BAR into a hardware
alignment, the PF's IOV BAR size would be expended. This means the original
method couldn't work.

This patch introduces a weak pcibios_iov_resource_size() interface, which
gives platform a chance to implement specific method to calculate the VF BAR
resource size.

Signed-off-by: Wei Yang 
---
 drivers/pci/iov.c   |   27 +--
 include/linux/pci.h |5 +
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 4d1685d..6866830 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -61,6 +61,30 @@ static void virtfn_remove_bus(struct pci_bus *physbus, 
struct pci_bus *virtbus)
pci_remove_bus(virtbus);
 }
 
+resource_size_t __weak pcibios_iov_resource_size(struct pci_dev *dev, int 
resno)
+{
+   return 0;
+}
+
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
+{
+   resource_size_t size;
+   struct pci_sriov *iov;
+
+   if (!dev->is_physfn)
+   return 0;
+
+   size = pcibios_iov_resource_size(dev, resno);
+   if (size != 0)
+   return size;
+
+   iov = dev->sriov;
+   size = resource_size(dev->resource + resno);
+   do_div(size, iov->total_VFs);
+
+   return size;
+}
+
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
int i;
@@ -96,8 +120,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset)
continue;
virtfn->resource[i].name = pci_name(virtfn);
virtfn->resource[i].flags = res->flags;
-   size = resource_size(res);
-   do_div(size, iov->total_VFs);
+   size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
virtfn->resource[i].start = res->start + size * id;
virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
rc = request_resource(res, &virtfn->resource[i]);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index bbf8058..2f5b454 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1162,6 +1162,8 @@ resource_size_t pcibios_window_alignment(struct pci_bus 
*bus,
 resource_size_t pcibios_iov_resource_alignment(struct pci_dev *dev,
 int resno,
 resource_size_t align);
+resource_size_t pcibios_iov_resource_size(struct pci_dev *dev,
+   int resno);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
@@ -1666,6 +1668,7 @@ int pci_num_vf(struct pci_dev *dev);
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 #else
 static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
 {
@@ -1685,6 +1688,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev 
*dev, u16 numvfs)
 { return 0; }
 static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
+static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int 
resno)
+{ return 0; }
 #endif
 
 #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 04/18] PCI: Take additional PF's IOV BAR alignment in sizing and assigning

2014-11-02 Thread Wei Yang
At resource sizing/assigning stage, resources are divided into two lists,
requested list and additional list, while the alignement of the additional
IOV BAR is not taken into the sizing and assigning procedure.

This is reasonable in the original implementation, since IOV BAR's alignment is
mostly the size of a PF BAR alignemt. This means the alignment is already taken
into consideration. While this rule may be violated on some platform, eg.
PowerNV platform.

This patch takes the additional IOV BAR alignment in sizing and assigning stage
explicitly. When system MMIO space is not enough, the PF's IOV BAR alignment
will not contribute to the bridge. When system MMIO space is enough, the
additional alignment will contribute to the bridge.

Also it take advantage of pci_dev_resource::min_align to store this additional
alignment.

Signed-off-by: Wei Yang 
---
 drivers/pci/setup-bus.c |   85 +++
 1 file changed, 71 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 0482235..05c7df0 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -99,8 +99,8 @@ static void remove_from_list(struct list_head *head,
}
 }
 
-static resource_size_t get_res_add_size(struct list_head *head,
-   struct resource *res)
+static struct pci_dev_resource *res_to_dev_res(struct list_head *head,
+  struct resource *res)
 {
struct pci_dev_resource *dev_res;
 
@@ -109,17 +109,37 @@ static resource_size_t get_res_add_size(struct list_head 
*head,
int idx = res - &dev_res->dev->resource[0];
 
dev_printk(KERN_DEBUG, &dev_res->dev->dev,
-"res[%d]=%pR get_res_add_size add_size %llx\n",
+"res[%d]=%pR res_to_dev_res add_size %llx 
min_align %llx\n",
 idx, dev_res->res,
-(unsigned long long)dev_res->add_size);
+(unsigned long long)dev_res->add_size,
+(unsigned long long)dev_res->min_align);
 
-   return dev_res->add_size;
+   return dev_res;
}
}
 
-   return 0;
+   return NULL;
+}
+
+static resource_size_t get_res_add_size(struct list_head *head,
+   struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res->add_size : 0;
+}
+
+static resource_size_t get_res_add_align(struct list_head *head,
+   struct resource *res)
+{
+   struct pci_dev_resource *dev_res;
+
+   dev_res = res_to_dev_res(head, res);
+   return dev_res ? dev_res->min_align : 0;
 }
 
+
 /* Sort resources by alignment */
 static void pdev_sort_resources(struct pci_dev *dev, struct list_head *head)
 {
@@ -368,8 +388,9 @@ static void __assign_resources_sorted(struct list_head 
*head,
LIST_HEAD(save_head);
LIST_HEAD(local_fail_head);
struct pci_dev_resource *save_res;
-   struct pci_dev_resource *dev_res, *tmp_res;
+   struct pci_dev_resource *dev_res, *tmp_res, *dev_res2;
unsigned long fail_type;
+   resource_size_t add_align, align;
 
/* Check if optional add_size is there */
if (!realloc_head || list_empty(realloc_head))
@@ -384,10 +405,38 @@ static void __assign_resources_sorted(struct list_head 
*head,
}
 
/* Update res in head list with add_size in realloc_head list */
-   list_for_each_entry(dev_res, head, list)
+   list_for_each_entry_safe(dev_res, tmp_res, head, list) {
dev_res->res->end += get_res_add_size(realloc_head,
dev_res->res);
 
+   /* 
+* There are two kinds additional resources in the list:
+* 1. bridge resource  -- IORESOURCE_STARTALIGN
+* 2. SRIOV resource   -- IORESOURCE_SIZEALIGN
+* Here just fix the additional alignment for bridge
+*/
+   if (!(dev_res->res->flags & IORESOURCE_STARTALIGN))
+   continue;
+
+   add_align = get_res_add_align(realloc_head, dev_res->res);
+
+   /* Reorder the list by their alignment */
+   if (add_align > dev_res->res->start) {
+   dev_res->res->start = add_align;
+   dev_res->res->end = add_align +
+   resource_size(dev_res->res);
+
+   list_for_each_entry(dev_res2, head, list) {
+   align = pci_resource_alignment(dev_res2->dev,
+  dev_res2->res);
+   if (ad

[PATCH V9 05/18] powerpc/pci: Add PCI resource alignment documentation

2014-11-02 Thread Wei Yang
In order to enable SRIOV on PowerNV platform, the PF's IOV BAR needs to be
adjusted:
1. size expaned
2. aligned to M64BT size

This patch documents this change on the reason and how.

Signed-off-by: Wei Yang 
---
 .../powerpc/pci_iov_resource_on_powernv.txt|   75 
 1 file changed, 75 insertions(+)
 create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt

diff --git a/Documentation/powerpc/pci_iov_resource_on_powernv.txt 
b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
new file mode 100644
index 000..8b3f346
--- /dev/null
+++ b/Documentation/powerpc/pci_iov_resource_on_powernv.txt
@@ -0,0 +1,75 @@
+Wei Yang 
+26 Aug 2014
+
+This document describes the requirement from hardware for PCI MMIO resource
+sizing and assignment on PowerNV platform and how generic PCI code handle this
+requirement.
+
+1. Hardware requirement on PowerNV platform
+On PowerNV platform, IODA2 version, it has 16 M64 BARs, which is used to map
+MMIO range to PE#. Each M64 BAR would cover one MMIO range and this range is
+divided by *total_pe* number evenly with one piece corresponding to one PE.
+
+We decide to leverage this M64 BAR to map VFs to their individual PE, since
+for SRIOV VFs their BAR share the same size.
+
+By doing so, it introduces another problem. The *total_pe* number usually is
+bigger than the total_VFs. If we map one IOV BAR directly to one M64 BAR, some
+part in M64 BAR will map to another devices MMIO range.
+
+ 0  1 total_VFs - 1
+ +--+--+- -+--+--+
+ |  |  |  ...  |  |  |
+ +--+--+- -+--+--+
+
+   IOV BAR
+ 0  1 total_VFs - 1  total_pe - 1
+ +--+--+- -+--+--+-  -+--+--+
+ |  |  |  ...  |  |  |   ...  |  |  |
+ +--+--+- -+--+--+-  -+--+--+
+
+   M64 BAR
+
+   Figure 1.0 Direct map IOV BAR
+
+As Figure 1.0 indicates, the range [total_VFs, total_pe - 1] in M64 BAR may
+map to some MMIO range on other device.
+
+The solution currently we have is to expand the IOV BAR to *total_pe* number.
+
+ 0  1 total_VFs - 1  total_pe - 1
+ +--+--+- -+--+--+-  -+--+--+
+ |  |  |  ...  |  |  |   ...  |  |  |
+ +--+--+- -+--+--+-  -+--+--+
+
+   IOV BAR
+ 0  1 total_VFs - 1  total_pe - 1
+ +--+--+- -+--+--+-  -+--+--+
+ |  |  |  ...  |  |  |   ...  |  |  |
+ +--+--+- -+--+--+-  -+--+--+
+
+   M64 BAR
+
+   Figure 1.1 Map expanded IOV BAR
+
+By expanding the IOV BAR, this ensures the whole M64 range will not effect
+others.
+
+2. How generic PCI code handle it
+Till now, it looks good to make it work, while another problem comes. The M64
+BAR start address needs to be size aligned, while the original generic PCI
+code assign the IOV BAR with individual VF BAR size aligned.
+
+Since usually one SRIOV VF BAR size is the same as its PF size, the original
+generic PCI code will not count in the IOV BAR alignment. (The alignment is
+the same as its PF.) With the change from PowerNV platform, this changes. The
+alignment of the IOV BAR is now the total size, then we need to count in it.
+
+From:
+   alignment(IOV BAR) = size(VF BAR) = size(PF BAR)
+To:
+   alignment(IOV BAR) = size(IOV BAR)
+
+In commit(PCI: Take additional IOV BAR alignment in sizing and assigning), it
+has add_align to track the alignment from IOV BAR and use it to meet the
+requirement.
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 06/18] powerpc/pci: Don't unset pci resources for VFs

2014-11-02 Thread Wei Yang
If we're going to reassign resources with flag PCI_REASSIGN_ALL_RSRC, all
resources will be cleaned out during device header fixup time and then get
reassigned by PCI core. However, the VF resources won't be reassigned and
thus, we shouldn't clean them out.

This patch adds a condition. If the pci_dev is a VF, skip the resource
unset process.

Signed-off-by: Wei Yang 
---
 arch/powerpc/kernel/pci-common.c |4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index e5dad9a..399d813 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -789,6 +789,10 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
   pci_name(dev));
return;
}
+
+   if (dev->is_virtfn)
+   return;
+
for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
struct resource *res = dev->resource + i;
struct pci_bus_region reg;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 07/18] powerpc/pci: Define pcibios_disable_device() on powerpc

2014-11-02 Thread Wei Yang
When driver remove a pci_dev, it will call pcibios_disable_device() which is
platform dependent. This gives flexibility to platforms.

This patch defines this weak function on powerpc architecture.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/machdep.h |5 -
 arch/powerpc/kernel/pci-common.c   |8 
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 307347f..8242262 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -240,7 +240,10 @@ struct machdep_calls {
 
/* Called when pci_enable_device() is called. Returns 0 to
 * allow assignment/enabling of the device. */
-   int  (*pcibios_enable_device_hook)(struct pci_dev *);
+   int (*pcibios_enable_device_hook)(struct pci_dev *);
+
+   /* Called when pci_disable_device() is called. */
+   void (*pcibios_disable_device_hook)(struct pci_dev *);
 
/* Called after scan and before resource survey */
void (*pcibios_fixup_phb)(struct pci_controller *hose);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 399d813..17acfa7 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1452,6 +1452,14 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
return pci_enable_resources(dev, mask);
 }
 
+void pcibios_disable_device(struct pci_dev *dev)
+{
+   if (ppc_md.pcibios_disable_device_hook)
+   ppc_md.pcibios_disable_device_hook(dev);
+
+   return;
+}
+
 resource_size_t pcibios_io_space_offset(struct pci_controller *hose)
 {
return (unsigned long) hose->io_base_virt - _IO_BASE;
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 09/18] powerpc/pci: remove pci_dn->pcidev field

2014-11-02 Thread Wei Yang
The field pci_dn->pcidev is assigned but not used.

This patch removes this field.

Signed-off-by: Wei Yang 
Acked-by: Gavin Shan 
---
 arch/powerpc/include/asm/pci-bridge.h |1 -
 arch/powerpc/platforms/powernv/pci-ioda.c |1 -
 2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 757d7bb..063d79d 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -166,7 +166,6 @@ struct pci_dn {
 
boolforce_32bit_msi;
 
-   struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
 #endif
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 468a0f2..df49dc6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -727,7 +727,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
struct pnv_ioda_pe *pe)
pci_name(dev));
continue;
}
-   pdn->pcidev = dev;
pdn->pe_number = pe->pe_number;
pe->dma_weight += pnv_ioda_dma_weight(dev);
if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 08/18] powrepc/pci: Refactor pci_dn

2014-11-02 Thread Wei Yang
From: Gavin Shan 

pci_dn is the extension of PCI device node and it's created from
device node. Unfortunately, VFs that are enabled dynamically by
PF's driver and they don't have corresponding device nodes, and
pci_dn. The patch refactors pci_dn to support VFs:

   * pci_dn is organized as a hierarchy tree. VF's pci_dn is put
 to the child list of pci_dn of PF's bridge. pci_dn of other
 device put to the child list of pci_dn of its upstream bridge.

   * VF's pci_dn is expected to be created dynamically when applying
 final fixup to PF. VF's pci_dn will be destroyed when releasing
 PF's pci_dev instance. pci_dn of other device is still created
 from device node as before.

   * For one particular PCI device (VF or not), its pci_dn can be
 found from pdev->dev.archdata.firmware_data, PCI_DN(devnode),
 or parent's list. The fast path (fetching pci_dn through PCI
 device instance) is populated during early fixup time.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/include/asm/device.h |3 +
 arch/powerpc/include/asm/pci-bridge.h |   14 +-
 arch/powerpc/kernel/pci-hotplug.c |3 +
 arch/powerpc/kernel/pci_dn.c  |  246 -
 4 files changed, 261 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 38faede..29992cd 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -34,6 +34,9 @@ struct dev_archdata {
 #ifdef CONFIG_SWIOTLB
dma_addr_t  max_direct_dma_addr;
 #endif
+#ifdef CONFIG_PPC64
+   void*firmware_data;
+#endif
 #ifdef CONFIG_EEH
struct eeh_dev  *edev;
 #endif
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 4ca90a3..757d7bb 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -89,6 +89,7 @@ struct pci_controller {
 
 #ifdef CONFIG_PPC64
unsigned long buid;
+   void *firmware_data;
 #endif /* CONFIG_PPC64 */
 
void *private_data;
@@ -150,9 +151,13 @@ static inline int isa_vaddr_is_ioport(void __iomem 
*address)
 struct iommu_table;
 
 struct pci_dn {
+   int flags;
+#define PCI_DN_FLAG_IOV_VF 0x01
+
int busno;  /* pci bus number */
int devfn;  /* pci device and function number */
 
+   struct  pci_dn *parent;
struct  pci_controller *phb;/* for pci devices */
struct  iommu_table *iommu_table;   /* for phb's or bridges */
struct  device_node *node;  /* back-pointer to the device_node */
@@ -169,14 +174,19 @@ struct pci_dn {
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
 #endif
+   struct list_head child_list;
+   struct list_head list;
 };
 
 /* Get the pointer to a device_node's pci_dn */
 #define PCI_DN(dn) ((struct pci_dn *) (dn)->data)
 
+extern struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
+  int devfn);
 extern struct pci_dn *pci_get_pdn(struct pci_dev *pdev);
-
-extern void * update_dn_pci_info(struct device_node *dn, void *data);
+extern struct pci_dn *add_dev_pci_info(struct pci_dev *pdev);
+extern void remove_dev_pci_info(struct pci_dev *pdev);
+extern void *update_dn_pci_info(struct device_node *dn, void *data);
 
 static inline int pci_device_from_OF_node(struct device_node *np,
  u8 *bus, u8 *devfn)
diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 5b78917..af60efe 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -30,6 +30,9 @@
 void pcibios_release_device(struct pci_dev *dev)
 {
eeh_remove_device(dev);
+
+   /* Release firmware data */
+   remove_dev_pci_info(dev);
 }
 
 /**
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 1f61fab..fa966ae 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -32,12 +32,222 @@
 #include 
 #include 
 
+/*
+ * The function is used to find the firmware data of one
+ * specific PCI device, which is attached to the indicated
+ * PCI bus. For VFs, their firmware data is linked to that
+ * one of PF's bridge. For other devices, their firmware
+ * data is linked to that of their bridge.
+ */
+static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus)
+{
+   struct pci_bus *pbus;
+   struct device_node *dn;
+   struct pci_dn *pdn;
+
+   /*
+* We probably have virtual bus which doesn't
+* have associated bridge.
+*/
+   pbus = bus;
+   while (pbus) {
+   if (pci_is_root_bus(pbus) || pbus->self)
+   break;
+
+   pbus = pbus->parent;
+   }
+
+   /*
+* Except virtual bus, all PCI buses should
+* have device nodes.
+*/
+   dn = pc

[PATCH V9 10/18] powerpc/powernv: Use pci_dn in PCI config accessor

2014-11-02 Thread Wei Yang
The PCI config accessors rely on device node. Unfortunately, VFs
don't have corresponding device nodes. So we have to switch to
pci_dn for PCI config access.

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c |   14 +-
 arch/powerpc/platforms/powernv/pci.c |   69 ++
 arch/powerpc/platforms/powernv/pci.h |4 +-
 3 files changed, 40 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 1d19e79..c63b6c1 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -419,21 +419,31 @@ static inline bool powernv_eeh_cfg_blocked(struct 
device_node *dn)
 static int powernv_eeh_read_config(struct device_node *dn,
   int where, int size, u32 *val)
 {
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn)
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
if (powernv_eeh_cfg_blocked(dn)) {
*val = 0x;
return PCIBIOS_SET_FAILED;
}
 
-   return pnv_pci_cfg_read(dn, where, size, val);
+   return pnv_pci_cfg_read(pdn, where, size, val);
 }
 
 static int powernv_eeh_write_config(struct device_node *dn,
int where, int size, u32 val)
 {
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn)
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
if (powernv_eeh_cfg_blocked(dn))
return PCIBIOS_SET_FAILED;
 
-   return pnv_pci_cfg_write(dn, where, size, val);
+   return pnv_pci_cfg_write(pdn, where, size, val);
 }
 
 /**
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index b2187d0..f8dbb3f 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -368,9 +368,9 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, 
u32 pe_no)
spin_unlock_irqrestore(&phb->lock, flags);
 }
 
-static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
-struct device_node *dn)
+static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
 {
+   struct pnv_phb *phb = pdn->phb->private_data;
u8  fstate;
__be16  pcierr;
int pe_no;
@@ -381,7 +381,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
 * setup that yet. So all ER errors should be mapped to
 * reserved PE.
 */
-   pe_no = PCI_DN(dn)->pe_number;
+   pe_no = pdn->pe_number;
if (pe_no == IODA_INVALID_PE) {
if (phb->type == PNV_PHB_P5IOC2)
pe_no = 0;
@@ -409,8 +409,7 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
}
 
cfg_dbg(" -> EEH check, bdfn=%04x PE#%d fstate=%x\n",
-   (PCI_DN(dn)->busno << 8) | (PCI_DN(dn)->devfn),
-   pe_no, fstate);
+   (pdn->busno << 8) | (pdn->devfn), pe_no, fstate);
 
/* Clear the frozen state if applicable */
if (fstate == OPAL_EEH_STOPPED_MMIO_FREEZE ||
@@ -427,10 +426,9 @@ static void pnv_pci_config_check_eeh(struct pnv_phb *phb,
}
 }
 
-int pnv_pci_cfg_read(struct device_node *dn,
+int pnv_pci_cfg_read(struct pci_dn *pdn,
 int where, int size, u32 *val)
 {
-   struct pci_dn *pdn = PCI_DN(dn);
struct pnv_phb *phb = pdn->phb->private_data;
u32 bdfn = (pdn->busno << 8) | pdn->devfn;
s64 rc;
@@ -464,10 +462,9 @@ int pnv_pci_cfg_read(struct device_node *dn,
return PCIBIOS_SUCCESSFUL;
 }
 
-int pnv_pci_cfg_write(struct device_node *dn,
+int pnv_pci_cfg_write(struct pci_dn *pdn,
  int where, int size, u32 val)
 {
-   struct pci_dn *pdn = PCI_DN(dn);
struct pnv_phb *phb = pdn->phb->private_data;
u32 bdfn = (pdn->busno << 8) | pdn->devfn;
 
@@ -491,18 +488,17 @@ int pnv_pci_cfg_write(struct device_node *dn,
 }
 
 #if CONFIG_EEH
-static bool pnv_pci_cfg_check(struct pci_controller *hose,
- struct device_node *dn)
+static bool pnv_pci_cfg_check(struct pci_dn *pdn)
 {
struct eeh_dev *edev = NULL;
-   struct pnv_phb *phb = hose->private_data;
+   struct pnv_phb *phb = pdn->phb->private_data;
 
/* EEH not enabled ? */
if (!(phb->flags & PNV_PHB_FLAG_EEH))
return true;
 
/* PE reset or device removed ? */
-   edev = of_node_to_eeh_dev(dn);
+   edev = pdn->edev;
if (edev) {
if (edev->pe &&
(edev->pe->state & EEH_PE_CFG_BLOCKED))
@@ -515,8 +511,7 @@ static bool pnv_pci_cfg_check(struct pci_controller *hose,
return true;
 }
 #else
-static inline pnv_pci_cfg_check(struct pci_controller *hose,
-   struct device_node *dn)
+static inline pnv_pci_cfg_check(struct pci_dn *pdn)
 {
return true;
 }
@@ 

[PATCH V9 11/18] powerpc/powernv: Allocate pe->iommu_table dynamically

2014-11-02 Thread Wei Yang
Current iommu_table of a PE is a static field. This will have a problem when
iommu_free_table is called.

This patch allocate iommu_table dynamically.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/iommu.h  |3 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |   26 ++
 arch/powerpc/platforms/powernv/pci.h  |2 +-
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 42632c7..0fedacb 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -78,6 +78,9 @@ struct iommu_table {
struct iommu_group *it_group;
 #endif
void (*set_bypass)(struct iommu_table *tbl, bool enable);
+#ifdef CONFIG_PPC_POWERNV
+   void   *data;
+#endif
 };
 
 /* Pure 2^n version of get_order */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index df49dc6..6d35ed9 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -785,6 +785,10 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, int 
all)
return;
}
 
+   pe->tce32_table = kzalloc_node(sizeof(struct iommu_table),
+   GFP_KERNEL, hose->node);
+   pe->tce32_table->data = pe;
+
/* Associate it with all child devices */
pnv_ioda_setup_same_PE(bus, pe);
 
@@ -858,7 +862,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, 
struct pci_dev *pdev
 
pe = &phb->ioda.pe_array[pdn->pe_number];
WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
-   set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table);
+   set_iommu_table_base_and_group(&pdev->dev, pe->tce32_table);
 }
 
 static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
@@ -885,7 +889,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
} else {
dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
set_dma_ops(&pdev->dev, &dma_iommu_ops);
-   set_iommu_table_base(&pdev->dev, &pe->tce32_table);
+   set_iommu_table_base(&pdev->dev, pe->tce32_table);
}
*pdev->dev.dma_mask = dma_mask;
return 0;
@@ -922,9 +926,9 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
list_for_each_entry(dev, &bus->devices, bus_list) {
if (add_to_iommu_group)
set_iommu_table_base_and_group(&dev->dev,
-  &pe->tce32_table);
+  pe->tce32_table);
else
-   set_iommu_table_base(&dev->dev, &pe->tce32_table);
+   set_iommu_table_base(&dev->dev, pe->tce32_table);
 
if (dev->subordinate)
pnv_ioda_setup_bus_dma(pe, dev->subordinate,
@@ -1014,8 +1018,7 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe *pe,
 void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
 __be64 *startp, __be64 *endp, bool rm)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl->data;
struct pnv_phb *phb = pe->phb;
 
if (phb->type == PNV_PHB_IODA1)
@@ -1081,7 +1084,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
}
 
/* Setup linux iommu table */
-   tbl = &pe->tce32_table;
+   tbl = pe->tce32_table;
pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
  base << 28, IOMMU_PAGE_SHIFT_4K);
 
@@ -1119,8 +1122,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+   struct pnv_ioda_pe *pe = tbl->data;
uint16_t window_id = (pe->pe_number << 1 ) + 1;
int64_t rc;
 
@@ -1165,10 +1167,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct 
pnv_phb *phb,
pe->tce_bypass_base = 1ull << 59;
 
/* Install set_bypass callback for VFIO */
-   pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass;
+   pe->tce32_table->set_bypass = pnv_pci_ioda2_set_bypass;
 
/* Enable bypass by default */
-   pnv_pci_ioda2_set_bypass(&pe->tce32_table, true);
+   pnv_pci_ioda2_set_bypass(pe->tce32_table, true);
 }
 
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
@@ -1216,7 +1218,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
}
 
/* Setup linux iommu table */
-   tbl = &pe->tce32_table;
+   tbl = pe->tce32_table;
pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
  

[PATCH V9 12/18] powerpc/powernv: Expand VF resources according to the number of total_pe

2014-11-02 Thread Wei Yang
On PHB3, PF IOV BAR will be covered by M64 BAR to have better PE isolation.
Mostly the total_pe number is different from the total_VFs, which will lead to
a conflict between MMIO space and the PE number.

For example, total_VFs is 128 and total_pe is 256, then the second half of M64
BAR space will be part of other PCI device, which may already belongs to other
PEs.

This patch expands the PF IOV BAR size to reserve total_pe number of VF's BAR
size, which prevents the conflict.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/machdep.h|4 +++
 arch/powerpc/include/asm/pci-bridge.h |3 ++
 arch/powerpc/kernel/pci-common.c  |5 +++
 arch/powerpc/platforms/powernv/pci-ioda.c |   56 +
 4 files changed, 68 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 8242262..86d47ca 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -254,6 +254,10 @@ struct machdep_calls {
/* Reset the secondary bus of bridge */
void  (*pcibios_reset_secondary_bus)(struct pci_dev *dev);
 
+#ifdef CONFIG_PCI_IOV
+   void (*pcibios_fixup_sriov)(struct pci_bus *bus);
+#endif /* CONFIG_PCI_IOV */
+
/* Called to shutdown machine specific hardware not already controlled
 * by other drivers.
 */
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 063d79d..ff26045 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -172,6 +172,9 @@ struct pci_dn {
 #define IODA_INVALID_PE(-1)
 #ifdef CONFIG_PPC_POWERNV
int pe_number;
+#ifdef CONFIG_PCI_IOV
+   u16 max_vfs;/* number of VFs IOV BAR expended */
+#endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
struct list_head list;
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 17acfa7..4e3d87d 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1645,6 +1645,11 @@ void pcibios_scan_phb(struct pci_controller *hose)
if (ppc_md.pcibios_fixup_phb)
ppc_md.pcibios_fixup_phb(hose);
 
+#ifdef CONFIG_PCI_IOV
+   if (ppc_md.pcibios_fixup_sriov)
+   ppc_md.pcibios_fixup_sriov(bus);
+#endif /* CONFIG_PCI_IOV */
+
/* Configure PCI Express settings */
if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
struct pci_bus *child;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 6d35ed9..ef279d3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1601,6 +1601,59 @@ static void pnv_pci_init_ioda_msis(struct pnv_phb *phb)
 static void pnv_pci_init_ioda_msis(struct pnv_phb *phb) { }
 #endif /* CONFIG_PCI_MSI */
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   struct resource *res;
+   int i;
+   resource_size_t size;
+   struct pci_dn *pdn;
+
+   if (!pdev->is_physfn || pdev->is_added)
+   return;
+
+   hose = pci_bus_to_host(pdev->bus);
+   phb = hose->private_data;
+
+   pdn = pci_get_pdn(pdev);
+   pdn->max_vfs = 0;
+
+   for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
+   res = &pdev->resource[i];
+   if (!res->flags || res->parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res->flags)) {
+   dev_warn(&pdev->dev, " Skipping expanding IOV BAR %pR 
on %s\n",
+res, pci_name(pdev));
+   continue;
+   }
+
+   dev_dbg(&pdev->dev, " Fixing VF BAR[%d] %pR to\n", i, res);
+   size = pci_iov_resource_size(pdev, i);
+   res->end = res->start + size * phb->ioda.total_pe - 1;
+   dev_dbg(&pdev->dev, "   %pR\n", res);
+   }
+   pdn->max_vfs = phb->ioda.total_pe;
+}
+
+static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
+{
+   struct pci_dev *pdev;
+   struct pci_bus *b;
+
+   list_for_each_entry(pdev, &bus->devices, bus_list) {
+   b = pdev->subordinate;
+
+   if (b)
+   pnv_pci_ioda_fixup_sriov(b);
+
+   pnv_pci_ioda_fixup_iov_resources(pdev);
+   }
+}
+#endif /* CONFIG_PCI_IOV */
+
 /*
  * This function is supposed to be called on basis of PE from top
  * to bottom style. So the the I/O or MMIO segment assigned to
@@ -1983,6 +2036,9 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
ppc_md.pcibios_enable_device_hook = pnv_pci_enable_device_hook;
ppc_md.pcibios_window_alignment = pnv_pci_window_alignment;
ppc_md

[PATCH V9 14/18] powerpc/powernv: Implement pcibios_iov_resource_size() on powernv

2014-11-02 Thread Wei Yang
On PowerNV platform, the PF's IOV BAR size would be expanded, which is
different from the normal case.

This patch retrieves the VF BAR size by total size dividing the expanded number
of VFs.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/machdep.h|1 +
 arch/powerpc/kernel/pci-common.c  |8 
 arch/powerpc/platforms/powernv/pci-ioda.c |   15 +++
 3 files changed, 24 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 15a13e6..d971874 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -259,6 +259,7 @@ struct machdep_calls {
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *,
int resno,
resource_size_t 
align);
+   resource_size_t (*pcibios_iov_resource_size)(struct pci_dev *, int 
resno);
 #endif /* CONFIG_PCI_IOV */
 
/* Called to shutdown machine specific hardware not already controlled
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 581e67b..a2a96d3 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -143,6 +143,14 @@ resource_size_t pcibios_iov_resource_alignment(struct 
pci_dev *pdev,
 
return 0;
 }
+
+resource_size_t pcibios_iov_resource_size(struct pci_dev *pdev, int resno)
+{
+   if (ppc_md.pcibios_iov_resource_size)
+   return ppc_md.pcibios_iov_resource_size(pdev, resno);
+
+   return 0;
+}
 #endif /* CONFIG_PCI_IOV */
 
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 880d76d..f7abba3 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1847,6 +1847,20 @@ static resource_size_t 
pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
 
return align;
 }
+
+static resource_size_t pnv_pci_iov_resource_size(struct pci_dev *pdev, int 
resno)
+{
+   struct pci_dn *pdn = pci_get_pdn(pdev);
+   resource_size_t size = 0;
+
+   if (!pdn->max_vfs)
+   return size;
+
+   size = resource_size(pdev->resource + resno);
+   do_div(size, pdn->max_vfs);
+
+   return size;
+}
 #endif /* CONFIG_PCI_IOV */
 
 /* Prevent enabling devices for which we couldn't properly
@@ -2058,6 +2072,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
 #ifdef CONFIG_PCI_IOV
ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov;
ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment;
+   ppc_md.pcibios_iov_resource_size = pnv_pci_iov_resource_size;
 #endif /* CONFIG_PCI_IOV */
pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 13/18] powerpc/powernv: Implement pcibios_iov_resource_alignment() on powernv

2014-11-02 Thread Wei Yang
This patch implements the pcibios_iov_resource_alignment() on powernv
platform.

On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0

pnv_pci_iov_resource_alignment() handle these three cases respectively.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/machdep.h|3 +++
 arch/powerpc/kernel/pci-common.c  |   14 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   20 
 3 files changed, 37 insertions(+)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 86d47ca..15a13e6 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -256,6 +256,9 @@ struct machdep_calls {
 
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_bus *bus);
+   resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *,
+   int resno,
+   resource_size_t 
align);
 #endif /* CONFIG_PCI_IOV */
 
/* Called to shutdown machine specific hardware not already controlled
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 4e3d87d..581e67b 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -131,6 +131,20 @@ void pcibios_reset_secondary_bus(struct pci_dev *dev)
pci_reset_secondary_bus(dev);
 }
 
+#ifdef CONFIG_PCI_IOV
+resource_size_t pcibios_iov_resource_alignment(struct pci_dev *pdev,
+int resno,
+resource_size_t align)
+{
+   if (ppc_md.pcibios_iov_resource_alignment)
+   return ppc_md.pcibios_iov_resource_alignment(pdev,
+  resno,
+  align);
+
+   return 0;
+}
+#endif /* CONFIG_PCI_IOV */
+
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index ef279d3..880d76d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1830,6 +1830,25 @@ static resource_size_t pnv_pci_window_alignment(struct 
pci_bus *bus,
return phb->ioda.io_segsize;
 }
 
+#ifdef CONFIG_PCI_IOV
+static resource_size_t pnv_pci_iov_resource_alignment(struct pci_dev *pdev,
+   int resno,
+   resource_size_t 
align)
+{
+   struct pci_dn *pdn = pci_get_pdn(pdev);
+   resource_size_t iov_align;
+
+   iov_align = resource_size(&pdev->resource[resno]);
+   if (iov_align)
+   return iov_align;
+
+   if (pdn->max_vfs)
+   return pdn->max_vfs * align;
+
+   return align;
+}
+#endif /* CONFIG_PCI_IOV */
+
 /* Prevent enabling devices for which we couldn't properly
  * assign a PE
  */
@@ -2038,6 +2057,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
ppc_md.pcibios_reset_secondary_bus = pnv_pci_reset_secondary_bus;
 #ifdef CONFIG_PCI_IOV
ppc_md.pcibios_fixup_sriov = pnv_pci_ioda_fixup_sriov;
+   ppc_md.pcibios_iov_resource_alignment = pnv_pci_iov_resource_alignment;
 #endif /* CONFIG_PCI_IOV */
pci_add_flags(PCI_REASSIGN_ALL_RSRC);
 
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 15/18] powerpc/powernv: Shift VF resource with an offset

2014-11-02 Thread Wei Yang
On PowrNV platform, resource position in M64 implies the PE# the resource
belongs to. In some particular case, adjustment of a resource is necessary to
locate it to a correct position in M64.

This patch introduces a function to shift the 'real' PF IOV BAR address
according to an offset.

Signed-off-by: Wei Yang 
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   31 +
 1 file changed, 31 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f7abba3..5034a3c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -644,6 +645,36 @@ static unsigned int pnv_ioda_dma_weight(struct pci_dev 
*dev)
return 10;
 }
 
+#ifdef CONFIG_PCI_IOV
+static void pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
+{
+   struct pci_dn *pdn = pci_get_pdn(dev);
+   int i;
+   struct resource *res;
+   resource_size_t size;
+
+   if (!dev->is_physfn)
+   return;
+
+   for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
+   res = &dev->resource[i];
+   if (!res->flags || !res->parent)
+   continue;
+
+   if (!pnv_pci_is_mem_pref_64(res->flags))
+   continue;
+
+   dev_info(&dev->dev, " Shifting VF BAR %pR to\n", res);
+   size = pci_iov_resource_size(dev, i);
+   res->start += size*offset;
+
+   dev_info(&dev->dev, " %pR\n", res);
+   pci_update_resource(dev, i);
+   }
+   pdn->max_vfs -= offset;
+}
+#endif /* CONFIG_PCI_IOV */
+
 #if 0
 static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
 {
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 16/18] powerpc/powernv: Allocate VF PE

2014-11-02 Thread Wei Yang
VFs are created, when pci device is enabled.

This patch tries best to assign maximum resources and PEs for VF when pci
device is enabled. Enough M64 assigned to cover the IOV BAR, IOV BAR is
shifted to meet the PE# indicated by M64. VF's pdn->pdev and pdn->pe_number
are fixed.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/pci-bridge.h |4 +
 arch/powerpc/kernel/pci_dn.c  |   11 +
 arch/powerpc/platforms/powernv/pci-ioda.c |  460 -
 arch/powerpc/platforms/powernv/pci.c  |   18 ++
 arch/powerpc/platforms/powernv/pci.h  |7 +
 5 files changed, 487 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index ff26045..8d8d40a 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -174,6 +174,10 @@ struct pci_dn {
int pe_number;
 #ifdef CONFIG_PCI_IOV
u16 max_vfs;/* number of VFs IOV BAR expended */
+   u16 vf_pes; /* VF PE# under this PF */
+   int offset; /* PE# for the first VF PE */
+#define IODA_INVALID_M64(-1)
+   int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index fa966ae..dbc2f55 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -216,6 +216,17 @@ void remove_dev_pci_info(struct pci_dev *pdev)
struct pci_dn *pdn, *tmp;
int i;
 
+   /*
+* VF and VF PE is create/released dynamicly, which we need to
+* bind/unbind them. Otherwise when re-enable SRIOV, the VF and VF PE
+* would be mismatched.
+*/
+   if (pdev->is_virtfn) {
+   pdn = pci_get_pdn(pdev);
+   pdn->pe_number = IODA_INVALID_PE;
+   return;
+   }
+
/* Only support IOV PF for now */
if (!pdev->is_physfn)
return;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5034a3c..649f49d4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -45,6 +45,9 @@
 #include "powernv.h"
 #include "pci.h"
 
+/* 256M DMA window, 4K TCE pages, 8 bytes TCE */
+#define TCE32_TABLE_SIZE   ((0x1000 / 0x1000) * 8)
+
 static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
const char *fmt, ...)
 {
@@ -57,11 +60,18 @@ static void pe_level_printk(const struct pnv_ioda_pe *pe, 
const char *level,
vaf.fmt = fmt;
vaf.va = &args;
 
-   if (pe->pdev)
+   if (pe->flags & PNV_IODA_PE_DEV)
strlcpy(pfix, dev_name(&pe->pdev->dev), sizeof(pfix));
-   else
+   else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
sprintf(pfix, "%04x:%02x ",
pci_domain_nr(pe->pbus), pe->pbus->number);
+#ifdef CONFIG_PCI_IOV
+   else if (pe->flags & PNV_IODA_PE_VF)
+   sprintf(pfix, "%04x:%02x:%2x.%d",
+   pci_domain_nr(pe->parent_dev->bus),
+   (pe->rid & 0xff00) >> 8,
+   PCI_SLOT(pe->rid), PCI_FUNC(pe->rid));
+#endif /* CONFIG_PCI_IOV*/
 
printk("%spci %s: [PE# %.3d] %pV",
   level, pfix, pe->pe_number, &vaf);
@@ -508,6 +518,89 @@ static struct pnv_ioda_pe *pnv_ioda_get_pe(struct pci_dev 
*dev)
 }
 #endif /* CONFIG_PCI_MSI */
 
+#ifdef CONFIG_PCI_IOV
+static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
+{
+   struct pci_dev *parent;
+   uint8_t bcomp, dcomp, fcomp;
+   int64_t rc;
+   long rid_end, rid;
+
+   /* Currently, we just deconfigure VF PE. Bus PE will always there.*/
+   if (pe->pbus) {
+   int count;
+
+   dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
+   fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
+   parent = pe->pbus->self;
+   if (pe->flags & PNV_IODA_PE_BUS_ALL)
+   count = pe->pbus->busn_res.end - 
pe->pbus->busn_res.start + 1;
+   else
+   count = 1;
+
+   switch(count) {
+   case  1: bcomp = OpalPciBusAll; break;
+   case  2: bcomp = OpalPciBus7Bits;   break;
+   case  4: bcomp = OpalPciBus6Bits;   break;
+   case  8: bcomp = OpalPciBus5Bits;   break;
+   case 16: bcomp = OpalPciBus4Bits;   break;
+   case 32: bcomp = OpalPciBus3Bits;   break;
+   default:
+   pr_err("%s: Number of subordinate busses %d"
+  " unsupported\n",
+  pci_is_root_bus(pe->pbus)?"root 
bus":pci_name(pe->pbus->self),
+ 

[PATCH V9 17/18] powerpc/powernv: Expanding IOV BAR, with m64_per_iov supported

2014-11-02 Thread Wei Yang
M64 aperture size is limited on PHB3. When the IOV BAR is too big, this will
exceed the limitation and failed to be assigned.

This patch introduce a different expanding based on the IOV BAR size:

IOV BAR size is smaller than 64M, expand to total_pe.
IOV BAR size is bigger than 64M, roundup power2.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/pci-bridge.h |2 ++
 arch/powerpc/platforms/powernv/pci-ioda.c |   31 +++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 8d8d40a..a336c7a 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -176,6 +176,8 @@ struct pci_dn {
u16 max_vfs;/* number of VFs IOV BAR expended */
u16 vf_pes; /* VF PE# under this PF */
int offset; /* PE# for the first VF PE */
+#define M64_PER_IOV 4
+   int m64_per_iov;
 #define IODA_INVALID_M64(-1)
int m64_wins[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 649f49d4..a4e78ab 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2059,6 +2059,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
int i;
resource_size_t size;
struct pci_dn *pdn;
+   int mul, total_vfs;
 
if (!pdev->is_physfn || pdev->is_added)
return;
@@ -2069,6 +2070,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
pdn = pci_get_pdn(pdev);
pdn->max_vfs = 0;
 
+   total_vfs = pci_sriov_get_totalvfs(pdev);
+   pdn->m64_per_iov = 1;
+   mul = phb->ioda.total_pe;
+
+   for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
+   res = &pdev->resource[i];
+   if (!res->flags || res->parent)
+   continue;
+   if (!pnv_pci_is_mem_pref_64(res->flags)) {
+   dev_warn(&pdev->dev, " non M64 IOV BAR %pR on %s\n",
+   res, pci_name(pdev));
+   continue;
+   }
+
+   size = pci_iov_resource_size(pdev, i);
+
+   /* bigger than 64M */
+   if (size > (1 << 26)) {
+   dev_info(&pdev->dev, "PowerNV: VF BAR[%d] size "
+   "is bigger than 64M, roundup power2\n", 
i);
+   pdn->m64_per_iov = M64_PER_IOV;
+   mul = __roundup_pow_of_two(total_vfs);
+   break;
+   }
+   }
+
for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
res = &pdev->resource[i];
if (!res->flags || res->parent)
@@ -2081,10 +2108,10 @@ static void pnv_pci_ioda_fixup_iov_resources(struct 
pci_dev *pdev)
 
dev_dbg(&pdev->dev, " Fixing VF BAR[%d] %pR to\n", i, res);
size = pci_iov_resource_size(pdev, i);
-   res->end = res->start + size * phb->ioda.total_pe - 1;
+   res->end = res->start + size * mul - 1;
dev_dbg(&pdev->dev, "   %pR\n", res);
}
-   pdn->max_vfs = phb->ioda.total_pe;
+   pdn->max_vfs = mul;
 }
 
 static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
-- 
1.7.9.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V9 18/18] powerpc/powernv: Group VF PE when IOV BAR is big on PHB3

2014-11-02 Thread Wei Yang
When IOV BAR is big, each of it is covered by 4 M64 window. This leads to
several VF PE sits in one PE in terms of M64.

This patch group VF PEs according to the M64 allocation.

Signed-off-by: Wei Yang 
---
 arch/powerpc/include/asm/pci-bridge.h |2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  188 +++--
 2 files changed, 149 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index a336c7a..aaf5a31 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -179,7 +179,7 @@ struct pci_dn {
 #define M64_PER_IOV 4
int m64_per_iov;
 #define IODA_INVALID_M64(-1)
-   int m64_wins[PCI_SRIOV_NUM_BARS];
+   int m64_wins[PCI_SRIOV_NUM_BARS][M64_PER_IOV];
 #endif /* CONFIG_PCI_IOV */
 #endif
struct list_head child_list;
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index a4e78ab..ea02131 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -984,26 +984,27 @@ static int pnv_pci_vf_release_m64(struct pci_dev *pdev)
struct pci_controller *hose;
struct pnv_phb*phb;
struct pci_dn *pdn;
-   inti;
+   inti, j;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
phb = hose->private_data;
pdn = pci_get_pdn(pdev);
 
-   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
-   if (pdn->m64_wins[i] == IODA_INVALID_M64)
-   continue;
-   opal_pci_phb_mmio_enable(phb->opal_id,
-   OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i], 0);
-   clear_bit(pdn->m64_wins[i], &phb->ioda.m64_bar_alloc);
-   pdn->m64_wins[i] = IODA_INVALID_M64;
-   }
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+   for (j = 0; j < M64_PER_IOV; j++) {
+   if (pdn->m64_wins[i][j] == IODA_INVALID_M64)
+   continue;
+   opal_pci_phb_mmio_enable(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, pdn->m64_wins[i][j], 0);
+   clear_bit(pdn->m64_wins[i][j], 
&phb->ioda.m64_bar_alloc);
+   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+   }
 
return 0;
 }
 
-static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
+static int pnv_pci_vf_assign_m64(struct pci_dev *pdev, u16 vf_num)
 {
struct pci_bus*bus;
struct pci_controller *hose;
@@ -1011,17 +1012,33 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
struct pci_dn *pdn;
unsigned int   win;
struct resource   *res;
-   inti;
+   inti, j;
int64_trc;
+   inttotal_vfs;
+   resource_size_tsize, start;
+   intpe_num;
+   intvf_groups;
+   intvf_per_group;
 
bus = pdev->bus;
hose = pci_bus_to_host(bus);
phb = hose->private_data;
pdn = pci_get_pdn(pdev);
+   total_vfs = pci_sriov_get_totalvfs(pdev);
 
/* Initialize the m64_wins to IODA_INVALID_M64 */
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
-   pdn->m64_wins[i] = IODA_INVALID_M64;
+   for (j = 0; j < M64_PER_IOV; j++)
+   pdn->m64_wins[i][j] = IODA_INVALID_M64;
+
+   if (pdn->m64_per_iov == M64_PER_IOV) {
+   vf_groups = (vf_num <= M64_PER_IOV) ? vf_num: M64_PER_IOV;
+   vf_per_group = (vf_num <= M64_PER_IOV)? 1:
+   __roundup_pow_of_two(vf_num) / pdn->m64_per_iov;
+   } else {
+   vf_groups = 1;
+   vf_per_group = 1;
+   }
 
for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
res = pdev->resource + PCI_IOV_RESOURCES + i;
@@ -1031,33 +1048,61 @@ static int pnv_pci_vf_assign_m64(struct pci_dev *pdev)
if (!pnv_pci_is_mem_pref_64(res->flags))
continue;
 
-   do {
-   win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
-   phb->ioda.m64_bar_idx + 1, 0);
-
-   if (win >= phb->ioda.m64_bar_idx + 1)
-   goto m64_failed;
-   } while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
+   for (j = 0; j < vf_groups; j++) {
+   do {
+   win = 
find_next_zero_bit(&phb->ioda.m64_bar_alloc,
+   phb->ioda.m64_bar_idx + 1, 0);
+
+   if (win >= phb->ioda.m64_bar_idx + 1)
+   goto m64_faile

[PATCH V3 1/2] powerpc/mm/thp: Remove code duplication

2014-11-02 Thread Aneesh Kumar K.V
Rename invalidate_old_hpte to flush_hash_hugepage and use that in
other places.

Signed-off-by: Aneesh Kumar K.V 
---
Changes from V2:
* split the patch for easier review
* Rebase to latest linus tree

 arch/powerpc/include/asm/tlbflush.h |  3 +-
 arch/powerpc/mm/hash_utils_64.c | 52 ++
 arch/powerpc/mm/hugepage-hash64.c   | 54 ++-
 arch/powerpc/mm/pgtable_64.c| 64 ++---
 4 files changed, 65 insertions(+), 108 deletions(-)

diff --git a/arch/powerpc/include/asm/tlbflush.h 
b/arch/powerpc/include/asm/tlbflush.h
index 2def01ed0cb2..afe57427ef8e 100644
--- a/arch/powerpc/include/asm/tlbflush.h
+++ b/arch/powerpc/include/asm/tlbflush.h
@@ -127,7 +127,8 @@ static inline void arch_leave_lazy_mmu_mode(void)
 extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize,
int ssize, int local);
 extern void flush_hash_range(unsigned long number, int local);
-
+extern void flush_hash_hugepage(unsigned long vsid, unsigned long addr,
+   pmd_t *pmdp, unsigned int psize, int ssize);
 
 static inline void local_flush_tlb_mm(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index d5339a3b9945..26517ea34970 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1315,6 +1315,58 @@ void flush_hash_page(unsigned long vpn, real_pte_t pte, 
int psize, int ssize,
 #endif
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+void flush_hash_hugepage(unsigned long vsid, unsigned long addr,
+pmd_t *pmdp, unsigned int psize, int ssize)
+{
+   int i, max_hpte_count, valid;
+   unsigned long s_addr;
+   unsigned char *hpte_slot_array;
+   unsigned long hidx, shift, vpn, hash, slot;
+
+   s_addr = addr & HPAGE_PMD_MASK;
+   hpte_slot_array = get_hpte_slot_array(pmdp);
+   /*
+* IF we try to do a HUGE PTE update after a withdraw is done.
+* we will find the below NULL. This happens when we do
+* split_huge_page_pmd
+*/
+   if (!hpte_slot_array)
+   return;
+
+   if (ppc_md.hugepage_invalidate)
+   return ppc_md.hugepage_invalidate(vsid, s_addr, hpte_slot_array,
+ psize, ssize);
+   /*
+* No bluk hpte removal support, invalidate each entry
+*/
+   shift = mmu_psize_defs[psize].shift;
+   max_hpte_count = HPAGE_PMD_SIZE >> shift;
+   for (i = 0; i < max_hpte_count; i++) {
+   /*
+* 8 bits per each hpte entries
+* 000| [ secondary group (one bit) | hidx (3 bits) | valid bit]
+*/
+   valid = hpte_valid(hpte_slot_array, i);
+   if (!valid)
+   continue;
+   hidx =  hpte_hash_index(hpte_slot_array, i);
+
+   /* get the vpn */
+   addr = s_addr + (i * (1ul << shift));
+   vpn = hpt_vpn(addr, vsid, ssize);
+   hash = hpt_hash(vpn, shift, ssize);
+   if (hidx & _PTEIDX_SECONDARY)
+   hash = ~hash;
+
+   slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
+   slot += hidx & _PTEIDX_GROUP_IX;
+   ppc_md.hpte_invalidate(slot, vpn, psize,
+  MMU_PAGE_16M, ssize, 0);
+   }
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
 void flush_hash_range(unsigned long number, int local)
 {
if (ppc_md.flush_hash_range)
diff --git a/arch/powerpc/mm/hugepage-hash64.c 
b/arch/powerpc/mm/hugepage-hash64.c
index 5f5e6328c21c..1b3ad46a71b5 100644
--- a/arch/powerpc/mm/hugepage-hash64.c
+++ b/arch/powerpc/mm/hugepage-hash64.c
@@ -18,57 +18,6 @@
 #include 
 #include 
 
-static void invalidate_old_hpte(unsigned long vsid, unsigned long addr,
-   pmd_t *pmdp, unsigned int psize, int ssize)
-{
-   int i, max_hpte_count, valid;
-   unsigned long s_addr;
-   unsigned char *hpte_slot_array;
-   unsigned long hidx, shift, vpn, hash, slot;
-
-   s_addr = addr & HPAGE_PMD_MASK;
-   hpte_slot_array = get_hpte_slot_array(pmdp);
-   /*
-* IF we try to do a HUGE PTE update after a withdraw is done.
-* we will find the below NULL. This happens when we do
-* split_huge_page_pmd
-*/
-   if (!hpte_slot_array)
-   return;
-
-   if (ppc_md.hugepage_invalidate)
-   return ppc_md.hugepage_invalidate(vsid, s_addr, hpte_slot_array,
- psize, ssize);
-   /*
-* No bluk hpte removal support, invalidate each entry
-*/
-   shift = mmu_psize_defs[psize].shift;
-   max_hpte_count = HPAGE_PMD_SIZE >> shift;
-   for (i = 0; i < max_hpte_count; i++) {
-   /*
-* 8 bits per each hp

[PATCH V3 2/2] powerpc/mm/thp: Use tlbiel if possible

2014-11-02 Thread Aneesh Kumar K.V
If we know that user address space has never executed on other cpus
we could use tlbiel.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/machdep.h|  2 +-
 arch/powerpc/include/asm/tlbflush.h   |  3 ++-
 arch/powerpc/mm/hash_native_64.c  |  4 ++--
 arch/powerpc/mm/hash_utils_64.c   | 28 +++-
 arch/powerpc/mm/hugepage-hash64.c |  2 +-
 arch/powerpc/mm/pgtable_64.c  |  9 +++--
 arch/powerpc/platforms/pseries/lpar.c |  2 +-
 7 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 307347f8ddbd..ccc9f9bd1605 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -60,7 +60,7 @@ struct machdep_calls {
void(*hugepage_invalidate)(unsigned long vsid,
   unsigned long addr,
   unsigned char *hpte_slot_array,
-  int psize, int ssize);
+  int psize, int ssize, int local);
/* special for kexec, to be called in real mode, linear mapping is
 * destroyed as well */
void(*hpte_clear_all)(void);
diff --git a/arch/powerpc/include/asm/tlbflush.h 
b/arch/powerpc/include/asm/tlbflush.h
index afe57427ef8e..6a5c1774b32c 100644
--- a/arch/powerpc/include/asm/tlbflush.h
+++ b/arch/powerpc/include/asm/tlbflush.h
@@ -128,7 +128,8 @@ extern void flush_hash_page(unsigned long vpn, real_pte_t 
pte, int psize,
int ssize, int local);
 extern void flush_hash_range(unsigned long number, int local);
 extern void flush_hash_hugepage(unsigned long vsid, unsigned long addr,
-   pmd_t *pmdp, unsigned int psize, int ssize);
+   pmd_t *pmdp, unsigned int psize, int ssize,
+   int local);
 
 static inline void local_flush_tlb_mm(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index ae4962a06476..459840d9b7d3 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -419,7 +419,7 @@ static void native_hpte_invalidate(unsigned long slot, 
unsigned long vpn,
 static void native_hugepage_invalidate(unsigned long vsid,
   unsigned long addr,
   unsigned char *hpte_slot_array,
-  int psize, int ssize)
+  int psize, int ssize, int local)
 {
int i;
struct hash_pte *hptep;
@@ -465,7 +465,7 @@ static void native_hugepage_invalidate(unsigned long vsid,
 * instruction compares entry_VA in tlb with the VA specified
 * here
 */
-   tlbie(vpn, psize, actual_psize, ssize, 0);
+   tlbie(vpn, psize, actual_psize, ssize, local);
}
local_irq_restore(flags);
 }
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 26517ea34970..7a6fa267d1f4 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1317,7 +1317,7 @@ void flush_hash_page(unsigned long vpn, real_pte_t pte, 
int psize, int ssize,
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 void flush_hash_hugepage(unsigned long vsid, unsigned long addr,
-pmd_t *pmdp, unsigned int psize, int ssize)
+pmd_t *pmdp, unsigned int psize, int ssize, int local)
 {
int i, max_hpte_count, valid;
unsigned long s_addr;
@@ -1334,9 +1334,11 @@ void flush_hash_hugepage(unsigned long vsid, unsigned 
long addr,
if (!hpte_slot_array)
return;
 
-   if (ppc_md.hugepage_invalidate)
-   return ppc_md.hugepage_invalidate(vsid, s_addr, hpte_slot_array,
- psize, ssize);
+   if (ppc_md.hugepage_invalidate) {
+   ppc_md.hugepage_invalidate(vsid, s_addr, hpte_slot_array,
+  psize, ssize, local);
+   goto tm_abort;
+   }
/*
 * No bluk hpte removal support, invalidate each entry
 */
@@ -1362,8 +1364,24 @@ void flush_hash_hugepage(unsigned long vsid, unsigned 
long addr,
slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
slot += hidx & _PTEIDX_GROUP_IX;
ppc_md.hpte_invalidate(slot, vpn, psize,
-  MMU_PAGE_16M, ssize, 0);
+  MMU_PAGE_16M, ssize, local);
+   }
+tm_abort:
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+   /* Transactions are not aborted by tlbiel, only tlbie.
+* Without, syncing a page back to a block device w/ PIO could pick up
+* transactional data (bad!) s

Re: [PATCH] powerpc: Replace cc constraint in inline assembly with cr0

2014-11-02 Thread Anton Blanchard
Hi Segher,

> > Our inline assembly only clobbers the first condition register
> > field, but we mark all of them as being clobbered.
> 
> No, we don't.  "cc" has been an alias for cr0 for over twenty two and
> a half years now; it has never changed meaning.  This is an LLVM bug.

Thanks! I opened http://llvm.org/bugs/show_bug.cgi?id=21451 to track it.

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/pseries: Quieten ibm,pcie-link-speed-stats warning

2014-11-02 Thread Anton Blanchard
The ibm,pcie-link-speed-stats isn't mandatory, so we shouldn't print
a high priority error message when missing. One example where we see
this is QEMU.

Reduce it to pr_debug.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/platforms/pseries/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/pci.c 
b/arch/powerpc/platforms/pseries/pci.c
index 67e4859..fe16a50 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -134,7 +134,7 @@ int pseries_root_bridge_prepare(struct pci_host_bridge 
*bridge)
of_node_put(pdn);
 
if (rc) {
-   pr_err("no ibm,pcie-link-speed-stats property\n");
+   pr_debug("no ibm,pcie-link-speed-stats property\n");
return 0;
}
 
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/pseries: Quieten relocation on exceptions warning

2014-11-02 Thread Anton Blanchard
The H_SET_MODE hcall returns H_P2 if a function is not implemented
and all callers should handle this case.

The call to enable relocation on exceptions currently prints an error
message if the feature is not implemented. While H_SET_MODE was
first introduced on POWER8 (which has relocation on exceptions), it
has been now added on some POWER7 configurations (which does not).

Check for H_P2 and print an informational message instead.

Signed-off-by: Anton Blanchard 
---
 arch/powerpc/platforms/pseries/setup.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 125c589..fcc9227 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -499,7 +499,11 @@ static void __init pSeries_setup_arch(void)
 
if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
long rc;
-   if ((rc = pSeries_enable_reloc_on_exc()) != H_SUCCESS) {
+
+   rc = pSeries_enable_reloc_on_exc();
+   if (rc == H_P2) {
+   pr_info("Relocation on exceptions not supported\n");
+   } else if (rc != H_SUCCESS) {
pr_warn("Unable to enable relocation on exceptions: "
"%ld\n", rc);
}
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] powerpc: Fix compilation of emulate_step()

2014-11-02 Thread Paul Mackerras
Commit be96f63375a1 ("powerpc: Split out instruction analysis
part of emulate_step()") added some calls to do_fp_load()
and do_fp_store(), which fail to compile on configs with
CONFIG_PPC_FPU=n and CONFIG_PPC_EMULATE_SSTEP=y.  This fixes
the compile by adding #ifdef CONFIG_PPC_FPU around the code
that calls these functions.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/lib/sstep.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 54651fc..dc885b3 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1865,6 +1865,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned 
int instr)
}
goto ldst_done;
 
+#ifdef CONFIG_PPC_FPU
case LOAD_FP:
if (regs->msr & MSR_LE)
return 0;
@@ -1873,7 +1874,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned 
int instr)
else
err = do_fp_load(op.reg, do_lfd, op.ea, size, regs);
goto ldst_done;
-
+#endif
 #ifdef CONFIG_ALTIVEC
case LOAD_VMX:
if (regs->msr & MSR_LE)
@@ -1919,6 +1920,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned 
int instr)
err = write_mem(op.val, op.ea, size, regs);
goto ldst_done;
 
+#ifdef CONFIG_PPC_FPU
case STORE_FP:
if (regs->msr & MSR_LE)
return 0;
@@ -1927,7 +1929,7 @@ int __kprobes emulate_step(struct pt_regs *regs, unsigned 
int instr)
else
err = do_fp_store(op.reg, do_stfd, op.ea, size, regs);
goto ldst_done;
-
+#endif
 #ifdef CONFIG_ALTIVEC
case STORE_VMX:
if (regs->msr & MSR_LE)
-- 
2.1.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2] powerpc: Save/restore PPR for KVM hypercalls

2014-11-02 Thread Paul Mackerras
From: "Suresh E. Warrier" 

The system call FLIH (first-level interrupt handler) at 0xc00
unconditionally sets hardware priority to medium. For hypercalls, this
means we lose guest OS priority. The front end (do_kvm_0x**) to the
KVM interrupt handler always assumes that PPR priority is saved in
PACA exception save area, so it copies this to the kvm_hstate
structure. For hypercalls, this would be the saved priority from any
previous exception. Eventually, the guest gets resumed with an
incorrect priority.

The fix is to save the PPR priority in PACA exception save area before
switching HMT priorities in the FLIH so that existing code described above
in the KVM interrupt handler can copy it from there into the VCPU's saved
context.

Signed-off-by: Suresh Warrier 
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kernel/exceptions-64s.S | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 72e783e..f67d909 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -292,15 +292,26 @@ decrementer_pSeries:
. = 0xc00
.globl  system_call_pSeries
 system_call_pSeries:
-   HMT_MEDIUM
+   /*
+* Switch to HMT medium priority on systems where we don't support
+* saving/restoring PPR or if CONFIG_KVM_BOOK3S_64_HANDLER is not
+* set. Otherwise, we save PPR in the CONFIG_KVM_BOOK3S_64_HANDLER
+* path before switching priority.
+*/
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+   HMT_MEDIUM_PPR_DISCARD
SET_SCRATCH0(r13)
GET_PACA(r13)
std r9,PACA_EXGEN+EX_R9(r13)
+   OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR);
+   HMT_MEDIUM;
std r10,PACA_EXGEN+EX_R10(r13)
+   OPT_SAVE_REG_TO_PACA(PACA_EXGEN+EX_PPR, r9, CPU_FTR_HAS_PPR);
mfcrr9
KVMTEST(0xc00)
GET_SCRATCH0(r13)
+#else
+   HMT_MEDIUM;
 #endif
SYSCALL_PSERIES_1
SYSCALL_PSERIES_2_RFID
-- 
2.1.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev