[PATCH powerpc/next RESEND] powerpc: spinlock: Fix spin_unlock_wait()

2016-04-19 Thread Boqun Feng
There is an ordering issue with spin_unlock_wait() on powerpc, because
the spin_lock primitive is an ACQUIRE and an ACQUIRE is only ordering
the load part of the operation with memory operations following it.
Therefore the following event sequence can happen:

CPU 1   CPU 2   CPU 3
==  ==
spin_unlock();
spin_lock():
  r1 = *lock; // r1 == 0;
o = object; o = READ_ONCE(object); // reordered here
object = NULL;
smp_mb();
spin_unlock_wait();
  *lock = 1;
smp_mb();
o->dead = true; < o = READ_ONCE(object); > // reordered upwards
if (o) // true
BUG_ON(o->dead); // true!!

To fix this, we add a "nop" ll/sc loop in arch_spin_unlock_wait() on
ppc (arch_spin_is_locked_sync()), the "nop" ll/sc loop reads the lock
value and writes it back atomically, in this way it will synchronize the
view of the lock on CPU1 with that on CPU2. Therefore in the scenario
above, either CPU2 will fail to get the lock at first or CPU1 will see
the lock acquired by CPU2, both cases will eliminate this bug. This is a
similar idea as what Will Deacon did for ARM64 in:

"arm64: spinlock: serialise spin_unlock_wait against concurrent lockers"

Further more, if arch_spin_is_locked_sync() figures out the lock is
locked, we actually don't need to do the "nop" ll/sc trick again, we can
just do a normal load+check loop for the lock to be released, because in
that case, spin_unlock_wait() is called when someone is holding the
lock, and the store part of arch_spin_is_locked_sync() happens before
the unlocking of the current lock holder, which means
arch_spin_is_locked_sync() happens before the next lock acquisition.
With the smp_mb() perceding spin_unlock_wait(), the store of object is
guaranteed to be observed by the next lock holder.

Please note spin_unlock_wait() on powerpc is still not an ACQUIRE after
this fix, the callers should add necessary barriers if they want to
promote it as all the current callers do.

This patch therefore fixes the issue and also cleans the
arch_spin_unlock_wait() a little bit by removing superfluous memory
barriers in loops and consolidating the implementations for PPC32 and
PPC64 into one.

Suggested-by: "Paul E. McKenney" 
Signed-off-by: Boqun Feng 
Reviewed-by: "Paul E. McKenney" 
---
 arch/powerpc/include/asm/spinlock.h | 48 -
 arch/powerpc/lib/locks.c| 16 -
 2 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 523673d7583c..0a517c1a751e 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -64,6 +64,25 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 }
 
 /*
+ * Use a ll/sc loop to read the lock value, the STORE part of this operation is
+ * used for making later lock operation observe it.
+ */
+static inline bool arch_spin_is_locked_sync(arch_spinlock_t *lock)
+{
+   arch_spinlock_t tmp;
+
+   __asm__ __volatile__(
+"1:" PPC_LWARX(%0, 0, %2, 1) "\n"
+"  stwcx. %0, 0, %2\n"
+"  bne- 1b\n"
+   : "=" (tmp), "+m" (*lock)
+   : "r" (lock)
+   : "cr0", "xer");
+
+   return !arch_spin_value_unlocked(tmp);
+}
+
+/*
  * This returns the old value in the lock, so we succeeded
  * in getting the lock if the return value is 0.
  */
@@ -162,12 +181,29 @@ static inline void arch_spin_unlock(arch_spinlock_t *lock)
lock->slock = 0;
 }
 
-#ifdef CONFIG_PPC64
-extern void arch_spin_unlock_wait(arch_spinlock_t *lock);
-#else
-#define arch_spin_unlock_wait(lock) \
-   do { while (arch_spin_is_locked(lock)) cpu_relax(); } while (0)
-#endif
+static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
+{
+   /*
+* Make sure previous loads and stores are observed by other cpu, this
+* pairs with the ACQUIRE barrier in lock.
+*/
+   smp_mb();
+
+   if (!arch_spin_is_locked_sync(lock))
+   return;
+
+   while (!arch_spin_value_unlocked(*lock)) {
+   HMT_low();
+   if (SHARED_PROCESSOR)
+   __spin_yield(lock);
+   }
+   HMT_medium();
+
+   /*
+* No barrier here, caller either relys on the control dependency or
+* should add a necessary barrier afterwards.
+*/
+}
 
 /*
  * Read-write spinlocks, allowing multiple readers
diff --git a/arch/powerpc/lib/locks.c b/arch/powerpc/lib/locks.c
index f7deebdf3365..b7b1237d4aa6 100644
--- a/arch/powerpc/lib/locks.c
+++ b/arch/powerpc/lib/locks.c
@@ -68,19 +68,3 @@ void __rw_yield(arch_rwlock_t *rw)
get_hard_smp_processor_id(holder_cpu), 

Re: [PATCH v8 17/45] powerpc/powernv/ioda1: Improve DMA32 segment track

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 10:49 AM, Gavin Shan wrote:

On Tue, Apr 19, 2016 at 11:50:10AM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:44 PM, Gavin Shan wrote:

In current implementation, the DMA32 segments required by one specific
PE isn't calculated with the information hold in the PE independently.
It conflicts with the PCI hotplug design: PE centralized, meaning the
PE's DMA32 segments should be calculated from the information hold in
the PE independently.

This introduces an array (@dma32_segmap) for every PHB to track the
DMA32 segmeng usage. Besides, this moves the logic calculating PE's
consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
DMA32 segments are calculated/allocated from the information hold in
the PE (DMA32 weight). Also the logic is improved: we try to allocate
as much DMA32 segments as we can. It's acceptable that number of DMA32
segments less than the expected number are allocated.

Signed-off-by: Gavin Shan 



This DMA segments business was the reason why I have not even tried
implementing DDW for POWER7 - it is way too different from POWER8 and there
is no chance that anyone outside Ozlabs will ever try using this in practice;
the same applies to PCI hotplug on POWER7.

I am suggesting to ditch all IODA1 changes from this patchset as this code
will hang around (unused) for may be a year or so and then will be gone as
p5ioc2.



As I knew, some P7 boxes out of Ozlabs have the software stack. At least,
I was heavily relying on P7 box + PowerNV based linux heavily until last
September of last year.


And yet you have not replaced a single physical device on any of our power7 
boxes ;)



My original thoughts are as below. If they're
convincing, I can drop some of IODA1 changes, but not all of them obviously:

- In case customer want to use this combo (P7 box + PowerNV) for any reason.


I have serious doubts we have any customer like this. Or a developer who 
would want this. And OPAL on P7 does not support this either.



- In case developers want to use this combo (P7 box + PowerNV) for any reason.
   For example, no P8 boxes can be found for one particular project, but 
available
   P7 box is still ok for that.


Testing POWER8 PCI hotplug on POWER7 machine is kind of pointless anyway.



- EEH supported on P7/P8 needs hotplug some cases: when hitting excessive 
failures,
   PCI devices and their platform resources (PE, DMA, M32/M64 mapping etc) 
should
   be purged.


EEH recovery should not require resource reallocation, no?


- Current implementation has P7/P8 mixed up to some extent which isn't so good
   as Ben pointed long time ago. It's impossible not to affect P7IOC piece if
   P8 piece is changed in order to support hotplug.


This is understandable.


I'll leave it to Ben.





---
  arch/powerpc/platforms/powernv/pci-ioda.c | 111 +-
  arch/powerpc/platforms/powernv/pci.h  |   7 +-
  2 files changed, 66 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0fc2309..59782fba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2007,20 +2007,54 @@ static unsigned int 
pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
  }

  static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
-  struct pnv_ioda_pe *pe,
-  unsigned int base,
-  unsigned int segs)
+  struct pnv_ioda_pe *pe)
  {

struct page *tce_mem = NULL;
struct iommu_table *tbl;
-   unsigned int tce32_segsz, i;
+   unsigned int weight, total_weight;
+   unsigned int tce32_segsz, base, segs, i;
int64_t rc;
void *addr;

/* XXX FIXME: Handle 64-bit only DMA devices */
/* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
/* XXX FIXME: Allocate multi-level tables on PHB3 */
+   total_weight = pnv_pci_ioda_total_dma_weight(phb);
+   weight = pnv_pci_ioda_pe_dma_weight(pe);
+
+   segs = (weight * phb->ioda.dma32_count) / total_weight;
+   if (!segs)
+   segs = 1;
+
+   /*
+* Allocate contiguous DMA32 segments. We begin with the expected
+* number of segments. With one more attempt, the number of DMA32
+* segments to be allocated is decreased by one until one segment
+* is allocated successfully.
+*/
+   while (segs) {
+   for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
+   for (i = base; i < base + segs; i++) {
+   if (phb->ioda.dma32_segmap[i] !=
+   IODA_INVALID_PE)
+   break;
+   }
+
+   if (i >= base + segs)
+   

Re: [PATCH V11 0/4]perf/powerpc: Add ability to sample intr machine state in powerpc

2016-04-19 Thread Michael Ellerman
On Wed, 2016-04-20 at 00:57 -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Apr 18, 2016 at 03:17:11PM +0530, Anju T escreveu:
> > On Saturday 20 February 2016 10:32 AM, Anju T wrote:
> > > 
> > >  arch/powerpc/Kconfig|  1 +
> > >  arch/powerpc/include/uapi/asm/perf_regs.h   | 50 
> > >  arch/powerpc/perf/Makefile  |  1 +
> > >  arch/powerpc/perf/perf_regs.c   | 91 
> > > +
> > >  tools/perf/arch/powerpc/include/perf_regs.h | 69 ++
> > >  tools/perf/arch/powerpc/util/Build  |  1 +
> > >  tools/perf/arch/powerpc/util/perf_regs.c| 49 
> > >  tools/perf/config/Makefile  |  5 ++
> > >  8 files changed, 267 insertions(+)
> > >  create mode 100644 arch/powerpc/include/uapi/asm/perf_regs.h
> > >  create mode 100644 arch/powerpc/perf/perf_regs.c
> > >  create mode 100644 tools/perf/arch/powerpc/include/perf_regs.h
> > >  create mode 100644 tools/perf/arch/powerpc/util/perf_regs.c
> > > 
> > 
> > Hi,
> > 
> > Can this be taken into the next tree?
> 
> Even the bits in tools/perf/ are arch specific, so I guess this goes via
> the powerpc tree? Michael?

Yeah if that's OK with you.

It doesn't look like it will generate much in the way of merge conflicts.

Do you want to send an ack?

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 12:36 PM, Gavin Shan wrote:

On Tue, Apr 19, 2016 at 07:39:34PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:44 PM, Gavin Shan wrote:

This exports 4 functins, which base on the corresponding OPAL



s/functins/functions/



Thanks.


APIs to get/set PCI slot status. Those functions are going to
be used by PowerNV PCI hotplug driver:

pnv_pci_get_device_tree()opal_get_device_tree()
pnv_pci_get_presence_state() opal_pci_get_presence_state()
pnv_pci_get_power_state()opal_pci_get_power_state()
pnv_pci_set_power_state()opal_pci_set_power_state()

Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
unregister}() to allow registration and unregistration of PCI hotplug
notifier, which will be used to receive PCI hotplug message from
skiboot firmware in PowerNV PCI hotplug driver.

Signed-off-by: Gavin Shan 
---
  arch/powerpc/include/asm/opal-api.h| 17 ++-
  arch/powerpc/include/asm/opal.h|  4 ++
  arch/powerpc/include/asm/pnv-pci.h |  7 +++
  arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
  arch/powerpc/platforms/powernv/pci.c   | 66 ++
  5 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index f8faaae..a6af338 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -158,7 +158,11 @@
  #define OPAL_LEDS_SET_INDICATOR   115
  #define OPAL_CEC_REBOOT2  116
  #define OPAL_CONSOLE_FLUSH117
-#define OPAL_LAST  117
+#define OPAL_GET_DEVICE_TREE   118
+#define OPAL_PCI_GET_PRESENCE_STATE119
+#define OPAL_PCI_GET_POWER_STATE   120
+#define OPAL_PCI_SET_POWER_STATE   121
+#define OPAL_LAST  121

  /* Device tree flags */

@@ -344,6 +348,16 @@ enum OpalPciResetState {
OPAL_ASSERT_RESET   = 1
  };

+enum OpalPciSlotPresentenceState {
+   OPAL_PCI_SLOT_EMPTY = 0,
+   OPAL_PCI_SLOT_PRESENT   = 1
+};
+
+enum OpalPciSlotPowerState {
+   OPAL_PCI_SLOT_POWER_OFF = 0,
+   OPAL_PCI_SLOT_POWER_ON  = 1
+};
+
  enum OpalSlotLedType {
OPAL_SLOT_LED_TYPE_ID = 0,  /* IDENTIFY LED */
OPAL_SLOT_LED_TYPE_FAULT = 1,   /* FAULT LED */
@@ -378,6 +392,7 @@ enum opal_msg_type {
OPAL_MSG_DPO,
OPAL_MSG_PRD,
OPAL_MSG_OCC,
+   OPAL_MSG_PCI_HOTPLUG,
OPAL_MSG_TYPE_MAX,
  };

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9e0039f..899bcb941 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -209,6 +209,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, 
uint64_t buf,
uint64_t size, uint64_t token);
  int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
+int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);

  /* Internal functions */
  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h 
b/arch/powerpc/include/asm/pnv-pci.h
index 6f77f71..d9d095b 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,13 @@
  #include 
  #include 

+extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
+extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
+extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
+extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
+
  int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
  int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index e45b88a..3ea1a855 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -302,3 +302,7 @@ OPAL_CALL(opal_prd_msg, 
OPAL_PRD_MSG);
  OPAL_CALL(opal_leds_get_ind,  OPAL_LEDS_GET_INDICATOR);
  OPAL_CALL(opal_leds_set_ind,  OPAL_LEDS_SET_INDICATOR);
  OPAL_CALL(opal_console_flush, OPAL_CONSOLE_FLUSH);
+OPAL_CALL(opal_get_device_tree,OPAL_GET_DEVICE_TREE);
+OPAL_CALL(opal_pci_get_presence_state, OPAL_PCI_GET_PRESENCE_STATE);

Re: [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 12:33 PM, Gavin Shan wrote:

On Tue, Apr 19, 2016 at 07:34:55PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:44 PM, Gavin Shan wrote:

The skiboot firmware might provide the PCI slot reset capability
which is identified by property "ibm,reset-by-firmware" on the
PCI slot associated device node.

This checks the property. If it exists, the reset request is routed
to firmware. Otherwise, the reset is done by kernel as before.

Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +++-
  1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index e23b063..c8a5217 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -789,7 +789,7 @@ static int pnv_eeh_root_reset(struct pci_controller *hose, 
int option)
return ret;
  }

-static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
+static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
  {
struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
@@ -840,6 +840,45 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int 
option)
return 0;
  }

+static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
+{
+   struct pci_controller *hose;
+   struct pnv_phb *phb;
+   struct device_node *dn = pdev ? pci_device_to_OF_node(pdev) : NULL;
+   uint64_t id = (0x1ul << 60);



What is this 1<<60 for?




As you replied in other threads, it's worthy to have some macros for this
piece of business. This bit indicates the ID of the slot behind a switch
port. If this bit is cleared, the ID represents a PHB slot.


+   uint8_t scope;
+   int64_t rc;
+
+   /*
+* If the firmware can't handle it, we will issue hot reset
+* on the secondary bus despite the requested reset type.
+*/
+   if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
+   return __pnv_eeh_bridge_reset(pdev, option);
+
+   /* The firmware can handle the request */
+   switch (option) {
+   case EEH_RESET_HOT:
+   scope = OPAL_RESET_PCI_HOT;
+   break;
+   case EEH_RESET_FUNDAMENTAL:
+   scope = OPAL_RESET_PCI_FUNDAMENTAL;
+   break;
+   case EEH_RESET_DEACTIVATE:
+   return 0;
+   default:
+   dev_warn(>dev, "%s: Unsupported reset %d\n",
+__func__, option);



Can the userspace trigger this case (via VFIO-EEH) and flood dmesg?



It depends on how you defined message flooding actually. It's abnormal
path caused by program internal error, not external users.



Can QEMU be changed to do something special (cause reset with a wrong 
option) via VFIO/EEH interface in a loop to make this message appear? Or 
the call with a wrong option will never reach this point?









+   return -EINVAL;
+   }
+
+   hose = pci_bus_to_host(pdev->bus);
+   phb = hose->private_data;
+   id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
+   rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
+   return pnv_pci_poll(id, rc, NULL);
+}
+
  static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
  {
int *freset = data;




--
Alexey






--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 12:28 PM, Gavin Shan wrote:

On Tue, Apr 19, 2016 at 07:28:20PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:44 PM, Gavin Shan wrote:

PowerNV platforms runs on top of skiboot firmware that includes
changes to support PCI slots. PCI slots are identified by PHB's
ID or the combo of that and PCI slot ID.

This changes the EEH PowerNV backend to support PCI slots:

* Rename arguments of opal_pci_reset() and opal_pci_poll().
* One more argument (PCI slot's state) added to opal_pci_poll().
* Drop pnv_eeh_phb_poll() and introduce a enhanced similar
  function pnv_pci_poll() that will be used by PowerNV hotplug
  backends.

Signed-off-by: Gavin Shan 
---
  arch/powerpc/include/asm/opal.h  |  4 +--
  arch/powerpc/platforms/powernv/eeh-powernv.c | 42 ++--
  arch/powerpc/platforms/powernv/pci.c | 21 ++
  arch/powerpc/platforms/powernv/pci.h |  1 +
  4 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 07a99e6..9e0039f 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -131,7 +131,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, 
uint16_t pe_number, uint16_t
  int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
uint16_t dma_window_number, uint64_t 
pci_start_addr,
uint64_t pci_mem_size);
-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t 
assert_state);
+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);

  int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
   uint64_t diag_buffer_len);
@@ -148,7 +148,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
  int64_t opal_set_system_attention_led(uint8_t led_action);
  int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
__be16 *pci_error_type, __be16 *severity);
-int64_t opal_pci_poll(uint64_t phb_id);
+int64_t opal_pci_poll(uint64_t id, uint8_t *state);
  int64_t opal_return_cpu(void);
  int64_t opal_check_token(uint64_t token);
  int64_t opal_reinit_cpus(uint64_t flags);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index c7454ba..e23b063 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -717,28 +717,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int 
*delay)
return ret;
  }

-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
-{
-   s64 rc = OPAL_HARDWARE;
-
-   while (1) {
-   rc = opal_pci_poll(phb->opal_id);
-   if (rc <= 0)
-   break;
-
-   if (system_state < SYSTEM_RUNNING)
-   udelay(1000 * rc);
-   else
-   msleep(rc);
-   }
-
-   return rc;
-}
-
  int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
  {
struct pnv_phb *phb = hose->private_data;
s64 rc = OPAL_HARDWARE;
+   int ret;

pr_debug("%s: Reset PHB#%x, option=%d\n",
 __func__, hose->global_number, option);
@@ -753,8 +736,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int 
option)
rc = opal_pci_reset(phb->opal_id,
OPAL_RESET_PHB_COMPLETE,
OPAL_DEASSERT_RESET);
-   if (rc < 0)
-   goto out;

/*
 * Poll state of the PHB until the request is done
@@ -762,24 +743,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int 
option)
 * reset followed by hot reset on root bus. So we also
 * need the PCI bus settlement delay.
 */
-   rc = pnv_eeh_phb_poll(phb);
-   if (option == EEH_RESET_DEACTIVATE) {
+   ret = pnv_pci_poll(phb->opal_id, rc, NULL);
+   if (option == EEH_RESET_DEACTIVATE && !ret) {
if (system_state < SYSTEM_RUNNING)
udelay(1000 * EEH_PE_RST_SETTLE_TIME);
else
msleep(EEH_PE_RST_SETTLE_TIME);
}
-out:
-   if (rc != OPAL_SUCCESS)
-   return -EIO;

-   return 0;
+   return ret;
  }

  static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
  {
struct pnv_phb *phb = hose->private_data;
s64 rc = OPAL_HARDWARE;
+   int ret;

pr_debug("%s: Reset PHB#%x, option=%d\n",
 __func__, hose->global_number, option);
@@ -801,18 +780,13 @@ static int pnv_eeh_root_reset(struct pci_controller 
*hose, int option)
rc = opal_pci_reset(phb->opal_id,
OPAL_RESET_PCI_HOT,

Re: [V2, 02/68] powerpc/mm/nohash: Return correctly from flush_tlb_page

2016-04-19 Thread Michael Ellerman
On Sat, 2016-09-04 at 06:12:58 UTC, "Aneesh Kumar K.V" wrote:
> if it is a hugetlb address return without calling __flush_tlb_page.
 
Why?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V11 0/4]perf/powerpc: Add ability to sample intr machine state in powerpc

2016-04-19 Thread Arnaldo Carvalho de Melo
Em Mon, Apr 18, 2016 at 03:17:11PM +0530, Anju T escreveu:
> On Saturday 20 February 2016 10:32 AM, Anju T wrote:
> >This short patch series adds the ability to sample the interrupted
> >machine state for each hardware sample.
> >
> >To test this patchset,
> >Eg:
> >
> >$ perf record -I?   # list supported registers
> >
> >output:
> >available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 
> >r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 nip msr 
> >orig_r3 ctr link xer ccr softe trap dar dsisr
> >
> >  usage: perf record [] []
> > or: perf record [] --  []
> >
> > -I, --intr-regs[=]
> >   sample selected machine registers on interrupt, 
> > use -I ? to list register names
> >
> >
> >$ perf record -I ls   # record machine state at interrupt
> >$ perf script -D  # read the perf.data file
> >
> >Sample output obtained for this patchset/ output looks like as follows:
> >
> >496768515470 0x1988 [0x188]: PERF_RECORD_SAMPLE(IP, 0x1): 4522/4522: 
> >0xc01e538c period: 1 addr: 0
> >... intr regs: mask 0x7ff ABI 64-bit
> > r00xc01e5e34
> > r10xc00fe733f9a0
> > r20xc1523100
> > r30xc00ffaadeb60
> > r40xc3456800
> > r50x73a9b5e000
> > r60x1e00
> > r70x0
> > r80x0
> > r90x0
> > r10   0x1
> > r11   0x0
> > r12   0x24022822
> > r13   0xcfeec180
> > r14   0x0
> > r15   0xc01e4be18800
> > r16   0x0
> > r17   0xc00ffaac5000
> > r18   0xc00fe733f8a0
> > r19   0xc1523100
> > r20   0xc009fd1c
> > r21   0xc00fcaa69000
> > r22   0xc01e4968
> > r23   0xc1523100
> > r24   0xc00fe733f850
> > r25   0xc00fcaa69000
> > r26   0xc3b8fcf0
> > r27   0xfead
> > r28   0x0
> > r29   0xc00fcaa69000
> > r30   0x1
> > r31   0x0
> > nip   0xc01dd320
> > msr   0x90009032
> > orig_r3 0xc01e538c
> > ctr   0xc009d550
> > link  0xc01e5e34
> > xer   0x0
> > ccr   0x84022882
> > softe 0x0
> > trap  0xf01
> > dar   0x0
> > dsisr 0xf0004006004
> >  ... thread: :4522:4522
> >  .. dso: 
> > /root/.debug/.build-id/b0/ef11b1a1629e62ac9de75199117ee5ef9469e9
> >:4522  4522   496.768515:  1 cycles:  c01e538c 
> > .perf_event_context_sched_in (/boot/vmlinux)
> >
> >
> >
> >Changes from v10:
> >
> >- Included SOFTE as suggested by mpe
> >- The name of registers displayed is  changed from
> >   gpr* to r* also the macro names changed from
> >   PERF_REG_POWERPC_GPR* to PERF_REG_POWERPC_R*.
> >- The conflict in returning the ABI is resolved.
> >- #define PERF_REG_SP  is again changed to  PERF_REG_POWERPC_R1
> >- Comment in tools/perf/config/Makefile is updated.
> >- removed the "Reviewed-By" tag as the patch has logic changes.
> >
> >
> >Changes from V9:
> >
> >- Changed the name displayed for link register from "lnk" to "link" in
> >   tools/perf/arch/powerpc/include/perf_regs.h
> >
> >changes from V8:
> >
> >- Corrected the indentation issue in the Makefile mentioned in 3rd patch
> >
> >Changes from V7:
> >
> >- Addressed the new line issue in 3rd patch.
> >
> >Changes from V6:
> >
> >- Corrected the typo in patch  tools/perf: Map the ID values with register 
> >names.
> >   ie #define PERF_REG_SP  PERF_REG_POWERPC_R1 should be #define PERF_REG_SP 
> >   PERF_REG_POWERPC_GPR1
> >
> >
> >Changes from V5:
> >
> >- Enabled perf_sample_regs_user also in this patch set.Functions added in
> >arch/powerpc/perf/perf_regs.c
> >- Added Maddy's patch to this patchset for enabling -I? option which will
> >   list the supported register names.
> >
> >
> >Changes from V4:
> >
> >- Removed the softe and MQ from all patches
> >- Switch case is replaced with an array in the 3rd patch
> >
> >Changes from V3:
> >
> >- Addressed the comments by Sukadev regarding the nits in the descriptions.
> >- Modified the subject of first patch.
> >- Included the sample output in the 3rd patch also.
> >
> >Changes from V2:
> >
> >- tools/perf/config/Makefile is moved to the patch tools/perf.
> >- The patchset is reordered.
> >- perf_regs_load() function is used for the dwarf unwind test.Since it is 
> >not required here,
> >   it is removed from tools/perf/arch/powerpc/include/perf_regs.h
> >- PERF_REGS_POWERPC_RESULT is removed.
> >
> >Changes from V1:
> >
> >- Solved the name missmatch issue in the from and signed-off field of the 
> >patch series.
> >- Added necessary comments in the 3rd patch ie perf/powerpc ,as suggested by 
> >Maddy.
> >
> >
> >
> >Anju T (3):
> >   perf/powerpc: assign an id to each powerpc register
> >   perf/powerpc: add support for sampling intr machine state
> >   tools/perf: Map the ID values with register names
> >
> >Madhavan Srinivasan (1):
> >   tool/perf: Add 

Re: [PATCH v8 30/45] powerpc/pci: Delay populating pdn

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 12:13 PM, Gavin Shan wrote:

On Tue, Apr 19, 2016 at 06:19:20PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:44 PM, Gavin Shan wrote:

The pdn (struct pci_dn) instances are allocated from memblock or
bootmem when creating PCI controller (hoses) in setup_arch(). PCI
hotplug, which will be supported by proceeding patches, releases
PCI device nodes and their corresponding pdn on unplugging event.
The memory chunks for pdn instances allocated from memblock or
bootmem are hard to reused after being released.

This delays creating pdn by pci_devs_phb_init() from setup_arch()
to core_initcall() so that they are allocated from slab. The memory
consumed by pdn can be released to system without problem during
PCI unplugging time. It indicates that pci_dn is unavailable in
setup_arch() and the the fixup on pdn (like AGP's) can't be carried
out that time. We have to do that in ppc_md.pcibios_root_bridge_prepare()
on maple/pasemi/powermac platforms where/when the pdn is available.

At the mean while, the EEH device is created when pdn is populated,
meaning pdn and EEH device have same life cycle. In turn, we needn't
call eeh_dev_init() to create EEH device explicitly.

Signed-off-by: Gavin Shan 



Uff. It would not hurt to mention that  pcibios_root_bridge_prepare is called
from subsys_initcall() which is executed after core_initcall() so the code
flow does not change.



Yes, will do in next revision.


Have you checked if there is anything in between
core_initcall(pci_devs_phb_init) and subsys_initcall(pcibios_init) which
might need device tree nodes? For example, subsys_initcall(pcibios_init)
calls (eventually) pnv_pci_ioda_fixup(), if we are unlucky and pcibios_init()
(and therefore pnv_pci_ioda_fixup() or what pseries/others do) is called
before pcibios_init() - won't we crash or something?



I don't catch what you were asking. device-tree nodes (struct device_node)
are always there. This patch doesn't affect them. Perhaps you were talking
about pdn (PCI_DN). If it's the case, this patch delays creating pdn from
setup_arch() to core_initcall(pci_devs_phb_init).



While thinking of explaining what I wanted to ask, I found my answer :)

pcibios_init() calls ppc_md.pcibios_root_bridge_prepare() first, then 
ppc_md.pcibios_fixup() so we are fine here with ordering.




I don't see anything need pdn between setup_arch() and core_initcall().
The changes introduced to powermac/pasemi platforms are: move fixing the child
pdns of the specifiec PHB's pdn from setup_arch() to 
subsys_initcall(pcibios_init).
I don't see anything between them needs the fixed pdns.

I don't understand how pcibios_init() is called before pcibios_init() in your


pcibios_init() is used twice in the sentence above :)

Anyway,


Reviewed-by: Alexey Kardashevskiy 





context. Sorry for my bad English. Perhaps you're asking the the called sequence
on core_initcall() and subsys_init()? If so, they're defined like below:

#define core_initcall(fn)   __define_initcall(fn, 1)
#define subsys_initcall(fn) __define_initcall(fn, 4)

>




--
Alexey



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2] powerpc: Implement {cmp}xchg for u8 and u16

2016-04-19 Thread Pan Xinhui

Hello, boqun

On 2016年04月19日 17:18, Boqun Feng wrote:
> Hi Xinhui,
> 
> On Tue, Apr 19, 2016 at 02:29:34PM +0800, Pan Xinhui wrote:
>> From: Pan Xinhui 
>>
>> Implement xchg{u8,u16}{local,relaxed}, and
>> cmpxchg{u8,u16}{,local,acquire,relaxed}.
>>
>> It works on all ppc.
>>
> 
> Nice work!
> 
thank you.

> AFAICT, your work doesn't depend on anything that ppc-specific, right?
> So maybe we can use it as a general approach for a fallback
> implementation on the archs without u8/u16 atomics. ;-)
> 
>> Suggested-by: Peter Zijlstra (Intel) 
>> Signed-off-by: Pan Xinhui 
>> ---
>> change from V1:
>>  rework totally.
>> ---
>>  arch/powerpc/include/asm/cmpxchg.h | 83 
>> ++
>>  1 file changed, 83 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/cmpxchg.h 
>> b/arch/powerpc/include/asm/cmpxchg.h
>> index 44efe73..79a1f45 100644
>> --- a/arch/powerpc/include/asm/cmpxchg.h
>> +++ b/arch/powerpc/include/asm/cmpxchg.h
>> @@ -7,6 +7,37 @@
>>  #include 
>>  #include 
>>  
>> +#ifdef __BIG_ENDIAN
>> +#define BITOFF_CAL(size, off)   ((sizeof(u32) - size - off) * 
>> BITS_PER_BYTE)
>> +#else
>> +#define BITOFF_CAL(size, off)   (off * BITS_PER_BYTE)
>> +#endif
>> +
>> +static __always_inline unsigned long
>> +__cmpxchg_u32_local(volatile unsigned int *p, unsigned long old,
>> +unsigned long new);
>> +
>> +#define __XCHG_GEN(cmp, type, sfx, u32sfx, skip, v) \
>> +static __always_inline u32  \
>> +__##cmp##xchg_##type##sfx(v void *ptr, u32 old, u32 new)\
>> +{   \
>> +int size = sizeof (type);   \
>> +int off = (unsigned long)ptr % sizeof(u32); \
>> +volatile u32 *p = ptr - off;\
>> +int bitoff = BITOFF_CAL(size, off); \
>> +u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;\
>> +u32 oldv, newv; \
>> +u32 ret;\
>> +do {\
>> +oldv = READ_ONCE(*p);   \
>> +ret = (oldv & bitmask) >> bitoff;   \
>> +if (skip && ret != old) \
>> +break;  \
>> +newv = (oldv & ~bitmask) | (new << bitoff); \
>> +} while (__cmpxchg_u32##u32sfx((v void*)p, oldv, newv) != oldv);\
> 
> Forgive me if this is too paranoid, but I think we can save the
> READ_ONCE() in the loop if we change the code into the following,
> because cmpxchg will return the "new" value, if the cmp part fails.
> 
>   newv = READ_ONCE(*p);
> 
>   do {
>   oldv = newv;
>   ret = (oldv & bitmask) >> bitoff;
>   if (skip && ret != old)
>   break;
>   newv = (oldv & ~bitmask) | (new << bitoff);
>   newv = __cmpxchg_u32##u32sfx((void *)p, oldv, newv);
>   } while(newv != oldv);
> 
>> +return ret; \
>> +}
a little optimization. Patch V3 will include your code, thanks.

>> +
>>  /*
>>   * Atomic exchange
>>   *
>> @@ -14,6 +45,19 @@
>>   * the previous value stored there.
>>   */
>>  
>> +#define XCHG_GEN(type, sfx, v)  
>> \
>> +__XCHG_GEN(_, type, sfx, _local, 0, v)  \
>  ^^^
> 
> This should be sfx, right? Otherwise, all the newly added xchg will
> call __cmpxchg_u32_local, this will result in wrong ordering guarantees.
> 
I mean that. But I will think of the ordering issue for a while. :)

>> +static __always_inline u32 __xchg_##type##sfx(v void *p, u32 n) \
>> +{   \
>> +return ___xchg_##type##sfx(p, 0, n);\
>> +}
>> +
>> +XCHG_GEN(u8, _local, volatile);
> 
> I don't think we need the "volatile" modifier here, because READ_ONCE()
> and __cmpxchg_u32_* all have "volatile" semantics IIUC, so maybe we can
> save a paramter for the __XCHG_GEN macro.
> 
such cleanup work can be done in separated patch. Here I just make the compiler 
happy.

thanks
xinhui
> Regards,
> Boqun
> 
>> +XCHG_GEN(u8, _relaxed, );
>> +XCHG_GEN(u16, _local, volatile);
>> +XCHG_GEN(u16, _relaxed, );
>> +#undef XCHG_GEN
>> +
>>  static __always_inline unsigned long
>>  __xchg_u32_local(volatile void *p, unsigned long val)
>>  {
>> @@ -88,6 +132,10 @@ static __always_inline unsigned long
>>  __xchg_local(volatile void *ptr, unsigned long x, 

Re: [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes()

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 11:27 AM, Gavin Shan wrote:

On Tue, Apr 19, 2016 at 03:51:03PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:44 PM, Gavin Shan wrote:

This renames traverse_pci_devices() to pci_traverse_device_nodes().
The function traverses all subordinate device nodes of the specified
one. Also, below cleanup applied to the function. No logical changes
introduced.

* Rename "pre" to "fn".
* Avoid assignment in if condition reported from checkpatch.pl.

Signed-off-by: Gavin Shan 
---
  arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
  arch/powerpc/kernel/pci_dn.c | 15 ++-
  arch/powerpc/platforms/pseries/msi.c |  4 ++--
  3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index ca0c5bf..8753e4e 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev; /* may be NULL 
if no ISA bus */
  struct device_node;
  struct pci_dn;

-typedef void *(*traverse_func)(struct device_node *me, void *data);




Why removing this typedef? Typedef's are good.

Anyway,



Could you please provide more details why it's good? I removed it
because it was used for only once.



I have some thoughts but never mind, nobody seems to care about this and 
typedefs are considered bad by the CodingStyle.








Reviewed-by: Alexey Kardashevskiy 





-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-   void *data);
+void *pci_traverse_device_nodes(struct device_node *start,
+   void *(*fn)(struct device_node *, void *),
+   void *data);
  void *traverse_pci_dn(struct pci_dn *root,
  void *(*fn)(struct pci_dn *, void *),
  void *data);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index ce10281..ecdccce 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -372,8 +372,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
   * one of these nodes we also assume its siblings are non-pci for
   * performance.
   */
-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
-   void *data)
+void *pci_traverse_device_nodes(struct device_node *start,
+   void *(*fn)(struct device_node *, void *),
+   void *data)
  {
struct device_node *dn, *nextdn;
void *ret;
@@ -388,8 +389,11 @@ void *traverse_pci_devices(struct device_node *start, 
traverse_func pre,
if (classp)
class = of_read_number(classp, 1);

-   if (pre && ((ret = pre(dn, data)) != NULL))
-   return ret;
+   if (fn) {
+   ret = fn(dn, data);
+   if (ret)
+   return ret;
+   }

/* If we are a PCI bridge, go down */
if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
@@ -411,6 +415,7 @@ void *traverse_pci_devices(struct device_node *start, 
traverse_func pre,
}
return NULL;
  }
+EXPORT_SYMBOL_GPL(pci_traverse_device_nodes);

  static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
  struct pci_dn *pdn)
@@ -487,7 +492,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
}

/* Update dn->phb ptrs for new phb and children devices */
-   traverse_pci_devices(dn, add_pdn, phb);
+   pci_traverse_device_nodes(dn, add_pdn, phb);
  }

  /**
diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 272e9ec..543a638 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int 
request)
memset(, 0, sizeof(struct msi_counts));

/* Work out how many devices we have below this PE */
-   traverse_pci_devices(pe_dn, count_non_bridge_devices, );
+   pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, );

if (counts.num_devices == 0) {
pr_err("rtas_msi: found 0 devices under PE for %s\n",
@@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int 
request)
/* else, we have some more calculating to do */
counts.requestor = pci_device_to_OF_node(dev);
counts.request = request;
-   traverse_pci_devices(pe_dn, count_spare_msis, );
+   pci_traverse_device_nodes(pe_dn, count_spare_msis, );

/* If the quota isn't an integer multiple of the total, we can
 * use the remainder as spare MSIs for anyone that wants them. */




--
Alexey






--
Alexey
___
Linuxppc-dev mailing list

Re: [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time

2016-04-19 Thread Gavin Shan
On Wed, Apr 20, 2016 at 01:00:38PM +1000, Alexey Kardashevskiy wrote:
>On 04/20/2016 11:12 AM, Gavin Shan wrote:
>>On Tue, Apr 19, 2016 at 02:16:42PM +1000, Alexey Kardashevskiy wrote:
>>>On 02/17/2016 02:44 PM, Gavin Shan wrote:
Currently, the PEs and their associated resources are assigned
in ppc_md.pcibios_fixup() except those used by SRIOV VFs.
>>>
>>>But this new code does not affect IOV and VF's PEs will still be created
>>>somewhere else rather than pnv_pci_setup_bridge()?
>>>
>>
>>Correct. VF PEs cannot be created in pnv_pci_setup_bridge() as the PF's
>>IOV capability isn't enabled at that point.
>>
>>>
The
function is called for once after PCI probing and resources
assignment is completed. So it isn't hotplug friendly.

This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
is called on the event during system bootup and PCI hotplug: updating
PCI bridge's windows after resource assignment/reassignment are done.
For partial hotplug case, where not all PCI devices belonging to the
PE are unplugged and plugged again, we just need unbinding/binding
the affected PCI devices with the corresponding PE without creating
new one.

As there is no upstream bridge for root bus that needs to be covered
by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
before any other PEs can be created, as PE for root bus is the ancestor
to anyone else.
>>>
>>>We did not need a root bus PE before? What is the other PE reserved for?
>>>Comments only say "reserved"...
>>>
>>
>>No, A PE for root bus is needed before.
>
>Ok. We needed a PE for the root bus and we need it now. What changed? Why do
>you reserve another PE?
>

Originally, all PEs (include the one for root bus) were created at PHB fixup 
time
in pnv_pci_ioda_fixup(). With this patch, all PEs are created in 
pnv_pci_setup_bridge().
pnv_pci_setup_bridge() is called for every PCI buses other than root bus. It 
means
pnv_pci_setup_bridge() isn't called for root bus. So we have to create PE for 
root
bus before the left PEs are created there. The PE# for root bus is reserved in 
advance
and used in pnv_pci_setup_bridge() at that point.

>
>>
>other PEs can be for the PCI bus
>>originated from root port and the subordinate domains.
>>

Also, the windows of root port or the upstream port of PCIe switch behind
root port are extended to be PHB's apertures to accommodate the additional
resources needed by newly plugged devices based on the fact: hotpluggable
slot is behind root port or downstream port of the PCIe switch behind
root port. The extension for those PCI brdiges' windows is done in
ppc_md.pcibios_setup_bridge() as well.
>>>
>>>
>>>This patch seems to be doing way too many things, hard to follow.
>>>
>>>Could you please split the patch into smaller chunks? For example (you can do
>>>it totally different):
>>>- move pnv_pci_ioda_setup_opal_tce_kill()
>>>- move PE creation from pnv_pci_ioda_fixup() to pnv_pci_setup_bridge();
>>>- add pnv_pci_fixup_bridge_resources()
>>>- add an extra reserved PE for the root bus (and all this magic with
>>>root_pe_idx/root_pe_populated)
>>>- ...
>>>
>>
>>I'll evaluate it later. It's always nice to have small patches. Thanks
>>for the comments.
>>
>>>
>>>
>>>
>>>--
>>>Alexey
>>>
>>
>>--
>>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>the body of a message to majord...@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add,remove}_pci_devices()

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 11:23 AM, Gavin Shan wrote:

On Tue, Apr 19, 2016 at 03:28:36PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:44 PM, Gavin Shan wrote:

This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
with names of the weak functions in PCI subsystem, which have the
prefix "pcibios". No logical changes introduced.

Signed-off-by: Gavin Shan 
---
  arch/powerpc/include/asm/pci-bridge.h |  4 ++--
  arch/powerpc/kernel/eeh_driver.c  | 12 ++--
  arch/powerpc/kernel/pci-hotplug.c | 15 +++
  drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
  drivers/pci/hotplug/rpaphp_core.c |  4 ++--
  drivers/pci/hotplug/rpaphp_pci.c  |  2 +-
  6 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 4dd6ef4..c817f38 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -263,10 +263,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct 
pci_dn *pdn)
  extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);

  /** Remove all of the PCI devices under this bus */
-extern void pcibios_remove_pci_devices(struct pci_bus *bus);
+extern void pci_remove_pci_devices(struct pci_bus *bus);



pci_lala_pci_lala() ("pci" is used twice) looks weird, if the prefix is
"pci", what other device types can they handle?...

May be pcihp_add_devices(), pcihp_remove_devices() as these as defined in
pci-hotplug.c?



I assume you're talking about drivers/pci/hotplug/pci_hotplug_core.c.


No, the helpers you are renaming are in pci-hotplug.c which uses "pci_" as 
a prefix even though the file is supposed to be about hotplug.




pci_hotplug_core.c uses pci_hp_ prefix rather than pcihp_. I will
rename them to pci_hp_*() in next revision.


Anyway, this will work too.




gwshan@gwshan:~/sandbox/linux$ find . -name pci-hotplug.c
./arch/powerpc/kernel/pci-hotplug.c
gwshan@gwshan:~/sandbox/linux$ grep pci*hp arch/powerpc/kernel/pci-hotplug.c





  /** Discover new pci devices under this bus, and add them */
-extern void pcibios_add_pci_devices(struct pci_bus *bus);
+extern void pci_add_pci_devices(struct pci_bus *bus);


  extern void isa_bridge_find_early(struct pci_controller *hose);
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index fb6207d..59e53fe 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -621,7 +621,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
 * We don't remove the corresponding PE instances because
 * we need the information afterwords. The attached EEH
 * devices are expected to be attached soon when calling
-* into pcibios_add_pci_devices().
+* into pci_add_pci_devices().
 */
eeh_pe_state_mark(pe, EEH_PE_KEEP);
if (bus) {
@@ -630,7 +630,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
} else {
eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
pci_lock_rescan_remove();
-   pcibios_remove_pci_devices(bus);
+   pci_remove_pci_devices(bus);
pci_unlock_rescan_remove();
}
} else if (frozen_bus) {
@@ -681,7 +681,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
if (pe->type & EEH_PE_VF)
eeh_add_virt_device(edev, NULL);
else
-   pcibios_add_pci_devices(bus);
+   pci_add_pci_devices(bus);
} else if (frozen_bus && rmv_data->removed) {
pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
ssleep(5);
@@ -691,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
if (pe->type & EEH_PE_VF)
eeh_add_virt_device(edev, NULL);
else
-   pcibios_add_pci_devices(frozen_bus);
+   pci_add_pci_devices(frozen_bus);
}
eeh_pe_state_clear(pe, EEH_PE_KEEP);

@@ -896,7 +896,7 @@ perm_error:
eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);

pci_lock_rescan_remove();
-   pcibios_remove_pci_devices(frozen_bus);
+   pci_remove_pci_devices(frozen_bus);
pci_unlock_rescan_remove();
}
}
@@ -981,7 +981,7 @@ static void eeh_handle_special_event(void)
bus = eeh_pe_bus_get(phb_pe);
eeh_pe_dev_traverse(pe,
eeh_report_failure, NULL);
-   pcibios_remove_pci_devices(bus);
+   pci_remove_pci_devices(bus);
}
 

Re: [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 11:15 AM, Gavin Shan wrote:

On Tue, Apr 19, 2016 at 02:28:51PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:44 PM, Gavin Shan wrote:

pnv_pci_ioda_table_free_pages() can be reused to release the IODA1
TCE table when releasing IODA1 PE in subsequent patches.

This renames the following functions to support releasing IODA1 TCE
table: pnv_pci_ioda2_table_free_pages() to pnv_pci_ioda_table_free_pages(),
pnv_pci_ioda2_table_do_free_pages() to pnv_pci_ioda_table_do_free_pages().
No logical changes introduced.


I can only see renaming here but it seems (from
IODA_architecture_04-14-2008.pdf) that IODA1 does not support multi-level TCE
tables in the way IODA2 does.



Note that the change was proposed by you in last round.


Hm. I do not recall proposing exactly that :-/


Yes, TVE on P7IOC
doesn't support multiple levels of TCE tables.


I thought it supports 2 levels.


In this case, we will always
have "tbl->it_indirect_levels" to 1, right?


Nope, it will be 0. But it is still ugly to use release function but not to 
use its allocating counterpart which is pnv_pci_ioda2_table_alloc_pages().


I suggest having pnv_pci_ioda1_table_free_pages() which will be just a 
single free_pages() call. If you need some ioda*-common code to free a 
table, then define pnv_ioda1_iommu_ops::free().






Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index d360607..077f9db 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -51,7 +51,7 @@
  #define POWERNV_IOMMU_DEFAULT_LEVELS  1
  #define POWERNV_IOMMU_MAX_LEVELS  5

-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl);

  static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
const char *fmt, ...)
@@ -1352,7 +1352,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev 
*dev, struct pnv_ioda_pe
iommu_group_put(pe->table_group.group);
BUG_ON(pe->table_group.group);
}
-   pnv_pci_ioda2_table_free_pages(tbl);
+   pnv_pci_ioda_table_free_pages(tbl);
iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
  }

@@ -1946,7 +1946,7 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, 
long index,

  static void pnv_ioda2_table_free(struct iommu_table *tbl)
  {
-   pnv_pci_ioda2_table_free_pages(tbl);
+   pnv_pci_ioda_table_free_pages(tbl);
iommu_free_table(tbl, "pnv");
  }

@@ -2448,7 +2448,7 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int 
nid, unsigned shift,
return addr;
  }

-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
unsigned long size, unsigned level);

  static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
@@ -2487,7 +2487,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, 
__u64 bus_offset,
 * release partially allocated table.
 */
if (offset < tce_table_size) {
-   pnv_pci_ioda2_table_do_free_pages(addr,
+   pnv_pci_ioda_table_do_free_pages(addr,
1ULL << (level_shift - 3), levels - 1);
return -ENOMEM;
}
@@ -2505,7 +2505,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, 
__u64 bus_offset,
return 0;
  }

-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
unsigned long size, unsigned level)
  {
const unsigned long addr_ul = (unsigned long) addr &
@@ -2521,7 +2521,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 
*addr,
if (!(hpa & (TCE_PCI_READ | TCE_PCI_WRITE)))
continue;

-   pnv_pci_ioda2_table_do_free_pages(__va(hpa), size,
+   pnv_pci_ioda_table_do_free_pages(__va(hpa), size,
level - 1);
}
}
@@ -2529,7 +2529,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 
*addr,
free_pages(addr_ul, get_order(size << 3));
  }

-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl)
  {
const unsigned long size = tbl->it_indirect_levels ?
tbl->it_level_size : tbl->it_size;
@@ -2537,7 +2537,7 @@ static void pnv_pci_ioda2_table_free_pages(struct 
iommu_table *tbl)
if (!tbl->it_size)
return;

-   pnv_pci_ioda2_table_do_free_pages((__be64 *)tbl->it_base, size,
+   

Re: [PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 07:42:01PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>The device tree will change dynamically in PowerNV PCI hotplug
>>driver. This enables CONFIG_OF_DYNAMIC to support that.
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/platforms/powernv/Kconfig | 1 +
>>  1 file changed, 1 insertion(+)
>>
>>diff --git a/arch/powerpc/platforms/powernv/Kconfig 
>>b/arch/powerpc/platforms/powernv/Kconfig
>>index 604190c..e7b1ad7 100644
>>--- a/arch/powerpc/platforms/powernv/Kconfig
>>+++ b/arch/powerpc/platforms/powernv/Kconfig
>>@@ -18,6 +18,7 @@ config PPC_POWERNV
>>  select CPU_FREQ_GOV_ONDEMAND
>>  select CPU_FREQ_GOV_CONSERVATIVE
>>  select PPC_DOORBELL
>>+ select OF_DYNAMIC
>
>
>Why not to enable it in 45/45 under config HOTPLUG_PCI_POWERNV? Is there any
>benefit of having it always on if HOTPLUG_PCI_POWERNV is not enabled?
>

Agree, I will move accordingly in next revision. Note that we have to move
it back here once something else depends on OF_DYNAMIC in future.

>>  default y
>>
>>  config OPAL_PRD
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 01/68] powerpc/cxl: Use REGION_ID instead of opencoding

2016-04-19 Thread Michael Ellerman
On Wed, 2016-04-13 at 08:12 +0530, Aneesh Kumar K.V wrote:
> "Aneesh Kumar K.V"  writes:
> > Also note that the `~` operation is wrong.
> > 
> > Cc: Frederic Barrat 
> > Cc: Andrew Donnellan 
> > Acked-by: Ian Munsie 
> > Signed-off-by: Aneesh Kumar K.V 
> > ---
> >  drivers/misc/cxl/fault.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/misc/cxl/fault.c b/drivers/misc/cxl/fault.c
> > index 9a8650bcb042..9a236543da23 100644
> > --- a/drivers/misc/cxl/fault.c
> > +++ b/drivers/misc/cxl/fault.c
> > @@ -152,7 +152,7 @@ static void cxl_handle_page_fault(struct cxl_context 
> > *ctx,
> > access = _PAGE_PRESENT;
> > if (dsisr & CXL_PSL_DSISR_An_S)
> > access |= _PAGE_RW;
> > -   if ((!ctx->kernel) || ~(dar & (1ULL << 63)))
> > +   if ((!ctx->kernel) || (REGION_ID(dar) == USER_REGION_ID))
> > access |= _PAGE_USER;
> > 
> > if (dsisr & DSISR_NOHPTE)
> 
> Posted an updated version of this patch alone with improved commit
> message here
> 
> http://mid.gmane.org/1460482475-20782-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com

I never saw it. And that link is empty?

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 11:12 AM, Gavin Shan wrote:

On Tue, Apr 19, 2016 at 02:16:42PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:44 PM, Gavin Shan wrote:

Currently, the PEs and their associated resources are assigned
in ppc_md.pcibios_fixup() except those used by SRIOV VFs.


But this new code does not affect IOV and VF's PEs will still be created
somewhere else rather than pnv_pci_setup_bridge()?



Correct. VF PEs cannot be created in pnv_pci_setup_bridge() as the PF's
IOV capability isn't enabled at that point.




The
function is called for once after PCI probing and resources
assignment is completed. So it isn't hotplug friendly.

This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
is called on the event during system bootup and PCI hotplug: updating
PCI bridge's windows after resource assignment/reassignment are done.
For partial hotplug case, where not all PCI devices belonging to the
PE are unplugged and plugged again, we just need unbinding/binding
the affected PCI devices with the corresponding PE without creating
new one.

As there is no upstream bridge for root bus that needs to be covered
by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
before any other PEs can be created, as PE for root bus is the ancestor
to anyone else.


We did not need a root bus PE before? What is the other PE reserved for?
Comments only say "reserved"...



No, A PE for root bus is needed before.


Ok. We needed a PE for the root bus and we need it now. What changed? Why 
do you reserve another PE?






other PEs can be for the PCI bus

originated from root port and the subordinate domains.



Also, the windows of root port or the upstream port of PCIe switch behind
root port are extended to be PHB's apertures to accommodate the additional
resources needed by newly plugged devices based on the fact: hotpluggable
slot is behind root port or downstream port of the PCIe switch behind
root port. The extension for those PCI brdiges' windows is done in
ppc_md.pcibios_setup_bridge() as well.



This patch seems to be doing way too many things, hard to follow.

Could you please split the patch into smaller chunks? For example (you can do
it totally different):
- move pnv_pci_ioda_setup_opal_tce_kill()
- move PE creation from pnv_pci_ioda_fixup() to pnv_pci_setup_bridge();
- add pnv_pci_fixup_bridge_resources()
- add an extra reserved PE for the root bus (and all this magic with
root_pe_idx/root_pe_populated)
- ...



I'll evaluate it later. It's always nice to have small patches. Thanks
for the comments.





--
Alexey



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V2, 68/68] powerpc/mm/radix: Use firmware feature to disable radix

2016-04-19 Thread Michael Ellerman
On Sat, 2016-09-04 at 06:14:04 UTC, "Aneesh Kumar K.V" wrote:
> We can depend on ibm,pa-features to enable/disable radix. This gives us
> a nice way to test p9 hash config, by changing device tree property.

I think we might want to be more careful here.

You set MMU_FTR_RADIX in the cputable entry. So it's on by default on P9 cpus.

Then if there is an ibm,pa-features property *and* it is >= 41 bytes long, the
below feature entry will hit. In that case the firmware controls whether it's on
or off.

I think it would be clearer if we removed RADIX from the cputable, and the below
became the only way to turn it on. Would that break anything?

> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 7030b035905d..a4d1f44364b8 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -165,6 +165,7 @@ static struct ibm_pa_feature {
>* which is 0 if the kernel doesn't support TM.
>*/
>   {CPU_FTR_TM_COMP, 0, 0, 22, 0, 0},
> + {0, MMU_FTR_RADIX, 0,   40, 0, 0},

So that says bit 0 of byte 40 enables MMU_FTR_RADIX. Where is that documented?

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 40/45] drivers/of: Split unflatten_dt_node()

2016-04-19 Thread Gavin Shan
On Wed, Feb 17, 2016 at 08:30:42AM -0600, Rob Herring wrote:
>On Tue, Feb 16, 2016 at 9:44 PM, Gavin Shan  wrote:
>> The function unflatten_dt_node() is called recursively to unflatten
>> device nodes and properties in the FDT blob. It looks complicated
>> and hard to be understood.
>>
>> This splits the function into 3 functions: populate_properties(),
>> populate_node() and unflatten_dt_node(). populate_properties(),
>> which is called by populate_node(), creates properties for the
>> indicated device node. The later one creates the device nodes
>> from FDT blob. populate_node() gets the offset in FDT blob for
>> next device nodes and then calls populate_node(). No logical
>> changes introduced.
>>
>> Signed-off-by: Gavin Shan 
>> ---
>>  drivers/of/fdt.c | 249 
>> ---
>>  1 file changed, 147 insertions(+), 102 deletions(-)
>
>One nit, otherwise:
>
>Acked-by: Rob Herring 
>
>[...]
>
>> +   /* And we process the "ibm,phandle" property
>> +* used in pSeries dynamic device tree
>> +* stuff
>> +*/
>> +   if (!strcmp(pname, "ibm,phandle"))
>> +   np->phandle = be32_to_cpup(val);
>> +
>> +   pp->name   = (char *)pname;
>> +   pp->length = sz;
>> +   pp->value  = (__be32 *)val;
>
>This cast should not be needed.
>

Rob, very sorry to response so lately. I will fix it up in next revision.

>> +   *pprev = pp;
>> +   pprev  = >next;
>> +   }
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC

2016-04-19 Thread Alexey Kardashevskiy

On 04/20/2016 10:22 AM, Gavin Shan wrote:

On Wed, Apr 13, 2016 at 05:47:59PM +1000, Alexey Kardashevskiy wrote:

On 02/17/2016 02:43 PM, Gavin Shan wrote:

This enables M64 window on P7IOC, which has been enabled on PHB3.
Different from PHB3 where 16 M64 BARs are supported and each of
them can be owned by one particular PE# exclusively or divided
evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
of them are divided to 8 segments. So every P7IOC PHB supports
128 M64 segments in total. P7IOC has M64DT, which helps mapping
one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
M64DT, indicating that one M64 segment can only be pinned to the
fixed PE#. In order to have same code to support M64 on P7IOC and
PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
of them is pinned to the fixed PE# by bypassing the function of
M64DT. In turn, we just need different phb->init_m64() for P7IOC
and PHB3 to support M64.


The comment is not quite correct - in addition to pnv_ioda1_init_m64(), you
also need to hack pnv_ioda_pick_m64_pe().



Right, will talk about the changes to pnv_ioda_pick_m64_pe() in the
commit log of next revision.





Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 86 +--
  arch/powerpc/platforms/powernv/pci.h  |  3 ++
  2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1dc663a..8488238 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -246,6 +246,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev 
*pdev,
}
  }

+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+   struct resource *r;
+   int index;
+
+   /*
+* There are 16 M64 BARs, each of which has 8 segments. So
+* there are as many M64 segments as the maximum number of
+* PEs, which is 128.
+*/
+   for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
+   unsigned long base, segsz = phb->ioda.m64_segsize;
+   int64_t rc;
+
+   base = phb->ioda.m64_base +
+  index * PNV_IODA1_M64_SEGS * segsz;
+   rc = opal_pci_set_phb_mem_window(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, index, base, 0,
+   PNV_IODA1_M64_SEGS * segsz);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
+   rc, phb->hose->global_number, index);
+   goto fail;
+   }
+
+   rc = opal_pci_phb_mmio_enable(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, index,
+   OPAL_ENABLE_M64_SPLIT);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
+   rc, phb->hose->global_number, index);
+   goto fail;
+   }
+   }
+
+   /*
+* Exclude the segment used by the reserved PE, which
+* is expected to be 0 or last supported PE#.
+*/
+   r = >hose->mem_resources[1];
+   if (phb->ioda.reserved_pe_idx == 0)
+   r->start += phb->ioda.m64_segsize;
+   else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
+   r->end -= phb->ioda.m64_segsize;
+   else
+   pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
+   phb->ioda.reserved_pe_idx);
+
+   return 0;
+
+fail:
+   for ( ; index >= 0; index--)
+   opal_pci_phb_mmio_enable(phb->opal_id,
+   OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
+
+   return -EIO;
+}
+
  static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
unsigned long *pe_bitmap,
bool all)
@@ -315,6 +373,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, bool 
all)
pe->master = master_pe;
list_add_tail(>list, _pe->slaves);
}
+
+   /*
+* P7IOC supports M64DT, which helps mapping M64 segment
+* to one particular PE#. However, PHB3 has fixed mapping
+* between M64 segment and PE#. In order to have same logic
+* for P7IOC and PHB3, we enforce fixed mapping between M64
+* segment and PE# on P7IOC.
+*/
+   if (phb->type == PNV_PHB_IODA1) {
+   int64_t rc;
+
+   rc = opal_pci_map_pe_mmio_window(phb->opal_id,
+   pe->pe_number, OPAL_M64_WINDOW_TYPE,
+   pe->pe_number / 

RE: [PATCH 5/5] drivers/net: support hdlc function for QE-UCC

2016-04-19 Thread Qiang Zhao
On 20/04/2016 12:22AM, Christophe Leroy  wrote
> -Original Message-
> From: Christophe Leroy [mailto:christophe.le...@c-s.fr]
> Sent: Wednesday, April 20, 2016 12:22 AM
> To: Qiang Zhao ; da...@davemloft.net
> Cc: gre...@linuxfoundation.org; Xiaobo Xie ; linux-
> ker...@vger.kernel.org; o...@buserror.net; net...@vger.kernel.org;
> a...@linux-foundation.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH 5/5] drivers/net: support hdlc function for QE-UCC
> 
> Le 30/03/2016 10:50, Zhao Qiang a écrit :
> > The driver add hdlc support for Freescale QUICC Engine.
> > It support NMSI and TSA mode.
> When using TSA, how does the TSA gets configured ? Especially how do you
> describe which Timeslot is switched to HDLC channels ?

the TSA is configured statically according to device tree node. 
For " which Timeslot is switched to HDLC channels ", there is a property 
"fsl,tx-timeslot-mask" in device tree to describe it.

> Is it possible to route some Timeslots to one UCC for HDLC, and route some
> others to another UCC for an ALSA sound driver ?

The feature you describe is not supported at present.

> The QE also have a QMC which allows to split all timeslots to a given UCC into
> independant channels that can either be used with HDLC or transparents (for
> audio for instance). Do you intent to also support QMC ?

new QE use UMCC instead of QMC in old QE, we have started to develop UMCC.
 
> According to the compatible property, it looks like your driver is for 
> freescale
> T1040. The MPC83xx also has a Quick Engine, would it work on it too ?

The driver is common, but tested on t1040, it is needed to add node to MPC83xx
If you want to test on mpc83xx.

-Zhao Qiang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 00/45] powerpc/powernv: PCI hotplug support

2016-04-19 Thread Gavin Shan
On Fri, Apr 15, 2016 at 11:10:21AM -0500, Rob Herring wrote:
>On Wed, Apr 13, 2016 at 8:30 PM, Gavin Shan  wrote:
>> On Thu, Apr 14, 2016 at 09:57:32AM +1000, Alistair Popple wrote:
>>>Hi Gavin,
>>>
>>>
>>>
 >Why exactly cannot EEH reset changes go to a smaller separate patchset
 >(before hotplug)?
 >

 As I explained before, the patchset's order is: PCI generic part,
 PowerNV PCI related, EEH related, device-tree part and hotplug driver.

 The EEH reset change is included in PATCH[37/45]. There is no point
 to reorder the patches.
>>>
>>>I don't understand all of the dependencies but if possible splitting the
>>>series up into a set of smaller self-contained patch series makes things
>>>easier to review and may make it easier for you to get this functionality
>>>reviewed and accepted into upstream.
>>>
>>
>> Thanks, Alistair. I will move those cleanup/refactor related patches
>> to form a separate series which is expected to be merged first. That
>> will helps the reviewers to focus on the patches with complicated
>> changes as you suggested. Alexey, please let me know if that way is
>> you like to see or not.
>
>As I said last cycle, I'll happily take the DT refactoring patches
>separately, but you have to tell me if you want me to apply them and
>it has to be well before the merge window.
>

Thanks, Rob. I hope to post next revision (v9) soon and the device-tree
related cleanup patches should be ready for next merge window in it.

>Rob
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 07:39:34PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>This exports 4 functins, which base on the corresponding OPAL
>
>
>s/functins/functions/
>

Thanks.

>>APIs to get/set PCI slot status. Those functions are going to
>>be used by PowerNV PCI hotplug driver:
>>
>>pnv_pci_get_device_tree()opal_get_device_tree()
>>pnv_pci_get_presence_state() opal_pci_get_presence_state()
>>pnv_pci_get_power_state()opal_pci_get_power_state()
>>pnv_pci_set_power_state()opal_pci_set_power_state()
>>
>>Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
>>unregister}() to allow registration and unregistration of PCI hotplug
>>notifier, which will be used to receive PCI hotplug message from
>>skiboot firmware in PowerNV PCI hotplug driver.
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/include/asm/opal-api.h| 17 ++-
>>  arch/powerpc/include/asm/opal.h|  4 ++
>>  arch/powerpc/include/asm/pnv-pci.h |  7 +++
>>  arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
>>  arch/powerpc/platforms/powernv/pci.c   | 66 
>> ++
>>  5 files changed, 97 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/include/asm/opal-api.h 
>>b/arch/powerpc/include/asm/opal-api.h
>>index f8faaae..a6af338 100644
>>--- a/arch/powerpc/include/asm/opal-api.h
>>+++ b/arch/powerpc/include/asm/opal-api.h
>>@@ -158,7 +158,11 @@
>>  #define OPAL_LEDS_SET_INDICATOR 115
>>  #define OPAL_CEC_REBOOT2116
>>  #define OPAL_CONSOLE_FLUSH  117
>>-#define OPAL_LAST117
>>+#define OPAL_GET_DEVICE_TREE 118
>>+#define OPAL_PCI_GET_PRESENCE_STATE  119
>>+#define OPAL_PCI_GET_POWER_STATE 120
>>+#define OPAL_PCI_SET_POWER_STATE 121
>>+#define OPAL_LAST121
>>
>>  /* Device tree flags */
>>
>>@@ -344,6 +348,16 @@ enum OpalPciResetState {
>>  OPAL_ASSERT_RESET   = 1
>>  };
>>
>>+enum OpalPciSlotPresentenceState {
>>+ OPAL_PCI_SLOT_EMPTY = 0,
>>+ OPAL_PCI_SLOT_PRESENT   = 1
>>+};
>>+
>>+enum OpalPciSlotPowerState {
>>+ OPAL_PCI_SLOT_POWER_OFF = 0,
>>+ OPAL_PCI_SLOT_POWER_ON  = 1
>>+};
>>+
>>  enum OpalSlotLedType {
>>  OPAL_SLOT_LED_TYPE_ID = 0,  /* IDENTIFY LED */
>>  OPAL_SLOT_LED_TYPE_FAULT = 1,   /* FAULT LED */
>>@@ -378,6 +392,7 @@ enum opal_msg_type {
>>  OPAL_MSG_DPO,
>>  OPAL_MSG_PRD,
>>  OPAL_MSG_OCC,
>>+ OPAL_MSG_PCI_HOTPLUG,
>>  OPAL_MSG_TYPE_MAX,
>>  };
>>
>>diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>>index 9e0039f..899bcb941 100644
>>--- a/arch/powerpc/include/asm/opal.h
>>+++ b/arch/powerpc/include/asm/opal.h
>>@@ -209,6 +209,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, 
>>uint64_t buf,
>>  uint64_t size, uint64_t token);
>>  int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
>>  uint64_t token);
>>+int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
>>+int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
>>+int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
>>+int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);
>>
>>  /* Internal functions */
>>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>>diff --git a/arch/powerpc/include/asm/pnv-pci.h 
>>b/arch/powerpc/include/asm/pnv-pci.h
>>index 6f77f71..d9d095b 100644
>>--- a/arch/powerpc/include/asm/pnv-pci.h
>>+++ b/arch/powerpc/include/asm/pnv-pci.h
>>@@ -13,6 +13,13 @@
>>  #include 
>>  #include 
>>
>>+extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t 
>>len);
>>+extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
>>+extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
>>+extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
>>+extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
>>+extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
>>+
>>  int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
>>  int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
>> unsigned int virq);
>>diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
>>b/arch/powerpc/platforms/powernv/opal-wrappers.S
>>index e45b88a..3ea1a855 100644
>>--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
>>+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
>>@@ -302,3 +302,7 @@ OPAL_CALL(opal_prd_msg,   
>>OPAL_PRD_MSG);
>>  OPAL_CALL(opal_leds_get_ind,
>> OPAL_LEDS_GET_INDICATOR);
>>  OPAL_CALL(opal_leds_set_ind,
>> OPAL_LEDS_SET_INDICATOR);
>>  OPAL_CALL(opal_console_flush,   OPAL_CONSOLE_FLUSH);

Re: [PATCH v8 37/45] powerpc/powernv: Use firmware PCI slot reset infrastructure

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 07:34:55PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>The skiboot firmware might provide the PCI slot reset capability
>>which is identified by property "ibm,reset-by-firmware" on the
>>PCI slot associated device node.
>>
>>This checks the property. If it exists, the reset request is routed
>>to firmware. Otherwise, the reset is done by kernel as before.
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 41 
>> +++-
>>  1 file changed, 40 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
>>b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index e23b063..c8a5217 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -789,7 +789,7 @@ static int pnv_eeh_root_reset(struct pci_controller 
>>*hose, int option)
>>  return ret;
>>  }
>>
>>-static int pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>+static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
>>  {
>>  struct pci_dn *pdn = pci_get_pdn_by_devfn(dev->bus, dev->devfn);
>>  struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>@@ -840,6 +840,45 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int 
>>option)
>>  return 0;
>>  }
>>
>>+static int pnv_eeh_bridge_reset(struct pci_dev *pdev, int option)
>>+{
>>+ struct pci_controller *hose;
>>+ struct pnv_phb *phb;
>>+ struct device_node *dn = pdev ? pci_device_to_OF_node(pdev) : NULL;
>>+ uint64_t id = (0x1ul << 60);
>
>
>What is this 1<<60 for?
>
>

As you replied in other threads, it's worthy to have some macros for this
piece of business. This bit indicates the ID of the slot behind a switch
port. If this bit is cleared, the ID represents a PHB slot.

>>+ uint8_t scope;
>>+ int64_t rc;
>>+
>>+ /*
>>+  * If the firmware can't handle it, we will issue hot reset
>>+  * on the secondary bus despite the requested reset type.
>>+  */
>>+ if (!dn || !of_get_property(dn, "ibm,reset-by-firmware", NULL))
>>+ return __pnv_eeh_bridge_reset(pdev, option);
>>+
>>+ /* The firmware can handle the request */
>>+ switch (option) {
>>+ case EEH_RESET_HOT:
>>+ scope = OPAL_RESET_PCI_HOT;
>>+ break;
>>+ case EEH_RESET_FUNDAMENTAL:
>>+ scope = OPAL_RESET_PCI_FUNDAMENTAL;
>>+ break;
>>+ case EEH_RESET_DEACTIVATE:
>>+ return 0;
>>+ default:
>>+ dev_warn(>dev, "%s: Unsupported reset %d\n",
>>+  __func__, option);
>
>
>Can the userspace trigger this case (via VFIO-EEH) and flood dmesg?
>

It depends on how you defined message flooding actually. It's abnormal
path caused by program internal error, not external users.

>
>
>>+ return -EINVAL;
>>+ }
>>+
>>+ hose = pci_bus_to_host(pdev->bus);
>>+ phb = hose->private_data;
>>+ id |= (pdev->bus->number << 24) | (pdev->devfn << 16) | phb->opal_id;
>>+ rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
>>+ return pnv_pci_poll(id, rc, NULL);
>>+}
>>+
>>  static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
>>  {
>>  int *freset = data;
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 36/45] powerpc/powernv: Support PCI slot ID

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 07:28:20PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>PowerNV platforms runs on top of skiboot firmware that includes
>>changes to support PCI slots. PCI slots are identified by PHB's
>>ID or the combo of that and PCI slot ID.
>>
>>This changes the EEH PowerNV backend to support PCI slots:
>>
>>* Rename arguments of opal_pci_reset() and opal_pci_poll().
>>* One more argument (PCI slot's state) added to opal_pci_poll().
>>* Drop pnv_eeh_phb_poll() and introduce a enhanced similar
>>  function pnv_pci_poll() that will be used by PowerNV hotplug
>>  backends.
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/include/asm/opal.h  |  4 +--
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 42 
>> ++--
>>  arch/powerpc/platforms/powernv/pci.c | 21 ++
>>  arch/powerpc/platforms/powernv/pci.h |  1 +
>>  4 files changed, 32 insertions(+), 36 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>>index 07a99e6..9e0039f 100644
>>--- a/arch/powerpc/include/asm/opal.h
>>+++ b/arch/powerpc/include/asm/opal.h
>>@@ -131,7 +131,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, 
>>uint16_t pe_number, uint16_t
>>  int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
>>  uint16_t dma_window_number, uint64_t 
>> pci_start_addr,
>>  uint64_t pci_mem_size);
>>-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t 
>>assert_state);
>>+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t 
>>assert_state);
>>
>>  int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
>> uint64_t diag_buffer_len);
>>@@ -148,7 +148,7 @@ int64_t opal_get_dpo_status(__be64 *dpo_timeout);
>>  int64_t opal_set_system_attention_led(uint8_t led_action);
>>  int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
>>  __be16 *pci_error_type, __be16 *severity);
>>-int64_t opal_pci_poll(uint64_t phb_id);
>>+int64_t opal_pci_poll(uint64_t id, uint8_t *state);
>>  int64_t opal_return_cpu(void);
>>  int64_t opal_check_token(uint64_t token);
>>  int64_t opal_reinit_cpus(uint64_t flags);
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
>>b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index c7454ba..e23b063 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -717,28 +717,11 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int 
>>*delay)
>>  return ret;
>>  }
>>
>>-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
>>-{
>>- s64 rc = OPAL_HARDWARE;
>>-
>>- while (1) {
>>- rc = opal_pci_poll(phb->opal_id);
>>- if (rc <= 0)
>>- break;
>>-
>>- if (system_state < SYSTEM_RUNNING)
>>- udelay(1000 * rc);
>>- else
>>- msleep(rc);
>>- }
>>-
>>- return rc;
>>-}
>>-
>>  int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
>>  {
>>  struct pnv_phb *phb = hose->private_data;
>>  s64 rc = OPAL_HARDWARE;
>>+ int ret;
>>
>>  pr_debug("%s: Reset PHB#%x, option=%d\n",
>>   __func__, hose->global_number, option);
>>@@ -753,8 +736,6 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int 
>>option)
>>  rc = opal_pci_reset(phb->opal_id,
>>  OPAL_RESET_PHB_COMPLETE,
>>  OPAL_DEASSERT_RESET);
>>- if (rc < 0)
>>- goto out;
>>
>>  /*
>>   * Poll state of the PHB until the request is done
>>@@ -762,24 +743,22 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int 
>>option)
>>   * reset followed by hot reset on root bus. So we also
>>   * need the PCI bus settlement delay.
>>   */
>>- rc = pnv_eeh_phb_poll(phb);
>>- if (option == EEH_RESET_DEACTIVATE) {
>>+ ret = pnv_pci_poll(phb->opal_id, rc, NULL);
>>+ if (option == EEH_RESET_DEACTIVATE && !ret) {
>>  if (system_state < SYSTEM_RUNNING)
>>  udelay(1000 * EEH_PE_RST_SETTLE_TIME);
>>  else
>>  msleep(EEH_PE_RST_SETTLE_TIME);
>>  }
>>-out:
>>- if (rc != OPAL_SUCCESS)
>>- return -EIO;
>>
>>- return 0;
>>+ return ret;
>>  }
>>
>>  static int pnv_eeh_root_reset(struct pci_controller *hose, int option)
>>  {
>>  struct pnv_phb *phb = hose->private_data;
>>  s64 rc = OPAL_HARDWARE;
>>+ int ret;
>>
>>  pr_debug("%s: Reset PHB#%x, option=%d\n",
>>   __func__, hose->global_number, option);
>>@@ -801,18 +780,13 @@ static int pnv_eeh_root_reset(struct pci_controller 
>>*hose, int option)
>>  rc = opal_pci_reset(phb->opal_id,

Re: [PATCH v8 30/45] powerpc/pci: Delay populating pdn

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 06:19:20PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>The pdn (struct pci_dn) instances are allocated from memblock or
>>bootmem when creating PCI controller (hoses) in setup_arch(). PCI
>>hotplug, which will be supported by proceeding patches, releases
>>PCI device nodes and their corresponding pdn on unplugging event.
>>The memory chunks for pdn instances allocated from memblock or
>>bootmem are hard to reused after being released.
>>
>>This delays creating pdn by pci_devs_phb_init() from setup_arch()
>>to core_initcall() so that they are allocated from slab. The memory
>>consumed by pdn can be released to system without problem during
>>PCI unplugging time. It indicates that pci_dn is unavailable in
>>setup_arch() and the the fixup on pdn (like AGP's) can't be carried
>>out that time. We have to do that in ppc_md.pcibios_root_bridge_prepare()
>>on maple/pasemi/powermac platforms where/when the pdn is available.
>>
>>At the mean while, the EEH device is created when pdn is populated,
>>meaning pdn and EEH device have same life cycle. In turn, we needn't
>>call eeh_dev_init() to create EEH device explicitly.
>>
>>Signed-off-by: Gavin Shan 
>
>
>Uff. It would not hurt to mention that  pcibios_root_bridge_prepare is called
>from subsys_initcall() which is executed after core_initcall() so the code
>flow does not change.
>

Yes, will do in next revision.

>Have you checked if there is anything in between
>core_initcall(pci_devs_phb_init) and subsys_initcall(pcibios_init) which
>might need device tree nodes? For example, subsys_initcall(pcibios_init)
>calls (eventually) pnv_pci_ioda_fixup(), if we are unlucky and pcibios_init()
>(and therefore pnv_pci_ioda_fixup() or what pseries/others do) is called
>before pcibios_init() - won't we crash or something?
>

I don't catch what you were asking. device-tree nodes (struct device_node)
are always there. This patch doesn't affect them. Perhaps you were talking
about pdn (PCI_DN). If it's the case, this patch delays creating pdn from
setup_arch() to core_initcall(pci_devs_phb_init). I don't see anything need
pdn between setup_arch() and core_initcall().

The changes introduced to powermac/pasemi platforms are: move fixing the child
pdns of the specifiec PHB's pdn from setup_arch() to 
subsys_initcall(pcibios_init).
I don't see anything between them needs the fixed pdns.

I don't understand how pcibios_init() is called before pcibios_init() in your
context. Sorry for my bad English. Perhaps you're asking the the called sequence
on core_initcall() and subsys_init()? If so, they're defined like below:

#define core_initcall(fn)   __define_initcall(fn, 1)
#define subsys_initcall(fn) __define_initcall(fn, 4)

>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver

2016-04-19 Thread Alistair Popple
On Tue, 19 Apr 2016 20:36:48 Alexey Kardashevskiy wrote:
> On 02/17/2016 02:44 PM, Gavin Shan wrote:
> > This adds standalone driver to support PCI hotplug for PowerPC PowerNV
> > platform that runs on top of skiboot firmware. The firmware identifies
> > hotpluggable slots and marked their device tree node with proper
> > "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
> > device tree nodes to create/register PCI hotplug slot accordingly.
> >
> > The PCI slots are organized in fashion of tree, which means one
> > PCI slot might have parent PCI slot and parent PCI slot possibly
> > contains multiple child PCI slots. At the plugging time, the parent
> > PCI slot is populated before its children. The child PCI slots are
> > removed before their parent PCI slot can be removed from the system.
> >
> > If the skiboot firmware doesn't support slot status retrieval, the PCI
> > slot device node shouldn't have property "ibm,reset-by-firmware". In
> > that case, none of valid PCI slots will be detected from device tree.
> > The skiboot firmware doesn't export the capability to access attention
> > LEDs yet and it's something for TBD.
> >
> > Signed-off-by: Gavin Shan 
> > Acked-by: Bjorn Helgaas 
> > ---
> >   drivers/pci/hotplug/Kconfig   |  12 +
> >   drivers/pci/hotplug/Makefile  |   3 +
> >   drivers/pci/hotplug/pnv_php.c | 870 
> > ++
> >   3 files changed, 885 insertions(+)
> >   create mode 100644 drivers/pci/hotplug/pnv_php.c
> >
> > diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
> > index df8caec..167c8ce 100644
> > --- a/drivers/pci/hotplug/Kconfig
> > +++ b/drivers/pci/hotplug/Kconfig
> > @@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
> >
> >   When in doubt, say N.
> >
> > +config HOTPLUG_PCI_POWERNV
> > +   tristate "PowerPC PowerNV PCI Hotplug driver"
> > +   depends on PPC_POWERNV && EEH
> > +   help
> > + Say Y here if you run PowerPC PowerNV platform that supports
> > + PCI Hotplug
> > +
> > + To compile this driver as a module, choose M here: the
> > + module will be called pnv-php.
> > +
> > + When in doubt, say N.
> > +
> >   config HOTPLUG_PCI_RPA
> > tristate "RPA PCI Hotplug driver"
> > depends on PPC_PSERIES && EEH
> > diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
> > index b616e75..e33cdda 100644
> > --- a/drivers/pci/hotplug/Makefile
> > +++ b/drivers/pci/hotplug/Makefile
> > @@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)+= pciehp.o
> >   obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550) += cpcihp_zt5550.o
> >   obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)+= cpcihp_generic.o
> >   obj-$(CONFIG_HOTPLUG_PCI_SHPC)+= shpchp.o
> > +obj-$(CONFIG_HOTPLUG_PCI_POWERNV)  += pnv-php.o
> >   obj-$(CONFIG_HOTPLUG_PCI_RPA) += rpaphp.o
> >   obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)   += rpadlpar_io.o
> >   obj-$(CONFIG_HOTPLUG_PCI_SGI) += sgi_hotplug.o
> > @@ -50,6 +51,8 @@ ibmphp-objs   :=  ibmphp_core.o   \
> >   acpiphp-objs  :=  acpiphp_core.o  \
> > acpiphp_glue.o
> >
> > +pnv-php-objs   :=  pnv_php.o
> > +
> >   rpaphp-objs   :=  rpaphp_core.o   \
> > rpaphp_pci.o\
> > rpaphp_slot.o
> > diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> > new file mode 100644
> > index 000..364ec36
> > --- /dev/null
> > +++ b/drivers/pci/hotplug/pnv_php.c
> > @@ -0,0 +1,870 @@
> > +/*
> > + * PCI Hotplug Driver for PowerPC PowerNV platform.
> > + *
> > + * Copyright Gavin Shan, IBM Corporation 2015.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define DRIVER_VERSION "0.1"
> > +#define DRIVER_AUTHOR  "Gavin Shan, IBM Corporation"
> > +#define DRIVER_DESC"PowerPC PowerNV PCI Hotplug Driver"
> > +
> > +struct pnv_php_slot {
> > +   struct hotplug_slot slot;
> > +   struct hotplug_slot_infoslot_info;
> > +   uint64_tid;
> > +   char*name;
> > +   int slot_no;
> > +   struct kref kref;
> > +#define PNV_PHP_STATE_INITIALIZED  0
> > +#define PNV_PHP_STATE_REGISTERED   1
> > +#define PNV_PHP_STATE_POPULATED2
> > +   int state;
> > +   struct device_node  *dn;
> > +   struct pci_dev  *pdev;
> > +   struct pci_bus  *bus;
> > +   bool   

Re: [PATCH v8 35/45] powerpc/powernv: Fundamental reset in pnv_pci_reset_secondary_bus()

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 07:04:19PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>In pnv_pci_reset_secondary_bus(), we should issue fundamental reset
>>if any one subordinate device of the specified bus is requesting that.
>>Otherwise, the device might not come up after the reset.
>>
>>Signed-off-by: Gavin Shan 
>
>
>Reviewed-by: Alexey Kardashevskiy 
>
>
>Out of curiosity - what does "fundamental" reset actually do?
>

Please refer to the skiboot patches - power off/on the target slot.

>
>>---
>>  arch/powerpc/platforms/powernv/eeh-powernv.c | 21 -
>>  1 file changed, 20 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
>>b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>index 593b8dc..c7454ba 100644
>>--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
>>+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
>>@@ -866,9 +866,28 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int 
>>option)
>>  return 0;
>>  }
>>
>>+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
>>+{
>>+ int *freset = data;
>>+
>>+ /*
>>+  * Stop the iteration immediately if there has any one
>>+  * PCI device requesting fundamental reset.
>>+  */
>>+ *freset |= pdev->needs_freset;
>>+ return *freset;
>>+}
>>+
>>  void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
>>  {
>>- pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
>>+ int option, freset = 0;
>>+
>>+ if (dev->subordinate)
>>+ pci_walk_bus(dev->subordinate,
>>+  pnv_pci_dev_reset_type, );
>>+
>>+ option = freset ? EEH_RESET_FUNDAMENTAL : EEH_RESET_HOT;
>>+ pnv_eeh_bridge_reset(dev, option);
>>  pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
>>  }
>>
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: Increase timeout for detection of AFU mmio hang

2016-04-19 Thread Ian Munsie
Acked-by: Ian Munsie 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/512x: clk: Remove CLK_IS_ROOT

2016-04-19 Thread Stephen Boyd
This flag is a no-op now (see commit 47b0eeb3dc8a "clk: Deprecate
CLK_IS_ROOT", 2016-02-02) so remove it.

Cc: Gerhard Sittig 
Signed-off-by: Stephen Boyd 
---
 arch/powerpc/platforms/512x/clock-commonclk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/512x/clock-commonclk.c 
b/arch/powerpc/platforms/512x/clock-commonclk.c
index c50ea76ba66c..6081fbd75330 100644
--- a/arch/powerpc/platforms/512x/clock-commonclk.c
+++ b/arch/powerpc/platforms/512x/clock-commonclk.c
@@ -221,7 +221,7 @@ static bool soc_has_mclk_mux0_canin(void)
 /* convenience wrappers around the common clk API */
 static inline struct clk *mpc512x_clk_fixed(const char *name, int rate)
 {
-   return clk_register_fixed_rate(NULL, name, NULL, CLK_IS_ROOT, rate);
+   return clk_register_fixed_rate(NULL, name, NULL, 0, rate);
 }
 
 static inline struct clk *mpc512x_clk_factor(
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 29/45] powerpc/pci: Export pci_traverse_device_nodes()

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 03:51:03PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>This renames traverse_pci_devices() to pci_traverse_device_nodes().
>>The function traverses all subordinate device nodes of the specified
>>one. Also, below cleanup applied to the function. No logical changes
>>introduced.
>>
>>* Rename "pre" to "fn".
>>* Avoid assignment in if condition reported from checkpatch.pl.
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/include/asm/ppc-pci.h   |  6 +++---
>>  arch/powerpc/kernel/pci_dn.c | 15 ++-
>>  arch/powerpc/platforms/pseries/msi.c |  4 ++--
>>  3 files changed, 15 insertions(+), 10 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/ppc-pci.h 
>>b/arch/powerpc/include/asm/ppc-pci.h
>>index ca0c5bf..8753e4e 100644
>>--- a/arch/powerpc/include/asm/ppc-pci.h
>>+++ b/arch/powerpc/include/asm/ppc-pci.h
>>@@ -33,9 +33,9 @@ extern struct pci_dev *isa_bridge_pcidev;   /* may be NULL 
>>if no ISA bus */
>>  struct device_node;
>>  struct pci_dn;
>>
>>-typedef void *(*traverse_func)(struct device_node *me, void *data);
>
>
>
>Why removing this typedef? Typedef's are good.
>
>Anyway,
>

Could you please provide more details why it's good? I removed it
because it was used for only once.


>
>Reviewed-by: Alexey Kardashevskiy 
>
>
>
>
>>-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>- void *data);
>>+void *pci_traverse_device_nodes(struct device_node *start,
>>+ void *(*fn)(struct device_node *, void *),
>>+ void *data);
>>  void *traverse_pci_dn(struct pci_dn *root,
>>void *(*fn)(struct pci_dn *, void *),
>>void *data);
>>diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>index ce10281..ecdccce 100644
>>--- a/arch/powerpc/kernel/pci_dn.c
>>+++ b/arch/powerpc/kernel/pci_dn.c
>>@@ -372,8 +372,9 @@ EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
>>   * one of these nodes we also assume its siblings are non-pci for
>>   * performance.
>>   */
>>-void *traverse_pci_devices(struct device_node *start, traverse_func pre,
>>- void *data)
>>+void *pci_traverse_device_nodes(struct device_node *start,
>>+ void *(*fn)(struct device_node *, void *),
>>+ void *data)
>>  {
>>  struct device_node *dn, *nextdn;
>>  void *ret;
>>@@ -388,8 +389,11 @@ void *traverse_pci_devices(struct device_node *start, 
>>traverse_func pre,
>>  if (classp)
>>  class = of_read_number(classp, 1);
>>
>>- if (pre && ((ret = pre(dn, data)) != NULL))
>>- return ret;
>>+ if (fn) {
>>+ ret = fn(dn, data);
>>+ if (ret)
>>+ return ret;
>>+ }
>>
>>  /* If we are a PCI bridge, go down */
>>  if (dn->child && ((class >> 8) == PCI_CLASS_BRIDGE_PCI ||
>>@@ -411,6 +415,7 @@ void *traverse_pci_devices(struct device_node *start, 
>>traverse_func pre,
>>  }
>>  return NULL;
>>  }
>>+EXPORT_SYMBOL_GPL(pci_traverse_device_nodes);
>>
>>  static struct pci_dn *pci_dn_next_one(struct pci_dn *root,
>>struct pci_dn *pdn)
>>@@ -487,7 +492,7 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
>>  }
>>
>>  /* Update dn->phb ptrs for new phb and children devices */
>>- traverse_pci_devices(dn, add_pdn, phb);
>>+ pci_traverse_device_nodes(dn, add_pdn, phb);
>>  }
>>
>>  /**
>>diff --git a/arch/powerpc/platforms/pseries/msi.c 
>>b/arch/powerpc/platforms/pseries/msi.c
>>index 272e9ec..543a638 100644
>>--- a/arch/powerpc/platforms/pseries/msi.c
>>+++ b/arch/powerpc/platforms/pseries/msi.c
>>@@ -305,7 +305,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int 
>>request)
>>  memset(, 0, sizeof(struct msi_counts));
>>
>>  /* Work out how many devices we have below this PE */
>>- traverse_pci_devices(pe_dn, count_non_bridge_devices, );
>>+ pci_traverse_device_nodes(pe_dn, count_non_bridge_devices, );
>>
>>  if (counts.num_devices == 0) {
>>  pr_err("rtas_msi: found 0 devices under PE for %s\n",
>>@@ -320,7 +320,7 @@ static int msi_quota_for_device(struct pci_dev *dev, int 
>>request)
>>  /* else, we have some more calculating to do */
>>  counts.requestor = pci_device_to_OF_node(dev);
>>  counts.request = request;
>>- traverse_pci_devices(pe_dn, count_spare_msis, );
>>+ pci_traverse_device_nodes(pe_dn, count_spare_msis, );
>>
>>  /* If the quota isn't an integer multiple of the total, we can
>>   * use the remainder as spare MSIs for anyone that wants them. */
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org

Re: [PATCH v8 28/45] powerpc/pci: Introduce pci_remove_device_node_info()

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 03:48:26PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>This implements and exports pci_remove_device_node_info(). It's
>>used to remove the pdn (struct pci_dn) for the indicated device
>>node. The function is going to be used by PowerNV PCI hotplug
>>driver.
>>
>>Signed-off-by: Gavin Shan 
>
>Kind of strange that there is no such helper for pseries, is there?
>

I don't find one actually. If you find one, pls let me know, thanks!

>
>Reviewed-by: Alexey Kardashevskiy 
>
>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h |  1 +
>>  arch/powerpc/kernel/pci_dn.c  | 23 +++
>>  2 files changed, 24 insertions(+)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>b/arch/powerpc/include/asm/pci-bridge.h
>>index 72a9d4e..c6310e2 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -240,6 +240,7 @@ extern struct pci_dn *add_dev_pci_data(struct pci_dev 
>>*pdev);
>>  extern void remove_dev_pci_data(struct pci_dev *pdev);
>>  extern struct pci_dn *pci_add_device_node_info(struct pci_controller *hose,
>> struct device_node *dn);
>>+extern void pci_remove_device_node_info(struct device_node *dn);
>>
>>  static inline int pci_device_from_OF_node(struct device_node *np,
>>u8 *bus, u8 *devfn)
>>diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
>>index 0a249ff..ce10281 100644
>>--- a/arch/powerpc/kernel/pci_dn.c
>>+++ b/arch/powerpc/kernel/pci_dn.c
>>@@ -331,6 +331,29 @@ struct pci_dn *pci_add_device_node_info(struct 
>>pci_controller *hose,
>>  }
>>  EXPORT_SYMBOL_GPL(pci_add_device_node_info);
>>
>>+void pci_remove_device_node_info(struct device_node *dn)
>>+{
>>+ struct pci_dn *pdn = dn ? PCI_DN(dn) : NULL;
>>+#ifdef CONFIG_EEH
>>+ struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>>+
>>+ if (edev)
>>+ edev->pdn = NULL;
>>+#endif
>>+
>>+ if (!pdn)
>>+ return;
>>+
>>+ WARN_ON(!list_empty(>child_list));
>>+ list_del(>list);
>>+ if (pdn->parent)
>>+ of_node_put(pdn->parent->node);
>>+
>>+ dn->data = NULL;
>>+ kfree(pdn);
>>+}
>>+EXPORT_SYMBOL_GPL(pci_remove_device_node_info);
>>+
>>  /*
>>   * Traverse a device tree stopping each PCI device in the tree.
>>   * This is done depth first.  As each node is processed, a "pre"
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 24/45] powerpc/pci: Rename pcibios_{add,remove}_pci_devices()

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 03:28:36PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>This renames pcibios_{add,remove}_pci_devices() to avoid conflicts
>>with names of the weak functions in PCI subsystem, which have the
>>prefix "pcibios". No logical changes introduced.
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/include/asm/pci-bridge.h |  4 ++--
>>  arch/powerpc/kernel/eeh_driver.c  | 12 ++--
>>  arch/powerpc/kernel/pci-hotplug.c | 15 +++
>>  drivers/pci/hotplug/rpadlpar_core.c   |  2 +-
>>  drivers/pci/hotplug/rpaphp_core.c |  4 ++--
>>  drivers/pci/hotplug/rpaphp_pci.c  |  2 +-
>>  6 files changed, 19 insertions(+), 20 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>b/arch/powerpc/include/asm/pci-bridge.h
>>index 4dd6ef4..c817f38 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -263,10 +263,10 @@ static inline struct eeh_dev *pdn_to_eeh_dev(struct 
>>pci_dn *pdn)
>>  extern struct pci_bus *pcibios_find_pci_bus(struct device_node *dn);
>>
>>  /** Remove all of the PCI devices under this bus */
>>-extern void pcibios_remove_pci_devices(struct pci_bus *bus);
>>+extern void pci_remove_pci_devices(struct pci_bus *bus);
>
>
>pci_lala_pci_lala() ("pci" is used twice) looks weird, if the prefix is
>"pci", what other device types can they handle?...
>
>May be pcihp_add_devices(), pcihp_remove_devices() as these as defined in
>pci-hotplug.c?
>

I assume you're talking about drivers/pci/hotplug/pci_hotplug_core.c.
pci_hotplug_core.c uses pci_hp_ prefix rather than pcihp_. I will
rename them to pci_hp_*() in next revision.

gwshan@gwshan:~/sandbox/linux$ find . -name pci-hotplug.c
./arch/powerpc/kernel/pci-hotplug.c
gwshan@gwshan:~/sandbox/linux$ grep pci*hp arch/powerpc/kernel/pci-hotplug.c 

>
>>
>>  /** Discover new pci devices under this bus, and add them */
>>-extern void pcibios_add_pci_devices(struct pci_bus *bus);
>>+extern void pci_add_pci_devices(struct pci_bus *bus);
>>
>>
>>  extern void isa_bridge_find_early(struct pci_controller *hose);
>>diff --git a/arch/powerpc/kernel/eeh_driver.c 
>>b/arch/powerpc/kernel/eeh_driver.c
>>index fb6207d..59e53fe 100644
>>--- a/arch/powerpc/kernel/eeh_driver.c
>>+++ b/arch/powerpc/kernel/eeh_driver.c
>>@@ -621,7 +621,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
>>pci_bus *bus,
>>   * We don't remove the corresponding PE instances because
>>   * we need the information afterwords. The attached EEH
>>   * devices are expected to be attached soon when calling
>>-  * into pcibios_add_pci_devices().
>>+  * into pci_add_pci_devices().
>>   */
>>  eeh_pe_state_mark(pe, EEH_PE_KEEP);
>>  if (bus) {
>>@@ -630,7 +630,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
>>pci_bus *bus,
>>  } else {
>>  eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
>>  pci_lock_rescan_remove();
>>- pcibios_remove_pci_devices(bus);
>>+ pci_remove_pci_devices(bus);
>>  pci_unlock_rescan_remove();
>>  }
>>  } else if (frozen_bus) {
>>@@ -681,7 +681,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
>>pci_bus *bus,
>>  if (pe->type & EEH_PE_VF)
>>  eeh_add_virt_device(edev, NULL);
>>  else
>>- pcibios_add_pci_devices(bus);
>>+ pci_add_pci_devices(bus);
>>  } else if (frozen_bus && rmv_data->removed) {
>>  pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
>>  ssleep(5);
>>@@ -691,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
>>pci_bus *bus,
>>  if (pe->type & EEH_PE_VF)
>>  eeh_add_virt_device(edev, NULL);
>>  else
>>- pcibios_add_pci_devices(frozen_bus);
>>+ pci_add_pci_devices(frozen_bus);
>>  }
>>  eeh_pe_state_clear(pe, EEH_PE_KEEP);
>>
>>@@ -896,7 +896,7 @@ perm_error:
>>  eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
>>
>>  pci_lock_rescan_remove();
>>- pcibios_remove_pci_devices(frozen_bus);
>>+ pci_remove_pci_devices(frozen_bus);
>>  pci_unlock_rescan_remove();
>>  }
>>  }
>>@@ -981,7 +981,7 @@ static void eeh_handle_special_event(void)
>>  bus = eeh_pe_bus_get(phb_pe);
>>  eeh_pe_dev_traverse(pe,
>>  eeh_report_failure, NULL);
>>- pcibios_remove_pci_devices(bus);
>>+ pci_remove_pci_devices(bus);
>>  }
>>  pci_unlock_rescan_remove();
>>  }
>>diff --git a/arch/powerpc/kernel/pci-hotplug.c 

Re: [PATCH v8 22/45] powerpc/powernv/ioda1: Support releasing IODA1 TCE table

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 02:28:51PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>pnv_pci_ioda_table_free_pages() can be reused to release the IODA1
>>TCE table when releasing IODA1 PE in subsequent patches.
>>
>>This renames the following functions to support releasing IODA1 TCE
>>table: pnv_pci_ioda2_table_free_pages() to pnv_pci_ioda_table_free_pages(),
>>pnv_pci_ioda2_table_do_free_pages() to pnv_pci_ioda_table_do_free_pages().
>>No logical changes introduced.
>
>I can only see renaming here but it seems (from
>IODA_architecture_04-14-2008.pdf) that IODA1 does not support multi-level TCE
>tables in the way IODA2 does.
>

Note that the change was proposed by you in last round. Yes, TVE on P7IOC
doesn't support multiple levels of TCE tables. In this case, we will always
have "tbl->it_indirect_levels" to 1, right?

>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 18 +-
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index d360607..077f9db 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -51,7 +51,7 @@
>>  #define POWERNV_IOMMU_DEFAULT_LEVELS1
>>  #define POWERNV_IOMMU_MAX_LEVELS5
>>
>>-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl);
>>+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl);
>>
>>  static void pe_level_printk(const struct pnv_ioda_pe *pe, const char *level,
>>  const char *fmt, ...)
>>@@ -1352,7 +1352,7 @@ static void pnv_pci_ioda2_release_dma_pe(struct pci_dev 
>>*dev, struct pnv_ioda_pe
>>  iommu_group_put(pe->table_group.group);
>>  BUG_ON(pe->table_group.group);
>>  }
>>- pnv_pci_ioda2_table_free_pages(tbl);
>>+ pnv_pci_ioda_table_free_pages(tbl);
>>  iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>  }
>>
>>@@ -1946,7 +1946,7 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, 
>>long index,
>>
>>  static void pnv_ioda2_table_free(struct iommu_table *tbl)
>>  {
>>- pnv_pci_ioda2_table_free_pages(tbl);
>>+ pnv_pci_ioda_table_free_pages(tbl);
>>  iommu_free_table(tbl, "pnv");
>>  }
>>
>>@@ -2448,7 +2448,7 @@ static __be64 *pnv_pci_ioda2_table_do_alloc_pages(int 
>>nid, unsigned shift,
>>  return addr;
>>  }
>>
>>-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
>>  unsigned long size, unsigned level);
>>
>>  static long pnv_pci_ioda2_table_alloc_pages(int nid, __u64 bus_offset,
>>@@ -2487,7 +2487,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, 
>>__u64 bus_offset,
>>   * release partially allocated table.
>>   */
>>  if (offset < tce_table_size) {
>>- pnv_pci_ioda2_table_do_free_pages(addr,
>>+ pnv_pci_ioda_table_do_free_pages(addr,
>>  1ULL << (level_shift - 3), levels - 1);
>>  return -ENOMEM;
>>  }
>>@@ -2505,7 +2505,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, 
>>__u64 bus_offset,
>>  return 0;
>>  }
>>
>>-static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
>>+static void pnv_pci_ioda_table_do_free_pages(__be64 *addr,
>>  unsigned long size, unsigned level)
>>  {
>>  const unsigned long addr_ul = (unsigned long) addr &
>>@@ -2521,7 +2521,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 
>>*addr,
>>  if (!(hpa & (TCE_PCI_READ | TCE_PCI_WRITE)))
>>  continue;
>>
>>- pnv_pci_ioda2_table_do_free_pages(__va(hpa), size,
>>+ pnv_pci_ioda_table_do_free_pages(__va(hpa), size,
>>  level - 1);
>>  }
>>  }
>>@@ -2529,7 +2529,7 @@ static void pnv_pci_ioda2_table_do_free_pages(__be64 
>>*addr,
>>  free_pages(addr_ul, get_order(size << 3));
>>  }
>>
>>-static void pnv_pci_ioda2_table_free_pages(struct iommu_table *tbl)
>>+static void pnv_pci_ioda_table_free_pages(struct iommu_table *tbl)
>>  {
>>  const unsigned long size = tbl->it_indirect_levels ?
>>  tbl->it_level_size : tbl->it_size;
>>@@ -2537,7 +2537,7 @@ static void pnv_pci_ioda2_table_free_pages(struct 
>>iommu_table *tbl)
>>  if (!tbl->it_size)
>>  return;
>>
>>- pnv_pci_ioda2_table_do_free_pages((__be64 *)tbl->it_base, size,
>>+ pnv_pci_ioda_table_do_free_pages((__be64 *)tbl->it_base, size,
>>  tbl->it_indirect_levels);
>>  }
>>
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 21/45] powerpc/powernv: Create PEs at PCI hot plugging time

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 02:16:42PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>Currently, the PEs and their associated resources are assigned
>>in ppc_md.pcibios_fixup() except those used by SRIOV VFs.
>
>But this new code does not affect IOV and VF's PEs will still be created
>somewhere else rather than pnv_pci_setup_bridge()?
>

Correct. VF PEs cannot be created in pnv_pci_setup_bridge() as the PF's
IOV capability isn't enabled at that point.

>
>>The
>>function is called for once after PCI probing and resources
>>assignment is completed. So it isn't hotplug friendly.
>>
>>This creates PEs dynamically by ppc_md.pcibios_setup_bridge(), which
>>is called on the event during system bootup and PCI hotplug: updating
>>PCI bridge's windows after resource assignment/reassignment are done.
>>For partial hotplug case, where not all PCI devices belonging to the
>>PE are unplugged and plugged again, we just need unbinding/binding
>>the affected PCI devices with the corresponding PE without creating
>>new one.
>>
>>As there is no upstream bridge for root bus that needs to be covered
>>by PE, we have to create PE for root bus in ppc_md.pcibios_setup_bridge()
>>before any other PEs can be created, as PE for root bus is the ancestor
>>to anyone else.
>
>We did not need a root bus PE before? What is the other PE reserved for?
>Comments only say "reserved"...
>

No, A PE for root bus is needed before. other PEs can be for the PCI bus
originated from root port and the subordinate domains.
 
>>
>>Also, the windows of root port or the upstream port of PCIe switch behind
>>root port are extended to be PHB's apertures to accommodate the additional
>>resources needed by newly plugged devices based on the fact: hotpluggable
>>slot is behind root port or downstream port of the PCIe switch behind
>>root port. The extension for those PCI brdiges' windows is done in
>>ppc_md.pcibios_setup_bridge() as well.
>
>
>This patch seems to be doing way too many things, hard to follow.
>
>Could you please split the patch into smaller chunks? For example (you can do
>it totally different):
>- move pnv_pci_ioda_setup_opal_tce_kill()
>- move PE creation from pnv_pci_ioda_fixup() to pnv_pci_setup_bridge();
>- add pnv_pci_fixup_bridge_resources()
>- add an extra reserved PE for the root bus (and all this magic with
>root_pe_idx/root_pe_populated)
>- ...
>

I'll evaluate it later. It's always nice to have small patches. Thanks
for the comments.

>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 20/45] powerpc/powernv: Allocate PE# in reverse order

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 01:07:59PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>PE number for one particular PE can be allocated dynamically or
>>reserved according to the consumed M64 (64-bits prefetchable)
>>segments of the PE. The M64 resources, and hence their segments
>>and PE number are assigned/reserved in ascending order. The PE
>>numbers are allocated dynamically in ascending order as well.
>>It's not a problem as the PE numbers are reserved and then
>>allocated all at once in fine order. However, it will introduce
>>conflicts when PCI hotplug is supported: the PE number to be
>>reserved for newly added PE might have been assigned.
>>
>>To resolve above conflicts, this forces the PE number to be
>>allocated dynamically in reverse order. With this patch applied,
>>the PE numbers are reserved in ascending order, but allocated
>>dynamically in reverse order.
>
>
>The patch is probably is ok, the commit log is not - I do not follow it. Some
>PEs are reserved (for what? why does the absolute PE number matter? put it in
>the commit log), that means that the corresponding bits in pe_alloc[] should
>be set so when you will be allocating PEs for a just plugged device, you
>won't pick them and you will pick free ones, and the order should not matter.
>I would think that "reservation" happens once at the boot time so you set
>"used" bits for the reserved PEs then and after that the dynamic allocator
>will skip them.
>

I will enhance the commit log in next revision, perhaps just pick part of
below words: On PHB3, there are 16 M64 BARs in hardware. The last one is
split ovenly into 256 segments. Each segment can be associated/assigned
to fixed PE# (segment#x <-> PE#x) which is how the hardware was designed.
If one plugged PE has M64 (64-bits prefetchable memory) resources, its
PE# is equal to the segment#. Otherwise, the PE# is allocated dynamically
if the PE doesn't contain M64 resource.

The M64 resources are assigned from low to high end, meaning the reserved
PE# (according to the M64 segments) are grown from low to high end. It's
most likely to get a dynamically allocated PE# which should be reserved
because of M64 segment. It's the conflicts the patch tries to resolve.

The PE# reservation doesn't happen once at boot time because it's
unknow how many PEs and how much M64 resources will be hot added.

>
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 14 ++
>>  1 file changed, 6 insertions(+), 8 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index f182ca7..565725b 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -144,16 +144,14 @@ static void pnv_ioda_reserve_pe(struct pnv_phb *phb, 
>>int pe_no)
>>
>>  static struct pnv_ioda_pe *pnv_ioda_alloc_pe(struct pnv_phb *phb)
>>  {
>>- unsigned long pe;
>>+ unsigned long pe = phb->ioda.total_pe_num - 1;
>>
>>- do {
>>- pe = find_next_zero_bit(phb->ioda.pe_alloc,
>>- phb->ioda.total_pe_num, 0);
>>- if (pe >= phb->ioda.total_pe_num)
>>- return NULL;
>>- } while(test_and_set_bit(pe, phb->ioda.pe_alloc));
>>+ for (pe = phb->ioda.total_pe_num - 1; pe >= 0; pe--) {
>>+ if (!test_and_set_bit(pe, phb->ioda.pe_alloc))
>>+ return pnv_ioda_init_pe(phb, pe);
>>+ }
>>
>>- return pnv_ioda_init_pe(phb, pe);
>>+ return NULL;
>>  }
>>
>>  static void pnv_ioda_free_pe(struct pnv_ioda_pe *pe)
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 18/45] powerpc/powernv: Increase PE# capacity

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 12:02:23PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>Each PHB maintains an array helping to translate 2-bytes Request
>>ID (RID) to PE# with the assumption that PE# takes one byte, meaning
>>that we can't have more than 256 PEs. However, pci_dn->pe_number
>>already had 4-bytes for the PE#.
>>
>>This extends the PE# capacity for every PHB. After that, the PE number
>>is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
>>check the PE# in phb->pe_rmap[] is valid or not.
>
>
>This should be merged into "[PATCH v8 21/45] powerpc/powernv: Create PEs at
>PCI hot plugging time" as it does not make sense alone (this patch does the
>initialization but only 3 patches apart this default value is analyzed ->
>hard to review).
>

Indeed, will move accordingly in next revision.

>>Signed-off-by: Gavin Shan 
>>Reviewed-by: Daniel Axtens 
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 6 +-
>>  arch/powerpc/platforms/powernv/pci.h  | 7 ++-
>>  2 files changed, 7 insertions(+), 6 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 59782fba..7800897 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -757,7 +757,7 @@ static int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, 
>>struct pnv_ioda_pe *pe)
>>
>>  /* Clear the reverse map */
>>  for (rid = pe->rid; rid < rid_end; rid++)
>>- phb->ioda.pe_rmap[rid] = 0;
>>+ phb->ioda.pe_rmap[rid] = IODA_INVALID_PE;
>>
>>  /* Release from all parents PELT-V */
>>  while (parent) {
>>@@ -3387,6 +3387,10 @@ static void __init pnv_pci_init_ioda_phb(struct 
>>device_node *np,
>>  if (prop32)
>>  phb->ioda.reserved_pe_idx = be32_to_cpup(prop32);
>>
>>+ /* Invalidate RID to PE# mapping */
>>+ for (i = 0; i < ARRAY_SIZE(phb->ioda.pe_rmap); ++i)
>>+ phb->ioda.pe_rmap[i] = IODA_INVALID_PE;
>>+
>>  /* Parse 64-bit MMIO range */
>>  pnv_ioda_parse_m64_window(phb);
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci.h 
>>b/arch/powerpc/platforms/powernv/pci.h
>>index 350e630..928cf81 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -160,11 +160,8 @@ struct pnv_phb {
>>  struct list_headpe_list;
>>  struct mutexpe_list_mutex;
>>
>>- /* Reverse map of PEs, will have to extend if
>>-  * we are to support more than 256 PEs, indexed
>>-  * bus { bus, devfn }
>>-  */
>>- unsigned char   pe_rmap[0x1];
>>+ /* Reverse map of PEs, indexed by {bus, devfn} */
>>+ int pe_rmap[0x1];
>>
>>  /* TCE cache invalidate registers (physical and
>>   * remapped)
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 17/45] powerpc/powernv/ioda1: Improve DMA32 segment track

2016-04-19 Thread Gavin Shan
On Tue, Apr 19, 2016 at 11:50:10AM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:44 PM, Gavin Shan wrote:
>>In current implementation, the DMA32 segments required by one specific
>>PE isn't calculated with the information hold in the PE independently.
>>It conflicts with the PCI hotplug design: PE centralized, meaning the
>>PE's DMA32 segments should be calculated from the information hold in
>>the PE independently.
>>
>>This introduces an array (@dma32_segmap) for every PHB to track the
>>DMA32 segmeng usage. Besides, this moves the logic calculating PE's
>>consumed DMA32 segments to pnv_pci_ioda1_setup_dma_pe() so that PE's
>>DMA32 segments are calculated/allocated from the information hold in
>>the PE (DMA32 weight). Also the logic is improved: we try to allocate
>>as much DMA32 segments as we can. It's acceptable that number of DMA32
>>segments less than the expected number are allocated.
>>
>>Signed-off-by: Gavin Shan 
>
>
>This DMA segments business was the reason why I have not even tried
>implementing DDW for POWER7 - it is way too different from POWER8 and there
>is no chance that anyone outside Ozlabs will ever try using this in practice;
>the same applies to PCI hotplug on POWER7.
>
>I am suggesting to ditch all IODA1 changes from this patchset as this code
>will hang around (unused) for may be a year or so and then will be gone as
>p5ioc2.
>

As I knew, some P7 boxes out of Ozlabs have the software stack. At least,
I was heavily relying on P7 box + PowerNV based linux heavily until last
September of last year. My original thoughts are as below. If they're
convincing, I can drop some of IODA1 changes, but not all of them obviously:

- In case customer want to use this combo (P7 box + PowerNV) for any reason.
- In case developers want to use this combo (P7 box + PowerNV) for any reason.
  For example, no P8 boxes can be found for one particular project, but 
available
  P7 box is still ok for that.
- EEH supported on P7/P8 needs hotplug some cases: when hitting excessive 
failures,
  PCI devices and their platform resources (PE, DMA, M32/M64 mapping etc) should
  be purged.
- Current implementation has P7/P8 mixed up to some extent which isn't so good
  as Ben pointed long time ago. It's impossible not to affect P7IOC piece if
  P8 piece is changed in order to support hotplug.

>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 111 
>> +-
>>  arch/powerpc/platforms/powernv/pci.h  |   7 +-
>>  2 files changed, 66 insertions(+), 52 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 0fc2309..59782fba 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2007,20 +2007,54 @@ static unsigned int 
>>pnv_pci_ioda_total_dma_weight(struct pnv_phb *phb)
>>  }
>>
>>  static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb *phb,
>>-struct pnv_ioda_pe *pe,
>>-unsigned int base,
>>-unsigned int segs)
>>+struct pnv_ioda_pe *pe)
>>  {
>>
>>  struct page *tce_mem = NULL;
>>  struct iommu_table *tbl;
>>- unsigned int tce32_segsz, i;
>>+ unsigned int weight, total_weight;
>>+ unsigned int tce32_segsz, base, segs, i;
>>  int64_t rc;
>>  void *addr;
>>
>>  /* XXX FIXME: Handle 64-bit only DMA devices */
>>  /* XXX FIXME: Provide 64-bit DMA facilities & non-4K TCE tables etc.. */
>>  /* XXX FIXME: Allocate multi-level tables on PHB3 */
>>+ total_weight = pnv_pci_ioda_total_dma_weight(phb);
>>+ weight = pnv_pci_ioda_pe_dma_weight(pe);
>>+
>>+ segs = (weight * phb->ioda.dma32_count) / total_weight;
>>+ if (!segs)
>>+ segs = 1;
>>+
>>+ /*
>>+  * Allocate contiguous DMA32 segments. We begin with the expected
>>+  * number of segments. With one more attempt, the number of DMA32
>>+  * segments to be allocated is decreased by one until one segment
>>+  * is allocated successfully.
>>+  */
>>+ while (segs) {
>>+ for (base = 0; base <= phb->ioda.dma32_count - segs; base++) {
>>+ for (i = base; i < base + segs; i++) {
>>+ if (phb->ioda.dma32_segmap[i] !=
>>+ IODA_INVALID_PE)
>>+ break;
>>+ }
>>+
>>+ if (i >= base + segs)
>>+ break;
>>+ }
>>+
>>+ if (i >= base + segs)
>>+ break;
>>+
>>+ segs--;
>>+ }
>>+
>>+ if (!segs) {
>>+ pe_warn(pe, "No available DMA32 segments\n");
>>+ return;
>>+ }
>>
>>  tbl = pnv_pci_table_alloc(phb->hose->node);
>>  iommu_register_group(>table_group, phb->hose->global_number,

Re: [PATCH v8 16/45] powerpc/powernv: Remove DMA32 PE list

2016-04-19 Thread Gavin Shan
On Wed, Apr 13, 2016 at 06:59:40PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>PEs are put into PHB DMA32 list (phb->ioda.pe_dma_list) according
>>to their DMA32 weight. The PEs on the list are iterated to setup
>>their TCE32 tables at system booting time. The list is used for
>>once and there is for keep having it.
>
>"there is no need to keep it" may be?
>

Sorry, I should have fixed it in early revision. Will fix it
up in next revision.

>>
>>This moves the logic calculating DMA32 weight of PHB and PE to
>>pnv_ioda_setup_dma() to drop PHB's DMA32 list. Also, every PE
>>traces the consumed DMA32 segment by @tce32_seg and @tce32_segcount
>>are useless and they're removed.
>>
>>Signed-off-by: Gavin Shan 
>
>
>Reviewed-by: Alexey Kardashevskiy 
>
>with few comments below...
>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 168 
>> +-
>>  arch/powerpc/platforms/powernv/pci.h  |  19 
>>  2 files changed, 75 insertions(+), 112 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index e60cff6..0fc2309 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -886,44 +886,6 @@ out:
>>  return 0;
>>  }
>>
>>-static void pnv_ioda_link_pe_by_weight(struct pnv_phb *phb,
>>-struct pnv_ioda_pe *pe)
>>-{
>>- struct pnv_ioda_pe *lpe;
>>-
>>- list_for_each_entry(lpe, >ioda.pe_dma_list, dma_link) {
>>- if (lpe->dma_weight < pe->dma_weight) {
>>- list_add_tail(>dma_link, >dma_link);
>>- return;
>>- }
>>- }
>>- list_add_tail(>dma_link, >ioda.pe_dma_list);
>>-}
>>-
>>-static unsigned int pnv_ioda_dma_weight(struct pci_dev *dev)
>>-{
>>- /* This is quite simplistic. The "base" weight of a device
>>-  * is 10. 0 means no DMA is to be accounted for it.
>>-  */
>>-
>>- /* If it's a bridge, no DMA */
>>- if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL)
>>- return 0;
>>-
>>- /* Reduce the weight of slow USB controllers */
>>- if (dev->class == PCI_CLASS_SERIAL_USB_UHCI ||
>>- dev->class == PCI_CLASS_SERIAL_USB_OHCI ||
>>- dev->class == PCI_CLASS_SERIAL_USB_EHCI)
>>- return 3;
>>-
>>- /* Increase the weight of RAID (includes Obsidian) */
>>- if ((dev->class >> 8) == PCI_CLASS_STORAGE_RAID)
>>- return 15;
>>-
>>- /* Default */
>>- return 10;
>>-}
>>-
>>  #ifdef CONFIG_PCI_IOV
>>  static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
>>  {
>>@@ -1028,7 +990,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct 
>>pci_dev *dev)
>>  pe->flags = PNV_IODA_PE_DEV;
>>  pe->pdev = dev;
>>  pe->pbus = NULL;
>>- pe->tce32_seg = -1;
>>  pe->mve_number = -1;
>>  pe->rid = dev->bus->number << 8 | pdn->devfn;
>>
>>@@ -1044,16 +1005,6 @@ static struct pnv_ioda_pe 
>>*pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>>  return NULL;
>>  }
>>
>>- /* Assign a DMA weight to the device */
>>- pe->dma_weight = pnv_ioda_dma_weight(dev);
>>- if (pe->dma_weight != 0) {
>>- phb->ioda.dma_weight += pe->dma_weight;
>>- phb->ioda.dma_pe_count++;
>>- }
>>-
>>- /* Link the PE */
>>- pnv_ioda_link_pe_by_weight(phb, pe);
>>-
>>  return pe;
>>  }
>>
>>@@ -1071,7 +1022,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
>>struct pnv_ioda_pe *pe)
>>  }
>>  pdn->pcidev = dev;
>>  pdn->pe_number = pe->pe_number;
>>- pe->dma_weight += pnv_ioda_dma_weight(dev);
>>  if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
>>  pnv_ioda_setup_same_PE(dev->subordinate, pe);
>>  }
>>@@ -1108,10 +1058,8 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, 
>>bool all)
>>  pe->flags |= (all ? PNV_IODA_PE_BUS_ALL : PNV_IODA_PE_BUS);
>>  pe->pbus = bus;
>>  pe->pdev = NULL;
>>- pe->tce32_seg = -1;
>>  pe->mve_number = -1;
>>  pe->rid = bus->busn_res.start << 8;
>>- pe->dma_weight = 0;
>>
>>  if (all)
>>  pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
>>@@ -1133,17 +1081,6 @@ static void pnv_ioda_setup_bus_PE(struct pci_bus *bus, 
>>bool all)
>>
>>  /* Put PE to the list */
>>  list_add_tail(>list, >ioda.pe_list);
>>-
>>- /* Account for one DMA PE if at least one DMA capable device exist
>>-  * below the bridge
>>-  */
>>- if (pe->dma_weight != 0) {
>>- phb->ioda.dma_weight += pe->dma_weight;
>>- phb->ioda.dma_pe_count++;
>>- }
>>-
>>- /* Link the PE */
>>- pnv_ioda_link_pe_by_weight(phb, pe);
>>  }
>>
>>  static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
>>@@ -1184,7 +1121,6 @@ static 

Re: [PATCH v8 13/45] powerpc/powernv/ioda1: M64 support on P7IOC

2016-04-19 Thread Gavin Shan
On Wed, Apr 13, 2016 at 05:47:59PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>This enables M64 window on P7IOC, which has been enabled on PHB3.
>>Different from PHB3 where 16 M64 BARs are supported and each of
>>them can be owned by one particular PE# exclusively or divided
>>evenly to 256 segments, every P7IOC PHB has 16 M64 BARs and each
>>of them are divided to 8 segments. So every P7IOC PHB supports
>>128 M64 segments in total. P7IOC has M64DT, which helps mapping
>>one particular M64 segment# to arbitrary PE#. PHB3 doesn't have
>>M64DT, indicating that one M64 segment can only be pinned to the
>>fixed PE#. In order to have same code to support M64 on P7IOC and
>>PHB3, we just provide 128 M64 segments on every P7IOC PHB and each
>>of them is pinned to the fixed PE# by bypassing the function of
>>M64DT. In turn, we just need different phb->init_m64() for P7IOC
>>and PHB3 to support M64.
>
>The comment is not quite correct - in addition to pnv_ioda1_init_m64(), you
>also need to hack pnv_ioda_pick_m64_pe().
>

Right, will talk about the changes to pnv_ioda_pick_m64_pe() in the
commit log of next revision.

>
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 86 
>> +--
>>  arch/powerpc/platforms/powernv/pci.h  |  3 ++
>>  2 files changed, 86 insertions(+), 3 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 1dc663a..8488238 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -246,6 +246,64 @@ static void pnv_ioda_reserve_dev_m64_pe(struct pci_dev 
>>*pdev,
>>  }
>>  }
>>
>>+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
>>+{
>>+ struct resource *r;
>>+ int index;
>>+
>>+ /*
>>+  * There are 16 M64 BARs, each of which has 8 segments. So
>>+  * there are as many M64 segments as the maximum number of
>>+  * PEs, which is 128.
>>+  */
>>+ for (index = 0; index < PNV_IODA1_M64_NUM; index++) {
>>+ unsigned long base, segsz = phb->ioda.m64_segsize;
>>+ int64_t rc;
>>+
>>+ base = phb->ioda.m64_base +
>>+index * PNV_IODA1_M64_SEGS * segsz;
>>+ rc = opal_pci_set_phb_mem_window(phb->opal_id,
>>+ OPAL_M64_WINDOW_TYPE, index, base, 0,
>>+ PNV_IODA1_M64_SEGS * segsz);
>>+ if (rc != OPAL_SUCCESS) {
>>+ pr_warn("  Error %lld setting M64 PHB#%d-BAR#%d\n",
>>+ rc, phb->hose->global_number, index);
>>+ goto fail;
>>+ }
>>+
>>+ rc = opal_pci_phb_mmio_enable(phb->opal_id,
>>+ OPAL_M64_WINDOW_TYPE, index,
>>+ OPAL_ENABLE_M64_SPLIT);
>>+ if (rc != OPAL_SUCCESS) {
>>+ pr_warn("  Error %lld enabling M64 PHB#%d-BAR#%d\n",
>>+ rc, phb->hose->global_number, index);
>>+ goto fail;
>>+ }
>>+ }
>>+
>>+ /*
>>+  * Exclude the segment used by the reserved PE, which
>>+  * is expected to be 0 or last supported PE#.
>>+  */
>>+ r = >hose->mem_resources[1];
>>+ if (phb->ioda.reserved_pe_idx == 0)
>>+ r->start += phb->ioda.m64_segsize;
>>+ else if (phb->ioda.reserved_pe_idx == (phb->ioda.total_pe_num - 1))
>>+ r->end -= phb->ioda.m64_segsize;
>>+ else
>>+ pr_warn("  Cannot cut M64 segment for reserved PE#%d\n",
>>+ phb->ioda.reserved_pe_idx);
>>+
>>+ return 0;
>>+
>>+fail:
>>+ for ( ; index >= 0; index--)
>>+ opal_pci_phb_mmio_enable(phb->opal_id,
>>+ OPAL_M64_WINDOW_TYPE, index, OPAL_DISABLE_M64);
>>+
>>+ return -EIO;
>>+}
>>+
>>  static void pnv_ioda_reserve_m64_pe(struct pci_bus *bus,
>>  unsigned long *pe_bitmap,
>>  bool all)
>>@@ -315,6 +373,26 @@ static int pnv_ioda_pick_m64_pe(struct pci_bus *bus, 
>>bool all)
>>  pe->master = master_pe;
>>  list_add_tail(>list, _pe->slaves);
>>  }
>>+
>>+ /*
>>+  * P7IOC supports M64DT, which helps mapping M64 segment
>>+  * to one particular PE#. However, PHB3 has fixed mapping
>>+  * between M64 segment and PE#. In order to have same logic
>>+  * for P7IOC and PHB3, we enforce fixed mapping between M64
>>+  * segment and PE# on P7IOC.
>>+  */
>>+ if (phb->type == PNV_PHB_IODA1) {
>>+ int64_t rc;
>>+
>>+ rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>+ pe->pe_number, OPAL_M64_WINDOW_TYPE,
>>+ 

Re: [PATCH v8 11/45] powerpc/powernv: Track M64 segment consumption

2016-04-19 Thread Gavin Shan
On Wed, Apr 13, 2016 at 05:09:45PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>When unplugging PCI devices, their parent PEs might be offline.
>>The consumed M64 resource by the PEs should be released at that
>>time. As we track M32 segment consumption, this introduces an
>>array to the PHB to track the mapping between M64 segment and
>>PE number.
>>
>>Signed-off-by: Gavin Shan 
>
>
>Reviewed-by: Alexey Kardashevskiy 
>
>but it would not hurt to mention in the commit log why M64 segment is not
>tracked/setup by the existing (at this point, at least)
>pnv_ioda_setup_one_res().
>

Right, I'll add something for it to the commit log in next revision, thanks!

>
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 --
>>  arch/powerpc/platforms/powernv/pci.h  |  1 +
>>  2 files changed, 9 insertions(+), 2 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 7330a73..fc0374a 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -305,6 +305,7 @@ static int pnv_ioda2_pick_m64_pe(struct pci_bus *bus, 
>>bool all)
>>  phb->ioda.total_pe_num) {
>>  pe = >ioda.pe_array[i];
>>
>>+ phb->ioda.m64_segmap[pe->pe_number] = pe->pe_number;
>>  if (!master_pe) {
>>  pe->flags |= PNV_IODA_PE_MASTER;
>>  INIT_LIST_HEAD(>slaves);
>>@@ -3245,7 +3246,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
>>device_node *np,
>>  {
>>  struct pci_controller *hose;
>>  struct pnv_phb *phb;
>>- unsigned long size, m32map_off, pemap_off, iomap_off = 0;
>>+ unsigned long size, m64map_off, m32map_off, pemap_off, iomap_off = 0;
>>  const __be64 *prop64;
>>  const __be32 *prop32;
>>  int i, len;
>>@@ -3332,6 +,8 @@ static void __init pnv_pci_init_ioda_phb(struct 
>>device_node *np,
>>
>>  /* Allocate aux data & arrays. We don't have IO ports on PHB3 */
>>  size = _ALIGN_UP(phb->ioda.total_pe_num / 8, sizeof(unsigned long));
>>+ m64map_off = size;
>>+ size += phb->ioda.total_pe_num * sizeof(phb->ioda.m64_segmap[0]);
>>  m32map_off = size;
>>  size += phb->ioda.total_pe_num * sizeof(phb->ioda.m32_segmap[0]);
>>  if (phb->type == PNV_PHB_IODA1) {
>>@@ -3342,9 +3345,12 @@ static void __init pnv_pci_init_ioda_phb(struct 
>>device_node *np,
>>  size += phb->ioda.total_pe_num * sizeof(struct pnv_ioda_pe);
>>  aux = memblock_virt_alloc(size, 0);
>>  phb->ioda.pe_alloc = aux;
>>+ phb->ioda.m64_segmap = aux + m64map_off;
>>  phb->ioda.m32_segmap = aux + m32map_off;
>>- for (i = 0; i < phb->ioda.total_pe_num; i++)
>>+ for (i = 0; i < phb->ioda.total_pe_num; i++) {
>>+ phb->ioda.m64_segmap[i] = IODA_INVALID_PE;
>>  phb->ioda.m32_segmap[i] = IODA_INVALID_PE;
>>+ }
>>  if (phb->type == PNV_PHB_IODA1) {
>>  phb->ioda.io_segmap = aux + iomap_off;
>>  for (i = 0; i < phb->ioda.total_pe_num; i++)
>>diff --git a/arch/powerpc/platforms/powernv/pci.h 
>>b/arch/powerpc/platforms/powernv/pci.h
>>index 36c4965..866a5ea 100644
>>--- a/arch/powerpc/platforms/powernv/pci.h
>>+++ b/arch/powerpc/platforms/powernv/pci.h
>>@@ -146,6 +146,7 @@ struct pnv_phb {
>>  struct pnv_ioda_pe  *pe_array;
>>
>>  /* M32 & IO segment maps */
>>+ int *m64_segmap;
>>  int *m32_segmap;
>>  int *io_segmap;
>>
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 09/45] powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()

2016-04-19 Thread Gavin Shan
On Wed, Apr 13, 2016 at 04:45:39PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>The original implementation of pnv_ioda_setup_pe_seg() configures
>>IO and M32 segments by separate logics, which can be merged by
>>by caching @segmap, @seg_size, @win in advance. This shouldn't
>>cause any behavioural changes.
>>
>>Signed-off-by: Gavin Shan 
>>---
>>  arch/powerpc/platforms/powernv/pci-ioda.c | 62 
>> ++-
>>  1 file changed, 28 insertions(+), 34 deletions(-)
>>
>>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>>b/arch/powerpc/platforms/powernv/pci-ioda.c
>>index 44cc5f3..fd7d382 100644
>>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>@@ -2940,8 +2940,10 @@ static void pnv_ioda_setup_pe_seg(struct 
>>pci_controller *hose,
>>  struct pnv_phb *phb = hose->private_data;
>>  struct pci_bus_region region;
>>  struct resource *res;
>>- int i, index;
>>- int rc;
>>+ unsigned int segsize;
>>+ int *segmap, index, i;
>>+ uint16_t win;
>>+ int64_t rc;
>>
>>  /*
>>   * NOTE: We only care PCI bus based PE for now. For PCI
>>@@ -2958,23 +2960,9 @@ static void pnv_ioda_setup_pe_seg(struct 
>>pci_controller *hose,
>>  if (res->flags & IORESOURCE_IO) {
>>  region.start = res->start - phb->ioda.io_pci_base;
>>  region.end   = res->end - phb->ioda.io_pci_base;
>>- index = region.start / phb->ioda.io_segsize;
>>-
>>- while (index < phb->ioda.total_pe_num &&
>>-region.start <= region.end) {
>>- phb->ioda.io_segmap[index] = pe->pe_number;
>>- rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>- pe->pe_number, OPAL_IO_WINDOW_TYPE, 0, 
>>index);
>>- if (rc != OPAL_SUCCESS) {
>>- pr_err("%s: OPAL error %d when mapping 
>>IO "
>>-"segment #%d to PE#%d\n",
>>-__func__, rc, index, 
>>pe->pe_number);
>>- break;
>>- }
>>-
>>- region.start += phb->ioda.io_segsize;
>>- index++;
>>- }
>>+ segsize  = phb->ioda.io_segsize;
>>+ segmap   = phb->ioda.io_segmap;
>>+ win  = OPAL_IO_WINDOW_TYPE;
>>  } else if ((res->flags & IORESOURCE_MEM) &&
>> !pnv_pci_is_mem_pref_64(res->flags)) {
>>  region.start = res->start -
>>@@ -2983,23 +2971,29 @@ static void pnv_ioda_setup_pe_seg(struct 
>>pci_controller *hose,
>>  region.end   = res->end -
>> hose->mem_offset[0] -
>> phb->ioda.m32_pci_base;
>>- index = region.start / phb->ioda.m32_segsize;
>>-
>>- while (index < phb->ioda.total_pe_num &&
>>-region.start <= region.end) {
>>- phb->ioda.m32_segmap[index] = pe->pe_number;
>>- rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>- pe->pe_number, OPAL_M32_WINDOW_TYPE, 0, 
>>index);
>>- if (rc != OPAL_SUCCESS) {
>>- pr_err("%s: OPAL error %d when mapping 
>>M32 "
>>-"segment#%d to PE#%d",
>>-__func__, rc, index, 
>>pe->pe_number);
>>- break;
>>- }
>>+ segsize  = phb->ioda.m32_segsize;
>>+ segmap   = phb->ioda.m32_segmap;
>>+ win  = OPAL_M32_WINDOW_TYPE;
>>+ } else {
>>+ continue;
>>+ }
>>
>>- region.start += phb->ioda.m32_segsize;
>>- index++;
>>+ index = region.start / segsize;
>>+ while (index < phb->ioda.total_pe_num &&
>>+region.start <= region.end) {
>>+ segmap[index] = pe->pe_number;
>>+ rc = opal_pci_map_pe_mmio_window(phb->opal_id,
>>+ pe->pe_number, win, 0, index);
>>+ if (rc != OPAL_SUCCESS) {
>>+ pr_warn("%s: Error %lld mapping (%d) seg#%d to 
>>PHB#%d-PE#%d\n",
>>+ __func__, rc, win, index,
>>+ pe->phb->hose->global_number,
>>+ 

Re: [PATCH v8 03/45] powerpc/pci: Cleanup on struct pci_controller_ops

2016-04-19 Thread Gavin Shan
On Wed, Apr 13, 2016 at 03:52:25PM +1000, Alexey Kardashevskiy wrote:
>On 02/17/2016 02:43 PM, Gavin Shan wrote:
>>Each PHB has one instance of "struct pci_controller_ops", which
>>includes various callbacks called by PCI subsystem. In the definition
>>of this struct, some callbacks have explicit names for its arguments,
>>but the left don't have.
>>
>>This adds all explicit names of the arguments to the callbacks in
>>"struct pci_controller_ops" so that the code looks consistent.
>>
>>Signed-off-by: Gavin Shan 
>>Reviewed-by: Daniel Axtens 
>
>With tiny nit below,
>
>Reviewed-by: Alexey Kardashevskiy 
>
>
>
>>---
>>  arch/powerpc/include/asm/pci-bridge.h | 13 +++--
>>  1 file changed, 7 insertions(+), 6 deletions(-)
>>
>>diff --git a/arch/powerpc/include/asm/pci-bridge.h 
>>b/arch/powerpc/include/asm/pci-bridge.h
>>index b688d04..4dd6ef4 100644
>>--- a/arch/powerpc/include/asm/pci-bridge.h
>>+++ b/arch/powerpc/include/asm/pci-bridge.h
>>@@ -21,18 +21,19 @@ struct pci_controller_ops {
>>  void(*dma_dev_setup)(struct pci_dev *dev);
>>  void(*dma_bus_setup)(struct pci_bus *bus);
>>
>>- int (*probe_mode)(struct pci_bus *);
>>+ int (*probe_mode)(struct pci_bus *bus);
>>
>>  /* Called when pci_enable_device() is called. Returns true to
>>   * allow assignment/enabling of the device. */
>>- bool(*enable_device_hook)(struct pci_dev *);
>>+ bool(*enable_device_hook)(struct pci_dev *dev);
>
>
>"pdev" is slightly better as it is of the "pci_dev" type (4130 occurrences of
>"pci_dev *pdev" and just 2833 of "pci_dev *dev" in the current kernel), "dev"
>is for "struct device".
>

Thanks for your review. I don't know if "dev" is for "struct device" only.
Usually, "dev" and "pdev" are interchangeably used for "struct pci_dev".
Especially the code written in old days uses "dev" for "struct pci_dev"
heavily.

Yes, I agree "pdev" is better than "dev" in this case and I'm going to
fix this up in next revision.

>>
>>- void(*disable_device)(struct pci_dev *);
>>+ void(*disable_device)(struct pci_dev *dev);
>>
>>- void(*release_device)(struct pci_dev *);
>>+ void(*release_device)(struct pci_dev *dev);
>>
>>  /* Called during PCI resource reassignment */
>>- resource_size_t (*window_alignment)(struct pci_bus *, unsigned long 
>>type);
>>+ resource_size_t (*window_alignment)(struct pci_bus *bus,
>>+ unsigned long type);
>>  void(*setup_bridge)(struct pci_bus *bus,
>>  unsigned long type);
>>  void(*reset_secondary_bus)(struct pci_dev *dev);
>>@@ -46,7 +47,7 @@ struct pci_controller_ops {
>>  int (*dma_set_mask)(struct pci_dev *dev, u64 dma_mask);
>>  u64 (*dma_get_required_mask)(struct pci_dev *dev);
>>
>>- void(*shutdown)(struct pci_controller *);
>>+ void(*shutdown)(struct pci_controller *hose);
>>  };
>>
>>  /*
>>
>
>
>-- 
>Alexey
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [V2] powerpc/Kconfig: Update config option based on page size.

2016-04-19 Thread Balbir Singh


On 20/04/16 00:59, Aneesh Kumar K.V wrote:
> Michael Ellerman  writes:
> 
>> On Fri, 2016-19-02 at 05:38:47 UTC, Rashmica Gupta wrote:
>>> Currently on PPC64 changing kernel pagesize from 4K to 64K leaves
>>> FORCE_MAX_ZONEORDER set to 13 - which produces a compile error.
>>>
>> ...
>>> So, update the range of FORCE_MAX_ZONEORDER from 9-64 to 8-9 for 64K pages
>>> and from 13-64 to 9-13 for 4K pages.
>>>
>>> Signed-off-by: Rashmica Gupta 
>>> Reviewed-by: Balbir Singh 
>>
>> Applied to powerpc next, thanks.
>>
>> https://git.kernel.org/powerpc/c/a7ee539584acf4a565b7439cea
>>
> 
> HPAGE_PMD_ORDER is not something we should check w.r.t 4k linux page
> size. We do have the below constraint w.r.t hugetlb pages
> 
> static inline bool hstate_is_gigantic(struct hstate *h)
> {
>   return huge_page_order(h) >= MAX_ORDER;
> }
> 
> That require MAX_ORDER to be greater than 12.
> 

The build will fail for MAX_ZONEORDER beyond the specified limits.
MAX_ORDER > 12 for what page size?

My understanding is this

1. gigantic refers to the fact the regular allocators cannot allocate
this page
2. Use alloc_contig_range() with CONFIG_CMA for gigantic pages

I could be wrong

> Did we test hugetlbfs 4k config with this patch ? Will it work if we
> start marking hugepage as gigantic page ?

Nope.. I did not

Thanks for the review!
Balbir Singh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] Live patching for powerpc

2016-04-19 Thread Jiri Kosina
On Wed, 20 Apr 2016, Balbir Singh wrote:

> Thanks, do we have a summary of what the relocation changes look like?

This work is queued in 
livepatching.git#for-4.7/arch-independent-klp-relocations

-- 
Jiri Kosina
SUSE Labs

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] Live patching for powerpc

2016-04-19 Thread Balbir Singh


On 16/04/16 01:07, Jiri Kosina wrote:
> On Thu, 14 Apr 2016, Michael Ellerman wrote:
> 
>> Topic branch here:
>>
>>   
>> https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git/log/?h=topic/livepatch
>>
>> I will merge that before Monday (my time) if I don't hear any objections.
> 
> I've now pulled this into livepatching.git#for-4.7/livepatching-ppc64 and 
> merged that branch into for-next as well.
> 
> That branch already contains all the relocation changes queued for 4.7, so 
> as much testing of the merged result as possible on ppc64 would be 
> appreciated.
Thanks, do we have a summary of what the relocation changes look like?

Balbir Singh.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/1] powerpc/86xx: Add support for Emerson/Artesyn MVME7100

2016-04-19 Thread Scott Wood
On Tue, 2016-04-19 at 10:33 +0200, Alessio Igor Bogani wrote:
> Hi Scott,
> 
> Thanks for reviewing it!
> 
> On 19 April 2016 at 06:26, Scott Wood  wrote:
> > On Mon, 2016-04-18 at 09:57 +0200, Alessio Igor Bogani wrote:
> > > + pci0: pcie@f1008000 {
> > > + reg = <0xf1008000 0x1000>;
> > > + ranges = <0x0200 0x0 0x8000 0x8000 0x0
> > > 0x5000
> > > +   0x0100 0x0 0x 0xf000 0x0
> > > 0x0080>;
> [...]
> > > +
> > > + pci1: pcie@f1009000 {
> > > + compatible = "fsl,mpc8641-pcie";
> > > + device_type = "pci";
> > > + #size-cells = <2>;
> > > + #address-cells = <3>;
> > > + reg = <0xf1009000 0x1000>;
> > > + bus-range = <0 0xff>;
> > 
> > Why are pci0 and pci1 so different? Why does mpc8641si-post.dtsi not have
> > pci1?
> 
> You are right. The MPC8641 processor offers two pci so
> mpc8641si-post.dtsi should be the right place where to define both.
> What about the boards which don't use the pci1? Will 'status =
> "disabled"' be enough?

Yes.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Trouble with DMA on PPC linux question

2016-04-19 Thread Bruce_Leonard
Ben,

Benjamin Herrenschmidt  wrote on 04/19/2016 01:45:40 AM:

> From: Benjamin Herrenschmidt 
> To: bruce_leon...@selinc.com, linuxppc-dev@lists.ozlabs.org
> Date: 04/19/2016 01:46 AM
> Subject: Re: Trouble with DMA on PPC linux question
> 
> On Mon, 2016-04-18 at 14:54 -0700, bruce_leon...@selinc.com wrote:
> > 
> > On the DMA transactions that work, the virtual address I hand to 
> > dma_map_single() is something like 0xe084 and the dma_addr_t 
result is 
> > 0x1084 which is less than my 512Mb limit.  On the transactions 
that 
> > don't work, the virtual address is 0xd539 with the mapped result 
being 
> > 0x2539, which is past my upper bound on my RAM.  In fact it's not 
even 
> > in my memory map, there's a hole there. 
> 
> Where does this virtual address come from ?
> 
> The kernel has two types of virtual addresses. Those coming from the
> linear mapping (the stuff you get from kmalloc() for example, or
> get_pages()) which can be translated using that simple substraction.
> 
> The other is the vmalloc space, and that is a non-linear mapping of
> random pages.
> 
> If your vaddr comes from the latter it can't be passed to
> dma_map_single as-is, you need to get to the underlying pages first.
> 
> Ben.
> 

That's a good question.  I'm not sure where the addresses come from right 
now (they're handed to me from the MTD layer), but I'll certainly dig into 
that and see.

Thanks for the help!  I appreciate the pointer.

Bruce
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] cxl: Increase timeout for detection of AFU mmio hang

2016-04-19 Thread Frederic Barrat
PSL designers recommend a larger value for the mmio hang pulse, 256 us
instead of 1 us. The CAIA architecture states that it needs to be
smaller than 1/2 of the RTOS timeout set in the PHB for outbound
non-posted transactions, which is still (easily) the case here.

Signed-off-by: Frederic Barrat 
---
Needs to be applied on top of http://patchwork.ozlabs.org/patch/604029/


 drivers/misc/cxl/pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index 94fd3f7..0a9c15b 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -375,8 +375,10 @@ static int init_implementation_adapter_regs(struct cxl 
*adapter, struct pci_dev
return -ENODEV;
}
 
+   psl_dsnctl = 0x9000ULL; /* pteupd ttype, scdone */
+   psl_dsnctl |= (0x2ULL << (63-38)); /* MMIO hang pulse: 256 us */
/* Tell PSL where to route data to */
-   psl_dsnctl = 0x9200ULL | (chipid << (63-5));
+   psl_dsnctl |= (chipid << (63-5));
psl_dsnctl |= (capp_unit_id << (63-13));
 
cxl_p1_write(adapter, CXL_PSL_DSNDCTL, psl_dsnctl);
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/5] drivers/net: support hdlc function for QE-UCC

2016-04-19 Thread Christophe Leroy

Le 30/03/2016 10:50, Zhao Qiang a écrit :

The driver add hdlc support for Freescale QUICC Engine.
It support NMSI and TSA mode.
When using TSA, how does the TSA gets configured ? Especially how do you 
describe which Timeslot is switched to HDLC channels ?
Is it possible to route some Timeslots to one UCC for HDLC, and route 
some others to another UCC for an ALSA sound driver ?


The QE also have a QMC which allows to split all timeslots to a given 
UCC into independant channels that can either be used with HDLC or 
transparents (for audio for instance). Do you intent to also support QMC ?


According to the compatible property, it looks like your driver is for 
freescale T1040. The MPC83xx also has a Quick Engine, would it work on 
it too ?


Christophe



Signed-off-by: Zhao Qiang 
---
  MAINTAINERS|6 +
  drivers/net/wan/Kconfig|   12 +
  drivers/net/wan/Makefile   |1 +
  drivers/net/wan/fsl_ucc_hdlc.c | 1339 
  drivers/net/wan/fsl_ucc_hdlc.h |  140 +
  include/soc/fsl/qe/ucc_fast.h  |4 +
  6 files changed, 1502 insertions(+)
  create mode 100644 drivers/net/wan/fsl_ucc_hdlc.c
  create mode 100644 drivers/net/wan/fsl_ucc_hdlc.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 74bbff3..428d6ed 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4572,6 +4572,12 @@ F:   drivers/net/ethernet/freescale/gianfar*
  X:drivers/net/ethernet/freescale/gianfar_ptp.c
  F:Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
  
+FREESCALE QUICC ENGINE UCC HDLC DRIVER

+M: Zhao Qiang 
+L: linuxppc-dev@lists.ozlabs.org
+S: Maintained
+F: drivers/net/wan/fsl_ucc_hdlc*
+
  FREESCALE QUICC ENGINE UCC UART DRIVER
  M:Timur Tabi 
  L:linuxppc-dev@lists.ozlabs.org
diff --git a/drivers/net/wan/Kconfig b/drivers/net/wan/Kconfig
index a2fdd15..cc424b2 100644
--- a/drivers/net/wan/Kconfig
+++ b/drivers/net/wan/Kconfig
@@ -280,6 +280,18 @@ config DSCC4
  To compile this driver as a module, choose M here: the
  module will be called dscc4.
  
+config FSL_UCC_HDLC

+   tristate "Freescale QUICC Engine HDLC support"
+   depends on HDLC
+   select QE_TDM
+   select QUICC_ENGINE
+   help
+ Driver for Freescale QUICC Engine HDLC controller. The driver
+ support HDLC run on NMSI and TDM mode.
+
+ To compile this driver as a module, choose M here: the
+ module will be called fsl_ucc_hdlc.
+
  config DSCC4_PCISYNC
bool "Etinc PCISYNC features"
depends on DSCC4
diff --git a/drivers/net/wan/Makefile b/drivers/net/wan/Makefile
index c135ef4..25fec40 100644
--- a/drivers/net/wan/Makefile
+++ b/drivers/net/wan/Makefile
@@ -32,6 +32,7 @@ obj-$(CONFIG_WANXL)   += wanxl.o
  obj-$(CONFIG_PCI200SYN)   += pci200syn.o
  obj-$(CONFIG_PC300TOO)+= pc300too.o
  obj-$(CONFIG_IXP4XX_HSS)  += ixp4xx_hss.o
+obj-$(CONFIG_FSL_UCC_HDLC) += fsl_ucc_hdlc.o
  
  clean-files := wanxlfw.inc

  $(obj)/wanxl.o:   $(obj)/wanxlfw.inc
diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
new file mode 100644
index 000..9958ec1
--- /dev/null
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -0,0 +1,1339 @@
+/* Freescale QUICC Engine HDLC Device Driver
+ *
+ * Copyright 2014 Freescale Semiconductor Inc.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "fsl_ucc_hdlc.h"
+
+#define DRV_DESC "Freescale QE UCC HDLC Driver"
+#define DRV_NAME "ucc_hdlc"
+
+#define TDM_PPPOHT_SLIC_MAXIN
+/* #define DEBUG */
+/* #define QE_HDLC_TEST */
+#define BROKEN_FRAME_INFO
+
+static struct ucc_tdm_info utdm_primary_info = {
+   .uf_info = {
+   .tsa = 0,
+   .cdp = 0,
+   .cds = 1,
+   .ctsp = 1,
+   .ctss = 1,
+   .revd = 0,
+   .urfs = 256,
+   .utfs = 256,
+   .urfet = 128,
+   .urfset = 192,
+   .utfet = 128,
+   .utftt = 0x40,
+   .ufpt = 256,
+   .mode = UCC_FAST_PROTOCOL_MODE_HDLC,
+   .ttx_trx = UCC_FAST_GUMR_TRANSPARENT_TTX_TRX_NORMAL,
+   .tenc = UCC_FAST_TX_ENCODING_NRZ,
+   .renc = UCC_FAST_RX_ENCODING_NRZ,
+   .tcrc = UCC_FAST_16_BIT_CRC,
+   .synl = UCC_FAST_SYNC_LEN_NOT_USED,
+   },
+
+   .si_info = {
+#ifdef CONFIG_FSL_PQ_MDS_T1
+   .simr_rfsd = 1,  

Re: [V2] powerpc/Kconfig: Update config option based on page size.

2016-04-19 Thread Aneesh Kumar K.V
Michael Ellerman  writes:

> On Fri, 2016-19-02 at 05:38:47 UTC, Rashmica Gupta wrote:
>> Currently on PPC64 changing kernel pagesize from 4K to 64K leaves
>> FORCE_MAX_ZONEORDER set to 13 - which produces a compile error.
>> 
> ...
>> So, update the range of FORCE_MAX_ZONEORDER from 9-64 to 8-9 for 64K pages
>> and from 13-64 to 9-13 for 4K pages.
>> 
>> Signed-off-by: Rashmica Gupta 
>> Reviewed-by: Balbir Singh 
>
> Applied to powerpc next, thanks.
>
> https://git.kernel.org/powerpc/c/a7ee539584acf4a565b7439cea
>

HPAGE_PMD_ORDER is not something we should check w.r.t 4k linux page
size. We do have the below constraint w.r.t hugetlb pages

static inline bool hstate_is_gigantic(struct hstate *h)
{
return huge_page_order(h) >= MAX_ORDER;
}

That require MAX_ORDER to be greater than 12.

Did we test hugetlbfs 4k config with this patch ? Will it work if we
start marking hugepage as gigantic page ?

-aneesh

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] powerpc/devtree: Parse new DRC mem/cpu/dev device, tree elements

2016-04-19 Thread Michael Bringmann


On 03/15/2016 12:15 AM, linuxppc-dev-requ...@lists.ozlabs.org wrote:


> Documentation/devicetree/bindings ? or link to PAPR where it's specified?
>
> -- 
> Stewart Smith
< OPAL Architect, IBM.

Here's the link to the Notes PAPR database's issue: 
notes://D01DBR12/86256680004635D2/565907e362ce41e28625636a000fba97/b2fa2e426b3222fd85257e810050443c

In case you don't have access to the database, here are the document headings 
attached to the issue:

10/21/15 : Updated Section:  C.6.6.2 ibm,dynamic-reconfiguration-memory to 
remove lmb-size from ibm,dynamic-memory-v2; corrected typo in C.6.6.2.
(See attached file: C PAPR Binding.doc)


10/21/15 : changed ibm,drc-index to drc-index in Section 7.3.28 and 
R1-7.3.28-14; added ibm,drc-info in the root and vdevice nodes in the Partition 
Migration/Hibernation section of Table 129.
(See attached file: Chapter 7 .doc)


10/21/15 : Updated reference " ... the drc-xxx or ibm,drc-info property ..." in 
Section 13.5.3.2. Added  "See 13.5.2.8 for additional information." to the 
first paragraph of sections 13.5.2.2, 13.5.2.4, 13.5.2.5, 13.5.2.6 : In the 
first paragraph of each section, add
(See attached file: Chapter 13.doc)

(See attached file: Chapter 17 and 18.doc)

(See attached file: Chapter 5 .doc)

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v5] powerpc/pci: Assign fixed PHB number based on device-tree properties

2016-04-19 Thread Guilherme G. Piccoli

On 04/19/2016 04:27 AM, Ian Munsie wrote:

Thanks for addressing my feedback :)

Reviewed-by: Ian Munsie 


Thanks very much for reviewing Ian =)

Cheers,


Guilherme

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC v6 06/10] PCI: Add a new PCI_BUS_FLAGS_MSI_REMAP flag

2016-04-19 Thread Yongji Xie

On 2016/4/18 19:30, David Laight wrote:

From: Yongji Xie

Sent: 18 April 2016 11:59
We introduce a new pci_bus_flags, PCI_BUS_FLAGS_MSI_REMAP
which indicates all devices on the bus are protected by the
hardware which supports IRQ remapping(intel naming).

This flag will be used to know whether it's safe to expose
MSI-X tables of PCI BARs to userspace. Because the capability
of IRQ remapping can guarantee the PCI device cannot trigger
MSIs that correspond to interrupt IDs of other devices.

I'm worried that this entire series is going to break drivers
for existing hardware.

I understand some of the reasoning for 'vm pass through' configurations,
but there will be PCIe devices out there that have the MSI-X tables
in the same BAR as other device registers.
If you are lucky nothing else is in the same 4k area, but I wouldn't
assume it.


Thanks for your comments. But I didn't get your point here.
Why will exposing MSI-X table to userspace break the driver
for hardware which have the MSI-X tables in the same BAR as
other device registers? Could you give me more details?

The reason why we want to mmap MSI-X table is that there
may be some other critical device registers in the same page
as the MSI-X table. We prefer to handle the mmio access to
these registers in guest rather than in QEMU. So we would
like to see there is something else in the same 4k/64k area.


In any case, if the hardware can't police the card's master transfers
there is nothing to stop a different bus master block on the card
from raising MSI-X interrupts - they are just a PCIe write.
So all you are doing is raising the bar slightly and giving a very false
sense of security.


Do you mean we can request a DMA to the target address
area that raises MSI-X interrupts? But for PPC64 with IODA
bridge, this invalid PCIe write will be prevented on PHB before
raising MSI-X interrupt. And I think the capability of interrupt
remapping or ITS can also do the same thing. If hardware didn't
support this, we would not expose MSI-X table in my patch.

Thanks,
Yongji

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 45/45] PCI/hotplug: PowerPC PowerNV PCI hotplug driver

2016-04-19 Thread Alexey Kardashevskiy

On 02/17/2016 02:44 PM, Gavin Shan wrote:

This adds standalone driver to support PCI hotplug for PowerPC PowerNV
platform that runs on top of skiboot firmware. The firmware identifies
hotpluggable slots and marked their device tree node with proper
"ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans
device tree nodes to create/register PCI hotplug slot accordingly.

The PCI slots are organized in fashion of tree, which means one
PCI slot might have parent PCI slot and parent PCI slot possibly
contains multiple child PCI slots. At the plugging time, the parent
PCI slot is populated before its children. The child PCI slots are
removed before their parent PCI slot can be removed from the system.

If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property "ibm,reset-by-firmware". In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.

Signed-off-by: Gavin Shan 
Acked-by: Bjorn Helgaas 
---
  drivers/pci/hotplug/Kconfig   |  12 +
  drivers/pci/hotplug/Makefile  |   3 +
  drivers/pci/hotplug/pnv_php.c | 870 ++
  3 files changed, 885 insertions(+)
  create mode 100644 drivers/pci/hotplug/pnv_php.c

diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..167c8ce 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC

  When in doubt, say N.

+config HOTPLUG_PCI_POWERNV
+   tristate "PowerPC PowerNV PCI Hotplug driver"
+   depends on PPC_POWERNV && EEH
+   help
+ Say Y here if you run PowerPC PowerNV platform that supports
+ PCI Hotplug
+
+ To compile this driver as a module, choose M here: the
+ module will be called pnv-php.
+
+ When in doubt, say N.
+
  config HOTPLUG_PCI_RPA
tristate "RPA PCI Hotplug driver"
depends on PPC_PSERIES && EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index b616e75..e33cdda 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)+= pciehp.o
  obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550) += cpcihp_zt5550.o
  obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC)+= cpcihp_generic.o
  obj-$(CONFIG_HOTPLUG_PCI_SHPC)+= shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)  += pnv-php.o
  obj-$(CONFIG_HOTPLUG_PCI_RPA) += rpaphp.o
  obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)   += rpadlpar_io.o
  obj-$(CONFIG_HOTPLUG_PCI_SGI) += sgi_hotplug.o
@@ -50,6 +51,8 @@ ibmphp-objs   :=  ibmphp_core.o   \
  acpiphp-objs  :=  acpiphp_core.o  \
acpiphp_glue.o

+pnv-php-objs   :=  pnv_php.o
+
  rpaphp-objs   :=  rpaphp_core.o   \
rpaphp_pci.o\
rpaphp_slot.o
diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
new file mode 100644
index 000..364ec36
--- /dev/null
+++ b/drivers/pci/hotplug/pnv_php.c
@@ -0,0 +1,870 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define DRIVER_VERSION "0.1"
+#define DRIVER_AUTHOR  "Gavin Shan, IBM Corporation"
+#define DRIVER_DESC"PowerPC PowerNV PCI Hotplug Driver"
+
+struct pnv_php_slot {
+   struct hotplug_slot slot;
+   struct hotplug_slot_infoslot_info;
+   uint64_tid;
+   char*name;
+   int slot_no;
+   struct kref kref;
+#define PNV_PHP_STATE_INITIALIZED  0
+#define PNV_PHP_STATE_REGISTERED   1
+#define PNV_PHP_STATE_POPULATED2
+   int state;
+   struct device_node  *dn;
+   struct pci_dev  *pdev;
+   struct pci_bus  *bus;
+   boolpower_state_check;
+   int power_state_confirmed;
+#define PNV_PHP_POWER_CONFIRMED_INVALID0
+#define PNV_PHP_POWER_CONFIRMED_SUCCESS1
+#define PNV_PHP_POWER_CONFIRMED_FAIL   2
+   struct opal_msg *msg;
+   void*fdt;
+   void*dt;
+   struct of_changeset ocs;
+   

Re: [5/5] powerpc/livepatch: Add live patching support on ppc64le

2016-04-19 Thread Michael Ellerman
On Wed, 2016-13-04 at 12:53:23 UTC, Michael Ellerman wrote:
> Add the kconfig logic & assembly support for handling live patched
> functions. This depends on DYNAMIC_FTRACE_WITH_REGS, which in turn
> depends on the new -mprofile-kernel ftrace ABI, which is only supported
> currently on ppc64le.
...
> 
> Signed-off-by: Michael Ellerman 
> Reviewed-by: Torsten Duwe 
> Reviewed-by: Balbir Singh 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/85baa095497f3e590df9f6c893

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [4/5] powerpc/livepatch: Add livepatch stack to struct thread_info

2016-04-19 Thread Michael Ellerman
On Wed, 2016-13-04 at 12:53:22 UTC, Michael Ellerman wrote:
> In order to support live patching we need to maintain an alternate
> stack of TOC & LR values. We use the base of the stack for this, and
> store the "live patch stack pointer" in struct thread_info.
> 
> Unlike the other fields of thread_info, we can not statically initialise
> that value, so it must be done at run time.
> 
> This patch just adds the code to support that, it is not enabled until
> the next patch which actually adds live patch support.
> 
> Signed-off-by: Michael Ellerman 
> Acked-by: Balbir Singh 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/5d31a96e6c0187f2c5d7004e00

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [3/5] powerpc/livepatch: Add livepatch header

2016-04-19 Thread Michael Ellerman
On Wed, 2016-13-04 at 12:53:21 UTC, Michael Ellerman wrote:
> Add the powerpc specific livepatch definitions. In particular we provide
> a non-default implementation of klp_get_ftrace_location().
> 
> This is required because the location of the mcount call is not constant
> when using -mprofile-kernel (which we always do for live patching).
> 
> Signed-off-by: Torsten Duwe 
> Signed-off-by: Balbir Singh 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/f63e6d89876034c21ecd18bb1c

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/5] livepatch: Allow architectures to specify an alternate ftrace location

2016-04-19 Thread Michael Ellerman
On Wed, 2016-13-04 at 12:53:20 UTC, Michael Ellerman wrote:
> When livepatch tries to patch a function it takes the function address
> and asks ftrace to install the livepatch handler at that location.
> ftrace will look for an mcount call site at that exact address.
> 
> On powerpc the mcount location is not the first instruction of the
> function, and in fact it's not at a constant offset from the start of
> the function. To accommodate this add a hook which arch code can
> override to customise the behaviour.
> 
> Signed-off-by: Torsten Duwe 
> Signed-off-by: Balbir Singh 
> Signed-off-by: Petr Mladek 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/28e7cbd3e0f5fefec892842d13

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/5] ftrace: Make ftrace_location_range() global

2016-04-19 Thread Michael Ellerman
On Wed, 2016-13-04 at 12:53:19 UTC, Michael Ellerman wrote:
> In order to support live patching on powerpc we would like to call
> ftrace_location_range(), so make it global.
> 
> Signed-off-by: Torsten Duwe 
> Signed-off-by: Balbir Singh 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/04cf31a759ef575f750a63777c

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [3/3] powerpc: Update TM user feature bits in scan_features()

2016-04-19 Thread Michael Ellerman
On Fri, 2016-15-04 at 02:08:19 UTC, Unknown sender due to SPF wrote:
> We need to update the user TM feature bits (PPC_FEATURE2_HTM and
> PPC_FEATURE2_HTM) to mirror what we do with the kernel TM feature
> bit.
> 
> At the moment, if firmware reports TM is not available we turn off
> the kernel TM feature bit but leave the userspace ones on. Userspace
> thinks it can execute TM instructions and it dies trying.
> 
> This (together with a QEMU patch) fixes PR KVM, which doesn't currently
> support TM.
> 
> Signed-off-by: Anton Blanchard 
> Cc: sta...@vger.kernel.org

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/4705e02498d6d5a7ab98dfee95

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/3] powerpc: Update cpu_user_features2 in scan_features()

2016-04-19 Thread Michael Ellerman
On Fri, 2016-15-04 at 02:07:24 UTC, Unknown sender due to SPF wrote:
> scan_features() updates cpu_user_features but not cpu_user_features2.
> 
> Amongst other things, cpu_user_features2 contains the user TM feature
> bits which we must keep in sync with the kernel TM feature bit.
> 
> Signed-off-by: Anton Blanchard 
> Cc: sta...@vger.kernel.org

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/beff82374b259d726e2625ec6c

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2, 1/3] powerpc: scan_features() updates incorrect bits for REAL_LE

2016-04-19 Thread Michael Ellerman
On Mon, 2016-18-04 at 10:36:07 UTC, Michael Ellerman wrote:
> From: Anton Blanchard 
> 
> The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
> feature value, meaning all the remaining elements initialise the wrong
> values.
...
> 
> Fix the code by adding the missing initialisation of the MMU feature.
> 
> Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
> would be unsafe to start using it as old kernels incorrectly set it.
> 
> Fixes: 44ae3ab3358e ("powerpc: Free up some CPU feature bits by moving out 
> MMU-related features")
> Signed-off-by: Anton Blanchard 
> Cc: sta...@vger.kernel.org
> [mpe: Flesh out changelog, add comment reserving 0x4]
> Signed-off-by: Michael Ellerman 

Applied to powerpc fixes.

https://git.kernel.org/powerpc/c/6997e57d693b07289694239e52

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-19 Thread Akshay Adiga
The frequency transition latency from pmin to pmax is observed to be in
few millisecond granurality. And it usually happens to take a performance
penalty during sudden frequency rampup requests.

This patch set solves this problem by using an entity called "global
pstates". The global pstate is a Chip-level entity, so the global entitiy
(Voltage) is managed across the cores. The local pstate is a Core-level
entity, so the local entity (frequency) is managed across threads.

This patch brings down global pstate at a slower rate than the local
pstate. Hence by holding global pstates higher than local pstate makes
the subsequent rampups faster.

A per policy structure is maintained to keep track of the global and
local pstate changes. The global pstate is brought down using a parabolic
equation. The ramp down time to pmin is set to ~5 seconds. To make sure
that the global pstates are dropped at regular interval , a timer is
queued for every 2 seconds during ramp-down phase, which eventually brings
the pstate down to local pstate.

Iozone results show fairly consistent performance boost.
YCSB on redis shows improved Max latencies in most cases.

Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb
with different record sizes . The following table shows IOoperations/sec
with and without patch.

Iozone Results ( in op/sec) ( mean over 3 iterations )
-
file size-  withwithout   %
recordsize-IOtype   patch   patch   change
--
200704-1-SeqWrite   1616532 1615425 0.06
200704-1-Rewrite2423195 2303130 5.21
200704-2-SeqWrite   1628577 1602620 1.61
200704-2-Rewrite2428264 2312154 5.02
200704-4-SeqWrite   1617605 1617182 0.02
200704-4-Rewrite2430524 2351238 3.37
200704-8-SeqWrite   1629478 1600436 1.81
200704-8-Rewrite2415308 2298136 5.09
200704-16-SeqWrite  1619632 1618250 0.08
200704-16-Rewrite   2396650 2352591 1.87
200704-32-SeqWrite  1632544 1598083 2.15
200704-32-Rewrite   2425119 2329743 4.09
200704-64-SeqWrite  1617812 1617235 0.03
200704-64-Rewrite   2402021 2321080 3.48
200704-128-SeqWrite 1631998 1600256 1.98
200704-128-Rewrite  2422389 2304954 5.09
200704-256 SeqWrite 1617065 1616962 0.00
200704-256-Rewrite  2432539 2301980 5.67
200704-512-SeqWrite 1632599 1598656 2.12
200704-512-Rewrite  2429270 2323676 4.54
200704-1024-SeqWrite1618758 1616156 0.16
200704-1024-Rewrite 2431631 2315889 4.99
401408-1-SeqWrite   1631479 1608132 1.45
401408-1-Rewrite2501550 2459409 1.71
401408-2-SeqWrite   1617095 1626069 -0.55
401408-2-Rewrite2507557 2443621 2.61
401408-4-SeqWrite   1629601 1611869 1.10
401408-4-Rewrite2505909 2462098 1.77
401408-8-SeqWrite   1617110 1626968 -0.60
401408-8-Rewrite2512244 2456827 2.25
401408-16-SeqWrite  1632609 1609603 1.42
401408-16-Rewrite   2500792 2451405 2.01
401408-32-SeqWrite  1619294 1628167 -0.54
401408-32-Rewrite   2510115 2451292 2.39
401408-64-SeqWrite  1632709 1603746 1.80
401408-64-Rewrite   2506692 2433186 3.02
401408-128-SeqWrite 1619284 1627461 -0.50
401408-128-Rewrite  2518698 2453361 2.66
401408-256-SeqWrite 1634022 1610681 1.44
401408-256-Rewrite  2509987 2446328 2.60
401408-512-SeqWrite 1617524 1628016 -0.64
401408-512-Rewrite  2504409 2442899 2.51
401408-1024-SeqWrite1629812 1611566 1.13
401408-1024-Rewrite 2507620  24429682.64

Tested with YCSB workload (50% update + 50% read) over redis for 1 million
records and 1 million operation. Each test was carried out with target
operations per second and persistence disabled.

Max-latency (in us)( mean over 5 iterations )

[PATCH v3 0/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-19 Thread Akshay Adiga
The frequency transition latency from pmin to pmax is observed to be in few
millisecond granurality. And it usually happens to take a performance penalty
during sudden frequency rampup requests.

This patch set solves this problem by using a chip-level entity called "global
pstates". Global pstate manages elements across other dependent core chiplets.
Typically, the element that needs to be managed is the voltage setting.
So by holding global pstates higher than local pstate for some amount of time
( ~5 seconds) the subsequent rampups could be made faster.

(1/2) patch removes the flag from cpufreq_policy->driver_data, so that it can
be used for tracking global pstates.

(2/2) patch adds code for global pstate management.
- The iozone results with this patchset, shows improvements in almost all cases.
- YCSB workload on redis with various  target operations per second shows 
better MaxLatency with this patch.

Changes from v1:
- Fixed coding style
- Added a routine to reset global_pstate_info instead of hacky memset
- Handled case where cpufreq_table_validate_and_show() fails
- changed int queue_gpstate_timer() to void queue_gpstate_timer()

Changes from v2:
- dropped the unreated change. 

Akshay Adiga (1):
  cpufreq: powernv: Ramp-down global pstate slower than local-pstate

Shilpasri G Bhat (1):
  cpufreq: powernv: Remove flag use-case of policy->driver_data

 drivers/cpufreq/powernv-cpufreq.c | 269 --
 1 file changed, 256 insertions(+), 13 deletions(-)

-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/2] cpufreq: powernv: Remove flag use-case of policy->driver_data

2016-04-19 Thread Akshay Adiga
From: Shilpasri G Bhat 

commit 1b0289848d5d ("cpufreq: powernv: Add sysfs attributes to show
throttle stats") used policy->driver_data as a flag for one-time creation
of throttle sysfs files. Instead of this use 'kernfs_find_and_get()' to
check if the attribute already exists. This is required as
policy->driver_data is used for other purposes in the later patch.

Signed-off-by: Shilpasri G Bhat 
Signed-off-by: Akshay Adiga 
Acked-by: Viresh Kumar 
---
 drivers/cpufreq/powernv-cpufreq.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 39ac78c..e2e2219 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -455,13 +455,15 @@ static int powernv_cpufreq_target_index(struct 
cpufreq_policy *policy,
 static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
int base, i;
+   struct kernfs_node *kn;
 
base = cpu_first_thread_sibling(policy->cpu);
 
for (i = 0; i < threads_per_core; i++)
cpumask_set_cpu(base + i, policy->cpus);
 
-   if (!policy->driver_data) {
+   kn = kernfs_find_and_get(policy->kobj.sd, throttle_attr_grp.name);
+   if (!kn) {
int ret;
 
ret = sysfs_create_group(>kobj, _attr_grp);
@@ -470,11 +472,8 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
policy->cpu);
return ret;
}
-   /*
-* policy->driver_data is used as a flag for one-time
-* creation of throttle sysfs files.
-*/
-   policy->driver_data = policy;
+   } else {
+   kernfs_put(kn);
}
return cpufreq_table_validate_and_show(policy, powernv_freqs);
 }
-- 
2.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] cpufreq: powernv: Ramp-down global pstate slower than local-pstate

2016-04-19 Thread Akshay Adiga

Hi Viresh,

On 04/18/2016 03:48 PM, Viresh Kumar wrote:

On 15-04-16, 11:58, Akshay Adiga wrote:

  static int powernv_cpufreq_reboot_notifier(struct notifier_block *nb,
-   unsigned long action, void *unused)
+  unsigned long action, void *unused)

Unrelated change.. better don't add such changes..


Posting out v3 with out this unrelated change.


  {
int cpu;
struct cpufreq_policy cpu_policy;
@@ -603,15 +843,18 @@ static struct notifier_block powernv_cpufreq_opal_nb = {
  static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
  {
struct powernv_smp_call_data freq_data;
-
+   struct global_pstate_info *gpstates = policy->driver_data;

You removed a blank line here and I feel the code looks better with
that.


freq_data.pstate_id = powernv_pstate_info.min;
+   freq_data.gpstate_id = powernv_pstate_info.min;
smp_call_function_single(policy->cpu, set_pstate, _data, 1);
+   del_timer_sync(>timer);
  }
  
  static struct cpufreq_driver powernv_cpufreq_driver = {

.name   = "powernv-cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
.init   = powernv_cpufreq_cpu_init,
+   .exit   = powernv_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = powernv_cpufreq_target_index,
.get= powernv_cpufreq_get,

None of the above comments are mandatory for you to fix..

Acked-by: Viresh Kumar 


Thanks for Ack  :)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: Add a kernel thread to check the coherent platform function's state

2016-04-19 Thread Michael Ellerman
On Mon, 2016-04-18 at 15:05 +0200, Christophe Lombard wrote:

> In the POWERVM environement, the PHYP CoherentAccel component manages

PowerVM is correct I think.

> the state of the Coherant Accelerator Processor Interface adapter and
   ^
   (CAPI)
> virtualizes CAPI resources, handles CAPP, PSL, PSL Slice errors - and
> interrupts - and provides a new set of HCALLs for the OS APIs to utilize
 ^
 hcall (as below?)
> AFUs.

AFUs ? (you define it below)

> During the course of operation, a coherent platform function can
> encounter errors. Some possible reason for errors are:
> • Hardware recoverable and unrecoverable errors
> • Transient and over-threshold correctable errors
> 
> PHYP implements its own state model for the coherent platform function.
> The current state of this Acclerator Fonction Unit (AFU) is available
> through a hcall.
> 
> In case of low-level troubles (or error injection), The PHYP component
> may reset the card and change the AFU state. The PHYP interface doesn't
> provide any way to be notified when that happens.

Ugh.

> The current implementation of the cxl driver, for the POWERVM
> environment, follows the general error recovery procedures required to

What are "the general error recovery procedures" ?

> reset operation of the coherent platform function. The platform firmware
> resets and reconfigures hardware when an external action is required -
> attach/detach a process, link ok, 

Platform firmware does that at our request or by itself?

> The purpose of this patch is to interact with the external driver

What's an external driver?

> (where the AFU is shown) even if no action is required. A kernel thread

But no action is required, so why do we need to do anything?

> is needed to check every x seconds the current state of the AFU to see
> if we need to enter an error recovery path.


I don't really understand what this is doing and why we want it. It sounds like
we're waking the cpu up every 3 seconds and having it poll the hypervisor, for
each AFU?

As far as the implementation, I can't see any reason why you need your own
kthreads, can't you just use queue_work() ?

cheers

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3 1/2] cpufreq: qoriq: Remove __exit macro from .exit callback

2016-04-19 Thread Jia Hongtao
.exit callback (qoriq_cpufreq_cpu_exit()) is also used during suspend.
So __exit macro should be removed or the function will be discarded.

Signed-off-by: Jia Hongtao 
---
 drivers/cpufreq/qoriq-cpufreq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c
index b23e525..3a3fe39 100644
--- a/drivers/cpufreq/qoriq-cpufreq.c
+++ b/drivers/cpufreq/qoriq-cpufreq.c
@@ -301,7 +301,7 @@ err_np:
return -ENODEV;
 }
 
-static int __exit qoriq_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+static int qoriq_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 {
struct cpu_data *data = policy->driver_data;
 
@@ -348,7 +348,7 @@ static struct cpufreq_driver qoriq_cpufreq_driver = {
.name   = "qoriq_cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
.init   = qoriq_cpufreq_cpu_init,
-   .exit   = __exit_p(qoriq_cpufreq_cpu_exit),
+   .exit   = qoriq_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = qoriq_cpufreq_target,
.get= cpufreq_generic_get,
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 39/45] powerpc/powernv: Select OF_DYNAMIC

2016-04-19 Thread Alexey Kardashevskiy

On 02/17/2016 02:44 PM, Gavin Shan wrote:

The device tree will change dynamically in PowerNV PCI hotplug
driver. This enables CONFIG_OF_DYNAMIC to support that.

Signed-off-by: Gavin Shan 
---
  arch/powerpc/platforms/powernv/Kconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 604190c..e7b1ad7 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -18,6 +18,7 @@ config PPC_POWERNV
select CPU_FREQ_GOV_ONDEMAND
select CPU_FREQ_GOV_CONSERVATIVE
select PPC_DOORBELL
+   select OF_DYNAMIC



Why not to enable it in 45/45 under config HOTPLUG_PCI_POWERNV? Is there 
any benefit of having it always on if HOTPLUG_PCI_POWERNV is not enabled?




default y

  config OPAL_PRD




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 38/45] powerpc/powernv: Functions to get/set PCI slot status

2016-04-19 Thread Alexey Kardashevskiy

On 02/17/2016 02:44 PM, Gavin Shan wrote:

This exports 4 functins, which base on the corresponding OPAL



s/functins/functions/



APIs to get/set PCI slot status. Those functions are going to
be used by PowerNV PCI hotplug driver:

pnv_pci_get_device_tree()opal_get_device_tree()
pnv_pci_get_presence_state() opal_pci_get_presence_state()
pnv_pci_get_power_state()opal_pci_get_power_state()
pnv_pci_set_power_state()opal_pci_set_power_state()

Besides, the patch also exports pnv_pci_hotplug_notifier_{register,
unregister}() to allow registration and unregistration of PCI hotplug
notifier, which will be used to receive PCI hotplug message from
skiboot firmware in PowerNV PCI hotplug driver.

Signed-off-by: Gavin Shan 
---
  arch/powerpc/include/asm/opal-api.h| 17 ++-
  arch/powerpc/include/asm/opal.h|  4 ++
  arch/powerpc/include/asm/pnv-pci.h |  7 +++
  arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++
  arch/powerpc/platforms/powernv/pci.c   | 66 ++
  5 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index f8faaae..a6af338 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -158,7 +158,11 @@
  #define OPAL_LEDS_SET_INDICATOR   115
  #define OPAL_CEC_REBOOT2  116
  #define OPAL_CONSOLE_FLUSH117
-#define OPAL_LAST  117
+#define OPAL_GET_DEVICE_TREE   118
+#define OPAL_PCI_GET_PRESENCE_STATE119
+#define OPAL_PCI_GET_POWER_STATE   120
+#define OPAL_PCI_SET_POWER_STATE   121
+#define OPAL_LAST  121

  /* Device tree flags */

@@ -344,6 +348,16 @@ enum OpalPciResetState {
OPAL_ASSERT_RESET   = 1
  };

+enum OpalPciSlotPresentenceState {
+   OPAL_PCI_SLOT_EMPTY = 0,
+   OPAL_PCI_SLOT_PRESENT   = 1
+};
+
+enum OpalPciSlotPowerState {
+   OPAL_PCI_SLOT_POWER_OFF = 0,
+   OPAL_PCI_SLOT_POWER_ON  = 1
+};
+
  enum OpalSlotLedType {
OPAL_SLOT_LED_TYPE_ID = 0,  /* IDENTIFY LED */
OPAL_SLOT_LED_TYPE_FAULT = 1,   /* FAULT LED */
@@ -378,6 +392,7 @@ enum opal_msg_type {
OPAL_MSG_DPO,
OPAL_MSG_PRD,
OPAL_MSG_OCC,
+   OPAL_MSG_PCI_HOTPLUG,
OPAL_MSG_TYPE_MAX,
  };

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9e0039f..899bcb941 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -209,6 +209,10 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, 
uint64_t buf,
uint64_t size, uint64_t token);
  int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_get_device_tree(uint32_t phandle, uint64_t buf, uint64_t len);
+int64_t opal_pci_get_presence_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_get_power_state(uint64_t id, uint8_t *state);
+int64_t opal_pci_set_power_state(uint64_t id, uint8_t state);

  /* Internal functions */
  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h 
b/arch/powerpc/include/asm/pnv-pci.h
index 6f77f71..d9d095b 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,13 @@
  #include 
  #include 

+extern int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len);
+extern int pnv_pci_get_presence_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_get_power_state(uint64_t id, uint8_t *state);
+extern int pnv_pci_set_power_state(uint64_t id, uint8_t state);
+extern int pnv_pci_hotplug_notifier_register(struct notifier_block *nb);
+extern int pnv_pci_hotplug_notifier_unregister(struct notifier_block *nb);
+
  int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
  int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index e45b88a..3ea1a855 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -302,3 +302,7 @@ OPAL_CALL(opal_prd_msg, 
OPAL_PRD_MSG);
  OPAL_CALL(opal_leds_get_ind,  OPAL_LEDS_GET_INDICATOR);
  OPAL_CALL(opal_leds_set_ind,  OPAL_LEDS_SET_INDICATOR);
  OPAL_CALL(opal_console_flush, OPAL_CONSOLE_FLUSH);
+OPAL_CALL(opal_get_device_tree,OPAL_GET_DEVICE_TREE);
+OPAL_CALL(opal_pci_get_presence_state, OPAL_PCI_GET_PRESENCE_STATE);
+OPAL_CALL(opal_pci_get_power_state,OPAL_PCI_GET_POWER_STATE);
+OPAL_CALL(opal_pci_set_power_state,

[PATCH v2] powerpc: define the fman node for the kmcoge4 DTS

2016-04-19 Thread Valentin Longchamp
Now that the FMAN mac driver has been merged the fman node is relevant.

The kmcoge4 board implements 3 ethernet interfaces, 1 with a RGMII phy
and 2 with fixed 1 Giga SGMII links.

Signed-off-by: Valentin Longchamp 
---
 arch/powerpc/boot/dts/fsl/kmcoge4.dts | 37 +++
 1 file changed, 37 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/kmcoge4.dts 
b/arch/powerpc/boot/dts/fsl/kmcoge4.dts
index 6858ec9..67bfcec 100644
--- a/arch/powerpc/boot/dts/fsl/kmcoge4.dts
+++ b/arch/powerpc/boot/dts/fsl/kmcoge4.dts
@@ -106,6 +106,43 @@
sata@221000 {
status = "disabled";
};
+
+   fman0: fman@40 {
+   enet0: ethernet@e {
+   phy-connection-type = "sgmii";
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   };
+   };
+   mdio0: mdio@e1120 {
+   front_phy: ethernet-phy@11 {
+   reg = <0x11>;
+   };
+   };
+
+   enet1: ethernet@e2000 {
+   phy-connection-type = "sgmii";
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   };
+   };
+   enet2: ethernet@e4000 {
+   status = "disabled";
+   };
+
+   enet3: ethernet@e6000 {
+   status = "disabled";
+   };
+   enet4: ethernet@e8000 {
+   phy-handle = <_phy>;
+   phy-connection-type = "rgmii";
+   };
+   enet5: ethernet@f {
+   status = "disabled";
+   };
+   };
};
 
rio: rapidio@ffe0c {
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: Add a kernel thread to check the coherent platform function's state

2016-04-19 Thread christophe lombard

On 19/04/2016 04:40, Andrew Donnellan wrote:

On 18/04/16 23:05, Christophe Lombard wrote:

In the POWERVM environement, the PHYP CoherentAccel component manages


environment


the state of the Coherant Accelerator Processor Interface adapter and


Coherent


virtualizes CAPI resources, handles CAPP, PSL, PSL Slice errors - and
interrupts - and provides a new set of HCALLs for the OS APIs to utilize
AFUs.

During the course of operation, a coherent platform function can
encounter errors. Some possible reason for errors are:
• Hardware recoverable and unrecoverable errors
• Transient and over-threshold correctable errors

PHYP implements its own state model for the coherent platform function.
The current state of this Acclerator Fonction Unit (AFU) is available


Accelerator Function Unit


through a hcall.

In case of low-level troubles (or error injection), The PHYP component


the


may reset the card and change the AFU state. The PHYP interface doesn't
provide any way to be notified when that happens.

The current implementation of the cxl driver, for the POWERVM
environment, follows the general error recovery procedures required to
reset operation of the coherent platform function. The platform firmware
resets and reconfigures hardware when an external action is required -
attach/detach a process, link ok, 

The purpose of this patch is to interact with the external driver
(where the AFU is shown) even if no action is required. A kernel thread
is needed to check every x seconds the current state of the AFU to see
if we need to enter an error recovery path.

Signed-off-by: Christophe Lombard 


A few minor issues below.


diff --git a/drivers/misc/cxl/guest.c b/drivers/misc/cxl/guest.c
index 8213372..06dfe7f 100644
--- a/drivers/misc/cxl/guest.c
+++ b/drivers/misc/cxl/guest.c
@@ -19,6 +19,10 @@
  #define CXL_SLOT_RESET_EVENT2
  #define CXL_RESUME_EVENT3

+#define CXL_KTHREAD "cxl_kthread"
+
+void stop_state_thread(struct cxl_afu *afu);


static?

[...]


-static int afu_do_recovery(struct cxl_afu *afu)
+static int handle_state_thread(void *data)
  {
-int rc;
+struct cxl_afu *afu;
+int rc = 0;


It looks like we don't use rc (see also comment below).



-/* many threads can arrive here, in case of detach_all for example.
- * Only one needs to drive the recovery
- */
-if (mutex_trylock(>guest->recovery_lock)) {
-rc = afu_update_state(afu);
-mutex_unlock(>guest->recovery_lock);
-return rc;
+pr_devel("in %s\n", __func__);
+
+afu = (struct cxl_afu*)data;


CodingStyle: space between cxl_afu and *


+do {
+set_current_state(TASK_INTERRUPTIBLE);
+
+if (afu) {
+afu_update_state(afu);


Should we be checking the retval here?


Right, We have to check the retval here. Thanks





+if (afu->guest->previous_state == H_STATE_PERM_UNAVAILABLE)
+goto out;
+} else
+return -ENODEV;
+schedule_timeout(msecs_to_jiffies(3000));
+} while(!kthread_should_stop());


CodingStyle: space between while and (


+
+out:
+afu->guest->kthread_tsk = NULL;
+return rc;
+}
+
+void start_state_thread(struct cxl_afu *afu)


static?


+{
+if (afu->guest->kthread_tsk)
+return;
+
+/* start kernel thread to handle the state of the afu */
+afu->guest->kthread_tsk = kthread_run(_state_thread,
+  (void *)afu, CXL_KTHREAD);
+if (IS_ERR(afu->guest->kthread_tsk)) {
+pr_devel("cannot start state kthread\n");
+afu->guest->kthread_tsk = NULL;
  }
-return 0;
+}
+
+void stop_state_thread(struct cxl_afu *afu)


static?



Thanks for the review. I will send a patch update.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3 1/2] cpufreq: qoriq: Remove __exit macro from .exit callback

2016-04-19 Thread Viresh Kumar
On 19-04-16, 17:00, Jia Hongtao wrote:
> .exit callback (qoriq_cpufreq_cpu_exit()) is also used during suspend.
> So __exit macro should be removed or the function will be discarded.
> 
> Signed-off-by: Jia Hongtao 
> ---
>  drivers/cpufreq/qoriq-cpufreq.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Viresh Kumar 

-- 
viresh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 33/45] powerpc/powernv: Simplify pnv_eeh_reset()

2016-04-19 Thread Alexey Kardashevskiy

On 02/17/2016 02:44 PM, Gavin Shan wrote:

This drops unnecessary nested if statements in pnv_eeh_reset() to
improve the code readability. After the changes, the unused local
variable "ret" is dropped as well. No logical changes introduced.

Signed-off-by: Gavin Shan 




Reviewed-by: Alexey Kardashevskiy 




---
  arch/powerpc/platforms/powernv/eeh-powernv.c | 67 +---
  1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 69e41ce..9226df1 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1009,8 +1009,9 @@ static int pnv_eeh_reset_vf_pe(struct eeh_pe *pe, int 
option)
  static int pnv_eeh_reset(struct eeh_pe *pe, int option)
  {
struct pci_controller *hose = pe->phb;
+   struct pnv_phb *phb;
struct pci_bus *bus;
-   int ret;
+   int64_t rc;

/*
 * For PHB reset, we always have complete reset. For those PEs whose
@@ -1026,45 +1027,39 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
 * reset. The side effect is that EEH core has to clear the frozen
 * state explicitly after BAR restore.
 */
-   if (pe->type & EEH_PE_PHB) {
-   ret = pnv_eeh_phb_reset(hose, option);
-   } else {
-   struct pnv_phb *phb;
-   s64 rc;
+   if (pe->type & EEH_PE_PHB)
+   return pnv_eeh_phb_reset(hose, option);

-   /*
-* The frozen PE might be caused by PAPR error injection
-* registers, which are expected to be cleared after hitting
-* frozen PE as stated in the hardware spec. Unfortunately,
-* that's not true on P7IOC. So we have to clear it manually
-* to avoid recursive EEH errors during recovery.
-*/
-   phb = hose->private_data;
-   if (phb->model == PNV_PHB_MODEL_P7IOC &&
-   (option == EEH_RESET_HOT ||
-   option == EEH_RESET_FUNDAMENTAL)) {
-   rc = opal_pci_reset(phb->opal_id,
-   OPAL_RESET_PHB_ERROR,
-   OPAL_ASSERT_RESET);
-   if (rc != OPAL_SUCCESS) {
-   pr_warn("%s: Failure %lld clearing "
-   "error injection registers\n",
-   __func__, rc);
-   return -EIO;
-   }
+   /*
+* The frozen PE might be caused by PAPR error injection
+* registers, which are expected to be cleared after hitting
+* frozen PE as stated in the hardware spec. Unfortunately,
+* that's not true on P7IOC. So we have to clear it manually
+* to avoid recursive EEH errors during recovery.
+*/
+   phb = hose->private_data;
+   if (phb->model == PNV_PHB_MODEL_P7IOC &&
+   (option == EEH_RESET_HOT ||
+option == EEH_RESET_FUNDAMENTAL)) {
+   rc = opal_pci_reset(phb->opal_id,
+   OPAL_RESET_PHB_ERROR,
+   OPAL_ASSERT_RESET);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn("%s: Failure %lld clearing error injection 
registers\n",
+   __func__, rc);
+   return -EIO;
}
-
-   bus = eeh_pe_bus_get(pe);
-   if (pe->type & EEH_PE_VF)
-   ret = pnv_eeh_reset_vf_pe(pe, option);
-   else if (pci_is_root_bus(bus) ||
-   pci_is_root_bus(bus->parent))
-   ret = pnv_eeh_root_reset(hose, option);
-   else
-   ret = pnv_eeh_bridge_reset(bus->self, option);
}

-   return ret;
+   bus = eeh_pe_bus_get(pe);
+   if (pe->type & EEH_PE_VF)
+   return pnv_eeh_reset_vf_pe(pe, option);
+
+   if (pci_is_root_bus(bus) ||
+   pci_is_root_bus(bus->parent))
+   return pnv_eeh_root_reset(hose, option);
+
+   return pnv_eeh_bridge_reset(bus->self, option);
  }

  /**




--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Trouble with DMA on PPC linux question

2016-04-19 Thread Benjamin Herrenschmidt
On Mon, 2016-04-18 at 14:54 -0700, bruce_leon...@selinc.com wrote:
> 
> On the DMA transactions that work, the virtual address I hand to 
> dma_map_single() is something like 0xe084 and the dma_addr_t result is 
> 0x1084 which is less than my 512Mb limit.  On the transactions that 
> don't work, the virtual address is 0xd539 with the mapped result being 
> 0x2539, which is past my upper bound on my RAM.  In fact it's not even 
> in my memory map, there's a hole there. 

Where does this virtual address come from ?

The kernel has two types of virtual addresses. Those coming from the
linear mapping (the stuff you get from kmalloc() for example, or
get_pages()) which can be translated using that simple substraction.

The other is the vmalloc space, and that is a non-linear mapping of
random pages.

If your vaddr comes from the latter it can't be passed to
dma_map_single as-is, you need to get to the underlying pages first.

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v8 30/45] powerpc/pci: Delay populating pdn

2016-04-19 Thread Alexey Kardashevskiy

On 02/17/2016 02:44 PM, Gavin Shan wrote:

The pdn (struct pci_dn) instances are allocated from memblock or
bootmem when creating PCI controller (hoses) in setup_arch(). PCI
hotplug, which will be supported by proceeding patches, releases
PCI device nodes and their corresponding pdn on unplugging event.
The memory chunks for pdn instances allocated from memblock or
bootmem are hard to reused after being released.

This delays creating pdn by pci_devs_phb_init() from setup_arch()
to core_initcall() so that they are allocated from slab. The memory
consumed by pdn can be released to system without problem during
PCI unplugging time. It indicates that pci_dn is unavailable in
setup_arch() and the the fixup on pdn (like AGP's) can't be carried
out that time. We have to do that in ppc_md.pcibios_root_bridge_prepare()
on maple/pasemi/powermac platforms where/when the pdn is available.

At the mean while, the EEH device is created when pdn is populated,
meaning pdn and EEH device have same life cycle. In turn, we needn't
call eeh_dev_init() to create EEH device explicitly.

Signed-off-by: Gavin Shan 



Uff. It would not hurt to mention that  pcibios_root_bridge_prepare is 
called from subsys_initcall() which is executed after core_initcall() so 
the code flow does not change.


Have you checked if there is anything in between 
core_initcall(pci_devs_phb_init) and subsys_initcall(pcibios_init) which 
might need device tree nodes? For example, subsys_initcall(pcibios_init) 
calls (eventually) pnv_pci_ioda_fixup(), if we are unlucky and 
pcibios_init() (and therefore pnv_pci_ioda_fixup() or what pseries/others 
do) is called before pcibios_init() - won't we crash or something?





--
Alexey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

答复: [PATCH 1/2] cpufreq: qoriq: Fix cooling device registration issue during suspend

2016-04-19 Thread Hongtao Jia


> -邮件原件-
> 发件人: Viresh Kumar [mailto:viresh.ku...@linaro.org]
> 发送时间: Monday, April 18, 2016 6:33 PM
> 收件人: Hongtao Jia 
> 抄送: linux...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Scott Wood
> ; Yuantian Tang 
> 主题: Re: [PATCH 1/2] cpufreq: qoriq: Fix cooling device registration issue
> during suspend
> 
> On 18-04-16, 15:59, Jia Hongtao wrote:
> > Cooling device is registered by ready callback. It's also invoked
> > while system resuming from sleep (Enabling non-boot cpus). Thus
> > cooling device may be multiple registered. Stop_cpu callback is
> > invoked during suspend (Disabling non-boot cpus). So matchable
> > unregistration is added to fix this issue.
> >
> > Signed-off-by: Jia Hongtao 
> > ---
> >  drivers/cpufreq/qoriq-cpufreq.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/cpufreq/qoriq-cpufreq.c
> > b/drivers/cpufreq/qoriq-cpufreq.c index b23e525..1c2fdc1 100644
> > --- a/drivers/cpufreq/qoriq-cpufreq.c
> > +++ b/drivers/cpufreq/qoriq-cpufreq.c
> > @@ -305,6 +305,7 @@ static int __exit qoriq_cpufreq_cpu_exit(struct
> > cpufreq_policy *policy)  {
> > struct cpu_data *data = policy->driver_data;
> >
> > +   cpufreq_cooling_unregister(data->cdev);
> > kfree(data->pclk);
> > kfree(data->table);
> > kfree(data);
> > @@ -323,6 +324,12 @@ static int qoriq_cpufreq_target(struct cpufreq_policy
> *policy,
> > return clk_set_parent(policy->clk, parent);  }
> >
> > +static void qoriq_cpufreq_stop_cpu(struct cpufreq_policy *policy) {
> > +   struct cpu_data *cpud = policy->driver_data;
> > +
> > +   cpufreq_cooling_unregister(cpud->cdev);
> > +}
> >
> >  static void qoriq_cpufreq_ready(struct cpufreq_policy *policy)  { @@
> > -352,6 +359,7 @@ static struct cpufreq_driver qoriq_cpufreq_driver = {
> > .verify = cpufreq_generic_frequency_table_verify,
> > .target_index   = qoriq_cpufreq_target,
> > .get= cpufreq_generic_get,
> > +   .stop_cpu   = qoriq_cpufreq_stop_cpu,
> > .ready  = qoriq_cpufreq_ready,
> > .attr   = cpufreq_generic_attr,
> >  };
> 
> You don't need to do it from stop_cpu(), please use
> qoriq_cpufreq_cpu_exit() for this.

Thanks. The new patch will be submitted soon.

-Hongtao.

> 
> --
> viresh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2] cpufreq: qoriq: Fix cooling device registration issue during suspend

2016-04-19 Thread Jia Hongtao
Cooling device is registered by ready callback. It's also invoked while
system resuming from sleep (Enabling non-boot cpus). Thus cooling device
may be multiple registered. Matchable unregistration is added to exit
callback to fix this issue.

Signed-off-by: Jia Hongtao 
---
Changes for V2:
* Using qoriq_cpufreq_cpu_exit() callback instead of adding stop_cpu().

 drivers/cpufreq/qoriq-cpufreq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c
index b23e525..0b85f90 100644
--- a/drivers/cpufreq/qoriq-cpufreq.c
+++ b/drivers/cpufreq/qoriq-cpufreq.c
@@ -301,10 +301,11 @@ err_np:
return -ENODEV;
 }
 
-static int __exit qoriq_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+static int qoriq_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 {
struct cpu_data *data = policy->driver_data;
 
+   cpufreq_cooling_unregister(data->cdev);
kfree(data->pclk);
kfree(data->table);
kfree(data);
@@ -348,7 +349,7 @@ static struct cpufreq_driver qoriq_cpufreq_driver = {
.name   = "qoriq_cpufreq",
.flags  = CPUFREQ_CONST_LOOPS,
.init   = qoriq_cpufreq_cpu_init,
-   .exit   = __exit_p(qoriq_cpufreq_cpu_exit),
+   .exit   = qoriq_cpufreq_cpu_exit,
.verify = cpufreq_generic_frequency_table_verify,
.target_index   = qoriq_cpufreq_target,
.get= cpufreq_generic_get,
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V2] powerpc: Implement {cmp}xchg for u8 and u16

2016-04-19 Thread Pan Xinhui
From: Pan Xinhui 

Implement xchg{u8,u16}{local,relaxed}, and
cmpxchg{u8,u16}{,local,acquire,relaxed}.

It works on all ppc.

Suggested-by: Peter Zijlstra (Intel) 
Signed-off-by: Pan Xinhui 
---
change from V1:
rework totally.
---
 arch/powerpc/include/asm/cmpxchg.h | 83 ++
 1 file changed, 83 insertions(+)

diff --git a/arch/powerpc/include/asm/cmpxchg.h 
b/arch/powerpc/include/asm/cmpxchg.h
index 44efe73..79a1f45 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -7,6 +7,37 @@
 #include 
 #include 
 
+#ifdef __BIG_ENDIAN
+#define BITOFF_CAL(size, off)  ((sizeof(u32) - size - off) * BITS_PER_BYTE)
+#else
+#define BITOFF_CAL(size, off)  (off * BITS_PER_BYTE)
+#endif
+
+static __always_inline unsigned long
+__cmpxchg_u32_local(volatile unsigned int *p, unsigned long old,
+   unsigned long new);
+
+#define __XCHG_GEN(cmp, type, sfx, u32sfx, skip, v)\
+static __always_inline u32 \
+__##cmp##xchg_##type##sfx(v void *ptr, u32 old, u32 new)   \
+{  \
+   int size = sizeof (type);   \
+   int off = (unsigned long)ptr % sizeof(u32); \
+   volatile u32 *p = ptr - off;\
+   int bitoff = BITOFF_CAL(size, off); \
+   u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;\
+   u32 oldv, newv; \
+   u32 ret;\
+   do {\
+   oldv = READ_ONCE(*p);   \
+   ret = (oldv & bitmask) >> bitoff;   \
+   if (skip && ret != old) \
+   break;  \
+   newv = (oldv & ~bitmask) | (new << bitoff); \
+   } while (__cmpxchg_u32##u32sfx((v void*)p, oldv, newv) != oldv);\
+   return ret; \
+}
+
 /*
  * Atomic exchange
  *
@@ -14,6 +45,19 @@
  * the previous value stored there.
  */
 
+#define XCHG_GEN(type, sfx, v) \
+   __XCHG_GEN(_, type, sfx, _local, 0, v)  \
+static __always_inline u32 __xchg_##type##sfx(v void *p, u32 n)\
+{  \
+   return ___xchg_##type##sfx(p, 0, n);\
+}
+
+XCHG_GEN(u8, _local, volatile);
+XCHG_GEN(u8, _relaxed, );
+XCHG_GEN(u16, _local, volatile);
+XCHG_GEN(u16, _relaxed, );
+#undef XCHG_GEN
+
 static __always_inline unsigned long
 __xchg_u32_local(volatile void *p, unsigned long val)
 {
@@ -88,6 +132,10 @@ static __always_inline unsigned long
 __xchg_local(volatile void *ptr, unsigned long x, unsigned int size)
 {
switch (size) {
+   case 1:
+   return __xchg_u8_local(ptr, x);
+   case 2:
+   return __xchg_u16_local(ptr, x);
case 4:
return __xchg_u32_local(ptr, x);
 #ifdef CONFIG_PPC64
@@ -103,6 +151,10 @@ static __always_inline unsigned long
 __xchg_relaxed(void *ptr, unsigned long x, unsigned int size)
 {
switch (size) {
+   case 1:
+   return __xchg_u8_relaxed(ptr, x);
+   case 2:
+   return __xchg_u16_relaxed(ptr, x);
case 4:
return __xchg_u32_relaxed(ptr, x);
 #ifdef CONFIG_PPC64
@@ -226,6 +278,21 @@ __cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned 
long new)
return prev;
 }
 
+
+#define CMPXCHG_GEN(type, sfx, v)  \
+   __XCHG_GEN(cmp, type, sfx, sfx, 1, v)
+
+CMPXCHG_GEN(u8, , volatile);
+CMPXCHG_GEN(u8, _local, volatile);
+CMPXCHG_GEN(u8, _relaxed, );
+CMPXCHG_GEN(u8, _acquire, );
+CMPXCHG_GEN(u16, , volatile);
+CMPXCHG_GEN(u16, _local, volatile);
+CMPXCHG_GEN(u16, _relaxed, );
+CMPXCHG_GEN(u16, _acquire, );
+#undef CMPXCHG_GEN
+#undef __XCHG_GEN
+
 #ifdef CONFIG_PPC64
 static __always_inline unsigned long
 __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
@@ -316,6 +383,10 @@ __cmpxchg(volatile void *ptr, unsigned long old, unsigned 
long new,
  unsigned int size)
 {
switch (size) {
+   case 1:
+   return __cmpxchg_u8(ptr, old, new);
+   case 2:
+   return __cmpxchg_u16(ptr, old, new);
case 4:
return __cmpxchg_u32(ptr, old, new);
 #ifdef CONFIG_PPC64
@@ -332,6 +403,10 @@ __cmpxchg_local(volatile void *ptr, unsigned long old, 
unsigned long new,
  

Re: [PATCH] cxl: static-ify variables to fix sparse warnings

2016-04-19 Thread Ian Munsie
Acked-by: Ian Munsie 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev