date:20150508

Re: [PATCH 1/6] dump_stack: Support adding to the dump stack arch description

2015-05-08 Thread Michael Ellerman

On Tue, 2015-05-05 at 14:16 -0700, Andrew Morton wrote:
 On Tue,  5 May 2015 21:12:12 +1000 Michael Ellerman m...@ellerman.id.au 
 wrote:
 
  Arch code can set a dump stack arch description string which is
  displayed with oops output to describe the hardware platform.
  +
  +   len = strnlen(dump_stack_arch_desc_str, 
  sizeof(dump_stack_arch_desc_str));
  +   pos = len;
  +
  +   if (len)
  +   pos++;
  +
  +   if (pos = sizeof(dump_stack_arch_desc_str))
  +   return; /* Ran out of space */
  +
  +   p = dump_stack_arch_desc_str[pos];
  +
  +   va_start(args, fmt);
  +   vsnprintf(p, sizeof(dump_stack_arch_desc_str) - pos, fmt, args);
  +   va_end(args);
 
 This code is almost race-free.  A (documented) smp_wmb() in here would
 make that 100%?
 
  +   if (len)
  +   dump_stack_arch_desc_str[len] = ' ';
  +}

On second thoughts I don't think it would.

It would order the stores in vsnprintf() vs the store of the space. The idea
being you never see a partially printed string. But for that to actually work
you need a barrier on the read side, and where do you put it?

The cpu printing the buffer could speculate the load of the tail of the buffer,
seeing something half printed from vsnprintf(), and then load the head of the
buffer and see the space, unless you order those loads.

So I don't think we can prevent a crashing cpu seeing a semi-printed buffer
without a lock, and we don't want to add a lock.

The other issue would be that a reader could miss the trailing NULL from the
vsnprintf() but see the space, meaning it would wander off the end of the
buffer. But the buffer's in BSS to start with, and we're careful not to print
off the end of it, so it should always be NULL terminated.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/2] perf/kvm: Port perf kvm to powerpc

2015-05-08 Thread Hemant Kumar



On 05/08/2015 09:58 AM, Ingo Molnar wrote:

* Hemant Kumar hem...@linux.vnet.ibm.com wrote:


  # perf kvm stat report -p 60515
Analyze events for pid(s) 60515, all VCPUs:

VM-EXITSamples  Samples% Time%Min Time Max
Time Avg time

H_DATA_STORAGE   500635.30% 0.13%  1.94us 49.46us 
12.37us ( +-   0.52% )
HV_DECREMENTER   445731.43% 0.02%  0.72us 16.14us  
1.91us ( +-   0.96% )
SYSCALL   269018.97% 0.10%  2.84us528.24us 
18.29us ( +-   3.75% )
RETURN_TO_HOST   178912.61%99.76%  1.58us 672791.91us  
27470.23us ( +-   3.00% )
   EXTERNAL240 1.69% 0.00%0.69us 10.67us  
1.33us ( +-   5.34% )

Where is the last line misaligned? Copy  paste error or does perf kvm
produce it in such a way?


Its a copy-paste error. Thanks for pointing this out.

Shall I resend the patches with the correct alignment of the o/p?


Thanks,

Ingo



--
Thanks,
Hemant Kumar

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/2] perf/kvm: Port perf kvm to powerpc

2015-05-08 Thread Ingo Molnar


* Hemant Kumar hem...@linux.vnet.ibm.com wrote:

 
 On 05/08/2015 09:58 AM, Ingo Molnar wrote:
 * Hemant Kumar hem...@linux.vnet.ibm.com wrote:
 
   # perf kvm stat report -p 60515
 Analyze events for pid(s) 60515, all VCPUs:
 
 VM-EXITSamples  Samples% Time%Min Time Max  
Time Avg time
 
 H_DATA_STORAGE   500635.30% 0.13%  1.94us 49.46us 
 12.37us ( +-   0.52% )
 HV_DECREMENTER   445731.43% 0.02%  0.72us 16.14us  
 1.91us ( +-   0.96% )
 SYSCALL   269018.97% 0.10%  2.84us528.24us 
  18.29us ( +-   3.75% )
 RETURN_TO_HOST   178912.61%99.76%  1.58us 672791.91us  
 27470.23us ( +-   3.00% )
EXTERNAL240 1.69% 0.00%0.69us 10.67us
1.33us ( +-   5.34% )
 Where is the last line misaligned? Copy  paste error or does perf kvm
 produce it in such a way?
 
 Its a copy-paste error. Thanks for pointing this out.
 
 Shall I resend the patches with the correct alignment of the o/p?

I don't think that's necessary, as long as the code is fine.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH Part3 v11 8/9] PCI: Remove platform specific pci_domain_nr()

2015-05-08 Thread Yijing Wang

Now pci_host_bridge holds the domain number,
so we could eliminate all platform specific
pci_domain_nr().

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 arch/alpha/include/asm/pci.h |2 --
 arch/ia64/include/asm/pci.h  |1 -
 arch/microblaze/pci/pci-common.c |   11 ---
 arch/mips/include/asm/pci.h  |2 --
 arch/powerpc/kernel/pci-common.c |   11 ---
 arch/s390/pci/pci.c  |6 --
 arch/sh/include/asm/pci.h|2 --
 arch/sparc/kernel/pci.c  |   17 -
 arch/tile/include/asm/pci.h  |2 --
 arch/x86/include/asm/pci.h   |6 --
 drivers/pci/pci.c|8 
 include/linux/pci.h  |7 +--
 12 files changed, 9 insertions(+), 66 deletions(-)

diff --git a/arch/alpha/include/asm/pci.h b/arch/alpha/include/asm/pci.h
index f7f680f..63a9a1e 100644
--- a/arch/alpha/include/asm/pci.h
+++ b/arch/alpha/include/asm/pci.h
@@ -95,8 +95,6 @@ static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, 
int channel)
return channel ? 15 : 14;
 }
 
-#define pci_domain_nr(bus) ((struct pci_controller *)(bus)-sysdata)-index
-
 static inline int pci_proc_domain(struct pci_bus *bus)
 {
struct pci_controller *hose = bus-sysdata;
diff --git a/arch/ia64/include/asm/pci.h b/arch/ia64/include/asm/pci.h
index 52af5ed..1dcea49 100644
--- a/arch/ia64/include/asm/pci.h
+++ b/arch/ia64/include/asm/pci.h
@@ -99,7 +99,6 @@ struct pci_controller {
 
 
 #define PCI_CONTROLLER(busdev) ((struct pci_controller *) busdev-sysdata)
-#define pci_domain_nr(busdev)(PCI_CONTROLLER(busdev)-segment)
 
 extern struct pci_ops pci_root_ops;
 
diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c
index d232c8a..6f64908 100644
--- a/arch/microblaze/pci/pci-common.c
+++ b/arch/microblaze/pci/pci-common.c
@@ -123,17 +123,6 @@ unsigned long pci_address_to_pio(phys_addr_t address)
 }
 EXPORT_SYMBOL_GPL(pci_address_to_pio);
 
-/*
- * Return the domain number for this bus.
- */
-int pci_domain_nr(struct pci_bus *bus)
-{
-   struct pci_controller *hose = pci_bus_to_host(bus);
-
-   return hose-global_number;
-}
-EXPORT_SYMBOL(pci_domain_nr);
-
 /* This routine is meant to be used early during boot, when the
  * PCI bus numbers have not yet been assigned, and you need to
  * issue PCI config cycles to an OF device.
diff --git a/arch/mips/include/asm/pci.h b/arch/mips/include/asm/pci.h
index d969299..f5e96d4 100644
--- a/arch/mips/include/asm/pci.h
+++ b/arch/mips/include/asm/pci.h
@@ -124,8 +124,6 @@ static inline void pci_dma_burst_advice(struct pci_dev 
*pdev,
 #endif
 
 #ifdef CONFIG_PCI_DOMAINS
-#define pci_domain_nr(bus) ((struct pci_controller *)(bus)-sysdata)-index
-
 static inline int pci_proc_domain(struct pci_bus *bus)
 {
struct pci_controller *hose = bus-sysdata;
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 5754367..b787d89 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -195,17 +195,6 @@ unsigned long pci_address_to_pio(phys_addr_t address)
 }
 EXPORT_SYMBOL_GPL(pci_address_to_pio);
 
-/*
- * Return the domain number for this bus.
- */
-int pci_domain_nr(struct pci_bus *bus)
-{
-   struct pci_controller *hose = pci_bus_to_host(bus);
-
-   return hose-global_number;
-}
-EXPORT_SYMBOL(pci_domain_nr);
-
 /* This routine is meant to be used early during boot, when the
  * PCI bus numbers have not yet been assigned, and you need to
  * issue PCI config cycles to an OF device.
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index b9ac2f5..86acba4 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -101,12 +101,6 @@ static struct zpci_dev *get_zdev_by_bus(struct pci_bus 
*bus)
return (bus  bus-sysdata) ? (struct zpci_dev *) bus-sysdata : NULL;
 }
 
-int pci_domain_nr(struct pci_bus *bus)
-{
-   return ((struct zpci_dev *) bus-sysdata)-domain;
-}
-EXPORT_SYMBOL_GPL(pci_domain_nr);
-
 int pci_proc_domain(struct pci_bus *bus)
 {
return pci_domain_nr(bus);
diff --git a/arch/sh/include/asm/pci.h b/arch/sh/include/asm/pci.h
index 5b45115..4dc3ad6 100644
--- a/arch/sh/include/asm/pci.h
+++ b/arch/sh/include/asm/pci.h
@@ -109,8 +109,6 @@ static inline void pci_dma_burst_advice(struct pci_dev 
*pdev,
 /* Board-specific fixup routines. */
 int pcibios_map_platform_irq(const struct pci_dev *dev, u8 slot, u8 pin);
 
-#define pci_domain_nr(bus) ((struct pci_channel *)(bus)-sysdata)-index
-
 static inline int pci_proc_domain(struct pci_bus *bus)
 {
struct pci_channel *hose = bus-sysdata;
diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index dc74202..b38eba5 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -886,23 +886,6 @@ int pcibus_to_node(struct pci_bus *pbus)
 EXPORT_SYMBOL(pcibus_to_node);
 #endif
 
-/* Return the domain number for this pci bus */
-
-int pci_domain_nr(struct pci_bus *pbus)
-{
-

[PATCH Part3 v11 5/9] powerpc/PCI: Rename pcibios_root_bridge_prepare() to pcibios_root_bus_prepare()

2015-05-08 Thread Yijing Wang

Pcibios_root_bridge_prepare() in powerpc set root bus
speed, it's not the preparation for pci host bridge.
For better separation of host bridge and root bus creation,
It's need to rename it to another weak function.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 arch/powerpc/include/asm/machdep.h   |2 +-
 arch/powerpc/kernel/pci-common.c |6 +++---
 arch/powerpc/platforms/pseries/pci.c |2 +-
 arch/powerpc/platforms/pseries/pseries.h |2 +-
 arch/powerpc/platforms/pseries/setup.c   |2 +-
 drivers/pci/probe.c  |9 +
 6 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index ef88994..f236660 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -125,7 +125,7 @@ struct machdep_calls {
/* Called after allocating resources */
void(*pcibios_fixup)(void);
void(*pci_irq_fixup)(struct pci_dev *dev);
-   int (*pcibios_root_bridge_prepare)(struct pci_host_bridge
+   int (*pcibios_root_bus_prepare)(struct pci_host_bridge
*bridge);
 
/* To setup PHBs when using automatic OF platform driver for PCI */
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index e9506d5..5754367 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -781,10 +781,10 @@ int pci_proc_domain(struct pci_bus *bus)
return 1;
 }
 
-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
+int pcibios_root_bus_prepare(struct pci_host_bridge *bridge)
 {
-   if (ppc_md.pcibios_root_bridge_prepare)
-   return ppc_md.pcibios_root_bridge_prepare(bridge);
+   if (ppc_md.pcibios_root_bus_prepare)
+   return ppc_md.pcibios_root_bus_prepare(bridge);
 
return 0;
 }
diff --git a/arch/powerpc/platforms/pseries/pci.c 
b/arch/powerpc/platforms/pseries/pci.c
index fe16a50..885f9ff 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -110,7 +110,7 @@ static void fixup_winbond_82c105(struct pci_dev* dev)
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_WINBOND, PCI_DEVICE_ID_WINBOND_82C105,
 fixup_winbond_82c105);
 
-int pseries_root_bridge_prepare(struct pci_host_bridge *bridge)
+int pseries_root_bus_prepare(struct pci_host_bridge *bridge)
 {
struct device_node *dn, *pdn;
struct pci_bus *bus;
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 8411c27..41310dc 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -75,7 +75,7 @@ static inline int dlpar_memory(struct pseries_hp_errorlog 
*hp_elog)
 
 /* PCI root bridge prepare function override for pseries */
 struct pci_host_bridge;
-int pseries_root_bridge_prepare(struct pci_host_bridge *bridge);
+int pseries_root_bus_prepare(struct pci_host_bridge *bridge);
 
 extern struct pci_controller_ops pseries_pci_controller_ops;
 
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index df6a704..2815309 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -537,7 +537,7 @@ static void __init pSeries_setup_arch(void)
ppc_md.enable_pmcs = power4_enable_pmcs;
}
 
-   ppc_md.pcibios_root_bridge_prepare = pseries_root_bridge_prepare;
+   ppc_md.pcibios_root_bus_prepare = pseries_root_bus_prepare;
 
if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
long rc;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 9f9445e..f5f5de6 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1883,6 +1883,11 @@ int __weak pcibios_root_bridge_prepare(struct 
pci_host_bridge *bridge)
return 0;
 }
 
+int __weak pcibios_root_bus_prepare(struct pci_host_bridge *bridge)
+{
+   return 0;
+}
+
 void __weak pcibios_add_bus(struct pci_bus *bus)
 {
 }
@@ -1948,6 +1953,10 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int domain,
b-dev.class = pcibus_class;
b-dev.parent = b-bridge;
dev_set_name(b-dev, %04x:%02x, pci_domain_nr(b), bus);
+   error = pcibios_root_bus_prepare(bridge);
+   if (error)
+   goto class_dev_reg_err;
+
error = device_register(b-dev);
if (error)
goto class_dev_reg_err;
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH Part3 v11 2/9] PCI: Move pci_bus_assign_domain_nr() declaration into drivers/pci/pci.h

2015-05-08 Thread Yijing Wang

pci_bus_assign_domain_nr() is only called in probe.c,
Move pci_bus_assign_domain_nr() declaration into
drivers/pci/pci.h.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/pci/pci.h   |9 +
 include/linux/pci.h |6 --
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9bd762c..bc3e79a 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -325,4 +325,13 @@ static inline int pci_dev_specific_reset(struct pci_dev 
*dev, int probe)
 
 struct pci_host_bridge *pci_find_host_bridge(struct pci_bus *bus);
 
+#ifdef CONFIG_PCI_DOMAINS_GENERIC
+void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent);
+#else
+static inline void pci_bus_assign_domain_nr(struct pci_bus *bus,
+   struct device *parent)
+{
+}
+#endif
+
 #endif /* DRIVERS_PCI_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 720fdbb..5ff35cb 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1332,12 +1332,6 @@ static inline int pci_domain_nr(struct pci_bus *bus)
 {
return bus-domain_nr;
 }
-void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent);
-#else
-static inline void pci_bus_assign_domain_nr(struct pci_bus *bus,
-   struct device *parent)
-{
-}
 #endif
 
 /* some architectures require additional setup to direct VGA traffic */
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH V3] cpuidle: Handle tick_broadcast_enter() failure gracefully

2015-05-08 Thread Preeti U Murthy

When a CPU has to enter an idle state where tick stops, it makes a call
to tick_broadcast_enter(). The call will fail if this CPU is the
broadcast CPU. Today, under such a circumstance, the arch cpuidle code
handles this CPU.  This is not convincing because not only do we not
know what the arch cpuidle code does, but we also do not account for the
idle state residency time and usage of such a CPU.

This scenario can be handled better by simply choosing an idle state
where in ticks do not stop. To accommodate this change move the setting
of runqueue idle state from the core to the cpuidle driver, else the
rq-idle_state will be set wrong.

Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com
---
Changes from V2: https://lkml.org/lkml/2015/5/7/78
Introduce a function in cpuidle core to select an idle state where ticks do not
stop rather than going through the governors.

Changes from V1: https://lkml.org/lkml/2015/5/7/24
Rebased on the latest linux-pm/bleeding-edge branch

 drivers/cpuidle/cpuidle.c |   45 +++--
 include/linux/sched.h |   16 
 kernel/sched/core.c   |   17 +
 kernel/sched/fair.c   |2 +-
 kernel/sched/idle.c   |6 --
 kernel/sched/sched.h  |   24 
 6 files changed, 77 insertions(+), 33 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 8c24f95..d1af760 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -21,6 +21,7 @@
 #include linux/module.h
 #include linux/suspend.h
 #include linux/tick.h
+#include linux/sched.h
 #include trace/events/power.h
 
 #include cpuidle.h
@@ -146,6 +147,36 @@ int cpuidle_enter_freeze(struct cpuidle_driver *drv, 
struct cpuidle_device *dev)
return index;
 }
 
+/*
+ * find_tick_valid_state - select a state where tick does not stop
+ * @dev: cpuidle device for this cpu
+ * @drv: cpuidle driver for this cpu
+ */
+static int find_tick_valid_state(struct cpuidle_device *dev,
+   struct cpuidle_driver *drv)
+{
+   int i, ret = -1;
+
+   for (i = CPUIDLE_DRIVER_STATE_START; i  drv-state_count; i++) {
+   struct cpuidle_state *s = drv-states[i];
+   struct cpuidle_state_usage *su = dev-states_usage[i];
+
+   /*
+* We do not explicitly check for latency requirement
+* since it is safe to assume that only shallower idle
+* states will have the CPUIDLE_FLAG_TIMER_STOP bit
+* cleared and they will invariably meet the latency
+* requirement.
+*/
+   if (s-disabled || su-disable ||
+   (s-flags  CPUIDLE_FLAG_TIMER_STOP))
+   continue;
+
+   ret = i;
+   }
+   return ret;
+}
+
 /**
  * cpuidle_enter_state - enter the state and update stats
  * @dev: cpuidle device for this cpu
@@ -168,10 +199,17 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
struct cpuidle_driver *drv,
 * CPU as a broadcast timer, this call may fail if it is not available.
 */
if (broadcast  tick_broadcast_enter()) {
-   default_idle_call();
-   return -EBUSY;
+   index = find_tick_valid_state(dev, drv);
+   if (index  0) {
+   default_idle_call();
+   return -EBUSY;
+   }
+   target_state = drv-states[index];
}
 
+   /* Take note of the planned idle state. */
+   idle_set_state(smp_processor_id(), target_state);
+
trace_cpu_idle_rcuidle(index, dev-cpu);
time_start = ktime_get();
 
@@ -180,6 +218,9 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct 
cpuidle_driver *drv,
time_end = ktime_get();
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev-cpu);
 
+   /* The cpu is no longer idle or about to enter idle. */
+   idle_set_state(smp_processor_id(), NULL);
+
if (broadcast) {
if (WARN_ON_ONCE(!irqs_disabled()))
local_irq_disable();
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 26a2e61..fef8359 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -45,6 +45,7 @@ struct sched_param {
 #include linux/rcupdate.h
 #include linux/rculist.h
 #include linux/rtmutex.h
+#include linux/cpuidle.h
 
 #include linux/time.h
 #include linux/param.h
@@ -893,6 +894,21 @@ enum cpu_idle_type {
CPU_MAX_IDLE_TYPES
 };
 
+#ifdef CONFIG_CPU_IDLE
+extern void idle_set_state(int cpu, struct cpuidle_state *idle_state);
+extern struct cpuidle_state *idle_get_state(int cpu);
+#else
+static inline void idle_set_state(int cpu,
+ struct cpuidle_state *idle_state)
+{
+}
+
+static inline struct cpuidle_state *idle_get_state(int cpu)
+{
+   return NULL;
+}
+#endif
+
 /*
  * Increase resolution of

[PATCH Part3 v11 4/9] PCI: Introduce pci_host_assign_domain_nr() to assign domain

2015-05-08 Thread Yijing Wang

Introduce pci_host_assign_domain_nr() to save domain number
in pci_host_bridge.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/pci/pci.c |   24 +++-
 drivers/pci/pci.h |1 +
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7bf27e8..46a0240 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4506,10 +4506,10 @@ static int pci_get_new_domain_nr(void)
return atomic_inc_return(__domain_nr);
 }
 
-void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent)
+static int pci_assign_domain_nr(struct device *dev)
 {
static int use_dt_domains = -1;
-   int domain = of_get_pci_domain_nr(parent-of_node);
+   int domain = of_get_pci_domain_nr(dev-of_node);
 
/*
 * Check DT domain and use_dt_domains values.
@@ -4543,16 +4543,30 @@ void pci_bus_assign_domain_nr(struct pci_bus *bus, 
struct device *parent)
use_dt_domains = 0;
domain = pci_get_new_domain_nr();
} else {
-   dev_err(parent, Node %s has inconsistent \linux,pci-domain\ 
property in DT\n,
-   parent-of_node-full_name);
+   dev_err(dev, Node %s has inconsistent \linux,pci-domain\ 
property in DT\n,
+   dev-of_node-full_name);
domain = -1;
}
 
-   bus-domain_nr = domain;
+   return domain;
+}
+
+void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent)
+{
+   bus-domain_nr = pci_assign_domain_nr(parent);
 }
 #endif
 #endif
 
+void pci_host_assign_domain_nr(struct pci_host_bridge *host, int domain)
+{
+#ifdef CONFIG_PCI_DOMAINS_GENERIC
+   host-domain = pci_assign_domain_nr(host-dev.parent);
+#else
+   host-domain = domain;
+#endif
+}
+
 /**
  * pci_ext_cfg_avail - can we access extended PCI config space?
  *
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index bc3e79a..c2e1a6b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -334,4 +334,5 @@ static inline void pci_bus_assign_domain_nr(struct pci_bus 
*bus,
 }
 #endif
 
+void pci_host_assign_domain_nr(struct pci_host_bridge *host, int domain);
 #endif /* DRIVERS_PCI_H */
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2] cpuidle: Handle tick_broadcast_enter() failure gracefully

2015-05-08 Thread Preeti U Murthy

On 05/08/2015 02:20 AM, Rafael J. Wysocki wrote:
 On Thursday, May 07, 2015 11:17:21 PM Preeti U Murthy wrote:
 When a CPU has to enter an idle state where tick stops, it makes a call
 to tick_broadcast_enter(). The call will fail if this CPU is the
 broadcast CPU. Today, under such a circumstance, the arch cpuidle code
 handles this CPU.  This is not convincing because not only are we not
 aware what the arch cpuidle code does, but we also do not account for
 the idle state residency time and usage of such a CPU.

 This scenario can be handled better by simply asking the cpuidle
 governor to choose an idle state where in ticks do not stop. To
 accommodate this change move the setting of runqueue idle state from the
 core to the cpuidle driver, else the rq-idle_state will be set wrong.

 Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com
 ---
 Changes from V1: https://lkml.org/lkml/2015/5/7/24
 Rebased on the latest linux-pm/bleeding-edge

  drivers/cpuidle/cpuidle.c  |   21 +
  drivers/cpuidle/governors/ladder.c |   13 ++---
  drivers/cpuidle/governors/menu.c   |6 +-
  include/linux/cpuidle.h|6 +++---
  include/linux/sched.h  |   16 
  kernel/sched/core.c|   17 +
  kernel/sched/fair.c|2 +-
  kernel/sched/idle.c|8 +---
  kernel/sched/sched.h   |   24 
  9 files changed, 70 insertions(+), 43 deletions(-)

 diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
 index 8c24f95..b7e86f4 100644
 --- a/drivers/cpuidle/cpuidle.c
 +++ b/drivers/cpuidle/cpuidle.c
 @@ -21,6 +21,7 @@
  #include linux/module.h
  #include linux/suspend.h
  #include linux/tick.h
 +#include linux/sched.h
  #include trace/events/power.h
  
  #include cpuidle.h
 @@ -168,10 +169,17 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
 struct cpuidle_driver *drv,
   * CPU as a broadcast timer, this call may fail if it is not available.
   */
  if (broadcast  tick_broadcast_enter()) {
 -default_idle_call();
 -return -EBUSY;
 +index = cpuidle_select(drv, dev, !broadcast);
 
 No, you can't do that.
 
 This code path may be used by suspend-to-idle and that should not call
 cpuidle_select().
 
 What's needed here seems to be a fallback mechanism like choose the
 deepest state shallower than X and such that it won't stop the tick.
 You don't really need to run a full governor for that.

Agreed. Makes the patch a lot simpler as well. I have sent out V3 doing
this.

Thank you

Regards
Preeti U Murthy
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 1/2] arm64: dts: Add the arasan sdhci nodes in apm-storm.dtsi.

2015-05-08 Thread Suman Tripathi

On Wed, May 6, 2015 at 7:12 PM, Suman Tripathi stripa...@apm.com wrote:

 This patch adds the arasan sdhci nodes to reuse the of-arasan
 driver for APM X-Gene SoC.

 Signed-off-by: Suman Tripathi stripa...@apm.com
 ---
  arch/arm64/boot/dts/apm/apm-mustang.dts |  4 +++
  arch/arm64/boot/dts/apm/apm-storm.dtsi  | 43
 +
  2 files changed, 47 insertions(+)

 diff --git a/arch/arm64/boot/dts/apm/apm-mustang.dts
 b/arch/arm64/boot/dts/apm/apm-mustang.dts
 index 83578e7..7ccd517 100644
 --- a/arch/arm64/boot/dts/apm/apm-mustang.dts
 +++ b/arch/arm64/boot/dts/apm/apm-mustang.dts
 @@ -52,3 +52,7 @@
  xgenet {
 status = ok;
  };
 +
 +sdhci0 {
 +   status = ok;
 +};
 diff --git a/arch/arm64/boot/dts/apm/apm-storm.dtsi
 b/arch/arm64/boot/dts/apm/apm-storm.dtsi
 index c8d3e0e..b5d2698 100644
 --- a/arch/arm64/boot/dts/apm/apm-storm.dtsi
 +++ b/arch/arm64/boot/dts/apm/apm-storm.dtsi
 @@ -145,6 +145,40 @@
 clock-output-names = socplldiv2;
 };

 +   ahbclk: ahbclk@1f2ac000 {
 +   compatible = apm,xgene-device-clock;
 +   #clock-cells = 1;
 +   clocks = socplldiv2 0;
 +   reg = 0x0 0x1f2ac000 0x0 0x1000
 +   0x0 0x1700 0x0 0x2000;
 +   reg-names = csr-reg, div-reg;
 +   csr-offset = 0x0;
 +   csr-mask = 0x1;
 +   enable-offset = 0x8;
 +   enable-mask = 0x1;
 +   divider-offset = 0x164;
 +   divider-width = 0x5;
 +   divider-shift = 0x0;
 +   clock-output-names = ahbclk;
 +   };
 +
 +   sdioclk: sdioclk@1f2ac000 {
 +   compatible = apm,xgene-device-clock;
 +   #clock-cells = 1;
 +   clocks = socplldiv2 0;
 +   reg = 0x0 0x1f2ac000 0x0 0x1000
 +   0x0 0x1700 0x0 0x2000;
 +   reg-names = csr-reg, div-reg;
 +   csr-offset = 0x0;
 +   csr-mask = 0x2;
 +   enable-offset = 0x8;
 +   enable-mask = 0x2;
 +   divider-offset = 0x178;
 +   divider-width = 0x8;
 +   divider-shift = 0x0;
 +   clock-output-names = sdioclk;
 +   };
 +
 qmlclk: qmlclk {
 compatible = apm,xgene-device-clock;
 #clock-cells = 1;
 @@ -533,6 +567,15 @@
 interrupts = 0x0 0x4f 0x4;
 };

 +   sdhci0: sdhci@1c00 {
 +   compatible = arasan,sdhci-4.9a;
 +   reg = 0x0 0x1c00 0x0 0x100;
 +   interrupts = 0x0 0x49 0x4;
 +   dma-coherent;
 +   clock-names = clk_xin, clk_ahb;
 +   clocks = sdioclk 0, ahbclk 0;
 +   };
 +
 phy1: phy@1f21a000 {
 compatible = apm,xgene-phy;
 reg = 0x0 0x1f21a000 0x0 0x100;
 --
 1.8.2.1


Any comments on this patch ??

-- 
Thanks,
with regards,
Suman Tripathi
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH Part3 v11 7/9] PCI: Create pci host bridge prior to root bus

2015-05-08 Thread Yijing Wang

Pci_host_bridge hold the domain number, we need
to assign domain number prior to root bus creation,
because root bus need to know the domain number
to check whether it's alreay exist.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/pci/probe.c |   60 ++
 1 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 9ed8ab7..e4ef791 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -515,7 +515,7 @@ static void pci_release_host_bridge_dev(struct device *dev)
kfree(bridge);
 }
 
-static struct pci_host_bridge *pci_alloc_host_bridge(struct pci_bus *b)
+static struct pci_host_bridge *pci_alloc_host_bridge(void)
 {
struct pci_host_bridge *bridge;
 
@@ -524,7 +524,6 @@ static struct pci_host_bridge *pci_alloc_host_bridge(struct 
pci_bus *b)
return NULL;
 
INIT_LIST_HEAD(bridge-windows);
-   bridge-bus = b;
return bridge;
 }
 
@@ -1902,48 +1901,51 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int domain,
 {
int error;
struct pci_host_bridge *bridge;
-   struct pci_bus *b, *b2;
+   struct pci_bus *b;
struct resource_entry *window, *n;
struct resource *res;
resource_size_t offset;
char bus_addr[64];
char *fmt;
 
-   b = pci_alloc_bus(NULL);
-   if (!b)
-   return NULL;
-
-   b-sysdata = sysdata;
-   b-ops = ops;
-   b-number = b-busn_res.start = bus;
-   pci_bus_assign_domain_nr(b, parent);
-   b2 = pci_find_bus(pci_domain_nr(b), bus);
-   if (b2) {
-   /* If we already got to this bus through a different bridge, 
ignore it */
-   dev_dbg(b2-dev, bus already known\n);
-   goto err_out;
-   }
-
-   bridge = pci_alloc_host_bridge(b);
+   bridge = pci_alloc_host_bridge();
if (!bridge)
-   goto err_out;
+   return NULL;
 
-   bridge-domain = domain;
bridge-dev.parent = parent;
+   pci_host_assign_domain_nr(bridge, domain);
bridge-dev.release = pci_release_host_bridge_dev;
dev_set_drvdata(bridge-dev, sysdata);
-   dev_set_name(bridge-dev, pci%04x:%02x, pci_domain_nr(b), bus);
+   dev_set_name(bridge-dev, pci%04x:%02x, bridge-domain, bus);
error = pcibios_root_bridge_prepare(bridge);
if (error) {
kfree(bridge);
-   goto err_out;
+   return NULL;
}
 
error = device_register(bridge-dev);
if (error) {
put_device(bridge-dev);
-   goto err_out;
+   return NULL;
}
+
+   b = pci_find_bus(bridge-domain, bus);
+   if (b) {
+   /* If we already got to this bus through a different bridge, 
ignore it */
+   dev_dbg(b-dev, bus already known\n);
+   goto unregister_host;
+   }
+
+   b = pci_alloc_bus(NULL);
+   if (!b)
+   goto unregister_host;
+
+   bridge-bus = b;
+   b-sysdata = sysdata;
+   b-ops = ops;
+   b-number = b-busn_res.start = bus;
+   pci_bus_assign_domain_nr(b, parent);
+
b-bridge = get_device(bridge-dev);
device_enable_async_suspend(b-bridge);
pci_set_bus_of_node(b);
@@ -1956,11 +1958,11 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int domain,
dev_set_name(b-dev, %04x:%02x, pci_domain_nr(b), bus);
error = pcibios_root_bus_prepare(bridge);
if (error)
-   goto class_dev_reg_err;
+   goto free_bus;
 
error = device_register(b-dev);
if (error)
-   goto class_dev_reg_err;
+   goto free_bus;
 
pcibios_add_bus(b);
 
@@ -2000,11 +2002,11 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int domain,
 
return b;
 
-class_dev_reg_err:
+free_bus:
+   kfree(b);
put_device(bridge-dev);
+unregister_host:
device_unregister(bridge-dev);
-err_out:
-   kfree(b);
return NULL;
 }
 EXPORT_SYMBOL_GPL(pci_create_root_bus);
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH Part3 v11 6/9] PCI: Make pci_host_bridge hold sysdata in drvdata

2015-05-08 Thread Yijing Wang

Now platform specific sysdata is saved in pci_bus,
and pcibios_root_bridge_prepare() need to know
the sysdata. Later, we would move pcibios_root_bridge_prepare()
prior to root bus creation, so we need to make
pci_host_bridge hold sysdata.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 arch/ia64/pci/pci.c |2 +-
 arch/x86/pci/acpi.c |2 +-
 drivers/pci/probe.c |1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c
index 33803f7..c82d666 100644
--- a/arch/ia64/pci/pci.c
+++ b/arch/ia64/pci/pci.c
@@ -478,7 +478,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root 
*root)
 
 int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
 {
-   struct pci_controller *controller = bridge-bus-sysdata;
+   struct pci_controller *controller = dev_get_drvdata(bridge-dev);
 
ACPI_COMPANION_SET(bridge-dev, controller-companion);
return 0;
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index 7563855..948b675 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -462,7 +462,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root 
*root)
 
 int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
 {
-   struct pci_sysdata *sd = bridge-bus-sysdata;
+   struct pci_sysdata *sd = dev_get_drvdata(bridge-dev);
 
ACPI_COMPANION_SET(bridge-dev, sd-companion);
return 0;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f5f5de6..9ed8ab7 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1931,6 +1931,7 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int domain,
bridge-domain = domain;
bridge-dev.parent = parent;
bridge-dev.release = pci_release_host_bridge_dev;
+   dev_set_drvdata(bridge-dev, sysdata);
dev_set_name(bridge-dev, pci%04x:%02x, pci_domain_nr(b), bus);
error = pcibios_root_bridge_prepare(bridge);
if (error) {
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 1/2] powerpc/powernv: Add poweroff (EPOW, DPO) events support for PowerNV platform

2015-05-08 Thread trigg

Hi Vipin,

These comments are in addition to what Joel has said in his review.

On Thu, May 7, 2015 at 3:00 PM, Vipin K Parashar
vi...@linux.vnet.ibm.com wrote:
 This patch adds support for FSP EPOW (Early Power Off Warning) and
 DPO (Delayed Power Off) events support for PowerNV platform.  EPOW events
 are generated by SPCN/FSP due to various critical system conditions that
 need system shutdown.  Few examples of these conditions are high ambient
 temperature or system running on UPS power with low UPS battery. DPO event
 is generated in response to admin initiated system shutdown request.
 This patch enables host kernel on PowerNV platform to handle OPAL
 notifications for these events and initiate system poweroff. Since EPOW
 notifications are sent in advance of impending shutdown event and thus
 this patch also adds functionality to wait for EPOW condition to return to
 normal. Host allows MAX_POWEROFF_SYS_TIME (600 seconds) as system
 poweroff time (time for host + guests shutdown) and waits for remaining
 time for EPOW condition to return to normal. If EPOW condition doesn't
 return to normal in calculated time it proceeds with graceful system
 shutdown. For EPOW events with smaller timeouts values than
 MAX_POWEROFF_SYS_TIME it proceeds with system shutdown without any wait
 for EPOW condition to return to normal.
 System admin can also add systemd service shutdown scripts to
 perform any specific actions like graceful guest shutdown upon system
 poweroff. libvirt-guests is systemd service available on recent distros
 for management of guests at system stat/shutdown time.

 Signed-off-by: Vipin K Parashar vi...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/opal-api.h|  30 ++
  arch/powerpc/include/asm/opal.h|   3 +-
  arch/powerpc/platforms/powernv/opal-power.c| 379 
 +++--
  arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
  4 files changed, 391 insertions(+), 22 deletions(-)

 diff --git a/arch/powerpc/include/asm/opal-api.h 
 b/arch/powerpc/include/asm/opal-api.h
 index 0321a90..03b3cef 100644
 --- a/arch/powerpc/include/asm/opal-api.h
 +++ b/arch/powerpc/include/asm/opal-api.h
 @@ -730,6 +730,36 @@ struct opal_i2c_request {
 __be64 buffer_ra;   /* Buffer real address */
  };

 +/*
 + * EPOW status sharing (OPAL and the host)
 + *
 + * The host will pass on OPAL, a buffer of length OPAL_EPOW_MAX_CLASSES
 + * to fetch system wide EPOW status. Each element in the returned buffer
 + * will contain bitwise EPOW status for each EPOW sub class.
 + */
 +
 +/* EPOW types */
 +enum OpalEpow {
 +   OPAL_EPOW_POWER = 0,/* Power EPOW */
 +   OPAL_EPOW_TEMP  = 1,/* Temperature EPOW */
 +   OPAL_EPOW_COOLING   = 2,/* Cooling EPOW */
 +   OPAL_MAX_EPOW_CLASSES   = 3,/* Max EPOW categories */
 +};
Dont explicitly assign sequential numbers in an enum. Its taken care
of by the compiler.

 +
 +/* Power EPOW events */
 +enum OpalEpowPower {
 +   OPAL_EPOW_POWER_UPS = 0x1, /* System on UPS power */
 +   OPAL_EPOW_POWER_UPS_LOW = 0x2, /* System on UPS power with low 
 battery*/
 +};
 +
 +/* Temperature EPOW events */
 +enum OpalEpowTemp {
 +   OPAL_EPOW_TEMP_HIGH_AMB = 0x1, /* High ambient temperature */
 +   OPAL_EPOW_TEMP_CRIT_AMB = 0x2, /* Critical ambient temperature */
 +   OPAL_EPOW_TEMP_HIGH_INT = 0x4, /* High internal temperature */
 +   OPAL_EPOW_TEMP_CRIT_INT = 0x8, /* Critical internal temperature */
 +};
 +
  #endif /* __ASSEMBLY__ */

  #endif /* __OPAL_API_H */
 diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
 index 042af1a..0777864 100644
 --- a/arch/powerpc/include/asm/opal.h
 +++ b/arch/powerpc/include/asm/opal.h
 @@ -141,7 +141,6 @@ int64_t opal_pci_fence_phb(uint64_t phb_id);
  int64_t opal_pci_reinit(uint64_t phb_id, uint64_t reinit_scope, uint64_t 
 data);
  int64_t opal_pci_mask_pe_error(uint64_t phb_id, uint16_t pe_number, uint8_t 
 error_type, uint8_t mask_action);
  int64_t opal_set_slot_led_status(uint64_t phb_id, uint64_t slot_id, uint8_t 
 led_type, uint8_t led_action);
 -int64_t opal_get_epow_status(__be64 *status);
  int64_t opal_set_system_attention_led(uint8_t led_action);
  int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
 __be16 *pci_error_type, __be16 *severity);
 @@ -200,6 +199,8 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, 
 uint64_t buf,
 uint64_t size, uint64_t token);
  int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
 uint64_t token);
 +int32_t opal_get_epow_status(__be32 *status, __be32 *num_classes);
 +int32_t opal_get_dpo_status(__be32 *timeout);

  /* Internal functions */
  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
 diff --git a/arch/powerpc/platforms/powernv/opal-power.c 
 b/arch/powerpc/platforms/powernv/opal-power.c

[PATCH Part3 v11 3/9] PCI: Remove declaration for pci_get_new_domain_nr()

2015-05-08 Thread Yijing Wang

pci_get_new_domain_nr() is only used in drivers/pci/pci.c,
remove the declaration in include/linux/pci.h.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/pci/pci.c   |4 ++--
 include/linux/pci.h |3 ---
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index acc4b6e..7bf27e8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4498,14 +4498,14 @@ static void pci_no_domains(void)
 }
 
 #ifdef CONFIG_PCI_DOMAINS
+#ifdef CONFIG_PCI_DOMAINS_GENERIC
 static atomic_t __domain_nr = ATOMIC_INIT(-1);
 
-int pci_get_new_domain_nr(void)
+static int pci_get_new_domain_nr(void)
 {
return atomic_inc_return(__domain_nr);
 }
 
-#ifdef CONFIG_PCI_DOMAINS_GENERIC
 void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent)
 {
static int use_dt_domains = -1;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5ff35cb..636c0a9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1314,12 +1314,10 @@ void pci_cfg_access_unlock(struct pci_dev *dev);
  */
 #ifdef CONFIG_PCI_DOMAINS
 extern int pci_domains_supported;
-int pci_get_new_domain_nr(void);
 #else
 enum { pci_domains_supported = 0 };
 static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
 static inline int pci_proc_domain(struct pci_bus *bus) { return 0; }
-static inline int pci_get_new_domain_nr(void) { return -ENOSYS; }
 #endif /* CONFIG_PCI_DOMAINS */
 
 /*
@@ -1442,7 +1440,6 @@ static inline struct pci_dev 
*pci_get_bus_and_slot(unsigned int bus,
 
 static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
 static inline struct pci_dev *pci_dev_get(struct pci_dev *dev) { return NULL; }
-static inline int pci_get_new_domain_nr(void) { return -ENOSYS; }
 
 #define dev_is_pci(d) (false)
 #define dev_is_pf(d) (false)
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 2/2] mmc: sdhci: Add support to disable SDR104/SDR50/DDR50 based on capability register 0.

2015-05-08 Thread Suman Tripathi

On Wed, May 6, 2015 at 7:12 PM, Suman Tripathi stripa...@apm.com wrote:

 The sdhci framework disables SDR104/SDR50/DDR50 based on only quirk.
 This patch adds the support to disable SDR104/SDR50/DDR50 based on
 reading the capability register 0.

 Signed-off-by: Suman Tripathi stripa...@apm.com
 ---
  drivers/mmc/host/sdhci.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
 index c80287a..e024c64 100644
 --- a/drivers/mmc/host/sdhci.c
 +++ b/drivers/mmc/host/sdhci.c
 @@ -3199,7 +3199,8 @@ int sdhci_add_host(struct sdhci_host *host)
 }
 }

 -   if (host-quirks2  SDHCI_QUIRK2_NO_1_8_V)
 +   if (host-quirks2  SDHCI_QUIRK2_NO_1_8_V ||
 +   !(caps[0]  SDHCI_CAN_VDD_180))
 caps[1] = ~(SDHCI_SUPPORT_SDR104 | SDHCI_SUPPORT_SDR50 |
SDHCI_SUPPORT_DDR50);

 --
 1.8.2.1


Any comments on this patch ??


-- 
Thanks,
with regards,
Suman Tripathi
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH Part3 v11 9/9] PCI: Remove pci_bus_assign_domain_nr()

2015-05-08 Thread Yijing Wang

Now we save the domain number in pci_host_bridge,
we could remove pci_bus_assign_domain_nr() and
clean the domain member in pci_bus.

Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 drivers/pci/pci.c   |5 -
 drivers/pci/pci.h   |9 -
 drivers/pci/probe.c |   11 +++
 include/linux/pci.h |3 ---
 4 files changed, 3 insertions(+), 25 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2e2f429..a3cb571 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4558,11 +4558,6 @@ static int pci_assign_domain_nr(struct device *dev)
 
return domain;
 }
-
-void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent)
-{
-   bus-domain_nr = pci_assign_domain_nr(parent);
-}
 #endif
 #endif
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c2e1a6b..d8a4238 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -325,14 +325,5 @@ static inline int pci_dev_specific_reset(struct pci_dev 
*dev, int probe)
 
 struct pci_host_bridge *pci_find_host_bridge(struct pci_bus *bus);
 
-#ifdef CONFIG_PCI_DOMAINS_GENERIC
-void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent);
-#else
-static inline void pci_bus_assign_domain_nr(struct pci_bus *bus,
-   struct device *parent)
-{
-}
-#endif
-
 void pci_host_assign_domain_nr(struct pci_host_bridge *host, int domain);
 #endif /* DRIVERS_PCI_H */
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e4ef791..be60074 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -481,7 +481,7 @@ void pci_read_bridge_bases(struct pci_bus *child)
}
 }
 
-static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
+static struct pci_bus *pci_alloc_bus(void)
 {
struct pci_bus *b;
 
@@ -496,10 +496,6 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus 
*parent)
INIT_LIST_HEAD(b-resources);
b-max_bus_speed = PCI_SPEED_UNKNOWN;
b-cur_bus_speed = PCI_SPEED_UNKNOWN;
-#ifdef CONFIG_PCI_DOMAINS_GENERIC
-   if (parent)
-   b-domain_nr = parent-domain_nr;
-#endif
return b;
 }
 
@@ -670,7 +666,7 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus 
*parent,
/*
 * Allocate a new bus, and inherit stuff from the parent..
 */
-   child = pci_alloc_bus(parent);
+   child = pci_alloc_bus();
if (!child)
return NULL;
 
@@ -1936,7 +1932,7 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int domain,
goto unregister_host;
}
 
-   b = pci_alloc_bus(NULL);
+   b = pci_alloc_bus();
if (!b)
goto unregister_host;
 
@@ -1944,7 +1940,6 @@ struct pci_bus *pci_create_root_bus(struct device 
*parent, int domain,
b-sysdata = sysdata;
b-ops = ops;
b-number = b-busn_res.start = bus;
-   pci_bus_assign_domain_nr(b, parent);
 
b-bridge = get_device(bridge-dev);
device_enable_async_suspend(b-bridge);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 13ed681..f010042 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -460,9 +460,6 @@ struct pci_bus {
unsigned char   primary;/* number of primary bridge */
unsigned char   max_bus_speed;  /* enum pci_bus_speed */
unsigned char   cur_bus_speed;  /* enum pci_bus_speed */
-#ifdef CONFIG_PCI_DOMAINS_GENERIC
-   int domain_nr;
-#endif
 
charname[48];
 
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH Part3 v11 1/9] PCI: Save domain in pci_host_bridge

2015-05-08 Thread Yijing Wang

Save domain in pci_host_bridge, so we could get domain
from pci_host_bridge, and at the end of series, we could
clean up the arch specific pci_domain_nr(). For arm/arm64,
the domain argument is pointless, because they enable
CONFIG_PCI_DOMAINS_GENERIC, PCI core would assign domain
number for them, so we pass meaningless -1 as the domain
number.

Tested-by: Gregory CLEMENT gregory.clem...@free-electrons.com #mvebu part
Signed-off-by: Yijing Wang wangyij...@huawei.com
---
 arch/alpha/kernel/pci.c|4 ++--
 arch/alpha/kernel/sys_nautilus.c   |2 +-
 arch/arm/kernel/bios32.c   |2 +-
 arch/arm/mach-dove/pcie.c  |2 +-
 arch/arm/mach-iop13xx/pci.c|4 ++--
 arch/arm/mach-mv78xx0/pcie.c   |2 +-
 arch/arm/mach-orion5x/pci.c|4 ++--
 arch/frv/mb93090-mb00/pci-vdk.c|3 ++-
 arch/ia64/pci/pci.c|4 ++--
 arch/ia64/sn/kernel/io_init.c  |4 ++--
 arch/m68k/coldfire/pci.c   |2 +-
 arch/microblaze/pci/pci-common.c   |4 ++--
 arch/mips/pci/pci.c|4 ++--
 arch/mn10300/unit-asb2305/pci.c|3 ++-
 arch/powerpc/kernel/pci-common.c   |4 ++--
 arch/s390/pci/pci.c|4 ++--
 arch/sh/drivers/pci/pci.c  |4 ++--
 arch/sparc/kernel/leon_pci.c   |2 +-
 arch/sparc/kernel/pci.c|4 ++--
 arch/sparc/kernel/pcic.c   |2 +-
 arch/tile/kernel/pci.c |4 ++--
 arch/tile/kernel/pci_gx.c  |4 ++--
 arch/unicore32/kernel/pci.c|2 +-
 arch/x86/pci/acpi.c|4 ++--
 arch/x86/pci/common.c  |2 +-
 arch/xtensa/kernel/pci.c   |2 +-
 drivers/parisc/dino.c  |2 +-
 drivers/parisc/lba_pci.c   |2 +-
 drivers/pci/host/pci-versatile.c   |3 ++-
 drivers/pci/host/pci-xgene.c   |2 +-
 drivers/pci/host/pcie-designware.c |2 +-
 drivers/pci/host/pcie-iproc.c  |2 +-
 drivers/pci/host/pcie-xilinx.c |2 +-
 drivers/pci/hotplug/ibmphp_core.c  |2 +-
 drivers/pci/probe.c|   21 +
 drivers/pci/xen-pcifront.c |2 +-
 include/linux/pci.h|8 +---
 37 files changed, 70 insertions(+), 60 deletions(-)

diff --git a/arch/alpha/kernel/pci.c b/arch/alpha/kernel/pci.c
index 82f738e..2b0bce9 100644
--- a/arch/alpha/kernel/pci.c
+++ b/arch/alpha/kernel/pci.c
@@ -336,8 +336,8 @@ common_init_pci(void)
pci_add_resource_offset(resources, hose-mem_space,
hose-mem_space-start);
 
-   bus = pci_scan_root_bus(NULL, next_busno, alpha_mv.pci_ops,
-   hose, resources);
+   bus = pci_scan_root_bus(NULL, hose-index, next_busno,
+   alpha_mv.pci_ops, hose, resources);
if (!bus)
continue;
hose-bus = bus;
diff --git a/arch/alpha/kernel/sys_nautilus.c b/arch/alpha/kernel/sys_nautilus.c
index 700686d..9614e4e 100644
--- a/arch/alpha/kernel/sys_nautilus.c
+++ b/arch/alpha/kernel/sys_nautilus.c
@@ -206,7 +206,7 @@ nautilus_init_pci(void)
unsigned long memtop = max_low_pfn  PAGE_SHIFT;
 
/* Scan our single hose.  */
-   bus = pci_scan_bus(0, alpha_mv.pci_ops, hose);
+   bus = pci_scan_bus(hose-index, 0, alpha_mv.pci_ops, hose);
if (!bus)
return;
 
diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index fc1..5c5a9bd 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -486,7 +486,7 @@ static void pcibios_init_hw(struct device *parent, struct 
hw_pci *hw,
if (hw-scan)
sys-bus = hw-scan(nr, sys);
else
-   sys-bus = pci_scan_root_bus(parent, sys-busnr,
+   sys-bus = pci_scan_root_bus(parent, -1, 
sys-busnr,
hw-ops, sys, sys-resources);
 
if (!sys-bus)
diff --git a/arch/arm/mach-dove/pcie.c b/arch/arm/mach-dove/pcie.c
index 91fe971..a379287 100644
--- a/arch/arm/mach-dove/pcie.c
+++ b/arch/arm/mach-dove/pcie.c
@@ -160,7 +160,7 @@ dove_pcie_scan_bus(int nr, struct pci_sys_data *sys)
return NULL;
}
 
-   return pci_scan_root_bus(NULL, sys-busnr, pcie_ops, sys,
+   return pci_scan_root_bus(NULL, -1, sys-busnr, pcie_ops, sys,
 sys-resources);
 }
 
diff --git a/arch/arm/mach-iop13xx/pci.c b/arch/arm/mach-iop13xx/pci.c
index 9082b84..bc4ba7e 100644
--- a/arch/arm/mach-iop13xx/pci.c
+++ b/arch/arm/mach-iop13xx/pci.c
@@ -535,12 +535,12 @@ struct pci_bus *iop13xx_scan_bus(int nr, struct 
pci_sys_data *sys)
while(time_before(jiffies, atux_trhfa_timeout))
udelay(100);
 
-   bus =

[PATCH Part3 v11 0/9] Remove platform pci_domain_nr()

2015-05-08 Thread Yijing Wang

This series is splitted out from previous patchset
Refine PCI scan interfaces and make generic pci host bridge.
It try to clean up all platform pci_domain_nr(), save domain
in pci_host_bridge, so we could get domain number from the
common interface.

You could pull it from https://github.com/YijingWang/linux-pci.git enumer11

Yijing Wang (9):
  PCI: Save domain in pci_host_bridge
  PCI: Move pci_bus_assign_domain_nr() declaration into
drivers/pci/pci.h
  PCI: Remove declaration for pci_get_new_domain_nr()
  PCI: Introduce pci_host_assign_domain_nr() to assign domain
  powerpc/PCI: Rename pcibios_root_bridge_prepare() to
pcibios_root_bus_prepare()
  PCI: Make pci_host_bridge hold sysdata in drvdata
  PCI: Create pci host bridge prior to root bus
  PCI: Remove platform specific pci_domain_nr()
  PCI: Remove pci_bus_assign_domain_nr()

 arch/alpha/include/asm/pci.h |2 -
 arch/alpha/kernel/pci.c  |4 +-
 arch/alpha/kernel/sys_nautilus.c |2 +-
 arch/arm/kernel/bios32.c |2 +-
 arch/arm/mach-dove/pcie.c|2 +-
 arch/arm/mach-iop13xx/pci.c  |4 +-
 arch/arm/mach-mv78xx0/pcie.c |2 +-
 arch/arm/mach-orion5x/pci.c  |4 +-
 arch/frv/mb93090-mb00/pci-vdk.c  |3 +-
 arch/ia64/include/asm/pci.h  |1 -
 arch/ia64/pci/pci.c  |6 +-
 arch/ia64/sn/kernel/io_init.c|4 +-
 arch/m68k/coldfire/pci.c |2 +-
 arch/microblaze/pci/pci-common.c |   15 +
 arch/mips/include/asm/pci.h  |2 -
 arch/mips/pci/pci.c  |4 +-
 arch/mn10300/unit-asb2305/pci.c  |3 +-
 arch/powerpc/include/asm/machdep.h   |2 +-
 arch/powerpc/kernel/pci-common.c |   21 ++-
 arch/powerpc/platforms/pseries/pci.c |2 +-
 arch/powerpc/platforms/pseries/pseries.h |2 +-
 arch/powerpc/platforms/pseries/setup.c   |2 +-
 arch/s390/pci/pci.c  |   10 +---
 arch/sh/drivers/pci/pci.c|4 +-
 arch/sh/include/asm/pci.h|2 -
 arch/sparc/kernel/leon_pci.c |2 +-
 arch/sparc/kernel/pci.c  |   21 +--
 arch/sparc/kernel/pcic.c |2 +-
 arch/tile/include/asm/pci.h  |2 -
 arch/tile/kernel/pci.c   |4 +-
 arch/tile/kernel/pci_gx.c|4 +-
 arch/unicore32/kernel/pci.c  |2 +-
 arch/x86/include/asm/pci.h   |6 --
 arch/x86/pci/acpi.c  |6 +-
 arch/x86/pci/common.c|2 +-
 arch/xtensa/kernel/pci.c |2 +-
 drivers/parisc/dino.c|2 +-
 drivers/parisc/lba_pci.c |2 +-
 drivers/pci/host/pci-versatile.c |3 +-
 drivers/pci/host/pci-xgene.c |2 +-
 drivers/pci/host/pcie-designware.c   |2 +-
 drivers/pci/host/pcie-iproc.c|2 +-
 drivers/pci/host/pcie-xilinx.c   |2 +-
 drivers/pci/hotplug/ibmphp_core.c|2 +-
 drivers/pci/pci.c|   31 --
 drivers/pci/pci.h|1 +
 drivers/pci/probe.c  |   94 +-
 drivers/pci/xen-pcifront.c   |2 +-
 include/linux/pci.h  |   27 ++---
 49 files changed, 145 insertions(+), 187 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 02/12] KVM: define common __KVM_GUESTDBG_USE_SW/HW_BP values

2015-05-08 Thread Christoffer Dall

On Wed, May 06, 2015 at 05:23:17PM +0100, Alex Bennée wrote:
 Currently x86, powerpc and soon arm64 use the same two architecture
 specific bits for guest debug support for software and hardware
 breakpoints. This makes the shared values explicit while leaving the
 gate open for another architecture to use some other value if they
 really really want to.
 
 Signed-off-by: Alex Bennée alex.ben...@linaro.org
 Reviewed-by: Andrew Jones drjo...@redhat.com
 
 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index ab4d473..1731569 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -310,8 +310,8 @@ struct kvm_guest_debug_arch {
   * and upper 16 bits are architecture specific. Architecture specific defines
   * that ioctl is for setting hardware breakpoint or software breakpoint.
   */
 -#define KVM_GUESTDBG_USE_SW_BP   0x0001
 -#define KVM_GUESTDBG_USE_HW_BP   0x0002
 +#define KVM_GUESTDBG_USE_SW_BP   __KVM_GUESTDBG_USE_SW_BP
 +#define KVM_GUESTDBG_USE_HW_BP   __KVM_GUESTDBG_USE_HW_BP
  
  /* definition of registers in kvm_run */
  struct kvm_sync_regs {
 diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
 index d7dcef5..1438202 100644
 --- a/arch/x86/include/uapi/asm/kvm.h
 +++ b/arch/x86/include/uapi/asm/kvm.h
 @@ -250,8 +250,8 @@ struct kvm_debug_exit_arch {
   __u64 dr7;
  };
  
 -#define KVM_GUESTDBG_USE_SW_BP   0x0001
 -#define KVM_GUESTDBG_USE_HW_BP   0x0002
 +#define KVM_GUESTDBG_USE_SW_BP   __KVM_GUESTDBG_USE_SW_BP
 +#define KVM_GUESTDBG_USE_HW_BP   __KVM_GUESTDBG_USE_HW_BP
  #define KVM_GUESTDBG_INJECT_DB   0x0004
  #define KVM_GUESTDBG_INJECT_BP   0x0008
  
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 70ac641..3b6252e 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -570,8 +570,16 @@ struct kvm_s390_irq_state {
  
  /* for KVM_SET_GUEST_DEBUG */
  
 -#define KVM_GUESTDBG_ENABLE  0x0001
 -#define KVM_GUESTDBG_SINGLESTEP  0x0002
 +#define KVM_GUESTDBG_ENABLE  (1  0)
 +#define KVM_GUESTDBG_SINGLESTEP  (1  1)
 +
 +/*
 + * Architecture specific stuff uses the top 16 bits of the field,

s/stuff/something more specific/

 + * however there is some shared commonality for the common cases
 + */
 +#define __KVM_GUESTDBG_USE_SW_BP (1  16)
 +#define __KVM_GUESTDBG_USE_HW_BP (1  17)
 +
  
  struct kvm_guest_debug {
   __u32 control;

We sort of left this discussion hanging with me expressing slight
concern about the usefulness about these defines.

Paolo, what are your thoughts?

-Christoffer
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 02/12] KVM: define common __KVM_GUESTDBG_USE_SW/HW_BP values

2015-05-08 Thread Paolo Bonzini



On 08/05/2015 11:23, Christoffer Dall wrote:
 On Wed, May 06, 2015 at 05:23:17PM +0100, Alex Bennée wrote:
 Currently x86, powerpc and soon arm64 use the same two architecture
 specific bits for guest debug support for software and hardware
 breakpoints. This makes the shared values explicit while leaving the
 gate open for another architecture to use some other value if they
 really really want to.

 Signed-off-by: Alex Bennée alex.ben...@linaro.org
 Reviewed-by: Andrew Jones drjo...@redhat.com

 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index ab4d473..1731569 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -310,8 +310,8 @@ struct kvm_guest_debug_arch {
   * and upper 16 bits are architecture specific. Architecture specific 
 defines
   * that ioctl is for setting hardware breakpoint or software breakpoint.
   */
 -#define KVM_GUESTDBG_USE_SW_BP  0x0001
 -#define KVM_GUESTDBG_USE_HW_BP  0x0002
 +#define KVM_GUESTDBG_USE_SW_BP  __KVM_GUESTDBG_USE_SW_BP
 +#define KVM_GUESTDBG_USE_HW_BP  __KVM_GUESTDBG_USE_HW_BP
  
  /* definition of registers in kvm_run */
  struct kvm_sync_regs {
 diff --git a/arch/x86/include/uapi/asm/kvm.h 
 b/arch/x86/include/uapi/asm/kvm.h
 index d7dcef5..1438202 100644
 --- a/arch/x86/include/uapi/asm/kvm.h
 +++ b/arch/x86/include/uapi/asm/kvm.h
 @@ -250,8 +250,8 @@ struct kvm_debug_exit_arch {
  __u64 dr7;
  };
  
 -#define KVM_GUESTDBG_USE_SW_BP  0x0001
 -#define KVM_GUESTDBG_USE_HW_BP  0x0002
 +#define KVM_GUESTDBG_USE_SW_BP  __KVM_GUESTDBG_USE_SW_BP
 +#define KVM_GUESTDBG_USE_HW_BP  __KVM_GUESTDBG_USE_HW_BP
  #define KVM_GUESTDBG_INJECT_DB  0x0004
  #define KVM_GUESTDBG_INJECT_BP  0x0008
  
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 70ac641..3b6252e 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -570,8 +570,16 @@ struct kvm_s390_irq_state {
  
  /* for KVM_SET_GUEST_DEBUG */
  
 -#define KVM_GUESTDBG_ENABLE 0x0001
 -#define KVM_GUESTDBG_SINGLESTEP 0x0002
 +#define KVM_GUESTDBG_ENABLE (1  0)
 +#define KVM_GUESTDBG_SINGLESTEP (1  1)
 +
 +/*
 + * Architecture specific stuff uses the top 16 bits of the field,
 
 s/stuff/something more specific/
 
 + * however there is some shared commonality for the common cases
 + */
 +#define __KVM_GUESTDBG_USE_SW_BP(1  16)
 +#define __KVM_GUESTDBG_USE_HW_BP(1  17)
 +
  
  struct kvm_guest_debug {
  __u32 control;
 
 We sort of left this discussion hanging with me expressing slight
 concern about the usefulness about these defines.
 
 Paolo, what are your thoughts?

I would just lift these two KVM_GUESTDBG_* defines to
include/uapi/linux/kvm.h and say that architecture specific stuff uses
the top 14 bits of the field. :)

Paolo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3] cpuidle: Handle tick_broadcast_enter() failure gracefully

2015-05-08 Thread Rafael J. Wysocki

On Friday, May 08, 2015 01:05:32 PM Preeti U Murthy wrote:
 When a CPU has to enter an idle state where tick stops, it makes a call
 to tick_broadcast_enter(). The call will fail if this CPU is the
 broadcast CPU. Today, under such a circumstance, the arch cpuidle code
 handles this CPU.  This is not convincing because not only do we not
 know what the arch cpuidle code does, but we also do not account for the
 idle state residency time and usage of such a CPU.
 
 This scenario can be handled better by simply choosing an idle state
 where in ticks do not stop. To accommodate this change move the setting
 of runqueue idle state from the core to the cpuidle driver, else the
 rq-idle_state will be set wrong.
 
 Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com
 ---
 Changes from V2: https://lkml.org/lkml/2015/5/7/78
 Introduce a function in cpuidle core to select an idle state where ticks do 
 not
 stop rather than going through the governors.
 
 Changes from V1: https://lkml.org/lkml/2015/5/7/24
 Rebased on the latest linux-pm/bleeding-edge branch
 
  drivers/cpuidle/cpuidle.c |   45 
 +++--
  include/linux/sched.h |   16 
  kernel/sched/core.c   |   17 +
  kernel/sched/fair.c   |2 +-
  kernel/sched/idle.c   |6 --
  kernel/sched/sched.h  |   24 
  6 files changed, 77 insertions(+), 33 deletions(-)
 
 diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
 index 8c24f95..d1af760 100644
 --- a/drivers/cpuidle/cpuidle.c
 +++ b/drivers/cpuidle/cpuidle.c
 @@ -21,6 +21,7 @@
  #include linux/module.h
  #include linux/suspend.h
  #include linux/tick.h
 +#include linux/sched.h
  #include trace/events/power.h
  
  #include cpuidle.h
 @@ -146,6 +147,36 @@ int cpuidle_enter_freeze(struct cpuidle_driver *drv, 
 struct cpuidle_device *dev)
   return index;
  }
  
 +/*
 + * find_tick_valid_state - select a state where tick does not stop
 + * @dev: cpuidle device for this cpu
 + * @drv: cpuidle driver for this cpu
 + */
 +static int find_tick_valid_state(struct cpuidle_device *dev,
 + struct cpuidle_driver *drv)
 +{
 + int i, ret = -1;
 +
 + for (i = CPUIDLE_DRIVER_STATE_START; i  drv-state_count; i++) {
 + struct cpuidle_state *s = drv-states[i];
 + struct cpuidle_state_usage *su = dev-states_usage[i];
 +
 + /*
 +  * We do not explicitly check for latency requirement
 +  * since it is safe to assume that only shallower idle
 +  * states will have the CPUIDLE_FLAG_TIMER_STOP bit
 +  * cleared and they will invariably meet the latency
 +  * requirement.
 +  */
 + if (s-disabled || su-disable ||
 + (s-flags  CPUIDLE_FLAG_TIMER_STOP))
 + continue;
 +
 + ret = i;
 + }
 + return ret;
 +}
 +
  /**
   * cpuidle_enter_state - enter the state and update stats
   * @dev: cpuidle device for this cpu
 @@ -168,10 +199,17 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
 struct cpuidle_driver *drv,
* CPU as a broadcast timer, this call may fail if it is not available.
*/
   if (broadcast  tick_broadcast_enter()) {
 - default_idle_call();
 - return -EBUSY;
 + index = find_tick_valid_state(dev, drv);

Well, the new state needs to be deeper than the old one or you may violate the
governor's choice and this doesn't guarantee that.

Also I don't quite see a reason to duplicate the find_deepest_state() 
functionality
here.

 + if (index  0) {
 + default_idle_call();
 + return -EBUSY;
 + }
 + target_state = drv-states[index];
   }
  
 + /* Take note of the planned idle state. */
 + idle_set_state(smp_processor_id(), target_state);

And I wouldn't do this either.

The behavior here is pretty much as though the driver demoted the state chosen
by the governor and we don't call idle_set_state() again in those cases.

 +
   trace_cpu_idle_rcuidle(index, dev-cpu);
   time_start = ktime_get();

Overall, something like the patch below (untested) should work I suppose?

---
 drivers/cpuidle/cpuidle.c |   21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/cpuidle/cpuidle.c
===
--- linux-pm.orig/drivers/cpuidle/cpuidle.c
+++ linux-pm/drivers/cpuidle/cpuidle.c
@@ -73,17 +73,19 @@ int cpuidle_play_dead(void)
 }
 
 static int find_deepest_state(struct cpuidle_driver *drv,
- struct cpuidle_device *dev, bool freeze)
+ struct cpuidle_device *dev, bool freeze,
+ int limit, unsigned int flags_to_avoid)
 {
unsigned int latency_req = 0;

[tip:perf/core] perf_event: Don't allow vmalloc() backed perf on powerpc

2015-05-08 Thread tip-bot for Michael Ellerman

Commit-ID:  cb307113746b4d184155d2c412e8069aeaa60d42
Gitweb: http://git.kernel.org/tip/cb307113746b4d184155d2c412e8069aeaa60d42
Author: Michael Ellerman m...@ellerman.id.au
AuthorDate: Mon, 4 May 2015 16:26:39 +1000
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 8 May 2015 12:26:01 +0200

perf_event: Don't allow vmalloc() backed perf on powerpc

On powerpc the perf event interrupt is not masked when interrupts are
disabled, allowing it to function as an NMI.

This causes problems if perf is using vmalloc. If we take a page fault
on the vmalloc region the fault handler will fail the page fault because
it detects we are coming in from an NMI (see do_hash_page()).

We don't actually need or want vmalloc backed perf so just disable it on
powerpc.

Signed-off-by: Michael Ellerman m...@ellerman.id.au
Signed-off-by: Peter Zijlstra (Intel) pet...@infradead.org
Cc: linuxppc-...@ozlabs.org
Cc: Andrew Morton a...@osdl.org
Cc: Anton Blanchard an...@samba.org
Cc: Borislav Petkov b...@alien8.de
Cc: H. Peter Anvin h...@zytor.com
Cc: Paul Mackerras pau...@samba.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: a...@ghostprotocols.net
Cc: suka...@linux.vnet.ibm.com
Link: 
http://lkml.kernel.org/r/1430720799-18426-1-git-send-email-...@ellerman.id.au
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 init/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index dc24dec..81050e4 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1637,7 +1637,7 @@ config PERF_EVENTS
 config DEBUG_PERF_USE_VMALLOC
default n
bool Debug: use vmalloc to back perf mmap() buffers
-   depends on PERF_EVENTS  DEBUG_KERNEL
+   depends on PERF_EVENTS  DEBUG_KERNEL  !PPC
select PERF_USE_VMALLOC
help
 Use vmalloc memory to back perf mmap() buffers.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 4/6] cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE

2015-05-08 Thread Rafael J. Wysocki

On Friday, May 08, 2015 09:16:44 AM Preeti U Murthy wrote:
 On 05/08/2015 02:29 AM, Rafael J. Wysocki wrote:
  On Thursday, May 07, 2015 05:49:22 PM Preeti U Murthy wrote:
  On 05/05/2015 02:11 PM, Preeti U Murthy wrote:
  On 05/05/2015 12:03 PM, Shilpasri G Bhat wrote:
  Hi Preeti,
 
  On 05/05/2015 09:30 AM, Preeti U Murthy wrote:
  Hi Shilpa,
 
  On 05/04/2015 02:24 PM, Shilpasri G Bhat wrote:
  Re-evaluate the chip's throttled state on recieving OCC_THROTTLE
  notification by executing *throttle_check() on any one of the cpu on
  the chip. This is a sanity check to verify if we were indeed
  throttled/unthrottled after receiving OCC_THROTTLE notification.
 
  We cannot call *throttle_check() directly from the notification
  handler because we could be handling chip1's notification in chip2. So
  initiate an smp_call to execute *throttle_check(). We are irq-disabled
  in the notification handler, so use a worker thread to smp_call
  throttle_check() on any of the cpu in the chipmask.
 
  I see that the first patch takes care of reporting *per-chip* throttling
  for pmax capping condition. But where are we taking care of reporting
  pstate set to safe and freq control disabled scenarios per-chip ?
 
 
  IMO let us not have psafe and freq control disabled states managed 
  per-chip.
  Because when the above two conditions occur it is likely to happen 
  across all
  chips during an OCC reset cycle. So I am setting 'throttled' to false on
  OCC_ACTIVE and re-verifying if it actually is the case by invoking
  *throttle_check().
 
  Alright like I pointed in the previous reply, a comment to indicate that
  psafe and freq control disabled conditions will fail when occ is
  inactive and that all chips face the consequence of this will help.
 
  From your explanation on the thread of the first patch of this series,
  this will not be required.
 
  So,
  Reviewed-by: Preeti U Murthy pre...@linux.vnet.ibm.com
  
  OK, so is the whole series reviewed now?
 
 Yes the whole series has been reviewed.

OK, I'll queue it up for 4.2, then, thanks!


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Build Failure with allyesconfig for PowerPc on latest verison of Linus's tree

2015-05-08 Thread nick

Greetings Benjamin,Paul,Michael and others,
I am reporting the below error message:
drivers/built-in.o: In function `.i40e_vc_process_vflr_event':
(.text+0x1ffaea0): relocation truncated to fit: R_PPC64_REL24 (stub) against 
symbol `._mcount' defined in .text section in arch/powerpc/kernel/entry_64.o
 drivers/built-in.o: In function `.i40e_vc_process_vflr_event':
(.text+0x1ffafa0): relocation truncated to fit: R_PPC64_REL24 (stub) against 
symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/kernel/built-in.o
drivers/built-in.o: In function `.i40e_vc_process_vflr_event':
(.text+0x1ffb120): relocation truncated to fit: R_PPC64_REL24 (stub) against 
symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/kernel/built-in.o
 drivers/built-in.o: In function `.i40e_vc_process_vflr_event':
 (.text+0x1ffb254): relocation truncated to fit: R_PPC64_REL24 (stub) against 
symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/kernel/built-in.o
drivers/built-in.o: In function `.i40e_vc_process_vflr_event':
 (.text+0x1ffb358): relocation truncated to fit: R_PPC64_REL24 (stub) against 
symbol `_restgpr0_23' defined in .text.save.restore section in 
arch/powerpc/lib/built-in.o
 drivers/built-in.o: In function `.i40e_ndo_set_vf_mac':
 (.text+0x1ffb360): relocation truncated to fit: R_PPC64_REL24 (stub) against 
symbol `_savegpr0_24' defined in .text.save.restore section in 
arch/powerpc/lib/built-in.o
 drivers/built-in.o: In function `.i40e_ndo_set_vf_mac':
 (.text+0x1ffb374): relocation truncated to fit: R_PPC64_REL24 (stub) against 
symbol `._mcount' defined in .text section in arch/powerpc/kernel/entry_64.o
 drivers/built-in.o: In function `.i40e_ndo_set_vf_mac':
 (.text+0x1ffb6e4): relocation truncated to fit: R_PPC64_REL24 (stub) against 
symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/kernel/built-in.o
 drivers/built-in.o: In function `.i40e_ndo_set_vf_mac':
 (.text+0x1ffb870): relocation truncated to fit: R_PPC64_REL24 (stub) against 
symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/kernel/built-in.o


This causes the build to break and fail on powerpc on the latest version of 
Linus's tree. Unfortunately my understanding of the powerpc code is rather 
limited and therefore felt best just to report it. Please let
me known if there is anything else I can do to help solve this outstanding 
build breakage.
Cheers,
Nick  
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-08 Thread Andrew Morton

On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com wrote:

 mlock() allows a user to control page out of program memory, but this
 comes at the cost of faulting in the entire mapping when it is
 allocated.  For large mappings where the entire area is not necessary
 this is not ideal.
 
 This series introduces new flags for mmap() and mlockall() that allow a
 user to specify that the covered are should not be paged out, but only
 after the memory has been used the first time.

Please tell us much much more about the value of these changes: the use
cases, the behavioural improvements and performance results which the
patchset brings to those use cases, etc.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3] cpuidle: Handle tick_broadcast_enter() failure gracefully

2015-05-08 Thread Rafael J. Wysocki

On Friday, May 08, 2015 04:18:02 PM Rafael J. Wysocki wrote:
 On Friday, May 08, 2015 01:05:32 PM Preeti U Murthy wrote:
  When a CPU has to enter an idle state where tick stops, it makes a call
  to tick_broadcast_enter(). The call will fail if this CPU is the
  broadcast CPU. Today, under such a circumstance, the arch cpuidle code
  handles this CPU.  This is not convincing because not only do we not
  know what the arch cpuidle code does, but we also do not account for the
  idle state residency time and usage of such a CPU.
  
  This scenario can be handled better by simply choosing an idle state
  where in ticks do not stop. To accommodate this change move the setting
  of runqueue idle state from the core to the cpuidle driver, else the
  rq-idle_state will be set wrong.
  
  Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com
  ---
  Changes from V2: https://lkml.org/lkml/2015/5/7/78
  Introduce a function in cpuidle core to select an idle state where ticks do 
  not
  stop rather than going through the governors.
  
  Changes from V1: https://lkml.org/lkml/2015/5/7/24
  Rebased on the latest linux-pm/bleeding-edge branch
  
   drivers/cpuidle/cpuidle.c |   45 
  +++--
   include/linux/sched.h |   16 
   kernel/sched/core.c   |   17 +
   kernel/sched/fair.c   |2 +-
   kernel/sched/idle.c   |6 --
   kernel/sched/sched.h  |   24 
   6 files changed, 77 insertions(+), 33 deletions(-)
  
  diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
  index 8c24f95..d1af760 100644
  --- a/drivers/cpuidle/cpuidle.c
  +++ b/drivers/cpuidle/cpuidle.c
  @@ -21,6 +21,7 @@
   #include linux/module.h
   #include linux/suspend.h
   #include linux/tick.h
  +#include linux/sched.h
   #include trace/events/power.h
   
   #include cpuidle.h
  @@ -146,6 +147,36 @@ int cpuidle_enter_freeze(struct cpuidle_driver *drv, 
  struct cpuidle_device *dev)
  return index;
   }
   
  +/*
  + * find_tick_valid_state - select a state where tick does not stop
  + * @dev: cpuidle device for this cpu
  + * @drv: cpuidle driver for this cpu
  + */
  +static int find_tick_valid_state(struct cpuidle_device *dev,
  +   struct cpuidle_driver *drv)
  +{
  +   int i, ret = -1;
  +
  +   for (i = CPUIDLE_DRIVER_STATE_START; i  drv-state_count; i++) {
  +   struct cpuidle_state *s = drv-states[i];
  +   struct cpuidle_state_usage *su = dev-states_usage[i];
  +
  +   /*
  +* We do not explicitly check for latency requirement
  +* since it is safe to assume that only shallower idle
  +* states will have the CPUIDLE_FLAG_TIMER_STOP bit
  +* cleared and they will invariably meet the latency
  +* requirement.
  +*/
  +   if (s-disabled || su-disable ||
  +   (s-flags  CPUIDLE_FLAG_TIMER_STOP))
  +   continue;
  +
  +   ret = i;
  +   }
  +   return ret;
  +}
  +
   /**
* cpuidle_enter_state - enter the state and update stats
* @dev: cpuidle device for this cpu
  @@ -168,10 +199,17 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
  struct cpuidle_driver *drv,
   * CPU as a broadcast timer, this call may fail if it is not available.
   */
  if (broadcast  tick_broadcast_enter()) {
  -   default_idle_call();
  -   return -EBUSY;
  +   index = find_tick_valid_state(dev, drv);
 
 Well, the new state needs to be deeper

I should have said shallower, sorry about that.

The state chosen by the governor satisfies certain latency requirements and we
can't violate those by choosing a deeper state here.

But the patch I sent actually did the right thing. :-)

 than the old one or you may violate the governor's choice and this doesn't
 guarantee that.
 
 Also I don't quite see a reason to duplicate the find_deepest_state() 
 functionality
 here.
 
  +   if (index  0) {
  +   default_idle_call();
  +   return -EBUSY;
  +   }
  +   target_state = drv-states[index];
  }
   
  +   /* Take note of the planned idle state. */
  +   idle_set_state(smp_processor_id(), target_state);
 
 And I wouldn't do this either.
 
 The behavior here is pretty much as though the driver demoted the state chosen
 by the governor and we don't call idle_set_state() again in those cases.
 
  +
  trace_cpu_idle_rcuidle(index, dev-cpu);
  time_start = ktime_get();
 
 Overall, something like the patch below (untested) should work I suppose?
 
 ---
  drivers/cpuidle/cpuidle.c |   21 ++---
  1 file changed, 14 insertions(+), 7 deletions(-)
 
 Index: linux-pm/drivers/cpuidle/cpuidle.c
 ===
 --- linux-pm.orig/drivers/cpuidle/cpuidle.c
 +++ linux-pm/drivers/cpuidle/cpuidle.c
 @@ -73,17 +73,19

Re: [PATCH] powerpc/mpc85xx: Fix EDAC address capture

2015-05-08 Thread Scott Wood

On Fri, 2015-05-08 at 16:34 -0500, Scott Wood wrote:
 On Thu, 2015-05-07 at 17:04 +0800, songwenbin wrote:
  From: York Sun york...@freescale.com
  
  Extend err_addr to cover 64 bits for DDR errors.
  
  Signed-off-by: York Sun york...@freescale.com
  Change-Id: Idb112c4a106416a9cad9933c415e6f62de5cf07b
  Reviewed-on: http://git.am.freescale.net:8181/553
  Tested-by: Schmitt Richard-B43082 b43...@freescale.com
  Reviewed-by: Fleming Andrew-AFLEMING aflem...@freescale.com
  Tested-by: Fleming Andrew-AFLEMING aflem...@freescale.com
  Signed-off-by: songwenbin wenbin.s...@freescale.com
 
 Please don't include gerrit stuff in upstream submissions.  Definitely
 don't include Reviewed-by/Tested-by from gerrit as those approvals are
 from an entirely different context.

Never mind, I see you fixed that in v2. :-)

That said, these patches should go via the edac tree (see MAINTAINERS).

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/mpc85xx: Fix EDAC address capture

2015-05-08 Thread Scott Wood

On Thu, 2015-05-07 at 17:04 +0800, songwenbin wrote:
 From: York Sun york...@freescale.com
 
 Extend err_addr to cover 64 bits for DDR errors.
 
 Signed-off-by: York Sun york...@freescale.com
 Change-Id: Idb112c4a106416a9cad9933c415e6f62de5cf07b
 Reviewed-on: http://git.am.freescale.net:8181/553
 Tested-by: Schmitt Richard-B43082 b43...@freescale.com
 Reviewed-by: Fleming Andrew-AFLEMING aflem...@freescale.com
 Tested-by: Fleming Andrew-AFLEMING aflem...@freescale.com
 Signed-off-by: songwenbin wenbin.s...@freescale.com

Please don't include gerrit stuff in upstream submissions.  Definitely
don't include Reviewed-by/Tested-by from gerrit as those approvals are
from an entirely different context.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] cxl: Use call_rcu to reduce latency when releasing the afu fd

2015-05-08 Thread Ian Munsie

From: Ian Munsie imun...@au1.ibm.com

The afu fd release path was identified as a significant bottleneck in
the overall performance of cxl. While an optimal AFU design would
minimise the need to close  reopen the AFU fd, it is not always
practical to avoid.

The bottleneck seems to be down to the call to synchronize_rcu(), which
will block until every other thread is guaranteed to be out of an RCU
critical section. Replace it with call_rcu() to free the context
structures later so we can return to the application sooner.

This reduces the time spent in the fd release path from 13356 usec to
13.3 usec - about a 100x speed up.

Reported-by: Fei K Chen uc...@cn.ibm.com
Signed-off-by: Ian Munsie imun...@au1.ibm.com
---
 drivers/misc/cxl/context.c | 15 ++-
 drivers/misc/cxl/cxl.h |  2 ++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index 22eb338..cea299e 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -243,12 +243,9 @@ void cxl_context_detach_all(struct cxl_afu *afu)
mutex_unlock(afu-contexts_lock);
 }
 
-void cxl_context_free(struct cxl_context *ctx)
+static void reclaim_ctx(struct rcu_head *rcu)
 {
-   mutex_lock(ctx-afu-contexts_lock);
-   idr_remove(ctx-afu-contexts_idr, ctx-pe);
-   mutex_unlock(ctx-afu-contexts_lock);
-   synchronize_rcu();
+   struct cxl_context *ctx = container_of(rcu, struct cxl_context, rcu);
 
free_page((u64)ctx-sstp);
ctx-sstp = NULL;
@@ -256,3 +253,11 @@ void cxl_context_free(struct cxl_context *ctx)
put_pid(ctx-pid);
kfree(ctx);
 }
+
+void cxl_context_free(struct cxl_context *ctx)
+{
+   mutex_lock(ctx-afu-contexts_lock);
+   idr_remove(ctx-afu-contexts_idr, ctx-pe);
+   mutex_unlock(ctx-afu-contexts_lock);
+   call_rcu(ctx-rcu, reclaim_ctx);
+}
diff --git a/drivers/misc/cxl/cxl.h b/drivers/misc/cxl/cxl.h
index 47f655f..ebd2e0d 100644
--- a/drivers/misc/cxl/cxl.h
+++ b/drivers/misc/cxl/cxl.h
@@ -460,6 +460,8 @@ struct cxl_context {
bool pending_irq;
bool pending_fault;
bool pending_afu_err;
+
+   struct rcu_head rcu;
 };
 
 struct cxl {
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 2/2] powerpc/thp: Serialize pmd clear against a linux page table walk.

2015-05-08 Thread Andrew Morton

On Thu,  7 May 2015 12:53:28 +0530 Aneesh Kumar K.V 
aneesh.ku...@linux.vnet.ibm.com wrote:

 Serialize against find_linux_pte_or_hugepte which does lock-less
 lookup in page tables with local interrupts disabled. For huge pages
 it casts pmd_t to pte_t. Since format of pte_t is different from
 pmd_t we want to prevent transit from pmd pointing to page table
 to pmd pointing to huge page (and back) while interrupts are disabled.
 We clear pmd to possibly replace it with page table pointer in
 different code paths. So make sure we wait for the parallel
 find_linux_pte_or_hugepage to finish.

I'm not seeing here any description of the problem which is being
fixed.  Does the patch make the machine faster?  Does the machine
crash?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2 1/2] mm/thp: Split out pmd collpase flush into a seperate functions

2015-05-08 Thread Andrew Morton

On Thu,  7 May 2015 12:53:27 +0530 Aneesh Kumar K.V 
aneesh.ku...@linux.vnet.ibm.com wrote:

 After this patch pmdp_* functions operate only on hugepage pte,
 and not on regular pmd_t values pointing to page table.
 

The patch looks like a pretty safe no-op for non-powerpc?

 --- a/arch/powerpc/include/asm/pgtable-ppc64.h
 +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
 @@ -576,6 +576,10 @@ static inline void pmdp_set_wrprotect(struct mm_struct 
 *mm, unsigned long addr,
  extern void pmdp_splitting_flush(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp);
  
 +#define __HAVE_ARCH_PMDP_COLLAPSE_FLUSH
 +extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
 +  unsigned long address, pmd_t *pmdp);
 +

The fashionable way of doing this is

extern pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
 unsigned long address, pmd_t *pmdp);
#define pmdp_collapse_flush pmdp_collapse_flush

then, elsewhere,

#ifndef pmdp_collapse_flush
static inline pmd_t pmdp_collapse_flush(...) {}
#define pmdp_collapse_flush pmdp_collapse_flush
#endif

It avoids introducing a second (ugly) symbol into the kernel.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-05-08 Thread Andrew Morton

On Wed, 06 May 2015 11:28:12 +0200 Vlastimil Babka vba...@suse.cz wrote:

 On 05/06/2015 12:09 AM, Nishanth Aravamudan wrote:
  On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote:
  What I find somewhat worrying though is that we could potentially
  break the pfmemalloc_watermark_ok() test in situations where
  zone_reclaimable_pages(zone) == 0 is a transient situation (and not
  a permanently allocated hugepage). In that case, the throttling is
  supposed to help system recover, and we might be breaking that
  ability with this patch, no?
 
  Well, if it's transient, we'll skip it this time through, and once there
  are reclaimable pages, we should notice it again.
 
  I'm not familiar enough with this logic, so I'll read through the code
  again soon to see if your concern is valid, as best I can.
 
  In reviewing the code, I think that transiently unreclaimable zones will
  lead to some higher direct reclaim rates and possible contention, but
  shouldn't cause any major harm. The likelihood of that situation, as
  well, in a non-reserved memory setup like the one I described, seems
  exceedingly low.
 
 OK, I guess when a reasonably configured system has nothing to reclaim, 
 it's already busted and throttling won't change much.
 
 Consider the patch Acked-by: Vlastimil Babka vba...@suse.cz

OK, thanks, I'll move this patch into the queue for 4.2-rc1.

Or is it important enough to merge into 4.1?



From: Nishanth Aravamudan n...@linux.vnet.ibm.com
Subject: mm: vmscan: do not throttle based on pfmemalloc reserves if node has 
no reclaimable pages

Based upon 675becce15 (mm: vmscan: do not throttle based on pfmemalloc
reserves if node has no ZONE_NORMAL) from Mel.

We have a system with the following topology:

# numactl -H
available: 3 nodes (0,2-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31
node 0 size: 28273 MB
node 0 free: 27323 MB
node 2 cpus:
node 2 size: 16384 MB
node 2 free: 0 MB
node 3 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 3 size: 30533 MB
node 3 free: 13273 MB
node distances:
node   0   2   3
  0:  10  20  20
  2:  20  10  20
  3:  20  20  10

Node 2 has no free memory, because:
# cat /sys/devices/system/node/node2/hugepages/hugepages-16777216kB/nr_hugepages
1

This leads to the following zoneinfo:

Node 2, zone  DMA
  pages free 0
min  1840
low  2300
high 2760
scanned  0
spanned  262144
present  262144
managed  262144
...
  all_unreclaimable: 1

If one then attempts to allocate some normal 16M hugepages via

echo 37  /proc/sys/vm/nr_hugepages

The echo never returns and kswapd2 consumes CPU cycles.

This is because throttle_direct_reclaim ends up calling
wait_event(pfmemalloc_wait, pfmemalloc_watermark_ok...). 
pfmemalloc_watermark_ok() in turn checks all zones on the node if there
are any reserves, and if so, then indicates the watermarks are ok, by
seeing if there are sufficient free pages.

675becce15 added a condition already for memoryless nodes.  In this case,
though, the node has memory, it is just all consumed (and not
reclaimable).  Effectively, though, the result is the same on this call to
pfmemalloc_watermark_ok() and thus seems like a reasonable additional
condition.

With this change, the afore-mentioned 16M hugepage allocation attempt
succeeds and correctly round-robins between Nodes 1 and 3.

Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com
Reviewed-by: Michal Hocko mho...@suse.cz
Acked-by: Vlastimil Babka vba...@suse.cz
Cc: Dave Hansen dave.han...@intel.com
Cc: Mel Gorman mgor...@suse.de
Cc: Anton Blanchard an...@samba.org
Cc: Johannes Weiner han...@cmpxchg.org
Cc: Michal Hocko mho...@suse.cz
Cc: Rik van Riel r...@redhat.com
Cc: Dan Streetman ddstr...@ieee.org
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 mm/vmscan.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff -puN 
mm/vmscan.c~mm-vmscan-do-not-throttle-based-on-pfmemalloc-reserves-if-node-has-no-reclaimable-pages
 mm/vmscan.c
--- 
a/mm/vmscan.c~mm-vmscan-do-not-throttle-based-on-pfmemalloc-reserves-if-node-has-no-reclaimable-pages
+++ a/mm/vmscan.c
@@ -2646,7 +2646,8 @@ static bool pfmemalloc_watermark_ok(pg_d
 
for (i = 0; i = ZONE_NORMAL; i++) {
zone = pgdat-node_zones[i];
-   if (!populated_zone(zone))
+   if (!populated_zone(zone) ||
+   zone_reclaimable_pages(zone) == 0)
continue;
 
pfmemalloc_reserve += min_wmark_pages(zone);
_

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] mm: vmscan: do not throttle based on pfmemalloc reserves if node has no reclaimable pages

2015-05-08 Thread Nishanth Aravamudan

On 08.05.2015 [15:47:26 -0700], Andrew Morton wrote:
 On Wed, 06 May 2015 11:28:12 +0200 Vlastimil Babka vba...@suse.cz wrote:
 
  On 05/06/2015 12:09 AM, Nishanth Aravamudan wrote:
   On 03.04.2015 [10:45:56 -0700], Nishanth Aravamudan wrote:
   What I find somewhat worrying though is that we could potentially
   break the pfmemalloc_watermark_ok() test in situations where
   zone_reclaimable_pages(zone) == 0 is a transient situation (and not
   a permanently allocated hugepage). In that case, the throttling is
   supposed to help system recover, and we might be breaking that
   ability with this patch, no?
  
   Well, if it's transient, we'll skip it this time through, and once there
   are reclaimable pages, we should notice it again.
  
   I'm not familiar enough with this logic, so I'll read through the code
   again soon to see if your concern is valid, as best I can.
  
   In reviewing the code, I think that transiently unreclaimable zones will
   lead to some higher direct reclaim rates and possible contention, but
   shouldn't cause any major harm. The likelihood of that situation, as
   well, in a non-reserved memory setup like the one I described, seems
   exceedingly low.
  
  OK, I guess when a reasonably configured system has nothing to reclaim, 
  it's already busted and throttling won't change much.
  
  Consider the patch Acked-by: Vlastimil Babka vba...@suse.cz
 
 OK, thanks, I'll move this patch into the queue for 4.2-rc1.

Thank you!

 Or is it important enough to merge into 4.1?

I think 4.2 is sufficient, but I wonder now if I should have included a
stable tag? The issue has been around for a while and there's a
relatively easily workaround (use the per-node sysfs files to manually
round-robin around the exhausted node) in older kernels, so I had
decided against it before.

Thanks,
Nish

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3] cpuidle: Handle tick_broadcast_enter() failure gracefully

2015-05-08 Thread Sudeep Holla




On 08/05/15 08:35, Preeti U Murthy wrote:

When a CPU has to enter an idle state where tick stops, it makes a call
to tick_broadcast_enter(). The call will fail if this CPU is the
broadcast CPU. Today, under such a circumstance, the arch cpuidle code
handles this CPU.  This is not convincing because not only do we not
know what the arch cpuidle code does, but we also do not account for the
idle state residency time and usage of such a CPU.

This scenario can be handled better by simply choosing an idle state
where in ticks do not stop. To accommodate this change move the setting
of runqueue idle state from the core to the cpuidle driver, else the
rq-idle_state will be set wrong.

Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com


I gave it a spin on ARM64 Juno platform with one of the CPU in broadcast
mode and Vexpress TC2 with broadcast timer. I found no issues in both
the cases. So, you can add:

Tested-by: Sudeep Holla sudeep.ho...@arm.com

Regards,
Sudeep
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V3] cpuidle: Handle tick_broadcast_enter() failure gracefully

2015-05-08 Thread Preeti U Murthy

Hi Rafael,

On 05/08/2015 07:48 PM, Rafael J. Wysocki wrote:
 +/*
 + * find_tick_valid_state - select a state where tick does not stop
 + * @dev: cpuidle device for this cpu
 + * @drv: cpuidle driver for this cpu
 + */
 +static int find_tick_valid_state(struct cpuidle_device *dev,
 +struct cpuidle_driver *drv)
 +{
 +int i, ret = -1;
 +
 +for (i = CPUIDLE_DRIVER_STATE_START; i  drv-state_count; i++) {
 +struct cpuidle_state *s = drv-states[i];
 +struct cpuidle_state_usage *su = dev-states_usage[i];
 +
 +/*
 + * We do not explicitly check for latency requirement
 + * since it is safe to assume that only shallower idle
 + * states will have the CPUIDLE_FLAG_TIMER_STOP bit
 + * cleared and they will invariably meet the latency
 + * requirement.
 + */
 +if (s-disabled || su-disable ||
 +(s-flags  CPUIDLE_FLAG_TIMER_STOP))
 +continue;
 +
 +ret = i;
 +}
 +return ret;
 +}
 +
  /**
   * cpuidle_enter_state - enter the state and update stats
   * @dev: cpuidle device for this cpu
 @@ -168,10 +199,17 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
 struct cpuidle_driver *drv,
   * CPU as a broadcast timer, this call may fail if it is not available.
   */
  if (broadcast  tick_broadcast_enter()) {
 -default_idle_call();
 -return -EBUSY;
 +index = find_tick_valid_state(dev, drv);
 
 Well, the new state needs to be deeper than the old one or you may violate the
 governor's choice and this doesn't guarantee that.

The comment above in find_tick_valid_state() explains why we are bound
to choose a shallow idle state. I think its safe to assume that any
state deeper than this one, would have the CPUIDLE_FLAG_TIMER_STOP flag
set and hence would be skipped.

Your patch relies on the assumption that the idle states are arranged in
the increasing order of exit_latency/in the order of shallow to deep.
This is not guaranteed, is it?

 
 Also I don't quite see a reason to duplicate the find_deepest_state() 
 functionality
 here.

Agreed. We could club them like in your patch.

 
 +if (index  0) {
 +default_idle_call();
 +return -EBUSY;
 +}
 +target_state = drv-states[index];
  }
  
 +/* Take note of the planned idle state. */
 +idle_set_state(smp_processor_id(), target_state);
 
 And I wouldn't do this either.
 
 The behavior here is pretty much as though the driver demoted the state chosen
 by the governor and we don't call idle_set_state() again in those cases.

Why is this wrong? The idea here is to set the idle state of the
runqueue to the one that it is more likely to enter into. Its is true
that the state has been demoted, but I don't see any code that requires
rq-idle_state to be a only a governor chosen state or nothing at all.

This is a more important chunk of this patch because it allows us to
track the idle states of the broadcast CPU. Else the system idle time is
bound to be higher than the residency time in different idle states of
all the CPUs. This shows up starkly as an anomaly if we are profiling
cpuidle state entry/exit.

 
 +
  trace_cpu_idle_rcuidle(index, dev-cpu);
  time_start = ktime_get();
 
 Overall, something like the patch below (untested) should work I suppose?

With the exception of the above two points,yes this should work.
 
 ---
  drivers/cpuidle/cpuidle.c |   21 ++---
  1 file changed, 14 insertions(+), 7 deletions(-)
 
 Index: linux-pm/drivers/cpuidle/cpuidle.c
 ===
 --- linux-pm.orig/drivers/cpuidle/cpuidle.c
 +++ linux-pm/drivers/cpuidle/cpuidle.c
 @@ -73,17 +73,19 @@ int cpuidle_play_dead(void)
  }
 
  static int find_deepest_state(struct cpuidle_driver *drv,
 -   struct cpuidle_device *dev, bool freeze)
 +   struct cpuidle_device *dev, bool freeze,
 +   int limit, unsigned int flags_to_avoid)
  {
   unsigned int latency_req = 0;
   int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
 
 - for (i = CPUIDLE_DRIVER_STATE_START; i  drv-state_count; i++) {
 + for (i = CPUIDLE_DRIVER_STATE_START; i  limit; i++) {
   struct cpuidle_state *s = drv-states[i];
   struct cpuidle_state_usage *su = dev-states_usage[i];
 
   if (s-disabled || su-disable || s-exit_latency = latency_req
 - || (freeze  !s-enter_freeze))
 + || (freeze  !s-enter_freeze)
 + || (s-flags  flags_to_avoid))
   continue;
 
   latency_req = s-exit_latency;
 @@ -100,7 +102,7 @@ static int find_deepest_state(struct cpu
  int cpuidle_find_deepest_state(struct cpuidle_driver *drv,

[PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-08 Thread Eric B Munson

mlock() allows a user to control page out of program memory, but this
comes at the cost of faulting in the entire mapping when it is
allocated.  For large mappings where the entire area is not necessary
this is not ideal.

This series introduces new flags for mmap() and mlockall() that allow a
user to specify that the covered are should not be paged out, but only
after the memory has been used the first time.

The performance cost of these patches are minimal on the two benchmarks
I have tested (stream and kernbench).

Avg throughput in MB/s from stream using 100 element arrays
Test 4.1-rc2  4.1-rc2+lock-on-fault
Copy:10,979.0810,917.34
Scale:   11,094.4511,023.01
Add: 12,487.2912,388.65
Triad:   12,505.7712,418.78

Kernbench optimal load
 4.1-rc2  4.1-rc2+lock-on-fault
Elapsed Time 71.046   71.324
User Time62.117   62.352
System Time  8.9268.969
Context Switches 14531.9  14542.5
Sleeps   14935.9  14939

Eric B Munson (3):
  Add flag to request pages are locked after page fault
  Add mlockall flag for locking pages on fault
  Add tests for lock on fault

 arch/alpha/include/uapi/asm/mman.h  |   2 +
 arch/mips/include/uapi/asm/mman.h   |   2 +
 arch/parisc/include/uapi/asm/mman.h |   2 +
 arch/powerpc/include/uapi/asm/mman.h|   2 +
 arch/sparc/include/uapi/asm/mman.h  |   2 +
 arch/tile/include/uapi/asm/mman.h   |   2 +
 arch/xtensa/include/uapi/asm/mman.h |   2 +
 include/linux/mm.h  |   1 +
 include/linux/mman.h|   3 +-
 include/uapi/asm-generic/mman.h |   2 +
 mm/mlock.c  |  13 ++-
 mm/mmap.c   |   4 +-
 mm/swap.c   |   3 +-
 tools/testing/selftests/vm/Makefile |   8 +-
 tools/testing/selftests/vm/lock-on-fault.c  | 145 
 tools/testing/selftests/vm/on-fault-limit.c |  47 +
 tools/testing/selftests/vm/run_vmtests  |  23 +
 17 files changed, 254 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/vm/lock-on-fault.c
 create mode 100644 tools/testing/selftests/vm/on-fault-limit.c

Cc: Shuah Khan shua...@osg.samsung.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org

-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/3] Add mlockall flag for locking pages on fault

2015-05-08 Thread Eric B Munson

Building on the previous patch, extend mlockall() to give a process a
way to specify that pages should be locked when they are faulted in, but
that pre-faulting is not needed.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/alpha/include/uapi/asm/mman.h   |  1 +
 arch/mips/include/uapi/asm/mman.h|  1 +
 arch/parisc/include/uapi/asm/mman.h  |  1 +
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h|  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  1 +
 include/uapi/asm-generic/mman.h  |  1 +
 mm/mlock.c   | 13 +
 9 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 15e96e1..3120dfb 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -38,6 +38,7 @@
 
 #define MCL_CURRENT 8192   /* lock all currently mapped pages */
 #define MCL_FUTURE 16384   /* lock all additions to address space 
*/
+#define MCL_ON_FAULT   32768   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index 47846a5..82aec3c 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -62,6 +62,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0   /* no further special treatment */
 #define MADV_RANDOM1   /* expect random page references */
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 1514cd7..f4601f3 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -32,6 +32,7 @@
 
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index fce74fe..0a28efc 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ON_FAULT   0x8 /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 12425d8..119be80 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -17,6 +17,7 @@
 
 #define MCL_CURRENT 0x2000  /* lock all currently mapped pages */
 #define MCL_FUTURE  0x4000  /* lock all additions to address space 
*/
+#define MCL_ON_FAULT   0x8 /* lock all pages that are faulted in */
 
 #define MAP_POPULATE   0x8000  /* populate (prefault) pagetables */
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index ec04eaf..66ea935 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -37,6 +37,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 
 #endif /* _ASM_TILE_MMAN_H */
diff --git a/arch/xtensa/include/uapi/asm/mman.h 
b/arch/xtensa/include/uapi/asm/mman.h
index 42d43cc..9abcc29 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -75,6 +75,7 @@
  */
 #define MCL_CURRENT1   /* lock all current mappings */
 #define MCL_FUTURE 2   /* lock all future mappings */
+#define MCL_ON_FAULT   4   /* lock all pages that are faulted in */
 
 #define MADV_NORMAL0

Re: [PATCH v4 02/21] powerpc/powernv: Enable M64 on P7IOC

2015-05-08 Thread Alexey Kardashevskiy


On 05/01/2015 04:02 PM, Gavin Shan wrote:

The patch enables M64 window on P7IOC, which has been enabled on
PHB3. Comparing to PHB3, there are 16 M64 BARs and each of them
are divided to 8 segments.


compared to something means you will tell about PHB3 too :)

Do I understand correctly that IODA==IODA1==P7IOC  and P7IOC != IODA2? The 
code does not use PHB3 or P7IOC acronym so it is a bit confusing.




So each PHB can support 128 M64 segments.
Also, P7IOC has M64DT, which helps mapping one particular M64
segment# to arbitrary PE#. However, we just provide 128 M64 (16 BARs)
segments and fixed mapping between PE# and M64 segment# in order
to keep same logic to support M64 for PHB3 and P7IOC. In turn, we
just need different phb-init_m64() hooks for P7IOC and PHB3.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 115 ++
  1 file changed, 103 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f8bc950..646962f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -165,6 +165,67 @@ static void pnv_ioda_free_pe(struct pnv_phb *phb, int pe)
clear_bit(pe, phb-ioda.pe_alloc);
  }

+static int pnv_ioda1_init_m64(struct pnv_phb *phb)
+{
+   struct resource *r;
+   int seg;
+   s64 rc;


Here @rc is of the s64 type.


+
+   /* Each PHB supports 16 separate M64 BARs, each of which are
+* divided into 8 segments. So there are number of M64 segments
+* as total PE#, which is 128.
+*/


there are as many M64 segments as a maximum number of PEs which is 128?



+   for (seg = 0; seg  phb-ioda.total_pe; seg += 8) {
+   unsigned long base;
+
+   base = phb-ioda.m64_base + seg * phb-ioda.m64_segsize;
+   rc = opal_pci_set_phb_mem_window(phb-opal_id,
+OPAL_M64_WINDOW_TYPE,
+seg / 8,
+base,
+0, /* unused */
+8 * phb-ioda.m64_segsize);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn(  Failure %lld configuring M64 BAR#%d on 
PHB#%d\n,
+   rc, seg / 8, phb-hose-global_number);
+   goto fail;
+   }
+
+   rc = opal_pci_phb_mmio_enable(phb-opal_id,
+ OPAL_M64_WINDOW_TYPE,
+ seg / 8,
+ OPAL_ENABLE_M64_SPLIT);
+   if (rc != OPAL_SUCCESS) {
+   pr_warn(  Failure %lld enabling M64 BAR#%d on 
PHB#%d\n,
+   rc, seg / 8, phb-hose-global_number);
+   goto fail;
+   }
+   }
+
+   /* Strip of the segment used by the reserved PE, which
+* is expected to be 0 or last supported PE#
+*/
+   r = phb-hose-mem_resources[1];


mem_resources[0] is IO, mem_resources[1] is MMIO, mem_resources[2] is for 
what? Would be nice to have this commented somewhere.




+   if (phb-ioda.reserved_pe == 0)
+   r-start += phb-ioda.m64_segsize;
+   else if (phb-ioda.reserved_pe == (phb-ioda.total_pe - 1))
+   r-end -= phb-ioda.m64_segsize;
+   else
+   pr_warn(  Cannot strip M64 segment for reserved PE#%d\n,
+   phb-ioda.reserved_pe);
+
+   return 0;
+
+fail:
+   for ( ; seg = 0; seg -= 8)
+   opal_pci_phb_mmio_enable(phb-opal_id,
+OPAL_M64_WINDOW_TYPE,
+seg / 8,
+OPAL_DISABLE_M64);


Out of curiosity - is not there a counterpart for 
opal_pci_set_phb_mem_window() for cleanup?




+
+   return -EIO;
+}
+
  /* The default M64 BAR is shared by all PEs */
  static int pnv_ioda2_init_m64(struct pnv_phb *phb)
  {
@@ -222,7 +283,7 @@ fail:
return -EIO;
  }

-static void pnv_ioda2_reserve_m64_pe(struct pnv_phb *phb)
+static void pnv_ioda_reserve_m64_pe(struct pnv_phb *phb)
  {
resource_size_t sgsz = phb-ioda.m64_segsize;
struct pci_dev *pdev;
@@ -248,8 +309,8 @@ static void pnv_ioda2_reserve_m64_pe(struct pnv_phb *phb)
}
  }

-static int pnv_ioda2_pick_m64_pe(struct pnv_phb *phb,
-struct pci_bus *bus, int all)
+static int pnv_ioda_pick_m64_pe(struct pnv_phb *phb,
+   struct pci_bus *bus, int all)
  {
resource_size_t segsz = phb-ioda.m64_segsize;
struct pci_dev *pdev;
@@ -346,6 +407,28 @@ done:
pe-master = master_pe;

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-08 Thread Eric B Munson

On Fri, 08 May 2015, Andrew Morton wrote:

 On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com wrote:
 
  mlock() allows a user to control page out of program memory, but this
  comes at the cost of faulting in the entire mapping when it is
  allocated.  For large mappings where the entire area is not necessary
  this is not ideal.
  
  This series introduces new flags for mmap() and mlockall() that allow a
  user to specify that the covered are should not be paged out, but only
  after the memory has been used the first time.
 
 Please tell us much much more about the value of these changes: the use
 cases, the behavioural improvements and performance results which the
 patchset brings to those use cases, etc.
 

The primary use case is for mmaping large files read only.  The process
knows that some of the data is necessary, but it is unlikely that the
entire file will be needed.  The developer only wants to pay the cost to
read the data in once.  Unfortunately developer must choose between
allowing the kernel to page in the memory as needed and guaranteeing
that the data will only be read from disk once.  The first option runs
the risk of having the memory reclaimed if the system is under memory
pressure, the second forces the memory usage and startup delay when
faulting in the entire file.

I am working on getting startup times with and without this change for
an application, I will post them as soon as I have them.

Eric


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 00/21] PowerPC/PowerNV: PCI Slot Management

2015-05-08 Thread Alexey Kardashevskiy


On 05/01/2015 04:02 PM, Gavin Shan wrote:

The series of patches intend to support PCI slot for PowerPC PowerNV platform,
which is running on top of skiboot firmware. The patchset requires corresponding
changes from skiboot firmware, which is sent to skib...@lists.ozlabs.org
for review. The PCI slots are exposed by skiboot with device node properties,
and kernel utilizes those properties to populated PCI slots accordingly.

The original PCI infrastructure on PowerNV platform can't support hotplug
because the PE is assigned during PHB fixup time, which is called for once
during system boot time. For this, the PCI infrastructure on PowerNV platform
has been reworked for a lot. After that, the PE and its corresponding resources
(IODT, M32DT, M64 segments, DMA32 and bypass window) are assigned upon updating
PCI bridge's resources, which might decide PE# assigned to the PE (e.g. M64
resources, on P8 strictly speaking).


Out of curiosity - does this PCI scan happen when memory subsystem is 
initialized? More precisely, after these changes, won't 
pnv_pci_ioda2_setup_dma_pe() be called too early after boot so I won't be 
able to use kmalloc() to allocate iommu_table's?


Also, checkpatch.pl failed multiple times on the series. Please fix.



Each PE will maintain a reference count,
which is (number of child PCI devices + 1). That indicates when last child PCI
device leaves the PE, the PE and its included resources will be relased and put
back into free pool again. With this design, the PE will be released when EEH PE
is released. PATCH[1 - 8] are related to this part.

 From skiboot perspective, PCI slot is providing (hot/fundamental/complete)
resets to EEH. The kernel gets to know if skiboot supports various reset on one
particular PCI slot through device-tree node. If it does, EEH will utilize the
functionality provided by skiboot. Besides, the device-tree nodes have to change
in order to support PCI hotplug. For example, when one PCI adapter inserted to
one slot, its device-tree node should be added to the system dynamically. 
Conversely,
the device-tree node should be removed from the system when the PCI adapter is 
going
to be offline. Since pci_dn and eeh_dev have same life cyle as PCI device nodes,
they should be added/removed accordingly during PCI hotplug. Patch[9 - 20] are
doing the related work.

The last patch is the standalone PCI hotplug driver for PowerNV platform. When
removing PCI adapter from one PCI slot, which is invoked by command in userland,
the skiboot will power off the slot to save power and remove all device-tree
nodes for all PCI devices behind the slot. Conversely, the Power to the slot
is turned on, the PCI devices behind the slot is rescanned, and the device-tree
nodes for those newly detected PCI devices will be built in skiboot. For both
of cases, one message will be sent to kernel by skiboot so that the kernel
can adjust the device-tree accordingly. At the same time, the kernel also have
to deallocate or allocate PE# and its related resources (PE# and so on) for the
removed/added PCI devices.

Changelog
=
v4:
* Rebased to 4.1.RC1
* Added API to unflatten FDT blob to device node sub-tree, which is attached
  the indicated parent device node. The original mechanism based on 
formatted
  string stream has been dropped.
* The PATCH[v3 09/21] (powerpc/eeh: Delay probing EEH device during 
hotplug)
  was picked up sent to linux-ppc@ separately for review as Richard's VF 
EEH
  Support depends on that.
v3:
* Rebased to 4.1.RC0
* PowerNV PCI infrasturcture is total refactored in order to support PCI
  hotplug. The PowerNV hotplug driver is also reworked a lot because of
  the changes in skiboot in order to support PCI hotplug.

Gavin Shan (21):
   pci: Add pcibios_setup_bridge()
   powerpc/powernv: Enable M64 on P7IOC
   powerpc/powernv: M64 support improvement
   powerpc/powernv: Improve IO and M32 mapping
   powerpc/powernv: Improve DMA32 segment assignment
   powerpc/powernv: Create PEs dynamically
   powerpc/powernv: Release PEs dynamically
   powerpc/powernv: Drop pnv_ioda_setup_dev_PE()
   powerpc/powernv: Use PCI slot reset infrastructure
   powerpc/powernv: Fundamental reset for PCI bus reset
   powerpc/pci: Don't scan empty slot
   powerpc/pci: Move pcibios_find_pci_bus() around
   powerpc/powernv: Introduce pnv_pci_poll()
   powerpc/powernv: Functions to get/reset PCI slot status
   powerpc/pci: Delay creating pci_dn
   powerpc/pci: Create eeh_dev while creating pci_dn
   powerpc/pci: Export traverse_pci_device_nodes()
   powerpc/pci: Update bridge windows on PCI plugging
   drivers/of: Support adding sub-tree
   powerpc/powernv: Select OF_DYNAMIC
   pci/hotplug: PowerPC PowerNV PCI hotplug driver

  arch/powerpc/include/asm/eeh.h |7 +-
  arch/powerpc/include/asm/opal-api.h|7 +-
  arch/powerpc/include/asm/opal.h|7 +-
  arch/powerpc/include/asm/pci-bridge.h  |

[PATCH 1/3] Add flag to request pages are locked after page fault

2015-05-08 Thread Eric B Munson

The cost of faulting in all memory to be locked can be very high when
working with large mappings.  If only portions of the mapping will be
used this can incur a high penalty for locking.  This patch introduces
the ability to request that pages are not pre-faulted, but are placed on
the unevictable LRU when they are finally faulted in.

To keep accounting checks out of the page fault path, users are billed
for the entire mapping lock as if MAP_LOCKED was used.

Signed-off-by: Eric B Munson emun...@akamai.com
Cc: linux-al...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@linux-mips.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-...@vger.kernel.org
---
 arch/alpha/include/uapi/asm/mman.h   | 1 +
 arch/mips/include/uapi/asm/mman.h| 1 +
 arch/parisc/include/uapi/asm/mman.h  | 1 +
 arch/powerpc/include/uapi/asm/mman.h | 1 +
 arch/sparc/include/uapi/asm/mman.h   | 1 +
 arch/tile/include/uapi/asm/mman.h| 1 +
 arch/xtensa/include/uapi/asm/mman.h  | 1 +
 include/linux/mm.h   | 1 +
 include/linux/mman.h | 3 ++-
 include/uapi/asm-generic/mman.h  | 1 +
 mm/mmap.c| 4 ++--
 mm/swap.c| 3 ++-
 12 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/mman.h 
b/arch/alpha/include/uapi/asm/mman.h
index 0086b47..15e96e1 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -30,6 +30,7 @@
 #define MAP_NONBLOCK   0x4 /* do not block on IO */
 #define MAP_STACK  0x8 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x10/* create a huge page mapping */
+#define MAP_LOCKONFAULT0x20/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_ASYNC   1   /* sync memory asynchronously */
 #define MS_SYNC2   /* synchronous memory sync */
diff --git a/arch/mips/include/uapi/asm/mman.h 
b/arch/mips/include/uapi/asm/mman.h
index cfcb876..47846a5 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -48,6 +48,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 /*
  * Flags for msync
diff --git a/arch/parisc/include/uapi/asm/mman.h 
b/arch/parisc/include/uapi/asm/mman.h
index 294d251..1514cd7 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -24,6 +24,7 @@
 #define MAP_NONBLOCK   0x2 /* do not block on IO */
 #define MAP_STACK  0x4 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x8 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x10/* Lock pages after they are 
faulted in, do not prefault */
 
 #define MS_SYNC1   /* synchronous memory sync */
 #define MS_ASYNC   2   /* sync memory asynchronously */
diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 6ea26df..fce74fe 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -27,5 +27,6 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h 
b/arch/sparc/include/uapi/asm/mman.h
index 0b14df3..12425d8 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -22,6 +22,7 @@
 #define MAP_NONBLOCK   0x1 /* do not block on IO */
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
+#define MAP_LOCKONFAULT0x8 /* Lock pages after they are 
faulted in, do not prefault */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h 
b/arch/tile/include/uapi/asm/mman.h
index 81b8fc3..ec04eaf 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
 #define MAP_DENYWRITE  0x0800  /* ETXTBSY */
 #define

Re: [PATCH 0/3] Allow user to request memory to be locked on page fault

2015-05-08 Thread Andrew Morton

On Fri, 8 May 2015 16:06:10 -0400 Eric B Munson emun...@akamai.com wrote:

 On Fri, 08 May 2015, Andrew Morton wrote:
 
  On Fri,  8 May 2015 15:33:43 -0400 Eric B Munson emun...@akamai.com wrote:
  
   mlock() allows a user to control page out of program memory, but this
   comes at the cost of faulting in the entire mapping when it is
   allocated.  For large mappings where the entire area is not necessary
   this is not ideal.
   
   This series introduces new flags for mmap() and mlockall() that allow a
   user to specify that the covered are should not be paged out, but only
   after the memory has been used the first time.
  
  Please tell us much much more about the value of these changes: the use
  cases, the behavioural improvements and performance results which the
  patchset brings to those use cases, etc.
  
 
 The primary use case is for mmaping large files read only.  The process
 knows that some of the data is necessary, but it is unlikely that the
 entire file will be needed.  The developer only wants to pay the cost to
 read the data in once.  Unfortunately developer must choose between
 allowing the kernel to page in the memory as needed and guaranteeing
 that the data will only be read from disk once.  The first option runs
 the risk of having the memory reclaimed if the system is under memory
 pressure, the second forces the memory usage and startup delay when
 faulting in the entire file.

Why can't the application mmap only those parts of the file which it
wants and mlock those?

 I am working on getting startup times with and without this change for
 an application, I will post them as soon as I have them.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

42 matches

Mail list logo