Re: [PATCH v10 0/8] arm64, numa: Add numa support for arm64 platforms

2016-02-02 Thread Robert Richter
On 02.02.16 15:39:15, Ganapatrao Kulkarni wrote:

> Ganapatrao Kulkarni (8):
>   arm64, numa: adding numa support for arm64 platforms.
>   Documentation, dt, numa: dt bindings for numa.
>   dt, numa: adding numa dt binding implementation.
>   arm64, numa : Enable numa dt for arm64 platforms.
>   arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node
> topology.
>   arm64, mm, numa: Adding numa balancing support for arm64.
>   topology, cleanup: Avoid redefinition of cpumask_of_pcibus in asm
> header files.
>   numa, mm, cleanup: remove redundant NODE_DATA macro from asm header
> files.

I have tested the whole series on single and dual node systems for
devicetree and acpi (with Hanjun's acpi numa v3 patches ported on
top).

Tested-by: Robert Richter 

-Robert

> 
>  Documentation/devicetree/bindings/numa.txt  | 272 
>  arch/arm64/Kconfig  |  26 +
>  arch/arm64/boot/dts/cavium/Makefile |   2 +-
>  arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  83 +++
>  arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 
> 
>  arch/arm64/include/asm/mmzone.h |  10 +
>  arch/arm64/include/asm/numa.h   |  45 ++
>  arch/arm64/include/asm/pgtable.h|  18 +
>  arch/arm64/include/asm/topology.h   |   7 +
>  arch/arm64/kernel/pci.c |  10 +
>  arch/arm64/kernel/setup.c   |   4 +
>  arch/arm64/kernel/smp.c |   4 +
>  arch/arm64/mm/Makefile  |   1 +
>  arch/arm64/mm/init.c|  34 +-
>  arch/arm64/mm/mmu.c |   1 +
>  arch/arm64/mm/numa.c| 404 
>  arch/ia64/include/asm/topology.h|   4 -
>  arch/m32r/include/asm/mmzone.h  |   4 +-
>  arch/metag/include/asm/mmzone.h |   4 +-
>  arch/metag/include/asm/topology.h   |   3 -
>  arch/powerpc/include/asm/mmzone.h   |   8 +-
>  arch/powerpc/include/asm/topology.h |   4 -
>  arch/s390/include/asm/mmzone.h  |   6 +-
>  arch/s390/include/asm/pci.h |   2 +-
>  arch/s390/include/asm/topology.h|   1 +
>  arch/sh/include/asm/mmzone.h|   4 +-
>  arch/sh/include/asm/topology.h  |   3 -
>  arch/sparc/include/asm/mmzone.h |   6 +-
>  arch/tile/include/asm/pci.h |   2 -
>  arch/tile/include/asm/topology.h|   3 +
>  arch/x86/include/asm/mmzone.h   |   3 +-
>  arch/x86/include/asm/mmzone_32.h|   5 -
>  arch/x86/include/asm/mmzone_64.h|  17 -
>  arch/x86/include/asm/pci.h  |   2 +-
>  arch/x86/include/asm/topology.h |   1 +
>  drivers/of/Kconfig  |  11 +
>  drivers/of/Makefile |   1 +
>  drivers/of/of_numa.c| 207 ++
>  include/asm-generic/mmzone.h|  24 +
>  include/asm-generic/topology.h  |   4 +-
>  include/linux/of.h  |   4 +
>  41 files changed, 1986 insertions(+), 74 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/numa.txt
>  create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
>  create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
>  create mode 100644 arch/arm64/include/asm/mmzone.h
>  create mode 100644 arch/arm64/include/asm/numa.h
>  create mode 100644 arch/arm64/mm/numa.c
>  delete mode 100644 arch/x86/include/asm/mmzone_64.h
>  create mode 100644 drivers/of/of_numa.c
>  create mode 100644 include/asm-generic/mmzone.h
> 
> -- 
> 1.8.1.4
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc/perf/hv-24x7: Display change in counter values

2016-02-02 Thread Madhavan Srinivasan


On Saturday 30 January 2016 08:37 AM, Sukadev Bhattiprolu wrote:
> From a1aa992fb25fb8e98a5c5724376ae8cc91463de3 Mon Sep 17 00:00:00 2001
> From: Sukadev Bhattiprolu 
> Date: Mon, 25 Jan 2016 23:05:36 -0500
> Subject: [PATCH 2/2] powerpc/perf/hv-24x7: Display change in counter values
>
> For 24x7 counters, perf displays the raw value of the 24x7 counter, which
> is a monotonically increasing value.
>
>   perf stat -C 0 -e \
>   'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
>   sleep 1
>
>  Performance counter stats for 'CPU(s) 0':
>
>  9,105,403,170  hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/
>
>0.000425751 seconds time elapsed
>
> In the typical usage of 'perf stat' this counter value is not as useful
> as the _change_ in the counter value over the duration of the application.

This may break application using this interface right? i.e, since
for all this time, counter output was raw values and application
may be post processing to calculate the difference, now with
this patch, application may need some change? Also,
should not this be documented somewhere?

Maddy

> Have h_24x7_event_init() set the event's prev_count to the raw value of
> the 24x7 counter at the time of initialization. When the application
> terminates, hv_24x7_event_read() will compute the change in value and
> report to the perf tool. Similarly, for the transaction interface, clear
> the event count to 0 at the beginning of the transaction.
>
>   perf stat -C 0 -e \
>   'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
>   sleep 1
>
>  Performance counter stats for 'CPU(s) 0':
>
>245,758  hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/
>
>1.006366383 seconds time elapsed
>
> Signed-off-by: Sukadev Bhattiprolu 
> ---
>  arch/powerpc/perf/hv-24x7.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
> index b7a9a03..77b958f 100644
> --- a/arch/powerpc/perf/hv-24x7.c
> +++ b/arch/powerpc/perf/hv-24x7.c
> @@ -1222,11 +1222,12 @@ static int h_24x7_event_init(struct perf_event *event)
>   return -EACCES;
>   }
>  
> - /* see if the event complains */
> + /* Get the initial value of the counter for this event */
>   if (single_24x7_request(event, )) {
>   pr_devel("test hcall failed\n");
>   return -EIO;
>   }
> + (void)local64_xchg(>hw.prev_count, ct);
>  
>   return 0;
>  }
> @@ -1289,6 +1290,16 @@ static void h_24x7_event_read(struct perf_event *event)
>   h24x7hw = _cpu_var(hv_24x7_hw);
>   h24x7hw->events[i] = event;
>   put_cpu_var(h24x7hw);
> + /*
> +  * Clear the event count so we can compute the _change_
> +  * in the 24x7 raw counter value at the end of the txn.
> +  *
> +  * Note that we could alternatively read the 24x7 value
> +  * now and save its value in event->hw.prev_count. But
> +  * that would require issuing a hcall, which would then
> +  * defeat the purpose of using the txn interface.
> +  */
> + local64_set(>count, 0);
>   }
>  
>   put_cpu_var(hv_24x7_reqb);

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 6/6] cpufreq: powernv: Add sysfs attributes to show throttle stats

2016-02-02 Thread Shilpasri G Bhat
Create sysfs attributes to export throttle information in
/sys/devices/system/cpu/cpufreq/chipX. The newly added sysfs files are as
follows:

1)/sys/devices/system/cpu/cpufreq/chipX/throttle_table
This table gives the detailed information on number of times Pmax is
limited to different frequencies due to different throttle reasons.
This table contains all frequencies in rows and all throttle reasons
in columns. Each cell represents the throttle count the Pmax was
limited to the frequency in its row and due to the reason in its
column. The 'Unthrottle' column here gives the count of unthrottling
back to Pmax after the frequency was throttled.
# cat /sys/devices/system/cpu/cpufreq/chip0/throttle_table
Frequency   Unthrottle  PowerCapOverTemp...
4322000 0   0   0
4289000 0   0   0
4256000 0   0   0
4222000 0   0   0
4189000 0   0   0
4156000 3   0   3
4123000 4   0   4
...

2)/sys/devices/system/cpu/cpufreq/chipX/throttle_stat
This gives the total number of events of max frequency throttling to
lower frequencies in the turbo range of frequencies and the sub-turbo(at
and below nominal) range of frequencies.
# cat /sys/devices/system/cpu/cpufreq/chip0/throttle_stat
turbo 7
sub-turbo 0

3)/sys/devices/system/cpu/cpufreq/chipX/chip-mask
This gives the list of cpus present in the chip.
# cat /sys/devices/system/cpu/cpufreq/chip0/chip_mask
0-31

Signed-off-by: Shilpasri G Bhat 
Cc: linux-...@vger.kernel.org
---
Changes from v7:
- Replace throttle_frequencies and throttle_reasons/ 
  sysfs attributes with a 2d table 'throttle_table' which lists the
  all frequencies in rows and throttle reasons in columns.
- Add 'chip_mask' attribute to show the list of cpus in the chip.
- Replace the kobject pointer with the variable in struct chip.
- Add 'pstate' member to struct chip to store last throttled pstate index.
- Fixes in the error-out-paths 'free_*' in init_chip_info() to avoid
  freeing unallocated pointers.
- Explicitly call 'sysfs_remove_group()' while cleaning up before 
  kobject_put()
- Replacements with snprintf(), __ATTR_RO() and container_of()
- Modified commit message and Documentation.

Changes from v6:
- Rename struct chip members 'throt_{nominal/turbo}' to throttle_*
- Rename sysfs throttle_reason attribute 'throttle_reset' to
  'unthrottle_count'
- Add sysfs attribute details in
  Documentation/ABI/testing/sysfs-devices-system-cpu
- Add helper routine get_chip_index_from_kobj() for throttle sysfs
  attribute show() to get chip index from kobject.
- Add the chip id in the pr_warn_once

No changes from v5.

Changes from v4:
- Taken care of Gautham's comments to use inline get_chip_index()

Changes from v3:
- Seperate the patch to contain only the throttle sysfs attribute changes.
- Add helper inline function get_chip_index()

Changes from v2:
- Fixed kbuild test warning.
drivers/cpufreq/powernv-cpufreq.c:609:2: warning: ignoring return
value of 'kstrtoint', declared with attribute warn_unused_result
[-Wunused-result]

Changes from v1:
- Added a kobject to struct chip
- Grouped the throttle reasons under a separate attribute_group and
  exported each reason as individual file.
- Moved the sysfs files from /sys/devices/system/node/nodeN to
  /sys/devices/system/cpu/cpufreq/chipN
- As suggested by Paul Clarke replaced 'Nominal' with 'sub-turbo'.

 Documentation/ABI/testing/sysfs-devices-system-cpu |  66 +++
 drivers/cpufreq/powernv-cpufreq.c  | 197 +++--
 2 files changed, 253 insertions(+), 10 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
b/Documentation/ABI/testing/sysfs-devices-system-cpu
index b683e8e..84ff57a 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -271,3 +271,69 @@ Description:   Parameters for the CPU cache attributes
- WriteBack: data is written only to the cache line and
 the modified cache line is written to main
 memory only when it is replaced
+
+What:  /sys/devices/system/cpu/cpufreq/chipX/
+Date:  Feb 2016
+Contact:   Linux kernel mailing list 
+   Linux for PowerPC mailing list 
+Description:   POWERNV CPUFreq driver's frequency throttle stats directory for
+   the chip
+
+   This directory contains the CPU frequency throttle attributes
+   for the chip. It is named using the hardware chip-id in the
+   format of 'chip'. 

[PATCH v8 5/6] cpufreq: powernv: Replace pr_info with trace print for throttle event

2016-02-02 Thread Shilpasri G Bhat
Currently we use printk message to notify the throttle event. But this
can flood the console if the cpu is throttled frequently. So replace the
printk with the tracepoint to notify the throttle event. And also events
like throttle below nominal frequency and OCC_RESET are reduced to
pr_warn/pr_warn_once as pointed by MFG to not mark them as critical
messages. This patch adds 'throttle_reason' to struct chip to store the
throttle reason.

Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
Acked-by: Viresh Kumar 
---
No changes from v7.

 drivers/cpufreq/powernv-cpufreq.c | 73 ++-
 1 file changed, 34 insertions(+), 39 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index c670314..1bbc10a 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -45,12 +46,22 @@ static struct cpufreq_frequency_table 
powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
 static unsigned int *core_to_chip_map;
 
+static const char * const throttle_reason[] = {
+   "No throttling",
+   "Power Cap",
+   "Processor Over Temperature",
+   "Power Supply Failure",
+   "Over Current",
+   "OCC Reset"
+};
+
 static struct chip {
unsigned int id;
bool throttled;
+   bool restore;
+   u8 throttle_reason;
cpumask_t mask;
struct work_struct throttle;
-   bool restore;
 } *chips;
 
 static int nr_chips;
@@ -331,17 +342,17 @@ static void powernv_cpufreq_throttle_check(void *data)
goto next;
chips[i].throttled = true;
if (pmsr_pmax < powernv_pstate_info.nominal)
-   pr_crit("CPU %d on Chip %u has Pmax reduced below 
nominal frequency (%d < %d)\n",
-   cpu, chips[i].id, pmsr_pmax,
-   powernv_pstate_info.nominal);
-   else
-   pr_info("CPU %d on Chip %u has Pmax reduced below turbo 
frequency (%d < %d)\n",
-   cpu, chips[i].id, pmsr_pmax,
-   powernv_pstate_info.max);
+   pr_warn_once("CPU %d on Chip %u has Pmax reduced below 
nominal frequency (%d < %d)\n",
+cpu, chips[i].id, pmsr_pmax,
+powernv_pstate_info.nominal);
+   trace_powernv_throttle(chips[i].id,
+ throttle_reason[chips[i].throttle_reason],
+ pmsr_pmax);
} else if (chips[i].throttled) {
chips[i].throttled = false;
-   pr_info("CPU %d on Chip %u has Pmax restored to %d\n", cpu,
-   chips[i].id, pmsr_pmax);
+   trace_powernv_throttle(chips[i].id,
+ throttle_reason[chips[i].throttle_reason],
+ pmsr_pmax);
}
 
/* Check if Psafe_mode_active is set in PMSR. */
@@ -359,7 +370,7 @@ next:
 
if (throttled) {
pr_info("PMSR = %16lx\n", pmsr);
-   pr_crit("CPU Frequency could be throttled\n");
+   pr_warn("CPU Frequency could be throttled\n");
}
 }
 
@@ -452,15 +463,6 @@ out:
put_online_cpus();
 }
 
-static char throttle_reason[][30] = {
-   "No throttling",
-   "Power Cap",
-   "Processor Over Temperature",
-   "Power Supply Failure",
-   "Over Current",
-   "OCC Reset"
-};
-
 static int powernv_cpufreq_occ_msg(struct notifier_block *nb,
   unsigned long msg_type, void *_msg)
 {
@@ -486,7 +488,7 @@ static int powernv_cpufreq_occ_msg(struct notifier_block 
*nb,
 */
if (!throttled) {
throttled = true;
-   pr_crit("CPU frequency is throttled for duration\n");
+   pr_warn("CPU frequency is throttled for duration\n");
}
 
break;
@@ -510,23 +512,18 @@ static int powernv_cpufreq_occ_msg(struct notifier_block 
*nb,
return 0;
}
 
-   if (omsg.throttle_status &&
+   for (i = 0; i < nr_chips; i++)
+   if (chips[i].id == omsg.chip)
+   break;
+
+   if (omsg.throttle_status >= 0 &&
omsg.throttle_status <= OCC_MAX_THROTTLE_STATUS)
-   

[PATCH v8 2/6] cpufreq: powernv: Hot-plug safe the kworker thread

2016-02-02 Thread Shilpasri G Bhat
In the kworker_thread powernv_cpufreq_work_fn(), we can end up
sending an IPI to a cpu going offline. This is a rare corner case
which is fixed using {get/put}_online_cpus(). Along with this fix,
this patch adds changes to do oneshot cpumask_{clear/and} operation.

Suggested-by: Shreyas B Prabhu 
Suggested-by: Gautham R Shenoy 
Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
Acked-by: Viresh Kumar 
---
No changes from v7.

 drivers/cpufreq/powernv-cpufreq.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 53f980b..a271b0f 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -423,18 +424,19 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
 {
struct chip *chip = container_of(work, struct chip, throttle);
unsigned int cpu;
-   cpumask_var_t mask;
+   cpumask_t mask;
 
-   smp_call_function_any(>mask,
+   get_online_cpus();
+   cpumask_and(, >mask, cpu_online_mask);
+   smp_call_function_any(,
  powernv_cpufreq_throttle_check, NULL, 0);
 
if (!chip->restore)
-   return;
+   goto out;
 
chip->restore = false;
-   cpumask_copy(mask, >mask);
-   for_each_cpu_and(cpu, mask, cpu_online_mask) {
-   int index, tcpu;
+   for_each_cpu(cpu, ) {
+   int index;
struct cpufreq_policy policy;
 
cpufreq_get_policy(, cpu);
@@ -442,9 +444,10 @@ void powernv_cpufreq_work_fn(struct work_struct *work)
   policy.cur,
   CPUFREQ_RELATION_C, );
powernv_cpufreq_target_index(, index);
-   for_each_cpu(tcpu, policy.cpus)
-   cpumask_clear_cpu(tcpu, mask);
+   cpumask_andnot(, , policy.cpus);
}
+out:
+   put_online_cpus();
 }
 
 static char throttle_reason[][30] = {
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc/perf/hv-24x7: Display change in counter values

2016-02-02 Thread Sukadev Bhattiprolu
Madhavan Srinivasan [ma...@linux.vnet.ibm.com] wrote:
> 
> 
> On Saturday 30 January 2016 08:37 AM, Sukadev Bhattiprolu wrote:
> > From a1aa992fb25fb8e98a5c5724376ae8cc91463de3 Mon Sep 17 00:00:00 2001
> > From: Sukadev Bhattiprolu 
> > Date: Mon, 25 Jan 2016 23:05:36 -0500
> > Subject: [PATCH 2/2] powerpc/perf/hv-24x7: Display change in counter values
> >
> > For 24x7 counters, perf displays the raw value of the 24x7 counter, which
> > is a monotonically increasing value.
> >
> > perf stat -C 0 -e \
> > 'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
> > sleep 1
> >
> >  Performance counter stats for 'CPU(s) 0':
> >
> >  9,105,403,170  hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/
> >
> >0.000425751 seconds time elapsed
> >
> > In the typical usage of 'perf stat' this counter value is not as useful
> > as the _change_ in the counter value over the duration of the application.
> 
> This may break application using this interface right? i.e, since
> for all this time, counter output was raw values and application
> may be post processing to calculate the difference, now with
> this patch, application may need some change? Also,
> should not this be documented somewhere?

Agree that it does change the behavior. I am checking to see if it
was explicitly documented that the values would be raw. But current
behavior seems counter-intuitive and inconsistent with 'perf stat'.

If we run something like:

perf stat -C 0 -e <24x7-event> make

we see the large number (raw value of the counter) when the application
terminates. The raw value not very useful. To effectively use the counter
in this scenario, user would ahve to run:

perf stat -C 0 -e <24x7-event> sleep 1
#note raw value 1

perf stat -C 0 -e <24x7-event> make
# note raw value 2

# compute diff of value2 and value1.

Reporting the change in value seems to be consistent with normal usage of
perf stat with events like cycles or instructions:

Thanks,

Sukadev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 1/6] cpufreq: powernv: Free 'chips' on module exit

2016-02-02 Thread Shilpasri G Bhat
This will free the dynamically allocated memory of 'chips' on
module exit.

Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
Acked-by: Viresh Kumar 
---
Changes from v7:
- Minor typo fix in the commit message

 drivers/cpufreq/powernv-cpufreq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 547890f..53f980b 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -612,6 +612,7 @@ static void __exit powernv_cpufreq_exit(void)
unregister_reboot_notifier(_cpufreq_reboot_nb);
opal_message_notifier_unregister(OPAL_MSG_OCC,
 _cpufreq_opal_nb);
+   kfree(chips);
cpufreq_unregister_driver(_cpufreq_driver);
 }
 module_exit(powernv_cpufreq_exit);
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Failure on latest GIT - implicit declaration of function ‘pte_swp_clear_soft_dirty’

2016-02-02 Thread Mike
Agreed, raised an eyebrow initially when select ppc64 and 32 :D

I'll give a word on the trackpad issue later, cant remember seeing any
changes that ought effect it really. guess the compile is done in a good
hour or so, took the tiime to slim it down to someting reasonable

Thanks man
On 2 Feb 2016 18:14, "Pranith Kumar"  wrote:

> On Tue, Feb 2, 2016 at 1:48 AM, Aneesh Kumar K.V
>  wrote:
> >
> > This patch didn't work for you ?
> >
> >
> http://mid.gmane.org/1454086969-21074-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com
> >
>
> This actually is a better patch. I didn't realize that we have the _64
> version.
>
> Thanks!
> --
> Pranith
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 3/6] cpufreq: powernv: Remove cpu_to_chip_id() from hot-path

2016-02-02 Thread Shilpasri G Bhat
cpu_to_chip_id() does a DT walk through to find out the chip id by
taking a contended device tree lock. This adds an unnecessary overhead
in a hot path. So instead of calling cpu_to_chip_id() everytime cache
the chip ids for all cores in the array 'core_to_chip_map' and use it
in the hotpath.

Reported-by: Anton Blanchard 
Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
Acked-by: Viresh Kumar 
---
No changes from v7.

 drivers/cpufreq/powernv-cpufreq.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index a271b0f..c670314 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -43,6 +43,7 @@
 
 static struct cpufreq_frequency_table powernv_freqs[POWERNV_MAX_PSTATES+1];
 static bool rebooting, throttled, occ_reset;
+static unsigned int *core_to_chip_map;
 
 static struct chip {
unsigned int id;
@@ -313,13 +314,14 @@ static inline unsigned int get_nominal_index(void)
 static void powernv_cpufreq_throttle_check(void *data)
 {
unsigned int cpu = smp_processor_id();
+   unsigned int chip_id = core_to_chip_map[cpu_core_index_of_thread(cpu)];
unsigned long pmsr;
int pmsr_pmax, i;
 
pmsr = get_pmspr(SPRN_PMSR);
 
for (i = 0; i < nr_chips; i++)
-   if (chips[i].id == cpu_to_chip_id(cpu))
+   if (chips[i].id == chip_id)
break;
 
/* Check for Pmax Capping */
@@ -559,19 +561,29 @@ static int init_chip_info(void)
unsigned int chip[256];
unsigned int cpu, i;
unsigned int prev_chip_id = UINT_MAX;
+   cpumask_t cpu_mask;
+   int ret = -ENOMEM;
 
-   for_each_possible_cpu(cpu) {
+   core_to_chip_map = kcalloc(cpu_nr_cores(), sizeof(unsigned int),
+  GFP_KERNEL);
+   if (!core_to_chip_map)
+   goto out;
+
+   cpumask_copy(_mask, cpu_possible_mask);
+   for_each_cpu(cpu, _mask) {
unsigned int id = cpu_to_chip_id(cpu);
 
if (prev_chip_id != id) {
prev_chip_id = id;
chip[nr_chips++] = id;
}
+   core_to_chip_map[cpu_core_index_of_thread(cpu)] = id;
+   cpumask_andnot(_mask, _mask, cpu_sibling_mask(cpu));
}
 
chips = kmalloc_array(nr_chips, sizeof(struct chip), GFP_KERNEL);
if (!chips)
-   return -ENOMEM;
+   goto free_chip_map;
 
for (i = 0; i < nr_chips; i++) {
chips[i].id = chip[i];
@@ -582,6 +594,10 @@ static int init_chip_info(void)
}
 
return 0;
+free_chip_map:
+   kfree(core_to_chip_map);
+out:
+   return ret;
 }
 
 static int __init powernv_cpufreq_init(void)
@@ -616,6 +632,7 @@ static void __exit powernv_cpufreq_exit(void)
opal_message_notifier_unregister(OPAL_MSG_OCC,
 _cpufreq_opal_nb);
kfree(chips);
+   kfree(core_to_chip_map);
cpufreq_unregister_driver(_cpufreq_driver);
 }
 module_exit(powernv_cpufreq_exit);
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 0/6] cpufreq: powernv: Redesign the presentation of throttle notification and solve bug-fixes in the driver

2016-02-02 Thread Shilpasri G Bhat
In POWER8, OCC(On-Chip-Controller) can throttle the frequency of the
CPU when the chip crosses its thermal and power limits. Currently,
powernv-cpufreq driver detects and reports this event as a console
message. Some machines may not sustain the max turbo frequency in all
conditions and can be throttled frequently. This can lead to the
flooding of console with throttle messages. So this patchset aims to
redesign the presentation of this event via sysfs counters and
tracepoints. And it also fixes couple of bugs reported in the driver.

- Patch [1] fixes a memory leak bug
- Patch [2] fixes the cpu hot-plug bug in powernv_cpufreq_work_fn().
- Patch [3] solves a bug in powernv_cpufreq_throttle_check(), which
  calls in to cpu_to_chip_id() in hot path which reads DT every time
  to find the chip id.
- Patches [4] to [6] will add a perf trace point
  "power:powernv_throttle" and sysfs throttle counter stats in
  /sys/devices/system/cpu/cpufreq/chipN.

Changes from v7:
- Changes in patch[6] involves adding a table to represent the
  throtle stats in frequency X reason layout. Detailed version log
  in the patch.

Changes from v6:
- Changes wrt comments from Balbir Singh and Viresh Kumar. Details in
  the version log of the patches.

Changes from v5:
- Fix kbuild error:
drivers/cpufreq/powernv-cpufreq.c:428:2: error: implicit declaration of
function 'get_online_cpus' [-Werror=implicit-function-declaration]

Changes from v4:
- Fix a hot-plug bug in powernv_cpufreq_work_fn()
- Changes wrt Gautham's and Shreyas's comments 

Changes from v3:
- Add a fix to replace cpu_to_chip_id() with simpler PIR shift to 
  obtain the chip id.
- Break patch2 in to two patches separating the tracepoint and sysfs
  attribute changes.

Changes from v2:
- Fixed kbuild test warning.
drivers/cpufreq/powernv-cpufreq.c:609:2: warning: ignoring return
value of 'kstrtoint', declared with attribute warn_unused_result
[-Wunused-result]
Shilpasri G Bhat (6):
  cpufre: powernv: Free 'chips' on module exit
  cpufreq: powernv: Hot-plug safe the kworker thread
  cpufreq: powernv: Remove cpu_to_chip_id() from hot-path
  cpufreq: powernv/tracing: Add powernv_throttle tracepoint
  cpufreq: powernv: Replace pr_info with trace print for throttle event
  cpufreq: powernv: Add sysfs attributes to show throttle stats

 Documentation/ABI/testing/sysfs-devices-system-cpu |  66 +
 drivers/cpufreq/powernv-cpufreq.c  | 303 +
 include/trace/events/power.h   |  22 ++
 kernel/trace/power-traces.c|   1 +
 4 files changed, 337 insertions(+), 55 deletions(-)

-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[no subject]

2016-02-02 Thread David Rientjes via Linuxppc-dev
--- Begin Message ---
On Thu, 28 Jan 2016, David Rientjes wrote:

> On Thu, 28 Jan 2016, Christian Borntraeger wrote:
> 
> > Indeed, I only touched the identity mapping and dump stack.
> > The question is do we really want to change free_init_pages as well?
> > The unmapping during runtime causes significant overhead, but the
> > unmapping after init imposes almost no runtime overhead. Of course,
> > things get fishy now as what is enabled and what not.
> > 
> > Kconfig after my patch "mm/debug_pagealloc: Ask users for default setting 
> > of debug_pagealloc"
> > (in mm) now states
> > snip
> > By default this option will have a small overhead, e.g. by not
> > allowing the kernel mapping to be backed by large pages on some
> > architectures. Even bigger overhead comes when the debugging is
> > enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
> > command line parameter.
> > snip
> > 
> > So I am tempted to NOT change free_init_pages, but the x86 maintainers
> > can certainly decide differently. Ingo, Thomas, H. Peter, please advise.
> > 
> 
> I'm sorry, but I thought the discussion of the previous version of the 
> patchset led to deciding that all CONFIG_DEBUG_PAGEALLOC behavior would be 
> controlled by being enabled on the commandline and checked with 
> debug_pagealloc_enabled().
> 
> I don't think we should have a CONFIG_DEBUG_PAGEALLOC that does some stuff 
> and then a commandline parameter or CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT 
> to enable more stuff.  It should either be all enabled by the commandline 
> (or config option) or split into a separate entity.  
> CONFIG_DEBUG_PAGEALLOC_LIGHT and CONFIG_DEBUG_PAGEALLOC would be fine, but 
> the current state is very confusing about what is being done and what 
> isn't.
> 

Ping?
--- End Message ---
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/3] x86: query dynamic DEBUG_PAGEALLOC setting

2016-02-02 Thread Christian Borntraeger
On 02/02/2016 10:51 PM, David Rientjes wrote:
> On Thu, 28 Jan 2016, David Rientjes wrote:
> 
>> On Thu, 28 Jan 2016, Christian Borntraeger wrote:
>>
>>> Indeed, I only touched the identity mapping and dump stack.
>>> The question is do we really want to change free_init_pages as well?
>>> The unmapping during runtime causes significant overhead, but the
>>> unmapping after init imposes almost no runtime overhead. Of course,
>>> things get fishy now as what is enabled and what not.
>>>
>>> Kconfig after my patch "mm/debug_pagealloc: Ask users for default setting 
>>> of debug_pagealloc"
>>> (in mm) now states
>>> snip
>>> By default this option will have a small overhead, e.g. by not
>>> allowing the kernel mapping to be backed by large pages on some
>>> architectures. Even bigger overhead comes when the debugging is
>>> enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
>>> command line parameter.
>>> snip
>>>
>>> So I am tempted to NOT change free_init_pages, but the x86 maintainers
>>> can certainly decide differently. Ingo, Thomas, H. Peter, please advise.
>>>
>>
>> I'm sorry, but I thought the discussion of the previous version of the 
>> patchset led to deciding that all CONFIG_DEBUG_PAGEALLOC behavior would be 
>> controlled by being enabled on the commandline and checked with 
>> debug_pagealloc_enabled().
>>
>> I don't think we should have a CONFIG_DEBUG_PAGEALLOC that does some stuff 
>> and then a commandline parameter or CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT 
>> to enable more stuff.  It should either be all enabled by the commandline 
>> (or config option) or split into a separate entity.  
>> CONFIG_DEBUG_PAGEALLOC_LIGHT and CONFIG_DEBUG_PAGEALLOC would be fine, but 
>> the current state is very confusing about what is being done and what 
>> isn't.
>>
> 
> Ping?
> 
https://lkml.org/lkml/2016/1/29/266 
?

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 8/9] Implement kernel live patching for ppc64le (ABIv2)

2016-02-02 Thread Jiri Kosina
On Tue, 2 Feb 2016, Petr Mladek wrote:

> Note that TOC is not set only when the problematic functions are 
> compiled with --mprofile-kernel. I still see the TOC stuff when 
> compiling only with -pg.

I don't see how this wouldn't be a gcc bug.

No matter whether it's plain profiling call (-pg) or kernel profiling call 
(-mprofile-kernel), gcc must always assume that global function (that will 
typically have just one instance for the whole address space) will be 
called.

-- 
Jiri Kosina
SUSE Labs

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/3] x86: query dynamic DEBUG_PAGEALLOC setting

2016-02-02 Thread Andrew Morton
On Tue, 2 Feb 2016 22:53:36 +0100 Christian Borntraeger 
 wrote:

> >> I don't think we should have a CONFIG_DEBUG_PAGEALLOC that does some stuff 
> >> and then a commandline parameter or CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT 
> >> to enable more stuff.  It should either be all enabled by the commandline 
> >> (or config option) or split into a separate entity.  
> >> CONFIG_DEBUG_PAGEALLOC_LIGHT and CONFIG_DEBUG_PAGEALLOC would be fine, but 
> >> the current state is very confusing about what is being done and what 
> >> isn't.
> >>
> > 
> > Ping?
> > 
> https://lkml.org/lkml/2016/1/29/266 

That's already in linux-next so I can't apply it.

Well, I can, but it's a hassle.  What's happening here?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] powerpc/eeh: Check for EEH availability in eeh_add_device_early()

2016-02-02 Thread Gavin Shan
On Tue, Jan 19, 2016 at 06:18:19PM -0200, Guilherme G. Piccoli wrote:
>The function eeh_add_device_early() is used to perform EEH initialization in
>devices added later on the system, like in hotplug/DLPAR scenarios. Since the
>commit 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on 
>Cell")
>a new check was introduced in this function - Cell has no EEH capabilities
>which led to kernel oops if hotplug was performed, so checking for
>eeh_enabled() was introduced to avoid the issue.
>
>However, in architectures that EEH is present like pSeries or PowerNV, we might
>reach a case in which no PCI devices are present on boot and so EEH is not
>initialized. Then, if a device is added via DLPAR for example,
>eeh_add_device_early() fails because eeh_enabled() is false.
>
>Also, we can hit a kernel oops on pSeries arch if eeh_add_device_early() fails:
>if we have no PCI devices on machine at boot time, and then we add a PCI device
>via DLPAR operation, the function query_ddw() triggers the oops on NULL pointer
>dereference in the line "cfg_addr = edev->config_addr;". It happens because
>config_addr in edev is NULL, since the function eeh_add_device_early() was not
>completed successfully.
>
>This patch just changes the way the arch checking is done in function
>eeh_add_device_early(): we don't use eeh_enabled() anymore, but instead we
>introduce the function eeh_available() that checks the running architecture
>by using the macro machine_is(). If we are running on pSeries or PowerNV, the
>EEH mechanism is available (even if not initialized yet). This way, we don't
>try to enable EEH on Cell and we don't hit the oops on DLPAR either.
>
>Fixes: 89a51df5ab1d ("powerpc/eeh: Fix crash in eeh_add_device_early() on 
>Cell")
>Signed-off-by: Guilherme G. Piccoli 
>---
> arch/powerpc/kernel/eeh.c | 19 ++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
>diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>index 40e4d4a..69031d7 100644
>--- a/arch/powerpc/kernel/eeh.c
>+++ b/arch/powerpc/kernel/eeh.c
>@@ -1056,6 +1056,23 @@ int eeh_init(void)
> core_initcall_sync(eeh_init);
>
> /**
>+ * eeh_available - Checks for the availability of EEH based on running
>+ * architecture.
>+ *
>+ * This routine should be used in case we need to check if EEH is
>+ * available in some situation, regardless if EEH is enabled or not.
>+ * For example, if we hotplug-add a PCI device on a machine with no
>+ * other PCI device, EEH won't be enabled, yet it's available if the
>+ * arch supports it.
>+ */
>+static inline bool eeh_available(void)
>+{
>+  if (machine_is(pseries) || machine_is(powernv))
>+  return true;
>+  return false;
>+}
>+

As I was told by somebody else before, the comments for static function
needn't to be exported.

>+/**
>  * eeh_add_device_early - Enable EEH for the indicated device node
>  * @pdn: PCI device node for which to set up EEH
>  *
>@@ -1072,7 +1089,7 @@ void eeh_add_device_early(struct pci_dn *pdn)
>   struct pci_controller *phb;
>   struct eeh_dev *edev = pdn_to_eeh_dev(pdn);
>
>-  if (!edev || !eeh_enabled())
>+  if (!edev || !eeh_available())
>   return;
>
>   if (!eeh_has_flag(EEH_PROBE_MODE_DEVTREE))

The change here seems not correct enough. Before the changes, 
eeh_add_device_early()
does nothing if EEH is disabled on pSeries. With the changes applied, the EEH 
device
(edev) will be scanned even EEH is disabled on pSeries.

From the code changes, I didn't figure out the real problem you try to fix. Cell
platform doesn't have flag EEH_PROBE_MODE_DEVTREE. So the function does nothing
on Cell platform except calling into pdn_to_eeh_dev(). I'm not sure if the 
kernel
crashed in pdn_to_eeh_dev() on Cell platform?

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/3] x86: query dynamic DEBUG_PAGEALLOC setting

2016-02-02 Thread Christian Borntraeger
On 02/02/2016 11:21 PM, Andrew Morton wrote:
> On Tue, 2 Feb 2016 22:53:36 +0100 Christian Borntraeger 
>  wrote:
> 
 I don't think we should have a CONFIG_DEBUG_PAGEALLOC that does some stuff 
 and then a commandline parameter or CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT 
 to enable more stuff.  It should either be all enabled by the commandline 
 (or config option) or split into a separate entity.  
 CONFIG_DEBUG_PAGEALLOC_LIGHT and CONFIG_DEBUG_PAGEALLOC would be fine, but 
 the current state is very confusing about what is being done and what 
 isn't.

>>>
>>> Ping?
>>>
>> https://lkml.org/lkml/2016/1/29/266 
> 
> That's already in linux-next so I can't apply it.
> 
> Well, I can, but it's a hassle.  What's happening here?

I pushed it on my tree for kbuild testing purposes some days ago. 
Will drop so that it can go via mm.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v10 3/8] dt, numa: adding numa dt binding implementation.

2016-02-02 Thread Rob Herring
On Tue, Feb 02, 2016 at 03:39:18PM +0530, Ganapatrao Kulkarni wrote:
> dt node parsing for numa topology is done using device property
> numa-node-id and device node distance-map.
> 
> Reviewed-by: Robert Richter 
> Signed-off-by: Ganapatrao Kulkarni 
> ---
>  drivers/of/Kconfig   |  11 +++
>  drivers/of/Makefile  |   1 +
>  drivers/of/of_numa.c | 207 
> +++
>  include/linux/of.h   |   4 +
>  4 files changed, 223 insertions(+)
>  create mode 100644 drivers/of/of_numa.c
> 
> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
> index e2a4841..8f9cc3a 100644
> --- a/drivers/of/Kconfig
> +++ b/drivers/of/Kconfig
> @@ -112,4 +112,15 @@ config OF_OVERLAY
> While this option is selected automatically when needed, you can
> enable it manually to improve device tree unit test coverage.
>  
> +config OF_NUMA
> + bool "Device Tree NUMA support"

Does this need to be user visible?

> + depends on NUMA
> + depends on OF
> + depends on ARM64

drop this (and make sure it compiles on other arches). It will fail 
because you also have a dependency on FDT.

> + default y
> + help
> +   Enable Device Tree NUMA support.
> +   This enables the numa mapping of cpu, memory, io and
> +   inter node distances using dt bindings.
> +
>  endif # OF
> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
> index 156c072..bee3fa9 100644
> --- a/drivers/of/Makefile
> +++ b/drivers/of/Makefile
> @@ -14,5 +14,6 @@ obj-$(CONFIG_OF_MTD)+= of_mtd.o
>  obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
>  obj-$(CONFIG_OF_RESOLVE)  += resolver.o
>  obj-$(CONFIG_OF_OVERLAY) += overlay.o
> +obj-$(CONFIG_OF_NUMA) += of_numa.o
>  
>  obj-$(CONFIG_OF_UNITTEST) += unittest-data/
> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
> new file mode 100644
> index 000..1142cdb
> --- /dev/null
> +++ b/drivers/of/of_numa.c
> @@ -0,0 +1,207 @@
> +/*
> + * OF NUMA Parsing support.
> + *
> + * Copyright (C) 2015 Cavium Inc.
> + * Author: Ganapatrao Kulkarni 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 

Surely you need some numa related includes.

> +
> +/* define default numa node to 0 */
> +#define DEFAULT_NODE 0
> +
> +/* Returns nid in the range [0..MAX_NUMNODES-1],
> + * or NUMA_NO_NODE if no valid numa-node-id entry found
> + * or DEFAULT_NODE if no numa-node-id entry exists
> + */
> +static int of_numa_prop_to_nid(const __be32 *of_numa_prop, int length)
> +{
> + int nid;
> +
> + if (!of_numa_prop)
> + return DEFAULT_NODE;
> +
> + if (length != sizeof(*of_numa_prop)) {
> + pr_warn("NUMA: Invalid of_numa_prop length %d found.\n",
> + length);
> + return NUMA_NO_NODE;
> + }
> +
> + nid = of_read_number(of_numa_prop, 1);
> + if (nid >= MAX_NUMNODES) {
> + pr_warn("NUMA: Invalid numa node %d found.\n", nid);
> + return NUMA_NO_NODE;
> + }
> +
> + return nid;
> +}
> +
> +static int __init early_init_of_node_to_nid(unsigned long node)
> +{
> + int length;
> + const __be32 *of_numa_prop;
> +
> + of_numa_prop = of_get_flat_dt_prop(node, "numa-node-id", );
> +
> + return of_numa_prop_to_nid(of_numa_prop, length);
> +}
> +
> +/*
> + * Even though we connect cpus to numa domains later in SMP
> + * init, we need to know the node ids now for all cpus.
> +*/
> +static int __init early_init_parse_cpu_node(unsigned long node)
> +{
> + int nid;
> + const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +
> + if (type == NULL)
> + return 0;
> +
> + if (strcmp(type, "cpu") != 0)
> + return 0;
> +
> + nid = early_init_of_node_to_nid(node);
> + if (nid == NUMA_NO_NODE)
> + return -EINVAL;
> +
> + node_set(nid, numa_nodes_parsed);
> + return 0;
> +}
> +
> +static int __init early_init_parse_memory_node(unsigned long node)
> +{
> + const __be32 *reg, *endp;
> + int length;
> + int nid;
> + const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +
> + if (type == NULL)
> + return 0;
> +
> + if (strcmp(type, "memory") != 0)
> + return 0;
> +
> + nid = early_init_of_node_to_nid(node);
> + if (nid == 

Re: [PATCH v3 2/3] x86: query dynamic DEBUG_PAGEALLOC setting

2016-02-02 Thread Andrew Morton
On Tue, 2 Feb 2016 23:37:50 +0100 Christian Borntraeger 
 wrote:

> On 02/02/2016 11:21 PM, Andrew Morton wrote:
> > On Tue, 2 Feb 2016 22:53:36 +0100 Christian Borntraeger 
> >  wrote:
> > 
>  I don't think we should have a CONFIG_DEBUG_PAGEALLOC that does some 
>  stuff 
>  and then a commandline parameter or 
>  CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT 
>  to enable more stuff.  It should either be all enabled by the 
>  commandline 
>  (or config option) or split into a separate entity.  
>  CONFIG_DEBUG_PAGEALLOC_LIGHT and CONFIG_DEBUG_PAGEALLOC would be fine, 
>  but 
>  the current state is very confusing about what is being done and what 
>  isn't.
> 
> >>>
> >>> Ping?
> >>>
> >> https://lkml.org/lkml/2016/1/29/266 
> > 
> > That's already in linux-next so I can't apply it.
> > 
> > Well, I can, but it's a hassle.  What's happening here?
> 
> I pushed it on my tree for kbuild testing purposes some days ago. 
> Will drop so that it can go via mm.

There are other patches that I haven't merged because they were already
in -next.  In fact I think I dropped them because they later popped up
in -next.

Some or all of:

lib-spinlock_debugc-prevent-an-infinite-recursive-cycle-in-spin_dump.patch
mm-provide-debug_pagealloc_enabled-without-config_debug_pagealloc.patch
x86-query-dynamic-debug_pagealloc-setting.patch
s390-query-dynamic-debug_pagealloc-setting.patch
mm-provide-debug_pagealloc_enabled-without-config_debug_pagealloc.patch
x86-query-dynamic-debug_pagealloc-setting.patch
s390-query-dynamic-debug_pagealloc-setting.patch

So please resend everything which you think is needed.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc/pseries: Check if EEH is enabled on DDW mechanism code

2016-02-02 Thread Gavin Shan
On Tue, Jan 19, 2016 at 06:18:20PM -0200, Guilherme G. Piccoli wrote:
>The Dynamic DMA Window (DDW) mechanism relies on EEH to obtain the
>configuration address of devices. For example, the functions query_ddw()
>and create_ddw() make use of eeh_dev struct. So, the dependency is
>intrinsic - DDW mechanism will fail if EEH is not enabled.
>
>Despite this dependency, no check for EEH availability is performed in DDW
>code. This patch adds a check based on eeh_enabled() function, so if EEH is
>not enabled before eeh_dev struct use, DDW will fallback to default iommu
>mechanism and won't fail.
>
>One use case for this patch is when we disable EEH globally via kernel
>command-line ("eeh=off") - without the patch, a device probe can hit a kernel
>oops because EEH is disabled but DDW will try to use it.
>
>Signed-off-by: Guilherme G. Piccoli 
>---
> arch/powerpc/platforms/pseries/iommu.c | 6 --
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
>diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>b/arch/powerpc/platforms/pseries/iommu.c
>index bd98ce2..1ff55cc 100644
>--- a/arch/powerpc/platforms/pseries/iommu.c
>+++ b/arch/powerpc/platforms/pseries/iommu.c
>@@ -1224,8 +1224,10 @@ static int dma_set_mask_pSeriesLP(struct device *dev, 
>u64 dma_mask)
>
>   pdev = to_pci_dev(dev);
>
>-  /* only attempt to use a new window if 64-bit DMA is requested */
>-  if (!disable_ddw && dma_mask == DMA_BIT_MASK(64)) {
>+  /* We should check if EEH is enabled here, since DDW mechanism has
>+   * an intrinsic dependency of EEH config addr information. Also, we
>+   * only attempt to use a new window if 64-bit DMA is requested */
>+  if (eeh_enabled() && !disable_ddw && dma_mask == DMA_BIT_MASK(64)) {
>   dn = pci_device_to_OF_node(pdev);
>   dev_dbg(dev, "node is %s\n", dn->full_name);
>

There are two types of addresses: (1) PCI config address (2) PE config address.
(1) is used to indentify one PCI device which is included in the PE. (2) is the
PCI config address of PE's primary bus in pHyp. Both of them can be used to 
identify
the PE. It means the (1) PCI config address, which is retrieved from pci_dn, 
can be
passed to hypervisor. Then we don't have to disable DDW when EEH is disabled.

Guilherme, did you hit the crash on pHyp or PowerKVM?

Thanks,
Gavin 

>-- 
>2.1.0
>

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Failure on latest GIT - appletouch fails to register

2016-02-02 Thread Mike
Normally as far as i can tell it should register it as input and that's not
happening, verified the config entry CONFIG_MOUSE_APPLETOUCH is selected
and compiled in both tests i've done for rc-1 and rc-2.

syslog.1:Feb  1 21:58:40 PowerBook-G4 kernel: [   14.744538] usbcore:
registered new interface driver appletouch
syslog.1:Feb  1 22:28:05 PowerBook-G4 kernel: [   14.904449] input:
appletouch as /devices/pci0001:10/0001:10:1a.0/usb2/2-2/2-2:1.1/input/input6
syslog.1:Feb  1 22:28:05 PowerBook-G4 kernel: [   14.905904] usbcore:
registered new interface driver appletouch
syslog.1:Feb  2 03:39:38 PowerBook-G4 kernel: [   15.423289] usbcore:
registered new interface driver appletouch

Any clues? Worked in 4.3.3 at least . Anyone point me to where i could
begin to look? Any structures changed which could account for this?


On 2 February 2016 at 17:55, Mike  wrote:

> Agreed, raised an eyebrow initially when select ppc64 and 32 :D
>
> I'll give a word on the trackpad issue later, cant remember seeing any
> changes that ought effect it really. guess the compile is done in a good
> hour or so, took the tiime to slim it down to someting reasonable
>
> Thanks man
> On 2 Feb 2016 18:14, "Pranith Kumar"  wrote:
>
>> On Tue, Feb 2, 2016 at 1:48 AM, Aneesh Kumar K.V
>>  wrote:
>> >
>> > This patch didn't work for you ?
>> >
>> >
>> http://mid.gmane.org/1454086969-21074-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com
>> >
>>
>> This actually is a better patch. I didn't realize that we have the _64
>> version.
>>
>> Thanks!
>> --
>> Pranith
>>
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/3] x86: query dynamic DEBUG_PAGEALLOC setting

2016-02-02 Thread Stephen Rothwell
Hi Andrew,

On Tue, 2 Feb 2016 15:04:35 -0800 Andrew Morton  
wrote:
>
> On Tue, 2 Feb 2016 23:37:50 +0100 Christian Borntraeger 
>  wrote:
> 
> > 
> > I pushed it on my tree for kbuild testing purposes some days ago. 
> > Will drop so that it can go via mm.  
> 
> There are other patches that I haven't merged because they were already
> in -next.  In fact I think I dropped them because they later popped up
> in -next.
> 
> Some or all of:
> 
> lib-spinlock_debugc-prevent-an-infinite-recursive-cycle-in-spin_dump.patch
> mm-provide-debug_pagealloc_enabled-without-config_debug_pagealloc.patch
> x86-query-dynamic-debug_pagealloc-setting.patch
> s390-query-dynamic-debug_pagealloc-setting.patch
> mm-provide-debug_pagealloc_enabled-without-config_debug_pagealloc.patch
> x86-query-dynamic-debug_pagealloc-setting.patch
> s390-query-dynamic-debug_pagealloc-setting.patch
> 
> So please resend everything which you think is needed.

Christian's tree will be empty in today's linux-next (I just refetched it).

-- 
Cheers,
Stephen Rothwell
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 4/6] cpufreq: powernv/tracing: Add powernv_throttle tracepoint

2016-02-02 Thread Shilpasri G Bhat
This patch adds the powernv_throttle tracepoint to trace the CPU
frequency throttling event, which is used by the powernv-cpufreq
driver in POWER8.

Signed-off-by: Shilpasri G Bhat 
Reviewed-by: Gautham R. Shenoy 
---
No changes from v7.

 include/trace/events/power.h | 22 ++
 kernel/trace/power-traces.c  |  1 +
 2 files changed, 23 insertions(+)

diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 284244e..19e5030 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -38,6 +38,28 @@ DEFINE_EVENT(cpu, cpu_idle,
TP_ARGS(state, cpu_id)
 );
 
+TRACE_EVENT(powernv_throttle,
+
+   TP_PROTO(int chip_id, const char *reason, int pmax),
+
+   TP_ARGS(chip_id, reason, pmax),
+
+   TP_STRUCT__entry(
+   __field(int, chip_id)
+   __string(reason, reason)
+   __field(int, pmax)
+   ),
+
+   TP_fast_assign(
+   __entry->chip_id = chip_id;
+   __assign_str(reason, reason);
+   __entry->pmax = pmax;
+   ),
+
+   TP_printk("Chip %d Pmax %d %s", __entry->chip_id,
+ __entry->pmax, __get_str(reason))
+);
+
 TRACE_EVENT(pstate_sample,
 
TP_PROTO(u32 core_busy,
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index eb4220a..81b8745 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -15,4 +15,5 @@
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(suspend_resume);
 EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
+EXPORT_TRACEPOINT_SYMBOL_GPL(powernv_throttle);
 
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 4/9] arch/powerpc: Clean up memory hotplug failure paths

2016-02-02 Thread David Gibson
On Tue, Feb 02, 2016 at 09:04:23AM -0600, Nathan Fontenot wrote:
> On 01/28/2016 11:23 PM, David Gibson wrote:
> > This makes a number of cleanups to handling of mapping failures during
> > memory hotplug on Power:
> > 
> > For errors creating the linear mapping for the hot-added region:
> >   * This is now reported with EFAULT which is more appropriate than the
> > previous EINVAL (the failure is unlikely to be related to the
> > function's parameters)
> >   * An error in this path now prints a warning message, rather than just
> > silently failing to add the extra memory.
> >   * Previously a failure here could result in the region being partially
> > mapped.  We now clean up any partial mapping before failing.
> > 
> > For errors creating the vmemmap for the hot-added region:
> >* This is now reported with EFAULT instead of causing a BUG() - this
> >  could happen for external reason (e.g. full hash table) so it's better
> >  to handle this non-fatally
> >* An error message is also printed, so the failure won't be silent
> >* As above a failure could cause a partially mapped region, we now
> >  clean this up.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  arch/powerpc/mm/hash_utils_64.c | 13 ++---
> >  arch/powerpc/mm/init_64.c   | 38 ++
> >  arch/powerpc/mm/mem.c   | 10 --
> >  3 files changed, 44 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/hash_utils_64.c 
> > b/arch/powerpc/mm/hash_utils_64.c
> > index 0737eae..e88a86e 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -635,9 +635,16 @@ static unsigned long __init htab_get_table_size(void)
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> >  int create_section_mapping(unsigned long start, unsigned long end)
> >  {
> > -   return htab_bolt_mapping(start, end, __pa(start),
> > -pgprot_val(PAGE_KERNEL), mmu_linear_psize,
> > -mmu_kernel_ssize);
> > +   int rc = htab_bolt_mapping(start, end, __pa(start),
> > +  pgprot_val(PAGE_KERNEL), mmu_linear_psize,
> > +  mmu_kernel_ssize);
> > +
> > +   if (rc < 0) {
> > +   int rc2 = htab_remove_mapping(start, end, mmu_linear_psize,
> > + mmu_kernel_ssize);
> > +   BUG_ON(rc2 && (rc2 != -ENOENT));
> > +   }
> > +   return rc;
> >  }
> >  
> 
> <-- snip -->
> 
> >  #ifdef CONFIG_MEMORY_HOTPLUG
> > @@ -217,15 +219,20 @@ static void vmemmap_remove_mapping(unsigned long 
> > start,
> >  }
> >  #endif
> >  #else /* CONFIG_PPC_BOOK3E */
> > -static void __meminit vmemmap_create_mapping(unsigned long start,
> > -unsigned long page_size,
> > -unsigned long phys)
> > +static int __meminit vmemmap_create_mapping(unsigned long start,
> > +   unsigned long page_size,
> > +   unsigned long phys)
> >  {
> > -   int  mapped = htab_bolt_mapping(start, start + page_size, phys,
> > -   pgprot_val(PAGE_KERNEL),
> > -   mmu_vmemmap_psize,
> > -   mmu_kernel_ssize);
> > -   BUG_ON(mapped < 0);
> > +   int rc = htab_bolt_mapping(start, start + page_size, phys,
> > +  pgprot_val(PAGE_KERNEL),
> > +  mmu_vmemmap_psize, mmu_kernel_ssize);
> > +   if (rc < 0) {
> > +   int rc2 = htab_remove_mapping(start, start + page_size,
> > + mmu_vmemmap_psize,
> > + mmu_kernel_ssize);
> > +   BUG_ON(rc2 && (rc2 != -ENOENT));
> > +   }
> > +   return rc;
> >  }
> >  
> 
> If I'm reading this correctly it appears that create_section_mapping() and
> vmemmap_create_mapping() for !PPC_BOOK3E are identical. Any reason to not
> have one routine, perhaps just have vmemmap_create_mapping() just call
> create_section_mapping()?

Not really, apart from documenting what they're used for.  They're
both fairly trivial wrappers around htab_bolt_mapping().  I think
cleaning this up is outside the scope of this series though.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] ppc64 boot: Wait for boot cpu to show up if nr_cpus limit is about to hit.

2016-02-02 Thread Mahesh Jagannath Salgaonkar
On 02/02/2016 07:28 PM, Denis Kirjanov wrote:
> On 2/1/16, Mahesh J Salgaonkar  wrote:
>> From: Mahesh Salgaonkar 
>>
>> The kernel boot parameter 'nr_cpus=' allows one to specify number of
>> possible cpus in the system. In the normal scenario the first cpu (cpu0)
>> that shows up is the boot cpu and hence it gets covered under nr_cpus
>> limit.
>>
>> But this assumption will be broken in kdump scenario where kdump kenrel
>> after a crash can boot up on an non-zero boot cpu. The paca structure
>> allocation depends on value of nr_cpus and is indexed using logical cpu
>> ids. This definetly will be an issue if boot cpu id > nr_cpus
> And what happend in this case? Have you tried it out?

Yes I have. It results into memory corruption when
set_hard_smp_processor_id(boot_cpu_id,..) is called from
early_init_dt_scan_cpus() and then kernel fails to boot. Nothing shows
up in console and system hangs forever.

You can easily reproduce this by configuring kdump service to use
'nr_cpus=1' instead of 'maxcpus=1' and then trigger system crash from
any cpu other than 0

e.g.  $ taskset -c 10 echo c > /proc/sysrq-trigger

Thanks,
-Mahesh.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4] perf/probe: Search both .eh_frame and .debug_frame sections for probe location

2016-02-02 Thread Hemant Kumar
perf probe through debuginfo__find_probes() in util/probe-finder.c
checks for the functions' frame descriptions in either .eh_frame section
of an ELF or the .debug_frame. The check is based on whether either one
of these sections is present. Depending on distro, toolchain defaults,
architetcutre, build flags, etc., CFI might be found in either .eh_frame
and/or .debug_frame. Sometimes, it may happen that, .eh_frame, even if
present, may not be complete and may miss some descriptions. Therefore,
to be sure, to find the CFI covering an address we will always have to
investigate both if available.

For e.g., in powerpc, this may happen :
 $ gcc -g bin.c -o bin

 $ objdump --dwarf ./bin
 <1><145>: Abbrev Number: 7 (DW_TAG_subprogram)
<146>   DW_AT_external: 1
<146>   DW_AT_name: (indirect string, offset: 0x9e): main
<14a>   DW_AT_decl_file   : 1
<14b>   DW_AT_decl_line   : 39
<14c>   DW_AT_prototyped  : 1
<14c>   DW_AT_type: <0x57>
<150>   DW_AT_low_pc  : 0x17b8

If the .eh_frame and .debug_frame are checked for the same binary, we
will find that, .eh_frame (although present) doesn't contain a
description for "main" function.
But, .debug_frame has a description :

00d8 0024  FDE cie= pc=17b8..1838
  DW_CFA_advance_loc: 16 to 17c8
  DW_CFA_def_cfa_offset: 144
  DW_CFA_offset_extended_sf: r65 at cfa+16
...

Due to this (since, perf checks whether .eh_frame is present and goes on
searching for that address inside that frame), perf is unable to process
the probes :
 # perf probe -x ./bin main
Failed to get call frame on 0x17b8
  Error: Failed to add events.

To avoid this issue, we need to check both the sections (.eh_frame and
.debug_frame), which is done in this patch.

Note that, we can always force everything into both .eh_frame and
.debug_frame by :
 $ gcc bin.c -fasynchronous-unwind-tables  -fno-dwarf2-cfi-asm -g -o bin

Acked-by: Masami Hiramatsu 
Signed-off-by: Hemant Kumar 
---
Changes since v3:
- Rebased it to v4.5-rc2.
Changes since v2:
- Fixed an issue related to filling up both the CFIs (Suggested by Masami).
Changes since v1:
- pf->cfi is now cached as pf->cfi_eh and pf->cfi_dbg depending on the source 
of CFI
  (Suggested by Mark Wielard).

 tools/perf/util/probe-finder.c | 62 +-
 tools/perf/util/probe-finder.h |  5 +++-
 2 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 2be10fb..4ce5c5e 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -686,8 +686,9 @@ static int call_probe_finder(Dwarf_Die *sc_die, struct 
probe_finder *pf)
pf->fb_ops = NULL;
 #if _ELFUTILS_PREREQ(0, 142)
} else if (nops == 1 && pf->fb_ops[0].atom == DW_OP_call_frame_cfa &&
-  pf->cfi != NULL) {
-   if (dwarf_cfi_addrframe(pf->cfi, pf->addr, ) != 0 ||
+  (pf->cfi_eh != NULL || pf->cfi_dbg != NULL)) {
+   if ((dwarf_cfi_addrframe(pf->cfi_eh, pf->addr, ) != 0 &&
+(dwarf_cfi_addrframe(pf->cfi_dbg, pf->addr, ) != 0)) 
||
dwarf_frame_cfa(frame, >fb_ops, ) != 0) {
pr_warning("Failed to get call frame on 0x%jx\n",
   (uintmax_t)pf->addr);
@@ -1015,8 +1016,7 @@ static int pubname_search_cb(Dwarf *dbg, Dwarf_Global 
*gl, void *data)
return DWARF_CB_OK;
 }
 
-/* Find probe points from debuginfo */
-static int debuginfo__find_probes(struct debuginfo *dbg,
+static int debuginfo__find_probe_location(struct debuginfo *dbg,
  struct probe_finder *pf)
 {
struct perf_probe_point *pp = >pev->point;
@@ -1025,27 +1025,6 @@ static int debuginfo__find_probes(struct debuginfo *dbg,
Dwarf_Die *diep;
int ret = 0;
 
-#if _ELFUTILS_PREREQ(0, 142)
-   Elf *elf;
-   GElf_Ehdr ehdr;
-   GElf_Shdr shdr;
-
-   /* Get the call frame information from this dwarf */
-   elf = dwarf_getelf(dbg->dbg);
-   if (elf == NULL)
-   return -EINVAL;
-
-   if (gelf_getehdr(elf, ) == NULL)
-   return -EINVAL;
-
-   if (elf_section_by_name(elf, , , ".eh_frame", NULL) &&
-   shdr.sh_type == SHT_PROGBITS) {
-   pf->cfi = dwarf_getcfi_elf(elf);
-   } else {
-   pf->cfi = dwarf_getcfi(dbg->dbg);
-   }
-#endif
-
off = 0;
pf->lcache = intlist__new(NULL);
if (!pf->lcache)
@@ -1108,6 +1087,39 @@ found:
return ret;
 }
 
+/* Find probe points from debuginfo */
+static int debuginfo__find_probes(struct debuginfo *dbg,
+ struct probe_finder *pf)
+{
+   int ret = 0;
+
+#if _ELFUTILS_PREREQ(0, 142)
+   Elf *elf;
+   GElf_Ehdr ehdr;
+   GElf_Shdr shdr;
+
+   if (pf->cfi_eh 

Re: [RFCv2 4/9] arch/powerpc: Clean up memory hotplug failure paths

2016-02-02 Thread Nathan Fontenot
On 01/28/2016 11:23 PM, David Gibson wrote:
> This makes a number of cleanups to handling of mapping failures during
> memory hotplug on Power:
> 
> For errors creating the linear mapping for the hot-added region:
>   * This is now reported with EFAULT which is more appropriate than the
> previous EINVAL (the failure is unlikely to be related to the
> function's parameters)
>   * An error in this path now prints a warning message, rather than just
> silently failing to add the extra memory.
>   * Previously a failure here could result in the region being partially
> mapped.  We now clean up any partial mapping before failing.
> 
> For errors creating the vmemmap for the hot-added region:
>* This is now reported with EFAULT instead of causing a BUG() - this
>  could happen for external reason (e.g. full hash table) so it's better
>  to handle this non-fatally
>* An error message is also printed, so the failure won't be silent
>* As above a failure could cause a partially mapped region, we now
>  clean this up.
> 
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/mm/hash_utils_64.c | 13 ++---
>  arch/powerpc/mm/init_64.c   | 38 ++
>  arch/powerpc/mm/mem.c   | 10 --
>  3 files changed, 44 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 0737eae..e88a86e 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -635,9 +635,16 @@ static unsigned long __init htab_get_table_size(void)
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  int create_section_mapping(unsigned long start, unsigned long end)
>  {
> - return htab_bolt_mapping(start, end, __pa(start),
> -  pgprot_val(PAGE_KERNEL), mmu_linear_psize,
> -  mmu_kernel_ssize);
> + int rc = htab_bolt_mapping(start, end, __pa(start),
> +pgprot_val(PAGE_KERNEL), mmu_linear_psize,
> +mmu_kernel_ssize);
> +
> + if (rc < 0) {
> + int rc2 = htab_remove_mapping(start, end, mmu_linear_psize,
> +   mmu_kernel_ssize);
> + BUG_ON(rc2 && (rc2 != -ENOENT));
> + }
> + return rc;
>  }
>  

<-- snip -->

>  #ifdef CONFIG_MEMORY_HOTPLUG
> @@ -217,15 +219,20 @@ static void vmemmap_remove_mapping(unsigned long start,
>  }
>  #endif
>  #else /* CONFIG_PPC_BOOK3E */
> -static void __meminit vmemmap_create_mapping(unsigned long start,
> -  unsigned long page_size,
> -  unsigned long phys)
> +static int __meminit vmemmap_create_mapping(unsigned long start,
> + unsigned long page_size,
> + unsigned long phys)
>  {
> - int  mapped = htab_bolt_mapping(start, start + page_size, phys,
> - pgprot_val(PAGE_KERNEL),
> - mmu_vmemmap_psize,
> - mmu_kernel_ssize);
> - BUG_ON(mapped < 0);
> + int rc = htab_bolt_mapping(start, start + page_size, phys,
> +pgprot_val(PAGE_KERNEL),
> +mmu_vmemmap_psize, mmu_kernel_ssize);
> + if (rc < 0) {
> + int rc2 = htab_remove_mapping(start, start + page_size,
> +   mmu_vmemmap_psize,
> +   mmu_kernel_ssize);
> + BUG_ON(rc2 && (rc2 != -ENOENT));
> + }
> + return rc;
>  }
>  

If I'm reading this correctly it appears that create_section_mapping() and
vmemmap_create_mapping() for !PPC_BOOK3E are identical. Any reason to not
have one routine, perhaps just have vmemmap_create_mapping() just call
create_section_mapping()?

-Nathan

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v10 7/8] topology, cleanup: Avoid redefinition of cpumask_of_pcibus in asm header files.

2016-02-02 Thread Ganapatrao Kulkarni
At present cpumask_of_pcibus is defined for !CONFIG_NUMA and moving out
to common will allow to use for numa too. This also avoids
redefinition of this macro in respective architecture header files.

Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
---
 arch/arm64/include/asm/topology.h   | 3 ---
 arch/ia64/include/asm/topology.h| 4 
 arch/metag/include/asm/topology.h   | 3 ---
 arch/powerpc/include/asm/topology.h | 4 
 arch/s390/include/asm/pci.h | 2 +-
 arch/s390/include/asm/topology.h| 1 +
 arch/sh/include/asm/topology.h  | 3 ---
 arch/tile/include/asm/pci.h | 2 --
 arch/tile/include/asm/topology.h| 3 +++
 arch/x86/include/asm/pci.h  | 2 +-
 arch/x86/include/asm/topology.h | 1 +
 include/asm-generic/topology.h  | 4 ++--
 12 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h 
b/arch/arm64/include/asm/topology.h
index 8b57339..6e1f62c 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -26,9 +26,6 @@ const struct cpumask *cpu_coregroup_mask(int cpu);
 
 struct pci_bus;
 int pcibus_to_node(struct pci_bus *bus);
-#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ?\
-cpu_all_mask : \
-cpumask_of_node(pcibus_to_node(bus)))
 
 #endif /* CONFIG_NUMA */
 
diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
index 3ad8f69..2778eb6 100644
--- a/arch/ia64/include/asm/topology.h
+++ b/arch/ia64/include/asm/topology.h
@@ -58,10 +58,6 @@ void build_cpu_to_node_map(void);
 
 extern void arch_fix_phys_package_id(int num, u32 slot);
 
-#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ?\
-cpu_all_mask : \
-cpumask_of_node(pcibus_to_node(bus)))
-
 #include 
 
 #endif /* _ASM_IA64_TOPOLOGY_H */
diff --git a/arch/metag/include/asm/topology.h 
b/arch/metag/include/asm/topology.h
index e95f874..b285196 100644
--- a/arch/metag/include/asm/topology.h
+++ b/arch/metag/include/asm/topology.h
@@ -9,9 +9,6 @@
 #define cpumask_of_node(node)  ((void)node, cpu_online_mask)
 
 #define pcibus_to_node(bus)((void)(bus), -1)
-#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ? \
-   cpu_all_mask : \
-   cpumask_of_node(pcibus_to_node(bus)))
 
 #endif
 
diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 8b3b46b..eee025d 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -32,10 +32,6 @@ static inline int pcibus_to_node(struct pci_bus *bus)
 }
 #endif
 
-#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ?\
-cpu_all_mask : \
-cpumask_of_node(pcibus_to_node(bus)))
-
 extern int __node_distance(int, int);
 #define node_distance(a, b) __node_distance(a, b)
 
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h
index c873e68..539fb2d 100644
--- a/arch/s390/include/asm/pci.h
+++ b/arch/s390/include/asm/pci.h
@@ -205,7 +205,7 @@ static inline int __pcibus_to_node(const struct pci_bus 
*bus)
 }
 
 static inline const struct cpumask *
-cpumask_of_pcibus(const struct pci_bus *bus)
+__cpumask_of_pcibus(const struct pci_bus *bus)
 {
return cpu_online_mask;
 }
diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
index 6b53962..ac2f88f 100644
--- a/arch/s390/include/asm/topology.h
+++ b/arch/s390/include/asm/topology.h
@@ -78,6 +78,7 @@ static inline const struct cpumask *cpumask_of_node(int node)
 #define parent_node(node) (node)
 
 #define pcibus_to_node(bus) __pcibus_to_node(bus)
+#define cpumask_of_pcibus(bus) __cpumask_of_pcibus(bus)
 
 #define node_distance(a, b) __node_distance(a, b)
 
diff --git a/arch/sh/include/asm/topology.h b/arch/sh/include/asm/topology.h
index b0a282d..357983d 100644
--- a/arch/sh/include/asm/topology.h
+++ b/arch/sh/include/asm/topology.h
@@ -9,9 +9,6 @@
 #define cpumask_of_node(node)  ((void)node, cpu_online_mask)
 
 #define pcibus_to_node(bus)((void)(bus), -1)
-#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ? \
-   cpu_all_mask : \
-   cpumask_of_node(pcibus_to_node(bus)))
 
 #endif
 
diff --git a/arch/tile/include/asm/pci.h b/arch/tile/include/asm/pci.h
index dfedd7a..473ed46 100644
--- a/arch/tile/include/asm/pci.h
+++ b/arch/tile/include/asm/pci.h
@@ -223,8 +223,6 @@ static inline int pcibios_assign_all_busses(void)
 /* Minimum PCI I/O address, starting at the page boundary. */
 #define PCIBIOS_MIN_IO PAGE_SIZE
 
-/* Use any cpu for PCI. */
-#define 

[PATCH v10 1/8] arm64, numa: adding numa support for arm64 platforms.

2016-02-02 Thread Ganapatrao Kulkarni
Adding numa support for arm64 based platforms.
This patch adds by default the dummy numa node and
maps all memory and cpus to node 0.
using this patch, numa can be simulated on single node arm64 platforms.

Tested-by: Shannon Zhao 
Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
---
 arch/arm64/Kconfig|  25 +++
 arch/arm64/include/asm/mmzone.h   |  12 ++
 arch/arm64/include/asm/numa.h |  43 +
 arch/arm64/include/asm/topology.h |  10 +
 arch/arm64/kernel/pci.c   |  10 +
 arch/arm64/kernel/setup.c |   4 +
 arch/arm64/kernel/smp.c   |   2 +
 arch/arm64/mm/Makefile|   1 +
 arch/arm64/mm/init.c  |  34 +++-
 arch/arm64/mm/mmu.c   |   1 +
 arch/arm64/mm/numa.c  | 387 ++
 11 files changed, 524 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8cc6228..fcf3950 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -74,6 +74,7 @@ config ARM64
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
select HAVE_MEMBLOCK
+   select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_PATA_PLATFORM
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
@@ -534,6 +535,30 @@ config HOTPLUG_CPU
  Say Y here to experiment with turning CPUs off and on.  CPUs
  can be controlled through /sys/devices/system/cpu.
 
+# Common NUMA Features
+config NUMA
+   bool "Numa Memory Allocation and Scheduler Support"
+   depends on SMP
+   help
+ Enable NUMA (Non Uniform Memory Access) support.
+
+ The kernel will try to allocate memory used by a CPU on the
+ local memory of the CPU and add some more
+ NUMA awareness to the kernel.
+
+config NODES_SHIFT
+   int "Maximum NUMA Nodes (as a power of 2)"
+   range 1 10
+   default "2"
+   depends on NEED_MULTIPLE_NODES
+   help
+ Specify the maximum number of NUMA Nodes available on the target
+ system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+   def_bool y
+   depends on NUMA
+
 source kernel/Kconfig.preempt
 source kernel/Kconfig.hz
 
diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 000..a0de9e6
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,12 @@
+#ifndef __ASM_MMZONE_H
+#define __ASM_MMZONE_H
+
+#ifdef CONFIG_NUMA
+
+#include 
+
+extern struct pglist_data *node_data[];
+#define NODE_DATA(nid) (node_data[(nid)])
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_MMZONE_H */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 000..574267f
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,43 @@
+#ifndef __ASM_NUMA_H
+#define __ASM_NUMA_H
+
+#include 
+
+#ifdef CONFIG_NUMA
+
+/* currently, arm64 implements flat NUMA topology */
+#define parent_node(node)  (node)
+
+int __node_distance(int from, int to);
+#define node_distance(a, b) __node_distance(a, b)
+
+extern nodemask_t numa_nodes_parsed __initdata;
+
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+void numa_clear_node(unsigned int cpu);
+
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+const struct cpumask *cpumask_of_node(int node);
+#else
+/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
+static inline const struct cpumask *cpumask_of_node(int node)
+{
+   return node_to_cpumask_map[node];
+}
+#endif
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(int nodeid, u64 start, u64 end);
+void __init numa_set_distance(int from, int to, int distance);
+void __init numa_free_distance(void);
+void numa_store_cpu_info(unsigned int cpu);
+
+#else  /* CONFIG_NUMA */
+
+static inline void numa_store_cpu_info(unsigned int cpu) { }
+static inline void arm64_numa_init(void) { }
+
+#endif /* CONFIG_NUMA */
+
+#endif /* __ASM_NUMA_H */
diff --git a/arch/arm64/include/asm/topology.h 
b/arch/arm64/include/asm/topology.h
index a3e9d6f..8b57339 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -22,6 +22,16 @@ void init_cpu_topology(void);
 void store_cpu_topology(unsigned int cpuid);
 const struct cpumask *cpu_coregroup_mask(int cpu);
 
+#ifdef CONFIG_NUMA
+
+struct pci_bus;
+int pcibus_to_node(struct pci_bus *bus);
+#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ?\
+cpu_all_mask : \
+cpumask_of_node(pcibus_to_node(bus)))
+
+#endif /* CONFIG_NUMA */
+
 #include 
 
 

[PATCH v10 3/8] dt, numa: adding numa dt binding implementation.

2016-02-02 Thread Ganapatrao Kulkarni
dt node parsing for numa topology is done using device property
numa-node-id and device node distance-map.

Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
---
 drivers/of/Kconfig   |  11 +++
 drivers/of/Makefile  |   1 +
 drivers/of/of_numa.c | 207 +++
 include/linux/of.h   |   4 +
 4 files changed, 223 insertions(+)
 create mode 100644 drivers/of/of_numa.c

diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
index e2a4841..8f9cc3a 100644
--- a/drivers/of/Kconfig
+++ b/drivers/of/Kconfig
@@ -112,4 +112,15 @@ config OF_OVERLAY
  While this option is selected automatically when needed, you can
  enable it manually to improve device tree unit test coverage.
 
+config OF_NUMA
+   bool "Device Tree NUMA support"
+   depends on NUMA
+   depends on OF
+   depends on ARM64
+   default y
+   help
+ Enable Device Tree NUMA support.
+ This enables the numa mapping of cpu, memory, io and
+ inter node distances using dt bindings.
+
 endif # OF
diff --git a/drivers/of/Makefile b/drivers/of/Makefile
index 156c072..bee3fa9 100644
--- a/drivers/of/Makefile
+++ b/drivers/of/Makefile
@@ -14,5 +14,6 @@ obj-$(CONFIG_OF_MTD)  += of_mtd.o
 obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
 obj-$(CONFIG_OF_RESOLVE)  += resolver.o
 obj-$(CONFIG_OF_OVERLAY) += overlay.o
+obj-$(CONFIG_OF_NUMA) += of_numa.o
 
 obj-$(CONFIG_OF_UNITTEST) += unittest-data/
diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
new file mode 100644
index 000..1142cdb
--- /dev/null
+++ b/drivers/of/of_numa.c
@@ -0,0 +1,207 @@
+/*
+ * OF NUMA Parsing support.
+ *
+ * Copyright (C) 2015 Cavium Inc.
+ * Author: Ganapatrao Kulkarni 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+
+/* define default numa node to 0 */
+#define DEFAULT_NODE 0
+
+/* Returns nid in the range [0..MAX_NUMNODES-1],
+ * or NUMA_NO_NODE if no valid numa-node-id entry found
+ * or DEFAULT_NODE if no numa-node-id entry exists
+ */
+static int of_numa_prop_to_nid(const __be32 *of_numa_prop, int length)
+{
+   int nid;
+
+   if (!of_numa_prop)
+   return DEFAULT_NODE;
+
+   if (length != sizeof(*of_numa_prop)) {
+   pr_warn("NUMA: Invalid of_numa_prop length %d found.\n",
+   length);
+   return NUMA_NO_NODE;
+   }
+
+   nid = of_read_number(of_numa_prop, 1);
+   if (nid >= MAX_NUMNODES) {
+   pr_warn("NUMA: Invalid numa node %d found.\n", nid);
+   return NUMA_NO_NODE;
+   }
+
+   return nid;
+}
+
+static int __init early_init_of_node_to_nid(unsigned long node)
+{
+   int length;
+   const __be32 *of_numa_prop;
+
+   of_numa_prop = of_get_flat_dt_prop(node, "numa-node-id", );
+
+   return of_numa_prop_to_nid(of_numa_prop, length);
+}
+
+/*
+ * Even though we connect cpus to numa domains later in SMP
+ * init, we need to know the node ids now for all cpus.
+*/
+static int __init early_init_parse_cpu_node(unsigned long node)
+{
+   int nid;
+   const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+
+   if (type == NULL)
+   return 0;
+
+   if (strcmp(type, "cpu") != 0)
+   return 0;
+
+   nid = early_init_of_node_to_nid(node);
+   if (nid == NUMA_NO_NODE)
+   return -EINVAL;
+
+   node_set(nid, numa_nodes_parsed);
+   return 0;
+}
+
+static int __init early_init_parse_memory_node(unsigned long node)
+{
+   const __be32 *reg, *endp;
+   int length;
+   int nid;
+   const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+
+   if (type == NULL)
+   return 0;
+
+   if (strcmp(type, "memory") != 0)
+   return 0;
+
+   nid = early_init_of_node_to_nid(node);
+   if (nid == NUMA_NO_NODE)
+   return -EINVAL;
+
+   reg = of_get_flat_dt_prop(node, "reg", );
+   endp = reg + (length / sizeof(__be32));
+
+   while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+   u64 base, size;
+
+   base = dt_mem_next_cell(dt_root_addr_cells, );
+   size = dt_mem_next_cell(dt_root_size_cells, );
+   pr_debug("NUMA:  base = %llx , node = %u\n",
+   

[PATCH v10 4/8] arm64, numa : Enable numa dt for arm64 platforms.

2016-02-02 Thread Ganapatrao Kulkarni
Adding numa dt binding support for arm64 based platforms.

Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
---
 arch/arm64/include/asm/numa.h |  2 ++
 arch/arm64/kernel/smp.c   |  2 ++
 arch/arm64/mm/numa.c  | 17 +
 3 files changed, 21 insertions(+)

diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index 574267f..e9b4f29 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -31,12 +31,14 @@ void __init arm64_numa_init(void);
 int __init numa_add_memblk(int nodeid, u64 start, u64 end);
 void __init numa_set_distance(int from, int to, int distance);
 void __init numa_free_distance(void);
+void __init early_map_cpu_to_node(unsigned int cpu, int nid);
 void numa_store_cpu_info(unsigned int cpu);
 
 #else  /* CONFIG_NUMA */
 
 static inline void numa_store_cpu_info(unsigned int cpu) { }
 static inline void arm64_numa_init(void) { }
+static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
 
 #endif /* CONFIG_NUMA */
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d6e7d6a..46c45c8 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -520,6 +520,8 @@ static void __init of_parse_and_init_cpus(void)
 
pr_debug("cpu logical map 0x%llx\n", hwid);
cpu_logical_map(cpu_count) = hwid;
+
+   early_map_cpu_to_node(cpu_count, of_node_to_nid(dn));
 next:
cpu_count++;
}
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 60f19a2..7aad5d4 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
@@ -122,6 +123,15 @@ void numa_store_cpu_info(unsigned int cpu)
map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
 }
 
+void __init early_map_cpu_to_node(unsigned int cpu, int nid)
+{
+   /* fallback to node 0 */
+   if (nid < 0 || nid >= MAX_NUMNODES)
+   nid = 0;
+
+   cpu_to_node_map[cpu] = nid;
+}
+
 /**
  * numa_add_memblk - Set node id to memblk
  * @nid: NUMA node ID of the new memblk
@@ -383,5 +393,12 @@ static int __init dummy_numa_init(void)
  */
 void __init arm64_numa_init(void)
 {
+   if (!numa_off) {
+#ifdef CONFIG_OF_NUMA
+   if (!numa_init(of_numa_init))
+   return;
+#endif
+   }
+
numa_init(dummy_numa_init);
 }
-- 
1.8.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v10 5/8] arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node topology.

2016-02-02 Thread Ganapatrao Kulkarni
Adding dt file for Cavium's Thunderx dual socket platform.

Signed-off-by: Ganapatrao Kulkarni 
---
 arch/arm64/boot/dts/cavium/Makefile |   2 +-
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  83 +++
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 
 3 files changed, 890 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi

diff --git a/arch/arm64/boot/dts/cavium/Makefile 
b/arch/arm64/boot/dts/cavium/Makefile
index e34f89d..7fe7067 100644
--- a/arch/arm64/boot/dts/cavium/Makefile
+++ b/arch/arm64/boot/dts/cavium/Makefile
@@ -1,4 +1,4 @@
-dtb-$(CONFIG_ARCH_THUNDER) += thunder-88xx.dtb
+dtb-$(CONFIG_ARCH_THUNDER) += thunder-88xx.dtb thunder-88xx-2n.dtb
 
 always := $(dtb-y)
 subdir-y   := $(dts-dirs)
diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts 
b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
new file mode 100644
index 000..5601e87
--- /dev/null
+++ b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts
@@ -0,0 +1,83 @@
+/*
+ * Cavium Thunder DTS file - Thunder board description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this library; if not, write to the Free
+ * Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ * MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+
+/include/ "thunder-88xx-2n.dtsi"
+
+/ {
+   model = "Cavium ThunderX CN88XX board";
+   compatible = "cavium,thunder-88xx";
+
+   aliases {
+   serial0 = 
+   serial1 = 
+   };
+
+   memory@ {
+   device_type = "memory";
+   reg = <0x0 0x0140 0x3 0xFEC0>;
+   /* socket 0 */
+   numa-node-id = <0>;
+   };
+
+   memory@100 {
+   device_type = "memory";
+   reg = <0x100 0x0040 0x3 0xFFC0>;
+/* socket 1 */
+   numa-node-id = <1>;
+   };
+
+   distance-map {
+   compatible = "numa-distance-map-v1";
+   distance-matrix = <0 0  10>,
+ <0 1  20>,
+ <1 1  10>;
+   };
+};
diff --git a/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi 
b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
new file mode 100644
index 000..b58e5c7
--- /dev/null
+++ b/arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi
@@ -0,0 +1,806 @@
+/*
+ * Cavium Thunder DTS file - Thunder SoC description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ * modify it under the 

[PATCH v10 2/8] Documentation, dt, numa: dt bindings for numa.

2016-02-02 Thread Ganapatrao Kulkarni
DT bindings for numa mapping of memory, cores and IOs.

Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
---
 Documentation/devicetree/bindings/numa.txt | 272 +
 1 file changed, 272 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/numa.txt

diff --git a/Documentation/devicetree/bindings/numa.txt 
b/Documentation/devicetree/bindings/numa.txt
new file mode 100644
index 000..ec5ed7c
--- /dev/null
+++ b/Documentation/devicetree/bindings/numa.txt
@@ -0,0 +1,272 @@
+==
+NUMA binding description.
+==
+
+==
+1 - Introduction
+==
+
+Systems employing a Non Uniform Memory Access (NUMA) architecture contain
+collections of hardware resources including processors, memory, and I/O buses,
+that comprise what is commonly known as a NUMA node.
+Processor accesses to memory within the local NUMA node is generally faster
+than processor accesses to memory outside of the local NUMA node.
+DT defines interfaces that allow the platform to convey NUMA node
+topology information to OS.
+
+==
+2 - numa-node-id
+==
+
+For the purpose of identification, each NUMA node is associated with a unique
+token known as a node id. For the purpose of this binding
+a node id is a 32-bit integer.
+
+A device node is associated with a NUMA node by the presence of a
+numa-node-id property which contains the node id of the device.
+
+Example:
+   /* numa node 0 */
+   numa-node-id = <0>;
+
+   /* numa node 1 */
+   numa-node-id = <1>;
+
+==
+3 - distance-map
+==
+
+The device tree node distance-map describes the relative
+distance (memory latency) between all numa nodes.
+
+- compatible : Should at least contain "numa-distance-map-v1".
+
+- distance-matrix
+  This property defines a matrix to describe the relative distances
+  between all numa nodes.
+  It is represented as a list of node pairs and their relative distance.
+
+  Note:
+   1. Each entry represents distance from first node to second node.
+   The distances are equal in either direction.
+   2. The distance from a node to self (local distance) is represented
+   with value 10 and all internode distance should be represented with
+   a value greater than 10.
+   3. distance-matrix should have entries in lexicographical ascending
+   order of nodes.
+   4. There must be only one device node distance-map which must reside in 
the root node.
+
+Example:
+   4 nodes connected in mesh/ring topology as below,
+
+   0___20__1
+   |   |
+   |   |
+   20 20
+   |   |
+   |   |
+   |___|
+   3   20  2
+
+   if relative distance for each hop is 20,
+   then internode distance would be,
+ 0 -> 1 = 20
+ 1 -> 2 = 20
+ 2 -> 3 = 20
+ 3 -> 0 = 20
+ 0 -> 2 = 40
+ 1 -> 3 = 40
+
+ and dt presentation for this distance matrix is,
+
+   distance-map {
+compatible = "numa-distance-map-v1";
+distance-matrix = <0 0  10>,
+  <0 1  20>,
+  <0 2  40>,
+  <0 3  20>,
+  <1 0  20>,
+  <1 1  10>,
+  <1 2  20>,
+  <1 3  40>,
+  <2 0  40>,
+  <2 1  20>,
+  <2 2  10>,
+  <2 3  20>,
+  <3 0  20>,
+  <3 1  40>,
+  <3 2  20>,
+  <3 3  10>;
+   };
+
+==
+4 - Example dts
+==
+
+Dual socket system consists of 2 boards connected through ccn bus and
+each board having one socket/soc of 8 

[PATCH v10 0/8] arm64, numa: Add numa support for arm64 platforms

2016-02-02 Thread Ganapatrao Kulkarni
v10:
- Incorporated review comments from Rob Herring.
- Moved numa binding and implementation to devicetree core.
- Added cleanup patch to remove redundant NODE_DATA macro from asm 
header files
- Include numa balancing support for arm64 patch in this series.
- Fix tile build issue reported by the kbuild robot(patch 7)

v9: - Added cleanup patch to reuse and avoid redefinition of 
cpumask_of_pcibus
  as suggested from Will Deacon and Bjorn Helgaas.
- Including patch to Make pci-host-generic driver numa aware.
- Incorporated comment from Shannon Zhao.

v8:
- Incorporated review comments of Mark Rutland and Will Deacon.
- Added pci helper function and macro for numa.

v7:
- managing numa memory mapping using memblock.
- Incorporated review comments of Mark Rutland.

v6:
- defined and implemented the numa dt binding using
node property proximity and device node distance-map.
- renamed dt_numa to of_numa

v5:
- created base verion of numa.c which creates dummy numa without using 
dt
  on single socket platforms. Then added patches for dt support.
- Incorporated review comments from Hanjun Guo.

v4:
done changes as per Arnd review comments.

v3:
Added changes to support numa on arm64 based platforms.
Tested these patches on cavium's multinode(2 node topology) platform.
In this patchset, defined and implemented dt bindings for numa mapping
for core and memory using device node property arm,associativity.

v2:
Defined and implemented numa map for memory, cores to node and
proximity distance matrix of nodes.

v1:
Initial patchset to support numa on arm64 platforms.

Note:
1. This patchset is tested for numa with dt on
   thunderx single socket and dual socket boards.
2. Numa DT booting needs the dt memory nodes, which are deleted in 
current efi-stub,
hence to try numa with dt, you need to rebase with ard's patchset.

http://git.linaro.org/people/ard.biesheuvel/linux-arm.git/shortlog/refs/heads/arm64-uefi-early-fdt-handling
3. PATCH[7,8] are not tested for other architectures.

Ganapatrao Kulkarni (8):
  arm64, numa: adding numa support for arm64 platforms.
  Documentation, dt, numa: dt bindings for numa.
  dt, numa: adding numa dt binding implementation.
  arm64, numa : Enable numa dt for arm64 platforms.
  arm64, dt, thunderx: Add initial dts for Cavium Thunderx in 2 node
topology.
  arm64, mm, numa: Adding numa balancing support for arm64.
  topology, cleanup: Avoid redefinition of cpumask_of_pcibus in asm
header files.
  numa, mm, cleanup: remove redundant NODE_DATA macro from asm header
files.

 Documentation/devicetree/bindings/numa.txt  | 272 
 arch/arm64/Kconfig  |  26 +
 arch/arm64/boot/dts/cavium/Makefile |   2 +-
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dts  |  83 +++
 arch/arm64/boot/dts/cavium/thunder-88xx-2n.dtsi | 806 
 arch/arm64/include/asm/mmzone.h |  10 +
 arch/arm64/include/asm/numa.h   |  45 ++
 arch/arm64/include/asm/pgtable.h|  18 +
 arch/arm64/include/asm/topology.h   |   7 +
 arch/arm64/kernel/pci.c |  10 +
 arch/arm64/kernel/setup.c   |   4 +
 arch/arm64/kernel/smp.c |   4 +
 arch/arm64/mm/Makefile  |   1 +
 arch/arm64/mm/init.c|  34 +-
 arch/arm64/mm/mmu.c |   1 +
 arch/arm64/mm/numa.c| 404 
 arch/ia64/include/asm/topology.h|   4 -
 arch/m32r/include/asm/mmzone.h  |   4 +-
 arch/metag/include/asm/mmzone.h |   4 +-
 arch/metag/include/asm/topology.h   |   3 -
 arch/powerpc/include/asm/mmzone.h   |   8 +-
 arch/powerpc/include/asm/topology.h |   4 -
 arch/s390/include/asm/mmzone.h  |   6 +-
 arch/s390/include/asm/pci.h |   2 +-
 arch/s390/include/asm/topology.h|   1 +
 arch/sh/include/asm/mmzone.h|   4 +-
 arch/sh/include/asm/topology.h  |   3 -
 arch/sparc/include/asm/mmzone.h |   6 +-
 arch/tile/include/asm/pci.h |   2 -
 arch/tile/include/asm/topology.h|   3 +
 arch/x86/include/asm/mmzone.h   |   3 +-
 arch/x86/include/asm/mmzone_32.h|   5 -
 arch/x86/include/asm/mmzone_64.h|  17 -
 arch/x86/include/asm/pci.h  |   2 +-
 arch/x86/include/asm/topology.h |   1 +
 drivers/of/Kconfig  |  11 +
 drivers/of/Makefile |   1 +
 drivers/of/of_numa.c| 207 ++
 

[PATCH v10 6/8] arm64, mm, numa: Adding numa balancing support for arm64.

2016-02-02 Thread Ganapatrao Kulkarni
enabled numa balancing for arm64 platforms.
added pte, pmd protnone helpers for use by automatic NUMA balancing.

Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
---
 arch/arm64/Kconfig   |  1 +
 arch/arm64/include/asm/pgtable.h | 18 ++
 2 files changed, 19 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fcf3950..8a7c02a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -11,6 +11,7 @@ config ARM64
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_SUPPORTS_ATOMIC_RMW
+   select ARCH_SUPPORTS_NUMA_BALANCING if NUMA
select ARCH_WANT_OPTIONAL_GPIOLIB
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
select ARCH_WANT_FRAME_POINTERS
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2d545d7a..24ce546 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -347,6 +347,24 @@ static inline pgprot_t mk_sect_prot(pgprot_t prot)
return __pgprot(pgprot_val(prot) & ~PTE_TABLE_BIT);
 }
 
+#ifdef CONFIG_NUMA_BALANCING
+
+/*
+ * These work without NUMA balancing but the kernel does not care. See the
+ * comment in include/asm-generic/pgtable.h
+ */
+static inline int pte_protnone(pte_t pte)
+{
+   return ((pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE);
+}
+
+static inline int pmd_protnone(pmd_t pmd)
+{
+   return pte_protnone(pmd_pte(pmd));
+}
+
+#endif /* CONFIG_NUMA_BALANCING */
+
 /*
  * THP definitions.
  */
-- 
1.8.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v10 8/8] numa, mm, cleanup: remove redundant NODE_DATA macro from asm header files.

2016-02-02 Thread Ganapatrao Kulkarni
NODE_DATA is defined across multiple asm header files.
Moving generic definition to asm-generic/mmzone.h to
remove redundant definitions.

Reviewed-by: Robert Richter 
Signed-off-by: Ganapatrao Kulkarni 
---
 arch/arm64/include/asm/mmzone.h   |  4 +---
 arch/m32r/include/asm/mmzone.h|  4 +---
 arch/metag/include/asm/mmzone.h   |  4 +---
 arch/powerpc/include/asm/mmzone.h |  8 ++--
 arch/s390/include/asm/mmzone.h|  6 +-
 arch/sh/include/asm/mmzone.h  |  4 +---
 arch/sparc/include/asm/mmzone.h   |  6 ++
 arch/x86/include/asm/mmzone.h |  3 +--
 arch/x86/include/asm/mmzone_32.h  |  5 -
 arch/x86/include/asm/mmzone_64.h  | 17 -
 include/asm-generic/mmzone.h  | 24 
 11 files changed, 34 insertions(+), 51 deletions(-)
 delete mode 100644 arch/x86/include/asm/mmzone_64.h
 create mode 100644 include/asm-generic/mmzone.h

diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
index a0de9e6..611a1cf 100644
--- a/arch/arm64/include/asm/mmzone.h
+++ b/arch/arm64/include/asm/mmzone.h
@@ -4,9 +4,7 @@
 #ifdef CONFIG_NUMA
 
 #include 
-
-extern struct pglist_data *node_data[];
-#define NODE_DATA(nid) (node_data[(nid)])
+#include  
 
 #endif /* CONFIG_NUMA */
 #endif /* __ASM_MMZONE_H */
diff --git a/arch/m32r/include/asm/mmzone.h b/arch/m32r/include/asm/mmzone.h
index 115ced3..e3d66a0 100644
--- a/arch/m32r/include/asm/mmzone.h
+++ b/arch/m32r/include/asm/mmzone.h
@@ -7,12 +7,10 @@
 #define _ASM_MMZONE_H_
 
 #include 
+#include  
 
 #ifdef CONFIG_DISCONTIGMEM
 
-extern struct pglist_data *node_data[];
-#define NODE_DATA(nid) (node_data[nid])
-
 #define node_localnr(pfn, nid) ((pfn) - NODE_DATA(nid)->node_start_pfn)
 
 #define pmd_page(pmd)  (pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT))
diff --git a/arch/metag/include/asm/mmzone.h b/arch/metag/include/asm/mmzone.h
index 9c88a9c..b1e95b3 100644
--- a/arch/metag/include/asm/mmzone.h
+++ b/arch/metag/include/asm/mmzone.h
@@ -3,9 +3,7 @@
 
 #ifdef CONFIG_NEED_MULTIPLE_NODES
 #include 
-
-extern struct pglist_data *node_data[];
-#define NODE_DATA(nid) (node_data[nid])
+#include  
 
 static inline int pfn_to_nid(unsigned long pfn)
 {
diff --git a/arch/powerpc/include/asm/mmzone.h 
b/arch/powerpc/include/asm/mmzone.h
index 7b58917..da0c5ba 100644
--- a/arch/powerpc/include/asm/mmzone.h
+++ b/arch/powerpc/include/asm/mmzone.h
@@ -19,12 +19,6 @@
 
 #ifdef CONFIG_NEED_MULTIPLE_NODES
 
-extern struct pglist_data *node_data[];
-/*
- * Return a pointer to the node data for node n.
- */
-#define NODE_DATA(nid) (node_data[nid])
-
 /*
  * Following are specific to this numa platform.
  */
@@ -42,5 +36,7 @@ u64 memory_hotplug_max(void);
 #define memory_hotplug_max() memblock_end_of_DRAM()
 #endif /* CONFIG_NEED_MULTIPLE_NODES */
 
+#include  
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_MMZONE_H_ */
diff --git a/arch/s390/include/asm/mmzone.h b/arch/s390/include/asm/mmzone.h
index a9e834e..91f1fcc 100644
--- a/arch/s390/include/asm/mmzone.h
+++ b/arch/s390/include/asm/mmzone.h
@@ -7,10 +7,6 @@
 #ifndef _ASM_S390_MMZONE_H
 #define _ASM_S390_MMZONE_H
 
-#ifdef CONFIG_NUMA
+#include  
 
-extern struct pglist_data *node_data[];
-#define NODE_DATA(nid) (node_data[nid])
-
-#endif /* CONFIG_NUMA */
 #endif /* _ASM_S390_MMZONE_H */
diff --git a/arch/sh/include/asm/mmzone.h b/arch/sh/include/asm/mmzone.h
index 15a8496..c070d00 100644
--- a/arch/sh/include/asm/mmzone.h
+++ b/arch/sh/include/asm/mmzone.h
@@ -5,9 +5,7 @@
 
 #ifdef CONFIG_NEED_MULTIPLE_NODES
 #include 
-
-extern struct pglist_data *node_data[];
-#define NODE_DATA(nid) (node_data[nid])
+#include  
 
 static inline int pfn_to_nid(unsigned long pfn)
 {
diff --git a/arch/sparc/include/asm/mmzone.h b/arch/sparc/include/asm/mmzone.h
index 99d9b9f..ef1365b 100644
--- a/arch/sparc/include/asm/mmzone.h
+++ b/arch/sparc/include/asm/mmzone.h
@@ -5,13 +5,11 @@
 
 #include 
 
-extern struct pglist_data *node_data[];
-
-#define NODE_DATA(nid) (node_data[nid])
-
 extern int numa_cpu_lookup_table[];
 extern cpumask_t numa_cpumask_lookup_table[];
 
 #endif /* CONFIG_NEED_MULTIPLE_NODES */
 
+#include  
+
 #endif /* _SPARC64_MMZONE_H */
diff --git a/arch/x86/include/asm/mmzone.h b/arch/x86/include/asm/mmzone.h
index d497bc4..5a52815 100644
--- a/arch/x86/include/asm/mmzone.h
+++ b/arch/x86/include/asm/mmzone.h
@@ -1,5 +1,4 @@
 #ifdef CONFIG_X86_32
 # include 
-#else
-# include 
 #endif
+#include  
diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index 1ec990b..09f7cfb 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -8,11 +8,6 @@
 
 #include 
 
-#ifdef CONFIG_NUMA
-extern struct pglist_data *node_data[];
-#define NODE_DATA(nid) (node_data[nid])
-#endif /* CONFIG_NUMA */
-
 #ifdef CONFIG_DISCONTIGMEM
 
 /*
diff --git a/arch/x86/include/asm/mmzone_64.h b/arch/x86/include/asm/mmzone_64.h

Re: [PATCH v6 8/9] Implement kernel live patching for ppc64le (ABIv2)

2016-02-02 Thread Petr Mladek
On Tue 2016-02-02 16:45:23, Torsten Duwe wrote:
> On Tue, Feb 02, 2016 at 01:12:24PM +0100, Petr Mladek wrote:
> > 
> > Hmm, the size of the offset is not a constant. In particular, leaf
> > functions do not set TOC before the mcount location.
> 
> To be slightly more precise, a leaf function that additionally uses
> no global data. No global function calls, no global data access =>
> no need to load the TOC.

Thanks for explanation.
 
> > The result is that kernel crashes when trying to trace leaf function
> 
> The trampoline *requires* a proper TOC pointer to find the remote function
> entry point. If you jump onto the trampoline with the TOC from the caller's
> caller you'll grab some address from somewhere and jump into nirvana.

The dmesg messages suggested someting like this.


> > By other words, it seems that the code generated with -mprofile-kernel
> > option has been buggy in all gcc versions.
> 
> Either that or we need bigger trampolines for everybody.
> 
> Michael, should we grow every module trampoline to always load R2,
> or fix GCC to recognise the generated bl _mcount as a global function call?
> Anton, what do you think?

BTW: Is the trampoline used also for classic probes? If not, we might need
a trampoline for them as well.

Note that TOC is not set only when the problematic functions are
compiled with --mprofile-kernel. I still see the TOC stuff when
compiling only with -pg.


Best Regards,
Petr
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 8/9] Implement kernel live patching for ppc64le (ABIv2)

2016-02-02 Thread Petr Mladek
On Tue 2016-01-26 13:48:53, Petr Mladek wrote:
> On Tue 2016-01-26 11:50:25, Miroslav Benes wrote:
> > 
> > [ added Petr to CC list ]
> > 
> > On Mon, 25 Jan 2016, Torsten Duwe wrote:
> > 
> > >   * create the appropriate files+functions
> > > arch/powerpc/include/asm/livepatch.h
> > > klp_check_compiler_support,
> > > klp_arch_set_pc
> > > arch/powerpc/kernel/livepatch.c with a stub for
> > > klp_write_module_reloc
> > > This is architecture-independent work in progress.
> > >   * introduce a fixup in arch/powerpc/kernel/entry_64.S
> > > for local calls that are becoming global due to live patching.
> > > And of course do the main KLP thing: return to a maybe different
> > > address, possibly altered by the live patching ftrace op.
> > > 
> > > Signed-off-by: Torsten Duwe 
> > 
> > Hi,
> > 
> > I have a few questions...
> > 
> > We still need Petr's patch from [1] to make livepatch work, right? Could 
> > you, please, add it to this patch set to make it self-sufficient?
> > 
> > Second, what is the situation with mcount prologue between gcc < 6 and 
> > gcc-6? Are there only 12 bytes in gcc-6 prologue? If yes, we need to 
> > change Petr's patch to make it more general and to be able to cope with 
> > different prologues. This is unfortunate. Either way, please mention it 
> > somewhere in a changelog.
> 
> I am going to update the extra patch. There is an idea to detect the
> offset during build by scrips/recordmcount. This tool looks for the
> ftrace locations. The offset should always be a constant that depends
> on the used architecture, compiler, and compiler flags.
> 
> The tool is called post build. We might need to pass the constant
> as a symbol added to the binary. The tool already adds some symbols.

Hmm, the size of the offset is not a constant. In particular, leaf
functions do not set TOC before the mcount location.

For example, the code generated for int_to_scsilun() looks like:


02d0 :
 2d0:   a6 02 08 7c mflrr0
 2d4:   10 00 01 f8 std r0,16(r1)
 2d8:   01 00 00 48 bl  2d8 
2d8: R_PPC64_REL24  _mcount
 2dc:   a6 02 08 7c mflrr0
 2e0:   10 00 01 f8 std r0,16(r1)
 2e4:   e1 ff 21 f8 stdur1,-32(r1)
 2e8:   00 00 20 39 li  r9,0
 2ec:   00 00 24 f9 std r9,0(r4)
 2f0:   04 00 20 39 li  r9,4
 2f4:   a6 03 29 7d mtctr   r9
 2f8:   00 00 40 39 li  r10,0
 2fc:   02 c2 68 78 rldicl  r8,r3,56,8
 300:   78 23 89 7c mr  r9,r4
 304:   ee 51 09 7d stbux   r8,r9,r10
 308:   02 00 4a 39 addir10,r10,2
 30c:   01 00 69 98 stb r3,1(r9)
 310:   02 84 63 78 rldicl  r3,r3,48,16
 314:   e8 ff 00 42 bdnz2fc 
 318:   20 00 21 38 addir1,r1,32
 31c:   10 00 01 e8 ld  r0,16(r1)
 320:   a6 03 08 7c mtlrr0
 324:   20 00 80 4e blr
 328:   00 00 00 60 nop
 32c:   00 00 42 60 ori r2,r2,0


Note that non-leaf functions starts with

0330 :
 330:   00 00 4c 3c addis   r2,r12,0
330: R_PPC64_REL16_HA   .TOC.
 334:   00 00 42 38 addir2,r2,0
334: R_PPC64_REL16_LO   .TOC.+0x4
 338:   a6 02 08 7c mflrr0
 33c:   10 00 01 f8 std r0,16(r1)
 340:   01 00 00 48 bl  340 
340: R_PPC64_REL24  _mcount


The above code is generated from kernel-4.5-rc1 sources using

$> gcc --version
gcc (SUSE Linux) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


But I get similar code also with

$> gcc-6 --version
gcc-6 (SUSE Linux) 6.0.0 20160121 (experimental) [trunk revision 232670]
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



The result is that kernel crashes when trying to trace leaf function
from modules. The mcount location is replaced with a call (branch)
that does not work without the TOC stuff.

By other words, it seems that the code generated with -mprofile-kernel
option has been buggy in all gcc versions.

I am curious that nobody found this earlier. Do I something wrong,
please?


Best Regards,
Petr
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 8/9] Implement kernel live patching for ppc64le (ABIv2)

2016-02-02 Thread Denis Kirjanov
On 1/25/16, Torsten Duwe  wrote:
>   * create the appropriate files+functions
> arch/powerpc/include/asm/livepatch.h
> klp_check_compiler_support,
> klp_arch_set_pc
> arch/powerpc/kernel/livepatch.c with a stub for
> klp_write_module_reloc
> This is architecture-independent work in progress.
>   * introduce a fixup in arch/powerpc/kernel/entry_64.S
> for local calls that are becoming global due to live patching.
> And of course do the main KLP thing: return to a maybe different
> address, possibly altered by the live patching ftrace op.
>
> Signed-off-by: Torsten Duwe 
> ---
>  arch/powerpc/include/asm/livepatch.h | 45 +++
>  arch/powerpc/kernel/entry_64.S   | 51
> +---
>  arch/powerpc/kernel/livepatch.c  | 38 +++
>  3 files changed, 130 insertions(+), 4 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/livepatch.h
>  create mode 100644 arch/powerpc/kernel/livepatch.c
>
> diff --git a/arch/powerpc/include/asm/livepatch.h
> b/arch/powerpc/include/asm/livepatch.h
> new file mode 100644
> index 000..44e8a2d
> --- /dev/null
> +++ b/arch/powerpc/include/asm/livepatch.h
> @@ -0,0 +1,45 @@
> +/*
> + * livepatch.h - powerpc-specific Kernel Live Patching Core
> + *
> + * Copyright (C) 2015 SUSE
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see .
> + */
> +#ifndef _ASM_POWERPC64_LIVEPATCH_H
> +#define _ASM_POWERPC64_LIVEPATCH_H
> +
> +#include 
> +#include 
> +
> +#ifdef CONFIG_LIVEPATCH
> +static inline int klp_check_compiler_support(void)
> +{
> +#if !defined(_CALL_ELF) || _CALL_ELF != 2 ||
> !defined(CC_USING_MPROFILE_KERNEL)
> + return 1;
> +#endif
> + return 0;
> +}
This function can be boolean.
> +
> +extern int klp_write_module_reloc(struct module *mod, unsigned long type,
> +unsigned long loc, unsigned long value);
> +
> +static inline void klp_arch_set_pc(struct pt_regs *regs, unsigned long ip)
> +{
> + regs->nip = ip;
> +}
> +#else
> +#error Live patching support is disabled; check CONFIG_LIVEPATCH
> +#endif
> +
> +#endif /* _ASM_POWERPC64_LIVEPATCH_H */
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 9e98aa1..f6e3ee7 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -1267,6 +1267,9 @@ _GLOBAL(ftrace_caller)
>   mflrr3
>   std r3, _NIP(r1)
>   std r3, 16(r1)
> +#ifdef CONFIG_LIVEPATCH
> + mr  r14,r3  /* remember old NIP */
> +#endif
>   subir3, r3, MCOUNT_INSN_SIZE
>   mfmsr   r4
>   std r4, _MSR(r1)
> @@ -1283,7 +1286,10 @@ ftrace_call:
>   nop
>
>   ld  r3, _NIP(r1)
> - mtlrr3
> + mtctr   r3  /* prepare to jump there */
> +#ifdef CONFIG_LIVEPATCH
> + cmpdr14,r3  /* has NIP been altered? */
> +#endif
>
>   REST_8GPRS(0,r1)
>   REST_8GPRS(8,r1)
> @@ -1296,6 +1302,27 @@ ftrace_call:
>   mtlrr12
>   mr  r2,r0   /* restore callee's TOC */
>
> +#ifdef CONFIG_LIVEPATCH
> + beq+4f  /* likely(old_NIP == new_NIP) */
> +
> + /* For a local call, restore this TOC after calling the patch function.
> +  * For a global call, it does not matter what we restore here,
> +  * since the global caller does its own restore right afterwards,
> +  * anyway. Just insert a KLP_return_helper frame in any case,
> +  * so a patch function can always count on the changed stack offsets.
> +  */
> + stdur1,-32(r1)  /* open new mini stack frame */
> + std r0,24(r1)   /* save TOC now, unconditionally. */
> + bl  5f
> +5:   mflrr12
> + addir12,r12,(KLP_return_helper+4-.)@l
> + std r12,LRSAVE(r1)
> + mtlrr12
> + mfctr   r12 /* allow for TOC calculation in newfunc */
> + bctr
> +4:
> +#endif
> +
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>   stdur1, -112(r1)
>  .globl ftrace_graph_call
> @@ -1305,15 +1332,31 @@ _GLOBAL(ftrace_graph_stub)
>   addir1, r1, 112
>  #endif
>
> - mflrr0  /* move this LR to CTR */
> - mtctr   r0
> -
>   ld  r0,LRSAVE(r1)   /* restore callee's lr at _mcount site */
>   mtlrr0
>

Re: Failure on latest GIT - implicit declaration of function ‘pte_swp_clear_soft_dirty’

2016-02-02 Thread Pranith Kumar
On Tue, Feb 2, 2016 at 1:48 AM, Aneesh Kumar K.V
 wrote:
>
> This patch didn't work for you ?
>
> http://mid.gmane.org/1454086969-21074-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com
>

This actually is a better patch. I didn't realize that we have the _64 version.

Thanks!
-- 
Pranith
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 8/9] Implement kernel live patching for ppc64le (ABIv2)

2016-02-02 Thread Torsten Duwe
On Tue, Feb 02, 2016 at 01:12:24PM +0100, Petr Mladek wrote:
> 
> Hmm, the size of the offset is not a constant. In particular, leaf
> functions do not set TOC before the mcount location.

To be slightly more precise, a leaf function that additionally uses
no global data. No global function calls, no global data access =>
no need to load the TOC.

> For example, the code generated for int_to_scsilun() looks like:
> 
> 
> 02d0 :
>  2d0:   a6 02 08 7c mflrr0
>  2d4:   10 00 01 f8 std r0,16(r1)
>  2d8:   01 00 00 48 bl  2d8 
> 2d8: R_PPC64_REL24  _mcount
[...]
> The above code is generated from kernel-4.5-rc1 sources using
> 
> $> gcc --version
> gcc (SUSE Linux) 4.8.5
> 
> But I get similar code also with
> 
> $> gcc-6 --version
> gcc-6 (SUSE Linux) 6.0.0 20160121 (experimental) [trunk revision 232670]
> 
> 
> The result is that kernel crashes when trying to trace leaf function

The trampoline *requires* a proper TOC pointer to find the remote function
entry point. If you jump onto the trampoline with the TOC from the caller's
caller you'll grab some address from somewhere and jump into nirvana.

> By other words, it seems that the code generated with -mprofile-kernel
> option has been buggy in all gcc versions.

Either that or we need bigger trampolines for everybody.

Michael, should we grow every module trampoline to always load R2,
or fix GCC to recognise the generated bl _mcount as a global function call?
Anton, what do you think?

Torsten

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] mtd/ifc: Add support for IFC controller version 2.0

2016-02-02 Thread Raghav Dogra
The new IFC controller version 2.0 has a different memory map page.
Upto IFC 1.4 PAGE size is 4 KB and from IFC2.0 PAGE size is 64KB.
This patch segregates the IFC global and runtime registers to appropriate
PAGE sizes.

Signed-off-by: Jaiprakash Singh 
Signed-off-by: Raghav Dogra 
---
This patch is the new version of following patch with changed title:
https://patchwork.ozlabs.org/patch/557391/

This patch is dependent on the 
"drivers/memory: Add deep sleep support for IFC" patch:
https://patchwork.ozlabs.org/patch/564785/

 drivers/memory/fsl_ifc.c| 250 +---
 drivers/mtd/nand/fsl_ifc_nand.c |  72 ++--
 include/linux/fsl_ifc.h |  48 +---
 3 files changed, 202 insertions(+), 168 deletions(-)

diff --git a/drivers/memory/fsl_ifc.c b/drivers/memory/fsl_ifc.c
index f82a245..d00076b 100644
--- a/drivers/memory/fsl_ifc.c
+++ b/drivers/memory/fsl_ifc.c
@@ -63,11 +63,11 @@ int fsl_ifc_find(phys_addr_t addr_base)
 {
int i = 0;
 
-   if (!fsl_ifc_ctrl_dev || !fsl_ifc_ctrl_dev->regs)
+   if (!fsl_ifc_ctrl_dev || !fsl_ifc_ctrl_dev->gregs)
return -ENODEV;
 
for (i = 0; i < fsl_ifc_ctrl_dev->banks; i++) {
-   u32 cspr = ifc_in32(_ifc_ctrl_dev->regs->cspr_cs[i].cspr);
+   u32 cspr = ifc_in32(_ifc_ctrl_dev->gregs->cspr_cs[i].cspr);
if (cspr & CSPR_V && (cspr & CSPR_BA) ==
convert_ifc_address(addr_base))
return i;
@@ -79,7 +79,7 @@ EXPORT_SYMBOL(fsl_ifc_find);
 
 static int fsl_ifc_ctrl_init(struct fsl_ifc_ctrl *ctrl)
 {
-   struct fsl_ifc_regs __iomem *ifc = ctrl->regs;
+   struct fsl_ifc_global __iomem *ifc = ctrl->gregs;
 
/*
 * Clear all the common status and event registers
@@ -108,7 +108,7 @@ static int fsl_ifc_ctrl_remove(struct platform_device *dev)
irq_dispose_mapping(ctrl->nand_irq);
irq_dispose_mapping(ctrl->irq);
 
-   iounmap(ctrl->regs);
+   iounmap(ctrl->gregs);
 
dev_set_drvdata(>dev, NULL);
kfree(ctrl);
@@ -126,7 +126,7 @@ static DEFINE_SPINLOCK(nand_irq_lock);
 
 static u32 check_nand_stat(struct fsl_ifc_ctrl *ctrl)
 {
-   struct fsl_ifc_regs __iomem *ifc = ctrl->regs;
+   struct fsl_ifc_runtime __iomem *ifc = ctrl->rregs;
unsigned long flags;
u32 stat;
 
@@ -161,7 +161,7 @@ static irqreturn_t fsl_ifc_nand_irq(int irqno, void *data)
 static irqreturn_t fsl_ifc_ctrl_irq(int irqno, void *data)
 {
struct fsl_ifc_ctrl *ctrl = data;
-   struct fsl_ifc_regs __iomem *ifc = ctrl->regs;
+   struct fsl_ifc_global __iomem *ifc = ctrl->gregs;
u32 err_axiid, err_srcid, status, cs_err, err_addr;
irqreturn_t ret = IRQ_NONE;
 
@@ -219,6 +219,7 @@ static int fsl_ifc_ctrl_probe(struct platform_device *dev)
 {
int ret = 0;
int version, banks;
+   void __iomem *addr;
 
dev_info(>dev, "Freescale Integrated Flash Controller\n");
 
@@ -229,22 +230,13 @@ static int fsl_ifc_ctrl_probe(struct platform_device *dev)
dev_set_drvdata(>dev, fsl_ifc_ctrl_dev);
 
/* IOMAP the entire IFC region */
-   fsl_ifc_ctrl_dev->regs = of_iomap(dev->dev.of_node, 0);
-   if (!fsl_ifc_ctrl_dev->regs) {
+   fsl_ifc_ctrl_dev->gregs = of_iomap(dev->dev.of_node, 0);
+   if (!fsl_ifc_ctrl_dev->gregs) {
dev_err(>dev, "failed to get memory region\n");
ret = -ENODEV;
goto err;
}
 
-   version = ifc_in32(_ifc_ctrl_dev->regs->ifc_rev) &
-   FSL_IFC_VERSION_MASK;
-   banks = (version == FSL_IFC_VERSION_1_0_0) ? 4 : 8;
-   dev_info(>dev, "IFC version %d.%d, %d banks\n",
-   version >> 24, (version >> 16) & 0xf, banks);
-
-   fsl_ifc_ctrl_dev->version = version;
-   fsl_ifc_ctrl_dev->banks = banks;
-
if (of_property_read_bool(dev->dev.of_node, "little-endian")) {
fsl_ifc_ctrl_dev->little_endian = true;
dev_dbg(>dev, "IFC REGISTERS are LITTLE endian\n");
@@ -253,8 +245,9 @@ static int fsl_ifc_ctrl_probe(struct platform_device *dev)
dev_dbg(>dev, "IFC REGISTERS are BIG endian\n");
}
 
-   version = ioread32be(_ifc_ctrl_dev->regs->ifc_rev) &
+   version = ifc_in32(_ifc_ctrl_dev->gregs->ifc_rev) &
FSL_IFC_VERSION_MASK;
+
banks = (version == FSL_IFC_VERSION_1_0_0) ? 4 : 8;
dev_info(>dev, "IFC version %d.%d, %d banks\n",
version >> 24, (version >> 16) & 0xf, banks);
@@ -262,6 +255,13 @@ static int fsl_ifc_ctrl_probe(struct platform_device *dev)
fsl_ifc_ctrl_dev->version = version;
fsl_ifc_ctrl_dev->banks = banks;
 
+   addr = fsl_ifc_ctrl_dev->gregs;
+   if (version >= FSL_IFC_VERSION_2_0_0)
+   addr += PGOFFSET_64K;
+   else
+   addr += PGOFFSET_4K;
+   

Re: [PATCH v6 1/9] ppc64 (le): prepare for -mprofile-kernel

2016-02-02 Thread AKASHI Takahiro

Hi,

On 01/26/2016 12:26 AM, Torsten Duwe wrote:

The gcc switch -mprofile-kernel, available for ppc64 on gcc > 4.8.5,
allows to call _mcount very early in the function, which low-level
ASM code and code patching functions need to consider.
Especially the link register and the parameter registers are still
alive and not yet saved into a new stack frame.


I'm thinking of implementing live patch support *for arm64*, and as part of
those efforts, we are proposing[1] a new *generic* gcc option, -fprolog-add=N.
This option will insert N nop instructions at the beginning of each function.
So we have to initialize those codes at the boot time to later utilize
them for FTRACE_WITH_REGS. Other than that, it will work similarly
with -mfentry on x86 (and -mprofile-kernel?).

I'm totally unfamiliar with ppc architecture, but just wondering
whether this option will also be useful for other architectures.

I will really appreciate you if you share your thoughts with me, please?

[1]  https://gcc.gnu.org/ml/gcc/2015-05/msg00267.html, and
 https://gcc.gnu.org/ml/gcc/2015-10/msg00090.html

Thanks,
-Takahiro AKASHI


Signed-off-by: Torsten Duwe 
---
  arch/powerpc/kernel/entry_64.S  | 45 +++--
  arch/powerpc/kernel/ftrace.c| 12 +--
  arch/powerpc/kernel/module_64.c | 14 +
  3 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index a94f155..e7cd043 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -1206,7 +1206,12 @@ _GLOBAL(enter_prom)
  #ifdef CONFIG_DYNAMIC_FTRACE
  _GLOBAL(mcount)
  _GLOBAL(_mcount)
-   blr
+   std r0,LRSAVE(r1) /* gcc6 does this _after_ this call _only_ */
+   mflrr0
+   mtctr   r0
+   ld  r0,LRSAVE(r1)
+   mtlrr0
+   bctr

  _GLOBAL_TOC(ftrace_caller)
/* Taken from output of objdump from lib64/glibc */
@@ -1262,13 +1267,28 @@ _GLOBAL(ftrace_stub)

  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
  _GLOBAL(ftrace_graph_caller)
+#ifdef CC_USING_MPROFILE_KERNEL
+   /* with -mprofile-kernel, parameter regs are still alive at _mcount */
+   std r10, 104(r1)
+   std r9, 96(r1)
+   std r8, 88(r1)
+   std r7, 80(r1)
+   std r6, 72(r1)
+   std r5, 64(r1)
+   std r4, 56(r1)
+   std r3, 48(r1)
+   mfctr   r4  /* ftrace_caller has moved local addr here */
+   std r4, 40(r1)
+   mflrr3  /* ftrace_caller has restored LR from stack */
+#else
/* load r4 with local address */
ld  r4, 128(r1)
-   subir4, r4, MCOUNT_INSN_SIZE

/* Grab the LR out of the caller stack frame */
ld  r11, 112(r1)
ld  r3, 16(r11)
+#endif
+   subir4, r4, MCOUNT_INSN_SIZE

bl  prepare_ftrace_return
nop
@@ -1277,6 +1297,26 @@ _GLOBAL(ftrace_graph_caller)
 * prepare_ftrace_return gives us the address we divert to.
 * Change the LR in the callers stack frame to this.
 */
+
+#ifdef CC_USING_MPROFILE_KERNEL
+   mtlrr3
+
+   ld  r0, 40(r1)
+   mtctr   r0
+   ld  r10, 104(r1)
+   ld  r9, 96(r1)
+   ld  r8, 88(r1)
+   ld  r7, 80(r1)
+   ld  r6, 72(r1)
+   ld  r5, 64(r1)
+   ld  r4, 56(r1)
+   ld  r3, 48(r1)
+
+   addir1, r1, 112
+   mflrr0
+   std r0, LRSAVE(r1)
+   bctr
+#else
ld  r11, 112(r1)
std r3, 16(r11)

@@ -1284,6 +1324,7 @@ _GLOBAL(ftrace_graph_caller)
mtlrr0
addir1, r1, 112
blr
+#endif

  _GLOBAL(return_to_handler)
/* need to save return values */
diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index 44d4d8e..080c525 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -306,11 +306,19 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long 
addr)
 * The load offset is different depending on the ABI. For simplicity
 * just mask it out when doing the compare.
 */
+#ifndef CC_USING_MPROFILE_KERNEL
if ((op[0] != 0x4808) || ((op[1] & 0x) != 0xe841)) {
-   pr_err("Unexpected call sequence: %x %x\n", op[0], op[1]);
+   pr_err("Unexpected call sequence at %p: %x %x\n",
+   ip, op[0], op[1]);
return -EINVAL;
}
-
+#else
+   /* look for patched "NOP" on ppc64 with -mprofile-kernel */
+   if (op[0] != 0x6000) {
+   pr_err("Unexpected call at %p: %x\n", ip, op[0]);
+   return -EINVAL;
+   }
+#endif
/* If we never set up a trampoline to ftrace_caller, then bail */
if (!rec->arch.mod->arch.tramp) {
pr_err("No ftrace trampoline\n");
diff --git a/arch/powerpc/kernel/module_64.c 

Re: Failure on latest GIT - appletouch fails to register

2016-02-02 Thread Mike
Not made any less confusing with the presence of the evdev event4 and
correct /dev/input/mouse ... cat on the device indicates no input is
passed. Xorg correctly finds and assigns the pad and functions, evdev sets
it up... But no inputs registered, i cannot spot any changes comparing to
4.3.3 that would account for this ...

On 3 February 2016 at 00:05, Mike  wrote:

> Normally as far as i can tell it should register it as input and that's
> not happening, verified the config entry CONFIG_MOUSE_APPLETOUCH is
> selected and compiled in both tests i've done for rc-1 and rc-2.
>
> syslog.1:Feb  1 21:58:40 PowerBook-G4 kernel: [   14.744538] usbcore:
> registered new interface driver appletouch
> syslog.1:Feb  1 22:28:05 PowerBook-G4 kernel: [   14.904449] input:
> appletouch as /devices/pci0001:10/0001:10:1a.0/usb2/2-2/2-2:1.1/input/input6
> syslog.1:Feb  1 22:28:05 PowerBook-G4 kernel: [   14.905904] usbcore:
> registered new interface driver appletouch
> syslog.1:Feb  2 03:39:38 PowerBook-G4 kernel: [   15.423289] usbcore:
> registered new interface driver appletouch
>
> Any clues? Worked in 4.3.3 at least . Anyone point me to where i could
> begin to look? Any structures changed which could account for this?
>
>
> On 2 February 2016 at 17:55, Mike  wrote:
>
>> Agreed, raised an eyebrow initially when select ppc64 and 32 :D
>>
>> I'll give a word on the trackpad issue later, cant remember seeing any
>> changes that ought effect it really. guess the compile is done in a good
>> hour or so, took the tiime to slim it down to someting reasonable
>>
>> Thanks man
>> On 2 Feb 2016 18:14, "Pranith Kumar"  wrote:
>>
>>> On Tue, Feb 2, 2016 at 1:48 AM, Aneesh Kumar K.V
>>>  wrote:
>>> >
>>> > This patch didn't work for you ?
>>> >
>>> >
>>> http://mid.gmane.org/1454086969-21074-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com
>>> >
>>>
>>> This actually is a better patch. I didn't realize that we have the _64
>>> version.
>>>
>>> Thanks!
>>> --
>>> Pranith
>>>
>>
>
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFCv2 3/9] arch/powerpc: Handle removing maybe-present bolted HPTEs

2016-02-02 Thread Denis Kirjanov
On 1/29/16, David Gibson  wrote:
> At the moment the hpte_removebolted callback in ppc_md returns void and
> will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
> place.  This is awkward for the case of cleaning up a mapping which was
> partially made before failing.
>
> So, we add a return value to hpte_removebolted, and have it return ENOENT
> in the case that the HPTE to remove didn't exist in the first place.
>
> In the (sole) caller, we propagate errors in hpte_removebolted to its
> caller to handle.  However, we handle ENOENT specially, continuing to
> complete the unmapping over the specified range before returning the error
> to the caller.
>
> This means that htab_remove_mapping() will work sanely on a partially
> present mapping, removing any HPTEs which are present, while also returning
> ENOENT to its caller in case it's important there.
>
> There are two callers of htab_remove_mapping():
>- In remove_section_mapping() we already WARN_ON() any error return,
>  which is reasonable - in this case the mapping should be fully
>  present
>- In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
>  just a WARN_ON() in the case of ENOENT, since failing to remove a
>  mapping that wasn't there in the first place probably shouldn't be
>  fatal.
>
> Signed-off-by: David Gibson 
> ---
>  arch/powerpc/include/asm/machdep.h|  2 +-
>  arch/powerpc/mm/hash_utils_64.c   | 10 +++---
>  arch/powerpc/mm/init_64.c |  9 +
>  arch/powerpc/platforms/pseries/lpar.c |  7 +--
>  4 files changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/machdep.h
> b/arch/powerpc/include/asm/machdep.h
> index 3f191f5..a7d3f66 100644
> --- a/arch/powerpc/include/asm/machdep.h
> +++ b/arch/powerpc/include/asm/machdep.h
> @@ -54,7 +54,7 @@ struct machdep_calls {
>  int psize, int apsize,
>  int ssize);
>   long(*hpte_remove)(unsigned long hpte_group);
> - void(*hpte_removebolted)(unsigned long ea,
> + long(*hpte_removebolted)(unsigned long ea,
>int psize, int ssize);
>   void(*flush_hash_range)(unsigned long number, int local);
>   void(*hugepage_invalidate)(unsigned long vsid,
> diff --git a/arch/powerpc/mm/hash_utils_64.c
> b/arch/powerpc/mm/hash_utils_64.c
> index 9f7d727..0737eae 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -269,6 +269,7 @@ int htab_remove_mapping(unsigned long vstart, unsigned
> long vend,
>  {
>   unsigned long vaddr;
>   unsigned int step, shift;
> + int rc = 0;
>
>   shift = mmu_psize_defs[psize].shift;
>   step = 1 << shift;
> @@ -276,10 +277,13 @@ int htab_remove_mapping(unsigned long vstart, unsigned
> long vend,
>   if (!ppc_md.hpte_removebolted)
>   return -ENODEV;
>
> - for (vaddr = vstart; vaddr < vend; vaddr += step)
> - ppc_md.hpte_removebolted(vaddr, psize, ssize);
> + for (vaddr = vstart; vaddr < vend; vaddr += step) {
> + rc = ppc_md.hpte_removebolted(vaddr, psize, ssize);
but the function proto return type is long.

> + if ((rc < 0) && (rc != -ENOENT))
> + return rc;
> + }
>
> - return 0;
> + return rc;
>  }
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index 379a6a9..baa1a23 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -232,10 +232,11 @@ static void __meminit vmemmap_create_mapping(unsigned
> long start,
>  static void vmemmap_remove_mapping(unsigned long start,
>  unsigned long page_size)
>  {
> - int mapped = htab_remove_mapping(start, start + page_size,
> -  mmu_vmemmap_psize,
> -  mmu_kernel_ssize);
> - BUG_ON(mapped < 0);
> + int rc = htab_remove_mapping(start, start + page_size,
> +  mmu_vmemmap_psize,
> +  mmu_kernel_ssize);
> + BUG_ON((rc < 0) && (rc != -ENOENT));
> + WARN_ON(rc == -ENOENT);
>  }
>  #endif
>
> diff --git a/arch/powerpc/platforms/pseries/lpar.c
> b/arch/powerpc/platforms/pseries/lpar.c
> index 477290a..92d472d 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -505,7 +505,7 @@ static void pSeries_lpar_hugepage_invalidate(unsigned
> long vsid,
>  }
>  #endif
>
> -static void pSeries_lpar_hpte_removebolted(unsigned long ea,
> +static long pSeries_lpar_hpte_removebolted(unsigned long ea,
>  int psize, int ssize)
>  {
>   unsigned long vpn;
> @@ -515,11 +515,14 @@ 

Re: [PATCH] ppc64 boot: Wait for boot cpu to show up if nr_cpus limit is about to hit.

2016-02-02 Thread Denis Kirjanov
On 2/1/16, Mahesh J Salgaonkar  wrote:
> From: Mahesh Salgaonkar 
>
> The kernel boot parameter 'nr_cpus=' allows one to specify number of
> possible cpus in the system. In the normal scenario the first cpu (cpu0)
> that shows up is the boot cpu and hence it gets covered under nr_cpus
> limit.
>
> But this assumption will be broken in kdump scenario where kdump kenrel
> after a crash can boot up on an non-zero boot cpu. The paca structure
> allocation depends on value of nr_cpus and is indexed using logical cpu
> ids. This definetly will be an issue if boot cpu id > nr_cpus
And what happend in this case? Have you tried it out?
>
> This patch modifies allocate_pacas() and smp_setup_cpu_maps() to
> accommodate boot cpu for the case where boot_cpuid > nr_cpu_ids.
>
> This change would help to reduce the memory reservation requirement for
> kdump on ppc64.
>
> Signed-off-by: Mahesh Salgaonkar 
> ---
>  arch/powerpc/include/asm/paca.h|3 +++
>  arch/powerpc/include/asm/smp.h |1 +
>  arch/powerpc/kernel/paca.c |   23 ++
>  arch/powerpc/kernel/prom.c |   37
> +++-
>  arch/powerpc/kernel/setup-common.c |   25 
>  5 files changed, 83 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/paca.h
> b/arch/powerpc/include/asm/paca.h
> index 70bd438..9be48b4 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -41,6 +41,9 @@ extern unsigned int debug_smp_processor_id(void); /* from
> linux/smp.h */
>  #define get_lppaca() (get_paca()->lppaca_ptr)
>  #define get_slb_shadow() (get_paca()->slb_shadow_ptr)
>
> +/* Maximum number of threads per core. */
> +#define  MAX_SMT 8
> +
>  struct task_struct;
>
>  /*
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 825663c..0a5b99f 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -30,6 +30,7 @@
>  #include 
>
>  extern int boot_cpuid;
> +extern int boot_hw_cpuid;
>  extern int spinning_secondaries;
>
>  extern void cpu_die(void);
> diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> index 01ea0ed..96e5715 100644
> --- a/arch/powerpc/kernel/paca.c
> +++ b/arch/powerpc/kernel/paca.c
> @@ -206,6 +206,7 @@ void __init allocate_pacas(void)
>  {
>   u64 limit;
>   int cpu;
> + int nr_cpus;
>
>   limit = ppc64_rma_size;
>
> @@ -218,20 +219,32 @@ void __init allocate_pacas(void)
>   limit = min(0x1000ULL, limit);
>  #endif
>
> - paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpu_ids);
> + /*
> +  * Always align up the nr_cpu_ids to SMT threads and allocate
> +  * the paca. This will help us to prepare for a situation where
> +  * boot cpu id > nr_cpus_id. We will use the last nthreads
> +  * slots (nthreads == threads per core) to accommodate a core
> +  * that contains boot cpu thread.
> +  *
> +  * Do not change nr_cpu_ids value here. Let us do that in
> +  * early_init_dt_scan_cpus() where we know exact value
> +  * of threads per core.
> +  */
> + nr_cpus = _ALIGN_UP(nr_cpu_ids, MAX_SMT);
> + paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpus);
>
>   paca = __va(memblock_alloc_base(paca_size, PAGE_SIZE, limit));
>   memset(paca, 0, paca_size);
>
>   printk(KERN_DEBUG "Allocated %u bytes for %d pacas at %p\n",
> - paca_size, nr_cpu_ids, paca);
> + paca_size, nr_cpus, paca);
>
> - allocate_lppacas(nr_cpu_ids, limit);
> + allocate_lppacas(nr_cpus, limit);
>
> - allocate_slb_shadows(nr_cpu_ids, limit);
> + allocate_slb_shadows(nr_cpus, limit);
>
>   /* Can't use for_each_*_cpu, as they aren't functional yet */
> - for (cpu = 0; cpu < nr_cpu_ids; cpu++)
> + for (cpu = 0; cpu < nr_cpus; cpu++)
>   initialise_paca([cpu], cpu);
>  }
>
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 7030b03..9d1568f 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -291,6 +291,29 @@ static void __init
> check_cpu_feature_properties(unsigned long node)
>   }
>  }
>
> +/*
> + * Adjust the logical id of a boot cpu to fall under nr_cpu_ids. Map it to
> + * last core slot in the allocated paca array.
> + *
> + * e.g. on SMT=8 system, kernel booted with nr_cpus=1 and boot cpu = 33,
> + * align nr_cpu_ids to MAX_SMT value 8. Allocate paca array to hold up-to
> + * MAX_SMT=8 cpus. Since boot cpu 33 is greater than nr_cpus (8), adjust
> + * its logical id so that new id becomes less than nr_cpu_ids. Make sure
> + * that boot cpu's new logical id is aligned to its thread id and falls
> + * under last nthreads slots available in paca array. In this case the
> + * boot cpu 33 is adjusted to new boot cpu id 1.
> + *
> + */
> +static 

[PATCH v2] ppc64 boot: Wait for boot cpu to show up if nr_cpus limit is about to hit.

2016-02-02 Thread Mahesh J Salgaonkar
From: Mahesh Salgaonkar 

The kernel boot parameter 'nr_cpus=' allows one to specify number of
possible cpus in the system. In the normal scenario the first cpu (cpu0)
that shows up is the boot cpu and hence it gets covered under nr_cpus
limit.

But this assumption will be broken in kdump scenario where kdump kenrel
after a crash can boot up on an non-zero boot cpu. The paca structure
allocation depends on value of nr_cpus and is indexed using logical cpu
ids. This definetly will be an issue if boot cpu id > nr_cpus

This patch modifies allocate_pacas() and smp_setup_cpu_maps() to
accommodate boot cpu for the case where boot_cpuid > nr_cpu_ids.

This change would help to reduce the memory reservation requirement for
kdump on ppc64.

Signed-off-by: Mahesh Salgaonkar 
---
Changes in V2:
- Fixed error reported by auto build test at
  https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-February/138620.html
---
 arch/powerpc/include/asm/paca.h|3 +++
 arch/powerpc/include/asm/smp.h |1 +
 arch/powerpc/kernel/paca.c |   23 +
 arch/powerpc/kernel/prom.c |   39 +++-
 arch/powerpc/kernel/setup-common.c |   25 +++
 5 files changed, 85 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 70bd438..9be48b4 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -41,6 +41,9 @@ extern unsigned int debug_smp_processor_id(void); /* from 
linux/smp.h */
 #define get_lppaca()   (get_paca()->lppaca_ptr)
 #define get_slb_shadow()   (get_paca()->slb_shadow_ptr)
 
+/* Maximum number of threads per core. */
+#defineMAX_SMT 8
+
 struct task_struct;
 
 /*
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 825663c..0a5b99f 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -30,6 +30,7 @@
 #include 
 
 extern int boot_cpuid;
+extern int boot_hw_cpuid;
 extern int spinning_secondaries;
 
 extern void cpu_die(void);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 01ea0ed..96e5715 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -206,6 +206,7 @@ void __init allocate_pacas(void)
 {
u64 limit;
int cpu;
+   int nr_cpus;
 
limit = ppc64_rma_size;
 
@@ -218,20 +219,32 @@ void __init allocate_pacas(void)
limit = min(0x1000ULL, limit);
 #endif
 
-   paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpu_ids);
+   /*
+* Always align up the nr_cpu_ids to SMT threads and allocate
+* the paca. This will help us to prepare for a situation where
+* boot cpu id > nr_cpus_id. We will use the last nthreads
+* slots (nthreads == threads per core) to accommodate a core
+* that contains boot cpu thread.
+*
+* Do not change nr_cpu_ids value here. Let us do that in
+* early_init_dt_scan_cpus() where we know exact value
+* of threads per core.
+*/
+   nr_cpus = _ALIGN_UP(nr_cpu_ids, MAX_SMT);
+   paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpus);
 
paca = __va(memblock_alloc_base(paca_size, PAGE_SIZE, limit));
memset(paca, 0, paca_size);
 
printk(KERN_DEBUG "Allocated %u bytes for %d pacas at %p\n",
-   paca_size, nr_cpu_ids, paca);
+   paca_size, nr_cpus, paca);
 
-   allocate_lppacas(nr_cpu_ids, limit);
+   allocate_lppacas(nr_cpus, limit);
 
-   allocate_slb_shadows(nr_cpu_ids, limit);
+   allocate_slb_shadows(nr_cpus, limit);
 
/* Can't use for_each_*_cpu, as they aren't functional yet */
-   for (cpu = 0; cpu < nr_cpu_ids; cpu++)
+   for (cpu = 0; cpu < nr_cpus; cpu++)
initialise_paca([cpu], cpu);
 }
 
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 7030b03..cbe7a7c 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -291,6 +291,29 @@ static void __init check_cpu_feature_properties(unsigned 
long node)
}
 }
 
+/*
+ * Adjust the logical id of a boot cpu to fall under nr_cpu_ids. Map it to
+ * last core slot in the allocated paca array.
+ *
+ * e.g. on SMT=8 system, kernel booted with nr_cpus=1 and boot cpu = 33,
+ * align nr_cpu_ids to MAX_SMT value 8. Allocate paca array to hold up-to
+ * MAX_SMT=8 cpus. Since boot cpu 33 is greater than nr_cpus (8), adjust
+ * its logical id so that new id becomes less than nr_cpu_ids. Make sure
+ * that boot cpu's new logical id is aligned to its thread id and falls
+ * under last nthreads slots available in paca array. In this case the
+ * boot cpu 33 is adjusted to new boot cpu id 1.
+ *
+ */
+static inline void adjust_boot_cpuid(int nthreads, int phys_id)
+{
+   boot_hw_cpuid = phys_id;
+   if (boot_cpuid >= nr_cpu_ids) {
+