Re: [PATCH] powerpc/pseries/cmm: fix wrong managed page count when migrating between zones

2019-12-04 Thread David Hildenbrand



> Am 05.12.2019 um 03:59 schrieb Michael Ellerman :
> 
> David Hildenbrand  writes:
>> Forgot to rename the subject to
>> 
>> "powerpc/pseries/cmm: fix managed page counts when migrating pages
>> between zones"
>> 
>> If I don't have to resend, would be great if that could be adjusted when
>> applying.
> 
> I can do that.
> 
> I'm inclined to wait until the virtio_balloon.c change is committed, in
> case there's any changes to it during review, and so we can refer to
> it's SHA in the change log of this commit.
> 
> Do you want to ping me when that happens?

Sounds like a good idea, we have time until 5.5. I‘ll ping/resend.

Cheers!

> 
> cheers
> 



Re: [RESEND PATCH v2] powerpc/kernel/sysfs: Add PMU_SYSFS config option to enable PMU SPRs sysfs file creation

2019-12-04 Thread Christophe Leroy




Le 05/12/2019 à 06:25, Kajol Jain a écrit :

Many of the performance moniroting unit (PMU) SPRs are
exposed in the sysfs. "perf" API is the primary interface to program
PMU and collect counter data in the system. So expose these
PMU SPRs in the absence of CONFIG_PERF_EVENTS.

Patch adds a new CONFIG option 'CONFIG_PMU_SYSFS'. The new config
option used in kernel/sysfs.c for PMU SPRs sysfs file creation and
this new option is enabled only if 'CONFIG_PERF_EVENTS' option is
disabled.


Not sure this new unselectable option is worth it. See below.
By the way I also find the subject misleading, as you may believe when 
reading it that there is something to select.




Tested this patch with enable/disable CONFIG_PERF_EVENTS option
in powernv and pseries machines.
Also did compilation testing for different architecture include:
x86, mips, mips64, alpha, arm. And with book3s_32.config option.


How do you use book3s_32.config exactly ? That's a portion of config, 
not a config by itself. You should use pmac32_defconfig I guess.




Signed-off-by: Kajol Jain 

Reviewed-by: Madhavan Srinivasan 
Tested-by: Nageswara R Sastry 

Tested using the following different scenarios:
1. CONFIG_PERF_EVENT - enabled, CONFIG_PMU_SYSFS - disabled,
RESULT: not seen any sysfs files(mmrc*, pmc*) from /sys/bus/cpu/devices/cpu?/
2. CONFIG_PERF_EVENT - disabled, CONFIG_PMU_SYSFS - enabled,
RESULT: seen any sysfs files(mmrc*, pmc*) from /sys/bus/cpu/devices/cpu?/
3. CONFIG_PERF_EVENT -disabled, CONFIG_PMU_SYSFS - disabled,
RESULT: not possible, any one of the config options need to be enabled.
4. CONFIG_PERF_EVENT -enabled, CONFIG_PMU_SYSFS - enabled,
RESULT: not possible, any one of the config options need to be enabled.
---
  arch/powerpc/kernel/sysfs.c| 21 +
  arch/powerpc/platforms/Kconfig.cputype |  8 
  2 files changed, 29 insertions(+)

---
Changelog:
Resend v2
Added 'Reviewed-by' and 'Tested-by' tag along with test scenarios.  

v1 -> v2
- Added new config option 'PMU_SYSFS' for PMU SPR's creation
   rather than using PERF_EVENTS config option directly and make
   sure SPR's file creation only if 'CONFIG_PERF_EVENTS' disabled.
---
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 80a676da11cb..b7c01f1ef236 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -457,16 +457,21 @@ static ssize_t __used \
  
  #if defined(CONFIG_PPC64)

  #define HAS_PPC_PMC_CLASSIC   1
+#ifdef CONFIG_PMU_SYSFS
  #define HAS_PPC_PMC_IBM   1
+#endif
  #define HAS_PPC_PMC_PA6T  1
  #elif defined(CONFIG_PPC_BOOK3S_32)
  #define HAS_PPC_PMC_CLASSIC   1
+#ifdef CONFIG_PMU_SYSFS
  #define HAS_PPC_PMC_IBM   1
  #define HAS_PPC_PMC_G41
  #endif
+#endif
  
  
  #ifdef HAS_PPC_PMC_CLASSIC

+#ifdef CONFIG_PMU_SYSFS


You don't need this big forest of #ifdefs (this one and all the ones 
after). All the objets you are protecting with this are indeed static. 
So the only thing you have to do is to register them only when relevant, 
and GCC will get rid of the objects by itself when the config option is 
not enabled. See below.


And the advantage of doing that way is that you don't need to build it 
with both options to check the build. That's recommended by kernel 
codying style (Refer 
https://www.kernel.org/doc/html/latest/process/coding-style.html#conditional-compilation)


[...]


@@ -787,8 +804,10 @@ static int register_cpu_online(unsigned int cpu)
device_create_file(s, _attrs[i]);
  
  #ifdef CONFIG_PPC64

+#ifdef CONFIG_PMU_SYSFS


Don't use #ifdef here, just do instead:

if (IS_ENABLED(CONFIG_PMU_SYSFS) && cpu_has_feature(CPU_FTR_MMCRA))


if (cpu_has_feature(CPU_FTR_MMCRA))
device_create_file(s, _attr_mmcra);
+#endif /* CONFIG_PMU_SYSFS */
  
  	if (cpu_has_feature(CPU_FTR_PURR)) {

if (!firmware_has_feature(FW_FEATURE_LPAR))
@@ -876,8 +895,10 @@ static int unregister_cpu_online(unsigned int cpu)
device_remove_file(s, _attrs[i]);
  
  #ifdef CONFIG_PPC64

+#ifdef CONFIG_PMU_SYSFS


Same, use IS_ENABLED() here as well.


if (cpu_has_feature(CPU_FTR_MMCRA))
device_remove_file(s, _attr_mmcra);
+#endif /* CONFIG_PMU_SYSFS */
  
  	if (cpu_has_feature(CPU_FTR_PURR))

device_remove_file(s, _attr_purr);
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 12543e53fa96..f3ad579c559f 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -417,6 +417,14 @@ config PPC_MM_SLICES
  config PPC_HAVE_PMU_SUPPORT
 bool
  
+config PMU_SYSFS

+   bool
+   default y if !PERF_EVENTS
+   help
+ This option enables PMU SPR sysfs file creation. Since PMU SPRs are
+ intended to be used via "perf" interface, config option is enabled
+ only when CONFIG_PERF_EVENTS is 

[RESEND PATCH v2] powerpc/kernel/sysfs: Add PMU_SYSFS config option to enable PMU SPRs sysfs file creation

2019-12-04 Thread Kajol Jain
Many of the performance moniroting unit (PMU) SPRs are
exposed in the sysfs. "perf" API is the primary interface to program
PMU and collect counter data in the system. So expose these
PMU SPRs in the absence of CONFIG_PERF_EVENTS.

Patch adds a new CONFIG option 'CONFIG_PMU_SYSFS'. The new config
option used in kernel/sysfs.c for PMU SPRs sysfs file creation and
this new option is enabled only if 'CONFIG_PERF_EVENTS' option is
disabled.

Tested this patch with enable/disable CONFIG_PERF_EVENTS option
in powernv and pseries machines.
Also did compilation testing for different architecture include:
x86, mips, mips64, alpha, arm. And with book3s_32.config option.

Signed-off-by: Kajol Jain 

Reviewed-by: Madhavan Srinivasan 
Tested-by: Nageswara R Sastry 

Tested using the following different scenarios:
1. CONFIG_PERF_EVENT - enabled, CONFIG_PMU_SYSFS - disabled,
RESULT: not seen any sysfs files(mmrc*, pmc*) from /sys/bus/cpu/devices/cpu?/
2. CONFIG_PERF_EVENT - disabled, CONFIG_PMU_SYSFS - enabled,
RESULT: seen any sysfs files(mmrc*, pmc*) from /sys/bus/cpu/devices/cpu?/
3. CONFIG_PERF_EVENT -disabled, CONFIG_PMU_SYSFS - disabled,
RESULT: not possible, any one of the config options need to be enabled.
4. CONFIG_PERF_EVENT -enabled, CONFIG_PMU_SYSFS - enabled,
RESULT: not possible, any one of the config options need to be enabled.
---
 arch/powerpc/kernel/sysfs.c| 21 +
 arch/powerpc/platforms/Kconfig.cputype |  8 
 2 files changed, 29 insertions(+)

---
Changelog:
Resend v2 
Added 'Reviewed-by' and 'Tested-by' tag along with test scenarios.  

v1 -> v2
- Added new config option 'PMU_SYSFS' for PMU SPR's creation
  rather than using PERF_EVENTS config option directly and make
  sure SPR's file creation only if 'CONFIG_PERF_EVENTS' disabled.
---
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 80a676da11cb..b7c01f1ef236 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -457,16 +457,21 @@ static ssize_t __used \
 
 #if defined(CONFIG_PPC64)
 #define HAS_PPC_PMC_CLASSIC1
+#ifdef CONFIG_PMU_SYSFS
 #define HAS_PPC_PMC_IBM1
+#endif
 #define HAS_PPC_PMC_PA6T   1
 #elif defined(CONFIG_PPC_BOOK3S_32)
 #define HAS_PPC_PMC_CLASSIC1
+#ifdef CONFIG_PMU_SYSFS
 #define HAS_PPC_PMC_IBM1
 #define HAS_PPC_PMC_G4 1
 #endif
+#endif
 
 
 #ifdef HAS_PPC_PMC_CLASSIC
+#ifdef CONFIG_PMU_SYSFS
 SYSFS_PMCSETUP(mmcr0, SPRN_MMCR0);
 SYSFS_PMCSETUP(mmcr1, SPRN_MMCR1);
 SYSFS_PMCSETUP(pmc1, SPRN_PMC1);
@@ -485,6 +490,10 @@ SYSFS_PMCSETUP(pmc7, SPRN_PMC7);
 SYSFS_PMCSETUP(pmc8, SPRN_PMC8);
 
 SYSFS_PMCSETUP(mmcra, SPRN_MMCRA);
+#endif /* CONFIG_PPC64 */
+#endif /* CONFIG_PMU_SYSFS */
+
+#ifdef CONFIG_PPC64
 SYSFS_SPRSETUP(purr, SPRN_PURR);
 SYSFS_SPRSETUP(spurr, SPRN_SPURR);
 SYSFS_SPRSETUP(pir, SPRN_PIR);
@@ -495,7 +504,9 @@ SYSFS_SPRSETUP(tscr, SPRN_TSCR);
   enable write when needed with a separate function.
   Lets be conservative and default to pseries.
 */
+#ifdef CONFIG_PMU_SYSFS
 static DEVICE_ATTR(mmcra, 0600, show_mmcra, store_mmcra);
+#endif /* CONFIG_PMU_SYSFS */
 static DEVICE_ATTR(spurr, 0400, show_spurr, NULL);
 static DEVICE_ATTR(purr, 0400, show_purr, store_purr);
 static DEVICE_ATTR(pir, 0400, show_pir, NULL);
@@ -606,12 +617,14 @@ static void sysfs_create_dscr_default(void)
 #endif /* CONFIG_PPC64 */
 
 #ifdef HAS_PPC_PMC_PA6T
+#ifdef CONFIG_PMU_SYSFS
 SYSFS_PMCSETUP(pa6t_pmc0, SPRN_PA6T_PMC0);
 SYSFS_PMCSETUP(pa6t_pmc1, SPRN_PA6T_PMC1);
 SYSFS_PMCSETUP(pa6t_pmc2, SPRN_PA6T_PMC2);
 SYSFS_PMCSETUP(pa6t_pmc3, SPRN_PA6T_PMC3);
 SYSFS_PMCSETUP(pa6t_pmc4, SPRN_PA6T_PMC4);
 SYSFS_PMCSETUP(pa6t_pmc5, SPRN_PA6T_PMC5);
+#endif /* CONFIG_PMU_SYSFS */
 #ifdef CONFIG_DEBUG_MISC
 SYSFS_SPRSETUP(hid0, SPRN_HID0);
 SYSFS_SPRSETUP(hid1, SPRN_HID1);
@@ -644,6 +657,7 @@ SYSFS_SPRSETUP(tsr3, SPRN_PA6T_TSR3);
 #endif /* CONFIG_DEBUG_MISC */
 #endif /* HAS_PPC_PMC_PA6T */
 
+#ifdef CONFIG_PMU_SYSFS
 #ifdef HAS_PPC_PMC_IBM
 static struct device_attribute ibm_common_attrs[] = {
__ATTR(mmcr0, 0600, show_mmcr0, store_mmcr0),
@@ -671,9 +685,11 @@ static struct device_attribute classic_pmc_attrs[] = {
__ATTR(pmc8, 0600, show_pmc8, store_pmc8),
 #endif
 };
+#endif /* CONFIG_PMU_SYSFS */
 
 #ifdef HAS_PPC_PMC_PA6T
 static struct device_attribute pa6t_attrs[] = {
+#ifdef CONFIG_PMU_SYSFS
__ATTR(mmcr0, 0600, show_mmcr0, store_mmcr0),
__ATTR(mmcr1, 0600, show_mmcr1, store_mmcr1),
__ATTR(pmc0, 0600, show_pa6t_pmc0, store_pa6t_pmc0),
@@ -682,6 +698,7 @@ static struct device_attribute pa6t_attrs[] = {
__ATTR(pmc3, 0600, show_pa6t_pmc3, store_pa6t_pmc3),
__ATTR(pmc4, 0600, show_pa6t_pmc4, store_pa6t_pmc4),
__ATTR(pmc5, 0600, show_pa6t_pmc5, store_pa6t_pmc5),
+#endif /* CONFIG_PMU_SYSFS */
 #ifdef CONFIG_DEBUG_MISC
__ATTR(hid0, 0600, show_hid0, store_hid0),
__ATTR(hid1, 0600, show_hid1, store_hid1),
@@ -787,8 +804,10 @@ 

Re: [PATCH 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-04 Thread Gautham R Shenoy
Hi Srikar,

On Wed, Dec 04, 2019 at 07:14:58PM +0530, Srikar Dronamraju wrote:
> With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
> vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
> This leads to wrong choice of CPU, which in-turn leads to larger wakeup
> latencies. Eventually, it leads to performance regression in latency
> sensitive benchmarks like soltp, schbench etc.

The regression in the latency sensitive benchmarks is due to
preferring potentially busy vCPUs over vCPUs in the CEDE state.


> 
> On Powerpc, vcpu_is_preempted only looks at yield_count. If the
> yield_count is odd, the vCPU is assumed to be preempted. However
> yield_count is increased whenever LPAR enters CEDE state. So any CPU
> that has entered CEDE state is assumed to be preempted.
> 
> Even if vCPU of dedicated LPAR is preempted/donated, it should have
> right of first-use since they are suppose to own the vCPU.
> 
> On a Power9 System with 32 cores
>  # lscpu
> Architecture:ppc64le
> Byte Order:  Little Endian
> CPU(s):  128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  8
> Core(s) per socket:  1
> Socket(s):   16
> NUMA node(s):2
> Model:   2.2 (pvr 004e 0202)
> Model name:  POWER9 (architected), altivec supported
> Hypervisor vendor:   pHyp
> Virtualization type: para
> L1d cache:   32K
> L1i cache:   32K
> L2 cache:512K
> L3 cache:10240K
> NUMA node0 CPU(s):   0-63
> NUMA node1 CPU(s):   64-127
> 
> 
>   # perf stat -a -r 5 ./schbench
> v5.4  v5.4 + patch
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 47   50.th: 33
>   74.th: 64   75.th: 44
>   89.th: 76   90.th: 50
>   94.th: 83   95.th: 53
>   *98.th: 103 *99.th: 57
>   98.5000th: 2124 99.5000th: 59
>   98.9000th: 7976 99.9000th: 83
>   min=-1, max=10519   min=0, max=117
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 45   50.th: 34
>   74.th: 61   75.th: 45
>   89.th: 70   90.th: 52
>   94.th: 77   95.th: 56
>   *98.th: 504 *99.th: 62
>   98.5000th: 4012 99.5000th: 64
>   98.9000th: 8168 99.9000th: 79
>   min=-1, max=14500   min=0, max=123
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 48   50.th: 35
>   74.th: 65   75.th: 47
>   89.th: 76   90.th: 55
>   94.th: 82   95.th: 59
>   *98.th: 1098*99.th: 67
>   98.5000th: 3988 99.5000th: 71
>   98.9000th: 9360 99.9000th: 98
>   min=-1, max=19283   min=0, max=137
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 46   50.th: 35
>   74.th: 63   75.th: 46
>   89.th: 73   90.th: 53
>   94.th: 78   95.th: 57
>   *98.th: 113 *99.th: 63
>   98.5000th: 2316 99.5000th: 65
>   98.9000th: 7704 99.9000th: 83
>   min=-1, max=17976   min=0, max=139
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 46   50.th: 34
>   74.th: 62   75.th: 46
>   89.th: 73   90.th: 53
>   94.th: 79   95.th: 57
>   *98.th: 97  *99.th: 64
>   98.5000th: 1398 99.5000th: 70
>   98.9000th: 8136 99.9000th: 100
>   min=-1, max=10008   min=0, max=142
> 
> Performance counter stats for 'system wide' (4 runs):
> 
> context-switches   42,604 ( +-  0.87% )   45,397 ( +-  0.25% )
> 

Re: [PATCH v2 16/27] nvdimm/ocxl: Implement the Read Error Log command

2019-12-04 Thread Alastair D'Silva
On Tue, 2019-12-03 at 14:46 +1100, Alastair D'Silva wrote:
> From: Alastair D'Silva 
> 
> The read error log command extracts information from the controller's
> internal error log.
> 
> This patch exposes this information in 2 ways:
> - During probe, if an error occurs & a log is available, print it to
> the
>   console
> - After probe, make the error log available to userspace via an
> IOCTL.
>   Userspace is notified of pending error logs in a later patch
>   ("nvdimm/ocxl: Forward events to userspace")
> 
> Signed-off-by: Alastair D'Silva 
> ---
>  drivers/nvdimm/ocxl/scm.c  | 270
> +
>  drivers/nvdimm/ocxl/scm_internal.h |   1 +
>  include/uapi/nvdimm/ocxl-scm.h |  46 +
>  3 files changed, 317 insertions(+)
>  create mode 100644 include/uapi/nvdimm/ocxl-scm.h
> 
> diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> index c313a473a28e..0bbe1a14291e 100644
> --- a/drivers/nvdimm/ocxl/scm.c
> +++ b/drivers/nvdimm/ocxl/scm.c
> @@ -495,10 +495,220 @@ static int scm_file_release(struct inode
> *inode, struct file *file)
>   return 0;
>  }
>  
> +/**
> + * scm_error_log_header_parse() - Parse the first 64 bits of the
> error log command response
> + * @scm_data: the SCM metadata
> + * @length: out, returns the number of bytes in the response
> (excluding the 64 bit header)
> + */
> +static int scm_error_log_header_parse(struct scm_data *scm_data, u16
> *length)
> +{
> + int rc;
> + u64 val;
> +
> + u16 data_identifier;
> + u32 data_length;
> +
> + rc = ocxl_global_mmio_read64(scm_data->ocxl_afu,
> +  scm_data-
> >admin_command.data_offset,
> +  OCXL_LITTLE_ENDIAN, );
> + if (rc)
> + return rc;
> +
> + data_identifier = val >> 48;
> + data_length = val & 0x;
> +
> + if (data_identifier != 0x454C) {
> + dev_err(_data->dev,
> + "Bad data identifier for error log data,
> expected 'EL', got '%2s' (%#x), data_length=%u\n",
> + (char *)_identifier,
> + (unsigned int)data_identifier, data_length);
> + return -EINVAL;
> + }
> +
> + *length = data_length;
> + return 0;
> +}
> +
> +static int scm_error_log_offset_0x08(struct scm_data *scm_data,
> +  u32 *log_identifier, u32
> *program_ref_code)
> +{
> + int rc;
> + u64 val;
> +
> + rc = ocxl_global_mmio_read64(scm_data->ocxl_afu,
> +  scm_data-
> >admin_command.data_offset + 0x08,
> +  OCXL_LITTLE_ENDIAN, );
> + if (rc)
> + return rc;
> +
> + *log_identifier = val >> 32;
> + *program_ref_code = val & 0x;
> +
> + return 0;
> +}
> +
> +static int scm_read_error_log(struct scm_data *scm_data,
> +   struct scm_ioctl_error_log *log, bool
> buf_is_user)
> +{
> + u64 val;
> + u16 user_buf_length;
> + u16 buf_length;
> + u16 i;
> + int rc;
> +
> + if (log->buf_size % 8)
> + return -EINVAL;
> +
> + rc = scm_chi(scm_data, );
> + if (rc)
> + goto out;
> +
> + if (!(val & GLOBAL_MMIO_CHI_ELA))
> + return -EAGAIN;
> +
> + user_buf_length = log->buf_size;
> +
> + mutex_lock(_data->admin_command.lock);
> +
> + rc = scm_admin_command_request(scm_data, ADMIN_COMMAND_ERRLOG);
> + if (rc)
> + goto out;
> +
> + rc = scm_admin_command_execute(scm_data);
> + if (rc)
> + goto out;
> +
> + rc = scm_admin_command_complete_timeout(scm_data,
> ADMIN_COMMAND_ERRLOG);
> + if (rc < 0) {
> + dev_warn(_data->dev, "Read error log timed out\n");
> + goto out;
> + }
> +
> + rc = scm_admin_response(scm_data);
> + if (rc < 0)
> + goto out;
> + if (rc != STATUS_SUCCESS) {
> + scm_warn_status(scm_data, "Unexpected status from
> retrieve error log", rc);
> + goto out;
> + }
> +
> +
> + rc = scm_error_log_header_parse(scm_data, >buf_size);
> + if (rc)
> + goto out;
> + // log->buf_size now contains the scm buffer size, not the user
> size
> +
> + rc = scm_error_log_offset_0x08(scm_data, >log_identifier,
> +>program_reference_code);
> + if (rc)
> + goto out;
> +
> + rc = ocxl_global_mmio_read64(scm_data->ocxl_afu,
> +  scm_data-
> >admin_command.data_offset + 0x10,
> +  OCXL_LITTLE_ENDIAN, );
> + if (rc)
> + goto out;
> +
> + log->error_log_type = val >> 56;
> + log->action_flags = (log->error_log_type ==
> SCM_ERROR_LOG_TYPE_GENERAL) ?
> + (val >> 32) & 0xFF : 0;
> + log->power_on_seconds = val & 0x;
> +
> + rc = ocxl_global_mmio_read64(scm_data->ocxl_afu,

Re: [PATCH] powerpc/pseries/cmm: fix wrong managed page count when migrating between zones

2019-12-04 Thread Michael Ellerman
David Hildenbrand  writes:
> Forgot to rename the subject to
>
> "powerpc/pseries/cmm: fix managed page counts when migrating pages
> between zones"
>
> If I don't have to resend, would be great if that could be adjusted when
> applying.

I can do that.

I'm inclined to wait until the virtio_balloon.c change is committed, in
case there's any changes to it during review, and so we can refer to
it's SHA in the change log of this commit.

Do you want to ping me when that happens?

cheers


Re: [PATCH resend] ASoC: fsl_audmix: add missed pm_runtime_disable

2019-12-04 Thread Nicolin Chen
On Tue, Dec 03, 2019 at 07:13:03PM +0800, Chuhong Yuan wrote:
> The driver forgets to call pm_runtime_disable in probe failure
> and remove.
> Add the missed calls to fix it.
> 
> Signed-off-by: Chuhong Yuan 

Acked-by: Nicolin Chen 

Thanks

> ---
>  sound/soc/fsl/fsl_audmix.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/sound/soc/fsl/fsl_audmix.c b/sound/soc/fsl/fsl_audmix.c
> index a1db1bce330f..5faecbeb5497 100644
> --- a/sound/soc/fsl/fsl_audmix.c
> +++ b/sound/soc/fsl/fsl_audmix.c
> @@ -505,15 +505,20 @@ static int fsl_audmix_probe(struct platform_device 
> *pdev)
> ARRAY_SIZE(fsl_audmix_dai));
>   if (ret) {
>   dev_err(dev, "failed to register ASoC DAI\n");
> - return ret;
> + goto err_disable_pm;
>   }
>  
>   priv->pdev = platform_device_register_data(dev, mdrv, 0, NULL, 0);
>   if (IS_ERR(priv->pdev)) {
>   ret = PTR_ERR(priv->pdev);
>   dev_err(dev, "failed to register platform %s: %d\n", mdrv, ret);
> + goto err_disable_pm;
>   }
>  
> + return 0;
> +
> +err_disable_pm:
> + pm_runtime_disable(dev);
>   return ret;
>  }
>  
> @@ -521,6 +526,8 @@ static int fsl_audmix_remove(struct platform_device *pdev)
>  {
>   struct fsl_audmix *priv = dev_get_drvdata(>dev);
>  
> + pm_runtime_disable(>dev);
> +
>   if (priv->pdev)
>   platform_device_unregister(priv->pdev);
>  
> -- 
> 2.24.0
> 


Re: [PATCH] ASoC: fsl_sai: add IRQF_SHARED

2019-12-04 Thread Nicolin Chen
On Thu, Nov 28, 2019 at 11:38:02PM +0100, Michael Walle wrote:
> The LS1028A SoC uses the same interrupt line for adjacent SAIs. Use
> IRQF_SHARED to be able to use these SAIs simultaneously.
> 
> Signed-off-by: Michael Walle 

Acked-by: Nicolin Chen 

Thanks

> ---
>  sound/soc/fsl/fsl_sai.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
> index b517e4bc1b87..8c3ea7300972 100644
> --- a/sound/soc/fsl/fsl_sai.c
> +++ b/sound/soc/fsl/fsl_sai.c
> @@ -958,7 +958,8 @@ static int fsl_sai_probe(struct platform_device *pdev)
>   if (irq < 0)
>   return irq;
>  
> - ret = devm_request_irq(>dev, irq, fsl_sai_isr, 0, np->name, sai);
> + ret = devm_request_irq(>dev, irq, fsl_sai_isr, IRQF_SHARED,
> +np->name, sai);
>   if (ret) {
>   dev_err(>dev, "failed to claim irq %u\n", irq);
>   return ret;
> -- 
> 2.20.1
> 


RE: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.y

2019-12-04 Thread Ram Pai
On Thu, Dec 05, 2019 at 09:26:14AM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 05/12/2019 07:42, Ram Pai wrote:
> > On Wed, Dec 04, 2019 at 02:36:18PM +1100, David Gibson wrote:
> >> On Wed, Dec 04, 2019 at 12:08:09PM +1100, Alexey Kardashevskiy wrote:
> >>>
> >>>
> >>> On 04/12/2019 11:49, Ram Pai wrote:
>  On Wed, Dec 04, 2019 at 11:04:04AM +1100, Alexey Kardashevskiy wrote:
> >
> >
> > On 04/12/2019 03:52, Ram Pai wrote:
> >> On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
> >>>
> >>>
> >>> On 03/12/2019 15:05, Ram Pai wrote:
>  On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
> >
> >
> > On 03/12/2019 13:08, Ram Pai wrote:
> >> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy 
> >> wrote:
> >>>
> >>>
> >>> On 02/12/2019 17:45, Ram Pai wrote:
>  H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as 
>  one of
>  its parameters. One page is dedicated per cpu, for the lifetime 
>  of the
>  kernel for this purpose. On secure VMs, contents of this page, 
>  when
>  accessed by the hypervisor, retrieves encrypted TCE entries.  
>  Hypervisor
>  needs to know the unencrypted entries, to update the TCE table
>  accordingly.  There is nothing secret or sensitive about these 
>  entries.
>  Hence share the page with the hypervisor.
> >>>
> >>> This unsecures a page in the guest in a random place which 
> >>> creates an
> >>> additional attack surface which is hard to exploit indeed but
> >>> nevertheless it is there.
> >>> A safer option would be not to use the
> >>> hcall-multi-tce hyperrtas option (which translates 
> >>> FW_FEATURE_MULTITCE
> >>> in the guest).
> >>
> >>
> >> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets 
> >> invoked
> >> automatically when IOMMU option is enabled.
> >
> > It is advertised by QEMU but the guest does not have to use it.
> 
>  Are you suggesting that even normal-guest, not use hcall-multi-tce?
>  or just secure-guest?  
> >>>
> >>>
> >>> Just secure.
> >>
> >> hmm..  how are the TCE entries communicated to the hypervisor, if
> >> hcall-multi-tce is disabled?
> >
> > Via H_PUT_TCE which updates 1 entry at once (sets or clears).
> > hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
> > H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
> > optimization.
> 
>  Do you still think, secure-VM should use H_PUT_TCE and not
>  H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
>  Is there any advantage of special casing it for secure-VMs.
> >>>
> >>>
> >>> Reducing the amount of insecure memory at random location.
> >>
> >> The other approach we could use for that - which would still allow
> >> H_PUT_TCE_INDIRECT, would be to allocate the TCE buffer page from the
> >> same pool that we use for the bounce buffers.  I assume there must
> >> already be some sort of allocator for that?
> > 
> > The allocator for swiotlb is buried deep in the swiotlb code. It is 
> > not exposed to the outside-swiotlb world. Will have to do major surgery
> > to expose it.
> > 
> > I was thinking, maybe we share the page, finish the INDIRECT_TCE call,
> > and unshare the page.  This will address Alexey's concern of having
> > shared pages at random location, and will also give me my performance
> > optimization.  Alexey: ok?
> 
> 
> I really do not see the point. I really think we should to 1:1 mapping
> of swtiotlb buffers using the default 32bit window using H_PUT_TCE and
> this should be more than enough, I do not think the amount of code will
> be dramatically different compared to unsecuring and securing a page or
> using one of swtiotlb pages for this purpose. Thanks,

Ok. I will address your major concern -- "do not create new shared pages
at random location"  in my next version of the patch.  Using the 32bit
DMA window just to map the SWIOTLB buffers, will be some effort. Hope 
we can stage it that way.

RP



[PATCH v2] powerpc/mm: Remove kvm radix prefetch workaround for Power9 DD2.2

2019-12-04 Thread Jordan Niethe
Commit a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with
KVM") introduced a number of workarounds as coming out of a guest with
the mmu enabled would make the cpu would start running in hypervisor
state with the PID value from the guest. The cpu will then start
prefetching for the hypervisor with that PID value.

In Power9 DD2.2 the cpu behaviour was modified to fix this. When
accessing Quadrant 0 in hypervisor mode with LPID != 0 prefetching will
not be performed. This means that we can get rid of the workarounds for
Power9 DD2.2 and later revisions. Add a new cpu feature
CPU_FTR_P9_RADIX_PREFETCH_BUG to indicate if the workarounds are needed.

Signed-off-by: Jordan Niethe 
---
v2: Use a cpu feature instead of open coding the PVR check
---
 arch/powerpc/include/asm/cputable.h  |  6 --
 arch/powerpc/kernel/dt_cpu_ftrs.c| 13 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |  2 ++
 arch/powerpc/mm/book3s64/radix_pgtable.c |  6 +-
 arch/powerpc/mm/book3s64/radix_tlb.c |  3 +++
 5 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index cf00ff0d121d..944a39c4c3a0 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -212,6 +212,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_TLBIE_STQ_BUG   LONG_ASM_CONST(0x4000)
 #define CPU_FTR_P9_TIDR
LONG_ASM_CONST(0x8000)
 #define CPU_FTR_P9_TLBIE_ERAT_BUG  LONG_ASM_CONST(0x0001)
+#define CPU_FTR_P9_RADIX_PREFETCH_BUG  LONG_ASM_CONST(0x0002)
 
 #ifndef __ASSEMBLY__
 
@@ -459,8 +460,9 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
CPU_FTR_P9_TLBIE_STQ_BUG | CPU_FTR_P9_TLBIE_ERAT_BUG | 
CPU_FTR_P9_TIDR)
-#define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
-#define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1)
+#define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9 | CPU_FTR_P9_RADIX_PREFETCH_BUG
+#define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | CPU_FTR_P9_RADIX_PREFETCH_BUG 
| \
+  CPU_FTR_POWER9_DD2_1)
 #define CPU_FTRS_POWER9_DD2_2 (CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD2_1 | \
   CPU_FTR_P9_TM_HV_ASSIST | \
   CPU_FTR_P9_TM_XER_SO_BUG)
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 180b3a5d1001..182b4047c1ef 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -727,17 +727,20 @@ static __init void cpufeatures_cpu_quirks(void)
/*
 * Not all quirks can be derived from the cpufeatures device tree.
 */
-   if ((version & 0xefff) == 0x004e0200)
-   ; /* DD2.0 has no feature flag */
-   else if ((version & 0xefff) == 0x004e0201)
+   if ((version & 0xefff) == 0x004e0200) {
+   /* DD2.0 has no feature flag */
+   cur_cpu_spec->cpu_features |= CPU_FTR_P9_RADIX_PREFETCH_BUG;
+   } else if ((version & 0xefff) == 0x004e0201) {
cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD2_1;
-   else if ((version & 0xefff) == 0x004e0202) {
+   cur_cpu_spec->cpu_features |= CPU_FTR_P9_RADIX_PREFETCH_BUG;
+   } else if ((version & 0xefff) == 0x004e0202) {
cur_cpu_spec->cpu_features |= CPU_FTR_P9_TM_HV_ASSIST;
cur_cpu_spec->cpu_features |= CPU_FTR_P9_TM_XER_SO_BUG;
cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD2_1;
-   } else if ((version & 0x) == 0x004e)
+   } else if ((version & 0x) == 0x004e) {
/* DD2.1 and up have DD2_1 */
cur_cpu_spec->cpu_features |= CPU_FTR_POWER9_DD2_1;
+   }
 
if ((version & 0x) == 0x004e) {
cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index faebcbb8c4db..72b08bb17200 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1793,6 +1793,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
tlbsync
ptesync
 
+BEGIN_FTR_SECTION
/* Radix: Handle the case where the guest used an illegal PID */
LOAD_REG_ADDR(r4, mmu_base_pid)
lwz r3, VCPU_GUEST_PID(r9)
@@ -1822,6 +1823,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
addir7,r7,0x1000
bdnz1b
ptesync
+END_FTR_SECTION_IFSET(CPU_FTR_P9_RADIX_PREFETCH_BUG)
 
 2:
 #endif /* CONFIG_PPC_RADIX_MMU */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 6ee17d09649c..25cd2a5a6f9f 100644
--- 

Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

2019-12-04 Thread Alexey Kardashevskiy



On 05/12/2019 07:42, Ram Pai wrote:
> On Wed, Dec 04, 2019 at 02:36:18PM +1100, David Gibson wrote:
>> On Wed, Dec 04, 2019 at 12:08:09PM +1100, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 04/12/2019 11:49, Ram Pai wrote:
 On Wed, Dec 04, 2019 at 11:04:04AM +1100, Alexey Kardashevskiy wrote:
>
>
> On 04/12/2019 03:52, Ram Pai wrote:
>> On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 03/12/2019 15:05, Ram Pai wrote:
 On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
>
>
> On 03/12/2019 13:08, Ram Pai wrote:
>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 02/12/2019 17:45, Ram Pai wrote:
 H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as 
 one of
 its parameters. One page is dedicated per cpu, for the lifetime of 
 the
 kernel for this purpose. On secure VMs, contents of this page, when
 accessed by the hypervisor, retrieves encrypted TCE entries.  
 Hypervisor
 needs to know the unencrypted entries, to update the TCE table
 accordingly.  There is nothing secret or sensitive about these 
 entries.
 Hence share the page with the hypervisor.
>>>
>>> This unsecures a page in the guest in a random place which creates 
>>> an
>>> additional attack surface which is hard to exploit indeed but
>>> nevertheless it is there.
>>> A safer option would be not to use the
>>> hcall-multi-tce hyperrtas option (which translates 
>>> FW_FEATURE_MULTITCE
>>> in the guest).
>>
>>
>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets 
>> invoked
>> automatically when IOMMU option is enabled.
>
> It is advertised by QEMU but the guest does not have to use it.

 Are you suggesting that even normal-guest, not use hcall-multi-tce?
 or just secure-guest?  
>>>
>>>
>>> Just secure.
>>
>> hmm..  how are the TCE entries communicated to the hypervisor, if
>> hcall-multi-tce is disabled?
>
> Via H_PUT_TCE which updates 1 entry at once (sets or clears).
> hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
> H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
> optimization.

 Do you still think, secure-VM should use H_PUT_TCE and not
 H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
 Is there any advantage of special casing it for secure-VMs.
>>>
>>>
>>> Reducing the amount of insecure memory at random location.
>>
>> The other approach we could use for that - which would still allow
>> H_PUT_TCE_INDIRECT, would be to allocate the TCE buffer page from the
>> same pool that we use for the bounce buffers.  I assume there must
>> already be some sort of allocator for that?
> 
> The allocator for swiotlb is buried deep in the swiotlb code. It is 
> not exposed to the outside-swiotlb world. Will have to do major surgery
> to expose it.
> 
> I was thinking, maybe we share the page, finish the INDIRECT_TCE call,
> and unshare the page.  This will address Alexey's concern of having
> shared pages at random location, and will also give me my performance
> optimization.  Alexey: ok?


I really do not see the point. I really think we should to 1:1 mapping
of swtiotlb buffers using the default 32bit window using H_PUT_TCE and
this should be more than enough, I do not think the amount of code will
be dramatically different compared to unsecuring and securing a page or
using one of swtiotlb pages for this purpose. Thanks,


-- 
Alexey


Re: [PATCH 3/3] Documentation: Document sysfs interfaces purr, spurr, idle_purr, idle_spurr

2019-12-04 Thread Nathan Lynch
"Gautham R. Shenoy"  writes:
> +
> +What:/sys/devices/system/cpu/cpuX/idle_purr
> +Date:Nov 2019
> +Contact: Linux for PowerPC mailing list 
> +Description: PURR ticks for cpuX when it was idle.
> +
> + This sysfs interface exposes the number of PURR ticks
> + for cpuX when it was idle.
> +
> +What:/sys/devices/system/cpu/cpuX/spurr
/sys/devices/system/cpu/cpuX/idle_spurr


> +Date:Nov 2019
> +Contact: Linux for PowerPC mailing list 
> +Description: SPURR ticks for cpuX when it was idle.
> +
> + This sysfs interface exposes the number of SPURR ticks
> + for cpuX when it was idle.


Re: [PATCH 1/3] powerpc/pseries: Account for SPURR ticks on idle CPUs

2019-12-04 Thread Nathan Lynch
"Gautham R. Shenoy"  writes:
> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index a36fd05..708ec68 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -33,6 +33,8 @@
>  unsigned long cpuidle_disable = IDLE_NO_OVERRIDE;
>  EXPORT_SYMBOL(cpuidle_disable);
>  
> +DEFINE_PER_CPU(u64, idle_spurr_cycles);
> +

Does idle_spurr_cycles need any special treatment for CPU
online/offline?

>  static int __init powersave_off(char *arg)
>  {
>   ppc_md.power_save = NULL;
> diff --git a/drivers/cpuidle/cpuidle-pseries.c 
> b/drivers/cpuidle/cpuidle-pseries.c
> index 74c2479..45e2be4 100644
> --- a/drivers/cpuidle/cpuidle-pseries.c
> +++ b/drivers/cpuidle/cpuidle-pseries.c
> @@ -30,11 +30,14 @@ struct cpuidle_driver pseries_idle_driver = {
>  static struct cpuidle_state *cpuidle_state_table __read_mostly;
>  static u64 snooze_timeout __read_mostly;
>  static bool snooze_timeout_en __read_mostly;
> +DECLARE_PER_CPU(u64, idle_spurr_cycles);

This belongs in a header... 


> -static inline void idle_loop_prolog(unsigned long *in_purr)
> +static inline void idle_loop_prolog(unsigned long *in_purr,
> + unsigned long *in_spurr)
>  {
>   ppc64_runlatch_off();
>   *in_purr = mfspr(SPRN_PURR);
> + *in_spurr = mfspr(SPRN_SPURR);
>   /*
>* Indicate to the HV that we are idle. Now would be
>* a good time to find other work to dispatch.
> @@ -42,13 +45,16 @@ static inline void idle_loop_prolog(unsigned long 
> *in_purr)
>   get_lppaca()->idle = 1;
>  }
>  
> -static inline void idle_loop_epilog(unsigned long in_purr)
> +static inline void idle_loop_epilog(unsigned long in_purr,
> + unsigned long in_spurr)
>  {
>   u64 wait_cycles;
> + u64 *idle_spurr_cycles_ptr = this_cpu_ptr(_spurr_cycles);
>  
>   wait_cycles = be64_to_cpu(get_lppaca()->wait_state_cycles);
>   wait_cycles += mfspr(SPRN_PURR) - in_purr;
>   get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles);
> + *idle_spurr_cycles_ptr += mfspr(SPRN_SPURR) - in_spurr;

... and the sampling and increment logic probably should be further
encapsulated in accessor functions that can be used in both the cpuidle
driver and the default/generic idle implementation. Or is there some
reason this is specific to the pseries cpuidle driver?


Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

2019-12-04 Thread Nathan Lynch
"Gautham R. Shenoy"  writes:
> @@ -1067,6 +1097,8 @@ static int __init topology_init(void)
>   register_cpu(c, cpu);
>  
>   device_create_file(>dev, _attr_physical_id);
> + if (firmware_has_feature(FW_FEATURE_SPLPAR))
> + create_idle_purr_spurr_sysfs_entry(>dev);

Architecturally speaking PURR/SPURR aren't strongly linked to the PAPR
SPLPAR option, are they? I'm not sure it's right for these attributes to
be absent if the platform does not support shared processor mode.


Re: [PATCH 0/3] pseries: Track and expose idle PURR and SPURR ticks

2019-12-04 Thread Nathan Lynch
"Gautham R. Shenoy"  writes:
> From: "Gautham R. Shenoy" 
>
> On PSeries LPARs, the data centers planners desire a more accurate
> view of system utilization per resource such as CPU to plan the system
> capacity requirements better. Such accuracy can be obtained by reading
> PURR/SPURR registers for CPU resource utilization.
>
> Tools such as lparstat which are used to compute the utilization need
> to know [S]PURR ticks when the cpu was busy or idle. The [S]PURR
> counters are already exposed through sysfs.  We already account for
> PURR ticks when we go to idle so that we can update the VPA area. This
> patchset extends support to account for SPURR ticks when idle, and
> expose both via per-cpu sysfs files.

Does anything really want to use PURR instead of SPURR? Seems like we
should expose only SPURR idle values if possible.


Re: [PATCH] powerpc/xive: skip ioremap() of ESB pages for LSI interrupts

2019-12-04 Thread Michael Ellerman
Greg Kurz  writes:
> On Thu,  5 Dec 2019 00:30:56 +1100 (AEDT)
> Michael Ellerman  wrote:
>> On Tue, 2019-12-03 at 16:36:42 UTC, =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= 
>> wrote:
>> > The PCI INTx interrupts and other LSI interrupts are handled differently
>> > under a sPAPR platform. When the interrupt source characteristics are
>> > queried, the hypervisor returns an H_INT_ESB flag to inform the OS
>> > that it should be using the H_INT_ESB hcall for interrupt management
>> > and not loads and stores on the interrupt ESB pages.
...
>> > 
>> > Cc: sta...@vger.kernel.org # v4.14+
>> > Fixes: bed81ee181dd ("powerpc/xive: introduce H_INT_ESB hcall")
>> > Signed-off-by: Cédric Le Goater 
>> 
>> Applied to powerpc fixes, thanks.
>> 
>> https://git.kernel.org/powerpc/c/b67a95f2abff0c34e5667c15ab8900de73d8d087
>> 
>
> My R-b tag is missing... I guess I didn't review it quick enough :)

Yeah sorry, your tag arrived after I'd applied it but before I'd pushed
it out.

Thanks for reviewing it anyway. You can redeem lost R-b tags for a free
beer at any conference we're both attending :)

cheers


Re: [PATCH 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-04 Thread Waiman Long
On 12/4/19 8:44 AM, Srikar Dronamraju wrote:
> With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
> vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
> This leads to wrong choice of CPU, which in-turn leads to larger wakeup
> latencies. Eventually, it leads to performance regression in latency
> sensitive benchmarks like soltp, schbench etc.
>
> On Powerpc, vcpu_is_preempted only looks at yield_count. If the
> yield_count is odd, the vCPU is assumed to be preempted. However
> yield_count is increased whenever LPAR enters CEDE state. So any CPU
> that has entered CEDE state is assumed to be preempted.
>
> Even if vCPU of dedicated LPAR is preempted/donated, it should have
> right of first-use since they are suppose to own the vCPU.
>
> On a Power9 System with 32 cores
>  # lscpu
> Architecture:ppc64le
> Byte Order:  Little Endian
> CPU(s):  128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  8
> Core(s) per socket:  1
> Socket(s):   16
> NUMA node(s):2
> Model:   2.2 (pvr 004e 0202)
> Model name:  POWER9 (architected), altivec supported
> Hypervisor vendor:   pHyp
> Virtualization type: para
> L1d cache:   32K
> L1i cache:   32K
> L2 cache:512K
> L3 cache:10240K
> NUMA node0 CPU(s):   0-63
> NUMA node1 CPU(s):   64-127
>  
>
>   # perf stat -a -r 5 ./schbench
> v5.4  v5.4 + patch
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 47   50.th: 33
>   74.th: 64   75.th: 44
>   89.th: 76   90.th: 50
>   94.th: 83   95.th: 53
>   *98.th: 103 *99.th: 57
>   98.5000th: 2124 99.5000th: 59
>   98.9000th: 7976 99.9000th: 83
>   min=-1, max=10519   min=0, max=117
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 45   50.th: 34
>   74.th: 61   75.th: 45
>   89.th: 70   90.th: 52
>   94.th: 77   95.th: 56
>   *98.th: 504 *99.th: 62
>   98.5000th: 4012 99.5000th: 64
>   98.9000th: 8168 99.9000th: 79
>   min=-1, max=14500   min=0, max=123
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 48   50.th: 35
>   74.th: 65   75.th: 47
>   89.th: 76   90.th: 55
>   94.th: 82   95.th: 59
>   *98.th: 1098*99.th: 67
>   98.5000th: 3988 99.5000th: 71
>   98.9000th: 9360 99.9000th: 98
>   min=-1, max=19283   min=0, max=137
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 46   50.th: 35
>   74.th: 63   75.th: 46
>   89.th: 73   90.th: 53
>   94.th: 78   95.th: 57
>   *98.th: 113 *99.th: 63
>   98.5000th: 2316 99.5000th: 65
>   98.9000th: 7704 99.9000th: 83
>   min=-1, max=17976   min=0, max=139
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 46   50.th: 34
>   74.th: 62   75.th: 46
>   89.th: 73   90.th: 53
>   94.th: 79   95.th: 57
>   *98.th: 97  *99.th: 64
>   98.5000th: 1398 99.5000th: 70
>   98.9000th: 8136 99.9000th: 100
>   min=-1, max=10008   min=0, max=142
>
> Performance counter stats for 'system wide' (4 runs):
>
> context-switches   42,604 ( +-  0.87% )   45,397 ( +-  0.25% )
> cpu-migrations  0,195 ( +-  2.70% )  230 ( +-  7.23% )
> page-faults16,783 ( +- 14.87% )   16,781 ( +-  9.77% )
>
> Waiman Long suggested 

Re: [PATCH] powerpc/pseries/cmm: fix wrong managed page count when migrating between zones

2019-12-04 Thread David Hildenbrand
Forgot to rename the subject to

"powerpc/pseries/cmm: fix managed page counts when migrating pages
between zones"

If I don't have to resend, would be great if that could be adjusted when
applying.


-- 
Thanks,

David / dhildenb



[PATCH] powerpc/pseries/cmm: fix wrong managed page count when migrating between zones

2019-12-04 Thread David Hildenbrand
In case we have to migrate a ballon page to a newpage of another zone, the
managed page count of both zones is wrong. Paired with memory offlining
(which will adjust the managed page count), we can trigger kernel crashes
and all kinds of different symptoms.

Fix it by properly adjusting the managed page count when migrating.

I did not try to reproduce on powerpc, however,I just resolved a long
known issue when ballooning+offlining in virtio-balloon. The same should
apply to powerpc/cmm since it started using the balloon compaction
infrastructure (luckily just recently).

Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction")
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Andrew Morton 
Cc: Richard Fontana 
Cc: Greg Kroah-Hartman 
Cc: Arun KS 
Cc: Thomas Gleixner 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: David Hildenbrand 
---

virtio-ballon fix with more details:

https://lkml.kernel.org/r/20191204204807.8025-1-da...@redhat.com/

---
 arch/powerpc/platforms/pseries/cmm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/cmm.c 
b/arch/powerpc/platforms/pseries/cmm.c
index 91571841df8a..665298fe2990 100644
--- a/arch/powerpc/platforms/pseries/cmm.c
+++ b/arch/powerpc/platforms/pseries/cmm.c
@@ -551,6 +551,10 @@ static int cmm_migratepage(struct balloon_dev_info 
*b_dev_info,
 */
plpar_page_set_active(page);
 
+   /* fixup the managed page count (esp. of the zone) */
+   adjust_managed_page_count(page, 1);
+   adjust_managed_page_count(newpage, -1);
+
/* balloon page list reference */
put_page(page);
 
-- 
2.21.0



RE: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

2019-12-04 Thread Ram Pai
On Wed, Dec 04, 2019 at 02:36:18PM +1100, David Gibson wrote:
> On Wed, Dec 04, 2019 at 12:08:09PM +1100, Alexey Kardashevskiy wrote:
> > 
> > 
> > On 04/12/2019 11:49, Ram Pai wrote:
> > > On Wed, Dec 04, 2019 at 11:04:04AM +1100, Alexey Kardashevskiy wrote:
> > >>
> > >>
> > >> On 04/12/2019 03:52, Ram Pai wrote:
> > >>> On Tue, Dec 03, 2019 at 03:24:37PM +1100, Alexey Kardashevskiy wrote:
> > 
> > 
> >  On 03/12/2019 15:05, Ram Pai wrote:
> > > On Tue, Dec 03, 2019 at 01:15:04PM +1100, Alexey Kardashevskiy wrote:
> > >>
> > >>
> > >> On 03/12/2019 13:08, Ram Pai wrote:
> > >>> On Tue, Dec 03, 2019 at 11:56:43AM +1100, Alexey Kardashevskiy 
> > >>> wrote:
> > 
> > 
> >  On 02/12/2019 17:45, Ram Pai wrote:
> > > H_PUT_TCE_INDIRECT hcall uses a page filled with TCE entries, as 
> > > one of
> > > its parameters. One page is dedicated per cpu, for the lifetime 
> > > of the
> > > kernel for this purpose. On secure VMs, contents of this page, 
> > > when
> > > accessed by the hypervisor, retrieves encrypted TCE entries.  
> > > Hypervisor
> > > needs to know the unencrypted entries, to update the TCE table
> > > accordingly.  There is nothing secret or sensitive about these 
> > > entries.
> > > Hence share the page with the hypervisor.
> > 
> >  This unsecures a page in the guest in a random place which creates 
> >  an
> >  additional attack surface which is hard to exploit indeed but
> >  nevertheless it is there.
> >  A safer option would be not to use the
> >  hcall-multi-tce hyperrtas option (which translates 
> >  FW_FEATURE_MULTITCE
> >  in the guest).
> > >>>
> > >>>
> > >>> Hmm... How do we not use it?  AFAICT hcall-multi-tce option gets 
> > >>> invoked
> > >>> automatically when IOMMU option is enabled.
> > >>
> > >> It is advertised by QEMU but the guest does not have to use it.
> > >
> > > Are you suggesting that even normal-guest, not use hcall-multi-tce?
> > > or just secure-guest?  
> > 
> > 
> >  Just secure.
> > >>>
> > >>> hmm..  how are the TCE entries communicated to the hypervisor, if
> > >>> hcall-multi-tce is disabled?
> > >>
> > >> Via H_PUT_TCE which updates 1 entry at once (sets or clears).
> > >> hcall-multi-tce  enables H_PUT_TCE_INDIRECT (512 entries at once) and
> > >> H_STUFF_TCE (clearing, up to 4bln at once? many), these are simply an
> > >> optimization.
> > > 
> > > Do you still think, secure-VM should use H_PUT_TCE and not
> > > H_PUT_TCE_INDIRECT?  And normal VM should use H_PUT_TCE_INDIRECT?
> > > Is there any advantage of special casing it for secure-VMs.
> > 
> > 
> > Reducing the amount of insecure memory at random location.
> 
> The other approach we could use for that - which would still allow
> H_PUT_TCE_INDIRECT, would be to allocate the TCE buffer page from the
> same pool that we use for the bounce buffers.  I assume there must
> already be some sort of allocator for that?

The allocator for swiotlb is buried deep in the swiotlb code. It is 
not exposed to the outside-swiotlb world. Will have to do major surgery
to expose it.

I was thinking, maybe we share the page, finish the INDIRECT_TCE call,
and unshare the page.  This will address Alexey's concern of having
shared pages at random location, and will also give me my performance
optimization.  Alexey: ok?

RP



Re: [PATCH] powerpc/xive: skip ioremap() of ESB pages for LSI interrupts

2019-12-04 Thread Greg Kurz
On Thu,  5 Dec 2019 00:30:56 +1100 (AEDT)
Michael Ellerman  wrote:

> On Tue, 2019-12-03 at 16:36:42 UTC, =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= 
> wrote:
> > The PCI INTx interrupts and other LSI interrupts are handled differently
> > under a sPAPR platform. When the interrupt source characteristics are
> > queried, the hypervisor returns an H_INT_ESB flag to inform the OS
> > that it should be using the H_INT_ESB hcall for interrupt management
> > and not loads and stores on the interrupt ESB pages.
> > 
> > A default -1 value is returned for the addresses of the ESB pages. The
> > driver ignores this condition today and performs a bogus IO mapping.
> > Recent changes and the DEBUG_VM configuration option make the bug
> > visible with :
> > 
> > [0.015518] kernel BUG at 
> > arch/powerpc/include/asm/book3s/64/pgtable.h:612!
> > [0.015578] Oops: Exception in kernel mode, sig: 5 [#1]
> > [0.015627] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA 
> > pSeries
> > [0.015697] Modules linked in:
> > [0.015739] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> > 5.4.0-0.rc6.git0.1.fc32.ppc64le #1
> > [0.015812] NIP:  c0f63294 LR: c0f62e44 CTR: 
> > 
> > [0.015889] REGS: c000fa45f0d0 TRAP: 0700   Not tainted  
> > (5.4.0-0.rc6.git0.1.fc32.ppc64le)
> > [0.015971] MSR:  82029033   CR: 
> > 44000424  XER: 
> > [0.016050] CFAR: c0f63128 IRQMASK: 0
> > [0.016050] GPR00: c0f62e44 c000fa45f360 c1be5400 
> > 
> > [0.016050] GPR04: c19c7d38 c000fa340030 fa330009 
> > c1c15e18
> > [0.016050] GPR08: 0040 ffe0  
> > 8418dd352dbd190f
> > [0.016050] GPR12:  c1e0 c00a8006 
> > c00a8006
> > [0.016050] GPR16:  81ae c1c24d98 
> > 
> > [0.016050] GPR20: c00a8007 c1cafca0 c00a8007 
> > 
> > [0.016050] GPR24: c00a8008 c00a8008 c1cafca8 
> > c00a8008
> > [0.016050] GPR28: c000fa32e010 c00a8006  
> > c000fa33
> > [0.016711] NIP [c0f63294] ioremap_page_range+0x4c4/0x6e0
> > [0.016778] LR [c0f62e44] ioremap_page_range+0x74/0x6e0
> > [0.016846] Call Trace:
> > [0.016876] [c000fa45f360] [c0f62e44] 
> > ioremap_page_range+0x74/0x6e0 (unreliable)
> > [0.016969] [c000fa45f460] [c00934bc] do_ioremap+0x8c/0x120
> > [0.017037] [c000fa45f4b0] [c00938e8] 
> > __ioremap_caller+0x128/0x140
> > [0.017116] [c000fa45f500] [c00931a0] ioremap+0x30/0x50
> > [0.017184] [c000fa45f520] [c00d1380] 
> > xive_spapr_populate_irq_data+0x170/0x260
> > [0.017263] [c000fa45f5c0] [c00cc90c] 
> > xive_irq_domain_map+0x8c/0x170
> > [0.017344] [c000fa45f600] [c0219124] 
> > irq_domain_associate+0xb4/0x2d0
> > [0.017424] [c000fa45f690] [c0219fe0] 
> > irq_create_mapping+0x1e0/0x3b0
> > [0.017506] [c000fa45f730] [c021ad6c] 
> > irq_create_fwspec_mapping+0x27c/0x3e0
> > [0.017586] [c000fa45f7c0] [c021af68] 
> > irq_create_of_mapping+0x98/0xb0
> > [0.017666] [c000fa45f830] [c08d4e48] 
> > of_irq_parse_and_map_pci+0x168/0x230
> > [0.017746] [c000fa45f910] [c0075428] 
> > pcibios_setup_device+0x88/0x250
> > [0.017826] [c000fa45f9a0] [c0077b84] 
> > pcibios_setup_bus_devices+0x54/0x100
> > [0.017906] [c000fa45fa10] [c00793f0] 
> > __of_scan_bus+0x160/0x310
> > [0.017973] [c000fa45faf0] [c0075fc0] 
> > pcibios_scan_phb+0x330/0x390
> > [0.018054] [c000fa45fba0] [c139217c] pcibios_init+0x8c/0x128
> > [0.018121] [c000fa45fc20] [c00107b0] 
> > do_one_initcall+0x60/0x2c0
> > [0.018201] [c000fa45fcf0] [c1384624] 
> > kernel_init_freeable+0x290/0x378
> > [0.018280] [c000fa45fdb0] [c0010d24] kernel_init+0x2c/0x148
> > [0.018348] [c000fa45fe20] [c000bdbc] 
> > ret_from_kernel_thread+0x5c/0x80
> > [0.018427] Instruction dump:
> > [0.018468] 41820014 3920fe7f 7d494838 7d290074 7929d182 f8e10038 
> > 69290001 0b09
> > [0.018552] 7a098420 0b09 7bc95960 7929a802 <0b09> 7fc68b78 
> > e8610048 7dc47378
> > 
> > Cc: sta...@vger.kernel.org # v4.14+
> > Fixes: bed81ee181dd ("powerpc/xive: introduce H_INT_ESB hcall")
> > Signed-off-by: Cédric Le Goater 
> 
> Applied to powerpc fixes, thanks.
> 
> https://git.kernel.org/powerpc/c/b67a95f2abff0c34e5667c15ab8900de73d8d087
> 

My R-b tag is missing... I guess I didn't review it quick enough :)

> cheers



Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

2019-12-04 Thread Ram Pai
On Wed, Dec 04, 2019 at 03:26:50PM -0300, Leonardo Bras wrote:
> On Sun, 2019-12-01 at 22:45 -0800, Ram Pai wrote:
> > @@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
> > *tbl, long tcenum,
> >  * from iommu_alloc{,_sg}()
> >  */
> > if (!tcep) {
> > -   tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> > -   /* If allocation fails, fall back to the loop 
> > implementation */
> > +   tcep = alloc_tce_page();
> > if (!tcep) {
> > local_irq_restore(flags);
> > return tce_build_pSeriesLP(tbl, tcenum, npages, 
> > uaddr,
> 
> The comment about failing allocation was removed, but I see no chage of
> behaviour here. 
> 
> Can you please explain what/where it changes? 

You observed it right. The comment should stay put.  Will have it fixed
in my next version.

Thanks,
RP



Re: [PATCH v4 1/2] powerpc/pseries/iommu: Share the per-cpu TCE page with the hypervisor.

2019-12-04 Thread Leonardo Bras
On Sun, 2019-12-01 at 22:45 -0800, Ram Pai wrote:
> @@ -206,8 +224,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
> *tbl, long tcenum,
>  * from iommu_alloc{,_sg}()
>  */
> if (!tcep) {
> -   tcep = (__be64 *)__get_free_page(GFP_ATOMIC);
> -   /* If allocation fails, fall back to the loop implementation 
> */
> +   tcep = alloc_tce_page();
> if (!tcep) {
> local_irq_restore(flags);
> return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,

The comment about failing allocation was removed, but I see no chage of
behaviour here. 

Can you please explain what/where it changes? 

Best regards,

Leonardo


signature.asc
Description: This is a digitally signed message part


[PATCH 4.9 080/125] net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe()

2019-12-04 Thread Greg Kroah-Hartman
From: Wen Yang 

[ Upstream commit 40752b3eae29f8ca2378e978a02bd6dbeeb06d16 ]

This patch fixes potential double frees if register_hdlc_device() fails.

Signed-off-by: Wen Yang 
Reviewed-by: Peng Hao 
CC: Zhao Qiang 
CC: "David S. Miller" 
CC: net...@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-ker...@vger.kernel.org
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/wan/fsl_ucc_hdlc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 7a62316c570d2..b2c1e872d5ed5 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -1117,7 +1117,6 @@ static int ucc_hdlc_probe(struct platform_device *pdev)
if (register_hdlc_device(dev)) {
ret = -ENOBUFS;
pr_err("ucc_hdlc: unable to register hdlc device\n");
-   free_netdev(dev);
goto free_dev;
}
 
-- 
2.20.1





[PATCH 4.14 118/209] net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe()

2019-12-04 Thread Greg Kroah-Hartman
From: Wen Yang 

[ Upstream commit 40752b3eae29f8ca2378e978a02bd6dbeeb06d16 ]

This patch fixes potential double frees if register_hdlc_device() fails.

Signed-off-by: Wen Yang 
Reviewed-by: Peng Hao 
CC: Zhao Qiang 
CC: "David S. Miller" 
CC: net...@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-ker...@vger.kernel.org
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 drivers/net/wan/fsl_ucc_hdlc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
index 18b648648adb2..289dff262948d 100644
--- a/drivers/net/wan/fsl_ucc_hdlc.c
+++ b/drivers/net/wan/fsl_ucc_hdlc.c
@@ -1114,7 +1114,6 @@ static int ucc_hdlc_probe(struct platform_device *pdev)
if (register_hdlc_device(dev)) {
ret = -ENOBUFS;
pr_err("ucc_hdlc: unable to register hdlc device\n");
-   free_netdev(dev);
goto free_dev;
}
 
-- 
2.20.1





[PATCH v2 2/2] powerpc/shared: Use static key to detect shared processor

2019-12-04 Thread Srikar Dronamraju
With the static key shared processor available, is_shared_processor()
can return without having to query the lppaca structure.

Signed-off-by: Srikar Dronamraju 
---
Changelog v1->v2:
Now that we no more refer to lppaca, remove the comment.

 arch/powerpc/include/asm/spinlock.h | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 866f6ca0427a..251fe6e47471 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -111,13 +111,8 @@ static inline void splpar_rw_yield(arch_rwlock_t *lock) {};
 
 static inline bool is_shared_processor(void)
 {
-/*
- * LPPACA is only available on Pseries so guard anything LPPACA related to
- * allow other platforms (which include this common header) to compile.
- */
-#ifdef CONFIG_PPC_PSERIES
-   return (IS_ENABLED(CONFIG_PPC_SPLPAR) &&
-   lppaca_shared_proc(local_paca->lppaca_ptr));
+#if defined(CONFIG_PPC_PSERIES) && defined(CONFIG_PPC_SPLPAR)
+   return static_branch_unlikely(_processor);
 #else
return false;
 #endif
-- 
2.18.1



Re: [PATCH 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-04 Thread Srikar Dronamraju
* Srikar Dronamraju  [2019-12-04 19:14:58]:

> 
> 
>   # perf stat -a -r 5 ./schbench
> v5.4  v5.4 + patch
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 47   50.th: 33
>   74.th: 64   75.th: 44
>   89.th: 76   90.th: 50
>   94.th: 83   95.th: 53
>   *98.th: 103 *99.th: 57
>   98.5000th: 2124 99.5000th: 59
>   98.9000th: 7976 99.9000th: 83
>   min=-1, max=10519   min=0, max=117
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 45   50.th: 34
>   74.th: 61   75.th: 45
>   89.th: 70   90.th: 52
>   94.th: 77   95.th: 56
>   *98.th: 504 *99.th: 62
>   98.5000th: 4012 99.5000th: 64
>   98.9000th: 8168 99.9000th: 79
>   min=-1, max=14500   min=0, max=123
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 48   50.th: 35
>   74.th: 65   75.th: 47
>   89.th: 76   90.th: 55
>   94.th: 82   95.th: 59
>   *98.th: 1098*99.th: 67
>   98.5000th: 3988 99.5000th: 71
>   98.9000th: 9360 99.9000th: 98
>   min=-1, max=19283   min=0, max=137
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 46   50.th: 35
>   74.th: 63   75.th: 46
>   89.th: 73   90.th: 53
>   94.th: 78   95.th: 57
>   *98.th: 113 *99.th: 63
>   98.5000th: 2316 99.5000th: 65
>   98.9000th: 7704 99.9000th: 83
>   min=-1, max=17976   min=0, max=139
> Latency percentiles (usec)  Latency percentiles (usec)
>   49.th: 46   50.th: 34
>   74.th: 62   75.th: 46
>   89.th: 73   90.th: 53
>   94.th: 79   95.th: 57
>   *98.th: 97  *99.th: 64
>   98.5000th: 1398 99.5000th: 70
>   98.9000th: 8136 99.9000th: 100
>   min=-1, max=10008   min=0, max=142
> 

Just to be clear, since these are latency values, lesser number is better.

-- 
Thanks and Regards
Srikar Dronamraju



Re: [bug] userspace hitting sporadic SIGBUS on xfs (Power9, ppc64le), v4.19 and later

2019-12-04 Thread Jan Stancek


- Original Message -
> Please try the patch below:

I ran reproducer for 18 hours on 2 systems were it previously reproduced,
there were no crashes / SIGBUS.



Re: [PATCH] powerpc: ensure that swiotlb buffer is allocated from low memory

2019-12-04 Thread Christoph Hellwig
On Wed, Dec 04, 2019 at 02:35:24PM +0200, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G. If a
> system has more physical memory than this limit, the swiotlb buffer is not
> addressable because it is allocated from memblock using top-down mode.
> 
> Force memblock to bottom-up mode before calling swiotlb_init() to ensure
> that the swiotlb buffer is DMA-able.
> 
> Link: 
> https://lkml.kernel.org/r/f1ebb706-73df-430e-9020-c214ec8ed...@xenosoft.de
> Reported-by: Christian Zigotzky 
> Signed-off-by: Mike Rapoport 

Looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH] powerpc/pmem: Fix kernel crash due to wrong range value usage in flush_dcache_range

2019-12-04 Thread Sachin Sant


> 
> Fixes: 076265907cf9 ("powerpc: Chunk calls to flush_dcache_range in 
> arch_*_memory")
> Reported-by: Sachin Sant 
> Signed-off-by: Aneesh Kumar K.V 
> ---
> arch/powerpc/mm/mem.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 

It took a while to setup environment on a replacement machine. 
Was able to test the fix. With this fix applied I no longer see the problem.

Tested-by: Sachin Sant 

Thanks
-Sachin
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index ad299e72ec30..9488b63dfc87 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -121,7 +121,7 @@ static void flush_dcache_range_chunked(unsigned long 
> start, unsigned long stop,
>   unsigned long i;
> 
>   for (i = start; i < stop; i += chunk) {
> - flush_dcache_range(i, min(stop, start + chunk));
> + flush_dcache_range(i, min(stop, i + chunk));
>   cond_resched();
>   }
> }
> -- 
> 2.23.0
> 



[PATCH 2/2] powerpc/shared: Use static key to detect shared processor

2019-12-04 Thread Srikar Dronamraju
With the static key shared processor available, is_shared_processor()
can return without having to query the lppaca structure.

Signed-off-by: Srikar Dronamraju 
---
 arch/powerpc/include/asm/spinlock.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/spinlock.h 
b/arch/powerpc/include/asm/spinlock.h
index 866f6ca0427a..699eb306a835 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -115,9 +115,8 @@ static inline bool is_shared_processor(void)
  * LPPACA is only available on Pseries so guard anything LPPACA related to
  * allow other platforms (which include this common header) to compile.
  */
-#ifdef CONFIG_PPC_PSERIES
-   return (IS_ENABLED(CONFIG_PPC_SPLPAR) &&
-   lppaca_shared_proc(local_paca->lppaca_ptr));
+#if defined(CONFIG_PPC_PSERIES) && defined(CONFIG_PPC_SPLPAR)
+   return static_branch_unlikely(_processor);
 #else
return false;
 #endif
-- 
2.18.1



[PATCH 1/2] powerpc/vcpu: Assume dedicated processors as non-preempt

2019-12-04 Thread Srikar Dronamraju
With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted
vCPUs"), scheduler avoids preempted vCPUs to schedule tasks on wakeup.
This leads to wrong choice of CPU, which in-turn leads to larger wakeup
latencies. Eventually, it leads to performance regression in latency
sensitive benchmarks like soltp, schbench etc.

On Powerpc, vcpu_is_preempted only looks at yield_count. If the
yield_count is odd, the vCPU is assumed to be preempted. However
yield_count is increased whenever LPAR enters CEDE state. So any CPU
that has entered CEDE state is assumed to be preempted.

Even if vCPU of dedicated LPAR is preempted/donated, it should have
right of first-use since they are suppose to own the vCPU.

On a Power9 System with 32 cores
 # lscpu
Architecture:ppc64le
Byte Order:  Little Endian
CPU(s):  128
On-line CPU(s) list: 0-127
Thread(s) per core:  8
Core(s) per socket:  1
Socket(s):   16
NUMA node(s):2
Model:   2.2 (pvr 004e 0202)
Model name:  POWER9 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:   32K
L1i cache:   32K
L2 cache:512K
L3 cache:10240K
NUMA node0 CPU(s):   0-63
NUMA node1 CPU(s):   64-127
 

  # perf stat -a -r 5 ./schbench
v5.4v5.4 + patch
Latency percentiles (usec)  Latency percentiles (usec)
49.th: 47   50.th: 33
74.th: 64   75.th: 44
89.th: 76   90.th: 50
94.th: 83   95.th: 53
*98.th: 103 *99.th: 57
98.5000th: 2124 99.5000th: 59
98.9000th: 7976 99.9000th: 83
min=-1, max=10519   min=0, max=117
Latency percentiles (usec)  Latency percentiles (usec)
49.th: 45   50.th: 34
74.th: 61   75.th: 45
89.th: 70   90.th: 52
94.th: 77   95.th: 56
*98.th: 504 *99.th: 62
98.5000th: 4012 99.5000th: 64
98.9000th: 8168 99.9000th: 79
min=-1, max=14500   min=0, max=123
Latency percentiles (usec)  Latency percentiles (usec)
49.th: 48   50.th: 35
74.th: 65   75.th: 47
89.th: 76   90.th: 55
94.th: 82   95.th: 59
*98.th: 1098*99.th: 67
98.5000th: 3988 99.5000th: 71
98.9000th: 9360 99.9000th: 98
min=-1, max=19283   min=0, max=137
Latency percentiles (usec)  Latency percentiles (usec)
49.th: 46   50.th: 35
74.th: 63   75.th: 46
89.th: 73   90.th: 53
94.th: 78   95.th: 57
*98.th: 113 *99.th: 63
98.5000th: 2316 99.5000th: 65
98.9000th: 7704 99.9000th: 83
min=-1, max=17976   min=0, max=139
Latency percentiles (usec)  Latency percentiles (usec)
49.th: 46   50.th: 34
74.th: 62   75.th: 46
89.th: 73   90.th: 53
94.th: 79   95.th: 57
*98.th: 97  *99.th: 64
98.5000th: 1398 99.5000th: 70
98.9000th: 8136 99.9000th: 100
min=-1, max=10008   min=0, max=142

Performance counter stats for 'system wide' (4 runs):

context-switches   42,604 ( +-  0.87% )   45,397 ( +-  0.25% )
cpu-migrations  0,195 ( +-  2.70% )  230 ( +-  7.23% )
page-faults16,783 ( +- 14.87% )   16,781 ( +-  9.77% )

Waiman Long suggested using static_keys.

Reported-by: Parth Shah 
Reported-by: Ihor Pasichnyk 
Cc: Parth Shah 
Cc: Ihor Pasichnyk 
Cc: Juri Lelli 
Cc: Waiman Long 

Re: [PATCH] powerpc/pmem: Fix kernel crash due to wrong range value usage in flush_dcache_range

2019-12-04 Thread Michael Ellerman
On Wed, 2019-12-04 at 05:29:09 UTC, "Aneesh Kumar K.V" wrote:
> This patch fix the below kernel crash.
> 
>  BUG: Unable to handle kernel data access on read at 0xc0038000
>  Faulting instruction address: 0xc008b6f0
> cpu 0x5: Vector: 300 (Data Access) at [c000d8587790]
> pc: c008b6f0: arch_remove_memory+0x150/0x210
> lr: c008b720: arch_remove_memory+0x180/0x210
> sp: c000d8587a20
>msr: 8280b033
>dar: c0038000
>  dsisr: 4000
>   current = 0xc000d8558600
>   paca= 0xcfff8f00   irqmask: 0x03   irq_happened: 0x01
> pid   = 1220, comm = ndctl
> enter ? for help
>  memunmap_pages+0x33c/0x410
>  devm_action_release+0x30/0x50
>  release_nodes+0x30c/0x3a0
>  device_release_driver_internal+0x178/0x240
>  unbind_store+0x74/0x190
>  drv_attr_store+0x44/0x60
>  sysfs_kf_write+0x74/0xa0
>  kernfs_fop_write+0x1b0/0x260
>  __vfs_write+0x3c/0x70
>  vfs_write+0xe4/0x200
>  ksys_write+0x7c/0x140
>  system_call+0x5c/0x68
> 
> Fixes: 076265907cf9 ("powerpc: Chunk calls to flush_dcache_range in 
> arch_*_memory")
> Reported-by: Sachin Sant 
> Signed-off-by: Aneesh Kumar K.V 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/6f4679b956741d2da6ad3ebb738cbe1264ac8781

cheers


Re: [PATCH] powerpc/xive: skip ioremap() of ESB pages for LSI interrupts

2019-12-04 Thread Michael Ellerman
On Tue, 2019-12-03 at 16:36:42 UTC, =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= wrote:
> The PCI INTx interrupts and other LSI interrupts are handled differently
> under a sPAPR platform. When the interrupt source characteristics are
> queried, the hypervisor returns an H_INT_ESB flag to inform the OS
> that it should be using the H_INT_ESB hcall for interrupt management
> and not loads and stores on the interrupt ESB pages.
> 
> A default -1 value is returned for the addresses of the ESB pages. The
> driver ignores this condition today and performs a bogus IO mapping.
> Recent changes and the DEBUG_VM configuration option make the bug
> visible with :
> 
> [0.015518] kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612!
> [0.015578] Oops: Exception in kernel mode, sig: 5 [#1]
> [0.015627] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA 
> pSeries
> [0.015697] Modules linked in:
> [0.015739] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.4.0-0.rc6.git0.1.fc32.ppc64le #1
> [0.015812] NIP:  c0f63294 LR: c0f62e44 CTR: 
> 
> [0.015889] REGS: c000fa45f0d0 TRAP: 0700   Not tainted  
> (5.4.0-0.rc6.git0.1.fc32.ppc64le)
> [0.015971] MSR:  82029033   CR: 
> 44000424  XER: 
> [0.016050] CFAR: c0f63128 IRQMASK: 0
> [0.016050] GPR00: c0f62e44 c000fa45f360 c1be5400 
> 
> [0.016050] GPR04: c19c7d38 c000fa340030 fa330009 
> c1c15e18
> [0.016050] GPR08: 0040 ffe0  
> 8418dd352dbd190f
> [0.016050] GPR12:  c1e0 c00a8006 
> c00a8006
> [0.016050] GPR16:  81ae c1c24d98 
> 
> [0.016050] GPR20: c00a8007 c1cafca0 c00a8007 
> 
> [0.016050] GPR24: c00a8008 c00a8008 c1cafca8 
> c00a8008
> [0.016050] GPR28: c000fa32e010 c00a8006  
> c000fa33
> [0.016711] NIP [c0f63294] ioremap_page_range+0x4c4/0x6e0
> [0.016778] LR [c0f62e44] ioremap_page_range+0x74/0x6e0
> [0.016846] Call Trace:
> [0.016876] [c000fa45f360] [c0f62e44] 
> ioremap_page_range+0x74/0x6e0 (unreliable)
> [0.016969] [c000fa45f460] [c00934bc] do_ioremap+0x8c/0x120
> [0.017037] [c000fa45f4b0] [c00938e8] 
> __ioremap_caller+0x128/0x140
> [0.017116] [c000fa45f500] [c00931a0] ioremap+0x30/0x50
> [0.017184] [c000fa45f520] [c00d1380] 
> xive_spapr_populate_irq_data+0x170/0x260
> [0.017263] [c000fa45f5c0] [c00cc90c] 
> xive_irq_domain_map+0x8c/0x170
> [0.017344] [c000fa45f600] [c0219124] 
> irq_domain_associate+0xb4/0x2d0
> [0.017424] [c000fa45f690] [c0219fe0] 
> irq_create_mapping+0x1e0/0x3b0
> [0.017506] [c000fa45f730] [c021ad6c] 
> irq_create_fwspec_mapping+0x27c/0x3e0
> [0.017586] [c000fa45f7c0] [c021af68] 
> irq_create_of_mapping+0x98/0xb0
> [0.017666] [c000fa45f830] [c08d4e48] 
> of_irq_parse_and_map_pci+0x168/0x230
> [0.017746] [c000fa45f910] [c0075428] 
> pcibios_setup_device+0x88/0x250
> [0.017826] [c000fa45f9a0] [c0077b84] 
> pcibios_setup_bus_devices+0x54/0x100
> [0.017906] [c000fa45fa10] [c00793f0] __of_scan_bus+0x160/0x310
> [0.017973] [c000fa45faf0] [c0075fc0] 
> pcibios_scan_phb+0x330/0x390
> [0.018054] [c000fa45fba0] [c139217c] pcibios_init+0x8c/0x128
> [0.018121] [c000fa45fc20] [c00107b0] 
> do_one_initcall+0x60/0x2c0
> [0.018201] [c000fa45fcf0] [c1384624] 
> kernel_init_freeable+0x290/0x378
> [0.018280] [c000fa45fdb0] [c0010d24] kernel_init+0x2c/0x148
> [0.018348] [c000fa45fe20] [c000bdbc] 
> ret_from_kernel_thread+0x5c/0x80
> [0.018427] Instruction dump:
> [0.018468] 41820014 3920fe7f 7d494838 7d290074 7929d182 f8e10038 69290001 
> 0b09
> [0.018552] 7a098420 0b09 7bc95960 7929a802 <0b09> 7fc68b78 
> e8610048 7dc47378
> 
> Cc: sta...@vger.kernel.org # v4.14+
> Fixes: bed81ee181dd ("powerpc/xive: introduce H_INT_ESB hcall")
> Signed-off-by: Cédric Le Goater 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/b67a95f2abff0c34e5667c15ab8900de73d8d087

cheers


Re: [PATCH v4 3/8] powerpc: Fix vDSO clock_getres()

2019-12-04 Thread Michael Ellerman
On Mon, 2019-12-02 at 07:57:29 UTC, Christophe Leroy wrote:
> From: Vincenzo Frascino 
> 
> clock_getres in the vDSO library has to preserve the same behaviour
> of posix_get_hrtimer_res().
> 
> In particular, posix_get_hrtimer_res() does:
> sec = 0;
> ns = hrtimer_resolution;
> and hrtimer_resolution depends on the enablement of the high
> resolution timers that can happen either at compile or at run time.
> 
> Fix the powerpc vdso implementation of clock_getres keeping a copy of
> hrtimer_resolution in vdso data and using that directly.
> 
> Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support
> to 32 bits kernel")
> Cc: sta...@vger.kernel.org
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Signed-off-by: Vincenzo Frascino 
> Reviewed-by: Christophe Leroy 
> Acked-by: Shuah Khan 
> [chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES]
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/552263456215ada7ee8700ce022d12b0cffe4802

cheers


Re: [PATCH] powerpc/kasan: fix boot failure with RELOCATABLE && FSL_BOOKE

2019-12-04 Thread Michael Ellerman
On Fri, 2019-11-29 at 14:26:41 UTC, Christophe Leroy wrote:
> When enabling CONFIG_RELOCATABLE and CONFIG_KASAN on FSL_BOOKE,
> the kernel doesn't boot.
> 
> relocate_init() requires KASAN early shadow area to be set up because
> it needs access to the device tree through generic functions.
> 
> Call kasan_early_init() before calling relocate_init()
> 
> Reported-by: Lexi Shao 
> Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
> Signed-off-by: Christophe Leroy 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/71eb40fc53371bc247c8066ae76ad9e22ae1e18d

cheers


[PATCH] powerpc/archrandom: fix arch_get_random_seed_int()

2019-12-04 Thread Ard Biesheuvel
Commit 01c9348c7620ec65

  powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*

updated arch_get_random_[int|long]() to be NOPs, and moved the hardware
RNG backing to arch_get_random_seed_[int|long]() instead. However, it
failed to take into account that arch_get_random_int() was implemented
in terms of arch_get_random_long(), and so we ended up with a version
of the former that is essentially a NOP as well.

Fix this by calling arch_get_random_seed_long() from
arch_get_random_seed_int() instead.

Fixes: 01c9348c7620ec65 ("powerpc: Use hardware RNG for arch_get_random_seed_* 
not arch_get_random_*")
Signed-off-by: Ard Biesheuvel 
---
 arch/powerpc/include/asm/archrandom.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/archrandom.h 
b/arch/powerpc/include/asm/archrandom.h
index 9c63b596e6ce..a09595f00cab 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -28,7 +28,7 @@ static inline int arch_get_random_seed_int(unsigned int *v)
unsigned long val;
int rc;
 
-   rc = arch_get_random_long();
+   rc = arch_get_random_seed_long();
if (rc)
*v = val;
 
-- 
2.17.1



Re: [PATCH 2/3] powerpc/sysfs: Show idle_purr and idle_spurr for every CPU

2019-12-04 Thread Gautham R Shenoy
Hi Kamalesh,

On Tue, Dec 03, 2019 at 07:07:53PM +0530, Kamalesh Babulal wrote:
> On 11/27/19 5:31 PM, Gautham R. Shenoy wrote:
> > From: "Gautham R. Shenoy" 
> > 
> > On Pseries LPARs, to calculate utilization, we need to know the
> > [S]PURR ticks when the CPUs were busy or idle.
> > 
> > The total PURR and SPURR ticks are already exposed via the per-cpu
> > sysfs files /sys/devices/system/cpu/cpuX/purr and
> > /sys/devices/system/cpu/cpuX/spurr.
> > 
> > This patch adds support for exposing the idle PURR and SPURR ticks via
> > /sys/devices/system/cpu/cpuX/idle_purr and
> > /sys/devices/system/cpu/cpuX/idle_spurr.
> 
> The patch looks good to me, with a minor file mode nit pick mentioned below.
> 
> > 
> > Signed-off-by: Gautham R. Shenoy 
> > ---
> >  arch/powerpc/kernel/sysfs.c | 32 
> >  1 file changed, 32 insertions(+)
> > 
> > diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> > index 80a676d..42ade55 100644
> > --- a/arch/powerpc/kernel/sysfs.c
> > +++ b/arch/powerpc/kernel/sysfs.c
> > @@ -1044,6 +1044,36 @@ static ssize_t show_physical_id(struct device *dev,
> >  }
> >  static DEVICE_ATTR(physical_id, 0444, show_physical_id, NULL);
> > 
> > +static ssize_t idle_purr_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > +   struct cpu *cpu = container_of(dev, struct cpu, dev);
> > +   unsigned int cpuid = cpu->dev.id;
> > +   struct lppaca *cpu_lppaca_ptr = paca_ptrs[cpuid]->lppaca_ptr;
> > +   u64 idle_purr_cycles = be64_to_cpu(cpu_lppaca_ptr->wait_state_cycles);
> > +
> > +   return sprintf(buf, "%llx\n", idle_purr_cycles);
> > +}
> > +static DEVICE_ATTR_RO(idle_purr);
> 
> per cpu purr/spurr sysfs file is created with file mode 0400. Using
> DEVICE_ATTR_RO for their idle_* variants will create sysfs files with 0444 as
> their file mode, you should probably use DEVICE_ATTR() with file mode 0400 to
> have consist permission for both variants.

Thanks for catching this. I missed checking the permissions of purr
and spurr. Will send another version.


> 
> -- 
> Kamalesh


[PATCH] powerpc: ensure that swiotlb buffer is allocated from low memory

2019-12-04 Thread Mike Rapoport
From: Mike Rapoport 

Some powerpc platforms (e.g. 85xx) limit DMA-able memory way below 4G. If a
system has more physical memory than this limit, the swiotlb buffer is not
addressable because it is allocated from memblock using top-down mode.

Force memblock to bottom-up mode before calling swiotlb_init() to ensure
that the swiotlb buffer is DMA-able.

Link: https://lkml.kernel.org/r/f1ebb706-73df-430e-9020-c214ec8ed...@xenosoft.de
Reported-by: Christian Zigotzky 
Signed-off-by: Mike Rapoport 
Cc: Benjamin Herrenschmidt 
Cc: Christoph Hellwig 
Cc: Darren Stevens 
Cc: mad skateman 
Cc: Michael Ellerman 
Cc: Nicolas Saenz Julienne 
Cc: Paul Mackerras 
Cc: Robin Murphy 
Cc: Rob Herring 
---
 arch/powerpc/mm/mem.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index be941d382c8d..14c2c53e3f9e 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -260,6 +260,14 @@ void __init mem_init(void)
BUILD_BUG_ON(MMU_PAGE_COUNT > 16);
 
 #ifdef CONFIG_SWIOTLB
+   /*
+* Some platforms (e.g. 85xx) limit DMA-able memory way below
+* 4G. We force memblock to bottom-up mode to ensure that the
+* memory allocated in swiotlb_init() is DMA-able.
+* As it's the last memblock allocation, no need to reset it
+* back to to-down.
+*/
+   memblock_set_bottom_up(true);
swiotlb_init(0);
 #endif
 
-- 
2.24.0



Re: [PATCH] powerpc/xive: skip ioremap() of ESB pages for LSI interrupts

2019-12-04 Thread Greg Kurz
On Tue,  3 Dec 2019 17:36:42 +0100
Cédric Le Goater  wrote:

> The PCI INTx interrupts and other LSI interrupts are handled differently
> under a sPAPR platform. When the interrupt source characteristics are
> queried, the hypervisor returns an H_INT_ESB flag to inform the OS
> that it should be using the H_INT_ESB hcall for interrupt management
> and not loads and stores on the interrupt ESB pages.
> 
> A default -1 value is returned for the addresses of the ESB pages. The
> driver ignores this condition today and performs a bogus IO mapping.
> Recent changes and the DEBUG_VM configuration option make the bug
> visible with :
> 
> [0.015518] kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612!
> [0.015578] Oops: Exception in kernel mode, sig: 5 [#1]
> [0.015627] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA 
> pSeries
> [0.015697] Modules linked in:
> [0.015739] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.4.0-0.rc6.git0.1.fc32.ppc64le #1
> [0.015812] NIP:  c0f63294 LR: c0f62e44 CTR: 
> 
> [0.015889] REGS: c000fa45f0d0 TRAP: 0700   Not tainted  
> (5.4.0-0.rc6.git0.1.fc32.ppc64le)
> [0.015971] MSR:  82029033   CR: 
> 44000424  XER: 
> [0.016050] CFAR: c0f63128 IRQMASK: 0
> [0.016050] GPR00: c0f62e44 c000fa45f360 c1be5400 
> 
> [0.016050] GPR04: c19c7d38 c000fa340030 fa330009 
> c1c15e18
> [0.016050] GPR08: 0040 ffe0  
> 8418dd352dbd190f
> [0.016050] GPR12:  c1e0 c00a8006 
> c00a8006
> [0.016050] GPR16:  81ae c1c24d98 
> 
> [0.016050] GPR20: c00a8007 c1cafca0 c00a8007 
> 
> [0.016050] GPR24: c00a8008 c00a8008 c1cafca8 
> c00a8008
> [0.016050] GPR28: c000fa32e010 c00a8006  
> c000fa33
> [0.016711] NIP [c0f63294] ioremap_page_range+0x4c4/0x6e0
> [0.016778] LR [c0f62e44] ioremap_page_range+0x74/0x6e0
> [0.016846] Call Trace:
> [0.016876] [c000fa45f360] [c0f62e44] 
> ioremap_page_range+0x74/0x6e0 (unreliable)
> [0.016969] [c000fa45f460] [c00934bc] do_ioremap+0x8c/0x120
> [0.017037] [c000fa45f4b0] [c00938e8] 
> __ioremap_caller+0x128/0x140
> [0.017116] [c000fa45f500] [c00931a0] ioremap+0x30/0x50
> [0.017184] [c000fa45f520] [c00d1380] 
> xive_spapr_populate_irq_data+0x170/0x260
> [0.017263] [c000fa45f5c0] [c00cc90c] 
> xive_irq_domain_map+0x8c/0x170
> [0.017344] [c000fa45f600] [c0219124] 
> irq_domain_associate+0xb4/0x2d0
> [0.017424] [c000fa45f690] [c0219fe0] 
> irq_create_mapping+0x1e0/0x3b0
> [0.017506] [c000fa45f730] [c021ad6c] 
> irq_create_fwspec_mapping+0x27c/0x3e0
> [0.017586] [c000fa45f7c0] [c021af68] 
> irq_create_of_mapping+0x98/0xb0
> [0.017666] [c000fa45f830] [c08d4e48] 
> of_irq_parse_and_map_pci+0x168/0x230
> [0.017746] [c000fa45f910] [c0075428] 
> pcibios_setup_device+0x88/0x250
> [0.017826] [c000fa45f9a0] [c0077b84] 
> pcibios_setup_bus_devices+0x54/0x100
> [0.017906] [c000fa45fa10] [c00793f0] __of_scan_bus+0x160/0x310
> [0.017973] [c000fa45faf0] [c0075fc0] 
> pcibios_scan_phb+0x330/0x390
> [0.018054] [c000fa45fba0] [c139217c] pcibios_init+0x8c/0x128
> [0.018121] [c000fa45fc20] [c00107b0] 
> do_one_initcall+0x60/0x2c0
> [0.018201] [c000fa45fcf0] [c1384624] 
> kernel_init_freeable+0x290/0x378
> [0.018280] [c000fa45fdb0] [c0010d24] kernel_init+0x2c/0x148
> [0.018348] [c000fa45fe20] [c000bdbc] 
> ret_from_kernel_thread+0x5c/0x80
> [0.018427] Instruction dump:
> [0.018468] 41820014 3920fe7f 7d494838 7d290074 7929d182 f8e10038 69290001 
> 0b09
> [0.018552] 7a098420 0b09 7bc95960 7929a802 <0b09> 7fc68b78 
> e8610048 7dc47378
> 
> Cc: sta...@vger.kernel.org # v4.14+
> Fixes: bed81ee181dd ("powerpc/xive: introduce H_INT_ESB hcall")
> Signed-off-by: Cédric Le Goater 
> ---
>  arch/powerpc/sysdev/xive/spapr.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/xive/spapr.c 
> b/arch/powerpc/sysdev/xive/spapr.c
> index 33c10749edec..55dc61cb4867 100644
> --- a/arch/powerpc/sysdev/xive/spapr.c
> +++ b/arch/powerpc/sysdev/xive/spapr.c
> @@ -392,20 +392,28 @@ static int xive_spapr_populate_irq_data(u32 hw_irq, 
> struct xive_irq_data *data)
>   data->esb_shift = esb_shift;
>   data->trig_page = trig_page;
>  
> + data->hw_irq = hw_irq;
> +

This is a side effect in the case where the XIVE_IRQ_FLAG_H_INT_ESB flag
isn't set 

Re: Bug 205201 - Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-12-04 Thread Christian Zigotzky
I think we have to wait to Roland’s test results with his SCSI PCI card.

Christian

Sent from my iPhone

> On 4. Dec 2019, at 09:56, Christoph Hellwig  wrote:
> 
>> On Wed, Nov 27, 2019 at 08:56:25AM +0200, Mike Rapoport wrote:
>>> On Tue, Nov 26, 2019 at 05:40:26PM +0100, Christoph Hellwig wrote:
 On Tue, Nov 26, 2019 at 12:26:38PM +0100, Christian Zigotzky wrote:
 Hello Christoph,
 
 The PCI TV card works with your patch! I was able to patch your Git kernel 
 with the patch above.
 
 I haven't found any error messages in the dmesg yet.
>>> 
>>> Thanks.  Unfortunately this is a bit of a hack as we need to set
>>> the mask based on runtime information like the magic FSL PCIe window.
>>> Let me try to draft something better up, and thanks already for testing
>>> this one!
>> 
>> Maybe we'll simply force bottom up allocation before calling
>> swiotlb_init()? Anyway, it's the last memblock allocation.
> 
> So I think we should go with this fix (plus a source code comment) for
> now.  Revamping the whole memory initialization is going to take a
> while, and this fix also is easily backportable.


[PATCH v5 2/2] ASoC: fsl_asrc: Add support for imx8qm & imx8qxp

2019-12-04 Thread Shengjiu Wang
There are two asrc module in imx8qm & imx8qxp, each module has
different clock configuration, and the DMA type is EDMA.

So in this patch, we define the new clocks, refine the clock map,
and include struct fsl_asrc_soc_data for different soc usage.

The EDMA channel is fixed with each dma request, one dma request
corresponding to one dma channel. So we need to request dma
channel with dma request of asrc module.

Signed-off-by: Shengjiu Wang 
Acked-by: Nicolin Chen 
---
changes in v2
- use !use_edma to wrap code in fsl_asrc_dma
- add Acked-by: Nicolin Chen

changes in v3
- remove the acked-by for commit is updated
- read "fsl,asrc-clk-map" property, and update table
  clk_map_imx8qm.

changes in v4
- Add table clk_map_imx8qxp
- add Acked-by: Nicolin Chen

changes in v5
- none

 sound/soc/fsl/fsl_asrc.c | 125 +--
 sound/soc/fsl/fsl_asrc.h |  64 +-
 sound/soc/fsl/fsl_asrc_dma.c |  39 +++
 3 files changed, 193 insertions(+), 35 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index a3cfceea7d2f..0dcebc24c312 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -41,26 +41,65 @@ static struct snd_pcm_hw_constraint_list 
fsl_asrc_rate_constraints = {
  * The following tables map the relationship between asrc_inclk/asrc_outclk in
  * fsl_asrc.h and the registers of ASRCSR
  */
-static unsigned char input_clk_map_imx35[] = {
+static unsigned char input_clk_map_imx35[ASRC_CLK_MAP_LEN] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf,
+   3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+   3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
 };
 
-static unsigned char output_clk_map_imx35[] = {
+static unsigned char output_clk_map_imx35[ASRC_CLK_MAP_LEN] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf,
+   3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+   3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
 };
 
 /* i.MX53 uses the same map for input and output */
-static unsigned char input_clk_map_imx53[] = {
+static unsigned char input_clk_map_imx53[ASRC_CLK_MAP_LEN] = {
 /* 0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8  0x9  0xa  0xb  0xc  0xd  
0xe  0xf */
0x0, 0x1, 0x2, 0x7, 0x4, 0x5, 0x6, 0x3, 0x8, 0x9, 0xa, 0xb, 0xc, 0xf, 
0xe, 0xd,
+   0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 
0x7, 0x7,
+   0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 
0x7, 0x7,
 };
 
-static unsigned char output_clk_map_imx53[] = {
+static unsigned char output_clk_map_imx53[ASRC_CLK_MAP_LEN] = {
 /* 0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8  0x9  0xa  0xb  0xc  0xd  
0xe  0xf */
0x8, 0x9, 0xa, 0x7, 0xc, 0x5, 0x6, 0xb, 0x0, 0x1, 0x2, 0x3, 0x4, 0xf, 
0xe, 0xd,
+   0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 
0x7, 0x7,
+   0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 0x7, 
0x7, 0x7,
 };
 
-static unsigned char *clk_map[2];
+/**
+ * i.MX8QM/i.MX8QXP uses the same map for input and output.
+ * clk_map_imx8qm[0] is for i.MX8QM asrc0
+ * clk_map_imx8qm[1] is for i.MX8QM asrc1
+ * clk_map_imx8qxp[0] is for i.MX8QXP asrc0
+ * clk_map_imx8qxp[1] is for i.MX8QXP asrc1
+ */
+static unsigned char clk_map_imx8qm[2][ASRC_CLK_MAP_LEN] = {
+   {
+   0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 
0xf, 0x0,
+   0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 
0xe, 0xf,
+   0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 
0xf, 0xf,
+   },
+   {
+   0xf, 0xf, 0xf, 0xf, 0xf, 0x7, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 
0xf, 0x0,
+   0x0, 0x1, 0x2, 0x3, 0xb, 0xc, 0xf, 0xf, 0xd, 0xe, 0xf, 0xf, 0xf, 0xf, 
0xf, 0xf,
+   0x4, 0x5, 0x6, 0xf, 0x8, 0x9, 0xa, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 
0xf, 0xf,
+   },
+};
+
+static unsigned char clk_map_imx8qxp[2][ASRC_CLK_MAP_LEN] = {
+   {
+   0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 
0xf, 0x0,
+   0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0xf, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 
0xf, 0xf,
+   0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 
0xf, 0xf,
+   },
+   {
+   0xf, 0xf, 0xf, 0xf, 0xf, 0x7, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 
0xf, 0x0,
+   0x0, 0x1, 0x2, 0x3, 0x7, 0x8, 0xf, 0xf, 0x9, 0xa, 0xf, 0xf, 0xf, 0xf, 
0xf, 0xf,
+   0xf, 0xf, 0x6, 0xf, 0xf, 0xf, 0xa, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 0xf, 
0xf, 0xf,
+   },
+};
 
 /**
  * Select the pre-processing and post-processing options
@@ -353,8 +392,8 @@ static int fsl_asrc_config_pair(struct fsl_asrc_pair *pair, 
bool use_ideal_rate)
}
 
/* Validate input and output clock sources */
-   clk_index[IN] = clk_map[IN][config->inclk];
-   clk_index[OUT] = clk_map[OUT][config->outclk];
+   clk_index[IN] = asrc_priv->clk_map[IN][config->inclk];
+   

[PATCH v5 1/2] ASoC: dt-bindings: fsl_asrc: add compatible string for imx8qm & imx8qxp

2019-12-04 Thread Shengjiu Wang
Add compatible string "fsl,imx8qm-asrc" for imx8qm platform,
"fsl,imx8qxp-asrc" for imx8qxp platform.

There are two asrc modules in imx8qm & imx8qxp, the clock mapping is
different for each other, so add new property "fsl,asrc-clk-map"
to distinguish them.

Signed-off-by: Shengjiu Wang 
---
changes in v2
-none

changes in v3
-use only one compatible string "fsl,imx8qm-asrc",
-add new property "fsl,asrc-clk-map".

changes in v4
-add "fsl,imx8qxp-asrc"

changes in v5
-refine the comments for compatible

 Documentation/devicetree/bindings/sound/fsl,asrc.txt | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/sound/fsl,asrc.txt 
b/Documentation/devicetree/bindings/sound/fsl,asrc.txt
index 1d4d9f938689..cb9a25165503 100644
--- a/Documentation/devicetree/bindings/sound/fsl,asrc.txt
+++ b/Documentation/devicetree/bindings/sound/fsl,asrc.txt
@@ -8,7 +8,12 @@ three substreams within totally 10 channels.
 
 Required properties:
 
-  - compatible : Contains "fsl,imx35-asrc" or "fsl,imx53-asrc".
+  - compatible : Compatible list, should contain one of the following
+ compatibles:
+ "fsl,imx35-asrc",
+ "fsl,imx53-asrc",
+ "fsl,imx8qm-asrc",
+ "fsl,imx8qxp-asrc",
 
   - reg: Offset and length of the register set for the 
device.
 
@@ -35,6 +40,11 @@ Required properties:
 
- fsl,asrc-width: Defines a mutual sample width used by DPCM Back Ends.
 
+   - fsl,asrc-clk-map   : Defines clock map used in driver. which is required
+ by imx8qm/imx8qxp platform
+ <0> - select the map for asrc0 in imx8qm/imx8qxp
+ <1> - select the map for asrc1 in imx8qm/imx8qxp
+
 Optional properties:
 
- big-endian: If this property is absent, the little endian 
mode
-- 
2.21.0



[PATCH] docs/core-api: Remove possibly confusing sub-headings from Bit Operations

2019-12-04 Thread Michael Ellerman
The recent commit 81d2c6f81996 ("kasan: support instrumented bitops
combined with generic bitops"), split the KASAN instrumented bitops
into separate headers for atomic, non-atomic and locking operations.

This was done to allow arches to include just the instrumented bitops
they need, while also using some of the generic bitops in
asm-generic/bitops (which are automatically instrumented). The generic
bitops are already split into atomic, non-atomic and locking headers.

This split required an update to kernel-api.rst because it included
include/asm-generic/bitops-instrumented.h, which no longer exists. So
now kernel-api.rst includes all three instrumented headers to get the
definitions for all the bitops.

When adding the three headers it seemed sensible to add sub-headings
for each, ie. "Atomic", "Non-atomic" and "Locking".

The confusion is that test_bit() is (and always has been) in
non-atomic.h, but is documented elsewhere (atomic_bitops.txt) as being
atomic. So having it appear under the "Non-atomic" heading is possibly
confusing.

Probably test_bit() should move from bitops/non-atomic.h to atomic.h,
but that has flow on effects. For now just remove the newly added
sub-headings in the documentation, so we at least aren't adding to the
confusion about whether test_bit() is atomic or not.

Signed-off-by: Michael Ellerman 
---
 Documentation/core-api/kernel-api.rst | 9 -
 1 file changed, 9 deletions(-)

Just FYI. I've applied this to my topic/kasan-bitops branch which I plan to ask
Linus to pull before the end of the merge window.

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=topic/kasan-bitops

cheers

diff --git a/Documentation/core-api/kernel-api.rst 
b/Documentation/core-api/kernel-api.rst
index 2caaeb55e8dd..4ac53a1363f6 100644
--- a/Documentation/core-api/kernel-api.rst
+++ b/Documentation/core-api/kernel-api.rst
@@ -57,21 +57,12 @@ The Linux kernel provides more basic utility functions.
 Bit Operations
 --
 
-Atomic Operations
-~
-
 .. kernel-doc:: include/asm-generic/bitops/instrumented-atomic.h
:internal:
 
-Non-atomic Operations
-~
-
 .. kernel-doc:: include/asm-generic/bitops/instrumented-non-atomic.h
:internal:
 
-Locking Operations
-~~
-
 .. kernel-doc:: include/asm-generic/bitops/instrumented-lock.h
:internal:
 
-- 
2.21.0



[RFC 3/3] powerpc/powernv: Parse device tree, population of SPR support

2019-12-04 Thread Pratik Rajesh Sampat
Parse the device tree for nodes self-save, self-restore and populate
support for the preferred SPRs based what was advertised by the device
tree.

Signed-off-by: Pratik Rajesh Sampat 
---
 arch/powerpc/platforms/powernv/idle.c | 104 ++
 1 file changed, 104 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index e33bb3fd88ac..b86d5da4561d 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -1427,6 +1427,107 @@ static void __init pnv_probe_idle_states(void)
supported_cpuidle_states |= pnv_idle_states[i].flags;
 }
 
+/*
+ * Extracts and populates the self save or restore capabilities
+ * passed from the device tree node
+ */
+static int extract_save_restore_state_dt(struct device_node *np, int type)
+{
+   int nr_sprns = 0, i, bitmask_index;
+   int rc = 0;
+   u64 *temp_u64;
+   const char *state_prop;
+   u64 bit_pos;
+
+   state_prop = of_get_property(np, "status", NULL);
+   if (!state_prop) {
+   pr_warn("opal: failed to find the active value for self 
save/restore node");
+   return -EINVAL;
+   }
+   if (strncmp(state_prop, "disabled", 8) == 0) {
+   /*
+* if the feature is not active, strip the preferred_sprs from
+* that capability.
+*/
+   if (type == SELF_RESTORE_TYPE) {
+   for (i = 0; i < nr_preferred_sprs; i++) {
+   preferred_sprs[i].supported_mode &=
+   ~SELF_RESTORE_STRICT;
+   }
+   } else {
+   for (i = 0; i < nr_preferred_sprs; i++) {
+   preferred_sprs[i].supported_mode &=
+   ~SELF_SAVE_STRICT;
+   }
+   }
+   return 0;
+   }
+   nr_sprns = of_property_count_u64_elems(np, "sprn-bitmask");
+   if (nr_sprns <= 0)
+   return rc;
+   temp_u64 = kcalloc(nr_sprns, sizeof(u64), GFP_KERNEL);
+   if (of_property_read_u64_array(np, "sprn-bitmask",
+  temp_u64, nr_sprns)) {
+   pr_warn("cpuidle-powernv: failed to find registers in DT\n");
+   kfree(temp_u64);
+   return -EINVAL;
+   }
+   /*
+* Populate acknowledgment of support for the sprs in the global vector
+* gotten by the registers supplied by the firmware.
+* The registers are in a bitmask, bit index within
+* that specifies the SPR
+*/
+   for (i = 0; i < nr_preferred_sprs; i++) {
+   bitmask_index = preferred_sprs[i].spr / 64;
+   bit_pos = preferred_sprs[i].spr % 64;
+   if ((temp_u64[bitmask_index] & (1UL << bit_pos)) == 0) {
+   if (type == SELF_RESTORE_TYPE)
+   preferred_sprs[i].supported_mode &=
+   ~SELF_RESTORE_STRICT;
+   else
+   preferred_sprs[i].supported_mode &=
+   ~SELF_SAVE_STRICT;
+   continue;
+   }
+   if (type == SELF_RESTORE_TYPE) {
+   preferred_sprs[i].supported_mode |=
+   SELF_RESTORE_STRICT;
+   } else {
+   preferred_sprs[i].supported_mode |=
+   SELF_SAVE_STRICT;
+   }
+   }
+
+   kfree(temp_u64);
+   return rc;
+}
+
+static int pnv_parse_deepstate_dt(void)
+{
+   struct device_node *np, *np1;
+   int rc = 0;
+
+   /* Self restore register population */
+   np = of_find_node_by_path("/ibm,opal/power-mgt/self-restore");
+   if (!np) {
+   pr_warn("opal: self restore Node not found");
+   } else {
+   rc = extract_save_restore_state_dt(np, SELF_RESTORE_TYPE);
+   if (rc != 0)
+   return rc;
+   }
+   /* Self save register population */
+   np1 = of_find_node_by_path("/ibm,opal/power-mgt/self-save");
+   if (!np1) {
+   pr_warn("opal: self save Node not found");
+   pr_warn("Legacy firmware. Assuming default self-restore 
support");
+   } else {
+   rc = extract_save_restore_state_dt(np1, SELF_SAVE_TYPE);
+   }
+   return rc;
+}
+
 /*
  * This function parses device-tree and populates all the information
  * into pnv_idle_states structure. It also sets up nr_pnv_idle_states
@@ -1575,6 +1676,9 @@ static int __init pnv_init_idle_states(void)
return rc;
pnv_probe_idle_states();
 
+   rc = pnv_parse_deepstate_dt();
+   if (rc)
+   return rc;
if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
   

[RFC 0/3] Integrate Support for self-save and determine

2019-12-04 Thread Pratik Rajesh Sampat
Currently the stop-api supports a mechanism called as self-restore
which allows us to restore the values of certain SPRs on wakeup from a
deep-stop state to a desired value. To use this, the Kernel makes an
OPAL call passing the PIR of the CPU, the SPR number and the value to
which the SPR should be restored when that CPU wakes up from a deep
stop state.

Recently, a new feature, named self-save has been enabled in the
stop-api, which is an alternative mechanism to do the same, except
that self-save will save the current content of the SPR before
entering a deep stop state and also restore the content back on
waking up from a deep stop state.

This patch series aims at introducing and leveraging the self-save feature in
the kernel.

Now, as the kernel has a choice to prefer one mode over the other and
there can be registers in both the save/restore SPR list which are sent
from the device tree, a new interface has been defined for the seamless
handing of the modes for each SPR.

A list of preferred SPRs are maintained in the kernel which contains two
properties:
1. supported_mode: Helps in identifying if it strictly supports self
   save or restore or both.
2. preferred_mode: Calls out what mode is preferred for each SPR. It
   could be strictly self save or restore, or it can also
   determine the preference of  mode over the other if both
   are present by encapsulating the other in bitmask from
   LSB to MSB.
Below is a table to show the Scenario::Consequence when the self save and
self restore modes are available or disabled in different combinations as
perceived from the device tree thus giving complete backwards compatibly
regardless of an older firmware running a newer kernel or vise-versa.

SR = Self restore; SS = Self save

.---..
| Scenario  |Consequence |
:---+:
| Legacy Firmware. No SS or SR node | Self restore is called for all |
|   | supported SPRs |
:---+:
| SR: !active SS: !active   | Deep stop states disabled  |
:---+:
| SR: active SS: !active| Self restore is called for all |
|   | supported SPRs |
:---+:
| SR: active SS: active | Goes through the preferences for each  |
|   | SPR and executes of the modes  |
|   | accordingly. Currently, Self restore is|
|   | called for all the SPRs except PSSCR   |
|   | which is self saved|
:---+:
| SR: active(only HID0) SS: active  | Self save called for all supported |
|   | registers expect HID0 (as HID0 cannot  |
|   | be self saved currently)   |
:---+:
| SR: !active SS: active| currently will disable deep states as  |
|   | HID0 is needed to be self restored and |
|   | cannot be self saved   |
'---''

Pratik Rajesh Sampat (3):
  powerpc/powernv: Interface to define support and preference for a SPR
  powerpc/powernv: Introduce Self save support
  powerpc/powernv: Parse device tree, population of SPR support

 arch/powerpc/include/asm/opal-api.h|   3 +-
 arch/powerpc/include/asm/opal.h|   1 +
 arch/powerpc/platforms/powernv/idle.c  | 431 ++---
 arch/powerpc/platforms/powernv/opal-call.c |   1 +
 4 files changed, 379 insertions(+), 57 deletions(-)

-- 
2.21.0



[RFC 2/3] powerpc/powernv: Introduce Self save support

2019-12-04 Thread Pratik Rajesh Sampat
This commit introduces and leverages the Self save API which OPAL now
supports.

Add the new Self Save OPAL API call in the list of OPAL calls.
Implement the self saving of the SPRs based on the support populated
while respecting it's preferences.

This implementation allows mixing of support for the SPRs, which
means that a SPR can be self restored while another SPR be self saved if
they support and prefer it to be so.

Signed-off-by: Pratik Rajesh Sampat 
---
 arch/powerpc/include/asm/opal-api.h| 3 ++-
 arch/powerpc/include/asm/opal.h| 1 +
 arch/powerpc/platforms/powernv/idle.c  | 2 ++
 arch/powerpc/platforms/powernv/opal-call.c | 1 +
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index c1f25a760eb1..89b7c44124e6 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -214,7 +214,8 @@
 #define OPAL_SECVAR_GET176
 #define OPAL_SECVAR_GET_NEXT   177
 #define OPAL_SECVAR_ENQUEUE_UPDATE 178
-#define OPAL_LAST  178
+#define OPAL_SLW_SELF_SAVE_REG 179
+#define OPAL_LAST  179
 
 #define QUIESCE_HOLD   1 /* Spin all calls at entry */
 #define QUIESCE_REJECT 2 /* Fail all calls with OPAL_BUSY */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9986ac34b8e2..389a85b63805 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -203,6 +203,7 @@ int64_t opal_handle_hmi(void);
 int64_t opal_handle_hmi2(__be64 *out_flags);
 int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end);
 int64_t opal_unregister_dump_region(uint32_t id);
+int64_t opal_slw_self_save_reg(uint64_t cpu_pir, uint64_t sprn);
 int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val);
 int64_t opal_config_cpu_idle_state(uint64_t state, uint64_t flag);
 int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t 
pe_number);
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index d38b8b6dcbce..e33bb3fd88ac 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -1170,6 +1170,8 @@ void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 
lpcr_val)
if (!is_lpcr_self_save)
opal_slw_set_reg(pir, SPRN_LPCR,
 lpcr_val);
+   else
+   opal_slw_self_save_reg(pir, SPRN_LPCR);
}
 }
 
diff --git a/arch/powerpc/platforms/powernv/opal-call.c 
b/arch/powerpc/platforms/powernv/opal-call.c
index 5cd0f52d258f..11e0ceb90de0 100644
--- a/arch/powerpc/platforms/powernv/opal-call.c
+++ b/arch/powerpc/platforms/powernv/opal-call.c
@@ -223,6 +223,7 @@ OPAL_CALL(opal_handle_hmi,  
OPAL_HANDLE_HMI);
 OPAL_CALL(opal_handle_hmi2,OPAL_HANDLE_HMI2);
 OPAL_CALL(opal_config_cpu_idle_state,  OPAL_CONFIG_CPU_IDLE_STATE);
 OPAL_CALL(opal_slw_set_reg,OPAL_SLW_SET_REG);
+OPAL_CALL(opal_slw_self_save_reg,  OPAL_SLW_SELF_SAVE_REG);
 OPAL_CALL(opal_register_dump_region,   OPAL_REGISTER_DUMP_REGION);
 OPAL_CALL(opal_unregister_dump_region, OPAL_UNREGISTER_DUMP_REGION);
 OPAL_CALL(opal_pci_set_phb_cxl_mode,   OPAL_PCI_SET_PHB_CAPI_MODE);
-- 
2.21.0



[RFC 1/3] powerpc/powernv: Interface to define support and preference for a SPR

2019-12-04 Thread Pratik Rajesh Sampat
Define a bitmask interface to determine support for the Self Restore,
Self Save or both.

Also define an interface to determine the preference of that SPR to
be strictly saved or restored or encapsulated with an order of preference.

The preference bitmask is shown as below:

|... | 2nd pref | 1st pref |

MSB   LSB

The preference from higher to lower is from LSB to MSB with a shift of 8
bits.
Example:
Prefer self save first, if not available then prefer self
restore
The preference mask for this scenario will be seen as below.
((SELF_RESTORE_STRICT << PREFERENCE_SHIFT) | SELF_SAVE_STRICT)
-
|... | Self restore | Self save |
-
MSB LSB

Finally, declare a list of preferred SPRs which encapsulate the bitmaks
for preferred and supported with defaults of both being set to support
legacy firmware.

This commit also implements using the above interface and retains the
legacy functionality of self restore.

Signed-off-by: Pratik Rajesh Sampat 
---
 arch/powerpc/platforms/powernv/idle.c | 325 +-
 1 file changed, 269 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 78599bca66c2..d38b8b6dcbce 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -32,9 +32,106 @@
 #define P9_STOP_SPR_MSR 2000
 #define P9_STOP_SPR_PSSCR  855
 
+/* Interface for the stop state supported and preference */
+#define SELF_RESTORE_TYPE0
+#define SELF_SAVE_TYPE   1
+
+#define NR_PREFERENCES2
+#define PREFERENCE_SHIFT  8
+#define PREFERENCE_MASK   0xff
+
+#define UNSUPPORTED 0x0
+#define SELF_RESTORE_STRICT 0x01
+#define SELF_SAVE_STRICT0x10
+
+/*
+ * Bitmask defining the kind of preferences available.
+ * Note : The higher to lower preference is from LSB to MSB, with a shift of
+ * 8 bits.
+ * 
+ * || 2nd pref | 1st pref |
+ * 
+ * MSB   LSB
+ */
+/* Prefer Restore if available, otherwise unsupported */
+#define PREFER_SELF_RESTORE_ONLY   SELF_RESTORE_STRICT
+/* Prefer Save if available, otherwise unsupported */
+#define PREFER_SELF_SAVE_ONLY  SELF_SAVE_STRICT
+/* Prefer Restore when available, otherwise prefer Save */
+#define PREFER_RESTORE_SAVE((SELF_SAVE_STRICT << \
+ PREFERENCE_SHIFT)\
+ | SELF_RESTORE_STRICT)
+/* Prefer Save when available, otherwise prefer Restore*/
+#define PREFER_SAVE_RESTORE((SELF_RESTORE_STRICT <<\
+ PREFERENCE_SHIFT)\
+ | SELF_SAVE_STRICT)
 static u32 supported_cpuidle_states;
 struct pnv_idle_states_t *pnv_idle_states;
 int nr_pnv_idle_states;
+/* Caching the lpcr & ptcr support to use later */
+static bool is_lpcr_self_save;
+static bool is_ptcr_self_save;
+
+struct preferred_sprs {
+   u64 spr;
+   u32 preferred_mode;
+   u32 supported_mode;
+};
+
+struct preferred_sprs preferred_sprs[] = {
+   {
+   .spr = SPRN_HSPRG0,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_LPCR,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_PTCR,
+   .preferred_mode = PREFER_SAVE_RESTORE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_HMEER,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_HID0,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = P9_STOP_SPR_MSR,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = P9_STOP_SPR_PSSCR,
+   .preferred_mode = PREFER_SAVE_RESTORE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_HID1,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_HID4,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   },
+   {
+   .spr = SPRN_HID5,
+   .preferred_mode = PREFER_RESTORE_SAVE,
+   .supported_mode = SELF_RESTORE_STRICT,
+   }
+};
+
+const int nr_preferred_sprs = ARRAY_SIZE(preferred_sprs);
 

Re: Bug 205201 - Booting halts if Dawicontrol DC-2976 UW SCSI board installed, unless RAM size limited to 3500M

2019-12-04 Thread Christoph Hellwig
On Wed, Nov 27, 2019 at 08:56:25AM +0200, Mike Rapoport wrote:
> On Tue, Nov 26, 2019 at 05:40:26PM +0100, Christoph Hellwig wrote:
> > On Tue, Nov 26, 2019 at 12:26:38PM +0100, Christian Zigotzky wrote:
> > > Hello Christoph,
> > >
> > > The PCI TV card works with your patch! I was able to patch your Git 
> > > kernel 
> > > with the patch above.
> > >
> > > I haven't found any error messages in the dmesg yet.
> > 
> > Thanks.  Unfortunately this is a bit of a hack as we need to set
> > the mask based on runtime information like the magic FSL PCIe window.
> > Let me try to draft something better up, and thanks already for testing
> > this one!
> 
> Maybe we'll simply force bottom up allocation before calling
> swiotlb_init()? Anyway, it's the last memblock allocation.

So I think we should go with this fix (plus a source code comment) for
now.  Revamping the whole memory initialization is going to take a
while, and this fix also is easily backportable.


Re: [alsa-devel] [PATCH v4 1/2] ASoC: dt-bindings: fsl_asrc: add compatible string for imx8qm & imx8qxp

2019-12-04 Thread Shengjiu Wang
Hi

On Mon, Dec 2, 2019 at 8:58 PM Fabio Estevam  wrote:
>
> On Mon, Dec 2, 2019 at 8:56 AM Shengjiu Wang  wrote:
>
> > -  - compatible : Contains "fsl,imx35-asrc" or "fsl,imx53-asrc".
> > +  - compatible : Contains "fsl,imx35-asrc", "fsl,imx53-asrc",
> > + "fsl,imx8qm-asrc", "fsl,imx8qxp-asrc"
>
> You missed the word "or" as in the original binding.

will update it in v5.

Best regards
Wang Shengjiu