Re: [PATCH v7 01/12] capabilities: introduce CAP_PERFMON to kernel and user space

2020-02-18 Thread Alexey Budankov


On 18.02.2020 22:21, James Morris wrote:
> On Mon, 17 Feb 2020, Alexey Budankov wrote:
> 
>>
>> Introduce CAP_PERFMON capability designed to secure system performance
>> monitoring and observability operations so that CAP_PERFMON would assist
>> CAP_SYS_ADMIN capability in its governing role for performance
>> monitoring and observability subsystems.
> 
> 
> Acked-by: James Morris 

Thanks James! 
I appreciate your involvement and collaboration 
w.r.t to the whole patch set.

Gratefully,
Alexey


Re: [PATCH 1/2] powerpc/kprobes: Remove redundant code

2020-02-18 Thread Christophe Leroy




Le 18/02/2020 à 15:39, Naveen N. Rao a écrit :

Christophe Leroy wrote:

At the time being we have something like

if (something) {
    p = get();
    if (p) {
    if (something_wrong)
    goto out;
    ...
    return;
    } else if (a != b) {
    if (some_error)
    goto out;
    ...
    }
    goto out;
}
p = get();
if (!p) {
    if (a != b) {
    if (some_error)
    goto out;
    ...
    }
    goto out;
}

This is similar to

p = get();
if (something) {
    if (p) {
    if (something_wrong)
    goto out;
    ...
    return;
    }
}
if (!p) {
    if (a != b) {
    if (some_error)
    goto out;
    ...
    }
    goto out;
}

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)


Good cleanup, thanks.



diff --git a/arch/powerpc/kernel/kprobes.c 
b/arch/powerpc/kernel/kprobes.c

index f8b848aa65bd..7a925eb76ec0 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -276,8 +276,8 @@ int kprobe_handler(struct pt_regs *regs)
 kcb = get_kprobe_ctlblk();

 /* Check we're not actually recursing */
+    p = get_kprobe(addr);
 if (kprobe_running()) {
-    p = get_kprobe(addr);
 if (p) {
 kprobe_opcode_t insn = *p->ainsn.insn;
 if (kcb->kprobe_status == KPROBE_HIT_SS &&
@@ -308,22 +308,9 @@ int kprobe_handler(struct pt_regs *regs)
 }
 prepare_singlestep(p, regs);
 return 1;
-    } else if (*addr != BREAKPOINT_INSTRUCTION) {
-    /* If trap variant, then it belongs not to us */
-    kprobe_opcode_t cur_insn = *addr;
-
-    if (is_trap(cur_insn))
-    goto no_kprobe;
-    /* The breakpoint instruction was removed by
- * another cpu right after we hit, no further
- * handling of this interrupt is appropriate
- */
-    ret = 1;
 }
-    goto no_kprobe;


A minot nit -- removing the above goto makes a slight change to the 
logic. But, see my comments for the next patch.


All legs of the (p) case are have either a return or a goto, so that 
goto no_kprobe is limited to the !p case, we have to fall_through now.


Christophe


RE: [PATCH v2 24/27] nvdimm/ocxl: Implement Overwrite

2020-02-18 Thread Alastair D'Silva
On Mon, 2020-02-03 at 15:10 +, Jonathan Cameron wrote:
> On Tue, 3 Dec 2019 14:46:52 +1100
> Alastair D'Silva  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > The near storage command 'Secure Erase' overwrites all data on the
> > media.
> > 
> > This patch hooks it up to the security function 'overwrite'.
> > 
> > Signed-off-by: Alastair D'Silva 
> 
> A few things to tidy up in here.
> 
> Thanks,
> 
> Jonathan
> 
> 
> > ---
> >  drivers/nvdimm/ocxl/scm.c  | 164
> > -
> >  drivers/nvdimm/ocxl/scm_internal.c |   1 +
> >  drivers/nvdimm/ocxl/scm_internal.h |  17 +++
> >  3 files changed, 180 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> > index a81eb5916eb3..8deb7862793c 100644
> > --- a/drivers/nvdimm/ocxl/scm.c
> > +++ b/drivers/nvdimm/ocxl/scm.c
> > @@ -169,6 +169,86 @@ static int scm_reserve_metadata(struct
> > scm_data *scm_data,
> > return 0;
> >  }
> >  
> > +/**
> > + * scm_overwrite() - Overwrite all data on the card
> > + * @scm_data: The SCM device data
> 
> I would mention in here that this exists with the lock held and
> where that is unlocked again.

Ok

> 
> > + * Return: 0 on success
> > + */
> > +int scm_overwrite(struct scm_data *scm_data)
> > +{
> > +   int rc;
> > +
> > +   mutex_lock(_data->ns_command.lock);
> > +
> > +   rc = scm_ns_command_request(scm_data, NS_COMMAND_SECURE_ERASE);
> > +   if (rc)
> 
> Perhaps change that goto label to reflect it is the error path rather
> than a shared exit route.
> 

Ok

> > +   goto out;
> > +
> > +   rc = scm_ns_command_execute(scm_data);
> > +   if (rc)
> > +   goto out;
> > +
> > +   scm_data->overwrite_state = SCM_OVERWRITE_BUSY;
> > +
> > +   return 0;
> > +
> > +out:
> > +   mutex_unlock(_data->ns_command.lock);
> > +   return rc;
> > +}
> > +
> > +/**
> > + * scm_secop_overwrite() - Overwrite all data on the card
> > + * @nvdimm: The nvdimm representation of the SCM device to start
> > the overwrite on
> > + * @key_data: Unused (no security key implementation)
> > + * Return: 0 on success
> > + */
> > +static int scm_secop_overwrite(struct nvdimm *nvdimm,
> > +  const struct nvdimm_key_data *key_data)
> > +{
> > +   struct scm_data *scm_data = nvdimm_provider_data(nvdimm);
> > +
> > +   return scm_overwrite(scm_data);
> > +}
> > +
> > +/**
> > + * scm_secop_query_overwrite() - Get the current overwrite state
> > + * @nvdimm: The nvdimm representation of the SCM device to start
> > the overwrite on
> > + * Return: 0 if successful or idle, -EBUSY if busy, -EFAULT if
> > failed
> > + */
> > +static int scm_secop_query_overwrite(struct nvdimm *nvdimm)
> > +{
> > +   struct scm_data *scm_data = nvdimm_provider_data(nvdimm);
> > +
> > +   if (scm_data->overwrite_state == SCM_OVERWRITE_BUSY)
> > +   return -EBUSY;
> > +
> > +   if (scm_data->overwrite_state == SCM_OVERWRITE_FAILED)
> > +   return -EFAULT;
> > +
> > +   return 0;
> > +}
> > +
> > +/**
> > + * scm_secop_get_flags() - return the security flags for the SCM
> > device
> 
> All params need to documented in kernel-doc comments.
Ok

> 
> > + */
> > +static unsigned long scm_secop_get_flags(struct nvdimm *nvdimm,
> > +   enum nvdimm_passphrase_type ptype)
> > +{
> > +   struct scm_data *scm_data = nvdimm_provider_data(nvdimm);
> > +
> > +   if (scm_data->overwrite_state == SCM_OVERWRITE_BUSY)
> > +   return BIT(NVDIMM_SECURITY_OVERWRITE);
> > +
> > +   return BIT(NVDIMM_SECURITY_DISABLED);
> > +}
> > +
> > +static const struct nvdimm_security_ops sec_ops  = {
> > +   .get_flags = scm_secop_get_flags,
> > +   .overwrite = scm_secop_overwrite,
> > +   .query_overwrite = scm_secop_query_overwrite,
> > +};
> > +
> >  /**
> >   * scm_register_lpc_mem() - Discover persistent memory on a device
> > and register it with the NVDIMM subsystem
> >   * @scm_data: The SCM device data
> > @@ -224,10 +304,10 @@ static int scm_register_lpc_mem(struct
> > scm_data *scm_data)
> > set_bit(NDD_ALIASING, _flags);
> >  
> > snprintf(serial, sizeof(serial), "%llx", fn_config->serial);
> > -   nd_mapping_desc.nvdimm = nvdimm_create(scm_data->nvdimm_bus,
> > scm_data,
> > +   nd_mapping_desc.nvdimm = __nvdimm_create(scm_data->nvdimm_bus,
> > scm_data,
> >  scm_dimm_attribute_groups,
> >  nvdimm_flags, nvdimm_cmd_mask,
> > -0, NULL);
> > +0, NULL, serial, _ops);
> > if (!nd_mapping_desc.nvdimm)
> > return -ENOMEM;
> >  
> > @@ -1530,6 +1610,83 @@ static void scm_dump_error_log(struct
> > scm_data *scm_data)
> > kfree(buf);
> >  }
> >  
> > +static void scm_handle_nscra_doorbell(struct scm_data *scm_data)
> > +{
> > +   int rc;
> > +
> > +   if (scm_data->ns_command.op_code == NS_COMMAND_SECURE_ERASE) {
> 
> Feels likely that we are going to end up with quite a few blocks like
> this as
> the driver is 

RE: [PATCH v2 22/27] nvdimm/ocxl: Implement the heartbeat command

2020-02-18 Thread Alastair D'Silva
On Mon, 2020-02-03 at 15:11 +, Jonathan Cameron wrote:
> On Tue, 3 Dec 2019 14:46:50 +1100
> Alastair D'Silva  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > The heartbeat admin command is a simple admin command that
> > exercises
> > the communication mechanisms within the controller.
> > 
> > This patch issues a heartbeat command to the card during init to
> > ensure
> > we can communicate with the card's crontroller.
> 
> controller

That's a perfectly cromulent misspelling ;)

> 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >  drivers/nvdimm/ocxl/scm.c | 43
> > +++
> >  1 file changed, 43 insertions(+)
> > 
> > diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> > index 8a30c887b5ed..e8b34262f397 100644
> > --- a/drivers/nvdimm/ocxl/scm.c
> > +++ b/drivers/nvdimm/ocxl/scm.c
> > @@ -353,6 +353,44 @@ static bool scm_is_usable(const struct
> > scm_data *scm_data)
> > return true;
> >  }
> >  
> > +/**
> > + * scm_heartbeat() - Issue a heartbeat command to the controller
> > + * @scm_data: a pointer to the SCM device data
> > + * Return: 0 if the controller responded correctly, negative on
> > error
> > + */
> > +static int scm_heartbeat(struct scm_data *scm_data)
> > +{
> > +   int rc;
> > +
> > +   mutex_lock(_data->admin_command.lock);
> > +
> > +   rc = scm_admin_command_request(scm_data,
> > ADMIN_COMMAND_HEARTBEAT);
> > +   if (rc)
> > +   goto out;
> > +
> > +   rc = scm_admin_command_execute(scm_data);
> > +   if (rc)
> > +   goto out;
> > +
> > +   rc = scm_admin_command_complete_timeout(scm_data,
> > ADMIN_COMMAND_HEARTBEAT);
> > +   if (rc < 0) {
> > +   dev_err(_data->dev, "Heartbeat timeout\n");
> > +   goto out;
> > +   }
> > +
> > +   rc = scm_admin_response(scm_data);
> > +   if (rc < 0)
> > +   goto out;
> > +   if (rc != STATUS_SUCCESS)
> > +   scm_warn_status(scm_data, "Unexpected status from
> > heartbeat", rc);
> > +
> > +   rc = scm_admin_response_handled(scm_data);
> > +
> > +out:
> > +   mutex_unlock(_data->admin_command.lock);
> > +   return rc;
> > +}
> > +
> >  /**
> >   * allocate_scm_minor() - Allocate a minor number to use for an
> > SCM device
> >   * @scm_data: The SCM device to associate the minor with
> > @@ -1508,6 +1546,11 @@ static int scm_probe(struct pci_dev *pdev,
> > const struct pci_device_id *ent)
> > goto err;
> > }
> >  
> > +   if (scm_heartbeat(scm_data)) {
> > +   dev_err(>dev, "SCM Heartbeat failed\n");
> > +   goto err;
> > +   }
> > +
> > elapsed = 0;
> > timeout = scm_data->readiness_timeout + scm_data-
> > >memory_available_timeout;
> > while (!scm_is_usable(scm_data)) {
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH v2 13/27] nvdimm/ocxl: Add support for Admin commands

2020-02-18 Thread Alastair D'Silva
On Mon, 2020-02-03 at 14:18 +, Jonathan Cameron wrote:
> On Tue, 3 Dec 2019 14:46:41 +1100
> Alastair D'Silva  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > This patch requests the metadata required to issue admin commands,
> > as well
> > as some helper functions to construct and check the completion of
> > the
> > commands.
> > 
> > Signed-off-by: Alastair D'Silva 
> 
> A few trivial bits inline.
> 
> Jonathan
> 
> > ---
> >  drivers/nvdimm/ocxl/scm.c  |  67 +
> >  drivers/nvdimm/ocxl/scm_internal.c | 152
> > +
> >  drivers/nvdimm/ocxl/scm_internal.h |  62 
> >  3 files changed, 281 insertions(+)
> > 
> > diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> > index 8088f65c289e..1e175f3c3cf2 100644
> > --- a/drivers/nvdimm/ocxl/scm.c
> > +++ b/drivers/nvdimm/ocxl/scm.c
> > @@ -267,6 +267,58 @@ static int scm_register_lpc_mem(struct
> > scm_data *scm_data)
> > return 0;
> >  }
> >  
> > +/**
> > + * scm_extract_command_metadata() - Extract command data from MMIO
> > & save it for further use
> > + * @scm_data: a pointer to the SCM device data
> > + * @offset: The base address of the command data structures
> > (address of CREQO)
> > + * @command_metadata: A pointer to the command metadata to
> > populate
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int scm_extract_command_metadata(struct scm_data *scm_data,
> > u32 offset,
> > +   struct command_metadata
> > *command_metadata)
> > +{
> > +   int rc;
> > +   u64 tmp;
> > +
> > +   rc = ocxl_global_mmio_read64(scm_data->ocxl_afu, offset,
> > OCXL_LITTLE_ENDIAN,
> > +);
> > +   if (rc)
> > +   return rc;
> > +
> > +   command_metadata->request_offset = tmp >> 32;
> > +   command_metadata->response_offset = tmp & 0x;
> > +
> > +   rc = ocxl_global_mmio_read64(scm_data->ocxl_afu, offset + 8,
> > OCXL_LITTLE_ENDIAN,
> > +);
> > +   if (rc)
> > +   return rc;
> > +
> > +   command_metadata->data_offset = tmp >> 32;
> > +   command_metadata->data_size = tmp & 0x;
> > +
> > +   command_metadata->id = 0;
> > +
> > +   return 0;
> > +}
> > +
> > +/**
> > + * scm_setup_command_metadata() - Set up the command metadata
> > + * @scm_data: a pointer to the SCM device data
> > + */
> > +static int scm_setup_command_metadata(struct scm_data *scm_data)
> > +{
> > +   int rc;
> > +
> > +   mutex_init(_data->admin_command.lock);
> > +
> > +   rc = scm_extract_command_metadata(scm_data,
> > GLOBAL_MMIO_ACMA_CREQO,
> > + _data->admin_command);
> > +   if (rc)
> > +   return rc;
> 
> Unless you are adding to this later in the series.
> 

Ignored

>   return scm_extract_command_metadata(scm_data,...)
> 
> > +
> > +   return 0;
> > +}
> > +
> >  /**
> >   * scm_is_usable() - Is a controller usable?
> >   * @scm_data: a pointer to the SCM device data
> > @@ -276,6 +328,8 @@ static bool scm_is_usable(const struct scm_data
> > *scm_data)
> >  {
> > u64 chi = 0;
> > int rc = scm_chi(scm_data, );
> > +   if (rc)
> > +   return false;
> >  
> > if (!(chi & GLOBAL_MMIO_CHI_CRDY)) {
> > dev_err(_data->dev, "SCM controller is not
> > ready.\n");
> > @@ -502,6 +556,14 @@ static int scm_probe(struct pci_dev *pdev,
> > const struct pci_device_id *ent)
> > }
> > scm_data->pdev = pdev;
> >  
> > +   scm_data->timeouts[ADMIN_COMMAND_ERRLOG] = 2000; // ms
> > +   scm_data->timeouts[ADMIN_COMMAND_HEARTBEAT] = 100; // ms
> > +   scm_data->timeouts[ADMIN_COMMAND_SMART] = 100; // ms
> > +   scm_data->timeouts[ADMIN_COMMAND_CONTROLLER_DUMP] = 1000; // ms
> > +   scm_data->timeouts[ADMIN_COMMAND_CONTROLLER_STATS] = 100; // ms
> > +   scm_data->timeouts[ADMIN_COMMAND_SHUTDOWN] = 1000; // ms
> > +   scm_data->timeouts[ADMIN_COMMAND_FW_UPDATE] = 16000; // ms
> > +
> > pci_set_drvdata(pdev, scm_data);
> >  
> > scm_data->ocxl_fn = ocxl_function_open(pdev);
> > @@ -543,6 +605,11 @@ static int scm_probe(struct pci_dev *pdev,
> > const struct pci_device_id *ent)
> > goto err;
> > }
> >  
> > +   if (scm_setup_command_metadata(scm_data)) {
> > +   dev_err(>dev, "Could not read OCXL command
> > matada\n");
> > +   goto err;
> > +   }
> > +
> > elapsed = 0;
> > timeout = scm_data->readiness_timeout + scm_data-
> > >memory_available_timeout;
> > while (!scm_is_usable(scm_data)) {
> > diff --git a/drivers/nvdimm/ocxl/scm_internal.c
> > b/drivers/nvdimm/ocxl/scm_internal.c
> > index 72d3c0e7d846..7b11b56863fb 100644
> > --- a/drivers/nvdimm/ocxl/scm_internal.c
> > +++ b/drivers/nvdimm/ocxl/scm_internal.c
> > @@ -17,3 +17,155 @@ int scm_chi(const struct scm_data *scm_data,
> > u64 *chi)
> >  
> > return 0;
> >  }
> > +
> > +static int scm_command_request(const struct scm_data *scm_data,
> > +  struct 

RE: [PATCH v2 14/27] nvdimm/ocxl: Add support for near storage commands

2020-02-18 Thread Alastair D'Silva
On Mon, 2020-02-03 at 14:22 +, Jonathan Cameron wrote:
> On Tue, 3 Dec 2019 14:46:42 +1100
> Alastair D'Silva  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > Similar to the previous patch, this adds support for near storage
> > commands.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >  drivers/nvdimm/ocxl/scm.c  |  6 +
> >  drivers/nvdimm/ocxl/scm_internal.c | 41
> > ++
> >  drivers/nvdimm/ocxl/scm_internal.h | 38
> > +++
> >  3 files changed, 85 insertions(+)
> > 
> > diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> > index 1e175f3c3cf2..6c16ca7fabfa 100644
> > --- a/drivers/nvdimm/ocxl/scm.c
> > +++ b/drivers/nvdimm/ocxl/scm.c
> > @@ -310,12 +310,18 @@ static int scm_setup_command_metadata(struct
> > scm_data *scm_data)
> > int rc;
> >  
> > mutex_init(_data->admin_command.lock);
> > +   mutex_init(_data->ns_command.lock);
> >  
> > rc = scm_extract_command_metadata(scm_data,
> > GLOBAL_MMIO_ACMA_CREQO,
> >   _data->admin_command);
> > if (rc)
> > return rc;
> >  
> > +   rc = scm_extract_command_metadata(scm_data,
> > GLOBAL_MMIO_NSCMA_CREQO,
> > + _data->ns_command);
> > +   if (rc)
> > +   return rc;
> > +
> 
> Ah. So much for my comment in previous patch.  Ignore that...
> 
> > return 0;
> >  }
> >  
> > diff --git a/drivers/nvdimm/ocxl/scm_internal.c
> > b/drivers/nvdimm/ocxl/scm_internal.c
> > index 7b11b56863fb..c405f1d8afb8 100644
> > --- a/drivers/nvdimm/ocxl/scm_internal.c
> > +++ b/drivers/nvdimm/ocxl/scm_internal.c
> > @@ -132,6 +132,47 @@ int scm_admin_response_handled(const struct
> > scm_data *scm_data)
> >   OCXL_LITTLE_ENDIAN,
> > GLOBAL_MMIO_CHI_ACRA);
> >  }
> >  
> > +int scm_ns_command_request(struct scm_data *scm_data, u8 op_code)
> > +{
> > +   u64 val;
> > +   int rc = ocxl_global_mmio_read64(scm_data->ocxl_afu,
> > GLOBAL_MMIO_CHI,
> > +OCXL_LITTLE_ENDIAN, );
> > +   if (rc)
> > +   return rc;
> > +
> > +   if (!(val & GLOBAL_MMIO_CHI_NSCRA))
> > +   return -EBUSY;
> > +
> > +   return scm_command_request(scm_data, _data->ns_command,
> > op_code);
> > +}
> > +
> > +int scm_ns_response(const struct scm_data *scm_data)
> > +{
> > +   return scm_command_response(scm_data, _data->ns_command);
> > +}
> > +
> > +int scm_ns_command_execute(const struct scm_data *scm_data)
> > +{
> > +   return ocxl_global_mmio_set64(scm_data->ocxl_afu,
> > GLOBAL_MMIO_HCI,
> > + OCXL_LITTLE_ENDIAN,
> > GLOBAL_MMIO_HCI_NSCRW);
> > +}
> > +
> > +bool scm_ns_command_complete(const struct scm_data *scm_data)
> > +{
> > +   u64 val = 0;
> > +   int rc = scm_chi(scm_data, );
> > +
> > +   WARN_ON(rc);
> > +
> > +   return (val & GLOBAL_MMIO_CHI_NSCRA) != 0;
> > +}
> > +
> > +int scm_ns_response_handled(const struct scm_data *scm_data)
> > +{
> > +   return ocxl_global_mmio_set64(scm_data->ocxl_afu,
> > GLOBAL_MMIO_CHIC,
> > + OCXL_LITTLE_ENDIAN,
> > GLOBAL_MMIO_CHI_NSCRA);
> > +}
> > +
> >  void scm_warn_status(const struct scm_data *scm_data, const char
> > *message,
> >  u8 status)
> >  {
> > diff --git a/drivers/nvdimm/ocxl/scm_internal.h
> > b/drivers/nvdimm/ocxl/scm_internal.h
> > index 9bff684cd069..9575996a89e7 100644
> > --- a/drivers/nvdimm/ocxl/scm_internal.h
> > +++ b/drivers/nvdimm/ocxl/scm_internal.h
> > @@ -108,6 +108,7 @@ struct scm_data {
> > struct ocxl_context *ocxl_context;
> > void *metadata_addr;
> > struct command_metadata admin_command;
> > +   struct command_metadata ns_command;
> > struct resource scm_res;
> > struct nd_region *nd_region;
> > char fw_version[8+1];
> > @@ -176,6 +177,42 @@ int scm_admin_command_complete_timeout(const
> > struct scm_data *scm_data,
> >   */
> >  int scm_admin_response_handled(const struct scm_data *scm_data);
> >  
> > +/**
> > + * scm_ns_command_request() - Issue a near storage command request
> > + * @scm_data: a pointer to the SCM device data
> > + * @op_code: The op-code for the command
> > + * Returns an identifier for the command, or negative on error
> > + */
> > +int scm_ns_command_request(struct scm_data *scm_data, u8 op_code);
> > +
> > +/**
> > + * scm_ns_response() - Validate a near storage response
> > + * @scm_data: a pointer to the SCM device data
> > + * Returns the status code of the command, or negative on error
> > + */
> > +int scm_ns_response(const struct scm_data *scm_data);
> > +
> > +/**
> > + * scm_ns_command_execute() - Notify the controller to start
> > processing a pending near storage command
> > + * @scm_data: a pointer to the SCM device data
> > + * Returns 0 on success, negative on error
> > + */
> > +int scm_ns_command_execute(const struct scm_data *scm_data);
> > +
> > +/**
> > + * scm_ns_command_complete() - Is a near storage 

RE: [PATCH v2 12/27] nvdimm/ocxl: Read the capability registers & wait for device ready

2020-02-18 Thread Alastair D'Silva
On Mon, 2020-02-03 at 13:23 +, Jonathan Cameron wrote:
> On Tue, 3 Dec 2019 14:46:40 +1100
> Alastair D'Silva  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > This patch reads timeouts & firmware version from the controller,
> > and
> > uses those timeouts to wait for the controller to report that it is
> > ready
> > before handing the memory over to libnvdimm.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >  drivers/nvdimm/ocxl/Makefile   |  2 +-
> >  drivers/nvdimm/ocxl/scm.c  | 84
> > ++
> >  drivers/nvdimm/ocxl/scm_internal.c | 19 +++
> >  drivers/nvdimm/ocxl/scm_internal.h | 24 +
> >  4 files changed, 128 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/nvdimm/ocxl/scm_internal.c
> > 
> > diff --git a/drivers/nvdimm/ocxl/Makefile
> > b/drivers/nvdimm/ocxl/Makefile
> > index 74a1bd98848e..9b6e31f0eb3e 100644
> > --- a/drivers/nvdimm/ocxl/Makefile
> > +++ b/drivers/nvdimm/ocxl/Makefile
> > @@ -4,4 +4,4 @@ ccflags-$(CONFIG_PPC_WERROR)+= -Werror
> >  
> >  obj-$(CONFIG_OCXL_SCM) += ocxlscm.o
> >  
> > -ocxlscm-y := scm.o
> > +ocxlscm-y := scm.o scm_internal.o
> > diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> > index 571058a9e7b8..8088f65c289e 100644
> > --- a/drivers/nvdimm/ocxl/scm.c
> > +++ b/drivers/nvdimm/ocxl/scm.c
> > @@ -7,6 +7,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -266,6 +267,30 @@ static int scm_register_lpc_mem(struct
> > scm_data *scm_data)
> > return 0;
> >  }
> >  
> > +/**
> > + * scm_is_usable() - Is a controller usable?
> > + * @scm_data: a pointer to the SCM device data
> > + * Return: true if the controller is usable
> > + */
> > +static bool scm_is_usable(const struct scm_data *scm_data)
> > +{
> > +   u64 chi = 0;
> > +   int rc = scm_chi(scm_data, );
> > +
> > +   if (!(chi & GLOBAL_MMIO_CHI_CRDY)) {
> > +   dev_err(_data->dev, "SCM controller is not
> > ready.\n");
> > +   return false;
> > +   }
> > +
> > +   if (!(chi & GLOBAL_MMIO_CHI_MA)) {
> > +   dev_err(_data->dev,
> > +   "SCM controller does not have memory
> > available.\n");
> > +   return false;
> > +   }
> > +
> > +   return true;
> > +}
> > +
> >  /**
> >   * allocate_scm_minor() - Allocate a minor number to use for an
> > SCM device
> >   * @scm_data: The SCM device to associate the minor with
> > @@ -380,6 +405,48 @@ static void scm_remove(struct pci_dev *pdev)
> > }
> >  }
> >  
> > +/**
> > + * read_device_metadata() - Retrieve config information from the
> > AFU and save it for future use
> > + * @scm_data: the SCM metadata
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int read_device_metadata(struct scm_data *scm_data)
> > +{
> > +   u64 val;
> > +   int rc;
> > +
> > +   rc = ocxl_global_mmio_read64(scm_data->ocxl_afu,
> > GLOBAL_MMIO_CCAP0,
> > +OCXL_LITTLE_ENDIAN, );
> > +   if (rc)
> > +   return rc;
> > +
> > +   scm_data->scm_revision = val & 0x;
> > +   scm_data->read_latency = (val >> 32) & 0xFF;
> > +   scm_data->readiness_timeout = (val >> 48) & 0xff;
> > +   scm_data->memory_available_timeout = val >> 52;
> 
> This overlaps with the masked region for readiness_timeout.  I'll
> guess the maks
> on that should be 0xF.
> 
Good catch, you're correct.

> > +
> > +   rc = ocxl_global_mmio_read64(scm_data->ocxl_afu,
> > GLOBAL_MMIO_CCAP1,
> > +OCXL_LITTLE_ENDIAN, );
> > +   if (rc)
> > +   return rc;
> > +
> > +   scm_data->max_controller_dump_size = val & 0x;
> > +
> > +   // Extract firmware version text
> > +   rc = ocxl_global_mmio_read64(scm_data->ocxl_afu,
> > GLOBAL_MMIO_FWVER,
> > +OCXL_HOST_ENDIAN, (u64 *)scm_data-
> > >fw_version);
> > +   if (rc)
> > +   return rc;
> > +
> > +   scm_data->fw_version[8] = '\0';
> > +
> > +   dev_info(_data->dev,
> > +"Firmware version '%s' SCM revision %d:%d\n",
> > scm_data->fw_version,
> > +scm_data->scm_revision >> 4, scm_data->scm_revision &
> > 0x0F);
> > +
> > +   return 0;
> > +}
> > +
> >  /**
> >   * scm_probe_function_0 - Set up function 0 for an OpenCAPI
> > Storage Class Memory device
> >   * This is important as it enables templates higher than 0 across
> > all other functions,
> > @@ -420,6 +487,8 @@ static int scm_probe_function_0(struct pci_dev
> > *pdev)
> >  static int scm_probe(struct pci_dev *pdev, const struct
> > pci_device_id *ent)
> >  {
> > struct scm_data *scm_data = NULL;
> > +   int elapsed;
> > +   u16 timeout;
> >  
> > if (PCI_FUNC(pdev->devfn) == 0)
> > return scm_probe_function_0(pdev);
> > @@ -469,6 +538,21 @@ static int scm_probe(struct pci_dev *pdev,
> > const struct pci_device_id *ent)
> > goto err;
> > }
> >  
> > +   if (read_device_metadata(scm_data)) {
> > +   dev_err(>dev, 

RE: [PATCH v2 10/27] nvdimm: Add driver for OpenCAPI Storage Class Memory

2020-02-18 Thread Alastair D'Silva
On Mon, 2020-02-03 at 13:20 +, Jonathan Cameron wrote:
> On Tue, 3 Dec 2019 14:46:38 +1100
> Alastair D'Silva  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > This driver exposes LPC memory on OpenCAPI SCM cards
> > as an NVDIMM, allowing the existing nvram infrastructure
> > to be used.
> > 
> > Namespace metadata is stored on the media itself, so
> > scm_reserve_metadata() maps 1 section's worth of PMEM storage
> > at the start to hold this. The rest of the PMEM range is registered
> > with libnvdimm as an nvdimm. scm_ndctl_config_read/write/size()
> > provide
> > callbacks to libnvdimm to access the metadata.
> > 
> > Signed-off-by: Alastair D'Silva 
> Hi Alastair,
> 
> A few bits and bobs inline.
> 
> Thanks,
> 
> Jonathan
> 
> > ---
> >  drivers/nvdimm/Kconfig |   2 +
> >  drivers/nvdimm/Makefile|   2 +-
> >  drivers/nvdimm/ocxl/Kconfig|  15 +
> >  drivers/nvdimm/ocxl/Makefile   |   7 +
> >  drivers/nvdimm/ocxl/scm.c  | 519
> > +
> >  drivers/nvdimm/ocxl/scm_internal.h |  28 ++
> >  6 files changed, 572 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/nvdimm/ocxl/Kconfig
> >  create mode 100644 drivers/nvdimm/ocxl/Makefile
> >  create mode 100644 drivers/nvdimm/ocxl/scm.c
> >  create mode 100644 drivers/nvdimm/ocxl/scm_internal.h
> > 
> > diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> > index 36af7af6b7cf..d1bab36da61c 100644
> > --- a/drivers/nvdimm/Kconfig
> > +++ b/drivers/nvdimm/Kconfig
> > @@ -130,4 +130,6 @@ config NVDIMM_TEST_BUILD
> >   core devm_memremap_pages() implementation and other
> >   infrastructure.
> >  
> > +source "drivers/nvdimm/ocxl/Kconfig"
> > +
> >  endif
> > diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> > index 29203f3d3069..e33492128042 100644
> > --- a/drivers/nvdimm/Makefile
> > +++ b/drivers/nvdimm/Makefile
> > @@ -1,5 +1,5 @@
> >  # SPDX-License-Identifier: GPL-2.0
> > -obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
> > +obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o ocxl/
> >  obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
> >  obj-$(CONFIG_ND_BTT) += nd_btt.o
> >  obj-$(CONFIG_ND_BLK) += nd_blk.o
> > diff --git a/drivers/nvdimm/ocxl/Kconfig
> > b/drivers/nvdimm/ocxl/Kconfig
> > new file mode 100644
> > index ..24099b300f5e
> > --- /dev/null
> > +++ b/drivers/nvdimm/ocxl/Kconfig
> > @@ -0,0 +1,15 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +if LIBNVDIMM
> > +
> > +config OCXL_SCM
> > +   tristate "OpenCAPI Storage Class Memory"
> > +   depends on LIBNVDIMM && PPC_POWERNV && PCI && EEH
> > +   select ZONE_DEVICE
> > +   select OCXL
> > +   help
> > + Exposes devices that implement the OpenCAPI Storage Class
> > Memory
> > + specification as persistent memory regions.
> > +
> > + Select N if unsure.
> > +
> > +endif
> > diff --git a/drivers/nvdimm/ocxl/Makefile
> > b/drivers/nvdimm/ocxl/Makefile
> > new file mode 100644
> > index ..74a1bd98848e
> > --- /dev/null
> > +++ b/drivers/nvdimm/ocxl/Makefile
> > @@ -0,0 +1,7 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +ccflags-$(CONFIG_PPC_WERROR)   += -Werror
> > +
> > +obj-$(CONFIG_OCXL_SCM) += ocxlscm.o
> > +
> > +ocxlscm-y := scm.o
> > diff --git a/drivers/nvdimm/ocxl/scm.c b/drivers/nvdimm/ocxl/scm.c
> > new file mode 100644
> > index ..571058a9e7b8
> > --- /dev/null
> > +++ b/drivers/nvdimm/ocxl/scm.c
> > @@ -0,0 +1,519 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +// Copyright 2019 IBM Corp.
> > +
> > +/*
> > + * A driver for Storage Class Memory, connected via OpenCAPI
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include "scm_internal.h"
> > +
> > +
> > +static const struct pci_device_id scm_pci_tbl[] = {
> > +   { PCI_DEVICE(PCI_VENDOR_ID_IBM, 0x0625), },
> > +   { }
> > +};
> > +
> > +MODULE_DEVICE_TABLE(pci, scm_pci_tbl);
> > +
> > +#define SCM_NUM_MINORS 256 // Total to reserve
> > +
> > +static dev_t scm_dev;
> > +static struct class *scm_class;
> > +static struct mutex minors_idr_lock;
> > +static struct idr minors_idr;
> > +
> > +static const struct attribute_group *scm_pmem_attribute_groups[] =
> > {
> > +   _bus_attribute_group,
> > +   NULL,
> > +};
> > +
> > +static const struct attribute_group
> > *scm_pmem_region_attribute_groups[] = {
> > +   _region_attribute_group,
> > +   _device_attribute_group,
> > +   _mapping_attribute_group,
> > +   _numa_attribute_group,
> > +   NULL,
> > +};
> > +
> > +/**
> > + * scm_ndctl_config_write() - Handle a ND_CMD_SET_CONFIG_DATA
> > command from ndctl
> > + * @scm_data: the SCM metadata
> > + * @command: the incoming data to write
> > + * Return: 0 on success, negative on failure
> > + */
> > +static int scm_ndctl_config_write(struct scm_data *scm_data,
> > + struct nd_cmd_set_config_hdr
> > *command)
> > +{
> > +   if (command->in_offset + command->in_length >
> > SCM_LABEL_AREA_SIZE)
> > +   return 

RE: [PATCH v2 08/27] ocxl: Save the device serial number in ocxl_fn

2020-02-18 Thread Alastair D'Silva
On Mon, 2020-02-03 at 12:53 +, Jonathan Cameron wrote:
> On Tue, 3 Dec 2019 14:46:36 +1100
> Alastair D'Silva  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > This patch retrieves the serial number of the card and makes it
> > available
> > to consumers of the ocxl driver via the ocxl_fn struct.
> > 
> > Signed-off-by: Alastair D'Silva 
> > Acked-by: Frederic Barrat 
> > Acked-by: Andrew Donnellan 
> > ---
> >  drivers/misc/ocxl/config.c | 46
> > ++
> >  include/misc/ocxl.h|  1 +
> >  2 files changed, 47 insertions(+)
> > 
> > diff --git a/drivers/misc/ocxl/config.c
> > b/drivers/misc/ocxl/config.c
> > index fb0c3b6f8312..a9203c309365 100644
> > --- a/drivers/misc/ocxl/config.c
> > +++ b/drivers/misc/ocxl/config.c
> > @@ -71,6 +71,51 @@ static int find_dvsec_afu_ctrl(struct pci_dev
> > *dev, u8 afu_idx)
> > return 0;
> >  }
> >  
> > +/**
> 
> Make sure anything you mark as kernel doc with /** is valid
> kernel-doc.
> 

Ok

> > + * Find a related PCI device (function 0)
> > + * @device: PCI device to match
> > + *
> > + * Returns a pointer to the related device, or null if not found
> > + */
> > +static struct pci_dev *get_function_0(struct pci_dev *dev)
> > +{
> > +   unsigned int devfn = PCI_DEVFN(PCI_SLOT(dev->devfn), 0); //
> > Look for function 0
> 
> Not sure the trailing comment adds much.
> 
> I'd personally not bother with this wrapper at all and just call
> the pci functions directly where needed.
> 

I'm not that familiar with the macros, so its not immediately obvious
to me what it's doing, so my preference is to leave it.
> > +
> > +   return pci_get_domain_bus_and_slot(pci_domain_nr(dev->bus),
> > +   dev->bus->number, devfn);
> > +}
> > +
> > +static void read_serial(struct pci_dev *dev, struct ocxl_fn_config
> > *fn)
> > +{
> > +   u32 low, high;
> > +   int pos;
> > +
> > +   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DSN);
> > +   if (pos) {
> > +   pci_read_config_dword(dev, pos + 0x04, );
> > +   pci_read_config_dword(dev, pos + 0x08, );
> > +
> > +   fn->serial = low | ((u64)high) << 32;
> > +
> > +   return;
> > +   }
> > +
> > +   if (PCI_FUNC(dev->devfn) != 0) {
> > +   struct pci_dev *related = get_function_0(dev);
> > +
> > +   if (!related) {
> > +   fn->serial = 0;
> > +   return;
> > +   }
> > +
> > +   read_serial(related, fn);
> > +   pci_dev_put(related);
> > +   return;
> > +   }
> > +
> > +   fn->serial = 0;
> > +}
> > +
> >  static void read_pasid(struct pci_dev *dev, struct ocxl_fn_config
> > *fn)
> >  {
> > u16 val;
> > @@ -208,6 +253,7 @@ int ocxl_config_read_function(struct pci_dev
> > *dev, struct ocxl_fn_config *fn)
> > int rc;
> >  
> > read_pasid(dev, fn);
> > +   read_serial(dev, fn);
> >  
> > rc = read_dvsec_tl(dev, fn);
> > if (rc) {
> > diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> > index 6f7c02f0d5e3..9843051c3c5b 100644
> > --- a/include/misc/ocxl.h
> > +++ b/include/misc/ocxl.h
> > @@ -46,6 +46,7 @@ struct ocxl_fn_config {
> > int dvsec_afu_info_pos; /* offset of the AFU information DVSEC
> > */
> > s8 max_pasid_log;
> > s8 max_afu_index;
> > +   u64 serial;
> >  };
> >  
> >  enum ocxl_endian {
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH v2 07/27] ocxl: Add functions to map/unmap LPC memory

2020-02-18 Thread Alastair D'Silva
On Mon, 2020-02-03 at 12:49 +, Jonathan Cameron wrote:
> On Tue, 3 Dec 2019 14:46:35 +1100
> Alastair D'Silva  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > Add functions to map/unmap LPC memory
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >  drivers/misc/ocxl/config.c|  4 +++
> >  drivers/misc/ocxl/core.c  | 50
> > +++
> >  drivers/misc/ocxl/ocxl_internal.h |  3 ++
> >  include/misc/ocxl.h   | 18 +++
> >  4 files changed, 75 insertions(+)
> > 
> > diff --git a/drivers/misc/ocxl/config.c
> > b/drivers/misc/ocxl/config.c
> > index c8e19bfb5ef9..fb0c3b6f8312 100644
> > --- a/drivers/misc/ocxl/config.c
> > +++ b/drivers/misc/ocxl/config.c
> > @@ -568,6 +568,10 @@ static int read_afu_lpc_memory_info(struct
> > pci_dev *dev,
> > afu->special_purpose_mem_size =
> > total_mem_size - lpc_mem_size;
> > }
> > +
> > +   dev_info(>dev, "Probed LPC memory of %#llx bytes and
> > special purpose memory of %#llx bytes\n",
> > +   afu->lpc_mem_size, afu->special_purpose_mem_size);
> > +
> 
> If we are being fussy, this block has nothing todo with the rest of
> the patch
> so we should be seeing it here.

Agreed

> 
> > return 0;
> >  }
> >  
> > diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> > index 2531c6cf19a0..98611faea219 100644
> > --- a/drivers/misc/ocxl/core.c
> > +++ b/drivers/misc/ocxl/core.c
> > @@ -210,6 +210,55 @@ static void unmap_mmio_areas(struct ocxl_afu
> > *afu)
> > release_fn_bar(afu->fn, afu->config.global_mmio_bar);
> >  }
> >  
> > +int ocxl_afu_map_lpc_mem(struct ocxl_afu *afu)
> > +{
> > +   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> > +
> > +   if ((afu->config.lpc_mem_size + afu-
> > >config.special_purpose_mem_size) == 0)
> > +   return 0;
> > +
> > +   afu->lpc_base_addr = ocxl_link_lpc_map(afu->fn->link, dev);
> > +   if (afu->lpc_base_addr == 0)
> > +   return -EINVAL;
> > +
> > +   if (afu->config.lpc_mem_size) {
> 
> I was happy with the explicit check on 0 above, but we should be
> consistent.  Either
> we make use of 0 == false, or we don't and explicitly check vs 0.
> 
> Hence
> 
> if (afu->config.pc_mem_size != 0) { 
> 
> here or
> 
> if (!(afu->config.pc_mem_size + afu-
> >config.special_purpose_mem_size))
>   return 0;
> 
> above.

This feels a bit niggly, but sure, changed to a '> 0' check.

> 
> > +   afu->lpc_res.start = afu->lpc_base_addr + afu-
> > >config.lpc_mem_offset;
> > +   afu->lpc_res.end = afu->lpc_res.start + afu-
> > >config.lpc_mem_size - 1;
> > +   }
> > +
> > +   if (afu->config.special_purpose_mem_size) {
> > +   afu->special_purpose_res.start = afu->lpc_base_addr +
> > +afu-
> > >config.special_purpose_mem_offset;
> > +   afu->special_purpose_res.end = afu-
> > >special_purpose_res.start +
> > +  afu-
> > >config.special_purpose_mem_size - 1;
> > +   }
> > +
> > +   return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(ocxl_afu_map_lpc_mem);
> > +
> > +struct resource *ocxl_afu_lpc_mem(struct ocxl_afu *afu)
> > +{
> > +   return >lpc_res;
> > +}
> > +EXPORT_SYMBOL_GPL(ocxl_afu_lpc_mem);
> > +
> > +static void unmap_lpc_mem(struct ocxl_afu *afu)
> > +{
> > +   struct pci_dev *dev = to_pci_dev(afu->fn->dev.parent);
> > +
> > +   if (afu->lpc_res.start || afu->special_purpose_res.start) {
> > +   void *link = afu->fn->link;
> > +
> > +   ocxl_link_lpc_release(link, dev);
> > +
> > +   afu->lpc_res.start = 0;
> > +   afu->lpc_res.end = 0;
> > +   afu->special_purpose_res.start = 0;
> > +   afu->special_purpose_res.end = 0;
> > +   }
> > +}
> > +
> >  static int configure_afu(struct ocxl_afu *afu, u8 afu_idx, struct
> > pci_dev *dev)
> >  {
> > int rc;
> > @@ -251,6 +300,7 @@ static int configure_afu(struct ocxl_afu *afu,
> > u8 afu_idx, struct pci_dev *dev)
> >  
> >  static void deconfigure_afu(struct ocxl_afu *afu)
> >  {
> > +   unmap_lpc_mem(afu);
> 
> Hmm. This breaks the existing balance between configure_afu and
> deconfigure_afu.
> 
> Given comments below on why we don't do map_lpc_mem in the afu bring
> up
> (as it's a shared operation) it seems to me that we should be doing
> this
> outside of the afu deconfigure.  Perhaps ocxl_function_close is
> appropriate?
> I don't know this infrastructure well enough to be sure.
> 
> If it does need to be here, then a comment to give more info on
> why would be great!
> 

Sure, I've added a comment in unmap_lpc_mem explaining that lpc_release
only releases the memory on the link when the last consumer calls
release.

It's in deconfigure_afu as the LPC memory is registered and reported
per-AFU (even though it has to be allocated all at once across the
link).

> > unmap_mmio_areas(afu);
> > reclaim_afu_pasid(afu);
> > reclaim_afu_actag(afu);
> > diff --git 

Re: [PATCH v2] powerpc/Makefile: Mark phony targets as PHONY

2020-02-18 Thread Masahiro Yamada
On Wed, Feb 19, 2020 at 9:04 AM Michael Ellerman  wrote:
>
> Some of our phony targets are not marked as such. This can lead to
> confusing errors, eg:
>
>   $ make clean
>   $ touch install
>   $ make install
>   make: 'install' is up to date.
>   $
>
> Fix it by adding them to the PHONY variable which is marked phony in
> the top-level Makefile, or in scripts/Makefile.build for the boot
> Makefile.
>
> Suggested-by: Masahiro Yamada 
> Signed-off-by: Michael Ellerman 
> ---

Reviewed-by: Masahiro Yamada 



-- 
Best Regards
Masahiro Yamada


Re: [PATCH v2 3/3] ASoC: fsl_easrc: Add EASRC ASoC CPU DAI and platform drivers

2020-02-18 Thread Nicolin Chen
On Tue, Feb 18, 2020 at 02:39:37PM +0800, Shengjiu Wang wrote:
> EASRC (Enhanced Asynchronous Sample Rate Converter) is a new IP module
> found on i.MX8MN. It is different with old ASRC module.
> 
> The primary features for the EASRC are as follows:
> - 4 Contexts - groups of channels with an independent time base
> - Fully independent and concurrent context control
> - Simultaneous processing of up to 32 audio channels
> - Programmable filter charachteristics for each context
> - 32, 24, 20, and 16-bit fixed point audio sample support
> - 32-bit floating point audio sample support
> - 8kHz to 384kHz sample rate
> - 1/16 to 8x sample rate conversion ratio
> 
> Signed-off-by: Shengjiu Wang 
> ---
>  sound/soc/fsl/Kconfig   |   10 +
>  sound/soc/fsl/Makefile  |2 +
>  sound/soc/fsl/fsl_asrc_common.h |1 +
>  sound/soc/fsl/fsl_easrc.c   | 2265 +++
>  sound/soc/fsl/fsl_easrc.h   |  668 +
>  sound/soc/fsl/fsl_easrc_dma.c   |  440 ++

I see a 90% similarity between fsl_asrc_dma and fsl_easrc_dma files.
Would it be possible reuse the existing code? Could share structures
from my point of view, just like it reuses "enum asrc_pair_index", I
know differentiating "pair" and "context" is a big point here though.

A possible quick solution for that, off the top of my head, could be:

1) in fsl_asrc_common.h

struct fsl_asrc {

};

struct fsl_asrc_pair {

};

2) in fsl_easrc.h

/* Renaming shared structures */
#define fsl_easrc fsl_asrc
#define fsl_easrc_context fsl_asrc_pair

May be a good idea to see if others have some opinion too.

> diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
> new file mode 100644
> index ..6fe2953317f2
> --- /dev/null
> +++ b/sound/soc/fsl/fsl_easrc.c
> +
> +/* set_rs_ratio
> + *
> + * According to the resample taps, calculate the resample ratio
> + */
> +static int set_rs_ratio(struct fsl_easrc_context *ctx)

"fsl_easrc_" prefix? Would be nice to have a formula in the comments.

> +/* resets the pointer of the coeff memory pointers */
> +static int fsl_coeff_mem_ptr_reset(struct fsl_easrc *easrc,
> +unsigned int ctx_id, int mem_type)
> +{
> + /*  To reset the write pointer back to zero, the register field
> +  *  ASRC_CTX_CTRL_EXT1x[PF_COEFF_MEM_RST] can be toggled from
> +  *  0x0 to 0x1 to 0x0.
> +  */

Please use the style:
/*
 * xxx
 */

> +static int fsl_easrc_resampler_config(struct fsl_easrc *easrc)
> +{
> + for (i = 0; i < hdr->interp_scen; i++) {
> + if ((interp[i].num_taps - 1) ==
> + bits_taps_to_val(easrc->rs_num_taps)) {

Could do below to save some indentations from the rest of the routine:
+   if ((interp[i].num_taps - 1) !=
+   bits_taps_to_val(easrc->rs_num_taps))
+   continue;

> + arr = interp[i].coeff;
> + selected_interp = [i];
> + dev_dbg(dev, "Selected interp_filter: %u taps - %u 
> phases\n",
> + selected_interp->num_taps,
> + selected_interp->num_phases);
> + break;

> +/*
> + *  Scale filter coefficients (64 bits float)
> + *  For input float32 normalized range (1.0,-1.0) -> output int[16,24,32]:
> + *  scale it by multiplying filter coefficients by 2^31
> + *  For input int[16, 24, 32] -> output float32
> + *  scale it by multiplying filter coefficients by 2^-15, 2^-23, 2^-31
> + *  input:
> + *  easrc:  Structure pointer of fsl_easrc
> + *  infilter : Pointer to non-scaled input filter
> + *  shift:  The multiply factor
> + *  output:
> + *  outfilter: scaled filter
> + 
> */
> +static int NormalizedFilterForFloat32InIntOut(struct fsl_easrc *easrc,
> +   u64 *infilter,
> +   u64 *outfilter,
> +   int shift)

Coding style looks very different, at comments and function naming.

> +{
> + struct device *dev = >pdev->dev;
> + u64 coef = *infilter;
> + s64 exp  = (coef & 0x7ff0ll) >> 52;
> + u64 outcoef;
> +
> + /*
> +  * If exponent is zero (value == 0), or 7ff (value == NaNs)
> +  * dont touch the content
> +  */
> + if (((coef & 0x7ff0ll) == 0) ||
> + ((coef & 0x7ff0ll) == ((u64)0x7ff << 52))) {
> + *outfilter = coef;
> + } else {
> + if ((shift > 0 && (shift + exp) >= 2047) ||
> + (shift < 0 && (exp + shift) <= 0)) {
> + dev_err(dev, "coef 

Re: [PATCH v3] powerpc/kprobes: Ignore traps that happened in real mode

2020-02-18 Thread Masami Hiramatsu
On Tue, 18 Feb 2020 19:38:27 + (UTC)
Christophe Leroy  wrote:

> When a program check exception happens while MMU translation is
> disabled, following Oops happens in kprobe_handler() in the following
> code:
> 
>   } else if (*addr != BREAKPOINT_INSTRUCTION) {
> 
> [   33.098554] BUG: Unable to handle kernel data access on read at 0xe268
> [   33.105091] Faulting instruction address: 0xc000ec34
> [   33.110010] Oops: Kernel access of bad area, sig: 11 [#1]
> [   33.115348] BE PAGE_SIZE=16K PREEMPT CMPC885
> [   33.119540] Modules linked in:
> [   33.122591] CPU: 0 PID: 429 Comm: cat Not tainted 
> 5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267
> [   33.131005] NIP:  c000ec34 LR: c000ecd8 CTR: c019cab8
> [   33.136002] REGS: ca4d3b58 TRAP: 0300   Not tainted  
> (5.6.0-rc1-s3k-dev-00824-g84195dc6c58a)
> [   33.144324] MSR:  1032   CR: 2a4d3c52  XER: 
> [   33.150699] DAR: e268 DSISR: c000
> [   33.150699] GPR00: c000b09c ca4d3c10 c66d0620  ca4d3c60  
> 9032 
> [   33.150699] GPR08: 0002  c087de44 c000afe0 c66d0ad0 100d3dd6 
> fff3 
> [   33.150699] GPR16:  0041  ca4d3d70   
> 416d 
> [   33.150699] GPR24: 0004 c53b6128  e268  c07c 
> c07bb6fc ca4d3c60
> [   33.188015] NIP [c000ec34] kprobe_handler+0x128/0x290
> [   33.192989] LR [c000ecd8] kprobe_handler+0x1cc/0x290
> [   33.197854] Call Trace:
> [   33.200340] [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc
> [   33.206590] [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4
> [   33.212392] --- interrupt: 700 at 0xe268
> [   33.270401] Instruction dump:
> [   33.273335] 913e0008 8122 3861 3929 9122 80010024 bb410008 
> 7c0803a6
> [   33.280992] 38210020 4e800020 3860 4e800020 <813b> 6d2a7fe0 
> 2f8a0008 419e0154
> [   33.288841] ---[ end trace 5b9152d4cdadd06d ]---
> 
> kprobe is not prepared to handle events in real mode and functions
> running in real mode should have been blacklisted, so kprobe_handler()
> can safely bail out telling 'this trap is not mine' for any trap that
> happened while in real-mode.
> 
> If the trap happened with MSR_IR or MSR_DR cleared, return 0 immediately.
> 

Looks good to me.

Reviewed-by: Masami Hiramatsu 

Thank you!


> Reported-by: Larry Finger 
> Fixes: 6cc89bad60a6 ("powerpc/kprobes: Invoke handlers directly")
> Cc: sta...@vger.kernel.org
> Cc: Naveen N. Rao 
> Cc: Masami Hiramatsu 
> Signed-off-by: Christophe Leroy 
> 
> ---
> v3: Also bail out if MSR_DR is cleared.
> 
> Resending v2 with a more appropriate name
> 
> v2: bailing out instead of converting real-time address to virtual and 
> continuing.
> 
> The bug might have existed even before that commit from Naveen.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/kprobes.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 2d27ec4feee4..9b340af02c38 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -264,6 +264,9 @@ int kprobe_handler(struct pt_regs *regs)
>   if (user_mode(regs))
>   return 0;
>  
> + if (!(regs->msr & MSR_IR) || !(regs->msr & MSR_DR))
> + return 0;
> +
>   /*
>* We don't want to be preempted for the entire
>* duration of kprobe processing
> -- 
> 2.25.0
> 


-- 
Masami Hiramatsu 


[PATCH v2] powerpc/Makefile: Mark phony targets as PHONY

2020-02-18 Thread Michael Ellerman
Some of our phony targets are not marked as such. This can lead to
confusing errors, eg:

  $ make clean
  $ touch install
  $ make install
  make: 'install' is up to date.
  $

Fix it by adding them to the PHONY variable which is marked phony in
the top-level Makefile, or in scripts/Makefile.build for the boot
Makefile.

Suggested-by: Masahiro Yamada 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Makefile  | 6 ++
 arch/powerpc/boot/Makefile | 2 ++
 2 files changed, 8 insertions(+)

v2: Use PHONY variable in boot/Makefile as well.

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f35730548e42..cbe5ca4f0ee5 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -298,6 +298,7 @@ $(BOOT_TARGETS2): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
 
 
+PHONY += bootwrapper_install
 bootwrapper_install:
$(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
 
@@ -403,9 +404,11 @@ define archhelp
   @echo '  (minus the .dts extension).'
 endef
 
+PHONY += install
 install:
$(Q)$(MAKE) $(build)=$(boot) install
 
+PHONY += vdso_install
 vdso_install:
 ifdef CONFIG_PPC64
$(Q)$(MAKE) $(build)=arch/$(ARCH)/kernel/vdso64 $@
@@ -425,6 +428,7 @@ archprepare: checkbin
 ifdef CONFIG_STACKPROTECTOR
 prepare: stack_protector_prepare
 
+PHONY += stack_protector_prepare
 stack_protector_prepare: prepare0
 ifdef CONFIG_PPC64
$(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk '{if 
($$2 == "PACA_CANARY") print $$3;}' include/generated/asm-offsets.h))
@@ -436,10 +440,12 @@ endif
 ifdef CONFIG_SMP
 prepare: task_cpu_prepare
 
+PHONY += task_cpu_prepare
 task_cpu_prepare: prepare0
$(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
 endif
 
+PHONY += checkbin
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
 checkbin:
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 0556bf4fc9e9..c53a1b8bba8b 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -445,6 +445,8 @@ install: $(CONFIGURE) $(addprefix $(obj)/, $(image-y))
 zInstall: $(CONFIGURE) $(addprefix $(obj)/, $(image-y))
sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" vmlinux 
System.map "$(INSTALL_PATH)" $^
 
+PHONY += install zInstall
+
 # anything not in $(targets)
 clean-files += $(image-) $(initrd-) cuImage.* dtbImage.* treeImage.* \
zImage zImage.initrd zImage.chrp zImage.coff zImage.holly \
-- 
2.21.1



Re: [PATCH v2 06/27] ocxl: Tally up the LPC memory on a link & allow it to be mapped

2020-02-18 Thread Alastair D'Silva
On Mon, 2020-02-03 at 12:37 +, Jonathan Cameron wrote:
> On Tue, 3 Dec 2019 14:46:34 +1100
> Alastair D'Silva  wrote:
> 
> > From: Alastair D'Silva 
> > 
> > Tally up the LPC memory on an OpenCAPI link & allow it to be mapped
> > 
> > Signed-off-by: Alastair D'Silva 
> Hi Alastair,
> 
> A few trivial comments inline.
> 
> Jonathan
> 
> > ---
> >  drivers/misc/ocxl/core.c  | 10 ++
> >  drivers/misc/ocxl/link.c  | 60
> > +++
> >  drivers/misc/ocxl/ocxl_internal.h | 33 +
> >  3 files changed, 103 insertions(+)
> > 
> > diff --git a/drivers/misc/ocxl/core.c b/drivers/misc/ocxl/core.c
> > index b7a09b21ab36..2531c6cf19a0 100644
> > --- a/drivers/misc/ocxl/core.c
> > +++ b/drivers/misc/ocxl/core.c
> > @@ -230,8 +230,18 @@ static int configure_afu(struct ocxl_afu *afu,
> > u8 afu_idx, struct pci_dev *dev)
> > if (rc)
> > goto err_free_pasid;
> >  
> > +   if (afu->config.lpc_mem_size || afu-
> > >config.special_purpose_mem_size) {
> > +   rc = ocxl_link_add_lpc_mem(afu->fn->link, afu-
> > >config.lpc_mem_offset,
> > +  afu->config.lpc_mem_size +
> > +  afu-
> > >config.special_purpose_mem_size);
> > +   if (rc)
> > +   goto err_free_mmio;
> > +   }
> > +
> > return 0;
> >  
> > +err_free_mmio:
> > +   unmap_mmio_areas(afu);
> >  err_free_pasid:
> > reclaim_afu_pasid(afu);
> >  err_free_actag:
> > diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> > index 58d111afd9f6..d8503f0dc6ec 100644
> > --- a/drivers/misc/ocxl/link.c
> > +++ b/drivers/misc/ocxl/link.c
> > @@ -84,6 +84,11 @@ struct ocxl_link {
> > int dev;
> > atomic_t irq_available;
> > struct spa *spa;
> > +   struct mutex lpc_mem_lock;
> 
> Always a good idea to explicitly document what a lock is intended to
> protect.
> 
Ok

> > +   u64 lpc_mem_sz; /* Total amount of LPC memory presented on the
> > link */
> > +   u64 lpc_mem;
> > +   int lpc_consumers;
> > +
> > void *platform_data;
> >  };
> >  static struct list_head links_list = LIST_HEAD_INIT(links_list);
> > @@ -396,6 +401,8 @@ static int alloc_link(struct pci_dev *dev, int
> > PE_mask, struct ocxl_link **out_l
> > if (rc)
> > goto err_spa;
> >  
> > +   mutex_init(>lpc_mem_lock);
> > +
> > /* platform specific hook */
> > rc = pnv_ocxl_spa_setup(dev, link->spa->spa_mem, PE_mask,
> > >platform_data);
> > @@ -711,3 +718,56 @@ void ocxl_link_free_irq(void *link_handle, int
> > hw_irq)
> > atomic_inc(>irq_available);
> >  }
> >  EXPORT_SYMBOL_GPL(ocxl_link_free_irq);
> > +
> > +int ocxl_link_add_lpc_mem(void *link_handle, u64 offset, u64 size)
> > +{
> > +   struct ocxl_link *link = (struct ocxl_link *) link_handle;
> > +
> > +   // Check for overflow
> 
> Stray c++ style comment.
> 
This is permitted in powerpc.

> > +   if (offset > (offset + size))
> > +   return -EINVAL;
> > +
> > +   mutex_lock(>lpc_mem_lock);
> > +   link->lpc_mem_sz = max(link->lpc_mem_sz, offset + size);
> > +
> > +   mutex_unlock(>lpc_mem_lock);
> > +
> > +   return 0;
> > +}
> > +
> > +u64 ocxl_link_lpc_map(void *link_handle, struct pci_dev *pdev)
> > +{
> > +   struct ocxl_link *link = (struct ocxl_link *) link_handle;
> > +   u64 lpc_mem;
> > +
> > +   mutex_lock(>lpc_mem_lock);
> > +   if (link->lpc_mem) {
> 
> If you don't modify this later in the series (I haven't read it all
> yet :),
> it rather feels like it would be more compact and just as readable as
> something like...
> 
>   if (!link->lpc_mem)
>   link->lpc_mem = pnv_ocxl...
> 
>   if (link->lpc_mem)
>   link->lpc_consumers++;
>   mutex_unlock(>lpc_mem_lock);
>   
>   return link->lpc_mem;
> 

Agreed, thanks.

> > +   lpc_mem = link->lpc_mem;
> > +
> > +   link->lpc_consumers++;
> > +   mutex_unlock(>lpc_mem_lock);
> > +   return lpc_mem;
> > +   }
> > +
> > +   link->lpc_mem = pnv_ocxl_platform_lpc_setup(pdev, link-
> > >lpc_mem_sz);
> > +   if (link->lpc_mem)
> > +   link->lpc_consumers++;
> > +   lpc_mem = link->lpc_mem;
> > +   mutex_unlock(>lpc_mem_lock);
> > +
> > +   return lpc_mem;
> > +}
> > +
> > +void ocxl_link_lpc_release(void *link_handle, struct pci_dev
> > *pdev)
> > +{
> > +   struct ocxl_link *link = (struct ocxl_link *) link_handle;
> > +
> > +   mutex_lock(>lpc_mem_lock);
> > +   WARN_ON(--link->lpc_consumers < 0);
> > +   if (link->lpc_consumers == 0) {
> > +   pnv_ocxl_platform_lpc_release(pdev);
> > +   link->lpc_mem = 0;
> > +   }
> > +
> > +   mutex_unlock(>lpc_mem_lock);
> > +}
> > diff --git a/drivers/misc/ocxl/ocxl_internal.h
> > b/drivers/misc/ocxl/ocxl_internal.h
> > index 97415afd79f3..20b417e00949 100644
> > --- a/drivers/misc/ocxl/ocxl_internal.h
> > +++ b/drivers/misc/ocxl/ocxl_internal.h
> > @@ -141,4 +141,37 @@ int 

Re: [PATCH v2 05/27] powerpc: Map & release OpenCAPI LPC memory

2020-02-18 Thread Alastair D'Silva
On Fri, 2020-02-14 at 12:09 +0100, Frederic Barrat wrote:
> 
> Le 03/12/2019 à 04:46, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > This patch adds platform support to map & release LPC memory.
> > 
> > Signed-off-by: Alastair D'Silva 
> > ---
> >   arch/powerpc/include/asm/pnv-ocxl.h   |  2 ++
> >   arch/powerpc/platforms/powernv/ocxl.c | 42
> > +++
> >   2 files changed, 44 insertions(+)
> > 
> > diff --git a/arch/powerpc/include/asm/pnv-ocxl.h
> > b/arch/powerpc/include/asm/pnv-ocxl.h
> > index 7de82647e761..f8f8ffb48aa8 100644
> > --- a/arch/powerpc/include/asm/pnv-ocxl.h
> > +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> > @@ -32,5 +32,7 @@ extern int pnv_ocxl_spa_remove_pe_from_cache(void
> > *platform_data, int pe_handle)
> >   
> >   extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
> >   extern void pnv_ocxl_free_xive_irq(u32 irq);
> > +extern u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64
> > size);
> > +extern void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev);
> >   
> >   #endif /* _ASM_PNV_OCXL_H */
> > diff --git a/arch/powerpc/platforms/powernv/ocxl.c
> > b/arch/powerpc/platforms/powernv/ocxl.c
> > index 8c65aacda9c8..b56a48daf48c 100644
> > --- a/arch/powerpc/platforms/powernv/ocxl.c
> > +++ b/arch/powerpc/platforms/powernv/ocxl.c
> > @@ -475,6 +475,48 @@ void pnv_ocxl_spa_release(void *platform_data)
> >   }
> >   EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
> >   
> > +u64 pnv_ocxl_platform_lpc_setup(struct pci_dev *pdev, u64 size)
> > +{
> > +   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > +   struct pnv_phb *phb = hose->private_data;
> > +   u32 bdfn = pci_dev_id(pdev);
> > +   __be64 base_addr_be64;
> > +   u64 base_addr;
> > +   int rc;
> > +
> > +   rc = opal_npu_mem_alloc(phb->opal_id, bdfn, size,
> > _addr_be64);
> > +   if (rc) {
> > +   dev_warn(>dev,
> > +"OPAL could not allocate LPC memory, rc=%d\n",
> > rc);
> > +   return 0;
> > +   }
> > +
> > +   base_addr = be64_to_cpu(base_addr_be64);
> > +
> > +   rc = check_hotplug_memory_addressable(base_addr >> PAGE_SHIFT,
> > + size >> PAGE_SHIFT);
> 
> check_hotplug_memory_addressable() is only declared if 
> CONFIG_MEMORY_HOTPLUG_SPARSE is selected.
> I think we also need a #ifdef here.
> 

Agreed. I think that since any actual use of the memory is going to be
dependant on both hotplug & sparse, moving the ifdef to wrap the
functions & declarations makes sense.


>Fred
> 
> 
> > +   if (rc)
> > +   return 0;
> > +
> > +   return base_addr;
> > +}
> > +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_setup);
> > +
> > +void pnv_ocxl_platform_lpc_release(struct pci_dev *pdev)
> > +{
> > +   struct pci_controller *hose = pci_bus_to_host(pdev->bus);
> > +   struct pnv_phb *phb = hose->private_data;
> > +   u32 bdfn = pci_dev_id(pdev);
> > +   int rc;
> > +
> > +   rc = opal_npu_mem_release(phb->opal_id, bdfn);
> > +   if (rc)
> > +   dev_warn(>dev,
> > +"OPAL reported rc=%d when releasing LPC
> > memory\n", rc);
> > +}
> > +EXPORT_SYMBOL_GPL(pnv_ocxl_platform_lpc_release);
> > +
> > +
> >   int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int
> > pe_handle)
> >   {
> > struct spa_data *data = (struct spa_data *) platform_data;
> > 
-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



Re: [PATCH] powerpc/Makefile: Mark phony targets as PHONY

2020-02-18 Thread Masahiro Yamada
On Tue, Feb 18, 2020 at 8:19 PM Michael Ellerman  wrote:
>
> Some of our phony targets are not marked as such. This can lead to
> confusing errors, eg:
>
>   $ make clean
>   $ touch install
>   $ make install
>   make: 'install' is up to date.
>   $
>
> Fix it by adding them to the PHONY variable which is marked phony in
> the top-level Makefile. In arch/powerpc/boot/Makefile we do it
> manually.


You can do likewise in arch/powerpc/boot/Makefile
because it is marked phony in scripts/Makefile.build





-- 
Best Regards
Masahiro Yamada


Re: [PATCH v2 2/3] ASoC: dt-bindings: fsl_easrc: Add document for EASRC

2020-02-18 Thread Nicolin Chen
On Tue, Feb 18, 2020 at 02:39:36PM +0800, Shengjiu Wang wrote:
> EASRC (Enhanced Asynchronous Sample Rate Converter) is a new
> IP module found on i.MX8MN.
> 
> Signed-off-by: Shengjiu Wang 
> ---
>  .../devicetree/bindings/sound/fsl,easrc.txt   | 57 +++
>  1 file changed, 57 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/sound/fsl,easrc.txt
> 
> diff --git a/Documentation/devicetree/bindings/sound/fsl,easrc.txt 
> b/Documentation/devicetree/bindings/sound/fsl,easrc.txt
> new file mode 100644
> index ..0e8153165e3b
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/sound/fsl,easrc.txt
> @@ -0,0 +1,57 @@
> +NXP Asynchronous Sample Rate Converter (ASRC) Controller

Missing "Enhanced", I guess.

And "ASRC" => "EASRC"

> +The Asynchronous Sample Rate Converter (ASRC) converts the sampling rate of a

Ditto

> +signal associated with an input clock into a signal associated with a 
> different
> +output clock. The driver currently works as a Front End of DPCM with other 
> Back
> +Ends Audio controller such as ESAI, SSI and SAI. It has four context to 
> support

"context" => "contexts"

Btw, what's the definition of this "context"?

And, is SSI still available on imx8mn?


Re: [PATCH] KVM: PPC: Book3S HV: Treat unrecognized TM instructions as illegal

2020-02-18 Thread Gustavo Romero

Hi,

On 02/17/2020 04:37 AM, Segher Boessenkool wrote:

On Mon, Feb 17, 2020 at 05:23:07PM +1100, Michael Neuling wrote:

Hence, we should NOP this, not generate an illegal.


It is not a reserved bit.

The IMC entry for it matches op1=01 op2=101110 presumably, which
catches all TM instructions and nothing else (bits 0..5 and bits 21..30).
That does not look at bit 31, the softpatch handler has to deal with this.

Some TM insns have bit 31 as 1 and some have it as /.  All instructions
with a "." in the mnemonic have bit 31 is 1, all other have it reserved.
The tables in appendices D, E, F show tend. and tsr. as having it
reserved, which contradicts the individual instruction description (and
does not make much sense).  (Only tcheck has /, everything else has 1;
everything else has a mnemonic with a dot, and does write CR0 always).


Wow, interesting.

P8 seems to be treating 31 as a reserved bit (with the table definition rather
than the individual instruction description). I'm inclined to match P8 even
though it's inconsistent with the dot mnemonic as you say.


"The POWER8 core ignores the state of reserved bits in the instructions
(denoted by “///” in the instruction definition) and executes the
instruction normally. Software should set these bits to ‘0’ per the
Power ISA." (p8 UM, 3.1.1.3; same in the p9 UM).


For the records, I've sent a v2 addressing Mikey's comments:

https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-February/204502.html

or

https://marc.info/?l=kvm-ppc=158206045520038=2

Thanks for the review.


Best regards,
Gustavo


[PATCH] KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones

2020-02-18 Thread Gustavo Romero
On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated by
KVM. This is handled at first by the hardware raising a softpatch interrupt
when certain TM instructions that need KVM assistance are executed in the
guest. Althought some TM instructions per Power ISA are invalid forms they
can raise a softpatch interrupt too. For instance, 'tresume.' instruction
as defined in the ISA must have bit 31 set (1), but an instruction that
matches 'tresume.' PO and XO opcode fields but has bit 31 not set (0), like
0x7cfe9ddc, also raises a softpatch interrupt. Similarly for 'treclaim.'
and 'trechkpt.' instructions with bit 31 = 0, i.e. 0x7c00075c and
0x7c0007dc, respectively. Hence, if a code like the following is executed
in the guest it will raise a softpatch interrupt just like a 'tresume.'
when the TM facility is enabled ('tabort. 0' in the example is used only
to enable the TM facility):

int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); }

Currently in such a case KVM throws a complete trace like:

[345523.705984] WARNING: CPU: 24 PID: 64413 at 
arch/powerpc/kvm/book3s_hv_tm.c:211 kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.705985] Modules linked in: kvm_hv(E) xt_conntrack ipt_REJECT 
nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat
iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter bridge stp llc sch_fq_codel ipmi_powernv at24 
vmx_crypto ipmi_devintf ipmi_msghandler
ibmpowernv uio_pdrv_genirq kvm opal_prd uio leds_powernv ib_iser rdma_cm iw_cm 
ib_cm ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs blake2b_generic 
zstd_compress raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor 
raid6_pq raid1 raid0 multipath linear tg3
crct10dif_vpmsum crc32c_vpmsum ipr [last unloaded: kvm_hv]
[345523.706030] CPU: 24 PID: 64413 Comm: CPU 0/KVM Tainted: GW   E 
5.5.0+ #1
[345523.706031] NIP:  c008072cb9c0 LR: c008072b5e80 CTR: 
c008085c7850
[345523.706034] REGS: c00399467680 TRAP: 0700   Tainted: GW   E 
 (5.5.0+)
[345523.706034] MSR:  90010282b033 
  CR: 24022428  XER: 
[345523.706042] CFAR: c008072b5e7c IRQMASK: 0
GPR00: c008072b5e80 c00399467910 c008072db500 
c00375ccc720
GPR04: c00375ccc720 0003fbec a10395dda5a6 

GPR08: 7cfe9ddc 7cfe9ddc05dc 7cfe9ddc7c0005dc 
c008072cd530
GPR12: c008085c7850 c003fffeb800 0001 
7dfb737f
GPR16: c0002001edcca558   
0001
GPR20: c1b21258 c0002001edcca558 0018 

GPR24: 0100  0001 
1500
GPR28: c0002001edcc4278 c0037dd8 80050280f033 
c00375ccc720
[345523.706062] NIP [c008072cb9c0] kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.706065] LR [c008072b5e80] kvmppc_handle_exit_hv.isra.53+0x3e8/0x798 
[kvm_hv]
[345523.706066] Call Trace:
[345523.706069] [c00399467910] [c00399467940] 0xc00399467940 
(unreliable)
[345523.706071] [c00399467950] [c00399467980] 0xc00399467980
[345523.706075] [c003994679f0] [c008072bd1c4] 
kvmhv_run_single_vcpu+0xa1c/0xb80 [kvm_hv]
[345523.706079] [c00399467ac0] [c008072bd8e0] 
kvmppc_vcpu_run_hv+0x5b8/0xb00 [kvm_hv]
[345523.706087] [c00399467b90] [c008085c93cc] kvmppc_vcpu_run+0x34/0x48 
[kvm]
[345523.706095] [c00399467bb0] [c008085c582c] 
kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
[345523.706101] [c00399467c40] [c008085b7498] 
kvm_vcpu_ioctl+0x3d0/0x7b0 [kvm]
[345523.706105] [c00399467db0] [c04adf9c] ksys_ioctl+0x13c/0x170
[345523.706107] [c00399467e00] [c04adff8] sys_ioctl+0x28/0x80
[345523.706111] [c00399467e20] [c000b278] system_call+0x5c/0x68
[345523.706112] Instruction dump:
[345523.706114] 419e0390 7f8a4840 409d0048 6d497c00 2f89075d 419e021c 6d497c00 
2f8907dd
[345523.706119] 419e01c0 6d497c00 2f8905dd 419e00a4 <0fe0> 38210040 
3860 ebc1fff0

and then treats the executed instruction as a 'nop'.

However the POWER9 User's Manual, in section "4.6.10 Book II Invalid
Forms", informs that for TM instructions bit 31 is in fact ignored, thus
for the TM-related invalid forms ignoring bit 31 and handling them like the
valid forms is an acceptable way to handle them. POWER8 behaves the same
way too.

This commit changes the handling of the cases here described by treating
the TM-related invalid forms that can generate a softpatch interrupt
just like their valid forms (w/ bit 31 = 1) instead of as a 'nop' and by
gently reporting any other unrecognized case to the host and treating it as
illegal instruction instead of throwing a trace and treating 

Re: [PATCH] libnvdimm/bus: return the outvar 'cmd_rc' error code in __nd_ioctl()

2020-02-18 Thread Dan Williams
On Tue, Feb 18, 2020 at 1:00 PM Jeff Moyer  wrote:
>
> Vaibhav Jain  writes:
>
> > Presently the error code returned via out variable 'cmd_rc' from the
> > nvdimm-bus controller function is ignored when called from
> > __nd_ioctl() and never communicated back to user-space code that called
> > an ioctl on dimm/bus.
> >
> > This minor patch updates __nd_ioctl() to propagate the value of out
> > variable 'cmd_rc' back to user-space in case it reports an error.
> >
> > Signed-off-by: Vaibhav Jain 
> > ---
> >  drivers/nvdimm/bus.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> > index a8b515968569..5b687a27fdf2 100644
> > --- a/drivers/nvdimm/bus.c
> > +++ b/drivers/nvdimm/bus.c
> > @@ -1153,6 +1153,11 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, 
> > struct nvdimm *nvdimm,
> >   if (rc < 0)
> >   goto out_unlock;
> >
> > + if (cmd_rc < 0) {
> > + rc = cmd_rc;
> > + goto out_unlock;
> > + }
> > +
> >   if (!nvdimm && cmd == ND_CMD_CLEAR_ERROR && cmd_rc >= 0) {
> >   struct nd_cmd_clear_error *clear_err = buf;
>
> Looks good to me.
>
> Reviewed-by: Jeff Moyer 

Applied.


Re: [PATCH] libnvdimm/of_pmem: Fix leaking bus_desc.provider_name in some paths

2020-02-18 Thread Jeff Moyer
Vaibhav Jain  writes:

> String 'bus_desc.provider_name' allocated inside
> of_pmem_region_probe() will leak in case call to nvdimm_bus_register()
> fails or when of_pmem_region_remove() is called.
>
> This minor patch ensures that 'bus_desc.provider_name' is freed in
> error path for of_pmem_region_probe() as well as in
> of_pmem_region_remove().
>
> Cc: sta...@vger.kernel.org
> Fixes:49bddc73d15c2("libnvdimm/of_pmem: Provide a unique name for bus 
> provider")
> Signed-off-by: Vaibhav Jain 
> ---
>  drivers/nvdimm/of_pmem.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/nvdimm/of_pmem.c b/drivers/nvdimm/of_pmem.c
> index 8224d1431ea9..9cb76f9837ad 100644
> --- a/drivers/nvdimm/of_pmem.c
> +++ b/drivers/nvdimm/of_pmem.c
> @@ -36,6 +36,7 @@ static int of_pmem_region_probe(struct platform_device 
> *pdev)
>  
>   priv->bus = bus = nvdimm_bus_register(>dev, >bus_desc);
>   if (!bus) {
> + kfree(priv->bus_desc.provider_name);
>   kfree(priv);
>   return -ENODEV;
>   }
> @@ -81,6 +82,7 @@ static int of_pmem_region_remove(struct platform_device 
> *pdev)
>   struct of_pmem_private *priv = platform_get_drvdata(pdev);
>  
>   nvdimm_bus_unregister(priv->bus);
> + kfree(priv->bus_desc.provider_name);
>   kfree(priv);
>  
>   return 0;

Reviewed-by: Jeff Moyer 



Re: [PATCH] libnvdimm/bus: return the outvar 'cmd_rc' error code in __nd_ioctl()

2020-02-18 Thread Jeff Moyer
Vaibhav Jain  writes:

> Presently the error code returned via out variable 'cmd_rc' from the
> nvdimm-bus controller function is ignored when called from
> __nd_ioctl() and never communicated back to user-space code that called
> an ioctl on dimm/bus.
>
> This minor patch updates __nd_ioctl() to propagate the value of out
> variable 'cmd_rc' back to user-space in case it reports an error.
>
> Signed-off-by: Vaibhav Jain 
> ---
>  drivers/nvdimm/bus.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index a8b515968569..5b687a27fdf2 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -1153,6 +1153,11 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, 
> struct nvdimm *nvdimm,
>   if (rc < 0)
>   goto out_unlock;
>  
> + if (cmd_rc < 0) {
> + rc = cmd_rc;
> + goto out_unlock;
> + }
> +
>   if (!nvdimm && cmd == ND_CMD_CLEAR_ERROR && cmd_rc >= 0) {
>   struct nd_cmd_clear_error *clear_err = buf;

Looks good to me.

Reviewed-by: Jeff Moyer 



Re: [PATCH v7 10/12] drivers/oprofile: open access for CAP_PERFMON privileged process

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> For backward compatibility reasons access to the monitoring remains
> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
> for secure monitoring is discouraged with respect to CAP_PERFMON
> capability.
> 
> Signed-off-by: Alexey Budankov 


Reviewed-by: James Morris 


-- 
James Morris




Re: [PATCH v7 09/12] drivers/perf: open access for CAP_PERFMON privileged process

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> For backward compatibility reasons access to the monitoring remains
> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
> for secure monitoring is discouraged with respect to CAP_PERFMON
> capability.
> 
> Signed-off-by: Alexey Budankov 


Reviewed-by: James Morris 


-- 
James Morris




[PATCH v3] powerpc/kprobes: Ignore traps that happened in real mode

2020-02-18 Thread Christophe Leroy
When a program check exception happens while MMU translation is
disabled, following Oops happens in kprobe_handler() in the following
code:

} else if (*addr != BREAKPOINT_INSTRUCTION) {

[   33.098554] BUG: Unable to handle kernel data access on read at 0xe268
[   33.105091] Faulting instruction address: 0xc000ec34
[   33.110010] Oops: Kernel access of bad area, sig: 11 [#1]
[   33.115348] BE PAGE_SIZE=16K PREEMPT CMPC885
[   33.119540] Modules linked in:
[   33.122591] CPU: 0 PID: 429 Comm: cat Not tainted 
5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267
[   33.131005] NIP:  c000ec34 LR: c000ecd8 CTR: c019cab8
[   33.136002] REGS: ca4d3b58 TRAP: 0300   Not tainted  
(5.6.0-rc1-s3k-dev-00824-g84195dc6c58a)
[   33.144324] MSR:  1032   CR: 2a4d3c52  XER: 
[   33.150699] DAR: e268 DSISR: c000
[   33.150699] GPR00: c000b09c ca4d3c10 c66d0620  ca4d3c60  
9032 
[   33.150699] GPR08: 0002  c087de44 c000afe0 c66d0ad0 100d3dd6 
fff3 
[   33.150699] GPR16:  0041  ca4d3d70   
416d 
[   33.150699] GPR24: 0004 c53b6128  e268  c07c 
c07bb6fc ca4d3c60
[   33.188015] NIP [c000ec34] kprobe_handler+0x128/0x290
[   33.192989] LR [c000ecd8] kprobe_handler+0x1cc/0x290
[   33.197854] Call Trace:
[   33.200340] [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc
[   33.206590] [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4
[   33.212392] --- interrupt: 700 at 0xe268
[   33.270401] Instruction dump:
[   33.273335] 913e0008 8122 3861 3929 9122 80010024 bb410008 
7c0803a6
[   33.280992] 38210020 4e800020 3860 4e800020 <813b> 6d2a7fe0 2f8a0008 
419e0154
[   33.288841] ---[ end trace 5b9152d4cdadd06d ]---

kprobe is not prepared to handle events in real mode and functions
running in real mode should have been blacklisted, so kprobe_handler()
can safely bail out telling 'this trap is not mine' for any trap that
happened while in real-mode.

If the trap happened with MSR_IR or MSR_DR cleared, return 0 immediately.

Reported-by: Larry Finger 
Fixes: 6cc89bad60a6 ("powerpc/kprobes: Invoke handlers directly")
Cc: sta...@vger.kernel.org
Cc: Naveen N. Rao 
Cc: Masami Hiramatsu 
Signed-off-by: Christophe Leroy 

---
v3: Also bail out if MSR_DR is cleared.

Resending v2 with a more appropriate name

v2: bailing out instead of converting real-time address to virtual and 
continuing.

The bug might have existed even before that commit from Naveen.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 2d27ec4feee4..9b340af02c38 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -264,6 +264,9 @@ int kprobe_handler(struct pt_regs *regs)
if (user_mode(regs))
return 0;
 
+   if (!(regs->msr & MSR_IR) || !(regs->msr & MSR_DR))
+   return 0;
+
/*
 * We don't want to be preempted for the entire
 * duration of kprobe processing
-- 
2.25.0



Re: [PATCH v2] powerpc/kprobes: Fix trap address when trap happened in real mode

2020-02-18 Thread Christophe Leroy




Le 18/02/2020 à 15:55, Naveen N. Rao a écrit :

Christophe Leroy wrote:

When a program check exception happens while MMU translation is
disabled, following Oops happens in kprobe_handler() in the following
code:

    } else if (*addr != BREAKPOINT_INSTRUCTION) {

[   33.098554] BUG: Unable to handle kernel data access on read at 
0xe268

[   33.105091] Faulting instruction address: 0xc000ec34
[   33.110010] Oops: Kernel access of bad area, sig: 11 [#1]
[   33.115348] BE PAGE_SIZE=16K PREEMPT CMPC885
[   33.119540] Modules linked in:
[   33.122591] CPU: 0 PID: 429 Comm: cat Not tainted 
5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267

[   33.131005] NIP:  c000ec34 LR: c000ecd8 CTR: c019cab8
[   33.136002] REGS: ca4d3b58 TRAP: 0300   Not tainted  
(5.6.0-rc1-s3k-dev-00824-g84195dc6c58a)

[   33.144324] MSR:  1032   CR: 2a4d3c52  XER: 
[   33.150699] DAR: e268 DSISR: c000
[   33.150699] GPR00: c000b09c ca4d3c10 c66d0620  ca4d3c60 
 9032 
[   33.150699] GPR08: 0002  c087de44 c000afe0 c66d0ad0 
100d3dd6 fff3 
[   33.150699] GPR16:  0041  ca4d3d70  
 416d 
[   33.150699] GPR24: 0004 c53b6128  e268  
c07c c07bb6fc ca4d3c60

[   33.188015] NIP [c000ec34] kprobe_handler+0x128/0x290
[   33.192989] LR [c000ecd8] kprobe_handler+0x1cc/0x290
[   33.197854] Call Trace:
[   33.200340] [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc
[   33.206590] [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4
[   33.212392] --- interrupt: 700 at 0xe268
[   33.270401] Instruction dump:
[   33.273335] 913e0008 8122 3861 3929 9122 80010024 
bb410008 7c0803a6
[   33.280992] 38210020 4e800020 3860 4e800020 <813b> 6d2a7fe0 
2f8a0008 419e0154

[   33.288841] ---[ end trace 5b9152d4cdadd06d ]---

kprobe is not prepared to handle events in real mode and functions
running in real mode should have been blacklisted, so kprobe_handler()
can safely bail out telling 'this trap is not mine' for any trap that
happened while in real-mode.

If the trap happened with MSR_IR cleared, return 0 immediately.

Reported-by: Larry Finger 
Fixes: 6cc89bad60a6 ("powerpc/kprobes: Invoke handlers directly")
Cc: sta...@vger.kernel.org
Cc: Naveen N. Rao 
Cc: Masami Hiramatsu 
Signed-off-by: Christophe Leroy 

---
v2: bailing out instead of converting real-time address to virtual and 
continuing.


The bug might have existed even before that commit from Naveen.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/kprobes.c 
b/arch/powerpc/kernel/kprobes.c

index 2d27ec4feee4..673f349662e8 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -264,6 +264,9 @@ int kprobe_handler(struct pt_regs *regs)
 if (user_mode(regs))
 return 0;

+    if (!(regs->msr & MSR_IR))
+    return 0;
+


Should we also check for MSR_DR? Are there scenarios with ppc32 where 
MSR_IR is on, but MSR_DR is off?


Yes indeed.

Christophe


Re: [PATCH v7 08/12] parisc/perf: open access for CAP_PERFMON privileged process

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> For backward compatibility reasons access to the monitoring remains
> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
> for secure monitoring is discouraged with respect to CAP_PERFMON
> capability.
> 
> Signed-off-by: Alexey Budankov 


Reviewed-by: James Morris 

-- 
James Morris




Re: [PATCH v7 07/12] powerpc/perf: open access for CAP_PERFMON privileged process

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> For backward compatibility reasons access to the monitoring remains
> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage
> for secure monitoring is discouraged with respect to CAP_PERFMON
> capability.
> 
> Signed-off-by: Alexey Budankov 


Reviewed-by: James Morris 


-- 
James Morris




Re: [PATCH v7 06/12] trace/bpf_trace: open access for CAP_PERFMON privileged process

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> 
> Open access to bpf_trace monitoring for CAP_PERFMON privileged process.
> Providing the access under CAP_PERFMON capability singly, without the
> rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
> credentials and makes operation more secure.
> 
> CAP_PERFMON implements the principal of least privilege for performance
> monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
> principle of least privilege: A security design principle that states
> that a process or program be granted only those privileges (e.g.,
> capabilities) necessary to accomplish its legitimate function, and only
> for the time that such privileges are actually required)
> 
> For backward compatibility reasons access to bpf_trace monitoring
> remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
> usage for secure bpf_trace monitoring is discouraged with respect to
> CAP_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov 


Reviewed-by: James Morris 


-- 
James Morris




Re: [PATCH v7 05/12] drm/i915/perf: open access for CAP_PERFMON privileged process

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> 
> Open access to i915_perf monitoring for CAP_PERFMON privileged process.
> Providing the access under CAP_PERFMON capability singly, without the
> rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
> credentials and makes operation more secure.
> 
> CAP_PERFMON implements the principal of least privilege for performance
> monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
> principle of least privilege: A security design principle that states
> that a process or program be granted only those privileges (e.g.,
> capabilities) necessary to accomplish its legitimate function, and only
> for the time that such privileges are actually required)
> 
> For backward compatibility reasons access to i915_events subsystem
> remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
> usage for secure i915_events monitoring is discouraged with respect to
> CAP_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov 


Reviewed-by: James Morris 


-- 
James Morris




Re: [PATCH v7 04/12] perf tool: extend Perf tool with CAP_PERFMON capability support

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> 
> Extend error messages to mention CAP_PERFMON capability as an option
> to substitute CAP_SYS_ADMIN capability for secure system performance
> monitoring and observability. Make perf_event_paranoid_check() and
> __cmd_ftrace() to be aware of CAP_PERFMON capability.
> 
> CAP_PERFMON implements the principal of least privilege for performance
> monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
> principle of least privilege: A security design principle that states
> that a process or program be granted only those privileges (e.g.,
> capabilities) necessary to accomplish its legitimate function, and only
> for the time that such privileges are actually required)
> 
> For backward compatibility reasons access to perf_events subsystem
> remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
> usage for secure perf_events monitoring is discouraged with respect to
> CAP_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov 


Reviewed-by: James Morris 


-- 
James Morris




Re: [PATCH v7 03/12] perf/core: open access to probes for CAP_PERFMON privileged process

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> 
> Open access to monitoring via kprobes and uprobes and eBPF tracing for
> CAP_PERFMON privileged process. Providing the access under CAP_PERFMON
> capability singly, without the rest of CAP_SYS_ADMIN credentials,
> excludes chances to misuse the credentials and makes operation more
> secure.
> 
> perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses
> ftrace to define new kprobe events, and those events are treated as
> tracepoint events. eBPF defines new probes via perf_event_open interface
> and then the probes are used in eBPF tracing.
> 
> CAP_PERFMON implements the principal of least privilege for performance
> monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
> principle of least privilege: A security design principle that states
> that a process or program be granted only those privileges (e.g.,
> capabilities) necessary to accomplish its legitimate function, and only
> for the time that such privileges are actually required)
> 
> For backward compatibility reasons access to perf_events subsystem
> remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
> usage for secure perf_events monitoring is discouraged with respect to
> CAP_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov 


Reviewed-by: James Morris 


-- 
James Morris




Re: [PATCH v7 02/12] perf/core: open access to the core for CAP_PERFMON privileged process

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> 
> Open access to monitoring of kernel code, cpus, tracepoints and
> namespaces data for a CAP_PERFMON privileged process. Providing the
> access under CAP_PERFMON capability singly, without the rest of
> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
> and makes operation more secure.
> 
> CAP_PERFMON implements the principal of least privilege for performance
> monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
> principle of least privilege: A security design principle that states
> that a process or program be granted only those privileges (e.g.,
> capabilities) necessary to accomplish its legitimate function, and only
> for the time that such privileges are actually required)
> 
> For backward compatibility reasons access to perf_events subsystem
> remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
> usage for secure perf_events monitoring is discouraged with respect to
> CAP_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov 


Reviewed-by: James Morris 


-- 
James Morris




Re: [PATCH v7 01/12] capabilities: introduce CAP_PERFMON to kernel and user space

2020-02-18 Thread James Morris
On Mon, 17 Feb 2020, Alexey Budankov wrote:

> 
> Introduce CAP_PERFMON capability designed to secure system performance
> monitoring and observability operations so that CAP_PERFMON would assist
> CAP_SYS_ADMIN capability in its governing role for performance
> monitoring and observability subsystems.


Acked-by: James Morris 


-- 
James Morris




Re: [PATCH v7 01/12] capabilities: introduce CAP_PERFMON to kernel and user space

2020-02-18 Thread Stephen Smalley

On 2/17/20 3:06 AM, Alexey Budankov wrote:


Introduce CAP_PERFMON capability designed to secure system performance
monitoring and observability operations so that CAP_PERFMON would assist
CAP_SYS_ADMIN capability in its governing role for performance
monitoring and observability subsystems.

CAP_PERFMON hardens system security and integrity during performance
monitoring and observability operations by decreasing attack surface
that is available to a CAP_SYS_ADMIN privileged process [2]. Providing
the access to system performance monitoring and observability operations
under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN
credentials, excludes chances to misuse the credentials and makes the
operation more secure. Thus, CAP_PERFMON implements the principal of
least privilege for performance monitoring and observability operations
(POSIX IEEE 1003.1e: 2.2.2.39 principle of least privilege: A security
design principle that states that a process or program be granted only
those privileges (e.g., capabilities) necessary to accomplish its
legitimate function, and only for the time that such privileges are
actually required)

CAP_PERFMON meets the demand to secure system performance monitoring and
observability operations for adoption in security sensitive, restricted,
multiuser production environments (e.g. HPC clusters, cloud and virtual
compute environments), where root or CAP_SYS_ADMIN credentials are not
available to mass users of a system, and securely unblocks accessibility
of system performance monitoring and observability operations beyond
the root and CAP_SYS_ADMIN use cases.

CAP_PERFMON takes over CAP_SYS_ADMIN credentials related to system
performance monitoring and observability operations and balances amount
of CAP_SYS_ADMIN credentials following the recommendations in the
capabilities man page [1] for CAP_SYS_ADMIN: "Note: this capability is
overloaded; see Notes to kernel developers, below." For backward
compatibility reasons access to system performance monitoring and
observability subsystems of the kernel remains open for CAP_SYS_ADMIN
privileged processes but CAP_SYS_ADMIN usage for secure system
performance monitoring and observability operations is discouraged with
respect to the designed CAP_PERFMON capability.

Although the software running under CAP_PERFMON can not ensure avoidance
of related hardware issues, the software can still mitigate these issues
following the official hardware issues mitigation procedure [2].
The bugs in the software itself can be fixed following the standard
kernel development process [3] to maintain and harden security of system
performance monitoring and observability operations.

[1] http://man7.org/linux/man-pages/man7/capabilities.7.html
[2] 
https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html
[3] https://www.kernel.org/doc/html/latest/admin-guide/security-bugs.html

Signed-off-by: Alexey Budankov 


Acked-by: Stephen Smalley 

[...]


Re: MCE handler gets NIP wrong on MPC8378

2020-02-18 Thread Christophe Leroy




Le 18/02/2020 à 18:07, Radu Rendec a écrit :

Hi Everyone,

The saved NIP seems to be broken inside machine_check_exception() on
MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
but I have seen other weird values.

I've been able to track down the entry code to head_32.S (vector 0x200),
but I'm not sure where/how the NIP value (where the exception occurred)
is captured.


NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and 
saved into _NIP(r11) in transfer_to_handler in entry_32.S


Can something clobber r12 at some point ?

Maybe add the following at some place to trap when it happens ?

tweqi r12, 0x900

If you put it just after reading SRR0, and just before writing into 
NIP(r11), you'll see if its wrong from the begining or if it is 
overwriten later.


Christophe



I also noticed most of the code has moved to head_32.h in newer kernel
versions, but EXCEPTION_PROLOG_1 and EXCEPTION_PROLOG_2 look pretty much
the same. I guess the same thing happens on a recent kernel, even though
I don't have an easy way to test it.

The original MCE that I see is triggered by a failed PCIe transaction,
but I was able to reproduce it by just reading from a (physically)
unmapped memory area. Sample code and kernel crash dump are included
below.

Can anyone please provide any suggestion as to what to look at next?

Thanks,
Radu

8<

#include 
#include 
#include 

static void __iomem *bad_addr_base;

static int __init test_mce_init(void)
{
 unsigned int x;

 bad_addr_base = ioremap(0xf000, 0x100);

 if (bad_addr_base) {
 __asm__ __volatile__ ("isync");
 x = ioread32(bad_addr_base);
 pr_info("Test: %#0x\n", x);
 } else
 pr_err("Cannot map\n");

 return 0;
}

static void __exit test_mce_exit(void)
{
 iounmap(bad_addr_base);
}

module_init(test_mce_init);
module_exit(test_mce_exit);

MODULE_LICENSE("GPL");

8<

[   14.977053] mce: loading out-of-tree module taints kernel.
[   15.004285] Disabling lock debugging due to kernel taint
[   15.026151] Machine check in kernel mode.
[   15.030153] Caused by (from SRR1=41000): [   15.033982] Transfer
error ack signal
[   15.037652] Oops: Machine check 1, sig: 7 [#1]
[   15.042088] PREEMPT [   15.044091] MPC8378_CUSTOM
[   15.046967] Modules linked in: mce(O+) iptable_filter ip_tables
x_tables ipv6 mpc8xxx_wdt yaffs spidev spi_fsl_spi spi_fsl_lib
spi_fsl_cpm fsl_mph_dr_of ehci_fsl ehci_hcd
[   15.067486] CPU: 0 PID: 1213 Comm: insmod Tainted: G   M C O
4.9.191-default-mpc8378-p3c692a64ae1d #31
[   15.078175] task: 9e83e550 task.stack: 9ea2e000
[   15.082699] NIP: 0900 LR: b147e030 CTR: 80015d50
[   15.087659] REGS: 9ea2fca0 TRAP: 0200   Tainted: G   M C O
(4.9.191-default-mpc8378-p3c692a64ae1d)
[   15.098084] MSR: 00041000 [   15.100973]   CR: 42002228  XER: 
[   15.104976] DAR: 80017414 DSISR: 
GPR00: b147e030 9ea2fd50 9e83e550  b148 9c652200 9ea2fd18 
GPR08: 9c652200  b148 1032 80015d50 100b93d0 b147e308 805eb3d8
GPR16: 003a 0550 b1473b5c b147c2a4 8048e444 80082b08  b147c0e8
GPR24: 0124 0578   b147c0a0 b147e000 9eb7c280 b147c0a0
NIP [0900] 0x900
[   15.139310] LR [b147e030] test_mce_init+0x30/0xa8 [mce]
[   15.144528] Call Trace:
[   15.146973] [9ea2fd50] [b147e000] test_mce_init+0x0/0xa8 [mce] (unreliable)
[   15.153940] [9ea2fd60] [b147e030] test_mce_init+0x30/0xa8 [mce]
[   15.159864] [9ea2fd70] [80003ac4] do_one_initcall+0xbc/0x184
[   15.165527] [9ea2fde0] [804857e8] do_init_module+0x64/0x1e4
[   15.171107] [9ea2fe00] [80086014] load_module+0x1c78/0x2268
[   15.176680] [9ea2fec0] [80086780] SyS_init_module+0x17c/0x190
[   15.182433] [9ea2ff40] [80010acc] ret_from_syscall+0x0/0x38
[   15.188005] --- interrupt: c01 at 0xfdfdb64
[   15.188005] LR = 0x10013c64
[   15.195309] Instruction dump:
[   15.198274]      
 
[   15.206047]     7d5043a6 
7d400026 
[   15.213824] ---[ end trace d38922938e009d45 ]---
[   15.218434]
[   16.219951] Kernel panic - not syncing: Fatal exception
[   16.225174] Rebooting in 1 seconds..



MCE handler gets NIP wrong on MPC8378

2020-02-18 Thread Radu Rendec
Hi Everyone,

The saved NIP seems to be broken inside machine_check_exception() on
MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
but I have seen other weird values.

I've been able to track down the entry code to head_32.S (vector 0x200),
but I'm not sure where/how the NIP value (where the exception occurred)
is captured.

I also noticed most of the code has moved to head_32.h in newer kernel
versions, but EXCEPTION_PROLOG_1 and EXCEPTION_PROLOG_2 look pretty much
the same. I guess the same thing happens on a recent kernel, even though
I don't have an easy way to test it.

The original MCE that I see is triggered by a failed PCIe transaction,
but I was able to reproduce it by just reading from a (physically)
unmapped memory area. Sample code and kernel crash dump are included
below.

Can anyone please provide any suggestion as to what to look at next?

Thanks,
Radu

8<

#include 
#include 
#include 

static void __iomem *bad_addr_base;

static int __init test_mce_init(void)
{
unsigned int x;

bad_addr_base = ioremap(0xf000, 0x100);

if (bad_addr_base) {
__asm__ __volatile__ ("isync");
x = ioread32(bad_addr_base);
pr_info("Test: %#0x\n", x);
} else
pr_err("Cannot map\n");

return 0;
}

static void __exit test_mce_exit(void)
{
iounmap(bad_addr_base);
}

module_init(test_mce_init);
module_exit(test_mce_exit);

MODULE_LICENSE("GPL");

8<

[   14.977053] mce: loading out-of-tree module taints kernel.
[   15.004285] Disabling lock debugging due to kernel taint
[   15.026151] Machine check in kernel mode.
[   15.030153] Caused by (from SRR1=41000): [   15.033982] Transfer
error ack signal
[   15.037652] Oops: Machine check 1, sig: 7 [#1]
[   15.042088] PREEMPT [   15.044091] MPC8378_CUSTOM
[   15.046967] Modules linked in: mce(O+) iptable_filter ip_tables
x_tables ipv6 mpc8xxx_wdt yaffs spidev spi_fsl_spi spi_fsl_lib
spi_fsl_cpm fsl_mph_dr_of ehci_fsl ehci_hcd
[   15.067486] CPU: 0 PID: 1213 Comm: insmod Tainted: G   M C O
4.9.191-default-mpc8378-p3c692a64ae1d #31
[   15.078175] task: 9e83e550 task.stack: 9ea2e000
[   15.082699] NIP: 0900 LR: b147e030 CTR: 80015d50
[   15.087659] REGS: 9ea2fca0 TRAP: 0200   Tainted: G   M C O
(4.9.191-default-mpc8378-p3c692a64ae1d)
[   15.098084] MSR: 00041000 [   15.100973]   CR: 42002228  XER: 
[   15.104976] DAR: 80017414 DSISR: 
GPR00: b147e030 9ea2fd50 9e83e550  b148 9c652200 9ea2fd18 
GPR08: 9c652200  b148 1032 80015d50 100b93d0 b147e308 805eb3d8
GPR16: 003a 0550 b1473b5c b147c2a4 8048e444 80082b08  b147c0e8
GPR24: 0124 0578   b147c0a0 b147e000 9eb7c280 b147c0a0
NIP [0900] 0x900
[   15.139310] LR [b147e030] test_mce_init+0x30/0xa8 [mce]
[   15.144528] Call Trace:
[   15.146973] [9ea2fd50] [b147e000] test_mce_init+0x0/0xa8 [mce] (unreliable)
[   15.153940] [9ea2fd60] [b147e030] test_mce_init+0x30/0xa8 [mce]
[   15.159864] [9ea2fd70] [80003ac4] do_one_initcall+0xbc/0x184
[   15.165527] [9ea2fde0] [804857e8] do_init_module+0x64/0x1e4
[   15.171107] [9ea2fe00] [80086014] load_module+0x1c78/0x2268
[   15.176680] [9ea2fec0] [80086780] SyS_init_module+0x17c/0x190
[   15.182433] [9ea2ff40] [80010acc] ret_from_syscall+0x0/0x38
[   15.188005] --- interrupt: c01 at 0xfdfdb64
[   15.188005] LR = 0x10013c64
[   15.195309] Instruction dump:
[   15.198274]      
 
[   15.206047]     7d5043a6 
7d400026 
[   15.213824] ---[ end trace d38922938e009d45 ]---
[   15.218434]
[   16.219951] Kernel panic - not syncing: Fatal exception
[   16.225174] Rebooting in 1 seconds..


Re: [PATCH 2/2] powerpc: avoid adjusting memory_limit for capture kernel memory reservation

2020-02-18 Thread Michal Suchánek
On Fri, Jun 28, 2019 at 12:51:19AM +0530, Hari Bathini wrote:
> Currently, if memory_limit is specified and it overlaps with memory to
> be reserved for capture kernel, memory_limit is adjusted to accommodate
> capture kernel. With memory reservation for capture kernel moved later
> (after enforcing memory limit), this adjustment no longer holds water.
> So, avoid adjusting memory_limit and error out instead.

The adjustment of memory limit does not look quite sound
 - There is no code to undo the adjustment in case reservation fails
 - I don't think reservation is still forced to the end of memory
   causing the kernel to use memory it was supposed not to
 - The CMA reservation again causes teh reserved memory to be used
 - Finally the CMA reservation makes this obsolete because the reserved
   memory is can be used by the system

> 
> Signed-off-by: Hari Bathini 
Reviewed-by: Michal Suchanek 
> ---
>  arch/powerpc/kernel/fadump.c|   16 
>  arch/powerpc/kernel/machine_kexec.c |   22 +++---
>  2 files changed, 11 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 4eab972..a784695 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -476,22 +476,6 @@ int __init fadump_reserve_mem(void)
>  #endif
>   }
>  
> - /*
> -  * Calculate the memory boundary.
> -  * If memory_limit is less than actual memory boundary then reserve
> -  * the memory for fadump beyond the memory_limit and adjust the
> -  * memory_limit accordingly, so that the running kernel can run with
> -  * specified memory_limit.
> -  */
> - if (memory_limit && memory_limit < memblock_end_of_DRAM()) {
> - size = get_fadump_area_size();
> - if ((memory_limit + size) < memblock_end_of_DRAM())
> - memory_limit += size;
> - else
> - memory_limit = memblock_end_of_DRAM();
> - printk(KERN_INFO "Adjusted memory_limit for firmware-assisted"
> - " dump, now %#016llx\n", memory_limit);
> - }
>   if (memory_limit)
>   memory_boundary = memory_limit;
>   else
> diff --git a/arch/powerpc/kernel/machine_kexec.c 
> b/arch/powerpc/kernel/machine_kexec.c
> index c4ed328..fc5533b 100644
> --- a/arch/powerpc/kernel/machine_kexec.c
> +++ b/arch/powerpc/kernel/machine_kexec.c
> @@ -125,10 +125,8 @@ void __init reserve_crashkernel(void)
>   crashk_res.end = crash_base + crash_size - 1;
>   }
>  
> - if (crashk_res.end == crashk_res.start) {
> - crashk_res.start = crashk_res.end = 0;
> - return;
> - }
> + if (crashk_res.end == crashk_res.start)
> + goto error_out;
>  
>   /* We might have got these values via the command line or the
>* device tree, either way sanitise them now. */
> @@ -170,15 +168,13 @@ void __init reserve_crashkernel(void)
>   if (overlaps_crashkernel(__pa(_stext), _end - _stext)) {
>   printk(KERN_WARNING
>   "Crash kernel can not overlap current kernel\n");
> - crashk_res.start = crashk_res.end = 0;
> - return;
> + goto error_out;
>   }
>  
>   /* Crash kernel trumps memory limit */
>   if (memory_limit && memory_limit <= crashk_res.end) {
> - memory_limit = crashk_res.end + 1;
> - printk("Adjusted memory limit for crashkernel, now 0x%llx\n",
> -memory_limit);
> + pr_err("Crash kernel size can't exceed memory_limit\n");
> + goto error_out;
>   }
>  
>   printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
> @@ -190,9 +186,13 @@ void __init reserve_crashkernel(void)
>   if (!memblock_is_region_memory(crashk_res.start, crash_size) ||
>   memblock_reserve(crashk_res.start, crash_size)) {
>   pr_err("Failed to reserve memory for crashkernel!\n");
> - crashk_res.start = crashk_res.end = 0;
> - return;
> + goto error_out;
>   }
> +
> + return;
> +error_out:
> + crashk_res.start = crashk_res.end = 0;
> + return;
>  }
>  
>  int overlaps_crashkernel(unsigned long start, unsigned long size)
> 


[PATCH rebased 2/2] powerpc: avoid adjusting memory_limit for capture kernel memory reservation

2020-02-18 Thread Michal Suchanek
From: Hari Bathini 

Currently, if memory_limit is specified and it overlaps with memory to
be reserved for capture kernel, memory_limit is adjusted to accommodate
capture kernel. With memory reservation for capture kernel moved later
(after enforcing memory limit), this adjustment no longer holds water.
So, avoid adjusting memory_limit and error out instead.

Signed-off-by: Hari Bathini 
Reviewed-by: Michal Suchanek 
---
 arch/powerpc/kernel/fadump.c | 16 
 arch/powerpc/kexec/core.c| 22 +++---
 2 files changed, 11 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index ff0114aeba9b..54d56ecadf1a 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -472,22 +472,6 @@ int __init fadump_reserve_mem(void)
}
}
 
-   /*
-* Calculate the memory boundary.
-* If memory_limit is less than actual memory boundary then reserve
-* the memory for fadump beyond the memory_limit and adjust the
-* memory_limit accordingly, so that the running kernel can run with
-* specified memory_limit.
-*/
-   if (memory_limit && memory_limit < memblock_end_of_DRAM()) {
-   size = get_fadump_area_size();
-   if ((memory_limit + size) < memblock_end_of_DRAM())
-   memory_limit += size;
-   else
-   memory_limit = memblock_end_of_DRAM();
-   printk(KERN_INFO "Adjusted memory_limit for firmware-assisted"
-   " dump, now %#016llx\n", memory_limit);
-   }
if (memory_limit)
mem_boundary = memory_limit;
else
diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c
index 078fe3d76feb..491e08d98c1b 100644
--- a/arch/powerpc/kexec/core.c
+++ b/arch/powerpc/kexec/core.c
@@ -126,10 +126,8 @@ void __init reserve_crashkernel(void)
crashk_res.end = crash_base + crash_size - 1;
}
 
-   if (crashk_res.end == crashk_res.start) {
-   crashk_res.start = crashk_res.end = 0;
-   return;
-   }
+   if (crashk_res.end == crashk_res.start)
+   goto error_out;
 
/* We might have got these values via the command line or the
 * device tree, either way sanitise them now. */
@@ -171,15 +169,13 @@ void __init reserve_crashkernel(void)
if (overlaps_crashkernel(__pa(_stext), _end - _stext)) {
printk(KERN_WARNING
"Crash kernel can not overlap current kernel\n");
-   crashk_res.start = crashk_res.end = 0;
-   return;
+   goto error_out;
}
 
/* Crash kernel trumps memory limit */
if (memory_limit && memory_limit <= crashk_res.end) {
-   memory_limit = crashk_res.end + 1;
-   printk("Adjusted memory limit for crashkernel, now 0x%llx\n",
-  memory_limit);
+   pr_err("Crash kernel size can't exceed memory_limit\n");
+   goto error_out;
}
 
printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
@@ -191,9 +187,13 @@ void __init reserve_crashkernel(void)
if (!memblock_is_region_memory(crashk_res.start, crash_size) ||
memblock_reserve(crashk_res.start, crash_size)) {
pr_err("Failed to reserve memory for crashkernel!\n");
-   crashk_res.start = crashk_res.end = 0;
-   return;
+   goto error_out;
}
+
+   return;
+error_out:
+   crashk_res.start = crashk_res.end = 0;
+   return;
 }
 
 int overlaps_crashkernel(unsigned long start, unsigned long size)
-- 
2.23.0



[PATCH rebased 1/2] powerpc: reserve memory for capture kernel after hugepages init

2020-02-18 Thread Michal Suchanek
From: Hari Bathini 

Sometimes, memory reservation for KDump/FADump can overlap with memory
marked for hugepages. This overlap leads to error, hang in KDump case
and copy error reported by f/w in case of FADump, while trying to
capture dump. Report error while setting up memory for the capture
kernel instead of running into issues while capturing dump, by moving
KDump/FADump reservation below MMU early init and failing gracefully
when hugepages memory overlaps with capture kernel memory.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/kernel/prom.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 6620f37abe73..0f14dc9c4dab 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -735,14 +735,6 @@ void __init early_init_devtree(void *params)
if (PHYSICAL_START > MEMORY_START)
memblock_reserve(MEMORY_START, 0x8000);
reserve_kdump_trampoline();
-#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
-   /*
-* If we fail to reserve memory for firmware-assisted dump then
-* fallback to kexec based kdump.
-*/
-   if (fadump_reserve_mem() == 0)
-#endif
-   reserve_crashkernel();
early_reserve_mem();
 
/* Ensure that total memory size is page-aligned. */
@@ -781,6 +773,14 @@ void __init early_init_devtree(void *params)
 #endif
 
mmu_early_init_devtree();
+#if defined(CONFIG_FA_DUMP) || defined(CONFIG_PRESERVE_FA_DUMP)
+   /*
+* If we fail to reserve memory for firmware-assisted dump then
+* fallback to kexec based kdump.
+*/
+   if (fadump_reserve_mem() == 0)
+#endif
+   reserve_crashkernel();
 
 #ifdef CONFIG_PPC_POWERNV
/* Scan and build the list of machine check recoverable ranges */
-- 
2.23.0



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Michal Hocko
On Tue 18-02-20 20:41:12, Sachin Sant wrote:
> 
> >> Yes, I can recreate the same problem with the patch applied on top of
> >> 5.6.0-rc2. 
> > 
> > And just to make sure. This was with 
> > http://lkml.kernel.org/r/fff0e636-4c36-ed10-281c-8cdb0687c...@virtuozzo.com
> > right?
> > 
> Yes, the same patch.
> 
> > If yes, is it possible that the specific node is somehow crippled (e.g.
> > some nodes don't have any memory and thus the allocator blows up)? In
> > other words what is the numa topology? (numactl -H)
> > 
> 
> Here is the o/p of numactl
> 
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB

OK, so what I expected. The node0 is memory less or simply not present
at all. Fun!

Anyway, I do not think it is expected that kmalloc_node just blows up
on those nodes. The page allocator simply falls back to the closest
node. Something for kmalloc maintainers I believe.

A short summary. kmalloc_node blows up when trying to allocate from a
memory less node.

> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
> 25 26 27 28 29 30 31
> node 1 size: 35247 MB
> node 1 free: 30907 MB
> node distances:
> node   0   1 
>   0:  10  40 
>   1:  40  10 
> # 
> 
> Thanks
> -Sachin

-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Sachin Sant


>> Yes, I can recreate the same problem with the patch applied on top of
>> 5.6.0-rc2. 
> 
> And just to make sure. This was with 
> http://lkml.kernel.org/r/fff0e636-4c36-ed10-281c-8cdb0687c...@virtuozzo.com
> right?
> 
Yes, the same patch.

> If yes, is it possible that the specific node is somehow crippled (e.g.
> some nodes don't have any memory and thus the allocator blows up)? In
> other words what is the numa topology? (numactl -H)
> 

Here is the o/p of numactl

# numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node   0   1 
  0:  10  40 
  1:  40  10 
# 

Thanks
-Sachin


Re: [PATCH v2] powerpc/kprobes: Fix trap address when trap happened in real mode

2020-02-18 Thread Naveen N. Rao

Christophe Leroy wrote:

When a program check exception happens while MMU translation is
disabled, following Oops happens in kprobe_handler() in the following
code:

} else if (*addr != BREAKPOINT_INSTRUCTION) {

[   33.098554] BUG: Unable to handle kernel data access on read at 0xe268
[   33.105091] Faulting instruction address: 0xc000ec34
[   33.110010] Oops: Kernel access of bad area, sig: 11 [#1]
[   33.115348] BE PAGE_SIZE=16K PREEMPT CMPC885
[   33.119540] Modules linked in:
[   33.122591] CPU: 0 PID: 429 Comm: cat Not tainted 
5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267
[   33.131005] NIP:  c000ec34 LR: c000ecd8 CTR: c019cab8
[   33.136002] REGS: ca4d3b58 TRAP: 0300   Not tainted  
(5.6.0-rc1-s3k-dev-00824-g84195dc6c58a)
[   33.144324] MSR:  1032   CR: 2a4d3c52  XER: 
[   33.150699] DAR: e268 DSISR: c000
[   33.150699] GPR00: c000b09c ca4d3c10 c66d0620  ca4d3c60  
9032 
[   33.150699] GPR08: 0002  c087de44 c000afe0 c66d0ad0 100d3dd6 
fff3 
[   33.150699] GPR16:  0041  ca4d3d70   
416d 
[   33.150699] GPR24: 0004 c53b6128  e268  c07c 
c07bb6fc ca4d3c60
[   33.188015] NIP [c000ec34] kprobe_handler+0x128/0x290
[   33.192989] LR [c000ecd8] kprobe_handler+0x1cc/0x290
[   33.197854] Call Trace:
[   33.200340] [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc
[   33.206590] [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4
[   33.212392] --- interrupt: 700 at 0xe268
[   33.270401] Instruction dump:
[   33.273335] 913e0008 8122 3861 3929 9122 80010024 bb410008 
7c0803a6
[   33.280992] 38210020 4e800020 3860 4e800020 <813b> 6d2a7fe0 2f8a0008 
419e0154
[   33.288841] ---[ end trace 5b9152d4cdadd06d ]---

kprobe is not prepared to handle events in real mode and functions
running in real mode should have been blacklisted, so kprobe_handler()
can safely bail out telling 'this trap is not mine' for any trap that
happened while in real-mode.

If the trap happened with MSR_IR cleared, return 0 immediately.

Reported-by: Larry Finger 
Fixes: 6cc89bad60a6 ("powerpc/kprobes: Invoke handlers directly")
Cc: sta...@vger.kernel.org
Cc: Naveen N. Rao 
Cc: Masami Hiramatsu 
Signed-off-by: Christophe Leroy 

---
v2: bailing out instead of converting real-time address to virtual and 
continuing.

The bug might have existed even before that commit from Naveen.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 2d27ec4feee4..673f349662e8 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -264,6 +264,9 @@ int kprobe_handler(struct pt_regs *regs)
if (user_mode(regs))
return 0;

+   if (!(regs->msr & MSR_IR))
+   return 0;
+


Should we also check for MSR_DR? Are there scenarios with ppc32 where 
MSR_IR is on, but MSR_DR is off?



- Naveen



Re: [PATCH 2/2] powerpc/kprobes: Reduce depth of a test

2020-02-18 Thread Naveen N. Rao

Christophe Leroy wrote:

if (a) {
if (b)
do_something();
}

Is equivalent to

if (a & b)
do_something();

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 58 +--
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 7a925eb76ec0..d7c80a078c1e 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -277,38 +277,36 @@ int kprobe_handler(struct pt_regs *regs)
 
 	/* Check we're not actually recursing */

p = get_kprobe(addr);
-   if (kprobe_running()) {
-   if (p) {
-   kprobe_opcode_t insn = *p->ainsn.insn;
-   if (kcb->kprobe_status == KPROBE_HIT_SS &&
-   is_trap(insn)) {
-   /* Turn off 'trace' bits */
-   regs->msr &= ~MSR_SINGLESTEP;
-   regs->msr |= kcb->kprobe_saved_msr;
-   goto no_kprobe;
-   }
-   /* We have reentered the kprobe_handler(), since
-* another probe was hit while within the handler.
-* We here save the original kprobes variables and
-* just single step on the instruction of the new probe
-* without calling any user handlers.
-*/
-   save_previous_kprobe(kcb);
-   set_current_kprobe(p, regs, kcb);
-   kprobes_inc_nmissed_count(p);
-   kcb->kprobe_status = KPROBE_REENTER;
-   if (p->ainsn.boostable >= 0) {
-   ret = try_to_emulate(p, regs);
-
-   if (ret > 0) {
-   restore_previous_kprobe(kcb);
-   preempt_enable_no_resched();
-   return 1;
-   }
+   if (kprobe_running() && p) {
+   kprobe_opcode_t insn = *p->ainsn.insn;
+
+   if (kcb->kprobe_status == KPROBE_HIT_SS && is_trap(insn)) {
+   /* Turn off 'trace' bits */
+   regs->msr &= ~MSR_SINGLESTEP;
+   regs->msr |= kcb->kprobe_saved_msr;
+   goto no_kprobe;
+   }
+   /* We have reentered the kprobe_handler(), since
+* another probe was hit while within the handler.
+* We here save the original kprobes variables and
+* just single step on the instruction of the new probe
+* without calling any user handlers.
+*/
+   save_previous_kprobe(kcb);
+   set_current_kprobe(p, regs, kcb);
+   kprobes_inc_nmissed_count(p);
+   kcb->kprobe_status = KPROBE_REENTER;
+   if (p->ainsn.boostable >= 0) {
+   ret = try_to_emulate(p, regs);
+
+   if (ret > 0) {
+   restore_previous_kprobe(kcb);
+   preempt_enable_no_resched();
+   return 1;
}
-   prepare_singlestep(p, regs);
-   return 1;
}
+   prepare_singlestep(p, regs);
+   return 1;
}
 


If we move the below !p case before the check for kprobe_running() right 
after get_kprobe(), we won't need to check for (p) above and we won't 
have any change in logic from Patch 1.



if (!p) {



- Naveen



Re: [PATCH 1/2] powerpc/kprobes: Remove redundant code

2020-02-18 Thread Naveen N. Rao

Christophe Leroy wrote:

At the time being we have something like

if (something) {
p = get();
if (p) {
if (something_wrong)
goto out;
...
return;
} else if (a != b) {
if (some_error)
goto out;
...
}
goto out;
}
p = get();
if (!p) {
if (a != b) {
if (some_error)
goto out;
...
}
goto out;
}

This is similar to

p = get();
if (something) {
if (p) {
if (something_wrong)
goto out;
...
return;
}
}
if (!p) {
if (a != b) {
if (some_error)
goto out;
...
}
goto out;
}

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)


Good cleanup, thanks.



diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index f8b848aa65bd..7a925eb76ec0 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -276,8 +276,8 @@ int kprobe_handler(struct pt_regs *regs)
kcb = get_kprobe_ctlblk();
 
 	/* Check we're not actually recursing */

+   p = get_kprobe(addr);
if (kprobe_running()) {
-   p = get_kprobe(addr);
if (p) {
kprobe_opcode_t insn = *p->ainsn.insn;
if (kcb->kprobe_status == KPROBE_HIT_SS &&
@@ -308,22 +308,9 @@ int kprobe_handler(struct pt_regs *regs)
}
prepare_singlestep(p, regs);
return 1;
-   } else if (*addr != BREAKPOINT_INSTRUCTION) {
-   /* If trap variant, then it belongs not to us */
-   kprobe_opcode_t cur_insn = *addr;
-
-   if (is_trap(cur_insn))
-   goto no_kprobe;
-   /* The breakpoint instruction was removed by
-* another cpu right after we hit, no further
-* handling of this interrupt is appropriate
-*/
-   ret = 1;
}
-   goto no_kprobe;


A minot nit -- removing the above goto makes a slight change to the 
logic. But, see my comments for the next patch.


- Naveen


}
 
-	p = get_kprobe(addr);

if (!p) {
if (*addr != BREAKPOINT_INSTRUCTION) {
/*
--
2.25.0






[PATCH v2] powerpc/kprobes: Ignore traps that happened in real mode

2020-02-18 Thread Christophe Leroy
When a program check exception happens while MMU translation is
disabled, following Oops happens in kprobe_handler() in the following
code:

} else if (*addr != BREAKPOINT_INSTRUCTION) {

[   33.098554] BUG: Unable to handle kernel data access on read at 0xe268
[   33.105091] Faulting instruction address: 0xc000ec34
[   33.110010] Oops: Kernel access of bad area, sig: 11 [#1]
[   33.115348] BE PAGE_SIZE=16K PREEMPT CMPC885
[   33.119540] Modules linked in:
[   33.122591] CPU: 0 PID: 429 Comm: cat Not tainted 
5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267
[   33.131005] NIP:  c000ec34 LR: c000ecd8 CTR: c019cab8
[   33.136002] REGS: ca4d3b58 TRAP: 0300   Not tainted  
(5.6.0-rc1-s3k-dev-00824-g84195dc6c58a)
[   33.144324] MSR:  1032   CR: 2a4d3c52  XER: 
[   33.150699] DAR: e268 DSISR: c000
[   33.150699] GPR00: c000b09c ca4d3c10 c66d0620  ca4d3c60  
9032 
[   33.150699] GPR08: 0002  c087de44 c000afe0 c66d0ad0 100d3dd6 
fff3 
[   33.150699] GPR16:  0041  ca4d3d70   
416d 
[   33.150699] GPR24: 0004 c53b6128  e268  c07c 
c07bb6fc ca4d3c60
[   33.188015] NIP [c000ec34] kprobe_handler+0x128/0x290
[   33.192989] LR [c000ecd8] kprobe_handler+0x1cc/0x290
[   33.197854] Call Trace:
[   33.200340] [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc
[   33.206590] [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4
[   33.212392] --- interrupt: 700 at 0xe268
[   33.270401] Instruction dump:
[   33.273335] 913e0008 8122 3861 3929 9122 80010024 bb410008 
7c0803a6
[   33.280992] 38210020 4e800020 3860 4e800020 <813b> 6d2a7fe0 2f8a0008 
419e0154
[   33.288841] ---[ end trace 5b9152d4cdadd06d ]---

kprobe is not prepared to handle events in real mode and functions
running in real mode should have been blacklisted, so kprobe_handler()
can safely bail out telling 'this trap is not mine' for any trap that
happened while in real-mode.

If the trap happened with MSR_IR cleared, return 0 immediately.

Reported-by: Larry Finger 
Fixes: 6cc89bad60a6 ("powerpc/kprobes: Invoke handlers directly")
Cc: sta...@vger.kernel.org
Cc: Naveen N. Rao 
Cc: Masami Hiramatsu 
Signed-off-by: Christophe Leroy 

---
Resending v2 with a more appropriate name

v2: bailing out instead of converting real-time address to virtual and 
continuing.

The bug might have existed even before that commit from Naveen.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 2d27ec4feee4..673f349662e8 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -264,6 +264,9 @@ int kprobe_handler(struct pt_regs *regs)
if (user_mode(regs))
return 0;
 
+   if (!(regs->msr & MSR_IR))
+   return 0;
+
/*
 * We don't want to be preempted for the entire
 * duration of kprobe processing
-- 
2.25.0



[PATCH v2] powerpc/kprobes: Fix trap address when trap happened in real mode

2020-02-18 Thread Christophe Leroy
When a program check exception happens while MMU translation is
disabled, following Oops happens in kprobe_handler() in the following
code:

} else if (*addr != BREAKPOINT_INSTRUCTION) {

[   33.098554] BUG: Unable to handle kernel data access on read at 0xe268
[   33.105091] Faulting instruction address: 0xc000ec34
[   33.110010] Oops: Kernel access of bad area, sig: 11 [#1]
[   33.115348] BE PAGE_SIZE=16K PREEMPT CMPC885
[   33.119540] Modules linked in:
[   33.122591] CPU: 0 PID: 429 Comm: cat Not tainted 
5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267
[   33.131005] NIP:  c000ec34 LR: c000ecd8 CTR: c019cab8
[   33.136002] REGS: ca4d3b58 TRAP: 0300   Not tainted  
(5.6.0-rc1-s3k-dev-00824-g84195dc6c58a)
[   33.144324] MSR:  1032   CR: 2a4d3c52  XER: 
[   33.150699] DAR: e268 DSISR: c000
[   33.150699] GPR00: c000b09c ca4d3c10 c66d0620  ca4d3c60  
9032 
[   33.150699] GPR08: 0002  c087de44 c000afe0 c66d0ad0 100d3dd6 
fff3 
[   33.150699] GPR16:  0041  ca4d3d70   
416d 
[   33.150699] GPR24: 0004 c53b6128  e268  c07c 
c07bb6fc ca4d3c60
[   33.188015] NIP [c000ec34] kprobe_handler+0x128/0x290
[   33.192989] LR [c000ecd8] kprobe_handler+0x1cc/0x290
[   33.197854] Call Trace:
[   33.200340] [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc
[   33.206590] [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4
[   33.212392] --- interrupt: 700 at 0xe268
[   33.270401] Instruction dump:
[   33.273335] 913e0008 8122 3861 3929 9122 80010024 bb410008 
7c0803a6
[   33.280992] 38210020 4e800020 3860 4e800020 <813b> 6d2a7fe0 2f8a0008 
419e0154
[   33.288841] ---[ end trace 5b9152d4cdadd06d ]---

kprobe is not prepared to handle events in real mode and functions
running in real mode should have been blacklisted, so kprobe_handler()
can safely bail out telling 'this trap is not mine' for any trap that
happened while in real-mode.

If the trap happened with MSR_IR cleared, return 0 immediately.

Reported-by: Larry Finger 
Fixes: 6cc89bad60a6 ("powerpc/kprobes: Invoke handlers directly")
Cc: sta...@vger.kernel.org
Cc: Naveen N. Rao 
Cc: Masami Hiramatsu 
Signed-off-by: Christophe Leroy 

---
v2: bailing out instead of converting real-time address to virtual and 
continuing.

The bug might have existed even before that commit from Naveen.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/kprobes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 2d27ec4feee4..673f349662e8 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -264,6 +264,9 @@ int kprobe_handler(struct pt_regs *regs)
if (user_mode(regs))
return 0;
 
+   if (!(regs->msr & MSR_IR))
+   return 0;
+
/*
 * We don't want to be preempted for the entire
 * duration of kprobe processing
-- 
2.25.0



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Michal Hocko
On Tue 18-02-20 19:30:33, Sachin Sant wrote:
> 
> 
> > On 18-Feb-2020, at 5:25 PM, Michal Hocko  wrote:
> > 
> > On Tue 18-02-20 17:10:47, Sachin Sant wrote:
> >> 
>  could you please test your boot with original patch from here:
>  
>  https://patchwork.kernel.org/patch/11360007/
> >>> 
> >>> After you tried the above patch instead of the problem patch,
> >>> do one more test and apply the below on current linux-next.
> >>> Please, say which of the patches makes your kernel bootable again.
> >>> 
> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>> index 63bb6a2aab81..7b9b48dcbc60 100644
> >>> --- a/mm/memcontrol.c
> >>> +++ b/mm/memcontrol.c
> >>> @@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct 
> >>> mem_cgroup *memcg,
> >>>   if (!old)
> >>>   return 0;
> >>> 
> >>> - new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> >>> + new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> >>>   if (!new)
> >>>   return -ENOMEM;
> >>> 
> >>> @@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct 
> >>> mem_cgroup *memcg)
> >>>   mutex_lock(_shrinker_map_mutex);
> >>>   size = memcg_shrinker_map_size;
> >>>   for_each_node(nid) {
> >>> - map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> >>> + map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> >>>   if (!map) {
> >>>   memcg_free_shrinker_maps(memcg);
> >>>   ret = -ENOMEM;
> >> 
> >> With this incremental patch applied on top of current linux-next, machine 
> >> fails to boot
> > 
> > Your calltrace points to a standard system call path. I do not see any
> > reason why that commit should cause any problems. Do you see the
> > same when applying the patch you managed to bisect to on top of Linus
> > tree? Just to rule out any other potential problems in linux-next?
> 
> Yes, I can recreate the same problem with the patch applied on top of
> 5.6.0-rc2. 

And just to make sure. This was with 
http://lkml.kernel.org/r/fff0e636-4c36-ed10-281c-8cdb0687c...@virtuozzo.com
right?

If yes, is it possible that the specific node is somehow crippled (e.g.
some nodes don't have any memory and thus the allocator blows up)? In
other words what is the numa topology? (numactl -H)

> CONFIG_SLUB is enabled in my case. I have attached the .config.
> The LPAR has 34GB of memory allocated.
> 
> [8.766078] BUG: Kernel NULL pointer dereference on read at 0x73b0
> [8.766083] Faulting instruction address: 0xc03d38a4
> [8.766089] Oops: Kernel access of bad area, sig: 11 [#1]
> [8.766093] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [8.766098] Modules linked in:
> [8.766103] CPU: 12 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-autotest+ #2
> [8.766107] NIP:  c03d38a4 LR: c03d3e44 CTR: 
> 
> [8.766113] REGS: c008b37836e0 TRAP: 0300   Not tainted  
> (5.6.0-rc2-autotest+)
> [8.766118] MSR:  80009033   CR: 24004844  
> XER: 
> [8.766125] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
> IRQMASK: 1 
> [8.766125] GPR00: c03d3e44 c008b3783970 c155d500 
> c008b301f500 
> [8.766125] GPR04: 0dc0 0002 c03443f8 
> c008bac98620 
> [8.766125] GPR08: 0008b9bf 0001  
>  
> [8.766125] GPR12: 24004844 c0001ec5d200  
>  
> [8.766125] GPR16: c7be2048 c1595818 c1750c98 
> 0002 
> [8.766125] GPR20: c1750ca8 c1624470 000fffe0 
> 5deadbeef122 
> [8.766125] GPR24: 0001 0dc0 0002 
> c03443f8 
> [8.766125] GPR28: c008b301f500 c008bac98620  
> c00c02286fc0 
> [8.766172] NIP [c03d38a4] ___slab_alloc+0x1f4/0x760
> [8.766177] LR [c03d3e44] __slab_alloc+0x34/0x60
> [8.766181] Call Trace:
> [8.766184] [c008b3783970] [c03d39e4] 
> ___slab_alloc+0x334/0x760 (unreliable)
> [8.766191] [c008b3783a50] [c03d3e44] __slab_alloc+0x34/0x60
> [8.766196] [c008b3783a80] [c03d5250] 
> __kmalloc_node+0x110/0x490
> [8.766203] [c008b3783b00] [c03443f8] kvmalloc_node+0x58/0x110
> [8.766208] [c008b3783b40] [c03fcf58] 
> mem_cgroup_css_online+0x108/0x270
> [8.766215] [c008b3783ba0] [c0236078] online_css+0x48/0xd0
> [8.766220] [c008b3783bd0] [c023eebc] 
> cgroup_apply_control_enable+0x2ec/0x4d0
> [8.766226] [c008b3783cb0] [c0242728] cgroup_mkdir+0x228/0x5f0
> [8.766232] [c008b3783d20] [c051ab48] 
> kernfs_iop_mkdir+0xb8/0x170
> [8.766238] [c008b3783d50] [c043a7c0] vfs_mkdir+0x110/0x230
> [8.766243] 

[PATCH] powerpc/entry: Fix a #if which should be a #ifdef in entry_32.S

2020-02-18 Thread Christophe Leroy
Fixes: 12c3f1fd87bf ("powerpc/32s: get rid of CPU_FTR_601 feature")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 0713daa651d9..b38e6b549d48 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -783,7 +783,7 @@ fast_exception_return:
 1: lis r3,exc_exit_restart_end@ha
addir3,r3,exc_exit_restart_end@l
cmplw   r12,r3
-#if CONFIG_PPC_BOOK3S_601
+#ifdef CONFIG_PPC_BOOK3S_601
bge 2b
 #else
bge 3f
@@ -791,7 +791,7 @@ fast_exception_return:
lis r4,exc_exit_restart@ha
addir4,r4,exc_exit_restart@l
cmplw   r12,r4
-#if CONFIG_PPC_BOOK3S_601
+#ifdef CONFIG_PPC_BOOK3S_601
blt 2b
 #else
blt 3f
-- 
2.25.0



Re: [PATCH] powerpc/kprobes: Fix trap address when trap happened in real mode

2020-02-18 Thread Naveen N. Rao

Masami, Christophe,
Apologies for pitching in late here...

Masami Hiramatsu wrote:

On Tue, 18 Feb 2020 12:04:41 +0100
Christophe Leroy  wrote:


>> Nevertheless, if one symbol has been forgotten in the blacklist, I think
>> it is a problem if it generate Oopses.
> 
> There is a long history also on x86 to make a blacklist. Anyway, how did

> you get this error on PPC32? Somewhere would you like to probe and
> it is a real mode function? Or, it happened unexpectedly?

The first Oops I got was triggered by a WARN_ON() kind of trap in real 
mode. The trap exception handler called kprobe_handler() which tried to 
read the instruction at the trap address (which was a real-mode address) 
so it triggered a Bad Access Fault.


This was initially the purpose of my patch.


OK, then filtering the trap reason in kprobe handler is a bit strange.
It should be done in the previous stage (maybe in trap.c)
Can we filter it by exception flag or only by checking the instruction
which causes the exception, or needs get_kprobe()...?


I think Masami's earlier patch proposal to bail out early from 
kprobe_handler() is appropriate here. We don't support kprobe in real 
mode since we don't have a way to ensure that the pre/post handlers work 
properly.


We will obviously also have to blacklist some of the real mode code from 
being probed to begin with. In addition, we will also have to blacklist 
any location where we can't take a trap (MSR_RI being unset, as an 
example)


Christophe,
See some of the below patch series:
https://patchwork.ozlabs.org/patch/752336/
https://patchwork.ozlabs.org/patch/752333/
https://patchwork.ozlabs.org/patch/782399/


- Naveen



Re: [PATCH] powerpc/kprobes: Fix trap address when trap happened in real mode

2020-02-18 Thread Christophe Leroy




Le 18/02/2020 à 13:33, Masami Hiramatsu a écrit :

On Tue, 18 Feb 2020 12:04:41 +0100
Christophe Leroy  wrote:


Nevertheless, if one symbol has been forgotten in the blacklist, I think
it is a problem if it generate Oopses.


There is a long history also on x86 to make a blacklist. Anyway, how did
you get this error on PPC32? Somewhere would you like to probe and
it is a real mode function? Or, it happened unexpectedly?


The first Oops I got was triggered by a WARN_ON() kind of trap in real
mode. The trap exception handler called kprobe_handler() which tried to
read the instruction at the trap address (which was a real-mode address)
so it triggered a Bad Access Fault.

This was initially the purpose of my patch.


OK, then filtering the trap reason in kprobe handler is a bit strange.
It should be done in the previous stage (maybe in trap.c)


See commit 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.6-rc2=6cc89bad60a673a24386f1ada83de8a068a78909



Can we filter it by exception flag or only by checking the instruction
which causes the exception, or needs get_kprobe()...?


The trap instruction used by kprobe is also used for other purposes like 
BUG_ON() or WARN_ON(), so needs get_kprobe()







After discussion with you, I started looking at what would be the effect
of setting a kprobe event in a function which runs in real mode.


If the kprobe single-stepping (or emulation) works in real mode, just
ignore the kprobes pre/post_handlers and increment nmissed count.

If that doesn't work, we have to call a BUG_ON, because we can not
continue the code execution. And also, you have to find a way to make
a blacklist for real mode code.


Yes, it has to be done function by function (hoppefully there's not more 
than a dozen).
But I'd like something which can fails gracefully for the functions we 
will forget to mark noprobe.


But as a first step I'd really like a bug fix in 5.6 to avoid Oopsing in 
kprobe_handler() at a non-kprobe trap.


Christophe


[PATCH v2 2/2] powerpc/perf: Check pmus_inuse flag in perf_event_print_debug()

2020-02-18 Thread Madhavan Srinivasan
pmu_inuse flag is part of lppaca struct which notifies the hypervisor
whether guest/partition is using PMUs. This provides a hint for
save/restore of PMU registers. Currently perf_event_print_debug()
does not check for pmu_inuse flag and it is not safe to use it to
dump PMU SPRs in a CONFIG_PSERIES.

Patch adds two things here. 1) An inline ppc_get_pmu_inuse() to get
the pmu_inuse value and 2)check in perf_event_print_debug() before
dumping the PMU SPRs.

ppc_get_pmu_inuse() is based on ppc_set_pmu_inuse() and includes same
CONFIG_ checks.

Signed-off-by: Madhavan Srinivasan 
---
Changelog v1:
- Fixed pmac32_deconfig build break 
- Fixed errors reported by checkpatch.pl

 arch/powerpc/include/asm/pmc.h  | 15 +++
 arch/powerpc/perf/core-book3s.c |  9 +
 2 files changed, 24 insertions(+)

diff --git a/arch/powerpc/include/asm/pmc.h b/arch/powerpc/include/asm/pmc.h
index c6bbe9778d3c..600c133b49cd 100644
--- a/arch/powerpc/include/asm/pmc.h
+++ b/arch/powerpc/include/asm/pmc.h
@@ -34,11 +34,26 @@ static inline void ppc_set_pmu_inuse(int inuse)
 #endif
 }
 
+static inline u8 ppc_get_pmu_inuse(void)
+{
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+#ifdef CONFIG_PPC_PSERIES
+   return get_lppaca()->pmcregs_in_use;
+#endif
+   }
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   return get_paca()->pmcregs_in_use;
+#endif
+#endif
+}
+
 extern void power4_enable_pmcs(void);
 
 #else /* CONFIG_PPC64 */
 
 static inline void ppc_set_pmu_inuse(int inuse) { }
+static inline u8 ppc_get_pmu_inuse(void) { return 0; }
 
 #endif
 
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 6e35bf9ff80a..61d4a290b336 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -808,6 +808,15 @@ void perf_event_print_debug(void)
if (!ppmu->n_counter)
return;
 
+   /*
+* Check pmu_inuse flag. As per PAPR spec, hypersivor
+* will save/restore the PMU regs only if pmu_inuse is
+* set. If its not enable, values dumped from these SPRs
+* may not be valid or useful.
+*/
+   if (!ppc_get_pmu_inuse())
+   return;
+
local_irq_save(flags);
 
pr_info("CPU: %d PMU registers, ppmu = %s n_counters = %d",
-- 
2.21.1



[PATCH 1/2] powerpc/perf: Add mtmmcr0(FC) after ppc_set_pmu_inuse(1)

2020-02-18 Thread Madhavan Srinivasan
pmu_inuse flag is part of lppaca struct which notifies the hypervisor
whether guest/partition is using PMUs. This provides a hint incase of
save/restore of PMU registers. And in power_pmu_enable(), linux sets
the pmu_inuse flag and then updates the PMU registers. Current sequence
in power_pmu_enable() is 1) update pmc_inuse flag 2)update MMCRA, MMCR1,
MMCR0 and so on. But with this sequence, there is a window where when
updating MMCRA, hypersior could load stale value to MMCR0 which could
cause a PMI exception. Patch add a mtmmcr0 with freeze counter bit set
right after updating the pmu_inuse flag to avoid any overflow scenarios.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/core-book3s.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a934e8c8a9b8..6e35bf9ff80a 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1343,6 +1343,7 @@ static void power_pmu_enable(struct pmu *pmu)
 * Then unfreeze the events.
 */
ppc_set_pmu_inuse(1);
+   mtspr(SPRN_MMCR0, MMCR0_FC);
mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
mtspr(SPRN_MMCR0, (cpuhw->mmcr[0] & ~(MMCR0_PMC1CE | MMCR0_PMCjCE))
-- 
2.21.1



Re: [PATCH] powerpc/kprobes: Fix trap address when trap happened in real mode

2020-02-18 Thread Masami Hiramatsu
On Tue, 18 Feb 2020 12:04:41 +0100
Christophe Leroy  wrote:

> >> Nevertheless, if one symbol has been forgotten in the blacklist, I think
> >> it is a problem if it generate Oopses.
> > 
> > There is a long history also on x86 to make a blacklist. Anyway, how did
> > you get this error on PPC32? Somewhere would you like to probe and
> > it is a real mode function? Or, it happened unexpectedly?
> 
> The first Oops I got was triggered by a WARN_ON() kind of trap in real 
> mode. The trap exception handler called kprobe_handler() which tried to 
> read the instruction at the trap address (which was a real-mode address) 
> so it triggered a Bad Access Fault.
> 
> This was initially the purpose of my patch.

OK, then filtering the trap reason in kprobe handler is a bit strange.
It should be done in the previous stage (maybe in trap.c)
Can we filter it by exception flag or only by checking the instruction
which causes the exception, or needs get_kprobe()...?

> After discussion with you, I started looking at what would be the effect 
> of setting a kprobe event in a function which runs in real mode.

If the kprobe single-stepping (or emulation) works in real mode, just
ignore the kprobes pre/post_handlers and increment nmissed count.

If that doesn't work, we have to call a BUG_ON, because we can not
continue the code execution. And also, you have to find a way to make
a blacklist for real mode code.

> >>
> >>> Or, some parts are possble to run under both real mode and kernel mode?
> >>
> >> I don't think so, at least on PPC32
> > 
> > OK, that's a good news. Also, are there any independent section where such
> > real mode functions are stored? (I can see start_real_trampolines in
> > sections.h) If that kind of sections are defined, it is easy to make
> > a blacklist in arch_populate_kprobe_blacklist(). See 
> > arch/arm64/kernel/probes/kprobes.c.
> 
> Part of them are in .head.text, and this section is already blacklisted 
> throught function arch_within_kprobe_blacklist()

Then, those are OK.

> 
> But there are several other functions which are not there. For instance, 
> many things within entry_32.S, and also things in hash_low.S
> On PPC64 (ie in entry_64.S) they were explicitely blacklisted with 
> _ASM_NOKPROBE_SYMBOL(). We have to do the same on PPC64

Agreed. Some of such unstable state code must not be probed.

>  So the 'program check' exception handler doesn't find the owner of the
>  trap hence generate an Oops.
> 
>  Even if we don't want kprobe() to proceed with the event entirely
>  (allthough it works at least for simple events), I'd expect it to fail
>  gracefully.
> >>>
> >>> Agreed. I thought it was easy to identify real mode code. But if it is
> >>> hard, we should apply your first patch and also skip user handlers
> >>> if we are in the real mode (and increment missed count).
> >>
> >> user handlers are already skipped.
> > 
> > Yes, if you don't put a kprobes on real mode code. However, if user
> > (accidentally) puts a probe on real mode code, it might call a
> > user handler?
> 
> Are we talking about the same thing ?

Ah, sorry about that. "user handler" here I meant was "kprobe pre/post_handler
function defined by the user of kprobes".

> 
> Only kernel code can run in real mode, so the following code at the 
> beginning of kprobe_handler() does the job ?
> 
>   if (user_mode(regs))
>   return 0;

Yes, you're right.

> >> What do you think about my latest proposal below ? If a trap is
> >> encoutered in real mode, if checks if the matching virtual address
> >> corresponds to a valid kprobe. If it is, it skips it. If not, it returns
> >> 0 to tell "it's no me". You are also talking about incrementing the
> >> missed count. Who do we do that ?
> > 
> > I rather like your first patch. If there is a kprobes, we can not skip
> > the instruction, because there is an instruction which must be executed.
> > (or single-skipped, but I'm not sure the emulator works correctly on
> > real mode)
> 
> Oops, yes of course.

Thank you,

-- 
Masami Hiramatsu 


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Michal Hocko
On Tue 18-02-20 17:10:47, Sachin Sant wrote:
> 
> >> could you please test your boot with original patch from here:
> >> 
> >> https://patchwork.kernel.org/patch/11360007/
> > 
> > After you tried the above patch instead of the problem patch,
> > do one more test and apply the below on current linux-next.
> > Please, say which of the patches makes your kernel bootable again.
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 63bb6a2aab81..7b9b48dcbc60 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct 
> > mem_cgroup *memcg,
> > if (!old)
> > return 0;
> > 
> > -   new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> > +   new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> > if (!new)
> > return -ENOMEM;
> > 
> > @@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
> > *memcg)
> > mutex_lock(_shrinker_map_mutex);
> > size = memcg_shrinker_map_size;
> > for_each_node(nid) {
> > -   map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> > +   map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> > if (!map) {
> > memcg_free_shrinker_maps(memcg);
> > ret = -ENOMEM;
> 
> With this incremental patch applied on top of current linux-next, machine 
> fails to boot

Your calltrace points to a standard system call path. I do not see any
reason why that commit should cause any problems. Do you see the
same when applying the patch you managed to bisect to on top of Linus
tree? Just to rule out any other potential problems in linux-next?
This all smells like a corrupted slab allocator. Which allocator do
you use?

> [8.868433] BUG: Kernel NULL pointer dereference on read at 0x73b0
> [8.868439] Faulting instruction address: 0xc03d55f4
> [8.868444] Oops: Kernel access of bad area, sig: 11 [#1]
> [8.868449] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [8.868453] Modules linked in:
> [8.868458] CPU: 18 PID: 1 Comm: systemd Not tainted 
> 5.6.0-rc2-next-20200218-autotest+ #4
> [8.868463] NIP:  c03d55f4 LR: c03d5b94 CTR: 
> 
> [8.868468] REGS: c008b3783710 TRAP: 0300   Not tainted  
> (5.6.0-rc2-next-20200218-autotest+)
> [8.868474] MSR:  80009033   CR: 24004844  
> XER: 
> [8.868481] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
> IRQMASK: 1 
> [8.868481] GPR00: c03d5b94 c008b37839a0 c155d400 
> c008b301f500 
> [8.868481] GPR04: 0dc0 0002 c03fee38 
> c008bb298620 
> [8.868481] GPR08: 0008ba1f 0001  
>  
> [8.868481] GPR12: 24004844 c0001ec54200  
>  
> [8.868481] GPR16: c008a1a60048 c1595898 c1750c18 
> 0002 
> [8.868481] GPR20: c1750c28 c1624470 000fffe0 
> 5deadbeef122 
> [8.868481] GPR24: 0001 0dc0 0002 
> c03fee38 
> [8.868481] GPR28: c008b301f500 c008bb298620  
> c00c02286d00 
> [8.868529] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
> [8.868534] LR [c03d5b94] __slab_alloc+0x34/0x60
> [8.868538] Call Trace:
> [8.868541] [c008b37839a0] [c03d5734] 
> ___slab_alloc+0x334/0x760 (unreliable)
> [8.868547] [c008b3783a80] [c03d5b94] __slab_alloc+0x34/0x60
> [8.868553] [c008b3783ab0] [c03d6fa0] 
> __kmalloc_node+0x110/0x490
> [8.868559] [c008b3783b30] [c03fee38] 
> mem_cgroup_css_online+0x108/0x270
> [8.868565] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
> [8.868571] [c008b3783bc0] [c023eaec] 
> cgroup_apply_control_enable+0x2ec/0x4d0
> [8.868577] [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
> [8.868583] [c008b3783d10] [c051e170] 
> kernfs_iop_mkdir+0x90/0xf0
> [8.868589] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
> [8.868594] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
> [8.868601] [c008b3783e20] [c000b278] system_call+0x5c/0x68
> [8.868605] Instruction dump:
> [8.868608] 7c421378 e95f 714a0001 4082fff0 4b64 6000 6000 
> faa10088 
> [8.868615] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 
> 409e0394 3d02002a 
> [8.868623] ---[ end trace f9b8e3c36493f430 ]---
> [8.870690] 
> [9.870701] Kernel panic - not syncing: Fatal exception
> 
> Thanks
> -Sachin

-- 
Michal Hocko
SUSE Labs


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Kirill Tkhai
On 18.02.2020 14:38, Sachin Sant wrote:
> 
> 
>> On 18-Feb-2020, at 4:20 PM, Kirill Tkhai  wrote:
>>
>> Hi, Sachin,
>>
>> On 18.02.2020 13:45, Sachin Sant wrote:
>>>
>>> commit a75056fc1e7c 
>>> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
>>>
>>> I can boot the kernel successfully if the patch is reverted. 
>>
>>
>> could you please test your boot with original patch from here:
>>
>> https://patchwork.kernel.org/patch/11360007/
>>
>> ?
> With this original patch I can boot the machine successfully.

Ok, thanks.

I think, there is no a problem in the commited patch, since 
mem_cgroup_css_alloc()
is called from the place, where any memory allocations have to be allowed. This
is one of the reason, memory_cgrp_subsys.early_init is 0, and all nodes 
allocations
should be availeble there.

The problem is not in vmalloc() itself, since the second patch with 
kmalloc_node()
also fails on your setup. Maybe, the reproduction depends on amount of allocated
memory. For me this looks like a problem in powerpc, but it would be interesting
to hear some comments from powerpc guys.

For now we may replace the commited patch with v2 
(https://patchwork.kernel.org/patch/11360007/)
containing workaround, which we have in another 
alloc_mem_cgroup_per_node_info() allocations.

Kirill


Re: [PATCH kernel v3] powerpc/book3s64: Fix error handling in mm_iommu_do_alloc()

2020-02-18 Thread Michael Ellerman
Alexey Kardashevskiy  writes:
> On 24/12/2019 10:32, Alexey Kardashevskiy wrote:
...
>> 
>> I could rearrange the code even more but since there is no NVLink3
>> coming ever, I'd avoid changing it more than necessary. Thanks,
>
> Repost? Rework?

I'll just take v3.

cheers


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Sachin Sant


>> could you please test your boot with original patch from here:
>> 
>> https://patchwork.kernel.org/patch/11360007/
> 
> After you tried the above patch instead of the problem patch,
> do one more test and apply the below on current linux-next.
> Please, say which of the patches makes your kernel bootable again.
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 63bb6a2aab81..7b9b48dcbc60 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct 
> mem_cgroup *memcg,
>   if (!old)
>   return 0;
> 
> - new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
> + new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
>   if (!new)
>   return -ENOMEM;
> 
> @@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
> *memcg)
>   mutex_lock(_shrinker_map_mutex);
>   size = memcg_shrinker_map_size;
>   for_each_node(nid) {
> - map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
> + map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
>   if (!map) {
>   memcg_free_shrinker_maps(memcg);
>   ret = -ENOMEM;

With this incremental patch applied on top of current linux-next, machine fails 
to boot

[8.868433] BUG: Kernel NULL pointer dereference on read at 0x73b0
[8.868439] Faulting instruction address: 0xc03d55f4
[8.868444] Oops: Kernel access of bad area, sig: 11 [#1]
[8.868449] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[8.868453] Modules linked in:
[8.868458] CPU: 18 PID: 1 Comm: systemd Not tainted 
5.6.0-rc2-next-20200218-autotest+ #4
[8.868463] NIP:  c03d55f4 LR: c03d5b94 CTR: 0000
[8.868468] REGS: c008b3783710 TRAP: 0300   Not tainted  
(5.6.0-rc2-next-20200218-autotest+)
[8.868474] MSR:  80009033   CR: 24004844  
XER: 
[8.868481] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
IRQMASK: 1 
[8.868481] GPR00: c03d5b94 c008b37839a0 c155d400 
c008b301f500 
[8.868481] GPR04: 0dc0 0002 c03fee38 
c008bb298620 
[8.868481] GPR08: 0008ba1f 0001  
 
[8.868481] GPR12: 24004844 c0001ec54200  
 
[8.868481] GPR16: c008a1a60048 c1595898 c1750c18 
0002 
[8.868481] GPR20: c1750c28 c1624470 000fffe0 
5deadbeef122 
[8.868481] GPR24: 0001 0dc0 0002 
c03fee38 
[8.868481] GPR28: c008b301f500 c008bb298620  
c00c02286d00 
[8.868529] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
[8.868534] LR [c03d5b94] __slab_alloc+0x34/0x60
[8.868538] Call Trace:
[8.868541] [c008b37839a0] [c03d5734] ___slab_alloc+0x334/0x760 
(unreliable)
[8.868547] [c008b3783a80] [c03d5b94] __slab_alloc+0x34/0x60
[8.868553] [c008b3783ab0] [c03d6fa0] __kmalloc_node+0x110/0x490
[8.868559] [c008b3783b30] [c03fee38] 
mem_cgroup_css_online+0x108/0x270
[8.868565] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
[8.868571] [c008b3783bc0] [c023eaec] 
cgroup_apply_control_enable+0x2ec/0x4d0
[8.868577] [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
[8.868583] [c008b3783d10] [c051e170] kernfs_iop_mkdir+0x90/0xf0
[8.868589] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
[8.868594] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
[8.868601] [c008b3783e20] [c000b278] system_call+0x5c/0x68
[8.868605] Instruction dump:
[8.868608] 7c421378 e95f 714a0001 4082fff0 4b64 6000 6000 
faa10088 
[8.868615] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 409e0394 
3d02002a 
[8.868623] ---[ end trace f9b8e3c36493f430 ]---
[8.870690] 
[9.870701] Kernel panic - not syncing: Fatal exception

Thanks
-Sachin



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Sachin Sant



> On 18-Feb-2020, at 4:20 PM, Kirill Tkhai  wrote:
> 
> Hi, Sachin,
> 
> On 18.02.2020 13:45, Sachin Sant wrote:
>> 
>> commit a75056fc1e7c 
>> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
>> 
>> I can boot the kernel successfully if the patch is reverted. 
> 
> 
> could you please test your boot with original patch from here:
> 
> https://patchwork.kernel.org/patch/11360007/
> 
> ?
With this original patch I can boot the machine successfully.

Thanks
-Sachin



Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Kirill Tkhai
On 18.02.2020 14:01, Kirill Tkhai wrote:
> On 18.02.2020 13:50, Kirill Tkhai wrote:
>> Hi, Sachin,
>>
>> On 18.02.2020 13:45, Sachin Sant wrote:
>>> Todays next fails to boot on a POWER9 PowerVM logical partition
>>> with following trace:
>>>
>>> [8.767660] random: systemd: uninitialized urandom read (16 bytes read)
>>> [8.768629] BUG: Kernel NULL pointer dereference on read at 0x73b0
>>> [8.768635] Faulting instruction address: 0xc03d55f4
>>> [8.768641] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [8.768645] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>>> [8.768650] Modules linked in:
>>> [8.768655] CPU: 19 PID: 1 Comm: systemd Not tainted 
>>> 5.6.0-rc2-next-20200218-autotest #1
>>> [8.768660] NIP:  c03d55f4 LR: c000003d5b94 CTR: 
>>> 
>>> [8.768666] REGS: c008b37836d0 TRAP: 0300   Not tainted  
>>> (5.6.0-rc2-next-20200218-autotest)
>>> [8.768671] MSR:  80009033   CR: 24004844  
>>> XER: 
>>> [8.768679] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
>>> IRQMASK: 1
>>> [8.768679] GPR00: c03d5b94 c008b3783960 c155d400 
>>> c008b301f500
>>> [8.768679] GPR04: 0dc0 0002 c03443d8 
>>> c008bb398620
>>> [8.768679] GPR08: 0008ba2f 0001  
>>> 
>>> [8.768679] GPR12: 24004844 c0001ec52a00  
>>> 
>>> [8.768679] GPR16: c008a1b20048 c1595898 c1750c18 
>>> 0002
>>> [8.768679] GPR20: c1750c28 c1624470 000fffe0 
>>> 5deadbeef122
>>> [8.768679] GPR24: 0001 0dc0 0002 
>>> c03443d8
>>> [8.768679] GPR28: c008b301f500 c008bb398620  
>>> c00c02287180
>>> [8.768727] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
>>> [8.768732] LR [c03d5b94] __slab_alloc+0x34/0x60
>>> [8.768735] Call Trace:
>>> [8.768739] [c008b3783960] [c03d5734] 
>>> ___slab_alloc+0x334/0x760 (unreliable)
>>> [8.768745] [c008b3783a40] [c03d5b94] __slab_alloc+0x34/0x60
>>> [8.768751] [c008b3783a70] [c03d6fa0] 
>>> __kmalloc_node+0x110/0x490
>>> [8.768757] [c008b3783af0] [c03443d8] 
>>> kvmalloc_node+0x58/0x110
>>> [8.768763] [c008b3783b30] [c03fee38] 
>>> mem_cgroup_css_online+0x108/0x270
>>> [8.768769] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
>>> [8.768775] [c008b3783bc0] [c023eaec] 
>>> cgroup_apply_control_enable+0x2ec/0x4d0
>>> [8.768781] [c008b3783ca0] [c0242318] 
>>> cgroup_mkdir+0x228/0x5f0
>>> [8.768786] [c008b3783d10] [c051e170] 
>>> kernfs_iop_mkdir+0x90/0xf0
>>> [8.768792] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
>>> [8.768797] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
>>> [8.768804] [c008b3783e20] [c000b278] system_call+0x5c/0x68
>>> [8.768808] Instruction dump:
>>> [8.768811] 7c421378 e95f 714a0001 4082fff0 4b64 6000 
>>> 6000 faa10088
>>> [8.768818] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 
>>> 409e0394 3d02002a
>>> [8.768826] ---[ end trace 631af2cb73507891 ]---
>>> [8.770876]
>>> [9.770887] Kernel panic - not syncing: Fatal exception
>>>
>>> Bisect reveals the problem was introduced in next-20200217 by following 
>>> commit 
>>>
>>> commit a75056fc1e7c 
>>> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
>>>
>>> I can boot the kernel successfully if the patch is reverted. 
>>
>>
>> could you please test your boot with original patch from here:
>>
>> https://patchwork.kernel.org/patch/11360007/
> 
> After you tried the above patch instead of the problem patch,
> do one more test and apply the below on current linux-next.
> Please, say which of the patches makes your kernel bootable again.
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 63bb6a2aab81..7b9b48dcbc60 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -33

[5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Sachin Sant
Todays next fails to boot on a POWER9 PowerVM logical partition
with following trace:

[8.767660] random: systemd: uninitialized urandom read (16 bytes read)
[8.768629] BUG: Kernel NULL pointer dereference on read at 0x73b0
[8.768635] Faulting instruction address: 0xc03d55f4
[8.768641] Oops: Kernel access of bad area, sig: 11 [#1]
[8.768645] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[8.768650] Modules linked in:
[8.768655] CPU: 19 PID: 1 Comm: systemd Not tainted 
5.6.0-rc2-next-20200218-autotest #1
[8.768660] NIP:  c03d55f4 LR: c03d5b94 CTR: 
[8.768666] REGS: c008b37836d0 TRAP: 0300   Not tainted  
(5.6.0-rc2-next-20200218-autotest)
[8.768671] MSR:  80009033   CR: 24004844  
XER: 
[8.768679] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
IRQMASK: 1
[8.768679] GPR00: c03d5b94 c008b3783960 c155d400 
c008b301f500
[8.768679] GPR04: 0dc0 0002 c03443d8 
c008bb398620
[8.768679] GPR08: 0008ba2f 0001  

[8.768679] GPR12: 24004844 c0001ec52a00  

[8.768679] GPR16: c008a1b20048 c1595898 c1750c18 
0002
[8.768679] GPR20: c1750c28 c1624470 000fffe0 
5deadbeef122
[8.768679] GPR24: 0001 0dc0 0002 
c03443d8
[8.768679] GPR28: c008b301f500 c008bb398620  
c00c02287180
[8.768727] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
[8.768732] LR [c03d5b94] __slab_alloc+0x34/0x60
[8.768735] Call Trace:
[8.768739] [c008b3783960] [c03d5734] ___slab_alloc+0x334/0x760 
(unreliable)
[8.768745] [c008b3783a40] [c03d5b94] __slab_alloc+0x34/0x60
[8.768751] [c008b3783a70] [c03d6fa0] __kmalloc_node+0x110/0x490
[8.768757] [c008b3783af0] [c03443d8] kvmalloc_node+0x58/0x110
[8.768763] [c008b3783b30] [c03fee38] 
mem_cgroup_css_online+0x108/0x270
[8.768769] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
[8.768775] [c008b3783bc0] [c023eaec] 
cgroup_apply_control_enable+0x2ec/0x4d0
[8.768781] [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
[8.768786] [c008b3783d10] [c051e170] kernfs_iop_mkdir+0x90/0xf0
[8.768792] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
[8.768797] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
[8.768804] [c008b3783e20] [c000b278] system_call+0x5c/0x68
[8.768808] Instruction dump:
[8.768811] 7c421378 e95f 714a0001 4082fff0 4b64 6000 6000 
faa10088
[8.768818] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 409e0394 
3d02002a
[8.768826] ---[ end trace 631af2cb73507891 ]---
[8.770876]
[9.770887] Kernel panic - not syncing: Fatal exception

Bisect reveals the problem was introduced in next-20200217 by following commit 

commit a75056fc1e7c 
mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node

I can boot the kernel successfully if the patch is reverted. 

Boot log attached.

Thanks
-Sachin



next-20200218.log
Description: Binary data


Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Kirill Tkhai
Hi, Sachin,

On 18.02.2020 13:45, Sachin Sant wrote:
> Todays next fails to boot on a POWER9 PowerVM logical partition
> with following trace:
> 
> [8.767660] random: systemd: uninitialized urandom read (16 bytes read)
> [8.768629] BUG: Kernel NULL pointer dereference on read at 0x73b0
> [8.768635] Faulting instruction address: 0xc03d55f4
> [8.768641] Oops: Kernel access of bad area, sig: 11 [#1]
> [8.768645] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [8.768650] Modules linked in:
> [8.768655] CPU: 19 PID: 1 Comm: systemd Not tainted 
> 5.6.0-rc2-next-20200218-autotest #1
> [8.768660] NIP:  c03d55f4 LR: c03d5b94 CTR: 
> 
> [8.768666] REGS: c008b37836d0 TRAP: 0300   Not tainted  
> (5.6.0-rc2-next-20200218-autotest)
> [8.768671] MSR:  80009033   CR: 24004844  
> XER: 
> [8.768679] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
> IRQMASK: 1
> [8.768679] GPR00: c03d5b94 c008b3783960 c155d400 
> c008b301f500
> [8.768679] GPR04: 0dc0 0002 c03443d8 
> c008bb398620
> [8.768679] GPR08: 0008ba2f 0001  
> 
> [8.768679] GPR12: 24004844 c0001ec52a00  
> 
> [8.768679] GPR16: c008a1b20048 c1595898 c1750c18 
> 0002
> [8.768679] GPR20: c1750c28 c1624470 000fffe0 
> 5deadbeef122
> [8.768679] GPR24: 0001 0dc0 0002 
> c03443d8
> [8.768679] GPR28: c008b301f500 c008bb398620  
> c00c02287180
> [8.768727] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
> [8.768732] LR [c03d5b94] __slab_alloc+0x34/0x60
> [8.768735] Call Trace:
> [8.768739] [c008b3783960] [c03d5734] 
> ___slab_alloc+0x334/0x760 (unreliable)
> [8.768745] [c008b3783a40] [c03d5b94] __slab_alloc+0x34/0x60
> [8.768751] [c008b3783a70] [c03d6fa0] 
> __kmalloc_node+0x110/0x490
> [8.768757] [c008b3783af0] [c03443d8] kvmalloc_node+0x58/0x110
> [8.768763] [c008b3783b30] [c03fee38] 
> mem_cgroup_css_online+0x108/0x270
> [8.768769] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
> [8.768775] [c008b3783bc0] [c023eaec] 
> cgroup_apply_control_enable+0x2ec/0x4d0
> [8.768781] [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
> [8.768786] [c008b3783d10] [c051e170] 
> kernfs_iop_mkdir+0x90/0xf0
> [8.768792] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
> [8.768797] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
> [8.768804] [c008b3783e20] [c000b278] system_call+0x5c/0x68
> [8.768808] Instruction dump:
> [8.768811] 7c421378 e95f 714a0001 4082fff0 4b64 6000 6000 
> faa10088
> [8.768818] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 
> 409e0394 3d02002a
> [8.768826] ---[ end trace 631af2cb73507891 ]---
> [8.770876]
> [9.770887] Kernel panic - not syncing: Fatal exception
> 
> Bisect reveals the problem was introduced in next-20200217 by following 
> commit 
> 
> commit a75056fc1e7c 
> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
> 
> I can boot the kernel successfully if the patch is reverted. 


could you please test your boot with original patch from here:

https://patchwork.kernel.org/patch/11360007/

?


[PATCH] powerpc/Makefile: Mark phony targets as PHONY

2020-02-18 Thread Michael Ellerman
Some of our phony targets are not marked as such. This can lead to
confusing errors, eg:

  $ make clean
  $ touch install
  $ make install
  make: 'install' is up to date.
  $

Fix it by adding them to the PHONY variable which is marked phony in
the top-level Makefile. In arch/powerpc/boot/Makefile we do it
manually.

Suggested-by: Masahiro Yamada 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Makefile  | 6 ++
 arch/powerpc/boot/Makefile | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index f35730548e42..cbe5ca4f0ee5 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -298,6 +298,7 @@ $(BOOT_TARGETS2): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
 
 
+PHONY += bootwrapper_install
 bootwrapper_install:
$(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
 
@@ -403,9 +404,11 @@ define archhelp
   @echo '  (minus the .dts extension).'
 endef
 
+PHONY += install
 install:
$(Q)$(MAKE) $(build)=$(boot) install
 
+PHONY += vdso_install
 vdso_install:
 ifdef CONFIG_PPC64
$(Q)$(MAKE) $(build)=arch/$(ARCH)/kernel/vdso64 $@
@@ -425,6 +428,7 @@ archprepare: checkbin
 ifdef CONFIG_STACKPROTECTOR
 prepare: stack_protector_prepare
 
+PHONY += stack_protector_prepare
 stack_protector_prepare: prepare0
 ifdef CONFIG_PPC64
$(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk '{if 
($$2 == "PACA_CANARY") print $$3;}' include/generated/asm-offsets.h))
@@ -436,10 +440,12 @@ endif
 ifdef CONFIG_SMP
 prepare: task_cpu_prepare
 
+PHONY += task_cpu_prepare
 task_cpu_prepare: prepare0
$(eval KBUILD_CFLAGS += -D_TASK_CPU=$(shell awk '{if ($$2 == 
"TASK_CPU") print $$3;}' include/generated/asm-offsets.h))
 endif
 
+PHONY += checkbin
 # Check toolchain versions:
 # - gcc-4.6 is the minimum kernel-wide version so nothing required.
 checkbin:
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 0556bf4fc9e9..58d4b4667206 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -445,6 +445,8 @@ install: $(CONFIGURE) $(addprefix $(obj)/, $(image-y))
 zInstall: $(CONFIGURE) $(addprefix $(obj)/, $(image-y))
sh -x $(srctree)/$(src)/install.sh "$(KERNELRELEASE)" vmlinux 
System.map "$(INSTALL_PATH)" $^
 
+.PHONY: install zInstall
+
 # anything not in $(targets)
 clean-files += $(image-) $(initrd-) cuImage.* dtbImage.* treeImage.* \
zImage zImage.initrd zImage.chrp zImage.coff zImage.holly \
-- 
2.21.1



Re: [PATCH] powerpc/kprobes: Fix trap address when trap happened in real mode

2020-02-18 Thread Christophe Leroy




Le 18/02/2020 à 11:29, Masami Hiramatsu a écrit :

On Tue, 18 Feb 2020 06:58:06 +0100
Christophe Leroy  wrote:



What do you mean by 'there' ? At the entry of kprobe_handler() ?

That's what my patch does, it checks whether MMU is disabled or not. If
it is, it converts the address to a virtual address.

Do you mean kprobe_handler() should bail out early as it does when the
trap happens in user mode ?


Yes, that is what I meant.


Of course we can do that, I don't know
enough about kprobe to know if kprobe_handler() should manage events
that happened in real-mode or just ignore them. But I tested adding an
event on a function that runs in real-mode, and it (now) works.

So, what should we do really ?


I'm not sure how the powerpc kernel runs in real mode.
But clearly, at least kprobe event can not handle that case because
it tries to access memory by probe_kernel_read(). Unless that function
correctly handles the address translation, I want to prohibit kprobes
on such address.

So what I would like to see is, something like below.

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 2d27ec4feee4..4771be152416 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -261,7 +261,7 @@ int kprobe_handler(struct pt_regs *regs)
   unsigned int *addr = (unsigned int *)regs->nip;
   struct kprobe_ctlblk *kcb;

-   if (user_mode(regs))

+   if (user_mode(regs) || !(regs->msr & MSR_IR))
   return 0;

   /*





With this instead change of my patch, I get an Oops everytime a kprobe
event occurs in real-mode.

This is because kprobe_handler() is now saying 'this trap doesn't belong
to me' for a trap that has been installed by it.


Hmm, on powerpc, kprobes is allowed to probe on the code which runs
in the real mode? I think we should also prohibit it by blacklisting.
(It is easy to add blacklist by NOKPROBE_SYMBOL(func))


Yes, I see a lot of them tagged with _ASM_NOKPROBE_SYMBOL() on PPC64,
but none on PPC32. I suppose that's missing and have to be added.


Ah, you are using PPC32.


Nevertheless, if one symbol has been forgotten in the blacklist, I think
it is a problem if it generate Oopses.


There is a long history also on x86 to make a blacklist. Anyway, how did
you get this error on PPC32? Somewhere would you like to probe and
it is a real mode function? Or, it happened unexpectedly?


The first Oops I got was triggered by a WARN_ON() kind of trap in real 
mode. The trap exception handler called kprobe_handler() which tried to 
read the instruction at the trap address (which was a real-mode address) 
so it triggered a Bad Access Fault.


This was initially the purpose of my patch.

After discussion with you, I started looking at what would be the effect 
of setting a kprobe event in a function which runs in real mode.







Or, some parts are possble to run under both real mode and kernel mode?


I don't think so, at least on PPC32


OK, that's a good news. Also, are there any independent section where such
real mode functions are stored? (I can see start_real_trampolines in
sections.h) If that kind of sections are defined, it is easy to make
a blacklist in arch_populate_kprobe_blacklist(). See 
arch/arm64/kernel/probes/kprobes.c.


Part of them are in .head.text, and this section is already blacklisted 
throught function arch_within_kprobe_blacklist()


But there are several other functions which are not there. For instance, 
many things within entry_32.S, and also things in hash_low.S
On PPC64 (ie in entry_64.S) they were explicitely blacklisted with 
_ASM_NOKPROBE_SYMBOL(). We have to do the same on PPC64






So the 'program check' exception handler doesn't find the owner of the
trap hence generate an Oops.

Even if we don't want kprobe() to proceed with the event entirely
(allthough it works at least for simple events), I'd expect it to fail
gracefully.


Agreed. I thought it was easy to identify real mode code. But if it is
hard, we should apply your first patch and also skip user handlers
if we are in the real mode (and increment missed count).


user handlers are already skipped.


Yes, if you don't put a kprobes on real mode code. However, if user
(accidentally) puts a probe on real mode code, it might call a
user handler?


Are we talking about the same thing ?

Only kernel code can run in real mode, so the following code at the 
beginning of kprobe_handler() does the job ?


if (user_mode(regs))
return 0;






What do you think about my latest proposal below ? If a trap is
encoutered in real mode, if checks if the matching virtual address
corresponds to a valid kprobe. If it is, it skips it. If not, it returns
0 to tell "it's no me". You are also talking about incrementing the
missed count. Who do we do that ?


I rather like your first patch. If there is a kprobes, we can not skip
the instruction, because there is an instruction which must be executed.
(or 

Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

2020-02-18 Thread Kirill Tkhai
On 18.02.2020 13:50, Kirill Tkhai wrote:
> Hi, Sachin,
> 
> On 18.02.2020 13:45, Sachin Sant wrote:
>> Todays next fails to boot on a POWER9 PowerVM logical partition
>> with following trace:
>>
>> [8.767660] random: systemd: uninitialized urandom read (16 bytes read)
>> [8.768629] BUG: Kernel NULL pointer dereference on read at 0x73b0
>> [8.768635] Faulting instruction address: 0xc03d55f4
>> [8.768641] Oops: Kernel access of bad area, sig: 11 [#1]
>> [8.768645] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> [8.768650] Modules linked in:
>> [8.768655] CPU: 19 PID: 1 Comm: systemd Not tainted 
>> 5.6.0-rc2-next-20200218-autotest #1
>> [8.768660] NIP:  c03d55f4 LR: c03d5b94 CTR: 
>> 
>> [8.768666] REGS: c008b37836d0 TRAP: 0300   Not tainted  
>> (5.6.0-rc2-next-20200218-autotest)
>> [8.768671] MSR:  80009033   CR: 24004844  
>> XER: 
>> [8.768679] CFAR: c000dec4 DAR: 73b0 DSISR: 4000 
>> IRQMASK: 1
>> [8.768679] GPR00: c03d5b94 c008b3783960 c155d400 
>> c008b301f500
>> [8.768679] GPR04: 0dc0 0002 c03443d8 
>> c008bb398620
>> [8.768679] GPR08: 0008ba2f 0001  
>> 
>> [8.768679] GPR12: 24004844 c0001ec52a00  
>> 
>> [8.768679] GPR16: c008a1b20048 c1595898 c1750c18 
>> 0002
>> [8.768679] GPR20: c1750c28 c1624470 000fffe0 
>> 5deadbeef122
>> [8.768679] GPR24: 0001 0dc0 0002 
>> c03443d8
>> [8.768679] GPR28: c008b301f500 c008bb398620  
>> c00c02287180
>> [8.768727] NIP [c03d55f4] ___slab_alloc+0x1f4/0x760
>> [8.768732] LR [c03d5b94] __slab_alloc+0x34/0x60
>> [8.768735] Call Trace:
>> [8.768739] [c008b3783960] [c03d5734] 
>> ___slab_alloc+0x334/0x760 (unreliable)
>> [8.768745] [c008b3783a40] [c03d5b94] __slab_alloc+0x34/0x60
>> [8.768751] [c008b3783a70] [c03d6fa0] 
>> __kmalloc_node+0x110/0x490
>> [8.768757] [c008b3783af0] [c03443d8] kvmalloc_node+0x58/0x110
>> [8.768763] [c008b3783b30] [c03fee38] 
>> mem_cgroup_css_online+0x108/0x270
>> [8.768769] [c008b3783b90] [c0235aa8] online_css+0x48/0xd0
>> [8.768775] [c008b3783bc0] [c023eaec] 
>> cgroup_apply_control_enable+0x2ec/0x4d0
>> [8.768781] [c008b3783ca0] [c0242318] cgroup_mkdir+0x228/0x5f0
>> [8.768786] [c008b3783d10] [c051e170] 
>> kernfs_iop_mkdir+0x90/0xf0
>> [8.768792] [c008b3783d50] [c043dc00] vfs_mkdir+0x110/0x230
>> [8.768797] [c008b3783da0] [c0441c90] do_mkdirat+0xb0/0x1a0
>> [8.768804] [c008b3783e20] [c000b278] system_call+0x5c/0x68
>> [8.768808] Instruction dump:
>> [8.768811] 7c421378 e95f 714a0001 4082fff0 4b64 6000 
>> 6000 faa10088
>> [8.768818] 3ea2000c 3ab57070 7b4a1f24 7d55502a  2faa 
>> 409e0394 3d02002a
>> [8.768826] ---[ end trace 631af2cb73507891 ]---
>> [8.770876]
>> [9.770887] Kernel panic - not syncing: Fatal exception
>>
>> Bisect reveals the problem was introduced in next-20200217 by following 
>> commit 
>>
>> commit a75056fc1e7c 
>> mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node
>>
>> I can boot the kernel successfully if the patch is reverted. 
> 
> 
> could you please test your boot with original patch from here:
> 
> https://patchwork.kernel.org/patch/11360007/

After you tried the above patch instead of the problem patch,
do one more test and apply the below on current linux-next.
Please, say which of the patches makes your kernel bootable again.

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 63bb6a2aab81..7b9b48dcbc60 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct mem_cgroup 
*memcg,
if (!old)
return 0;
 
-   new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
+   new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
if (!new)
return -ENOMEM;
 
@@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup 
*memcg)
mutex_lock(_shrinker_map_mutex);
size = memcg_shrinker_map_size;
for_each_node(nid) {
-   map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
+   map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
if (!map) {
memcg_free_shrinker_maps(memcg);
ret = -ENOMEM;




Re: [PATCH v2 07/13] powerpc: add support for folded p4d page tables

2020-02-18 Thread Mike Rapoport
On Sun, Feb 16, 2020 at 11:41:07AM +0100, Christophe Leroy wrote:
> 
> 
> Le 16/02/2020 à 09:18, Mike Rapoport a écrit :
> > From: Mike Rapoport 
> > 
> > Implement primitives necessary for the 4th level folding, add walks of p4d
> > level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h.
> 
> I don't think it is worth adding all this additionnals walks of p4d, this
> patch could be limited to changes like:
> 
> - pud = pud_offset(pgd, gpa);
> + pud = pud_offset(p4d_offset(pgd, gpa), gpa);
> 
> The additionnal walks should be added through another patch the day powerpc
> need them.

Ok, I'll update the patch to reduce walking the p4d.
 
> See below for more comments.
> 
> > 
> > Signed-off-by: Mike Rapoport 
> > Tested-by: Christophe Leroy  # 8xx and 83xx
> > ---

...

> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> > b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 201a69e6a355..bafff0ab 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> >   #define _ASM_POWERPC_BOOK3S_64_PGTABLE_H_
> > -#include 
> > +#include 
> >   #ifndef __ASSEMBLY__
> >   #include 
> > @@ -251,7 +251,7 @@ extern unsigned long __pmd_frag_size_shift;
> >   /* Bits to mask out from a PUD to get to the PMD page */
> >   #define PUD_MASKED_BITS   0xc0ffUL
> >   /* Bits to mask out from a PGD to get to the PUD page */
> > -#define PGD_MASKED_BITS0xc0ffUL
> > +#define P4D_MASKED_BITS0xc0ffUL
> >   /*
> >* Used as an indicator for rcu callback functions
> > @@ -949,54 +949,60 @@ static inline bool pud_access_permitted(pud_t pud, 
> > bool write)
> > return pte_access_permitted(pud_pte(pud), write);
> >   }
> > -#define pgd_write(pgd) pte_write(pgd_pte(pgd))
> > +#define __p4d_raw(x)   ((p4d_t) { __pgd_raw(x) })
> > +static inline __be64 p4d_raw(p4d_t x)
> > +{
> > +   return pgd_raw(x.pgd);
> > +}
> > +
> 
> Shouldn't this be defined in asm/pgtable-be-types.h, just like other
> __pxx_raw() ?

Ideally yes, but this creates weird header file dependencies and untangling
them would generate way too much churn.
 
> > +#define p4d_write(p4d) pte_write(p4d_pte(p4d))
> > -static inline void pgd_clear(pgd_t *pgdp)
> > +static inline void p4d_clear(p4d_t *p4dp)
> >   {
> > -   *pgdp = __pgd(0);
> > +   *p4dp = __p4d(0);
> >   }

...

> > @@ -573,9 +596,15 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t *pgtable, 
> > pte_t pte,
> > /* Traverse the guest's 2nd-level tree, allocate new levels needed */
> > pgd = pgtable + pgd_index(gpa);
> > -   pud = NULL;
> > +   p4d = NULL;
> > if (pgd_present(*pgd))
> > -   pud = pud_offset(pgd, gpa);
> > +   p4d = p4d_offset(pgd, gpa);
> > +   else
> > +   new_p4d = p4d_alloc_one(kvm->mm, gpa);
> > +
> > +   pud = NULL;
> > +   if (p4d_present(*p4d))
> > +   pud = pud_offset(p4d, gpa);
> 
> Is it worth adding all this new code ?
> 
> My understanding is that the series objective is to get rid of
> __ARCH_HAS_5LEVEL_HACK, to to add support for 5 levels to an architecture
> that not need it (at least for now).
> If we want to add support for 5 levels, it can be done later in another
> patch.
> 
> Here I think your change could be limited to:
> 
> - pud = pud_offset(pgd, gpa);
> + pud = pud_offset(p4d_offset(pgd, gpa), gpa);

This won't work. Without __ARCH_USE_5LEVEL_HACK defined pgd_present() is
hardwired to 1 and the actual check for the top level is performed with
p4d_present(). The 'else' clause that allocates p4d will never be taken and
it could be removed, but I prefer to keep it for consistency.
 
> > else
> > new_pud = pud_alloc_one(kvm->mm, gpa);
> > @@ -597,12 +626,18 @@ int kvmppc_create_pte(struct kvm *kvm, pgd_t 
> > *pgtable, pte_t pte,
> > /* Now traverse again under the lock and change the tree */
> > ret = -ENOMEM;
> > if (pgd_none(*pgd)) {
> > +   if (!new_p4d)
> > +   goto out_unlock;
> > +   pgd_populate(kvm->mm, pgd, new_p4d);
> > +   new_p4d = NULL;
> > +   }
> > +   if (p4d_none(*p4d)) {
> > if (!new_pud)
> > goto out_unlock;
> > -   pgd_populate(kvm->mm, pgd, new_pud);
> > +   p4d_populate(kvm->mm, p4d, new_pud);
> > new_pud = NULL;
> > }
> > -   pud = pud_offset(pgd, gpa);
> > +   pud = pud_offset(p4d, gpa);
> > if (pud_is_leaf(*pud)) {
> > unsigned long hgpa = gpa & PUD_MASK;
> > @@ -1220,6 +1255,7 @@ static ssize_t debugfs_radix_read(struct file *file, 
> > char __user *buf,
> > pgd_t *pgt;
> > struct kvm_nested_guest *nested;
> > pgd_t pgd, *pgdp;
> > +   p4d_t p4d, *p4dp;
> > pud_t pud, *pudp;
> > pmd_t pmd, *pmdp;
> > pte_t *ptep;
> > @@ -1298,7 

Re:Re: [PATCH] powerpc/sysdev: fix compile errors

2020-02-18 Thread 王文虎

From: Christophe Leroy 
 Date: 2020-01-21 16:37:07
To:"王文虎" ,Andrew Donnellan 
 cc: Kate Stewart ,Richard Fontana 
,Greg Kroah-Hartman 
,linux-ker...@vger.kernel.org,wangwenhu 
,Paul Mackerras 
,triv...@kernel.org,Thomas Gleixner 
,linuxppc-dev@lists.ozlabs.org,loneh...@hotmail.com
Subject: Re: [PATCH] powerpc/sysdev: fix compile errors>
>
>Le 21/01/2020 à 07:59, 王文虎 a écrit :
>> 发件人:Andrew Donnellan 
>> 发送日期:2020-01-21 14:13:07
>> 收件人:wangwenhu ,Benjamin Herrenschmidt 
>> ,Paul Mackerras ,Michael 
>> Ellerman ,Kate Stewart 
>> ,Greg Kroah-Hartman 
>> ,Richard Fontana ,Thomas 
>> Gleixner 
>> ,linuxppc-dev@lists.ozlabs.org,linux-ker...@vger.kernel.org
>> 抄送人:triv...@kernel.org,loneh...@hotmail.com,wenhu.w...@vivo.com
>> 主题:Re: [PATCH] powerpc/sysdev: fix compile errors>On 21/1/20 4:31 pm, 
>> wangwenhu wrote:
 From: wangwenhu 

 Include arch/powerpc/include/asm/io.h into fsl_85xx_cache_sram.c to
 fix the implicit declaration compile errors when building Cache-Sram.

 arch/powerpc/sysdev/fsl_85xx_cache_sram.c: In function 
 ‘instantiate_cache_sram’:
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:26: error: implicit 
 declaration of function ‘ioremap_coherent’; did you mean 
 ‘bitmap_complement’? [-Werror=implicit-function-declaration]
 cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
 ^~~~
 bitmap_complement
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c:97:24: error: assignment makes 
 pointer from integer without a cast [-Werror=int-conversion]
 cache_sram->base_virt = ioremap_coherent(cache_sram->base_phys,
   ^
 arch/powerpc/sysdev/fsl_85xx_cache_sram.c:123:2: error: implicit 
 declaration of function ‘iounmap’; did you mean ‘roundup’? 
 [-Werror=implicit-function-declaration]
 iounmap(cache_sram->base_virt);
 ^~~
 roundup
 cc1: all warnings being treated as errors

 Signed-off-by: wangwenhu 
>>>
>>> How long has this code been broken for?
>> 
>> It's been broken almost 15 months since the commit below:
>> "commit aa91796ec46339f2ed53da311bd3ea77a3e4dfe1
>> Author: Christophe Leroy 
>> Date:   Tue Oct 9 13:51:41 2018 +
>> 
>>  powerpc: don't use ioremap_prot() nor __ioremap() unless really needed."
>> 
>> And we are working on it now for further development.
>
>That's pretty surprising. That commit didn't change the iounmap(). It 
>only replaced ioremap_prot() by ioremap_coherent(). Both are defined in io.h
>
>Christophe
>

The compile error exists since the uploading of the driver
Details below:
1. "ioremap_flags" defined in "asm/io.h" was used primarily(Wed Oct 13 17:30:56 
2010):
(6db92cc9d07d: powerpc/85xx: add cache-sram support);
2. "ioremap_prot" was used to replace "ioremap_flags"
(40f1ce7fb7e8: powerpc: Remove ioremap_flags);
3. "ioremap_coherent" was used to replace "ioremap_prot":
(aa91796ec463: powerpc: don't use ioremap_prot() nor __ioremap() unless really 
needed.)

So I will do the re-patch with a "Fixed" tag and "" include 
modification.
Which commit should be referenced to append to "Fixed" tag? (No.1 or No.3 ?)

Wenhu

>> 
>>>
 ---
arch/powerpc/sysdev/fsl_85xx_cache_sram.c | 1 +
1 file changed, 1 insertion(+)

 diff --git a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c 
 b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
 index f6c665dac725..29b6868eff7d 100644
 --- a/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
 +++ b/arch/powerpc/sysdev/fsl_85xx_cache_sram.c
 @@ -17,6 +17,7 @@
#include 
#include 
#include 
 +#include 

#include "fsl_85xx_cache_ctlr.h"

>>>
>>> -- 
>>> Andrew Donnellan  OzLabs, ADL Canberra
>>> a...@linux.ibm.com IBM Australia Limited
>>>
>> 
>> Wenhu
>> 




Re: [PATCH] powerpc/kprobes: Fix trap address when trap happened in real mode

2020-02-18 Thread Masami Hiramatsu
On Tue, 18 Feb 2020 06:58:06 +0100
Christophe Leroy  wrote:

> 
>  What do you mean by 'there' ? At the entry of kprobe_handler() ?
> 
>  That's what my patch does, it checks whether MMU is disabled or not. If
>  it is, it converts the address to a virtual address.
> 
>  Do you mean kprobe_handler() should bail out early as it does when the
>  trap happens in user mode ?
> >>>
> >>> Yes, that is what I meant.
> >>>
>  Of course we can do that, I don't know
>  enough about kprobe to know if kprobe_handler() should manage events
>  that happened in real-mode or just ignore them. But I tested adding an
>  event on a function that runs in real-mode, and it (now) works.
> 
>  So, what should we do really ?
> >>>
> >>> I'm not sure how the powerpc kernel runs in real mode.
> >>> But clearly, at least kprobe event can not handle that case because
> >>> it tries to access memory by probe_kernel_read(). Unless that function
> >>> correctly handles the address translation, I want to prohibit kprobes
> >>> on such address.
> >>>
> >>> So what I would like to see is, something like below.
> >>>
> >>> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> >>> index 2d27ec4feee4..4771be152416 100644
> >>> --- a/arch/powerpc/kernel/kprobes.c
> >>> +++ b/arch/powerpc/kernel/kprobes.c
> >>> @@ -261,7 +261,7 @@ int kprobe_handler(struct pt_regs *regs)
> >>>   unsigned int *addr = (unsigned int *)regs->nip;
> >>>   struct kprobe_ctlblk *kcb;
> >>>
> >>> -   if (user_mode(regs))
> >>> +   if (user_mode(regs) || !(regs->msr & MSR_IR))
> >>>   return 0;
> >>>
> >>>   /*
> >>>
> >>>
> >>
> >> With this instead change of my patch, I get an Oops everytime a kprobe
> >> event occurs in real-mode.
> >>
> >> This is because kprobe_handler() is now saying 'this trap doesn't belong
> >> to me' for a trap that has been installed by it.
> > 
> > Hmm, on powerpc, kprobes is allowed to probe on the code which runs
> > in the real mode? I think we should also prohibit it by blacklisting.
> > (It is easy to add blacklist by NOKPROBE_SYMBOL(func))
> 
> Yes, I see a lot of them tagged with _ASM_NOKPROBE_SYMBOL() on PPC64, 
> but none on PPC32. I suppose that's missing and have to be added. 

Ah, you are using PPC32. 

> Nevertheless, if one symbol has been forgotten in the blacklist, I think 
> it is a problem if it generate Oopses.

There is a long history also on x86 to make a blacklist. Anyway, how did
you get this error on PPC32? Somewhere would you like to probe and
it is a real mode function? Or, it happened unexpectedly?

> 
> > Or, some parts are possble to run under both real mode and kernel mode?
> 
> I don't think so, at least on PPC32

OK, that's a good news. Also, are there any independent section where such
real mode functions are stored? (I can see start_real_trampolines in
sections.h) If that kind of sections are defined, it is easy to make
a blacklist in arch_populate_kprobe_blacklist(). See 
arch/arm64/kernel/probes/kprobes.c.


> >> So the 'program check' exception handler doesn't find the owner of the
> >> trap hence generate an Oops.
> >>
> >> Even if we don't want kprobe() to proceed with the event entirely
> >> (allthough it works at least for simple events), I'd expect it to fail
> >> gracefully.
> > 
> > Agreed. I thought it was easy to identify real mode code. But if it is
> > hard, we should apply your first patch and also skip user handlers
> > if we are in the real mode (and increment missed count).
> 
> user handlers are already skipped.

Yes, if you don't put a kprobes on real mode code. However, if user
(accidentally) puts a probe on real mode code, it might call a
user handler?

> 
> What do you think about my latest proposal below ? If a trap is 
> encoutered in real mode, if checks if the matching virtual address 
> corresponds to a valid kprobe. If it is, it skips it. If not, it returns 
> 0 to tell "it's no me". You are also talking about incrementing the 
> missed count. Who do we do that ?

I rather like your first patch. If there is a kprobes, we can not skip
the instruction, because there is an instruction which must be executed.
(or single-skipped, but I'm not sure the emulator works correctly on
real mode)

Thank you,

> 
> 
> 
> @@ -264,6 +265,13 @@ int kprobe_handler(struct pt_regs *regs)
>   if (user_mode(regs))
>   return 0;
> 
> +if (!(regs->msr & MSR_IR)) {
> +if (!get_kprobe(phys_to_virt(regs->nip)))
> +return 0;
> +regs->nip += 4;
> +return 1;
> +}
> +
>   /*
>* We don't want to be preempted for the entire
>* duration of kprobe processing
> 
> 
> > 
> > BTW, can the emulater handle the real mode code correctly?
> 
> I don't know, how do I test that ?
> 
> Christophe


-- 
Masami Hiramatsu 


Re: [PATCH 2/2] powerpc/kernel/sysfs: Add new config option PMU_SYSFS to enable PMU SPRs sysfs file creation

2020-02-18 Thread Nageswara R Sastry

"Kajol Jain"  wrote on 14/02/2020 01:36:06 PM:

> From: "Kajol Jain" 
> To: linuxppc-dev@lists.ozlabs.org, m...@ellerman.id.au
> Cc: ma...@linux.vnet.ibm.com, a...@linux.vnet.ibm.com, Nageswara R
> Sastry/India/IBM@IBMIN, kj...@linux.ibm.com
> Date: 14/02/2020 01:36 PM
> Subject: [PATCH 2/2] powerpc/kernel/sysfs: Add new config option
> PMU_SYSFS to enable PMU SPRs sysfs file creation
>
> Many of the performance monitoring unit (PMU) SPRs are
> exposed in the sysfs. This may not be a desirable since
> "perf" API is the primary interface to program PMU and
> collect counter data in the system. But that said, we
> cant remove these sysfs files since we dont whether
> anyone/anything is using them.
>
> So the patch adds a new CONFIG option 'CONFIG_PMU_SYSFS'
> (user selectable) to be used in sysfs file creation for
> PMU SPRs. New option by default is disabled, but can be
> enabled if user needs it.
>
> Tested this patch behaviour in powernv and pseries machines.
> Patch is also tested for pmac32_defconfig.
>
> Signed-off-by: Kajol Jain 

Tested-by: Nageswara R Sastry 

Tested on PowerVM, PowerNV machines with and with out CONFIG_PMU_SYSFS
With CONFIG_PMU_SYSFS=y
For online CPUs can see the SPR files namely mmcr*, pmc*
With CONFIG_PMU_SYSFS=n
Not seeing the SPR files


> ---
>  arch/powerpc/kernel/sysfs.c| 6 ++
>  arch/powerpc/platforms/Kconfig.cputype | 6 ++
>  2 files changed, 12 insertions(+)
>
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 74da5eb..cd807e8 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -562,6 +562,7 @@ void ppc_enable_pmcs(void)
>   * that are implemented on the current processor
>   */
>
> +#ifdef CONFIG_PMU_SYSFS
>  #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_BOOK3S_32)
>  #define HAS_PPC_PMC_CLASSIC   1
>  #define HAS_PPC_PMC_IBM  1
> @@ -575,6 +576,7 @@ void ppc_enable_pmcs(void)
>  #ifdef CONFIG_PPC_BOOK3S_32
>  #define HAS_PPC_PMC_G4  1
>  #endif
> +#endif /* CONFIG_PMU_SYSFS */
>
>  #if defined(CONFIG_PPC64) && defined(CONFIG_DEBUG_MISC)
>  #define HAS_PPC_PA6T
> @@ -812,8 +814,10 @@ static int register_cpu_online(unsigned int cpu)
>   device_create_file(s, _attrs[i]);
>
>  #ifdef CONFIG_PPC64
> +#ifdef   CONFIG_PMU_SYSFS
> if (cpu_has_feature(CPU_FTR_MMCRA))
>device_create_file(s, _attr_mmcra);
> +#endif /* CONFIG_PMU_SYSFS */
>
> if (cpu_has_feature(CPU_FTR_PURR)) {
>if (!firmware_has_feature(FW_FEATURE_LPAR))
> @@ -901,8 +905,10 @@ static int unregister_cpu_online(unsigned int cpu)
>   device_remove_file(s, _attrs[i]);
>
>  #ifdef CONFIG_PPC64
> +#ifdef CONFIG_PMU_SYSFS
> if (cpu_has_feature(CPU_FTR_MMCRA))
>device_remove_file(s, _attr_mmcra);
> +#endif /* CONFIG_PMU_SYSFS */
>
> if (cpu_has_feature(CPU_FTR_PURR))
>device_remove_file(s, _attr_purr);
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/
> platforms/Kconfig.cputype
> index 8d7f9c3..e58b48d 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -417,6 +417,12 @@ config PPC_MM_SLICES
>  config PPC_HAVE_PMU_SUPPORT
> bool
>
> +config PMU_SYSFS
> +   bool "Create PMU SPRs sysfs file"
> +   default n
> +   help
> + This option enables sysfs file creation for PMU SPRs like
> MMCR* and PMC*.
> +
>  config PPC_PERF_CTRS
> def_bool y
> depends on PERF_EVENTS && PPC_HAVE_PMU_SUPPORT
> --
> 1.8.3.1
>


[PATCH v2 00/24] Manually convert thermal, crypto and misc devices to ReST

2020-02-18 Thread Mauro Carvalho Chehab


Manually convert some files from thermal, crypto and misc-devices
to ReST format.

This patch is against linux-next 20200217 tag.

v2: 

- a small change at patch 2 to avoid uneeded whitespace changes;
- added 13 new patches at the end

Mauro Carvalho Chehab (24):
  docs: thermal: convert cpu-idle-cooling.rst to ReST
  docs: crypto: convert asymmetric-keys.txt to ReST
  docs: crypto: convert api-intro.txt to ReST format
  docs: crypto: convert async-tx-api.txt to ReST format
  docs: crypto: descore-readme.txt: convert to ReST format
  docs: misc-devices/spear-pcie-gadget.txt: convert to ReST
  docs: misc-devices/pci-endpoint-test.txt: convert to ReST
  docs: misc-devices/pci-endpoint-test.txt: convert to ReST
  docs: misc-devices/c2port.txt: convert to ReST format
  docs: misc-devices/bh1770glc.txt: convert to ReST
  docs: misc-devices/apds990x.txt: convert to ReST format
  docs: pci: endpoint/function/binding/pci-test.txt convert to ReST
  docs: arm64: convert perf.txt to ReST format
  docs: cpu-freq: convert index.txt to ReST
  docs: cpu-freq: convert amd-powernow.txt to ReST
  docs: cpu-freq: convert core.txt to ReST
  docs: cpu-freq: convert cpu-drivers.txt to ReST
  docs: cpu-freq: convert cpufreq-nforce2.txt to ReST
  docs: cpu-freq: convert cpufreq-stats.txt to ReST
  docs: cpu-freq: convert pcc-cpufreq.txt to ReST
  docs: powerpc: convert vcpudispatch_stats.txt to ReST
  docs: sh: convert new-machine.txt to ReST
  docs: sh: convert register-banks.txt to ReST
  docs: trace: ring-buffer-design.txt: convert to ReST format

 .../endpoint/function/binding/pci-test.rst|  26 +
 .../endpoint/function/binding/pci-test.txt|  19 -
 Documentation/PCI/endpoint/index.rst  |   2 +
 Documentation/arm64/index.rst |   1 +
 Documentation/arm64/{perf.txt => perf.rst}|   7 +-
 .../{amd-powernow.txt => amd-powernow.rst}|  12 +-
 Documentation/cpu-freq/{core.txt => core.rst} |  65 +-
 .../{cpu-drivers.txt => cpu-drivers.rst}  | 129 ++-
 ...pufreq-nforce2.txt => cpufreq-nforce2.rst} |  18 +-
 .../{cpufreq-stats.txt => cpufreq-stats.rst}  | 121 +--
 Documentation/cpu-freq/index.rst  |  42 +
 Documentation/cpu-freq/index.txt  |  56 --
 .../{pcc-cpufreq.txt => pcc-cpufreq.rst}  |  86 +-
 .../crypto/{api-intro.txt => api-intro.rst}   | 186 ++--
 ...symmetric-keys.txt => asymmetric-keys.rst} |  91 +-
 .../{async-tx-api.txt => async-tx-api.rst}| 253 +++---
 ...{descore-readme.txt => descore-readme.rst} | 152 +++-
 Documentation/crypto/index.rst|   5 +
 .../driver-api/thermal/cpu-idle-cooling.rst   |  18 +-
 Documentation/driver-api/thermal/index.rst|   1 +
 Documentation/index.rst   |   1 +
 .../{ad525x_dpot.txt => ad525x_dpot.rst}  |  24 +-
 .../{apds990x.txt => apds990x.rst}|  31 +-
 .../{bh1770glc.txt => bh1770glc.rst}  |  45 +-
 .../misc-devices/{c2port.txt => c2port.rst}   |  58 +-
 Documentation/misc-devices/index.rst  |   6 +
 .../misc-devices/pci-endpoint-test.rst|  56 ++
 .../misc-devices/pci-endpoint-test.txt|  41 -
 .../misc-devices/spear-pcie-gadget.rst| 170 
 .../misc-devices/spear-pcie-gadget.txt| 130 ---
 Documentation/powerpc/index.rst   |   1 +
 ...patch_stats.txt => vcpudispatch_stats.rst} |  17 +-
 Documentation/sh/index.rst|   6 +
 .../sh/{new-machine.txt => new-machine.rst}   | 193 +++--
 ...{register-banks.txt => register-banks.rst} |  12 +-
 Documentation/trace/index.rst |   1 +
 ...ffer-design.txt => ring-buffer-design.rst} | 802 ++
 37 files changed, 1603 insertions(+), 1281 deletions(-)
 create mode 100644 Documentation/PCI/endpoint/function/binding/pci-test.rst
 delete mode 100644 Documentation/PCI/endpoint/function/binding/pci-test.txt
 rename Documentation/arm64/{perf.txt => perf.rst} (95%)
 rename Documentation/cpu-freq/{amd-powernow.txt => amd-powernow.rst} (89%)
 rename Documentation/cpu-freq/{core.txt => core.rst} (69%)
 rename Documentation/cpu-freq/{cpu-drivers.txt => cpu-drivers.rst} (72%)
 rename Documentation/cpu-freq/{cpufreq-nforce2.txt => cpufreq-nforce2.rst} 
(55%)
 rename Documentation/cpu-freq/{cpufreq-stats.txt => cpufreq-stats.rst} (53%)
 create mode 100644 Documentation/cpu-freq/index.rst
 delete mode 100644 Documentation/cpu-freq/index.txt
 rename Documentation/cpu-freq/{pcc-cpufreq.txt => pcc-cpufreq.rst} (80%)
 rename Documentation/crypto/{api-intro.txt => api-intro.rst} (70%)
 rename Documentation/crypto/{asymmetric-keys.txt => asymmetric-keys.rst} (91%)
 rename Documentation/crypto/{async-tx-api.txt => async-tx-api.rst} (55%)
 rename Documentation/crypto/{descore-readme.txt => descore-readme.rst} (81%)
 rename Documentation/misc-devices/{ad525x_dpot.txt => ad525x_dpot.rst} (85%)
 rename Documentation/misc-devices/{apds990x.txt => apds990x.rst} (86%)
 rename Documentation/misc-devices/{bh1770glc.txt => bh1770glc.rst} (83%)
 rename