Re: [PATCH v1 09/10] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition

2020-08-23 Thread Alexey Kardashevskiy



On 18/08/2020 09:40, Leonardo Bras wrote:
> As of today, if the biggest DDW that can be created can't map the whole
> partition, it's creation is skipped and the default DMA window
> "ibm,dma-window" is used instead.
> 
> DDW is 16x bigger than the default DMA window,

16x only under very specific circumstances which are
1. phyp
2. sriov
3. device class in hmc (or what that priority number is in the lpar config).

> having the same amount of
> pages, but increasing the page size to 64k.
> Besides larger DMA window,

"Besides being larger"?

> it performs better for allocations over 4k,

Better how?

> so it would be nice to use it instead.


I'd rather say something like:
===
So far we assumed we can map the guest RAM 1:1 to the bus which worked
with a small number of devices. SRIOV changes it as the user can
configure hundreds VFs and since phyp preallocates TCEs and does not
allow IOMMU pages bigger than 64K, it has to limit the number of TCEs
per a PE to limit waste of physical pages.
===


> 
> The DDW created will be used for direct mapping by default.
> If it's not available, indirect mapping will be used instead.
> 
> For indirect mapping, it's necessary to update the iommu_table so
> iommu_alloc() can use the DDW created. For this,
> iommu_table_update_window() is called when everything else succeeds
> at enable_ddw().
> 
> Removing the default DMA window for using DDW with indirect mapping
> is only allowed if there is no current IOMMU memory allocated in
> the iommu_table. enable_ddw() is aborted otherwise.
> 
> As there will never have both direct and indirect mappings at the same
> time, the same property name can be used for the created DDW.
> 
> So renaming
> define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> to
> define DMA64_PROPNAME "linux,dma64-ddr-window-info"
> looks the right thing to do.

I know I suggested this but this does not look so good anymore as I
suspect it breaks kexec (from older kernel to this one) so you either
need to check for both DT names or just keep the old one. Changing the
macro name is fine.


> 
> To make sure the property differentiates both cases, a new u32 for flags
> was added at the end of the property, where BIT(0) set means direct
> mapping.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 108 +++--
>  1 file changed, 84 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 3a1ef02ad9d5..9544e3c91ced 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -350,8 +350,11 @@ struct dynamic_dma_window_prop {
>   __be64  dma_base;   /* address hi,lo */
>   __be32  tce_shift;  /* ilog2(tce_page_size) */
>   __be32  window_shift;   /* ilog2(tce_window_size) */
> + __be32  flags;  /* DDW properties, see bellow */
>  };
>  
> +#define DDW_FLAGS_DIRECT 0x01

This is set if ((1<= ddw_memory_hotplug_max()), you
could simply check window_shift and drop the flags.


> +
>  struct direct_window {
>   struct device_node *device;
>   const struct dynamic_dma_window_prop *prop;
> @@ -377,7 +380,7 @@ static LIST_HEAD(direct_window_list);
>  static DEFINE_SPINLOCK(direct_window_list_lock);
>  /* protects initializing window twice for same device */
>  static DEFINE_MUTEX(direct_window_init_mutex);
> -#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>  
>  static int tce_clearrange_multi_pSeriesLP(unsigned long start_pfn,
>   unsigned long num_pfn, const void *arg)
> @@ -836,7 +839,7 @@ static void remove_ddw(struct device_node *np, bool 
> remove_prop)
>   if (ret)
>   return;
>  
> - win = of_find_property(np, DIRECT64_PROPNAME, NULL);
> + win = of_find_property(np, DMA64_PROPNAME, NULL);
>   if (!win)
>   return;
>  
> @@ -852,7 +855,7 @@ static void remove_ddw(struct device_node *np, bool 
> remove_prop)
>   np, ret);
>  }
>  
> -static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
> +static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr, bool 
> *direct_mapping)
>  {
>   struct direct_window *window;
>   const struct dynamic_dma_window_prop *direct64;
> @@ -864,6 +867,7 @@ static bool find_existing_ddw(struct device_node *pdn, 
> u64 *dma_addr)
>   if (window->device == pdn) {
>   direct64 = window->prop;
>   *dma_addr = be64_to_cpu(direct64->dma_base);
> + *direct_mapping = be32_to_cpu(direct64->flags) & 
> DDW_FLAGS_DIRECT;
>   found = true;
>   break;
>   }
> @@ -901,8 +905,8 @@ static int find_existing_ddw_windows(void)
>   if (!firmware_has_feature(FW_FEATURE_LPAR))
>   return 0;
>  
> 

Re: kernel since 5.6 do not boot anymore on Apple PowerBook

2020-08-23 Thread Christophe Leroy

Hello Giuseppe,

Le 22/08/2020 à 10:28, Giuseppe Sacco a écrit :

Hello Christophe,

Il giorno ven, 21/08/2020 alle 16.03 +0200, Christophe Leroy ha
scritto:
[...]


You also said in a previous mail that your original issue also
happens
when CONFIG_VMAP_STACK is not selected. The above bug being linked
to
CONFIG_VMAP_STACK, maybe it would be easier to bisect with
CONFIG_VMAP_STACK unselected.


I was wrong. Disabling CONFIG_VMAP_STACK led me to all "good" compile
and bisect ended without finding the culprit commit.

So, I started from scratch: I rebuilt HEAD and found that it does show
the original problem I am facing, then I rebuilt it without
CONFIG_VMAP_STACK and found that it does pass (fix?) the problem, since
kernel continue booting, but then it stops with three Oops related to
command systemd-udevd.

You may find a video that displays the complete boot, vmlinux, config,
and system.map files here:



The Oopses in the video are fixed in 5.9-rc2, see my response to your 
other mail.


So now we know that your kernel doesn't boot when CONFIG_VMAP_STACK is set.
Can you remind the exact problem ?

One common problem with CONFIG_VMAP_STACK is when some drivers are 
invalidly using buffers in stack for DMA.


Couldn't try with CONFIG_DEBUG_VIRTUAL (without CONFIG_VMAP_STACK) and 
see if it triggers some warnings ?


Thanks
Christophe


Re: Oops decoding help request

2020-08-23 Thread Christophe Leroy

Hi,

Le 23/08/2020 à 19:26, Giuseppe Sacco a écrit :

Hello,
I am not a kernel developer and I need much help in order to understand
a kernel Oops (the first of a series of three Oops). It is:


This is a bug in the kernel. It is fixed in 5.9-rc2.

See the following commit: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/powerpc/mm/book3s32/mmu.c?h=v5.9-rc2=541cebb51f3422d4f2c6cb95c1e5cc3dcc9e5021


Christophe



kernel tried to execute exec-protected page (f102) - exploit attempt? (uid: 
0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xf102
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash PowerMac
Modules linked in: crct10dif_generic (+) crct10dif_common 
drm_panel_orientation_quirks
CPU: 0 PID: 71 Comm: systemd-udevd Not tainted 5.9.0-rc1+ #298
NIP:  f102 LR: c00053a4 CTR: f102
REGS: c1c6dd50 TRAP: 0400   Not tainted  (5.9.0-rc1+)
MSR:  10009032   CR: 2284  XER: 

GPR00: c0005390 c1c6de08 c1c6b400  0cc0 0008 ef6db038 0001
GPR08: 002e    2284 00b6fb58  0005
GPR16: bff0d768 bff0d770   01032cc0  00b0b31f 01020960
GPR24:  00b70954 010206c0 ef39f4a0  00a28380 f102 f10193a0
NIP [f102] crct10dif_mod_init+0x0/0x60 [crct10dif_generic]
LR [c00053a4] do_one_initcall+0x50/0x1f4
Call Trace:
[c1c6de08] [c0005390] do_one_initcall+0x3c/0x1f4 (unreliable)
[c1c6de78] [c0102068] do_init_module+0x6c/0x27c
[c1c6dea8] [c01053cc] sys_finit_module+0xc0/0x12c
[c1c6df38] [c001c11c] ret_from_syscall+0x0/0x34
--- interrupt: c01 at 0x7a7780
 LR = 0xa1bf64
Instruction dump:
       
    <7c0802a6> 90010004 6000 9421fff0
---[ end trace 257a4bbda691894e ]---

 From what I understand, this is a problem in the init function of
module crct10dif_generic jumping at address f102.

I think I understand that f102 is an address for data and not for
code. In fact it belongs to "vmalloc & ioremap" area of the virtual
memory layout:
   * 0xffbee000..0xf000  : fixmap
   * 0xff40..0xff80  : highmem PTEs
   * 0xfda27000..0xff40  : early ioremap
   * 0xf100..0xfda27000  : vmalloc & ioremap

The init function is:

 :
0:  7c 08 02 a6 mflrr0
4:  90 01 00 04 stw r0,4(r1)
8:  48 00 00 01 bl  8 
c:  94 21 ff f0 stwur1,-16(r1)
   10:  7c 08 02 a6 mflrr0
   14:  3c 60 00 00 lis r3,0
   18:  90 01 00 14 stw r0,20(r1)
   1c:  38 63 00 00 addir3,r3,0
   20:  80 01 00 14 lwz r0,20(r1)
   24:  38 21 00 10 addir1,r1,16
   28:  7c 08 03 a6 mtlrr0
   2c:  48 00 00 00 b   2c 

and its source code is:

static int __init crct10dif_mod_init(void)
{
 return crypto_register_shash();
}

This is what I am not understanding. The error message seems to imply
that code jumps to an invalid address, so the problem would be that
address of function crypto_register_shash is calculated wrongly.

About stack addresses, please note that CONFIG_VMAP_STACK is not set.

Is this a correct reasoning?

Thank you very much,
Giuseppe



Re: [PATCH v1 08/10] powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()

2020-08-23 Thread Alexey Kardashevskiy



On 18/08/2020 09:40, Leonardo Bras wrote:
> Code used to create a ddw property that was previously scattered in
> enable_ddw() is now gathered in ddw_property_create(), which deals with
> allocation and filling the property, letting it ready for
> of_property_add(), which now occurs in sequence.
> 
> This created an opportunity to reorganize the second part of enable_ddw():
> 
> Without this patch enable_ddw() does, in order:
> kzalloc() property & members, create_ddw(), fill ddwprop inside property,
> ddw_list_add(), do tce_setrange_multi_pSeriesLP_walk in all memory,
> of_add_property().
> 
> With this patch enable_ddw() does, in order:
> create_ddw(), ddw_property_create(), of_add_property(), ddw_list_add(),
> do tce_setrange_multi_pSeriesLP_walk in all memory.
> 
> This change requires of_remove_property() in case anything fails after
> of_add_property(), but we get to do tce_setrange_multi_pSeriesLP_walk
> in all memory, which looks the most expensive operation, only if
> everything else succeeds.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 97 +++---
>  1 file changed, 57 insertions(+), 40 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 4031127c9537..3a1ef02ad9d5 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1123,6 +1123,31 @@ static void reset_dma_window(struct pci_dev *dev, 
> struct device_node *par_dn)
>ret);
>  }
>  
> +static int ddw_property_create(struct property **ddw_win, const char 
> *propname,

@propname is always the same, do you really want to pass it every time?

> +u32 liobn, u64 dma_addr, u32 page_shift, u32 
> window_shift)
> +{
> + struct dynamic_dma_window_prop *ddwprop;
> + struct property *win64;
> +
> + *ddw_win = win64 = kzalloc(sizeof(*win64), GFP_KERNEL);
> + if (!win64)
> + return -ENOMEM;
> +
> + win64->name = kstrdup(propname, GFP_KERNEL);

Not clear why "win64->name = DIRECT64_PROPNAME" would not work here, the
generic OF code does not try kfree() it but it is probably out of scope
here.


> + ddwprop = kzalloc(sizeof(*ddwprop), GFP_KERNEL);
> + win64->value = ddwprop;
> + win64->length = sizeof(*ddwprop);
> + if (!win64->name || !win64->value)
> + return -ENOMEM;


Up to 2 memory leaks here. I see the cleanup at "out_free_prop:" but
still looks fragile. Instead you could simply return win64 as the only
error possible here is -ENOMEM and returning NULL is equally good.


> +
> + ddwprop->liobn = cpu_to_be32(liobn);
> + ddwprop->dma_base = cpu_to_be64(dma_addr);
> + ddwprop->tce_shift = cpu_to_be32(page_shift);
> + ddwprop->window_shift = cpu_to_be32(window_shift);
> +
> + return 0;
> +}
> +
>  /*
>   * If the PE supports dynamic dma windows, and there is space for a table
>   * that can map all pages in a linear offset, then setup such a table,
> @@ -1140,12 +1165,11 @@ static bool enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   struct ddw_query_response query;
>   struct ddw_create_response create;
>   int page_shift;
> - u64 max_addr;
> + u64 max_addr, win_addr;
>   struct device_node *dn;
>   u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   struct direct_window *window;
> - struct property *win64;
> - struct dynamic_dma_window_prop *ddwprop;
> + struct property *win64 = NULL;
>   struct failed_ddw_pdn *fpdn;
>   bool default_win_removed = false;
>  
> @@ -1244,38 +1268,34 @@ static bool enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   goto out_failed;
>   }
>   len = order_base_2(max_addr);
> - win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
> - if (!win64) {
> - dev_info(>dev,
> - "couldn't allocate property for 64bit dma window\n");
> +
> + ret = create_ddw(dev, ddw_avail, , page_shift, len);
> + if (ret != 0)

It is usually just "if (ret)"


>   goto out_failed;
> - }
> - win64->name = kstrdup(DIRECT64_PROPNAME, GFP_KERNEL);
> - win64->value = ddwprop = kmalloc(sizeof(*ddwprop), GFP_KERNEL);
> - win64->length = sizeof(*ddwprop);
> - if (!win64->name || !win64->value) {
> +
> + dev_dbg(>dev, "created tce table LIOBN 0x%x for %pOF\n",
> + create.liobn, dn);
> +
> + win_addr = ((u64)create.addr_hi << 32) | create.addr_lo;
> + ret = ddw_property_create(, DIRECT64_PROPNAME, create.liobn, 
> win_addr,
> +   page_shift, len);
> + if (ret) {
>   dev_info(>dev,
> - "couldn't allocate property name and value\n");
> +  "couldn't allocate property, property name, or 
> value\n");
>   goto out_free_prop;
>   }
>  
> - ret = create_ddw(dev, ddw_avail, 

Re: [PATCH v1 06/10] powerpc/pseries/iommu: Add ddw_list_add() helper

2020-08-23 Thread Alexey Kardashevskiy



On 18/08/2020 09:40, Leonardo Bras wrote:
> There are two functions adding DDW to the direct_window_list in a
> similar way, so create a ddw_list_add() to avoid duplicity and
> simplify those functions.
> 
> Also, on enable_ddw(), add list_del() on out_free_window to allow
> removing the window from list if any error occurs.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 42 --
>  1 file changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 39617ce0ec83..fcdefcc0f365 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -872,6 +872,24 @@ static u64 find_existing_ddw(struct device_node *pdn)
>   return dma_addr;
>  }
>  
> +static struct direct_window *ddw_list_add(struct device_node *pdn,
> +   const struct dynamic_dma_window_prop 
> *dma64)
> +{
> + struct direct_window *window;
> +
> + window = kzalloc(sizeof(*window), GFP_KERNEL);
> + if (!window)
> + return NULL;
> +
> + window->device = pdn;
> + window->prop = dma64;
> + spin_lock(_window_list_lock);
> + list_add(>list, _window_list);
> + spin_unlock(_window_list_lock);
> +
> + return window;
> +}
> +
>  static int find_existing_ddw_windows(void)
>  {
>   int len;
> @@ -887,18 +905,11 @@ static int find_existing_ddw_windows(void)
>   if (!direct64)
>   continue;
>  
> - window = kzalloc(sizeof(*window), GFP_KERNEL);
> - if (!window || len < sizeof(struct dynamic_dma_window_prop)) {
> + window = ddw_list_add(pdn, direct64);
> + if (!window || len < sizeof(*direct64)) {


Since you are touching this code, it looks like the "len <
sizeof(*direct64)" part should go above to "if (!direct64)".



>   kfree(window);
>   remove_ddw(pdn, true);
> - continue;
>   }
> -
> - window->device = pdn;
> - window->prop = direct64;
> - spin_lock(_window_list_lock);
> - list_add(>list, _window_list);
> - spin_unlock(_window_list_lock);
>   }
>  
>   return 0;
> @@ -1261,7 +1272,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   dev_dbg(>dev, "created tce table LIOBN 0x%x for %pOF\n",
> create.liobn, dn);
>  
> - window = kzalloc(sizeof(*window), GFP_KERNEL);
> + /* Add new window to existing DDW list */

The comment seems to duplicate what the ddw_list_add name already suggests.


> + window = ddw_list_add(pdn, ddwprop);
>   if (!window)
>   goto out_clear_window;
>  
> @@ -1280,16 +1292,14 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   goto out_free_window;
>   }
>  
> - window->device = pdn;
> - window->prop = ddwprop;
> - spin_lock(_window_list_lock);
> - list_add(>list, _window_list);
> - spin_unlock(_window_list_lock);

I'd leave these 3 lines here and in find_existing_ddw_windows() (which
would make  ddw_list_add -> ddw_prop_alloc). In general you want to have
less stuff to do on the failure path. kmalloc may fail and needs kfree
but you can safely delay list_add (which cannot fail) and avoid having
the lock help twice in the same function (one of them is hidden inside
ddw_list_add).

Not sure if this change is really needed after all. Thanks,

> -
>   dma_addr = be64_to_cpu(ddwprop->dma_base);
>   goto out_unlock;
>  
>  out_free_window:
> + spin_lock(_window_list_lock);
> + list_del(>list);
> + spin_unlock(_window_list_lock);
> +
>   kfree(window);
>  
>  out_clear_window:
> 

-- 
Alexey


Re: [PATCH v1 07/10] powerpc/pseries/iommu: Allow DDW windows starting at 0x00

2020-08-23 Thread Alexey Kardashevskiy



On 18/08/2020 09:40, Leonardo Bras wrote:
> enable_ddw() currently returns the address of the DMA window, which is
> considered invalid if has the value 0x00.
> 
> Also, it only considers valid an address returned from find_existing_ddw
> if it's not 0x00.
> 
> Changing this behavior makes sense, given the users of enable_ddw() only
> need to know if direct mapping is possible. It can also allow a DMA window
> starting at 0x00 to be used.
> 
> This will be helpful for using a DDW with indirect mapping, as the window
> address will be different than 0x00, but it will not map the whole
> partition.
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 30 --
>  1 file changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index fcdefcc0f365..4031127c9537 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -852,24 +852,25 @@ static void remove_ddw(struct device_node *np, bool 
> remove_prop)
>   np, ret);
>  }
>  
> -static u64 find_existing_ddw(struct device_node *pdn)
> +static bool find_existing_ddw(struct device_node *pdn, u64 *dma_addr)
>  {
>   struct direct_window *window;
>   const struct dynamic_dma_window_prop *direct64;
> - u64 dma_addr = 0;
> + bool found = false;
>  
>   spin_lock(_window_list_lock);
>   /* check if we already created a window and dupe that config if so */
>   list_for_each_entry(window, _window_list, list) {
>   if (window->device == pdn) {
>   direct64 = window->prop;
> - dma_addr = be64_to_cpu(direct64->dma_base);
> + *dma_addr = be64_to_cpu(direct64->dma_base);
> + found = true;
>   break;
>   }
>   }
>   spin_unlock(_window_list_lock);
>  
> - return dma_addr;
> + return found;
>  }
>  
>  static struct direct_window *ddw_list_add(struct device_node *pdn,
> @@ -1131,15 +1132,15 @@ static void reset_dma_window(struct pci_dev *dev, 
> struct device_node *par_dn)
>   * pdn: the parent pe node with the ibm,dma_window property
>   * Future: also check if we can remap the base window for our base page size
>   *
> - * returns the dma offset for use by the direct mapped DMA code.
> + * returns true if can map all pages (direct mapping), false otherwise..
>   */
> -static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> +static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>  {
>   int len, ret;
>   struct ddw_query_response query;
>   struct ddw_create_response create;
>   int page_shift;
> - u64 dma_addr, max_addr;
> + u64 max_addr;
>   struct device_node *dn;
>   u32 ddw_avail[DDW_APPLICABLE_SIZE];
>   struct direct_window *window;
> @@ -1150,8 +1151,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>  
>   mutex_lock(_window_init_mutex);
>  
> - dma_addr = find_existing_ddw(pdn);
> - if (dma_addr != 0)
> + if (find_existing_ddw(pdn, >dev.archdata.dma_offset))
>   goto out_unlock;
>  
>   /*
> @@ -1292,7 +1292,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   goto out_free_window;
>   }
>  
> - dma_addr = be64_to_cpu(ddwprop->dma_base);
> + dev->dev.archdata.dma_offset = be64_to_cpu(ddwprop->dma_base);


Do not you need the same chunk in the find_existing_ddw() case above as
well? Thanks,


>   goto out_unlock;
>  
>  out_free_window:
> @@ -1309,6 +1309,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   kfree(win64->name);
>   kfree(win64->value);
>   kfree(win64);
> + win64 = NULL;
>  
>  out_failed:
>   if (default_win_removed)
> @@ -1322,7 +1323,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>  
>  out_unlock:
>   mutex_unlock(_window_init_mutex);
> - return dma_addr;
> + return win64;
>  }
>  
>  static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
> @@ -1401,11 +1402,8 @@ static bool iommu_bypass_supported_pSeriesLP(struct 
> pci_dev *pdev, u64 dma_mask)
>   break;
>   }
>  
> - if (pdn && PCI_DN(pdn)) {
> - pdev->dev.archdata.dma_offset = enable_ddw(pdev, pdn);
> - if (pdev->dev.archdata.dma_offset)
> - return true;
> - }
> + if (pdn && PCI_DN(pdn))
> + return enable_ddw(pdev, pdn);
>  
>   return false;
>  }
> 

-- 
Alexey


Re: [PATCH] powerpc/prom_init: Check display props exist before enabling btext

2020-08-23 Thread Michael Ellerman
Alexey Kardashevskiy  writes:
> On 21/08/2020 20:34, Michael Ellerman wrote:
>> It's possible to enable CONFIG_PPC_EARLY_DEBUG_BOOTX for a pseries
>> kernel (maybe it shouldn't be), which is then booted with qemu/slof.
>
>
> CONFIG_BOOTX_TEXT=y
> CONFIG_PPC_EARLY_DEBUG=y
> CONFIG_PPC_EARLY_DEBUG_BOOTX=y
>
> this does not crash my VM. The changed chunk is sitting under "if
> (prom_getprop(node, "linux,boot-display", NULL, 0)" and I cannot find
> what creates this property - it is neither slof/grub/qemu, unlikely that
> it is phyp so it must be this one:
>
> arch/powerpc/platforms/powermac/bootx_init.c|244|
> bootx_dt_add_string("linux,boot-display", mem_end);

It's in prom_init.c:

static void __init prom_init_stdout(void)
{
...
stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
if (stdout_node != PROM_ERROR) {
val = cpu_to_be32(stdout_node);

/* If it's a display, note it */
memset(type, 0, sizeof(type));
prom_getprop(stdout_node, "device_type", type, sizeof(type));
if (prom_strcmp(type, "display") == 0)
prom_setprop(stdout_node, path, "linux,boot-display", 
NULL, 0);
}
}

> which is powermac and not pseries. Or may be that pmac firmware.
>
> Where did you see this crash?

Qemu pseries either TCG or KVM with eg:

$ qemu-system-ppc64 -M pseries -cpu POWER8 -m 1G -kernel build~/vmlinux


>> But if you do that the kernel crashes in draw_byte(), with a DAR
>> pointing somewhere near INT_MAX.
>> 
>> Adding some debug to prom_init we see that we're not able to read the
>> "address" property from OF, so we're just using whatever junk value
>> was on the stack.
>> 
>> So check the properties can be read properly from OF, if not we bail
>> out before initialising btext, which avoids the crash.
>
> This is a right thing any way, just the commit log is confusing.
>
> Reviewed-by: Alexey Kardashevskiy 

Thanks.

cheers


[powerpc:merge] BUILD SUCCESS 3829b105d464789815a333dd34ceac63cfe9bd22

2020-08-23 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: 3829b105d464789815a333dd34ceac63cfe9bd22  Automatic merge of 
'master', 'next' and 'fixes' (2020-08-23 21:44)

elapsed time: 846m

configs tested: 107
configs skipped: 8

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
m68k   m5475evb_defconfig
m68k   bvme6000_defconfig
m68k   sun3_defconfig
powerpc  chrp32_defconfig
armqcom_defconfig
arm mv78xx0_defconfig
c6xevmc6457_defconfig
powerpc  pasemi_defconfig
armhisi_defconfig
armspear3xx_defconfig
nds32alldefconfig
powerpc powernv_defconfig
shapsh4ad0a_defconfig
arm  colibri_pxa270_defconfig
mips   sb1250_swarm_defconfig
arm shannon_defconfig
m68kq40_defconfig
ia64 bigsur_defconfig
powerpc  pmac32_defconfig
powerpc skiroot_defconfig
archsdk_defconfig
csky alldefconfig
riscvnommu_virt_defconfig
armneponset_defconfig
h8300allyesconfig
sh  polaris_defconfig
sh  urquell_defconfig
powerpc defconfig
m68k apollo_defconfig
powerpc  alldefconfig
armpleb_defconfig
armdove_defconfig
powerpcamigaone_defconfig
arm  tango4_defconfig
shshmin_defconfig
arm   tegra_defconfig
armmps2_defconfig
arm64alldefconfig
arc nps_defconfig
arm  ep93xx_defconfig
um   x86_64_defconfig
arm   cns3420vb_defconfig
armmulti_v5_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a002-20200823
x86_64   randconfig-a003-20200823
x86_64   randconfig-a005-20200823
x86_64   randconfig-a001-20200823
x86_64   randconfig-a006-20200823
x86_64   randconfig-a004-20200823
i386 randconfig-a002-20200823
i386 randconfig-a004-20200823
i386 randconfig-a003-20200823
i386 randconfig-a005-20200823
i386 randconfig-a006-20200823
i386 randconfig-a001-20200823
i386 randconfig-a013-20200823
i386 randconfig-a012-20200823
i386 randconfig-a011-20200823
i386 randconfig-a016-20200823
i386 randconfig-a014-20200823
i386 randconfig-a015-20200823
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscv

[powerpc:next-test] BUILD SUCCESS 02a4db8feb0e35a215b6b803bce0ab8a1fd32838

2020-08-23 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
next-test
branch HEAD: 02a4db8feb0e35a215b6b803bce0ab8a1fd32838  powerpc/64: Remove 
unused generic_secondary_thread_init()

elapsed time: 845m

configs tested: 112
configs skipped: 8

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
m68k   m5475evb_defconfig
m68k   bvme6000_defconfig
m68k   sun3_defconfig
powerpc  chrp32_defconfig
armqcom_defconfig
arm mv78xx0_defconfig
c6xevmc6457_defconfig
powerpc  pasemi_defconfig
armhisi_defconfig
armspear3xx_defconfig
nds32alldefconfig
powerpc powernv_defconfig
shapsh4ad0a_defconfig
arm  colibri_pxa270_defconfig
mips   sb1250_swarm_defconfig
arm shannon_defconfig
m68kq40_defconfig
ia64 bigsur_defconfig
powerpc  pmac32_defconfig
powerpc skiroot_defconfig
archsdk_defconfig
csky alldefconfig
riscvnommu_virt_defconfig
armneponset_defconfig
h8300allyesconfig
sh  polaris_defconfig
sh  urquell_defconfig
powerpc defconfig
m68k apollo_defconfig
powerpc  alldefconfig
armpleb_defconfig
armdove_defconfig
mips  malta_kvm_defconfig
powerpc  g5_defconfig
x86_64   alldefconfig
arm  ep93xx_defconfig
arm lpc32xx_defconfig
sh shx3_defconfig
powerpcamigaone_defconfig
arm  tango4_defconfig
shshmin_defconfig
arm   tegra_defconfig
armmps2_defconfig
arm64alldefconfig
arc nps_defconfig
um   x86_64_defconfig
arm   cns3420vb_defconfig
armmulti_v5_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a002-20200823
x86_64   randconfig-a003-20200823
x86_64   randconfig-a005-20200823
x86_64   randconfig-a001-20200823
x86_64   randconfig-a006-20200823
x86_64   randconfig-a004-20200823
i386 randconfig-a002-20200823
i386 randconfig-a004-20200823
i386 randconfig-a003-20200823
i386 randconfig-a005-20200823
i386 randconfig-a006-20200823
i386 randconfig-a001-20200823
i386 randconfig-a013-20200823
i386 randconfig-a012-20200823
i386 randconfig-a011-20200823
i386 randconfig-a016-20200823
i386 randconfig

Re: [PATCH v1 05/10] powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper

2020-08-23 Thread Alexey Kardashevskiy



On 18/08/2020 09:40, Leonardo Bras wrote:
> Creates a helper to allow allocating a new iommu_table without the need
> to reallocate the iommu_group.
> 
> This will be helpful for replacing the iommu_table for the new DMA window,
> after we remove the old one with iommu_tce_table_put().
> 
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 25 ++---
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 8fe23b7dff3a..39617ce0ec83 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -53,28 +53,31 @@ enum {
>   DDW_EXT_QUERY_OUT_SIZE = 2
>  };
>  
> -static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> +static struct iommu_table *iommu_pseries_alloc_table(int node)
>  {
> - struct iommu_table_group *table_group;
>   struct iommu_table *tbl;
>  
> - table_group = kzalloc_node(sizeof(struct iommu_table_group), GFP_KERNEL,
> -node);
> - if (!table_group)
> - return NULL;
> -
>   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
>   if (!tbl)
> - goto free_group;
> + return NULL;
>  
>   INIT_LIST_HEAD_RCU(>it_group_list);
>   kref_init(>it_kref);
> + return tbl;
> +}
>  
> - table_group->tables[0] = tbl;
> +static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> +{
> + struct iommu_table_group *table_group;
> +
> + table_group = kzalloc_node(sizeof(*table_group), GFP_KERNEL, node);


I'd prefer you did not make unrelated changes (sizeof(struct
iommu_table_group) -> sizeof(*table_group)) so the diff stays shorter
and easier to follow. You changed  sizeof(struct iommu_table_group) but
not sizeof(struct iommu_table) and this confused me enough to spend more
time than this straight forward change deserves.

Not important in this case though so

Reviewed-by: Alexey Kardashevskiy 




> + if (!table_group)
> + return NULL;
>  
> - return table_group;
> + table_group->tables[0] = iommu_pseries_alloc_table(node);
> + if (table_group->tables[0])
> + return table_group;
>  
> -free_group:
>   kfree(table_group);
>   return NULL;
>  }
> 

-- 
Alexey


[PATCH] Documentation/powerpc: fix malformed table in syscall64-abi

2020-08-23 Thread Randy Dunlap
From: Randy Dunlap 

Fix malformed table warning in powerpc/syscall64-abi.rst by making
two tables and moving the headings.

Documentation/powerpc/syscall64-abi.rst:53: WARNING: Malformed table.
Text in column margin in table line 2.

=== = 
--- For the sc instruction, differences with the ELF ABI ---
r0  Volatile  (System call number.)
r3  Volatile  (Parameter 1, and return value.)
r4-r8   Volatile  (Parameters 2-6.)
cr0 Volatile  (cr0.SO is the return error condition.)
cr1, cr5-7  Nonvolatile
lr  Nonvolatile

--- For the scv 0 instruction, differences with the ELF ABI ---
r0  Volatile  (System call number.)
r3  Volatile  (Parameter 1, and return value.)
r4-r8   Volatile  (Parameters 2-6.)
=== = 

Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv 
instructions")
Signed-off-by: Randy Dunlap 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
---
 Documentation/powerpc/syscall64-abi.rst |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- lnx-59-rc2.orig/Documentation/powerpc/syscall64-abi.rst
+++ lnx-59-rc2/Documentation/powerpc/syscall64-abi.rst
@@ -49,16 +49,18 @@ Register preservation rules
 Register preservation rules match the ELF ABI calling sequence with the
 following differences:
 
-=== = 
 --- For the sc instruction, differences with the ELF ABI ---
+=== = 
 r0  Volatile  (System call number.)
 r3  Volatile  (Parameter 1, and return value.)
 r4-r8   Volatile  (Parameters 2-6.)
 cr0 Volatile  (cr0.SO is the return error condition.)
 cr1, cr5-7  Nonvolatile
 lr  Nonvolatile
+=== = 
 
 --- For the scv 0 instruction, differences with the ELF ABI ---
+=== = 
 r0  Volatile  (System call number.)
 r3  Volatile  (Parameter 1, and return value.)
 r4-r8   Volatile  (Parameters 2-6.)




Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.9-3 tag

2020-08-23 Thread pr-tracker-bot
The pull request you sent on Sun, 23 Aug 2020 22:50:13 +1000:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> tags/powerpc-5.9-3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/cb95712138ec5e480db5160b41172bbc6f6494cc

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Oops decoding help request

2020-08-23 Thread Giuseppe Sacco
Hello,
I am not a kernel developer and I need much help in order to understand
a kernel Oops (the first of a series of three Oops). It is:

kernel tried to execute exec-protected page (f102) - exploit attempt? (uid: 
0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xf102
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash PowerMac
Modules linked in: crct10dif_generic (+) crct10dif_common 
drm_panel_orientation_quirks
CPU: 0 PID: 71 Comm: systemd-udevd Not tainted 5.9.0-rc1+ #298
NIP:  f102 LR: c00053a4 CTR: f102
REGS: c1c6dd50 TRAP: 0400   Not tainted  (5.9.0-rc1+)
MSR:  10009032   CR: 2284  XER: 

GPR00: c0005390 c1c6de08 c1c6b400  0cc0 0008 ef6db038 0001
GPR08: 002e    2284 00b6fb58  0005
GPR16: bff0d768 bff0d770   01032cc0  00b0b31f 01020960
GPR24:  00b70954 010206c0 ef39f4a0  00a28380 f102 f10193a0
NIP [f102] crct10dif_mod_init+0x0/0x60 [crct10dif_generic]
LR [c00053a4] do_one_initcall+0x50/0x1f4
Call Trace:
[c1c6de08] [c0005390] do_one_initcall+0x3c/0x1f4 (unreliable)
[c1c6de78] [c0102068] do_init_module+0x6c/0x27c
[c1c6dea8] [c01053cc] sys_finit_module+0xc0/0x12c
[c1c6df38] [c001c11c] ret_from_syscall+0x0/0x34
--- interrupt: c01 at 0x7a7780
LR = 0xa1bf64
Instruction dump:
       
    <7c0802a6> 90010004 6000 9421fff0
---[ end trace 257a4bbda691894e ]---

>From what I understand, this is a problem in the init function of
module crct10dif_generic jumping at address f102.

I think I understand that f102 is an address for data and not for
code. In fact it belongs to "vmalloc & ioremap" area of the virtual
memory layout:
  * 0xffbee000..0xf000  : fixmap
  * 0xff40..0xff80  : highmem PTEs
  * 0xfda27000..0xff40  : early ioremap
  * 0xf100..0xfda27000  : vmalloc & ioremap

The init function is:

 :
   0:   7c 08 02 a6 mflrr0
   4:   90 01 00 04 stw r0,4(r1)
   8:   48 00 00 01 bl  8 
   c:   94 21 ff f0 stwur1,-16(r1)
  10:   7c 08 02 a6 mflrr0
  14:   3c 60 00 00 lis r3,0
  18:   90 01 00 14 stw r0,20(r1)
  1c:   38 63 00 00 addir3,r3,0
  20:   80 01 00 14 lwz r0,20(r1)
  24:   38 21 00 10 addir1,r1,16
  28:   7c 08 03 a6 mtlrr0
  2c:   48 00 00 00 b   2c 

and its source code is:

static int __init crct10dif_mod_init(void)
{
return crypto_register_shash();
}

This is what I am not understanding. The error message seems to imply
that code jumps to an invalid address, so the problem would be that
address of function crypto_register_shash is calculated wrongly.

About stack addresses, please note that CONFIG_VMAP_STACK is not set.

Is this a correct reasoning?

Thank you very much,
Giuseppe



[GIT PULL] Please pull powerpc/linux.git powerpc-5.9-3 tag

2020-08-23 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 5.9.

There's one non-fix, which is the perf extended regs support. That was posted
way back but I waited for the tools/perf part to land, which it now has.

cheers


The following changes since commit 9123e3a74ec7b934a4a099e98af6a61c2f80bbf5:

  Linux 5.9-rc1 (2020-08-16 13:04:57 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.9-3

for you to fetch changes up to 64ef8f2c4791940d7f3945507b6a45c20d959260:

  powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver 
(2020-08-21 23:35:27 +1000)

- --
powerpc fixes for 5.9 #3

Add perf support for emitting extended registers for power10.

A fix for CPU hotplug on pseries, where on large/loaded systems we may not wait
long enough for the CPU to be offlined, leading to crashes.

Addition of a raw cputable entry for Power10, which is not required to boot, but
is required to make our PMU setup work correctly in guests.

Three fixes for the recent changes on 32-bit Book3S to move modules into their
own segment for strict RWX.

A fix for a recent change in our powernv PCI code that could lead to crashes.

A change to our perf interrupt accounting to avoid soft lockups when using some
events, found by syzkaller.

A change in the way we handle power loss events from the hypervisor on pseries.
We no longer immediately shut down if we're told we're running on a UPS.

A few other minor fixes.

Thanks to:
  Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar,
  Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz, Kajol Jain,
  Madhavan Srinivasan, Michael Neuling, Michael Roth, Nageswara R Sastry, Oliver
  O'Halloran, Thiago Jung Bauermann, Vaidyanathan Srinivasan, Vasant Hegde.

- --
Aneesh Kumar K.V (1):
  powerpc/pkeys: Fix build error with PPC_MEM_KEYS disabled

Anju T Sudhakar (1):
  powerpc/perf: Add support for outputting extended regs in perf intr_regs

Athira Rajeev (2):
  powerpc/perf: Add extended regs support for power10 platform
  powerpc/perf: Fix soft lockups due to missed interrupt accounting

Christophe Leroy (4):
  powerpc/fixmap: Fix the size of the early debug area
  powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
  powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
  powerpc/32s: Fix module loading failure when VMALLOC_END is over 
0xf000

Frederic Barrat (1):
  powerpc/powernv/pci: Fix possible crash when releasing DMA resources

Kajol Jain (1):
  powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver

Madhavan Srinivasan (2):
  powerpc: Add POWER10 raw mode cputable entry
  powerpc/kernel: Cleanup machine check function declarations

Michael Neuling (1):
  powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores

Michael Roth (1):
  powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death

Vasant Hegde (1):
  powerpc/pseries: Do not initiate shutdown when system is running on UPS


 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 |  2 +-
 arch/powerpc/include/asm/cputable.h  |  5 +++
 arch/powerpc/include/asm/fixmap.h|  2 +-
 arch/powerpc/include/asm/kasan.h |  9 +++-
 arch/powerpc/include/asm/mce.h   |  7 
 arch/powerpc/include/asm/perf_event.h|  3 ++
 arch/powerpc/include/asm/perf_event_server.h |  5 +++
 arch/powerpc/include/uapi/asm/perf_regs.h| 20 -
 arch/powerpc/kernel/cputable.c   | 22 
--
 arch/powerpc/kernel/dt_cpu_ftrs.c|  4 --
 arch/powerpc/kernel/setup-common.c   |  1 +
 arch/powerpc/mm/book3s32/mmu.c   |  9 +++-
 arch/powerpc/mm/book3s64/hash_utils.c|  4 +-
 arch/powerpc/perf/core-book3s.c  |  5 +++
 arch/powerpc/perf/hv-24x7.c  | 11 -
 arch/powerpc/perf/perf_regs.c| 44 
++--
 arch/powerpc/perf/power10-pmu.c  |  6 +++
 arch/powerpc/perf/power9-pmu.c   |  6 +++
 arch/powerpc/platforms/powernv/pci-ioda.c|  2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 18 +---
 arch/powerpc/platforms/pseries/ras.c |  1 -
 21 files changed, 161 insertions(+), 25 deletions(-)
-BEGIN PGP SIGNATURE-