date:20190418

Re: [PATCH 1/1] iommu/arm-smmu: Log CBFRSYNRA register on context fault

2019-04-18 Thread Vivek Gautam

On Fri, Apr 19, 2019 at 5:55 AM Bjorn Andersson
 wrote:
>
> On Mon 15 Apr 10:37 PDT 2019, Vivek Gautam wrote:
>
> > Bits[15:0] in CBFRSYNRA register contain information about
> > StreamID of the incoming transaction that generated the
> > fault. Dump CBFRSYNRA register to get this info.
> > This is specially useful in a distributed SMMU architecture
> > where multiple masters are connected to the SMMU.
> > SID information helps to quickly identify the faulting
> > master device.
> >
> > Signed-off-by: Vivek Gautam 
> > ---
> >
> > V1 of the patch available @
> > https://lore.kernel.org/patchwork/patch/1061615/
> >
> > Changes from v1:
> >  - Dump the raw register value of CBFRSYNRA register in the
> >context fault log rather than extracting the SID inforamtion
> >and dumping that.
> >
> >  drivers/iommu/arm-smmu-regs.h | 2 ++
> >  drivers/iommu/arm-smmu.c  | 8 ++--
> >  2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> > index a1226e4ab5f8..e9132a926761 100644
> > --- a/drivers/iommu/arm-smmu-regs.h
> > +++ b/drivers/iommu/arm-smmu-regs.h
> > @@ -147,6 +147,8 @@ enum arm_smmu_s2cr_privcfg {
> >  #define CBAR_IRPTNDX_SHIFT   24
> >  #define CBAR_IRPTNDX_MASK0xff
> >
> > +#define ARM_SMMU_GR1_CBFRSYNRA(n)(0x400 + ((n) << 2))
> > +
> >  #define ARM_SMMU_GR1_CBA2R(n)(0x800 + ((n) << 2))
> >  #define CBA2R_RW64_32BIT (0 << 0)
> >  #define CBA2R_RW64_64BIT (1 << 0)
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 045d93884164..a4773e8c6b0e 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -575,7 +575,9 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
> > *dev)
> >   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >   struct arm_smmu_cfg *cfg = _domain->cfg;
> >   struct arm_smmu_device *smmu = smmu_domain->smmu;
> > + void __iomem *gr1_base = ARM_SMMU_GR1(smmu);
> >   void __iomem *cb_base;
> > + u32 cbfrsynra;
> >
> >   cb_base = ARM_SMMU_CB(smmu, cfg->cbndx);
> >   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
> > @@ -585,10 +587,12 @@ static irqreturn_t arm_smmu_context_fault(int irq, 
> > void *dev)
> >
> >   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
> >   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
> > + cbfrsynra = readl_relaxed(gr1_base +
> > +   ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
>
> The 80 char limit is more like a guideline anyways...please don't wrap
> this.
>
> >
> >   dev_err_ratelimited(smmu->dev,
> > - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
> > cb=%d\n",
> > - fsr, iova, fsynr, cfg->cbndx);
> > + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
> > cbfrsynra = 0x%x, cb=%d\n",
>
> Drop the spaces around '='.
>
> With those addressed, you have my
>
> Reviewed-by: Bjorn Andersson 

Thanks for the review Bjorn. Will address above comments and respin.

Best regards
Vivek

>
> Regards,
> Bjorn
>
> > + fsr, iova, fsynr, cbfrsynra, cfg->cbndx);
> >
> >   writel(fsr, cb_base + ARM_SMMU_CB_FSR);
> >   return IRQ_HANDLED;
> > --
> > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> > of Code Aurora Forum, hosted by The Linux Foundation
> >
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/18] iommu/vt-d: Add custom allocator for IOASID

2019-04-18 Thread Jacob Pan

On Thu, 18 Apr 2019 16:36:02 +0100
Jean-Philippe Brucker  wrote:

> On 16/04/2019 00:10, Jacob Pan wrote:[...]
> >> > +   /*
> >> > +    * Register a custom ASID allocator if we
> >> > are running
> >> > +    * in a guest, the purpose is to have a
> >> > system wide PASID
> >> > +    * namespace among all PASID users.
> >> > +    * Note that only one vIOMMU in each guest
> >> > is supported.    
> >> 
> >> Why one vIOMMU per guest?  This would prevent guests with multiple
> >> PCI domains aiui.
> >>   
> > This is mainly for simplicity reasons. These are all virtual BDFs
> > anyway. As long as guest BDF can be mapped to a host BDF, it should
> > be sufficient, am I missing anything?
> > 
> > From PASID allocation perspective, it is not tied to any PCI device
> > until bind call. We only need to track PASID ownership per guest.
> > 
> > virtio-IOMMU spec does support multiple PCI domains but I am not
> > sure if that applies to all assigned devices, whether all assigned
> > devices are under the same domain. Perhaps Jean can help to clarify
> > how PASID allocation API looks like on virtio IOMMU.  
> 
> [Ugh, this is much longer than I hoped. In short I don't think
> multiple vIOMMUs is a problem, because the host uses the same
> allocator for all of them.]
> 
I agreed, it is not an issue as far as PASID allocation is concerned.
> Yes there can be a single virtio-iommu instance for multiple PCI
> domains, or multiple instances each managing assigned devices. It's
> up to the hypervisor to decide on the topology.
> 
> For Linux and QEMU I was assuming that choosing the vIOMMU used for
> PASID allocation isn't a big deal, since in the end they all use the
> same allocator in the host. It gets complicated when some vIOMMUs can
> be removed at runtime (unload the virtio-iommu module that was
> providing the PASID allocator, and then you can't allocate PASIDs for
> the VT-d instance anymore), so maybe limiting to one type of vIOMMU
> (don't mix VT-d and virtio-iommu in the same VM) is more reasonable.
> 
I think you can deal with the hot removal of vIOMMU by keeping multiple
allocators in a list. i.e. when the second vIOMMU register an
allocator, instead of return -EBUSY, we just keep the it in back pocket
list. If the first vIOMMU is removed, the second one can be popped out
into action (and vise versa). Then we always have an allocator.
> It's a bit more delicate from the virtio-iommu perspective. The
> interface is portable and I can't tie it down to the choices we're
> making for Linux and KVM. Having a system-wide PASID space is what we
> picked for Linux but the PCIe architecture allows for each device to
> have their own PASID space, and I suppose some hypervisors and guests
> might prefer implementing it that way.
> 
> My plan for the moment is to implement global PASID allocation using
> one feature bit and two new requests, but leave space for a
> per-device PASID allocation, introduced with another feature bit if
> necessary. If it ever gets added, I expect the per-device allocation
> to be done during the bind request rather than with a separate
> PASID_ALLOC request.
> 
> So currently I have a new feature bit and two commands:
> 
> #define VIRTIO_IOMMU_F_PASID_ALLOC
> #define VIRTIO_IOMMU_T_ALLOC_PASID
> #define VIRTIO_IOMMU_T_FREE_PASID
> 
> struct virtio_iommu_req_alloc_pasid {
> struct virtio_iommu_req_head head;
> u32 reserved;
> 
> /* Device-writeable */
> le32 pasid;
> struct virtio_iommu_req_tail tail;
> };
> 
> struct virtio_iommu_req_free_pasid {
> struct virtio_iommu_req_head head;
> u32 reserved;
> le32 pasid;
> 
> /* Device-writeable */
> struct virtio_iommu_req_tail tail;
> };
> 
> If the feature bit is offered it must be used, and the guest can only
> use PASIDs allocated via VIRTIO_IOMMU_T_ALLOC_PASID in its bind
> requests.
> 
> The PASID space is global at the host scope. If multiple virtio-iommu
> devices in the VM offer the feature bit, then using either of their
> command queue to issue a VIRTIO_IOMMU_F_ALLOC_PASID and
> VIRTIO_IOMMU_F_FREE_PASID is equivalent. Another possibility is to
> require that only one of the virtio-iommu instances per VM offers the
> feature bit. I do prefer this option, but there is the vIOMMU removal
> problem mentioned above - which, with the first option, could be
> solved by keeping a list of PASID allocator functions rather than a
> single one.
> 
> I'm considering adding max_pasid field to
> virtio_iommu_req_alloc_pasid. If VIRTIO_IOMMU_T_ALLOC_PASID returns a
> random 20-bit value then a lot of space might be needed for storing
> PASID contexts (is that a real concern though? For internal data it
> can use a binary tree, and the guest is not in charge of hardware
> PASID tables here). If the guest is short on memory then it could
> benefit from a smaller number of PASID bits. That could

Re: [patch V2 01/29] tracing: Cleanup stack trace code

2019-04-18 Thread Steven Rostedt

On Fri, 19 Apr 2019 00:44:17 +0200 (CEST)
Thomas Gleixner  wrote:

> On Thu, 18 Apr 2019, Steven Rostedt wrote:
> > On Thu, 18 Apr 2019 10:41:20 +0200
> > Thomas Gleixner  wrote:
> >   
> > > @@ -412,23 +404,20 @@ stack_trace_sysctl(struct ctl_table *tab
> > >  void __user *buffer, size_t *lenp,
> > >  loff_t *ppos)
> > >  {
> > > - int ret;
> > > + int ret, was_enabled;  
> > 
> > One small nit. Could this be:
> > 
> > int was_enabled;
> > int ret;
> > 
> > I prefer only joining variables that are related on the same line.
> > Makes it look cleaner IMO.  
> 
> If you wish so. To me it's waste of screen space :)

At least you didn't say it helps the compiler ;-)

> 
> > >  
> > >   mutex_lock(_sysctl_mutex);
> > > + was_enabled = !!stack_tracer_enabled;
> > >
> > 
> > Bah, not sure why I didn't do it this way to begin with. I think I
> > copied something else that couldn't do it this way for some reason and
> > didn't put any brain power behind the copy. :-/ But that was back in
> > 2008 so I blame it on being "young and stupid" ;-)  
> 
> The young part is gone for sure :)

I purposely set you up for that response.

> 
> > Other then the above nit and removing the unneeded +1 in max_entries:  
> 
> s/+1/-1/

That was an ode to G+

-- Steve
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/1] iommu/arm-smmu: Log CBFRSYNRA register on context fault

2019-04-18 Thread Bjorn Andersson

On Mon 15 Apr 10:37 PDT 2019, Vivek Gautam wrote:

> Bits[15:0] in CBFRSYNRA register contain information about
> StreamID of the incoming transaction that generated the
> fault. Dump CBFRSYNRA register to get this info.
> This is specially useful in a distributed SMMU architecture
> where multiple masters are connected to the SMMU.
> SID information helps to quickly identify the faulting
> master device.
> 
> Signed-off-by: Vivek Gautam 
> ---
> 
> V1 of the patch available @
> https://lore.kernel.org/patchwork/patch/1061615/
> 
> Changes from v1:
>  - Dump the raw register value of CBFRSYNRA register in the
>context fault log rather than extracting the SID inforamtion
>and dumping that.
> 
>  drivers/iommu/arm-smmu-regs.h | 2 ++
>  drivers/iommu/arm-smmu.c  | 8 ++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> index a1226e4ab5f8..e9132a926761 100644
> --- a/drivers/iommu/arm-smmu-regs.h
> +++ b/drivers/iommu/arm-smmu-regs.h
> @@ -147,6 +147,8 @@ enum arm_smmu_s2cr_privcfg {
>  #define CBAR_IRPTNDX_SHIFT   24
>  #define CBAR_IRPTNDX_MASK0xff
>  
> +#define ARM_SMMU_GR1_CBFRSYNRA(n)(0x400 + ((n) << 2))
> +
>  #define ARM_SMMU_GR1_CBA2R(n)(0x800 + ((n) << 2))
>  #define CBA2R_RW64_32BIT (0 << 0)
>  #define CBA2R_RW64_64BIT (1 << 0)
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 045d93884164..a4773e8c6b0e 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -575,7 +575,9 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
> *dev)
>   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>   struct arm_smmu_cfg *cfg = _domain->cfg;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> + void __iomem *gr1_base = ARM_SMMU_GR1(smmu);
>   void __iomem *cb_base;
> + u32 cbfrsynra;
>  
>   cb_base = ARM_SMMU_CB(smmu, cfg->cbndx);
>   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
> @@ -585,10 +587,12 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
> *dev)
>  
>   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
>   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
> + cbfrsynra = readl_relaxed(gr1_base +
> +   ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));

The 80 char limit is more like a guideline anyways...please don't wrap
this.

>  
>   dev_err_ratelimited(smmu->dev,
> - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cb=%d\n",
> - fsr, iova, fsynr, cfg->cbndx);
> + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra 
> = 0x%x, cb=%d\n",

Drop the spaces around '='.

With those addressed, you have my

Reviewed-by: Bjorn Andersson 

Regards,
Bjorn

> + fsr, iova, fsynr, cbfrsynra, cfg->cbndx);
>  
>   writel(fsr, cb_base + ARM_SMMU_CB_FSR);
>   return IRQ_HANDLED;
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/3] PCI: Add dma_ranges window list

2019-04-18 Thread Bjorn Helgaas

On Fri, Apr 12, 2019 at 08:43:33AM +0530, Srinath Mannam wrote:
> Add a dma_ranges field in PCI host bridge structure to hold resource
> entries list of memory regions in sorted order given through dma-ranges
> DT property.
> 
> While initializing IOMMU domain of PCI EPs connected to that host bridge
> This list of resources will be processed and IOVAs for the address holes
> will be reserved.

s/bridge This list/bridge, this list/

> Signed-off-by: Srinath Mannam 
> Based-on-patch-by: Oza Pawandeep 
> Reviewed-by: Oza Pawandeep 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/probe.c | 3 +++
>  include/linux/pci.h | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 257b9f6..ce5505f 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -544,6 +544,7 @@ struct pci_host_bridge *pci_alloc_host_bridge(size_t priv)
>   return NULL;
>  
>   INIT_LIST_HEAD(>windows);
> + INIT_LIST_HEAD(>dma_ranges);
>   bridge->dev.release = pci_release_host_bridge_dev;
>  
>   /*
> @@ -572,6 +573,7 @@ struct pci_host_bridge *devm_pci_alloc_host_bridge(struct 
> device *dev,
>   return NULL;
>  
>   INIT_LIST_HEAD(>windows);
> + INIT_LIST_HEAD(>dma_ranges);
>   bridge->dev.release = devm_pci_release_host_bridge_dev;
>  
>   return bridge;
> @@ -581,6 +583,7 @@ EXPORT_SYMBOL(devm_pci_alloc_host_bridge);
>  void pci_free_host_bridge(struct pci_host_bridge *bridge)
>  {
>   pci_free_resource_list(>windows);
> + pci_free_resource_list(>dma_ranges);
>  
>   kfree(bridge);
>  }
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 65f1d8c..016a044 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -487,6 +487,7 @@ struct pci_host_bridge {
>   void*sysdata;
>   int busnr;
>   struct list_head windows;   /* resource_entry */
> + struct list_head dma_ranges;/* dma ranges resource list */
>   u8 (*swizzle_irq)(struct pci_dev *, u8 *); /* Platform IRQ swizzler */
>   int (*map_irq)(const struct pci_dev *, u8, u8);
>   void (*release_fn)(struct pci_host_bridge *);
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 0/3] PCIe Host request to reserve IOVA

2019-04-18 Thread Bjorn Helgaas

[+cc Scott]

On Fri, Apr 12, 2019 at 08:43:32AM +0530, Srinath Mannam wrote:
> Few SOCs have limitation that their PCIe host can't allow few inbound
> address ranges. Allowed inbound address ranges are listed in dma-ranges
> DT property and this address ranges are required to do IOVA mapping.
> Remaining address ranges have to be reserved in IOVA mapping.
> 
> PCIe Host driver of those SOCs has to list resource entries of allowed
> address ranges given in dma-ranges DT property in sorted order. This
> sorted list of resources will be processed and reserve IOVA address for
> inaccessible address holes while initializing IOMMU domain.
> 
> This patch set is based on Linux-5.0-rc2.
> 
> Changes from v3:
>   - Addressed Robin Murphy review comments.
> - pcie-iproc: parse dma-ranges and make sorted resource list.
> - dma-iommu: process list and reserve gaps between entries
> 
> Changes from v2:
>   - Patch set rebased to Linux-5.0-rc2
> 
> Changes from v1:
>   - Addressed Oza review comments.
> 
> Srinath Mannam (3):
>   PCI: Add dma_ranges window list
>   iommu/dma: Reserve IOVA for PCIe inaccessible DMA address
>   PCI: iproc: Add sorted dma ranges resource entries to host bridge
> 
>  drivers/iommu/dma-iommu.c   | 19 
>  drivers/pci/controller/pcie-iproc.c | 44 
> -
>  drivers/pci/probe.c |  3 +++
>  include/linux/pci.h |  1 +
>  4 files changed, 66 insertions(+), 1 deletion(-)

To make progress on this, I think we need an ack from Joerg for the
dma-iommu.c part, an ack from Ray or Scott for the pcie-iproc.c part,
and an ack from Robin for the thing as a whole.

Then I would say Lorenzo should take a look and merge if he approves, since
pcie-iproc.c contains the most changes.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 0/3] PCIe Host request to reserve IOVA

2019-04-18 Thread Bjorn Helgaas

On Tue, Apr 16, 2019 at 05:28:36PM +0530, Srinath Mannam wrote:
> On Sat, Apr 13, 2019 at 4:04 AM Bjorn Helgaas  wrote:
> > On Fri, Apr 12, 2019 at 08:43:32AM +0530, Srinath Mannam wrote:
> > > Few SOCs have limitation that their PCIe host can't allow few inbound
> > > address ranges. Allowed inbound address ranges are listed in dma-ranges
> > > DT property and this address ranges are required to do IOVA mapping.
> > > Remaining address ranges have to be reserved in IOVA mapping.
>
> > If I understand correctly, devices below these PCIe host bridges can
> > DMA only to the listed address ranges, and you prevent devices from
> > doing DMA to the holes between the listed ranges by reserving the
> > holes in dma-iommu.
>
> Yes, devices below these PCIe host bridges can DMA only to the listed
> address ranges,
> and this patch prevents to allocate DMA(IOVA) addresses in the holes
> of listed ranges.
>
> > Apparently there's something that makes sure driver dma_map_*() always
> > goes through dma-iommu?  I traced as far as seeing that dma-iommu
> > depends on CONFIG_IOMMU_DMA, and that arm64 selects CONFIG_IOMMU_DMA
> > if CONFIG_IOMMU_SUPPORT, but then the trail got cold.  I didn't see
> > what selects CONFIG_IOMMU_SUPPORT.
>
> IOMMU_SUPPORT depends on MMU.

Yes, I see that IOMMU_SUPPORT depends on MMU (in
drivers/iommu/Kconfig).  But that doesn't *select* IOMMU_SUPPORT; it
only means you *can't* select it unless MMU has already been selected.

I think you only get dma-iommu if you choose to select IOMMU_SUPPORT
via menuconfig or whatever, and the current config rules allow you to
turn that off.  Maybe that's OK, I dunno.  If you do turn it off, I
guess we'll ignore the holes in "dma-ranges" and devices will be able
to DMA to the holes.

> > This does look like what Robin suggested, as far as I can tell.
> > Hopefully he'll take a look and give his reviewed-by.  Thanks for
> > persevering!
> Thank you.
> 
> Regards,
> Srinath.
> >
> > > PCIe Host driver of those SOCs has to list resource entries of allowed
> > > address ranges given in dma-ranges DT property in sorted order. This
> > > sorted list of resources will be processed and reserve IOVA address for
> > > inaccessible address holes while initializing IOMMU domain.
> > >
> > > This patch set is based on Linux-5.0-rc2.
> > >
> > > Changes from v3:
> > >   - Addressed Robin Murphy review comments.
> > > - pcie-iproc: parse dma-ranges and make sorted resource list.
> > > - dma-iommu: process list and reserve gaps between entries
> > >
> > > Changes from v2:
> > >   - Patch set rebased to Linux-5.0-rc2
> > >
> > > Changes from v1:
> > >   - Addressed Oza review comments.
> > >
> > > Srinath Mannam (3):
> > >   PCI: Add dma_ranges window list
> > >   iommu/dma: Reserve IOVA for PCIe inaccessible DMA address
> > >   PCI: iproc: Add sorted dma ranges resource entries to host bridge
> > >
> > >  drivers/iommu/dma-iommu.c   | 19 
> > >  drivers/pci/controller/pcie-iproc.c | 44 
> > > -
> > >  drivers/pci/probe.c |  3 +++
> > >  include/linux/pci.h |  1 +
> > >  4 files changed, 66 insertions(+), 1 deletion(-)
> > >
> > > --
> > > 2.7.4
> > >
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 01/29] tracing: Cleanup stack trace code

2019-04-18 Thread Thomas Gleixner

On Thu, 18 Apr 2019, Steven Rostedt wrote:
> On Thu, 18 Apr 2019 10:41:20 +0200
> Thomas Gleixner  wrote:
> 
> > @@ -412,23 +404,20 @@ stack_trace_sysctl(struct ctl_table *tab
> >void __user *buffer, size_t *lenp,
> >loff_t *ppos)
> >  {
> > -   int ret;
> > +   int ret, was_enabled;
> 
> One small nit. Could this be:
> 
>   int was_enabled;
>   int ret;
> 
> I prefer only joining variables that are related on the same line.
> Makes it look cleaner IMO.

If you wish so. To me it's waste of screen space :)

> >  
> > mutex_lock(_sysctl_mutex);
> > +   was_enabled = !!stack_tracer_enabled;
> >  
> 
> Bah, not sure why I didn't do it this way to begin with. I think I
> copied something else that couldn't do it this way for some reason and
> didn't put any brain power behind the copy. :-/ But that was back in
> 2008 so I blame it on being "young and stupid" ;-)

The young part is gone for sure :)

> Other then the above nit and removing the unneeded +1 in max_entries:

s/+1/-1/

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 20/29] tracing: Simplify stacktrace retrieval in histograms

2019-04-18 Thread Tom Zanussi

Hi Steve,

On Thu, 2019-04-18 at 16:13 -0400, Steven Rostedt wrote:
> On Thu, 18 Apr 2019 14:58:55 -0500
> Tom Zanussi  wrote:
> 
> > > Tom,
> > > 
> > > Can you review this too?  
> > 
> > Looks good to me too!
> > 
> > Acked-by: Tom Zanussi 
> > 
> 
> Would you be OK to upgrade this to a Reviewed-by tag?
> 

Yeah, I did review it and even tested it, so:

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 

Tom


> Thanks!
> 
> -- Steve

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 20/29] tracing: Simplify stacktrace retrieval in histograms

2019-04-18 Thread Tom Zanussi

On Thu, 2019-04-18 at 09:40 -0400, Steven Rostedt wrote:
> [ Added Tom Zanussi ]
> 
> On Thu, 18 Apr 2019 10:41:39 +0200
> Thomas Gleixner  wrote:
> 
> > The indirection through struct stack_trace is not necessary at all.
> > Use the
> > storage array based interface.
> > 
> > Signed-off-by: Thomas Gleixner 
> > Cc: Steven Rostedt 
> 
> Looks fine to me
> 
> Acked-by: Steven Rostedt (VMware) 
> 
>  But...
> 
> Tom,
> 
> Can you review this too?

Looks good to me too!

Acked-by: Tom Zanussi 


> 
> Patch series starts here:
> 
>   http://lkml.kernel.org/r/20190418084119.056416...@linutronix.de
> 
> Thanks,
> 
> -- Steve
> 
> > ---
> >  kernel/trace/trace_events_hist.c |   12 +++-
> >  1 file changed, 3 insertions(+), 9 deletions(-)
> > 
> > --- a/kernel/trace/trace_events_hist.c
> > +++ b/kernel/trace/trace_events_hist.c
> > @@ -5186,7 +5186,6 @@ static void event_hist_trigger(struct ev
> > u64 var_ref_vals[TRACING_MAP_VARS_MAX];
> > char compound_key[HIST_KEY_SIZE_MAX];
> > struct tracing_map_elt *elt = NULL;
> > -   struct stack_trace stacktrace;
> > struct hist_field *key_field;
> > u64 field_contents;
> > void *key = NULL;
> > @@ -5198,14 +5197,9 @@ static void event_hist_trigger(struct ev
> > key_field = hist_data->fields[i];
> >  
> > if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
> > -   stacktrace.max_entries = HIST_STACKTRACE_DEPTH;
> > -   stacktrace.entries = entries;
> > -   stacktrace.nr_entries = 0;
> > -   stacktrace.skip = HIST_STACKTRACE_SKIP;
> > -
> > -   memset(stacktrace.entries, 0,
> > HIST_STACKTRACE_SIZE);
> > -   save_stack_trace();
> > -
> > +   memset(entries, 0, HIST_STACKTRACE_SIZE);
> > +   stack_trace_save(entries,
> > HIST_STACKTRACE_DEPTH,
> > +HIST_STACKTRACE_SKIP);
> > key = entries;
> > } else {
> > field_contents = key_field->fn(key_field, elt,
> > rbe, rec);
> > 
> 
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/dmar: Use struct_size() helper

2019-04-18 Thread Gustavo A. R. Silva

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.

So, replace code of the following form:

size = sizeof(*info) + level * sizeof(info->path[0]);

with:

size = struct_size(info, path, level);

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/iommu/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 9c49300e9fb7..6d969a172fbb 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -145,7 +145,7 @@ dmar_alloc_pci_notify_info(struct pci_dev *dev, unsigned 
long event)
for (tmp = dev; tmp; tmp = tmp->bus->self)
level++;
 
-   size = sizeof(*info) + level * sizeof(info->path[0]);
+   size = struct_size(info, path, level);
if (size <= sizeof(dmar_pci_notify_info_buf)) {
info = (struct dmar_pci_notify_info *)dmar_pci_notify_info_buf;
} else {
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 01/29] tracing: Cleanup stack trace code

2019-04-18 Thread Steven Rostedt

On Thu, 18 Apr 2019 10:41:20 +0200
Thomas Gleixner  wrote:


> @@ -412,23 +404,20 @@ stack_trace_sysctl(struct ctl_table *tab
>  void __user *buffer, size_t *lenp,
>  loff_t *ppos)
>  {
> - int ret;
> + int ret, was_enabled;

One small nit. Could this be:

int was_enabled;
int ret;

I prefer only joining variables that are related on the same line.
Makes it look cleaner IMO.

>  
>   mutex_lock(_sysctl_mutex);
> + was_enabled = !!stack_tracer_enabled;
>  

Bah, not sure why I didn't do it this way to begin with. I think I
copied something else that couldn't do it this way for some reason and
didn't put any brain power behind the copy. :-/ But that was back in
2008 so I blame it on being "young and stupid" ;-)

Other then the above nit and removing the unneeded +1 in max_entries:

Reviewed-by: Steven Rostedt (VMware) 

-- Steve


>   ret = proc_dointvec(table, write, buffer, lenp, ppos);
>  
> - if (ret || !write ||
> - (last_stack_tracer_enabled == !!stack_tracer_enabled))
> + if (ret || !write || (was_enabled == !!stack_tracer_enabled))
>   goto out;
>  
> - last_stack_tracer_enabled = !!stack_tracer_enabled;
> -
>   if (stack_tracer_enabled)
>   register_ftrace_function(_ops);
>   else
>   unregister_ftrace_function(_ops);
> -
>   out:
>   mutex_unlock(_sysctl_mutex);
>   return ret;
> @@ -444,7 +433,6 @@ static __init int enable_stacktrace(char
>   strncpy(stack_trace_filter_buf, str + len, COMMAND_LINE_SIZE);
>  
>   stack_tracer_enabled = 1;
> - last_stack_tracer_enabled = 1;
>   return 1;
>  }
>  __setup("stacktrace", enable_stacktrace);
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 01/29] tracing: Cleanup stack trace code

2019-04-18 Thread Steven Rostedt

On Thu, 18 Apr 2019 17:24:43 -0400
Steven Rostedt  wrote:

> I believe it was for historical leftovers (there was a time it was
> required), and left there for "paranoid" sake. But let me apply the
> patch and see if it is really needed.

I removed the +1 on the max_entries and set SET_TRACE_ENTRIES to 5 (a
bit extreme). Then I ran the stack tracing with KASAN enabled and it
never complained.

As stated, it was there for historical reasons and I felt 500 was way
more than enough and left the buffer there just out of laziness and
paranoia.

Feel free to remove that if you want.

-- Steve
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 01/29] tracing: Cleanup stack trace code

2019-04-18 Thread Steven Rostedt

On Thu, 18 Apr 2019 23:14:45 +0200 (CEST)
Thomas Gleixner  wrote:

> On Thu, 18 Apr 2019, Josh Poimboeuf wrote:
> 
> > On Thu, Apr 18, 2019 at 10:41:20AM +0200, Thomas Gleixner wrote:  
> > > - Remove the extra array member of stack_dump_trace[]. It's not required 
> > > as
> > >   the stack tracer stores at max array size - 1 entries so there is still
> > >   an empty slot.  
> > 
> > What is the empty slot used for?  
> 
> I was trying to find an answer but failed. Maybe it's just historical
> leftovers or Steven knows where the magic is in this maze.
>

I believe it was for historical leftovers (there was a time it was
required), and left there for "paranoid" sake. But let me apply the
patch and see if it is really needed.

-- Steve
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 01/29] tracing: Cleanup stack trace code

2019-04-18 Thread Thomas Gleixner

On Thu, 18 Apr 2019, Josh Poimboeuf wrote:

> On Thu, Apr 18, 2019 at 10:41:20AM +0200, Thomas Gleixner wrote:
> > - Remove the extra array member of stack_dump_trace[]. It's not required as
> >   the stack tracer stores at max array size - 1 entries so there is still
> >   an empty slot.
> 
> What is the empty slot used for?

I was trying to find an answer but failed. Maybe it's just historical
leftovers or Steven knows where the magic is in this maze.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 20/29] tracing: Simplify stacktrace retrieval in histograms

2019-04-18 Thread Steven Rostedt

On Thu, 18 Apr 2019 14:58:55 -0500
Tom Zanussi  wrote:

> > Tom,
> > 
> > Can you review this too?  
> 
> Looks good to me too!
> 
> Acked-by: Tom Zanussi 
> 

Would you be OK to upgrade this to a Reviewed-by tag?

Thanks!

-- Steve
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie

2019-04-18 Thread Thomas Gleixner

On Thu, 18 Apr 2019, Julien Grall wrote:

> When an MSI doorbell is located downstream of an IOMMU, it is required
> to swizzle the physical address with an appropriately-mapped IOVA for any
> device attached to one of our DMA ops domain.
> 
> At the moment, the allocation of the mapping may be done when composing
> the message. However, the composing may be done in non-preemtible
> context while the allocation requires to be called from preemptible
> context.
> 
> A follow-up patch will split the current logic in two functions
> requiring to keep an IOMMU cookie per MSI.
> 
> This patch introduces a new field in msi_desc to store an IOMMU cookie
> when CONFIG_IOMMU_DMA is selected.

# git grep 'This patch' Documentation/process/

Applied to the whole series.

Thanks

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/3] iommu/amd: move gart fallback to amd_iommu_init

2019-04-18 Thread Kevin Mitchell via iommu

The fallback to the GART driver in the case amd_iommu doesn't work was
executed in a function called free_iommu_resources, which didn't really
make sense. This was even being called twice if amd_iommu=off was
specified on the command line.

The only complication is that it needs to be verified that amd_iommu has
fully relinquished control by calling free_iommu_resources and emptying
the amd_iommu_list.

Signed-off-by: Kevin Mitchell 
---
 drivers/iommu/amd_iommu_init.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 3798d7303c99..5f3df5ae6ba8 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2345,15 +2345,6 @@ static void __init free_iommu_resources(void)
amd_iommu_dev_table = NULL;
 
free_iommu_all();
-
-#ifdef CONFIG_GART_IOMMU
-   /*
-* We failed to initialize the AMD IOMMU - try fallback to GART
-* if possible.
-*/
-   gart_iommu_init();
-
-#endif
 }
 
 /* SB IOAPIC is always on this device in AMD systems */
@@ -2774,6 +2765,16 @@ static int __init amd_iommu_init(void)
}
}
 
+#ifdef CONFIG_GART_IOMMU
+   if (ret && list_empty(_iommu_list)) {
+   /*
+* We failed to initialize the AMD IOMMU - try fallback
+* to GART if possible.
+*/
+   gart_iommu_init();
+   }
+#endif
+
for_each_iommu(iommu)
amd_iommu_debugfs_setup(iommu);
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/3] iommu/amd: make iommu_disable safer

2019-04-18 Thread Kevin Mitchell via iommu

Make it safe to call iommu_disable during early init error conditions
before mmio_base is set, but after the struct amd_iommu has been added
to the amd_iommu_list. For example, this happens if firmware fails to
fill in mmio_phys in the ACPI table leading to a NULL pointer
dereference in iommu_feature_disable.

Signed-off-by: Kevin Mitchell 
---
 drivers/iommu/amd_iommu_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index f773792d77fd..3798d7303c99 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -424,6 +424,9 @@ static void iommu_enable(struct amd_iommu *iommu)
 
 static void iommu_disable(struct amd_iommu *iommu)
 {
+   if (!iommu->mmio_base)
+   return;
+
/* Disable command buffer */
iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/3] iommu/amd: only free resources once on init error

2019-04-18 Thread Kevin Mitchell via iommu

When amd_iommu=off was specified on the command line, free_X_resources
functions were called immediately after early_amd_iommu_init. They were
then called again when amd_iommu_init also failed (as expected).

Instead, call them only once: at the end of state_next() whenever
there's an error. These functions should be safe to call any time and
any number of times. However, since state_next is never called again in
an error state, the cleanup will only ever be run once.

This also ensures that cleanup code is run as soon as possible after an
error is detected rather than waiting for amd_iommu_init() to be called.

Signed-off-by: Kevin Mitchell 
---
 drivers/iommu/amd_iommu_init.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5f3df5ae6ba8..24fc060fe596 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2638,8 +2638,6 @@ static int __init state_next(void)
init_state = ret ? IOMMU_INIT_ERROR : IOMMU_ACPI_FINISHED;
if (init_state == IOMMU_ACPI_FINISHED && amd_iommu_disabled) {
pr_info("AMD IOMMU disabled on kernel command-line\n");
-   free_dma_resources();
-   free_iommu_resources();
init_state = IOMMU_CMDLINE_DISABLED;
ret = -EINVAL;
}
@@ -2680,6 +2678,19 @@ static int __init state_next(void)
BUG();
}
 
+   if (ret) {
+   free_dma_resources();
+   if (!irq_remapping_enabled) {
+   disable_iommus();
+   free_iommu_resources();
+   } else {
+   struct amd_iommu *iommu;
+
+   uninit_device_table_dma();
+   for_each_iommu(iommu)
+   iommu_flush_all_caches(iommu);
+   }
+   }
return ret;
 }
 
@@ -2753,18 +2764,6 @@ static int __init amd_iommu_init(void)
int ret;
 
ret = iommu_go_to_state(IOMMU_INITIALIZED);
-   if (ret) {
-   free_dma_resources();
-   if (!irq_remapping_enabled) {
-   disable_iommus();
-   free_iommu_resources();
-   } else {
-   uninit_device_table_dma();
-   for_each_iommu(iommu)
-   iommu_flush_all_caches(iommu);
-   }
-   }
-
 #ifdef CONFIG_GART_IOMMU
if (ret && list_empty(_iommu_list)) {
/*
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/3] handle init errors more gracefully in amd_iommu

2019-04-18 Thread Kevin Mitchell via iommu

This series makes error handling more robust in the amd_iommu init
code. It was initially motivated by problematic firmware that does not
set up the physical address of the iommu. This led to a NULL dereference
panic when iommu_disable was called during cleanup.

While the first patch is sufficient to avoid the panic, the subsequent
two move the cleanup closer to the actual error and avoid calling the
cleanup code twice when amd_iommu=off is specified on the command line.

I have tested this series on a variety of AMD CPUs with firmware
exhibiting the issue. I have additionally tested on platforms where the
firmware has been fixed. I tried both with and without amd_iommu=off. I
have also tested on older CPUs where no IOMMU is detected and even one
where the GART driver ends up running.

Thanks,

Kevin

Kevin Mitchell (3):
  iommu/amd: make iommu_disable safer
  iommu/amd: move gart fallback to amd_iommu_init
  iommu/amd: only free resources once on init error

 drivers/iommu/amd_iommu_init.c | 45 ++
 1 file changed, 24 insertions(+), 21 deletions(-)

-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-04-18 Thread Robin Murphy


On 18/04/2019 17:35, Christoph Hellwig wrote:

On Thu, Apr 18, 2019 at 04:06:56PM +0100, Robin Murphy wrote:

OK, I'm still looking at mmap and get_sgtable, but for now I've pushed out
a partial branch that consolidates alloc and free in a way which makes
sense to me:

   git://linux-arm.org/linux-rm  dma/rework

Please let me know what you think.


 From a very high level POV this looks ok, but sometimes a bit to
convoluted to me.  The major issue why I went with the version I posted
is that I can cleanly ifdef out the remap code in just a few sections.
In this version it is spread out a lot more, and the use of IS_ENABLED
means that we'd need a lot more stubs for functionality that won't
ever be called but needs to be compilable.


OK, for some reason I'd convinced myself that mmap and get_sgtable would 
need changes to properly handle the new non-remapped CMA case - not sure 
how I reached that conclusion. On inspection they do appear to be broken 
for the non-coherent atomic case, but AFAICS that's been so forever...


Still, I've worked in the vm_map_pages() stuff pending in MM and given 
them the same treatment to finish the picture. Both x86_64_defconfig and 
i386_defconfig do indeed compile and link fine as I expected, so I 
really would like to understand the concern around #ifdefs better.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 6/7] irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg

2019-04-18 Thread Julien Grall

The functions mbi_compose_m{b, s}i_msg may be called from non-preemptible
context. However, on RT, iommu_dma_map_msi_msg requires to be called
from a preemptible context.

A recent patch split the function iommu_dma_map_msi_msg in 2 functions:
one that should be called in preemptible context, the other does
not have any requirement.

This patch reworks the GICv3 MBI driver to avoid executing preemptible
code in non-preemptible context by preparing the MSI mappings when
allocating the MSI interrupt.

Signed-off-by: Julien Grall 
---
 drivers/irqchip/irq-gic-v3-mbi.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-mbi.c b/drivers/irqchip/irq-gic-v3-mbi.c
index fbfa7ff6deb1..c812b80e3ce9 100644
--- a/drivers/irqchip/irq-gic-v3-mbi.c
+++ b/drivers/irqchip/irq-gic-v3-mbi.c
@@ -84,6 +84,7 @@ static void mbi_free_msi(struct mbi_range *mbi, unsigned int 
hwirq,
 static int mbi_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
   unsigned int nr_irqs, void *args)
 {
+   msi_alloc_info_t *info = args;
struct mbi_range *mbi = NULL;
int hwirq, offset, i, err = 0;
 
@@ -104,6 +105,16 @@ static int mbi_irq_domain_alloc(struct irq_domain *domain, 
unsigned int virq,
 
hwirq = mbi->spi_start + offset;
 
+   err = iommu_dma_prepare_msi(info->desc,
+   mbi_phys_base + GICD_CLRSPI_NSR);
+   if (err)
+   return err;
+
+   err = iommu_dma_prepare_msi(info->desc,
+   mbi_phys_base + GICD_SETSPI_NSR);
+   if (err)
+   return err;
+
for (i = 0; i < nr_irqs; i++) {
err = mbi_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
if (err)
@@ -142,7 +153,7 @@ static void mbi_compose_msi_msg(struct irq_data *data, 
struct msi_msg *msg)
msg[0].address_lo = lower_32_bits(mbi_phys_base + GICD_SETSPI_NSR);
msg[0].data = data->parent_data->hwirq;
 
-   iommu_dma_map_msi_msg(data->irq, msg);
+   iommu_dma_compose_msi_msg(data->irq, msg);
 }
 
 #ifdef CONFIG_PCI_MSI
@@ -202,7 +213,7 @@ static void mbi_compose_mbi_msg(struct irq_data *data, 
struct msi_msg *msg)
msg[1].address_lo = lower_32_bits(mbi_phys_base + GICD_CLRSPI_NSR);
msg[1].data = data->parent_data->hwirq;
 
-   iommu_dma_map_msi_msg(data->irq, [1]);
+   iommu_dma_compose_msi_msg(data->irq, [1]);
 }
 
 /* Platform-MSI specific irqchip */
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 7/7] iommu/dma-iommu: Remove iommu_dma_map_msi_msg()

2019-04-18 Thread Julien Grall

A recent patch introduced two new functions to replace
iommu_dma_map_msi_msg() to avoid executing preemptible code in
non-preemptible context.

All the existings callers are now using the two new functions, so
iommu_dma_map_msi_msg() can be removed.

Signed-off-by: Julien Grall 
---
 drivers/iommu/dma-iommu.c | 20 
 include/linux/dma-iommu.h |  5 -
 2 files changed, 25 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f5c1f1685095..fdc8ded62e87 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -928,23 +928,3 @@ void iommu_dma_compose_msi_msg(int irq, struct msi_msg 
*msg)
msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1;
msg->address_lo += lower_32_bits(msi_page->iova);
 }
-
-void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
-{
-   struct msi_desc *desc = irq_get_msi_desc(irq);
-   phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo;
-
-   if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) {
-   /*
-* We're called from a void callback, so the best we can do is
-* 'fail' by filling the message with obviously bogus values.
-* Since we got this far due to an IOMMU being present, it's
-* not like the existing address would have worked anyway...
-*/
-   msg->address_hi = ~0U;
-   msg->address_lo = ~0U;
-   msg->data = ~0U;
-   } else {
-   iommu_dma_compose_msi_msg(irq, msg);
-   }
-}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 2f4b2c2cc859..4fe2b2fb19bf 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -81,7 +81,6 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t 
msi_addr);
 /* Update the MSI message if required. */
 void iommu_dma_compose_msi_msg(int irq, struct msi_msg *msg);
 
-void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
 #else
@@ -120,10 +119,6 @@ static inline void iommu_dma_compose_msi_msg(int irq, 
struct msi_msg *msg)
 {
 }
 
-static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
-{
-}
-
 static inline void iommu_dma_get_resv_regions(struct device *dev, struct 
list_head *list)
 {
 }
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/7] irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg

2019-04-18 Thread Julien Grall

The function gicv2m_compose_msi_msg may be called from non-preemptible
context. However, on RT, iommu_dma_map_msi_msg requires to be called
from a preemptible context.

A recent patch split the function iommu_dma_map_msi_msg in 2 functions:
one that should be called in preemptible context, the other does
not have any requirement.

This patch reworks the gicv2m driver to avoid executing preemptible code
in non-preemptible context by preparing the MSI mapping when allocating
the MSI interrupt.

Signed-off-by: Julien Grall 
---
 drivers/irqchip/irq-gic-v2m.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v2m.c b/drivers/irqchip/irq-gic-v2m.c
index f5fe0100f9ff..e5372acd92c9 100644
--- a/drivers/irqchip/irq-gic-v2m.c
+++ b/drivers/irqchip/irq-gic-v2m.c
@@ -110,7 +110,7 @@ static void gicv2m_compose_msi_msg(struct irq_data *data, 
struct msi_msg *msg)
if (v2m->flags & GICV2M_NEEDS_SPI_OFFSET)
msg->data -= v2m->spi_offset;
 
-   iommu_dma_map_msi_msg(data->irq, msg);
+   iommu_dma_compose_msi_msg(data->irq, msg);
 }
 
 static struct irq_chip gicv2m_irq_chip = {
@@ -167,6 +167,7 @@ static void gicv2m_unalloc_msi(struct v2m_data *v2m, 
unsigned int hwirq,
 static int gicv2m_irq_domain_alloc(struct irq_domain *domain, unsigned int 
virq,
   unsigned int nr_irqs, void *args)
 {
+   msi_alloc_info_t *info = args;
struct v2m_data *v2m = NULL, *tmp;
int hwirq, offset, i, err = 0;
 
@@ -186,6 +187,11 @@ static int gicv2m_irq_domain_alloc(struct irq_domain 
*domain, unsigned int virq,
 
hwirq = v2m->spi_start + offset;
 
+   err = iommu_dma_prepare_msi(info->desc,
+   v2m->res.start + V2M_MSI_SETSPI_NS);
+   if (err)
+   return err;
+
for (i = 0; i < nr_irqs; i++) {
err = gicv2m_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
if (err)
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts

2019-04-18 Thread Julien Grall

On RT, the function iommu_dma_map_msi_msg may be called from
non-preemptible context. This will lead to a splat with
CONFIG_DEBUG_ATOMIC_SLEEP as the function is using spin_lock
(they can sleep on RT).

The function iommu_dma_map_msi_msg is used to map the MSI page in the
IOMMU PT and update the MSI message with the IOVA.

Only the part to lookup for the MSI page requires to be called in
preemptible context. As the MSI page cannot change over the lifecycle
of the MSI interrupt, the lookup can be cached and re-used later on.

This patch split the function iommu_dma_map_msi_msg in two new
functions:
- iommu_dma_prepare_msi: This function will prepare the mapping in
the IOMMU and store the cookie in the structure msi_desc. This
function should be called in preemptible context.
- iommu_dma_compose_msi_msg: This function will update the MSI
message with the IOVA when the device is behind an IOMMU.

Signed-off-by: Julien Grall 
---
 drivers/iommu/dma-iommu.c | 43 ---
 include/linux/dma-iommu.h | 21 +
 2 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 77aabe637a60..f5c1f1685095 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -888,17 +888,17 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
return NULL;
 }
 
-void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
+int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
 {
-   struct device *dev = msi_desc_to_dev(irq_get_msi_desc(irq));
+   struct device *dev = msi_desc_to_dev(desc);
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct iommu_dma_cookie *cookie;
-   struct iommu_dma_msi_page *msi_page;
-   phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo;
unsigned long flags;
 
-   if (!domain || !domain->iova_cookie)
-   return;
+   if (!domain || !domain->iova_cookie) {
+   desc->iommu_cookie = NULL;
+   return 0;
+   }
 
cookie = domain->iova_cookie;
 
@@ -908,10 +908,33 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
 * of an MSI from within an IPI handler.
 */
spin_lock_irqsave(>msi_lock, flags);
-   msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain);
+   desc->iommu_cookie = iommu_dma_get_msi_page(dev, msi_addr, domain);
spin_unlock_irqrestore(>msi_lock, flags);
 
-   if (WARN_ON(!msi_page)) {
+   return (desc->iommu_cookie) ? 0 : -ENOMEM;
+}
+
+void iommu_dma_compose_msi_msg(int irq, struct msi_msg *msg)
+{
+   struct msi_desc *desc = irq_get_msi_desc(irq);
+   struct device *dev = msi_desc_to_dev(desc);
+   const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+   const struct iommu_dma_msi_page *msi_page = desc->iommu_cookie;
+
+   if (!domain || !domain->iova_cookie || WARN_ON(!msi_page))
+   return;
+
+   msg->address_hi = upper_32_bits(msi_page->iova);
+   msg->address_lo &= cookie_msi_granule(domain->iova_cookie) - 1;
+   msg->address_lo += lower_32_bits(msi_page->iova);
+}
+
+void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
+{
+   struct msi_desc *desc = irq_get_msi_desc(irq);
+   phys_addr_t msi_addr = (u64)msg->address_hi << 32 | msg->address_lo;
+
+   if (WARN_ON(iommu_dma_prepare_msi(desc, msi_addr))) {
/*
 * We're called from a void callback, so the best we can do is
 * 'fail' by filling the message with obviously bogus values.
@@ -922,8 +945,6 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
msg->address_lo = ~0U;
msg->data = ~0U;
} else {
-   msg->address_hi = upper_32_bits(msi_page->iova);
-   msg->address_lo &= cookie_msi_granule(cookie) - 1;
-   msg->address_lo += lower_32_bits(msi_page->iova);
+   iommu_dma_compose_msi_msg(irq, msg);
}
 }
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index e760dc5d1fa8..2f4b2c2cc859 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -71,12 +71,23 @@ void iommu_dma_unmap_resource(struct device *dev, 
dma_addr_t handle,
size_t size, enum dma_data_direction dir, unsigned long attrs);
 
 /* The DMA API isn't _quite_ the whole story, though... */
+/*
+ * Map the MSI page in the IOMMU device and store it in @desc
+ *
+ * Return 0 if succeeded other an error if the preparation has failed.
+ */
+int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr);
+
+/* Update the MSI message if required. */
+void iommu_dma_compose_msi_msg(int irq, struct msi_msg *msg);
+
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head

[PATCH 4/7] irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg

2019-04-18 Thread Julien Grall

The function its_irq_compose_msi_msg may be called from non-preemptible
context. However, on RT, iommu_dma_map_msi_msg requires to be called
from a preemptible context.

A recent patch split the function iommu_dma_map_msi_msg in 2 functions:
one that should be called in preemptible context, the other does
not have any requirement.

This patch reworks the GICv3 ITS driver to avoid executing preemptible
code in non-preemptible context by preparing the MSI mapping when
allocating the MSI interrupt.

Signed-off-by: Julien Grall 
---
 drivers/irqchip/irq-gic-v3-its.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 7577755bdcf4..1e8e01797d9b 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1179,7 +1179,7 @@ static void its_irq_compose_msi_msg(struct irq_data *d, 
struct msi_msg *msg)
msg->address_hi = upper_32_bits(addr);
msg->data   = its_get_event_id(d);
 
-   iommu_dma_map_msi_msg(d->irq, msg);
+   iommu_dma_compose_msi_msg(d->irq, msg);
 }
 
 static int its_irq_set_irqchip_state(struct irq_data *d,
@@ -2566,6 +2566,7 @@ static int its_irq_domain_alloc(struct irq_domain 
*domain, unsigned int virq,
 {
msi_alloc_info_t *info = args;
struct its_device *its_dev = info->scratchpad[0].ptr;
+   struct its_node *its = its_dev->its;
irq_hw_number_t hwirq;
int err;
int i;
@@ -2574,6 +2575,8 @@ static int its_irq_domain_alloc(struct irq_domain 
*domain, unsigned int virq,
if (err)
return err;
 
+   err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev));
+
for (i = 0; i < nr_irqs; i++) {
err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
if (err)
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 5/7] irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg

2019-04-18 Thread Julien Grall

The function ls_scfg_msi_compose_msg may be called from non-preemptible
context. However, on RT, iommu_dma_map_msi_msg requires to be called
from a preemptible context.

A recent patch split the function iommu_dma_map_msi_msg in 2 functions:
one that should be called in preemptible context, the other does
not have any requirement.

This patch reworks the FreeScale SCFG MSI driver to avoid executing
preemptible code in non-preemptible context by preparing the MSI mapping
when allocating the MSI interrupt.

Signed-off-by: Julien Grall 
---
 drivers/irqchip/irq-ls-scfg-msi.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-ls-scfg-msi.c 
b/drivers/irqchip/irq-ls-scfg-msi.c
index c671b3212010..8099c5b1fcb5 100644
--- a/drivers/irqchip/irq-ls-scfg-msi.c
+++ b/drivers/irqchip/irq-ls-scfg-msi.c
@@ -100,7 +100,7 @@ static void ls_scfg_msi_compose_msg(struct irq_data *data, 
struct msi_msg *msg)
msg->data |= cpumask_first(mask);
}
 
-   iommu_dma_map_msi_msg(data->irq, msg);
+   iommu_dma_compose_msi_msg(data->irq, msg);
 }
 
 static int ls_scfg_msi_set_affinity(struct irq_data *irq_data,
@@ -141,6 +141,7 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain 
*domain,
unsigned int nr_irqs,
void *args)
 {
+   msi_alloc_info_t *info = args;
struct ls_scfg_msi *msi_data = domain->host_data;
int pos, err = 0;
 
@@ -157,6 +158,10 @@ static int ls_scfg_msi_domain_irq_alloc(struct irq_domain 
*domain,
if (err)
return err;
 
+   err = iommu_dma_prepare_msi(info->desc, msi_data->msiir_addr);
+   if (err)
+   return err;
+
irq_domain_set_info(domain, virq, pos,
_scfg_msi_parent_chip, msi_data,
handle_simple_irq, NULL, NULL);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/7] iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts

2019-04-18 Thread Julien Grall

Hi all,

On RT, the function iommu_dma_map_msi_msg expects to be called from preemptible
context. However, this is not always the case resulting a splat with
!CONFIG_DEBUG_ATOMIC_SLEEP:

[   48.875777] BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:974
[   48.875779] in_atomic(): 1, irqs_disabled(): 128, pid: 2103, name: ip
[   48.875782] INFO: lockdep is turned off.
[   48.875784] irq event stamp: 10684
[   48.875786] hardirqs last  enabled at (10683): [] 
_raw_spin_unlock_irqrestore+0x88/0x90
[   48.875791] hardirqs last disabled at (10684): [] 
_raw_spin_lock_irqsave+0x24/0x68
[   48.875796] softirqs last  enabled at (0): [] 
copy_process.isra.1.part.2+0x8d8/0x1970
[   48.875801] softirqs last disabled at (0): [<>]   
(null)
[   48.875805] Preemption disabled at:
[   48.875805] [] __setup_irq+0xd8/0x6c0
[   48.875811] CPU: 2 PID: 2103 Comm: ip Not tainted 
5.0.3-rt1-7-g42ede9a0fed6 #45
[   48.875815] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno 
Development Platform, BIOS EDK II Jan 23 2017
[   48.875817] Call trace:
[   48.875818]  dump_backtrace+0x0/0x140
[   48.875821]  show_stack+0x14/0x20
[   48.875823]  dump_stack+0xa0/0xd4
[   48.875827]  ___might_sleep+0x16c/0x1f8
[   48.875831]  rt_spin_lock+0x5c/0x70
[   48.875835]  iommu_dma_map_msi_msg+0x5c/0x1d8
[   48.875839]  gicv2m_compose_msi_msg+0x3c/0x48
[   48.875843]  irq_chip_compose_msi_msg+0x40/0x58
[   48.875846]  msi_domain_activate+0x38/0x98
[   48.875849]  __irq_domain_activate_irq+0x58/0xa0
[   48.875852]  irq_domain_activate_irq+0x34/0x58
[   48.875855]  irq_activate+0x28/0x30
[   48.875858]  __setup_irq+0x2b0/0x6c0
[   48.875861]  request_threaded_irq+0xdc/0x188
[   48.875865]  sky2_setup_irq+0x44/0xf8
[   48.875868]  sky2_open+0x1a4/0x240
[   48.875871]  __dev_open+0xd8/0x188
[   48.875874]  __dev_change_flags+0x164/0x1f0
[   48.875877]  dev_change_flags+0x20/0x60
[   48.875879]  do_setlink+0x2a0/0xd30
[   48.875882]  __rtnl_newlink+0x5b4/0x6d8
[   48.875885]  rtnl_newlink+0x50/0x78
[   48.875888]  rtnetlink_rcv_msg+0x178/0x640
[   48.875891]  netlink_rcv_skb+0x58/0x118
[   48.875893]  rtnetlink_rcv+0x14/0x20
[   48.875896]  netlink_unicast+0x188/0x200
[   48.875898]  netlink_sendmsg+0x248/0x3d8
[   48.875900]  sock_sendmsg+0x18/0x40
[   48.875904]  ___sys_sendmsg+0x294/0x2d0
[   48.875908]  __sys_sendmsg+0x68/0xb8
[   48.875911]  __arm64_sys_sendmsg+0x20/0x28
[   48.875914]  el0_svc_common+0x90/0x118
[   48.875918]  el0_svc_handler+0x2c/0x80
[   48.875922]  el0_svc+0x8/0xc

This series is a first attempt to rework how MSI are mapped and composed
when an IOMMU is present.

I was able to test the changes in GICv2m and GICv3 ITS. I don't have
hardware for the other interrupt controllers.

Cheers,

Julien Grall (7):
  genirq/msi: Add a new field in msi_desc to store an IOMMU cookie
  iommu/dma-iommu: Split iommu_dma_map_msi_msg in two parts
  irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg
  irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg
  irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg
  irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg
  iommu/dma-iommu: Remove iommu_dma_map_msi_msg()

 drivers/iommu/dma-iommu.c | 45 ---
 drivers/irqchip/irq-gic-v2m.c |  8 ++-
 drivers/irqchip/irq-gic-v3-its.c  |  5 -
 drivers/irqchip/irq-gic-v3-mbi.c  | 15 +++--
 drivers/irqchip/irq-ls-scfg-msi.c |  7 +-
 include/linux/dma-iommu.h | 20 +++--
 include/linux/msi.h   |  3 +++
 7 files changed, 74 insertions(+), 29 deletions(-)

-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/7] genirq/msi: Add a new field in msi_desc to store an IOMMU cookie

2019-04-18 Thread Julien Grall

When an MSI doorbell is located downstream of an IOMMU, it is required
to swizzle the physical address with an appropriately-mapped IOVA for any
device attached to one of our DMA ops domain.

At the moment, the allocation of the mapping may be done when composing
the message. However, the composing may be done in non-preemtible
context while the allocation requires to be called from preemptible
context.

A follow-up patch will split the current logic in two functions
requiring to keep an IOMMU cookie per MSI.

This patch introduces a new field in msi_desc to store an IOMMU cookie
when CONFIG_IOMMU_DMA is selected.

Signed-off-by: Julien Grall 
---
 include/linux/msi.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index 7e9b81c3b50d..d7907feef1bb 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -77,6 +77,9 @@ struct msi_desc {
struct device   *dev;
struct msi_msg  msg;
struct irq_affinity_desc*affinity;
+#ifdef CONFIG_IOMMU_DMA
+   const void  *iommu_cookie;
+#endif
 
union {
/* PCI MSI/X specific data */
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-04-18 Thread Robin Murphy


On 18/04/2019 17:35, Christoph Hellwig wrote:

On Thu, Apr 18, 2019 at 04:06:56PM +0100, Robin Murphy wrote:

OK, I'm still looking at mmap and get_sgtable, but for now I've pushed out
a partial branch that consolidates alloc and free in a way which makes
sense to me:

   git://linux-arm.org/linux-rm  dma/rework

Please let me know what you think.


 From a very high level POV this looks ok, but sometimes a bit to
convoluted to me.  The major issue why I went with the version I posted
is that I can cleanly ifdef out the remap code in just a few sections.
In this version it is spread out a lot more, and the use of IS_ENABLED
means that we'd need a lot more stubs for functionality that won't
ever be called but needs to be compilable.


What functionality do you have planned in that regard? I did do a quick 
build test of my arm64 config with DMA_DIRECT_REMAP hacked out, and 
dma-iommu.o appeared to link OK (although other bits of arm64 and 
dma-direct didn't, as expected). I will try x86 with IOMMU_DMA to make 
sure, though.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-04-18 Thread Christoph Hellwig

On Thu, Apr 18, 2019 at 04:06:56PM +0100, Robin Murphy wrote:
> OK, I'm still looking at mmap and get_sgtable, but for now I've pushed out 
> a partial branch that consolidates alloc and free in a way which makes 
> sense to me:
>
>   git://linux-arm.org/linux-rm  dma/rework
>
> Please let me know what you think.

>From a very high level POV this looks ok, but sometimes a bit to
convoluted to me.  The major issue why I went with the version I posted
is that I can cleanly ifdef out the remap code in just a few sections.
In this version it is spread out a lot more, and the use of IS_ENABLED
means that we'd need a lot more stubs for functionality that won't
ever be called but needs to be compilable.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 21/29] tracing: Use percpu stack trace buffer more intelligently

2019-04-18 Thread Steven Rostedt

On Thu, 18 Apr 2019 17:43:59 +0200 (CEST)
Thomas Gleixner  wrote:

> On Thu, 18 Apr 2019, Steven Rostedt wrote:
> > On Thu, 18 Apr 2019 10:41:40 +0200
> > Thomas Gleixner  wrote:  
> > > -static DEFINE_PER_CPU(struct ftrace_stack, ftrace_stack);
> > > +/* This allows 8 level nesting which is plenty */  
> > 
> > Can we make this 4 level nesting and increase the size? (I can see us
> > going more than 64 deep, kernel developers never cease to amaze me ;-)
> > That's all we need:
> > 
> >  Context: Normal, softirq, irq, NMI
> > 
> > Is there any other way to nest?  
> 
> Not that I know, but you are the tracer dude :)
>

There's other places I only test 4 deep, so it should be fine to limit
it to 4 then.

Thanks!

-- Steve
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 21/29] tracing: Use percpu stack trace buffer more intelligently

2019-04-18 Thread Thomas Gleixner

On Thu, 18 Apr 2019, Steven Rostedt wrote:
> On Thu, 18 Apr 2019 10:41:40 +0200
> Thomas Gleixner  wrote:
> > -static DEFINE_PER_CPU(struct ftrace_stack, ftrace_stack);
> > +/* This allows 8 level nesting which is plenty */
> 
> Can we make this 4 level nesting and increase the size? (I can see us
> going more than 64 deep, kernel developers never cease to amaze me ;-)
> That's all we need:
> 
>  Context: Normal, softirq, irq, NMI
> 
> Is there any other way to nest?

Not that I know, but you are the tracer dude :)

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-18 Thread Thomas Gleixner

On Thu, 18 Apr 2019, Josh Poimboeuf wrote:

> On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:
> > All architectures which support stacktrace carry duplicated code and
> > do the stack storage and filtering at the architecture side.
> > 
> > Provide a consolidated interface with a callback function for consuming the
> > stack entries provided by the architecture specific stack walker. This
> > removes lots of duplicated code and allows to implement better filtering
> > than 'skip number of entries' in the future without touching any
> > architecture specific code.
> > 
> > Signed-off-by: Thomas Gleixner 
> > Cc: linux-a...@vger.kernel.org
> 
> This is a step in the right direction, especially if it allows us to get
> rid of the 'skip' stuff.  But I'm not crazy about the callbacks.
> 
> Another idea I had (but never got a chance to work on) was to extend the
> x86 unwind interface to all arches.  So instead of the callbacks, each
> arch would implement something like this API:
> 
> 
> struct unwind_state state;
> 
> void unwind_start(struct unwind_state *state, struct task_struct *task,
> struct pt_regs *regs, unsigned long *first_frame);
> 
> bool unwind_next_frame(struct unwind_state *state);
> 
> inline bool unwind_done(struct unwind_state *state);
> 
> 
> Not only would it avoid the callbacks (which is a nice benefit already),
> it would also allow the interfaces to be used outside of the
> stack_trace_*() interfaces.  That would come in handy in cases like the
> ftrace stack tracer code, which needs more than the stack_trace_*() API
> can give.

I surely thought about that, but after staring at all incarnations of
arch/*/stacktrace.c I just gave up.

Aside of that quite some archs already have callback based unwinders
because they use them for more than stacktracing and just have a single
implementation of that loop.

I'm fine either way. We can start with x86 and then let archs convert over
their stuff, but I wouldn't hold my breath that this will be completed in
the forseeable future.

> Of course, this may be more work than what you thought you signed up for
> ;-)

I did not sign up for anything. I tripped over that mess by accident and me
being me hated it strong enough to give it at least an initial steam blast.

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/18] iommu/vt-d: Add custom allocator for IOASID

2019-04-18 Thread Jean-Philippe Brucker

On 16/04/2019 00:10, Jacob Pan wrote:[...]
>> > +   /*
>> > +    * Register a custom ASID allocator if we
>> > are running
>> > +    * in a guest, the purpose is to have a
>> > system wide PASID
>> > +    * namespace among all PASID users.
>> > +    * Note that only one vIOMMU in each guest
>> > is supported.  
>> 
>> Why one vIOMMU per guest?  This would prevent guests with multiple PCI
>> domains aiui.
>> 
> This is mainly for simplicity reasons. These are all virtual BDFs
> anyway. As long as guest BDF can be mapped to a host BDF, it should be
> sufficient, am I missing anything?
> 
> From PASID allocation perspective, it is not tied to any PCI device
> until bind call. We only need to track PASID ownership per guest.
> 
> virtio-IOMMU spec does support multiple PCI domains but I am not sure
> if that applies to all assigned devices, whether all assigned devices
> are under the same domain. Perhaps Jean can help to clarify how PASID
> allocation API looks like on virtio IOMMU.

[Ugh, this is much longer than I hoped. In short I don't think multiple
vIOMMUs is a problem, because the host uses the same allocator for all of
them.]

Yes there can be a single virtio-iommu instance for multiple PCI
domains, or multiple instances each managing assigned devices. It's up to
the hypervisor to decide on the topology.

For Linux and QEMU I was assuming that choosing the vIOMMU used for PASID
allocation isn't a big deal, since in the end they all use the same
allocator in the host. It gets complicated when some vIOMMUs can be
removed at runtime (unload the virtio-iommu module that was providing the
PASID allocator, and then you can't allocate PASIDs for the VT-d instance
anymore), so maybe limiting to one type of vIOMMU (don't mix VT-d and
virtio-iommu in the same VM) is more reasonable.

It's a bit more delicate from the virtio-iommu perspective. The
interface is portable and I can't tie it down to the choices we're making
for Linux and KVM. Having a system-wide PASID space is what we picked for
Linux but the PCIe architecture allows for each device to have
their own PASID space, and I suppose some hypervisors and guests might
prefer implementing it that way.

My plan for the moment is to implement global PASID allocation using one
feature bit and two new requests, but leave space for a per-device PASID
allocation, introduced with another feature bit if necessary. If it ever
gets added, I expect the per-device allocation to be done during the bind
request rather than with a separate PASID_ALLOC request.

So currently I have a new feature bit and two commands:

#define VIRTIO_IOMMU_F_PASID_ALLOC
#define VIRTIO_IOMMU_T_ALLOC_PASID
#define VIRTIO_IOMMU_T_FREE_PASID

struct virtio_iommu_req_alloc_pasid {
struct virtio_iommu_req_head head;
u32 reserved;

/* Device-writeable */
le32 pasid;
struct virtio_iommu_req_tail tail;
};

struct virtio_iommu_req_free_pasid {
struct virtio_iommu_req_head head;
u32 reserved;
le32 pasid;

/* Device-writeable */
struct virtio_iommu_req_tail tail;
};

If the feature bit is offered it must be used, and the guest can only use
PASIDs allocated via VIRTIO_IOMMU_T_ALLOC_PASID in its bind requests.

The PASID space is global at the host scope. If multiple virtio-iommu
devices in the VM offer the feature bit, then using either of their
command queue to issue a VIRTIO_IOMMU_F_ALLOC_PASID and
VIRTIO_IOMMU_F_FREE_PASID is equivalent. Another possibility is to require
that only one of the virtio-iommu instances per VM offers the feature bit.
I do prefer this option, but there is the vIOMMU removal problem mentioned
above - which, with the first option, could be solved by keeping a list of
PASID allocator functions rather than a single one.

I'm considering adding max_pasid field to virtio_iommu_req_alloc_pasid. If
VIRTIO_IOMMU_T_ALLOC_PASID returns a random 20-bit value then a lot of
space might be needed for storing PASID contexts (is that a real concern
though? For internal data it can use a binary tree, and the guest is not
in charge of hardware PASID tables here). If the guest is short on memory
then it could benefit from a smaller number of PASID bits. That could
either be globally configurable in the virtio-iommu config space, or using
a max_pasid field in the VIRTIO_IOMMU_T_ALLOC_PASID request. The latter
allows to support devices with less than 20 PASID bits, though we're
hoping that no one will implement that.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 08/29] mm/kmemleak: Simplify stacktrace handling

2019-04-18 Thread Catalin Marinas

On Thu, Apr 18, 2019 at 10:41:27AM +0200, Thomas Gleixner wrote:
> Replace the indirection through struct stack_trace by using the storage
> array based interfaces.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Catalin Marinas 
> Cc: linux...@kvack.org

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-04-18 Thread Robin Murphy


On 17/04/2019 07:33, Christoph Hellwig wrote:

On Wed, Apr 10, 2019 at 08:11:57AM +0200, Christoph Hellwig wrote:

On Tue, Apr 09, 2019 at 06:59:32PM +0100, Robin Murphy wrote:

On 27/03/2019 08:04, Christoph Hellwig wrote:

This keeps the code together and will simplify compiling the code
out on architectures that are always dma coherent.


And this is where things take a turn in the direction I just can't get on
with - I'm looking at the final result and the twisty maze of little
disjoint helpers all overlapping each other in functionality is really
difficult to follow. And I would *much* rather have things rely on
compile-time constant optimisation than spend the future having to fix the
#ifdefed parts for arm64 whenever x86-centric changes fail to test them.


Can you draft up a patch on top of my series to show me what you
want?  I can take care of finishing it up and moving the changes
into the right patches in the series.


Any chance to make some progress on this?  Or at least a better
description of what you want?


OK, I'm still looking at mmap and get_sgtable, but for now I've pushed 
out a partial branch that consolidates alloc and free in a way which 
makes sense to me:


  git://linux-arm.org/linux-rm  dma/rework

Please let me know what you think.

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 21/29] tracing: Use percpu stack trace buffer more intelligently

2019-04-18 Thread Steven Rostedt

On Thu, 18 Apr 2019 10:41:40 +0200
Thomas Gleixner  wrote:

> The per cpu stack trace buffer usage pattern is odd at best. The buffer has
> place for 512 stack trace entries on 64-bit and 1024 on 32-bit. When
> interrupts or exceptions nest after the per cpu buffer was acquired the
> stacktrace length is hardcoded to 8 entries. 512/1024 stack trace entries
> in kernel stacks are unrealistic so the buffer is a complete waste.
> 
> Split the buffer into chunks of 64 stack entries which is plenty. This
> allows nesting contexts (interrupts, exceptions) to utilize the cpu buffer
> for stack retrieval and avoids the fixed length allocation along with the
> conditional execution pathes.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Steven Rostedt 
> ---
>  kernel/trace/trace.c |   77 
> +--
>  1 file changed, 39 insertions(+), 38 deletions(-)
> 
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2749,12 +2749,21 @@ trace_function(struct trace_array *tr,
>  
>  #ifdef CONFIG_STACKTRACE
>  
> -#define FTRACE_STACK_MAX_ENTRIES (PAGE_SIZE / sizeof(unsigned long))
> +/* 64 entries for kernel stacks are plenty */
> +#define FTRACE_KSTACK_ENTRIES64
> +
>  struct ftrace_stack {
> - unsigned long   calls[FTRACE_STACK_MAX_ENTRIES];
> + unsigned long   calls[FTRACE_KSTACK_ENTRIES];
>  };
>  
> -static DEFINE_PER_CPU(struct ftrace_stack, ftrace_stack);
> +/* This allows 8 level nesting which is plenty */

Can we make this 4 level nesting and increase the size? (I can see us
going more than 64 deep, kernel developers never cease to amaze me ;-)
That's all we need:

 Context: Normal, softirq, irq, NMI

Is there any other way to nest?

-- Steve

> +#define FTRACE_KSTACK_NESTING(PAGE_SIZE / sizeof(struct 
> ftrace_stack))
> +
> +struct ftrace_stacks {
> + struct ftrace_stack stacks[FTRACE_KSTACK_NESTING];
> +};
> +
> +static DEFINE_PER_CPU(struct ftrace_stacks, ftrace_stacks);
>  static DEFINE_PER_CPU(int, ftrace_stack_reserve);
>  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-18 Thread Josh Poimboeuf

On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:
> All architectures which support stacktrace carry duplicated code and
> do the stack storage and filtering at the architecture side.
> 
> Provide a consolidated interface with a callback function for consuming the
> stack entries provided by the architecture specific stack walker. This
> removes lots of duplicated code and allows to implement better filtering
> than 'skip number of entries' in the future without touching any
> architecture specific code.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-a...@vger.kernel.org

This is a step in the right direction, especially if it allows us to get
rid of the 'skip' stuff.  But I'm not crazy about the callbacks.

Another idea I had (but never got a chance to work on) was to extend the
x86 unwind interface to all arches.  So instead of the callbacks, each
arch would implement something like this API:

struct unwind_state state;

void unwind_start(struct unwind_state *state, struct task_struct *task,
  struct pt_regs *regs, unsigned long *first_frame);

bool unwind_next_frame(struct unwind_state *state);

inline bool unwind_done(struct unwind_state *state);

Not only would it avoid the callbacks (which is a nice benefit already),
it would also allow the interfaces to be used outside of the
stack_trace_*() interfaces.  That would come in handy in cases like the
ftrace stack tracer code, which needs more than the stack_trace_*() API
can give.

Of course, this may be more work than what you thought you signed up for
;-)

-- 
Josh
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-18 Thread Khalid Aziz

On 4/17/19 11:41 PM, Kees Cook wrote:
> On Wed, Apr 17, 2019 at 11:41 PM Andy Lutomirski  wrote:
>> I don't think this type of NX goof was ever the argument for XPFO.
>> The main argument I've heard is that a malicious user program writes a
>> ROP payload into user memory (regular anonymous user memory) and then
>> gets the kernel to erroneously set RSP (*not* RIP) to point there.
> 
> Well, more than just ROP. Any of the various attack primitives. The NX
> stuff is about moving RIP: SMEP-bypassing. But there is still basic
> SMAP-bypassing for putting a malicious structure in userspace and
> having the kernel access it via the linear mapping, etc.
> 
>> I find this argument fairly weak for a couple reasons.  First, if
>> we're worried about this, let's do in-kernel CFI, not XPFO, to
> 
> CFI is getting much closer. Getting the kernel happy under Clang, LTO,
> and CFI is under active development. (It's functional for arm64
> already, and pieces have been getting upstreamed.)
> 

CFI theoretically offers protection with fairly low overhead. I have not
played much with CFI in clang. I agree with Linus that probability of
bugs in XPFO implementation itself is a cause of concern. If CFI in
Clang can provide us the same level of protection as XPFO does, I
wouldn't want to push for an expensive change like XPFO.

If Clang/CFI can't get us there for extended period of time, does it
make sense to continue to poke at XPFO?

Thanks,
Khalid

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 1/1] iommu: enhance IOMMU dma mode build options

2019-04-18 Thread Zhen Lei

First, add build option IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the three config options in an choice, make people can only choose one of
the three at a time.

The default IOMMU dma modes on each ARCHs have no change.

Signed-off-by: Zhen Lei 
---
 arch/ia64/kernel/pci-dma.c|  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
 arch/s390/pci/pci_dma.c   |  2 +-
 arch/x86/kernel/pci-dma.c |  7 ++---
 drivers/iommu/Kconfig | 44 ++-
 drivers/iommu/amd_iommu_init.c|  3 ++-
 drivers/iommu/intel-iommu.c   |  2 +-
 drivers/iommu/iommu.c |  3 ++-
 8 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/arch/ia64/kernel/pci-dma.c b/arch/ia64/kernel/pci-dma.c
index fe988c49f01ce6a..655511dbf3c3b34 100644
--- a/arch/ia64/kernel/pci-dma.c
+++ b/arch/ia64/kernel/pci-dma.c
@@ -22,7 +22,7 @@
 int force_iommu __read_mostly;
 #endif

-int iommu_pass_through;
+int iommu_pass_through = IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);

 static int __init pci_iommu_init(void)
 {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3ead4c237ed0ec9..383e082a9bb985c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -85,7 +85,8 @@ void pe_level_printk(const struct pnv_ioda_pe *pe, const char 
*level,
va_end(args);
 }

-static bool pnv_iommu_bypass_disabled __read_mostly;
+static bool pnv_iommu_bypass_disabled __read_mostly =
+   !IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
 static bool pci_reset_phbs __read_mostly;

 static int __init iommu_setup(char *str)
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 9e52d1527f71495..784ad1e0acecfb1 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -17,7 +17,7 @@

 static struct kmem_cache *dma_region_table_cache;
 static struct kmem_cache *dma_page_table_cache;
-static int s390_iommu_strict;
+static int s390_iommu_strict = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);

 static int zpci_refresh_global(struct zpci_dev *zdev)
 {
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d460998ae828514..fb2bab42a0a3173 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -43,11 +43,8 @@
  * It is also possible to disable by default in kernel config, and enable with
  * iommu=nopt at boot time.
  */
-#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
-int iommu_pass_through __read_mostly = 1;
-#else
-int iommu_pass_through __read_mostly;
-#endif
+int iommu_pass_through __read_mostly =
+   IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);

 extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6f07f3b21816c64..8a1f1793cde76b4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -74,17 +74,47 @@ config IOMMU_DEBUGFS
  debug/iommu directory, and then populate a subdirectory with
  entries as required.

-config IOMMU_DEFAULT_PASSTHROUGH
-   bool "IOMMU passthrough by default"
+choice
+   prompt "IOMMU dma mode"
depends on IOMMU_API
-help
- Enable passthrough by default, removing the need to pass in
- iommu.passthrough=on or iommu=pt through command line. If this
- is enabled, you can still disable with iommu.passthrough=off
- or iommu=nopt depending on the architecture.
+   default IOMMU_DEFAULT_PASSTHROUGH if (PPC_POWERNV && PCI)
+   default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU || S390_IOMMU)
+   default IOMMU_DEFAULT_STRICT
+   help
+ This option allows IOMMU dma mode to be chose at build time, to
+ override the default dma mode of each ARCHs, removing the need to
+ pass in kernel parameters through command line. You can still use
+ ARCHs specific boot options to override this option again.
+
+config IOMMU_DEFAULT_PASSTHROUGH
+   bool "passthrough"
+   help
+ In this mode, the dma access through IOMMU without any addresses
+ transformation. That means, the wrong or illegal dma access can not
+ be caught, no error information will be reported.

  If unsure, say N here.

+config IOMMU_DEFAULT_LAZY
+   bool "lazy"
+   help
+ Support lazy mode, where for every IOMMU DMA unmap operation, the
+ flush operation of IOTLB and the free operation of IOVA are deferred.
+ They are only guaranteed to be done before the related IOVA will be
+ reused.
+
+config IOMMU_DEFAULT_STRICT
+   bool "strict"
+   help
+ For every IOMMU DMA unmap operation, the flush operation of IOTLB and
+ the free operation of IOVA are guaranteed to be done in the unmap
+

[PATCH v6 0/1] iommu: enhance IOMMU dma mode build options

2019-04-18 Thread Zhen Lei

v5 --> v6:
1. give up adding boot option iommu.dma_mode

v4 --> v5:
As Hanjun and Thomas Gleixner's suggestion:
1. Keep the old ARCH specific boot options no change.
2. Keep build option CONFIG_IOMMU_DEFAULT_PASSTHROUGH no change.

v4:
As Robin Murphy's suggestion:
"It's also not necessarily obvious to the user how this interacts with
IOMMU_DEFAULT_PASSTHROUGH, so if we really do go down this route, maybe it
would be better to refactor the whole lot into a single selection of something
like IOMMU_DEFAULT_MODE anyway."

In this version, I tried to normalize the IOMMU dma mode boot options for all
ARCHs. When IOMMU is enabled, there are 3 dma modes: paasthrough(bypass),
lazy(mapping but defer the IOTLB invalidation), strict. But currently each
ARCHs defined their private boot options, different with each other. For
example, to enable/disable "passthrough", ARM64 use iommu.passthrough=1/0,
X86 use iommu=pt/nopt, PPC/POWERNV use iommu=nobypass.

Zhen Lei (1):
  iommu: enhance IOMMU dma mode build options

 arch/ia64/kernel/pci-dma.c|  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  3 ++-
 arch/s390/pci/pci_dma.c   |  2 +-
 arch/x86/kernel/pci-dma.c |  7 ++---
 drivers/iommu/Kconfig | 44 ++-
 drivers/iommu/amd_iommu_init.c|  3 ++-
 drivers/iommu/intel-iommu.c   |  2 +-
 drivers/iommu/iommu.c |  3 ++-
 8 files changed, 48 insertions(+), 18 deletions(-)

-- 
1.8.3


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 01/29] tracing: Cleanup stack trace code

2019-04-18 Thread Josh Poimboeuf

On Thu, Apr 18, 2019 at 10:41:20AM +0200, Thomas Gleixner wrote:
> - Remove the extra array member of stack_dump_trace[]. It's not required as
>   the stack tracer stores at max array size - 1 entries so there is still
>   an empty slot.

What is the empty slot used for?

-- 
Josh
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 20/29] tracing: Simplify stacktrace retrieval in histograms

2019-04-18 Thread Steven Rostedt



[ Added Tom Zanussi ]

On Thu, 18 Apr 2019 10:41:39 +0200
Thomas Gleixner  wrote:

> The indirection through struct stack_trace is not necessary at all. Use the
> storage array based interface.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Steven Rostedt 

Looks fine to me

Acked-by: Steven Rostedt (VMware) 

 But...

Tom,

Can you review this too?

Patch series starts here:

  http://lkml.kernel.org/r/20190418084119.056416...@linutronix.de

Thanks,

-- Steve

> ---
>  kernel/trace/trace_events_hist.c |   12 +++-
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -5186,7 +5186,6 @@ static void event_hist_trigger(struct ev
>   u64 var_ref_vals[TRACING_MAP_VARS_MAX];
>   char compound_key[HIST_KEY_SIZE_MAX];
>   struct tracing_map_elt *elt = NULL;
> - struct stack_trace stacktrace;
>   struct hist_field *key_field;
>   u64 field_contents;
>   void *key = NULL;
> @@ -5198,14 +5197,9 @@ static void event_hist_trigger(struct ev
>   key_field = hist_data->fields[i];
>  
>   if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
> - stacktrace.max_entries = HIST_STACKTRACE_DEPTH;
> - stacktrace.entries = entries;
> - stacktrace.nr_entries = 0;
> - stacktrace.skip = HIST_STACKTRACE_SKIP;
> -
> - memset(stacktrace.entries, 0, HIST_STACKTRACE_SIZE);
> - save_stack_trace();
> -
> + memset(entries, 0, HIST_STACKTRACE_SIZE);
> + stack_trace_save(entries, HIST_STACKTRACE_DEPTH,
> +  HIST_STACKTRACE_SKIP);
>   key = entries;
>   } else {
>   field_contents = key_field->fn(key_field, elt, rbe, 
> rec);
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 14/29] dm bufio: Simplify stack trace retrieval

2019-04-18 Thread Steven Rostedt

On Thu, 18 Apr 2019 14:11:44 +0200
Alexander Potapenko  wrote:

> On Thu, Apr 18, 2019 at 1:54 PM Thomas Gleixner  wrote:
> >
> > On Thu, 18 Apr 2019, Alexander Potapenko wrote:  
> > > On Thu, Apr 18, 2019 at 11:06 AM Thomas Gleixner  
> > > wrote:  
> > > > -   save_stack_trace(>stack_trace);
> > > > +   b->stack_len = stack_trace_save(b->stack_entries, MAX_STACK, 
> > > > 2);  
> > > As noted in one of similar patches before, can we have an inline
> > > comment to indicate what does this "2" stand for?  
> >
> > Come on. We have gazillion of functions which take numerical constant
> > arguments. Should we add comments to all of them?  
> Ok, sorry. I might not be familiar enough with the kernel style guide.

It is a legitimate complaint but not for this series. I only complain
about hard coded constants when they are added. That "2" was not
added by this series. This patch set is a clean up of the stack tracing
code, not a clean up of removing hard coded constants, or commenting
them.

The hard coded "2" was there without a comment before this patch series
and Thomas is correct to leave it as is for these changes. This patch
series should not modify what was already there which is out of scope
for the purpose of these changes.

A separate clean up patch to the maintainer of the subsystem (dm bufio
in this case) is fine. But it's not Thomas's responsibility.

-- Steve
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 14/29] dm bufio: Simplify stack trace retrieval

2019-04-18 Thread Alexander Potapenko via iommu

On Thu, Apr 18, 2019 at 11:06 AM Thomas Gleixner  wrote:
>
> Replace the indirection through struct stack_trace with an invocation of
> the storage array based interface.
>
> Signed-off-by: Thomas Gleixner 
> Cc: dm-de...@redhat.com
> Cc: Mike Snitzer 
> Cc: Alasdair Kergon 
> ---
>  drivers/md/dm-bufio.c |   15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)
>
> --- a/drivers/md/dm-bufio.c
> +++ b/drivers/md/dm-bufio.c
> @@ -150,7 +150,7 @@ struct dm_buffer {
> void (*end_io)(struct dm_buffer *, blk_status_t);
>  #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
>  #define MAX_STACK 10
> -   struct stack_trace stack_trace;
> +   unsigned int stack_len;
> unsigned long stack_entries[MAX_STACK];
>  #endif
>  };
> @@ -232,11 +232,7 @@ static DEFINE_MUTEX(dm_bufio_clients_loc
>  #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
>  static void buffer_record_stack(struct dm_buffer *b)
>  {
> -   b->stack_trace.nr_entries = 0;
> -   b->stack_trace.max_entries = MAX_STACK;
> -   b->stack_trace.entries = b->stack_entries;
> -   b->stack_trace.skip = 2;
> -   save_stack_trace(>stack_trace);
> +   b->stack_len = stack_trace_save(b->stack_entries, MAX_STACK, 2);
As noted in one of similar patches before, can we have an inline
comment to indicate what does this "2" stand for?
>  }
>  #endif
>
> @@ -438,7 +434,7 @@ static struct dm_buffer *alloc_buffer(st
> adjust_total_allocated(b->data_mode, (long)c->block_size);
>
>  #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
> -   memset(>stack_trace, 0, sizeof(b->stack_trace));
> +   b->stack_len = 0;
>  #endif
> return b;
>  }
> @@ -1520,8 +1516,9 @@ static void drop_buffers(struct dm_bufio
> DMERR("leaked buffer %llx, hold count %u, list %d",
>   (unsigned long long)b->block, b->hold_count, i);
>  #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
> -   print_stack_trace(>stack_trace, 1);
> -   b->hold_count = 0; /* mark unclaimed to avoid BUG_ON 
> below */
> +   stack_trace_print(b->stack_entries, b->stack_len, 1);
> +   /* mark unclaimed to avoid BUG_ON below */
> +   b->hold_count = 0;
>  #endif
> }
>
>
>


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 09/29] mm/kasan: Simplify stacktrace handling

2019-04-18 Thread Andrey Ryabinin

On 4/18/19 11:41 AM, Thomas Gleixner wrote:
> Replace the indirection through struct stack_trace by using the storage
> array based interfaces.
> 
> Signed-off-by: Thomas Gleixner 
> Acked-by: Dmitry Vyukov 
> Cc: Andrey Ryabinin 
> Cc: Alexander Potapenko 
> Cc: kasan-...@googlegroups.com
> Cc: linux...@kvack.org

Acked-by: Andrey Ryabinin 

>  
>  static inline depot_stack_handle_t save_stack(gfp_t flags)
>  {
>   unsigned long entries[KASAN_STACK_DEPTH];
> - struct stack_trace trace = {
> - .nr_entries = 0,
> - .entries = entries,
> - .max_entries = KASAN_STACK_DEPTH,
> - .skip = 0
> - };
> + unsigned int nr_entries;
>  
> - save_stack_trace();
> - filter_irq_stacks();
> -
> - return depot_save_stack(, flags);
> + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
> + nr_entries = filter_irq_stacks(entries, nr_entries);
> + return stack_depot_save(entries, nr_entries, flags);

Suggestion for further improvement:

stack_trace_save() shouldn't unwind beyond irq entry point so we wouldn't need 
filter_irq_stacks().
Probably all call sites doesn't care about random stack above irq entry point, 
so it doesn't
make sense to spend resources on unwinding non-irq stack from interrupt first 
an filtering out it later.

It would improve performance of stack_trace_save() called from interrupt and 
fix page_owner which feed unfiltered
stack to stack_depot_save(). Random non-irq part kills the benefit of using the 
stack_deopt_save().
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 14/29] dm bufio: Simplify stack trace retrieval

2019-04-18 Thread Alexander Potapenko via iommu

On Thu, Apr 18, 2019 at 1:54 PM Thomas Gleixner  wrote:
>
> On Thu, 18 Apr 2019, Alexander Potapenko wrote:
> > On Thu, Apr 18, 2019 at 11:06 AM Thomas Gleixner  wrote:
> > > -   save_stack_trace(>stack_trace);
> > > +   b->stack_len = stack_trace_save(b->stack_entries, MAX_STACK, 2);
> > As noted in one of similar patches before, can we have an inline
> > comment to indicate what does this "2" stand for?
>
> Come on. We have gazillion of functions which take numerical constant
> arguments. Should we add comments to all of them?
Ok, sorry. I might not be familiar enough with the kernel style guide.
> Thanks,
>
> tglx



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 03/29] lib/stackdepot: Provide functions which operate on plain storage arrays

2019-04-18 Thread Thomas Gleixner

On Thu, 18 Apr 2019, Mike Rapoport wrote:
> On Thu, Apr 18, 2019 at 10:41:22AM +0200, Thomas Gleixner wrote:
> > 
> > -void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace 
> > *trace)
> > +/**
> > + * stack_depot_fetch - Fetch stack entries from a depot
> > + *
> 
> Nit: kernel-doc will complain about missing description of @handle.

Duh.

> > + * @entries:   Pointer to store the entries address
> > + */
> 
> Can you please s/Returns/Return:/ so that kernel-doc will recognize this as
> return section.

Sure thing.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-18 Thread Thomas Gleixner

On Thu, 18 Apr 2019, Mike Rapoport wrote:
> On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:
> > +/**
> > + * arch_stack_walk - Architecture specific function to walk the stack
> > +
> 
> Nit: no '*' at line beginning makes kernel-doc unhappy

Oops.

> > + * @consume_entry: Callback which is invoked by the architecture code for
> > + * each entry.
> > + * @cookie:Caller supplied pointer which is handed back to
> > + * @consume_entry
> > + * @task:  Pointer to a task struct, can be NULL
> > + * @regs:  Pointer to registers, can be NULL
> > + *
> > + * @task   @regs:
> > + * NULLNULLStack trace from current
> > + * taskNULLStack trace from task (can be current)
> > + * NULLregsStack trace starting on regs->stackpointer
> 
> This will render as a single line with 'make *docs'.
> Adding line separators makes this actually a table in the generated docs:
> 
>  *  === 
>  * task   regs
>  *  === 
>  * NULL   NULLStack trace from current
>  * task   NULLStack trace from task (can be current)
>  * NULL   regsStack trace starting on regs->stackpointer
>  *  === 

Cute.

> > + * Returns number of entries stored.
> 
> Can you please s/Returns/Return:/ so that kernel-doc will recognize this as
> return section.
> 
> This is relevant for other comments below as well.

Sure.

Thanks for the free kernel doc course! :)

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 14/29] dm bufio: Simplify stack trace retrieval

2019-04-18 Thread Thomas Gleixner

On Thu, 18 Apr 2019, Alexander Potapenko wrote:
> On Thu, Apr 18, 2019 at 11:06 AM Thomas Gleixner  wrote:
> > -   save_stack_trace(>stack_trace);
> > +   b->stack_len = stack_trace_save(b->stack_entries, MAX_STACK, 2);
> As noted in one of similar patches before, can we have an inline
> comment to indicate what does this "2" stand for?

Come on. We have gazillion of functions which take numerical constant
arguments. Should we add comments to all of them?

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 09/29] mm/kasan: Simplify stacktrace handling

2019-04-18 Thread Thomas Gleixner

On Thu, 18 Apr 2019, Andrey Ryabinin wrote:
> On 4/18/19 11:41 AM, Thomas Gleixner wrote:
> > Replace the indirection through struct stack_trace by using the storage
> > array based interfaces.
> > 
> > Signed-off-by: Thomas Gleixner 
> > Acked-by: Dmitry Vyukov 
> > Cc: Andrey Ryabinin 
> > Cc: Alexander Potapenko 
> > Cc: kasan-...@googlegroups.com
> > Cc: linux...@kvack.org
> 
> Acked-by: Andrey Ryabinin 
> 
> >  
> >  static inline depot_stack_handle_t save_stack(gfp_t flags)
> >  {
> > unsigned long entries[KASAN_STACK_DEPTH];
> > -   struct stack_trace trace = {
> > -   .nr_entries = 0,
> > -   .entries = entries,
> > -   .max_entries = KASAN_STACK_DEPTH,
> > -   .skip = 0
> > -   };
> > +   unsigned int nr_entries;
> >  
> > -   save_stack_trace();
> > -   filter_irq_stacks();
> > -
> > -   return depot_save_stack(, flags);
> > +   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
> > +   nr_entries = filter_irq_stacks(entries, nr_entries);
> > +   return stack_depot_save(entries, nr_entries, flags);
> 
> Suggestion for further improvement:
> 
> stack_trace_save() shouldn't unwind beyond irq entry point so we wouldn't
> need filter_irq_stacks().  Probably all call sites doesn't care about
> random stack above irq entry point, so it doesn't make sense to spend
> resources on unwinding non-irq stack from interrupt first an filtering
> out it later.

There are users which care about the full trace.

Once we have cleaned up the whole architeture side, we can add core side
filtering which allows to

   1) replace the 'skip number of entries at the beginning

   2) stop the trace when it reaches a certain point

Right now, I don't want to change any of this until the whole mess is
consolidated.

Thanks,

tglx



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [patch V2 03/29] lib/stackdepot: Provide functions which operate on plain storage arrays

2019-04-18 Thread Mike Rapoport

On Thu, Apr 18, 2019 at 10:41:22AM +0200, Thomas Gleixner wrote:
> The struct stack_trace indirection in the stack depot functions is a truly
> pointless excercise which requires horrible code at the callsites.
> 
> Provide interfaces based on plain storage arrays.
> 
> Signed-off-by: Thomas Gleixner 
> Acked-by: Alexander Potapenko 
> ---
>  include/linux/stackdepot.h |4 ++
>  lib/stackdepot.c   |   66 
> -
>  2 files changed, 51 insertions(+), 19 deletions(-)
> 
> --- a/include/linux/stackdepot.h
> +++ b/include/linux/stackdepot.h
> @@ -26,7 +26,11 @@ typedef u32 depot_stack_handle_t;
>  struct stack_trace;
> 
>  depot_stack_handle_t depot_save_stack(struct stack_trace *trace, gfp_t 
> flags);
> +depot_stack_handle_t stack_depot_save(unsigned long *entries,
> +   unsigned int nr_entries, gfp_t gfp_flags);
> 
>  void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace 
> *trace);
> +unsigned int stack_depot_fetch(depot_stack_handle_t handle,
> +unsigned long **entries);
> 
>  #endif
> --- a/lib/stackdepot.c
> +++ b/lib/stackdepot.c
> @@ -194,40 +194,56 @@ static inline struct stack_record *find_
>   return NULL;
>  }
> 
> -void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace 
> *trace)
> +/**
> + * stack_depot_fetch - Fetch stack entries from a depot
> + *

Nit: kernel-doc will complain about missing description of @handle.

> + * @entries: Pointer to store the entries address
> + */
> +unsigned int stack_depot_fetch(depot_stack_handle_t handle,
> +unsigned long **entries)
>  {
>   union handle_parts parts = { .handle = handle };
>   void *slab = stack_slabs[parts.slabindex];
>   size_t offset = parts.offset << STACK_ALLOC_ALIGN;
>   struct stack_record *stack = slab + offset;
> 
> - trace->nr_entries = trace->max_entries = stack->size;
> - trace->entries = stack->entries;
> - trace->skip = 0;
> + *entries = stack->entries;
> + return stack->size;
> +}
> +EXPORT_SYMBOL_GPL(stack_depot_fetch);
> +
> +void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace 
> *trace)
> +{
> + unsigned int nent = stack_depot_fetch(handle, >entries);
> +
> + trace->max_entries = trace->nr_entries = nent;
>  }
>  EXPORT_SYMBOL_GPL(depot_fetch_stack);
> 
>  /**
> - * depot_save_stack - save stack in a stack depot.
> - * @trace - the stacktrace to save.
> - * @alloc_flags - flags for allocating additional memory if required.
> + * stack_depot_save - Save a stack trace from an array
>   *
> - * Returns the handle of the stack struct stored in depot.
> + * @entries: Pointer to storage array
> + * @nr_entries:  Size of the storage array
> + * @alloc_flags: Allocation gfp flags
> + *
> + * Returns the handle of the stack struct stored in depot

Can you please s/Returns/Return:/ so that kernel-doc will recognize this as
return section.

>   */
> -depot_stack_handle_t depot_save_stack(struct stack_trace *trace,
> - gfp_t alloc_flags)
> +depot_stack_handle_t stack_depot_save(unsigned long *entries,
> +   unsigned int nr_entries,
> +   gfp_t alloc_flags)
>  {
> - u32 hash;
> - depot_stack_handle_t retval = 0;
>   struct stack_record *found = NULL, **bucket;
> - unsigned long flags;
> + depot_stack_handle_t retval = 0;
>   struct page *page = NULL;
>   void *prealloc = NULL;
> + unsigned long flags;
> + u32 hash;
> 
> - if (unlikely(trace->nr_entries == 0))
> + if (unlikely(nr_entries == 0))
>   goto fast_exit;
> 
> - hash = hash_stack(trace->entries, trace->nr_entries);
> + hash = hash_stack(entries, nr_entries);
>   bucket = _table[hash & STACK_HASH_MASK];
> 
>   /*
> @@ -235,8 +251,8 @@ depot_stack_handle_t depot_save_stack(st
>* The smp_load_acquire() here pairs with smp_store_release() to
>* |bucket| below.
>*/
> - found = find_stack(smp_load_acquire(bucket), trace->entries,
> -trace->nr_entries, hash);
> + found = find_stack(smp_load_acquire(bucket), entries,
> +nr_entries, hash);
>   if (found)
>   goto exit;
> 
> @@ -264,10 +280,10 @@ depot_stack_handle_t depot_save_stack(st
> 
>   spin_lock_irqsave(_lock, flags);
> 
> - found = find_stack(*bucket, trace->entries, trace->nr_entries, hash);
> + found = find_stack(*bucket, entries, nr_entries, hash);
>   if (!found) {
>   struct stack_record *new =
> - depot_alloc_stack(trace->entries, trace->nr_entries,
> + depot_alloc_stack(entries, nr_entries,
> hash, , alloc_flags);
>   if (new) {
>   new->next = *bucket;
> @@

Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-18 Thread Mike Rapoport

On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:
> All architectures which support stacktrace carry duplicated code and
> do the stack storage and filtering at the architecture side.
> 
> Provide a consolidated interface with a callback function for consuming the
> stack entries provided by the architecture specific stack walker. This
> removes lots of duplicated code and allows to implement better filtering
> than 'skip number of entries' in the future without touching any
> architecture specific code.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: linux-a...@vger.kernel.org
> ---
>  include/linux/stacktrace.h |   38 +
>  kernel/stacktrace.c|  173 
> +
>  lib/Kconfig|4 +
>  3 files changed, 215 insertions(+)
> 
> --- a/include/linux/stacktrace.h
> +++ b/include/linux/stacktrace.h
> @@ -23,6 +23,43 @@ unsigned int stack_trace_save_regs(struc
>  unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
> 
>  /* Internal interfaces. Do not use in generic code */
> +#ifdef CONFIG_ARCH_STACKWALK
> +
> +/**
> + * stack_trace_consume_fn - Callback for arch_stack_walk()
> + * @cookie:  Caller supplied pointer handed back by arch_stack_walk()
> + * @addr:The stack entry address to consume
> + * @reliable:True when the stack entry is reliable. Required by
> + *   some printk based consumers.
> + *
> + * Returns:  True, if the entry was consumed or skipped
> + *   False, if there is no space left to store
> + */
> +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr,
> +bool reliable);
> +/**
> + * arch_stack_walk - Architecture specific function to walk the stack
> +

Nit: no '*' at line beginning makes kernel-doc unhappy

> + * @consume_entry:   Callback which is invoked by the architecture code for
> + *   each entry.
> + * @cookie:  Caller supplied pointer which is handed back to
> + *   @consume_entry
> + * @task:Pointer to a task struct, can be NULL
> + * @regs:Pointer to registers, can be NULL
> + *
> + * @task @regs:
> + * NULL  NULLStack trace from current
> + * task  NULLStack trace from task (can be current)
> + * NULL  regsStack trace starting on regs->stackpointer

This will render as a single line with 'make *docs'.
Adding line separators makes this actually a table in the generated docs:

 *  === 
 * task regs
 *  === 
 * NULL NULLStack trace from current
 * task NULLStack trace from task (can be current)
 * NULL regsStack trace starting on regs->stackpointer
 *  === 


> + */
> +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> +  struct task_struct *task, struct pt_regs *regs);
> +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void 
> *cookie,
> +  struct task_struct *task);
> +void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie,
> +   const struct pt_regs *regs);
> +
> +#else /* CONFIG_ARCH_STACKWALK */
>  struct stack_trace {
>   unsigned int nr_entries, max_entries;
>   unsigned long *entries;
> @@ -37,6 +74,7 @@ extern void save_stack_trace_tsk(struct
>  extern int save_stack_trace_tsk_reliable(struct task_struct *tsk,
>struct stack_trace *trace);
>  extern void save_stack_trace_user(struct stack_trace *trace);
> +#endif /* !CONFIG_ARCH_STACKWALK */
>  #endif /* CONFIG_STACKTRACE */
> 
>  #if defined(CONFIG_STACKTRACE) && defined(CONFIG_HAVE_RELIABLE_STACKTRACE)
> --- a/kernel/stacktrace.c
> +++ b/kernel/stacktrace.c
> @@ -5,6 +5,8 @@
>   *
>   *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar 
>   */
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -64,6 +66,175 @@ int stack_trace_snprint(char *buf, size_
>  }
>  EXPORT_SYMBOL_GPL(stack_trace_snprint);
> 
> +#ifdef CONFIG_ARCH_STACKWALK
> +
> +struct stacktrace_cookie {
> + unsigned long   *store;
> + unsigned intsize;
> + unsigned intskip;
> + unsigned intlen;
> +};
> +
> +static bool stack_trace_consume_entry(void *cookie, unsigned long addr,
> +   bool reliable)
> +{
> + struct stacktrace_cookie *c = cookie;
> +
> + if (c->len >= c->size)
> + return false;
> +
> + if (c->skip > 0) {
> + c->skip--;
> + return true;
> + }
> + c->store[c->len++] = addr;
> + return c->len < c->size;
> +}
> +
> +static bool stack_trace_consume_entry_nosched(void *cookie, unsigned long 
> addr,
> +

Re: [PATCH v3 3/9] ACPI/IORT: Check ATS capability in root complex nodes

2019-04-18 Thread Lorenzo Pieralisi

On Wed, Apr 17, 2019 at 07:24:42PM +0100, Jean-Philippe Brucker wrote:
> Root complex node in IORT has a bit telling whether it supports ATS or
> not. Store this bit in the IOMMU fwspec when setting up a device, so it
> can be accessed later by an IOMMU driver. In the future we'll probably
> want to store this bit at the host bridge or SMMU rather than in each
> endpoint.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/acpi/arm64/iort.c | 11 +++
>  include/linux/iommu.h |  4 
>  2 files changed, 15 insertions(+)

For the IORT portion:

Acked-by: Lorenzo Pieralisi 

> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index e48894e002ba..4000902e57f0 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1028,6 +1028,14 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, 
> u64 *dma_size)
>   dev_dbg(dev, "dma_pfn_offset(%#08llx)\n", offset);
>  }
>  
> +static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
> +{
> + struct acpi_iort_root_complex *pci_rc;
> +
> + pci_rc = (struct acpi_iort_root_complex *)node->node_data;
> + return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED;
> +}
> +
>  /**
>   * iort_iommu_configure - Set-up IOMMU configuration for a device.
>   *
> @@ -1063,6 +1071,9 @@ const struct iommu_ops *iort_iommu_configure(struct 
> device *dev)
>   info.node = node;
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>iort_pci_iommu_init, );
> +
> + if (!err && iort_pci_rc_supports_ats(node))
> + dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS;
>   } else {
>   int i = 0;
>  
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 480921dfbadf..51ab006d348e 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -446,6 +446,7 @@ struct iommu_fwspec {
>   const struct iommu_ops  *ops;
>   struct fwnode_handle*iommu_fwnode;
>   void*iommu_priv;
> + u32 flags;
>   unsigned intnum_ids;
>   u32 ids[1];
>  };
> @@ -458,6 +459,9 @@ struct iommu_sva {
>   const struct iommu_sva_ops  *ops;
>  };
>  
> +/* ATS is supported */
> +#define IOMMU_FWSPEC_PCI_RC_ATS  (1 << 0)
> +
>  int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
> const struct iommu_ops *ops);
>  void iommu_fwspec_free(struct device *dev);
> -- 
> 2.21.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 22/29] tracing: Make ftrace_trace_userstack() static and conditional

2019-04-18 Thread Thomas Gleixner

It's only used in trace.c and there is absolutely no point in compiling it
in when user space stack traces are not supported.

Signed-off-by: Thomas Gleixner 
Cc: Steven Rostedt 
---
 kernel/trace/trace.c |   14 --
 kernel/trace/trace.h |8 
 2 files changed, 8 insertions(+), 14 deletions(-)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -159,6 +159,8 @@ static union trace_eval_map_item *trace_
 #endif /* CONFIG_TRACE_EVAL_MAP_FILE */
 
 static int tracing_set_tracer(struct trace_array *tr, const char *buf);
+static void ftrace_trace_userstack(struct ring_buffer *buffer,
+  unsigned long flags, int pc);
 
 #define MAX_TRACER_SIZE100
 static char bootup_tracer_buf[MAX_TRACER_SIZE] __initdata;
@@ -2905,9 +2907,10 @@ void trace_dump_stack(int skip)
 }
 EXPORT_SYMBOL_GPL(trace_dump_stack);
 
+#ifdef CONFIG_USER_STACKTRACE_SUPPORT
 static DEFINE_PER_CPU(int, user_stack_count);
 
-void
+static void
 ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc)
 {
struct trace_event_call *call = _user_stack;
@@ -2958,13 +2961,12 @@ ftrace_trace_userstack(struct ring_buffe
  out:
preempt_enable();
 }
-
-#ifdef UNUSED
-static void __trace_userstack(struct trace_array *tr, unsigned long flags)
+#else /* CONFIG_USER_STACKTRACE_SUPPORT */
+static void ftrace_trace_userstack(struct ring_buffer *buffer,
+  unsigned long flags, int pc)
 {
-   ftrace_trace_userstack(tr, flags, preempt_count());
 }
-#endif /* UNUSED */
+#endif /* !CONFIG_USER_STACKTRACE_SUPPORT */
 
 #endif /* CONFIG_STACKTRACE */
 
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -782,17 +782,9 @@ void update_max_tr_single(struct trace_a
 #endif /* CONFIG_TRACER_MAX_TRACE */
 
 #ifdef CONFIG_STACKTRACE
-void ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags,
-   int pc);
-
 void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
   int pc);
 #else
-static inline void ftrace_trace_userstack(struct ring_buffer *buffer,
- unsigned long flags, int pc)
-{
-}
-
 static inline void __trace_stack(struct trace_array *tr, unsigned long flags,
 int skip, int pc)
 {


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 25/29] livepatch: Simplify stack trace retrieval

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace by using the storage
array based interfaces.

Signed-off-by: Thomas Gleixner 
---
 kernel/livepatch/transition.c |   22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -202,15 +202,15 @@ void klp_update_patch_state(struct task_
  * Determine whether the given stack trace includes any references to a
  * to-be-patched or to-be-unpatched function.
  */
-static int klp_check_stack_func(struct klp_func *func,
-   struct stack_trace *trace)
+static int klp_check_stack_func(struct klp_func *func, unsigned long *entries,
+   unsigned int nr_entries)
 {
unsigned long func_addr, func_size, address;
struct klp_ops *ops;
int i;
 
-   for (i = 0; i < trace->nr_entries; i++) {
-   address = trace->entries[i];
+   for (i = 0; i < nr_entries; i++) {
+   address = entries[i];
 
if (klp_target_state == KLP_UNPATCHED) {
 /*
@@ -254,29 +254,25 @@ static int klp_check_stack_func(struct k
 static int klp_check_stack(struct task_struct *task, char *err_buf)
 {
static unsigned long entries[MAX_STACK_ENTRIES];
-   struct stack_trace trace;
struct klp_object *obj;
struct klp_func *func;
-   int ret;
+   int ret, nr_entries;
 
-   trace.skip = 0;
-   trace.nr_entries = 0;
-   trace.max_entries = MAX_STACK_ENTRIES;
-   trace.entries = entries;
-   ret = save_stack_trace_tsk_reliable(task, );
+   ret = stack_trace_save_tsk_reliable(task, entries, ARRAY_SIZE(entries));
WARN_ON_ONCE(ret == -ENOSYS);
-   if (ret) {
+   if (ret < 0) {
snprintf(err_buf, STACK_ERR_BUF_SIZE,
 "%s: %s:%d has an unreliable stack\n",
 __func__, task->comm, task->pid);
return ret;
}
+   nr_entries = ret;
 
klp_for_each_object(klp_transition_patch, obj) {
if (!obj->patched)
continue;
klp_for_each_func(obj, func) {
-   ret = klp_check_stack_func(func, );
+   ret = klp_check_stack_func(func, entries, nr_entries);
if (ret) {
snprintf(err_buf, STACK_ERR_BUF_SIZE,
 "%s: %s:%d is sleeping on function 
%s\n",


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 26/29] stacktrace: Remove obsolete functions

2019-04-18 Thread Thomas Gleixner

No more users of the struct stack_trace based interfaces. Remove them.

Remove the macro stubs for !CONFIG_STACKTRACE as well as they are pointless
because the storage on the call sites is conditional on CONFIG_STACKTRACE
already. No point to be 'smart'.

Signed-off-by: Thomas Gleixner 
---
 include/linux/stacktrace.h |   17 -
 kernel/stacktrace.c|   14 --
 2 files changed, 31 deletions(-)

--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -36,24 +36,7 @@ extern void save_stack_trace_tsk(struct
struct stack_trace *trace);
 extern int save_stack_trace_tsk_reliable(struct task_struct *tsk,
 struct stack_trace *trace);
-
-extern void print_stack_trace(struct stack_trace *trace, int spaces);
-extern int snprint_stack_trace(char *buf, size_t size,
-   struct stack_trace *trace, int spaces);
-
-#ifdef CONFIG_USER_STACKTRACE_SUPPORT
 extern void save_stack_trace_user(struct stack_trace *trace);
-#else
-# define save_stack_trace_user(trace)  do { } while (0)
-#endif
-
-#else /* !CONFIG_STACKTRACE */
-# define save_stack_trace(trace)   do { } while (0)
-# define save_stack_trace_tsk(tsk, trace)  do { } while (0)
-# define save_stack_trace_user(trace)  do { } while (0)
-# define print_stack_trace(trace, spaces)  do { } while (0)
-# define snprint_stack_trace(buf, size, trace, spaces) do { } while (0)
-# define save_stack_trace_tsk_reliable(tsk, trace) ({ -ENOSYS; })
 #endif /* CONFIG_STACKTRACE */
 
 #if defined(CONFIG_STACKTRACE) && defined(CONFIG_HAVE_RELIABLE_STACKTRACE)
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -30,12 +30,6 @@ void stack_trace_print(unsigned long *en
 }
 EXPORT_SYMBOL_GPL(stack_trace_print);
 
-void print_stack_trace(struct stack_trace *trace, int spaces)
-{
-   stack_trace_print(trace->entries, trace->nr_entries, spaces);
-}
-EXPORT_SYMBOL_GPL(print_stack_trace);
-
 /**
  * stack_trace_snprint - Print the entries in the stack trace into a buffer
  * @buf:   Pointer to the print buffer
@@ -70,14 +64,6 @@ int stack_trace_snprint(char *buf, size_
 }
 EXPORT_SYMBOL_GPL(stack_trace_snprint);
 
-int snprint_stack_trace(char *buf, size_t size,
-   struct stack_trace *trace, int spaces)
-{
-   return stack_trace_snprint(buf, size, trace->entries,
-  trace->nr_entries, spaces);
-}
-EXPORT_SYMBOL_GPL(snprint_stack_trace);
-
 /*
  * Architectures that do not implement save_stack_trace_*()
  * get these weak aliases and once-per-bootup warnings


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 13/29] btrfs: ref-verify: Simplify stack trace retrieval

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Johannes Thumshirn 
Acked-by: David Sterba 
Cc: Chris Mason 
Cc: Josef Bacik 
Cc: linux-bt...@vger.kernel.org
---
 fs/btrfs/ref-verify.c |   15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

--- a/fs/btrfs/ref-verify.c
+++ b/fs/btrfs/ref-verify.c
@@ -205,28 +205,17 @@ static struct root_entry *lookup_root_en
 #ifdef CONFIG_STACKTRACE
 static void __save_stack_trace(struct ref_action *ra)
 {
-   struct stack_trace stack_trace;
-
-   stack_trace.max_entries = MAX_TRACE;
-   stack_trace.nr_entries = 0;
-   stack_trace.entries = ra->trace;
-   stack_trace.skip = 2;
-   save_stack_trace(_trace);
-   ra->trace_len = stack_trace.nr_entries;
+   ra->trace_len = stack_trace_save(ra->trace, MAX_TRACE, 2);
 }
 
 static void __print_stack_trace(struct btrfs_fs_info *fs_info,
struct ref_action *ra)
 {
-   struct stack_trace trace;
-
if (ra->trace_len == 0) {
btrfs_err(fs_info, "  ref-verify: no stacktrace");
return;
}
-   trace.nr_entries = ra->trace_len;
-   trace.entries = ra->trace;
-   print_stack_trace(, 2);
+   stack_trace_print(ra->trace, ra->trace_len, 2);
 }
 #else
 static void inline __save_stack_trace(struct ref_action *ra)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 09/29] mm/kasan: Simplify stacktrace handling

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace by using the storage
array based interfaces.

Signed-off-by: Thomas Gleixner 
Acked-by: Dmitry Vyukov 
Cc: Andrey Ryabinin 
Cc: Alexander Potapenko 
Cc: kasan-...@googlegroups.com
Cc: linux...@kvack.org
---
 mm/kasan/common.c |   30 --
 mm/kasan/report.c |7 ---
 2 files changed, 16 insertions(+), 21 deletions(-)

--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -48,34 +48,28 @@ static inline int in_irqentry_text(unsig
 ptr < (unsigned long)&__softirqentry_text_end);
 }
 
-static inline void filter_irq_stacks(struct stack_trace *trace)
+static inline unsigned int filter_irq_stacks(unsigned long *entries,
+unsigned int nr_entries)
 {
-   int i;
+   unsigned int i;
 
-   if (!trace->nr_entries)
-   return;
-   for (i = 0; i < trace->nr_entries; i++)
-   if (in_irqentry_text(trace->entries[i])) {
+   for (i = 0; i < nr_entries; i++) {
+   if (in_irqentry_text(entries[i])) {
/* Include the irqentry function into the stack. */
-   trace->nr_entries = i + 1;
-   break;
+   return i + 1;
}
+   }
+   return nr_entries;
 }
 
 static inline depot_stack_handle_t save_stack(gfp_t flags)
 {
unsigned long entries[KASAN_STACK_DEPTH];
-   struct stack_trace trace = {
-   .nr_entries = 0,
-   .entries = entries,
-   .max_entries = KASAN_STACK_DEPTH,
-   .skip = 0
-   };
+   unsigned int nr_entries;
 
-   save_stack_trace();
-   filter_irq_stacks();
-
-   return depot_save_stack(, flags);
+   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
+   nr_entries = filter_irq_stacks(entries, nr_entries);
+   return stack_depot_save(entries, nr_entries, flags);
 }
 
 static inline void set_track(struct kasan_track *track, gfp_t flags)
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -100,10 +100,11 @@ static void print_track(struct kasan_tra
 {
pr_err("%s by task %u:\n", prefix, track->pid);
if (track->stack) {
-   struct stack_trace trace;
+   unsigned long *entries;
+   unsigned int nr_entries;
 
-   depot_fetch_stack(track->stack, );
-   print_stack_trace(, 0);
+   nr_entries = stack_depot_fetch(track->stack, );
+   stack_trace_print(entries, nr_entries, 0);
} else {
pr_err("(stack is not available)\n");
}


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 21/29] tracing: Use percpu stack trace buffer more intelligently

2019-04-18 Thread Thomas Gleixner

The per cpu stack trace buffer usage pattern is odd at best. The buffer has
place for 512 stack trace entries on 64-bit and 1024 on 32-bit. When
interrupts or exceptions nest after the per cpu buffer was acquired the
stacktrace length is hardcoded to 8 entries. 512/1024 stack trace entries
in kernel stacks are unrealistic so the buffer is a complete waste.

Split the buffer into chunks of 64 stack entries which is plenty. This
allows nesting contexts (interrupts, exceptions) to utilize the cpu buffer
for stack retrieval and avoids the fixed length allocation along with the
conditional execution pathes.

Signed-off-by: Thomas Gleixner 
Cc: Steven Rostedt 
---
 kernel/trace/trace.c |   77 +--
 1 file changed, 39 insertions(+), 38 deletions(-)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2749,12 +2749,21 @@ trace_function(struct trace_array *tr,
 
 #ifdef CONFIG_STACKTRACE
 
-#define FTRACE_STACK_MAX_ENTRIES (PAGE_SIZE / sizeof(unsigned long))
+/* 64 entries for kernel stacks are plenty */
+#define FTRACE_KSTACK_ENTRIES  64
+
 struct ftrace_stack {
-   unsigned long   calls[FTRACE_STACK_MAX_ENTRIES];
+   unsigned long   calls[FTRACE_KSTACK_ENTRIES];
 };
 
-static DEFINE_PER_CPU(struct ftrace_stack, ftrace_stack);
+/* This allows 8 level nesting which is plenty */
+#define FTRACE_KSTACK_NESTING  (PAGE_SIZE / sizeof(struct ftrace_stack))
+
+struct ftrace_stacks {
+   struct ftrace_stack stacks[FTRACE_KSTACK_NESTING];
+};
+
+static DEFINE_PER_CPU(struct ftrace_stacks, ftrace_stacks);
 static DEFINE_PER_CPU(int, ftrace_stack_reserve);
 
 static void __ftrace_trace_stack(struct ring_buffer *buffer,
@@ -2763,10 +2772,11 @@ static void __ftrace_trace_stack(struct
 {
struct trace_event_call *call = _kernel_stack;
struct ring_buffer_event *event;
+   struct ftrace_stack *fstack;
struct stack_entry *entry;
struct stack_trace trace;
-   int use_stack;
-   int size = FTRACE_STACK_ENTRIES;
+   int size = FTRACE_KSTACK_ENTRIES;
+   int stackidx;
 
trace.nr_entries= 0;
trace.skip  = skip;
@@ -2788,29 +2798,32 @@ static void __ftrace_trace_stack(struct
 */
preempt_disable_notrace();
 
-   use_stack = __this_cpu_inc_return(ftrace_stack_reserve);
+   stackidx = __this_cpu_inc_return(ftrace_stack_reserve);
+
+   /* This should never happen. If it does, yell once and skip */
+   if (WARN_ON_ONCE(stackidx >= FTRACE_KSTACK_NESTING))
+   goto out;
+
/*
-* We don't need any atomic variables, just a barrier.
-* If an interrupt comes in, we don't care, because it would
-* have exited and put the counter back to what we want.
-* We just need a barrier to keep gcc from moving things
-* around.
+* The above __this_cpu_inc_return() is 'atomic' cpu local. An
+* interrupt will either see the value pre increment or post
+* increment. If the interrupt happens pre increment it will have
+* restored the counter when it returns.  We just need a barrier to
+* keep gcc from moving things around.
 */
barrier();
-   if (use_stack == 1) {
-   trace.entries   = this_cpu_ptr(ftrace_stack.calls);
-   trace.max_entries   = FTRACE_STACK_MAX_ENTRIES;
-
-   if (regs)
-   save_stack_trace_regs(regs, );
-   else
-   save_stack_trace();
-
-   if (trace.nr_entries > size)
-   size = trace.nr_entries;
-   } else
-   /* From now on, use_stack is a boolean */
-   use_stack = 0;
+
+   fstack = this_cpu_ptr(ftrace_stacks.stacks) + (stackidx - 1);
+   trace.entries   = fstack->calls;
+   trace.max_entries   = FTRACE_KSTACK_ENTRIES;
+
+   if (regs)
+   save_stack_trace_regs(regs, );
+   else
+   save_stack_trace();
+
+   if (trace.nr_entries > size)
+   size = trace.nr_entries;
 
size *= sizeof(unsigned long);
 
@@ -2820,19 +2833,7 @@ static void __ftrace_trace_stack(struct
goto out;
entry = ring_buffer_event_data(event);
 
-   memset(>caller, 0, size);
-
-   if (use_stack)
-   memcpy(>caller, trace.entries,
-  trace.nr_entries * sizeof(unsigned long));
-   else {
-   trace.max_entries   = FTRACE_STACK_ENTRIES;
-   trace.entries   = entry->caller;
-   if (regs)
-   save_stack_trace_regs(regs, );
-   else
-   save_stack_trace();
-   }
+   memcpy(>caller, trace.entries, size);
 
entry->size = trace.nr_entries;
 


___
iommu mailing list
iommu@lists.linux-foundation.org

[patch V2 29/29] x86/stacktrace: Use common infrastructure

2019-04-18 Thread Thomas Gleixner

Replace the stack_trace_save*() functions with the new arch_stack_walk()
interfaces.

Signed-off-by: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
---
 arch/x86/Kconfig |1 
 arch/x86/kernel/stacktrace.c |  116 +++
 2 files changed, 20 insertions(+), 97 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -74,6 +74,7 @@ config X86
select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
+   select ARCH_STACKWALK
select ARCH_SUPPORTS_ACPI
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING if X86_64
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -12,75 +12,31 @@
 #include 
 #include 
 
-static int save_stack_address(struct stack_trace *trace, unsigned long addr,
- bool nosched)
-{
-   if (nosched && in_sched_functions(addr))
-   return 0;
-
-   if (trace->skip > 0) {
-   trace->skip--;
-   return 0;
-   }
-
-   if (trace->nr_entries >= trace->max_entries)
-   return -1;
-
-   trace->entries[trace->nr_entries++] = addr;
-   return 0;
-}
-
-static void noinline __save_stack_trace(struct stack_trace *trace,
-  struct task_struct *task, struct pt_regs *regs,
-  bool nosched)
+void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
+struct task_struct *task, struct pt_regs *regs)
 {
struct unwind_state state;
unsigned long addr;
 
-   if (regs)
-   save_stack_address(trace, regs->ip, nosched);
+   if (regs && !consume_entry(cookie, regs->ip, false))
+   return;
 
for (unwind_start(, task, regs, NULL); !unwind_done();
 unwind_next_frame()) {
addr = unwind_get_return_address();
-   if (!addr || save_stack_address(trace, addr, nosched))
+   if (!addr || !consume_entry(cookie, addr, false))
break;
}
 }
 
 /*
- * Save stack-backtrace addresses into a stack_trace buffer.
+ * This function returns an error if it detects any unreliable features of the
+ * stack.  Otherwise it guarantees that the stack trace is reliable.
+ *
+ * If the task is not 'current', the caller *must* ensure the task is inactive.
  */
-void save_stack_trace(struct stack_trace *trace)
-{
-   trace->skip++;
-   __save_stack_trace(trace, current, NULL, false);
-}
-EXPORT_SYMBOL_GPL(save_stack_trace);
-
-void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
-{
-   __save_stack_trace(trace, current, regs, false);
-}
-
-void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
-{
-   if (!try_get_task_stack(tsk))
-   return;
-
-   if (tsk == current)
-   trace->skip++;
-   __save_stack_trace(trace, tsk, NULL, true);
-
-   put_task_stack(tsk);
-}
-EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
-
-#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE
-
-static int __always_inline
-__save_stack_trace_reliable(struct stack_trace *trace,
-   struct task_struct *task)
+int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
+void *cookie, struct task_struct *task)
 {
struct unwind_state state;
struct pt_regs *regs;
@@ -117,7 +73,7 @@ static int __always_inline
if (!addr)
return -EINVAL;
 
-   if (save_stack_address(trace, addr, false))
+   if (!consume_entry(cookie, addr, false))
return -EINVAL;
}
 
@@ -132,32 +88,6 @@ static int __always_inline
return 0;
 }
 
-/*
- * This function returns an error if it detects any unreliable features of the
- * stack.  Otherwise it guarantees that the stack trace is reliable.
- *
- * If the task is not 'current', the caller *must* ensure the task is inactive.
- */
-int save_stack_trace_tsk_reliable(struct task_struct *tsk,
- struct stack_trace *trace)
-{
-   int ret;
-
-   /*
-* If the task doesn't have a stack (e.g., a zombie), the stack is
-* "reliably" empty.
-*/
-   if (!try_get_task_stack(tsk))
-   return 0;
-
-   ret = __save_stack_trace_reliable(trace, tsk);
-
-   put_task_stack(tsk);
-
-   return ret;
-}
-#endif /* CONFIG_HAVE_RELIABLE_STACKTRACE */
-
 /* Userspace stacktrace - based on kernel/trace/trace_sysprof.c */
 
 struct stack_frame_user {
@@ -182,15 +112,15 @@ copy_stack_frame(const void __user *fp,
return ret;
 }
 
-static inline void __save_stack_trace_user(struct stack_trace *trace)
+void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie,
+ const struct pt_regs

[patch V2 18/29] lockdep: Move stack trace logic into check_prev_add()

2019-04-18 Thread Thomas Gleixner

There is only one caller of check_prev_add() which hands in a zeroed struct
stack trace and a function pointer to save_stack(). Inside check_prev_add()
the stack_trace struct is checked for being empty, which is always
true. Based on that one code path stores a stack trace which is unused. The
comment there does not make sense either. It's all leftovers from
historical lockdep code (cross release).

Move the variable into check_prev_add() itself and cleanup the nonsensical
checks and the pointless stack trace recording.

Signed-off-by: Thomas Gleixner 
---
 kernel/locking/lockdep.c |   30 --
 1 file changed, 8 insertions(+), 22 deletions(-)

--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2158,10 +2158,10 @@ check_deadlock(struct task_struct *curr,
  */
 static int
 check_prev_add(struct task_struct *curr, struct held_lock *prev,
-  struct held_lock *next, int distance, struct stack_trace *trace,
-  int (*save)(struct stack_trace *trace))
+  struct held_lock *next, int distance)
 {
struct lock_list *uninitialized_var(target_entry);
+   struct stack_trace trace;
struct lock_list *entry;
struct lock_list this;
int ret;
@@ -2196,17 +2196,8 @@ check_prev_add(struct task_struct *curr,
this.class = hlock_class(next);
this.parent = NULL;
ret = check_noncircular(, hlock_class(prev), _entry);
-   if (unlikely(!ret)) {
-   if (!trace->entries) {
-   /*
-* If @save fails here, the printing might trigger
-* a WARN but because of the !nr_entries it should
-* not do bad things.
-*/
-   save(trace);
-   }
+   if (unlikely(!ret))
return print_circular_bug(, target_entry, next, prev);
-   }
else if (unlikely(ret < 0))
return print_bfs_bug(ret);
 
@@ -2253,7 +2244,7 @@ check_prev_add(struct task_struct *curr,
return print_bfs_bug(ret);
 
 
-   if (!trace->entries && !save(trace))
+   if (!save_trace())
return 0;
 
/*
@@ -2262,14 +2253,14 @@ check_prev_add(struct task_struct *curr,
 */
ret = add_lock_to_list(hlock_class(next), hlock_class(prev),
   _class(prev)->locks_after,
-  next->acquire_ip, distance, trace);
+  next->acquire_ip, distance, );
 
if (!ret)
return 0;
 
ret = add_lock_to_list(hlock_class(prev), hlock_class(next),
   _class(next)->locks_before,
-  next->acquire_ip, distance, trace);
+  next->acquire_ip, distance, );
if (!ret)
return 0;
 
@@ -2287,12 +2278,6 @@ check_prevs_add(struct task_struct *curr
 {
int depth = curr->lockdep_depth;
struct held_lock *hlock;
-   struct stack_trace trace = {
-   .nr_entries = 0,
-   .max_entries = 0,
-   .entries = NULL,
-   .skip = 0,
-   };
 
/*
 * Debugging checks.
@@ -2318,7 +2303,8 @@ check_prevs_add(struct task_struct *curr
 * added:
 */
if (hlock->read != 2 && hlock->check) {
-   int ret = check_prev_add(curr, hlock, next, distance, 
, save_trace);
+   int ret = check_prev_add(curr, hlock, next, distance);
+
if (!ret)
return 0;
 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 19/29] lockdep: Simplify stack trace handling

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace by using the storage
array based interfaces and storing the information is a small lockdep
specific data structure.

Signed-off-by: Thomas Gleixner 
---
 include/linux/lockdep.h  |9 +++--
 kernel/locking/lockdep.c |   44 +---
 2 files changed, 32 insertions(+), 21 deletions(-)

--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -66,6 +66,11 @@ struct lock_class_key {
 
 extern struct lock_class_key __lockdep_no_validate__;
 
+struct lock_trace {
+   unsigned intnr_entries;
+   unsigned intoffset;
+};
+
 #define LOCKSTAT_POINTS4
 
 /*
@@ -100,7 +105,7 @@ struct lock_class {
 * IRQ/softirq usage tracking bits:
 */
unsigned long   usage_mask;
-   struct stack_trace  usage_traces[XXX_LOCK_USAGE_STATES];
+   struct lock_trace   usage_traces[XXX_LOCK_USAGE_STATES];
 
/*
 * Generation counter, when doing certain classes of graph walking,
@@ -188,7 +193,7 @@ struct lock_list {
struct list_headentry;
struct lock_class   *class;
struct lock_class   *links_to;
-   struct stack_trace  trace;
+   struct lock_trace   trace;
int distance;
 
/*
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -434,18 +434,14 @@ static void print_lockdep_off(const char
 #endif
 }
 
-static int save_trace(struct stack_trace *trace)
+static int save_trace(struct lock_trace *trace)
 {
-   trace->nr_entries = 0;
-   trace->max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
-   trace->entries = stack_trace + nr_stack_trace_entries;
-
-   trace->skip = 3;
-
-   save_stack_trace(trace);
-
-   trace->max_entries = trace->nr_entries;
+   unsigned long *entries = stack_trace + nr_stack_trace_entries;
+   unsigned int max_entries;
 
+   trace->offset = nr_stack_trace_entries;
+   max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
+   trace->nr_entries = stack_trace_save(entries, max_entries, 3);
nr_stack_trace_entries += trace->nr_entries;
 
if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) {
@@ -1196,7 +1192,7 @@ static struct lock_list *alloc_list_entr
 static int add_lock_to_list(struct lock_class *this,
struct lock_class *links_to, struct list_head *head,
unsigned long ip, int distance,
-   struct stack_trace *trace)
+   struct lock_trace *trace)
 {
struct lock_list *entry;
/*
@@ -1415,6 +1411,13 @@ static inline int __bfs_backwards(struct
  * checking.
  */
 
+static void print_lock_trace(struct lock_trace *trace, unsigned int spaces)
+{
+   unsigned long *entries = stack_trace + trace->offset;
+
+   stack_trace_print(entries, trace->nr_entries, spaces);
+}
+
 /*
  * Print a dependency chain entry (this is only done when a deadlock
  * has been detected):
@@ -1427,8 +1430,7 @@ print_circular_bug_entry(struct lock_lis
printk("\n-> #%u", depth);
print_lock_name(target->class);
printk(KERN_CONT ":\n");
-   print_stack_trace(>trace, 6);
-
+   print_lock_trace(>trace, 6);
return 0;
 }
 
@@ -1740,7 +1742,7 @@ static void print_lock_class_header(stru
 
len += printk("%*s   %s", depth, "", usage_str[bit]);
len += printk(KERN_CONT " at:\n");
-   print_stack_trace(class->usage_traces + bit, len);
+   print_lock_trace(class->usage_traces + bit, len);
}
}
printk("%*s }\n", depth, "");
@@ -1765,7 +1767,7 @@ print_shortest_lock_dependencies(struct
do {
print_lock_class_header(entry->class, depth);
printk("%*s ... acquired at:\n", depth, "");
-   print_stack_trace(>trace, 2);
+   print_lock_trace(>trace, 2);
printk("\n");
 
if (depth == 0 && (entry != root)) {
@@ -1878,14 +1880,14 @@ print_bad_irq_dependency(struct task_str
print_lock_name(backwards_entry->class);
pr_warn("\n... which became %s-irq-safe at:\n", irqclass);
 
-   print_stack_trace(backwards_entry->class->usage_traces + bit1, 1);
+   print_lock_trace(backwards_entry->class->usage_traces + bit1, 1);
 
pr_warn("\nto a %s-irq-unsafe lock:\n", irqclass);
print_lock_name(forwards_entry->class);
pr_warn("\n... which became %s-irq-unsafe at:\n", irqclass);
pr_warn("...");
 
-   print_stack_trace(forwards_entry->class->usage_traces + bit2, 1);
+   print_lock_trace(forwards_entry->class->usage_traces + bit2, 1);
 
pr_warn("\nother info that might help us debug

[patch V2 16/29] drm: Simplify stacktrace handling

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace by using the storage
array based interfaces.

The original code in all printing functions is really wrong. It allocates a
storage array on stack which is unused because depot_fetch_stack() does not
store anything in it. It overwrites the entries pointer in the stack_trace
struct so it points to the depot storage.

Signed-off-by: Thomas Gleixner 
Cc: intel-...@lists.freedesktop.org
Cc: Joonas Lahtinen 
Cc: Maarten Lankhorst 
Cc: dri-de...@lists.freedesktop.org
Cc: David Airlie 
Cc: Jani Nikula 
Cc: Daniel Vetter 
Cc: Rodrigo Vivi 
---
 drivers/gpu/drm/drm_mm.c|   22 +++---
 drivers/gpu/drm/i915/i915_vma.c |   11 ---
 drivers/gpu/drm/i915/intel_runtime_pm.c |   21 +++--
 3 files changed, 18 insertions(+), 36 deletions(-)

--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -106,22 +106,19 @@
 static noinline void save_stack(struct drm_mm_node *node)
 {
unsigned long entries[STACKDEPTH];
-   struct stack_trace trace = {
-   .entries = entries,
-   .max_entries = STACKDEPTH,
-   .skip = 1
-   };
+   unsigned int n;
 
-   save_stack_trace();
+   n = stack_trace_save(entries, ARRAY_SIZE(entries), 1);
 
/* May be called under spinlock, so avoid sleeping */
-   node->stack = depot_save_stack(, GFP_NOWAIT);
+   node->stack = stack_depot_save(entries, n, GFP_NOWAIT);
 }
 
 static void show_leaks(struct drm_mm *mm)
 {
struct drm_mm_node *node;
-   unsigned long entries[STACKDEPTH];
+   unsigned long *entries;
+   unsigned int nr_entries;
char *buf;
 
buf = kmalloc(BUFSZ, GFP_KERNEL);
@@ -129,19 +126,14 @@ static void show_leaks(struct drm_mm *mm
return;
 
list_for_each_entry(node, drm_mm_nodes(mm), node_list) {
-   struct stack_trace trace = {
-   .entries = entries,
-   .max_entries = STACKDEPTH
-   };
-
if (!node->stack) {
DRM_ERROR("node [%08llx + %08llx]: unknown owner\n",
  node->start, node->size);
continue;
}
 
-   depot_fetch_stack(node->stack, );
-   snprint_stack_trace(buf, BUFSZ, , 0);
+   nr_entries = stack_depot_fetch(node->stack, );
+   stack_trace_snprint(buf, BUFSZ, entries, nr_entries, 0);
DRM_ERROR("node [%08llx + %08llx]: inserted at\n%s",
  node->start, node->size, buf);
}
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -36,11 +36,8 @@
 
 static void vma_print_allocator(struct i915_vma *vma, const char *reason)
 {
-   unsigned long entries[12];
-   struct stack_trace trace = {
-   .entries = entries,
-   .max_entries = ARRAY_SIZE(entries),
-   };
+   unsigned long *entries;
+   unsigned int nr_entries;
char buf[512];
 
if (!vma->node.stack) {
@@ -49,8 +46,8 @@ static void vma_print_allocator(struct i
return;
}
 
-   depot_fetch_stack(vma->node.stack, );
-   snprint_stack_trace(buf, sizeof(buf), , 0);
+   nr_entries = stack_depot_fetch(vma->node.stack, );
+   stack_trace_snprint(buf, sizeof(buf), entries, nr_entries, 0);
DRM_DEBUG_DRIVER("vma.node [%08llx + %08llx] %s: inserted at %s\n",
 vma->node.start, vma->node.size, reason, buf);
 }
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -60,27 +60,20 @@
 static noinline depot_stack_handle_t __save_depot_stack(void)
 {
unsigned long entries[STACKDEPTH];
-   struct stack_trace trace = {
-   .entries = entries,
-   .max_entries = ARRAY_SIZE(entries),
-   .skip = 1,
-   };
+   unsigned int n;
 
-   save_stack_trace();
-   return depot_save_stack(, GFP_NOWAIT | __GFP_NOWARN);
+   n = stack_trace_save(entries, ARRAY_SIZE(entries), 1);
+   return stack_depot_save(entries, n, GFP_NOWAIT | __GFP_NOWARN);
 }
 
 static void __print_depot_stack(depot_stack_handle_t stack,
char *buf, int sz, int indent)
 {
-   unsigned long entries[STACKDEPTH];
-   struct stack_trace trace = {
-   .entries = entries,
-   .max_entries = ARRAY_SIZE(entries),
-   };
+   unsigned long *entries;
+   unsigned int nr_entries;
 
-   depot_fetch_stack(stack, );
-   snprint_stack_trace(buf, sz, , indent);
+   nr_entries = stack_depot_fetch(stack, );
+   stack_trace_snprint(buf, sz, entries, nr_entries, indent);
 }
 
 static void init_intel_runtime_pm_wakeref(struct drm_i915_private *i915)


___
iommu mailing list
iommu@lists.linux-foundation.org

[patch V2 27/29] lib/stackdepot: Remove obsolete functions

2019-04-18 Thread Thomas Gleixner

No more users of the struct stack_trace based interfaces.

Signed-off-by: Thomas Gleixner 
Acked-by: Alexander Potapenko 
---
 include/linux/stackdepot.h |4 
 lib/stackdepot.c   |   20 
 2 files changed, 24 deletions(-)

--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -23,13 +23,9 @@
 
 typedef u32 depot_stack_handle_t;
 
-struct stack_trace;
-
-depot_stack_handle_t depot_save_stack(struct stack_trace *trace, gfp_t flags);
 depot_stack_handle_t stack_depot_save(unsigned long *entries,
  unsigned int nr_entries, gfp_t gfp_flags);
 
-void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace);
 unsigned int stack_depot_fetch(depot_stack_handle_t handle,
   unsigned long **entries);
 
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -212,14 +212,6 @@ unsigned int stack_depot_fetch(depot_sta
 }
 EXPORT_SYMBOL_GPL(stack_depot_fetch);
 
-void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace)
-{
-   unsigned int nent = stack_depot_fetch(handle, >entries);
-
-   trace->max_entries = trace->nr_entries = nent;
-}
-EXPORT_SYMBOL_GPL(depot_fetch_stack);
-
 /**
  * stack_depot_save - Save a stack trace from an array
  *
@@ -314,15 +306,3 @@ depot_stack_handle_t stack_depot_save(un
return retval;
 }
 EXPORT_SYMBOL_GPL(stack_depot_save);
-
-/**
- * depot_save_stack - save stack in a stack depot.
- * @trace - the stacktrace to save.
- * @alloc_flags - flags for allocating additional memory if required.
- */
-depot_stack_handle_t depot_save_stack(struct stack_trace *trace,
- gfp_t alloc_flags)
-{
-   return stack_depot_save(trace->entries, trace->nr_entries, alloc_flags);
-}
-EXPORT_SYMBOL_GPL(depot_save_stack);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 23/29] tracing: Simplify stack trace retrieval

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace by using the storage
array based interfaces.

Signed-off-by: Thomas Gleixner 
Cc: Steven Rostedt 
---
 kernel/trace/trace.c |   40 +---
 1 file changed, 13 insertions(+), 27 deletions(-)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2774,22 +2774,18 @@ static void __ftrace_trace_stack(struct
 {
struct trace_event_call *call = _kernel_stack;
struct ring_buffer_event *event;
+   unsigned int size, nr_entries;
struct ftrace_stack *fstack;
struct stack_entry *entry;
-   struct stack_trace trace;
-   int size = FTRACE_KSTACK_ENTRIES;
int stackidx;
 
-   trace.nr_entries= 0;
-   trace.skip  = skip;
-
/*
 * Add one, for this function and the call to save_stack_trace()
 * If regs is set, then these functions will not be in the way.
 */
 #ifndef CONFIG_UNWINDER_ORC
if (!regs)
-   trace.skip++;
+   skip++;
 #endif
 
/*
@@ -2816,28 +2812,24 @@ static void __ftrace_trace_stack(struct
barrier();
 
fstack = this_cpu_ptr(ftrace_stacks.stacks) + (stackidx - 1);
-   trace.entries   = fstack->calls;
-   trace.max_entries   = FTRACE_KSTACK_ENTRIES;
-
-   if (regs)
-   save_stack_trace_regs(regs, );
-   else
-   save_stack_trace();
-
-   if (trace.nr_entries > size)
-   size = trace.nr_entries;
+   size = ARRAY_SIZE(fstack->calls);
 
-   size *= sizeof(unsigned long);
+   if (regs) {
+   nr_entries = stack_trace_save_regs(regs, fstack->calls,
+  size, skip);
+   } else {
+   nr_entries = stack_trace_save(fstack->calls, size, skip);
+   }
 
+   size = nr_entries * sizeof(unsigned long);
event = __trace_buffer_lock_reserve(buffer, TRACE_STACK,
sizeof(*entry) + size, flags, pc);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
 
-   memcpy(>caller, trace.entries, size);
-
-   entry->size = trace.nr_entries;
+   memcpy(>caller, fstack->calls, size);
+   entry->size = nr_entries;
 
if (!call_filter_check_discard(call, entry, buffer, event))
__buffer_unlock_commit(buffer, event);
@@ -2916,7 +2908,6 @@ ftrace_trace_userstack(struct ring_buffe
struct trace_event_call *call = _user_stack;
struct ring_buffer_event *event;
struct userstack_entry *entry;
-   struct stack_trace trace;
 
if (!(global_trace.trace_flags & TRACE_ITER_USERSTACKTRACE))
return;
@@ -2947,12 +2938,7 @@ ftrace_trace_userstack(struct ring_buffe
entry->tgid = current->tgid;
memset(>caller, 0, sizeof(entry->caller));
 
-   trace.nr_entries= 0;
-   trace.max_entries   = FTRACE_STACK_ENTRIES;
-   trace.skip  = 0;
-   trace.entries   = entry->caller;
-
-   save_stack_trace_user();
+   stack_trace_save_user(entry->caller, FTRACE_STACK_ENTRIES);
if (!call_filter_check_discard(call, entry, buffer, event))
__buffer_unlock_commit(buffer, event);
 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-18 Thread Thomas Gleixner

All architectures which support stacktrace carry duplicated code and
do the stack storage and filtering at the architecture side.

Provide a consolidated interface with a callback function for consuming the
stack entries provided by the architecture specific stack walker. This
removes lots of duplicated code and allows to implement better filtering
than 'skip number of entries' in the future without touching any
architecture specific code.

Signed-off-by: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
---
 include/linux/stacktrace.h |   38 +
 kernel/stacktrace.c|  173 +
 lib/Kconfig|4 +
 3 files changed, 215 insertions(+)

--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -23,6 +23,43 @@ unsigned int stack_trace_save_regs(struc
 unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
 
 /* Internal interfaces. Do not use in generic code */
+#ifdef CONFIG_ARCH_STACKWALK
+
+/**
+ * stack_trace_consume_fn - Callback for arch_stack_walk()
+ * @cookie:Caller supplied pointer handed back by arch_stack_walk()
+ * @addr:  The stack entry address to consume
+ * @reliable:  True when the stack entry is reliable. Required by
+ * some printk based consumers.
+ *
+ * Returns:True, if the entry was consumed or skipped
+ * False, if there is no space left to store
+ */
+typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr,
+  bool reliable);
+/**
+ * arch_stack_walk - Architecture specific function to walk the stack
+
+ * @consume_entry: Callback which is invoked by the architecture code for
+ * each entry.
+ * @cookie:Caller supplied pointer which is handed back to
+ * @consume_entry
+ * @task:  Pointer to a task struct, can be NULL
+ * @regs:  Pointer to registers, can be NULL
+ *
+ * @task   @regs:
+ * NULLNULLStack trace from current
+ * taskNULLStack trace from task (can be current)
+ * NULLregsStack trace starting on regs->stackpointer
+ */
+void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
+struct task_struct *task, struct pt_regs *regs);
+int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void 
*cookie,
+struct task_struct *task);
+void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie,
+ const struct pt_regs *regs);
+
+#else /* CONFIG_ARCH_STACKWALK */
 struct stack_trace {
unsigned int nr_entries, max_entries;
unsigned long *entries;
@@ -37,6 +74,7 @@ extern void save_stack_trace_tsk(struct
 extern int save_stack_trace_tsk_reliable(struct task_struct *tsk,
 struct stack_trace *trace);
 extern void save_stack_trace_user(struct stack_trace *trace);
+#endif /* !CONFIG_ARCH_STACKWALK */
 #endif /* CONFIG_STACKTRACE */
 
 #if defined(CONFIG_STACKTRACE) && defined(CONFIG_HAVE_RELIABLE_STACKTRACE)
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -5,6 +5,8 @@
  *
  *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar 
  */
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -64,6 +66,175 @@ int stack_trace_snprint(char *buf, size_
 }
 EXPORT_SYMBOL_GPL(stack_trace_snprint);
 
+#ifdef CONFIG_ARCH_STACKWALK
+
+struct stacktrace_cookie {
+   unsigned long   *store;
+   unsigned intsize;
+   unsigned intskip;
+   unsigned intlen;
+};
+
+static bool stack_trace_consume_entry(void *cookie, unsigned long addr,
+ bool reliable)
+{
+   struct stacktrace_cookie *c = cookie;
+
+   if (c->len >= c->size)
+   return false;
+
+   if (c->skip > 0) {
+   c->skip--;
+   return true;
+   }
+   c->store[c->len++] = addr;
+   return c->len < c->size;
+}
+
+static bool stack_trace_consume_entry_nosched(void *cookie, unsigned long addr,
+ bool reliable)
+{
+   if (in_sched_functions(addr))
+   return true;
+   return stack_trace_consume_entry(cookie, addr, reliable);
+}
+
+/**
+ * stack_trace_save - Save a stack trace into a storage array
+ * @store: Pointer to storage array
+ * @size:  Size of the storage array
+ * @skipnr:Number of entries to skip at the start of the stack trace
+ *
+ * Returns number of entries stored.
+ */
+unsigned int stack_trace_save(unsigned long *store, unsigned int size,
+ unsigned int skipnr)
+{
+   stack_trace_consume_fn consume_entry = stack_trace_consume_entry;
+   struct stacktrace_cookie c = {
+   .store  = store,
+   .size   = size,
+   .skip   = skipnr + 1,
+   };
+
+

[patch V2 20/29] tracing: Simplify stacktrace retrieval in histograms

2019-04-18 Thread Thomas Gleixner

The indirection through struct stack_trace is not necessary at all. Use the
storage array based interface.

Signed-off-by: Thomas Gleixner 
Cc: Steven Rostedt 
---
 kernel/trace/trace_events_hist.c |   12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -5186,7 +5186,6 @@ static void event_hist_trigger(struct ev
u64 var_ref_vals[TRACING_MAP_VARS_MAX];
char compound_key[HIST_KEY_SIZE_MAX];
struct tracing_map_elt *elt = NULL;
-   struct stack_trace stacktrace;
struct hist_field *key_field;
u64 field_contents;
void *key = NULL;
@@ -5198,14 +5197,9 @@ static void event_hist_trigger(struct ev
key_field = hist_data->fields[i];
 
if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
-   stacktrace.max_entries = HIST_STACKTRACE_DEPTH;
-   stacktrace.entries = entries;
-   stacktrace.nr_entries = 0;
-   stacktrace.skip = HIST_STACKTRACE_SKIP;
-
-   memset(stacktrace.entries, 0, HIST_STACKTRACE_SIZE);
-   save_stack_trace();
-
+   memset(entries, 0, HIST_STACKTRACE_SIZE);
+   stack_trace_save(entries, HIST_STACKTRACE_DEPTH,
+HIST_STACKTRACE_SKIP);
key = entries;
} else {
field_contents = key_field->fn(key_field, elt, rbe, 
rec);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 24/29] tracing: Remove the last struct stack_trace usage

2019-04-18 Thread Thomas Gleixner

Simplify the stack retrieval code by using the storage array based
interface.

Signed-off-by: Thomas Gleixner 
---
 kernel/trace/trace_stack.c |   42 --
 1 file changed, 16 insertions(+), 26 deletions(-)

--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -23,16 +23,7 @@
 static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES];
 static unsigned stack_trace_index[STACK_TRACE_ENTRIES];
 
-/*
- * Reserve one entry for the passed in ip. This will allow
- * us to remove most or all of the stack size overhead
- * added by the stack tracer itself.
- */
-struct stack_trace stack_trace_max = {
-   .max_entries= STACK_TRACE_ENTRIES - 1,
-   .entries= _dump_trace[0],
-};
-
+static unsigned int stack_trace_entries;
 static unsigned long stack_trace_max_size;
 static arch_spinlock_t stack_trace_max_lock =
(arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
@@ -49,10 +40,10 @@ static void print_max_stack(void)
 
pr_emerg("DepthSize   Location(%d entries)\n"
   "-   \n",
-  stack_trace_max.nr_entries);
+  stack_trace_entries);
 
-   for (i = 0; i < stack_trace_max.nr_entries; i++) {
-   if (i + 1 == stack_trace_max.nr_entries)
+   for (i = 0; i < stack_trace_entries; i++) {
+   if (i + 1 == stack_trace_entries)
size = stack_trace_index[i];
else
size = stack_trace_index[i] - stack_trace_index[i+1];
@@ -98,13 +89,12 @@ static void check_stack(unsigned long ip
 
stack_trace_max_size = this_size;
 
-   stack_trace_max.nr_entries = 0;
-   stack_trace_max.skip = 0;
-
-   save_stack_trace(_trace_max);
+   stack_trace_entries = stack_trace_save(stack_dump_trace,
+  ARRAY_SIZE(stack_dump_trace) - 1,
+  0);
 
/* Skip over the overhead of the stack tracer itself */
-   for (i = 0; i < stack_trace_max.nr_entries; i++) {
+   for (i = 0; i < stack_trace_entries; i++) {
if (stack_dump_trace[i] == ip)
break;
}
@@ -113,7 +103,7 @@ static void check_stack(unsigned long ip
 * Some archs may not have the passed in ip in the dump.
 * If that happens, we need to show everything.
 */
-   if (i == stack_trace_max.nr_entries)
+   if (i == stack_trace_entries)
i = 0;
 
/*
@@ -131,13 +121,13 @@ static void check_stack(unsigned long ip
 * loop will only happen once. This code only takes place
 * on a new max, so it is far from a fast path.
 */
-   while (i < stack_trace_max.nr_entries) {
+   while (i < stack_trace_entries) {
int found = 0;
 
stack_trace_index[x] = this_size;
p = start;
 
-   for (; p < top && i < stack_trace_max.nr_entries; p++) {
+   for (; p < top && i < stack_trace_entries; p++) {
/*
 * The READ_ONCE_NOCHECK is used to let KASAN know that
 * this is not a stack-out-of-bounds error.
@@ -168,7 +158,7 @@ static void check_stack(unsigned long ip
i++;
}
 
-   stack_trace_max.nr_entries = x;
+   stack_trace_entries = x;
 
if (task_stack_end_corrupted(current)) {
print_max_stack();
@@ -270,7 +260,7 @@ static void *
 {
long n = *pos - 1;
 
-   if (n >= stack_trace_max.nr_entries)
+   if (n >= stack_trace_entries)
return NULL;
 
m->private = (void *)n;
@@ -334,7 +324,7 @@ static int t_show(struct seq_file *m, vo
seq_printf(m, "DepthSize   Location"
   "(%d entries)\n"
   "-   \n",
-  stack_trace_max.nr_entries);
+  stack_trace_entries);
 
if (!stack_tracer_enabled && !stack_trace_max_size)
print_disabled(m);
@@ -344,10 +334,10 @@ static int t_show(struct seq_file *m, vo
 
i = *(long *)v;
 
-   if (i >= stack_trace_max.nr_entries)
+   if (i >= stack_trace_entries)
return 0;
 
-   if (i + 1 == stack_trace_max.nr_entries)
+   if (i + 1 == stack_trace_entries)
size = stack_trace_index[i];
else
size = stack_trace_index[i] - stack_trace_index[i+1];


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-18 Thread Kees Cook via iommu

On Wed, Apr 17, 2019 at 11:41 PM Andy Lutomirski  wrote:
> I don't think this type of NX goof was ever the argument for XPFO.
> The main argument I've heard is that a malicious user program writes a
> ROP payload into user memory (regular anonymous user memory) and then
> gets the kernel to erroneously set RSP (*not* RIP) to point there.

Well, more than just ROP. Any of the various attack primitives. The NX
stuff is about moving RIP: SMEP-bypassing. But there is still basic
SMAP-bypassing for putting a malicious structure in userspace and
having the kernel access it via the linear mapping, etc.

> I find this argument fairly weak for a couple reasons.  First, if
> we're worried about this, let's do in-kernel CFI, not XPFO, to

CFI is getting much closer. Getting the kernel happy under Clang, LTO,
and CFI is under active development. (It's functional for arm64
already, and pieces have been getting upstreamed.)

> mitigate it.  Second, I don't see why the exact same attack can't be
> done using, say, page cache, and unless I'm missing something, XPFO
> doesn't protect page cache.  Or network buffers, or pipe buffers, etc.

My understanding is that it's much easier to feel out the linear
mapping address than for the others. But yes, all of those same attack
primitives are possible in other memory areas (though most are NX),
and plenty of exploits have done such things.

-- 
Kees Cook
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 10/29] mm/page_owner: Simplify stack trace handling

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace by using the storage
array based interfaces.

The original code in all printing functions is really wrong. It allocates a
storage array on stack which is unused because depot_fetch_stack() does not
store anything in it. It overwrites the entries pointer in the stack_trace
struct so it points to the depot storage.

Signed-off-by: Thomas Gleixner 
Cc: linux...@kvack.org
Cc: Mike Rapoport 
Cc: David Rientjes 
Cc: Andrew Morton 
---
 mm/page_owner.c |   79 +++-
 1 file changed, 28 insertions(+), 51 deletions(-)

--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -58,15 +58,10 @@ static bool need_page_owner(void)
 static __always_inline depot_stack_handle_t create_dummy_stack(void)
 {
unsigned long entries[4];
-   struct stack_trace dummy;
+   unsigned int nr_entries;
 
-   dummy.nr_entries = 0;
-   dummy.max_entries = ARRAY_SIZE(entries);
-   dummy.entries = [0];
-   dummy.skip = 0;
-
-   save_stack_trace();
-   return depot_save_stack(, GFP_KERNEL);
+   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
+   return stack_depot_save(entries, nr_entries, GFP_KERNEL);
 }
 
 static noinline void register_dummy_stack(void)
@@ -120,46 +115,39 @@ void __reset_page_owner(struct page *pag
}
 }
 
-static inline bool check_recursive_alloc(struct stack_trace *trace,
-   unsigned long ip)
+static inline bool check_recursive_alloc(unsigned long *entries,
+unsigned int nr_entries,
+unsigned long ip)
 {
-   int i;
+   unsigned int i;
 
-   if (!trace->nr_entries)
-   return false;
-
-   for (i = 0; i < trace->nr_entries; i++) {
-   if (trace->entries[i] == ip)
+   for (i = 0; i < nr_entries; i++) {
+   if (entries[i] == ip)
return true;
}
-
return false;
 }
 
 static noinline depot_stack_handle_t save_stack(gfp_t flags)
 {
unsigned long entries[PAGE_OWNER_STACK_DEPTH];
-   struct stack_trace trace = {
-   .nr_entries = 0,
-   .entries = entries,
-   .max_entries = PAGE_OWNER_STACK_DEPTH,
-   .skip = 2
-   };
depot_stack_handle_t handle;
+   unsigned int nr_entries;
 
-   save_stack_trace();
+   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2);
 
/*
-* We need to check recursion here because our request to stackdepot
-* could trigger memory allocation to save new entry. New memory
-* allocation would reach here and call depot_save_stack() again
-* if we don't catch it. There is still not enough memory in stackdepot
-* so it would try to allocate memory again and loop forever.
+* We need to check recursion here because our request to
+* stackdepot could trigger memory allocation to save new
+* entry. New memory allocation would reach here and call
+* stack_depot_save_entries() again if we don't catch it. There is
+* still not enough memory in stackdepot so it would try to
+* allocate memory again and loop forever.
 */
-   if (check_recursive_alloc(, _RET_IP_))
+   if (check_recursive_alloc(entries, nr_entries, _RET_IP_))
return dummy_handle;
 
-   handle = depot_save_stack(, flags);
+   handle = stack_depot_save(entries, nr_entries, flags);
if (!handle)
handle = failure_handle;
 
@@ -337,16 +325,10 @@ print_page_owner(char __user *buf, size_
struct page *page, struct page_owner *page_owner,
depot_stack_handle_t handle)
 {
-   int ret;
-   int pageblock_mt, page_mt;
+   int ret, pageblock_mt, page_mt;
+   unsigned long *entries;
+   unsigned int nr_entries;
char *kbuf;
-   unsigned long entries[PAGE_OWNER_STACK_DEPTH];
-   struct stack_trace trace = {
-   .nr_entries = 0,
-   .entries = entries,
-   .max_entries = PAGE_OWNER_STACK_DEPTH,
-   .skip = 0
-   };
 
count = min_t(size_t, count, PAGE_SIZE);
kbuf = kmalloc(count, GFP_KERNEL);
@@ -375,8 +357,8 @@ print_page_owner(char __user *buf, size_
if (ret >= count)
goto err;
 
-   depot_fetch_stack(handle, );
-   ret += snprint_stack_trace(kbuf + ret, count - ret, , 0);
+   nr_entries = stack_depot_fetch(handle, );
+   ret += stack_trace_snprint(kbuf + ret, count - ret, entries, 
nr_entries, 0);
if (ret >= count)
goto err;
 
@@ -407,14 +389,9 @@ void __dump_page_owner(struct page *page
 {
struct page_ext *page_ext = lookup_page_ext(page);
struct page_owner *page_owner;
-   unsigned long entries[PAGE_OWNER_STACK_DEPTH];
-   struct

[patch V2 17/29] lockdep: Remove unused trace argument from print_circular_bug()

2019-04-18 Thread Thomas Gleixner

Signed-off-by: Thomas Gleixner 
---
 kernel/locking/lockdep.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1522,10 +1522,9 @@ static inline int class_equal(struct loc
 }
 
 static noinline int print_circular_bug(struct lock_list *this,
-   struct lock_list *target,
-   struct held_lock *check_src,
-   struct held_lock *check_tgt,
-   struct stack_trace *trace)
+  struct lock_list *target,
+  struct held_lock *check_src,
+  struct held_lock *check_tgt)
 {
struct task_struct *curr = current;
struct lock_list *parent;
@@ -2206,7 +2205,7 @@ check_prev_add(struct task_struct *curr,
 */
save(trace);
}
-   return print_circular_bug(, target_entry, next, prev, 
trace);
+   return print_circular_bug(, target_entry, next, prev);
}
else if (unlikely(ret < 0))
return print_bfs_bug(ret);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 15/29] dm persistent data: Simplify stack trace handling

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface. This results in less storage space and
indirection.

Signed-off-by: Thomas Gleixner 
Cc: dm-de...@redhat.com
Cc: Mike Snitzer 
Cc: Alasdair Kergon 
---
 drivers/md/persistent-data/dm-block-manager.c |   19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

--- a/drivers/md/persistent-data/dm-block-manager.c
+++ b/drivers/md/persistent-data/dm-block-manager.c
@@ -35,7 +35,10 @@
 #define MAX_HOLDERS 4
 #define MAX_STACK 10
 
-typedef unsigned long stack_entries[MAX_STACK];
+struct stack_store {
+   unsigned intnr_entries;
+   unsigned long   entries[MAX_STACK];
+};
 
 struct block_lock {
spinlock_t lock;
@@ -44,8 +47,7 @@ struct block_lock {
struct task_struct *holders[MAX_HOLDERS];
 
 #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
-   struct stack_trace traces[MAX_HOLDERS];
-   stack_entries entries[MAX_HOLDERS];
+   struct stack_store traces[MAX_HOLDERS];
 #endif
 };
 
@@ -73,7 +75,7 @@ static void __add_holder(struct block_lo
 {
unsigned h = __find_holder(lock, NULL);
 #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
-   struct stack_trace *t;
+   struct stack_store *t;
 #endif
 
get_task_struct(task);
@@ -81,11 +83,7 @@ static void __add_holder(struct block_lo
 
 #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
t = lock->traces + h;
-   t->nr_entries = 0;
-   t->max_entries = MAX_STACK;
-   t->entries = lock->entries[h];
-   t->skip = 2;
-   save_stack_trace(t);
+   t->nr_entries = stack_trace_save(t->entries, MAX_STACK, 2);
 #endif
 }
 
@@ -106,7 +104,8 @@ static int __check_holder(struct block_l
DMERR("recursive lock detected in metadata");
 #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
DMERR("previously held here:");
-   print_stack_trace(lock->traces + i, 4);
+   stack_trace_print(lock->traces[i].entries,
+ lock->traces[i].nr_entries, 4);
 
DMERR("subsequent acquisition attempted here:");
dump_stack();


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 06/29] latency_top: Simplify stack trace handling

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface.

Signed-off-by: Thomas Gleixner 
---
 kernel/latencytop.c |   17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

--- a/kernel/latencytop.c
+++ b/kernel/latencytop.c
@@ -141,20 +141,6 @@ account_global_scheduler_latency(struct
memcpy(_record[i], lat, sizeof(struct latency_record));
 }
 
-/*
- * Iterator to store a backtrace into a latency record entry
- */
-static inline void store_stacktrace(struct task_struct *tsk,
-   struct latency_record *lat)
-{
-   struct stack_trace trace;
-
-   memset(, 0, sizeof(trace));
-   trace.max_entries = LT_BACKTRACEDEPTH;
-   trace.entries = >backtrace[0];
-   save_stack_trace_tsk(tsk, );
-}
-
 /**
  * __account_scheduler_latency - record an occurred latency
  * @tsk - the task struct of the task hitting the latency
@@ -191,7 +177,8 @@ void __sched
lat.count = 1;
lat.time = usecs;
lat.max = usecs;
-   store_stacktrace(tsk, );
+
+   stack_trace_save_tsk(tsk, lat.backtrace, LT_BACKTRACEDEPTH, 0);
 
raw_spin_lock_irqsave(_lock, flags);
 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 12/29] dma/debug: Simplify stracktrace retrieval

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface.

Signed-off-by: Thomas Gleixner 
Cc: iommu@lists.linux-foundation.org
Cc: Robin Murphy 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
---
 kernel/dma/debug.c |   13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -89,8 +89,8 @@ struct dma_debug_entry {
int  sg_mapped_ents;
enum map_err_types  map_err_type;
 #ifdef CONFIG_STACKTRACE
-   struct   stack_trace stacktrace;
-   unsigned longst_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
+   unsigned intstack_len;
+   unsigned long   stack_entries[DMA_DEBUG_STACKTRACE_ENTRIES];
 #endif
 };
 
@@ -174,7 +174,7 @@ static inline void dump_entry_trace(stru
 #ifdef CONFIG_STACKTRACE
if (entry) {
pr_warning("Mapped at:\n");
-   print_stack_trace(>stacktrace, 0);
+   stack_trace_print(entry->stack_entries, entry->stack_len, 0);
}
 #endif
 }
@@ -704,12 +704,9 @@ static struct dma_debug_entry *dma_entry
spin_unlock_irqrestore(_entries_lock, flags);
 
 #ifdef CONFIG_STACKTRACE
-   entry->stacktrace.max_entries = DMA_DEBUG_STACKTRACE_ENTRIES;
-   entry->stacktrace.entries = entry->st_entries;
-   entry->stacktrace.skip = 1;
-   save_stack_trace(>stacktrace);
+   entry->stack_len = stack_trace_save(entry->stack_entries,
+   ARRAY_SIZE(entry->stack_entries), 
1);
 #endif
-
return entry;
 }
 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 07/29] mm/slub: Simplify stack trace retrieval

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface.

Signed-off-by: Thomas Gleixner 
Cc: Andrew Morton 
Cc: Pekka Enberg 
Cc: linux...@kvack.org
Cc: David Rientjes 
Cc: Christoph Lameter 
---
 mm/slub.c |   12 
 1 file changed, 4 insertions(+), 8 deletions(-)

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -552,18 +552,14 @@ static void set_track(struct kmem_cache
 
if (addr) {
 #ifdef CONFIG_STACKTRACE
-   struct stack_trace trace;
+   unsigned int nr_entries;
 
-   trace.nr_entries = 0;
-   trace.max_entries = TRACK_ADDRS_COUNT;
-   trace.entries = p->addrs;
-   trace.skip = 3;
metadata_access_enable();
-   save_stack_trace();
+   nr_entries = stack_trace_save(p->addrs, TRACK_ADDRS_COUNT, 3);
metadata_access_disable();
 
-   if (trace.nr_entries < TRACK_ADDRS_COUNT)
-   p->addrs[trace.nr_entries] = 0;
+   if (nr_entries < TRACK_ADDRS_COUNT)
+   p->addrs[nr_entries] = 0;
 #endif
p->addr = addr;
p->cpu = smp_processor_id();


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 11/29] fault-inject: Simplify stacktrace retrieval

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface.

Signed-off-by: Thomas Gleixner 
Cc: Akinobu Mita 
---
 lib/fault-inject.c |   12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

--- a/lib/fault-inject.c
+++ b/lib/fault-inject.c
@@ -65,22 +65,16 @@ static bool fail_task(struct fault_attr
 
 static bool fail_stacktrace(struct fault_attr *attr)
 {
-   struct stack_trace trace;
int depth = attr->stacktrace_depth;
unsigned long entries[MAX_STACK_TRACE_DEPTH];
-   int n;
+   int n, nr_entries;
bool found = (attr->require_start == 0 && attr->require_end == 
ULONG_MAX);
 
if (depth == 0)
return found;
 
-   trace.nr_entries = 0;
-   trace.entries = entries;
-   trace.max_entries = depth;
-   trace.skip = 1;
-
-   save_stack_trace();
-   for (n = 0; n < trace.nr_entries; n++) {
+   nr_entries = stack_trace_save(entries, depth, 1);
+   for (n = 0; n < nr_entries; n++) {
if (attr->reject_start <= entries[n] &&
   entries[n] < attr->reject_end)
return false;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 04/29] backtrace-test: Simplify stack trace handling

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace by using the storage
array based interfaces.

Signed-off-by: Thomas Gleixner 
---
 kernel/backtracetest.c |   11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

--- a/kernel/backtracetest.c
+++ b/kernel/backtracetest.c
@@ -48,19 +48,14 @@ static void backtrace_test_irq(void)
 #ifdef CONFIG_STACKTRACE
 static void backtrace_test_saved(void)
 {
-   struct stack_trace trace;
unsigned long entries[8];
+   unsigned int nr_entries;
 
pr_info("Testing a saved backtrace.\n");
pr_info("The following trace is a kernel self test and not a bug!\n");
 
-   trace.nr_entries = 0;
-   trace.max_entries = ARRAY_SIZE(entries);
-   trace.entries = entries;
-   trace.skip = 0;
-
-   save_stack_trace();
-   print_stack_trace(, 0);
+   nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
+   stack_trace_print(entries, nr_entries, 0);
 }
 #else
 static void backtrace_test_saved(void)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 00/29] stacktrace: Consolidate stack trace usage

2019-04-18 Thread Thomas Gleixner

This is an update to V1:

 https://lkml.kernel.org/r/20190410102754.387743...@linutronix.de

Struct stack_trace is a sinkhole for input and output parameters which is
largely pointless for most usage sites. In fact if embedded into other data
structures it creates indirections and extra storage overhead for no
benefit.

Looking at all usage sites makes it clear that they just require an
interface which is based on a storage array. That array is either on stack,
global or embedded into some other data structure.

Some of the stack depot usage sites are outright wrong, but fortunately the
wrongness just causes more stack being used for nothing and does not have
functional impact.

Fix this up by:

  1) Providing plain storage array based interfaces for stacktrace and
 stackdepot.

  2) Cleaning up the mess at the callsites including some related
 cleanups.

  3) Removing the struct stack_trace based interfaces

  This is not yet changing the struct stack_trace interfaces at the
  architecture level, but it removes the exposure to the usage sites.

The last two patches are extending the cleanup to the architecture level by
replacing the various save_stack_trace.* architecture interfaces with a
more unified arch_stack_walk() interface. x86 is converted, but I have
worked through all architectures already and it removes lots of duplicated
code and allows consolidation across the board. The rest of the
architecture patches are not included in this posting as I want to get
feedback on the approach itself. The diffstat of cleaning up the remaining
architectures is currently on top of the current lot is:

   47 files changed, 402 insertions(+), 1196 deletions(-)

Once this has settled, the core interfaces can be improved by adding
features, which allow to get rid of the imprecise 'skip number of entries'
approach which tries to remove the stack tracer and the callsites themself
from the trace. That's error prone due to inlining and other issues. Having
e.g. a _RET_IP_ based filter allows to do that far more reliable.

The series is based on:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/stacktrace

which contains the removal of the inconsistent and pointless ULONG_MAX
termination of stacktraces.

It's also available from git:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.core/stacktrace

   up to:  131038eb3e2f ("x86/stacktrace: Use common infrastructure")

Changes vs. V1:

   - Applied the ULONG_MAX termination cleanup in tip

   - Addressed the review comments

   - Fixed up the last users of struct stack_trace outside the stacktrace
 core and architecture code (livepatch, tracing)

   - Added the new arch_stack_walk() model and converted x86 to it

Thanks,

tglx

---
 arch/x86/Kconfig  |1 
 arch/x86/kernel/stacktrace.c  |  116 +
 drivers/gpu/drm/drm_mm.c  |   22 -
 drivers/gpu/drm/i915/i915_vma.c   |   11 
 drivers/gpu/drm/i915/intel_runtime_pm.c   |   21 -
 drivers/md/dm-bufio.c |   15 -
 drivers/md/persistent-data/dm-block-manager.c |   19 -
 fs/btrfs/ref-verify.c |   15 -
 fs/proc/base.c|   14 -
 include/linux/ftrace.h|   18 -
 include/linux/lockdep.h   |9 
 include/linux/stackdepot.h|8 
 include/linux/stacktrace.h|   80 +-
 kernel/backtracetest.c|   11 
 kernel/dma/debug.c|   13 -
 kernel/latencytop.c   |   17 -
 kernel/livepatch/transition.c |   22 -
 kernel/locking/lockdep.c  |   81 ++
 kernel/stacktrace.c   |  323 --
 kernel/trace/trace.c  |  105 +++-
 kernel/trace/trace.h  |8 
 kernel/trace/trace_events_hist.c  |   12 
 kernel/trace/trace_stack.c|   76 ++
 lib/Kconfig   |4 
 lib/fault-inject.c|   12 
 lib/stackdepot.c  |   50 ++--
 mm/kasan/common.c |   30 --
 mm/kasan/report.c |7 
 mm/kmemleak.c |   24 -
 mm/page_owner.c   |   79 ++
 mm/slub.c |   12 
 31 files changed, 664 insertions(+), 571 deletions(-)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 14/29] dm bufio: Simplify stack trace retrieval

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface.

Signed-off-by: Thomas Gleixner 
Cc: dm-de...@redhat.com
Cc: Mike Snitzer 
Cc: Alasdair Kergon 
---
 drivers/md/dm-bufio.c |   15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -150,7 +150,7 @@ struct dm_buffer {
void (*end_io)(struct dm_buffer *, blk_status_t);
 #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
 #define MAX_STACK 10
-   struct stack_trace stack_trace;
+   unsigned int stack_len;
unsigned long stack_entries[MAX_STACK];
 #endif
 };
@@ -232,11 +232,7 @@ static DEFINE_MUTEX(dm_bufio_clients_loc
 #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
 static void buffer_record_stack(struct dm_buffer *b)
 {
-   b->stack_trace.nr_entries = 0;
-   b->stack_trace.max_entries = MAX_STACK;
-   b->stack_trace.entries = b->stack_entries;
-   b->stack_trace.skip = 2;
-   save_stack_trace(>stack_trace);
+   b->stack_len = stack_trace_save(b->stack_entries, MAX_STACK, 2);
 }
 #endif
 
@@ -438,7 +434,7 @@ static struct dm_buffer *alloc_buffer(st
adjust_total_allocated(b->data_mode, (long)c->block_size);
 
 #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
-   memset(>stack_trace, 0, sizeof(b->stack_trace));
+   b->stack_len = 0;
 #endif
return b;
 }
@@ -1520,8 +1516,9 @@ static void drop_buffers(struct dm_bufio
DMERR("leaked buffer %llx, hold count %u, list %d",
  (unsigned long long)b->block, b->hold_count, i);
 #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING
-   print_stack_trace(>stack_trace, 1);
-   b->hold_count = 0; /* mark unclaimed to avoid BUG_ON 
below */
+   stack_trace_print(b->stack_entries, b->stack_len, 1);
+   /* mark unclaimed to avoid BUG_ON below */
+   b->hold_count = 0;
 #endif
}
 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 08/29] mm/kmemleak: Simplify stacktrace handling

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace by using the storage
array based interfaces.

Signed-off-by: Thomas Gleixner 
Cc: Catalin Marinas 
Cc: linux...@kvack.org
---
 mm/kmemleak.c |   24 +++-
 1 file changed, 3 insertions(+), 21 deletions(-)

--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -410,11 +410,6 @@ static void print_unreferenced(struct se
  */
 static void dump_object_info(struct kmemleak_object *object)
 {
-   struct stack_trace trace;
-
-   trace.nr_entries = object->trace_len;
-   trace.entries = object->trace;
-
pr_notice("Object 0x%08lx (size %zu):\n",
  object->pointer, object->size);
pr_notice("  comm \"%s\", pid %d, jiffies %lu\n",
@@ -424,7 +419,7 @@ static void dump_object_info(struct kmem
pr_notice("  flags = 0x%x\n", object->flags);
pr_notice("  checksum = %u\n", object->checksum);
pr_notice("  backtrace:\n");
-   print_stack_trace(, 4);
+   stack_trace_print(object->trace, object->trace_len, 4);
 }
 
 /*
@@ -553,15 +548,7 @@ static struct kmemleak_object *find_and_
  */
 static int __save_stack_trace(unsigned long *trace)
 {
-   struct stack_trace stack_trace;
-
-   stack_trace.max_entries = MAX_TRACE;
-   stack_trace.nr_entries = 0;
-   stack_trace.entries = trace;
-   stack_trace.skip = 2;
-   save_stack_trace(_trace);
-
-   return stack_trace.nr_entries;
+   return stack_trace_save(trace, MAX_TRACE, 2);
 }
 
 /*
@@ -2019,13 +2006,8 @@ early_param("kmemleak", kmemleak_boot_co
 
 static void __init print_log_trace(struct early_log *log)
 {
-   struct stack_trace trace;
-
-   trace.nr_entries = log->trace_len;
-   trace.entries = log->trace;
-
pr_notice("Early log backtrace:\n");
-   print_stack_trace(, 2);
+   stack_trace_print(log->trace, log->trace_len, 2);
 }
 
 /*


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 05/29] proc: Simplify task stack retrieval

2019-04-18 Thread Thomas Gleixner

Replace the indirection through struct stack_trace with an invocation of
the storage array based interface.

Signed-off-by: Thomas Gleixner 
Reviewed-by: Alexey Dobriyan 
Cc: Andrew Morton 
---
 fs/proc/base.c |   14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -407,7 +407,6 @@ static void unlock_trace(struct task_str
 static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
  struct pid *pid, struct task_struct *task)
 {
-   struct stack_trace trace;
unsigned long *entries;
int err;
 
@@ -430,20 +429,17 @@ static int proc_pid_stack(struct seq_fil
if (!entries)
return -ENOMEM;
 
-   trace.nr_entries= 0;
-   trace.max_entries   = MAX_STACK_TRACE_DEPTH;
-   trace.entries   = entries;
-   trace.skip  = 0;
-
err = lock_trace(task);
if (!err) {
-   unsigned int i;
+   unsigned int i, nr_entries;
 
-   save_stack_trace_tsk(task, );
+   nr_entries = stack_trace_save_tsk(task, entries,
+ MAX_STACK_TRACE_DEPTH, 0);
 
-   for (i = 0; i < trace.nr_entries; i++) {
+   for (i = 0; i < nr_entries; i++) {
seq_printf(m, "[<0>] %pB\n", (void *)entries[i]);
}
+
unlock_trace(task);
}
kfree(entries);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 01/29] tracing: Cleanup stack trace code

2019-04-18 Thread Thomas Gleixner

- Remove the extra array member of stack_dump_trace[]. It's not required as
  the stack tracer stores at max array size - 1 entries so there is still
  an empty slot.

- Make variables which are only used in trace_stack.c static.

- Simplify the enable/disable logic.

- Rename stack_trace_print() as it's using the stack_trace_ namespace. Free
  the name up for stack trace related functions.

Signed-off-by: Thomas Gleixner 
Cc: Steven Rostedt 
---
V2: Add more cleanups and use print_max_stack() as requested by Steven.
---
 include/linux/ftrace.h |   18 --
 kernel/trace/trace_stack.c |   36 
 2 files changed, 16 insertions(+), 38 deletions(-)

--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -241,21 +241,11 @@ static inline void ftrace_free_mem(struc
 
 #ifdef CONFIG_STACK_TRACER
 
-#define STACK_TRACE_ENTRIES 500
-
-struct stack_trace;
-
-extern unsigned stack_trace_index[];
-extern struct stack_trace stack_trace_max;
-extern unsigned long stack_trace_max_size;
-extern arch_spinlock_t stack_trace_max_lock;
-
 extern int stack_tracer_enabled;
-void stack_trace_print(void);
-int
-stack_trace_sysctl(struct ctl_table *table, int write,
-  void __user *buffer, size_t *lenp,
-  loff_t *ppos);
+
+int stack_trace_sysctl(struct ctl_table *table, int write,
+  void __user *buffer, size_t *lenp,
+  loff_t *ppos);
 
 /* DO NOT MODIFY THIS VARIABLE DIRECTLY! */
 DECLARE_PER_CPU(int, disable_stack_tracer);
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -18,8 +18,10 @@
 
 #include "trace.h"
 
-static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES + 1];
-unsigned stack_trace_index[STACK_TRACE_ENTRIES];
+#define STACK_TRACE_ENTRIES 500
+
+static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES];
+static unsigned stack_trace_index[STACK_TRACE_ENTRIES];
 
 /*
  * Reserve one entry for the passed in ip. This will allow
@@ -31,17 +33,16 @@ struct stack_trace stack_trace_max = {
.entries= _dump_trace[0],
 };
 
-unsigned long stack_trace_max_size;
-arch_spinlock_t stack_trace_max_lock =
+static unsigned long stack_trace_max_size;
+static arch_spinlock_t stack_trace_max_lock =
(arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
 
 DEFINE_PER_CPU(int, disable_stack_tracer);
 static DEFINE_MUTEX(stack_sysctl_mutex);
 
 int stack_tracer_enabled;
-static int last_stack_tracer_enabled;
 
-void stack_trace_print(void)
+static void print_max_stack(void)
 {
long i;
int size;
@@ -61,16 +62,7 @@ void stack_trace_print(void)
}
 }
 
-/*
- * When arch-specific code overrides this function, the following
- * data should be filled up, assuming stack_trace_max_lock is held to
- * prevent concurrent updates.
- * stack_trace_index[]
- * stack_trace_max
- * stack_trace_max_size
- */
-void __weak
-check_stack(unsigned long ip, unsigned long *stack)
+static void check_stack(unsigned long ip, unsigned long *stack)
 {
unsigned long this_size, flags; unsigned long *p, *top, *start;
static int tracer_frame;
@@ -179,7 +171,7 @@ check_stack(unsigned long ip, unsigned l
stack_trace_max.nr_entries = x;
 
if (task_stack_end_corrupted(current)) {
-   stack_trace_print();
+   print_max_stack();
BUG();
}
 
@@ -412,23 +404,20 @@ stack_trace_sysctl(struct ctl_table *tab
   void __user *buffer, size_t *lenp,
   loff_t *ppos)
 {
-   int ret;
+   int ret, was_enabled;
 
mutex_lock(_sysctl_mutex);
+   was_enabled = !!stack_tracer_enabled;
 
ret = proc_dointvec(table, write, buffer, lenp, ppos);
 
-   if (ret || !write ||
-   (last_stack_tracer_enabled == !!stack_tracer_enabled))
+   if (ret || !write || (was_enabled == !!stack_tracer_enabled))
goto out;
 
-   last_stack_tracer_enabled = !!stack_tracer_enabled;
-
if (stack_tracer_enabled)
register_ftrace_function(_ops);
else
unregister_ftrace_function(_ops);
-
  out:
mutex_unlock(_sysctl_mutex);
return ret;
@@ -444,7 +433,6 @@ static __init int enable_stacktrace(char
strncpy(stack_trace_filter_buf, str + len, COMMAND_LINE_SIZE);
 
stack_tracer_enabled = 1;
-   last_stack_tracer_enabled = 1;
return 1;
 }
 __setup("stacktrace", enable_stacktrace);


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch V2 02/29] stacktrace: Provide helpers for common stack trace operations

2019-04-18 Thread Thomas Gleixner

All operations with stack traces are based on struct stack_trace. That's a
horrible construct as the struct is a kitchen sink for input and
output. Quite some usage sites embed it into their own data structures
which creates weird indirections.

There is absolutely no point in doing so. For all use cases a storage array
and the number of valid stack trace entries in the array is sufficient.

Provide helper functions which avoid the struct stack_trace indirection so
the usage sites can be cleaned up.

Signed-off-by: Thomas Gleixner 
---
 include/linux/stacktrace.h |   27 +++
 kernel/stacktrace.c|  160 -
 2 files changed, 172 insertions(+), 15 deletions(-)

--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -3,11 +3,26 @@
 #define __LINUX_STACKTRACE_H
 
 #include 
+#include 
 
 struct task_struct;
 struct pt_regs;
 
 #ifdef CONFIG_STACKTRACE
+void stack_trace_print(unsigned long *trace, unsigned int nr_entries,
+  int spaces);
+int stack_trace_snprint(char *buf, size_t size, unsigned long *entries,
+   unsigned int nr_entries, int spaces);
+unsigned int stack_trace_save(unsigned long *store, unsigned int size,
+ unsigned int skipnr);
+unsigned int stack_trace_save_tsk(struct task_struct *task,
+ unsigned long *store, unsigned int size,
+ unsigned int skipnr);
+unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
+  unsigned int size, unsigned int skipnr);
+unsigned int stack_trace_save_user(unsigned long *store, unsigned int size);
+
+/* Internal interfaces. Do not use in generic code */
 struct stack_trace {
unsigned int nr_entries, max_entries;
unsigned long *entries;
@@ -41,4 +56,16 @@ extern void save_stack_trace_user(struct
 # define save_stack_trace_tsk_reliable(tsk, trace) ({ -ENOSYS; })
 #endif /* CONFIG_STACKTRACE */
 
+#if defined(CONFIG_STACKTRACE) && defined(CONFIG_HAVE_RELIABLE_STACKTRACE)
+int stack_trace_save_tsk_reliable(struct task_struct *tsk, unsigned long 
*store,
+ unsigned int size);
+#else
+static inline int stack_trace_save_tsk_reliable(struct task_struct *tsk,
+   unsigned long *store,
+   unsigned int size)
+{
+   return -ENOSYS;
+}
+#endif
+
 #endif /* __LINUX_STACKTRACE_H */
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -11,35 +11,52 @@
 #include 
 #include 
 
-void print_stack_trace(struct stack_trace *trace, int spaces)
+/**
+ * stack_trace_print - Print the entries in the stack trace
+ * @entries:   Pointer to storage array
+ * @nr_entries:Number of entries in the storage array
+ * @spaces:Number of leading spaces to print
+ */
+void stack_trace_print(unsigned long *entries, unsigned int nr_entries,
+  int spaces)
 {
-   int i;
+   unsigned int i;
 
-   if (WARN_ON(!trace->entries))
+   if (WARN_ON(!entries))
return;
 
-   for (i = 0; i < trace->nr_entries; i++)
-   printk("%*c%pS\n", 1 + spaces, ' ', (void *)trace->entries[i]);
+   for (i = 0; i < nr_entries; i++)
+   printk("%*c%pS\n", 1 + spaces, ' ', (void *)entries[i]);
+}
+EXPORT_SYMBOL_GPL(stack_trace_print);
+
+void print_stack_trace(struct stack_trace *trace, int spaces)
+{
+   stack_trace_print(trace->entries, trace->nr_entries, spaces);
 }
 EXPORT_SYMBOL_GPL(print_stack_trace);
 
-int snprint_stack_trace(char *buf, size_t size,
-   struct stack_trace *trace, int spaces)
+/**
+ * stack_trace_snprint - Print the entries in the stack trace into a buffer
+ * @buf:   Pointer to the print buffer
+ * @size:  Size of the print buffer
+ * @entries:   Pointer to storage array
+ * @nr_entries:Number of entries in the storage array
+ * @spaces:Number of leading spaces to print
+ */
+int stack_trace_snprint(char *buf, size_t size, unsigned long *entries,
+   unsigned int nr_entries, int spaces)
 {
-   int i;
-   int generated;
-   int total = 0;
+   unsigned int generated, i, total = 0;
 
-   if (WARN_ON(!trace->entries))
+   if (WARN_ON(!entries))
return 0;
 
-   for (i = 0; i < trace->nr_entries; i++) {
+   for (i = 0; i < nr_entries; i++) {
generated = snprintf(buf, size, "%*c%pS\n", 1 + spaces, ' ',
-(void *)trace->entries[i]);
+(void *)entries[i]);
 
total += generated;
-
-   /* Assume that generated isn't a negative number */
if (generated >= size) {
buf += size;
size = 0;
@@ -51,6 +68,14 @@ int snprint_stack_trace(char *buf, size_

[patch V2 03/29] lib/stackdepot: Provide functions which operate on plain storage arrays

2019-04-18 Thread Thomas Gleixner

The struct stack_trace indirection in the stack depot functions is a truly
pointless excercise which requires horrible code at the callsites.

Provide interfaces based on plain storage arrays.

Signed-off-by: Thomas Gleixner 
Acked-by: Alexander Potapenko 
---
 include/linux/stackdepot.h |4 ++
 lib/stackdepot.c   |   66 -
 2 files changed, 51 insertions(+), 19 deletions(-)

--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -26,7 +26,11 @@ typedef u32 depot_stack_handle_t;
 struct stack_trace;
 
 depot_stack_handle_t depot_save_stack(struct stack_trace *trace, gfp_t flags);
+depot_stack_handle_t stack_depot_save(unsigned long *entries,
+ unsigned int nr_entries, gfp_t gfp_flags);
 
 void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace);
+unsigned int stack_depot_fetch(depot_stack_handle_t handle,
+  unsigned long **entries);
 
 #endif
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -194,40 +194,56 @@ static inline struct stack_record *find_
return NULL;
 }
 
-void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace)
+/**
+ * stack_depot_fetch - Fetch stack entries from a depot
+ *
+ * @entries:   Pointer to store the entries address
+ */
+unsigned int stack_depot_fetch(depot_stack_handle_t handle,
+  unsigned long **entries)
 {
union handle_parts parts = { .handle = handle };
void *slab = stack_slabs[parts.slabindex];
size_t offset = parts.offset << STACK_ALLOC_ALIGN;
struct stack_record *stack = slab + offset;
 
-   trace->nr_entries = trace->max_entries = stack->size;
-   trace->entries = stack->entries;
-   trace->skip = 0;
+   *entries = stack->entries;
+   return stack->size;
+}
+EXPORT_SYMBOL_GPL(stack_depot_fetch);
+
+void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace)
+{
+   unsigned int nent = stack_depot_fetch(handle, >entries);
+
+   trace->max_entries = trace->nr_entries = nent;
 }
 EXPORT_SYMBOL_GPL(depot_fetch_stack);
 
 /**
- * depot_save_stack - save stack in a stack depot.
- * @trace - the stacktrace to save.
- * @alloc_flags - flags for allocating additional memory if required.
+ * stack_depot_save - Save a stack trace from an array
  *
- * Returns the handle of the stack struct stored in depot.
+ * @entries:   Pointer to storage array
+ * @nr_entries:Size of the storage array
+ * @alloc_flags:   Allocation gfp flags
+ *
+ * Returns the handle of the stack struct stored in depot
  */
-depot_stack_handle_t depot_save_stack(struct stack_trace *trace,
-   gfp_t alloc_flags)
+depot_stack_handle_t stack_depot_save(unsigned long *entries,
+ unsigned int nr_entries,
+ gfp_t alloc_flags)
 {
-   u32 hash;
-   depot_stack_handle_t retval = 0;
struct stack_record *found = NULL, **bucket;
-   unsigned long flags;
+   depot_stack_handle_t retval = 0;
struct page *page = NULL;
void *prealloc = NULL;
+   unsigned long flags;
+   u32 hash;
 
-   if (unlikely(trace->nr_entries == 0))
+   if (unlikely(nr_entries == 0))
goto fast_exit;
 
-   hash = hash_stack(trace->entries, trace->nr_entries);
+   hash = hash_stack(entries, nr_entries);
bucket = _table[hash & STACK_HASH_MASK];
 
/*
@@ -235,8 +251,8 @@ depot_stack_handle_t depot_save_stack(st
 * The smp_load_acquire() here pairs with smp_store_release() to
 * |bucket| below.
 */
-   found = find_stack(smp_load_acquire(bucket), trace->entries,
-  trace->nr_entries, hash);
+   found = find_stack(smp_load_acquire(bucket), entries,
+  nr_entries, hash);
if (found)
goto exit;
 
@@ -264,10 +280,10 @@ depot_stack_handle_t depot_save_stack(st
 
spin_lock_irqsave(_lock, flags);
 
-   found = find_stack(*bucket, trace->entries, trace->nr_entries, hash);
+   found = find_stack(*bucket, entries, nr_entries, hash);
if (!found) {
struct stack_record *new =
-   depot_alloc_stack(trace->entries, trace->nr_entries,
+   depot_alloc_stack(entries, nr_entries,
  hash, , alloc_flags);
if (new) {
new->next = *bucket;
@@ -297,4 +313,16 @@ depot_stack_handle_t depot_save_stack(st
 fast_exit:
return retval;
 }
+EXPORT_SYMBOL_GPL(stack_depot_save);
+
+/**
+ * depot_save_stack - save stack in a stack depot.
+ * @trace - the stacktrace to save.
+ * @alloc_flags - flags for allocating additional memory if required.
+ */
+depot_stack_handle_t depot_save_stack(struct stack_trace *trace,

Re: [RFC PATCH v9 03/13] mm: Add support for eXclusive Page Frame Ownership (XPFO)

2019-04-18 Thread Thomas Gleixner

On Wed, 17 Apr 2019, Linus Torvalds wrote:
> On Wed, Apr 17, 2019 at 4:42 PM Thomas Gleixner  wrote:
> > On Wed, 17 Apr 2019, Linus Torvalds wrote:
> > > With SMEP, user space pages are always NX.
> >
> > We talk past each other. The user space page in the ring3 valid virtual
> > address space (non negative) is of course protected by SMEP.
> >
> > The attack utilizes the kernel linear mapping of the physical
> > memory. I.e. user space address 0x43210 has a kernel equivalent at
> > 0xfxx. So if the attack manages to trick the kernel to that valid
> > kernel address and that is mapped X --> game over. SMEP does not help
> > there.
> 
> Oh, agreed.
> 
> But that would simply be a kernel bug. We should only map kernel pages
> executable when we have kernel code in them, and we should certainly
> not allow those pages to be mapped writably in user space.
> 
> That kind of "executable in kernel, writable in user" would be a
> horrendous and major bug.
> 
> So i think it's a non-issue.

Pretty much.

> > From the top of my head I'd say this is a non issue as those kernel address
> > space mappings _should_ be NX, but we got bitten by _should_ in the past:)
> 
> I do agree that bugs can happen, obviously, and we might have missed 
> something.
>
> But in the context of XPFO, I would argue (*very* strongly) that the
> likelihood of the above kind of bug is absolutely *miniscule* compared
> to the likelihood that we'd have something wrong in the software
> implementation of XPFO.
> 
> So if the argument is "we might have bugs in software", then I think
> that's an argument _against_ XPFO rather than for it.

No argument from my side. We better spend time to make sure that a bogus
kernel side X mapping is caught, like we catch other things.

Thanks,

tglx

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

88 matches

Mail list logo