Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-04-19 Thread Michael S. Tsirkin
On Wed, Apr 17, 2019 at 06:42:00PM -0300, Thiago Jung Bauermann wrote:
> 
> Michael S. Tsirkin  writes:
> 
> > On Thu, Mar 21, 2019 at 09:05:04PM -0300, Thiago Jung Bauermann wrote:
> >>
> >> Michael S. Tsirkin  writes:
> >>
> >> > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
> >> >> >From what I understand of the ACCESS_PLATFORM definition, the host will
> >> >> only ever try to access memory addresses that are supplied to it by the
> >> >> guest, so all of the secure guest memory that the host cares about is
> >> >> accessible:
> >> >>
> >> >> If this feature bit is set to 0, then the device has same access to
> >> >> memory addresses supplied to it as the driver has. In particular,
> >> >> the device will always use physical addresses matching addresses
> >> >> used by the driver (typically meaning physical addresses used by the
> >> >> CPU) and not translated further, and can access any address supplied
> >> >> to it by the driver. When clear, this overrides any
> >> >> platform-specific description of whether device access is limited or
> >> >> translated in any way, e.g. whether an IOMMU may be present.
> >> >>
> >> >> All of the above is true for POWER guests, whether they are secure
> >> >> guests or not.
> >> >>
> >> >> Or are you saying that a virtio device may want to access memory
> >> >> addresses that weren't supplied to it by the driver?
> >> >
> >> > Your logic would apply to IOMMUs as well.  For your mode, there are
> >> > specific encrypted memory regions that driver has access to but device
> >> > does not. that seems to violate the constraint.
> >>
> >> Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
> >> the device can ignore the IOMMU for all practical purposes I would
> >> indeed say that the logic would apply to IOMMUs as well. :-)
> >>
> >> I guess I'm still struggling with the purpose of signalling to the
> >> driver that the host may not have access to memory addresses that it
> >> will never try to access.
> >
> > For example, one of the benefits is to signal to host that driver does
> > not expect ability to access all memory. If it does, host can
> > fail initialization gracefully.
> 
> But why would the ability to access all memory be necessary or even
> useful? When would the host access memory that the driver didn't tell it
> to access?

When I say all memory I mean even memory not allowed by the IOMMU.


> >> >> >> > But the name "sev_active" makes me scared because at least AMD 
> >> >> >> > guys who
> >> >> >> > were doing the sensible thing and setting ACCESS_PLATFORM
> >> >> >>
> >> >> >> My understanding is, AMD guest-platform knows in advance that their
> >> >> >> guest will run in secure mode and hence sets the flag at the time of 
> >> >> >> VM
> >> >> >> instantiation. Unfortunately we dont have that luxury on our 
> >> >> >> platforms.
> >> >> >
> >> >> > Well you do have that luxury. It looks like that there are existing
> >> >> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
> >> >> > with how that path is slow. So you are trying to optimize for
> >> >> > them by clearing ACCESS_PLATFORM and then you have lost ability
> >> >> > to invoke DMA API.
> >> >> >
> >> >> > For example if there was another flag just like ACCESS_PLATFORM
> >> >> > just not yet used by anyone, you would be all fine using that right?
> >> >>
> >> >> Yes, a new flag sounds like a great idea. What about the definition
> >> >> below?
> >> >>
> >> >> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
> >> >> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
> >> >> exception that the IOMMU is explicitly defined to be off or bypassed
> >> >> when accessing memory addresses supplied to the device by the
> >> >> driver. This flag should be set by the guest if offered, but to
> >> >> allow for backward-compatibility device implementations allow for it
> >> >> to be left unset by the guest. It is an error to set both this flag
> >> >> and VIRTIO_F_ACCESS_PLATFORM.
> >> >
> >> > It looks kind of narrow but it's an option.
> >>
> >> Great!
> >>
> >> > I wonder how we'll define what's an iommu though.
> >>
> >> Hm, it didn't occur to me it could be an issue. I'll try.
> 
> I rephrased it in terms of address translation. What do you think of
> this version? The flag name is slightly different too:
> 
> 
> VIRTIO_F_ACCESS_PLATFORM_NO_TRANSLATION This feature has the same
> meaning as VIRTIO_F_ACCESS_PLATFORM both when set and when not set,
> with the exception that address translation is guaranteed to be
> unnecessary when accessing memory addresses supplied to the device
> by the driver. Which is to say, the device will always use physical
> addresses matching addresses used by the driver (typically meaning
> physical addresses used by the CPU) and not translated further. This
> flag should be set by the gues

Re: [patch V2 23/29] tracing: Simplify stack trace retrieval

2019-04-19 Thread Steven Rostedt
On Thu, 18 Apr 2019 10:41:42 +0200
Thomas Gleixner  wrote:

> Replace the indirection through struct stack_trace by using the storage
> array based interfaces.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Steven Rostedt 

Reviewed-by: Steven Rostedt (VMware) 

-- Steve
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 24/29] tracing: Remove the last struct stack_trace usage

2019-04-19 Thread Steven Rostedt
On Thu, 18 Apr 2019 10:41:43 +0200
Thomas Gleixner  wrote:

> Simplify the stack retrieval code by using the storage array based
> interface.
> 
> Signed-off-by: Thomas Gleixner 


Reviewed-by: Steven Rostedt (VMware) 

-- Steve
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Josh Poimboeuf
On Fri, Apr 19, 2019 at 11:07:17AM +0200, Peter Zijlstra wrote:
> On Fri, Apr 19, 2019 at 10:32:30AM +0200, Thomas Gleixner wrote:
> > On Fri, 19 Apr 2019, Peter Zijlstra wrote:
> > > On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:
> > > 
> > > > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long 
> > > > addr,
> > > > +  bool reliable);
> > > 
> > > > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void 
> > > > *cookie,
> > > > +struct task_struct *task, struct pt_regs *regs);
> > > > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, 
> > > > void *cookie,
> > > > +struct task_struct *task);
> > > 
> > > This bugs me a little; ideally the _reliable() thing would not exists.
> > > 
> > > Thomas said that the existing __save_stack_trace_reliable() is different
> > > enough for the unification to be non-trivial, but maybe Josh can help
> > > out?
> > > 
> > > >From what I can see the biggest significant differences are:
> > > 
> > >  - it looks at the regs sets on the stack and for FP bails early
> > >  - bails for khreads and idle (after it does all the hard work!?!)

That's done for a reason, see the "Success path" comments.

> > > The first (FP checking for exceptions) should probably be reflected in
> > > consume_fn(.reliable) anyway -- although that would mean a lot of extra
> > > '?' entries where there are none today.
> > > 
> > > And the second (KTHREAD/IDLE) is something that the generic code can
> > > easily do before calling into the arch unwinder.
> > 
> > And looking at the powerpc version of it, that has even more interesting
> > extra checks in that function.
> 
> Right, but not fundamentally different from determining @reliable I
> think.
> 
> Anyway, it would be good if someone knowledgable could have a look at
> this.

Yeah, we could probably do that.

The flow would need to be changed a bit -- some of the errors are soft
errors which most users don't care about because they just want a best
effort.  The soft errors can be remembered without breaking out of the
loop, and just returned at the end.  Most users could just ignore the
return code.

The only thing I'd be worried about is performance for the non-livepatch
users, but I guess those checks don't look very expensive.  And the x86
unwinders are already pretty slow anyway...

-- 
Josh
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Josh Poimboeuf
On Fri, Apr 19, 2019 at 09:02:11AM +0200, Peter Zijlstra wrote:
> On Thu, Apr 18, 2019 at 05:42:55PM +0200, Thomas Gleixner wrote:
> > On Thu, 18 Apr 2019, Josh Poimboeuf wrote:
> 
> > > Another idea I had (but never got a chance to work on) was to extend the
> > > x86 unwind interface to all arches.  So instead of the callbacks, each
> > > arch would implement something like this API:
> 
> > I surely thought about that, but after staring at all incarnations of
> > arch/*/stacktrace.c I just gave up.
> > 
> > Aside of that quite some archs already have callback based unwinders
> > because they use them for more than stacktracing and just have a single
> > implementation of that loop.
> > 
> > I'm fine either way. We can start with x86 and then let archs convert over
> > their stuff, but I wouldn't hold my breath that this will be completed in
> > the forseeable future.
> 
> I suggested the same to Thomas early on, and I even spend the time to
> convert some $random arch to the iterator interface, and while it is
> indeed entirely feasible, it is _far_ more work.
> 
> The callback thing OTOH is flexible enough to do what we want to do now,
> and allows converting most archs to it without too much pain (as Thomas
> said, many archs are already in this form and only need minor API
> adjustments), which gets us in a far better place than we are now.
> 
> And we can always go to iterators later on. But I think getting the
> generic unwinder improved across all archs is a really important first
> step here.

Fair enough.

-- 
Josh
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled

2019-04-19 Thread Leizhen (ThunderTown)



On 2019/4/17 9:39, Leizhen (ThunderTown) wrote:
> 
> 
> On 2019/4/16 17:14, Will Deacon wrote:
>> On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote:
>>> On 2019/4/4 23:30, Will Deacon wrote:
 On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote:
> v1 --> v2:
> 1. Drop part2. Now, we only use the SMMUv3 hardware feature 
> STE.config=0b000
> (Report abort to device, no event recorded) to suppress the event messages
> caused by the unexpected devices.
> 2. rewrite the patch description.

 This issue came up a while back:

 https://lore.kernel.org/linux-pci/20180302103032.gb19...@arm.com/

 and I'd still prefer to solve it using the disable_bypass logic which we
 already have. Something along the lines of the diff below?
>>>
>>> Yes, my patches also use disable_bypass=1(set ste.config=0b000). If
>>> SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries
>>> are allocated and initialized(set ste.config=0b000). But if 
>>> SMMU_IDR0.ST_LEVEL=1
>>> (2-level Stream Table), we only allocated and initialized the first level 
>>> tables,
>>> but leave level 2 tables dynamic allocated. That means, 
>>> C_BAD_STREAMID(eventid=0x2)
>>> will be reported, if an unexpeted device access memory without 
>>> reinitialized in
>>> kdump kernel.
>>
>> So is your problem just that the C_BAD_STREAMID events are noisy? If so,
>> perhaps we should be disabling fault reporting entirely in the kdump kernel.
>>
>> How about the update diff below? I'm keen to have this as simple as
>> possible, so we don't end up introducing rarely tested, complex code on
>> the crash path.
> In theory, it can solve the problem, let me test it.
Hi Will,
  I have tested your patch on my board today. It works well.

> 
> But then again, below patch will also disable the fault reporting come from 
> the
> expected devices which are used in the kdump kernel. In fact, my patches have 
> been
> merged into our interval version more than 2 months, no bug have been found 
> yet.
> 
> However, my patches do not support the case that the hardware does not 
> support the
> "STE bypass" feature, I think your patch can also resolve it.
> 
>>
>> Will
>>
>> --->8
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index d3880010c6cf..d8b73da6447d 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct 
>> arm_smmu_device *smmu, bool bypass)
>>  /* Clear CR0 and sync (disables SMMU and queue processing) */
>>  reg = readl_relaxed(smmu->base + ARM_SMMU_CR0);
>>  if (reg & CR0_SMMUEN) {
>> -if (is_kdump_kernel()) {
>> -arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
>> -arm_smmu_device_disable(smmu);
>> -return -EBUSY;
>> -}
>> -
>>  dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>> +WARN_ON(is_kdump_kernel() && !disable_bypass);
>> +arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
>>  }
>>  
>>  ret = arm_smmu_device_disable(smmu);
>> @@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct 
>> arm_smmu_device *smmu, bool bypass)
>>  return ret;
>>  }
>>  
>> +if (is_kdump_kernel())
>> +enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
>>  
>>  /* Enable the SMMU interface, or ensure bypass */
>>  if (!bypass || disable_bypass) {
>>
>> .
>>
> 

-- 
Thanks!
BestRegards

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 02/13] soc/fsl/bman: map FBPR area in the iommu

2019-04-19 Thread Laurentiu Tudor
Hi Robin,

> -Original Message-
> From: Robin Murphy 
> Sent: Friday, March 29, 2019 4:51 PM
> 
> On 29/03/2019 14:00, laurentiu.tu...@nxp.com wrote:
> > From: Laurentiu Tudor 
> >
> > Add a one-to-one iommu mapping for bman private data memory (FBPR).
> > This is required for BMAN to work without faults behind an iommu.
> >
> > Signed-off-by: Laurentiu Tudor 
> > ---
> >   drivers/soc/fsl/qbman/bman_ccsr.c | 11 +++
> >   1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c
> b/drivers/soc/fsl/qbman/bman_ccsr.c
> > index 7c3cc968053c..b209c79511bb 100644
> > --- a/drivers/soc/fsl/qbman/bman_ccsr.c
> > +++ b/drivers/soc/fsl/qbman/bman_ccsr.c
> > @@ -29,6 +29,7 @@
> >*/
> >
> >   #include "bman_priv.h"
> > +#include 
> >
> >   u16 bman_ip_rev;
> >   EXPORT_SYMBOL(bman_ip_rev);
> > @@ -178,6 +179,7 @@ static int fsl_bman_probe(struct platform_device
> *pdev)
> > int ret, err_irq;
> > struct device *dev = &pdev->dev;
> > struct device_node *node = dev->of_node;
> > +   struct iommu_domain *domain;
> > struct resource *res;
> > u16 id, bm_pool_cnt;
> > u8 major, minor;
> > @@ -225,6 +227,15 @@ static int fsl_bman_probe(struct platform_device
> *pdev)
> >
> > dev_dbg(dev, "Allocated FBPR 0x%llx 0x%zx\n", fbpr_a, fbpr_sz);
> >
> > +   /* Create an 1-to-1 iommu mapping for FBPR area */
> > +   domain = iommu_get_domain_for_dev(dev);
> 
> If that's expected to be the default domain that you're grabbing, then
> this is *incredibly* fragile. There's nothing to stop the IOVA that you
> forcibly map from being automatically allocated later and causing some
> other DMA mapping to fail noisily and unexpectedly. Furthermore, have
> you tried this with "iommu.passthrough=1"?
> 
> That said, I really don't understand what's going on here anyway :/
> 
> As far as I can tell from qbman_init_private_mem(), fbpr_a comes from
> dma_alloc_coherent() and thus would already be a mapped IOVA - isn't
> this the stuff that Roy converted to nicely use shared-dma-pool regions
> a while ago?
> 

Finally found some time to look into this, sorry for the delay. It seems that 
on the code path taken in our case (dma_alloc_coherent() -> dma_alloc_attrs() 
-> dma_alloc_from_dev_coherent() -> __dma_alloc_from_coherent()) there's no 
call into the iommu layer, thus no mapping in the smmu. I plan to come up with 
a RFC patch early next week so we have something concrete to discuss on.

---
Best Regards, Laurentiu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 22/29] tracing: Make ftrace_trace_userstack() static and conditional

2019-04-19 Thread Steven Rostedt
On Thu, 18 Apr 2019 10:41:41 +0200
Thomas Gleixner  wrote:

> It's only used in trace.c and there is absolutely no point in compiling it
> in when user space stack traces are not supported.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Steven Rostedt 

Funny, these were moved out to global functions along with the
ftrace_trace_stack() but I guess they were never used.

This basically just does a partial revert of:

 c0a0d0d3f6528 ("tracing/core: Make the stack entry helpers global")


> ---
>  kernel/trace/trace.c |   14 --
>  kernel/trace/trace.h |8 
>  2 files changed, 8 insertions(+), 14 deletions(-)
> 
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -159,6 +159,8 @@ static union trace_eval_map_item *trace_
>  #endif /* CONFIG_TRACE_EVAL_MAP_FILE */
>  
>  static int tracing_set_tracer(struct trace_array *tr, const char *buf);
> +static void ftrace_trace_userstack(struct ring_buffer *buffer,
> +unsigned long flags, int pc);
>  
>  #define MAX_TRACER_SIZE  100
>  static char bootup_tracer_buf[MAX_TRACER_SIZE] __initdata;
> @@ -2905,9 +2907,10 @@ void trace_dump_stack(int skip)
>  }
>  EXPORT_SYMBOL_GPL(trace_dump_stack);
>  
> +#ifdef CONFIG_USER_STACKTRACE_SUPPORT
>  static DEFINE_PER_CPU(int, user_stack_count);
>  
> -void
> +static void
>  ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int 
> pc)
>  {
>   struct trace_event_call *call = &event_user_stack;
> @@ -2958,13 +2961,12 @@ ftrace_trace_userstack(struct ring_buffe
>   out:
>   preempt_enable();
>  }
> -
> -#ifdef UNUSED

Strange, I never knew about this ifdef. I would have nuked it when I
saw it.

Anyway,

Reviewed-by: Steven Rostedt (VMware) 

-- Steve


> -static void __trace_userstack(struct trace_array *tr, unsigned long flags)
> +#else /* CONFIG_USER_STACKTRACE_SUPPORT */
> +static void ftrace_trace_userstack(struct ring_buffer *buffer,
> +unsigned long flags, int pc)
>  {
> - ftrace_trace_userstack(tr, flags, preempt_count());
>  }
> -#endif /* UNUSED */
> +#endif /* !CONFIG_USER_STACKTRACE_SUPPORT */
>  
>  #endif /* CONFIG_STACKTRACE */
>  
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -782,17 +782,9 @@ void update_max_tr_single(struct trace_a
>  #endif /* CONFIG_TRACER_MAX_TRACE */
>  
>  #ifdef CONFIG_STACKTRACE
> -void ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags,
> - int pc);
> -
>  void __trace_stack(struct trace_array *tr, unsigned long flags, int skip,
>  int pc);
>  #else
> -static inline void ftrace_trace_userstack(struct ring_buffer *buffer,
> -   unsigned long flags, int pc)
> -{
> -}
> -
>  static inline void __trace_stack(struct trace_array *tr, unsigned long flags,
>int skip, int pc)
>  {
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-04-19 Thread Christoph Hellwig
On Thu, Apr 18, 2019 at 05:41:00PM +0100, Robin Murphy wrote:
>>  From a very high level POV this looks ok, but sometimes a bit to
>> convoluted to me.  The major issue why I went with the version I posted
>> is that I can cleanly ifdef out the remap code in just a few sections.
>> In this version it is spread out a lot more, and the use of IS_ENABLED
>> means that we'd need a lot more stubs for functionality that won't
>> ever be called but needs to be compilable.
>
> What functionality do you have planned in that regard? I did do a quick 
> build test of my arm64 config with DMA_DIRECT_REMAP hacked out, and 
> dma-iommu.o appeared to link OK (although other bits of arm64 and 
> dma-direct didn't, as expected). I will try x86 with IOMMU_DMA to make 
> sure, though.

Yeah, this seems to actually work, there just is a huge chunk of
remapping that is hopefully discarded by the compiler even without the
ifdefs.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra
On Fri, Apr 19, 2019 at 10:32:30AM +0200, Thomas Gleixner wrote:
> On Fri, 19 Apr 2019, Peter Zijlstra wrote:
> > On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:
> > 
> > > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr,
> > > +  bool reliable);
> > 
> > > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> > > +  struct task_struct *task, struct pt_regs *regs);
> > > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void 
> > > *cookie,
> > > +  struct task_struct *task);
> > 
> > This bugs me a little; ideally the _reliable() thing would not exists.
> > 
> > Thomas said that the existing __save_stack_trace_reliable() is different
> > enough for the unification to be non-trivial, but maybe Josh can help
> > out?
> > 
> > >From what I can see the biggest significant differences are:
> > 
> >  - it looks at the regs sets on the stack and for FP bails early
> >  - bails for khreads and idle (after it does all the hard work!?!)
> > 
> > The first (FP checking for exceptions) should probably be reflected in
> > consume_fn(.reliable) anyway -- although that would mean a lot of extra
> > '?' entries where there are none today.
> > 
> > And the second (KTHREAD/IDLE) is something that the generic code can
> > easily do before calling into the arch unwinder.
> 
> And looking at the powerpc version of it, that has even more interesting
> extra checks in that function.

Right, but not fundamentally different from determining @reliable I
think.

Anyway, it would be good if someone knowledgable could have a look at
this.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Thomas Gleixner
On Fri, 19 Apr 2019, Peter Zijlstra wrote:
> On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:
> 
> > +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr,
> > +  bool reliable);
> 
> > +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> > +struct task_struct *task, struct pt_regs *regs);
> > +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void 
> > *cookie,
> > +struct task_struct *task);
> 
> This bugs me a little; ideally the _reliable() thing would not exists.
> 
> Thomas said that the existing __save_stack_trace_reliable() is different
> enough for the unification to be non-trivial, but maybe Josh can help
> out?
> 
> >From what I can see the biggest significant differences are:
> 
>  - it looks at the regs sets on the stack and for FP bails early
>  - bails for khreads and idle (after it does all the hard work!?!)
> 
> The first (FP checking for exceptions) should probably be reflected in
> consume_fn(.reliable) anyway -- although that would mean a lot of extra
> '?' entries where there are none today.
> 
> And the second (KTHREAD/IDLE) is something that the generic code can
> easily do before calling into the arch unwinder.

And looking at the powerpc version of it, that has even more interesting
extra checks in that function.

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-04-19 Thread Christoph Hellwig
On Thu, Apr 18, 2019 at 07:15:00PM +0100, Robin Murphy wrote:
> Still, I've worked in the vm_map_pages() stuff pending in MM and given them 
> the same treatment to finish the picture. Both x86_64_defconfig and 
> i386_defconfig do indeed compile and link fine as I expected, so I really 
> would like to understand the concern around #ifdefs better.

This looks generally fine to me.  One thing I'd like to do is to
generally make use of the fact that __iommu_dma_get_pages returns NULL
for the force contigous case as that cleans up a few things.  Also
for the !DMA_REMAP case we need to try the page allocator when
dma_alloc_from_contiguous does not return a page.  What do you thing
of the following incremental diff?  If that is fine with you I can
fold that in and add back in the remaining patches from my series
not obsoleted by your patches and resend.


diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1bc8d1de1a1d..50b44e220de3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -894,7 +894,7 @@ static void iommu_dma_unmap_resource(struct device *dev, 
dma_addr_t handle,
 
 static void __iommu_dma_free(struct device *dev, void *cpu_addr, size_t size)
 {
-   struct page *page, **pages;
+   struct page *page = NULL;
int count = size >> PAGE_SHIFT;
 
/* Non-coherent atomic allocation? Easy */
@@ -902,24 +902,26 @@ static void __iommu_dma_free(struct device *dev, void 
*cpu_addr, size_t size)
dma_free_from_pool(cpu_addr, size))
return;
 
-   /* Lowmem means a coherent atomic or CMA allocation */
-   if (!IS_ENABLED(CONFIG_DMA_REMAP) || !is_vmalloc_addr(cpu_addr)) {
-   page = virt_to_page(cpu_addr);
-   if (!dma_release_from_contiguous(dev, page, count))
-   __free_pages(page, get_order(size));
-   return;
-   }
-   /*
-* If it's remapped, then it's either non-coherent or highmem CMA, or
-* an iommu_dma_alloc_remap() construction.
-*/
-   page = vmalloc_to_page(cpu_addr);
-   if (!dma_release_from_contiguous(dev, page, count)) {
-   pages = __iommu_dma_get_pages(cpu_addr);
+   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && is_vmalloc_addr(cpu_addr)) {
+   /*
+* If it the address is remapped, then it's either non-coherent
+* or highmem CMA, or an iommu_dma_alloc_remap() construction.
+*/
+   struct page **pages = __iommu_dma_get_pages(cpu_addr);
+
if (pages)
__iommu_dma_free_pages(pages, count);
+   else
+   page = vmalloc_to_page(cpu_addr);
+
+   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+   } else {
+   /* Lowmem means a coherent atomic or CMA allocation */
+   page = virt_to_page(cpu_addr);
}
-   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+
+   if (page && !dma_release_from_contiguous(dev, page, count))
+   __free_pages(page, get_order(size));
 }
 
 static void *iommu_dma_alloc(struct device *dev, size_t size,
@@ -935,25 +937,26 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
gfp |= __GFP_ZERO;
 
+   if (IS_ENABLED(CONFIG_DMA_REMAP) && gfpflags_allow_blocking(gfp) &&
+   !(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
+   return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs);
+
if (!gfpflags_allow_blocking(gfp)) {
-   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !coherent)
+   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !coherent) {
cpu_addr = dma_alloc_from_pool(alloc_size, &page, gfp);
-   else
-   page = alloc_pages(gfp, page_order);
-   } else if (!IS_ENABLED(CONFIG_DMA_REMAP) ||
-  (attrs & DMA_ATTR_FORCE_CONTIGUOUS)) {
+   if (!cpu_addr)
+   return NULL;
+   goto do_iommu_map;
+   }
+   } else {
page = dma_alloc_from_contiguous(dev, count, page_order,
 gfp & __GFP_NOWARN);
-   } else {
-   return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs);
}
-
+   if (!page)
+   page = alloc_pages(gfp, page_order);
if (!page)
return NULL;
 
-   if (cpu_addr)
-   goto do_iommu_map;
-
if (IS_ENABLED(CONFIG_DMA_REMAP) && (!coherent || PageHighMem(page))) {
pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
 
@@ -1007,16 +1010,14 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
if (off >= nr_pages || vma_pages(vma) > nr_pages - off)
return -ENXIO;
 
-   if (!is_vmalloc_addr(cpu_addr)) {
-   pfn = page_to

Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra
On Thu, Apr 18, 2019 at 10:41:47AM +0200, Thomas Gleixner wrote:

> +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr,
> +  bool reliable);

> +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
> +  struct task_struct *task, struct pt_regs *regs);
> +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void 
> *cookie,
> +  struct task_struct *task);

This bugs me a little; ideally the _reliable() thing would not exists.

Thomas said that the existing __save_stack_trace_reliable() is different
enough for the unification to be non-trivial, but maybe Josh can help
out?

>From what I can see the biggest significant differences are:

 - it looks at the regs sets on the stack and for FP bails early
 - bails for khreads and idle (after it does all the hard work!?!)

The first (FP checking for exceptions) should probably be reflected in
consume_fn(.reliable) anyway -- although that would mean a lot of extra
'?' entries where there are none today.

And the second (KTHREAD/IDLE) is something that the generic code can
easily do before calling into the arch unwinder.

Hmm?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 12/29] dma/debug: Simplify stracktrace retrieval

2019-04-19 Thread Christoph Hellwig
Please fix up the > 80 char line.  Otherwise:

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 28/29] stacktrace: Provide common infrastructure

2019-04-19 Thread Peter Zijlstra
On Thu, Apr 18, 2019 at 05:42:55PM +0200, Thomas Gleixner wrote:
> On Thu, 18 Apr 2019, Josh Poimboeuf wrote:

> > Another idea I had (but never got a chance to work on) was to extend the
> > x86 unwind interface to all arches.  So instead of the callbacks, each
> > arch would implement something like this API:

> I surely thought about that, but after staring at all incarnations of
> arch/*/stacktrace.c I just gave up.
> 
> Aside of that quite some archs already have callback based unwinders
> because they use them for more than stacktracing and just have a single
> implementation of that loop.
> 
> I'm fine either way. We can start with x86 and then let archs convert over
> their stuff, but I wouldn't hold my breath that this will be completed in
> the forseeable future.

I suggested the same to Thomas early on, and I even spend the time to
convert some $random arch to the iterator interface, and while it is
indeed entirely feasible, it is _far_ more work.

The callback thing OTOH is flexible enough to do what we want to do now,
and allows converting most archs to it without too much pain (as Thomas
said, many archs are already in this form and only need minor API
adjustments), which gets us in a far better place than we are now.

And we can always go to iterators later on. But I think getting the
generic unwinder improved across all archs is a really important first
step here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu