Re: [PATCH 0/3] drm: panfrost: Coherency support

2020-09-17 Thread Tomeu Vizoso

On 9/17/20 12:38 PM, Steven Price wrote:

On 16/09/2020 18:46, Rob Herring wrote:

On Wed, Sep 16, 2020 at 11:04 AM Alyssa Rosenzweig
 wrote:


So I get a performance regression with the dma-coherent approach, 
even if it's

clearly the cleaner.


That's bizarre -- this should really be the faster of the two.


Coherency may not be free. CortexA9 had something like 4x slower
memcpy if SMP was enabled as an example. I don't know if there's
anything going on like that specifically here. If there's never any
CPU accesses mixed in with kmscube, then there would be no benefit to
coherency.


The DDK blob has the ability to mark only certain areas of memory as 
coherent for performance reasons. For simple things like kmscube I would 
expect that it's basically write-only from the CPU and almost all memory 
the GPU touches isn't touched by the CPU. I.e. coherency isn't helping 
and the coherency traffic is probably expensive. Whether the complexity 
is worth it for "real" content I don't know - it may just be silly 
benchmarks that benefit.


Or maybe it's only a problem for applications that do silly things? I 
don't think kmscube was ever optimized for performance.


Regards,

Tomeu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/io-pgtable-arm: Mali LPAE improvements

2019-09-23 Thread Tomeu Vizoso
On Thu, 19 Sep 2019 at 10:31, Will Deacon  wrote:
>
> On Wed, Sep 11, 2019 at 06:19:40PM +0100, Robin Murphy wrote:
> > On 2019-09-11 5:20 pm, Will Deacon wrote:
> > > On Wed, Sep 11, 2019 at 06:19:04PM +0200, Neil Armstrong wrote:
> > > > On 11/09/2019 16:42, Robin Murphy wrote:
> > > > > Here's the eagerly-awaited fix to unblock T720/T820, plus a couple of
> > > > > other bits that I've collected so far. I'm not considering this as
> > > > > 5.3 fixes material, but it would be nice if there's any chance still
> > > > > to sneak it into 5.4.
> > > > >
> > > > > Robin.
> > > > >
> > > > >
> > > > > Robin Murphy (3):
> > > > >iommu/io-pgtable-arm: Correct Mali attributes
> > > > >iommu/io-pgtable-arm: Support more Mali configurations
> > > > >iommu/io-pgtable-arm: Allow coherent walks for Mali
> > > > >
> > > > >   drivers/iommu/io-pgtable-arm.c | 61 
> > > > > ++
> > > > >   1 file changed, 48 insertions(+), 13 deletions(-)
> > > > >
> > > >
> > > > Tested-by: Neil Armstrong 
> > > >
> > > > On Khadas VIM2 (Amlogic S912) with T820 Mali GPU
> > > >
> > > > I hope this will be part of v5.4 so we can run panfrost on vanilla v5.4 
> > > > !
> > >
> > > Not a chance -- the merge window opens on Monday and -next isn't being
> > > rolled out at the moment due to LPC. Let's shoot for 5.5 and get this
> > > queued up in a few weeks.
> >
> > Fair enough, that was certainly more extreme optimism than realistic
> > expectation on my part :)
> >
> > There is some argument for taking #1 and #2 as 5.4 fixes, though - the
> > upcoming Mesa 19.2 release will enable T820 support on the userspace side -
> > so let's pick that discussion up again in a few weeks.
>
> Ok, I'll include those two in my fixes pull to Joerg at -rc1.

Hi Will,

Looks like this didn't end up happening?

Thanks,

Tomeu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

2019-06-10 Thread Tomeu Vizoso
On Wed, 29 May 2019 at 19:38, Robin Murphy  wrote:
>
> On 29/05/2019 16:09, Tomeu Vizoso wrote:
> > On Tue, 21 May 2019 at 18:11, Clément Péron  wrote:
> >>
> > [snip]
> >> [  345.204813] panfrost 180.gpu: mmu irq status=1
> >> [  345.209617] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
> >> 0x02400400
> >
> >  From what I can see here, 0x02400400 points to the first byte
> > of the first submitted job descriptor.
> >
> > So mapping buffers for the GPU doesn't seem to be working at all on
> > 64-bit T-760.
> >
> > Steven, Robin, do you have any idea of why this could be?
>
> I tried rolling back to the old panfrost/nondrm shim, and it works fine
> with kbase, and I also found that T-820 falls over in the exact same
> manner, so the fact that it seemed to be common to the smaller 33-bit
> designs rather than anything to do with the other
> job_descriptor_size/v4/v5 complication turned out to be telling.
>
> [ as an aside, are 64-bit jobs actually known not to work on v4 GPUs, or
> is it just that nobody's yet observed a 64-bit blob driving one? ]

Do you know if 64-bit descriptors work on v4 GPUs with our kernel
driver but with the DDK?

Wonder if there something else to be fixed in the kernel for that scenario.

Thanks,

Tomeu

> Long story short, it appears that 'Mali LPAE' is also lacking the start
> level notion of VMSA, and expects a full 4-level table even for <40 bits
> when level 0 effectively redundant. Thus walking the 3-level table that
> io-pgtable comes back with ends up going wildly wrong. The hack below
> seems to do the job for me; if Clément can confirm (on T-720 you'll
> still need the userspace hack to force 32-bit jobs as well) then I think
> I'll cook up a proper refactoring of the allocator to put things right.
>
> Robin.
>
>
> ->8-
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 546968d8a349..f29da6e8dc08 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -1023,12 +1023,14 @@ arm_mali_lpae_alloc_pgtable(struct
> io_pgtable_cfg *cfg, void *cookie)
> iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
> if (iop) {
> u64 mair, ttbr;
> +   struct arm_lpae_io_pgtable *data = 
> io_pgtable_ops_to_data(>ops);
>
> +   data->levels = 4;
> /* Copy values as union fields overlap */
> mair = cfg->arm_lpae_s1_cfg.mair[0];
> ttbr = cfg->arm_lpae_s1_cfg.ttbr[0];
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

2019-05-31 Thread Tomeu Vizoso
On Wed, 29 May 2019 at 19:38, Robin Murphy  wrote:
>
> On 29/05/2019 16:09, Tomeu Vizoso wrote:
> > On Tue, 21 May 2019 at 18:11, Clément Péron  wrote:
> >>
> > [snip]
> >> [  345.204813] panfrost 180.gpu: mmu irq status=1
> >> [  345.209617] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
> >> 0x02400400
> >
> >  From what I can see here, 0x02400400 points to the first byte
> > of the first submitted job descriptor.
> >
> > So mapping buffers for the GPU doesn't seem to be working at all on
> > 64-bit T-760.
> >
> > Steven, Robin, do you have any idea of why this could be?
>
> I tried rolling back to the old panfrost/nondrm shim, and it works fine
> with kbase, and I also found that T-820 falls over in the exact same
> manner, so the fact that it seemed to be common to the smaller 33-bit
> designs rather than anything to do with the other
> job_descriptor_size/v4/v5 complication turned out to be telling.

Is this complication something you can explain? I don't know what v4
and v5 are meant here.

> [ as an aside, are 64-bit jobs actually known not to work on v4 GPUs, or
> is it just that nobody's yet observed a 64-bit blob driving one? ]

I'm looking right now at getting Panfrost working on T720 with 64-bit
descriptors, with the ultimate goal of making Panfrost
64-bit-descriptor only so we can have a single build of Mesa in
distros.

> Long story short, it appears that 'Mali LPAE' is also lacking the start
> level notion of VMSA, and expects a full 4-level table even for <40 bits
> when level 0 effectively redundant. Thus walking the 3-level table that
> io-pgtable comes back with ends up going wildly wrong. The hack below
> seems to do the job for me; if Clément can confirm (on T-720 you'll
> still need the userspace hack to force 32-bit jobs as well) then I think
> I'll cook up a proper refactoring of the allocator to put things right.

Mmaps seem to work with this patch, thanks.

The main complication I'm facing right now seems to be that the SFBD
descriptor on T720 seems to be different from the one we already had
(tested on T6xx?).

Cheers,

Tomeu

> Robin.
>
>
> ->8-
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 546968d8a349..f29da6e8dc08 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -1023,12 +1023,14 @@ arm_mali_lpae_alloc_pgtable(struct
> io_pgtable_cfg *cfg, void *cookie)
> iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
> if (iop) {
> u64 mair, ttbr;
> +   struct arm_lpae_io_pgtable *data = 
> io_pgtable_ops_to_data(>ops);
>
> +   data->levels = 4;
> /* Copy values as union fields overlap */
> mair = cfg->arm_lpae_s1_cfg.mair[0];
> ttbr = cfg->arm_lpae_s1_cfg.ttbr[0];
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support

2019-05-29 Thread Tomeu Vizoso
On Tue, 21 May 2019 at 18:11, Clément Péron  wrote:
>
[snip]
> [  345.204813] panfrost 180.gpu: mmu irq status=1
> [  345.209617] panfrost 180.gpu: Unhandled Page fault in AS0 at VA
> 0x02400400

From what I can see here, 0x02400400 points to the first byte
of the first submitted job descriptor.

So mapping buffers for the GPU doesn't seem to be working at all on
64-bit T-760.

Steven, Robin, do you have any idea of why this could be?

Thanks,

Tomeu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 3/3] drm/panfrost: Add initial panfrost driver

2019-04-10 Thread Tomeu Vizoso
On Wed, 10 Apr 2019 at 12:20, Steven Price  wrote:
>
> On 08/04/2019 22:04, Rob Herring wrote:
> > On Fri, Apr 5, 2019 at 7:30 AM Steven Price  wrote:
> >>
> >> On 01/04/2019 08:47, Rob Herring wrote:
> >>> This adds the initial driver for panfrost which supports Arm Mali
> >>> Midgard and Bifrost family of GPUs. Currently, only the T860 and
> >>> T760 Midgard GPUs have been tested.
> >
> > [...]
> >
> >>> + case 0xD3: return "TRANSTAB_BUS_FAULT_LEVEL3";
> >>> + case 0xD4: return "TRANSTAB_BUS_FAULT_LEVEL4";
> >>> + case 0xD8: return "ACCESS_FLAG";
> >>> + }
> >>> +
> >>> + if (panfrost_has_hw_feature(pfdev, HW_FEATURE_AARCH64_MMU)) {
> >>
> >> There's not a great deal of point in this conditional - you won't get
> >> the below exception codes from hardware which doesn't support it AARCH64
> >> page tables.
> >
> > I wasn't sure if "UNKNOWN" types could happen or not.
> >
> >>
> >>> + switch (exception_code) {
> >>> + case 0xC9 ... 0xCF: return "PERMISSION_FAULT";
> >>> + case 0xD9 ... 0xDF: return "ACCESS_FLAG";
> >>> + case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
> >>> + case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
> >>> + }
> >>> + }
> >>> +
> >>> + return "UNKNOWN";
> >>> +}
> >
> >>> +static inline int panfrost_model_cmp(struct panfrost_device *pfdev, s32 
> >>> id)
> >>> +{
> >>> + s32 match_id = pfdev->features.id;
> >>> +
> >>> + if (match_id & 0xf000)
> >>> + match_id &= 0xf00f;
> >>
> >> I'm puzzled by this, and really not sure if it's going to work out
> >> ignoring the ARCH_MINOR/ARCH_REV fields. But for now there's no real
> >> Bifrost support.
> >
> > That seemed to be enough for kbase to select features/issues.
>
> I can't deny that it seems to work for now, and indeed looking more
> closely at kbase that does seem to be the effect of the current code.
>
> >>> + switch (param->param) {
> >>> + case DRM_PANFROST_PARAM_GPU_ID:
> >>> + param->value = pfdev->features.id;
> >>
> >> This is unfortunate naming - this parameter is *not* the GPU_ID. It is
> >> the top half of the GPU_ID which kbase calls the PRODUCT_ID.
> >
> > I can rename it.
>
> It would help prevent confusion, thanks!
>
> >> I'm also somewhat surprised that you don't need loads of other
> >> properties from the GPU - in particular knowing the number of shader
> >> cores is useful for allocating the right amount of memory for TLS (and
> >> can't be obtained purely from the GPU_ID).
> >
> > We'll add them as userspace needs them.
>
> Fair enough. I'm not sure how much you want to provide "forward
> compatibility" by providing them early - the basic definitions are
> already in kbase. I found it a bit surprising that some feature
> registers are printed on probe, but not available to be queried.
>
> >>> +static int
> >>> +panfrost_lookup_bos(struct drm_device *dev,
> >>> +   struct drm_file *file_priv,
> >>> +   struct drm_panfrost_submit *args,
> >>> +   struct panfrost_job *job)
> >>> +{
> >>> + u32 *handles;
> >>> + int ret = 0;
> >>> +
> >>> + job->bo_count = args->bo_handle_count;
> >>> +
> >>> + if (!job->bo_count)
> >>> + return 0;
> >>> +
> >>> + job->bos = kvmalloc_array(job->bo_count,
> >>> +   sizeof(struct drm_gem_object *),
> >>> +   GFP_KERNEL | __GFP_ZERO);
> >>> + if (!job->bos)
> >>
> >> If we return here then job->bo_count has been set, but job->bos is
> >> invalid - this means that panfrost_job_cleanup() will then dereference
> >> NULL. Although maybe this is better fixed in panfrost_job_cleanup().
> >
> > Good catch. The fence arrays have the same problem. As does V3D which we 
> > copied.
> >
> >>> + return -ENOMEM;
> >>> +
> >>> + job->implicit_fences = kvmalloc_array(job->bo_count,
> >>> +   sizeof(struct dma_fence *),
> >>> +   GFP_KERNEL | __GFP_ZERO);
> >>> + if (!job->implicit_fences)
> >>> + return -ENOMEM;
> >
> >>> +static irqreturn_t panfrost_gpu_irq_handler(int irq, void *data)
> >>> +{
> >>> + struct panfrost_device *pfdev = data;
> >>> + u32 state = gpu_read(pfdev, GPU_INT_STAT);
> >>> + u32 fault_status = gpu_read(pfdev, GPU_FAULT_STATUS);
> >>> + u64 address;
> >>> + bool done = false;
> >>> +
> >>> + if (!state)
> >>> + return IRQ_NONE;
> >>> +
> >>> + if (state & GPU_IRQ_MASK_ERROR) {
> >>> + address = (u64) gpu_read(pfdev, GPU_FAULT_ADDRESS_HI) << 32;
> >>> + address |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO);
> >>> +
> >>> + dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at 0x%016llx\n",
> >>> +  fault_status & 0xFF, 
> >>> panfrost_exception_name(pfdev, fault_status),
> >>> +  address);
> >>> +
> >>> + if 

Re: [PATCH v2 3/3] drm/panfrost: Add initial panfrost driver

2019-04-09 Thread Tomeu Vizoso
On Mon, 8 Apr 2019 at 23:04, Rob Herring  wrote:
>
> On Fri, Apr 5, 2019 at 7:30 AM Steven Price  wrote:
> >
> > On 01/04/2019 08:47, Rob Herring wrote:
> > > This adds the initial driver for panfrost which supports Arm Mali
> > > Midgard and Bifrost family of GPUs. Currently, only the T860 and
> > > T760 Midgard GPUs have been tested.
>
> [...]
> > > +
> > > + if (status & JOB_INT_MASK_ERR(j)) {
> > > + job_write(pfdev, JS_COMMAND_NEXT(j), 
> > > JS_COMMAND_NOP);
> > > + job_write(pfdev, JS_COMMAND(j), 
> > > JS_COMMAND_HARD_STOP_0);
> >
> > Hard-stopping an already completed job isn't likely to do very much :)
> > Also you are using the "_0" version which is only valid when "job chain
> > disambiguation" is present.

Yeah, guess that can be removed.

> > I suspect in this case you should also be signalling the fence? At the
> > moment you rely on the GPU timeout recovering from the fault.
>
> I'll defer to Tomeu who wrote this (IIRC).

Yes, that would be an improvement.

> > One issue that I haven't got to the bottom of is that I can trigger a
> > lockdep splat:
> >
> > -8<--
> > panfrost ffa3.gpu: js fault, js=1, status=JOB_CONFIG_FAULT,
> > head=0x0, tail=0x0
> > root@debian:~/ddk_panfrost# panfrost ffa3.gpu: gpu sched timeout,
> > js=1, status=0x40, head=0x0, tail=0x0, sched_job=12a94ba6
> >
> > ==
> > WARNING: possible circular locking dependency detected
> > 5.0.0+ #32 Not tainted
> > --
> > kworker/1:0/608 is trying to acquire lock:
> > 89b1e2d8 (&(>job_lock)->rlock){-.-.}, at:
> > dma_fence_remove_callback+0x14/0x50
> >
> > but task is already holding lock:
> > a887e4b2 (&(>job_list_lock)->rlock){-.-.}, at:
> > drm_sched_stop+0x24/0x10c
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #1 (&(>job_list_lock)->rlock){-.-.}:
> >drm_sched_process_job+0x60/0x208
> >dma_fence_signal+0x1dc/0x1fc
> >panfrost_job_irq_handler+0x160/0x194
> >__handle_irq_event_percpu+0x80/0x388
> >handle_irq_event_percpu+0x24/0x78
> >handle_irq_event+0x38/0x5c
> >handle_fasteoi_irq+0xb4/0x128
> >generic_handle_irq+0x18/0x28
> >__handle_domain_irq+0xa0/0xb4
> >gic_handle_irq+0x4c/0x78
> >__irq_svc+0x70/0x98
> >arch_cpu_idle+0x20/0x3c
> >arch_cpu_idle+0x20/0x3c
> >do_idle+0x11c/0x22c
> >cpu_startup_entry+0x18/0x20
> >start_kernel+0x398/0x420
> >
> > -> #0 (&(>job_lock)->rlock){-.-.}:
> >_raw_spin_lock_irqsave+0x50/0x64
> >dma_fence_remove_callback+0x14/0x50
> >drm_sched_stop+0x5c/0x10c
> >panfrost_job_timedout+0xd0/0x180
> >drm_sched_job_timedout+0x34/0x5c
> >process_one_work+0x2ac/0x6a4
> >worker_thread+0x28c/0x3fc
> >kthread+0x13c/0x158
> >ret_from_fork+0x14/0x20
> >  (null)
> >
> > other info that might help us debug this:
> >
> >  Possible unsafe locking scenario:
> >
> >CPU0CPU1
> >
> >   lock(&(>job_list_lock)->rlock);
> >lock(&(>job_lock)->rlock);
> >lock(&(>job_list_lock)->rlock);
> >   lock(&(>job_lock)->rlock);
> >
> >  *** DEADLOCK ***
> >
> > 3 locks held by kworker/1:0/608:
> >  #0: 9b350627 ((wq_completion)"events"){+.+.}, at:
> > process_one_work+0x1f8/0x6a4
> >  #1: a802aa2d ((work_completion)(&(>work_tdr)->work)){+.+.}, at:
> > process_one_work+0x1f8/0x6a4
> >  #2: a887e4b2 (&(>job_list_lock)->rlock){-.-.}, at:
> > drm_sched_stop+0x24/0x10c
> >
> > stack backtrace:
> > CPU: 1 PID: 608 Comm: kworker/1:0 Not tainted 5.0.0+ #32
> > Hardware name: Rockchip (Device Tree)
> > Workqueue: events drm_sched_job_timedout
> > [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> > [] (show_stack) from [] (dump_stack+0x9c/0xd4)
> > [] (dump_stack) from []
> > (print_circular_bug.constprop.15+0x1fc/0x2cc)
> > [] (print_circular_bug.constprop.15) from []
> > (__lock_acquire+0xe5c/0x167c)
> > [] (__lock_acquire) from [] (lock_acquire+0xc4/0x210)
> > [] (lock_acquire) from []
> > (_raw_spin_lock_irqsave+0x50/0x64)
> > [] (_raw_spin_lock_irqsave) from []
> > (dma_fence_remove_callback+0x14/0x50)
> > [] (dma_fence_remove_callback) from []
> > (drm_sched_stop+0x5c/0x10c)
> > [] (drm_sched_stop) from []
> > (panfrost_job_timedout+0xd0/0x180)
> > [] (panfrost_job_timedout) from []
> > (drm_sched_job_timedout+0x34/0x5c)
> > [] (drm_sched_job_timedout) from []
> > (process_one_work+0x2ac/0x6a4)
> > [] (process_one_work) from []
> > (worker_thread+0x28c/0x3fc)
> > [] (worker_thread) from [] (kthread+0x13c/0x158)
> > [] (kthread) from [] (ret_from_fork+0x14/0x20)
> > Exception stack(0xeebd7fb0 to 0xeebd7ff8)
> > 7fa0: 

Re: [PATCH] iommu/rockchip: Fix "is stall active" check

2016-04-06 Thread Tomeu Vizoso
On 5 April 2016 at 21:10, Heiko Stuebner <he...@sntech.de> wrote:
> Am Dienstag, 5. April 2016, 18:28:58 schrieb John Keeping:
>> On Tue, 05 Apr 2016 10:21:27 -0700, Heiko Stuebner wrote:
>> > Am Dienstag, 5. April 2016, 15:05:46 schrieb John Keeping:
>> > > Since commit cd6438c5f844 ("iommu/rockchip: Reconstruct to support
>> > > multi
>> > > slaves") rk_iommu_is_stall_active() always returns false because the
>> > > bitwise AND operates on the boolean flag promoted to an integer and a
>> > > value that is either zero or BIT(2).
>> > >
>> > > Explicitly convert the right-hand value to a boolean so that both
>> > > sides
>> > > are guaranteed to be either zero or one.
>> > >
>> > > rk_iommu_is_paging_enabled() does not suffer from the same problem
>> > > since
>> > > RK_MMU_STATUS_PAGING_ENABLED is BIT(0), but let's apply the same
>> > > change
>> > > for consistency and to make it clear that it's correct without needing
>> > > to lookup the value.
>> > >
>> > > Fixes: cd6438c5f844 ("iommu/rockchip: Reconstruct to support multi
>> > > slaves") Signed-off-by: John Keeping <j...@metanate.com>
>> >
>> > Nice catch, thanks.
>> >
>> > Reviewed-by: Heiko Stuebner <he...@sntech.de>
>> >
>> > John, out of curiosity, how did you find that problem? Excess stall
>> > error
>> > messages or something else?
>>
>> I was testing the drm iommu patches I posted today [1] and noticed an
>> "Enable stall request timed out" error when unbinding the
>> display-subsystem, but the RK_MMU_STATUS value printed in the error
>> message indicated that stall active was set.
>>
>> [1] http://article.gmane.org/gmane.comp.video.dri.devel/150721
>
> So it seems both you and Tomeu were working to resolve the same issue [2].
>
> And John's patch seems to actually fix the underlying problem.

Indeed, I don't get spurious messages any more with this patch.

Tested-by: Tomeu Vizoso <tomeu.viz...@collabora.com>

Thanks,

Tomeu

>
> [2] https://lkml.org/lkml/2016/3/22/493
>
>
>> > > ---
>> > >
>> > >  drivers/iommu/rockchip-iommu.c | 8 
>> > >  1 file changed, 4 insertions(+), 4 deletions(-)
>> > >
>> > > diff --git a/drivers/iommu/rockchip-iommu.c
>> > > b/drivers/iommu/rockchip-iommu.c index a6f593a0a29e..5710a06c3049
>> > > 100644
>> > > --- a/drivers/iommu/rockchip-iommu.c
>> > > +++ b/drivers/iommu/rockchip-iommu.c
>> > > @@ -315,8 +315,8 @@ static bool rk_iommu_is_stall_active(struct
>> > > rk_iommu
>> > > *iommu) int i;
>> > >
>> > >   for (i = 0; i < iommu->num_mmu; i++)
>> > >
>> > > - active &= rk_iommu_read(iommu->bases[i], RK_MMU_STATUS) &
>> > > - RK_MMU_STATUS_STALL_ACTIVE;
>> > > + active &= !!(rk_iommu_read(iommu->bases[i], RK_MMU_STATUS) &
>> > > +RK_MMU_STATUS_STALL_ACTIVE);
>> > >
>> > >   return active;
>> > >
>> > >  }
>> > >
>> > > @@ -327,8 +327,8 @@ static bool rk_iommu_is_paging_enabled(struct
>> > > rk_iommu *iommu) int i;
>> > >
>> > >   for (i = 0; i < iommu->num_mmu; i++)
>> > >
>> > > - enable &= rk_iommu_read(iommu->bases[i], RK_MMU_STATUS) &
>> > > - RK_MMU_STATUS_PAGING_ENABLED;
>> > > + enable &= !!(rk_iommu_read(iommu->bases[i], RK_MMU_STATUS) &
>> > > +RK_MMU_STATUS_PAGING_ENABLED);
>> > >
>> > >   return enable;
>> > >
>> > >  }
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/rockchip: Only log stall errors when attaching

2016-03-22 Thread Tomeu Vizoso
Move the logging of timeouts when stalling the MMU to
rk_iommu_attach_device, as it's expected that sometimes the MMU won't
get stalled when detaching a device, and it's not a real problem that
would need to be communicated to the user.

Signed-off-by: Tomeu Vizoso <tomeu.viz...@collabora.com>
---
 drivers/iommu/rockchip-iommu.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 0253ab35c33b..65325b6742e6 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -346,13 +346,7 @@ static int rk_iommu_enable_stall(struct rk_iommu *iommu)
 
rk_iommu_command(iommu, RK_MMU_CMD_ENABLE_STALL);
 
-   ret = rk_wait_for(rk_iommu_is_stall_active(iommu), 1);
-   if (ret)
-   for (i = 0; i < iommu->num_mmu; i++)
-   dev_err(iommu->dev, "Enable stall request timed out, 
status: %#08x\n",
-   rk_iommu_read(iommu->bases[i], RK_MMU_STATUS));
-
-   return ret;
+   return rk_wait_for(rk_iommu_is_stall_active(iommu), 1);
 }
 
 static int rk_iommu_disable_stall(struct rk_iommu *iommu)
@@ -798,8 +792,12 @@ static int rk_iommu_attach_device(struct iommu_domain 
*domain,
return 0;
 
ret = rk_iommu_enable_stall(iommu);
-   if (ret)
+   if (ret) {
+   for (i = 0; i < iommu->num_mmu; i++)
+   dev_err(iommu->dev, "Enable stall request timed out, 
status: %#08x\n",
+   rk_iommu_read(iommu->bases[i], RK_MMU_STATUS));
return ret;
+   }
 
ret = rk_iommu_force_reset(iommu);
if (ret)
-- 
2.5.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/rockchip: Don't feed NULL res pointers to devres

2016-03-21 Thread Tomeu Vizoso
If we do, devres prints a "invalid resource" string in the error
loglevel.

Signed-off-by: Tomeu Vizoso <tomeu.viz...@collabora.com>
---
 drivers/iommu/rockchip-iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index a6f593a0a29e..0253ab35c33b 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1049,6 +1049,8 @@ static int rk_iommu_probe(struct platform_device *pdev)
 
for (i = 0; i < pdev->num_resources; i++) {
res = platform_get_resource(pdev, IORESOURCE_MEM, i);
+   if (!res)
+   continue;
iommu->bases[i] = devm_ioremap_resource(>dev, res);
if (IS_ERR(iommu->bases[i]))
continue;
-- 
2.5.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/12] clk: tegra: Implement memory-controller clock

2014-10-20 Thread Tomeu Vizoso
On 13 October 2014 12:33, Thierry Reding thierry.red...@gmail.com wrote:
 From: Thierry Reding tred...@nvidia.com

 The memory controller clock runs either at half or the same frequency as
 the EMC clock.

 Signed-off-by: Thierry Reding tred...@nvidia.com
 ---
 Changes in v3:
 - split registration into a separate function that can be reused for all
   SoC generations, but pass in the name and parent parameters for
   clarity as well as the register address (in case it ever changes) and
   the EMC spin-lock since it isn't globally available

  drivers/clk/tegra/clk-divider.c  | 13 +
  drivers/clk/tegra/clk-tegra114.c |  7 ++-
  drivers/clk/tegra/clk-tegra124.c |  7 ++-
  drivers/clk/tegra/clk-tegra20.c  |  8 +++-
  drivers/clk/tegra/clk-tegra30.c  |  7 ++-
  drivers/clk/tegra/clk.h  |  2 ++
  include/dt-bindings/clock/tegra114-car.h |  2 +-
  include/dt-bindings/clock/tegra124-car.h |  2 +-
  include/dt-bindings/clock/tegra20-car.h  |  2 +-
  9 files changed, 43 insertions(+), 7 deletions(-)

 diff --git a/drivers/clk/tegra/clk-divider.c b/drivers/clk/tegra/clk-divider.c
 index 290f9c1a3749..84e1b3c1fb2a 100644
 --- a/drivers/clk/tegra/clk-divider.c
 +++ b/drivers/clk/tegra/clk-divider.c
 @@ -185,3 +185,16 @@ struct clk *tegra_clk_register_divider(const char *name,

 return clk;
  }
 +
 +static const struct clk_div_table mc_div_table[] = {
 +   { .val = 0, .div = 2 },
 +   { .val = 1, .div = 1 },
 +   { .val = 0, .div = 0 },
 +};
 +
 +struct clk *tegra_clk_register_mc(const char *name, const char *parent_name,
 + void __iomem *reg, spinlock_t *lock)
 +{
 +   return clk_register_divider_table(NULL, mc, emc_mux, 0, reg,

Should take name and parent_name from the passed args?

With that change:

Reviewed-By: Tomeu Vizoso tomeu.viz...@collabora.com

Cheers,

Tomeu

 + 16, 1, 0, mc_div_table, lock);
 +}
 diff --git a/drivers/clk/tegra/clk-tegra114.c 
 b/drivers/clk/tegra/clk-tegra114.c
 index f760f31d05c4..0b03d2cf7264 100644
 --- a/drivers/clk/tegra/clk-tegra114.c
 +++ b/drivers/clk/tegra/clk-tegra114.c
 @@ -173,6 +173,7 @@ static DEFINE_SPINLOCK(pll_d_lock);
  static DEFINE_SPINLOCK(pll_d2_lock);
  static DEFINE_SPINLOCK(pll_u_lock);
  static DEFINE_SPINLOCK(pll_re_lock);
 +static DEFINE_SPINLOCK(emc_lock);

  static struct div_nmp pllxc_nmp = {
 .divm_shift = 0,
 @@ -1228,7 +1229,11 @@ static __init void tegra114_periph_clk_init(void 
 __iomem *clk_base,
ARRAY_SIZE(mux_pllmcp_clkm),
CLK_SET_RATE_NO_REPARENT,
clk_base + CLK_SOURCE_EMC,
 -  29, 3, 0, NULL);
 +  29, 3, 0, emc_lock);
 +
 +   clk = tegra_clk_register_mc(mc, emc_mux, clk_base + 
 CLK_SOURCE_EMC,
 +   emc_lock);
 +   clks[TEGRA114_CLK_MC] = clk;

 for (i = 0; i  ARRAY_SIZE(tegra_periph_clk_list); i++) {
 data = tegra_periph_clk_list[i];
 diff --git a/drivers/clk/tegra/clk-tegra124.c 
 b/drivers/clk/tegra/clk-tegra124.c
 index e3a85842ce0c..f5f9baca7bb6 100644
 --- a/drivers/clk/tegra/clk-tegra124.c
 +++ b/drivers/clk/tegra/clk-tegra124.c
 @@ -132,6 +132,7 @@ static DEFINE_SPINLOCK(pll_d2_lock);
  static DEFINE_SPINLOCK(pll_e_lock);
  static DEFINE_SPINLOCK(pll_re_lock);
  static DEFINE_SPINLOCK(pll_u_lock);
 +static DEFINE_SPINLOCK(emc_lock);

  /* possible OSC frequencies in Hz */
  static unsigned long tegra124_input_freq[] = {
 @@ -1127,7 +1128,11 @@ static __init void tegra124_periph_clk_init(void 
 __iomem *clk_base,
 clk = clk_register_mux(NULL, emc_mux, mux_pllmcp_clkm,
ARRAY_SIZE(mux_pllmcp_clkm), 0,
clk_base + CLK_SOURCE_EMC,
 -  29, 3, 0, NULL);
 +  29, 3, 0, emc_lock);
 +
 +   clk = tegra_clk_register_mc(mc, emc_mux, clk_base + 
 CLK_SOURCE_EMC,
 +   emc_lock);
 +   clks[TEGRA124_CLK_MC] = clk;

 /* cml0 */
 clk = clk_register_gate(NULL, cml0, pll_e, 0, clk_base + PLLE_AUX,
 diff --git a/drivers/clk/tegra/clk-tegra20.c b/drivers/clk/tegra/clk-tegra20.c
 index dace2b1b5ae6..41272dcc9e22 100644
 --- a/drivers/clk/tegra/clk-tegra20.c
 +++ b/drivers/clk/tegra/clk-tegra20.c
 @@ -140,6 +140,8 @@ static struct cpu_clk_suspend_context {
  static void __iomem *clk_base;
  static void __iomem *pmc_base;

 +static DEFINE_SPINLOCK(emc_lock);
 +
  #define TEGRA_INIT_DATA_MUX(_name, _parents, _offset,  \
 _clk_num, _gate_flags, _clk_id) \
 TEGRA_INIT_DATA(_name, NULL, NULL, _parents, _offset,   \
 @@ -819,11 +821,15 @@ static void __init tegra20_periph_clk_init(void)
ARRAY_SIZE(mux_pllmcp_clkm