Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2018-10-18 Thread Daniel Vetter
On Fri, Mar 16, 2018 at 12:26 AM Byungchul Park  wrote:
>
> On 3/15/2018 9:41 PM, Peter Zijlstra wrote:
> > On Thu, Mar 15, 2018 at 11:31:57AM +0100, Daniel Vetter wrote:
> >> Is there any progress on getting cross-release enabled again?
> >
> > Not yet, I'm still fighting the meltdown/spectre induced backlog.
> >
> > We'll get to it eventually.
>
> Please let me know when you get available so that I can start.

Hi Byungchul,

Do you have a branch somewhere with latest/rebased cross-release
patches? I'd be interested in resurrecting them, and least for local
testing in drm. I'm still carrying around some patches to improve
annotations here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2018-03-16 Thread Byungchul Park

On 3/15/2018 9:41 PM, Peter Zijlstra wrote:

On Thu, Mar 15, 2018 at 11:31:57AM +0100, Daniel Vetter wrote:

Is there any progress on getting cross-release enabled again?


Not yet, I'm still fighting the meltdown/spectre induced backlog.

We'll get to it eventually.


Please let me know when you get available so that I can start.

--
Thanks,
Byungchul
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2018-03-15 Thread Peter Zijlstra
On Thu, Mar 15, 2018 at 11:31:57AM +0100, Daniel Vetter wrote:
> Is there any progress on getting cross-release enabled again?

Not yet, I'm still fighting the meltdown/spectre induced backlog.

We'll get to it eventually.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2018-03-15 Thread Daniel Vetter
On Wed, Dec 20, 2017 at 2:14 AM, Byungchul Park  wrote:
> On 12/19/2017 6:59 PM, Daniel Vetter wrote:
>>
>> On Mon, Dec 18, 2017 at 09:42:13AM -0800, Linus Torvalds wrote:
>>>
>>> On Sun, Dec 17, 2017 at 11:11 PM, Daniel Vetter  wrote:


 This didn't seem to have made it into -rc4. Anything needed to get it
 going?
>>>
>>>
>>> Do you actually see the problem in -rc4?
>>>
>>> Because we ended up removing the cross-release checking due to other
>>> developers complaining. It seemed to need a lot more work before it
>>> was ready.
>>>
>>> So I suspect the patch is simply not relevant any more (even if it's
>>> not necessarily wrong either).
>>
>>
>> Awww ... I like the cross release stuff, it's catching lots of good bugs
>> for us - writing a gpu memory manager that's supposed to interface with
>> the core one ain't the simplest thing to get right. That's also why we're
>> going around and fixing up fallout (we've worked on the worker annotations
>> for 4.14 too). I guess next release, hopefully.
>>
>> I think between 4.14 and 4.15 attemps cross-release spotted around 5 or so
>> genuine deadlocks in i915 code. And at least right now, with the current
>> set of fixups our CI runs are splat-free. So at least a Kconfig option
>> would be nice, to be able to enable it easily when you want to.
>>
>> Wrt us not noticing: Getting the patches in through normal means takes too
>> long, so we constantly carry arounda  bunch of fixup patches to address
>> serious issues that block CI (no lockdep is no good CI). That's why we
>> won't immediately notice when an issue is resolved some other way.
>
>
> Hello Ingo and Linus,
>
> IMO, taking it off by default is enough. What fs folk actually
> wanted is not to remove whole stuff but make it off by default.
>
> Cross-release logic itself makes sense. Please consider it back
> and apply my patch making it off by default.
>
> I will do what I can do for the classification in layered fs.

Is there any progress on getting cross-release enabled again?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-20 Thread Byungchul Park

On 12/19/2017 6:59 PM, Daniel Vetter wrote:

On Mon, Dec 18, 2017 at 09:42:13AM -0800, Linus Torvalds wrote:

On Sun, Dec 17, 2017 at 11:11 PM, Daniel Vetter  wrote:


This didn't seem to have made it into -rc4. Anything needed to get it
going?


Do you actually see the problem in -rc4?

Because we ended up removing the cross-release checking due to other
developers complaining. It seemed to need a lot more work before it
was ready.

So I suspect the patch is simply not relevant any more (even if it's
not necessarily wrong either).


Awww ... I like the cross release stuff, it's catching lots of good bugs
for us - writing a gpu memory manager that's supposed to interface with
the core one ain't the simplest thing to get right. That's also why we're
going around and fixing up fallout (we've worked on the worker annotations
for 4.14 too). I guess next release, hopefully.

I think between 4.14 and 4.15 attemps cross-release spotted around 5 or so
genuine deadlocks in i915 code. And at least right now, with the current
set of fixups our CI runs are splat-free. So at least a Kconfig option
would be nice, to be able to enable it easily when you want to.

Wrt us not noticing: Getting the patches in through normal means takes too
long, so we constantly carry arounda  bunch of fixup patches to address
serious issues that block CI (no lockdep is no good CI). That's why we
won't immediately notice when an issue is resolved some other way.


Hello Ingo and Linus,

IMO, taking it off by default is enough. What fs folk actually
wanted is not to remove whole stuff but make it off by default.

Cross-release logic itself makes sense. Please consider it back
and apply my patch making it off by default.

I will do what I can do for the classification in layered fs.

--
Thanks,
Byungchul
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-19 Thread Daniel Vetter
On Mon, Dec 18, 2017 at 09:42:13AM -0800, Linus Torvalds wrote:
> On Sun, Dec 17, 2017 at 11:11 PM, Daniel Vetter  wrote:
> >
> > This didn't seem to have made it into -rc4. Anything needed to get it
> > going?
> 
> Do you actually see the problem in -rc4?
> 
> Because we ended up removing the cross-release checking due to other
> developers complaining. It seemed to need a lot more work before it
> was ready.
> 
> So I suspect the patch is simply not relevant any more (even if it's
> not necessarily wrong either).

Awww ... I like the cross release stuff, it's catching lots of good bugs
for us - writing a gpu memory manager that's supposed to interface with
the core one ain't the simplest thing to get right. That's also why we're
going around and fixing up fallout (we've worked on the worker annotations
for 4.14 too). I guess next release, hopefully.

I think between 4.14 and 4.15 attemps cross-release spotted around 5 or so
genuine deadlocks in i915 code. And at least right now, with the current
set of fixups our CI runs are splat-free. So at least a Kconfig option
would be nice, to be able to enable it easily when you want to.

Wrt us not noticing: Getting the patches in through normal means takes too
long, so we constantly carry arounda  bunch of fixup patches to address
serious issues that block CI (no lockdep is no good CI). That's why we
won't immediately notice when an issue is resolved some other way.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-18 Thread Linus Torvalds
On Sun, Dec 17, 2017 at 11:11 PM, Daniel Vetter  wrote:
>
> This didn't seem to have made it into -rc4. Anything needed to get it
> going?

Do you actually see the problem in -rc4?

Because we ended up removing the cross-release checking due to other
developers complaining. It seemed to need a lot more work before it
was ready.

So I suspect the patch is simply not relevant any more (even if it's
not necessarily wrong either).

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-17 Thread Daniel Vetter
On Mon, Dec 11, 2017 at 10:19:28AM +0100, Daniel Vetter wrote:
> On Fri, Dec 08, 2017 at 11:54:19AM +0100, Peter Zijlstra wrote:
> > On Thu, Dec 07, 2017 at 11:08:49AM +0100, Daniel Vetter wrote:
> > > Since -rc1 we're hitting a bunch of lockdep splats using the new
> > > cross-release stuff around the 2 kthread completions. In all cases
> > > they are because totally independent uses of kthread are mixed up by
> > > lockdep into the same locking class, creating artificial deadlocks.
> > > 
> > > Fix this by converting kthread code in the same way as e.g.
> > > alloc_workqueue already works: Use macros for the public api so we can
> > > have a callsite specific lockdep key, then pass that through the
> > > entire callchain. Due to the many entry points this is slightly
> > > tedious.
> > > 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Marta Lofstedt 
> > > Cc: Byungchul Park 
> > > Cc: Ingo Molnar 
> > > Cc: Peter Zijlstra 
> > > Cc: Tejun Heo 
> > > Cc: Kees Cook 
> > > Cc: Thomas Gleixner 
> > > Cc: Shaohua Li 
> > > Cc: Andrew Morton 
> > > Cc: Jens Axboe 
> > > Cc: Daniel Vetter 
> > > Cc: Greg Kroah-Hartman 
> > > Cc: Jonathan Corbet 
> > > Cc: Oleg Nesterov 
> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=103950
> > > Signed-off-by: Daniel Vetter 
> > 
> > Acked-by: Peter Zijlstra (Intel) 
> 
> Who's going to pick this up? Ingo, Andrew?

This didn't seem to have made it into -rc4. Anything needed to get it
going?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-11 Thread Daniel Vetter
On Fri, Dec 08, 2017 at 11:54:19AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 07, 2017 at 11:08:49AM +0100, Daniel Vetter wrote:
> > Since -rc1 we're hitting a bunch of lockdep splats using the new
> > cross-release stuff around the 2 kthread completions. In all cases
> > they are because totally independent uses of kthread are mixed up by
> > lockdep into the same locking class, creating artificial deadlocks.
> > 
> > Fix this by converting kthread code in the same way as e.g.
> > alloc_workqueue already works: Use macros for the public api so we can
> > have a callsite specific lockdep key, then pass that through the
> > entire callchain. Due to the many entry points this is slightly
> > tedious.
> > 
> > Cc: Tvrtko Ursulin 
> > Cc: Marta Lofstedt 
> > Cc: Byungchul Park 
> > Cc: Ingo Molnar 
> > Cc: Peter Zijlstra 
> > Cc: Tejun Heo 
> > Cc: Kees Cook 
> > Cc: Thomas Gleixner 
> > Cc: Shaohua Li 
> > Cc: Andrew Morton 
> > Cc: Jens Axboe 
> > Cc: Daniel Vetter 
> > Cc: Greg Kroah-Hartman 
> > Cc: Jonathan Corbet 
> > Cc: Oleg Nesterov 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=103950
> > Signed-off-by: Daniel Vetter 
> 
> Acked-by: Peter Zijlstra (Intel) 

Who's going to pick this up? Ingo, Andrew?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-07 Thread Peter Zijlstra
On Thu, Dec 07, 2017 at 03:58:28PM +0100, Daniel Vetter wrote:

> [   85.069417] gem_exec_captur/2810 is trying to acquire lock:
> [   85.069419]  ((completion)>parked){+.+.}, at: [] 
> kthread_park+0x3d/0x50
> [   85.069426] 
>but task is already holding lock:
> [   85.069428]  (>struct_mutex){+.+.}, at: [] 
> i915_reset_device+0x1bd/0x230 [i915]
> [   85.069448] 
>which lock already depends on the new lock.
> 
> [   85.069451] 
>the existing dependency chain (in reverse order) is:
> [   85.069454] 
>-> #3 (>struct_mutex){+.+.}:
> [   85.069460]__mutex_lock+0x81/0x9b0
> [   85.069481]i915_mutex_lock_interruptible+0x47/0x130 [i915]
> [   85.069502]i915_gem_fault+0x201/0x760 [i915]
> [   85.069507]__do_fault+0x15/0x70
> [   85.069509]__handle_mm_fault+0x7bf/0xda0
> [   85.069512]handle_mm_fault+0x14f/0x2f0
> [   85.069515]__do_page_fault+0x2d1/0x560
> [   85.069518]page_fault+0x22/0x30
> [   85.069520] 
>-> #2 (>mmap_sem){}:
> [   85.069525]__might_fault+0x63/0x90
> [   85.069529]_copy_to_user+0x1e/0x70
> [   85.069532]perf_read+0x21d/0x290
> [   85.069534]__vfs_read+0x1e/0x120
> [   85.069536]vfs_read+0xa1/0x150
> [   85.069539]SyS_read+0x40/0xa0
> [   85.069541]entry_SYSCALL_64_fastpath+0x1c/0x89

>-> #0 ((completion)>parked){+.+.}:

> [   85.069692] Chain exists of:
>  (completion)>parked --> >mmap_sem --> 
> >struct_mutex

> [   85.069718] 3 locks held by gem_exec_captur/2810:

> [   85.069732]  #2:  (>struct_mutex){+.+.}, at: [] 
> i915_reset_device+0x1bd/0x230 [i915]

>stack backtrace:

> [   85.069788]  lock_acquire+0xaf/0x200
> [   85.069793]  wait_for_common+0x54/0x210
> [   85.069807]  kthread_park+0x3d/0x50
> [   85.069827]  i915_gem_reset_prepare_engine+0x1d/0x90 [i915]
> [   85.069849]  i915_gem_reset_prepare+0x2c/0x60 [i915]
> [   85.069865]  i915_reset+0x66/0x230 [i915]
> [   85.069881]  i915_reset_device+0x1cb/0x230 [i915]
> [   85.069919]  i915_handle_error+0x2d3/0x430 [i915]
> [   85.069951]  i915_wedged_set+0x79/0xc0 [i915]
> [   85.069955]  simple_attr_write+0xab/0xc0
> [   85.069959]  full_proxy_write+0x4b/0x70
> [   85.069961]  __vfs_write+0x1e/0x130
> [   85.069976]  vfs_write+0xc0/0x1b0
> [   85.069979]  SyS_write+0x40/0xa0
> [   85.069982]  entry_SYSCALL_64_fastpath+0x1c/0x89


Thread-Ak-Thread

i915_reset_device
#3mutex_lock(>struct_mutex)
  i915_reset()
i915_gem_reset_prepare()
  i915_gem_reset_prepare_engine()
kthread_park()

__do_page_fault()
#2down_read(>mmap_sem);
  handle_mm_fault()
__handle_mm_fault()
  __do_fault()
i915_gem_fault()
  
i915_mutex_lock_interruptible()
#3  
mutex_lock(>struct_mutex)

/* twiddles thumbs 
forever more */

#0wait_for_common()

#0  complete()


Is what it says I suppose. Now I don't know enough about that i915 code
to say if that breadcrumbs_signal thread can ever trigger a fault or
not. I got properly lost in that dma_fence callback maze.

You're saying not?


(also, that comment near need_resched() doesn't make sense to me)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] kthread: finer-grained lockdep/cross-release completion

2017-12-07 Thread Peter Zijlstra
On Thu, Dec 07, 2017 at 11:08:49AM +0100, Daniel Vetter wrote:
> Since -rc1 we're hitting a bunch of lockdep splats using the new
> cross-release stuff around the 2 kthread completions. In all cases
> they are because totally independent uses of kthread are mixed up by
> lockdep into the same locking class, creating artificial deadlocks.
> 
> Fix this by converting kthread code in the same way as e.g.
> alloc_workqueue already works: Use macros for the public api so we can
> have a callsite specific lockdep key, then pass that through the
> entire callchain. Due to the many entry points this is slightly
> tedious.

Do you have a splat somewhere? I'm having trouble seeing how all this
comes together. That is, I've no real idea what you're actual problem is
and if this is the right solution.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel