Re: [Intel-gfx] Oops with i915

2018-06-18 Thread Ville Syrjälä
On Mon, Jun 18, 2018 at 01:29:02PM +0100, Sudip Mukherjee wrote:
> Hi Ville,
> 
> On Mon, Jun 18, 2018 at 03:09:15PM +0300, Ville Syrjälä wrote:
> > On Thu, Jun 07, 2018 at 11:06:33AM +0100, Sudip Mukherjee wrote:
> > > Hi All,
> > > 
> > > We are running v4.14.47 kernel and recently in one of our test cycle
> > > we saw the below trace. I know this is not the usual way to raise a
> > > BUG report, but since this was seen only once in one of the automated
> > > test cycle so I donot have anything else apart from this trace.
> > > Is this a known issue? Will appreciate any help in understanding what
> > > the problem might be.
> > > 
> > > [ 1176.909543] BUG: unable to handle kernel paging request at 8298fb0a
> > > [ 1176.916565] IP: queued_spin_lock_slowpath+0xfc/0x142
> > > [ 1176.922111] *pdpt = 3367a001 *pde = 
> > > [ 1176.928534] Oops: 0002 [#1] PREEMPT SMP
> > > [ 1177.002434] CPU: 2 PID: 24688 Comm: kworker/u8:4 Tainted: G U 
> > > O4.14.47-20180606-a6b8390e8cc1de032b8314d1a5b193fe9e21f325 #1
> > > [ 1177.024120] Workqueue: events_unbound intel_atomic_commit_work
> > > [ 1177.030630] task: ef2ee200 task.stack: efbf4000
> > > [ 1177.035685] EIP: queued_spin_lock_slowpath+0xfc/0x142
> > > [ 1177.041327] EFLAGS: 00010087 CPU: 2
> > > [ 1177.045212] EAX: 8298fb0a EBX: 3ba0 ECX: ee82489c EDX: f4656fc0
> > > [ 1177.052215] ESI: 000c EDI: 0001 EBP: efbf5e88 ESP: efbf5e78
> > > [ 1177.059217]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> > > [ 1177.065239] CR0: 80050033 CR2: 8298fb0a CR3: 2e8ed320 CR4: 001006f0
> > > [ 1177.072240] Call Trace:
> > > [ 1177.074973]  _raw_spin_lock_irqsave+0x28/0x2d
> > > [ 1177.079840]  complete_all+0x12/0x36
> > > [ 1177.083737]  drm_atomic_helper_commit_hw_done+0x3c/0x43
> > > [ 1177.089576]  intel_atomic_commit_tail+0xa5f/0xbd9
> > > [ 1177.094832]  ? wait_woken+0x5a/0x5a
> > > [ 1177.098727]  ? wait_woken+0x5a/0x5a
> > > [ 1177.102622]  intel_atomic_commit_work+0xb/0xd
> > > [ 1177.107489]  ? intel_atomic_commit_work+0xb/0xd
> > > [ 1177.112551]  process_one_work+0x109/0x1ee
> > > [ 1177.117029]  worker_thread+0x1a4/0x257
> > > [ 1177.121215]  kthread+0xee/0xf3
> > > [ 1177.124625]  ? rescuer_thread+0x207/0x207
> > > [ 1177.129103]  ? kthread_create_on_node+0x1a/0x1a
> > > [ 1177.134165]  ret_from_fork+0x2e/0x38
> > > [ 1177.138156] Code: 12 09 de 89 f0 89 75 f0 c1 e8 10 66 87 41 02 89 c3 
> > > c1 e3 10 74 51 83 e0 03 c1 eb 12 6b c0 0c 05 c0 1f 7e c1 03 04 9d d8 b1 
> > > 6c c1 <89> 10 8b 42 04 85 c0 75 04 f3 90 eb f5 8b 1a 85 db 74 03 0f 0d
> > > [ 1177.159204] EIP: queued_spin_lock_slowpath+0xfc/0x142 SS:ESP: 
> > > 0068:efbf5e78
> > > [ 1177.166983] CR2: 8298fb0a
> > 
> > Presumably a use after free in atomic. Possibly 21a01abbe32a
> > ("drm/atomic: Fix freeing connector/plane state too early by tracking
> > commits, v3.") But there may have been other similar fixes.
> 
> Thanks for your reply. I also thought so as the stacktrace showed it was
> using an invalid memory for the old_state. And so I applied:
> 21a01abbe32a ("drm/atomic: Fix freeing connector/plane state too early by 
> tracking commits, v3.")
> on top of v4.14.47. It also needed:
> 1) f46640b931e5 ("drm/atomic: Return commit in drm_crtc_commit_get for better 
> annotation")
> 2) 163bcc2c74a2 ("drm/atomic: Move drm_crtc_commit to drm_crtc_state, v4.")
> 
> to apply cleanly. But after that the occurance rate increased.
> Did I miss something else also?

No idea. I suggest a reverse bisect to find out when it got fixed in
upstream.

-- 
Ville Syrjälä
Intel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] Oops with i915

2018-06-18 Thread Sudip Mukherjee
Hi Ville,

On Mon, Jun 18, 2018 at 03:09:15PM +0300, Ville Syrjälä wrote:
> On Thu, Jun 07, 2018 at 11:06:33AM +0100, Sudip Mukherjee wrote:
> > Hi All,
> > 
> > We are running v4.14.47 kernel and recently in one of our test cycle
> > we saw the below trace. I know this is not the usual way to raise a
> > BUG report, but since this was seen only once in one of the automated
> > test cycle so I donot have anything else apart from this trace.
> > Is this a known issue? Will appreciate any help in understanding what
> > the problem might be.
> > 
> > [ 1176.909543] BUG: unable to handle kernel paging request at 8298fb0a
> > [ 1176.916565] IP: queued_spin_lock_slowpath+0xfc/0x142
> > [ 1176.922111] *pdpt = 3367a001 *pde = 
> > [ 1176.928534] Oops: 0002 [#1] PREEMPT SMP
> > [ 1177.002434] CPU: 2 PID: 24688 Comm: kworker/u8:4 Tainted: G U O  
> >   4.14.47-20180606-a6b8390e8cc1de032b8314d1a5b193fe9e21f325 #1
> > [ 1177.024120] Workqueue: events_unbound intel_atomic_commit_work
> > [ 1177.030630] task: ef2ee200 task.stack: efbf4000
> > [ 1177.035685] EIP: queued_spin_lock_slowpath+0xfc/0x142
> > [ 1177.041327] EFLAGS: 00010087 CPU: 2
> > [ 1177.045212] EAX: 8298fb0a EBX: 3ba0 ECX: ee82489c EDX: f4656fc0
> > [ 1177.052215] ESI: 000c EDI: 0001 EBP: efbf5e88 ESP: efbf5e78
> > [ 1177.059217]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> > [ 1177.065239] CR0: 80050033 CR2: 8298fb0a CR3: 2e8ed320 CR4: 001006f0
> > [ 1177.072240] Call Trace:
> > [ 1177.074973]  _raw_spin_lock_irqsave+0x28/0x2d
> > [ 1177.079840]  complete_all+0x12/0x36
> > [ 1177.083737]  drm_atomic_helper_commit_hw_done+0x3c/0x43
> > [ 1177.089576]  intel_atomic_commit_tail+0xa5f/0xbd9
> > [ 1177.094832]  ? wait_woken+0x5a/0x5a
> > [ 1177.098727]  ? wait_woken+0x5a/0x5a
> > [ 1177.102622]  intel_atomic_commit_work+0xb/0xd
> > [ 1177.107489]  ? intel_atomic_commit_work+0xb/0xd
> > [ 1177.112551]  process_one_work+0x109/0x1ee
> > [ 1177.117029]  worker_thread+0x1a4/0x257
> > [ 1177.121215]  kthread+0xee/0xf3
> > [ 1177.124625]  ? rescuer_thread+0x207/0x207
> > [ 1177.129103]  ? kthread_create_on_node+0x1a/0x1a
> > [ 1177.134165]  ret_from_fork+0x2e/0x38
> > [ 1177.138156] Code: 12 09 de 89 f0 89 75 f0 c1 e8 10 66 87 41 02 89 c3 c1 
> > e3 10 74 51 83 e0 03 c1 eb 12 6b c0 0c 05 c0 1f 7e c1 03 04 9d d8 b1 6c c1 
> > <89> 10 8b 42 04 85 c0 75 04 f3 90 eb f5 8b 1a 85 db 74 03 0f 0d
> > [ 1177.159204] EIP: queued_spin_lock_slowpath+0xfc/0x142 SS:ESP: 
> > 0068:efbf5e78
> > [ 1177.166983] CR2: 8298fb0a
> 
> Presumably a use after free in atomic. Possibly 21a01abbe32a
> ("drm/atomic: Fix freeing connector/plane state too early by tracking
> commits, v3.") But there may have been other similar fixes.

Thanks for your reply. I also thought so as the stacktrace showed it was
using an invalid memory for the old_state. And so I applied:
21a01abbe32a ("drm/atomic: Fix freeing connector/plane state too early by 
tracking commits, v3.")
on top of v4.14.47. It also needed:
1) f46640b931e5 ("drm/atomic: Return commit in drm_crtc_commit_get for better 
annotation")
2) 163bcc2c74a2 ("drm/atomic: Move drm_crtc_commit to drm_crtc_state, v4.")

to apply cleanly. But after that the occurance rate increased.
Did I miss something else also?
Will apprecate your help in finding a fix to this.

--
Regards
Sudip
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] Oops with i915

2018-06-18 Thread Ville Syrjälä
On Thu, Jun 07, 2018 at 11:06:33AM +0100, Sudip Mukherjee wrote:
> Hi All,
> 
> We are running v4.14.47 kernel and recently in one of our test cycle
> we saw the below trace. I know this is not the usual way to raise a
> BUG report, but since this was seen only once in one of the automated
> test cycle so I donot have anything else apart from this trace.
> Is this a known issue? Will appreciate any help in understanding what
> the problem might be.
> 
> [ 1176.909543] BUG: unable to handle kernel paging request at 8298fb0a
> [ 1176.916565] IP: queued_spin_lock_slowpath+0xfc/0x142
> [ 1176.922111] *pdpt = 3367a001 *pde = 
> [ 1176.928534] Oops: 0002 [#1] PREEMPT SMP
> [ 1177.002434] CPU: 2 PID: 24688 Comm: kworker/u8:4 Tainted: G U O
> 4.14.47-20180606-a6b8390e8cc1de032b8314d1a5b193fe9e21f325 #1
> [ 1177.024120] Workqueue: events_unbound intel_atomic_commit_work
> [ 1177.030630] task: ef2ee200 task.stack: efbf4000
> [ 1177.035685] EIP: queued_spin_lock_slowpath+0xfc/0x142
> [ 1177.041327] EFLAGS: 00010087 CPU: 2
> [ 1177.045212] EAX: 8298fb0a EBX: 3ba0 ECX: ee82489c EDX: f4656fc0
> [ 1177.052215] ESI: 000c EDI: 0001 EBP: efbf5e88 ESP: efbf5e78
> [ 1177.059217]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> [ 1177.065239] CR0: 80050033 CR2: 8298fb0a CR3: 2e8ed320 CR4: 001006f0
> [ 1177.072240] Call Trace:
> [ 1177.074973]  _raw_spin_lock_irqsave+0x28/0x2d
> [ 1177.079840]  complete_all+0x12/0x36
> [ 1177.083737]  drm_atomic_helper_commit_hw_done+0x3c/0x43
> [ 1177.089576]  intel_atomic_commit_tail+0xa5f/0xbd9
> [ 1177.094832]  ? wait_woken+0x5a/0x5a
> [ 1177.098727]  ? wait_woken+0x5a/0x5a
> [ 1177.102622]  intel_atomic_commit_work+0xb/0xd
> [ 1177.107489]  ? intel_atomic_commit_work+0xb/0xd
> [ 1177.112551]  process_one_work+0x109/0x1ee
> [ 1177.117029]  worker_thread+0x1a4/0x257
> [ 1177.121215]  kthread+0xee/0xf3
> [ 1177.124625]  ? rescuer_thread+0x207/0x207
> [ 1177.129103]  ? kthread_create_on_node+0x1a/0x1a
> [ 1177.134165]  ret_from_fork+0x2e/0x38
> [ 1177.138156] Code: 12 09 de 89 f0 89 75 f0 c1 e8 10 66 87 41 02 89 c3 c1 e3 
> 10 74 51 83 e0 03 c1 eb 12 6b c0 0c 05 c0 1f 7e c1 03 04 9d d8 b1 6c c1 <89> 
> 10 8b 42 04 85 c0 75 04 f3 90 eb f5 8b 1a 85 db 74 03 0f 0d
> [ 1177.159204] EIP: queued_spin_lock_slowpath+0xfc/0x142 SS:ESP: 0068:efbf5e78
> [ 1177.166983] CR2: 8298fb0a

Presumably a use after free in atomic. Possibly 21a01abbe32a
("drm/atomic: Fix freeing connector/plane state too early by tracking
commits, v3.") But there may have been other similar fixes.

-- 
Ville Syrjälä
Intel
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] Oops with i915

2018-06-18 Thread Sudip Mukherjee
On Thu, Jun 07, 2018 at 11:06:33AM +0100, Sudip Mukherjee wrote:
> Hi All,
> 
> We are running v4.14.47 kernel and recently in one of our test cycle
> we saw the below trace. I know this is not the usual way to raise a
> BUG report, but since this was seen only once in one of the automated
> test cycle so I donot have anything else apart from this trace.
> Is this a known issue? Will appreciate any help in understanding what
> the problem might be.
> 
> [ 1176.909543] BUG: unable to handle kernel paging request at 8298fb0a
> [ 1176.916565] IP: queued_spin_lock_slowpath+0xfc/0x142
> [ 1176.922111] *pdpt = 3367a001 *pde = 
> [ 1176.928534] Oops: 0002 [#1] PREEMPT SMP
> [ 1177.002434] CPU: 2 PID: 24688 Comm: kworker/u8:4 Tainted: G U O
> 4.14.47-20180606-a6b8390e8cc1de032b8314d1a5b193fe9e21f325 #1
> [ 1177.024120] Workqueue: events_unbound intel_atomic_commit_work
> [ 1177.030630] task: ef2ee200 task.stack: efbf4000
> [ 1177.035685] EIP: queued_spin_lock_slowpath+0xfc/0x142
> [ 1177.041327] EFLAGS: 00010087 CPU: 2
> [ 1177.045212] EAX: 8298fb0a EBX: 3ba0 ECX: ee82489c EDX: f4656fc0
> [ 1177.052215] ESI: 000c EDI: 0001 EBP: efbf5e88 ESP: efbf5e78
> [ 1177.059217]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> [ 1177.065239] CR0: 80050033 CR2: 8298fb0a CR3: 2e8ed320 CR4: 001006f0
> [ 1177.072240] Call Trace:
> [ 1177.074973]  _raw_spin_lock_irqsave+0x28/0x2d
> [ 1177.079840]  complete_all+0x12/0x36
> [ 1177.083737]  drm_atomic_helper_commit_hw_done+0x3c/0x43
> [ 1177.089576]  intel_atomic_commit_tail+0xa5f/0xbd9
> [ 1177.094832]  ? wait_woken+0x5a/0x5a
> [ 1177.098727]  ? wait_woken+0x5a/0x5a
> [ 1177.102622]  intel_atomic_commit_work+0xb/0xd
> [ 1177.107489]  ? intel_atomic_commit_work+0xb/0xd
> [ 1177.112551]  process_one_work+0x109/0x1ee
> [ 1177.117029]  worker_thread+0x1a4/0x257
> [ 1177.121215]  kthread+0xee/0xf3
> [ 1177.124625]  ? rescuer_thread+0x207/0x207
> [ 1177.129103]  ? kthread_create_on_node+0x1a/0x1a
> [ 1177.134165]  ret_from_fork+0x2e/0x38
> [ 1177.138156] Code: 12 09 de 89 f0 89 75 f0 c1 e8 10 66 87 41 02 89 c3 c1 e3 
> 10 74 51 83 e0 03 c1 eb 12 6b c0 0c 05 c0 1f 7e c1 03 04 9d d8 b1 6c c1 <89> 
> 10 8b 42 04 85 c0 75 04 f3 90 eb f5 8b 1a 85 db 74 03 0f 0d
> [ 1177.159204] EIP: queued_spin_lock_slowpath+0xfc/0x142 SS:ESP: 0068:efbf5e78
> [ 1177.166983] CR2: 8298fb0a

A gentile ping on this issue. Can anyone please help me on this.

--
Regards
Sudip
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] Oops with i915

2018-06-07 Thread Sudip Mukherjee
Hi All,

We are running v4.14.47 kernel and recently in one of our test cycle
we saw the below trace. I know this is not the usual way to raise a
BUG report, but since this was seen only once in one of the automated
test cycle so I donot have anything else apart from this trace.
Is this a known issue? Will appreciate any help in understanding what
the problem might be.

[ 1176.909543] BUG: unable to handle kernel paging request at 8298fb0a
[ 1176.916565] IP: queued_spin_lock_slowpath+0xfc/0x142
[ 1176.922111] *pdpt = 3367a001 *pde = 
[ 1176.928534] Oops: 0002 [#1] PREEMPT SMP
[ 1177.002434] CPU: 2 PID: 24688 Comm: kworker/u8:4 Tainted: G U O
4.14.47-20180606-a6b8390e8cc1de032b8314d1a5b193fe9e21f325 #1
[ 1177.024120] Workqueue: events_unbound intel_atomic_commit_work
[ 1177.030630] task: ef2ee200 task.stack: efbf4000
[ 1177.035685] EIP: queued_spin_lock_slowpath+0xfc/0x142
[ 1177.041327] EFLAGS: 00010087 CPU: 2
[ 1177.045212] EAX: 8298fb0a EBX: 3ba0 ECX: ee82489c EDX: f4656fc0
[ 1177.052215] ESI: 000c EDI: 0001 EBP: efbf5e88 ESP: efbf5e78
[ 1177.059217]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
[ 1177.065239] CR0: 80050033 CR2: 8298fb0a CR3: 2e8ed320 CR4: 001006f0
[ 1177.072240] Call Trace:
[ 1177.074973]  _raw_spin_lock_irqsave+0x28/0x2d
[ 1177.079840]  complete_all+0x12/0x36
[ 1177.083737]  drm_atomic_helper_commit_hw_done+0x3c/0x43
[ 1177.089576]  intel_atomic_commit_tail+0xa5f/0xbd9
[ 1177.094832]  ? wait_woken+0x5a/0x5a
[ 1177.098727]  ? wait_woken+0x5a/0x5a
[ 1177.102622]  intel_atomic_commit_work+0xb/0xd
[ 1177.107489]  ? intel_atomic_commit_work+0xb/0xd
[ 1177.112551]  process_one_work+0x109/0x1ee
[ 1177.117029]  worker_thread+0x1a4/0x257
[ 1177.121215]  kthread+0xee/0xf3
[ 1177.124625]  ? rescuer_thread+0x207/0x207
[ 1177.129103]  ? kthread_create_on_node+0x1a/0x1a
[ 1177.134165]  ret_from_fork+0x2e/0x38
[ 1177.138156] Code: 12 09 de 89 f0 89 75 f0 c1 e8 10 66 87 41 02 89 c3 c1 e3 
10 74 51 83 e0 03 c1 eb 12 6b c0 0c 05 c0 1f 7e c1 03 04 9d d8 b1 6c c1 <89> 10 
8b 42 04 85 c0 75 04 f3 90 eb f5 8b 1a 85 db 74 03 0f 0d
[ 1177.159204] EIP: queued_spin_lock_slowpath+0xfc/0x142 SS:ESP: 0068:efbf5e78
[ 1177.166983] CR2: 8298fb0a


--
Regards
Sudip
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [OOPS] drm/i915/execlists: Remove too-early assert

2018-02-16 Thread Chris Wilson
Quoting Chris Wilson (2018-02-16 15:32:10)
> We can't assert that the execlists are active before we set the flag. So
> perform the assert after we are expected to have marked the execlists
> active.
> 
> Fixes: 339ccd35b42c ("drm/i915: Assert that we always complete a submission 
> to guc/execlists")
> Signed-off-by: Chris Wilson 
> Cc: Michał Winiarski 
> Cc: Mika Kuoppala 

From irc,
Acked-by: Tomi Sarvela 
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [OOPS] drm/i915/execlists: Remove too-early assert

2018-02-16 Thread Chris Wilson
We can't assert that the execlists are active before we set the flag. So
perform the assert after we are expected to have marked the execlists
active.

Fixes: 339ccd35b42c ("drm/i915: Assert that we always complete a submission to 
guc/execlists")
Signed-off-by: Chris Wilson 
Cc: Michał Winiarski 
Cc: Mika Kuoppala 
---
 drivers/gpu/drm/i915/intel_lrc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6fbe1a8a37ad..9b6d781b22ec 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -644,8 +644,6 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
port_assign(port, last);
 
/* We must always keep the beast fed if we have work piled up */
-   GEM_BUG_ON(port_isset(execlists->port) &&
-  !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
 
 unlock:
@@ -655,6 +653,9 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
execlists_set_active(execlists, EXECLISTS_ACTIVE_USER);
execlists_submit_ports(engine);
}
+
+   GEM_BUG_ON(port_isset(execlists->port) &&
+  !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER));
 }
 
 void
-- 
2.16.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx