Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-30 Thread Michael Marineau
On Thu, Nov 27, 2014 at 12:33 AM, Maarten Lankhorst
 wrote:
> Hey,
>
> Op 27-11-14 om 02:18 schreef Tobias Klausmann:
>>
>>
>> On 26.11.2014 21:29, Michael Marineau wrote:
>>> On Mon, Nov 24, 2014 at 11:43 PM, Maarten Lankhorst
>>>  wrote:
 Hey,

 Op 22-11-14 om 21:16 schreef Michael Marineau:
> On Nov 22, 2014 11:45 AM, "Michael Marineau"  wrote:
>> On Nov 22, 2014 8:56 AM, "Maarten Lankhorst" <
> maarten.lankho...@canonical.com> wrote:
>>> Hey,
>>>
>>> Op 22-11-14 om 01:19 schreef Michael Marineau:
 On Thu, Nov 20, 2014 at 12:53 AM, Maarten Lankhorst
  wrote:
> Op 20-11-14 om 05:06 schreef Michael Marineau:
>> On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst
>>  wrote:
>>> Hey,
>>>
>>> On 19-11-14 07:43, Michael Marineau wrote:
 On 3.18-rc kernel's I have been intermittently experiencing GPU
 lockups shortly after startup, accompanied with one or both of the
 following errors:

 nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
 from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
 nouveau E[ DRM] GPU lockup - switching to software fbcon

 I was able to trace the issue with bisect to commit
 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
 fences for readable objects". The lockups appear to have cleared
> up
 since reverting that and a few related followup commits:

 809e9447: "drm/nouveau: use shared fences for readable objects"
 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
 e3be4c23: "drm/nouveau: specify if interruptible wait is desired
> in
 nouveau_fence_sync"
 15a996bb: "drm/nouveau: assign fence_chan->name correctly"
>>> Weird. I'm not sure yet what causes it.
>>>
>>>
> http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2
>> Building a kernel from that commit gives me an entirely new
> behavior:
>> X hangs for at least 10-20 seconds at a time with brief moments of
>> responsiveness before hanging again while gitk on the kernel repo
>> loads. Otherwise the system is responsive. The head of that
>> fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared
>> fences for readable objects" commit I originally bisected to does
>> feature the complete lockups I was seeing before.
> Ok for the sake of argument lets just assume they're separate bugs,
> and we should look at xorg
> hanging first.
>
> Is there anything in the dmesg when the hanging happens?
>
> And it's probably 15 seconds, if it's called through
> nouveau_fence_wait.
> Try changing else if (!ret) to else if (WARN_ON(!ret)) in that
> function, and see if you get some dmesg spam. :)
 Adding the WARN_ON to 86be4f21 repots the following:

 [ 1188.676073] [ cut here ]
 [ 1188.676161] WARNING: CPU: 1 PID: 474 at
 drivers/gpu/drm/nouveau/nouveau_fence.c:359
 nouveau_fence_wait.part.9+0x33/0x40 [nouveau]()
 [ 1188.676166] Modules linked in: rndis_host cdc_ether usbnet mii bnep
 ecb btusb bluetooth rfkill bridge stp llc hid_generic usb_storage
 joydev mousedev hid_apple usbhid bcm5974 nls_iso8859_1 nls_cp437 vfat
 fat nouveau snd_hda_codec_hdmi coretemp x86_pkg_temp_thermal
 intel_powerclamp kvm_intel kvm iTCO_wdt crct10dif_pclmul
 iTCO_vendor_support crc32c_intel evdev aesni_intel mac_hid aes_x86_64
 lrw glue_helper ablk_helper applesmc snd_hda_codec_cirrus cryptd
 input_polldev snd_hda_codec_generic mxm_wmi led_class wmi microcode
 hwmon snd_hda_intel ttm snd_hda_controller lpc_ich i2c_i801 mfd_core
 snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm sbs drm
 apple_gmux i2ccore snd_timer snd agpgart mei_me soundcore sbshc mei
 video xhci_hcd usbcore usb_common apple_bl button battery ac efivars
 autofs4
 [ 1188.676300]  efivarfs
 [ 1188.676308] CPU: 1 PID: 474 Comm: Xorg Tainted: GW
 3.17.0-rc2-nvtest+ #147
 [ 1188.676313] Hardware name: Apple Inc.
 MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS
 MBP112.88Z.0138.B11.1408291503 08/29/2014
 [ 1188.676316]  0009 88045daebce8 814f0c09
 
 [ 1188.676325]  88045daebd20 8104ea5d 88006a6c1468
 fff0
 [ 1188.676333]    88006a6c1000
 88045daebd30
 [ 1188.676341] Call Trace:
 [ 1188.676356]  [] dump_stack+

Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-28 Thread Ian Kumlien
Hi,

Sorry to but in like this but I'm suffering from the same kind of
deadlocks with nouveau...

The really odd thing is that i could boot some -rc6+ kernel without
problems but it hung
while playing video and then it refused to start properly again.


Anyway, to quote Maarten:
Ok that most likely means the interrupt based wait is borked somehow,
so lets find \
out why..

I fear that this happens because of a race in the interface, so my
first attempt will \
rule out abuse of the nvif api by nouveau_fence.c

Can you test below patch with the default wait function?
---

I tried the patch, straight on Linus' master and it didn't change
anything for me :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-27 Thread Maarten Lankhorst
Hey,

Op 27-11-14 om 02:18 schreef Tobias Klausmann:
>
>
> On 26.11.2014 21:29, Michael Marineau wrote:
>> On Mon, Nov 24, 2014 at 11:43 PM, Maarten Lankhorst
>>  wrote:
>>> Hey,
>>>
>>> Op 22-11-14 om 21:16 schreef Michael Marineau:
 On Nov 22, 2014 11:45 AM, "Michael Marineau"  wrote:
> On Nov 22, 2014 8:56 AM, "Maarten Lankhorst" <
 maarten.lankho...@canonical.com> wrote:
>> Hey,
>>
>> Op 22-11-14 om 01:19 schreef Michael Marineau:
>>> On Thu, Nov 20, 2014 at 12:53 AM, Maarten Lankhorst
>>>  wrote:
 Op 20-11-14 om 05:06 schreef Michael Marineau:
> On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst
>  wrote:
>> Hey,
>>
>> On 19-11-14 07:43, Michael Marineau wrote:
>>> On 3.18-rc kernel's I have been intermittently experiencing GPU
>>> lockups shortly after startup, accompanied with one or both of the
>>> following errors:
>>>
>>> nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
>>> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
>>> nouveau E[ DRM] GPU lockup - switching to software fbcon
>>>
>>> I was able to trace the issue with bisect to commit
>>> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
>>> fences for readable objects". The lockups appear to have cleared
 up
>>> since reverting that and a few related followup commits:
>>>
>>> 809e9447: "drm/nouveau: use shared fences for readable objects"
>>> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
>>> e3be4c23: "drm/nouveau: specify if interruptible wait is desired
 in
>>> nouveau_fence_sync"
>>> 15a996bb: "drm/nouveau: assign fence_chan->name correctly"
>> Weird. I'm not sure yet what causes it.
>>
>>
 http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2
> Building a kernel from that commit gives me an entirely new
 behavior:
> X hangs for at least 10-20 seconds at a time with brief moments of
> responsiveness before hanging again while gitk on the kernel repo
> loads. Otherwise the system is responsive. The head of that
> fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared
> fences for readable objects" commit I originally bisected to does
> feature the complete lockups I was seeing before.
 Ok for the sake of argument lets just assume they're separate bugs,
 and we should look at xorg
 hanging first.

 Is there anything in the dmesg when the hanging happens?

 And it's probably 15 seconds, if it's called through
 nouveau_fence_wait.
 Try changing else if (!ret) to else if (WARN_ON(!ret)) in that
 function, and see if you get some dmesg spam. :)
>>> Adding the WARN_ON to 86be4f21 repots the following:
>>>
>>> [ 1188.676073] [ cut here ]
>>> [ 1188.676161] WARNING: CPU: 1 PID: 474 at
>>> drivers/gpu/drm/nouveau/nouveau_fence.c:359
>>> nouveau_fence_wait.part.9+0x33/0x40 [nouveau]()
>>> [ 1188.676166] Modules linked in: rndis_host cdc_ether usbnet mii bnep
>>> ecb btusb bluetooth rfkill bridge stp llc hid_generic usb_storage
>>> joydev mousedev hid_apple usbhid bcm5974 nls_iso8859_1 nls_cp437 vfat
>>> fat nouveau snd_hda_codec_hdmi coretemp x86_pkg_temp_thermal
>>> intel_powerclamp kvm_intel kvm iTCO_wdt crct10dif_pclmul
>>> iTCO_vendor_support crc32c_intel evdev aesni_intel mac_hid aes_x86_64
>>> lrw glue_helper ablk_helper applesmc snd_hda_codec_cirrus cryptd
>>> input_polldev snd_hda_codec_generic mxm_wmi led_class wmi microcode
>>> hwmon snd_hda_intel ttm snd_hda_controller lpc_ich i2c_i801 mfd_core
>>> snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm sbs drm
>>> apple_gmux i2ccore snd_timer snd agpgart mei_me soundcore sbshc mei
>>> video xhci_hcd usbcore usb_common apple_bl button battery ac efivars
>>> autofs4
>>> [ 1188.676300]  efivarfs
>>> [ 1188.676308] CPU: 1 PID: 474 Comm: Xorg Tainted: GW
>>> 3.17.0-rc2-nvtest+ #147
>>> [ 1188.676313] Hardware name: Apple Inc.
>>> MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS
>>> MBP112.88Z.0138.B11.1408291503 08/29/2014
>>> [ 1188.676316]  0009 88045daebce8 814f0c09
>>> 
>>> [ 1188.676325]  88045daebd20 8104ea5d 88006a6c1468
>>> fff0
>>> [ 1188.676333]    88006a6c1000
>>> 88045daebd30
>>> [ 1188.676341] Call Trace:
>>> [ 1188.676356]  [] dump_stack+0x4d/0x66
>>> [ 1188.676369]  [] warn_slowpath_common+0x7d/0xa0
>>> [ 1188.676377]  [] warn_slowpath_null+0x1a/0x20
>>> [ 1188.676439]  []
>>> nou

Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-26 Thread Tobias Klausmann



On 26.11.2014 21:29, Michael Marineau wrote:

On Mon, Nov 24, 2014 at 11:43 PM, Maarten Lankhorst
 wrote:

Hey,

Op 22-11-14 om 21:16 schreef Michael Marineau:

On Nov 22, 2014 11:45 AM, "Michael Marineau"  wrote:

On Nov 22, 2014 8:56 AM, "Maarten Lankhorst" <

maarten.lankho...@canonical.com> wrote:

Hey,

Op 22-11-14 om 01:19 schreef Michael Marineau:

On Thu, Nov 20, 2014 at 12:53 AM, Maarten Lankhorst
 wrote:

Op 20-11-14 om 05:06 schreef Michael Marineau:

On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst
 wrote:

Hey,

On 19-11-14 07:43, Michael Marineau wrote:

On 3.18-rc kernel's I have been intermittently experiencing GPU
lockups shortly after startup, accompanied with one or both of the
following errors:

nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
nouveau E[ DRM] GPU lockup - switching to software fbcon

I was able to trace the issue with bisect to commit
809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
fences for readable objects". The lockups appear to have cleared

up

since reverting that and a few related followup commits:

809e9447: "drm/nouveau: use shared fences for readable objects"
055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
e3be4c23: "drm/nouveau: specify if interruptible wait is desired

in

nouveau_fence_sync"
15a996bb: "drm/nouveau: assign fence_chan->name correctly"

Weird. I'm not sure yet what causes it.



http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2

Building a kernel from that commit gives me an entirely new

behavior:

X hangs for at least 10-20 seconds at a time with brief moments of
responsiveness before hanging again while gitk on the kernel repo
loads. Otherwise the system is responsive. The head of that
fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared
fences for readable objects" commit I originally bisected to does
feature the complete lockups I was seeing before.

Ok for the sake of argument lets just assume they're separate bugs,

and we should look at xorg

hanging first.

Is there anything in the dmesg when the hanging happens?

And it's probably 15 seconds, if it's called through

nouveau_fence_wait.

Try changing else if (!ret) to else if (WARN_ON(!ret)) in that

function, and see if you get some dmesg spam. :)

Adding the WARN_ON to 86be4f21 repots the following:

[ 1188.676073] [ cut here ]
[ 1188.676161] WARNING: CPU: 1 PID: 474 at
drivers/gpu/drm/nouveau/nouveau_fence.c:359
nouveau_fence_wait.part.9+0x33/0x40 [nouveau]()
[ 1188.676166] Modules linked in: rndis_host cdc_ether usbnet mii bnep
ecb btusb bluetooth rfkill bridge stp llc hid_generic usb_storage
joydev mousedev hid_apple usbhid bcm5974 nls_iso8859_1 nls_cp437 vfat
fat nouveau snd_hda_codec_hdmi coretemp x86_pkg_temp_thermal
intel_powerclamp kvm_intel kvm iTCO_wdt crct10dif_pclmul
iTCO_vendor_support crc32c_intel evdev aesni_intel mac_hid aes_x86_64
lrw glue_helper ablk_helper applesmc snd_hda_codec_cirrus cryptd
input_polldev snd_hda_codec_generic mxm_wmi led_class wmi microcode
hwmon snd_hda_intel ttm snd_hda_controller lpc_ich i2c_i801 mfd_core
snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm sbs drm
apple_gmux i2ccore snd_timer snd agpgart mei_me soundcore sbshc mei
video xhci_hcd usbcore usb_common apple_bl button battery ac efivars
autofs4
[ 1188.676300]  efivarfs
[ 1188.676308] CPU: 1 PID: 474 Comm: Xorg Tainted: GW
3.17.0-rc2-nvtest+ #147
[ 1188.676313] Hardware name: Apple Inc.
MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS
MBP112.88Z.0138.B11.1408291503 08/29/2014
[ 1188.676316]  0009 88045daebce8 814f0c09

[ 1188.676325]  88045daebd20 8104ea5d 88006a6c1468
fff0
[ 1188.676333]    88006a6c1000
88045daebd30
[ 1188.676341] Call Trace:
[ 1188.676356]  [] dump_stack+0x4d/0x66
[ 1188.676369]  [] warn_slowpath_common+0x7d/0xa0
[ 1188.676377]  [] warn_slowpath_null+0x1a/0x20
[ 1188.676439]  []
nouveau_fence_wait.part.9+0x33/0x40 [nouveau]
[ 1188.676496]  [] nouveau_fence_wait+0x16/0x30

[nouveau]

[ 1188.676552]  []
nouveau_gem_ioctl_cpu_prep+0xef/0x1f0 [nouveau]
[ 1188.676578]  [] drm_ioctl+0x1ec/0x660 [drm]
[ 1188.676590]  [] ?

_raw_spin_unlock_irqrestore+0x36/0x70

[ 1188.676600]  [] ? trace_hardirqs_on+0xd/0x10
[ 1188.676655]  [] nouveau_drm_ioctl+0x54/0xc0

[nouveau]

[ 1188.676663]  [] do_vfs_ioctl+0x300/0x520
[ 1188.676671]  [] ? sysret_check+0x22/0x5d
[ 1188.676677]  [] SyS_ioctl+0x41/0x80
[ 1188.676683]  [] system_call_fastpath+0x16/0x1b
[ 1188.676688] ---[ end trace 6f7a510865b4674f ]---

Here are the fence events that fired during that particular

fence_wait:

 Xorg   474 [004]  1173.667645: fence:fence_wait_start:
driver=nouveau timeline=Xorg[474] context=2 seqno=56910
 Xorg   474 [004]  1173.667647: fence:f

Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-26 Thread Michael Marineau
On Mon, Nov 24, 2014 at 11:43 PM, Maarten Lankhorst
 wrote:
> Hey,
>
> Op 22-11-14 om 21:16 schreef Michael Marineau:
>> On Nov 22, 2014 11:45 AM, "Michael Marineau"  wrote:
>>>
>>> On Nov 22, 2014 8:56 AM, "Maarten Lankhorst" <
>> maarten.lankho...@canonical.com> wrote:
 Hey,

 Op 22-11-14 om 01:19 schreef Michael Marineau:
> On Thu, Nov 20, 2014 at 12:53 AM, Maarten Lankhorst
>  wrote:
>> Op 20-11-14 om 05:06 schreef Michael Marineau:
>>> On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst
>>>  wrote:
 Hey,

 On 19-11-14 07:43, Michael Marineau wrote:
> On 3.18-rc kernel's I have been intermittently experiencing GPU
> lockups shortly after startup, accompanied with one or both of the
> following errors:
>
> nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
> nouveau E[ DRM] GPU lockup - switching to software fbcon
>
> I was able to trace the issue with bisect to commit
> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
> fences for readable objects". The lockups appear to have cleared
>> up
> since reverting that and a few related followup commits:
>
> 809e9447: "drm/nouveau: use shared fences for readable objects"
> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
> e3be4c23: "drm/nouveau: specify if interruptible wait is desired
>> in
> nouveau_fence_sync"
> 15a996bb: "drm/nouveau: assign fence_chan->name correctly"
 Weird. I'm not sure yet what causes it.


>> http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2
>>> Building a kernel from that commit gives me an entirely new
>> behavior:
>>> X hangs for at least 10-20 seconds at a time with brief moments of
>>> responsiveness before hanging again while gitk on the kernel repo
>>> loads. Otherwise the system is responsive. The head of that
>>> fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared
>>> fences for readable objects" commit I originally bisected to does
>>> feature the complete lockups I was seeing before.
>> Ok for the sake of argument lets just assume they're separate bugs,
>> and we should look at xorg
>> hanging first.
>>
>> Is there anything in the dmesg when the hanging happens?
>>
>> And it's probably 15 seconds, if it's called through
>> nouveau_fence_wait.
>> Try changing else if (!ret) to else if (WARN_ON(!ret)) in that
>> function, and see if you get some dmesg spam. :)
> Adding the WARN_ON to 86be4f21 repots the following:
>
> [ 1188.676073] [ cut here ]
> [ 1188.676161] WARNING: CPU: 1 PID: 474 at
> drivers/gpu/drm/nouveau/nouveau_fence.c:359
> nouveau_fence_wait.part.9+0x33/0x40 [nouveau]()
> [ 1188.676166] Modules linked in: rndis_host cdc_ether usbnet mii bnep
> ecb btusb bluetooth rfkill bridge stp llc hid_generic usb_storage
> joydev mousedev hid_apple usbhid bcm5974 nls_iso8859_1 nls_cp437 vfat
> fat nouveau snd_hda_codec_hdmi coretemp x86_pkg_temp_thermal
> intel_powerclamp kvm_intel kvm iTCO_wdt crct10dif_pclmul
> iTCO_vendor_support crc32c_intel evdev aesni_intel mac_hid aes_x86_64
> lrw glue_helper ablk_helper applesmc snd_hda_codec_cirrus cryptd
> input_polldev snd_hda_codec_generic mxm_wmi led_class wmi microcode
> hwmon snd_hda_intel ttm snd_hda_controller lpc_ich i2c_i801 mfd_core
> snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm sbs drm
> apple_gmux i2ccore snd_timer snd agpgart mei_me soundcore sbshc mei
> video xhci_hcd usbcore usb_common apple_bl button battery ac efivars
> autofs4
> [ 1188.676300]  efivarfs
> [ 1188.676308] CPU: 1 PID: 474 Comm: Xorg Tainted: GW
> 3.17.0-rc2-nvtest+ #147
> [ 1188.676313] Hardware name: Apple Inc.
> MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS
> MBP112.88Z.0138.B11.1408291503 08/29/2014
> [ 1188.676316]  0009 88045daebce8 814f0c09
> 
> [ 1188.676325]  88045daebd20 8104ea5d 88006a6c1468
> fff0
> [ 1188.676333]    88006a6c1000
> 88045daebd30
> [ 1188.676341] Call Trace:
> [ 1188.676356]  [] dump_stack+0x4d/0x66
> [ 1188.676369]  [] warn_slowpath_common+0x7d/0xa0
> [ 1188.676377]  [] warn_slowpath_null+0x1a/0x20
> [ 1188.676439]  []
> nouveau_fence_wait.part.9+0x33/0x40 [nouveau]
> [ 1188.676496]  [] nouveau_fence_wait+0x16/0x30
>> [nouveau]
> [ 1188.676552]  []
> nouveau_gem_ioctl_cpu_prep+0xef/0x1f0 [nouveau]
> [ 1188.676578]  [] drm_ioctl+0x1ec/0x660 [drm]
> [ 1188.676590]  [] ?
>> _raw_spin_unlock_irqrestor

Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-24 Thread Maarten Lankhorst
Hey,

Op 22-11-14 om 21:16 schreef Michael Marineau:
> On Nov 22, 2014 11:45 AM, "Michael Marineau"  wrote:
>>
>> On Nov 22, 2014 8:56 AM, "Maarten Lankhorst" <
> maarten.lankho...@canonical.com> wrote:
>>> Hey,
>>>
>>> Op 22-11-14 om 01:19 schreef Michael Marineau:
 On Thu, Nov 20, 2014 at 12:53 AM, Maarten Lankhorst
  wrote:
> Op 20-11-14 om 05:06 schreef Michael Marineau:
>> On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst
>>  wrote:
>>> Hey,
>>>
>>> On 19-11-14 07:43, Michael Marineau wrote:
 On 3.18-rc kernel's I have been intermittently experiencing GPU
 lockups shortly after startup, accompanied with one or both of the
 following errors:

 nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
 from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
 nouveau E[ DRM] GPU lockup - switching to software fbcon

 I was able to trace the issue with bisect to commit
 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
 fences for readable objects". The lockups appear to have cleared
> up
 since reverting that and a few related followup commits:

 809e9447: "drm/nouveau: use shared fences for readable objects"
 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
 e3be4c23: "drm/nouveau: specify if interruptible wait is desired
> in
 nouveau_fence_sync"
 15a996bb: "drm/nouveau: assign fence_chan->name correctly"
>>> Weird. I'm not sure yet what causes it.
>>>
>>>
> http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2
>> Building a kernel from that commit gives me an entirely new
> behavior:
>> X hangs for at least 10-20 seconds at a time with brief moments of
>> responsiveness before hanging again while gitk on the kernel repo
>> loads. Otherwise the system is responsive. The head of that
>> fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared
>> fences for readable objects" commit I originally bisected to does
>> feature the complete lockups I was seeing before.
> Ok for the sake of argument lets just assume they're separate bugs,
> and we should look at xorg
> hanging first.
>
> Is there anything in the dmesg when the hanging happens?
>
> And it's probably 15 seconds, if it's called through
> nouveau_fence_wait.
> Try changing else if (!ret) to else if (WARN_ON(!ret)) in that
> function, and see if you get some dmesg spam. :)
 Adding the WARN_ON to 86be4f21 repots the following:

 [ 1188.676073] [ cut here ]
 [ 1188.676161] WARNING: CPU: 1 PID: 474 at
 drivers/gpu/drm/nouveau/nouveau_fence.c:359
 nouveau_fence_wait.part.9+0x33/0x40 [nouveau]()
 [ 1188.676166] Modules linked in: rndis_host cdc_ether usbnet mii bnep
 ecb btusb bluetooth rfkill bridge stp llc hid_generic usb_storage
 joydev mousedev hid_apple usbhid bcm5974 nls_iso8859_1 nls_cp437 vfat
 fat nouveau snd_hda_codec_hdmi coretemp x86_pkg_temp_thermal
 intel_powerclamp kvm_intel kvm iTCO_wdt crct10dif_pclmul
 iTCO_vendor_support crc32c_intel evdev aesni_intel mac_hid aes_x86_64
 lrw glue_helper ablk_helper applesmc snd_hda_codec_cirrus cryptd
 input_polldev snd_hda_codec_generic mxm_wmi led_class wmi microcode
 hwmon snd_hda_intel ttm snd_hda_controller lpc_ich i2c_i801 mfd_core
 snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm sbs drm
 apple_gmux i2ccore snd_timer snd agpgart mei_me soundcore sbshc mei
 video xhci_hcd usbcore usb_common apple_bl button battery ac efivars
 autofs4
 [ 1188.676300]  efivarfs
 [ 1188.676308] CPU: 1 PID: 474 Comm: Xorg Tainted: GW
 3.17.0-rc2-nvtest+ #147
 [ 1188.676313] Hardware name: Apple Inc.
 MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS
 MBP112.88Z.0138.B11.1408291503 08/29/2014
 [ 1188.676316]  0009 88045daebce8 814f0c09
 
 [ 1188.676325]  88045daebd20 8104ea5d 88006a6c1468
 fff0
 [ 1188.676333]    88006a6c1000
 88045daebd30
 [ 1188.676341] Call Trace:
 [ 1188.676356]  [] dump_stack+0x4d/0x66
 [ 1188.676369]  [] warn_slowpath_common+0x7d/0xa0
 [ 1188.676377]  [] warn_slowpath_null+0x1a/0x20
 [ 1188.676439]  []
 nouveau_fence_wait.part.9+0x33/0x40 [nouveau]
 [ 1188.676496]  [] nouveau_fence_wait+0x16/0x30
> [nouveau]
 [ 1188.676552]  []
 nouveau_gem_ioctl_cpu_prep+0xef/0x1f0 [nouveau]
 [ 1188.676578]  [] drm_ioctl+0x1ec/0x660 [drm]
 [ 1188.676590]  [] ?
> _raw_spin_unlock_irqrestore+0x36/0x70
 [ 1188.676600]  [] ? trace_hardirqs_on+0xd/0x10
 [ 1188.676655]  [] nouveau_drm_ioctl+0x54/0xc0
> [nouveau]
 [ 1188.676663]  [] do_vfs_ioctl+

Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-22 Thread Maarten Lankhorst
Hey,

Op 22-11-14 om 01:19 schreef Michael Marineau:
> On Thu, Nov 20, 2014 at 12:53 AM, Maarten Lankhorst
>  wrote:
>> Op 20-11-14 om 05:06 schreef Michael Marineau:
>>> On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst
>>>  wrote:
 Hey,

 On 19-11-14 07:43, Michael Marineau wrote:
> On 3.18-rc kernel's I have been intermittently experiencing GPU
> lockups shortly after startup, accompanied with one or both of the
> following errors:
>
> nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
> nouveau E[ DRM] GPU lockup - switching to software fbcon
>
> I was able to trace the issue with bisect to commit
> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
> fences for readable objects". The lockups appear to have cleared up
> since reverting that and a few related followup commits:
>
> 809e9447: "drm/nouveau: use shared fences for readable objects"
> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
> e3be4c23: "drm/nouveau: specify if interruptible wait is desired in
> nouveau_fence_sync"
> 15a996bb: "drm/nouveau: assign fence_chan->name correctly"
 Weird. I'm not sure yet what causes it.

 http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2
>>> Building a kernel from that commit gives me an entirely new behavior:
>>> X hangs for at least 10-20 seconds at a time with brief moments of
>>> responsiveness before hanging again while gitk on the kernel repo
>>> loads. Otherwise the system is responsive. The head of that
>>> fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared
>>> fences for readable objects" commit I originally bisected to does
>>> feature the complete lockups I was seeing before.
>> Ok for the sake of argument lets just assume they're separate bugs, and we 
>> should look at xorg
>> hanging first.
>>
>> Is there anything in the dmesg when the hanging happens?
>>
>> And it's probably 15 seconds, if it's called through nouveau_fence_wait.
>>
>> Try changing else if (!ret) to else if (WARN_ON(!ret)) in that function, and 
>> see if you get some dmesg spam. :)
> Adding the WARN_ON to 86be4f21 repots the following:
>
> [ 1188.676073] [ cut here ]
> [ 1188.676161] WARNING: CPU: 1 PID: 474 at
> drivers/gpu/drm/nouveau/nouveau_fence.c:359
> nouveau_fence_wait.part.9+0x33/0x40 [nouveau]()
> [ 1188.676166] Modules linked in: rndis_host cdc_ether usbnet mii bnep
> ecb btusb bluetooth rfkill bridge stp llc hid_generic usb_storage
> joydev mousedev hid_apple usbhid bcm5974 nls_iso8859_1 nls_cp437 vfat
> fat nouveau snd_hda_codec_hdmi coretemp x86_pkg_temp_thermal
> intel_powerclamp kvm_intel kvm iTCO_wdt crct10dif_pclmul
> iTCO_vendor_support crc32c_intel evdev aesni_intel mac_hid aes_x86_64
> lrw glue_helper ablk_helper applesmc snd_hda_codec_cirrus cryptd
> input_polldev snd_hda_codec_generic mxm_wmi led_class wmi microcode
> hwmon snd_hda_intel ttm snd_hda_controller lpc_ich i2c_i801 mfd_core
> snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm sbs drm
> apple_gmux i2ccore snd_timer snd agpgart mei_me soundcore sbshc mei
> video xhci_hcd usbcore usb_common apple_bl button battery ac efivars
> autofs4
> [ 1188.676300]  efivarfs
> [ 1188.676308] CPU: 1 PID: 474 Comm: Xorg Tainted: GW
> 3.17.0-rc2-nvtest+ #147
> [ 1188.676313] Hardware name: Apple Inc.
> MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS
> MBP112.88Z.0138.B11.1408291503 08/29/2014
> [ 1188.676316]  0009 88045daebce8 814f0c09
> 
> [ 1188.676325]  88045daebd20 8104ea5d 88006a6c1468
> fff0
> [ 1188.676333]    88006a6c1000
> 88045daebd30
> [ 1188.676341] Call Trace:
> [ 1188.676356]  [] dump_stack+0x4d/0x66
> [ 1188.676369]  [] warn_slowpath_common+0x7d/0xa0
> [ 1188.676377]  [] warn_slowpath_null+0x1a/0x20
> [ 1188.676439]  []
> nouveau_fence_wait.part.9+0x33/0x40 [nouveau]
> [ 1188.676496]  [] nouveau_fence_wait+0x16/0x30 [nouveau]
> [ 1188.676552]  []
> nouveau_gem_ioctl_cpu_prep+0xef/0x1f0 [nouveau]
> [ 1188.676578]  [] drm_ioctl+0x1ec/0x660 [drm]
> [ 1188.676590]  [] ? _raw_spin_unlock_irqrestore+0x36/0x70
> [ 1188.676600]  [] ? trace_hardirqs_on+0xd/0x10
> [ 1188.676655]  [] nouveau_drm_ioctl+0x54/0xc0 [nouveau]
> [ 1188.676663]  [] do_vfs_ioctl+0x300/0x520
> [ 1188.676671]  [] ? sysret_check+0x22/0x5d
> [ 1188.676677]  [] SyS_ioctl+0x41/0x80
> [ 1188.676683]  [] system_call_fastpath+0x16/0x1b
> [ 1188.676688] ---[ end trace 6f7a510865b4674f ]---
>
> Here are the fence events that fired during that particular fence_wait:
> Xorg   474 [004]  1173.667645: fence:fence_wait_start:
> driver=nouveau timeline=Xorg[474] context=2 seqno=56910
> Xorg   474 [004]  1173.667647: fence:fence_enable_signal:
> drive

Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-20 Thread Maarten Lankhorst
Op 20-11-14 om 05:06 schreef Michael Marineau:
> On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst
>  wrote:
>> Hey,
>>
>> On 19-11-14 07:43, Michael Marineau wrote:
>>> On 3.18-rc kernel's I have been intermittently experiencing GPU
>>> lockups shortly after startup, accompanied with one or both of the
>>> following errors:
>>>
>>> nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
>>> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
>>> nouveau E[ DRM] GPU lockup - switching to software fbcon
>>>
>>> I was able to trace the issue with bisect to commit
>>> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
>>> fences for readable objects". The lockups appear to have cleared up
>>> since reverting that and a few related followup commits:
>>>
>>> 809e9447: "drm/nouveau: use shared fences for readable objects"
>>> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
>>> e3be4c23: "drm/nouveau: specify if interruptible wait is desired in
>>> nouveau_fence_sync"
>>> 15a996bb: "drm/nouveau: assign fence_chan->name correctly"
>> Weird. I'm not sure yet what causes it.
>>
>> http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2
> Building a kernel from that commit gives me an entirely new behavior:
> X hangs for at least 10-20 seconds at a time with brief moments of
> responsiveness before hanging again while gitk on the kernel repo
> loads. Otherwise the system is responsive. The head of that
> fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared
> fences for readable objects" commit I originally bisected to does
> feature the complete lockups I was seeing before.
Ok for the sake of argument lets just assume they're separate bugs, and we 
should look at xorg
hanging first.

Is there anything in the dmesg when the hanging happens?

And it's probably 15 seconds, if it's called through nouveau_fence_wait.

Try changing else if (!ret) to else if (WARN_ON(!ret)) in that function, and 
see if you get some dmesg spam. :)


>> On the EDITED patch from fixed-fences-for-bisect, can you do the following:
>>
>> In nouveau/nv84_fence.c function nv84_fence_context_new, remove
>>
>> fctx->base.sequence = nv84_fence_read(chan);
>>
>> and add back
>>
>> nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x);
> Making your suggested change on top of each 86be4f21 and 1c6aafb5 made
> no noticeable difference in either of the two behaviors.
>
>> If that fails you should compile your kernel with trace events, to get some 
>> debugging info from the fences. I'll post debugging info if this does not 
>> fix it.
> Happy to gather whatever debug log or tracing data you need :)
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-20 Thread Maarten Lankhorst
Op 20-11-14 om 00:08 schreef Tobias Klausmann:
> On 19.11.2014 09:10, Maarten Lankhorst wrote:
>> ...
>> On the EDITED patch from fixed-fences-for-bisect, can you do the following:
>>
>> In nouveau/nv84_fence.c function nv84_fence_context_new, remove
>>
>> fctx->base.sequence = nv84_fence_read(chan);
>>
>> and add back
>>
>> nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x);
>>
>> ...
>
> Added the above on top of your "fixed-fences-for-bisect" branch and guessed 
> it would work, but did not :/
> Anyway, as this "initializes" the fence to a known state, maybe you should 
> consider pushing that.
Hey,

There is a reason I don't set it to a known state on nv84+.

Channel 2 is created, fence seqno ends up being 100, other channel waits on 
seqno reaching 100.
Channel 2 is destroyed, and immediately recreated. Seqno is reset to 0.
Other channel waits for channel 2's seqno being 100.

The other channel can keep waiting indefinitely.

I guess it might be useful to reset the contents of the bo to zero on 
allocation, but it should not be done in fence_context_new.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-19 Thread Michael Marineau
On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst
 wrote:
> Hey,
>
> On 19-11-14 07:43, Michael Marineau wrote:
>> On 3.18-rc kernel's I have been intermittently experiencing GPU
>> lockups shortly after startup, accompanied with one or both of the
>> following errors:
>>
>> nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
>> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
>> nouveau E[ DRM] GPU lockup - switching to software fbcon
>>
>> I was able to trace the issue with bisect to commit
>> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
>> fences for readable objects". The lockups appear to have cleared up
>> since reverting that and a few related followup commits:
>>
>> 809e9447: "drm/nouveau: use shared fences for readable objects"
>> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
>> e3be4c23: "drm/nouveau: specify if interruptible wait is desired in
>> nouveau_fence_sync"
>> 15a996bb: "drm/nouveau: assign fence_chan->name correctly"
>
> Weird. I'm not sure yet what causes it.
>
> http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2

Building a kernel from that commit gives me an entirely new behavior:
X hangs for at least 10-20 seconds at a time with brief moments of
responsiveness before hanging again while gitk on the kernel repo
loads. Otherwise the system is responsive. The head of that
fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared
fences for readable objects" commit I originally bisected to does
feature the complete lockups I was seeing before.

>
> On the EDITED patch from fixed-fences-for-bisect, can you do the following:
>
> In nouveau/nv84_fence.c function nv84_fence_context_new, remove
>
> fctx->base.sequence = nv84_fence_read(chan);
>
> and add back
>
> nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x);

Making your suggested change on top of each 86be4f21 and 1c6aafb5 made
no noticeable difference in either of the two behaviors.

>
> If that fails you should compile your kernel with trace events, to get some 
> debugging info from the fences. I'll post debugging info if this does not fix 
> it.

Happy to gather whatever debug log or tracing data you need :)

-- 
Michael Marineau
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-19 Thread Tobias Klausmann

On 19.11.2014 09:10, Maarten Lankhorst wrote:

...
On the EDITED patch from fixed-fences-for-bisect, can you do the following:

In nouveau/nv84_fence.c function nv84_fence_context_new, remove

fctx->base.sequence = nv84_fence_read(chan);

and add back

nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x);

...


Added the above on top of your "fixed-fences-for-bisect" branch and 
guessed it would work, but did not :/
Anyway, as this "initializes" the fence to a known state, maybe you 
should consider pushing that.


Going to compile the kernel with trace events (lets see how) ...

Tobias
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-19 Thread Tobias Klausmann

On 19.11.2014 09:10, Maarten Lankhorst wrote:

Hey,

On 19-11-14 07:43, Michael Marineau wrote:

On 3.18-rc kernel's I have been intermittently experiencing GPU
lockups shortly after startup, accompanied with one or both of the
following errors:

nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
nouveau E[ DRM] GPU lockup - switching to software fbcon

I was able to trace the issue with bisect to commit
809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
fences for readable objects". The lockups appear to have cleared up
since reverting that and a few related followup commits:

809e9447: "drm/nouveau: use shared fences for readable objects"
055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
e3be4c23: "drm/nouveau: specify if interruptible wait is desired in
nouveau_fence_sync"
15a996bb: "drm/nouveau: assign fence_chan->name correctly"

Weird. I'm not sure yet what causes it.

http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2

On the EDITED patch from fixed-fences-for-bisect, can you do the following:

In nouveau/nv84_fence.c function nv84_fence_context_new, remove

fctx->base.sequence = nv84_fence_read(chan);

and add back

nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x);

If that fails you should compile your kernel with trace events, to get some 
debugging info from the fences. I'll post debugging info if this does not fix 
it.

~Maarten


Hey,
as mentioned in IRC the new fencing hangs my GPU for a while as well (nve7).
Bisected back to  86be4f216bbb9ea3339843a5658d4c21162c7ee2
, EDITED

from the fixed-fences-for-bisect branch mentioned above.

Original bisect on linus brach brought me to:
29ba89b2371d466ca68973525816cf10debc2655
drm/nouveau: rework to new fence interface

Michael if you are going to bisect the "fixed-fences-for-bisect" branch, 
maybe take a closer look if you come anywhere near that commit, if that 
does or does not trigger the GPU hangs for you!


Tobias
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-19 Thread Maarten Lankhorst
Hey,

On 19-11-14 07:43, Michael Marineau wrote:
> On 3.18-rc kernel's I have been intermittently experiencing GPU
> lockups shortly after startup, accompanied with one or both of the
> following errors:
> 
> nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
> nouveau E[ DRM] GPU lockup - switching to software fbcon
> 
> I was able to trace the issue with bisect to commit
> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
> fences for readable objects". The lockups appear to have cleared up
> since reverting that and a few related followup commits:
> 
> 809e9447: "drm/nouveau: use shared fences for readable objects"
> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
> e3be4c23: "drm/nouveau: specify if interruptible wait is desired in
> nouveau_fence_sync"
> 15a996bb: "drm/nouveau: assign fence_chan->name correctly"

Weird. I'm not sure yet what causes it.

http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2

On the EDITED patch from fixed-fences-for-bisect, can you do the following:

In nouveau/nv84_fence.c function nv84_fence_context_new, remove

fctx->base.sequence = nv84_fence_read(chan);

and add back

nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x);

If that fails you should compile your kernel with trace events, to get some 
debugging info from the fences. I'll post debugging info if this does not fix 
it.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-18 Thread Michael Marineau
On 3.18-rc kernel's I have been intermittently experiencing GPU
lockups shortly after startup, accompanied with one or both of the
following errors:

nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
nouveau E[ DRM] GPU lockup - switching to software fbcon

I was able to trace the issue with bisect to commit
809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
fences for readable objects". The lockups appear to have cleared up
since reverting that and a few related followup commits:

809e9447: "drm/nouveau: use shared fences for readable objects"
055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
e3be4c23: "drm/nouveau: specify if interruptible wait is desired in
nouveau_fence_sync"
15a996bb: "drm/nouveau: assign fence_chan->name correctly"

For reference here is what the driver reports about my hardware:
nouveau :01:00.0: enabling device (0006 -> 0007)
nouveau  [  DEVICE][:01:00.0] BOOT0  : 0x0e7290a2
nouveau  [  DEVICE][:01:00.0] Chipset: GK107 (NVE7)
nouveau  [  DEVICE][:01:00.0] Family : NVE0
nouveau  [   VBIOS][:01:00.0] checking PRAMIN for image...
nouveau  [   VBIOS][:01:00.0] ... appears to be valid
nouveau  [   VBIOS][:01:00.0] using image from PRAMIN
nouveau  [   VBIOS][:01:00.0] BIT signature found
nouveau  [   VBIOS][:01:00.0] version 80.07.c7.04.01
nouveau :01:00.0: irq 39 for MSI/MSI-X
nouveau  [ PMC][:01:00.0] MSI interrupts enabled
nouveau  [ PFB][:01:00.0] RAM type: GDDR5
nouveau  [ PFB][:01:00.0] RAM size: 2048 MiB
nouveau  [ PFB][:01:00.0]ZCOMP: 0 tags
nouveau  [  PTHERM][:01:00.0] FAN control: none / external
nouveau  [  PTHERM][:01:00.0] fan management: automatic
nouveau  [  PTHERM][:01:00.0] internal sensor: yes
nouveau  [ CLK][:01:00.0] 07: core 270-405 MHz memory 838 MHz
nouveau  [ CLK][:01:00.0] 0a: core 270-925 MHz memory 1560 MHz
nouveau  [ CLK][:01:00.0] 0e: core 270-925 MHz memory 4000 MHz
nouveau  [ CLK][:01:00.0] 0f: core 270-925 MHz memory 5016 MHz
nouveau  [ CLK][:01:00.0] --: core 405 MHz memory 680 MHz
nouveau  [ DRM] VRAM: 2048 MiB
nouveau  [ DRM] GART: 1048576 MiB
nouveau  [ DRM] TMDS table version 2.0
nouveau  [ DRM] DCB version 4.0
nouveau  [ DRM] DCB outp 00: 04810fb6 0f230010
nouveau  [ DRM] DCB outp 01: 01821fd6 0f420020
nouveau  [ DRM] DCB outp 02: 01021f12 00020020
nouveau  [ DRM] DCB outp 03: 08832fc6 0f420010
nouveau  [ DRM] DCB outp 04: 08032f02 00020010
nouveau  [ DRM] DCB outp 05: 02843f62 00020010
nouveau  [ DRM] DCB conn 00: 00020047
nouveau  [ DRM] DCB conn 01: 02208146
nouveau  [ DRM] DCB conn 02: 01104246
nouveau  [ DRM] DCB conn 03: 00410361
nouveau  [ DRM] MM: using COPY for buffer copies
nouveau  [ DRM] allocated 2880x1800 fb: 0x8, bo 88046b26f800

-- 
Michael Marineau
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/