Re: [PATCH wq#for-linus] drm: fix a fallout from slow-work -> wq conversion

2018-06-19 Thread Markus Trippelsdorf
On Mon, Aug 09, 2010 at 12:00:49PM +0200, Tejun Heo wrote:
> Commit 991ea75c (drm: use workqueue instead of slow-work), which made
> drm to use wq instead of slow-work, didn't account for the return
> value difference between delayed_slow_work_enqueue() and
> queue_delayed_work().  The former returns 0 on success and -errno on
> failures while the latter never fails and only uses the return value
> to indicate whether the work was already pending or not.
> 
> This misconversion triggered spurious error messages.  Remove the now
> unnecessary return value check and error message.
> 
> Signed-off-by: Tejun Heo 
> Reported-by: Markus Trippelsdorf 
> Cc: David Airlie 
> Cc: dri-devel@lists.freedesktop.org
> ---
> Markus, it's almost trivial but it would be great if you can test this
> one too.

Looks good, but drm_kms_helper_poll_disable needs the same treatment.


diff --git a/drivers/gpu/drm/drm_crtc_helper.c 
b/drivers/gpu/drm/drm_crtc_helper.c
index 4598130..b9e4dbf 100644
--- a/drivers/gpu/drm/drm_crtc_helper.c
+++ b/drivers/gpu/drm/drm_crtc_helper.c
@@ -839,7 +839,6 @@ static void output_poll_execute(struct work_struct *work)
struct drm_connector *connector;
enum drm_connector_status old_status, status;
bool repoll = false, changed = false;
-   int ret;
 
mutex_lock(>mode_config.mutex);
list_for_each_entry(connector, >mode_config.connector_list, head) {
@@ -874,11 +873,8 @@ static void output_poll_execute(struct work_struct *work)
dev->mode_config.funcs->output_poll_changed(dev);
}
 
-   if (repoll) {
-   ret = queue_delayed_work(system_nrt_wq, delayed_work, 
DRM_OUTPUT_POLL_PERIOD);
-   if (ret)
-   DRM_ERROR("delayed enqueue failed %d\n", ret);
-   }
+   if (repoll)
+   queue_delayed_work(system_nrt_wq, delayed_work, 
DRM_OUTPUT_POLL_PERIOD);
 }
 
 void drm_kms_helper_poll_disable(struct drm_device *dev)
@@ -893,18 +889,14 @@ void drm_kms_helper_poll_enable(struct drm_device *dev)
 {
bool poll = false;
struct drm_connector *connector;
-   int ret;
 
list_for_each_entry(connector, >mode_config.connector_list, head) {
if (connector->polled)
poll = true;
}
 
-   if (poll) {
-   ret = queue_delayed_work(system_nrt_wq, 
>mode_config.output_poll_work, DRM_OUTPUT_POLL_PERIOD);
-   if (ret)
-   DRM_ERROR("delayed enqueue failed %d\n", ret);
-   }
+   if (poll)
+   queue_delayed_work(system_nrt_wq, 
>mode_config.output_poll_work, DRM_OUTPUT_POLL_PERIOD);
 }
 EXPORT_SYMBOL(drm_kms_helper_poll_enable);
 
-- 
»A man who doesn't know he is in prison can never escape.«
William S. Burroughs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm fixes for 4.10-rc6 (just missed rc5 tagging :-)

2017-01-25 Thread Markus Trippelsdorf
On 2017.01.23 at 09:38 +1000, Dave Airlie wrote:
> 
> Alex Deucher (8):
>   drm/radeon/si: load special ucode for certain MC configs
>   drm/amdgpu/si: load special ucode for certain MC configs
>   drm/amdgpu: drop oland quirks
>   drm/amdgpu: drop the mclk quirk for hainan
>   drm/radeon: drop oland quirks
>   drm/radeon: drop the mclk quirk for hainan
>   drm/radeon: add support for new hainan variants
>   drm/amdgpu: add support for new hainan variants

Since the merge I get the following warning during boot:

[2.463532] [drm] Initialized
[2.463576] [drm] radeon kernel modesetting enabled.
[2.463788] [drm] initializing kernel modesetting (RS780 0x1002:0x9614 
0x1043:0x834D 0x00).
[2.463830] [drm] register mmio base: 0xFBEE
[2.463867] [drm] register mmio size: 65536
[2.464429] ATOM BIOS: 113
[2.464481] radeon :01:05.0: VRAM: 128M 0xC000 - 
0xC7FF (128M used)
[2.464531] radeon :01:05.0: GTT: 512M 0xA000 - 
0xBFFF
[2.464573] [drm] Detected VRAM RAM=128M, BAR=128M
[2.464610] [drm] RAM width 32bits DDR
[2.464698] [TTM] Zone  kernel: Available graphics memory: 4079298 kiB
[2.464736] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[2.464775] [TTM] Initializing pool allocator
[2.464815] [TTM] Initializing DMA pool allocator
[2.464869] [drm] radeon: 128M of VRAM memory ready
[2.464906] [drm] radeon: 512M of GTT memory ready.
[2.464951] [drm] Loading RS780 Microcode
[2.464993] [drm] radeon: power management initialized
[2.465033] [drm] GART: num cpu pages 131072, num gpu pages 131072
[2.476534] [drm] PCIE GART of 512M enabled (table at 0xC004).
[2.476617] radeon :01:05.0: WB enabled
[2.476656] radeon :01:05.0: fence driver on ring 0 use gpu addr 
0xac00 and cpu addr 0x880215c8fc00
[2.476707] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[2.476745] [drm] Driver supports precise vblank timestamp query.
[2.476783] radeon :01:05.0: radeon: MSI limited to 32-bit
[2.476833] [drm] radeon: irq initialized.
[2.509088] [drm] ring test on 0 succeeded in 1 usecs
[2.509395] [drm] ib test on ring 0 succeeded in 0 usecs
[2.509594] [drm] Radeon Display Connectors
[2.509632] [drm] Connector 0:
[2.509669] [drm]   VGA-1
[2.509706] [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 
0x7e4c
[2.509744] [drm]   Encoders:
[2.509781] [drm] CRT1: INTERNAL_KLDSCP_DAC1
[2.509818] [drm] Connector 1:
[2.509855] [drm]   DVI-D-1
[2.509892] [drm]   HPD3
[2.509929] [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 
0x7e5c
[2.509967] [drm]   Encoders:
[2.510004] [drm] DFP3: INTERNAL_KLDSCP_LVTMA
[2.556637] [drm] fb mappable at 0xF0141000
[2.556675] [drm] vram apper at 0xF000
[2.556712] [drm] size 8294400
[2.556749] [drm] fb depth is 24
[2.556786] [drm]pitch is 7680
[2.556871] fbcon: radeondrmfb (fb0) is primary device
[2.602802] Console: switching to colour frame buffer device 135x120
[2.610664] radeon :01:05.0: fb0: radeondrmfb frame buffer device
[2.627020] [ cut here ]
[2.627043] WARNING: CPU: 0 PID: 1 at ./include/drm/drm_crtc.h:857 
drm_kms_helper_poll_init+0x127/0x140
[2.627090] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.10.0-rc5-00107-g883af14e67e8-dirty #84
[2.627130] Hardware name: System manufacturer System Product Name/M4A78T-E, 
BIOS 350304/13/2011
[2.627172] Call Trace:
[2.627181]  ? dump_stack+0x46/0x64
[2.627194]  ? drm_kms_helper_poll_init+0x127/0x140
[2.627214]  ? warn_slowpath_null+0x5a/0xd1
[2.627230]  ? drm_kms_helper_poll_init+0x127/0x140
[2.627250]  ? radeon_modeset_init+0x48d/0x9a0
[2.627268]  ? radeon_driver_load_kms+0x13d/0x220
[2.627286]  ? drm_dev_register+0x31a/0x380
[2.627302]  ? drm_get_pci_dev+0x94/0x1e0
[2.627318]  ? pci_device_probe+0x81/0x100
[2.627334]  ? driver_probe_device+0x2d4/0x480
[2.627352]  ? __driver_attach+0xd1/0xe0
[2.627367]  ? driver_probe_device+0x480/0x480
[2.627384]  ? bus_for_each_dev+0x55/0xa0
[2.627399]  ? bus_add_driver+0x189/0x220
[2.627414]  ? driver_register+0x78/0x100
[2.627430]  ? ttm_init+0x5b/0x5b
[2.627442]  ? do_one_initcall+0x8c/0x122
[2.627457]  ? set_debug_rodata+0xc/0xc
[2.627471]  ? kernel_init_freeable+0x117/0x198
[2.627489]  ? rest_init+0x80/0x80
[2.627501]  ? kernel_init+0x6/0x240
[2.627513]  ? rest_init+0x80/0x80
[2.627526]  ? ret_from_fork+0x23/0x30
[2.627540] ---[ end trace 3cb8d4a331963460 ]---
[2.627608] [drm] Initialized radeon 2.48.0 20080528 for :01:05.0 on 
minor 0


-- 
Markus
___
dri-devel mailing list

commit a481daa88fd (drm/radeon: always apply pci shutdown callbacks) breaks reboot

2016-10-12 Thread Markus Trippelsdorf
Since:

commit a481daa88fd4d6b54f25348972bba10b5f6a84d0
Author: Alex Deucher 
Date:   Thu Sep 22 14:43:50 2016 -0400

drm/radeon: always apply pci shutdown callbacks

We can't properly detect all hypervisors and we
need this to properly tear down the hardware.

I cannot reboot my machine anymore. Before reboot the monitor goes blank and
the machine stays in that state until I press the reset button.

Hardware is RS780.

-- 
Markus


[no subject]

2014-09-08 Thread Markus Trippelsdorf
On 2014.09.07 at 23:47 -0400, Alex Deucher wrote:
> On Sun, Sep 7, 2014 at 9:24 AM, Markus Trippelsdorf
>  wrote:
> > On 2014.08.25 at 11:10 +0200, Christian K?nig wrote:
> >> Let me know if it works for you, cause we don't really have any hardware
> >> any more to test it.
> >
> > I've tested your patch series today (using drm-next-3.18 from
> > ~agd5f/linux) on a RS780D/Radeon HD 3300 system with a couple of H264
> > videos. While it sometimes works as expected, it stalled the GPU far too
> > often to be usable. The stalls are not recoverable and the machine ends
> > up with a black sreen, but still accepts SysRq keyboard inputs.
> 
> 
> Does it work any better if dpm is disabled?

Unfortunately no. The symptoms are unchanged.

-- 
Markus


[no subject]

2014-09-07 Thread Markus Trippelsdorf
On 2014.08.25 at 11:10 +0200, Christian K?nig wrote:
> Let me know if it works for you, cause we don't really have any hardware 
> any more to test it.

I've tested your patch series today (using drm-next-3.18 from
~agd5f/linux) on a RS780D/Radeon HD 3300 system with a couple of H264
videos. While it sometimes works as expected, it stalled the GPU far too
often to be usable. The stalls are not recoverable and the machine ends
up with a black sreen, but still accepts SysRq keyboard inputs.

Here are some logs:

vdpauinfo:
display: :0   screen: 0
API version: 1
Information string: G3DVL VDPAU Driver Shared Library version 1.0

Video surface:

name   width height types
---
420 8192  8192  NV12 YV12 
422 8192  8192  UYVY YUYV 
444 8192  8192  Y8U8V8A8 V8U8Y8A8 

Decoder capabilities:

name   level macbs width height
---
MPEG1 0  9216  2048  1152
MPEG2_SIMPLE  3  9216  2048  1152
MPEG2_MAIN3  9216  2048  1152
H264_BASELINE41  9216  2048  1152
H264_MAIN41  9216  2048  1152
H264_HIGH41  9216  2048  1152
VC1_ADVANCED  4  9216  2048  1152

Output surface:

name  width height nat types

B8G8R8A8  8192  8192y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 
R8G8B8A8  8192  8192y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 
R10G10B10A2   8192  8192y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 
B10G10R10A2   8192  8192y  NV12 YV12 UYVY YUYV Y8U8V8A8 V8U8Y8A8 

Bitmap surface:

name  width height
--
B8G8R8A8  8192  8192
R8G8B8A8  8192  8192
R10G10B10A2   8192  8192
B10G10R10A2   8192  8192
A88192  8192

Video mixer:

feature namesup

DEINTERLACE_TEMPORAL y
DEINTERLACE_TEMPORAL_SPATIAL -
INVERSE_TELECINE -
NOISE_REDUCTION  y
SHARPNESSy
LUMA_KEY -
HIGH QUALITY SCALING - L1-
HIGH QUALITY SCALING - L2-
HIGH QUALITY SCALING - L3-
HIGH QUALITY SCALING - L4-
HIGH QUALITY SCALING - L5-
HIGH QUALITY SCALING - L6-
HIGH QUALITY SCALING - L7-
HIGH QUALITY SCALING - L8-
HIGH QUALITY SCALING - L9-

parameter name  sup  min  max
-
VIDEO_SURFACE_WIDTH  y48 2048
VIDEO_SURFACE_HEIGHT y48 1152
CHROMA_TYPE  y  
LAYERS   y 04

attribute name  sup  min  max
-
BACKGROUND_COLOR y  
CSC_MATRIX   y  
NOISE_REDUCTION_LEVELy  0.00 1.00
SHARPNESS_LEVEL  y -1.00 1.00
LUMA_KEY_MIN_LUMAy  
LUMA_KEY_MAX_LUMAy  


Sep  7 14:03:45 x4 kernel: [drm] Initialized drm 1.1.0 20060810
Sep  7 14:03:45 x4 kernel: [drm] radeon kernel modesetting enabled.
Sep  7 14:03:45 x4 kernel: [drm] initializing kernel modesetting (RS780 
0x1002:0x9614 0x1043:0x834D).
Sep  7 14:03:45 x4 kernel: [drm] register mmio base: 0xFBEE
Sep  7 14:03:45 x4 kernel: [drm] register mmio size: 65536
Sep  7 14:03:45 x4 kernel: ATOM BIOS: 113
Sep  7 14:03:45 x4 kernel: radeon :01:05.0: VRAM: 128M 0xC000 - 
0xC7FF (128M used)
Sep  7 14:03:45 x4 kernel: radeon :01:05.0: GTT: 512M 0xA000 - 
0xBFFF
Sep  7 14:03:45 x4 kernel: [drm] Detected VRAM RAM=128M, BAR=128M
Sep  7 14:03:45 x4 kernel: [drm] RAM width 32bits DDR
Sep  7 14:03:45 x4 kernel: [TTM] Zone  kernel: Available graphics memory: 
4083350 kiB
Sep  7 14:03:45 x4 kernel: [TTM] Zone   dma32: Available graphics memory: 
2097152 kiB
Sep  7 14:03:45 x4 kernel: [TTM] Initializing pool allocator
Sep  7 14:03:45 x4 kernel: [TTM] Initializing DMA pool allocator
Sep  7 14:03:45 x4 kernel: [drm] radeon: 128M of VRAM memory ready
Sep  7 14:03:45 x4 kernel: [drm] radeon: 512M of GTT memory ready.
Sep  7 14:03:45 x4 kernel: [drm] Loading RS780 Microcode
Sep  7 14:03:45 x4 kernel: == power state 0 ==
Sep  7 14:03:45 x4 kernel:  ui class: none
Sep  7 14:03:45 x4 kernel:  internal class: boot 
Sep  7 14:03:45 x4 kernel:  caps: video 
Sep  7 14:03:45 x4 kernel:  uvdvclk: 0 dclk: 0
Sep  7 14:03:45 x4 kernel:  power level 0sclk: 5 
vddc_index: 2
Sep  7 14:03:45 x4 kernel:  power level 1sclk: 5 
vddc_index: 2
Sep  7 14:03:45 x4 kernel:  status: c r b 
Sep  7 14:03:45 x4 kernel: == power state 1 ==
Sep  7 14:03:45 x4 kernel:  ui class: performance
Sep  7 14:03:45 x4 kernel:  internal class: none
Sep  7 

[git pull] drm fixes

2014-04-20 Thread Markus Trippelsdorf
On 2014.04.20 at 17:23 +0200, Christian K?nig wrote:
> Dropping Linus and LKML for now.
> 
> Does the attached patch help? If not please open up a bug, tracking all 
> those logs in mails can become quite painful.

Yes, this patch if fine. Everything is working fine again.

Thanks.
-- 
Markus


[git pull] drm fixes

2014-04-20 Thread Markus Trippelsdorf
On 2014.04.20 at 11:56 +0200, Markus Trippelsdorf wrote:
> On 2014.04.20 at 10:27 +0200, Christian K?nig wrote:
> > > I did and it doesn't fix the issue.
> > In this case please open up a new bugreport on 
> > https://bugs.freedesktop.org/enter_bug.cgi?product=DRI and select 
> > DRM/Radeon as component.
> > 
> > Additional to that please provide a dmesg output generated with 
> > drm.debug=0xE for both the good and the bad case.
> 
> Good vs. bad:

Sorry it is actually bad vs. good:

> -[drm:radeon_compute_pll_avivo] 146250 - 14439, pll dividers - fb: 2047.0 
> ref: 29, post 7
> +[drm:radeon_compute_pll_avivo] 146250 - 14955, pll dividers - fb: 1023.5 
> ref: 14, post 7

-- 
Markus


[git pull] drm fixes

2014-04-20 Thread Markus Trippelsdorf
On 2014.04.20 at 10:27 +0200, Christian K?nig wrote:
> > I did and it doesn't fix the issue.
> In this case please open up a new bugreport on 
> https://bugs.freedesktop.org/enter_bug.cgi?product=DRI and select 
> DRM/Radeon as component.
> 
> Additional to that please provide a dmesg output generated with 
> drm.debug=0xE for both the good and the bad case.

Good vs. bad:

-[drm:radeon_compute_pll_avivo] 146250 - 14439, pll dividers - fb: 2047.0 ref: 
29, post 7
+[drm:radeon_compute_pll_avivo] 146250 - 14955, pll dividers - fb: 1023.5 ref: 
14, post 7

Full bad dmesg is attached.

-- 
Markus
-- next part --
Linux version 3.15.0-rc1-00421-g6d4596905b65-dirty (markus at x4) (gcc version 
4.9.0 20140417 (prerelease) (GCC) ) #24 SMP Sun Apr 20 11:44:09 CEST 2014
Command line: BOOT_IMAGE=/bzImage 
root=PARTUUID=3ce070b6-a2b4-4127-9478-6ea8ceb69db7 init=/sbin/minit 
fbcon=rotate:3 radeon.dpm=1 radeon.audio=0 drm_kms_helper.poll=0 quiet 
drm.debug=0xE
KERNEL supported cpus:
  AMD AuthenticAMD
e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x-0x0009fbff] usable
BIOS-e820: [mem 0x0009fc00-0x0009] reserved
BIOS-e820: [mem 0x000e6000-0x000f] reserved
BIOS-e820: [mem 0x0010-0xdfe8] usable
BIOS-e820: [mem 0xdfe9-0xdfea7fff] ACPI data
BIOS-e820: [mem 0xdfea8000-0xdfec] ACPI NVS
BIOS-e820: [mem 0xdfed-0xdfef] reserved
BIOS-e820: [mem 0xfff0-0x] reserved
BIOS-e820: [mem 0x0001-0x00021fff] usable
NX (Execute Disable) protection: active
SMBIOS 2.5 present.
DMI: System manufacturer System Product Name/M4A78T-E, BIOS 350304/13/2011
e820: update [mem 0x-0x0fff] usable ==> reserved
e820: remove [mem 0x000a-0x000f] usable
e820: last_pfn = 0x22 max_arch_pfn = 0x4
MTRR default type: uncachable
MTRR fixed ranges enabled:
  0-9 write-back
  A-E uncachable
  F-F write-protect
MTRR variable ranges enabled:
  0 base  mask 8000 write-back
  1 base 8000 mask C000 write-back
  2 base C000 mask E000 write-back
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
TOM2: 00022000 aka 8704M
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
e820: update [mem 0xe000-0x] usable ==> reserved
e820: last_pfn = 0xdfe90 max_arch_pfn = 0x4
Base memory trampoline at [88099000] 99000 size 24576
Using GB pages for direct mapping
init_memory_mapping: [mem 0x-0x000f]
 [mem 0x-0x000f] page 4k
BRK [0x019bf000, 0x019b] PGTABLE
BRK [0x019c, 0x019c0fff] PGTABLE
BRK [0x019c1000, 0x019c1fff] PGTABLE
init_memory_mapping: [mem 0x21fe0-0x21fff]
 [mem 0x21fe0-0x21fff] page 2M
BRK [0x019c2000, 0x019c2fff] PGTABLE
init_memory_mapping: [mem 0x21c00-0x21fdf]
 [mem 0x21c00-0x21fdf] page 2M
init_memory_mapping: [mem 0x2-0x21bff]
 [mem 0x2-0x21bff] page 2M
init_memory_mapping: [mem 0x0010-0xdfe8]
 [mem 0x0010-0x001f] page 4k
 [mem 0x0020-0x3fff] page 2M
 [mem 0x4000-0xbfff] page 1G
 [mem 0xc000-0xdfdf] page 2M
 [mem 0xdfe0-0xdfe8] page 4k
init_memory_mapping: [mem 0x1-0x1]
 [mem 0x1-0x1] page 1G
ACPI: RSDP 0x000FB880 24 (v02 ACPIAM)
ACPI: XSDT 0xDFE90100 5C (v01 041311 XSDT1656 20110413 MSFT 
0097)
ACPI: FACP 0xDFE90290 F4 (v03 041311 FACP1656 20110413 MSFT 
0097)
ACPI BIOS Warning (bug): Optional FADT field Pm2ControlBlock has zero address 
or length: 0x/0x1 (20140214/tbfadt-634)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 
(20140214/tbfadt-603)
ACPI: DSDT 0xDFE90450 00E6FE (v01 A1152  A1152000  INTL 
20060113)
ACPI: FACS 0xDFEA8000 40
ACPI: APIC 0xDFE90390 7C (v01 041311 APIC1656 20110413 MSFT 
0097)
ACPI: MCFG 0xDFE90410 3C (v01 041311 OEMMCFG  20110413 MSFT 
0097)
ACPI: OEMB 0xDFEA8040 72 (v01 041311 OEMB1656 20110413 MSFT 
0097)
ACPI: SRAT 0xDFE9F450 E8 (v01 AMDFAM_F_10 0002 AMD  
0001)
ACPI: HPET 0xDFE9F540 38 (v01 041311 OEMHPET  20110413 MSFT 
0097)
ACPI: SSDT 0xDFE9F580 00088C (v01 A M I  POWERNOW 0001 AMD  
0001)
ACPI: Local APIC address 0xfee0
 [ea00-ea00087f] PMD -> [88021760-88021f5f] 
on node 0
Zone ranges:
  DMA  [mem 0x1000-0x00ff]
  DMA32[mem 0x0100-0x]
  Normal   [mem 0x1-0x21fff]
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x1000-0x0009efff]
  node   0: [mem 0x0010-0xdfe8]
  node   0: [mem 0x1-0x21fff]
On node 0 totalpages: 2096686
  DMA 

[git pull] drm fixes

2014-04-20 Thread Markus Trippelsdorf
On 2014.04.19 at 21:21 -0400, Alex Deucher wrote:
> On Sat, Apr 19, 2014 at 3:03 PM, Markus Trippelsdorf
>  wrote:
> > On 2014.04.19 at 08:19 +0100, Dave Airlie wrote:
> >>
> >> Unfortunately this contains no easter eggs, its a bit larger than I'd
> >> like, but I included a patch that just moves code from one file to another
> >> and I'd like to avoid merge conflicts with that later, so it makes it seem
> >> worse than it is,
> >
> >> Christian K?nig (2):
> >>   drm/radeon: apply more strict limits for PLL params v2
> >>   drm/radeon: improve PLL params if we don't match exactly v2
> >
> > commit f8a2645ecede4eaf90b3d785f2805c8ecb76d43e
> > Author: Christian K?nig 
> > Date:   Wed Apr 16 11:54:21 2014 +0200
> >
> > drm/radeon: improve PLL params if we don't match exactly v2
> >
> > The commit above causes my monitor to just stay blank after boot.
> > No framebuffer, no Xorg, no nothing. I'm using a Radeon RS780.
> 
> Can you try the patch on:
> https://bugs.freedesktop.org/show_bug.cgi?id=77673

I did and it doesn't fix the issue.

-- 
Markus


[git pull] drm fixes

2014-04-19 Thread Markus Trippelsdorf
On 2014.04.19 at 08:19 +0100, Dave Airlie wrote:
> 
> Unfortunately this contains no easter eggs, its a bit larger than I'd 
> like, but I included a patch that just moves code from one file to another 
> and I'd like to avoid merge conflicts with that later, so it makes it seem 
> worse than it is,

> Christian K?nig (2):
>   drm/radeon: apply more strict limits for PLL params v2
>   drm/radeon: improve PLL params if we don't match exactly v2

commit f8a2645ecede4eaf90b3d785f2805c8ecb76d43e
Author: Christian K?nig 
Date:   Wed Apr 16 11:54:21 2014 +0200

drm/radeon: improve PLL params if we don't match exactly v2

The commit above causes my monitor to just stay blank after boot.
No framebuffer, no Xorg, no nothing. I'm using a Radeon RS780.


Apr 19 20:55:45 x4 kernel: Serial: 8250/16550 driver, 4 ports, IRQ sharing 
disabled
Apr 19 20:55:45 x4 kernel: [drm] Initialized drm 1.1.0 20060810
Apr 19 20:55:45 x4 kernel: [drm] radeon kernel modesetting enabled.
Apr 19 20:55:45 x4 kernel: [drm] initializing kernel modesetting (RS780 
0x1002:0x9614 0x1043:0x834D).
Apr 19 20:55:45 x4 kernel: [drm] register mmio base: 0xFBEE
Apr 19 20:55:45 x4 kernel: [drm] register mmio size: 65536
Apr 19 20:55:45 x4 kernel: ATOM BIOS: 113
Apr 19 20:55:45 x4 kernel: radeon :01:05.0: VRAM: 128M 0xC000 - 
0xC7FF (128M used)
Apr 19 20:55:45 x4 kernel: radeon :01:05.0: GTT: 512M 0xA000 - 
0xBFFF
Apr 19 20:55:45 x4 kernel: [drm] Detected VRAM RAM=128M, BAR=128M
Apr 19 20:55:45 x4 kernel: [drm] RAM width 32bits DDR
Apr 19 20:55:45 x4 kernel: [TTM] Zone  kernel: Available graphics memory: 
4083362 kiB
Apr 19 20:55:45 x4 kernel: [TTM] Zone   dma32: Available graphics memory: 
2097152 kiB
Apr 19 20:55:45 x4 kernel: [TTM] Initializing pool allocator
Apr 19 20:55:45 x4 kernel: [TTM] Initializing DMA pool allocator
Apr 19 20:55:45 x4 kernel: [drm] radeon: 128M of VRAM memory ready
Apr 19 20:55:45 x4 kernel: [drm] radeon: 512M of GTT memory ready.
Apr 19 20:55:45 x4 kernel: [drm] Loading RS780 Microcode
Apr 19 20:55:45 x4 kernel: == power state 0 ==
Apr 19 20:55:45 x4 kernel:  ui class: none
Apr 19 20:55:45 x4 kernel:  internal class: boot
Apr 19 20:55:45 x4 kernel:  caps: video
Apr 19 20:55:45 x4 kernel:  uvdvclk: 0 dclk: 0
Apr 19 20:55:45 x4 kernel:  power level 0sclk: 5 
vddc_index: 2
Apr 19 20:55:45 x4 kernel:  power level 1sclk: 5 
vddc_index: 2
Apr 19 20:55:45 x4 kernel:  status: c r b
Apr 19 20:55:45 x4 kernel: == power state 1 ==
Apr 19 20:55:45 x4 kernel:  ui class: performance
Apr 19 20:55:45 x4 kernel:  internal class: none
Apr 19 20:55:45 x4 kernel:  caps: video
Apr 19 20:55:45 x4 kernel:  uvdvclk: 0 dclk: 0
Apr 19 20:55:45 x4 kernel:  power level 0sclk: 5 
vddc_index: 1
Apr 19 20:55:45 x4 kernel:  power level 1sclk: 7 
vddc_index: 2
Apr 19 20:55:45 x4 kernel:  status:
Apr 19 20:55:45 x4 kernel: == power state 2 ==
Apr 19 20:55:45 x4 kernel:  ui class: none
Apr 19 20:55:45 x4 kernel:  internal class: uvd
Apr 19 20:55:45 x4 kernel:  caps: video
Apr 19 20:55:45 x4 kernel:  uvdvclk: 53300 dclk: 4
Apr 19 20:55:45 x4 kernel:  power level 0sclk: 5 
vddc_index: 1
Apr 19 20:55:45 x4 kernel:  power level 1sclk: 5 
vddc_index: 1
Apr 19 20:55:45 x4 kernel:  status:
Apr 19 20:55:45 x4 kernel: [drm] radeon: dpm initialized
Apr 19 20:55:45 x4 kernel: [drm] GART: num cpu pages 131072, num gpu pages 
131072
Apr 19 20:55:45 x4 kernel: [drm] PCIE GART of 512M enabled (table at 
0xC004).
Apr 19 20:55:45 x4 kernel: radeon :01:05.0: WB enabled
Apr 19 20:55:45 x4 kernel: radeon :01:05.0: fence driver on ring 0 use gpu 
addr 0xac00 and cpu addr 0x8800db8d0c00
Apr 19 20:55:45 x4 kernel: [drm] Supports vblank timestamp caching Rev 2 
(21.10.2013).
Apr 19 20:55:45 x4 kernel: [drm] Driver supports precise vblank timestamp query.
Apr 19 20:55:45 x4 kernel: [drm] radeon: irq initialized.
Apr 19 20:55:45 x4 kernel: [drm] ring test on 0 succeeded in 1 usecs
Apr 19 20:55:45 x4 kernel: [drm] ib test on ring 0 succeeded in 0 usecs
Apr 19 20:55:45 x4 kernel: [drm] Radeon Display Connectors
Apr 19 20:55:45 x4 kernel: [drm] Connector 0:
Apr 19 20:55:45 x4 kernel: [drm]   VGA-1
Apr 19 20:55:45 x4 kernel: [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 
0x7e48 0x7e4c 0x7e4c
Apr 19 20:55:45 x4 kernel: [drm]   Encoders:
Apr 19 20:55:45 x4 kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1
Apr 19 20:55:45 x4 kernel: [drm] Connector 1:
Apr 19 20:55:45 x4 kernel: [drm]   DVI-D-1
Apr 19 20:55:45 x4 kernel: [drm]   HPD3
Apr 19 20:55:45 x4 kernel: [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 
0x7e58 0x7e5c 0x7e5c
Apr 19 20:55:45 x4 kernel: [drm]   Encoders:
Apr 19 20:55:45 x4 kernel: [drm] DFP3: INTERNAL_KLDSCP_LVTMA
Apr 19 20:55:45 x4 kernel: switching from power state:
Apr 19 

Can no longer shutdown after drm/radeon: Implement radeon_pci_shutdown

2013-12-12 Thread Markus Trippelsdorf
On 2013.12.12 at 03:27 +, Deucher, Alexander wrote:
> > On 2013.12.11 at 23:46 +, Deucher, Alexander wrote:
> > > > -Original Message-
> > > > From: Peter Chubb [mailto:peter.chubb at nicta.com.au]
> > > > Sent: Wednesday, December 11, 2013 5:11 PM
> > > > To: Markus Trippelsdorf
> > > > Cc: Peter Chubb; Deucher, Alexander; airlied at linux.ie; dri-
> > > > devel at lists.freedesktop.org
> > > > Subject: Re: Can no longer shutdown after drm/radeon: Implement
> > > > radeon_pci_shutdown
> > > >
> > > > >>>>> "Markus" == Markus Trippelsdorf 
> > writes:
> > > >
> > > > Markus> On 2013.12.11 at 11:37 +1100, Peter Chubb wrote:
> > > >
> > > > Markus> It would be interesting to know where exactly it hangs.  Could
> > > > Markus> you comment out the *_fini(rdev) calls in
> > > > Markus> radeon_driver_unload_kms
> > > > (drivers/gpu/drm/radeon/radeon_kms.c)
> > > > Markus> one after the other to find out which one is responsible?
> > > >
> > > > It's radeon_device_fini() that is the problem.
> > >
> > > I think the problem is that the drm subsystem tears down the device
> > > via drm_driver.unload in drm_dev_unregister(), but now that we have a
> > > pci_driver.shutdown callback (which is needed for kexec) that gets
> > > called too so the driver gets torn down twice.
> > 
> > If that is the case the following patch should fix the issue.
> > Can you give it a try, Peter?
(Peter:)
> Thanks that works.  I tested shutdown, kexec, and s2disk --- all work
> correctly.

> That may work, but I think it's just papering over a race which may
> still bite someone else depending on the timing.

This leaves three possibilities:

1) Revert 846ae41ae99d now and come up with a solution with proper
locking for 3.14
2) Add my simple fix now and implement additional locking if the need
arises for 3.14.
3) Implement a fix with proper locking now.

It's your choice Alex.

-- 
Markus


Can no longer shutdown after drm/radeon: Implement radeon_pci_shutdown

2013-12-12 Thread Markus Trippelsdorf
On 2013.12.11 at 23:46 +, Deucher, Alexander wrote:
> > -Original Message-
> > From: Peter Chubb [mailto:peter.chubb at nicta.com.au]
> > Sent: Wednesday, December 11, 2013 5:11 PM
> > To: Markus Trippelsdorf
> > Cc: Peter Chubb; Deucher, Alexander; airlied at linux.ie; dri-
> > devel at lists.freedesktop.org
> > Subject: Re: Can no longer shutdown after drm/radeon: Implement
> > radeon_pci_shutdown
> > 
> > >>>>> "Markus" == Markus Trippelsdorf  writes:
> > 
> > Markus> On 2013.12.11 at 11:37 +1100, Peter Chubb wrote:
> > 
> > Markus> It would be interesting to know where exactly it hangs.  Could
> > Markus> you comment out the *_fini(rdev) calls in
> > Markus> radeon_driver_unload_kms
> > (drivers/gpu/drm/radeon/radeon_kms.c)
> > Markus> one after the other to find out which one is responsible?
> > 
> > It's radeon_device_fini() that is the problem.
> 
> I think the problem is that the drm subsystem tears down the device
> via drm_driver.unload in drm_dev_unregister(), but now that we have a
> pci_driver.shutdown callback (which is needed for kexec) that gets
> called too so the driver gets torn down twice.

If that is the case the following patch should fix the issue.
Can you give it a try, Peter?

diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
b/drivers/gpu/drm/radeon/radeon_kms.c
index 55d0b474bd37..539e5f1ff5e3 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -59,7 +59,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
radeon_acpi_fini(rdev);

radeon_modeset_fini(rdev);
-   radeon_device_fini(rdev);
+   if (!rdev->shutdown)
+   radeon_device_fini(rdev);

 done_free:
kfree(rdev);
-- 
Markus


Can no longer shutdown after drm/radeon: Implement radeon_pci_shutdown

2013-12-11 Thread Markus Trippelsdorf
On 2013.12.11 at 11:37 +1100, Peter Chubb wrote:
> On my HP Elitebook 8740w qith a Mobility Radeon HD 5870
> commit 846ae41ae99d314bf2a02784152208a6ddf7eddc
> breaks shutdown.  The machine hangs when trying to shutdown, kexec or
> hibernate, before seeing the usual `machine halted' (or whatever) message.
> 
> If I comment out thus:
> 
> index 9f5ff28..40bff3c 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -514,7 +514,7 @@ radeon_pci_shutdown(struct pci_dev *pdev)
>  {
>   struct drm_device *dev = pci_get_drvdata(pdev);
>  
> - radeon_driver_unload_kms(dev);
> + /*radeon_driver_unload_kms(dev);*/
>  }
>  
> then everything works again.  Thsi is obviously not the proper fix.

It would be interesting to know where exactly it hangs.
Could you comment out the *_fini(rdev) calls in radeon_driver_unload_kms
(drivers/gpu/drm/radeon/radeon_kms.c) one after the other to find out
which one is responsible? 

-- 
Markus


Re: [PATCH 0/3] drm/radeon kexec fixes

2013-09-11 Thread Markus Trippelsdorf
On 2013.09.10 at 16:40 -0400, Alex Deucher wrote:
 On Tue, Sep 10, 2013 at 2:27 PM, Eric W. Biederman
 ebied...@xmission.com wrote:
  Alex Deucher alexdeuc...@gmail.com writes:
 
  On Mon, Sep 9, 2013 at 5:21 AM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
 
  IIRC Alex said the sanity checks are expensive and boot-time could be
  improved by dropping them. Maybe he can chime in?
 
  They shouldn't be necessary with a proper shutdown, but in this
  particular case, they are not very expensive.  What is expensive is
  having a separate sanity check functions for all the various hw blocks
  to teardown everything on startup prior to starting it up in case
  kexec, etc. left the system in a bad state.  It ends up amounting to a
  full tear down sequence followed by a full start up sequence every
  time you load the driver.
 
  I can't really comment on the first patch, but the rest seem fine.
 
  Let me reask the question just a little bit.
 
  Is it the sanity checks that are expensive?  Or is it the
  reinitialization that is triggered by the sanity checks that is
  expensive?
 
  From what Christian said in the other reply it sounds like this is a
  game we will never completely win, but it would be nice to have half a
  chance in the kexec on panic case to have video.  So I am curious to
  know if the checks are expensive when we are coming at hardware in a
  clean state.
 
 The particular sanity checks from this patch set are not expensive,
 but we had previously discussed more extensive sanity checks for other
 aspects of the chips in prior conversations.  Prior to this patch set,
 the hw is not torn down properly (might have been in the middle of DMA
 for example) when kexec happens.  That's why the sanity checks were
 added in the first place.  With this patch set, the sanity checks
 shouldn't be necessary.

I think you're talking past each other. 
What Eric worries about is the »kexec on panic« case, where the shutdown
method *isn't* called. In this case the sanity checks, that are only
superfluous when the hardware was shutdown normally during kexec (the
default case), may actually help. And because the checks aren't
expensive, it wouldn't hurt to just leave them in place.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/3] drm/radeon kexec fixes

2013-09-11 Thread Markus Trippelsdorf
On 2013.09.09 at 11:38 +0200, Christian König wrote:
 Am 09.09.2013 11:21, schrieb Markus Trippelsdorf:
  On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote:
  Markus Trippelsdorf mar...@trippelsdorf.de writes:
 
  Here are a couple of patches that get kexec working with radeon devices.
  I've tested this on my RS780.
  Comments or flames are welcome.
  Thanks.
  A couple of high level comments.
 
  This looks promising for the usual case.
 
  Removing the printk at the end of the kexec path seems a little dubious,
  what of other cpus, interrupt handlers, etc.  Basically estabilishing a
  new rule on when printk is allowed seems a little dubious at this point,
  even if it is a useful debugging trick.
  OK. I will drop this patch. It doesn't seem to be necessary, because I
  cannot reproduce the printk related hang anymore.
 
  Having a clean shutdown of the radeon definitely seems worth doing,
  because the cases where we care abouty video are when a person is in
  front of the system.
  Yes. But please note that even with radeon_pci_shutdown implemented, I
  still get ring test failures on roughly every eighth kexec boot:
 
[drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
radeon :01:05.0: disabling GPU acceleration
 
  That's definitely better than the current state of affairs, with ring
  test failures on every second boot. But I haven't figured out the reason
  for these failures yet. It's curious that once a ring test failure
  occurs, it will reliably fail after each kexec invocation, no matter how
  often repeated. Only a reboot brings the machine back to normal.
 
 The main problem here is that the AMD gfx hardware doesn't really 
 support being reinitialized once booted (for various reasons). It's a 
 (intended) limitation of the hardware design that you can only 
 initialize certain blocks once every power cycle, so the whole approach 
 actually will never work 100% reliable.
 
 All you can hope for is that stopping the hardware while shutting down 
 the old kernel and starting it again results in exactly the same 
 hardware parameters (offsets, clock etc...) otherwise starting the 
 blocks will just fail and you end up with disabled acceleration like above.
 
 Sorry, but there isn't much we can do about this,

I've tested this further and it turned out that if I revert commit
f5d9b7f0f9 on top of my drm/radeon: Implement radeon_pci_shutdown
patch, the initialization failures seem to go away completely.

Any idea what's going on?

Here's the patch:

diff --git a/drivers/gpu/drm/radeon/r600_dpm.c 
b/drivers/gpu/drm/radeon/r600_dpm.c
index fa0de46..4e8c1988 100644
--- a/drivers/gpu/drm/radeon/r600_dpm.c
+++ b/drivers/gpu/drm/radeon/r600_dpm.c
@@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev)
 void r600_enable_sclk_control(struct radeon_device *rdev, bool enable)
 {
if (enable)
-   WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF);
+   WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF);
else
-   WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);
+   WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);
 }
 
 void r600_enable_mclk_control(struct radeon_device *rdev, bool enable)


-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/3] drm/radeon kexec fixes

2013-09-09 Thread Markus Trippelsdorf
On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote:
 Markus Trippelsdorf mar...@trippelsdorf.de writes:
 
  Here are a couple of patches that get kexec working with radeon devices.
  I've tested this on my RS780. 
  Comments or flames are welcome.
  Thanks.
 
 A couple of high level comments.
 
 This looks promising for the usual case.
 
 Removing the printk at the end of the kexec path seems a little dubious,
 what of other cpus, interrupt handlers, etc.  Basically estabilishing a
 new rule on when printk is allowed seems a little dubious at this point,
 even if it is a useful debugging trick.

OK. I will drop this patch. It doesn't seem to be necessary, because I
cannot reproduce the printk related hang anymore.

 Having a clean shutdown of the radeon definitely seems worth doing,
 because the cases where we care abouty video are when a person is in
 front of the system.

Yes. But please note that even with radeon_pci_shutdown implemented, I
still get ring test failures on roughly every eighth kexec boot:

 [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)   

 radeon :01:05.0: disabling GPU acceleration  

That's definitely better than the current state of affairs, with ring
test failures on every second boot. But I haven't figured out the reason
for these failures yet. It's curious that once a ring test failure
occurs, it will reliably fail after each kexec invocation, no matter how
often repeated. Only a reboot brings the machine back to normal.

 I don't know if you want to remove the sanity checks.  They seem cheap
 and safe regardless.  Are they expensive or ineffective?  Moreover if
 they work a reasonable amount of the time that means that the kexec on
 panic case (where we don't shut anything down) can actually use the
 video, and that in general the driver will be more robust.  I don't
 expect anyone much cares as kexec on panic is mostly used to just write
 a core file to the network, or the local disk.  But if it is easy to
 keep that case working most of the time, why not.

IIRC Alex said the sanity checks are expensive and boot-time could be
improved by dropping them. Maybe he can chime in?

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 0/3] drm/radeon kexec fixes

2013-09-08 Thread Markus Trippelsdorf
Here are a couple of patches that get kexec working with radeon devices.
I've tested this on my RS780. 
Comments or flames are welcome.
Thanks.

Markus Trippelsdorf (3):
  kexec: get rid of late printk
  drm/radeon: Implement radeon_pci_shutdown
  drm/radeon: get rid of r100_restore_sanity hack

 drivers/gpu/drm/radeon/r100.c| 27 ---
 drivers/gpu/drm/radeon/r300.c|  2 --
 drivers/gpu/drm/radeon/r420.c|  2 --
 drivers/gpu/drm/radeon/r520.c|  2 --
 drivers/gpu/drm/radeon/radeon_asic.h |  1 -
 drivers/gpu/drm/radeon/radeon_drv.c  | 10 ++
 drivers/gpu/drm/radeon/rs400.c   |  2 --
 drivers/gpu/drm/radeon/rs600.c   |  2 --
 drivers/gpu/drm/radeon/rs690.c   |  2 --
 drivers/gpu/drm/radeon/rv515.c   |  2 --
 kernel/kexec.c   |  1 -
 11 files changed, 10 insertions(+), 43 deletions(-)

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/3] kexec: get rid of late printk

2013-09-08 Thread Markus Trippelsdorf
kexec calls:
 printk(KERN_EMERG Starting new kernel\n);
late before calling machine_shutdown().
However at this point the underlying fb device may have already been
shutdown. This causes the kernel to hang.
Fix by simply getting rid of the printk call.

Signed-off-by: Markus Trippelsdorf mar...@trippelsdorf.de
---
 kernel/kexec.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 59f7b55..f33fa9f 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1679,7 +1679,6 @@ int kernel_kexec(void)
 #endif
{
kernel_restart_prepare(NULL);
-   printk(KERN_EMERG Starting new kernel\n);
machine_shutdown();
}
 
-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/3] drm/radeon: Implement radeon_pci_shutdown

2013-09-08 Thread Markus Trippelsdorf
Currently radeon devices are not properbly shutdown during kexec. This
cases a varity of issues, e.g. dpm initialization failures.
Fix this by implementing a radeon_pci_shutdown function, that unloads
the driver cleanly.

Signed-off-by: Markus Trippelsdorf mar...@trippelsdorf.de
---
 drivers/gpu/drm/radeon/radeon_drv.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
b/drivers/gpu/drm/radeon/radeon_drv.c
index cb4445f..d71edee 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -380,6 +380,15 @@ static const struct file_operations radeon_driver_kms_fops 
= {
 #endif
 };
 
+
+static void
+radeon_pci_shutdown(struct pci_dev *pdev)
+{
+   struct drm_device *dev = pci_get_drvdata(pdev);
+
+   radeon_driver_unload_kms(dev);
+}
+
 static struct drm_driver kms_driver = {
.driver_features =
DRIVER_USE_AGP |
@@ -453,6 +462,7 @@ static struct pci_driver radeon_kms_pci_driver = {
.remove = radeon_pci_remove,
.suspend = radeon_pci_suspend,
.resume = radeon_pci_resume,
+   .shutdown = radeon_pci_shutdown,
 };
 
 static int __init radeon_init(void)
-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 3/3] drm/radeon: get rid of r100_restore_sanity hack

2013-09-08 Thread Markus Trippelsdorf
Now that radeon devices are properly shutdown during kexec, we can get
rid of r100_restore_sanity.

Signed-off-by: Markus Trippelsdorf mar...@trippelsdorf.de
---
 drivers/gpu/drm/radeon/r100.c| 27 ---
 drivers/gpu/drm/radeon/r300.c|  2 --
 drivers/gpu/drm/radeon/r420.c|  2 --
 drivers/gpu/drm/radeon/r520.c|  2 --
 drivers/gpu/drm/radeon/radeon_asic.h |  1 -
 drivers/gpu/drm/radeon/rs400.c   |  2 --
 drivers/gpu/drm/radeon/rs600.c   |  2 --
 drivers/gpu/drm/radeon/rs690.c   |  2 --
 drivers/gpu/drm/radeon/rv515.c   |  2 --
 9 files changed, 42 deletions(-)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 9fc61dd..d53dcd8 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -3938,31 +3938,6 @@ void r100_fini(struct radeon_device *rdev)
rdev-bios = NULL;
 }
 
-/*
- * Due to how kexec works, it can leave the hw fully initialised when it
- * boots the new kernel. However doing our init sequence with the CP and
- * WB stuff setup causes GPU hangs on the RN50 at least. So at startup
- * do some quick sanity checks and restore sane values to avoid this
- * problem.
- */
-void r100_restore_sanity(struct radeon_device *rdev)
-{
-   u32 tmp;
-
-   tmp = RREG32(RADEON_CP_CSQ_CNTL);
-   if (tmp) {
-   WREG32(RADEON_CP_CSQ_CNTL, 0);
-   }
-   tmp = RREG32(RADEON_CP_RB_CNTL);
-   if (tmp) {
-   WREG32(RADEON_CP_RB_CNTL, 0);
-   }
-   tmp = RREG32(RADEON_SCRATCH_UMSK);
-   if (tmp) {
-   WREG32(RADEON_SCRATCH_UMSK, 0);
-   }
-}
-
 int r100_init(struct radeon_device *rdev)
 {
int r;
@@ -3975,8 +3950,6 @@ int r100_init(struct radeon_device *rdev)
radeon_scratch_init(rdev);
/* Initialize surface registers */
radeon_surface_init(rdev);
-   /* sanity check some register to avoid hangs like after kexec */
-   r100_restore_sanity(rdev);
/* TODO: disable VGA need to use VGA request */
/* BIOS*/
if (!radeon_get_bios(rdev)) {
diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c
index d8dd269..57ba534 100644
--- a/drivers/gpu/drm/radeon/r300.c
+++ b/drivers/gpu/drm/radeon/r300.c
@@ -1480,8 +1480,6 @@ int r300_init(struct radeon_device *rdev)
/* Initialize surface registers */
radeon_surface_init(rdev);
/* TODO: disable VGA need to use VGA request */
-   /* restore some register to sane defaults */
-   r100_restore_sanity(rdev);
/* BIOS*/
if (!radeon_get_bios(rdev)) {
if (ASIC_IS_AVIVO(rdev))
diff --git a/drivers/gpu/drm/radeon/r420.c b/drivers/gpu/drm/radeon/r420.c
index 4e796ec..9ee3360 100644
--- a/drivers/gpu/drm/radeon/r420.c
+++ b/drivers/gpu/drm/radeon/r420.c
@@ -371,8 +371,6 @@ int r420_init(struct radeon_device *rdev)
/* Initialize surface registers */
radeon_surface_init(rdev);
/* TODO: disable VGA need to use VGA request */
-   /* restore some register to sane defaults */
-   r100_restore_sanity(rdev);
/* BIOS*/
if (!radeon_get_bios(rdev)) {
if (ASIC_IS_AVIVO(rdev))
diff --git a/drivers/gpu/drm/radeon/r520.c b/drivers/gpu/drm/radeon/r520.c
index e1aece7..4709c10 100644
--- a/drivers/gpu/drm/radeon/r520.c
+++ b/drivers/gpu/drm/radeon/r520.c
@@ -256,8 +256,6 @@ int r520_init(struct radeon_device *rdev)
radeon_scratch_init(rdev);
/* Initialize surface registers */
radeon_surface_init(rdev);
-   /* restore some register to sane defaults */
-   r100_restore_sanity(rdev);
/* TODO: disable VGA need to use VGA request */
/* BIOS*/
if (!radeon_get_bios(rdev)) {
diff --git a/drivers/gpu/drm/radeon/radeon_asic.h 
b/drivers/gpu/drm/radeon/radeon_asic.h
index 818bbe6..6eee9e2 100644
--- a/drivers/gpu/drm/radeon/radeon_asic.h
+++ b/drivers/gpu/drm/radeon/radeon_asic.h
@@ -122,7 +122,6 @@ void r100_mc_resume(struct radeon_device *rdev, struct 
r100_mc_save *save);
 void r100_vram_init_sizes(struct radeon_device *rdev);
 int r100_cp_reset(struct radeon_device *rdev);
 void r100_vga_render_disable(struct radeon_device *rdev);
-void r100_restore_sanity(struct radeon_device *rdev);
 int r100_cs_track_check_pkt3_indx_buffer(struct radeon_cs_parser *p,
 struct radeon_cs_packet *pkt,
 struct radeon_bo *robj);
diff --git a/drivers/gpu/drm/radeon/rs400.c b/drivers/gpu/drm/radeon/rs400.c
index b8074a8..23bbf89 100644
--- a/drivers/gpu/drm/radeon/rs400.c
+++ b/drivers/gpu/drm/radeon/rs400.c
@@ -510,8 +510,6 @@ int rs400_init(struct radeon_device *rdev)
/* Initialize surface registers */
radeon_surface_init(rdev);
/* TODO: disable VGA need to use VGA request */
-   /* restore some register to sane defaults */
-   r100_restore_sanity(rdev);
/* BIOS

Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-30 Thread Markus Trippelsdorf
On 2013.07.30 at 20:46 +0200, Markus Trippelsdorf wrote:
> On 2013.07.30 at 10:53 -0400, Alex Deucher wrote:
> > On Tue, Jul 30, 2013 at 7:27 AM, Markus Trippelsdorf
> >  wrote:
> > > On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
> > >> On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
> > >>  wrote:
> > >> > Alex Deucher  writes:
> > >> >
> > >> >> On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
> > >> >>  wrote:
> > >> >>>
> > >> >>>
> > >> >>> Alex Deucher  wrote:
> > >> >>>>On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
> > >> >>>> wrote:
> > >> >>>>> On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
> > >> >>>>>> On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
> > >> >>>>>>  wrote:
> > >> >>>>>> > On my test machine Xorg doesn't start anymore when I kexec into 
> > >> >>>>>> > a
> > >> >>>>>> > 3.11.0-rc3 kernel.
> > >> >>>>>>
> > >> >>>>>> With kexec, dpm doesn't get torn down properly which can result 
> > >> >>>>>> in a
> > >> >>>>>> bad hardware state when the driver loads again.  Does the attached
> > >> >>>>>> patch help?  It attempts to disable dpm at startup in case it 
> > >> >>>>>> wasn't
> > >> >>>>>> torn down properly previously.
> > >> >>>>>
> > >> >>>>> dpm initialization now works, but unfortunately GPU acceleration
> > >> >>>>still gets
> > >> >>>>> disabled:
> > >> >>>>
> > >> >>>>Stupid kexec complicates things.  We need to make sure dpm is torn
> > >> >>>>down before we init the rest of the GPU, but dpm needs get 
> > >> >>>>initialized
> > >> >>>>later in the init process since it depends on certain other state 
> > >> >>>>from
> > >> >>>>the driver.  I need to think about this for a bit.  I'm not sure of a
> > >> >>>>good way to handle this.
> > >> >>>
> > >> >>> You might just want to implement a shutdown method that cleans 
> > >> >>> things up properly.   At least as a first pass until you worry about 
> > >> >>> things like kexec on panic.
> > >> >>>
> > >> >>> Or can you not shutdown the graphics stack on reboot because of the 
> > >> >>> need to display the kernels shutdown progress?
> > >> >>
> > >> >> Does kexec actually call this shutdown method?  The driver implements
> > >> >> appropriate clean-up measures if it's shutdown properly.
> > >> >
> > >> > Absoltuely.  All parts of the reboot path call ->shutdown.  Including
> > >> > kexec.
> > >> >
> > >> > You don't get a device remove/hotunplug but unless this is a kexec on
> > >> > panic ->shutdown is most definitely called.  Now I am talking about the
> > >> > device layer/pci layer shutdown method I don't know how gpu drivers are
> > >> > wired up.  GPU land was a little strange last I looked.  Hopefully it
> > >> > isn't so strange that there is a method named shutdown that is not 
> > >> > wired
> > >> > up.
> > >>
> > >> It doesn't look like the drm infrastructure has a shutdown callback.
> > >> The drm drivers register a drm_driver callback struct that includes an
> > >> unload callback which takes care of all the device teardown (if you
> > >> unload the module for example).  I don't know that it actually gets
> > >> called at kexec time however.  I don't know enough about how kexec
> > >> works.
> > >
> > > BTW there is r100_restore_sanity() in drm/radeon/r100.c that explicitly
> > > handles the kexec case during init. So maybe an r600_restore_sanity()
> > > function is needed?
> > >
> > > (One of the advantages of using kexec (besides the much quicker boot
> > > time) is that the monitor is not switched off and then on during boot.
> > > I guess that advantage would be lost if the unload callback would be
> > > called.)
> > 
> > r100_restore_sanity() is basically a set of hacks (that gets called at
> > driver startup) to work around the fact that with kexec the drm driver
> > is not torn down correctly.  So we could add a bunch more asic
> > specific tear down sequences to deal with dpm (and all the other
> > engines on the GPU that may potentially cause problems if they are not
> > torn down properly), but that will just turn into a mess.  All of
> > these hacks also add latency to the driver load.  I think the best
> > solution would probably be to figure how to hook up the drm unload
> > callback to the shutdown method that kexec uses.
> 
> FYI the following (ugly) hack works for me. 

No. It still fails, although much more infrequently (~ on every 6-8 boot).

I begin to wonder if:
 [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) 
is an simple initialization bug that doesn't directly depend on kexec at
all.

-- 
Markus


Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-30 Thread Markus Trippelsdorf
On 2013.07.30 at 10:53 -0400, Alex Deucher wrote:
> On Tue, Jul 30, 2013 at 7:27 AM, Markus Trippelsdorf
>  wrote:
> > On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
> >> On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
> >>  wrote:
> >> > Alex Deucher  writes:
> >> >
> >> >> On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
> >> >>  wrote:
> >> >>>
> >> >>>
> >> >>> Alex Deucher  wrote:
> >> >>>>On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
> >> >>>> wrote:
> >> >>>>> On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
> >> >>>>>> On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
> >> >>>>>>  wrote:
> >> >>>>>> > On my test machine Xorg doesn't start anymore when I kexec into a
> >> >>>>>> > 3.11.0-rc3 kernel.
> >> >>>>>>
> >> >>>>>> With kexec, dpm doesn't get torn down properly which can result in a
> >> >>>>>> bad hardware state when the driver loads again.  Does the attached
> >> >>>>>> patch help?  It attempts to disable dpm at startup in case it wasn't
> >> >>>>>> torn down properly previously.
> >> >>>>>
> >> >>>>> dpm initialization now works, but unfortunately GPU acceleration
> >> >>>>still gets
> >> >>>>> disabled:
> >> >>>>
> >> >>>>Stupid kexec complicates things.  We need to make sure dpm is torn
> >> >>>>down before we init the rest of the GPU, but dpm needs get initialized
> >> >>>>later in the init process since it depends on certain other state from
> >> >>>>the driver.  I need to think about this for a bit.  I'm not sure of a
> >> >>>>good way to handle this.
> >> >>>
> >> >>> You might just want to implement a shutdown method that cleans things 
> >> >>> up properly.   At least as a first pass until you worry about things 
> >> >>> like kexec on panic.
> >> >>>
> >> >>> Or can you not shutdown the graphics stack on reboot because of the 
> >> >>> need to display the kernels shutdown progress?
> >> >>
> >> >> Does kexec actually call this shutdown method?  The driver implements
> >> >> appropriate clean-up measures if it's shutdown properly.
> >> >
> >> > Absoltuely.  All parts of the reboot path call ->shutdown.  Including
> >> > kexec.
> >> >
> >> > You don't get a device remove/hotunplug but unless this is a kexec on
> >> > panic ->shutdown is most definitely called.  Now I am talking about the
> >> > device layer/pci layer shutdown method I don't know how gpu drivers are
> >> > wired up.  GPU land was a little strange last I looked.  Hopefully it
> >> > isn't so strange that there is a method named shutdown that is not wired
> >> > up.
> >>
> >> It doesn't look like the drm infrastructure has a shutdown callback.
> >> The drm drivers register a drm_driver callback struct that includes an
> >> unload callback which takes care of all the device teardown (if you
> >> unload the module for example).  I don't know that it actually gets
> >> called at kexec time however.  I don't know enough about how kexec
> >> works.
> >
> > BTW there is r100_restore_sanity() in drm/radeon/r100.c that explicitly
> > handles the kexec case during init. So maybe an r600_restore_sanity()
> > function is needed?
> >
> > (One of the advantages of using kexec (besides the much quicker boot
> > time) is that the monitor is not switched off and then on during boot.
> > I guess that advantage would be lost if the unload callback would be
> > called.)
> 
> r100_restore_sanity() is basically a set of hacks (that gets called at
> driver startup) to work around the fact that with kexec the drm driver
> is not torn down correctly.  So we could add a bunch more asic
> specific tear down sequences to deal with dpm (and all the other
> engines on the GPU that may potentially cause problems if they are not
> torn down properly), but that will just turn into a mess.  All of
> these hacks also add latency to the driver load.  I think the best
> solution would probably be to figure how to hook up the drm unload
>

Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-30 Thread Markus Trippelsdorf
On 2013.07.30 at 13:27 +0200, Markus Trippelsdorf wrote:
> On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
> > On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
> >  wrote:
> > > Alex Deucher  writes:
> > >
> > >> On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
> > >>  wrote:
> > >>>
> > >>>
> > >>> Alex Deucher  wrote:
> > >>>>On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
> > >>>> wrote:
> > >>>>> On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
> > >>>>>> On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
> > >>>>>>  wrote:
> > >>>>>> > On my test machine Xorg doesn't start anymore when I kexec into a
> > >>>>>> > 3.11.0-rc3 kernel.
> > >>>>>>
> > >>>>>> With kexec, dpm doesn't get torn down properly which can result in a
> > >>>>>> bad hardware state when the driver loads again.  Does the attached
> > >>>>>> patch help?  It attempts to disable dpm at startup in case it wasn't
> > >>>>>> torn down properly previously.
> > >>>>>
> > >>>>> dpm initialization now works, but unfortunately GPU acceleration
> > >>>>still gets
> > >>>>> disabled:
> > >>>>
> > >>>>Stupid kexec complicates things.  We need to make sure dpm is torn
> > >>>>down before we init the rest of the GPU, but dpm needs get initialized
> > >>>>later in the init process since it depends on certain other state from
> > >>>>the driver.  I need to think about this for a bit.  I'm not sure of a
> > >>>>good way to handle this.
> > >>>
> > >>> You might just want to implement a shutdown method that cleans things 
> > >>> up properly.   At least as a first pass until you worry about things 
> > >>> like kexec on panic.
> > >>>
> > >>> Or can you not shutdown the graphics stack on reboot because of the 
> > >>> need to display the kernels shutdown progress?
> > >>
> > >> Does kexec actually call this shutdown method?  The driver implements
> > >> appropriate clean-up measures if it's shutdown properly.
> > >
> > > Absoltuely.  All parts of the reboot path call ->shutdown.  Including
> > > kexec.
> > >
> > > You don't get a device remove/hotunplug but unless this is a kexec on
> > > panic ->shutdown is most definitely called.  Now I am talking about the
> > > device layer/pci layer shutdown method I don't know how gpu drivers are
> > > wired up.  GPU land was a little strange last I looked.  Hopefully it
> > > isn't so strange that there is a method named shutdown that is not wired
> > > up.
> > 
> > It doesn't look like the drm infrastructure has a shutdown callback.
> > The drm drivers register a drm_driver callback struct that includes an
> > unload callback which takes care of all the device teardown (if you
> > unload the module for example).  I don't know that it actually gets
> > called at kexec time however.  I don't know enough about how kexec
> > works.
> 
> BTW there is r100_restore_sanity() in drm/radeon/r100.c that explicitly
> handles the kexec case during init. So maybe an r600_restore_sanity()
> function is needed?

Oh, I see r100_restore_sanity() gets also called for the other ASICs...

-- 
Markus


Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-30 Thread Markus Trippelsdorf
On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
> On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
>  wrote:
> > Alex Deucher  writes:
> >
> >> On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
> >>  wrote:
> >>>
> >>>
> >>> Alex Deucher  wrote:
> >>>>On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
> >>>> wrote:
> >>>>> On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
> >>>>>> On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
> >>>>>>  wrote:
> >>>>>> > On my test machine Xorg doesn't start anymore when I kexec into a
> >>>>>> > 3.11.0-rc3 kernel.
> >>>>>>
> >>>>>> With kexec, dpm doesn't get torn down properly which can result in a
> >>>>>> bad hardware state when the driver loads again.  Does the attached
> >>>>>> patch help?  It attempts to disable dpm at startup in case it wasn't
> >>>>>> torn down properly previously.
> >>>>>
> >>>>> dpm initialization now works, but unfortunately GPU acceleration
> >>>>still gets
> >>>>> disabled:
> >>>>
> >>>>Stupid kexec complicates things.  We need to make sure dpm is torn
> >>>>down before we init the rest of the GPU, but dpm needs get initialized
> >>>>later in the init process since it depends on certain other state from
> >>>>the driver.  I need to think about this for a bit.  I'm not sure of a
> >>>>good way to handle this.
> >>>
> >>> You might just want to implement a shutdown method that cleans things up 
> >>> properly.   At least as a first pass until you worry about things like 
> >>> kexec on panic.
> >>>
> >>> Or can you not shutdown the graphics stack on reboot because of the need 
> >>> to display the kernels shutdown progress?
> >>
> >> Does kexec actually call this shutdown method?  The driver implements
> >> appropriate clean-up measures if it's shutdown properly.
> >
> > Absoltuely.  All parts of the reboot path call ->shutdown.  Including
> > kexec.
> >
> > You don't get a device remove/hotunplug but unless this is a kexec on
> > panic ->shutdown is most definitely called.  Now I am talking about the
> > device layer/pci layer shutdown method I don't know how gpu drivers are
> > wired up.  GPU land was a little strange last I looked.  Hopefully it
> > isn't so strange that there is a method named shutdown that is not wired
> > up.
> 
> It doesn't look like the drm infrastructure has a shutdown callback.
> The drm drivers register a drm_driver callback struct that includes an
> unload callback which takes care of all the device teardown (if you
> unload the module for example).  I don't know that it actually gets
> called at kexec time however.  I don't know enough about how kexec
> works.

BTW there is r100_restore_sanity() in drm/radeon/r100.c that explicitly
handles the kexec case during init. So maybe an r600_restore_sanity()
function is needed?

(One of the advantages of using kexec (besides the much quicker boot
time) is that the monitor is not switched off and then on during boot.
I guess that advantage would be lost if the unload callback would be
called.)

-- 
Markus


Re: Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-30 Thread Markus Trippelsdorf
On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
 On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
 ebied...@xmission.com wrote:
  Alex Deucher alexdeuc...@gmail.com writes:
 
  On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
  ebied...@xmission.com wrote:
 
 
  Alex Deucher alexdeuc...@gmail.com wrote:
 On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
  On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On my test machine Xorg doesn't start anymore when I kexec into a
   3.11.0-rc3 kernel.
 
  With kexec, dpm doesn't get torn down properly which can result in a
  bad hardware state when the driver loads again.  Does the attached
  patch help?  It attempts to disable dpm at startup in case it wasn't
  torn down properly previously.
 
  dpm initialization now works, but unfortunately GPU acceleration
 still gets
  disabled:
 
 Stupid kexec complicates things.  We need to make sure dpm is torn
 down before we init the rest of the GPU, but dpm needs get initialized
 later in the init process since it depends on certain other state from
 the driver.  I need to think about this for a bit.  I'm not sure of a
 good way to handle this.
 
  You might just want to implement a shutdown method that cleans things up 
  properly.   At least as a first pass until you worry about things like 
  kexec on panic.
 
  Or can you not shutdown the graphics stack on reboot because of the need 
  to display the kernels shutdown progress?
 
  Does kexec actually call this shutdown method?  The driver implements
  appropriate clean-up measures if it's shutdown properly.
 
  Absoltuely.  All parts of the reboot path call -shutdown.  Including
  kexec.
 
  You don't get a device remove/hotunplug but unless this is a kexec on
  panic -shutdown is most definitely called.  Now I am talking about the
  device layer/pci layer shutdown method I don't know how gpu drivers are
  wired up.  GPU land was a little strange last I looked.  Hopefully it
  isn't so strange that there is a method named shutdown that is not wired
  up.
 
 It doesn't look like the drm infrastructure has a shutdown callback.
 The drm drivers register a drm_driver callback struct that includes an
 unload callback which takes care of all the device teardown (if you
 unload the module for example).  I don't know that it actually gets
 called at kexec time however.  I don't know enough about how kexec
 works.

BTW there is r100_restore_sanity() in drm/radeon/r100.c that explicitly
handles the kexec case during init. So maybe an r600_restore_sanity()
function is needed?

(One of the advantages of using kexec (besides the much quicker boot
time) is that the monitor is not switched off and then on during boot.
I guess that advantage would be lost if the unload callback would be
called.)

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-30 Thread Markus Trippelsdorf
On 2013.07.30 at 13:27 +0200, Markus Trippelsdorf wrote:
 On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
  On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
  ebied...@xmission.com wrote:
   Alex Deucher alexdeuc...@gmail.com writes:
  
   On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
   ebied...@xmission.com wrote:
  
  
   Alex Deucher alexdeuc...@gmail.com wrote:
  On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
   On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On my test machine Xorg doesn't start anymore when I kexec into a
3.11.0-rc3 kernel.
  
   With kexec, dpm doesn't get torn down properly which can result in a
   bad hardware state when the driver loads again.  Does the attached
   patch help?  It attempts to disable dpm at startup in case it wasn't
   torn down properly previously.
  
   dpm initialization now works, but unfortunately GPU acceleration
  still gets
   disabled:
  
  Stupid kexec complicates things.  We need to make sure dpm is torn
  down before we init the rest of the GPU, but dpm needs get initialized
  later in the init process since it depends on certain other state from
  the driver.  I need to think about this for a bit.  I'm not sure of a
  good way to handle this.
  
   You might just want to implement a shutdown method that cleans things 
   up properly.   At least as a first pass until you worry about things 
   like kexec on panic.
  
   Or can you not shutdown the graphics stack on reboot because of the 
   need to display the kernels shutdown progress?
  
   Does kexec actually call this shutdown method?  The driver implements
   appropriate clean-up measures if it's shutdown properly.
  
   Absoltuely.  All parts of the reboot path call -shutdown.  Including
   kexec.
  
   You don't get a device remove/hotunplug but unless this is a kexec on
   panic -shutdown is most definitely called.  Now I am talking about the
   device layer/pci layer shutdown method I don't know how gpu drivers are
   wired up.  GPU land was a little strange last I looked.  Hopefully it
   isn't so strange that there is a method named shutdown that is not wired
   up.
  
  It doesn't look like the drm infrastructure has a shutdown callback.
  The drm drivers register a drm_driver callback struct that includes an
  unload callback which takes care of all the device teardown (if you
  unload the module for example).  I don't know that it actually gets
  called at kexec time however.  I don't know enough about how kexec
  works.
 
 BTW there is r100_restore_sanity() in drm/radeon/r100.c that explicitly
 handles the kexec case during init. So maybe an r600_restore_sanity()
 function is needed?

Oh, I see r100_restore_sanity() gets also called for the other ASICs...

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-30 Thread Markus Trippelsdorf
On 2013.07.30 at 10:53 -0400, Alex Deucher wrote:
 On Tue, Jul 30, 2013 at 7:27 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
  On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
  ebied...@xmission.com wrote:
   Alex Deucher alexdeuc...@gmail.com writes:
  
   On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
   ebied...@xmission.com wrote:
  
  
   Alex Deucher alexdeuc...@gmail.com wrote:
  On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
   On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On my test machine Xorg doesn't start anymore when I kexec into a
3.11.0-rc3 kernel.
  
   With kexec, dpm doesn't get torn down properly which can result in a
   bad hardware state when the driver loads again.  Does the attached
   patch help?  It attempts to disable dpm at startup in case it wasn't
   torn down properly previously.
  
   dpm initialization now works, but unfortunately GPU acceleration
  still gets
   disabled:
  
  Stupid kexec complicates things.  We need to make sure dpm is torn
  down before we init the rest of the GPU, but dpm needs get initialized
  later in the init process since it depends on certain other state from
  the driver.  I need to think about this for a bit.  I'm not sure of a
  good way to handle this.
  
   You might just want to implement a shutdown method that cleans things 
   up properly.   At least as a first pass until you worry about things 
   like kexec on panic.
  
   Or can you not shutdown the graphics stack on reboot because of the 
   need to display the kernels shutdown progress?
  
   Does kexec actually call this shutdown method?  The driver implements
   appropriate clean-up measures if it's shutdown properly.
  
   Absoltuely.  All parts of the reboot path call -shutdown.  Including
   kexec.
  
   You don't get a device remove/hotunplug but unless this is a kexec on
   panic -shutdown is most definitely called.  Now I am talking about the
   device layer/pci layer shutdown method I don't know how gpu drivers are
   wired up.  GPU land was a little strange last I looked.  Hopefully it
   isn't so strange that there is a method named shutdown that is not wired
   up.
 
  It doesn't look like the drm infrastructure has a shutdown callback.
  The drm drivers register a drm_driver callback struct that includes an
  unload callback which takes care of all the device teardown (if you
  unload the module for example).  I don't know that it actually gets
  called at kexec time however.  I don't know enough about how kexec
  works.
 
  BTW there is r100_restore_sanity() in drm/radeon/r100.c that explicitly
  handles the kexec case during init. So maybe an r600_restore_sanity()
  function is needed?
 
  (One of the advantages of using kexec (besides the much quicker boot
  time) is that the monitor is not switched off and then on during boot.
  I guess that advantage would be lost if the unload callback would be
  called.)
 
 r100_restore_sanity() is basically a set of hacks (that gets called at
 driver startup) to work around the fact that with kexec the drm driver
 is not torn down correctly.  So we could add a bunch more asic
 specific tear down sequences to deal with dpm (and all the other
 engines on the GPU that may potentially cause problems if they are not
 torn down properly), but that will just turn into a mess.  All of
 these hacks also add latency to the driver load.  I think the best
 solution would probably be to figure how to hook up the drm unload
 callback to the shutdown method that kexec uses.

FYI the following (ugly) hack works for me. 
(If I don't comment out radeon_fbdev_fini(rdev) kexec will hang.)

diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index 75349cd..13e2988 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -3947,20 +3947,6 @@ void r100_fini(struct radeon_device *rdev)
  */
 void r100_restore_sanity(struct radeon_device *rdev)
 {
-   u32 tmp;
-
-   tmp = RREG32(RADEON_CP_CSQ_CNTL);
-   if (tmp) {
-   WREG32(RADEON_CP_CSQ_CNTL, 0);
-   }
-   tmp = RREG32(RADEON_CP_RB_CNTL);
-   if (tmp) {
-   WREG32(RADEON_CP_RB_CNTL, 0);
-   }
-   tmp = RREG32(RADEON_SCRATCH_UMSK);
-   if (tmp) {
-   WREG32(RADEON_SCRATCH_UMSK, 0);
-   }
 }
 
 int r100_init(struct radeon_device *rdev)
diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
b/drivers/gpu/drm/radeon/radeon_display.c
index c2b67b4..79b38e2 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -1405,7 +1405,7 @@ int radeon_modeset_init(struct radeon_device *rdev)
 
 void radeon_modeset_fini(struct radeon_device *rdev)
 {
-   radeon_fbdev_fini(rdev);
+// radeon_fbdev_fini(rdev);
kfree(rdev

Re: Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-30 Thread Markus Trippelsdorf
On 2013.07.30 at 20:46 +0200, Markus Trippelsdorf wrote:
 On 2013.07.30 at 10:53 -0400, Alex Deucher wrote:
  On Tue, Jul 30, 2013 at 7:27 AM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
   On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
   ebied...@xmission.com wrote:
Alex Deucher alexdeuc...@gmail.com writes:
   
On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
ebied...@xmission.com wrote:
   
   
Alex Deucher alexdeuc...@gmail.com wrote:
   On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On my test machine Xorg doesn't start anymore when I kexec into 
 a
 3.11.0-rc3 kernel.
   
With kexec, dpm doesn't get torn down properly which can result 
in a
bad hardware state when the driver loads again.  Does the attached
patch help?  It attempts to disable dpm at startup in case it 
wasn't
torn down properly previously.
   
dpm initialization now works, but unfortunately GPU acceleration
   still gets
disabled:
   
   Stupid kexec complicates things.  We need to make sure dpm is torn
   down before we init the rest of the GPU, but dpm needs get 
   initialized
   later in the init process since it depends on certain other state 
   from
   the driver.  I need to think about this for a bit.  I'm not sure of a
   good way to handle this.
   
You might just want to implement a shutdown method that cleans 
things up properly.   At least as a first pass until you worry about 
things like kexec on panic.
   
Or can you not shutdown the graphics stack on reboot because of the 
need to display the kernels shutdown progress?
   
Does kexec actually call this shutdown method?  The driver implements
appropriate clean-up measures if it's shutdown properly.
   
Absoltuely.  All parts of the reboot path call -shutdown.  Including
kexec.
   
You don't get a device remove/hotunplug but unless this is a kexec on
panic -shutdown is most definitely called.  Now I am talking about the
device layer/pci layer shutdown method I don't know how gpu drivers are
wired up.  GPU land was a little strange last I looked.  Hopefully it
isn't so strange that there is a method named shutdown that is not 
wired
up.
  
   It doesn't look like the drm infrastructure has a shutdown callback.
   The drm drivers register a drm_driver callback struct that includes an
   unload callback which takes care of all the device teardown (if you
   unload the module for example).  I don't know that it actually gets
   called at kexec time however.  I don't know enough about how kexec
   works.
  
   BTW there is r100_restore_sanity() in drm/radeon/r100.c that explicitly
   handles the kexec case during init. So maybe an r600_restore_sanity()
   function is needed?
  
   (One of the advantages of using kexec (besides the much quicker boot
   time) is that the monitor is not switched off and then on during boot.
   I guess that advantage would be lost if the unload callback would be
   called.)
  
  r100_restore_sanity() is basically a set of hacks (that gets called at
  driver startup) to work around the fact that with kexec the drm driver
  is not torn down correctly.  So we could add a bunch more asic
  specific tear down sequences to deal with dpm (and all the other
  engines on the GPU that may potentially cause problems if they are not
  torn down properly), but that will just turn into a mess.  All of
  these hacks also add latency to the driver load.  I think the best
  solution would probably be to figure how to hook up the drm unload
  callback to the shutdown method that kexec uses.
 
 FYI the following (ugly) hack works for me. 

No. It still fails, although much more infrequently (~ on every 6-8 boot).

I begin to wonder if:
 [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) 
is an simple initialization bug that doesn't directly depend on kexec at
all.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-29 Thread Markus Trippelsdorf
On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
> On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
>  wrote:
> > Alex Deucher  writes:
> >
> >> On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
> >>  wrote:
> >>>
> >>>
> >>> Alex Deucher  wrote:
> >>>>On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
> >>>> wrote:
> >>>>> On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
> >>>>>> On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
> >>>>>>  wrote:
> >>>>>> > On my test machine Xorg doesn't start anymore when I kexec into a
> >>>>>> > 3.11.0-rc3 kernel.
> >>>>>>
> >>>>>> With kexec, dpm doesn't get torn down properly which can result in a
> >>>>>> bad hardware state when the driver loads again.  Does the attached
> >>>>>> patch help?  It attempts to disable dpm at startup in case it wasn't
> >>>>>> torn down properly previously.
> >>>>>
> >>>>> dpm initialization now works, but unfortunately GPU acceleration
> >>>>still gets
> >>>>> disabled:
> >>>>
> >>>>Stupid kexec complicates things.  We need to make sure dpm is torn
> >>>>down before we init the rest of the GPU, but dpm needs get initialized
> >>>>later in the init process since it depends on certain other state from
> >>>>the driver.  I need to think about this for a bit.  I'm not sure of a
> >>>>good way to handle this.
> >>>
> >>> You might just want to implement a shutdown method that cleans things up 
> >>> properly.   At least as a first pass until you worry about things like 
> >>> kexec on panic.
> >>>
> >>> Or can you not shutdown the graphics stack on reboot because of the need 
> >>> to display the kernels shutdown progress?
> >>
> >> Does kexec actually call this shutdown method?  The driver implements
> >> appropriate clean-up measures if it's shutdown properly.
> >
> > Absoltuely.  All parts of the reboot path call ->shutdown.  Including
> > kexec.
> >
> > You don't get a device remove/hotunplug but unless this is a kexec on
> > panic ->shutdown is most definitely called.  Now I am talking about the
> > device layer/pci layer shutdown method I don't know how gpu drivers are
> > wired up.  GPU land was a little strange last I looked.  Hopefully it
> > isn't so strange that there is a method named shutdown that is not wired
> > up.
> 
> It doesn't look like the drm infrastructure has a shutdown callback.
> The drm drivers register a drm_driver callback struct that includes an
> unload callback which takes care of all the device teardown (if you
> unload the module for example).  I don't know that it actually gets
> called at kexec time however.  I don't know enough about how kexec
> works.
> 
> Markus, does everything work ok after a reboot?  Is it just kexec that
> is a problem?

Yes.

-- 
Markus


Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-29 Thread Markus Trippelsdorf
On 2013.07.29 at 18:14 +0200, Joshua C. wrote:
> 
> This error message seems similar to mine "[drm:r600_uvd_ring_test]
> *ERROR* radeon: ring 5 test failed (0xCAFEDEAD)" Bugzilla:
> https://bugs.freedesktop.org/show_bug.cgi?id=67276 In my case I blame
> another commit for this. Are these bugs related?

I guess not, because reverting commit 9cc2e0e9f13 doesn't fix the issue
for me. 
Can you check if reverting commit f5d9b7f0f9 does fixes the problem for
you?

-- 
Markus


Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-29 Thread Markus Trippelsdorf
On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
> On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
>  wrote:
> > On my test machine Xorg doesn't start anymore when I kexec into a
> > 3.11.0-rc3 kernel.
> 
> With kexec, dpm doesn't get torn down properly which can result in a
> bad hardware state when the driver loads again.  Does the attached
> patch help?  It attempts to disable dpm at startup in case it wasn't
> torn down properly previously.

dpm initialization now works, but unfortunately GPU acceleration still gets
disabled:

[drm] Initialized drm 1.1.0 20060810

[135/1104]
[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D).
[drm] register mmio base: 0xFBEE
[drm] register mmio size: 65536
ATOM BIOS: 113
radeon :01:05.0: VRAM: 128M 0xC000 - 0xC7FF (128M 
used)
radeon :01:05.0: GTT: 512M 0xA000 - 0xBFFF
[drm] Detected VRAM RAM=128M, BAR=128M
[drm] RAM width 32bits DDR
[TTM] Zone  kernel: Available graphics memory: 4082356 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 128M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] Loading RS780 Microcode
[drm] PCIE GART of 512M enabled (table at 0xC004).
radeon :01:05.0: WB enabled
radeon :01:05.0: fence driver on ring 0 use gpu addr 0xac00 and 
cpu addr 0x880215c30c00
radeon :01:05.0: fence driver on ring 3 use gpu addr 0xac0c and 
cpu addr 0x880215c30c0c
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
[drm] radeon: irq initialized.
radeon :01:05.0: setting latency timer to 64
[drm] ring test on 0 succeeded in 1 usecs
[drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
radeon :01:05.0: disabling GPU acceleration
radeon :01:05.0: 8802161cfc00 unpin not necessary
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   VGA-1
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm] CRT1: INTERNAL_KLDSCP_DAC1
[drm] Connector 1:
[drm]   DVI-D-1
[drm]   HPD3
[drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[drm]   Encoders:
[drm] DFP3: INTERNAL_KLDSCP_LVTMA
== power state 0 ==
ui class: none
internal class: boot 
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 2
power level 1sclk: 5 vddc_index: 2
status: c r b 
== power state 1 ==
ui class: performance
internal class: none
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 7 vddc_index: 2
status: 
== power state 2 ==
ui class: none
internal class: uvd 
caps: video 
uvdvclk: 53300 dclk: 4
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 5 vddc_index: 1
status: 
switching from power state:
ui class: none
internal class: boot 
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 2
power level 1sclk: 5 vddc_index: 2
status: c b 
switching to power state:
ui class: performance
internal class: none
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 7 vddc_index: 2
status: r 
[drm] radeon: dpm initialized
[drm] fb mappable at 0xF0142000
[drm] vram apper at 0xF000
[drm] size 7299072
[drm] fb depth is 24
[drm]pitch is 6912
fbcon: radeondrmfb (fb0)

-- 
Markus


Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-29 Thread Markus Trippelsdorf
On my test machine Xorg doesn't start anymore when I kexec into a
3.11.0-rc3 kernel.
On cold boot everything is fine:

[drm] Initialized drm 1.1.0 20060810
[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D).
[drm] register mmio base: 0xFBEE
[drm] register mmio size: 65536
ATOM BIOS: 113
radeon :01:05.0: VRAM: 128M 0xC000 - 0xC7FF (128M 
used)
radeon :01:05.0: GTT: 512M 0xA000 - 0xBFFF
[drm] Detected VRAM RAM=128M, BAR=128M
[drm] RAM width 32bits DDR
[TTM] Zone  kernel: Available graphics memory: 4082356 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 128M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] Loading RS780 Microcode
[drm] PCIE GART of 512M enabled (table at 0xC004).
radeon :01:05.0: WB enabled
radeon :01:05.0: fence driver on ring 0 use gpu addr 0xac00 and 
cpu addr 0x880215c45c00
radeon :01:05.0: fence driver on ring 3 use gpu addr 0xac0c and 
cpu addr 0x880215c45c0c
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
[drm] radeon: irq initialized.
radeon :01:05.0: setting latency timer to 64
[drm] ring test on 0 succeeded in 0 usecs
[drm] ring test on 3 succeeded in 1 usecs
[drm] ib test on ring 0 succeeded in 0 usecs
[drm] ib test on ring 3 succeeded in 0 usecs
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   VGA-1
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm] CRT1: INTERNAL_KLDSCP_DAC1
[drm] Connector 1:
[drm]   DVI-D-1
[drm]   HPD3
[drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[drm]   Encoders:
[drm] DFP3: INTERNAL_KLDSCP_LVTMA
== power state 0 ==
ui class: none
internal class: boot 
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 2
power level 1sclk: 5 vddc_index: 2
status: c r b 
== power state 1 ==
ui class: performance
internal class: none
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 7 vddc_index: 2
status: 
== power state 2 ==
ui class: none
internal class: uvd 
caps: video 
uvdvclk: 53300 dclk: 4
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 5 vddc_index: 1
status: 
switching from power state:
ui class: none
internal class: boot 
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 2
power level 1sclk: 5 vddc_index: 2
status: c b 
switching to power state:
ui class: performance
internal class: none
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 7 vddc_index: 2
status: r 
[drm] radeon: dpm initialized
[drm] fb mappable at 0xF0142000
[drm] vram apper at 0xF000
[drm] size 7299072
[drm] fb depth is 24
[drm]pitch is 6912

But after I run kexec things go wrong:

[drm] Initialized drm 1.1.0 20060810
[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D).
[drm] register mmio base: 0xFBEE
[drm] register mmio size: 65536
ATOM BIOS: 113
radeon :01:05.0: VRAM: 128M 0xC000 - 0xC7FF (128M 
used)
radeon :01:05.0: GTT: 512M 0xA000 - 0xBFFF
[drm] Detected VRAM RAM=128M, BAR=128M
[drm] RAM width 32bits DDR
[TTM] Zone  kernel: Available graphics memory: 4082356 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 128M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] Loading RS780 Microcode
[drm] PCIE GART of 512M enabled (table at 0xC004).
radeon :01:05.0: WB enabled
radeon :01:05.0: fence driver on ring 0 use gpu addr 0xac00 and 
cpu addr 0x880215c45c00
radeon :01:05.0: fence driver on ring 3 use gpu addr 0xac0c and 
cpu addr 0x880215c45c0c
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
[drm] radeon: irq initialized.
radeon :01:05.0: setting latency timer to 64
[drm] ring test on 0 succeeded in 0 usecs
[drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
radeon :01:05.0: disabling GPU 

Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-29 Thread Markus Trippelsdorf
On my test machine Xorg doesn't start anymore when I kexec into a
3.11.0-rc3 kernel.
On cold boot everything is fine:

[drm] Initialized drm 1.1.0 20060810
[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D).
[drm] register mmio base: 0xFBEE
[drm] register mmio size: 65536
ATOM BIOS: 113
radeon :01:05.0: VRAM: 128M 0xC000 - 0xC7FF (128M 
used)
radeon :01:05.0: GTT: 512M 0xA000 - 0xBFFF
[drm] Detected VRAM RAM=128M, BAR=128M
[drm] RAM width 32bits DDR
[TTM] Zone  kernel: Available graphics memory: 4082356 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 128M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] Loading RS780 Microcode
[drm] PCIE GART of 512M enabled (table at 0xC004).
radeon :01:05.0: WB enabled
radeon :01:05.0: fence driver on ring 0 use gpu addr 0xac00 and 
cpu addr 0x880215c45c00
radeon :01:05.0: fence driver on ring 3 use gpu addr 0xac0c and 
cpu addr 0x880215c45c0c
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
[drm] radeon: irq initialized.
radeon :01:05.0: setting latency timer to 64
[drm] ring test on 0 succeeded in 0 usecs
[drm] ring test on 3 succeeded in 1 usecs
[drm] ib test on ring 0 succeeded in 0 usecs
[drm] ib test on ring 3 succeeded in 0 usecs
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   VGA-1
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm] CRT1: INTERNAL_KLDSCP_DAC1
[drm] Connector 1:
[drm]   DVI-D-1
[drm]   HPD3
[drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[drm]   Encoders:
[drm] DFP3: INTERNAL_KLDSCP_LVTMA
== power state 0 ==
ui class: none
internal class: boot 
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 2
power level 1sclk: 5 vddc_index: 2
status: c r b 
== power state 1 ==
ui class: performance
internal class: none
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 7 vddc_index: 2
status: 
== power state 2 ==
ui class: none
internal class: uvd 
caps: video 
uvdvclk: 53300 dclk: 4
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 5 vddc_index: 1
status: 
switching from power state:
ui class: none
internal class: boot 
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 2
power level 1sclk: 5 vddc_index: 2
status: c b 
switching to power state:
ui class: performance
internal class: none
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 7 vddc_index: 2
status: r 
[drm] radeon: dpm initialized
[drm] fb mappable at 0xF0142000
[drm] vram apper at 0xF000
[drm] size 7299072
[drm] fb depth is 24
[drm]pitch is 6912

But after I run kexec things go wrong:

[drm] Initialized drm 1.1.0 20060810
[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D).
[drm] register mmio base: 0xFBEE
[drm] register mmio size: 65536
ATOM BIOS: 113
radeon :01:05.0: VRAM: 128M 0xC000 - 0xC7FF (128M 
used)
radeon :01:05.0: GTT: 512M 0xA000 - 0xBFFF
[drm] Detected VRAM RAM=128M, BAR=128M
[drm] RAM width 32bits DDR
[TTM] Zone  kernel: Available graphics memory: 4082356 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 128M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] Loading RS780 Microcode
[drm] PCIE GART of 512M enabled (table at 0xC004).
radeon :01:05.0: WB enabled
radeon :01:05.0: fence driver on ring 0 use gpu addr 0xac00 and 
cpu addr 0x880215c45c00
radeon :01:05.0: fence driver on ring 3 use gpu addr 0xac0c and 
cpu addr 0x880215c45c0c
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
[drm] radeon: irq initialized.
radeon :01:05.0: setting latency timer to 64
[drm] ring test on 0 succeeded in 0 usecs
[drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
radeon :01:05.0: disabling GPU 

Re: Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-29 Thread Markus Trippelsdorf
On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
 On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On my test machine Xorg doesn't start anymore when I kexec into a
  3.11.0-rc3 kernel.
 
 With kexec, dpm doesn't get torn down properly which can result in a
 bad hardware state when the driver loads again.  Does the attached
 patch help?  It attempts to disable dpm at startup in case it wasn't
 torn down properly previously.

dpm initialization now works, but unfortunately GPU acceleration still gets
disabled:

[drm] Initialized drm 1.1.0 20060810

[135/1104]
[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D).
[drm] register mmio base: 0xFBEE
[drm] register mmio size: 65536
ATOM BIOS: 113
radeon :01:05.0: VRAM: 128M 0xC000 - 0xC7FF (128M 
used)
radeon :01:05.0: GTT: 512M 0xA000 - 0xBFFF
[drm] Detected VRAM RAM=128M, BAR=128M
[drm] RAM width 32bits DDR
[TTM] Zone  kernel: Available graphics memory: 4082356 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 128M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] Loading RS780 Microcode
[drm] PCIE GART of 512M enabled (table at 0xC004).
radeon :01:05.0: WB enabled
radeon :01:05.0: fence driver on ring 0 use gpu addr 0xac00 and 
cpu addr 0x880215c30c00
radeon :01:05.0: fence driver on ring 3 use gpu addr 0xac0c and 
cpu addr 0x880215c30c0c
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
[drm] radeon: irq initialized.
radeon :01:05.0: setting latency timer to 64
[drm] ring test on 0 succeeded in 1 usecs
[drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
radeon :01:05.0: disabling GPU acceleration
radeon :01:05.0: 8802161cfc00 unpin not necessary
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   VGA-1
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm] CRT1: INTERNAL_KLDSCP_DAC1
[drm] Connector 1:
[drm]   DVI-D-1
[drm]   HPD3
[drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[drm]   Encoders:
[drm] DFP3: INTERNAL_KLDSCP_LVTMA
== power state 0 ==
ui class: none
internal class: boot 
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 2
power level 1sclk: 5 vddc_index: 2
status: c r b 
== power state 1 ==
ui class: performance
internal class: none
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 7 vddc_index: 2
status: 
== power state 2 ==
ui class: none
internal class: uvd 
caps: video 
uvdvclk: 53300 dclk: 4
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 5 vddc_index: 1
status: 
switching from power state:
ui class: none
internal class: boot 
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 2
power level 1sclk: 5 vddc_index: 2
status: c b 
switching to power state:
ui class: performance
internal class: none
caps: video 
uvdvclk: 0 dclk: 0
power level 0sclk: 5 vddc_index: 1
power level 1sclk: 7 vddc_index: 2
status: r 
[drm] radeon: dpm initialized
[drm] fb mappable at 0xF0142000
[drm] vram apper at 0xF000
[drm] size 7299072
[drm] fb depth is 24
[drm]pitch is 6912
fbcon: radeondrmfb (fb0)

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Commit f5d9b7f0f9 (fix r600_enable_sclk_control()) causes kexec issues

2013-07-29 Thread Markus Trippelsdorf
On 2013.07.29 at 15:53 -0400, Alex Deucher wrote:
 On Mon, Jul 29, 2013 at 2:10 PM, Eric W. Biederman
 ebied...@xmission.com wrote:
  Alex Deucher alexdeuc...@gmail.com writes:
 
  On Mon, Jul 29, 2013 at 11:50 AM, Eric W. Biederman
  ebied...@xmission.com wrote:
 
 
  Alex Deucher alexdeuc...@gmail.com wrote:
 On Mon, Jul 29, 2013 at 10:09 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2013.07.29 at 09:58 -0400, Alex Deucher wrote:
  On Mon, Jul 29, 2013 at 3:51 AM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On my test machine Xorg doesn't start anymore when I kexec into a
   3.11.0-rc3 kernel.
 
  With kexec, dpm doesn't get torn down properly which can result in a
  bad hardware state when the driver loads again.  Does the attached
  patch help?  It attempts to disable dpm at startup in case it wasn't
  torn down properly previously.
 
  dpm initialization now works, but unfortunately GPU acceleration
 still gets
  disabled:
 
 Stupid kexec complicates things.  We need to make sure dpm is torn
 down before we init the rest of the GPU, but dpm needs get initialized
 later in the init process since it depends on certain other state from
 the driver.  I need to think about this for a bit.  I'm not sure of a
 good way to handle this.
 
  You might just want to implement a shutdown method that cleans things up 
  properly.   At least as a first pass until you worry about things like 
  kexec on panic.
 
  Or can you not shutdown the graphics stack on reboot because of the need 
  to display the kernels shutdown progress?
 
  Does kexec actually call this shutdown method?  The driver implements
  appropriate clean-up measures if it's shutdown properly.
 
  Absoltuely.  All parts of the reboot path call -shutdown.  Including
  kexec.
 
  You don't get a device remove/hotunplug but unless this is a kexec on
  panic -shutdown is most definitely called.  Now I am talking about the
  device layer/pci layer shutdown method I don't know how gpu drivers are
  wired up.  GPU land was a little strange last I looked.  Hopefully it
  isn't so strange that there is a method named shutdown that is not wired
  up.
 
 It doesn't look like the drm infrastructure has a shutdown callback.
 The drm drivers register a drm_driver callback struct that includes an
 unload callback which takes care of all the device teardown (if you
 unload the module for example).  I don't know that it actually gets
 called at kexec time however.  I don't know enough about how kexec
 works.
 
 Markus, does everything work ok after a reboot?  Is it just kexec that
 is a problem?

Yes.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Commit ecff665f5e3f (drm/ttm: make ttm reservation calls...) causes system hang on Radeon RS780

2013-07-10 Thread Markus Trippelsdorf
On 2013.07.10 at 11:56 +0200, Maarten Lankhorst wrote:
> Op 10-07-13 11:46, Markus Trippelsdorf schreef:
> > On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote:
> >> Op 10-07-13 11:22, Markus Trippelsdorf schreef:
> >>> By simply copy/pasting a big document under LibreOffice my system hangs
> >>> itself up. Only a hard reset gets it working again.
> >>> see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551
> >>>
> >>> I've bisected the issue to:
> >>>
> >>> commit ecff665f5e3f1c6909353e00b9420e45ae23d995
> >>> Author: Maarten Lankhorst 
> >>> Date:   Thu Jun 27 13:48:17 2013 +0200
> >>>
> >>> drm/ttm: make ttm reservation calls behave like reservation calls
> >>> 
> >>> This commit converts the source of the val_seq counter to
> >>> the ww_mutex api. The reservation objects are converted later,
> >>> because there is still a lockdep splat in nouveau that has to
> >>> resolved first.
> >>> 
> >>> Signed-off-by: Maarten Lankhorst 
> >>> Reviewed-by: Jerome Glisse 
> >>> Signed-off-by: Dave Airlie 
> >> Hey,
> >>
> >> Can you try current head with CONFIG_PROVE_LOCKING set and post the
> >> lockdep splat from dmesg, if any? If there is any locking issue
> >> lockdep should warn about it.  Lockdep will turn itself off after the
> >> first splat, so if the lockdep splat happens before running the
> >> affected parts those will have to be fixed first.
> > There was an unrelated EDAC lockdep splat, so I simply disabled it.
> >
> > This is what I get:
> >
> > Jul 10 11:40:44 x4 kernel: 
> > Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
> > Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
> > Jul 10 11:40:44 x4 kernel: 
> > Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still 
> > held!
> > Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
> > Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: 
> > [] radeon_bo_list_validate+0x20/0xd0
> > Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: 
> > [] ttm_eu_reserve_buffers+0x126/0x4b0
> > Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
> > Jul 10 11:40:53 x4 kernel: Emergency Sync complete
> >
> Thanks, exactly what I thought. I missed a backoff somewhere..
> 
> Does the below patch fix it?

Yes. Thank you for your quick reply.

-- 
Markus


Commit ecff665f5e3f (drm/ttm: make ttm reservation calls...) causes system hang on Radeon RS780

2013-07-10 Thread Markus Trippelsdorf
On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote:
> Op 10-07-13 11:22, Markus Trippelsdorf schreef:
> > By simply copy/pasting a big document under LibreOffice my system hangs
> > itself up. Only a hard reset gets it working again.
> > see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551
> >
> > I've bisected the issue to:
> >
> > commit ecff665f5e3f1c6909353e00b9420e45ae23d995
> > Author: Maarten Lankhorst 
> > Date:   Thu Jun 27 13:48:17 2013 +0200
> >
> > drm/ttm: make ttm reservation calls behave like reservation calls
> > 
> > This commit converts the source of the val_seq counter to
> > the ww_mutex api. The reservation objects are converted later,
> > because there is still a lockdep splat in nouveau that has to
> > resolved first.
> > 
> > Signed-off-by: Maarten Lankhorst 
> > Reviewed-by: Jerome Glisse 
> > Signed-off-by: Dave Airlie 
> Hey,
> 
> Can you try current head with CONFIG_PROVE_LOCKING set and post the
> lockdep splat from dmesg, if any? If there is any locking issue
> lockdep should warn about it.  Lockdep will turn itself off after the
> first splat, so if the lockdep splat happens before running the
> affected parts those will have to be fixed first.

There was an unrelated EDAC lockdep splat, so I simply disabled it.

This is what I get:

Jul 10 11:40:44 x4 kernel: 
Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
Jul 10 11:40:44 x4 kernel: 
Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: 
[] radeon_bo_list_validate+0x20/0xd0
Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: 
[] ttm_eu_reserve_buffers+0x126/0x4b0
Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
Jul 10 11:40:53 x4 kernel: Emergency Sync complete

-- 
Markus


Commit ecff665f5e3f (drm/ttm: make ttm reservation calls...) causes system hang on Radeon RS780

2013-07-10 Thread Markus Trippelsdorf
By simply copy/pasting a big document under LibreOffice my system hangs
itself up. Only a hard reset gets it working again.
see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551

I've bisected the issue to:

commit ecff665f5e3f1c6909353e00b9420e45ae23d995
Author: Maarten Lankhorst 
Date:   Thu Jun 27 13:48:17 2013 +0200

drm/ttm: make ttm reservation calls behave like reservation calls

This commit converts the source of the val_seq counter to
the ww_mutex api. The reservation objects are converted later,
because there is still a lockdep splat in nouveau that has to
resolved first.

Signed-off-by: Maarten Lankhorst 
Reviewed-by: Jerome Glisse 
Signed-off-by: Dave Airlie 

-- 
Markus


Commit ecff665f5e3f (drm/ttm: make ttm reservation calls...) causes system hang on Radeon RS780

2013-07-10 Thread Markus Trippelsdorf
By simply copy/pasting a big document under LibreOffice my system hangs
itself up. Only a hard reset gets it working again.
see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551

I've bisected the issue to:

commit ecff665f5e3f1c6909353e00b9420e45ae23d995
Author: Maarten Lankhorst m.b.lankho...@gmail.com
Date:   Thu Jun 27 13:48:17 2013 +0200

drm/ttm: make ttm reservation calls behave like reservation calls

This commit converts the source of the val_seq counter to
the ww_mutex api. The reservation objects are converted later,
because there is still a lockdep splat in nouveau that has to
resolved first.

Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
Reviewed-by: Jerome Glisse jgli...@redhat.com
Signed-off-by: Dave Airlie airl...@redhat.com

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Commit ecff665f5e3f (drm/ttm: make ttm reservation calls...) causes system hang on Radeon RS780

2013-07-10 Thread Markus Trippelsdorf
On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote:
 Op 10-07-13 11:22, Markus Trippelsdorf schreef:
  By simply copy/pasting a big document under LibreOffice my system hangs
  itself up. Only a hard reset gets it working again.
  see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551
 
  I've bisected the issue to:
 
  commit ecff665f5e3f1c6909353e00b9420e45ae23d995
  Author: Maarten Lankhorst m.b.lankho...@gmail.com
  Date:   Thu Jun 27 13:48:17 2013 +0200
 
  drm/ttm: make ttm reservation calls behave like reservation calls
  
  This commit converts the source of the val_seq counter to
  the ww_mutex api. The reservation objects are converted later,
  because there is still a lockdep splat in nouveau that has to
  resolved first.
  
  Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
  Reviewed-by: Jerome Glisse jgli...@redhat.com
  Signed-off-by: Dave Airlie airl...@redhat.com
 Hey,
 
 Can you try current head with CONFIG_PROVE_LOCKING set and post the
 lockdep splat from dmesg, if any? If there is any locking issue
 lockdep should warn about it.  Lockdep will turn itself off after the
 first splat, so if the lockdep splat happens before running the
 affected parts those will have to be fixed first.

There was an unrelated EDAC lockdep splat, so I simply disabled it.

This is what I get:

Jul 10 11:40:44 x4 kernel: 
Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
Jul 10 11:40:44 x4 kernel: 
Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: 
[813279f0] radeon_bo_list_validate+0x20/0xd0
Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: 
[81309306] ttm_eu_reserve_buffers+0x126/0x4b0
Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
Jul 10 11:40:53 x4 kernel: Emergency Sync complete

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Commit ecff665f5e3f (drm/ttm: make ttm reservation calls...) causes system hang on Radeon RS780

2013-07-10 Thread Markus Trippelsdorf
On 2013.07.10 at 11:56 +0200, Maarten Lankhorst wrote:
 Op 10-07-13 11:46, Markus Trippelsdorf schreef:
  On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote:
  Op 10-07-13 11:22, Markus Trippelsdorf schreef:
  By simply copy/pasting a big document under LibreOffice my system hangs
  itself up. Only a hard reset gets it working again.
  see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551
 
  I've bisected the issue to:
 
  commit ecff665f5e3f1c6909353e00b9420e45ae23d995
  Author: Maarten Lankhorst m.b.lankho...@gmail.com
  Date:   Thu Jun 27 13:48:17 2013 +0200
 
  drm/ttm: make ttm reservation calls behave like reservation calls
  
  This commit converts the source of the val_seq counter to
  the ww_mutex api. The reservation objects are converted later,
  because there is still a lockdep splat in nouveau that has to
  resolved first.
  
  Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
  Reviewed-by: Jerome Glisse jgli...@redhat.com
  Signed-off-by: Dave Airlie airl...@redhat.com
  Hey,
 
  Can you try current head with CONFIG_PROVE_LOCKING set and post the
  lockdep splat from dmesg, if any? If there is any locking issue
  lockdep should warn about it.  Lockdep will turn itself off after the
  first splat, so if the lockdep splat happens before running the
  affected parts those will have to be fixed first.
  There was an unrelated EDAC lockdep splat, so I simply disabled it.
 
  This is what I get:
 
  Jul 10 11:40:44 x4 kernel: 
  Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
  Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
  Jul 10 11:40:44 x4 kernel: 
  Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still 
  held!
  Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
  Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: 
  [813279f0] radeon_bo_list_validate+0x20/0xd0
  Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: 
  [81309306] ttm_eu_reserve_buffers+0x126/0x4b0
  Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
  Jul 10 11:40:53 x4 kernel: Emergency Sync complete
 
 Thanks, exactly what I thought. I missed a backoff somewhere..
 
 Does the below patch fix it?

Yes. Thank you for your quick reply.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[pull] radeon drm-next-3.11

2013-07-02 Thread Markus Trippelsdorf
On 2013.07.01 at 17:01 -0400, alexdeucher at gmail.com wrote:
> From: Alex Deucher 
> 
> Hi Dave,
> 
> A few more patches for 3.11:
> - add debugfs interface to check current DPM state
> - Fix a bug that caused problems with DPM on BTC+ asics.
> 
> The following changes since commit f7d452f4fd5d86f764807a1234a407deb5b105ef:
> 
>   Merge branch 'drm-nouveau-next' of 
> git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next (2013-07-01 
> 14:10:20 +1000)
> 
> are available in the git repository at:
> 
>   git://people.freedesktop.org/~agd5f/linux drm-next-3.11
> 
> Alex Deucher (12):
>   drm/radeon: remove sumo dpm/uvd bringup leftovers
>   drm/radeon/atom: fix endian bug in radeon_atom_init_mc_reg_table()
>   drm/radeon: fix typo in radeon_atom_init_mc_reg_table()
>   drm/radeon/dpm: re-enable state transitions for BTC
>   drm/radeon/dpm: re-enable state transitions for Cayman
>   drm/radeon/dpm: add infrastructure to support debugfs info
>   drm/radeon/dpm: add debugfs support for rv6xx
>   drm/radeon/dpm: add debugfs support for 7xx/evergreen/btc

Looks like you forgot to add debugfs support for rs780:

diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
b/drivers/gpu/drm/radeon/radeon_asic.c
index a5b244d..ca4f928 100644
--- a/drivers/gpu/drm/radeon/radeon_asic.c
+++ b/drivers/gpu/drm/radeon/radeon_asic.c
@@ -1270,6 +1270,7 @@ static struct radeon_asic rs780_asic = {
.get_sclk = _dpm_get_sclk,
.get_mclk = _dpm_get_mclk,
.print_power_state = _dpm_print_power_state,
+   .debugfs_print_current_performance_level = 
_dpm_debugfs_print_current_performance_level,
},
.pflip = {
.pre_page_flip = _pre_page_flip,

-- 
Markus


Re: [pull] radeon drm-next-3.11

2013-07-01 Thread Markus Trippelsdorf
On 2013.07.01 at 17:01 -0400, alexdeuc...@gmail.com wrote:
 From: Alex Deucher alexander.deuc...@amd.com
 
 Hi Dave,
 
 A few more patches for 3.11:
 - add debugfs interface to check current DPM state
 - Fix a bug that caused problems with DPM on BTC+ asics.
 
 The following changes since commit f7d452f4fd5d86f764807a1234a407deb5b105ef:
 
   Merge branch 'drm-nouveau-next' of 
 git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next (2013-07-01 
 14:10:20 +1000)
 
 are available in the git repository at:
 
   git://people.freedesktop.org/~agd5f/linux drm-next-3.11
 
 Alex Deucher (12):
   drm/radeon: remove sumo dpm/uvd bringup leftovers
   drm/radeon/atom: fix endian bug in radeon_atom_init_mc_reg_table()
   drm/radeon: fix typo in radeon_atom_init_mc_reg_table()
   drm/radeon/dpm: re-enable state transitions for BTC
   drm/radeon/dpm: re-enable state transitions for Cayman
   drm/radeon/dpm: add infrastructure to support debugfs info
   drm/radeon/dpm: add debugfs support for rv6xx
   drm/radeon/dpm: add debugfs support for 7xx/evergreen/btc

Looks like you forgot to add debugfs support for rs780:

diff --git a/drivers/gpu/drm/radeon/radeon_asic.c 
b/drivers/gpu/drm/radeon/radeon_asic.c
index a5b244d..ca4f928 100644
--- a/drivers/gpu/drm/radeon/radeon_asic.c
+++ b/drivers/gpu/drm/radeon/radeon_asic.c
@@ -1270,6 +1270,7 @@ static struct radeon_asic rs780_asic = {
.get_sclk = rs780_dpm_get_sclk,
.get_mclk = rs780_dpm_get_mclk,
.print_power_state = rs780_dpm_print_power_state,
+   .debugfs_print_current_performance_level = 
rv770_dpm_debugfs_print_current_performance_level,
},
.pflip = {
.pre_page_flip = rs600_pre_page_flip,

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


radeon_gem_object_create:69 alloc size 139Mb bigger than 128Mb limit (RS780)

2013-02-28 Thread Markus Trippelsdorf
Running the latest Linus git kernel I occasionally see the following
warning:

 radeon_gem_object_create:69 alloc size 139Mb bigger than 128Mb limit

>From dmesg:
[drm] Initialized drm 1.1.0 20060810
[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D).
[drm] register mmio base: 0xFBEE
[drm] register mmio size: 65536
ATOM BIOS: 113
radeon :01:05.0: VRAM: 128M 0xC000 - 0xC7FF (128M 
used)
radeon :01:05.0: GTT: 512M 0xA000 - 0xBFFF
[drm] Detected VRAM RAM=128M, BAR=128M
[drm] RAM width 32bits DDR
[TTM] Zone  kernel: Available graphics memory: 4083398 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 128M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
[drm] radeon: irq initialized.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] Loading RS780 Microcode
[drm] PCIE GART of 512M enabled (table at 0xC004).
radeon :01:05.0: WB enabled
radeon :01:05.0: fence driver on ring 0 use gpu addr 0xac00 and 
cpu addr 0x8802163d1c00
radeon :01:05.0: fence driver on ring 3 use gpu addr 0xac0c and 
cpu addr 0x8802163d1c0c
radeon :01:05.0: setting latency timer to 64
[drm] ring test on 0 succeeded in 1 usecs
[drm] ring test on 3 succeeded in 1 usecs
[drm] ib test on ring 0 succeeded in 0 usecs
[drm] ib test on ring 3 succeeded in 0 usecs
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   VGA-1
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm] CRT1: INTERNAL_KLDSCP_DAC1
[drm] Connector 1:
[drm]   DVI-D-1
[drm]   HPD3
[drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[drm]   Encoders:
[drm] DFP3: INTERNAL_KLDSCP_LVTMA
[drm] radeon: power management initialized
[drm] fb mappable at 0xF0142000
[drm] vram apper at 0xF000
[drm] size 7299072
[drm] fb depth is 24
[drm]pitch is 6912

-- 
Markus


radeon_gem_object_create:69 alloc size 139Mb bigger than 128Mb limit (RS780)

2013-02-28 Thread Markus Trippelsdorf
Running the latest Linus git kernel I occasionally see the following
warning:

 radeon_gem_object_create:69 alloc size 139Mb bigger than 128Mb limit

From dmesg:
[drm] Initialized drm 1.1.0 20060810
[drm] radeon kernel modesetting enabled.
[drm] initializing kernel modesetting (RS780 0x1002:0x9614 0x1043:0x834D).
[drm] register mmio base: 0xFBEE
[drm] register mmio size: 65536
ATOM BIOS: 113
radeon :01:05.0: VRAM: 128M 0xC000 - 0xC7FF (128M 
used)
radeon :01:05.0: GTT: 512M 0xA000 - 0xBFFF
[drm] Detected VRAM RAM=128M, BAR=128M
[drm] RAM width 32bits DDR
[TTM] Zone  kernel: Available graphics memory: 4083398 kiB
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
[drm] radeon: 128M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] Driver supports precise vblank timestamp query.
[drm] radeon: irq initialized.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] Loading RS780 Microcode
[drm] PCIE GART of 512M enabled (table at 0xC004).
radeon :01:05.0: WB enabled
radeon :01:05.0: fence driver on ring 0 use gpu addr 0xac00 and 
cpu addr 0x8802163d1c00
radeon :01:05.0: fence driver on ring 3 use gpu addr 0xac0c and 
cpu addr 0x8802163d1c0c
radeon :01:05.0: setting latency timer to 64
[drm] ring test on 0 succeeded in 1 usecs
[drm] ring test on 3 succeeded in 1 usecs
[drm] ib test on ring 0 succeeded in 0 usecs
[drm] ib test on ring 3 succeeded in 0 usecs
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   VGA-1
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm] CRT1: INTERNAL_KLDSCP_DAC1
[drm] Connector 1:
[drm]   DVI-D-1
[drm]   HPD3
[drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[drm]   Encoders:
[drm] DFP3: INTERNAL_KLDSCP_LVTMA
[drm] radeon: power management initialized
[drm] fb mappable at 0xF0142000
[drm] vram apper at 0xF000
[drm] size 7299072
[drm] fb depth is 24
[drm]pitch is 6912

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-17 Thread Markus Trippelsdorf
On 2013.01.17 at 13:28 -0500, Jerome Glisse wrote:
> On Thu, Jan 17, 2013 at 11:10 AM, Markus Trippelsdorf
>  wrote:
> > On 2013.01.17 at 10:44 -0500, Jerome Glisse wrote:
> >> On Thu, Jan 17, 2013 at 3:46 AM, Markus Trippelsdorf
> >>  wrote:
> >> > On 2013.01.16 at 19:18 -0500, Jerome Glisse wrote:
> >> >> On Wed, Jan 16, 2013 at 6:10 PM, Markus Trippelsdorf
> >> >>  wrote:
> >> >> > On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
> >> >> >> On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
> >> >> >>  wrote:
> >> >> >> > On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
> >> >> >> >> On 2013.01.15 at 16:26 +0100, Michel D?nzer wrote:
> >> >> >> >> > On Die, 201301-15 at 16:23 +0100, Markus Trippelsdorf wrote:
> >> >> >> >> > > On 2013.01.15 at 15:43 +0100, Michel D?nzer wrote:
> >> >> >> >> > > > On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf 
> >> >> >> >> > > > wrote:
> >> >> >> >> > > > > On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > And just in case it got lost in the noise yesterday:
> >> >> >> >> > > > > > The image corruption is caused by Dave's commit:
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
> >> >> >> >> > > > > > Author: Dave Airlie 
> >> >> >> >> > > > > > Date:   Fri Dec 14 21:04:46 2012 +1000
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > radeon: fix regression with eviction since evict 
> >> >> >> >> > > > > > caching changes
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > Reverting it 'fixes' the issue.
> >> >> >> >> > > > >
> >> >> >> >> > > > > Ping.
> >> >> >> >> > > > > The issue still happens with todays Linus git tree.
> >> >> >> >> > > >
> >> >> >> >> > > > Does the corruption also occur with
> >> >> >> >> > > > dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually 
> >> >> >> >> > > > on top of
> >> >> >> >> > > > 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
> >> >> >> >> > >
> >> >> >> >> > > No.
> >> >> >> >> >
> >> >> >> >> > So, can you bisect which change between those two actually 
> >> >> >> >> > introduced
> >> >> >> >> > the corruption?
> >> >> >> >
> >> >> >> > The real cause of the image corruption is:
> >> >> >> >
> >> >> >> > d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
> >> >> >> > commit d025e9e2b890db679f1246037bf65bd4be512627
> >> >> >> > Author: Jerome Glisse 
> >> >> >> > Date:   Thu Nov 29 10:35:41 2012 -0500
> >> >> >> >
> >> >> >> > drm/radeon: do not move bo to different placement at each cs
> >> >> >> >
> >> >> >> > The bo creation placement is where the bo will be. Instead of 
> >> >> >> > trying
> >> >> >> > to move bo at each command stream let this work to another 
> >> >> >> > worker
> >> >> >> > thread that will use more advance heuristic.
> >> >> >> >
> >> >> >> > agd5f: remove leftover unused variable
> >> >> >> >
> >> >> >> > Signed-off-by: Jerome Glisse 
> >> >> >> > Reviewed-by: Alex Deucher 
> >> >> >> >
> >> >> >> > Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
> >> >> >

[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-17 Thread Markus Trippelsdorf
On 2013.01.17 at 12:55 -0500, Jerome Glisse wrote:
> On Thu, Jan 17, 2013 at 11:10 AM, Markus Trippelsdorf
>  wrote:
> > On 2013.01.17 at 10:44 -0500, Jerome Glisse wrote:
> >> On Thu, Jan 17, 2013 at 3:46 AM, Markus Trippelsdorf
> >>  wrote:
> >> > On 2013.01.16 at 19:18 -0500, Jerome Glisse wrote:
> >> >> On Wed, Jan 16, 2013 at 6:10 PM, Markus Trippelsdorf
> >> >>  wrote:
> >> >> > On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
> >> >> >> On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
> >> >> >>  wrote:
> >> >> >> > On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
> >> >> >> >> On 2013.01.15 at 16:26 +0100, Michel D?nzer wrote:
> >> >> >> >> > On Die, 201301-15 at 16:23 +0100, Markus Trippelsdorf wrote:
> >> >> >> >> > > On 2013.01.15 at 15:43 +0100, Michel D?nzer wrote:
> >> >> >> >> > > > On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf 
> >> >> >> >> > > > wrote:
> >> >> >> >> > > > > On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > And just in case it got lost in the noise yesterday:
> >> >> >> >> > > > > > The image corruption is caused by Dave's commit:
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
> >> >> >> >> > > > > > Author: Dave Airlie 
> >> >> >> >> > > > > > Date:   Fri Dec 14 21:04:46 2012 +1000
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > radeon: fix regression with eviction since evict 
> >> >> >> >> > > > > > caching changes
> >> >> >> >> > > > > >
> >> >> >> >> > > > > > Reverting it 'fixes' the issue.
> >> >> >> >> > > > >
> >> >> >> >> > > > > Ping.
> >> >> >> >> > > > > The issue still happens with todays Linus git tree.
> >> >> >> >> > > >
> >> >> >> >> > > > Does the corruption also occur with
> >> >> >> >> > > > dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually 
> >> >> >> >> > > > on top of
> >> >> >> >> > > > 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
> >> >> >> >> > >
> >> >> >> >> > > No.
> >> >> >> >> >
> >> >> >> >> > So, can you bisect which change between those two actually 
> >> >> >> >> > introduced
> >> >> >> >> > the corruption?
> >> >> >> >
> >> >> >> > The real cause of the image corruption is:
> >> >> >> >
> >> >> >> > d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
> >> >> >> > commit d025e9e2b890db679f1246037bf65bd4be512627
> >> >> >> > Author: Jerome Glisse 
> >> >> >> > Date:   Thu Nov 29 10:35:41 2012 -0500
> >> >> >> >
> >> >> >> > drm/radeon: do not move bo to different placement at each cs
> >> >> >> >
> >> >> >> > The bo creation placement is where the bo will be. Instead of 
> >> >> >> > trying
> >> >> >> > to move bo at each command stream let this work to another 
> >> >> >> > worker
> >> >> >> > thread that will use more advance heuristic.
> >> >> >> >
> >> >> >> > agd5f: remove leftover unused variable
> >> >> >> >
> >> >> >> > Signed-off-by: Jerome Glisse 
> >> >> >> > Reviewed-by: Alex Deucher 
> >> >> >> >
> >> >> >> > Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
> >> >> >

[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-17 Thread Markus Trippelsdorf
On 2013.01.17 at 10:44 -0500, Jerome Glisse wrote:
> On Thu, Jan 17, 2013 at 3:46 AM, Markus Trippelsdorf
>  wrote:
> > On 2013.01.16 at 19:18 -0500, Jerome Glisse wrote:
> >> On Wed, Jan 16, 2013 at 6:10 PM, Markus Trippelsdorf
> >>  wrote:
> >> > On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
> >> >> On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
> >> >>  wrote:
> >> >> > On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
> >> >> >> On 2013.01.15 at 16:26 +0100, Michel D?nzer wrote:
> >> >> >> > On Die, 201301-15 at 16:23 +0100, Markus Trippelsdorf wrote:
> >> >> >> > > On 2013.01.15 at 15:43 +0100, Michel D?nzer wrote:
> >> >> >> > > > On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote:
> >> >> >> > > > > On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
> >> >> >> > > > > >
> >> >> >> > > > > > And just in case it got lost in the noise yesterday:
> >> >> >> > > > > > The image corruption is caused by Dave's commit:
> >> >> >> > > > > >
> >> >> >> > > > > > commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
> >> >> >> > > > > > Author: Dave Airlie 
> >> >> >> > > > > > Date:   Fri Dec 14 21:04:46 2012 +1000
> >> >> >> > > > > >
> >> >> >> > > > > > radeon: fix regression with eviction since evict 
> >> >> >> > > > > > caching changes
> >> >> >> > > > > >
> >> >> >> > > > > > Reverting it 'fixes' the issue.
> >> >> >> > > > >
> >> >> >> > > > > Ping.
> >> >> >> > > > > The issue still happens with todays Linus git tree.
> >> >> >> > > >
> >> >> >> > > > Does the corruption also occur with
> >> >> >> > > > dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on 
> >> >> >> > > > top of
> >> >> >> > > > 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
> >> >> >> > >
> >> >> >> > > No.
> >> >> >> >
> >> >> >> > So, can you bisect which change between those two actually 
> >> >> >> > introduced
> >> >> >> > the corruption?
> >> >> >
> >> >> > The real cause of the image corruption is:
> >> >> >
> >> >> > d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
> >> >> > commit d025e9e2b890db679f1246037bf65bd4be512627
> >> >> > Author: Jerome Glisse 
> >> >> > Date:   Thu Nov 29 10:35:41 2012 -0500
> >> >> >
> >> >> > drm/radeon: do not move bo to different placement at each cs
> >> >> >
> >> >> > The bo creation placement is where the bo will be. Instead of 
> >> >> > trying
> >> >> > to move bo at each command stream let this work to another worker
> >> >> > thread that will use more advance heuristic.
> >> >> >
> >> >> > agd5f: remove leftover unused variable
> >> >> >
> >> >> > Signed-off-by: Jerome Glisse 
> >> >> > Reviewed-by: Alex Deucher 
> >> >> >
> >> >> > Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
> >> >>
> >> >> Can you try this patch from Jerome:
> >> >> https://bugzilla.kernel.org/attachment.cgi?id=91421
> >> >
> >> > It fixes the corruption, but it degrades performance so much that it
> >> > takes several seconds to switch virtual desktops under xmonad. And
> >> > sometimes the website used for the scroll test is stuck for several
> >> > seconds and unscrollable during that time.
> >> >
> >> > --
> >> > Markus
> >>
> >> What about this patch instead :
> >> http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch
> >
> > This one doesn't work:
> 
> Same address updated patch
> 
> http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch

It still doesn't work unfortunately. Can you please just revert
d025e9e2b89 for now? Maybe it's better to wait for the next kernel
release for another solution.

Jan 17 17:05:34 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Jan 17 17:05:34 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x022b last fence id 0x0224)
Jan 17 17:05:34 x4 kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse 
relocation -12!
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (764, 6, 4096, -12)
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (764, 6, 4096, -12)
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (764, 6, 4096, -12)
Jan 17 17:05:34 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 17:05:34 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (7098368, 6, 4096, -12)
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (7278592, 2, 4096, -12)

-- 
Markus


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-17 Thread Markus Trippelsdorf
On 2013.01.16 at 19:18 -0500, Jerome Glisse wrote:
> On Wed, Jan 16, 2013 at 6:10 PM, Markus Trippelsdorf
>  wrote:
> > On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
> >> On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
> >>  wrote:
> >> > On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
> >> >> On 2013.01.15 at 16:26 +0100, Michel D?nzer wrote:
> >> >> > On Die, 2013-01-15 at 16:23 +0100, Markus Trippelsdorf wrote:
> >> >> > > On 2013.01.15 at 15:43 +0100, Michel D?nzer wrote:
> >> >> > > > On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote:
> >> >> > > > > On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
> >> >> > > > > >
> >> >> > > > > > And just in case it got lost in the noise yesterday:
> >> >> > > > > > The image corruption is caused by Dave's commit:
> >> >> > > > > >
> >> >> > > > > > commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
> >> >> > > > > > Author: Dave Airlie 
> >> >> > > > > > Date:   Fri Dec 14 21:04:46 2012 +1000
> >> >> > > > > >
> >> >> > > > > > radeon: fix regression with eviction since evict caching 
> >> >> > > > > > changes
> >> >> > > > > >
> >> >> > > > > > Reverting it 'fixes' the issue.
> >> >> > > > >
> >> >> > > > > Ping.
> >> >> > > > > The issue still happens with todays Linus git tree.
> >> >> > > >
> >> >> > > > Does the corruption also occur with
> >> >> > > > dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top 
> >> >> > > > of
> >> >> > > > 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
> >> >> > >
> >> >> > > No.
> >> >> >
> >> >> > So, can you bisect which change between those two actually introduced
> >> >> > the corruption?
> >> >
> >> > The real cause of the image corruption is:
> >> >
> >> > d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
> >> > commit d025e9e2b890db679f1246037bf65bd4be512627
> >> > Author: Jerome Glisse 
> >> > Date:   Thu Nov 29 10:35:41 2012 -0500
> >> >
> >> > drm/radeon: do not move bo to different placement at each cs
> >> >
> >> > The bo creation placement is where the bo will be. Instead of trying
> >> > to move bo at each command stream let this work to another worker
> >> > thread that will use more advance heuristic.
> >> >
> >> > agd5f: remove leftover unused variable
> >> >
> >> > Signed-off-by: Jerome Glisse 
> >> > Reviewed-by: Alex Deucher 
> >> >
> >> > Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
> >>
> >> Can you try this patch from Jerome:
> >> https://bugzilla.kernel.org/attachment.cgi?id=91421
> >
> > It fixes the corruption, but it degrades performance so much that it
> > takes several seconds to switch virtual desktops under xmonad. And
> > sometimes the website used for the scroll test is stuck for several
> > seconds and unscrollable during that time.
> >
> > --
> > Markus
> 
> What about this patch instead :
> http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch

This one doesn't work:

Jan 17 09:40:53 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x0a63 last fence id 0x0a62)
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse 
relocation -12!
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs

[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-17 Thread Markus Trippelsdorf
On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
> On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
>  wrote:
> > On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
> >> On 2013.01.15 at 16:26 +0100, Michel D?nzer wrote:
> >> > On Die, 2013-01-15 at 16:23 +0100, Markus Trippelsdorf wrote:
> >> > > On 2013.01.15 at 15:43 +0100, Michel D?nzer wrote:
> >> > > > On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote:
> >> > > > > On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
> >> > > > > >
> >> > > > > > And just in case it got lost in the noise yesterday:
> >> > > > > > The image corruption is caused by Dave's commit:
> >> > > > > >
> >> > > > > > commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
> >> > > > > > Author: Dave Airlie 
> >> > > > > > Date:   Fri Dec 14 21:04:46 2012 +1000
> >> > > > > >
> >> > > > > > radeon: fix regression with eviction since evict caching 
> >> > > > > > changes
> >> > > > > >
> >> > > > > > Reverting it 'fixes' the issue.
> >> > > > >
> >> > > > > Ping.
> >> > > > > The issue still happens with todays Linus git tree.
> >> > > >
> >> > > > Does the corruption also occur with
> >> > > > dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top of
> >> > > > 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
> >> > >
> >> > > No.
> >> >
> >> > So, can you bisect which change between those two actually introduced
> >> > the corruption?
> >
> > The real cause of the image corruption is:
> >
> > d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
> > commit d025e9e2b890db679f1246037bf65bd4be512627
> > Author: Jerome Glisse 
> > Date:   Thu Nov 29 10:35:41 2012 -0500
> >
> > drm/radeon: do not move bo to different placement at each cs
> >
> > The bo creation placement is where the bo will be. Instead of trying
> > to move bo at each command stream let this work to another worker
> > thread that will use more advance heuristic.
> >
> > agd5f: remove leftover unused variable
> >
> > Signed-off-by: Jerome Glisse 
> > Reviewed-by: Alex Deucher 
> >
> > Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
> 
> Can you try this patch from Jerome:
> https://bugzilla.kernel.org/attachment.cgi?id=91421

It fixes the corruption, but it degrades performance so much that it
takes several seconds to switch virtual desktops under xmonad. And
sometimes the website used for the scroll test is stuck for several
seconds and unscrollable during that time. 

-- 
Markus


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-17 Thread Markus Trippelsdorf
On 2013.01.16 at 19:18 -0500, Jerome Glisse wrote:
 On Wed, Jan 16, 2013 at 6:10 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
  On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
   On 2013.01.15 at 16:26 +0100, Michel Dänzer wrote:
On Die, 2013-01-15 at 16:23 +0100, Markus Trippelsdorf wrote:
 On 2013.01.15 at 15:43 +0100, Michel Dänzer wrote:
  On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote:
   On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
   
And just in case it got lost in the noise yesterday:
The image corruption is caused by Dave's commit:
   
commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
Author: Dave Airlie airl...@redhat.com
Date:   Fri Dec 14 21:04:46 2012 +1000
   
radeon: fix regression with eviction since evict caching 
changes
   
Reverting it 'fixes' the issue.
  
   Ping.
   The issue still happens with todays Linus git tree.
 
  Does the corruption also occur with
  dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top 
  of
  0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?

 No.
   
So, can you bisect which change between those two actually introduced
the corruption?
  
   The real cause of the image corruption is:
  
   d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
   commit d025e9e2b890db679f1246037bf65bd4be512627
   Author: Jerome Glisse jgli...@redhat.com
   Date:   Thu Nov 29 10:35:41 2012 -0500
  
   drm/radeon: do not move bo to different placement at each cs
  
   The bo creation placement is where the bo will be. Instead of trying
   to move bo at each command stream let this work to another worker
   thread that will use more advance heuristic.
  
   agd5f: remove leftover unused variable
  
   Signed-off-by: Jerome Glisse jgli...@redhat.com
   Reviewed-by: Alex Deucher alexander.deuc...@amd.com
  
   Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
 
  Can you try this patch from Jerome:
  https://bugzilla.kernel.org/attachment.cgi?id=91421
 
  It fixes the corruption, but it degrades performance so much that it
  takes several seconds to switch virtual desktops under xmonad. And
  sometimes the website used for the scroll test is stuck for several
  seconds and unscrollable during that time.
 
  --
  Markus
 
 What about this patch instead :
 http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch

This one doesn't work:

Jan 17 09:40:53 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x0a63 last fence id 0x0a62)
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse 
relocation -12!
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 09:40:53 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 09:40:53 x4 kernel

Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-17 Thread Markus Trippelsdorf
On 2013.01.17 at 10:44 -0500, Jerome Glisse wrote:
 On Thu, Jan 17, 2013 at 3:46 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2013.01.16 at 19:18 -0500, Jerome Glisse wrote:
  On Wed, Jan 16, 2013 at 6:10 PM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
   On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
On 2013.01.15 at 16:26 +0100, Michel Dänzer wrote:
 On Die, 201301-15 at 16:23 +0100, Markus Trippelsdorf wrote:
  On 2013.01.15 at 15:43 +0100, Michel Dänzer wrote:
   On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote:
On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:

 And just in case it got lost in the noise yesterday:
 The image corruption is caused by Dave's commit:

 commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
 Author: Dave Airlie airl...@redhat.com
 Date:   Fri Dec 14 21:04:46 2012 +1000

 radeon: fix regression with eviction since evict 
 caching changes

 Reverting it 'fixes' the issue.
   
Ping.
The issue still happens with todays Linus git tree.
  
   Does the corruption also occur with
   dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on 
   top of
   0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
 
  No.

 So, can you bisect which change between those two actually 
 introduced
 the corruption?
   
The real cause of the image corruption is:
   
d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
commit d025e9e2b890db679f1246037bf65bd4be512627
Author: Jerome Glisse jgli...@redhat.com
Date:   Thu Nov 29 10:35:41 2012 -0500
   
drm/radeon: do not move bo to different placement at each cs
   
The bo creation placement is where the bo will be. Instead of 
trying
to move bo at each command stream let this work to another worker
thread that will use more advance heuristic.
   
agd5f: remove leftover unused variable
   
Signed-off-by: Jerome Glisse jgli...@redhat.com
Reviewed-by: Alex Deucher alexander.deuc...@amd.com
   
Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
  
   Can you try this patch from Jerome:
   https://bugzilla.kernel.org/attachment.cgi?id=91421
  
   It fixes the corruption, but it degrades performance so much that it
   takes several seconds to switch virtual desktops under xmonad. And
   sometimes the website used for the scroll test is stuck for several
   seconds and unscrollable during that time.
  
   --
   Markus
 
  What about this patch instead :
  http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch
 
  This one doesn't work:
 
 Same address updated patch
 
 http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch

It still doesn't work unfortunately. Can you please just revert
d025e9e2b89 for now? Maybe it's better to wait for the next kernel
release for another solution.

Jan 17 17:05:34 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Jan 17 17:05:34 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x022b last fence id 0x0224)
Jan 17 17:05:34 x4 kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse 
relocation -12!
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (764, 6, 4096, -12)
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (764, 6, 4096, -12)
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (764, 6, 4096, -12)
Jan 17 17:05:34 x4 kernel: radeon :01:05.0: couldn't schedule ib
Jan 17 17:05:34 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule 
IB !
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (7098368, 6, 4096, -12)
Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
allocate GEM object (7278592, 2, 4096, -12)

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-17 Thread Markus Trippelsdorf
On 2013.01.17 at 12:55 -0500, Jerome Glisse wrote:
 On Thu, Jan 17, 2013 at 11:10 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2013.01.17 at 10:44 -0500, Jerome Glisse wrote:
  On Thu, Jan 17, 2013 at 3:46 AM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2013.01.16 at 19:18 -0500, Jerome Glisse wrote:
   On Wed, Jan 16, 2013 at 6:10 PM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
 On 2013.01.15 at 16:26 +0100, Michel Dänzer wrote:
  On Die, 201301-15 at 16:23 +0100, Markus Trippelsdorf wrote:
   On 2013.01.15 at 15:43 +0100, Michel Dänzer wrote:
On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf 
wrote:
 On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
 
  And just in case it got lost in the noise yesterday:
  The image corruption is caused by Dave's commit:
 
  commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
  Author: Dave Airlie airl...@redhat.com
  Date:   Fri Dec 14 21:04:46 2012 +1000
 
  radeon: fix regression with eviction since evict 
  caching changes
 
  Reverting it 'fixes' the issue.

 Ping.
 The issue still happens with todays Linus git tree.
   
Does the corruption also occur with
dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually 
on top of
0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
  
   No.
 
  So, can you bisect which change between those two actually 
  introduced
  the corruption?

 The real cause of the image corruption is:

 d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
 commit d025e9e2b890db679f1246037bf65bd4be512627
 Author: Jerome Glisse jgli...@redhat.com
 Date:   Thu Nov 29 10:35:41 2012 -0500

 drm/radeon: do not move bo to different placement at each cs

 The bo creation placement is where the bo will be. Instead of 
 trying
 to move bo at each command stream let this work to another 
 worker
 thread that will use more advance heuristic.

 agd5f: remove leftover unused variable

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 Reviewed-by: Alex Deucher alexander.deuc...@amd.com

 Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
   
Can you try this patch from Jerome:
https://bugzilla.kernel.org/attachment.cgi?id=91421
   
It fixes the corruption, but it degrades performance so much that it
takes several seconds to switch virtual desktops under xmonad. And
sometimes the website used for the scroll test is stuck for several
seconds and unscrollable during that time.
   
--
Markus
  
   What about this patch instead :
   http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch
  
   This one doesn't work:
 
  Same address updated patch
 
  http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch
 
  It still doesn't work unfortunately. Can you please just revert
  d025e9e2b89 for now? Maybe it's better to wait for the next kernel
  release for another solution.
 
  Jan 17 17:05:34 x4 kernel: radeon :01:05.0: GPU lockup CP stall for 
  more than 1msec
  Jan 17 17:05:34 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
  0x022b last fence id 0x0224)
  Jan 17 17:05:34 x4 kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse 
  relocation -12!
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (764, 6, 4096, -12)
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (764, 6, 4096, -12)
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (764, 6, 4096, -12)
  Jan 17 17:05:34 x4 kernel: radeon :01:05.0: couldn't schedule ib
  Jan 17 17:05:34 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to 
  schedule IB !
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (7098368, 6, 4096, -12)
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (7278592, 2, 4096, -12)
 
  --
  Markus
 
 I am trying to understand why i can't reproduce, what is your desktop
 (gnome, kde, ...) what browser ? Is your card agp ? How much ram do
 you have ?

The desktop is xmonad and the exact browser doesn't matter, because it
happens both with Firefox and Chromium. I use my monitor in portrait
mode: DVI-0 connected 1050x1680+0+0 left

dmesg:

Linux version 3.8.0-rc3-00352-gdfdebc2

Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-17 Thread Markus Trippelsdorf
On 2013.01.17 at 13:28 -0500, Jerome Glisse wrote:
 On Thu, Jan 17, 2013 at 11:10 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2013.01.17 at 10:44 -0500, Jerome Glisse wrote:
  On Thu, Jan 17, 2013 at 3:46 AM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2013.01.16 at 19:18 -0500, Jerome Glisse wrote:
   On Wed, Jan 16, 2013 at 6:10 PM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
 On 2013.01.15 at 16:26 +0100, Michel Dänzer wrote:
  On Die, 201301-15 at 16:23 +0100, Markus Trippelsdorf wrote:
   On 2013.01.15 at 15:43 +0100, Michel Dänzer wrote:
On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf 
wrote:
 On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
 
  And just in case it got lost in the noise yesterday:
  The image corruption is caused by Dave's commit:
 
  commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
  Author: Dave Airlie airl...@redhat.com
  Date:   Fri Dec 14 21:04:46 2012 +1000
 
  radeon: fix regression with eviction since evict 
  caching changes
 
  Reverting it 'fixes' the issue.

 Ping.
 The issue still happens with todays Linus git tree.
   
Does the corruption also occur with
dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually 
on top of
0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
  
   No.
 
  So, can you bisect which change between those two actually 
  introduced
  the corruption?

 The real cause of the image corruption is:

 d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
 commit d025e9e2b890db679f1246037bf65bd4be512627
 Author: Jerome Glisse jgli...@redhat.com
 Date:   Thu Nov 29 10:35:41 2012 -0500

 drm/radeon: do not move bo to different placement at each cs

 The bo creation placement is where the bo will be. Instead of 
 trying
 to move bo at each command stream let this work to another 
 worker
 thread that will use more advance heuristic.

 agd5f: remove leftover unused variable

 Signed-off-by: Jerome Glisse jgli...@redhat.com
 Reviewed-by: Alex Deucher alexander.deuc...@amd.com

 Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
   
Can you try this patch from Jerome:
https://bugzilla.kernel.org/attachment.cgi?id=91421
   
It fixes the corruption, but it degrades performance so much that it
takes several seconds to switch virtual desktops under xmonad. And
sometimes the website used for the scroll test is stuck for several
seconds and unscrollable during that time.
   
--
Markus
  
   What about this patch instead :
   http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch
  
   This one doesn't work:
 
  Same address updated patch
 
  http://people.freedesktop.org/~glisse/0001-drm-radeon-exclude-system-placement-when-validating-.patch
 
  It still doesn't work unfortunately. Can you please just revert
  d025e9e2b89 for now? Maybe it's better to wait for the next kernel
  release for another solution.
 
  Jan 17 17:05:34 x4 kernel: radeon :01:05.0: GPU lockup CP stall for 
  more than 1msec
  Jan 17 17:05:34 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
  0x022b last fence id 0x0224)
  Jan 17 17:05:34 x4 kernel: [drm:radeon_cs_ioctl] *ERROR* Failed to parse 
  relocation -12!
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (764, 6, 4096, -12)
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (764, 6, 4096, -12)
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (764, 6, 4096, -12)
  Jan 17 17:05:34 x4 kernel: radeon :01:05.0: couldn't schedule ib
  Jan 17 17:05:34 x4 kernel: [drm:radeon_cs_ib_chunk] *ERROR* Failed to 
  schedule IB !
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (7098368, 6, 4096, -12)
  Jan 17 17:05:34 x4 kernel: [drm:radeon_gem_object_create] *ERROR* Failed to 
  allocate GEM object (7278592, 2, 4096, -12)
 
  --
  Markus
 
 For 3.9 sake can you try if
 http://people.freedesktop.org/~glisse/0001-drm-radeon-keep-original-user-requested-placement-ar.patch
 
 on top of revert d025e9e2b89 works

Yes, this combination works just fine.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-16 Thread Markus Trippelsdorf
On 2013.01.16 at 17:36 -0500, Alex Deucher wrote:
 On Tue, Jan 15, 2013 at 12:03 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
  On 2013.01.15 at 16:26 +0100, Michel Dänzer wrote:
   On Die, 2013-01-15 at 16:23 +0100, Markus Trippelsdorf wrote:
On 2013.01.15 at 15:43 +0100, Michel Dänzer wrote:
 On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote:
  On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
  
   And just in case it got lost in the noise yesterday:
   The image corruption is caused by Dave's commit:
  
   commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
   Author: Dave Airlie airl...@redhat.com
   Date:   Fri Dec 14 21:04:46 2012 +1000
  
   radeon: fix regression with eviction since evict caching 
   changes
  
   Reverting it 'fixes' the issue.
 
  Ping.
  The issue still happens with todays Linus git tree.

 Does the corruption also occur with
 dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top of
 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
   
No.
  
   So, can you bisect which change between those two actually introduced
   the corruption?
 
  The real cause of the image corruption is:
 
  d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
  commit d025e9e2b890db679f1246037bf65bd4be512627
  Author: Jerome Glisse jgli...@redhat.com
  Date:   Thu Nov 29 10:35:41 2012 -0500
 
  drm/radeon: do not move bo to different placement at each cs
 
  The bo creation placement is where the bo will be. Instead of trying
  to move bo at each command stream let this work to another worker
  thread that will use more advance heuristic.
 
  agd5f: remove leftover unused variable
 
  Signed-off-by: Jerome Glisse jgli...@redhat.com
  Reviewed-by: Alex Deucher alexander.deuc...@amd.com
 
  Reverting d025e9e2b890d on top of Linus' tree fixes the issue.
 
 Can you try this patch from Jerome:
 https://bugzilla.kernel.org/attachment.cgi?id=91421

It fixes the corruption, but it degrades performance so much that it
takes several seconds to switch virtual desktops under xmonad. And
sometimes the website used for the scroll test is stuck for several
seconds and unscrollable during that time. 

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-15 Thread Markus Trippelsdorf
On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
> On 2013.01.15 at 16:26 +0100, Michel D?nzer wrote:
> > On Die, 2013-01-15 at 16:23 +0100, Markus Trippelsdorf wrote: 
> > > On 2013.01.15 at 15:43 +0100, Michel D?nzer wrote:
> > > > On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote: 
> > > > > On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
> > > > > > 
> > > > > > And just in case it got lost in the noise yesterday: 
> > > > > > The image corruption is caused by Dave's commit:
> > > > > > 
> > > > > > commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
> > > > > > Author: Dave Airlie 
> > > > > > Date:   Fri Dec 14 21:04:46 2012 +1000
> > > > > > 
> > > > > > radeon: fix regression with eviction since evict caching changes
> > > > > > 
> > > > > > Reverting it 'fixes' the issue.
> > > > > 
> > > > > Ping.
> > > > > The issue still happens with todays Linus git tree.
> > > > 
> > > > Does the corruption also occur with
> > > > dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top of
> > > > 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
> > > 
> > > No.
> > 
> > So, can you bisect which change between those two actually introduced
> > the corruption?
> 
> 86a1881d08f65a42c17071a59c0088dbe2870246 is the first bad commit

Sorry, the bisection above was wrong. Please ignore.

The real cause of the image corruption is:

d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
commit d025e9e2b890db679f1246037bf65bd4be512627
Author: Jerome Glisse 
Date:   Thu Nov 29 10:35:41 2012 -0500

drm/radeon: do not move bo to different placement at each cs

The bo creation placement is where the bo will be. Instead of trying
to move bo at each command stream let this work to another worker
thread that will use more advance heuristic.

agd5f: remove leftover unused variable

Signed-off-by: Jerome Glisse 
Reviewed-by: Alex Deucher 

Reverting d025e9e2b890d on top of Linus' tree fixes the issue.

-- 
Markus


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-15 Thread Markus Trippelsdorf
On 2013.01.15 at 16:26 +0100, Michel D?nzer wrote:
> On Die, 2013-01-15 at 16:23 +0100, Markus Trippelsdorf wrote: 
> > On 2013.01.15 at 15:43 +0100, Michel D?nzer wrote:
> > > On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote: 
> > > > On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
> > > > > 
> > > > > And just in case it got lost in the noise yesterday: 
> > > > > The image corruption is caused by Dave's commit:
> > > > > 
> > > > > commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
> > > > > Author: Dave Airlie 
> > > > > Date:   Fri Dec 14 21:04:46 2012 +1000
> > > > > 
> > > > > radeon: fix regression with eviction since evict caching changes
> > > > > 
> > > > > Reverting it 'fixes' the issue.
> > > > 
> > > > Ping.
> > > > The issue still happens with todays Linus git tree.
> > > 
> > > Does the corruption also occur with
> > > dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top of
> > > 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
> > 
> > No.
> 
> So, can you bisect which change between those two actually introduced
> the corruption?

86a1881d08f65a42c17071a59c0088dbe2870246 is the first bad commit
commit 86a1881d08f65a42c17071a59c0088dbe2870246
Author: Jerome Glisse 
Date:   Wed Dec 12 16:43:15 2012 -0500

drm/radeon: fix fence driver for dma ring when wb is disabled

The dma ring can't write to register thus have to write to memory
its fence value. This ensure that it doesn't try to use scratch
register for dma ring fence driver.

Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=58166

Signed-off-by: Jerome Glisse 
Reviewed-by: Alex Deucher 

-- 
Markus


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-15 Thread Markus Trippelsdorf
On 2013.01.15 at 15:43 +0100, Michel D?nzer wrote:
> On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote: 
> > On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
> > > 
> > > And just in case it got lost in the noise yesterday: 
> > > The image corruption is caused by Dave's commit:
> > > 
> > > commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
> > > Author: Dave Airlie 
> > > Date:   Fri Dec 14 21:04:46 2012 +1000
> > > 
> > > radeon: fix regression with eviction since evict caching changes
> > > 
> > > Reverting it 'fixes' the issue.
> > 
> > Ping.
> > The issue still happens with todays Linus git tree.
> 
> Does the corruption also occur with
> dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top of
> 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?

No.

-- 
Markus


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-15 Thread Markus Trippelsdorf
On 2013.01.15 at 15:43 +0100, Michel Dänzer wrote:
 On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote: 
  On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
   
   And just in case it got lost in the noise yesterday: 
   The image corruption is caused by Dave's commit:
   
   commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
   Author: Dave Airlie airl...@redhat.com
   Date:   Fri Dec 14 21:04:46 2012 +1000
   
   radeon: fix regression with eviction since evict caching changes
   
   Reverting it 'fixes' the issue.
  
  Ping.
  The issue still happens with todays Linus git tree.
 
 Does the corruption also occur with
 dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top of
 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?

No.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-15 Thread Markus Trippelsdorf
On 2013.01.15 at 16:26 +0100, Michel Dänzer wrote:
 On Die, 2013-01-15 at 16:23 +0100, Markus Trippelsdorf wrote: 
  On 2013.01.15 at 15:43 +0100, Michel Dänzer wrote:
   On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote: 
On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
 
 And just in case it got lost in the noise yesterday: 
 The image corruption is caused by Dave's commit:
 
 commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
 Author: Dave Airlie airl...@redhat.com
 Date:   Fri Dec 14 21:04:46 2012 +1000
 
 radeon: fix regression with eviction since evict caching changes
 
 Reverting it 'fixes' the issue.

Ping.
The issue still happens with todays Linus git tree.
   
   Does the corruption also occur with
   dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top of
   0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
  
  No.
 
 So, can you bisect which change between those two actually introduced
 the corruption?

86a1881d08f65a42c17071a59c0088dbe2870246 is the first bad commit
commit 86a1881d08f65a42c17071a59c0088dbe2870246
Author: Jerome Glisse jgli...@redhat.com
Date:   Wed Dec 12 16:43:15 2012 -0500

drm/radeon: fix fence driver for dma ring when wb is disabled

The dma ring can't write to register thus have to write to memory
its fence value. This ensure that it doesn't try to use scratch
register for dma ring fence driver.

Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=58166

Signed-off-by: Jerome Glisse jgli...@redhat.com
Reviewed-by: Alex Deucher alexander.deuc...@amd.com

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-15 Thread Markus Trippelsdorf
On 2013.01.15 at 17:32 +0100, Markus Trippelsdorf wrote:
 On 2013.01.15 at 16:26 +0100, Michel Dänzer wrote:
  On Die, 2013-01-15 at 16:23 +0100, Markus Trippelsdorf wrote: 
   On 2013.01.15 at 15:43 +0100, Michel Dänzer wrote:
On Sam, 2013-01-05 at 11:41 +0100, Markus Trippelsdorf wrote: 
 On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
  
  And just in case it got lost in the noise yesterday: 
  The image corruption is caused by Dave's commit:
  
  commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
  Author: Dave Airlie airl...@redhat.com
  Date:   Fri Dec 14 21:04:46 2012 +1000
  
  radeon: fix regression with eviction since evict caching changes
  
  Reverting it 'fixes' the issue.
 
 Ping.
 The issue still happens with todays Linus git tree.

Does the corruption also occur with
dd54fee7d440c4a9756cce2c24a50c15e4c17ccb applied manually on top of
0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d?
   
   No.
  
  So, can you bisect which change between those two actually introduced
  the corruption?
 
 86a1881d08f65a42c17071a59c0088dbe2870246 is the first bad commit

Sorry, the bisection above was wrong. Please ignore.

The real cause of the image corruption is:

d025e9e2b890db679f1246037bf65bd4be512627 is the first bad commit
commit d025e9e2b890db679f1246037bf65bd4be512627
Author: Jerome Glisse jgli...@redhat.com
Date:   Thu Nov 29 10:35:41 2012 -0500

drm/radeon: do not move bo to different placement at each cs

The bo creation placement is where the bo will be. Instead of trying
to move bo at each command stream let this work to another worker
thread that will use more advance heuristic.

agd5f: remove leftover unused variable

Signed-off-by: Jerome Glisse jgli...@redhat.com
Reviewed-by: Alex Deucher alexander.deuc...@amd.com

Reverting d025e9e2b890d on top of Linus' tree fixes the issue.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-05 Thread Markus Trippelsdorf
On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
> On 2012.12.20 at 14:45 +0100, Markus Trippelsdorf wrote:
> > On 2012.12.20 at 08:30 -0500, Alex Deucher wrote:
> > > On Wed, Dec 19, 2012 at 9:33 AM, Markus Trippelsdorf
> > >  wrote:
> > > > On 2012.12.19 at 15:18 +0100, Maarten Lankhorst wrote:
> > > >> Fix regression introduced by 85b144f860176
> > > >
> > > > (EE) [mi] EQ overflow continuing.  100 events have been dropped.
> > > > (EE)
> > > > (EE) Backtrace:
> > > > (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
> > > > (EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
> > > > (EE) 2: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
> > > > (EE) 3: /usr/lib64/xorg/modules/input/mouse_drv.so 
> > > > (0x7ff8f2501000+0x6b70) [0x7ff8f2507b70]
> > > > (EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so 
> > > > (0x7ff8f2501000+0x73a0) [0x7ff8f25083a0]
> > > > (EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so 
> > > > (0x7ff8f2501000+0x428c) [0x7ff8f250528c]
> > > > (EE) 6: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
> > > > (EE) 7: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
> > > > (EE) 8: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
> > > > (EE) 9: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
> > > > (EE) 10: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
> > > > (EE) 11: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) 
> > > > [0x7ff8f246cbdf]
> > > > (EE) 12: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) 
> > > > [0x7ff8f25107bf]
> > > > (EE) 13: /usr/lib64/xorg/modules/drivers/radeon_drv.so 
> > > > (0x7ff8f154f000+0x407ec) [0x7ff8f158f7ec]
> > > > (EE) 14: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
> > > > (EE) 15: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
> > > > (EE) 16: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
> > > > (EE) 17: /usr/bin/X (0x40+0x230cd) [0x4230cd]
> > > > (EE) 18: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
> > > > (EE) 19: /usr/bin/X (0x40+0x22c09) [0x422c09]
> > > > (EE)
> > > > (EE) [mi] EQ overflow continuing.  200 events have been dropped.
> > > >
> > > > And the pictures get distorted on the test-webpage. See attached 
> > > > screenshot.
> > > >
> > > 
> > > Anything in your kernel log that corresponds to the errors in your xorg 
> > > log?
> > 
> > No. But I've found out that the errors in the xorg log are unrelated to
> > the image corruption. 
> > I use one of those Logitech mice with this "hyper" fast scrolling
> > feature. And I guess the Xorg mouse driver just can't keep up with the
> > fast input. So it's just a harmless warning that can be ignored, I
> > guess.
> 
> And just in case it got lost in the noise yesterday: 
> The image corruption is caused by Dave's commit:
> 
> commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
> Author: Dave Airlie 
> Date:   Fri Dec 14 21:04:46 2012 +1000
> 
> radeon: fix regression with eviction since evict caching changes
> 
> Reverting it 'fixes' the issue.

Ping.
The issue still happens with todays Linus git tree.
Can you please have a look at this Dave?
Thanks.

-- 
Markus


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2013-01-05 Thread Markus Trippelsdorf
On 2012.12.20 at 14:58 +0100, Markus Trippelsdorf wrote:
 On 2012.12.20 at 14:45 +0100, Markus Trippelsdorf wrote:
  On 2012.12.20 at 08:30 -0500, Alex Deucher wrote:
   On Wed, Dec 19, 2012 at 9:33 AM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
On 2012.12.19 at 15:18 +0100, Maarten Lankhorst wrote:
Fix regression introduced by 85b144f860176
   
(EE) [mi] EQ overflow continuing.  100 events have been dropped.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
(EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
(EE) 2: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
(EE) 3: /usr/lib64/xorg/modules/input/mouse_drv.so 
(0x7ff8f2501000+0x6b70) [0x7ff8f2507b70]
(EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so 
(0x7ff8f2501000+0x73a0) [0x7ff8f25083a0]
(EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so 
(0x7ff8f2501000+0x428c) [0x7ff8f250528c]
(EE) 6: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
(EE) 7: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
(EE) 8: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
(EE) 9: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
(EE) 10: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
(EE) 11: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) 
[0x7ff8f246cbdf]
(EE) 12: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) 
[0x7ff8f25107bf]
(EE) 13: /usr/lib64/xorg/modules/drivers/radeon_drv.so 
(0x7ff8f154f000+0x407ec) [0x7ff8f158f7ec]
(EE) 14: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
(EE) 15: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
(EE) 16: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
(EE) 17: /usr/bin/X (0x40+0x230cd) [0x4230cd]
(EE) 18: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
(EE) 19: /usr/bin/X (0x40+0x22c09) [0x422c09]
(EE)
(EE) [mi] EQ overflow continuing.  200 events have been dropped.
   
And the pictures get distorted on the test-webpage. See attached 
screenshot.
   
   
   Anything in your kernel log that corresponds to the errors in your xorg 
   log?
  
  No. But I've found out that the errors in the xorg log are unrelated to
  the image corruption. 
  I use one of those Logitech mice with this hyper fast scrolling
  feature. And I guess the Xorg mouse driver just can't keep up with the
  fast input. So it's just a harmless warning that can be ignored, I
  guess.
 
 And just in case it got lost in the noise yesterday: 
 The image corruption is caused by Dave's commit:
 
 commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
 Author: Dave Airlie airl...@redhat.com
 Date:   Fri Dec 14 21:04:46 2012 +1000
 
 radeon: fix regression with eviction since evict caching changes
 
 Reverting it 'fixes' the issue.

Ping.
The issue still happens with todays Linus git tree.
Can you please have a look at this Dave?
Thanks.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec

2013-01-03 Thread Markus Trippelsdorf
On 2013.01.02 at 18:37 -0500, Alex Deucher wrote:
> On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
>  wrote:
> > On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
> >> Please affected people can you test if patch :
> >> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
> >>
> >> Fix the issue, you need to make sure you don't have the patch that
> >> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
> >> is :
> >>  .copy = _copy_dma,
> >>  .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
> >
> > It fixes the issue for me. Thanks.
> 
> The count is actually the count, not count - 1.  The real fix seems to
> be that r6xx requires 2 dw aligned transfers.  The attached patch
> fixes the issue for me.

Yes, this one also works for me. Thanks.

-- 
Markus


Re: radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec

2013-01-03 Thread Markus Trippelsdorf
On 2013.01.02 at 18:37 -0500, Alex Deucher wrote:
 On Wed, Jan 2, 2013 at 5:38 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
  Please affected people can you test if patch :
  http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
 
  Fix the issue, you need to make sure you don't have the patch that
  disable dma on r6xx ie that line 977-978  1061-1062  in radeon_asic.c
  is :
   .copy = r600_copy_dma,
   .copy_ring_index = R600_RING_TYPE_DMA_INDEX,
 
  It fixes the issue for me. Thanks.
 
 The count is actually the count, not count - 1.  The real fix seems to
 be that r6xx requires 2 dw aligned transfers.  The attached patch
 fixes the issue for me.

Yes, this one also works for me. Thanks.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec

2013-01-02 Thread Markus Trippelsdorf
On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
> Please affected people can you test if patch :
> http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
> 
> Fix the issue, you need to make sure you don't have the patch that
> disable dma on r6xx ie that line 977-978 & 1061-1062  in radeon_asic.c
> is :
>  .copy = _copy_dma,
>  .copy_ring_index = R600_RING_TYPE_DMA_INDEX,

It fixes the issue for me. Thanks.

-- 
Markus


Re: radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec

2013-01-02 Thread Markus Trippelsdorf
On 2013.01.02 at 17:31 -0500, Jerome Glisse wrote:
 Please affected people can you test if patch :
 http://people.freedesktop.org/~glisse/0003-drm-radeon-fix-dma-copy-on-r6xx-r7xx-evergen-ni-si-g.patch
 
 Fix the issue, you need to make sure you don't have the patch that
 disable dma on r6xx ie that line 977-978  1061-1062  in radeon_asic.c
 is :
  .copy = r600_copy_dma,
  .copy_ring_index = R600_RING_TYPE_DMA_INDEX,

It fixes the issue for me. Thanks.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec

2012-12-23 Thread Markus Trippelsdorf
On 2012.12.23 at 12:31 +0100, Borislav Petkov wrote:
> On Sun, Dec 23, 2012 at 11:19:00AM +, Andy Furniss wrote:
> > modinfo radeon
> > 
> > will give a list assuming you use modules, I think all of them need =.
> 
> Yep, that is one way of getting that info, thanks. I always go and look
> at Documentation/kernel-parameters.txt and forget about modinfo.
> 
> As you say 'radeon' needs to be module but since this is the case with
> the distros, the majority of Linux installations out there have it this
> way so we're fine.

(If you don't use modules:
 git grep MODULE_PARM_DESC -- drivers/gpu/drm/radeon/
)

You may have hit the same issue as I, see:
http://thread.gmane.org/gmane.comp.video.dri.devel/78328

Reverting commit 2d6cc729 fixes the problem for me, setting
radeon.no_wb=1 doesn't help.

-- 
Markus


GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-23 Thread Markus Trippelsdorf
On 2012.12.23 at 10:09 +, Andy Furniss wrote:
> Markus Trippelsdorf wrote:
> 
> >> Does booting with radeon.wb=0 fix the issue?  Please make sure your
> >> kernel has this patch:
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=86a1881d08f65a42c17071a59c0088dbe2870246
> >
> > My kernel has this patch and radeon.wb=0 doesn't help.
> 
> I think that should be no_wb=1

Yes, you're right. But even with radeon.no_wb=1 it still hangs:


...
Dec 23 11:15:02 x4 kernel: radeon :01:05.0: WB disabled
Dec 23 11:15:02 x4 kernel: radeon :01:05.0: fence driver on ring 0 use gpu 
addr 0xa004 and cpu addr 0x8802163ad004
Dec 23 11:15:02 x4 kernel: radeon :01:05.0: fence driver on ring 3 use gpu 
addr 0xac0c and cpu addr 0x8802163adc0c
Dec 23 11:15:02 x4 kernel: radeon :01:05.0: setting latency timer to 64
...
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x089c last fence id 0x089b)
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: Saved 217 dwords of commands on 
ring 0.
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: GPU softreset 
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008010_GRBM_STATUS=0xA000B030
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008014_GRBM_STATUS2=0x0003
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_000E50_SRBM_STATUS=0x20005040
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008674_CP_STALLED_STAT1 = 
0x
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008678_CP_STALLED_STAT2 = 
0x0002
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_00867C_CP_BUSY_STAT = 
0xD086
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008680_CP_STAT  = 
0x80098645
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008020_GRBM_SOFT_RESET=0x7FEE
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: 
R_008020_GRBM_SOFT_RESET=0x0001
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008010_GRBM_STATUS=0xA000B030
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008014_GRBM_STATUS2=0x0003
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_000E50_SRBM_STATUS=0x2000C040
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008674_CP_STALLED_STAT1 = 
0x
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008678_CP_STALLED_STAT2 = 
0x
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_00867C_CP_BUSY_STAT = 
0x
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008680_CP_STAT  = 
0x8010
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: GPU reset succeeded, trying to 
resume
Dec 23 11:16:04 x4 kernel: [drm] PCIE GART of 512M enabled (table at 
0xC004).
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: WB disabled
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: fence driver on ring 0 use gpu 
addr 0xa004 and cpu addr 0x8802163ad004
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: fence driver on ring 3 use gpu 
addr 0xac0c and cpu addr 0x8802163adc0c
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: setting latency timer to 64
Dec 23 11:16:04 x4 kernel: [drm] ring test on 0 succeeded in 1 usecs
Dec 23 11:16:05 x4 kernel: [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test 
failed (0xCAFEDEAD)
Dec 23 11:16:05 x4 kernel: [drm:r600_resume] *ERROR* r600 startup failed on 
resume
Dec 23 11:16:09 x4 kernel: SysRq : Emergency Sync
Dec 23 11:16:09 x4 kernel: Emergency Sync complete
Dec 23 11:16:15 x4 kernel: SysRq : Emergency Remount R/O
Dec 23 11:16:15 x4 kernel: EXT4-fs (sdb2): re-mounted. Opts: (null)

-- 
Markus


GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-23 Thread Markus Trippelsdorf
On 2012.12.22 at 20:46 -0500, Alex Deucher wrote:
> On Mon, Dec 17, 2012 at 5:25 PM, Markus Trippelsdorf
>  wrote:
> > On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
> >> On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
> >>  wrote:
> >> > On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> >> >> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> >> >>  wrote:
> >> >> > As soon as I open the following website:
> >> >> > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> >> >> >
> >> >> > my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> >> >>
> >> >> Is this a regression?  Most likely a 3D driver bug unless you are only
> >> >> seeing it with specific kernels.  What browser are you using and do
> >> >> you have hw accelerated webgl, etc. enabled?  If so, what version of
> >> >> mesa are you using?
> >> >
> >> > This is a regression, because it is caused by yesterdays merge of
> >> > drm-next by Linus. IOW I only see this bug when running a
> >> > v3.7-9432-g9360b53 kernel.
> >>
> >> Can you bisect?  I'm guessing it may be related to the new DMA rings.  
> >> Possibly:
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
> >
> > Yes, the commit above causes the issue.
> >
> 
> Does booting with radeon.wb=0 fix the issue?  Please make sure your
> kernel has this patch:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=86a1881d08f65a42c17071a59c0088dbe2870246

My kernel has this patch and radeon.wb=0 doesn't help. It still freezes
the machine as soon as you scroll on a website with many big images.

-- 
Markus


Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-23 Thread Markus Trippelsdorf
On 2012.12.22 at 20:46 -0500, Alex Deucher wrote:
 On Mon, Dec 17, 2012 at 5:25 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
  On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
   On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
As soon as I open the following website:
http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
   
my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
  
   Is this a regression?  Most likely a 3D driver bug unless you are only
   seeing it with specific kernels.  What browser are you using and do
   you have hw accelerated webgl, etc. enabled?  If so, what version of
   mesa are you using?
  
   This is a regression, because it is caused by yesterdays merge of
   drm-next by Linus. IOW I only see this bug when running a
   v3.7-9432-g9360b53 kernel.
 
  Can you bisect?  I'm guessing it may be related to the new DMA rings.  
  Possibly:
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
 
  Yes, the commit above causes the issue.
 
 
 Does booting with radeon.wb=0 fix the issue?  Please make sure your
 kernel has this patch:
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=86a1881d08f65a42c17071a59c0088dbe2870246

My kernel has this patch and radeon.wb=0 doesn't help. It still freezes
the machine as soon as you scroll on a website with many big images.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-23 Thread Markus Trippelsdorf
On 2012.12.23 at 10:09 +, Andy Furniss wrote:
 Markus Trippelsdorf wrote:
 
  Does booting with radeon.wb=0 fix the issue?  Please make sure your
  kernel has this patch:
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=86a1881d08f65a42c17071a59c0088dbe2870246
 
  My kernel has this patch and radeon.wb=0 doesn't help.
 
 I think that should be no_wb=1

Yes, you're right. But even with radeon.no_wb=1 it still hangs:


...
Dec 23 11:15:02 x4 kernel: radeon :01:05.0: WB disabled
Dec 23 11:15:02 x4 kernel: radeon :01:05.0: fence driver on ring 0 use gpu 
addr 0xa004 and cpu addr 0x8802163ad004
Dec 23 11:15:02 x4 kernel: radeon :01:05.0: fence driver on ring 3 use gpu 
addr 0xac0c and cpu addr 0x8802163adc0c
Dec 23 11:15:02 x4 kernel: radeon :01:05.0: setting latency timer to 64
...
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x089c last fence id 0x089b)
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: Saved 217 dwords of commands on 
ring 0.
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: GPU softreset 
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008010_GRBM_STATUS=0xA000B030
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008014_GRBM_STATUS2=0x0003
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_000E50_SRBM_STATUS=0x20005040
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008674_CP_STALLED_STAT1 = 
0x
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008678_CP_STALLED_STAT2 = 
0x0002
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_00867C_CP_BUSY_STAT = 
0xD086
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008680_CP_STAT  = 
0x80098645
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008020_GRBM_SOFT_RESET=0x7FEE
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: 
R_008020_GRBM_SOFT_RESET=0x0001
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008010_GRBM_STATUS=0xA000B030
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_008014_GRBM_STATUS2=0x0003
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   
R_000E50_SRBM_STATUS=0x2000C040
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008674_CP_STALLED_STAT1 = 
0x
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008678_CP_STALLED_STAT2 = 
0x
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_00867C_CP_BUSY_STAT = 
0x
Dec 23 11:16:04 x4 kernel: radeon :01:05.0:   R_008680_CP_STAT  = 
0x8010
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: GPU reset succeeded, trying to 
resume
Dec 23 11:16:04 x4 kernel: [drm] PCIE GART of 512M enabled (table at 
0xC004).
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: WB disabled
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: fence driver on ring 0 use gpu 
addr 0xa004 and cpu addr 0x8802163ad004
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: fence driver on ring 3 use gpu 
addr 0xac0c and cpu addr 0x8802163adc0c
Dec 23 11:16:04 x4 kernel: radeon :01:05.0: setting latency timer to 64
Dec 23 11:16:04 x4 kernel: [drm] ring test on 0 succeeded in 1 usecs
Dec 23 11:16:05 x4 kernel: [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test 
failed (0xCAFEDEAD)
Dec 23 11:16:05 x4 kernel: [drm:r600_resume] *ERROR* r600 startup failed on 
resume
Dec 23 11:16:09 x4 kernel: SysRq : Emergency Sync
Dec 23 11:16:09 x4 kernel: Emergency Sync complete
Dec 23 11:16:15 x4 kernel: SysRq : Emergency Remount R/O
Dec 23 11:16:15 x4 kernel: EXT4-fs (sdb2): re-mounted. Opts: (null)

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec

2012-12-23 Thread Markus Trippelsdorf
On 2012.12.23 at 12:31 +0100, Borislav Petkov wrote:
 On Sun, Dec 23, 2012 at 11:19:00AM +, Andy Furniss wrote:
  modinfo radeon
  
  will give a list assuming you use modules, I think all of them need =num.
 
 Yep, that is one way of getting that info, thanks. I always go and look
 at Documentation/kernel-parameters.txt and forget about modinfo.
 
 As you say 'radeon' needs to be module but since this is the case with
 the distros, the majority of Linux installations out there have it this
 way so we're fine.

(If you don't use modules:
 git grep MODULE_PARM_DESC -- drivers/gpu/drm/radeon/
)

You may have hit the same issue as I, see:
http://thread.gmane.org/gmane.comp.video.dri.devel/78328

Reverting commit 2d6cc729 fixes the problem for me, setting
radeon.no_wb=1 doesn't help.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2012-12-20 Thread Markus Trippelsdorf
On 2012.12.20 at 14:45 +0100, Markus Trippelsdorf wrote:
> On 2012.12.20 at 08:30 -0500, Alex Deucher wrote:
> > On Wed, Dec 19, 2012 at 9:33 AM, Markus Trippelsdorf
> >  wrote:
> > > On 2012.12.19 at 15:18 +0100, Maarten Lankhorst wrote:
> > >> Fix regression introduced by 85b144f860176
> > >
> > > (EE) [mi] EQ overflow continuing.  100 events have been dropped.
> > > (EE)
> > > (EE) Backtrace:
> > > (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
> > > (EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
> > > (EE) 2: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
> > > (EE) 3: /usr/lib64/xorg/modules/input/mouse_drv.so 
> > > (0x7ff8f2501000+0x6b70) [0x7ff8f2507b70]
> > > (EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so 
> > > (0x7ff8f2501000+0x73a0) [0x7ff8f25083a0]
> > > (EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so 
> > > (0x7ff8f2501000+0x428c) [0x7ff8f250528c]
> > > (EE) 6: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
> > > (EE) 7: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
> > > (EE) 8: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
> > > (EE) 9: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
> > > (EE) 10: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
> > > (EE) 11: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
> > > (EE) 12: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) 
> > > [0x7ff8f25107bf]
> > > (EE) 13: /usr/lib64/xorg/modules/drivers/radeon_drv.so 
> > > (0x7ff8f154f000+0x407ec) [0x7ff8f158f7ec]
> > > (EE) 14: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
> > > (EE) 15: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
> > > (EE) 16: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
> > > (EE) 17: /usr/bin/X (0x40+0x230cd) [0x4230cd]
> > > (EE) 18: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
> > > (EE) 19: /usr/bin/X (0x40+0x22c09) [0x422c09]
> > > (EE)
> > > (EE) [mi] EQ overflow continuing.  200 events have been dropped.
> > >
> > > And the pictures get distorted on the test-webpage. See attached 
> > > screenshot.
> > >
> > 
> > Anything in your kernel log that corresponds to the errors in your xorg log?
> 
> No. But I've found out that the errors in the xorg log are unrelated to
> the image corruption. 
> I use one of those Logitech mice with this "hyper" fast scrolling
> feature. And I guess the Xorg mouse driver just can't keep up with the
> fast input. So it's just a harmless warning that can be ignored, I
> guess.

And just in case it got lost in the noise yesterday: 
The image corruption is caused by Dave's commit:

commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
Author: Dave Airlie 
Date:   Fri Dec 14 21:04:46 2012 +1000

radeon: fix regression with eviction since evict caching changes

Reverting it 'fixes' the issue.

-- 
Markus


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2012-12-20 Thread Markus Trippelsdorf
On 2012.12.20 at 08:30 -0500, Alex Deucher wrote:
> On Wed, Dec 19, 2012 at 9:33 AM, Markus Trippelsdorf
>  wrote:
> > On 2012.12.19 at 15:18 +0100, Maarten Lankhorst wrote:
> >> Fix regression introduced by 85b144f860176
> >
> > Thanks. This fixes the kernel BUG, but now I get this errors in my
> > Xorg.log:
> >
> > [23.092] [mi] Increasing EQ size to 512 to prevent dropped events.
> > (EE) [mi] EQ overflowing.  Additional events will be discarded until 
> > existing events are processed.
> > (EE)
> > (EE) Backtrace:
> > (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
> > (EE) 1: /usr/bin/X (mieqEnqueue+0x21b) [0x56615b]
> > (EE) 2: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
> > (EE) 3: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
> > (EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x6b70) 
> > [0x7ff8f2507b70]
> > (EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x73a0) 
> > [0x7ff8f25083a0]
> > (EE) 6: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x428c) 
> > [0x7ff8f250528c]
> > (EE) 7: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
> > (EE) 8: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
> > (EE) 9: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
> > (EE) 10: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
> > (EE) 11: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
> > (EE) 12: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
> > (EE) 13: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) 
> > [0x7ff8f25107bf]
> > (EE) 14: /usr/lib64/xorg/modules/drivers/radeon_drv.so 
> > (0x7ff8f154f000+0x407ec) [0x7ff8f158f7ec]
> > (EE) 15: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
> > (EE) 16: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
> > (EE) 17: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
> > (EE) 18: /usr/bin/X (0x40+0x230cd) [0x4230cd]
> > (EE) 19: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
> > (EE) 20: /usr/bin/X (0x40+0x22c09) [0x422c09]
> > (EE)
> > (EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher 
> > up the stack.
> > (EE) [mi] mieq is *NOT* the cause.  It is a victim.
> > (EE) [mi] EQ overflow continuing.  100 events have been dropped.
> > (EE)
> > (EE) Backtrace:
> > (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
> > (EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
> > (EE) 2: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
> > (EE) 3: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x6b70) 
> > [0x7ff8f2507b70]
> > (EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x73a0) 
> > [0x7ff8f25083a0]
> > (EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x428c) 
> > [0x7ff8f250528c]
> > (EE) 6: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
> > (EE) 7: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
> > (EE) 8: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
> > (EE) 9: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
> > (EE) 10: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
> > (EE) 11: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
> > (EE) 12: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) 
> > [0x7ff8f25107bf]
> > (EE) 13: /usr/lib64/xorg/modules/drivers/radeon_drv.so 
> > (0x7ff8f154f000+0x407ec) [0x7ff8f158f7ec]
> > (EE) 14: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
> > (EE) 15: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
> > (EE) 16: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
> > (EE) 17: /usr/bin/X (0x40+0x230cd) [0x4230cd]
> > (EE) 18: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
> > (EE) 19: /usr/bin/X (0x40+0x22c09) [0x422c09]
> > (EE)
> > (EE) [mi] EQ overflow continuing.  200 events have been dropped.
> >
> > And the pictures get distorted on the test-webpage. See attached screenshot.
> >
> 
> Anything in your kernel log that corresponds to the errors in your xorg log?

No. But I've found out that the errors in the xorg log are unrelated to
the image corruption. 
I use one of those Logitech mice with this "hyper" fast scrolling
feature. And I guess the Xorg mouse driver just can't keep up with the
fast input. So it's just a harmless warning that can be ignored, I
guess.

-- 
Markus


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2012-12-20 Thread Markus Trippelsdorf
On 2012.12.20 at 08:30 -0500, Alex Deucher wrote:
 On Wed, Dec 19, 2012 at 9:33 AM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2012.12.19 at 15:18 +0100, Maarten Lankhorst wrote:
  Fix regression introduced by 85b144f860176
 
  Thanks. This fixes the kernel BUG, but now I get this errors in my
  Xorg.log:
 
  [23.092] [mi] Increasing EQ size to 512 to prevent dropped events.
  (EE) [mi] EQ overflowing.  Additional events will be discarded until 
  existing events are processed.
  (EE)
  (EE) Backtrace:
  (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
  (EE) 1: /usr/bin/X (mieqEnqueue+0x21b) [0x56615b]
  (EE) 2: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
  (EE) 3: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
  (EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x6b70) 
  [0x7ff8f2507b70]
  (EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x73a0) 
  [0x7ff8f25083a0]
  (EE) 6: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x428c) 
  [0x7ff8f250528c]
  (EE) 7: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
  (EE) 8: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
  (EE) 9: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
  (EE) 10: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
  (EE) 11: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
  (EE) 12: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
  (EE) 13: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) 
  [0x7ff8f25107bf]
  (EE) 14: /usr/lib64/xorg/modules/drivers/radeon_drv.so 
  (0x7ff8f154f000+0x407ec) [0x7ff8f158f7ec]
  (EE) 15: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
  (EE) 16: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
  (EE) 17: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
  (EE) 18: /usr/bin/X (0x40+0x230cd) [0x4230cd]
  (EE) 19: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
  (EE) 20: /usr/bin/X (0x40+0x22c09) [0x422c09]
  (EE)
  (EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher 
  up the stack.
  (EE) [mi] mieq is *NOT* the cause.  It is a victim.
  (EE) [mi] EQ overflow continuing.  100 events have been dropped.
  (EE)
  (EE) Backtrace:
  (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
  (EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
  (EE) 2: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
  (EE) 3: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x6b70) 
  [0x7ff8f2507b70]
  (EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x73a0) 
  [0x7ff8f25083a0]
  (EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x428c) 
  [0x7ff8f250528c]
  (EE) 6: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
  (EE) 7: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
  (EE) 8: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
  (EE) 9: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
  (EE) 10: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
  (EE) 11: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
  (EE) 12: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) 
  [0x7ff8f25107bf]
  (EE) 13: /usr/lib64/xorg/modules/drivers/radeon_drv.so 
  (0x7ff8f154f000+0x407ec) [0x7ff8f158f7ec]
  (EE) 14: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
  (EE) 15: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
  (EE) 16: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
  (EE) 17: /usr/bin/X (0x40+0x230cd) [0x4230cd]
  (EE) 18: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
  (EE) 19: /usr/bin/X (0x40+0x22c09) [0x422c09]
  (EE)
  (EE) [mi] EQ overflow continuing.  200 events have been dropped.
 
  And the pictures get distorted on the test-webpage. See attached screenshot.
 
 
 Anything in your kernel log that corresponds to the errors in your xorg log?

No. But I've found out that the errors in the xorg log are unrelated to
the image corruption. 
I use one of those Logitech mice with this hyper fast scrolling
feature. And I guess the Xorg mouse driver just can't keep up with the
fast input. So it's just a harmless warning that can be ignored, I
guess.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2012-12-20 Thread Markus Trippelsdorf
On 2012.12.20 at 14:45 +0100, Markus Trippelsdorf wrote:
 On 2012.12.20 at 08:30 -0500, Alex Deucher wrote:
  On Wed, Dec 19, 2012 at 9:33 AM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2012.12.19 at 15:18 +0100, Maarten Lankhorst wrote:
   Fix regression introduced by 85b144f860176
  
   (EE) [mi] EQ overflow continuing.  100 events have been dropped.
   (EE)
   (EE) Backtrace:
   (EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
   (EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
   (EE) 2: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
   (EE) 3: /usr/lib64/xorg/modules/input/mouse_drv.so 
   (0x7ff8f2501000+0x6b70) [0x7ff8f2507b70]
   (EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so 
   (0x7ff8f2501000+0x73a0) [0x7ff8f25083a0]
   (EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so 
   (0x7ff8f2501000+0x428c) [0x7ff8f250528c]
   (EE) 6: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
   (EE) 7: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
   (EE) 8: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
   (EE) 9: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
   (EE) 10: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
   (EE) 11: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
   (EE) 12: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) 
   [0x7ff8f25107bf]
   (EE) 13: /usr/lib64/xorg/modules/drivers/radeon_drv.so 
   (0x7ff8f154f000+0x407ec) [0x7ff8f158f7ec]
   (EE) 14: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
   (EE) 15: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
   (EE) 16: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
   (EE) 17: /usr/bin/X (0x40+0x230cd) [0x4230cd]
   (EE) 18: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
   (EE) 19: /usr/bin/X (0x40+0x22c09) [0x422c09]
   (EE)
   (EE) [mi] EQ overflow continuing.  200 events have been dropped.
  
   And the pictures get distorted on the test-webpage. See attached 
   screenshot.
  
  
  Anything in your kernel log that corresponds to the errors in your xorg log?
 
 No. But I've found out that the errors in the xorg log are unrelated to
 the image corruption. 
 I use one of those Logitech mice with this hyper fast scrolling
 feature. And I guess the Xorg mouse driver just can't keep up with the
 fast input. So it's just a harmless warning that can be ignored, I
 guess.

And just in case it got lost in the noise yesterday: 
The image corruption is caused by Dave's commit:

commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
Author: Dave Airlie airl...@redhat.com
Date:   Fri Dec 14 21:04:46 2012 +1000

radeon: fix regression with eviction since evict caching changes

Reverting it 'fixes' the issue.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2012-12-19 Thread Markus Trippelsdorf
On 2012.12.19 at 15:54 +0100, Markus Trippelsdorf wrote:
> On 2012.12.19 at 09:47 -0500, Alex Deucher wrote:
> 
> And the pictures get distorted on the test-webpage when I scroll up and
> down, see:
> http://trippelsdorf.de/bad.png

The picture distortion issue is caused by:

commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
Author: Dave Airlie 
Date:   Fri Dec 14 21:04:46 2012 +1000

radeon: fix regression with eviction since evict caching changes

Since 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d
drm/radeon: use cached memory when evicting for vram on non agp

evicting from TTM would try and evict to TTM instead of system,
not so good.

This should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=58272

Signed-off-by: Dave Airlie 
Signed-off-by: Maarten Lankhorst 
Signed-off-by: Alex Deucher 

Reverting the commit above "fixes" the problem.

-- 
Markus


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2012-12-19 Thread Markus Trippelsdorf
On 2012.12.19 at 09:47 -0500, Alex Deucher wrote:
> On Wed, Dec 19, 2012 at 9:41 AM, Paul Menzel
>  wrote:
> > Am Mittwoch, den 19.12.2012, 15:18 +0100 schrieb Maarten Lankhorst:
> >> Fix regression introduced by 85b144f860176
> >
> > Thanks for the catch and patch.
> >
> > Also please add the commit summary to make the commit message self
> > contained?
> >
> > The problem description would also be nice.
> >
> >> Signed-off-by: Maarten Lankhorst 
> >> Reported-by: Markus Trippelsdorf 
> > Message-ID: <20121217182752.GA351 at x4>
> >
> >> ---
> >>
> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> >> index 0bf66f9..9f85418 100644
> >> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> >> @@ -579,7 +579,7 @@ static int ttm_bo_cleanup_refs_and_unlock(struct 
> >> ttm_buffer_object *bo,
> >>* at this point the buffer should be dead, so
> >>* no new sync objects can be attached.
> >>*/
> >> - sync_obj = driver->sync_obj_ref(>sync_obj);
> >> + sync_obj = driver->sync_obj_ref(bo->sync_obj);
> >
> > Any idea, why this only had an impact for one person so far?
> 
> There are several radeon bugs from drm-next 3.8 that may be ultimately
> related to this.

This patch fixes the kernel BUG, but now I get these errors in my
Xorg.log:

[23.092] [mi] Increasing EQ size to 512 to prevent dropped events.
(EE) [mi] EQ overflowing.  Additional events will be discarded until existing 
events are processed.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
(EE) 1: /usr/bin/X (mieqEnqueue+0x21b) [0x56615b]
(EE) 2: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
(EE) 3: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
(EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x6b70) 
[0x7ff8f2507b70]
(EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x73a0) 
[0x7ff8f25083a0]
(EE) 6: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x428c) 
[0x7ff8f250528c]
(EE) 7: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
(EE) 8: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
(EE) 9: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
(EE) 10: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
(EE) 11: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
(EE) 12: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
(EE) 13: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) [0x7ff8f25107bf]
(EE) 14: /usr/lib64/xorg/modules/drivers/radeon_drv.so (0x7ff8f154f000+0x407ec) 
[0x7ff8f158f7ec]
(EE) 15: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
(EE) 16: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
(EE) 17: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
(EE) 18: /usr/bin/X (0x40+0x230cd) [0x4230cd]
(EE) 19: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
(EE) 20: /usr/bin/X (0x40+0x22c09) [0x422c09]
(EE)
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up 
the stack.
(EE) [mi] mieq is *NOT* the cause.  It is a victim.
(EE) [mi] EQ overflow continuing.  100 events have been dropped.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
(EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
(EE) 2: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
(EE) 3: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x6b70) 
[0x7ff8f2507b70]
(EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x73a0) 
[0x7ff8f25083a0]
(EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x428c) 
[0x7ff8f250528c]
(EE) 6: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
(EE) 7: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
(EE) 8: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
(EE) 9: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
(EE) 10: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
(EE) 11: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
(EE) 12: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) [0x7ff8f25107bf]
(EE) 13: /usr/lib64/xorg/modules/drivers/radeon_drv.so (0x7ff8f154f000+0x407ec) 
[0x7ff8f158f7ec]
(EE) 14: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
(EE) 15: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
(EE) 16: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
(EE) 17: /usr/bin/X (0x40+0x230cd) [0x4230cd]
(EE) 18: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
(EE) 19: /usr/bin/X (0x40+0x22c09) [0x422c09]
(EE)
(EE) [mi] EQ overflow continuing.  200 events have been dropped.

And the pictures get distorted on the test-webpage when I scroll up and
down, see:
http://trippelsdorf.de/bad.png

-- 
Markus


[PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2012-12-19 Thread Markus Trippelsdorf
On 2012.12.19 at 15:18 +0100, Maarten Lankhorst wrote:
> Fix regression introduced by 85b144f860176

Thanks. This fixes the kernel BUG, but now I get this errors in my
Xorg.log:

[23.092] [mi] Increasing EQ size to 512 to prevent dropped events.
(EE) [mi] EQ overflowing.  Additional events will be discarded until existing 
events are processed.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
(EE) 1: /usr/bin/X (mieqEnqueue+0x21b) [0x56615b]
(EE) 2: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
(EE) 3: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
(EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x6b70) 
[0x7ff8f2507b70]
(EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x73a0) 
[0x7ff8f25083a0]
(EE) 6: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x428c) 
[0x7ff8f250528c]
(EE) 7: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
(EE) 8: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
(EE) 9: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
(EE) 10: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
(EE) 11: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
(EE) 12: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
(EE) 13: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) [0x7ff8f25107bf]
(EE) 14: /usr/lib64/xorg/modules/drivers/radeon_drv.so (0x7ff8f154f000+0x407ec) 
[0x7ff8f158f7ec]
(EE) 15: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
(EE) 16: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
(EE) 17: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
(EE) 18: /usr/bin/X (0x40+0x230cd) [0x4230cd]
(EE) 19: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
(EE) 20: /usr/bin/X (0x40+0x22c09) [0x422c09]
(EE)
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up 
the stack.
(EE) [mi] mieq is *NOT* the cause.  It is a victim.
(EE) [mi] EQ overflow continuing.  100 events have been dropped.
(EE)
(EE) Backtrace:
(EE) 0: /usr/bin/X (xorg_backtrace+0x3d) [0x584f1d]
(EE) 1: /usr/bin/X (QueuePointerEvents+0x52) [0x44a792]
(EE) 2: /usr/bin/X (xf86PostButtonEvent+0xd5) [0x4829b5]
(EE) 3: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x6b70) 
[0x7ff8f2507b70]
(EE) 4: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x73a0) 
[0x7ff8f25083a0]
(EE) 5: /usr/lib64/xorg/modules/input/mouse_drv.so (0x7ff8f2501000+0x428c) 
[0x7ff8f250528c]
(EE) 6: /usr/bin/X (0x40+0x71cd8) [0x471cd8]
(EE) 7: /usr/bin/X (0x40+0x9a2ab) [0x49a2ab]
(EE) 8: /lib/libpthread.so.0 (0x7ff8f1edc000+0xf260) [0x7ff8f1eeb260]
(EE) 9: /lib/libc.so.6 (ioctl+0x7) [0x7ff8f19bd127]
(EE) 10: /usr/lib/libdrm.so.2 (drmIoctl+0x34) [0x7ff8f246a634]
(EE) 11: /usr/lib/libdrm.so.2 (drmCommandWriteRead+0x1f) [0x7ff8f246cbdf]
(EE) 12: /usr/lib/libdrm_radeon.so.1 (0x7ff8f250e000+0x27bf) [0x7ff8f25107bf]
(EE) 13: /usr/lib64/xorg/modules/drivers/radeon_drv.so (0x7ff8f154f000+0x407ec) 
[0x7ff8f158f7ec]
(EE) 14: /usr/bin/X (_CallCallbacks+0x34) [0x438894]
(EE) 15: /usr/bin/X (FlushAllOutput+0x2c) [0x5880ec]
(EE) 16: /usr/bin/X (0x40+0x33aa1) [0x433aa1]
(EE) 17: /usr/bin/X (0x40+0x230cd) [0x4230cd]
(EE) 18: /lib/libc.so.6 (__libc_start_main+0xf5) [0x7ff8f19088b5]
(EE) 19: /usr/bin/X (0x40+0x22c09) [0x422c09]
(EE)
(EE) [mi] EQ overflow continuing.  200 events have been dropped.

And the pictures get distorted on the test-webpage. See attached screenshot.

-- 
Markus
-- next part --
A non-text attachment was scrubbed...
Name: bad.jpg
Type: image/jpeg
Size: 163834 bytes
Desc: not available
URL: 



GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-19 Thread Markus Trippelsdorf
On 2012.12.19 at 14:57 +0100, Maarten Lankhorst wrote:
> Op 18-12-12 17:12, Markus Trippelsdorf schreef:
> > With your supposed debugging BUG_ONs added I still get:
> >
> > Dec 18 17:01:15 x4 kernel: [ cut here ]
> > Dec 18 17:01:15 x4 kernel: WARNING: at include/linux/kref.h:42 
> > radeon_fence_ref+0x2c/0x40()
> > Dec 18 17:01:15 x4 kernel: Hardware name: System Product Name
> > Dec 18 17:01:15 x4 kernel: Pid: 157, comm: X Not tainted 
> > 3.7.0-rc7-00520-g85b144f-dirty #174
> > Dec 18 17:01:15 x4 kernel: Call Trace:
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > warn_slowpath_common+0x74/0xb0
> > Dec 18 17:01:15 x4 kernel: [] ? radeon_fence_ref+0x2c/0x40
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > ttm_bo_cleanup_refs_and_unlock+0x18c/0x2d0
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > ttm_mem_evict_first+0x1dc/0x2a0
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > ttm_bo_man_get_node+0x62/0xb0
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > ttm_bo_mem_space+0x28e/0x340
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > ttm_bo_move_buffer+0xfc/0x170
> > Dec 18 17:01:15 x4 kernel: [] ? kmem_cache_alloc+0xb2/0xc0
> > Dec 18 17:01:15 x4 kernel: [] ? ttm_bo_validate+0x95/0x110
> > Dec 18 17:01:15 x4 kernel: [] ? ttm_bo_init+0x2ec/0x3b0
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > radeon_bo_create+0x18a/0x200
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > radeon_bo_clear_va+0x40/0x40
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > radeon_gem_object_create+0x92/0x160
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > radeon_gem_create_ioctl+0x6c/0x150
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > radeon_gem_object_free+0x2f/0x40
> > Dec 18 17:01:15 x4 kernel: [] ? drm_ioctl+0x420/0x4f0
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > radeon_gem_pwrite_ioctl+0x20/0x20
> > Dec 18 17:01:15 x4 kernel: [] ? do_vfs_ioctl+0x2e4/0x4e0
> > Dec 18 17:01:15 x4 kernel: [] ? vfs_read+0x118/0x160
> > Dec 18 17:01:15 x4 kernel: [] ? sys_ioctl+0x4c/0xa0
> > Dec 18 17:01:15 x4 kernel: [] ? sys_read+0x51/0xa0
> > Dec 18 17:01:15 x4 kernel: [] ? 
> > system_call_fastpath+0x16/0x1b
> so the kref to fence is null here. This should be impossible and
> indicates a bug in refcounting somewhere, or possibly memory
> corruption.
> 
> Lets first look where things could go wrong..
> 
> sync_obj member requires fence_lock to be taken, but radeon code in
> general doesn't do that, hm..
> 
> I think radeon_cs_sync_rings needs to take fence_lock during the
> iteration, then taking on a refcount to the fence, and
> radeon_crtc_page_flip and radeon_move_blit are lacking refcount on
> fence_lock as well.
> 
> But that would probably still not explain why it crashes in
> radeon_vm_bo_invalidate shortly after, so it seems just as likely that
> it's operating on freed memory there or something.
> 
> But none of the code touches refcounting for that bo, and I really
> don't see how I messed up anything there.
> 
> I seem to be able to reproduce it if I add a hack though, can you test
> if you get the exact same issues if you apply this patch?

Your patch doesn't apply unfortunately:

markus at x4 linux % patch -p1 --dry-run < ~/maarten.patch
checking file drivers/gpu/drm/ttm/ttm_bo.c
Hunk #1 succeeded at 512 with fuzz 1.
Hunk #6 FAILED at 814.
1 out of 6 hunks FAILED
markus at x4 linux % git describe
v3.7-10833-g752451f
markus at x4 linux % 

-- 
Markus


Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-19 Thread Markus Trippelsdorf
On 2012.12.19 at 14:57 +0100, Maarten Lankhorst wrote:
 Op 18-12-12 17:12, Markus Trippelsdorf schreef:
  With your supposed debugging BUG_ONs added I still get:
 
  Dec 18 17:01:15 x4 kernel: [ cut here ]
  Dec 18 17:01:15 x4 kernel: WARNING: at include/linux/kref.h:42 
  radeon_fence_ref+0x2c/0x40()
  Dec 18 17:01:15 x4 kernel: Hardware name: System Product Name
  Dec 18 17:01:15 x4 kernel: Pid: 157, comm: X Not tainted 
  3.7.0-rc7-00520-g85b144f-dirty #174
  Dec 18 17:01:15 x4 kernel: Call Trace:
  Dec 18 17:01:15 x4 kernel: [81058c84] ? 
  warn_slowpath_common+0x74/0xb0
  Dec 18 17:01:15 x4 kernel: [8129273c] ? radeon_fence_ref+0x2c/0x40
  Dec 18 17:01:15 x4 kernel: [8125e95c] ? 
  ttm_bo_cleanup_refs_and_unlock+0x18c/0x2d0
  Dec 18 17:01:15 x4 kernel: [8125f17c] ? 
  ttm_mem_evict_first+0x1dc/0x2a0
  Dec 18 17:01:15 x4 kernel: [81264452] ? 
  ttm_bo_man_get_node+0x62/0xb0
  Dec 18 17:01:15 x4 kernel: [8125f4ce] ? 
  ttm_bo_mem_space+0x28e/0x340
  Dec 18 17:01:15 x4 kernel: [8125fb0c] ? 
  ttm_bo_move_buffer+0xfc/0x170
  Dec 18 17:01:15 x4 kernel: [810de172] ? kmem_cache_alloc+0xb2/0xc0
  Dec 18 17:01:15 x4 kernel: [8125fc15] ? ttm_bo_validate+0x95/0x110
  Dec 18 17:01:15 x4 kernel: [8125ff7c] ? ttm_bo_init+0x2ec/0x3b0
  Dec 18 17:01:15 x4 kernel: [8129419a] ? 
  radeon_bo_create+0x18a/0x200
  Dec 18 17:01:15 x4 kernel: [81293e80] ? 
  radeon_bo_clear_va+0x40/0x40
  Dec 18 17:01:15 x4 kernel: [812a5342] ? 
  radeon_gem_object_create+0x92/0x160
  Dec 18 17:01:15 x4 kernel: [812a575c] ? 
  radeon_gem_create_ioctl+0x6c/0x150
  Dec 18 17:01:15 x4 kernel: [812a529f] ? 
  radeon_gem_object_free+0x2f/0x40
  Dec 18 17:01:15 x4 kernel: [81246b60] ? drm_ioctl+0x420/0x4f0
  Dec 18 17:01:15 x4 kernel: [812a56f0] ? 
  radeon_gem_pwrite_ioctl+0x20/0x20
  Dec 18 17:01:15 x4 kernel: [810f53a4] ? do_vfs_ioctl+0x2e4/0x4e0
  Dec 18 17:01:15 x4 kernel: [810e5588] ? vfs_read+0x118/0x160
  Dec 18 17:01:15 x4 kernel: [810f55ec] ? sys_ioctl+0x4c/0xa0
  Dec 18 17:01:15 x4 kernel: [810e5851] ? sys_read+0x51/0xa0
  Dec 18 17:01:15 x4 kernel: [814b0612] ? 
  system_call_fastpath+0x16/0x1b
 so the kref to fence is null here. This should be impossible and
 indicates a bug in refcounting somewhere, or possibly memory
 corruption.
 
 Lets first look where things could go wrong..
 
 sync_obj member requires fence_lock to be taken, but radeon code in
 general doesn't do that, hm..
 
 I think radeon_cs_sync_rings needs to take fence_lock during the
 iteration, then taking on a refcount to the fence, and
 radeon_crtc_page_flip and radeon_move_blit are lacking refcount on
 fence_lock as well.
 
 But that would probably still not explain why it crashes in
 radeon_vm_bo_invalidate shortly after, so it seems just as likely that
 it's operating on freed memory there or something.
 
 But none of the code touches refcounting for that bo, and I really
 don't see how I messed up anything there.
 
 I seem to be able to reproduce it if I add a hack though, can you test
 if you get the exact same issues if you apply this patch?

Your patch doesn't apply unfortunately:

markus@x4 linux % patch -p1 --dry-run  ~/maarten.patch
checking file drivers/gpu/drm/ttm/ttm_bo.c
Hunk #1 succeeded at 512 with fuzz 1.
Hunk #6 FAILED at 814.
1 out of 6 hunks FAILED
markus@x4 linux % git describe
v3.7-10833-g752451f
markus@x4 linux % 

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

2012-12-19 Thread Markus Trippelsdorf
On 2012.12.19 at 15:54 +0100, Markus Trippelsdorf wrote:
 On 2012.12.19 at 09:47 -0500, Alex Deucher wrote:
 
 And the pictures get distorted on the test-webpage when I scroll up and
 down, see:
 http://trippelsdorf.de/bad.png

The picture distortion issue is caused by:

commit dd54fee7d440c4a9756cce2c24a50c15e4c17ccb
Author: Dave Airlie airl...@redhat.com
Date:   Fri Dec 14 21:04:46 2012 +1000

radeon: fix regression with eviction since evict caching changes

Since 0d0b3e7443bed6b49cb90fe7ddc4b5578a83a88d
drm/radeon: use cached memory when evicting for vram on non agp

evicting from TTM would try and evict to TTM instead of system,
not so good.

This should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=58272

Signed-off-by: Dave Airlie airl...@redhat.com
Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com
Signed-off-by: Alex Deucher alexander.deuc...@amd.com

Reverting the commit above fixes the problem.

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-18 Thread Markus Trippelsdorf
On 2012.12.18 at 16:24 +0100, Maarten Lankhorst wrote:
> Op 18-12-12 14:38, Markus Trippelsdorf schreef:
> > On 2012.12.18 at 12:20 +0100, Michel D?nzer wrote:
> >> On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: 
> >>> On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
> >>>> On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
> >>>>> On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
> >>>>>  wrote:
> >>>>>> On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> >>>>>>> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> >>>>>>>  wrote:
> >>>>>>>> As soon as I open the following website:
> >>>>>>>> http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> >>>>>>>>
> >>>>>>>> my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> >>>>>>> Is this a regression?  Most likely a 3D driver bug unless you are only
> >>>>>>> seeing it with specific kernels.  What browser are you using and do
> >>>>>>> you have hw accelerated webgl, etc. enabled?  If so, what version of
> >>>>>>> mesa are you using?
> >>>>>> This is a regression, because it is caused by yesterdays merge of
> >>>>>> drm-next by Linus. IOW I only see this bug when running a
> >>>>>> v3.7-9432-g9360b53 kernel.
> >>>>> Can you bisect?  I'm guessing it may be related to the new DMA rings.  
> >>>>> Possibly:
> >>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
> >>>> Yes, the commit above causes the issue. 
> >>>>
> >>>>  2d6cc72  GPU lockups
> >>> With 2d6cc72 reverted I get:
> >>>
> >>> Dec 17 23:09:35 x4 kernel: [ cut here ]
> >> Probably a separate issue, can you bisect this one as well?
> > Yes. Git-bisect points to:
> >
> > 85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
> > commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
> > Author: Maarten Lankhorst 
> > Date:   Thu Nov 29 11:36:54 2012 +
> >
> > drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
> > held, v3
> >
> > (Please note that this bug is a little bit harder to reproduce. But
> > when you scroll up and down for ~10 seconds on the webpage mentioned
> > above it will trigger the oops.
> > So while I'm not 100% sure that the issue is caused by exactly this
> > commit, the vicinity should be right)
> >
> Those dmesg warnings sound suspicious, looks like something is going
> very wrong there.
> 
> Can you revert the one before it? "drm/radeon: allow move_notify to be
> called without reservation" Reservation should be held at this point,
> that commit got in accidentally.
> 
> I doubt not holding a reservation is causing it though, I don't really
> see how that commit could cause it however, so can you please double
> check it never happened before that point, and only started at that
> commit?
> 
> also slap in a BUG_ON(!ttm_bo_is_reserved(bo)) in
> ttm_bo_cleanup_refs_and_unlock for good measure, and a
> BUG_ON(spin_trylock(>fence_lock)); to ttm_bo_wait.
> 
> I really don't see how that specific commit can be wrong though, so
> awaiting your results first before I try to dig more into it.

I just reran git-bisect just on your commits (from 1a1494def to 97a875cbd)
and I landed on the same commit as above:

commit 85b144f86 (drm/ttm: call ttm_bo_cleanup_refs with reservation and lru 
lock held, v3)

So now I'm pretty sure it's specifically this commit that started the
issue.

With your supposed debugging BUG_ONs added I still get:

Dec 18 17:01:15 x4 kernel: [ cut here ]
Dec 18 17:01:15 x4 kernel: WARNING: at include/linux/kref.h:42 
radeon_fence_ref+0x2c/0x40()
Dec 18 17:01:15 x4 kernel: Hardware name: System Product Name
Dec 18 17:01:15 x4 kernel: Pid: 157, comm: X Not tainted 
3.7.0-rc7-00520-g85b144f-dirty #174
Dec 18 17:01:15 x4 kernel: Call Trace:
Dec 18 17:01:15 x4 kernel: [] ? warn_slowpath_common+0x74/0xb0
Dec 18 17:01:15 x4 kernel: [] ? radeon_fence_ref+0x2c/0x40
Dec 18 17:01:15 x4 kernel: [] ? 
ttm_bo_cleanup_refs_and_unlock+0x18c/0x2d0
Dec 18 17:01:15 x4 kernel: [] ? 
ttm_mem_evict_first+0x1dc/0x2a0
Dec 18 17:01:15 x4 kernel: [] ? ttm_bo_man_get_node+0x62/0xb0
Dec 18 17:01:15 x4 kernel: [] ? ttm_bo_mem_space+0x28e/0x340
Dec 18 17:01:15 x4 kernel: [] ? t

GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-18 Thread Markus Trippelsdorf
On 2012.12.18 at 14:38 +0100, Markus Trippelsdorf wrote:
> On 2012.12.18 at 12:20 +0100, Michel D?nzer wrote:
> > On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: 
> > > On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
> > > > On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
> > > > > On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
> > > > >  wrote:
> > > > > > On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> > > > > >> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> > > > > >>  wrote:
> > > > > >> > As soon as I open the following website:
> > > > > >> > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> > > > > >> >
> > > > > >> > my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> > > > > >>
> > > > > >> Is this a regression?  Most likely a 3D driver bug unless you are 
> > > > > >> only
> > > > > >> seeing it with specific kernels.  What browser are you using and do
> > > > > >> you have hw accelerated webgl, etc. enabled?  If so, what version 
> > > > > >> of
> > > > > >> mesa are you using?
> > > > > >
> > > > > > This is a regression, because it is caused by yesterdays merge of
> > > > > > drm-next by Linus. IOW I only see this bug when running a
> > > > > > v3.7-9432-g9360b53 kernel.
> > > > > 
> > > > > Can you bisect?  I'm guessing it may be related to the new DMA rings. 
> > > > >  Possibly:
> > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
> > > > 
> > > > Yes, the commit above causes the issue. 
> > > > 
> > > >  2d6cc72  GPU lockups
> > > 
> > > With 2d6cc72 reverted I get:
> > > 
> > > Dec 17 23:09:35 x4 kernel: [ cut here ]
> > 
> > Probably a separate issue, can you bisect this one as well?
> 
> Yes. Git-bisect points to:
> 
> 85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
> commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
> Author: Maarten Lankhorst 
> Date:   Thu Nov 29 11:36:54 2012 +
> 
> drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
> held, v3
> 
> (Please note that this bug is a little bit harder to reproduce. But
> when you scroll up and down for ~10 seconds on the webpage mentioned
> above it will trigger the oops.
> So while I'm not 100% sure that the issue is caused by exactly this
> commit, the vicinity should be right)
> 
> Dec 18 14:29:07 x4 kernel: [ cut here ]
> Dec 18 14:29:07 x4 kernel: WARNING: at include/linux/kref.h:42 
> radeon_fence_ref+0x2c/0x40()
> Dec 18 14:29:07 x4 kernel: Hardware name: System Product Name
> Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Not tainted 
> 3.7.0-rc7-00520-g85b144f #168
> Dec 18 14:29:07 x4 kernel: Call Trace:
> Dec 18 14:29:07 x4 kernel: [] ? 
> warn_slowpath_common+0x74/0xb0
> Dec 18 14:29:07 x4 kernel: [] ? radeon_fence_ref+0x2c/0x40
> Dec 18 14:29:07 x4 kernel: [] ? 
> ttm_bo_cleanup_refs_and_unlock+0x17c/0x2c0
> Dec 18 14:29:07 x4 kernel: [] ? 
> ttm_mem_evict_first+0x1dc/0x2a0
> Dec 18 14:29:07 x4 kernel: [] ? 
> ttm_bo_man_get_node+0x62/0xb0
> Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_mem_space+0x28e/0x340
> Dec 18 14:29:07 x4 kernel: [] ? 
> ttm_bo_move_buffer+0xfc/0x170
> Dec 18 14:29:07 x4 kernel: [] ? kmem_cache_alloc+0xb2/0xc0
> Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_validate+0x95/0x110
> Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_init+0x2ec/0x3b0
> Dec 18 14:29:07 x4 kernel: [] ? radeon_bo_create+0x18a/0x200
> Dec 18 14:29:07 x4 kernel: [] ? radeon_bo_clear_va+0x40/0x40
> Dec 18 14:29:07 x4 kernel: [] ? 
> radeon_gem_object_create+0x92/0x160
> Dec 18 14:29:07 x4 kernel: [] ? 
> radeon_gem_create_ioctl+0x6c/0x150
> Dec 18 14:29:07 x4 kernel: [] ? drm_ioctl+0x420/0x4f0
> Dec 18 14:29:07 x4 kernel: [] ? 
> radeon_gem_pwrite_ioctl+0x20/0x20
> Dec 18 14:29:07 x4 kernel: [] ? do_vfs_ioctl+0x2e4/0x4e0
> Dec 18 14:29:07 x4 kernel: [] ? vfs_read+0x118/0x160
> Dec 18 14:29:07 x4 kernel: [] ? sys_ioctl+0x4c/0xa0
> Dec 18 14:29:07 x4 kernel: [] ? sys_read+0x51/0xa0
> Dec 18 14:29:07 x4 kernel: [] ? 
> system_call_fastpath+0x16/0x1b
> Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70b ]---
> Dec 18 14:29:07 x4 kernel: BUG: unable to handle k

GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-18 Thread Markus Trippelsdorf
On 2012.12.18 at 12:20 +0100, Michel D?nzer wrote:
> On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: 
> > On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
> > > On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
> > > > On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
> > > >  wrote:
> > > > > On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> > > > >> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> > > > >>  wrote:
> > > > >> > As soon as I open the following website:
> > > > >> > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> > > > >> >
> > > > >> > my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> > > > >>
> > > > >> Is this a regression?  Most likely a 3D driver bug unless you are 
> > > > >> only
> > > > >> seeing it with specific kernels.  What browser are you using and do
> > > > >> you have hw accelerated webgl, etc. enabled?  If so, what version of
> > > > >> mesa are you using?
> > > > >
> > > > > This is a regression, because it is caused by yesterdays merge of
> > > > > drm-next by Linus. IOW I only see this bug when running a
> > > > > v3.7-9432-g9360b53 kernel.
> > > > 
> > > > Can you bisect?  I'm guessing it may be related to the new DMA rings.  
> > > > Possibly:
> > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
> > > 
> > > Yes, the commit above causes the issue. 
> > > 
> > >  2d6cc72  GPU lockups
> > 
> > With 2d6cc72 reverted I get:
> > 
> > Dec 17 23:09:35 x4 kernel: [ cut here ]
> 
> Probably a separate issue, can you bisect this one as well?

Yes. Git-bisect points to:

85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
Author: Maarten Lankhorst 
Date:   Thu Nov 29 11:36:54 2012 +

drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
held, v3

(Please note that this bug is a little bit harder to reproduce. But
when you scroll up and down for ~10 seconds on the webpage mentioned
above it will trigger the oops.
So while I'm not 100% sure that the issue is caused by exactly this
commit, the vicinity should be right)

Dec 18 14:29:07 x4 kernel: [ cut here ]
Dec 18 14:29:07 x4 kernel: WARNING: at include/linux/kref.h:42 
radeon_fence_ref+0x2c/0x40()
Dec 18 14:29:07 x4 kernel: Hardware name: System Product Name
Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Not tainted 
3.7.0-rc7-00520-g85b144f #168
Dec 18 14:29:07 x4 kernel: Call Trace:
Dec 18 14:29:07 x4 kernel: [] ? warn_slowpath_common+0x74/0xb0
Dec 18 14:29:07 x4 kernel: [] ? radeon_fence_ref+0x2c/0x40
Dec 18 14:29:07 x4 kernel: [] ? 
ttm_bo_cleanup_refs_and_unlock+0x17c/0x2c0
Dec 18 14:29:07 x4 kernel: [] ? 
ttm_mem_evict_first+0x1dc/0x2a0
Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_man_get_node+0x62/0xb0
Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_mem_space+0x28e/0x340
Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_move_buffer+0xfc/0x170
Dec 18 14:29:07 x4 kernel: [] ? kmem_cache_alloc+0xb2/0xc0
Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_validate+0x95/0x110
Dec 18 14:29:07 x4 kernel: [] ? ttm_bo_init+0x2ec/0x3b0
Dec 18 14:29:07 x4 kernel: [] ? radeon_bo_create+0x18a/0x200
Dec 18 14:29:07 x4 kernel: [] ? radeon_bo_clear_va+0x40/0x40
Dec 18 14:29:07 x4 kernel: [] ? 
radeon_gem_object_create+0x92/0x160
Dec 18 14:29:07 x4 kernel: [] ? 
radeon_gem_create_ioctl+0x6c/0x150
Dec 18 14:29:07 x4 kernel: [] ? drm_ioctl+0x420/0x4f0
Dec 18 14:29:07 x4 kernel: [] ? 
radeon_gem_pwrite_ioctl+0x20/0x20
Dec 18 14:29:07 x4 kernel: [] ? do_vfs_ioctl+0x2e4/0x4e0
Dec 18 14:29:07 x4 kernel: [] ? vfs_read+0x118/0x160
Dec 18 14:29:07 x4 kernel: [] ? sys_ioctl+0x4c/0xa0
Dec 18 14:29:07 x4 kernel: [] ? sys_read+0x51/0xa0
Dec 18 14:29:07 x4 kernel: [] ? system_call_fastpath+0x16/0x1b
Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70b ]---
Dec 18 14:29:07 x4 kernel: BUG: unable to handle kernel paging request at 
00010077
Dec 18 14:29:07 x4 kernel: IP: [] _raw_spin_lock+0x5/0x30
Dec 18 14:29:07 x4 kernel: PGD 2156c4067 PUD 0
Dec 18 14:29:07 x4 kernel: Oops: 0002 [#1] SMP
Dec 18 14:29:07 x4 kernel: CPU 1
Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Tainted: GW
3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E
Dec 18 14:29:07 x4 kernel: RIP: 0010:[]  [] 
_raw_spin_lock+0x5/0x30
Dec 18 14:29:07 x4 kernel: RSP: 0018:880211645d58  EFLAGS: 00010286
Dec 18 14:29:07 x4 kernel: RAX: 

Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-18 Thread Markus Trippelsdorf
On 2012.12.18 at 12:20 +0100, Michel Dänzer wrote:
 On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: 
  On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
   On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
 On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  As soon as I open the following website:
  http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
 
  my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:

 Is this a regression?  Most likely a 3D driver bug unless you are 
 only
 seeing it with specific kernels.  What browser are you using and do
 you have hw accelerated webgl, etc. enabled?  If so, what version of
 mesa are you using?

 This is a regression, because it is caused by yesterdays merge of
 drm-next by Linus. IOW I only see this bug when running a
 v3.7-9432-g9360b53 kernel.

Can you bisect?  I'm guessing it may be related to the new DMA rings.  
Possibly:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
   
   Yes, the commit above causes the issue. 
   
2d6cc72  GPU lockups
  
  With 2d6cc72 reverted I get:
  
  Dec 17 23:09:35 x4 kernel: [ cut here ]
 
 Probably a separate issue, can you bisect this one as well?

Yes. Git-bisect points to:

85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
Author: Maarten Lankhorst maarten.lankho...@canonical.com
Date:   Thu Nov 29 11:36:54 2012 +

drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
held, v3

(Please note that this bug is a little bit harder to reproduce. But
when you scroll up and down for ~10 seconds on the webpage mentioned
above it will trigger the oops.
So while I'm not 100% sure that the issue is caused by exactly this
commit, the vicinity should be right)

Dec 18 14:29:07 x4 kernel: [ cut here ]
Dec 18 14:29:07 x4 kernel: WARNING: at include/linux/kref.h:42 
radeon_fence_ref+0x2c/0x40()
Dec 18 14:29:07 x4 kernel: Hardware name: System Product Name
Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Not tainted 
3.7.0-rc7-00520-g85b144f #168
Dec 18 14:29:07 x4 kernel: Call Trace:
Dec 18 14:29:07 x4 kernel: [81058c84] ? warn_slowpath_common+0x74/0xb0
Dec 18 14:29:07 x4 kernel: [812926fc] ? radeon_fence_ref+0x2c/0x40
Dec 18 14:29:07 x4 kernel: [8125e91c] ? 
ttm_bo_cleanup_refs_and_unlock+0x17c/0x2c0
Dec 18 14:29:07 x4 kernel: [8125f13c] ? 
ttm_mem_evict_first+0x1dc/0x2a0
Dec 18 14:29:07 x4 kernel: [81264412] ? ttm_bo_man_get_node+0x62/0xb0
Dec 18 14:29:07 x4 kernel: [8125f48e] ? ttm_bo_mem_space+0x28e/0x340
Dec 18 14:29:07 x4 kernel: [8125facc] ? ttm_bo_move_buffer+0xfc/0x170
Dec 18 14:29:07 x4 kernel: [810de172] ? kmem_cache_alloc+0xb2/0xc0
Dec 18 14:29:07 x4 kernel: [8125fbd5] ? ttm_bo_validate+0x95/0x110
Dec 18 14:29:07 x4 kernel: [8125ff3c] ? ttm_bo_init+0x2ec/0x3b0
Dec 18 14:29:07 x4 kernel: [8129415a] ? radeon_bo_create+0x18a/0x200
Dec 18 14:29:07 x4 kernel: [81293e40] ? radeon_bo_clear_va+0x40/0x40
Dec 18 14:29:07 x4 kernel: [812a5302] ? 
radeon_gem_object_create+0x92/0x160
Dec 18 14:29:07 x4 kernel: [812a571c] ? 
radeon_gem_create_ioctl+0x6c/0x150
Dec 18 14:29:07 x4 kernel: [81246b60] ? drm_ioctl+0x420/0x4f0
Dec 18 14:29:07 x4 kernel: [812a56b0] ? 
radeon_gem_pwrite_ioctl+0x20/0x20
Dec 18 14:29:07 x4 kernel: [810f53a4] ? do_vfs_ioctl+0x2e4/0x4e0
Dec 18 14:29:07 x4 kernel: [810e5588] ? vfs_read+0x118/0x160
Dec 18 14:29:07 x4 kernel: [810f55ec] ? sys_ioctl+0x4c/0xa0
Dec 18 14:29:07 x4 kernel: [810e5851] ? sys_read+0x51/0xa0
Dec 18 14:29:07 x4 kernel: [814b05d2] ? system_call_fastpath+0x16/0x1b
Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70b ]---
Dec 18 14:29:07 x4 kernel: BUG: unable to handle kernel paging request at 
00010077
Dec 18 14:29:07 x4 kernel: IP: [814afa15] _raw_spin_lock+0x5/0x30
Dec 18 14:29:07 x4 kernel: PGD 2156c4067 PUD 0
Dec 18 14:29:07 x4 kernel: Oops: 0002 [#1] SMP
Dec 18 14:29:07 x4 kernel: CPU 1
Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Tainted: GW
3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E
Dec 18 14:29:07 x4 kernel: RIP: 0010:[814afa15]  [814afa15] 
_raw_spin_lock+0x5/0x30
Dec 18 14:29:07 x4 kernel: RSP: 0018:880211645d58  EFLAGS: 00010286
Dec 18 14:29:07 x4 kernel: RAX: 0100 RBX: 8801c0e29448 RCX: 

Dec 18 14:29:07 x4 kernel: RDX: 0001 RSI: 0001 RDI: 
00010077
Dec 18 14:29

Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-18 Thread Markus Trippelsdorf
On 2012.12.18 at 14:38 +0100, Markus Trippelsdorf wrote:
 On 2012.12.18 at 12:20 +0100, Michel Dänzer wrote:
  On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: 
   On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
 On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
  On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   As soon as I open the following website:
   http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
  
   my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
 
  Is this a regression?  Most likely a 3D driver bug unless you are 
  only
  seeing it with specific kernels.  What browser are you using and do
  you have hw accelerated webgl, etc. enabled?  If so, what version 
  of
  mesa are you using?
 
  This is a regression, because it is caused by yesterdays merge of
  drm-next by Linus. IOW I only see this bug when running a
  v3.7-9432-g9360b53 kernel.
 
 Can you bisect?  I'm guessing it may be related to the new DMA rings. 
  Possibly:
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2

Yes, the commit above causes the issue. 

 2d6cc72  GPU lockups
   
   With 2d6cc72 reverted I get:
   
   Dec 17 23:09:35 x4 kernel: [ cut here ]
  
  Probably a separate issue, can you bisect this one as well?
 
 Yes. Git-bisect points to:
 
 85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
 commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
 Author: Maarten Lankhorst maarten.lankho...@canonical.com
 Date:   Thu Nov 29 11:36:54 2012 +
 
 drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
 held, v3
 
 (Please note that this bug is a little bit harder to reproduce. But
 when you scroll up and down for ~10 seconds on the webpage mentioned
 above it will trigger the oops.
 So while I'm not 100% sure that the issue is caused by exactly this
 commit, the vicinity should be right)
 
 Dec 18 14:29:07 x4 kernel: [ cut here ]
 Dec 18 14:29:07 x4 kernel: WARNING: at include/linux/kref.h:42 
 radeon_fence_ref+0x2c/0x40()
 Dec 18 14:29:07 x4 kernel: Hardware name: System Product Name
 Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Not tainted 
 3.7.0-rc7-00520-g85b144f #168
 Dec 18 14:29:07 x4 kernel: Call Trace:
 Dec 18 14:29:07 x4 kernel: [81058c84] ? 
 warn_slowpath_common+0x74/0xb0
 Dec 18 14:29:07 x4 kernel: [812926fc] ? radeon_fence_ref+0x2c/0x40
 Dec 18 14:29:07 x4 kernel: [8125e91c] ? 
 ttm_bo_cleanup_refs_and_unlock+0x17c/0x2c0
 Dec 18 14:29:07 x4 kernel: [8125f13c] ? 
 ttm_mem_evict_first+0x1dc/0x2a0
 Dec 18 14:29:07 x4 kernel: [81264412] ? 
 ttm_bo_man_get_node+0x62/0xb0
 Dec 18 14:29:07 x4 kernel: [8125f48e] ? ttm_bo_mem_space+0x28e/0x340
 Dec 18 14:29:07 x4 kernel: [8125facc] ? 
 ttm_bo_move_buffer+0xfc/0x170
 Dec 18 14:29:07 x4 kernel: [810de172] ? kmem_cache_alloc+0xb2/0xc0
 Dec 18 14:29:07 x4 kernel: [8125fbd5] ? ttm_bo_validate+0x95/0x110
 Dec 18 14:29:07 x4 kernel: [8125ff3c] ? ttm_bo_init+0x2ec/0x3b0
 Dec 18 14:29:07 x4 kernel: [8129415a] ? radeon_bo_create+0x18a/0x200
 Dec 18 14:29:07 x4 kernel: [81293e40] ? radeon_bo_clear_va+0x40/0x40
 Dec 18 14:29:07 x4 kernel: [812a5302] ? 
 radeon_gem_object_create+0x92/0x160
 Dec 18 14:29:07 x4 kernel: [812a571c] ? 
 radeon_gem_create_ioctl+0x6c/0x150
 Dec 18 14:29:07 x4 kernel: [81246b60] ? drm_ioctl+0x420/0x4f0
 Dec 18 14:29:07 x4 kernel: [812a56b0] ? 
 radeon_gem_pwrite_ioctl+0x20/0x20
 Dec 18 14:29:07 x4 kernel: [810f53a4] ? do_vfs_ioctl+0x2e4/0x4e0
 Dec 18 14:29:07 x4 kernel: [810e5588] ? vfs_read+0x118/0x160
 Dec 18 14:29:07 x4 kernel: [810f55ec] ? sys_ioctl+0x4c/0xa0
 Dec 18 14:29:07 x4 kernel: [810e5851] ? sys_read+0x51/0xa0
 Dec 18 14:29:07 x4 kernel: [814b05d2] ? 
 system_call_fastpath+0x16/0x1b
 Dec 18 14:29:07 x4 kernel: ---[ end trace c5e6f68fefd3a70b ]---
 Dec 18 14:29:07 x4 kernel: BUG: unable to handle kernel paging request at 
 00010077
 Dec 18 14:29:07 x4 kernel: IP: [814afa15] _raw_spin_lock+0x5/0x30
 Dec 18 14:29:07 x4 kernel: PGD 2156c4067 PUD 0
 Dec 18 14:29:07 x4 kernel: Oops: 0002 [#1] SMP
 Dec 18 14:29:07 x4 kernel: CPU 1
 Dec 18 14:29:07 x4 kernel: Pid: 161, comm: X Tainted: GW
 3.7.0-rc7-00520-g85b144f #168 System manufacturer System Product Name/M4A78T-E
 Dec 18 14:29:07 x4 kernel: RIP: 0010:[814afa15]  
 [814afa15] _raw_spin_lock+0x5/0x30
 Dec 18 14:29:07 x4 kernel: RSP: 0018:880211645d58  EFLAGS: 00010286
 Dec 18 14:29:07 x4 kernel

Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-18 Thread Markus Trippelsdorf
On 2012.12.18 at 16:24 +0100, Maarten Lankhorst wrote:
 Op 18-12-12 14:38, Markus Trippelsdorf schreef:
  On 2012.12.18 at 12:20 +0100, Michel Dänzer wrote:
  On Mon, 2012-12-17 at 23:55 +0100, Markus Trippelsdorf wrote: 
  On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
  On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
  On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
  On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
  On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
  As soon as I open the following website:
  http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
 
  my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
  Is this a regression?  Most likely a 3D driver bug unless you are only
  seeing it with specific kernels.  What browser are you using and do
  you have hw accelerated webgl, etc. enabled?  If so, what version of
  mesa are you using?
  This is a regression, because it is caused by yesterdays merge of
  drm-next by Linus. IOW I only see this bug when running a
  v3.7-9432-g9360b53 kernel.
  Can you bisect?  I'm guessing it may be related to the new DMA rings.  
  Possibly:
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
  Yes, the commit above causes the issue. 
 
   2d6cc72  GPU lockups
  With 2d6cc72 reverted I get:
 
  Dec 17 23:09:35 x4 kernel: [ cut here ]
  Probably a separate issue, can you bisect this one as well?
  Yes. Git-bisect points to:
 
  85b144f860176ec18db927d6d9ecdfb24d9c6483 is the first bad commit
  commit 85b144f860176ec18db927d6d9ecdfb24d9c6483
  Author: Maarten Lankhorst maarten.lankho...@canonical.com
  Date:   Thu Nov 29 11:36:54 2012 +
 
  drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock
  held, v3
 
  (Please note that this bug is a little bit harder to reproduce. But
  when you scroll up and down for ~10 seconds on the webpage mentioned
  above it will trigger the oops.
  So while I'm not 100% sure that the issue is caused by exactly this
  commit, the vicinity should be right)
 
 Those dmesg warnings sound suspicious, looks like something is going
 very wrong there.
 
 Can you revert the one before it? drm/radeon: allow move_notify to be
 called without reservation Reservation should be held at this point,
 that commit got in accidentally.
 
 I doubt not holding a reservation is causing it though, I don't really
 see how that commit could cause it however, so can you please double
 check it never happened before that point, and only started at that
 commit?
 
 also slap in a BUG_ON(!ttm_bo_is_reserved(bo)) in
 ttm_bo_cleanup_refs_and_unlock for good measure, and a
 BUG_ON(spin_trylock(bdev-fence_lock)); to ttm_bo_wait.
 
 I really don't see how that specific commit can be wrong though, so
 awaiting your results first before I try to dig more into it.

I just reran git-bisect just on your commits (from 1a1494def to 97a875cbd)
and I landed on the same commit as above:

commit 85b144f86 (drm/ttm: call ttm_bo_cleanup_refs with reservation and lru 
lock held, v3)

So now I'm pretty sure it's specifically this commit that started the
issue.

With your supposed debugging BUG_ONs added I still get:

Dec 18 17:01:15 x4 kernel: [ cut here ]
Dec 18 17:01:15 x4 kernel: WARNING: at include/linux/kref.h:42 
radeon_fence_ref+0x2c/0x40()
Dec 18 17:01:15 x4 kernel: Hardware name: System Product Name
Dec 18 17:01:15 x4 kernel: Pid: 157, comm: X Not tainted 
3.7.0-rc7-00520-g85b144f-dirty #174
Dec 18 17:01:15 x4 kernel: Call Trace:
Dec 18 17:01:15 x4 kernel: [81058c84] ? warn_slowpath_common+0x74/0xb0
Dec 18 17:01:15 x4 kernel: [8129273c] ? radeon_fence_ref+0x2c/0x40
Dec 18 17:01:15 x4 kernel: [8125e95c] ? 
ttm_bo_cleanup_refs_and_unlock+0x18c/0x2d0
Dec 18 17:01:15 x4 kernel: [8125f17c] ? 
ttm_mem_evict_first+0x1dc/0x2a0
Dec 18 17:01:15 x4 kernel: [81264452] ? ttm_bo_man_get_node+0x62/0xb0
Dec 18 17:01:15 x4 kernel: [8125f4ce] ? ttm_bo_mem_space+0x28e/0x340
Dec 18 17:01:15 x4 kernel: [8125fb0c] ? ttm_bo_move_buffer+0xfc/0x170
Dec 18 17:01:15 x4 kernel: [810de172] ? kmem_cache_alloc+0xb2/0xc0
Dec 18 17:01:15 x4 kernel: [8125fc15] ? ttm_bo_validate+0x95/0x110
Dec 18 17:01:15 x4 kernel: [8125ff7c] ? ttm_bo_init+0x2ec/0x3b0
Dec 18 17:01:15 x4 kernel: [8129419a] ? radeon_bo_create+0x18a/0x200
Dec 18 17:01:15 x4 kernel: [81293e80] ? radeon_bo_clear_va+0x40/0x40
Dec 18 17:01:15 x4 kernel: [812a5342] ? 
radeon_gem_object_create+0x92/0x160
Dec 18 17:01:15 x4 kernel: [812a575c] ? 
radeon_gem_create_ioctl+0x6c/0x150
Dec 18 17:01:15 x4 kernel: [812a529f] ? 
radeon_gem_object_free+0x2f/0x40
Dec 18 17:01:15 x4 kernel: [81246b60] ? drm_ioctl+0x420/0x4f0
Dec 18 17:01:15 x4 kernel: [812a56f0

GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
> On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
> > On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
> >  wrote:
> > > On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> > >> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> > >>  wrote:
> > >> > As soon as I open the following website:
> > >> > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> > >> >
> > >> > my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> > >>
> > >> Is this a regression?  Most likely a 3D driver bug unless you are only
> > >> seeing it with specific kernels.  What browser are you using and do
> > >> you have hw accelerated webgl, etc. enabled?  If so, what version of
> > >> mesa are you using?
> > >
> > > This is a regression, because it is caused by yesterdays merge of
> > > drm-next by Linus. IOW I only see this bug when running a
> > > v3.7-9432-g9360b53 kernel.
> > 
> > Can you bisect?  I'm guessing it may be related to the new DMA rings.  
> > Possibly:
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
> 
> Yes, the commit above causes the issue. 
> 
>  2d6cc72  GPU lockups

With 2d6cc72 reverted I get:

Dec 17 23:09:35 x4 kernel: [ cut here ]
Dec 17 23:09:35 x4 kernel: WARNING: at include/linux/kref.h:42 
radeon_fence_ref+0x2c/0x40()
Dec 17 23:09:35 x4 kernel: Hardware name: System Product Name
Dec 17 23:09:35 x4 kernel: Pid: 182, comm: X Not tainted 3.7.0-09433-ge033059 
#155
Dec 17 23:09:35 x4 kernel: Call Trace:
Dec 17 23:09:35 x4 kernel: [] ? warn_slowpath_common+0x74/0xb0
Dec 17 23:09:35 x4 kernel: [] ? radeon_fence_ref+0x2c/0x40
Dec 17 23:09:35 x4 kernel: [] ? 
ttm_bo_cleanup_refs_and_unlock+0x17c/0x2c0
Dec 17 23:09:35 x4 kernel: [] ? ttm_mem_evict_first+0x94/0x1d0
Dec 17 23:09:35 x4 kernel: [] ? ttm_bo_man_get_node+0x62/0xb0
Dec 17 23:09:35 x4 kernel: [] ? ttm_bo_mem_space+0x271/0x320
Dec 17 23:09:35 x4 kernel: [] ? ttm_bo_move_buffer+0xdd/0x150
Dec 17 23:09:35 x4 kernel: [] ? ttm_bo_validate+0x89/0xf0
Dec 17 23:09:35 x4 kernel: [] ? ttm_bo_init+0x2e9/0x3a0
Dec 17 23:09:35 x4 kernel: [] ? radeon_bo_create+0x18a/0x200
Dec 17 23:09:35 x4 kernel: [] ? radeon_bo_clear_va+0x40/0x40
Dec 17 23:09:35 x4 kernel: [] ? 
radeon_gem_object_create+0x92/0x160
Dec 17 23:09:35 x4 kernel: [] ? 
radeon_gem_create_ioctl+0x6c/0x150
Dec 17 23:09:35 x4 kernel: [] ? drm_ioctl+0x420/0x4f0
Dec 17 23:09:35 x4 kernel: [] ? 
radeon_gem_pwrite_ioctl+0x20/0x20
Dec 17 23:09:35 x4 kernel: [] ? __do_page_fault+0x1a9/0x490
Dec 17 23:09:35 x4 kernel: [] ? mmap_region+0x169/0x560
Dec 17 23:09:35 x4 kernel: [] ? do_vfs_ioctl+0x2e4/0x4e0
Dec 17 23:09:35 x4 kernel: [] ? vm_mmap_pgoff+0x69/0x80
Dec 17 23:09:35 x4 kernel: [] ? sys_ioctl+0x4c/0xa0
Dec 17 23:09:35 x4 kernel: [] ? system_call_fastpath+0x16/0x1b
Dec 17 23:09:35 x4 kernel: ---[ end trace eb6036661a77c177 ]---
Dec 17 23:09:35 x4 kernel: BUG: unable to handle kernel paging request at 
8803d9ee4bd8
Dec 17 23:09:35 x4 kernel: IP: [] 
radeon_fence_wait_seq+0x85/0x440
Dec 17 23:09:35 x4 kernel: PGD 180c063 PUD 0
Dec 17 23:09:35 x4 kernel: Oops:  [#1] SMP
Dec 17 23:09:35 x4 kernel: CPU 3
Dec 17 23:09:35 x4 kernel: Pid: 182, comm: X Tainted: GW
3.7.0-09433-ge033059 #155 System manufacturer System Product Name/M4A78T-E
Dec 17 23:09:35 x4 kernel: RIP: 0010:[]  [] 
radeon_fence_wait_seq+0x85/0x440
Dec 17 23:09:35 x4 kernel: RSP: 0018:880210cc7a38  EFLAGS: 00010282
Dec 17 23:09:35 x4 kernel: RAX: 880210cc7a90 RBX: 88020674c970 RCX: 
0001
Dec 17 23:09:35 x4 kernel: RDX: 0605b580 RSI: 0058 RDI: 
8801c7f7dc80
Dec 17 23:09:35 x4 kernel: RBP: 8803d9ee4bd8 R08: 0001 R09: 
02a9
Dec 17 23:09:35 x4 kernel: R10: 02a8 R11: 0006 R12: 
880210ee6981
Dec 17 23:09:35 x4 kernel: R13: 0605b580 R14: 8801c7f7dc80 R15: 
8802161864f8
Dec 17 23:09:35 x4 kernel: FS:  7f5ee88f4880() 
GS:88021fd8() knlGS:
Dec 17 23:09:35 x4 kernel: CS:  0010 DS:  ES:  CR0: 80050033
Dec 17 23:09:35 x4 kernel: CR2: 8803d9ee4bd8 CR3: 000210c63000 CR4: 
07e0
Dec 17 23:09:35 x4 kernel: DR0:  DR1:  DR2: 

Dec 17 23:09:35 x4 kernel: DR3:  DR6: 0ff0 DR7: 
0400
Dec 17 23:09:35 x4 kernel: Process X (pid: 182, threadinfo 880210cc6000, 
task 880215f45730)
Dec 17 23:09:35 x4 kernel: Stack:
Dec 17 23:09:35 x4 kernel: 8129de0c 0605b580 8803d9ee4080 
0010
Dec 17 23:09:35 x4 kernel: 880210cc

GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
> On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
>  wrote:
> > On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> >> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> >>  wrote:
> >> > As soon as I open the following website:
> >> > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> >> >
> >> > my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> >>
> >> Is this a regression?  Most likely a 3D driver bug unless you are only
> >> seeing it with specific kernels.  What browser are you using and do
> >> you have hw accelerated webgl, etc. enabled?  If so, what version of
> >> mesa are you using?
> >
> > This is a regression, because it is caused by yesterdays merge of
> > drm-next by Linus. IOW I only see this bug when running a
> > v3.7-9432-g9360b53 kernel.
> 
> Can you bisect?  I'm guessing it may be related to the new DMA rings.  
> Possibly:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2

Yes, the commit above causes the issue. 

 2d6cc72  GPU lockups
 009ee7a  runs fine

-- 
Markus


GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
On 2012.12.17 at 22:48 +0100, Markus Trippelsdorf wrote:
> On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> > On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
> >  wrote:
> > > As soon as I open the following website:
> > > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> > >
> > > my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> > 
> > Is this a regression?  Most likely a 3D driver bug unless you are only
> > seeing it with specific kernels.  What browser are you using and do
> > you have hw accelerated webgl, etc. enabled?  If so, what version of
> > mesa are you using?
> 
> This is a regression, because it is caused by yesterdays merge of
> drm-next by Linus. IOW I only see this bug when running a
> v3.7-9432-g9360b53 kernel. 

Forgot to mention that I don't use webgl. Browser is Firefox. And I use
my screen in portrait mode:

 DVI-0 connected 1050x1680+0+0 left (normal left inverted right x axis y axis) 
434mm x 270mm

-- 
Markus


GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
> On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
>  wrote:
> > As soon as I open the following website:
> > http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
> >
> > my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
> 
> Is this a regression?  Most likely a 3D driver bug unless you are only
> seeing it with specific kernels.  What browser are you using and do
> you have hw accelerated webgl, etc. enabled?  If so, what version of
> mesa are you using?

This is a regression, because it is caused by yesterdays merge of
drm-next by Linus. IOW I only see this bug when running a
v3.7-9432-g9360b53 kernel. 

-- 
Markus


GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
As soon as I open the following website:
http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html

my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:


Dec 17 17:41:39 x4 kernel: [drm] Initialized drm 1.1.0 20060810
Dec 17 17:41:39 x4 kernel: [drm] radeon defaulting to kernel modesetting.
Dec 17 17:41:39 x4 kernel: [drm] radeon kernel modesetting enabled.
Dec 17 17:41:39 x4 kernel: [drm] initializing kernel modesetting (RS780 
0x1002:0x9614 0x1043:0x834D).
Dec 17 17:41:39 x4 kernel: [drm] register mmio base: 0xFBEE
Dec 17 17:41:39 x4 kernel: [drm] register mmio size: 65536
Dec 17 17:41:39 x4 kernel: ATOM BIOS: 113
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: VRAM: 128M 0xC000 - 
0xC7FF (128M used)
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: GTT: 512M 0xA000 - 
0xBFFF
Dec 17 17:41:39 x4 kernel: [drm] Detected VRAM RAM=128M, BAR=128M
Dec 17 17:41:39 x4 kernel: [drm] RAM width 32bits DDR
Dec 17 17:41:39 x4 kernel: [TTM] Zone  kernel: Available graphics memory: 
4083532 kiB
Dec 17 17:41:39 x4 kernel: [TTM] Zone   dma32: Available graphics memory: 
2097152 kiB
Dec 17 17:41:39 x4 kernel: [TTM] Initializing pool allocator
Dec 17 17:41:39 x4 kernel: [TTM] Initializing DMA pool allocator
Dec 17 17:41:39 x4 kernel: [drm] radeon: 128M of VRAM memory ready
Dec 17 17:41:39 x4 kernel: [drm] radeon: 512M of GTT memory ready.
Dec 17 17:41:39 x4 kernel: [drm] Supports vblank timestamp caching Rev 1 
(10.10.2010).
Dec 17 17:41:39 x4 kernel: [drm] Driver supports precise vblank timestamp query.
Dec 17 17:41:39 x4 kernel: [drm] radeon: irq initialized.
Dec 17 17:41:39 x4 kernel: [drm] GART: num cpu pages 131072, num gpu pages 
131072
Dec 17 17:41:39 x4 kernel: [drm] Loading RS780 Microcode
Dec 17 17:41:39 x4 kernel: [drm] PCIE GART of 512M enabled (table at 
0xC004).
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: WB enabled
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: fence driver on ring 0 use gpu 
addr 0xac00 and cpu addr 0x8802163acc00
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: fence driver on ring 3 use gpu 
addr 0xac0c and cpu addr 0x8802163acc0c
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: setting latency timer to 64
Dec 17 17:41:39 x4 kernel: [drm] ring test on 0 succeeded in 0 usecs
Dec 17 17:41:39 x4 kernel: [drm] ring test on 3 succeeded in 1 usecs
Dec 17 17:41:39 x4 kernel: [drm] ib test on ring 0 succeeded in 0 usecs
Dec 17 17:41:39 x4 kernel: [drm] ib test on ring 3 succeeded in 0 usecs
Dec 17 17:41:39 x4 kernel: [drm] Radeon Display Connectors
Dec 17 17:41:39 x4 kernel: [drm] Connector 0:
Dec 17 17:41:39 x4 kernel: [drm]   VGA-1
Dec 17 17:41:39 x4 kernel: [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 
0x7e48 0x7e4c 0x7e4c
Dec 17 17:41:39 x4 kernel: [drm]   Encoders:
Dec 17 17:41:39 x4 kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1
Dec 17 17:41:39 x4 kernel: [drm] Connector 1:
Dec 17 17:41:39 x4 kernel: [drm]   DVI-D-1
Dec 17 17:41:39 x4 kernel: [drm]   HPD3
Dec 17 17:41:39 x4 kernel: [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 
0x7e58 0x7e5c 0x7e5c
Dec 17 17:41:39 x4 kernel: [drm]   Encoders:
Dec 17 17:41:39 x4 kernel: [drm] DFP3: INTERNAL_KLDSCP_LVTMA
Dec 17 17:41:39 x4 kernel: [drm] radeon: power management initialized
Dec 17 17:41:39 x4 kernel: [drm] fb mappable at 0xF0142000
Dec 17 17:41:39 x4 kernel: [drm] vram apper at 0xF000
Dec 17 17:41:39 x4 kernel: [drm] size 7299072
Dec 17 17:41:39 x4 kernel: [drm] fb depth is 24
Dec 17 17:41:39 x4 kernel: [drm]pitch is 6912
Dec 17 17:41:39 x4 kernel: fbcon: radeondrmfb (fb0) is primary device
Dec 17 17:41:39 x4 kernel: Console: switching to colour frame buffer device 
131x105
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: fb0: radeondrmfb frame buffer 
device
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: registered panic notifier
Dec 17 17:41:39 x4 kernel: [drm] Initialized radeon 2.27.0 20080528 for 
:01:05.0 on minor 0
...
Dec 17 19:12:33 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Dec 17 19:12:33 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x00022777 last fence id 0x00022774)

after reboot:

Dec 17 19:14:32 x4 kernel: Adding 4194300k swap on /var/cache/swapfile.img.  
Priority:-1 extents:9 across:629080060k 
Dec 17 19:16:44 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Dec 17 19:16:44 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x0954 last fence id 0x0952)
Dec 17 19:16:44 x4 kernel: radeon :01:05.0: Saved 89 dwords of commands on 
ring 0.
Dec 17 19:16:44 x4 kernel: radeon :01:05.0: GPU softreset 
Dec 17 19:16:44 x4 kernel: radeon :01:05.0:   
R_008010_GRBM_STATUS=0xA000B030
Dec 17 19:16:44 x4 kernel: radeon :01:05.0:   
R_008014_GRBM_STATUS2=0x0003
Dec 17 19:16:44 x4 kernel: radeon :01:05.0:   
R_000E50_SRBM_STATUS=0x20005040
Dec 17 19:16:44 x4 

GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
As soon as I open the following website:
http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html

my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:


Dec 17 17:41:39 x4 kernel: [drm] Initialized drm 1.1.0 20060810
Dec 17 17:41:39 x4 kernel: [drm] radeon defaulting to kernel modesetting.
Dec 17 17:41:39 x4 kernel: [drm] radeon kernel modesetting enabled.
Dec 17 17:41:39 x4 kernel: [drm] initializing kernel modesetting (RS780 
0x1002:0x9614 0x1043:0x834D).
Dec 17 17:41:39 x4 kernel: [drm] register mmio base: 0xFBEE
Dec 17 17:41:39 x4 kernel: [drm] register mmio size: 65536
Dec 17 17:41:39 x4 kernel: ATOM BIOS: 113
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: VRAM: 128M 0xC000 - 
0xC7FF (128M used)
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: GTT: 512M 0xA000 - 
0xBFFF
Dec 17 17:41:39 x4 kernel: [drm] Detected VRAM RAM=128M, BAR=128M
Dec 17 17:41:39 x4 kernel: [drm] RAM width 32bits DDR
Dec 17 17:41:39 x4 kernel: [TTM] Zone  kernel: Available graphics memory: 
4083532 kiB
Dec 17 17:41:39 x4 kernel: [TTM] Zone   dma32: Available graphics memory: 
2097152 kiB
Dec 17 17:41:39 x4 kernel: [TTM] Initializing pool allocator
Dec 17 17:41:39 x4 kernel: [TTM] Initializing DMA pool allocator
Dec 17 17:41:39 x4 kernel: [drm] radeon: 128M of VRAM memory ready
Dec 17 17:41:39 x4 kernel: [drm] radeon: 512M of GTT memory ready.
Dec 17 17:41:39 x4 kernel: [drm] Supports vblank timestamp caching Rev 1 
(10.10.2010).
Dec 17 17:41:39 x4 kernel: [drm] Driver supports precise vblank timestamp query.
Dec 17 17:41:39 x4 kernel: [drm] radeon: irq initialized.
Dec 17 17:41:39 x4 kernel: [drm] GART: num cpu pages 131072, num gpu pages 
131072
Dec 17 17:41:39 x4 kernel: [drm] Loading RS780 Microcode
Dec 17 17:41:39 x4 kernel: [drm] PCIE GART of 512M enabled (table at 
0xC004).
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: WB enabled
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: fence driver on ring 0 use gpu 
addr 0xac00 and cpu addr 0x8802163acc00
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: fence driver on ring 3 use gpu 
addr 0xac0c and cpu addr 0x8802163acc0c
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: setting latency timer to 64
Dec 17 17:41:39 x4 kernel: [drm] ring test on 0 succeeded in 0 usecs
Dec 17 17:41:39 x4 kernel: [drm] ring test on 3 succeeded in 1 usecs
Dec 17 17:41:39 x4 kernel: [drm] ib test on ring 0 succeeded in 0 usecs
Dec 17 17:41:39 x4 kernel: [drm] ib test on ring 3 succeeded in 0 usecs
Dec 17 17:41:39 x4 kernel: [drm] Radeon Display Connectors
Dec 17 17:41:39 x4 kernel: [drm] Connector 0:
Dec 17 17:41:39 x4 kernel: [drm]   VGA-1
Dec 17 17:41:39 x4 kernel: [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 
0x7e48 0x7e4c 0x7e4c
Dec 17 17:41:39 x4 kernel: [drm]   Encoders:
Dec 17 17:41:39 x4 kernel: [drm] CRT1: INTERNAL_KLDSCP_DAC1
Dec 17 17:41:39 x4 kernel: [drm] Connector 1:
Dec 17 17:41:39 x4 kernel: [drm]   DVI-D-1
Dec 17 17:41:39 x4 kernel: [drm]   HPD3
Dec 17 17:41:39 x4 kernel: [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 
0x7e58 0x7e5c 0x7e5c
Dec 17 17:41:39 x4 kernel: [drm]   Encoders:
Dec 17 17:41:39 x4 kernel: [drm] DFP3: INTERNAL_KLDSCP_LVTMA
Dec 17 17:41:39 x4 kernel: [drm] radeon: power management initialized
Dec 17 17:41:39 x4 kernel: [drm] fb mappable at 0xF0142000
Dec 17 17:41:39 x4 kernel: [drm] vram apper at 0xF000
Dec 17 17:41:39 x4 kernel: [drm] size 7299072
Dec 17 17:41:39 x4 kernel: [drm] fb depth is 24
Dec 17 17:41:39 x4 kernel: [drm]pitch is 6912
Dec 17 17:41:39 x4 kernel: fbcon: radeondrmfb (fb0) is primary device
Dec 17 17:41:39 x4 kernel: Console: switching to colour frame buffer device 
131x105
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: fb0: radeondrmfb frame buffer 
device
Dec 17 17:41:39 x4 kernel: radeon :01:05.0: registered panic notifier
Dec 17 17:41:39 x4 kernel: [drm] Initialized radeon 2.27.0 20080528 for 
:01:05.0 on minor 0
...
Dec 17 19:12:33 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Dec 17 19:12:33 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x00022777 last fence id 0x00022774)

after reboot:

Dec 17 19:14:32 x4 kernel: Adding 4194300k swap on /var/cache/swapfile.img.  
Priority:-1 extents:9 across:629080060k 
Dec 17 19:16:44 x4 kernel: radeon :01:05.0: GPU lockup CP stall for more 
than 1msec
Dec 17 19:16:44 x4 kernel: radeon :01:05.0: GPU lockup (waiting for 
0x0954 last fence id 0x0952)
Dec 17 19:16:44 x4 kernel: radeon :01:05.0: Saved 89 dwords of commands on 
ring 0.
Dec 17 19:16:44 x4 kernel: radeon :01:05.0: GPU softreset 
Dec 17 19:16:44 x4 kernel: radeon :01:05.0:   
R_008010_GRBM_STATUS=0xA000B030
Dec 17 19:16:44 x4 kernel: radeon :01:05.0:   
R_008014_GRBM_STATUS2=0x0003
Dec 17 19:16:44 x4 kernel: radeon :01:05.0:   
R_000E50_SRBM_STATUS=0x20005040
Dec 17 19:16:44 x4 

Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
 On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  As soon as I open the following website:
  http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
 
  my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
 
 Is this a regression?  Most likely a 3D driver bug unless you are only
 seeing it with specific kernels.  What browser are you using and do
 you have hw accelerated webgl, etc. enabled?  If so, what version of
 mesa are you using?

This is a regression, because it is caused by yesterdays merge of
drm-next by Linus. IOW I only see this bug when running a
v3.7-9432-g9360b53 kernel. 

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
On 2012.12.17 at 22:48 +0100, Markus Trippelsdorf wrote:
 On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
  On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   As soon as I open the following website:
   http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
  
   my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
  
  Is this a regression?  Most likely a 3D driver bug unless you are only
  seeing it with specific kernels.  What browser are you using and do
  you have hw accelerated webgl, etc. enabled?  If so, what version of
  mesa are you using?
 
 This is a regression, because it is caused by yesterdays merge of
 drm-next by Linus. IOW I only see this bug when running a
 v3.7-9432-g9360b53 kernel. 

Forgot to mention that I don't use webgl. Browser is Firefox. And I use
my screen in portrait mode:

 DVI-0 connected 1050x1680+0+0 left (normal left inverted right x axis y axis) 
434mm x 270mm

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
 On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
 mar...@trippelsdorf.de wrote:
  On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
  On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   As soon as I open the following website:
   http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
  
   my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
 
  Is this a regression?  Most likely a 3D driver bug unless you are only
  seeing it with specific kernels.  What browser are you using and do
  you have hw accelerated webgl, etc. enabled?  If so, what version of
  mesa are you using?
 
  This is a regression, because it is caused by yesterdays merge of
  drm-next by Linus. IOW I only see this bug when running a
  v3.7-9432-g9360b53 kernel.
 
 Can you bisect?  I'm guessing it may be related to the new DMA rings.  
 Possibly:
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2

Yes, the commit above causes the issue. 

 2d6cc72  GPU lockups
 009ee7a  runs fine

-- 
Markus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: GPU lockup CP stall for more than 10000msec on latest vanilla git

2012-12-17 Thread Markus Trippelsdorf
On 2012.12.17 at 23:25 +0100, Markus Trippelsdorf wrote:
 On 2012.12.17 at 17:00 -0500, Alex Deucher wrote:
  On Mon, Dec 17, 2012 at 4:48 PM, Markus Trippelsdorf
  mar...@trippelsdorf.de wrote:
   On 2012.12.17 at 16:32 -0500, Alex Deucher wrote:
   On Mon, Dec 17, 2012 at 1:27 PM, Markus Trippelsdorf
   mar...@trippelsdorf.de wrote:
As soon as I open the following website:
http://www.boston.com/bigpicture/2012/12/2012_year_in_pictures_part_i.html
   
my Radeon RS780 stalls (GPU lockup) leaving the machine unusable:
  
   Is this a regression?  Most likely a 3D driver bug unless you are only
   seeing it with specific kernels.  What browser are you using and do
   you have hw accelerated webgl, etc. enabled?  If so, what version of
   mesa are you using?
  
   This is a regression, because it is caused by yesterdays merge of
   drm-next by Linus. IOW I only see this bug when running a
   v3.7-9432-g9360b53 kernel.
  
  Can you bisect?  I'm guessing it may be related to the new DMA rings.  
  Possibly:
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=2d6cc7296d4ee128ab0fa3b715f0afde511f49c2
 
 Yes, the commit above causes the issue. 
 
  2d6cc72  GPU lockups

With 2d6cc72 reverted I get:

Dec 17 23:09:35 x4 kernel: [ cut here ]
Dec 17 23:09:35 x4 kernel: WARNING: at include/linux/kref.h:42 
radeon_fence_ref+0x2c/0x40()
Dec 17 23:09:35 x4 kernel: Hardware name: System Product Name
Dec 17 23:09:35 x4 kernel: Pid: 182, comm: X Not tainted 3.7.0-09433-ge033059 
#155
Dec 17 23:09:35 x4 kernel: Call Trace:
Dec 17 23:09:35 x4 kernel: [81059c94] ? warn_slowpath_common+0x74/0xb0
Dec 17 23:09:35 x4 kernel: [8129de0c] ? radeon_fence_ref+0x2c/0x40
Dec 17 23:09:35 x4 kernel: [8126a02c] ? 
ttm_bo_cleanup_refs_and_unlock+0x17c/0x2c0
Dec 17 23:09:35 x4 kernel: [8126a6f4] ? ttm_mem_evict_first+0x94/0x1d0
Dec 17 23:09:35 x4 kernel: [8126f9c2] ? ttm_bo_man_get_node+0x62/0xb0
Dec 17 23:09:35 x4 kernel: [8126aaa1] ? ttm_bo_mem_space+0x271/0x320
Dec 17 23:09:35 x4 kernel: [8126b0bd] ? ttm_bo_move_buffer+0xdd/0x150
Dec 17 23:09:35 x4 kernel: [8126b1b9] ? ttm_bo_validate+0x89/0xf0
Dec 17 23:09:35 x4 kernel: [8126b509] ? ttm_bo_init+0x2e9/0x3a0
Dec 17 23:09:35 x4 kernel: [8129f84a] ? radeon_bo_create+0x18a/0x200
Dec 17 23:09:35 x4 kernel: [8129f510] ? radeon_bo_clear_va+0x40/0x40
Dec 17 23:09:35 x4 kernel: [812b0d42] ? 
radeon_gem_object_create+0x92/0x160
Dec 17 23:09:35 x4 kernel: [812b113c] ? 
radeon_gem_create_ioctl+0x6c/0x150
Dec 17 23:09:35 x4 kernel: [81252250] ? drm_ioctl+0x420/0x4f0
Dec 17 23:09:35 x4 kernel: [812b10d0] ? 
radeon_gem_pwrite_ioctl+0x20/0x20
Dec 17 23:09:35 x4 kernel: [810521a9] ? __do_page_fault+0x1a9/0x490
Dec 17 23:09:35 x4 kernel: [810d1ac9] ? mmap_region+0x169/0x560
Dec 17 23:09:35 x4 kernel: [810f7f84] ? do_vfs_ioctl+0x2e4/0x4e0
Dec 17 23:09:35 x4 kernel: [810c0e19] ? vm_mmap_pgoff+0x69/0x80
Dec 17 23:09:35 x4 kernel: [810f81cc] ? sys_ioctl+0x4c/0xa0
Dec 17 23:09:35 x4 kernel: [814c2a12] ? system_call_fastpath+0x16/0x1b
Dec 17 23:09:35 x4 kernel: ---[ end trace eb6036661a77c177 ]---
Dec 17 23:09:35 x4 kernel: BUG: unable to handle kernel paging request at 
8803d9ee4bd8
Dec 17 23:09:35 x4 kernel: IP: [8129d395] 
radeon_fence_wait_seq+0x85/0x440
Dec 17 23:09:35 x4 kernel: PGD 180c063 PUD 0
Dec 17 23:09:35 x4 kernel: Oops:  [#1] SMP
Dec 17 23:09:35 x4 kernel: CPU 3
Dec 17 23:09:35 x4 kernel: Pid: 182, comm: X Tainted: GW
3.7.0-09433-ge033059 #155 System manufacturer System Product Name/M4A78T-E
Dec 17 23:09:35 x4 kernel: RIP: 0010:[8129d395]  [8129d395] 
radeon_fence_wait_seq+0x85/0x440
Dec 17 23:09:35 x4 kernel: RSP: 0018:880210cc7a38  EFLAGS: 00010282
Dec 17 23:09:35 x4 kernel: RAX: 880210cc7a90 RBX: 88020674c970 RCX: 
0001
Dec 17 23:09:35 x4 kernel: RDX: 0605b580 RSI: 0058 RDI: 
8801c7f7dc80
Dec 17 23:09:35 x4 kernel: RBP: 8803d9ee4bd8 R08: 0001 R09: 
02a9
Dec 17 23:09:35 x4 kernel: R10: 02a8 R11: 0006 R12: 
880210ee6981
Dec 17 23:09:35 x4 kernel: R13: 0605b580 R14: 8801c7f7dc80 R15: 
8802161864f8
Dec 17 23:09:35 x4 kernel: FS:  7f5ee88f4880() 
GS:88021fd8() knlGS:
Dec 17 23:09:35 x4 kernel: CS:  0010 DS:  ES:  CR0: 80050033
Dec 17 23:09:35 x4 kernel: CR2: 8803d9ee4bd8 CR3: 000210c63000 CR4: 
07e0
Dec 17 23:09:35 x4 kernel: DR0:  DR1:  DR2: 

Dec 17 23:09:35 x4 kernel: DR3:  DR6: 0ff0 DR7: 
0400
Dec 17 23:09:35 x4 kernel: Process X (pid: 182, threadinfo 880210cc6000, 
task 880215f45730)
Dec 17 23:09:35 x4 kernel: Stack:
Dec 17 23:09:35 x4 kernel

Monitor sync out of range with current Linux git tree

2012-10-05 Thread Markus Trippelsdorf
On 2012.10.05 at 10:25 -0400, Alex Deucher wrote:
> On Fri, Oct 5, 2012 at 10:15 AM, Markus Trippelsdorf
>  wrote:
> > On 2012.10.05 at 10:02 -0400, Alex Deucher wrote:
> >> On Fri, Oct 5, 2012 at 9:38 AM, Markus Trippelsdorf
> >>  wrote:
> >> > On 2012.10.05 at 09:14 -0400, Alex Deucher wrote:
> >> >> On Fri, Oct 5, 2012 at 8:37 AM, Markus Trippelsdorf
> >> >>  wrote:
> >> >> > When I cold start my machine I see the following error message on my
> >> >> > monitor:
> >> >> >
> >> >> >  Out of Range
> >> >> >  48.2kHz / 44Hz
> >> >> >
> >> >> > I have to reboot on older kernel and kexec to the current one to get 
> >> >> > it
> >> >> > working again.
> >> >>
> >> >> I don't see anything amiss; can you bisect?
> >> >
> >> > Yes. It's your commit:
> >> >
> >> > commit 9dbbcfc6894957fdbb311ba92c85c026659878b5
> >> > Author: Alex Deucher 
> >> > Date:   Wed Sep 12 17:39:57 2012 -0400
> >> >
> >> > drm/radeon/dce3: use a single PPLL for all DP displays
> >>
> >> Can you apply the attached patch and send me the output?
> >
> > [drm] == start crtc 0 driving DVI-D-1 ==
> > [drm] crtc 0 is not DP
> > [drm] plls in use 0x0
> > [drm] crtc 0 using pll 0x1
> > [drm] == end crtc 0 ==
> 
> Does the attached patch fix the issue?

Yes. Thanks Alex.

-- 
Markus


  1   2   >