Re: i915 build issue

2023-07-11 Thread John Garry

On 11/07/2023 10:58, Jani Nikula wrote:

I didn't notice anything on dri-devel mailing list about this.

I presume you're using CONFIG_WERROR=y or W=e.


I'm just using whatever vanilla x86_64_defconfig gives.



See [1] and [2]. I'm undecided how we should treat this.


Ok.

Thanks,
John



i915 build issue

2023-07-11 Thread John Garry

Hi guys,

Did anyone else notice this build issue on v6.5-rc1:

drivers/gpu/drm/i915/i915_pci.c:143:15: error: expected expression
before ‘,’ token
  GEN3_FEATURES,
   ^
drivers/gpu/drm/i915/i915_pci.c:151:15: error: expected expression
before ‘,’ token
  GEN3_FEATURES,
   ^
drivers/gpu/drm/i915/i915_pci.c:159:15: error: expected expression
before ‘,’ token
  GEN3_FEATURES,
   ^
drivers/gpu/drm/i915/i915_pci.c:166:15: error: expected expression
before ‘,’ token
  GEN3_FEATURES,
   ^
drivers/gpu/drm/i915/i915_pci.c:174:15: error: expected expression
before ‘,’ token
  GEN3_FEATURES,
   ^
drivers/gpu/drm/i915/i915_pci.c:180:15: error: expected expression
before ‘,’ token
  GEN3_FEATURES,
   ^
drivers/gpu/drm/i915/i915_pci.c:186:15: error: expected expression
before ‘,’ token
  GEN3_FEATURES,
   ^
drivers/gpu/drm/i915/i915_pci.c:209:15: error: initialized field
overwritten [-Werror=override-init]
  .has_snoop = false,
   ^
...

I'm using following gcc:

john@localhost:~/mnt_sda4/john/linux> gcc --version
gcc (SUSE Linux) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I didn't notice anything on dri-devel mailing list about this.

Cheers,
John


Re: Warnings in DRM code when removing/unbinding a driver

2020-01-12 Thread John Garry



Hi Thomas,


drm-tip now contains


I have tested today's linux-next, which includes this:



commit a88248506a2bcfeaef6837a53cde19fe11970e6c
Author: Thomas Zimmermann 
Date:   Tue Dec 3 09:38:15 2019 +0100

 drm/hisilicon/hibmc: Switch to generic fbdev emulation

which removes this entire code and switches hibmc to generic fbdev
emulation. Does that fix the problem?



And I see no warn, here's a dmesg snippet:

[   20.672787] pci 0007:90:00.0: can't derive routing for PCI INT A
[   20.678831] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[   20.686536] pci_bus 0007:90: 2-byte config write to 0007:90:00.0 
offset 0x4 may corrupt adjacent RW1C bits

[   20.696888] [TTM] Zone  kernel: Available graphics memory: 57359458 KiB
[   20.703545] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[   20.710108] [TTM] Initializing pool allocator
[   20.714561] [TTM] Initializing DMA pool allocator
[   20.720212] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   20.726863] [drm] No driver support for vblank timestamp query.
[   20.754777] Console: switching to colour frame buffer device 100x37
[   20.778180] hibmc-drm 0007:91:00.0: fb0: hibmcdrmfb frame buffer device
[   20.786447] [drm] Initialized hibmc 1.0.0 20160828 for 0007:91:00.0 
on minor 0

[   20.794346] Console: switching to colour dummy device 80x25
[   20.801884] pci 0007:90:00.0: can't derive routing for PCI INT A
[   20.807963] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[   20.813656] [TTM] Finalizing pool allocator
[   20.817905] [TTM] Finalizing DMA pool allocator
[   20.822576] [TTM] Zone  kernel: Used memory at exit: 0 KiB
[   20.828760] [TTM] Zone   dma32: Used memory at exit: 0 KiB
[   20.834978] pci 0007:90:00.0: can't derive routing for PCI INT A
[   20.841021] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[   20.848858] [TTM] Zone  kernel: Available graphics memory: 57359458 KiB
[   20.855516] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[   20.862079] [TTM] Initializing pool allocator
[   20.866525] [TTM] Initializing DMA pool allocator
[   20.872064] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[   20.878716] [drm] No driver support for vblank timestamp query.
[   20.905996] Console: switching to colour frame buffer device 100x37
[   20.929385] hibmc-drm 0007:91:00.0: fb0: hibmcdrmfb frame buffer device
[   20.937241] [drm] Initialized hibmc 1.0.0 20160828 for 0007:91:00.0 
on minor 0

[   21.171906] loop: module loaded

Thanks,
John


Best regards
Thomas


[   27.965802]  hibmc_unload+0x2c/0xd0
[   27.969281]  hibmc_pci_remove+0x2c/0x40
[   27.973109]  pci_device_remove+0x6c/0x140
[   27.977110]  really_probe+0x174/0x548
[   27.980763]  driver_probe_device+0x7c/0x148
[   27.984936]  device_driver_attach+0x94/0xa0
[   27.989109]  __driver_attach+0xa8/0x110
[   27.992935]  bus_for_each_dev+0xe8/0x158
[   27.996849]  driver_attach+0x30/0x40
[   28.000415]  bus_add_driver+0x234/0x2f0
[   28.004241]  driver_register+0xbc/0x1d0
[   28.008067]  __pci_register_driver+0xbc/0xd0
[   28.012329]  hibmc_pci_driver_init+0x20/0x28
[   28.016590]  do_one_initcall+0xb4/0x254
[   28.020417]  kernel_init_freeable+0x27c/0x328
[   28.024765]  kernel_init+0x10/0x118
[   28.028245]  ret_from_fork+0x10/0x18
[   28.031813] ---[ end trace 35a83b71b657878d ]---
[   28.036503] [ cut here ]
[   28.041115] WARNING: CPU: 24 PID: 1 at
drivers/gpu/drm/drm_gem_vram_helper.c:40
ttm_buffer_object_destroy+0x4c/0x80
[   28.051537] Modules linked in:
[   28.054585] CPU: 24 PID: 1 Comm: swapper/0 Tainted: G    B   W
  5.5.0-rc1-dirty #565
[   28.062924] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
RC0 - V1.16.01 03/15/2019

[snip]

Indeed, simply unbinding the device from the driver causes the same sort
of issue:

root@(none)$ cd ./bus/pci/drivers/hibmc-drm/
root@(none)$ ls
:05:00.0  bind  new_id    remove_id uevent
unbind
root@(none)$ echo \:05\:00.0 > unbind
[  116.074352] [ cut here ]
[  116.078978] WARNING: CPU: 17 PID: 1178 at
drivers/gpu/drm/drm_gem_vram_helper.c:40
ttm_buffer_object_destroy+0x4c/0x80
[  116.089661] Modules linked in:
[  116.092711] CPU: 17 PID: 1178 Comm: sh Tainted: G    B   W
5.5.0-rc1-dirty #565
[  116.100704] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
RC0 - V1.16.01 03/15/2019
[  116.109218] pstate: 2049 (nzCv daif +PAN -UAO)
[  116.114001] pc : ttm_buffer_object_destroy+0x4c/0x80
[  116.118956] lr : ttm_buffer_object_destroy+0x18/0x80
[  116.123910] sp : 0022e6cef8e0
[  116.127215] x29: 0022e6cef8e0 x28: 00231b1fb000
[  116.132519] x27:  x26: 00231b1fb000
[  116.137821] x25: 0022e6cefdc0 x24: 2480
[  116.143124] x23: 0023682b6ab0 x22: 0023682b6800
[  116.148427] x21: 0023682b6800 x20: 
[  116.153730] x19: 0023682b6800 x18: 
[  116.159032] x17: 001
[  116.185545] x7 : 0023682b6b07 x6 

Re: Warnings in DRM code when removing/unbinding a driver

2019-12-23 Thread John Garry

On 19/12/2019 09:54, Daniel Vetter wrote:

On Wed, Dec 18, 2019 at 7:08 PM John Garry  wrote:


+

So the v5.4 kernel does not have this issue.

I have bisected the initial occurrence to:

commit 37a48adfba6cf6e87df9ba8b75ab85d514ed86d8
Author: Thomas Zimmermann 
Date:   Fri Sep 6 14:20:53 2019 +0200

  drm/vram: Add kmap ref-counting to GEM VRAM objects

  The kmap and kunmap operations of GEM VRAM buffers can now be called
  in interleaving pairs. The first call to drm_gem_vram_kmap() maps the
  buffer's memory to kernel address space and the final call to
  drm_gem_vram_kunmap() unmaps the memory. Intermediate calls to these
  functions increment or decrement a reference counter.

So this either exposes or creates the issue.


Yeah that's just shooting the messenger.


OK, so it exposes it.

 Like I said, for most drivers

you can pretty much assume that their unload sequence has been broken
since forever. It's not often tested, and especially the hotunbind
from a device (as opposed to driver unload) stuff wasn't even possible
to get right until just recently.


Do you think it's worth trying to fix this for 5.5 and earlier, or just 
switch to the device-managed interface for 5.6 and forget about 5.5 and 
earlier?


Thanks,
John
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Warnings in DRM code when removing/unbinding a driver

2019-12-19 Thread John Garry

+

So the v5.4 kernel does not have this issue.

I have bisected the initial occurrence to:

commit 37a48adfba6cf6e87df9ba8b75ab85d514ed86d8
Author: Thomas Zimmermann 
Date:   Fri Sep 6 14:20:53 2019 +0200

drm/vram: Add kmap ref-counting to GEM VRAM objects

The kmap and kunmap operations of GEM VRAM buffers can now be called
in interleaving pairs. The first call to drm_gem_vram_kmap() maps the
buffer's memory to kernel address space and the final call to
drm_gem_vram_kunmap() unmaps the memory. Intermediate calls to these
functions increment or decrement a reference counter.

So this either exposes or creates the issue.

John


On Mon, 2019-12-16 at 17:23 +, John Garry wrote:

Hi all,

Enabling CONFIG_DEBUG_TEST_DRIVER_REMOVE causes many warns on a system
with the HIBMC hw:

[   27.788806] WARNING: CPU: 24 PID: 1 at
drivers/gpu/drm/drm_gem_vram_helper.c:564 
bo_driver_move_notify+0x8c/0x98


A total shot in the dark. This might make no sense,
but it's worth a try:


Thanks for the suggestion, but still the same splat.

I haven't had a chance to analyze the problem myself. But perhaps we 
should just change over the device-managed interface, as Daniel mentioned.




diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c

index 2fd4ca91a62d..69bb0e29da88 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
@@ -247,9 +247,8 @@ static int hibmc_unload(struct drm_device *dev)
  {
 struct hibmc_drm_private *priv = dev->dev_private;
-   hibmc_fbdev_fini(priv);
-
 drm_atomic_helper_shutdown(dev);
+   hibmc_fbdev_fini(priv);
 if (dev->irq_enabled)
 drm_irq_uninstall(dev);

Hope it helps,
Ezequiel



Thanks,
John

[EOM]


[   27.798969] Modules linked in:
[   27.802018] CPU: 24 PID: 1 Comm: swapper/0 Tainted: G    B
   5.5.0-rc1-dirty #565
[   27.810358] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
RC0 - V1.16.01 03/15/2019
[   27.818872] pstate: 20c9 (nzCv daif +PAN +UAO)
[   27.823654] pc : bo_driver_move_notify+0x8c/0x98
[   27.828262] lr : bo_driver_move_notify+0x40/0x98
[   27.832868] sp : 00236f0677e0
[   27.836173] x29: 00236f0677e0 x28: a0001454e5e0
[   27.841476] x27: 002366e52128 x26: a000149e67b0
[   27.846779] x25: 002366e523e0 x24: 002336936120
[   27.852082] x23: 0023346f4010 x22: 002336936128
[   27.857385] x21: a000149c15c0 x20: 0023369361f8
[   27.862687] x19: 002336936000 x18: 1258
[   27.867989] x17: 1190 x16: 11d0
[   27.873292] x15: 1348 x14: a00012d68190
[   27.878595] x13: 0006 x12: 140003241f91
[   27.883897] x11: 940003241f91 x10: dfffa000
[   27.889200] x9 : 940003241f92 x8 : 0001
[   27.894502] x7 : a0001920fc88 x6 : 940003241f92
[   27.899804] x5 : 940003241f92 x4 : 0023369363a0
[   27.905107] x3 : a00010c104b8 x2 : dfffa000
[   27.910409] x1 : 0003 x0 : 0001
[   27.915712] Call trace:
[   27.918151]  bo_driver_move_notify+0x8c/0x98
[   27.922412]  ttm_bo_cleanup_memtype_use+0x54/0x100
[   27.927194]  ttm_bo_put+0x3a0/0x5d0
[   27.930673]  drm_gem_vram_object_free+0xc/0x18
[   27.935109]  drm_gem_object_free+0x34/0xd0
[   27.939196]  drm_gem_object_put_unlocked+0xc8/0xf0
[   27.943978]  hibmc_user_framebuffer_destroy+0x20/0x40
[   27.949020]  drm_framebuffer_free+0x48/0x58
[   27.953194]  drm_mode_object_put.part.1+0x90/0xe8
[   27.957889]  drm_mode_object_put+0x28/0x38
[   27.961976]  hibmc_fbdev_fini+0x54/0x78
[   27.965802]  hibmc_unload+0x2c/0xd0
[   27.969281]  hibmc_pci_remove+0x2c/0x40
[   27.973109]  pci_device_remove+0x6c/0x140
[   27.977110]  really_probe+0x174/0x548
[   27.980763]  driver_probe_device+0x7c/0x148
[   27.984936]  device_driver_attach+0x94/0xa0
[   27.989109]  __driver_attach+0xa8/0x110
[   27.992935]  bus_for_each_dev+0xe8/0x158
[   27.996849]  driver_attach+0x30/0x40
[   28.000415]  bus_add_driver+0x234/0x2f0
[   28.004241]  driver_register+0xbc/0x1d0
[   28.008067]  __pci_register_driver+0xbc/0xd0
[   28.012329]  hibmc_pci_driver_init+0x20/0x28
[   28.016590]  do_one_initcall+0xb4/0x254
[   28.020417]  kernel_init_freeable+0x27c/0x328
[   28.024765]  kernel_init+0x10/0x118
[   28.028245]  ret_from_fork+0x10/0x18
[   28.031813] ---[ end trace 35a83b71b657878d ]---
[   28.036503] [ cut here ]
[   28.041115] WARNING: CPU: 24 PID: 1 at
drivers/gpu/drm/drm_gem_vram_helper.c:40 
ttm_buffer_object_destroy+0x4c/0x80

[   28.051537] Modules linked in:
[   28.054585] CPU: 24 PID: 1 Comm: swapper/0 Tainted: G    B   W
   5.5.0-rc1-dirty #565
[   28.062924] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
RC0 - V1.16.01 03/15/2019

[snip]

Indeed, simply unbinding the device from the driver causes the same sort
of issue:

root@(non

Re: Warnings in DRM code when removing/unbinding a driver

2019-12-18 Thread John Garry

Hi Ezequiel,


On Mon, 2019-12-16 at 17:23 +, John Garry wrote:

Hi all,

Enabling CONFIG_DEBUG_TEST_DRIVER_REMOVE causes many warns on a system
with the HIBMC hw:

[   27.788806] WARNING: CPU: 24 PID: 1 at
drivers/gpu/drm/drm_gem_vram_helper.c:564 bo_driver_move_notify+0x8c/0x98


A total shot in the dark. This might make no sense,
but it's worth a try:


Thanks for the suggestion, but still the same splat.

I haven't had a chance to analyze the problem myself. But perhaps we 
should just change over the device-managed interface, as Daniel mentioned.




diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
index 2fd4ca91a62d..69bb0e29da88 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
@@ -247,9 +247,8 @@ static int hibmc_unload(struct drm_device *dev)
  {
 struct hibmc_drm_private *priv = dev->dev_private;
  
-   hibmc_fbdev_fini(priv);

-
 drm_atomic_helper_shutdown(dev);
+   hibmc_fbdev_fini(priv);
  
 if (dev->irq_enabled)

 drm_irq_uninstall(dev);

Hope it helps,
Ezequiel



Thanks,
John

[EOM]


[   27.798969] Modules linked in:
[   27.802018] CPU: 24 PID: 1 Comm: swapper/0 Tainted: GB
   5.5.0-rc1-dirty #565
[   27.810358] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
RC0 - V1.16.01 03/15/2019
[   27.818872] pstate: 20c9 (nzCv daif +PAN +UAO)
[   27.823654] pc : bo_driver_move_notify+0x8c/0x98
[   27.828262] lr : bo_driver_move_notify+0x40/0x98
[   27.832868] sp : 00236f0677e0
[   27.836173] x29: 00236f0677e0 x28: a0001454e5e0
[   27.841476] x27: 002366e52128 x26: a000149e67b0
[   27.846779] x25: 002366e523e0 x24: 002336936120
[   27.852082] x23: 0023346f4010 x22: 002336936128
[   27.857385] x21: a000149c15c0 x20: 0023369361f8
[   27.862687] x19: 002336936000 x18: 1258
[   27.867989] x17: 1190 x16: 11d0
[   27.873292] x15: 1348 x14: a00012d68190
[   27.878595] x13: 0006 x12: 140003241f91
[   27.883897] x11: 940003241f91 x10: dfffa000
[   27.889200] x9 : 940003241f92 x8 : 0001
[   27.894502] x7 : a0001920fc88 x6 : 940003241f92
[   27.899804] x5 : 940003241f92 x4 : 0023369363a0
[   27.905107] x3 : a00010c104b8 x2 : dfffa000
[   27.910409] x1 : 0003 x0 : 0001
[   27.915712] Call trace:
[   27.918151]  bo_driver_move_notify+0x8c/0x98
[   27.922412]  ttm_bo_cleanup_memtype_use+0x54/0x100
[   27.927194]  ttm_bo_put+0x3a0/0x5d0
[   27.930673]  drm_gem_vram_object_free+0xc/0x18
[   27.935109]  drm_gem_object_free+0x34/0xd0
[   27.939196]  drm_gem_object_put_unlocked+0xc8/0xf0
[   27.943978]  hibmc_user_framebuffer_destroy+0x20/0x40
[   27.949020]  drm_framebuffer_free+0x48/0x58
[   27.953194]  drm_mode_object_put.part.1+0x90/0xe8
[   27.957889]  drm_mode_object_put+0x28/0x38
[   27.961976]  hibmc_fbdev_fini+0x54/0x78
[   27.965802]  hibmc_unload+0x2c/0xd0
[   27.969281]  hibmc_pci_remove+0x2c/0x40
[   27.973109]  pci_device_remove+0x6c/0x140
[   27.977110]  really_probe+0x174/0x548
[   27.980763]  driver_probe_device+0x7c/0x148
[   27.984936]  device_driver_attach+0x94/0xa0
[   27.989109]  __driver_attach+0xa8/0x110
[   27.992935]  bus_for_each_dev+0xe8/0x158
[   27.996849]  driver_attach+0x30/0x40
[   28.000415]  bus_add_driver+0x234/0x2f0
[   28.004241]  driver_register+0xbc/0x1d0
[   28.008067]  __pci_register_driver+0xbc/0xd0
[   28.012329]  hibmc_pci_driver_init+0x20/0x28
[   28.016590]  do_one_initcall+0xb4/0x254
[   28.020417]  kernel_init_freeable+0x27c/0x328
[   28.024765]  kernel_init+0x10/0x118
[   28.028245]  ret_from_fork+0x10/0x18
[   28.031813] ---[ end trace 35a83b71b657878d ]---
[   28.036503] [ cut here ]
[   28.041115] WARNING: CPU: 24 PID: 1 at
drivers/gpu/drm/drm_gem_vram_helper.c:40 ttm_buffer_object_destroy+0x4c/0x80
[   28.051537] Modules linked in:
[   28.054585] CPU: 24 PID: 1 Comm: swapper/0 Tainted: GB   W
   5.5.0-rc1-dirty #565
[   28.062924] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
RC0 - V1.16.01 03/15/2019

[snip]

Indeed, simply unbinding the device from the driver causes the same sort
of issue:

root@(none)$ cd ./bus/pci/drivers/hibmc-drm/
root@(none)$ ls
:05:00.0  bind  new_idremove_id ueventunbind
root@(none)$ echo \:05\:00.0 > unbind
[  116.074352] [ cut here ]
[  116.078978] WARNING: CPU: 17 PID: 1178 at
drivers/gpu/drm/drm_gem_vram_helper.c:40 ttm_buffer_object_destroy+0x4c/0x80
[  116.089661] Modules linked in:
[  116.092711] CPU: 17 PID: 1178 Comm: sh Tainted: GB   W
5.5.0-rc1-dirty #565
[  116.100704] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI
RC0 - V1.16.01 03/15/2019
[  116.109218] pstate: 2049 (nzCv daif +PAN -UAO)
[  116.1

Re: Warnings in DRM code when removing/unbinding a driver

2019-12-18 Thread John Garry

On 16/12/2019 17:23, John Garry wrote:

+, -


Hi all,


xinliang  is bouncing. We need to get his 
new mail address.


John



Enabling CONFIG_DEBUG_TEST_DRIVER_REMOVE causes many warns on a system 
with the HIBMC hw:


[   27.788806] WARNING: CPU: 24 PID: 1 at 
drivers/gpu/drm/drm_gem_vram_helper.c:564 bo_driver_move_notify+0x8c/0x98

[   27.798969] Modules linked in:
[   27.802018] CPU: 24 PID: 1 Comm: swapper/0 Tainted: G    B 
  5.5.0-rc1-dirty #565
[   27.810358] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI 
RC0 - V1.16.01 03/15/2019

[   27.818872] pstate: 20c9 (nzCv daif +PAN +UAO)
[   27.823654] pc : bo_driver_move_notify+0x8c/0x98
[   27.828262] lr : bo_driver_move_notify+0x40/0x98
[   27.832868] sp : 00236f0677e0
[   27.836173] x29: 00236f0677e0 x28: a0001454e5e0
[   27.841476] x27: 002366e52128 x26: a000149e67b0
[   27.846779] x25: 002366e523e0 x24: 002336936120
[   27.852082] x23: 0023346f4010 x22: 002336936128
[   27.857385] x21: a000149c15c0 x20: 0023369361f8
[   27.862687] x19: 002336936000 x18: 1258
[   27.867989] x17: 1190 x16: 11d0
[   27.873292] x15: 1348 x14: a00012d68190
[   27.878595] x13: 0006 x12: 140003241f91
[   27.883897] x11: 940003241f91 x10: dfffa000
[   27.889200] x9 : 940003241f92 x8 : 0001
[   27.894502] x7 : a0001920fc88 x6 : 940003241f92
[   27.899804] x5 : 940003241f92 x4 : 0023369363a0
[   27.905107] x3 : a00010c104b8 x2 : dfffa000
[   27.910409] x1 : 0003 x0 : 0001
[   27.915712] Call trace:
[   27.918151]  bo_driver_move_notify+0x8c/0x98
[   27.922412]  ttm_bo_cleanup_memtype_use+0x54/0x100
[   27.927194]  ttm_bo_put+0x3a0/0x5d0
[   27.930673]  drm_gem_vram_object_free+0xc/0x18
[   27.935109]  drm_gem_object_free+0x34/0xd0
[   27.939196]  drm_gem_object_put_unlocked+0xc8/0xf0
[   27.943978]  hibmc_user_framebuffer_destroy+0x20/0x40
[   27.949020]  drm_framebuffer_free+0x48/0x58
[   27.953194]  drm_mode_object_put.part.1+0x90/0xe8
[   27.957889]  drm_mode_object_put+0x28/0x38
[   27.961976]  hibmc_fbdev_fini+0x54/0x78
[   27.965802]  hibmc_unload+0x2c/0xd0
[   27.969281]  hibmc_pci_remove+0x2c/0x40
[   27.973109]  pci_device_remove+0x6c/0x140
[   27.977110]  really_probe+0x174/0x548
[   27.980763]  driver_probe_device+0x7c/0x148
[   27.984936]  device_driver_attach+0x94/0xa0
[   27.989109]  __driver_attach+0xa8/0x110
[   27.992935]  bus_for_each_dev+0xe8/0x158
[   27.996849]  driver_attach+0x30/0x40
[   28.000415]  bus_add_driver+0x234/0x2f0
[   28.004241]  driver_register+0xbc/0x1d0
[   28.008067]  __pci_register_driver+0xbc/0xd0
[   28.012329]  hibmc_pci_driver_init+0x20/0x28
[   28.016590]  do_one_initcall+0xb4/0x254
[   28.020417]  kernel_init_freeable+0x27c/0x328
[   28.024765]  kernel_init+0x10/0x118
[   28.028245]  ret_from_fork+0x10/0x18
[   28.031813] ---[ end trace 35a83b71b657878d ]---
[   28.036503] [ cut here ]
[   28.041115] WARNING: CPU: 24 PID: 1 at 
drivers/gpu/drm/drm_gem_vram_helper.c:40 
ttm_buffer_object_destroy+0x4c/0x80

[   28.051537] Modules linked in:
[   28.054585] CPU: 24 PID: 1 Comm: swapper/0 Tainted: G    B   W 
  5.5.0-rc1-dirty #565
[   28.062924] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI 
RC0 - V1.16.01 03/15/2019


[snip]

Indeed, simply unbinding the device from the driver causes the same sort 
of issue:


root@(none)$ cd ./bus/pci/drivers/hibmc-drm/
root@(none)$ ls
:05:00.0  bind  new_id    remove_id uevent
unbind

root@(none)$ echo \:05\:00.0 > unbind
[  116.074352] [ cut here ]
[  116.078978] WARNING: CPU: 17 PID: 1178 at 
drivers/gpu/drm/drm_gem_vram_helper.c:40 
ttm_buffer_object_destroy+0x4c/0x80

[  116.089661] Modules linked in:
[  116.092711] CPU: 17 PID: 1178 Comm: sh Tainted: G    B   W 
5.5.0-rc1-dirty #565
[  116.100704] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI 
RC0 - V1.16.01 03/15/2019

[  116.109218] pstate: 2049 (nzCv daif +PAN -UAO)
[  116.114001] pc : ttm_buffer_object_destroy+0x4c/0x80
[  116.118956] lr : ttm_buffer_object_destroy+0x18/0x80
[  116.123910] sp : 0022e6cef8e0
[  116.127215] x29: 0022e6cef8e0 x28: 00231b1fb000
[  116.132519] x27:  x26: 00231b1fb000
[  116.137821] x25: 0022e6cefdc0 x24: 2480
[  116.143124] x23: 0023682b6ab0 x22: 0023682b6800
[  116.148427] x21: 0023682b6800 x20: 
[  116.153730] x19: 0023682b6800 x18: 
[  116.159032] x17: 001
[  116.185545] x7 : 0023682b6b07 x6 : 80046d056d61
[  116.190848] x5 : 80046d056d61 x4 : 0023682b6ba0
[  116.196151] x3 : a00010197338 x2 : dfffa000
[  116.201453] x1 : 0003 x0 : 0001
[  116.206756] Call trace:
[  116.209195]  ttm_buffer_object_destroy+0x4c/0x80
[  116.213

Warnings in DRM code when removing/unbinding a driver

2019-12-17 Thread John Garry

Hi all,

Enabling CONFIG_DEBUG_TEST_DRIVER_REMOVE causes many warns on a system 
with the HIBMC hw:


[   27.788806] WARNING: CPU: 24 PID: 1 at 
drivers/gpu/drm/drm_gem_vram_helper.c:564 bo_driver_move_notify+0x8c/0x98

[   27.798969] Modules linked in:
[   27.802018] CPU: 24 PID: 1 Comm: swapper/0 Tainted: GB 
 5.5.0-rc1-dirty #565
[   27.810358] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI 
RC0 - V1.16.01 03/15/2019

[   27.818872] pstate: 20c9 (nzCv daif +PAN +UAO)
[   27.823654] pc : bo_driver_move_notify+0x8c/0x98
[   27.828262] lr : bo_driver_move_notify+0x40/0x98
[   27.832868] sp : 00236f0677e0
[   27.836173] x29: 00236f0677e0 x28: a0001454e5e0
[   27.841476] x27: 002366e52128 x26: a000149e67b0
[   27.846779] x25: 002366e523e0 x24: 002336936120
[   27.852082] x23: 0023346f4010 x22: 002336936128
[   27.857385] x21: a000149c15c0 x20: 0023369361f8
[   27.862687] x19: 002336936000 x18: 1258
[   27.867989] x17: 1190 x16: 11d0
[   27.873292] x15: 1348 x14: a00012d68190
[   27.878595] x13: 0006 x12: 140003241f91
[   27.883897] x11: 940003241f91 x10: dfffa000
[   27.889200] x9 : 940003241f92 x8 : 0001
[   27.894502] x7 : a0001920fc88 x6 : 940003241f92
[   27.899804] x5 : 940003241f92 x4 : 0023369363a0
[   27.905107] x3 : a00010c104b8 x2 : dfffa000
[   27.910409] x1 : 0003 x0 : 0001
[   27.915712] Call trace:
[   27.918151]  bo_driver_move_notify+0x8c/0x98
[   27.922412]  ttm_bo_cleanup_memtype_use+0x54/0x100
[   27.927194]  ttm_bo_put+0x3a0/0x5d0
[   27.930673]  drm_gem_vram_object_free+0xc/0x18
[   27.935109]  drm_gem_object_free+0x34/0xd0
[   27.939196]  drm_gem_object_put_unlocked+0xc8/0xf0
[   27.943978]  hibmc_user_framebuffer_destroy+0x20/0x40
[   27.949020]  drm_framebuffer_free+0x48/0x58
[   27.953194]  drm_mode_object_put.part.1+0x90/0xe8
[   27.957889]  drm_mode_object_put+0x28/0x38
[   27.961976]  hibmc_fbdev_fini+0x54/0x78
[   27.965802]  hibmc_unload+0x2c/0xd0
[   27.969281]  hibmc_pci_remove+0x2c/0x40
[   27.973109]  pci_device_remove+0x6c/0x140
[   27.977110]  really_probe+0x174/0x548
[   27.980763]  driver_probe_device+0x7c/0x148
[   27.984936]  device_driver_attach+0x94/0xa0
[   27.989109]  __driver_attach+0xa8/0x110
[   27.992935]  bus_for_each_dev+0xe8/0x158
[   27.996849]  driver_attach+0x30/0x40
[   28.000415]  bus_add_driver+0x234/0x2f0
[   28.004241]  driver_register+0xbc/0x1d0
[   28.008067]  __pci_register_driver+0xbc/0xd0
[   28.012329]  hibmc_pci_driver_init+0x20/0x28
[   28.016590]  do_one_initcall+0xb4/0x254
[   28.020417]  kernel_init_freeable+0x27c/0x328
[   28.024765]  kernel_init+0x10/0x118
[   28.028245]  ret_from_fork+0x10/0x18
[   28.031813] ---[ end trace 35a83b71b657878d ]---
[   28.036503] [ cut here ]
[   28.041115] WARNING: CPU: 24 PID: 1 at 
drivers/gpu/drm/drm_gem_vram_helper.c:40 ttm_buffer_object_destroy+0x4c/0x80

[   28.051537] Modules linked in:
[   28.054585] CPU: 24 PID: 1 Comm: swapper/0 Tainted: GB   W 
 5.5.0-rc1-dirty #565
[   28.062924] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI 
RC0 - V1.16.01 03/15/2019


[snip]

Indeed, simply unbinding the device from the driver causes the same sort 
of issue:


root@(none)$ cd ./bus/pci/drivers/hibmc-drm/
root@(none)$ ls
:05:00.0  bind  new_idremove_id ueventunbind
root@(none)$ echo \:05\:00.0 > unbind
[  116.074352] [ cut here ]
[  116.078978] WARNING: CPU: 17 PID: 1178 at 
drivers/gpu/drm/drm_gem_vram_helper.c:40 ttm_buffer_object_destroy+0x4c/0x80

[  116.089661] Modules linked in:
[  116.092711] CPU: 17 PID: 1178 Comm: sh Tainted: GB   W 
5.5.0-rc1-dirty #565
[  116.100704] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI 
RC0 - V1.16.01 03/15/2019

[  116.109218] pstate: 2049 (nzCv daif +PAN -UAO)
[  116.114001] pc : ttm_buffer_object_destroy+0x4c/0x80
[  116.118956] lr : ttm_buffer_object_destroy+0x18/0x80
[  116.123910] sp : 0022e6cef8e0
[  116.127215] x29: 0022e6cef8e0 x28: 00231b1fb000
[  116.132519] x27:  x26: 00231b1fb000
[  116.137821] x25: 0022e6cefdc0 x24: 2480
[  116.143124] x23: 0023682b6ab0 x22: 0023682b6800
[  116.148427] x21: 0023682b6800 x20: 
[  116.153730] x19: 0023682b6800 x18: 
[  116.159032] x17: 001
[  116.185545] x7 : 0023682b6b07 x6 : 80046d056d61
[  116.190848] x5 : 80046d056d61 x4 : 0023682b6ba0
[  116.196151] x3 : a00010197338 x2 : dfffa000
[  116.201453] x1 : 0003 x0 : 0001
[  116.206756] Call trace:
[  116.209195]  ttm_buffer_object_destroy+0x4c/0x80
[  116.213803]  ttm_bo_release_list+0x184/0x220
[  116.218064]  ttm_bo_put+0x410/0x5d0
[  116.221544]  drm_gem_vram_object_free+0xc/0x18
[ 

Re: [PATCH 0/3] HiBMC driver fixes

2018-09-27 Thread John Garry

On 26/09/2018 10:41, Xinliang Liu wrote:

On Wed, 26 Sep 2018 at 16:46, John Garry  wrote:


On 26/09/2018 04:00, Xinliang Liu wrote:

Thanks John, good addressing!
The root cause as you said, our hibmc previous frame buffer format
depth setting is wrong which does not pass the new format sanity
checking drm_mode_legacy_fb_format.
For this series,  Reviewed-by: Xinliang Liu 
Applied to hisilicon-drm-next.


I can't see this branch in the git associated with this driver from its
MAINTAINERS entry (git://github.com/xin3liang/linux.git), but please

Not a branch, it is a tag: drm-hisilicon-next-2018-09-26


ensure these fixes are included in 4.19


As it doesn't affect 4.19-rcx, I send a PULL for 4.20.
See mail "[GIT PULL] drm-hisilicon-next-2018-09-26"


When Chris' change goes into 4.20 - which I suspect will be before yours 
- boot-time bisect will be broken.


John





Thanks,
John



Thanks,
Xinliang

On Sun, 23 Sep 2018 at 20:32, John Garry  wrote:


This patchset fixes a couple of issues in probing the HiBMC driver, as
follows:
- fix the probe error path to not carry an error code in the pointer
- don't use invalid legacy fb bpp/depth combination

Another more trivial patch is for using the standard Huawei PCI vendor ID
instead of hard-coding it.

Tested on Huawei D05 board. I can see tux on BMC VGA console.

John Garry (3):
  drm/hisilicon: hibmc: Do not carry error code in HiBMC framebuffer
pointer
  drm/hisilicon: hibmc: Don't overwrite fb helper surface depth
  drm/hisilicon: hibmc: Use HUAWEI PCI vendor ID macro

 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 2 +-
 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--
1.9.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


.






.




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/3] HiBMC driver fixes

2018-09-27 Thread John Garry

On 26/09/2018 04:00, Xinliang Liu wrote:

Thanks John, good addressing!
The root cause as you said, our hibmc previous frame buffer format
depth setting is wrong which does not pass the new format sanity
checking drm_mode_legacy_fb_format.
For this series,  Reviewed-by: Xinliang Liu 
Applied to hisilicon-drm-next.


I can't see this branch in the git associated with this driver from its 
MAINTAINERS entry (git://github.com/xin3liang/linux.git), but please 
ensure these fixes are included in 4.19


Thanks,
John



Thanks,
Xinliang

On Sun, 23 Sep 2018 at 20:32, John Garry  wrote:


This patchset fixes a couple of issues in probing the HiBMC driver, as
follows:
- fix the probe error path to not carry an error code in the pointer
- don't use invalid legacy fb bpp/depth combination

Another more trivial patch is for using the standard Huawei PCI vendor ID
instead of hard-coding it.

Tested on Huawei D05 board. I can see tux on BMC VGA console.

John Garry (3):
  drm/hisilicon: hibmc: Do not carry error code in HiBMC framebuffer
pointer
  drm/hisilicon: hibmc: Don't overwrite fb helper surface depth
  drm/hisilicon: hibmc: Use HUAWEI PCI vendor ID macro

 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 2 +-
 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--
1.9.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


.




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/3] drm/hisilicon: hibmc: Don't overwrite fb helper surface depth

2018-09-23 Thread John Garry
Currently the driver overwrites the surface depth provided by the fb
helper to give an invalid bpp/surface depth combination.

This has been exposed by commit 70109354fed2 ("drm: Reject unknown legacy
bpp and depth for drm_mode_addfb ioctl"), which now causes the driver to
fail to probe.

Fix by not overwriting the surface depth.

Fixes: d1667b86795a ("drm/hisilicon/hibmc: Add support for frame buffer")
Signed-off-by: John Garry 
---
 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
index 8bd2907..edcca17 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
@@ -71,7 +71,6 @@ static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
DRM_DEBUG_DRIVER("surface width(%d), height(%d) and bpp(%d)\n",
 sizes->surface_width, sizes->surface_height,
 sizes->surface_bpp);
-   sizes->surface_depth = 32;
 
bytes_per_pixel = DIV_ROUND_UP(sizes->surface_bpp, 8);
 
-- 
1.9.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 0/3] HiBMC driver fixes

2018-09-23 Thread John Garry
This patchset fixes a couple of issues in probing the HiBMC driver, as
follows:
- fix the probe error path to not carry an error code in the pointer
- don't use invalid legacy fb bpp/depth combination

Another more trivial patch is for using the standard Huawei PCI vendor ID
instead of hard-coding it.

Tested on Huawei D05 board. I can see tux on BMC VGA console.

John Garry (3):
  drm/hisilicon: hibmc: Do not carry error code in HiBMC framebuffer
pointer
  drm/hisilicon: hibmc: Don't overwrite fb helper surface depth
  drm/hisilicon: hibmc: Use HUAWEI PCI vendor ID macro

 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c   | 2 +-
 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
1.9.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/3] drm/hisilicon: hibmc: Do not carry error code in HiBMC framebuffer pointer

2018-09-23 Thread John Garry
In hibmc_drm_fb_create(), when the call to hibmc_framebuffer_init() fails
with error, do not store the error code in the HiBMC device frame-buffer
pointer, as this will be later checked for non-zero value in
hibmc_fbdev_destroy() when our intention is to check for a valid function
pointer.

This fixes the following crash:
[9.699791] Unable to handle kernel NULL pointer dereference at virtual 
address 001a
[9.708672] Mem abort info:
[9.711489]   ESR = 0x9604
[9.714570]   Exception class = DABT (current EL), IL = 32 bits
[9.720551]   SET = 0, FnV = 0
[9.723631]   EA = 0, S1PTW = 0
[9.726799] Data abort info:
[9.729702]   ISV = 0, ISS = 0x0004
[9.733573]   CM = 0, WnR = 0
[9.736566] [001a] user address but active_mm is swapper
[9.742987] Internal error: Oops: 9604 [#1] PREEMPT SMP
[9.748614] Modules linked in:
[9.751694] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: GW 
4.19.0-rc4-next-20180920-1-g9b0012c #322
[9.762681] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 IT21 
Nemo 2.0 RC0 04/18/2018
[9.771915] Workqueue: events work_for_cpu_fn
[9.776312] pstate: 6005 (nZCv daif -PAN -UAO)
[9.781150] pc : drm_mode_object_put+0x0/0x20
[9.785547] lr : hibmc_fbdev_fini+0x40/0x58
[9.789767] sp : 0af1bcf0
[9.793108] x29: 0af1bcf0 x28: 
[9.798473] x27:  x26: 08f66630
[9.803838] x25:  x24: 095abb98
[9.809203] x23: 8017db92fe00 x22: 8017d2b13000
[9.814568] x21: ffea x20: 8017d2f80018
[9.819933] x19: 8017d28a0018 x18: 
[9.825297] x17:  x16: 
[9.830662] x15: 092296c8 x14: 8939970f
[9.836026] x13: 0939971d x12: 09229940
[9.841391] x11: 085f8fc0 x10: 0af1b9a0
[9.846756] x9 : 000d x8 : 6620657a696c6169
[9.852121] x7 : 8017d3340580 x6 : 8017d4168000
[9.857486] x5 :  x4 : 8017db92fb20
[9.862850] x3 : 2690 x2 : 8017d3340480
[9.868214] x1 : 0028 x0 : 0002
[9.873580] Process kworker/16:1 (pid: 293, stack limit = 0x(ptrval))
[9.880788] Call trace:
[9.883252]  drm_mode_object_put+0x0/0x20
[9.887297]  hibmc_unload+0x1c/0x80
[9.890815]  hibmc_pci_probe+0x170/0x3c8
[9.894773]  local_pci_probe+0x3c/0xb0
[9.898555]  work_for_cpu_fn+0x18/0x28
[9.902337]  process_one_work+0x1e0/0x318
[9.906382]  worker_thread+0x228/0x450
[9.910164]  kthread+0x128/0x130
[9.913418]  ret_from_fork+0x10/0x18
[9.917024] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
[9.923180] ---[ end trace 2695ffa0af5be375 ]---

Fixes: d1667b86795a ("drm/hisilicon/hibmc: Add support for frame buffer")
Signed-off-by: John Garry 
---
 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
index b92595c..8bd2907 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
@@ -122,6 +122,7 @@ static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, _cmd, gobj);
if (IS_ERR(hi_fbdev->fb)) {
ret = PTR_ERR(hi_fbdev->fb);
+   hi_fbdev->fb = NULL;
DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
goto out_release_fbi;
}
-- 
1.9.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 3/3] drm/hisilicon: hibmc: Use HUAWEI PCI vendor ID macro

2018-09-23 Thread John Garry
Switch to use Huawei PCI vendor ID macro from pci_ids.h file.

In addition, switch to use PCI_VDEVICE() instead of open coding.

Signed-off-by: John Garry 
---
 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
index d4f6f1f..79b6bda 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
@@ -402,7 +402,7 @@ static void hibmc_pci_remove(struct pci_dev *pdev)
 }
 
 static struct pci_device_id hibmc_pci_table[] = {
-   {0x19e5, 0x1711, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
+   { PCI_VDEVICE(HUAWEI, 0x1711) },
{0,}
 };
 
-- 
1.9.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Bug report: HiBMC crash

2018-09-23 Thread John Garry

On 21/09/2018 15:28, Chris Wilson wrote:

Quoting John Garry (2018-09-21 09:11:19)

On 21/09/2018 06:49, Liuxinliang (Matthew Liu) wrote:

Hi John,
Thank you for reporting bug.
I am now using 4.18.7. I haven't found this issue yet.
I will try linux-next and figure out what's wrong with it.

Thanks,
Xinliang




As mentioned in internal mail, the issue may be that the surface
depth/bpp we were using the in the driver was previously invalid, but
code has since been added in v4.19 to reject this. Specifically it looks
like this patch:

commit 70109354fed232dfce8fb2c7cadf635acbe03e19
Author: Chris Wilson 
Date:   Wed Sep 5 16:31:16 2018 +0100

 drm: Reject unknown legacy bpp and depth for drm_mode_addfb ioctl



diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
index b92595c477ef..f3e7f41e6781 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_fbdev.c
@@ -71,7 +71,6 @@ static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
DRM_DEBUG_DRIVER("surface width(%d), height(%d) and bpp(%d)\n",
 sizes->surface_width, sizes->surface_height,
 sizes->surface_bpp);
-   sizes->surface_depth = 32;

bytes_per_pixel = DIV_ROUND_UP(sizes->surface_bpp, 8);

@@ -192,7 +191,6 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
return -ENOMEM;
}

-   priv->fbdev = hifbdev;
drm_fb_helper_prepare(priv->dev, >helper,
  _fbdev_helper_funcs);

@@ -246,6 +244,7 @@ int hibmc_fbdev_init(struct hibmc_drm_private *priv)
 fix->ypanstep, fix->ywrapstep, fix->line_length,
 fix->accel, fix->capabilities);

+   priv->fbdev = hifbdev;
return 0;

 fini:

>
> Apply chunks 2&3 first to confirm they fix the GPF.
> -Chris

Hi Chris,

So relocating where priv->fbdev is set does fix the crash.

However then applying chunk #1 introduces another crash:

9.229007] pci 0007:90:00.0: can't derive routing for PCI INT A
[9.235082] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[9.240457] [TTM] Zone  kernel: Available graphics memory: 16297792 kiB
[9.247147] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[9.253744] [TTM] Initializing pool allocator
[9.258148] [TTM] Initializing DMA pool allocator
[9.262951] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[9.269636] [drm] No driver support for vblank timestamp query.
[9.280967] Unable to handle kernel9.229007] pci 0007:90:00.0: 
can't derive routing for PCI INT A

[9.235082] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[9.240457] [TTM] Zone  kernel: Available graphics memory: 16297792 kiB
[9.247147] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[9.253744] [TTM] Initializing pool allocator
[9.258148] [TTM] Initializing DMA pool allocator
[9.262951] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[9.269636] [drm] No driver support for vblank timestamp query.
[9.280967] Unable to handle kernel NULL pointer dereference at 
virtual address 0150

[9.289849] Mem abort info:
[9.292666]   ESR = 0x9644
[9.295747]   Exception class = DABT (current EL), IL = 32 bits
[9.301728]   SET = 0, FnV = 0
[9.304809]   EA = 0, S1PTW = 0
[9.307977] Data abort info:
[9.310882]   ISV = 0, ISS = 0x0044
[9.314754]   CM = 0, WnR = 1
[9.317744] [0150] user address but active_mm is swapper
[9.324166] Internal error: Oops: 9644 [#1] PREEMPT SMP
[9.329793] Modules linked in:
[9.332874] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted 
4.19.0-rc4-next-20180920-1-g9b0012c-dirty #345
[9.342983] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 IT21 Nemo 2.0 RC0 04/18/2018

[9.352223] Workqueue: events work_for_cpu_fn
[9.356621] pstate: 8005 (Nzcv daif -PAN -UAO)
[9.361461] pc : hibmc_drm_fb_create+0x20c/0x3c0
[9.366122] lr : hibmc_drm_fb_create+0x1e4/0x3c0
[9.370781] sp : 0aeebb50
[9.374123] x29: 0aeebb50 x28: 
[9.379489] x27: 0aeebca0 x26: 8017b3830800
[9.384854] x25: 8017b3828018 x24: 8017b3850018
[9.390219] x23: 8017b3830670 x22: 8017b3830800
[9.395583] x21: 000eb000 x20: 8017b3830a70
[9.400948] x19: 091f9000 x18: 
[9.406313] x17:  x16: 8017d4168000
[9.411678] x15: 091f96c8 x14: 09049000
[9.417042] x13:  x12: 
[9.422407] x11: 8017daf39940 x10: 0040
[9.427772] x9 : 8017b53e02b0 x8 : 8017daf39918
[9.433136] x7 : 8017daf39a60 x6 : 8017b3840800
[9.438500] x5 : 00

Re: Bug report: HiBMC crash

2018-09-23 Thread John Garry

On 21/09/2018 06:49, Liuxinliang (Matthew Liu) wrote:

Hi John,
Thank you for reporting bug.
I am now using 4.18.7. I haven't found this issue yet.
I will try linux-next and figure out what's wrong with it.

Thanks,
Xinliang




As mentioned in internal mail, the issue may be that the surface 
depth/bpp we were using the in the driver was previously invalid, but 
code has since been added in v4.19 to reject this. Specifically it looks 
like this patch:


commit 70109354fed232dfce8fb2c7cadf635acbe03e19
Author: Chris Wilson 
Date:   Wed Sep 5 16:31:16 2018 +0100

drm: Reject unknown legacy bpp and depth for drm_mode_addfb ioctl


Thanks,
John


On 2018/9/20 19:23, John Garry wrote:

On 20/09/2018 11:04, John Garry wrote:

Hi,

I am seeing this crash below on linux-next (20 Sept).

This is on an arm64 D05 board, which includes the HiBMC device. D06 was
also crashing for what looked like same reason. I am using standard
defconfig, except DRM and DRM_HISI_HIBMC are built-in.

Is this a known issue? I tested v4.19-rc3 and it had no such crash.

The origin seems to be here, where pointer info is not checked for NULL
for safety:
static int framebuffer_check(struct drm_device *dev,
 const struct drm_mode_fb_cmd2 *r)
{
...

/* now let the driver pick its own format info */
info = drm_get_format_info(dev, r);

...

for (i = 0; i < info->num_planes; i++) {
unsigned int width = fb_plane_width(r->width, info, i);
unsigned int height = fb_plane_height(r->height, info, i);
unsigned int cpp = info->cpp[i];




Upon closer inspection the crash is actually from hibmc probe error
handling path, specifically
hibmc_fbdev_destroy()->drm_framebuffer_put() is called with fb holding
the error value from hibmc_framebuffer_init(), as shown:

static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
   struct drm_fb_helper_surface_size *sizes)
{

...

hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, _cmd, gobj);
if (IS_ERR(hi_fbdev->fb)) {
ret = PTR_ERR(hi_fbdev->fb);

*** hi_fbdev->fb holds error code ***

DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
goto out_release_fbi;
}


static void hibmc_fbdev_destroy(struct hibmc_fbdev *fbdev)
{
struct hibmc_framebuffer *gfb = fbdev->fb;
struct drm_fb_helper *fbh = >helper;

drm_fb_helper_unregister_fbi(fbh);

drm_fb_helper_fini(fbh);

**>fb holds error code, not pointer ***

if (gfb)
drm_framebuffer_put(>fb);
}

This change fixes the crash for me:

hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, _cmd, gobj);
if (IS_ERR(hi_fbdev->fb)) {
ret = PTR_ERR(hi_fbdev->fb);
+hi_fbdev->fb = NULL;
DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
goto out_release_fbi;
}

Why we're hitting the error path at all, I don't know.

And, having said all that, the code I pointed out in
framebuffer_check() still does not seem safe for same reason I
mentioned originally.

John


John

[9.220446] pci 0007:90:00.0: can't derive routing for PCI INT A
[9.226517] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[9.231847] [TTM] Zone  kernel: Available graphics memory:
16297696 kiB
[9.238536] [TTM] Zone   dma32: Available graphics memory: 2097152
kiB
[9.245133] [TTM] Initializing pool allocator
[9.249536] [TTM] Initializing DMA pool allocator
[9.254340] [drm] Supports vblank timestamp caching Rev 2
(21.10.2013).
[9.261026] [drm] No driver support for vblank timestamp query.
[9.272431] WARNING: CPU: 16 PID: 293 at
drivers/gpu/drm/drm_fourcc.c:221 drm_format_info.part.1+0x0/0x8
[9.282014] Modules linked in:
[9.285095] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
4.19.0-rc4-next-20180920-1-g9b0012c #322
[9.294677] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
D05 IT21 Nemo 2.0 RC0 04/18/2018
[9.303915] Workqueue: events work_for_cpu_fn
[9.308314] pstate: 6005 (nZCv daif -PAN -UAO)
[9.313150] pc : drm_format_info.part.1+0x0/0x8
[9.317724] lr : drm_get_format_info+0x90/0x98
[9.322208] sp : 0af1baf0
[9.325549] x29: 0af1baf0 x28: 
[9.330915] x27: 0af1bcb0 x26: 8017d3018800
[9.336279] x25: 8017d28a0018 x24: 8017d2f80018
[9.341644] x23: 8017d3018670 x22: 0af1bbf0
[9.347009] x21: 8017d3018a70 x20: 0af1bbf0
[9.352373] x19: 0af1bbf0 x18: 
[9.357737] x17:  x16: 
[9.363102] x15: 092296c8 x14: 09074000
[9.368466] x13:  x12: 
[9.373831] x11: 8017fbffe008 x10: 8017db9307e8
[9.379195] x9 :  x8 : 8017b517c800
[9.384560] x7 :  x6 : 003f
[9.389924] x5 : 0040 

Re: Bug report: HiBMC crash

2018-09-21 Thread John Garry

On 20/09/2018 11:04, John Garry wrote:

Hi,

I am seeing this crash below on linux-next (20 Sept).

This is on an arm64 D05 board, which includes the HiBMC device. D06 was
also crashing for what looked like same reason. I am using standard
defconfig, except DRM and DRM_HISI_HIBMC are built-in.

Is this a known issue? I tested v4.19-rc3 and it had no such crash.

The origin seems to be here, where pointer info is not checked for NULL
for safety:
static int framebuffer_check(struct drm_device *dev,
 const struct drm_mode_fb_cmd2 *r)
{
...

/* now let the driver pick its own format info */
info = drm_get_format_info(dev, r);

...

for (i = 0; i < info->num_planes; i++) {
unsigned int width = fb_plane_width(r->width, info, i);
unsigned int height = fb_plane_height(r->height, info, i);
unsigned int cpp = info->cpp[i];




Upon closer inspection the crash is actually from hibmc probe error 
handling path, specifically hibmc_fbdev_destroy()->drm_framebuffer_put() 
is called with fb holding the error value from hibmc_framebuffer_init(), 
as shown:


static int hibmc_drm_fb_create(struct drm_fb_helper *helper,
   struct drm_fb_helper_surface_size *sizes)
{

...

hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, _cmd, gobj);
if (IS_ERR(hi_fbdev->fb)) {
ret = PTR_ERR(hi_fbdev->fb);

*** hi_fbdev->fb holds error code ***

DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
goto out_release_fbi;
}


static void hibmc_fbdev_destroy(struct hibmc_fbdev *fbdev)
{
struct hibmc_framebuffer *gfb = fbdev->fb;
struct drm_fb_helper *fbh = >helper;

drm_fb_helper_unregister_fbi(fbh);

drm_fb_helper_fini(fbh);

**  >fb holds error code, not pointer ***

if (gfb)
drm_framebuffer_put(>fb);
}

This change fixes the crash for me:

hi_fbdev->fb = hibmc_framebuffer_init(priv->dev, _cmd, gobj);
if (IS_ERR(hi_fbdev->fb)) {
ret = PTR_ERR(hi_fbdev->fb);
+   hi_fbdev->fb = NULL;
DRM_ERROR("failed to initialize framebuffer: %d\n", ret);
goto out_release_fbi;
}

Why we're hitting the error path at all, I don't know.

And, having said all that, the code I pointed out in framebuffer_check() 
still does not seem safe for same reason I mentioned originally.


John


John

[9.220446] pci 0007:90:00.0: can't derive routing for PCI INT A
[9.226517] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[9.231847] [TTM] Zone  kernel: Available graphics memory: 16297696 kiB
[9.238536] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[9.245133] [TTM] Initializing pool allocator
[9.249536] [TTM] Initializing DMA pool allocator
[9.254340] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[9.261026] [drm] No driver support for vblank timestamp query.
[9.272431] WARNING: CPU: 16 PID: 293 at
drivers/gpu/drm/drm_fourcc.c:221 drm_format_info.part.1+0x0/0x8
[9.282014] Modules linked in:
[9.285095] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted
4.19.0-rc4-next-20180920-1-g9b0012c #322
[9.294677] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
D05 IT21 Nemo 2.0 RC0 04/18/2018
[9.303915] Workqueue: events work_for_cpu_fn
[9.308314] pstate: 6005 (nZCv daif -PAN -UAO)
[9.313150] pc : drm_format_info.part.1+0x0/0x8
[9.317724] lr : drm_get_format_info+0x90/0x98
[9.322208] sp : 0af1baf0
[9.325549] x29: 0af1baf0 x28: 
[9.330915] x27: 0af1bcb0 x26: 8017d3018800
[9.336279] x25: 8017d28a0018 x24: 8017d2f80018
[9.341644] x23: 8017d3018670 x22: 0af1bbf0
[9.347009] x21: 8017d3018a70 x20: 0af1bbf0
[9.352373] x19: 0af1bbf0 x18: 
[9.357737] x17:  x16: 
[9.363102] x15: 092296c8 x14: 09074000
[9.368466] x13:  x12: 
[9.373831] x11: 8017fbffe008 x10: 8017db9307e8
[9.379195] x9 :  x8 : 8017b517c800
[9.384560] x7 :  x6 : 003f
[9.389924] x5 : 0040 x4 : 
[9.395289] x3 : 08d04000 x2 : 56555941
[9.400654] x1 : 08d04f70 x0 : 0044
[9.406019] Call trace:
[9.408483]  drm_format_info.part.1+0x0/0x8
[9.412705]  drm_helper_mode_fill_fb_struct+0x20/0x80
[9.417807]  hibmc_framebuffer_init+0x48/0xd0
[9.422204]  hibmc_drm_fb_create+0x1ec/0x3c8
[9.426513]  __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
[9.432756]  drm_fb_helper_initial_config+0x3c/0x48
[9.437681]  hibmc_fbdev_init+0xb4/0x198
[9.441638]  hib

Bug report: HiBMC crash

2018-09-21 Thread John Garry
0
[9.597660] x3 : 0af1bc24 x2 : 08d23f50
[9.603024] x1 : 8017b517c700 x0 : 
[9.608389] Call trace:
[9.610852]  drm_framebuffer_init+0x18/0x110
[9.615161]  hibmc_framebuffer_init+0x60/0xd0
[9.619558]  hibmc_drm_fb_create+0x1ec/0x3c8
[9.623867]  __drm_fb_helper_initial_config_and_unlock+0x1cc/0x418
[9.630110]  drm_fb_helper_initial_config+0x3c/0x48
[9.635034]  hibmc_fbdev_init+0xb4/0x198
[9.638991]  hibmc_pci_probe+0x2f4/0x3c8
[9.642949]  local_pci_probe+0x3c/0xb0
[9.646731]  work_for_cpu_fn+0x18/0x28
[9.650513]  process_one_work+0x1e0/0x318
[9.654558]  worker_thread+0x228/0x450
[9.658339]  kthread+0x128/0x130
[9.661594]  ret_from_fork+0x10/0x18
[9.665199] ---[ end trace 2695ffa0af5be374 ]---
[9.669868] [drm:hibmc_framebuffer_init] *ERROR* drm_framebuffer_init 
failed: -22
[9.677434] [drm:hibmc_drm_fb_create] *ERROR* failed to initialize 
framebuffer: -22
[9.685182] [drm:hibmc_fbdev_init] *ERROR* failed to setup initial 
conn config: -22

[9.692926] [drm:hibmc_pci_probe] *ERROR* failed to initialize fbdev: -22
[9.699791] Unable to handle kernel NULL pointer dereference at 
virtual address 001a

[9.708672] Mem abort info:
[9.711489]   ESR = 0x9604
[9.714570]   Exception class = DABT (current EL), IL = 32 bits
[9.720551]   SET = 0, FnV = 0
[9.723631]   EA = 0, S1PTW = 0
[9.726799] Data abort info:
[9.729702]   ISV = 0, ISS = 0x0004
[9.733573]   CM = 0, WnR = 0
[9.736566] [001a] user address but active_mm is swapper
[9.742987] Internal error: Oops: 9604 [#1] PREEMPT SMP
[9.748614] Modules linked in:
[9.751694] CPU: 16 PID: 293 Comm: kworker/16:1 Tainted: GW 
  4.19.0-rc4-next-20180920-1-g9b0012c #322
[9.762681] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 IT21 Nemo 2.0 RC0 04/18/2018

[9.771915] Workqueue: events work_for_cpu_fn
[9.776312] pstate: 6005 (nZCv daif -PAN -UAO)
[9.781150] pc : drm_mode_object_put+0x0/0x20
[9.785547] lr : hibmc_fbdev_fini+0x40/0x58
[9.789767] sp : 0af1bcf0
[9.793108] x29: 0af1bcf0 x28: 
[9.798473] x27:  x26: 08f66630
[9.803838] x25:  x24: 095abb98
[9.809203] x23: 8017db92fe00 x22: 8017d2b13000
[9.814568] x21: ffea x20: 8017d2f80018
[9.819933] x19: 8017d28a0018 x18: 
[9.825297] x17:  x16: 
[9.830662] x15: 092296c8 x14: 8939970f
[9.836026] x13: 0939971d x12: 09229940
[9.841391] x11: 085f8fc0 x10: 0af1b9a0
[9.846756] x9 : 000d x8 : 6620657a696c6169
[9.852121] x7 : 8017d3340580 x6 : 8017d4168000
[9.857486] x5 :  x4 : 8017db92fb20
[9.862850] x3 : 2690 x2 : 8017d3340480
[9.868214] x1 : 0028 x0 : 0002
[9.873580] Process kworker/16:1 (pid: 293, stack limit = 
0x(ptrval))

[9.880788] Call trace:
[9.883252]  drm_mode_object_put+0x0/0x20
[9.887297]  hibmc_unload+0x1c/0x80
[9.890815]  hibmc_pci_probe+0x170/0x3c8
[9.894773]  local_pci_probe+0x3c/0xb0
[9.898555]  work_for_cpu_fn+0x18/0x28
[9.902337]  process_one_work+0x1e0/0x318
[9.906382]  worker_thread+0x228/0x450
[9.910164]  kthread+0x128/0x130
[9.913418]  ret_from_fork+0x10/0x18
[9.917024] Code: a94153f3 a8c27bfd d65f03c0 d503201f (f9400c01)
[9.923180] ---[ end trace 2695ffa0af5be375 ]---

On Thu, 20 Sep 2018 at 10:06, John Garry  wrote:
[9.196615] arm-smmu-v3 arm-smmu-v3.4.auto: ias 44-bit, oas 44-bit 
(features 0x0f0d)
[9.206296] arm-smmu-v3 arm-smmu-v3.4.auto: no evtq irq - events will 
not be reported!
[9.214302] arm-smmu-v3 arm-smmu-v3.4.auto: no gerr irq - errors will 
not be reported!

[9.222673] pci 0007:90:00.0: can't derive routing for PCI INT A
[9.228746] hibmc-drm 0007:91:00.0: PCI INT A: no GSI
[9.234073] [TTM] Zone  kernel: Available graphics memory: 16297696 kiB
[9.240763] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[9.247361] [TTM] Initializing pool allocator
[9.251763] [TTM] Initializing DMA pool allocator
[9.256565] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[9.263250] [drm] No driver support for vblank timestamp query.
[9.274661] WARNING: CPU: 16 PID: 293 at 
drivers/gpu/drm/drm_fourcc.c:221 drm_format_info.part.1+0x0/0x8

[9.284244] Modules linked in:
[9.287326] CPU: 16 PID: 293 Comm: kworker/16:1 Not tainted 
4.19.0-rc4-next-20180919-1-gcb2f9f4-dirty #321
[9.297435] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon 
D05 IT21 Nemo 2.0 RC0 04/18/2018

[9.306674] Workqueue: events work_for_cpu_fn
[9.311072] pstate: 6005 (nZCv daif -PAN -UAO)
[