[BUG, regression] Dereferencing of NULL pointer in radeon_mn_unregister()

2019-09-01 Thread Petr Cvek
Hi,

kernel: 5.3.0-rc6-next

After starting Xorg and running xrandr the Xorg crashes with (not exactly 
useful, it is MIPS dump):

[   28.842553] CPU 0 Unable to handle kernel paging request at virtual address 
001c, epc == 808de6d4, ra == 804d32ec
[   28.853387] Oops[#1]:
[   28.855699] CPU: 0 PID: 692 Comm: Xorg Not tainted 5.3.0-rc6-next-20190826+ 
#59
[   28.863104] $ 0   :  80b6 0011 87f1af00
[   28.868407] $ 4   : 001c 0002 0002 00fe
[   28.873705] $ 8   : 865e9fe0 fc00 0004 
[   28.879003] $12   : 87f1baf0  da9a 0040
[   28.884301] $16   : 86434450 86434400  001c
[   28.889600] $20   : 865e9dbc  80912ee4 865e9dbc
[   28.894898] $24   : 80add220 27cfd6fd  
[   28.900198] $28   : 865e8000 865e9cb8 0009 804d32ec
[   28.905499] Hi: 91bb
[   28.908414] Lo: 6e44
[   28.911350] epc   : 808de6d4 mutex_lock+0x8/0x44
[   28.916045] ra: 804d32ec radeon_mn_unregister+0x3c/0xb0
[   28.921687] Status: 1100fc03 KERNEL EXL IE 
[   28.925929] Cause : 0088 (ExcCode 02)
[   28.929987] BadVA : 001c
[   28.932903] PrId  : 00019655 (MIPS 24KEc)
[   28.936961] Modules linked in: usbhid hid_generic hid evdev
[   28.942635] Process Xorg (pid: 692, threadinfo=68a84c48, task=84477b53, 
tls=77e03da0)
[   28.950566] Stack :  804d32e4 0001  84d7b400 84d7b400 
8784a078 86434450
[   28.959043] 86632600 8663268c 803a4ed4 8041583c  803b6d94 
865e9dbc 86434450
[   28.967519] 86632600 86434400 86632600 803a451c 87912980 879129ac 
80ae 0007
[   28.975996] 0007 86632620 86632600 803a45d0 87ffc718 71a8f000 
71a8f000 87ffc71c
[   28.984472] 71a8efff 800d3c08 865eac00 86632600  803a5bf4 
71a8f000 
[   28.992948] ...
[   28.995425] Call Trace:
[   28.997905] [<808de6d4>] mutex_lock+0x8/0x44
[   29.002239] [<804d32ec>] radeon_mn_unregister+0x3c/0xb0
[   29.007550] [<8041583c>] radeon_gem_object_free+0x18/0x2c
[   29.013031] [<803a451c>] drm_gem_object_release_handle+0x74/0xac
[   29.019122] [<803a45d0>] drm_gem_handle_delete+0x7c/0x128
[   29.024599] [<803a5bf4>] drm_ioctl_kernel+0xb0/0x108
[   29.029633] [<803a5e74>] drm_ioctl+0x200/0x3a8
[   29.034154] [<803e07b4>] radeon_drm_ioctl+0x54/0xc0
[   29.039110] [<801214dc>] do_vfs_ioctl+0x4e8/0x81c
[   29.043880] [<80121864>] ksys_ioctl+0x54/0xb0
[   29.048305] [<8001100c>] syscall_common+0x34/0x58
[   29.053074] Code: 24050002  27bdfff8  8f83  14a5   
 00600825  e081  1020fffa 

but it seems there is NULL pointer at this line:


https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/gpu/drm/radeon/radeon_mn.c?h=next-20190830#n237

The code is:

struct radeon_mn *rmn = bo->mn;
...
mutex_lock(>lock); //<-crash

A quick assert proves the bo->mn returns NULL. The code worked in 4.19-rc and 
it seems the problematic patch is 

drm/radeon: use mmu_notifier_get/put for struct radeon_mn

as it removes the NULL check.

Forcing -ENODEV in the register funtion (and immediate return in unregister as 
without CONFIG_MMU_NOTIFIER) works.

Petr
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [BUG, regression] Dereferencing of NULL pointer in radeon_mn_unregister()

2019-09-01 Thread Petr Cvek
Dne 01. 09. 19 v 16:04 Jason Gunthorpe napsal(a):
> On Sun, Sep 01, 2019 at 11:38:10AM +0200, Petr Cvek wrote:
> 
>> The code is:
>>
>>  struct radeon_mn *rmn = bo->mn;
>>  ...
>>  mutex_lock(>lock); //<-crash
>>
>> A quick assert proves the bo->mn returns NULL. The code worked in
>> 4.19-rc and it seems the problematic patch is
> 
> Hum, the code went away because the locking protecting that variable
> went away.. It means the caller is not careful to pair register and
> unregister.
>  
>>  drm/radeon: use mmu_notifier_get/put for struct radeon_mn
>>
>> as it removes the NULL check.
>>
>> Forcing -ENODEV in the register funtion (and immediate return in
>> unregister as without CONFIG_MMU_NOTIFIER) works.
> 
> Is just adding a
> 
>   if (!rmn)
>retrun
> 
> To the top of radeon_mn_unregister enough to fix it?

Yeah it seems to work. A further test with minetest works too.

Petr 

> 
> Jason
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC] AMD polaris MEM_AP_SIZE location (PCI BAR aperture size)

2019-08-28 Thread Petr Cvek
Dne 28. 08. 19 v 15:06 Koenig, Christian napsal(a):
>> Yeah but sadly it seems it is possible to only increase the BAR size from 
>> its currently default 256MB.
> 
> Well the specification allows to change the BAR size from 1MB up to 
> several TB. The key point is we usually use it to increase the BAR size, 
> but it is perfectly possible to make it smaller as well.
> 

Yeah but only with the sizes reported in Resizable BAR Capability Register 
(+0x4). Which in my case contains the value 0x0001f000. The first bit from 
right is 12, which means 256 MB, 512 MB, ... . 

So I guess my RX460 doesn't go under 256.

Anyway, thanks for help.

Petr

> Take a look at the function pci_rebar_set_size() for example. You should 
> be able to use this as something like pci_rebar_set_size(>pdev, 0, 
> 6) to get a 64MB BAR.
> 
> Alternatively you can try to program the PCIe config space with the 
> setpci commandline tool.
> 
> Then you need to force a rescan of the PCIe bus so that the kernel can 
> actually detect the change.
> 
> Regards,
> Christian.
> 
> Am 28.08.19 um 14:46 schrieb Petr Cvek:
>> Dne 28. 08. 19 v 10:31 Christian König napsal(a):
>>> Hi Petr,
>>>
>>> well that is indeed a rather unusual use case.
>>>
>>> I'm not 100% sure how you actually hacked the HD4550 to do what you want to 
>>> do, cause this ASIC generation shouldn't support this.
>> I don't remember correctly how exactly I've managed to do that but I think 
>> the process was:
>>
>> I compared different BIOSes first and I found a correlation between aperture 
>> sizes in the ROM images of older (x1300/R520) generation. While doing that 
>> I've found mentions of MEM_AP_SIZE ROM powerup strap registers (from x.org 
>> documents) and I tried to flash the settings of HD4550 from 256MB (default) 
>> directly to 64MB and I was surprised it worked (I've though it will be only 
>> 128MB). IF I google the "MEM_AP_SIZE" now I can find a document [1], which 
>> says (page 56) the ROM address is 0x78 (page 56) and the table says 64MB is 
>> possible, which both match the experiment.
>>
>> Of course the bios is now CRC broken, but it doesn't matter in linux (the 
>> ROM code is just x86 anyway).
>>
>> [1] https://dev.xdevs.com/attachments/download/233/AMD_RV710_ds_nda_1.01b.pdf
>>
>>> For a Polaris you can just use the PCIe resizeable BAR extension. For how 
>>> to use it see the pci_resize_resource() function in the linux kernel.
>>>
>>> Please be aware that we usually use the function to increase the BAR size 
>>> to allow the CPU to access more of the on board memory, so making it 
>>> smaller might actually not be tested at all.
>> Yeah but sadly it seems it is possible to only increase the BAR size from 
>> its currently default 256MB.
>>
>>> Regards,
>>> Christian.
>>>
>>>
>>> Am 27.08.19 um 04:36 schrieb Petr Cvek:
>>>> Hello,
>>>>
>>>> I'm trying to run AMD GPUs in unusual configurations. I was able to 
>>>> decrease the PCI BAR size in HD4550 by its BIOS strap configuration and 
>>>> change it to 64MB (and I was able to run it on MIPS vocore2 board :-D ). 
>>>> Is there a similar configuration location for AMD polaris 11/RX 460 BIOS?
>>>>
>>>> Petr Cvek
>>>> ___
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC] AMD polaris MEM_AP_SIZE location (PCI BAR aperture size)

2019-08-28 Thread Petr Cvek
Dne 28. 08. 19 v 10:31 Christian König napsal(a):
> Hi Petr,
> 
> well that is indeed a rather unusual use case.
> 
> I'm not 100% sure how you actually hacked the HD4550 to do what you want to 
> do, cause this ASIC generation shouldn't support this.

I don't remember correctly how exactly I've managed to do that but I think the 
process was: 

I compared different BIOSes first and I found a correlation between aperture 
sizes in the ROM images of older (x1300/R520) generation. While doing that I've 
found mentions of MEM_AP_SIZE ROM powerup strap registers (from x.org 
documents) and I tried to flash the settings of HD4550 from 256MB (default) 
directly to 64MB and I was surprised it worked (I've though it will be only 
128MB). IF I google the "MEM_AP_SIZE" now I can find a document [1], which says 
(page 56) the ROM address is 0x78 (page 56) and the table says 64MB is 
possible, which both match the experiment.

Of course the bios is now CRC broken, but it doesn't matter in linux (the ROM 
code is just x86 anyway).

[1] https://dev.xdevs.com/attachments/download/233/AMD_RV710_ds_nda_1.01b.pdf

> 
> For a Polaris you can just use the PCIe resizeable BAR extension. For how to 
> use it see the pci_resize_resource() function in the linux kernel.
> 
> Please be aware that we usually use the function to increase the BAR size to 
> allow the CPU to access more of the on board memory, so making it smaller 
> might actually not be tested at all.

Yeah but sadly it seems it is possible to only increase the BAR size from its 
currently default 256MB.

> 
> Regards,
> Christian.
> 
> 
> Am 27.08.19 um 04:36 schrieb Petr Cvek:
>> Hello,
>>
>> I'm trying to run AMD GPUs in unusual configurations. I was able to decrease 
>> the PCI BAR size in HD4550 by its BIOS strap configuration and change it to 
>> 64MB (and I was able to run it on MIPS vocore2 board :-D ). Is there a 
>> similar configuration location for AMD polaris 11/RX 460 BIOS?
>>
>> Petr Cvek
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[RFC] AMD polaris MEM_AP_SIZE location (PCI BAR aperture size)

2019-08-27 Thread Petr Cvek
Hello,

I'm trying to run AMD GPUs in unusual configurations. I was able to decrease 
the PCI BAR size in HD4550 by its BIOS strap configuration and change it to 
64MB (and I was able to run it on MIPS vocore2 board :-D ). Is there a similar 
configuration location for AMD polaris 11/RX 460 BIOS? 

Petr Cvek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx