Re: drm/imx: Crash in drm_mode_config_cleanup

2018-09-13 Thread Stefan Agner
Hi Philipp,

On 13.09.2018 01:36, Philipp Zabel wrote:
> Hi Stefan,
> 
> thank you for the report. I think what happens is the following:

Thanks for looking into it!

> 
> On Wed, 2018-09-12 at 15:26 -0700, Stefan Agner wrote:
>> Hi,
>>
>> While working on Apalis iMX6 with four displays, I encountered the
>> following crash:
>>
> [...]
>> [3.781163] imx-drm display-subsystem: bound 12.hdmi (ops 
>> dw_hdmi_imx_ops)
>> [3.788441] imx-drm display-subsystem: binding ldb (ops imx_ldb_ops)
>> [3.794902] imx_ldb_bind, imx_ldb ec0da010
> 
> component_bind calls devres_open_group right before component->ops->bind
> 
> The devmem_kzalloc'ed imx_ldb_channel that contains the encoder is
> allocated in this devres group.
> 
>> [3.799818] drm_encoder_init, encoder ec0da388
>> [3.804363] drm_encoder_init, ret 0
> 
> drm_encoder_init adds the encoder in imx_ldb to the
> mode_config.encoder_list.
> 
>> [3.807908] imx_ldb_register, encoder ec0da388
>> [3.812432] imx_ldb_register, ret 0
>> [3.815951] imx_ldb_bind encoder LVDS-46, ec0da388 ->func c0e6371c
>> [3.87] imx_ldb_bind, done
>> [3.825331] imx-drm display-subsystem: bound ldb (ops imx_ldb_ops)
>> [3.831614] imx-drm display-subsystem: binding 
>> parallel-display-controller1 (ops imx_pd_ops)
>> [3.840285] drm_encoder_init, encoder ec8a2b78
>> [3.844931] imx-drm display-subsystem: Linked as a consumer to panel
>> [3.851472] imx-drm display-subsystem: bound parallel-display-controller1 
>> (ops imx_pd_ops)
>> [3.859785] imx-drm display-subsystem: binding 
>> parallel-display-controller0 (ops imx_pd_ops)
>> [3.868561] imx-drm display-subsystem: failed to bind 
>> parallel-display-controller0 (ops imx_pd_ops): -517
>> [3.878225] imx-drm display-subsystem: Dropping the link to panel
>> [3.884679] imx_ldb_unbind
>> [3.887421] imx_ldb_unbind, encoder LVDS-46, ec0da388 ->func c0e6371c
>> [3.893950] imx_ldb_unbind, encoder (null), ec0da838 ->func 0
>> [3.899723] imx_ldb_unbind, done
> 
> component_unbind calls devres_release_group after component->ops-
>>unbind, freeing imx_ldb and with it the encoder that is still in the
> mode_config.encoder_list

Oh I see, I did not realize that component_unbind releases devres...

> 
>> [3.908345] imx_drm_bind, component_bind_all, ret -517
>> [3.913638] drm_mode_config_cleanup, encoder TMDS-44, ec861894 ->funcs 
>> c0e63a18
>> [3.921260] drm_mode_config_cleanup, calling encoder->funcs->destroy!
>> [3.927897] drm_encoder_cleanup, encoder ec861894
>> [3.932717] drm_mode_config_cleanup, encoder (null), ec0da388 ->funcs 0
> 
> Use after free.
> 

That all makes sense now...


>> [3.939356] drm_mode_config_cleanup, calling encoder->funcs->destroy!
>> [3.946050] Unable to handle kernel NULL pointer dereference at virtual 
>> address 0004
> [...]
>> The Apalis iMX6 device tree I work with uses a dumb VGA bridge and a LVDS
>> display directly specified in the ldb node. In the case above I did not
>> compile in the dumb VGA bridge driver, that is the reason why binding
>> parallel-display-controller0 fails.
> 
> I suppose this means that component drivers must not allocate the
> encoder in the component devres group during bind. Not only their own
> failure, but any other component's failure can cause component_unbind to
> be called in the cleanup path of component_bind_all, freeing the memory
> before drm_mode_config_cleanup is called.
> 

Indeed, moving allocation into driver bind fixes the crash!

I wonder, how does devres knows that imx_ldb got allocated in probe and does 
not free on unbind? I guess that is the group mechanism which makes sure of 
that?

ldb would not the only driver affected, also 
drivers/gpu/drm/imx/parallel-display.c allocates encoders in component bind, 
there are probably more.

Actually, when thinking more about it, if we would have a way to call framework 
cleanup functions (drm_mode_config_cleanup() in this case) between the bind and 
unbind loops in component_bind_all(), we would not have that problem.

[adding Russel, since he introduced the component infrastructure]

>> The LVDS encoder (0xec1a0388 in that case) is somehow already cleaned up
>> when calling drm_mode_config_cleanup(), which leads to a null pointer
>> deference when trying to call destroy.
>>
>> I was unable to figure out why the DRM encoder in the second
>> drm_mode_config_cleanup() call is already zeroed out. The component
>> framework calls imx_ldb_unbind() first, but that should not free
>> up the encoder afaict. Any idea?
>>
>> I was also wondering in the deferred probing case, functions like
>> imx_ldb_bind() call devm_kzalloc on every try. Since the underlying
>> device is not removed, doesn't this leads to multiple active allocations?
> 
> See above. To me it looks like we have to allocate imx_ldb in probe and
> deal with possibly partially preinitialzied structure when bind is
> called a second time.
> 

Yeah with the 

Re: drm/imx: Crash in drm_mode_config_cleanup

2018-09-13 Thread Philipp Zabel
Hi Stefan,

thank you for the report. I think what happens is the following:

On Wed, 2018-09-12 at 15:26 -0700, Stefan Agner wrote:
> Hi,
> 
> While working on Apalis iMX6 with four displays, I encountered the
> following crash:
> 
[...]
> [3.781163] imx-drm display-subsystem: bound 12.hdmi (ops 
> dw_hdmi_imx_ops)
> [3.788441] imx-drm display-subsystem: binding ldb (ops imx_ldb_ops)
> [3.794902] imx_ldb_bind, imx_ldb ec0da010

component_bind calls devres_open_group right before component->ops->bind

The devmem_kzalloc'ed imx_ldb_channel that contains the encoder is
allocated in this devres group.

> [3.799818] drm_encoder_init, encoder ec0da388
> [3.804363] drm_encoder_init, ret 0

drm_encoder_init adds the encoder in imx_ldb to the
mode_config.encoder_list.

> [3.807908] imx_ldb_register, encoder ec0da388
> [3.812432] imx_ldb_register, ret 0
> [3.815951] imx_ldb_bind encoder LVDS-46, ec0da388 ->func c0e6371c
> [3.87] imx_ldb_bind, done
> [3.825331] imx-drm display-subsystem: bound ldb (ops imx_ldb_ops)
> [3.831614] imx-drm display-subsystem: binding 
> parallel-display-controller1 (ops imx_pd_ops)
> [3.840285] drm_encoder_init, encoder ec8a2b78
> [3.844931] imx-drm display-subsystem: Linked as a consumer to panel
> [3.851472] imx-drm display-subsystem: bound parallel-display-controller1 
> (ops imx_pd_ops)
> [3.859785] imx-drm display-subsystem: binding 
> parallel-display-controller0 (ops imx_pd_ops)
> [3.868561] imx-drm display-subsystem: failed to bind 
> parallel-display-controller0 (ops imx_pd_ops): -517
> [3.878225] imx-drm display-subsystem: Dropping the link to panel
> [3.884679] imx_ldb_unbind
> [3.887421] imx_ldb_unbind, encoder LVDS-46, ec0da388 ->func c0e6371c
> [3.893950] imx_ldb_unbind, encoder (null), ec0da838 ->func 0
> [3.899723] imx_ldb_unbind, done

component_unbind calls devres_release_group after component->ops-
>unbind, freeing imx_ldb and with it the encoder that is still in the
mode_config.encoder_list

> [3.908345] imx_drm_bind, component_bind_all, ret -517
> [3.913638] drm_mode_config_cleanup, encoder TMDS-44, ec861894 ->funcs 
> c0e63a18
> [3.921260] drm_mode_config_cleanup, calling encoder->funcs->destroy!
> [3.927897] drm_encoder_cleanup, encoder ec861894
> [3.932717] drm_mode_config_cleanup, encoder (null), ec0da388 ->funcs 0

Use after free.

> [3.939356] drm_mode_config_cleanup, calling encoder->funcs->destroy!
> [3.946050] Unable to handle kernel NULL pointer dereference at virtual 
> address 0004
[...]
> The Apalis iMX6 device tree I work with uses a dumb VGA bridge and a LVDS
> display directly specified in the ldb node. In the case above I did not
> compile in the dumb VGA bridge driver, that is the reason why binding
> parallel-display-controller0 fails.

I suppose this means that component drivers must not allocate the
encoder in the component devres group during bind. Not only their own
failure, but any other component's failure can cause component_unbind to
be called in the cleanup path of component_bind_all, freeing the memory
before drm_mode_config_cleanup is called.

> The LVDS encoder (0xec1a0388 in that case) is somehow already cleaned up
> when calling drm_mode_config_cleanup(), which leads to a null pointer
> deference when trying to call destroy.
> 
> I was unable to figure out why the DRM encoder in the second
> drm_mode_config_cleanup() call is already zeroed out. The component
> framework calls imx_ldb_unbind() first, but that should not free
> up the encoder afaict. Any idea?
> 
> I was also wondering in the deferred probing case, functions like
> imx_ldb_bind() call devm_kzalloc on every try. Since the underlying
> device is not removed, doesn't this leads to multiple active allocations?

See above. To me it looks like we have to allocate imx_ldb in probe and
deal with possibly partially preinitialzied structure when bind is
called a second time.

regards
Philipp
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: drm/imx: Crash in drm_mode_config_cleanup

2018-09-12 Thread Stefan Agner
On 12.09.2018 15:26, Stefan Agner wrote:
> Hi,
> 
> While working on Apalis iMX6 with four displays, I encountered the
> following crash:
> 
> ...
> [2.978166] imx-drm display-subsystem: trying to bring up master
> [2.984368] imx-drm display-subsystem: Looking for component 0
> [2.990236] imx-drm display-subsystem: master has incomplete components
> [2.998774] imx-parallel-display parallel-display-controller0: adding 
> component (ops imx_pd_ops)
> [3.007697] imx-drm display-subsystem: trying to bring up master
> [3.013812] imx-drm display-subsystem: Looking for component 0
> [3.019679] imx-drm display-subsystem: master has incomplete components
> [3.027117] imx-parallel-display parallel-display-controller1: adding 
> component (ops imx_pd_ops)
> [3.036016] imx-drm display-subsystem: trying to bring up master
> [3.042128] imx-drm display-subsystem: Looking for component 0
> [3.047990] imx-drm display-subsystem: master has incomplete components
> [3.055667] imx_ldb_probe
> [3.058336] imx-ldb ldb: adding component (ops imx_ldb_ops)
> [3.064021] imx-drm display-subsystem: trying to bring up master
> [3.070061] imx-drm display-subsystem: Looking for component 0
> [3.075996] imx-drm display-subsystem: master has incomplete components
> [3.083712] dwhdmi-imx 12.hdmi: adding component (ops dw_hdmi_imx_ops)
> [3.090709] imx-drm display-subsystem: trying to bring up master
> [3.096747] imx-drm display-subsystem: Looking for component 0
> [3.102682] imx-drm display-subsystem: master has incomplete components
> [3.115134] panel-simple panel: Linked as a consumer to regulator.13
> [3.123207] etnaviv-gpu 13.gpu: adding component (ops gpu_ops)
> [3.129424] imx-drm display-subsystem: trying to bring up master
> [3.135550] imx-drm display-subsystem: Looking for component 0
> [3.141491] imx-drm display-subsystem: master has incomplete components
> [3.149460] etnaviv-gpu 134000.gpu: adding component (ops gpu_ops)
> [3.155772] imx-drm display-subsystem: trying to bring up master
> [3.161886] imx-drm display-subsystem: Looking for component 0
> [3.167751] imx-drm display-subsystem: master has incomplete components
> [3.175988] etnaviv-gpu 2204000.gpu: adding component (ops gpu_ops)
> [3.182386] imx-drm display-subsystem: trying to bring up master
> [3.188426] imx-drm display-subsystem: Looking for component 0
> [3.194367] imx-drm display-subsystem: master has incomplete components
> [3.202629] etnaviv etnaviv: trying to bring up master
> [3.207806] etnaviv etnaviv: Looking for component 0
> [3.212887] etnaviv etnaviv: found component 13.gpu, duplicate 0
> [3.219275] etnaviv etnaviv: Looking for component 1
> [3.224345] etnaviv etnaviv: found component 134000.gpu, duplicate 0
> [3.230796] etnaviv etnaviv: Looking for component 2
> [3.235791] etnaviv etnaviv: found component 2204000.gpu, duplicate 0
> [3.243093] etnaviv etnaviv: binding 13.gpu (ops gpu_ops)
> [3.250918] etnaviv etnaviv: bound 13.gpu (ops gpu_ops)
> [3.256547] etnaviv etnaviv: binding 134000.gpu (ops gpu_ops)
> [3.263052] etnaviv etnaviv: bound 134000.gpu (ops gpu_ops)
> [3.268678] etnaviv etnaviv: binding 2204000.gpu (ops gpu_ops)
> [3.275268] etnaviv etnaviv: bound 2204000.gpu (ops gpu_ops)
> [3.281064] etnaviv-gpu 13.gpu: model: GC2000, revision: 5108
> [3.299883] etnaviv-gpu 134000.gpu: model: GC320, revision: 5007
> [3.319255] etnaviv-gpu 2204000.gpu: model: GC355, revision: 1215
> [3.325491] etnaviv-gpu 2204000.gpu: Ignoring GPU with VG and FE2.0
> [3.334102] [drm] Initialized etnaviv 1.2.0 20151214 for etnaviv on minor 0
> [3.346005] imx-ipuv3-crtc imx-ipuv3-crtc.2: adding component (ops 
> ipu_crtc_ops)
> [3.353520] imx-drm display-subsystem: trying to bring up master
> [3.359596] imx-drm display-subsystem: Looking for component 0
> [3.365518] imx-drm display-subsystem: found component imx-ipuv3-crtc.2, 
> duplicate 0
> [3.373342] imx-drm display-subsystem: Looking for component 1
> [3.379206] imx-drm display-subsystem: master has incomplete components
> [3.386415] imx-ipuv3-crtc imx-ipuv3-crtc.3: adding component (ops 
> ipu_crtc_ops)
> [3.393901] imx-drm display-subsystem: trying to bring up master
> [3.399939] imx-drm display-subsystem: Looking for component 0
> [3.405852] imx-drm display-subsystem: Looking for component 1
> [3.411770] imx-drm display-subsystem: found component imx-ipuv3-crtc.3, 
> duplicate 0
> [3.419542] imx-drm display-subsystem: Looking for component 2
> [3.425456] imx-drm display-subsystem: master has incomplete components
> [3.432224] imx-ipuv3 240.ipu: IPUv3H probed
> [3.438945] imx-ipuv3-crtc imx-ipuv3-crtc.6: adding component (ops 
> ipu_crtc_ops)
> [3.446435] imx-drm display-subsystem: trying to bring up master
> [3.452526]