[DRM] drm_get_connector_name internal static string buffer causes a race

2014-03-25 Thread Daniel Vetter
On Tue, Mar 25, 2014 at 11:23:46AM +, Dmitry Malkin wrote:
> Hi,
> will you accept bash scripts for reloading driver/X/FLR for intel-gpu-tools 
> to uncover exists and future bugs besides those patches?

i-g-t/tests/drv_module_reload we have already, so basic coverage is there.
It it's run in our nightly test runs on drm-intel-nightly.

But that scripts tries very hard not to cause a race so more fancy stuff
is still needed. But until we have a more solid driver I don't think
there's much point for crazier scripts, it'll just cause headaches.
-Daniel


> 
> From: Daniel Vetter  on behalf of Daniel Vetter 
> 
> Sent: Tuesday, March 25, 2014 2:43 PM
> To: Dmitry Malkin
> Cc: Daniel Vetter; dri-devel at lists.freedesktop.org
> Subject: Re: [DRM] drm_get_connector_name internal static string buffer 
> causes a race
> 
> On Tue, Mar 25, 2014 at 08:08:23AM +, Dmitry Malkin wrote:
> > Hello, Daniel,
> >
> > Thank you for response. I've got a couple questions for you:
> >
> > 0. What do you think about making integration test with continuous 
> > reloading i915 driver and X server (with FLR between iteration)?
> > Simplified example for ubuntu (root required):
> 
> Module reloading is know to be horribly racy atm. Don't do that in any
> kind of stress-test situation before fixing up piles of bugs ;-)
> 
> Will mean: You'll uncover at _lot_ more than just this. Patches for all
> this highly welcome though.
> 
> Thanks, Daniel
> 
> >
> > #!/bin/bash
> > service lightdm stop
> > rmmod snd_hda_intel
> > echo -n ":00:02.0" > /sys/bus/pci/devices/\:00\:02.0/driver/unbind
> > rmmod i915
> > echo 1 > /sys/bus/pci/devices/\:00\:02.0/reset
> > modprobe i915
> > service lightdm start
> >
> > This will uncover next problems:
> > * sysfs add/remove i2c connectors (parent/child warning)
> > * drm static buffer races
> > * NX bit violation crash (see dump below)
> > * and bunch of DMAR problems
> >
> >
> > 1. Could you point me out git/branch with FIXME comments?
> >
> > 2. About kfree() problem: what if put string buffer onto stack of caller 
> > for drm_get_connector_name and drm_get_encoder_name?
> > It will be auto-removed and there is will be the patch about adding new 
> > param for these functions.
> > (yes the patch will be big and awful to read)
> >
> > === NX bit violation crash 
> > 
> > Mar 20 21:53:29 haswell01 kernel: [13447.100849] Console: switching to 
> > colour dummy device 80x25
> > Mar 20 21:53:29 haswell01 kernel: [13447.100950] drm_kms_helper: drm: 
> > unregistered panic notifier
> > Mar 20 21:53:29 haswell01 kernel: [13447.117785] waiting module removal not 
> > supported: please upgrade
> > Mar 20 21:53:29 haswell01 kernel: [13447.117880] [drm] Module unloaded
> > Mar 20 21:53:29 haswell01 kernel: [13447.118244] waiting module removal not 
> > supported: please upgrade
> > Mar 20 21:53:29 haswell01 kernel: [13447.118360] waiting module removal not 
> > supported: please upgradewaiting module removal not supported: please 
> > upgrade
> > Mar 20 21:53:39 haswell01 kernel: [13447.118590] waiting module removal not 
> > supported: please upgrade<6>[13457.263808] [drm] Initialized drm 1.1.0 
> > 20060810
> > Mar 20 21:53:39 haswell01 kernel: [13457.333992] kernel tried to execute 
> > NX-protected page - exploit attempt? (uid: 0)
> > Mar 20 21:53:39 haswell01 kernel: [13457.333996] BUG: unable to handle 
> > kernel paging request at 8803eb27fcb0
> > Mar 20 21:53:39 haswell01 kernel: [13457.333998] IP: [] 
> > 0x8803eb27fcb0
> > Mar 20 21:53:39 haswell01 kernel: [13457.334001] PGD 2fd9067 PUD 3e68c3063 
> > PMD 404902063 PTE 8003eb27f163
> > Mar 20 21:53:39 haswell01 kernel: [13457.334004] Oops: 0011 [#1] SMP
> > Mar 20 21:53:39 haswell01 kernel: [13457.334006] Modules linked in: i915(+) 
> > video drm_kms_helper drm i2c_algo_bit snd_hda_codec_hdmi rfcomm bnep 
> > bluetooth nls_iso8859_1 x86_pkg_temp_thermal intel_powerclamp coretemp 
> > kvm_intel snd_hda_codec_realtek kvm snd_hda_codec snd_hwdep snd_pcm 
> > snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
> > hid_generic crct10dif_pclmul snd_seq_device usbhid crc32_pclmul snd_timer 
> > ghash_clmulni_intel snd aesni_intel psmouse hid mei_me aes_x86_64 lrw ppdev 
> > gf128mul mei glue_helper parport_pc ablk_helper lp cryptd soundcore parport 
> > serio_raw lpc_ich mac_hid e1000e ahci ptp libahci pps_core [last unloaded: 
>

[DRM] drm_get_connector_name internal static string buffer causes a race

2014-03-25 Thread Jani Nikula
On Mon, 24 Mar 2014, Daniel Vetter  wrote:
> On Mon, Mar 24, 2014 at 12:06:21PM +, Dmitry Malkin wrote:
>> 
>> Hello guys,
>> 
>> I've been playing with reloading intel gfx driver (i915) in a cycle,
>> for a while, and at some point I've found a non-deterministic kernel
>> crash with a highly-variable iteration dependency -- 2 to 200 driver
>> reload iterations.
>> 
>> The apparent race is over the shared internal string buffer in
>> drm_get_connector_name().  It is mostly harmless, due to its results
>> being mostly used for log output, but in at least one case --
>> drm_sysfs_connector_add() -- this leads to a more critical error.
>> 
>> Race scenario:
>> 
>> - drm_sysfs_connector_add()
>>- drm_get_connector_name()
>> vs.
>> - anything that generates log messages involving DRM connectors
>>- drm_get_connector_name()
>> 
>>  (and many other from drm_crtc.c) shares with caller const char* to
>> internal static char buffer.  If something call it from other thread,
>> while main thread strore and use returned pointer it may overwrite
>> connector name.
>> 
>> Here are we go: registering HDMI connecter (drm_sysfs_connector_add
>> store and use pointer from drm_get_connector_name) and the same time
>> got VGA connector name down through the stack. (the second thread is
>> upowerd who watch continuously sysfs)
>
> Yeah, in my recent kerneldoc series I've noticed this too and added FIXME
> comments. There's a lot more than just those you've pointed out. The
> problem is that fixing these will be an awful lot of work since you need
> to add proper cleanup code (calling kfree()) to all the required places.
>
> So a full subsystem wide code audit for every single use-site of these.

It would be much easier and much much less error prone to add a name
field (either a fixed size or kmalloced buffer) in drm_connector and
drm_encoder structs, and initialize them in connector/encoder
init. AFAICT the name does not change during the objects' lifetime. None
of the use-sites would need to be changed.

The question is whether (number of connectors + number of encoders) * 32
bytes is too high a price to pay for the implementation simplicity. I
think not.

Another alternative to returning kmalloced strings is:

diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index 16ca28ed5ee8..85c8ed191daa 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -277,11 +277,10 @@ EXPORT_SYMBOL(drm_get_encoder_name);
  *
  * FIXME: This isn't really multithreading safe.
  */
-const char *drm_get_connector_name(const struct drm_connector *connector)
+const char *drm_get_connector_name(char *buf, size_t bufsize,
+  const struct drm_connector *connector)
 {
-   static char buf[32];
-
-   snprintf(buf, 32, "%s-%d",
+   snprintf(buf, bufsize, "%s-%d",
 drm_connector_enum_list[connector->connector_type].name,
 connector->connector_type_id);
return buf;

So the callers may use the kind of buffers they want. Something like
this may be needed for drm_get_format_name() anyway.


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center


[DRM] drm_get_connector_name internal static string buffer causes a race

2014-03-25 Thread Daniel Vetter
 FS:  7f3f392d0740() 
> GS:88041ea0() knlGS:
> Mar 20 21:53:39 haswell01 kernel: [13457.334046] CS:  0010 DS:  ES:  
> CR0: 80050033
> Mar 20 21:53:39 haswell01 kernel: [13457.334047] CR2: 8803eb27fcb0 CR3: 
> 0003e7358000 CR4: 001407f0
> Mar 20 21:53:39 haswell01 kernel: [13457.334048] DR0:  DR1: 
>  DR2: 
> Mar 20 21:53:39 haswell01 kernel: [13457.334050] DR3:  DR6: 
> fffe0ff0 DR7: 0400
> Mar 20 21:53:39 haswell01 kernel: [13457.334051] Stack:
> Mar 20 21:53:39 haswell01 kernel: [13457.334052]  a0213cb2 
> 8803f6016000 880408199000 880408199098
> Mar 20 21:53:39 haswell01 kernel: [13457.334054]  880406395b60 
> a0215b92 0004 880408199140
> Mar 20 21:53:39 haswell01 kernel: [13457.334057]  a03b9920 
> 880408199000  a03dd000
> Mar 20 21:53:39 haswell01 kernel: [13457.334059] Call Trace:
> Mar 20 21:53:39 haswell01 kernel: [13457.334067]  [] ? 
> drm_dev_register+0xa2/0x1e0 [drm]
> Mar 20 21:53:39 haswell01 kernel: [13457.334073]  [] 
> drm_get_pci_dev+0x92/0x140 [drm]
> Mar 20 21:53:39 haswell01 kernel: [13457.334082]  [] 
> i915_pci_probe+0x3c/0x90 [i915]
> Mar 20 21:53:39 haswell01 kernel: [13457.334086]  [] 
> local_pci_probe+0x45/0xa0
> Mar 20 21:53:39 haswell01 kernel: [13457.334088]  [] ? 
> pci_match_device+0xc5/0xd0
> Mar 20 21:53:39 haswell01 kernel: [13457.334090]  [] 
> pci_device_probe+0xd9/0x130
> Mar 20 21:53:39 haswell01 kernel: [13457.334093]  [] 
> driver_probe_device+0x125/0x3b0
> Mar 20 21:53:39 haswell01 kernel: [13457.334095]  [] 
> __driver_attach+0x93/0xa0
> Mar 20 21:53:39 haswell01 kernel: [13457.334098]  [] ? 
> __device_attach+0x40/0x40
> Mar 20 21:53:39 haswell01 kernel: [13457.334100]  [] 
> bus_for_each_dev+0x63/0xa0
> Mar 20 21:53:39 haswell01 kernel: [13457.334102]  [] 
> driver_attach+0x1e/0x20
> Mar 20 21:53:39 haswell01 kernel: [13457.334104]  [] 
> bus_add_driver+0x180/0x250
> Mar 20 21:53:39 haswell01 kernel: [13457.334106]  [] ? 
> 0xa03fdfff
> Mar 20 21:53:39 haswell01 kernel: [13457.334109]  [] 
> driver_register+0x64/0xf0
> Mar 20 21:53:39 haswell01 kernel: [13457.334110]  [] ? 
> 0xa03fdfff
> Mar 20 21:53:39 haswell01 kernel: [13457.334113]  [] 
> __pci_register_driver+0x4c/0x50
> Mar 20 21:53:39 haswell01 kernel: [13457.334117]  [] 
> drm_pci_init+0x11a/0x130 [drm]
> Mar 20 21:53:39 haswell01 kernel: [13457.334119]  [] ? 
> 0xa03fdfff
> Mar 20 21:53:39 haswell01 kernel: [13457.334127]  [] 
> i915_init+0x66/0x68 [i915]
> Mar 20 21:53:39 haswell01 kernel: [13457.334130]  [] 
> do_one_initcall+0xfa/0x1b0
> Mar 20 21:53:39 haswell01 kernel: [13457.334133]  [] ? 
> set_memory_nx+0x43/0x50
> Mar 20 21:53:39 haswell01 kernel: [13457.334137]  [] 
> load_module+0x2050/0x26f0
> Mar 20 21:53:39 haswell01 kernel: [13457.334139]  [] ? 
> store_uevent+0x40/0x40
> Mar 20 21:53:39 haswell01 kernel: [13457.334141]  [] 
> SyS_finit_module+0x86/0xb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334144]  [] 
> tracesys+0xe1/0xe6
> Mar 20 21:53:39 haswell01 kernel: [13457.334145] Code: 00 00 00 00 00 00 00 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 
> 00 00 00 00 00 00 00 00 00 00  b1 ca 81 ff ff ff ff b0 f4 27 eb 03 88 ff 
> ff ff ff ff 7f 00 
> Mar 20 21:53:39 haswell01 kernel: [13457.334164] RIP  [] 
> 0x8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334166]  RSP 
> Mar 20 21:53:39 haswell01 kernel: [13457.334167] CR2: 8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334168] ---[ end trace 
> e2598b1fc83a65fd ]---
> 
> 
> 
> From: Daniel Vetter  on behalf of Daniel Vetter 
> 
> Sent: Tuesday, March 25, 2014 12:31 AM
> To: Dmitry Malkin
> Cc: dri-devel at lists.freedesktop.org
> Subject: Re: [DRM] drm_get_connector_name internal static string buffer 
> causes a race
> 
> On Mon, Mar 24, 2014 at 12:06:21PM +, Dmitry Malkin wrote:
> >
> > Hello guys,
> >
> > I've been playing with reloading intel gfx driver (i915) in a cycle, for a 
> > while,
> > and at some point I've found a non-deterministic kernel crash with a 
> > highly-variable
> > iteration dependency -- 2 to 200 driver reload iterations.
> >
> > The apparent race is over the shared internal string buffer in 
> > drm_get_connector_name().
> > It is mostly harmless, due to its results being mostly used for log output, 
> > but in at least one
> > case  -- drm_sysfs_connector_add() -- this leads to a more critical error

[DRM] drm_get_connector_name internal static string buffer causes a race

2014-03-25 Thread Dmitry Malkin
Hi,
will you accept bash scripts for reloading driver/X/FLR for intel-gpu-tools to 
uncover exists and future bugs besides those patches?

From: Daniel Vetter  on behalf of Daniel Vetter 
<dan...@ffwll.ch>
Sent: Tuesday, March 25, 2014 2:43 PM
To: Dmitry Malkin
Cc: Daniel Vetter; dri-devel at lists.freedesktop.org
Subject: Re: [DRM] drm_get_connector_name internal static string buffer causes 
a race

On Tue, Mar 25, 2014 at 08:08:23AM +, Dmitry Malkin wrote:
> Hello, Daniel,
>
> Thank you for response. I've got a couple questions for you:
>
> 0. What do you think about making integration test with continuous reloading 
> i915 driver and X server (with FLR between iteration)?
> Simplified example for ubuntu (root required):

Module reloading is know to be horribly racy atm. Don't do that in any
kind of stress-test situation before fixing up piles of bugs ;-)

Will mean: You'll uncover at _lot_ more than just this. Patches for all
this highly welcome though.

Thanks, Daniel

>
> #!/bin/bash
> service lightdm stop
> rmmod snd_hda_intel
> echo -n ":00:02.0" > /sys/bus/pci/devices/\:00\:02.0/driver/unbind
> rmmod i915
> echo 1 > /sys/bus/pci/devices/\:00\:02.0/reset
> modprobe i915
> service lightdm start
>
> This will uncover next problems:
> * sysfs add/remove i2c connectors (parent/child warning)
> * drm static buffer races
> * NX bit violation crash (see dump below)
> * and bunch of DMAR problems
>
>
> 1. Could you point me out git/branch with FIXME comments?
>
> 2. About kfree() problem: what if put string buffer onto stack of caller for 
> drm_get_connector_name and drm_get_encoder_name?
> It will be auto-removed and there is will be the patch about adding new param 
> for these functions.
> (yes the patch will be big and awful to read)
>
> === NX bit violation crash 
> 
> Mar 20 21:53:29 haswell01 kernel: [13447.100849] Console: switching to colour 
> dummy device 80x25
> Mar 20 21:53:29 haswell01 kernel: [13447.100950] drm_kms_helper: drm: 
> unregistered panic notifier
> Mar 20 21:53:29 haswell01 kernel: [13447.117785] waiting module removal not 
> supported: please upgrade
> Mar 20 21:53:29 haswell01 kernel: [13447.117880] [drm] Module unloaded
> Mar 20 21:53:29 haswell01 kernel: [13447.118244] waiting module removal not 
> supported: please upgrade
> Mar 20 21:53:29 haswell01 kernel: [13447.118360] waiting module removal not 
> supported: please upgradewaiting module removal not supported: please upgrade
> Mar 20 21:53:39 haswell01 kernel: [13447.118590] waiting module removal not 
> supported: please upgrade<6>[13457.263808] [drm] Initialized drm 1.1.0 
> 20060810
> Mar 20 21:53:39 haswell01 kernel: [13457.333992] kernel tried to execute 
> NX-protected page - exploit attempt? (uid: 0)
> Mar 20 21:53:39 haswell01 kernel: [13457.333996] BUG: unable to handle kernel 
> paging request at 8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.333998] IP: [] 
> 0x8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334001] PGD 2fd9067 PUD 3e68c3063 
> PMD 404902063 PTE 8003eb27f163
> Mar 20 21:53:39 haswell01 kernel: [13457.334004] Oops: 0011 [#1] SMP
> Mar 20 21:53:39 haswell01 kernel: [13457.334006] Modules linked in: i915(+) 
> video drm_kms_helper drm i2c_algo_bit snd_hda_codec_hdmi rfcomm bnep 
> bluetooth nls_iso8859_1 x86_pkg_temp_thermal intel_powerclamp coretemp 
> kvm_intel snd_hda_codec_realtek kvm snd_hda_codec snd_hwdep snd_pcm 
> snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
> hid_generic crct10dif_pclmul snd_seq_device usbhid crc32_pclmul snd_timer 
> ghash_clmulni_intel snd aesni_intel psmouse hid mei_me aes_x86_64 lrw ppdev 
> gf128mul mei glue_helper parport_pc ablk_helper lp cryptd soundcore parport 
> serio_raw lpc_ich mac_hid e1000e ahci ptp libahci pps_core [last unloaded: 
> video]
> Mar 20 21:53:39 haswell01 kernel: [13457.334031] CPU: 0 PID: 4974 Comm: 
> modprobe Not tainted 3.13.6 #10
> Mar 20 21:53:39 haswell01 kernel: [13457.334032] Hardware name:   
>/DQ87PG, BIOS PGQ8710H.86A.0144.2014.0113.1604 01/13/2014
> Mar 20 21:53:39 haswell01 kernel: [13457.334034] task: 88002e6c5fc0 ti: 
> 880406394000 task.ti: 880406394000
> Mar 20 21:53:39 haswell01 kernel: [13457.334035] RIP: 
> 0010:[]  [] 0x8803eb27fcb0
> Mar 20 21:53:39 haswell01 kernel: [13457.334037] RSP: 0018:880406395af0  
> EFLAGS: 00010282
> Mar 20 21:53:39 haswell01 kernel: [13457.334038] RAX: 8803eb27f4b0 RBX: 
> 8803f6016000 RCX: a0238630
> Mar 20 21:53:39 haswell01 kernel: [13457.334039] RDX: 8803eb27fcb0 RSI: 
> a03ba3c4 RDI: fff

[DRM] drm_get_connector_name internal static string buffer causes a race

2014-03-25 Thread Dmitry Malkin
054]  880406395b60 
a0215b92 0004 880408199140
Mar 20 21:53:39 haswell01 kernel: [13457.334057]  a03b9920 
880408199000  a03dd000
Mar 20 21:53:39 haswell01 kernel: [13457.334059] Call Trace:
Mar 20 21:53:39 haswell01 kernel: [13457.334067]  [] ? 
drm_dev_register+0xa2/0x1e0 [drm]
Mar 20 21:53:39 haswell01 kernel: [13457.334073]  [] 
drm_get_pci_dev+0x92/0x140 [drm]
Mar 20 21:53:39 haswell01 kernel: [13457.334082]  [] 
i915_pci_probe+0x3c/0x90 [i915]
Mar 20 21:53:39 haswell01 kernel: [13457.334086]  [] 
local_pci_probe+0x45/0xa0
Mar 20 21:53:39 haswell01 kernel: [13457.334088]  [] ? 
pci_match_device+0xc5/0xd0
Mar 20 21:53:39 haswell01 kernel: [13457.334090]  [] 
pci_device_probe+0xd9/0x130
Mar 20 21:53:39 haswell01 kernel: [13457.334093]  [] 
driver_probe_device+0x125/0x3b0
Mar 20 21:53:39 haswell01 kernel: [13457.334095]  [] 
__driver_attach+0x93/0xa0
Mar 20 21:53:39 haswell01 kernel: [13457.334098]  [] ? 
__device_attach+0x40/0x40
Mar 20 21:53:39 haswell01 kernel: [13457.334100]  [] 
bus_for_each_dev+0x63/0xa0
Mar 20 21:53:39 haswell01 kernel: [13457.334102]  [] 
driver_attach+0x1e/0x20
Mar 20 21:53:39 haswell01 kernel: [13457.334104]  [] 
bus_add_driver+0x180/0x250
Mar 20 21:53:39 haswell01 kernel: [13457.334106]  [] ? 
0xa03fdfff
Mar 20 21:53:39 haswell01 kernel: [13457.334109]  [] 
driver_register+0x64/0xf0
Mar 20 21:53:39 haswell01 kernel: [13457.334110]  [] ? 
0xa03fdfff
Mar 20 21:53:39 haswell01 kernel: [13457.334113]  [] 
__pci_register_driver+0x4c/0x50
Mar 20 21:53:39 haswell01 kernel: [13457.334117]  [] 
drm_pci_init+0x11a/0x130 [drm]
Mar 20 21:53:39 haswell01 kernel: [13457.334119]  [] ? 
0xa03fdfff
Mar 20 21:53:39 haswell01 kernel: [13457.334127]  [] 
i915_init+0x66/0x68 [i915]
Mar 20 21:53:39 haswell01 kernel: [13457.334130]  [] 
do_one_initcall+0xfa/0x1b0
Mar 20 21:53:39 haswell01 kernel: [13457.334133]  [] ? 
set_memory_nx+0x43/0x50
Mar 20 21:53:39 haswell01 kernel: [13457.334137]  [] 
load_module+0x2050/0x26f0
Mar 20 21:53:39 haswell01 kernel: [13457.334139]  [] ? 
store_uevent+0x40/0x40
Mar 20 21:53:39 haswell01 kernel: [13457.334141]  [] 
SyS_finit_module+0x86/0xb0
Mar 20 21:53:39 haswell01 kernel: [13457.334144]  [] 
tracesys+0xe1/0xe6
Mar 20 21:53:39 haswell01 kernel: [13457.334145] Code: 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 
00 00 00 00 00 00 00 00 00  b1 ca 81 ff ff ff ff b0 f4 27 eb 03 88 ff ff ff 
ff ff 7f 00 
Mar 20 21:53:39 haswell01 kernel: [13457.334164] RIP  [] 
0x8803eb27fcb0
Mar 20 21:53:39 haswell01 kernel: [13457.334166]  RSP 
Mar 20 21:53:39 haswell01 kernel: [13457.334167] CR2: 8803eb27fcb0
Mar 20 21:53:39 haswell01 kernel: [13457.334168] ---[ end trace 
e2598b1fc83a65fd ]---



From: Daniel Vetter  on behalf of Daniel Vetter 
<dan...@ffwll.ch>
Sent: Tuesday, March 25, 2014 12:31 AM
To: Dmitry Malkin
Cc: dri-devel at lists.freedesktop.org
Subject: Re: [DRM] drm_get_connector_name internal static string buffer causes 
a race

On Mon, Mar 24, 2014 at 12:06:21PM +, Dmitry Malkin wrote:
>
> Hello guys,
>
> I've been playing with reloading intel gfx driver (i915) in a cycle, for a 
> while,
> and at some point I've found a non-deterministic kernel crash with a 
> highly-variable
> iteration dependency -- 2 to 200 driver reload iterations.
>
> The apparent race is over the shared internal string buffer in 
> drm_get_connector_name().
> It is mostly harmless, due to its results being mostly used for log output, 
> but in at least one
> case  -- drm_sysfs_connector_add() -- this leads to a more critical error.
>
> Race scenario:
>
> - drm_sysfs_connector_add()
>- drm_get_connector_name()
> vs.
> - anything that generates log messages involving DRM connectors
>- drm_get_connector_name()
>
>  (and many other from drm_crtc.c) shares with caller const char* to internal 
> static char buffer.
> If something call it from other thread, while main thread strore and use 
> returned pointer it may overwrite connector name.
>
> Here are we go: registering HDMI connecter  (drm_sysfs_connector_add store 
> and use pointer from drm_get_connector_name)
> and the same time got VGA connector name down through the stack. (the second 
> thread is upowerd who watch continuously sysfs)

Yeah, in my recent kerneldoc series I've noticed this too and added FIXME
comments. There's a lot more than just those you've pointed out. The
problem is that fixing these will be an awful lot of work since you need
to add proper cleanup code (calling kfree()) to all the required places.

So a full subsystem wide code audit for every single use-site of these.
-Daniel


[DRM] drm_get_connector_name internal static string buffer causes a race

2014-03-24 Thread Daniel Vetter
On Mon, Mar 24, 2014 at 12:06:21PM +, Dmitry Malkin wrote:
> 
> Hello guys,
> 
> I've been playing with reloading intel gfx driver (i915) in a cycle, for a 
> while,
> and at some point I've found a non-deterministic kernel crash with a 
> highly-variable
> iteration dependency -- 2 to 200 driver reload iterations.
> 
> The apparent race is over the shared internal string buffer in 
> drm_get_connector_name().
> It is mostly harmless, due to its results being mostly used for log output, 
> but in at least one
> case  -- drm_sysfs_connector_add() -- this leads to a more critical error.
> 
> Race scenario:
> 
> - drm_sysfs_connector_add()
>- drm_get_connector_name()
> vs.
> - anything that generates log messages involving DRM connectors
>- drm_get_connector_name()
> 
>  (and many other from drm_crtc.c) shares with caller const char* to internal 
> static char buffer.
> If something call it from other thread, while main thread strore and use 
> returned pointer it may overwrite connector name.
> 
> Here are we go: registering HDMI connecter  (drm_sysfs_connector_add store 
> and use pointer from drm_get_connector_name)
> and the same time got VGA connector name down through the stack. (the second 
> thread is upowerd who watch continuously sysfs)

Yeah, in my recent kerneldoc series I've noticed this too and added FIXME
comments. There's a lot more than just those you've pointed out. The
problem is that fixing these will be an awful lot of work since you need
to add proper cleanup code (calling kfree()) to all the required places.

So a full subsystem wide code audit for every single use-site of these.
-Daniel

> 
> Mar 24 14:23:04 haswell01 kernel: [  417.570043] [ cut here 
> ]
> Mar 24 14:23:04 haswell01 kernel: [  417.570045] WARNING: CPU: 0 PID: 12700 
> at /build/buildd/linux-3.13.0/lib/kobject.c:223 
> kobject_add_internal+0x224/0x330()
> Mar 24 14:23:04 haswell01 kernel: [  417.570046] kobject_add_internal failed 
> for card0-VGA-1 with -EEXIST, don't try to register things with the same name 
> in the same directory.
> Mar 24 14:23:04 haswell01 kernel: [  417.570047] Modules linked in: i915(+) 
> video drm_kms_helper drm i2c_algo_bit snd_hda_codec_realtek 
> snd_hda_codec_hdmi bnep rfcomm bluetooth x86_pkg_temp_thermal intel_p
> owerclamp coretemp kvm_intel snd_hda_codec kvm snd_hwdep snd_pcm hid_generic 
> snd_page_alloc crct10dif_pclmul snd_seq_midi crc32_pclmul ghash_clmulni_intel 
> snd_seq_midi_event usbhid snd_rawmidi hid aesni_in
> tel aes_x86_64 lrw gf128mul ppdev glue_helper ablk_helper cryptd snd_seq 
> snd_seq_device snd_timer snd mei_me psmouse lpc_ich soundcore mei mac_hid 
> parport_pc serio_raw nls_iso8859_1 lp parport e1000e ahci
> ptp libahci pps_core [last unloaded: video]
> Mar 24 14:23:04 haswell01 kernel: [  417.570068] CPU: 0 PID: 12700 Comm: 
> modprobe Tainted: GW3.13.0-19-generic #39-Ubuntu
> Mar 24 14:23:04 haswell01 kernel: [  417.570069] Hardware name:   
>/DQ87PG, BIOS PGQ8710H.86A.0144.2014.0113.1604 01/13/2014
> Mar 24 14:23:04 haswell01 kernel: [  417.570069]  0009 
> 8804051295f8 81711075 880405129640
> Mar 24 14:23:04 haswell01 kernel: [  417.570071]  880405129630 
> 810662cd 88040776a410 ffef
> Mar 24 14:23:04 haswell01 kernel: [  417.570074]   
> 8804048dcc10 880407769000 880405129690
> Mar 24 14:23:04 haswell01 kernel: [  417.570076] Call Trace:
> Mar 24 14:23:04 haswell01 kernel: [  417.570078]  [] 
> dump_stack+0x45/0x56
> Mar 24 14:23:04 haswell01 kernel: [  417.570080]  [] 
> warn_slowpath_common+0x7d/0xa0
> Mar 24 14:23:04 haswell01 kernel: [  417.570081]  [] 
> warn_slowpath_fmt+0x4c/0x50
> Mar 24 14:23:04 haswell01 kernel: [  417.570082]  [] ? 
> sysfs_create_dir_ns+0x73/0xc0
> Mar 24 14:23:04 haswell01 kernel: [  417.570084]  [] 
> kobject_add_internal+0x224/0x330
> Mar 24 14:23:04 haswell01 kernel: [  417.570086]  [] 
> kobject_add+0x65/0xb0
> Mar 24 14:23:04 haswell01 kernel: [  417.570088]  [] 
> device_add+0x125/0x640
> Mar 24 14:23:04 haswell01 kernel: [  417.570090]  [] 
> device_create_groups_vargs+0xe0/0x110
> Mar 24 14:23:04 haswell01 kernel: [  417.570092]  [] 
> device_create+0x41/0x50
> Mar 24 14:23:04 haswell01 kernel: [  417.570097]  [] 
> drm_sysfs_connector_add+0x69/0x230 [drm]
> Mar 24 14:23:04 haswell01 kernel: [  417.570110]  [] 
> intel_hdmi_init_connector+0x111/0x260 [i915]
> Mar 24 14:23:04 haswell01 kernel: [  417.570119]  [] 
> intel_ddi_init+0x270/0x2a0 [i915]
> Mar 24 14:23:04 haswell01 kernel: [  417.570130]  [] 
> intel_setup_outputs+0x4c6/0x750 [i915]
> Mar 24 14:23:04 haswell01 kernel: [  417.570139]  [] 
> intel_modeset_init+0x607/0x8f0 [i915]
> Mar 24 14:23:04 haswell01 kernel: [  417.570147]  [] 
> i915_driver_load+0xbb4/0xe70 [i915]
> Mar 24 14:23:04 haswell01 kernel: [  417.570153]  [] 
> drm_dev_register+0xa2/0x1e0 [drm]
> Mar 24 14:23:04 haswell01 kernel: [  417.570158]  [] 

[DRM] drm_get_connector_name internal static string buffer causes a race

2014-03-24 Thread Dmitry Malkin

Hello guys,

I've been playing with reloading intel gfx driver (i915) in a cycle, for a 
while,
and at some point I've found a non-deterministic kernel crash with a 
highly-variable
iteration dependency -- 2 to 200 driver reload iterations.

The apparent race is over the shared internal string buffer in 
drm_get_connector_name().
It is mostly harmless, due to its results being mostly used for log output, but 
in at least one
case  -- drm_sysfs_connector_add() -- this leads to a more critical error.

Race scenario:

- drm_sysfs_connector_add()
   - drm_get_connector_name()
vs.
- anything that generates log messages involving DRM connectors
   - drm_get_connector_name()

 (and many other from drm_crtc.c) shares with caller const char* to internal 
static char buffer.
If something call it from other thread, while main thread strore and use 
returned pointer it may overwrite connector name.

Here are we go: registering HDMI connecter  (drm_sysfs_connector_add store and 
use pointer from drm_get_connector_name)
and the same time got VGA connector name down through the stack. (the second 
thread is upowerd who watch continuously sysfs)

Mar 24 14:23:04 haswell01 kernel: [  417.570043] [ cut here 
]
Mar 24 14:23:04 haswell01 kernel: [  417.570045] WARNING: CPU: 0 PID: 12700 at 
/build/buildd/linux-3.13.0/lib/kobject.c:223 kobject_add_internal+0x224/0x330()
Mar 24 14:23:04 haswell01 kernel: [  417.570046] kobject_add_internal failed 
for card0-VGA-1 with -EEXIST, don't try to register things with the same name 
in the same directory.
Mar 24 14:23:04 haswell01 kernel: [  417.570047] Modules linked in: i915(+) 
video drm_kms_helper drm i2c_algo_bit snd_hda_codec_realtek snd_hda_codec_hdmi 
bnep rfcomm bluetooth x86_pkg_temp_thermal intel_p
owerclamp coretemp kvm_intel snd_hda_codec kvm snd_hwdep snd_pcm hid_generic 
snd_page_alloc crct10dif_pclmul snd_seq_midi crc32_pclmul ghash_clmulni_intel 
snd_seq_midi_event usbhid snd_rawmidi hid aesni_in
tel aes_x86_64 lrw gf128mul ppdev glue_helper ablk_helper cryptd snd_seq 
snd_seq_device snd_timer snd mei_me psmouse lpc_ich soundcore mei mac_hid 
parport_pc serio_raw nls_iso8859_1 lp parport e1000e ahci
ptp libahci pps_core [last unloaded: video]
Mar 24 14:23:04 haswell01 kernel: [  417.570068] CPU: 0 PID: 12700 Comm: 
modprobe Tainted: GW3.13.0-19-generic #39-Ubuntu
Mar 24 14:23:04 haswell01 kernel: [  417.570069] Hardware name: 
 /DQ87PG, BIOS PGQ8710H.86A.0144.2014.0113.1604 01/13/2014
Mar 24 14:23:04 haswell01 kernel: [  417.570069]  0009 
8804051295f8 81711075 880405129640
Mar 24 14:23:04 haswell01 kernel: [  417.570071]  880405129630 
810662cd 88040776a410 ffef
Mar 24 14:23:04 haswell01 kernel: [  417.570074]   
8804048dcc10 880407769000 880405129690
Mar 24 14:23:04 haswell01 kernel: [  417.570076] Call Trace:
Mar 24 14:23:04 haswell01 kernel: [  417.570078]  [] 
dump_stack+0x45/0x56
Mar 24 14:23:04 haswell01 kernel: [  417.570080]  [] 
warn_slowpath_common+0x7d/0xa0
Mar 24 14:23:04 haswell01 kernel: [  417.570081]  [] 
warn_slowpath_fmt+0x4c/0x50
Mar 24 14:23:04 haswell01 kernel: [  417.570082]  [] ? 
sysfs_create_dir_ns+0x73/0xc0
Mar 24 14:23:04 haswell01 kernel: [  417.570084]  [] 
kobject_add_internal+0x224/0x330
Mar 24 14:23:04 haswell01 kernel: [  417.570086]  [] 
kobject_add+0x65/0xb0
Mar 24 14:23:04 haswell01 kernel: [  417.570088]  [] 
device_add+0x125/0x640
Mar 24 14:23:04 haswell01 kernel: [  417.570090]  [] 
device_create_groups_vargs+0xe0/0x110
Mar 24 14:23:04 haswell01 kernel: [  417.570092]  [] 
device_create+0x41/0x50
Mar 24 14:23:04 haswell01 kernel: [  417.570097]  [] 
drm_sysfs_connector_add+0x69/0x230 [drm]
Mar 24 14:23:04 haswell01 kernel: [  417.570110]  [] 
intel_hdmi_init_connector+0x111/0x260 [i915]
Mar 24 14:23:04 haswell01 kernel: [  417.570119]  [] 
intel_ddi_init+0x270/0x2a0 [i915]
Mar 24 14:23:04 haswell01 kernel: [  417.570130]  [] 
intel_setup_outputs+0x4c6/0x750 [i915]
Mar 24 14:23:04 haswell01 kernel: [  417.570139]  [] 
intel_modeset_init+0x607/0x8f0 [i915]
Mar 24 14:23:04 haswell01 kernel: [  417.570147]  [] 
i915_driver_load+0xbb4/0xe70 [i915]
Mar 24 14:23:04 haswell01 kernel: [  417.570153]  [] 
drm_dev_register+0xa2/0x1e0 [drm]
Mar 24 14:23:04 haswell01 kernel: [  417.570158]  [] 
drm_get_pci_dev+0x92/0x140 [drm]
Mar 24 14:23:04 haswell01 kernel: [  417.570166]  [] 
i915_pci_probe+0x3c/0x90 [i915]
Mar 24 14:23:04 haswell01 kernel: [  417.570168]  [] 
local_pci_probe+0x45/0xa0
Mar 24 14:23:04 haswell01 kernel: [  417.570170]  [] ? 
pci_match_device+0xc5/0xd0
Mar 24 14:23:04 haswell01 kernel: [  417.570172]  [] 
pci_device_probe+0xd9/0x130
Mar 24 14:23:04 haswell01 kernel: [  417.570174]  [] 
driver_probe_device+0x125/0x3b0
Mar 24 14:23:04 haswell01 kernel: [  417.570176]  [] 
__driver_attach+0x93/0xa0
Mar 24 14:23:04 haswell01 kernel: [  417.570178]  [] ? 
__device_attach+0x40/0x40
Mar 24