Re: [PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind

2024-03-28 Thread Dixit, Ashutosh
On Wed, 27 Mar 2024 13:37:19 -0700, Dixit, Ashutosh wrote:
>
> On Wed, 27 Mar 2024 02:15:27 -0700, Krzysztofik, Janusz wrote:
> >
>
> Hi Janusz,
>
> > For me, that still doesn't explain why you think that i915->hwmon reset to
> > NULL on i915 driver unregister can be the root cause of the reported UAF in
> > hwmon sysfs and this patch is going to fix that UAF issue.  I can see no
> > references to i915->hwmon in that code, and even then, that's not NULL what 
> > is
> > reported here as non-canonical address.  The code is using a no longer valid
> > pointer to an already freed (and poisoned) memory.
>
> Correct, I said basically the same thing in my reply to the patch. That the
> patch doesn't explain that ddat seems to have been freed and poisoned.
>
> > > > I think that instead of dropping i915_hwmon_unregister() we have to 
> > > > actually
> > > > unregister hwmon in the body of that function, which is called from
> > > > i915_driver_unregister() intended for closing user interfaces, then 
> > > > called
> > > > relatively early during driver unbind, so hwmon sysfs entries disappear 
> > > > before
> > > > i915 structures, especially uncore used by hwmon code, are freed.
> > >
> > > You mean uncore are freed before hwmon get unregistered, I don't think
> > > so. uncore is also drm device managed resource so in sequence I think it
> > > should be freed after i915 dev managed resources freed.
> >
> > If both uncore and hwmon are managed resources of i915 device then how can 
> > you
> > predict which of them is released first?
>
> Look at __hwmon_device_register. Here we see:
>
>   hdev->parent = dev
>
> So the hwmon device is a child device of the drm device (against which ddat
> is devm_kzalloc'd). Since a child device holds a reference against the
> parent (device_add() has get_device(dev->parent)), I would expect the hwmon
> device to disappear before the parent drm device. And I am assuming hwmon
> sysfs is linked to the hwmon device, so sysfs should disappear before ddat
> getting freed. But apparently this is not what is happening, so there's
> still something we are missing.

I looked into this a bit more. It doesn't seem to have to do with device
references. Basically, managed resources are maintained in a list and they
are removed in a reverse order from which they are added. See
release_nodes(). I believe (haven't really checked it but it looks to be
the case from the logs below) the call stack during release is:

device_release_driver ->
 __device_release_driver ->
  device_unbind_cleanup ->
   devres_release_all ->
release_nodes

With CONFIG_DEBUG_DEVRES=y and 'echo 1 > /sys/module/devres/parameters/log'
and running the selftests I always see:

Alloc:

[  878.406556] i915 :03:00.0: DEVRES ADD 88815d8d4000 
devm_kzalloc_release (616 bytes)
[  878.407632] i915 :03:00.0: DEVRES ADD 888199e6dd40 
devm_hwmon_release (8 bytes)

Dealloc:

[  879.317635] i915 :03:00.0: DEVRES REL 888199e6dd40 
devm_hwmon_release (8 bytes)
[  879.318261] i915 :03:00.0: DEVRES REL 88815d8d4000 
devm_kzalloc_release (616 bytes)

So from the above the memory is always freed after hwmon is
unregistered. 616 bytes is 'sizeof(struct i915_hwmon)'.

So the code seems to be correct. Unless something else is going on these CI
systems where we see this crash.

If we can repro the crash locally (as I described in my other email) that
would be the best. Then we can verify any fix. If we cannot repro it
locally, then we'll need to think of something else. Basically the
direction would be to see if we eliminate devm_ and if that makes the issue
go away.

Thanks.
--
Ashutosh


Re: [PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind

2024-03-28 Thread Krzysztofik, Janusz
Hi Ashutosh,

On Wednesday, 27 March 2024 21:37:19 CET Dixit, Ashutosh wrote:
> On Wed, 27 Mar 2024 02:15:27 -0700, Krzysztofik, Janusz wrote:
> >
> 
> Hi Janusz,
> 
> > For me, that still doesn't explain why you think that i915->hwmon reset to
> > NULL on i915 driver unregister can be the root cause of the reported UAF in
> > hwmon sysfs and this patch is going to fix that UAF issue.  I can see no
> > references to i915->hwmon in that code, and even then, that's not NULL what 
> > is
> > reported here as non-canonical address.  The code is using a no longer valid
> > pointer to an already freed (and poisoned) memory.
> 
> Correct, I said basically the same thing in my reply to the patch. That the
> patch doesn't explain that ddat seems to have been freed and poisoned.
> 
> > > > I think that instead of dropping i915_hwmon_unregister() we have to 
> > > > actually
> > > > unregister hwmon in the body of that function, which is called from
> > > > i915_driver_unregister() intended for closing user interfaces, then 
> > > > called
> > > > relatively early during driver unbind, so hwmon sysfs entries disappear 
> > > > before
> > > > i915 structures, especially uncore used by hwmon code, are freed.
> > >
> > > You mean uncore are freed before hwmon get unregistered, I don't think
> > > so. uncore is also drm device managed resource so in sequence I think it
> > > should be freed after i915 dev managed resources freed.
> >
> > If both uncore and hwmon are managed resources of i915 device then how can 
> > you
> > predict which of them is released first?
> 
> Look at __hwmon_device_register. Here we see:
> 
>   hdev->parent = dev
> 
> So the hwmon device is a child device of the drm device (against which ddat
> is devm_kzalloc'd). Since a child device holds a reference against the
> parent (device_add() has get_device(dev->parent)), I would expect the hwmon
> device to disappear before the parent drm device. 

OK, but the parent device has a lot of managed resources, not only hwmon, and 
for me hwmon apparently depends on at least one of those resources, e.g., a 
structure with pointers to hardware accessors or hardware registers.  Release 
of all those i915 resources is not atomic, then it may happen that a resource 
which hwmon depends on is released while the hwmon resource still active and 
has its sysfs interface exposed, I believe.

> And I am assuming hwmon
> sysfs is linked to the hwmon device, so sysfs should disappear before ddat
> getting freed. But apparently this is not what is happening, 

If that occurred the actual scenario of the failure, i.e., from still user 
reachable code of our sysfs accessors, dev_get_drvdata(dev) pointing to the 
ddat field of an already freed hwmon structure, then I would blame hwmon core 
for that, not our usage of it.

Thanks,
Janusz

> so there's
> still something we are missing.
> 
> Thanks.
> --
> Ashutosh
> 

-
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z 
dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach 
handlowych.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i 
moze zawierac informacje poufne. W razie przypadkowego otrzymania tej 
wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; 
jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). If you are not the intended recipient, please 
contact the sender and delete all copies; any review or distribution by others 
is strictly prohibited.


Re: [PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind

2024-03-27 Thread Dixit, Ashutosh
On Wed, 27 Mar 2024 02:15:27 -0700, Krzysztofik, Janusz wrote:
>

Hi Janusz,

> For me, that still doesn't explain why you think that i915->hwmon reset to
> NULL on i915 driver unregister can be the root cause of the reported UAF in
> hwmon sysfs and this patch is going to fix that UAF issue.  I can see no
> references to i915->hwmon in that code, and even then, that's not NULL what is
> reported here as non-canonical address.  The code is using a no longer valid
> pointer to an already freed (and poisoned) memory.

Correct, I said basically the same thing in my reply to the patch. That the
patch doesn't explain that ddat seems to have been freed and poisoned.

> > > I think that instead of dropping i915_hwmon_unregister() we have to 
> > > actually
> > > unregister hwmon in the body of that function, which is called from
> > > i915_driver_unregister() intended for closing user interfaces, then called
> > > relatively early during driver unbind, so hwmon sysfs entries disappear 
> > > before
> > > i915 structures, especially uncore used by hwmon code, are freed.
> >
> > You mean uncore are freed before hwmon get unregistered, I don't think
> > so. uncore is also drm device managed resource so in sequence I think it
> > should be freed after i915 dev managed resources freed.
>
> If both uncore and hwmon are managed resources of i915 device then how can you
> predict which of them is released first?

Look at __hwmon_device_register. Here we see:

hdev->parent = dev

So the hwmon device is a child device of the drm device (against which ddat
is devm_kzalloc'd). Since a child device holds a reference against the
parent (device_add() has get_device(dev->parent)), I would expect the hwmon
device to disappear before the parent drm device. And I am assuming hwmon
sysfs is linked to the hwmon device, so sysfs should disappear before ddat
getting freed. But apparently this is not what is happening, so there's
still something we are missing.

Thanks.
--
Ashutosh


Re: [PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind

2024-03-27 Thread Dixit, Ashutosh
On Tue, 26 Mar 2024 05:48:38 -0700, Badal Nilawar wrote:
>

Hi Badal,

> i915_hwmon and its resources are managed resources of i915 dev.
> During i915 driver unregister flow the function i915_hwmon_unregister()
> explicitly makes i915_hwmon resource NULL. This happen before
> hwmon is actually unregistered. Doing so may cause UAF if hwmon
> sysfs is being accessed:

I don't agree with this patch. For even if we remove setting to NULL, it
doesn't explain what we see, which is that the devm_kzalloc'd i915->hwmon
has been *freed* (since ddat (i915->hwmon->ddat) is 0x6b6b6b6b6b6b6dbf):

(gdb) list *hwm_energy+0x2b
0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134).
129 struct hwm_energy_info *ei = >ei;
130 intel_wakeref_t wakeref;
131 i915_reg_t rgaddr;
132 u32 reg_val;
133
134 if (ddat->gt_n >= 0)
135 rgaddr = hwmon->rg.energy_status_tile;
136 else
137 rgaddr = hwmon->rg.energy_status_all;
138

If we don't have i915_hwmon_unregister equivalent in xe, that's because xe
has not implemented equivalents of i915_hwmon_power_max_disable and
i915_hwmon_power_max_restore, where the check is used.

Also, we should verify any fix before sending a patch out, either via local
repro (say use a script to read hwmon energy file while running the
selftests) or CI. Though in this case CI doesn't show any failures but I am
wondering if that is only by chance.

Fwiu, the issue is somehow being caused by ddat being *freed* before hwmon
sysfs has disappeared, at module unload time (since selftests use module
unload), allowing prometheus-node process to call into the energy read
sysfs and cause the crash.

Thanks.
--
Ashutosh


>
> <7> [518.386591] i915 :03:00.0: [drm] intel_gt_set_wedged called from 
> intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
> <7> [518.471128] i915 :03:00.0: [drm:drm_client_release] drm_fb_helper
> <4> [518.501476] general protection fault, probably for non-canonical address 
> 0x6b6b6b6b6b6b6dbf:  [#1] PREEMPT SMP NOPTI
> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G U   
>   6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client 
> Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 
> 05/27/2023
> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 
> f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 
> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
> <4> [518.559746] RSP: 0018:c900077efd00 EFLAGS: 00010202
> <4> [518.564943] RAX:  RBX: 8881e3078428 RCX: 
> 
> <4> [518.572025] RDX: 0001 RSI: c900077efda0 RDI: 
> 6b6b6b6b
> <4> [518.579107] RBP: c900077efd40 R08: c900077efda0 R09: 
> 0001
> <4> [518.586186] R10: 0001 R11: 88810c19aa00 R12: 
> 888243e1a010
> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: 
> 8881e3078428
> <4> [518.600343] FS:  7f9def400700() GS:8d10() 
> knlGS:
> <4> [518.608373] CS:  0010 DS:  ES:  CR0: 80050033
> <4> [518.614077] CR2: 564f19bff288 CR3: 000119f94000 CR4: 
> 00f50ef0
> <4> [518.621158] PKRU: 5554
> <4> [518.623858] Call Trace:
> <4> [518.626303]  
> <4> [518.628400]  ? __die_body+0x1a/0x60
> <4> [518.631881]  ? die_addr+0x38/0x60
> <4> [518.635182]  ? exc_general_protection+0x1a1/0x400
> <4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
> <4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
> <4> [518.649007]  hwm_read+0x9a/0x310 [i915]
> <4> [518.652953]  hwmon_attr_show+0x36/0x140
> <4> [518.656775]  dev_attr_show+0x15/0x60
>
> Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
> Cc: Ashutosh Dixit 
> Cc: Janusz Krzysztofik 
> Signed-off-by: Badal Nilawar 
> ---
>  drivers/gpu/drm/i915/i915_driver.c | 2 --
>  drivers/gpu/drm/i915/i915_hwmon.c  | 5 -
>  2 files changed, 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> b/drivers/gpu/drm/i915/i915_driver.c
> index 4b9233c07a22..a95b455964b7 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct 
> drm_i915_private *dev_priv)
>   for_each_gt(gt, dev_priv, i)
>   intel_gt_driver_unregister(gt);
>
> - i915_hwmon_unregister(dev_priv);
> -
>   i915_perf_unregister(dev_priv);
>   i915_pmu_unregister(dev_priv);
>
> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
> b/drivers/gpu/drm/i915/i915_hwmon.c
> index c9169e56b9a1..91f171752d34 

Re: [PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind

2024-03-27 Thread Krzysztofik, Janusz
Hi Badal,

On Wednesday, 27 March 2024 05:30:17 CET Nilawar, Badal wrote:
> 
> On 27-03-2024 02:58, Krzysztofik, Janusz wrote:
> > On Tuesday, 26 March 2024 13:48:38 CET Badal Nilawar wrote:
> >> i915_hwmon and its resources are managed resources of i915 dev.
> >> During i915 driver unregister flow the function i915_hwmon_unregister()
> >> explicitly makes i915_hwmon resource NULL. This happen before
> >> hwmon is actually unregistered. Doing so may cause UAF if hwmon
> >> sysfs is being accessed:
> >>
> >> <7> [518.386591] i915 :03:00.0: [drm] intel_gt_set_wedged called from 
> >> intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
> >> <7> [518.471128] i915 :03:00.0: [drm:drm_client_release] drm_fb_helper
> >> <4> [518.501476] general protection fault, probably for non-canonical 
> >> address 0x6b6b6b6b6b6b6dbf:  [#1] PREEMPT SMP NOPTI
> >> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G U
> >>  6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
> >> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client 
> >> Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 
> >> 05/27/2023
> >> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
> >> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 
> >> e4 f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b 
> >> <45> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
> >> <4> [518.559746] RSP: 0018:c900077efd00 EFLAGS: 00010202
> >> <4> [518.564943] RAX:  RBX: 8881e3078428 RCX: 
> >> 
> >> <4> [518.572025] RDX: 0001 RSI: c900077efda0 RDI: 
> >> 6b6b6b6b
> >> <4> [518.579107] RBP: c900077efd40 R08: c900077efda0 R09: 
> >> 0001
> >> <4> [518.586186] R10: 0001 R11: 88810c19aa00 R12: 
> >> 888243e1a010
> >> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: 
> >> 8881e3078428
> >> <4> [518.600343] FS:  7f9def400700() GS:8d10() 
> >> knlGS:
> >> <4> [518.608373] CS:  0010 DS:  ES:  CR0: 80050033
> >> <4> [518.614077] CR2: 564f19bff288 CR3: 000119f94000 CR4: 
> >> 00f50ef0
> >> <4> [518.621158] PKRU: 5554
> >> <4> [518.623858] Call Trace:
> >> <4> [518.626303]  
> >> <4> [518.628400]  ? __die_body+0x1a/0x60
> >> <4> [518.631881]  ? die_addr+0x38/0x60
> >> <4> [518.635182]  ? exc_general_protection+0x1a1/0x400
> >> <4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
> >> <4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
> >> <4> [518.649007]  hwm_read+0x9a/0x310 [i915]
> >> <4> [518.652953]  hwmon_attr_show+0x36/0x140
> >> <4> [518.656775]  dev_attr_show+0x15/0x60
> > 
> > I don't think that's a good example of what you are talking about in your
> > commit description.  I haven't looked throughout the i915 code to find out 
> > who
> > actually uses that i915->hwmon pointer and when, but while looking at the
> > hwmon code that now fails on sysfs access, I haven't noticed any use of
> > i915->hwmon.
> 
> i915_hwmon is main structure and I think issue is ddat is invalid.
> 
> struct hwm_drvdata {
>  struct i915_hwmon *hwmon;
>  struct intel_uncore *uncore;
>  struct device *hwmon_dev;
>  struct hwm_energy_info ei;  /*  Energy info for energy1_input */
>  char name[12];
>  int gt_n;
>  bool reset_in_progress;
>  wait_queue_head_t waitq;
> };
> 
> struct i915_hwmon {
>  struct hwm_drvdata ddat;
>  struct hwm_drvdata ddat_gt[I915_MAX_GT];
>  struct mutex hwmon_lock;/* counter overflow logic and rmw */
>  struct hwm_reg rg;
>  int scl_shift_power;
>  int scl_shift_energy;
>  int scl_shift_time;
> };
> 
> (gdb) list *hwm_energy+0x2b
> 0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134).
> 129 struct hwm_energy_info *ei = >ei;
> 130 intel_wakeref_t wakeref;
> 131 i915_reg_t rgaddr;
> 132 u32 reg_val;
> 133
> 134 if (ddat->gt_n >= 0)
> 135 rgaddr = hwmon->rg.energy_status_tile;
> 136 else
> 137 rgaddr = hwmon->rg.energy_status_all;
> 138

For me, that still doesn't explain why you think that i915->hwmon reset to 
NULL on i915 driver unregister can be the root cause of the reported UAF in 
hwmon sysfs and this patch is going to fix that UAF issue.  I can see no 
references to i915->hwmon in that code, and even then, that's not NULL what is 
reported here as non-canonical address.  The code is using a no longer valid 
pointer to an already freed (and poisoned) memory.

> 
> > 
> > I think that instead of dropping i915_hwmon_unregister() we have to actually
> > unregister hwmon in the body of that function, which is called from
> > i915_driver_unregister() intended for closing user interfaces, then called
> > 

Re: [PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind

2024-03-26 Thread Nilawar, Badal




On 27-03-2024 02:58, Krzysztofik, Janusz wrote:

On Tuesday, 26 March 2024 13:48:38 CET Badal Nilawar wrote:

i915_hwmon and its resources are managed resources of i915 dev.
During i915 driver unregister flow the function i915_hwmon_unregister()
explicitly makes i915_hwmon resource NULL. This happen before
hwmon is actually unregistered. Doing so may cause UAF if hwmon
sysfs is being accessed:

<7> [518.386591] i915 :03:00.0: [drm] intel_gt_set_wedged called from 
intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
<7> [518.471128] i915 :03:00.0: [drm:drm_client_release] drm_fb_helper
<4> [518.501476] general protection fault, probably for non-canonical address 
0x6b6b6b6b6b6b6dbf:  [#1] PREEMPT SMP NOPTI
<4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G U 
6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
<4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client 
Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023
<4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
<4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 f0 48 83 
ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 8b bd 54 02 00 00 
49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
<4> [518.559746] RSP: 0018:c900077efd00 EFLAGS: 00010202
<4> [518.564943] RAX:  RBX: 8881e3078428 RCX: 

<4> [518.572025] RDX: 0001 RSI: c900077efda0 RDI: 
6b6b6b6b
<4> [518.579107] RBP: c900077efd40 R08: c900077efda0 R09: 
0001
<4> [518.586186] R10: 0001 R11: 88810c19aa00 R12: 
888243e1a010
<4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: 
8881e3078428
<4> [518.600343] FS:  7f9def400700() GS:8d10() 
knlGS:
<4> [518.608373] CS:  0010 DS:  ES:  CR0: 80050033
<4> [518.614077] CR2: 564f19bff288 CR3: 000119f94000 CR4: 
00f50ef0
<4> [518.621158] PKRU: 5554
<4> [518.623858] Call Trace:
<4> [518.626303]  
<4> [518.628400]  ? __die_body+0x1a/0x60
<4> [518.631881]  ? die_addr+0x38/0x60
<4> [518.635182]  ? exc_general_protection+0x1a1/0x400
<4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
<4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
<4> [518.649007]  hwm_read+0x9a/0x310 [i915]
<4> [518.652953]  hwmon_attr_show+0x36/0x140
<4> [518.656775]  dev_attr_show+0x15/0x60


I don't think that's a good example of what you are talking about in your
commit description.  I haven't looked throughout the i915 code to find out who
actually uses that i915->hwmon pointer and when, but while looking at the
hwmon code that now fails on sysfs access, I haven't noticed any use of
i915->hwmon.


i915_hwmon is main structure and I think issue is ddat is invalid.

struct hwm_drvdata {
struct i915_hwmon *hwmon;
struct intel_uncore *uncore;
struct device *hwmon_dev;
struct hwm_energy_info ei;  /*  Energy info for energy1_input */
char name[12];
int gt_n;
bool reset_in_progress;
wait_queue_head_t waitq;
};

struct i915_hwmon {
struct hwm_drvdata ddat;
struct hwm_drvdata ddat_gt[I915_MAX_GT];
struct mutex hwmon_lock;/* counter overflow logic and rmw */
struct hwm_reg rg;
int scl_shift_power;
int scl_shift_energy;
int scl_shift_time;
};

(gdb) list *hwm_energy+0x2b
0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134).
129 struct hwm_energy_info *ei = >ei;
130 intel_wakeref_t wakeref;
131 i915_reg_t rgaddr;
132 u32 reg_val;
133
134 if (ddat->gt_n >= 0)
135 rgaddr = hwmon->rg.energy_status_tile;
136 else
137 rgaddr = hwmon->rg.energy_status_all;
138



I think that instead of dropping i915_hwmon_unregister() we have to actually
unregister hwmon in the body of that function, which is called from
i915_driver_unregister() intended for closing user interfaces, then called
relatively early during driver unbind, so hwmon sysfs entries disappear before
i915 structures, especially uncore used by hwmon code, are freed.


You mean uncore are freed before hwmon get unregistered, I don't think 
so. uncore is also drm device managed resource so in sequence I think it 
should be freed after i915 dev managed resources freed.


 871 static int intel_gt_tile_setup(struct intel_gt *gt, phys_addr_t 
phys_addr)

 872 {
 873 int ret;
 874
 875 if (!gt_is_root(gt)) {
 876 struct intel_uncore *uncore;
 877 spinlock_t *irq_lock;
 878
 879 uncore = drmm_kzalloc(>i915->drm, 
sizeof(*uncore), GFP_KERNEL);

 880 if (!uncore)
 881 return -ENOMEM;
 882
 883 irq_lock = drmm_kzalloc(>i915->drm, 
sizeof(*irq_lock), GFP_KERNEL);

 884 if 

Re: [PATCH] drm/i915/hwmon: Remove i915_hwmon_unregister() during driver unbind

2024-03-26 Thread Krzysztofik, Janusz
On Tuesday, 26 March 2024 13:48:38 CET Badal Nilawar wrote:
> i915_hwmon and its resources are managed resources of i915 dev.
> During i915 driver unregister flow the function i915_hwmon_unregister()
> explicitly makes i915_hwmon resource NULL. This happen before
> hwmon is actually unregistered. Doing so may cause UAF if hwmon
> sysfs is being accessed:
> 
> <7> [518.386591] i915 :03:00.0: [drm] intel_gt_set_wedged called from 
> intel_gt_set_wedged_on_fini+0xd/0x30 [i915]
> <7> [518.471128] i915 :03:00.0: [drm:drm_client_release] drm_fb_helper
> <4> [518.501476] general protection fault, probably for non-canonical address 
> 0x6b6b6b6b6b6b6dbf:  [#1] PREEMPT SMP NOPTI
> <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G U   
>   6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1
> <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client 
> Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 
> 05/27/2023
> <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915]
> <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 
> f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 
> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0
> <4> [518.559746] RSP: 0018:c900077efd00 EFLAGS: 00010202
> <4> [518.564943] RAX:  RBX: 8881e3078428 RCX: 
> 
> <4> [518.572025] RDX: 0001 RSI: c900077efda0 RDI: 
> 6b6b6b6b
> <4> [518.579107] RBP: c900077efd40 R08: c900077efda0 R09: 
> 0001
> <4> [518.586186] R10: 0001 R11: 88810c19aa00 R12: 
> 888243e1a010
> <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: 
> 8881e3078428
> <4> [518.600343] FS:  7f9def400700() GS:8d10() 
> knlGS:
> <4> [518.608373] CS:  0010 DS:  ES:  CR0: 80050033
> <4> [518.614077] CR2: 564f19bff288 CR3: 000119f94000 CR4: 
> 00f50ef0
> <4> [518.621158] PKRU: 5554
> <4> [518.623858] Call Trace:
> <4> [518.626303]  
> <4> [518.628400]  ? __die_body+0x1a/0x60
> <4> [518.631881]  ? die_addr+0x38/0x60
> <4> [518.635182]  ? exc_general_protection+0x1a1/0x400
> <4> [518.639862]  ? asm_exc_general_protection+0x26/0x30
> <4> [518.644710]  ? hwm_energy+0x2b/0x100 [i915]
> <4> [518.649007]  hwm_read+0x9a/0x310 [i915]
> <4> [518.652953]  hwmon_attr_show+0x36/0x140
> <4> [518.656775]  dev_attr_show+0x15/0x60

I don't think that's a good example of what you are talking about in your 
commit description.  I haven't looked throughout the i915 code to find out who 
actually uses that i915->hwmon pointer and when, but while looking at the 
hwmon code that now fails on sysfs access, I haven't noticed any use of 
i915->hwmon.

I think that instead of dropping i915_hwmon_unregister() we have to actually 
unregister hwmon in the body of that function, which is called from 
i915_driver_unregister() intended for closing user interfaces, then called 
relatively early during driver unbind, so hwmon sysfs entries disappear before 
i915 structures, especially uncore used by hwmon code, are freed.

Thanks,
Janusz

> 
> Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure")
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366
> Cc: Ashutosh Dixit 
> Cc: Janusz Krzysztofik 
> Signed-off-by: Badal Nilawar 
> ---
>  drivers/gpu/drm/i915/i915_driver.c | 2 --
>  drivers/gpu/drm/i915/i915_hwmon.c  | 5 -
>  2 files changed, 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> b/drivers/gpu/drm/i915/i915_driver.c
> index 4b9233c07a22..a95b455964b7 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct 
> drm_i915_private *dev_priv)
>   for_each_gt(gt, dev_priv, i)
>   intel_gt_driver_unregister(gt);
>  
> - i915_hwmon_unregister(dev_priv);
> -
>   i915_perf_unregister(dev_priv);
>   i915_pmu_unregister(dev_priv);
>  
> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c 
> b/drivers/gpu/drm/i915/i915_hwmon.c
> index c9169e56b9a1..91f171752d34 100644
> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> @@ -841,8 +841,3 @@ void i915_hwmon_register(struct drm_i915_private *i915)
>   ddat_gt->hwmon_dev = hwmon_dev;
>   }
>  }
> -
> -void i915_hwmon_unregister(struct drm_i915_private *i915)
> -{
> - fetch_and_zero(>hwmon);
> -}
> 

-
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial 
Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | 
Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z 
dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w