Re: [RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-02-01 Thread Marco Pagani



On 2024-01-30 21:47, Luis Chamberlain wrote:
> On Tue, Jan 30, 2024 at 08:36:14PM +0100, Marco Pagani wrote:
>> The current implementation of try_module_get() requires the module to
>> exist and be live as a precondition. While this may seem intuitive at
>> first glance, enforcing the precondition can be tricky, considering that
>> modules can be unloaded at any time if not previously taken. For
>> instance, the caller could be preempted just before calling
>> try_module_get(), and while preempted, the module could be unloaded and
>> freed. More subtly, the module could also be unloaded at any point while
>> executing try_module_get() before incrementing the refount with
>> atomic_inc_not_zero().
>>
>> Neglecting the precondition that the module must exist and be live can
>> cause unexpected race conditions that can lead to crashes. However,
>> ensuring that the precondition is met may require additional locking
>> that increases the complexity of the code and can make it more
>> error-prone.
>>
>> This patch adds a slower yet safer implementation of try_module_get()
>> that checks if the module is valid by looking into the mod_tree before
>> taking the module's refcount. This new function can be safely called on
>> stale and invalid module pointers, relieving developers from the burden
>> of ensuring that the module exists and is live before attempting to take
>> it.
>>
>> The tree lookup and refcount increment are executed after taking the
>> module_mutex to prevent the module from being unloaded after looking up
>> the tree.
>>
>> Signed-off-by: Marco Pagani 
> 
> It very much sounds like there is a desire to have this but without a
> user, there is no justification.

I was working on a set of patches to fix an issue in the fpga subsystem
when I came across your commit 557aafac1153 ("kernel/module: add
documentation for try_module_get()") that made me realize we also had a
safety problem. 

To solve this problem for the fpga manager, we had to add a mutex to
ensure the low-level module still exists before calling
try_module_get(). However, having a safer version of try_module_get()
would have simplified the code and made it more robust against changes.

https://lore.kernel.org/linux-fpga/2024060242.149265-1-marpa...@redhat.com/

I suspect there may be other cases where try_module_get() is
inadvertently called without ensuring that the module still exists
that may benefit from a safer implementation.

>> +bool try_module_get_safe(struct module *module)
>> +{
>> +struct module *mod;
>> +bool ret = true;
>> +
>> +if (!module)
>> +goto out;
>> +
>> +mutex_lock(_mutex);
> 
> If a user comes around then this should be mutex_lock_interruptible(),
> and add might_sleep()

Would it be okay to return false if it gets interrupted, or should I
change the return type to int to propagate -EINTR? My concern with
changing the signature is that it would be less straightforward to
use the function in place of try_module_get().

>> +
>> +/*
>> + * Check if the address points to a valid live module and take
>> + * the refcount only if it points to the module struct.
>> + */
>> +mod = __module_address((unsigned long)module);
>> +if (mod && mod == module && module_is_live(mod))
>> +__module_get(mod);
>> +else
>> +ret = false;
>> +
>> +mutex_unlock(_mutex);
>> +
>> +out:
>> +return ret;
>> +}
>> +EXPORT_SYMBOL(try_module_get_safe);
> 
> And EXPORT_SYMBOL_GPL() would need to be used.

Okay, I initially used EXPORT_SYMBOL() to be compatible with
try_module_get().

> 
> I'd also expect selftests to be expanded for this case, but again,
> without a user, this is just trying to resolve a problem which does not
> exist.

I can add selftests in the next versions.
Thanks,
Marco




[RFC PATCH] kernel/module: add a safer implementation of try_module_get()

2024-01-30 Thread Marco Pagani
The current implementation of try_module_get() requires the module to
exist and be live as a precondition. While this may seem intuitive at
first glance, enforcing the precondition can be tricky, considering that
modules can be unloaded at any time if not previously taken. For
instance, the caller could be preempted just before calling
try_module_get(), and while preempted, the module could be unloaded and
freed. More subtly, the module could also be unloaded at any point while
executing try_module_get() before incrementing the refount with
atomic_inc_not_zero().

Neglecting the precondition that the module must exist and be live can
cause unexpected race conditions that can lead to crashes. However,
ensuring that the precondition is met may require additional locking
that increases the complexity of the code and can make it more
error-prone.

This patch adds a slower yet safer implementation of try_module_get()
that checks if the module is valid by looking into the mod_tree before
taking the module's refcount. This new function can be safely called on
stale and invalid module pointers, relieving developers from the burden
of ensuring that the module exists and is live before attempting to take
it.

The tree lookup and refcount increment are executed after taking the
module_mutex to prevent the module from being unloaded after looking up
the tree.

Signed-off-by: Marco Pagani 
---
 include/linux/module.h | 15 +++
 kernel/module/main.c   | 27 +++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 08364d5cbc07..86b6ea43d204 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -695,6 +695,19 @@ extern void __module_get(struct module *module);
  */
 extern bool try_module_get(struct module *module);
 
+/**
+ * try_module_get_safe() - safely take the refcount of a module.
+ * @module: address of the module to be taken.
+ *
+ * Safer version of try_module_get(). Check first if the module exists and is 
alive,
+ * and then take its reference count.
+ *
+ * Return:
+ * * %true - module exists and its refcount has been incremented or module is 
NULL.
+ * * %false - module does not exist.
+ */
+extern bool try_module_get_safe(struct module *module);
+
 /**
  * module_put() - release a reference count to a module
  * @module: the module we should release a reference count for
@@ -815,6 +828,8 @@ static inline bool try_module_get(struct module *module)
return true;
 }
 
+#define try_module_get_safe(module) try_module_get(module)
+
 static inline void module_put(struct module *module)
 {
 }
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 98fedfdb8db5..22376b69778c 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -842,6 +842,33 @@ bool try_module_get(struct module *module)
 }
 EXPORT_SYMBOL(try_module_get);
 
+bool try_module_get_safe(struct module *module)
+{
+   struct module *mod;
+   bool ret = true;
+
+   if (!module)
+   goto out;
+
+   mutex_lock(_mutex);
+
+   /*
+* Check if the address points to a valid live module and take
+* the refcount only if it points to the module struct.
+*/
+   mod = __module_address((unsigned long)module);
+   if (mod && mod == module && module_is_live(mod))
+   __module_get(mod);
+   else
+   ret = false;
+
+   mutex_unlock(_mutex);
+
+out:
+   return ret;
+}
+EXPORT_SYMBOL(try_module_get_safe);
+
 void module_put(struct module *module)
 {
int ret;

base-commit: 4515d08a742c76612b65d2f47a87d12860519842
-- 
2.43.0




[PATCH] kernel/module: improve documentation for try_module_get()

2023-12-21 Thread Marco Pagani
The sentence "this call will fail if the module is already being
removed" is potentially confusing and may contradict the rest of the
documentation. If one tries to get a module that has already been
removed using a stale pointer, the kernel will crash.

Signed-off-by: Marco Pagani 
---
 include/linux/module.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index a98e188cf37b..08364d5cbc07 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -668,7 +668,7 @@ extern void __module_get(struct module *module);
  * @module: the module we should check for
  *
  * Only try to get a module reference count if the module is not being removed.
- * This call will fail if the module is already being removed.
+ * This call will fail if the module is in the process of being removed.
  *
  * Care must also be taken to ensure the module exists and is alive prior to
  * usage of this call. This can be gauranteed through two means:
-- 
2.43.0