from:"Rasmus Villemoes"

[PATCH] docs: admin-guide: update description for kernel.modprobe sysctl

2021-04-20 Thread Rasmus Villemoes

When I added CONFIG_MODPROBE_PATH, I neglected to update
Documentation/. It's still true that this defaults to /sbin/modprobe,
but now via a level of indirection. So document that the kernel might
have been built with something other than /sbin/modprobe as the
initial value.

Signed-off-by: Rasmus Villemoes 
---

Andrew, this is a followup or a fixup, however you want to handle it,
to "modules: add CONFIG_MODPROBE_PATH" (in next-20210419 known as
8bc50a36278dbf3e14b25236e3acee25f75d5bd8). I.e., please add this as-is
to your patch queue, or fold this into that one.

 Documentation/admin-guide/sysctl/kernel.rst | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
b/Documentation/admin-guide/sysctl/kernel.rst
index 1d56a6b73a4e..7ca8df5451d4 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -483,10 +483,11 @@ modprobe
 
 
 The full path to the usermode helper for autoloading kernel modules,
-by default "/sbin/modprobe".  This binary is executed when the kernel
-requests a module.  For example, if userspace passes an unknown
-filesystem type to mount(), then the kernel will automatically request
-the corresponding filesystem module by executing this usermode helper.
+by default ``CONFIG_MODPROBE_PATH``, which in turn defaults to
+"/sbin/modprobe".  This binary is executed when the kernel requests a
+module.  For example, if userspace passes an unknown filesystem type
+to mount(), then the kernel will automatically request the
+corresponding filesystem module by executing this usermode helper.
 This usermode helper should insert the needed module into the kernel.
 
 This sysctl only affects module autoloading.  It has no effect on the
-- 
2.29.2

[PATCH] docs: admin-guide: update description for kernel.hotplug sysctl

2021-04-20 Thread Rasmus Villemoes

It's been a few releases since this defaulted to /sbin/hotplug. Update
the text, and include pointers to the two CONFIG_UEVENT_HELPER{,_PATH}
config knobs whose help text could provide more info, but also hint
that the user probably doesn't need to care at all.

Fixes: 7934779a69f1 ("Driver-Core: disable /sbin/hotplug by default")
Signed-off-by: Rasmus Villemoes 
---
 Documentation/admin-guide/sysctl/kernel.rst | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst 
b/Documentation/admin-guide/sysctl/kernel.rst
index 1d56a6b73a4e..c24f57f2c782 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -333,7 +333,12 @@ hotplug
 ===
 
 Path for the hotplug policy agent.
-Default value is "``/sbin/hotplug``".
+Default value is ``CONFIG_UEVENT_HELPER_PATH``, which in turn defaults
+to the empty string.
+
+This file only exists when ``CONFIG_UEVENT_HELPER`` is enabled. Most
+modern systems rely exclusively on the netlink-based uevent source and
+don't need this.
 
 
 hung_task_all_cpu_backtrace
-- 
2.29.2

Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-20 Thread Rasmus Villemoes

On 17/04/2021 01.07, Matthew Wilcox (Oracle) wrote:
> 32-bit architectures which expect 8-byte alignment for 8-byte integers
> and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
> page inadvertently expanded in 2019.  When the dma_addr_t was added,
> it forced the alignment of the union to 8 bytes, which inserted a 4 byte
> gap between 'flags' and the union.
> 
> Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
> This restores the alignment to that of an unsigned long, and also fixes a
> potential problem where (on a big endian platform), the bit used to denote
> PageTail could inadvertently get set, and a racing get_user_pages_fast()
> could dereference a bogus compound_head().
> 
> Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>  include/linux/mm_types.h |  4 ++--
>  include/net/page_pool.h  | 12 +++-
>  net/core/page_pool.c | 12 +++-
>  3 files changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6613b26a8894..5aacc1c10a45 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -97,10 +97,10 @@ struct page {
>   };
>   struct {/* page_pool used by netstack */
>   /**
> -  * @dma_addr: might require a 64-bit value even on
> +  * @dma_addr: might require a 64-bit value on
>* 32-bit architectures.
>*/
> - dma_addr_t dma_addr;
> + unsigned long dma_addr[2];

Shouldn't that member get another name (_dma_addr?) to be sure the
buildbots catch every possible (ab)user and get them turned into the new
accessors? Sure, page->dma_addr is now "pointer to unsigned long"
instead of "dma_addr_t", but who knows if there's a
"(long)page->dma_addr" somewhere?

Rasmus

[PATCH] driver core: make device_set_deferred_probe_reason a no-op when !CONFIG_DEBUG_FS

2021-04-19 Thread Rasmus Villemoes

When debugfs is not enabled, the deferred_probe_reason string is never
read. So there's no reason to spend time and memory on recording it.

There's still a bunch of redundant kfree(NULL) calls and NULL
assignments, but this gives most of the benefit (avoiding two
vsnprintf() and a kmalloc()) for the minimal amount of ifdeffery.

Signed-off-by: Rasmus Villemoes 
---
 drivers/base/dd.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 37a5e5f8b221..6a197336c6a4 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -216,9 +216,13 @@ void device_unblock_probing(void)
  * device_set_deferred_probe_reason() - Set defer probe reason message for 
device
  * @dev: the pointer to the struct device
  * @vaf: the pointer to va_format structure with message
+ *
+ * The ->deferred_probe_reason string is only ever read when debugfs
+ * is enabled, so this is a no-op for !CONFIG_DEBUG_FS.
  */
 void device_set_deferred_probe_reason(const struct device *dev, struct 
va_format *vaf)
 {
+#ifdef CONFIG_DEBUG_FS
const char *drv = dev_driver_string(dev);
 
mutex_lock(_probe_mutex);
@@ -227,6 +231,7 @@ void device_set_deferred_probe_reason(const struct device 
*dev, struct va_format
dev->p->deferred_probe_reason = kasprintf(GFP_KERNEL, "%s: %pV", drv, 
vaf);
 
mutex_unlock(_probe_mutex);
+#endif
 }
 
 /*
-- 
2.29.2

Re: [PATCH 05/15] x86: Implement function_nocfi

2021-04-19 Thread Rasmus Villemoes

On 17/04/2021 00.28, Kees Cook wrote:
> On Fri, Apr 16, 2021 at 03:06:17PM -0700, Andy Lutomirski wrote:

>> The
>> foo symbol would point to whatever magic is needed.
> 
> No, the symbol points to the jump table entry. Direct calls get minimal
> overhead and indirect calls can add the "is this function in the right
> table" checking.
> 
> 
> But note that this shouldn't turn into a discussion of "maybe Clang could
> do CFI differently"; this is what Clang has.

Why not? In particular, I'd really like somebody to answer the question
"why not just store a cookie before each address-taken or
external-linkage function?".

(1) direct calls get _no_ overhead, not just "minimal".
(2) address comparison, intra- or inter-module, Just Works, no need to
disable one sanity check to get others.
(3) The overhead and implementation of the signature check is the same
for inter- and intra- module calls.
(4) passing (unsigned long)sym into asm code or stashing it in a table
Just Works.

How much code would be needed on the clang side to provide a
--cfi-mode=inline ?

The only issue, AFAICT, which is also a problem being handled in the
current proposal, is indirect calls to asm code from C. They either have
to be instrumented to not do any checking (which requires callers to
know that the pointer they have might point to an asm-implementation),
or the asm code could be taught to emit those cookies as well - which
would provide even better coverage. Something like

CFI_COOKIE(foo)
foo:
  ...

with CFI_COOKIE expanding to nothing when !CONFIG_CFI, and otherwise to
(something like) ".quad cfi_cookie__foo" ; with some appropriate Kbuild
and compiler magic to generate a table of cfi_cookie__* defines from a
header file with the prototypes.

Rasmus

Re: [PATCH v5] printk: Userspace format enumeration support

2021-04-19 Thread Rasmus Villemoes

On 16/04/2021 15.56, Chris Down wrote:
> Hey Petr, Rasmus,

>> This is great point! There are many other subsystem specific wrappers,
>> e,g, ata_dev_printk(), netdev_printk(), snd_printk(), dprintk().
>> We should make it easy to index them as well.
> 
> These would be nice to have, but we should agree about how we store
> things internally.
> 
> For example, in printk we typically store the level inline as part of
> the format string at compile time. However, for `dev_printk`, it's
> passed entirely separately from the format string after preprocessing is
> already concluded (or at least, not in a way we can easily parse it the
> same way we do for printk()):
> 
> void dev_printk(const char *level, const struct device *dev, const
> char *fmt, ...)

Hm, yeah, for "naked" dev_printk() calls there's no easy way to grab the
level, for dev_err and friends it's somewhat easier as you could just
hook into the definition of the dev_err macro. I'm not saying you need
to handle everything at once, but doing dev_err and netdev_err would get
you a very long way

> One (ugly) way to handle this would be to have a new "level" field in
> the printk index entry, with semantics that if it's some sentinel value,
> look at the format itself for the format, otherwise if it's some other
> value, the level field itself is the level.
>
> This will work, but it's pretty ugly. Any better suggestions? :-)

Well, that was more or less exactly what I suggested when I wrote

> One could also record the function a format is being used with - without
> that, the display probably can't show a reasonable  for those
> dev_* function.

But, I think the real question is, why are we/you interested in the
level at all? Isn't the format string itself enough for the purpose of
tracking which printks have come and gone? IOW, what about, on the
display side, simply skipping over some KERN_* prefix if present?

Rasmus

Re: [PATCH 1/2] workqueue: Have 'alloc_workqueue()' like macros accept a format specifier

2021-04-19 Thread Rasmus Villemoes

On 18/04/2021 23.26, Christophe JAILLET wrote:
> Improve 'create_workqueue', 'create_freezable_workqueue' and
> 'create_singlethread_workqueue' so that they accept a format
> specifier and a variable number of arguments.
> 
> This will put these macros more in line with 'alloc_ordered_workqueue' and
> the underlying 'alloc_workqueue()' function.
> 
> This will also allow further code simplification.
> 
> Suggested-by: Bart Van Assche 
> Signed-off-by: Christophe JAILLET 
> ---
>  include/linux/workqueue.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index d15a7730ee18..145e756ff315 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -425,13 +425,13 @@ struct workqueue_struct *alloc_workqueue(const char 
> *fmt,
>   alloc_workqueue(fmt, WQ_UNBOUND | __WQ_ORDERED |\
>   __WQ_ORDERED_EXPLICIT | (flags), 1, ##args)
>  
> -#define create_workqueue(name)   
> \
> - alloc_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, 1, (name))
> +#define create_workqueue(fmt, args...)   
> \
> + alloc_workqueue(fmt, __WQ_LEGACY | WQ_MEM_RECLAIM, 1, ##args)

The changes make sense, but are you sure that no current users of those
macros have some % character in the string they pass? If all users pass
string literals the compiler/0day bot should catch those, but as the
very example you give in 2/2 shows, not everybody passes string literals.

Maybe git grep would quickly tell that there's only 8 callers and they
are all audited quickly or something like that; in that case please
include a note to that effect in the commit log.

Rasmus

Re: [PATCH v1 1/1] kernel.h: Split out panic and oops helpers

2021-04-08 Thread Rasmus Villemoes

On 06/04/2021 15.31, Andy Shevchenko wrote:
> kernel.h is being used as a dump for all kinds of stuff for a long time.
> Here is the attempt to start cleaning it up by splitting out panic and
> oops helpers.

Yay.

Acked-by: Rasmus Villemoes 

> At the same time convert users in header and lib folder to use new header.
> Though for time being include new header back to kernel.h to avoid twisted
> indirected includes for existing users.

I think it would be good to have some place to note that "This #include
is just for backwards compatibility, it will go away RealSoonNow, so if
you rely on something from linux/panic.h, include that explicitly
yourself TYVM. And if you're looking for a janitorial task, write a
script to check that every file that uses some identifier defined in
panic.h actually includes that file. When all offenders are found and
dealt with, remove the #include and this note.".

> +
> +struct taint_flag {
> + char c_true;/* character printed when tainted */
> + char c_false;   /* character printed when not tainted */
> + bool module;/* also show as a per-module taint flag */
> +};
> +
> +extern const struct taint_flag taint_flags[TAINT_FLAGS_COUNT];

While you're doing this, nothing outside of kernel/panic.c cares about
the definition of struct taint_flag or use the taint_flags array, so
could you make the definition private to that file and make the array
static? (Another patch, of course.)

> +enum lockdep_ok {
> + LOCKDEP_STILL_OK,
> + LOCKDEP_NOW_UNRELIABLE,
> +};
> +
> +extern const char *print_tainted(void);
> +extern void add_taint(unsigned flag, enum lockdep_ok);
> +extern int test_taint(unsigned flag);
> +extern unsigned long get_taint(void);

I know you're just moving code, but it would be a nice opportunity to
drop the redundant externs.

Rasmus

Re: [PATCH v6 01/27] mm: Introduce struct folio

2021-04-08 Thread Rasmus Villemoes

On 31/03/2021 20.47, Matthew Wilcox (Oracle) wrote:

> +static inline void folio_build_bug(void)
> +{
> +#define FOLIO_MATCH(pg, fl)  \
> +BUILD_BUG_ON(offsetof(struct page, pg) != offsetof(struct folio, fl));
> +
> + FOLIO_MATCH(flags, flags);
> + FOLIO_MATCH(lru, lru);
> + FOLIO_MATCH(mapping, mapping);
> + FOLIO_MATCH(index, index);
> + FOLIO_MATCH(private, private);
> + FOLIO_MATCH(_mapcount, _mapcount);
> + FOLIO_MATCH(_refcount, _refcount);
> +#ifdef CONFIG_MEMCG
> + FOLIO_MATCH(memcg_data, memcg_data);
> +#endif
> +#undef FOLIO_MATCH
> + BUILD_BUG_ON(sizeof(struct page) != sizeof(struct folio));
> +}
> +

Perhaps do this next to the definition of struct folio instead of hiding
it in some arbitrary TU - hint, we have static_assert() that doesn't
need to be in function context. And consider amending FOLIO_MATCH by a
static_assert(__same_type(typeof_member(...), typeof_member(...))).

Rasmus

Re: [greybus-dev] [PATCH] greybus: remove stray nul byte in apb_log_enable_read output

2021-03-26 Thread Rasmus Villemoes

On 26/03/2021 17.31, Alex Elder wrote:
> On 3/26/21 10:22 AM, Rasmus Villemoes wrote:
>> Including a nul byte in the otherwise human-readable ascii output
>> from this debugfs file is probably not intended.
> 
> Looking only at the comments above simple_read_from_buffer(),
> the last argument is the size of the buffer (tmp_buf in this
> case).  So "3" is proper.

The size of the buffer is 3 because that's what sprintf() is gonna need
to print one digit, '\n' and a nul byte. That doesn't necessarily imply
that the entire buffer is meant to be sent to userspace.

> But looking at callers, it seems that the trailing NUL is
> often excluded this way.
> 
> I don't really have a problem with your patch, but could
> you explain why having the NUL byte included is an actual
> problem?  A short statement about that would provide better
> context than just "probably not intended."

debugfs files are AFAIK intended to be used with simple "cat foo", "echo
1 > foo" in shell (scripts and interactive). Having non-printable
characters returned from that "cat foo" is odd (and can sometimes break
scripts that e.g. "grep 1 foo/*/*/bar" when grep stops because it thinks
one of the files is binary, or when the output of that is further piped
somewhere).

At the very least, it's inconsistent for this one, those in
greybus/svc.c do just return the ascii digits and the newline (and if
one followed your argument above and let those pass 16 instead of desc
one would leak a few bytes of uninitialized kernel stack to userspace).

I said "probably not intended" because for all I know, it might be
intentional. I just doubt it.

Rasmus

[PATCH] greybus: remove stray nul byte in apb_log_enable_read output

2021-03-26 Thread Rasmus Villemoes

Including a nul byte in the otherwise human-readable ascii output
from this debugfs file is probably not intended.

Signed-off-by: Rasmus Villemoes 
---
 drivers/greybus/es2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/greybus/es2.c b/drivers/greybus/es2.c
index 48ad154df3a7..86a7fbc7fe13 100644
--- a/drivers/greybus/es2.c
+++ b/drivers/greybus/es2.c
@@ -1171,7 +1171,7 @@ static ssize_t apb_log_enable_read(struct file *f, char 
__user *buf,
char tmp_buf[3];
 
sprintf(tmp_buf, "%d\n", enable);
-   return simple_read_from_buffer(buf, count, ppos, tmp_buf, 3);
+   return simple_read_from_buffer(buf, count, ppos, tmp_buf, 2);
 }
 
 static ssize_t apb_log_enable_write(struct file *f, const char __user *buf,
-- 
2.29.2

[PATCH] debugfs: drop pointless nul-termination in debugfs_read_file_bool()

2021-03-26 Thread Rasmus Villemoes

simple_read_from_buffer() doesn't care about any bytes in the buffer
beyond "available". Making the buffer nul-terminated is therefore
completely pointless.

Signed-off-by: Rasmus Villemoes 
---
 fs/debugfs/file.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 686e0ad28788..9979d981e9be 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -773,7 +773,7 @@ EXPORT_SYMBOL_GPL(debugfs_create_atomic_t);
 ssize_t debugfs_read_file_bool(struct file *file, char __user *user_buf,
   size_t count, loff_t *ppos)
 {
-   char buf[3];
+   char buf[2];
bool val;
int r;
struct dentry *dentry = F_DENTRY(file);
@@ -789,7 +789,6 @@ ssize_t debugfs_read_file_bool(struct file *file, char 
__user *user_buf,
else
buf[0] = 'N';
buf[1] = '\n';
-   buf[2] = 0x00;
return simple_read_from_buffer(user_buf, count, ppos, buf, 2);
 }
 EXPORT_SYMBOL_GPL(debugfs_read_file_bool);
-- 
2.29.2

Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()

2021-03-26 Thread Rasmus Villemoes

On 26/03/2021 15.22, Peter Zijlstra wrote:
> On Fri, Mar 26, 2021 at 01:53:59PM +0100, Rasmus Villemoes wrote:
>> On 26/03/2021 12.38, Peter Zijlstra wrote:
> 
>>> +
>>> +again:
>>> +   rcu_read_lock();
>>> +   str = rcu_dereference(*(char **)file->private_data);
>>> +   len = strlen(str) + 1;
>>> +
>>> +   if (!copy || copy_len < len) {
>>> +   rcu_read_unlock();
>>> +   kfree(copy);
>>> +   copy = kmalloc(len + 1, GFP_KERNEL);
>>> +   if (!copy) {
>>> +   debugfs_file_put(dentry);
>>> +   return -ENOMEM;
>>> +   }
>>> +   copy_len = len;
>>> +   goto again;
>>> +   }
>>> +
>>> +   strncpy(copy, str, len);
>>> +   copy[len] = '\n';
>>> +   copy[len+1] = '\0';
>>> +   rcu_read_unlock();
>>
>> As noted (accidentally off-list), this is broken. I think you want this
>> on top
>>
>> - len = strlen(str) + 1;
>> + len = strlen(str);
> 
>   kmalloc(len + 2, ...);

No, because nul-terminating the stuff you pass to
simple_read_from_buffer is pointless cargo-culting. Yeah, read_file_bool
does it, but that's just bogus.

>> - strncpy(copy, str, len);
>> + memcpy(copy, str, len);
>>   copy[len] = '\n';
>> - copy[len+1] = '\0';
> 
> I'll go with strscpy() I tihnk, something like:
> 
>   len = strscpy(copy, str, len);
>   if (len < 0)
>   return len;

To what end? The only way that could possibly return -EFOO is if the
nul-terminator in str vanished between the strlen() and here, and in
that case you have bigger problems.

>   copy[len] = '\n';
>   copy[len + 1] = '\0';
> 
>>> +EXPORT_SYMBOL_GPL(debugfs_read_file_str);
>>
>> Why?
> 
> Copy-pasta from debugfs_*_bool(). This thing seems to export everything
> and I figured I'd go along with that.

I thought the convention was not to export anything until the kernel
itself had a (modular) user. But I can't find that stated under
Documentation/ anywhere - but it does say "Every function that is
exported to loadable modules using
``EXPORT_SYMBOL`` or ``EXPORT_SYMBOL_GPL`` should have a kernel-doc
comment.". Anyway, the *_bool stuff doesn't seem to be something to be
copy-pasted.

Rasmus

Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()

2021-03-26 Thread Rasmus Villemoes

On 26/03/2021 13.57, Greg KH wrote:
> On Fri, Mar 26, 2021 at 01:53:59PM +0100, Rasmus Villemoes wrote:
>> On 26/03/2021 12.38, Peter Zijlstra wrote:
>>
>>> Implement debugfs_create_str() to easily display names and such in
>>> debugfs.
>>
>> Patches 7-9 don't seem to add any users of this. What's it for precisely?
> 
> It's used in patch 7, look close :)

Ah, macrology. But the write capability doesn't seem used, and I (still)
fail to see how that could be useful.

>>> +EXPORT_SYMBOL_GPL(debugfs_read_file_str);
>>
>> Why?
> 
> Because there is a user of it?  Yes, it's not in a module, but to make
> it easy, might as well export it now.

No, I was asking why the individual read (and write) methods would need
to be exported. They have even less reason then debugfs_create_str() -
which I also think shouldn't be exported until a modular user shows up.

Rasmus

Re: [PATCH v2 6/9] debugfs: Implement debugfs_create_str()

2021-03-26 Thread Rasmus Villemoes

On 26/03/2021 12.38, Peter Zijlstra wrote:

> Implement debugfs_create_str() to easily display names and such in
> debugfs.

Patches 7-9 don't seem to add any users of this. What's it for precisely?

> +
> +again:
> + rcu_read_lock();
> + str = rcu_dereference(*(char **)file->private_data);
> + len = strlen(str) + 1;
> +
> + if (!copy || copy_len < len) {
> + rcu_read_unlock();
> + kfree(copy);
> + copy = kmalloc(len + 1, GFP_KERNEL);
> + if (!copy) {
> + debugfs_file_put(dentry);
> + return -ENOMEM;
> + }
> + copy_len = len;
> + goto again;
> + }
> +
> + strncpy(copy, str, len);
> + copy[len] = '\n';
> + copy[len+1] = '\0';
> + rcu_read_unlock();

As noted (accidentally off-list), this is broken. I think you want this
on top

- len = strlen(str) + 1;
+ len = strlen(str);

- strncpy(copy, str, len);
+ memcpy(copy, str, len);
  copy[len] = '\n';
- copy[len+1] = '\0';

> +EXPORT_SYMBOL_GPL(debugfs_read_file_str);

Why?

> +
> +ssize_t debugfs_write_file_str(struct file *file, const char __user 
> *user_buf,
> +size_t count, loff_t *ppos)
> +{
> + struct dentry *dentry = F_DENTRY(file);
> + char *old, *new = NULL;
> + int pos = *ppos;
> + int r;
> +
> + r = debugfs_file_get(dentry);
> + if (unlikely(r))
> + return r;
> +
> + old = *(char **)file->private_data;
> +
> + /* only allow strict concattenation */
> + r = -EINVAL;
> + if (pos && pos != strlen(old))
> + goto error;

Other than the synchronize_rcu() below, I don't see any rcu stuff in
here. What prevents one task from kfree'ing old while another computes
its strlen()? Or two tasks from getting the same value of old and both
kfree'ing the same pointer?

And what part of the kernel periodically looks at some string and
decides something needs to be done? Shouldn't a write imply some
notification or callback? I can see the usefulness of the read part to
expose some otherwise-maintained string, but what good does allowing
writes do?

Rasmus

[tip: sched/core] sched/core: Use -EINVAL in sched_dynamic_mode()

2021-03-25 Thread tip-bot2 for Rasmus Villemoes

The following commit has been merged into the sched/core branch of tip:

Commit-ID: c4681f3f1cfcfde0c95ff72f0bdb43f9ffd7f00e
Gitweb:
https://git.kernel.org/tip/c4681f3f1cfcfde0c95ff72f0bdb43f9ffd7f00e
Author:Rasmus Villemoes 
AuthorDate:Thu, 25 Mar 2021 01:45:15 +01:00
Committer: Ingo Molnar 
CommitterDate: Thu, 25 Mar 2021 11:39:13 +01:00

sched/core: Use -EINVAL in sched_dynamic_mode()

-1 is -EPERM which is a somewhat odd error to return from
sched_dynamic_write(). No other callers care about which negative
value is used.

Signed-off-by: Rasmus Villemoes 
Signed-off-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Link: https://lore.kernel.org/r/20210325004515.531631-2-li...@rasmusvillemoes.dk
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1fe9d3f..95bd6ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5384,7 +5384,7 @@ static int sched_dynamic_mode(const char *str)
if (!strcmp(str, "full"))
return preempt_dynamic_full;
 
-   return -1;
+   return -EINVAL;
 }
 
 static void sched_dynamic_update(int mode)

[tip: sched/core] sched/core: Stop using magic values in sched_dynamic_mode()

2021-03-25 Thread tip-bot2 for Rasmus Villemoes

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 7e1b2eb74928b2478fd0630ce6c664334b480d00
Gitweb:
https://git.kernel.org/tip/7e1b2eb74928b2478fd0630ce6c664334b480d00
Author:Rasmus Villemoes 
AuthorDate:Thu, 25 Mar 2021 01:45:14 +01:00
Committer: Ingo Molnar 
CommitterDate: Thu, 25 Mar 2021 11:39:12 +01:00

sched/core: Stop using magic values in sched_dynamic_mode()

Use the enum names which are also what is used in the switch() in
sched_dynamic_update().

Signed-off-by: Rasmus Villemoes 
Signed-off-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Link: https://lore.kernel.org/r/20210325004515.531631-1-li...@rasmusvillemoes.dk
---
 kernel/sched/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3384ea7..1fe9d3f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5376,13 +5376,13 @@ static int preempt_dynamic_mode = preempt_dynamic_full;
 static int sched_dynamic_mode(const char *str)
 {
if (!strcmp(str, "none"))
-   return 0;
+   return preempt_dynamic_none;
 
if (!strcmp(str, "voluntary"))
-   return 1;
+   return preempt_dynamic_voluntary;
 
if (!strcmp(str, "full"))
-   return 2;
+   return preempt_dynamic_full;
 
return -1;
 }

Re: [PATCH] static_call: fix function type mismatch

2021-03-25 Thread Rasmus Villemoes

On 25/03/2021 08.42, Peter Zijlstra wrote:
> On Thu, Mar 25, 2021 at 01:42:41AM +0100, Rasmus Villemoes wrote:
>>> Actually, it looks like I can't select PREEMPT_DYNAMIC> and tweaking Kconfig
>>
>> Ah, there's no prompt on the "bool" line, so it doesn't show up. That
>> seems to be a mistake, since there's an elaborate help text which says
>>
>>   The runtime overhead is negligible with
>> HAVE_STATIC_CALL_INLINE enabled
>>   but if runtime patching is not available for the specific
>> architecture
>>   then the potential overhead should be considered.
>>
>> So it seems that it was meant to be "you can enable this if you really
>> want".
>>
>> to force enable it on arm64 results in a build error
> 
> Right, PREEMPT_DYNAMIC really hard relies on HAVE_STATIC_CALL
> 
> There's an implicit dependency in the select:
> 
> config PREEMPT
>   ...
>   select PREEMPT_DYNAMIC if HAVE_PREEMPT_DYNAMIC

That's not a dependency, that's a "force PREEMPT_DYNAMIC on", and users
on x86 can't deselect PREEMPT_DYNAMIC even if they wanted to.

Having a help text but not providing a prompt string is rather unusual.
What's the point of that paragraph I quoted above if PREEMPT_DYNAMIC is
not supposed to be settable by the developer?

>>> ("implicit declaration of function 'static_call_mod'").
>>
>> Seems to be an omission in the last !HAVE_STATIC_CALL branch in
>> static_call_types.h, and there's also no
>> EXPORT_STATIC_CALL_TRAMP{,_GPL} in static_call.h for that case.
> 
> That interface doesn't make sense for !HAVE_STATIC_CALL.

Perhaps, perhaps not. But I think it's silly to have code with such a
random hidden dependency, especially when it's only a defense against
crazy oot modules and not some fundamental requirement.

> It's impossible
> to not export the function pointer itself but still call it for
> !HAVE_STATIC_CALL.

Well, I think there's a way. At this point, the audience is asked to
wear sun glasses.

// foo.h
extern const int foo;
extern int __foo_just_for_vmlinux;

// foo.c
int __foo_just_for_vmlinux;
extern const int foo __attribute__((__alias__("__foo_just_for_vmlinux")));
EXPORT_SYMBOL(foo);

Modules can read foo, but can't do foo = 5. (Yeah, they can take the
address and cast away the const...). Basically, this is a kind of
top-level anonymous union trick a la i_nlink/__i_nlink. And it's more or
less explicitly said in the gcc docs that this is supposed to work:
"Except for top-level qualifiers the alias target must have the same
type as the alias."

Rasmus

[PATCH 2/2] sched: use -EINVAL in sched_dynamic_mode

2021-03-24 Thread Rasmus Villemoes

-1 is -EPERM which is a somewhat odd error to return from
sched_dynamic_write(). No other callers care about which negative
value is used.

Signed-off-by: Rasmus Villemoes 
---
 kernel/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3959481261bb..5b29261668a6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5378,7 +5378,7 @@ static int sched_dynamic_mode(const char *str)
if (!strcmp(str, "full"))
return preempt_dynamic_full;
 
-   return -1;
+   return -EINVAL;
 }
 
 static void sched_dynamic_update(int mode)
-- 
2.29.2

[PATCH 1/2] sched: stop using magic values in sched_dynamic_mode

2021-03-24 Thread Rasmus Villemoes

Use the enum names which are also what is used in the switch() in
sched_dynamic_update().

Signed-off-by: Rasmus Villemoes 
---
 kernel/sched/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 98191218d891..3959481261bb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5370,13 +5370,13 @@ static int preempt_dynamic_mode = preempt_dynamic_full;
 static int sched_dynamic_mode(const char *str)
 {
if (!strcmp(str, "none"))
-   return 0;
+   return preempt_dynamic_none;
 
if (!strcmp(str, "voluntary"))
-   return 1;
+   return preempt_dynamic_voluntary;
 
if (!strcmp(str, "full"))
-   return 2;
+   return preempt_dynamic_full;
 
return -1;
 }
-- 
2.29.2

Re: [PATCH] static_call: fix function type mismatch

2021-03-24 Thread Rasmus Villemoes

On 25/03/2021 00.40, Sami Tolvanen wrote:
> On Wed, Mar 24, 2021 at 3:53 PM Rasmus Villemoes
>  wrote:
>>
>> On 24/03/2021 23.34, Sami Tolvanen wrote:
>>> On Wed, Mar 24, 2021 at 2:51 PM Rasmus Villemoes
>>>  wrote:
>>>>
>>>> On 24/03/2021 18.33, Peter Zijlstra wrote:
>>>>> On Wed, Mar 24, 2021 at 05:45:52PM +0100, Rasmus Villemoes wrote:
>>>>>> Sorry, I think I misread the code. The static calls are indeed
>>>>>> initialized with a function with the right prototype. Try adding
>>>>>> "preempt=full" on the command line so that we exercise these lines
>>>>>>
>>>>>>static_call_update(cond_resched,
>>>>>> (typeof(&__cond_resched)) __static_call_return0);
>>>>>> static_call_update(might_resched,
>>>>>> (typeof(&__cond_resched)) __static_call_return0);
>>>>>>
>>>>>> I would expect that to blow up, since we end up calling a long (*)(void)
>>>>>> function using a function pointer of type int (*)(void).
>>>>>
>>>>> Note that on x86 there won't actually be any calling of function
>>>>> pointers. See what arch/x86/kernel/static_call.c does :-)
>>>>
>>>> I know, but so far x86 is the only one with HAVE_STATIC_CALL, so for
>>>> arm64 which is where CFI seems to be targeted initially, static_calls
>>>> are function pointers. And unless CFI ignores the return type, I'd
>>>> really expect the above to fail.
>>>
>>> I think you're correct, this would trip CFI without HAVE_STATIC_CALL.
>>> However, arm64 also doesn't support PREEMPT_DYNAMIC at the moment, so
>>> this isn't currently a problem there.
>>
>> Well, there's PREEMPT_DYNAMIC and HAVE_PREEMPT_DYNAMIC. The former
>> doesn't depend on the latter (and the latter does depend on
>> HAVE_STATIC_CALL, so effectively not for anything but x86). You should
>> be able to select both PREEMPT_DYNAMIC and CFI_CLANG, and test if
>> booting with preempt=full does give the fireworks one expects.
> 
> Actually, it looks like I can't select PREEMPT_DYNAMIC> and tweaking Kconfig

Ah, there's no prompt on the "bool" line, so it doesn't show up. That
seems to be a mistake, since there's an elaborate help text which says

  The runtime overhead is negligible with
HAVE_STATIC_CALL_INLINE enabled
  but if runtime patching is not available for the specific
architecture
  then the potential overhead should be considered.

So it seems that it was meant to be "you can enable this if you really
want".

to force enable it on arm64 results in a build error
> ("implicit declaration of function 'static_call_mod'").

Seems to be an omission in the last !HAVE_STATIC_CALL branch in
static_call_types.h, and there's also no
EXPORT_STATIC_CALL_TRAMP{,_GPL} in static_call.h for that case.

Rasmus

Re: [PATCH] static_call: fix function type mismatch

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 23.34, Sami Tolvanen wrote:
> On Wed, Mar 24, 2021 at 2:51 PM Rasmus Villemoes
>  wrote:
>>
>> On 24/03/2021 18.33, Peter Zijlstra wrote:
>>> On Wed, Mar 24, 2021 at 05:45:52PM +0100, Rasmus Villemoes wrote:
>>>> Sorry, I think I misread the code. The static calls are indeed
>>>> initialized with a function with the right prototype. Try adding
>>>> "preempt=full" on the command line so that we exercise these lines
>>>>
>>>>static_call_update(cond_resched,
>>>> (typeof(&__cond_resched)) __static_call_return0);
>>>> static_call_update(might_resched,
>>>> (typeof(&__cond_resched)) __static_call_return0);
>>>>
>>>> I would expect that to blow up, since we end up calling a long (*)(void)
>>>> function using a function pointer of type int (*)(void).
>>>
>>> Note that on x86 there won't actually be any calling of function
>>> pointers. See what arch/x86/kernel/static_call.c does :-)
>>
>> I know, but so far x86 is the only one with HAVE_STATIC_CALL, so for
>> arm64 which is where CFI seems to be targeted initially, static_calls
>> are function pointers. And unless CFI ignores the return type, I'd
>> really expect the above to fail.
> 
> I think you're correct, this would trip CFI without HAVE_STATIC_CALL.
> However, arm64 also doesn't support PREEMPT_DYNAMIC at the moment, so
> this isn't currently a problem there.

Well, there's PREEMPT_DYNAMIC and HAVE_PREEMPT_DYNAMIC. The former
doesn't depend on the latter (and the latter does depend on
HAVE_STATIC_CALL, so effectively not for anything but x86). You should
be able to select both PREEMPT_DYNAMIC and CFI_CLANG, and test if
booting with preempt=full does give the fireworks one expects.

Rasmus

Re: [RFC patch] vsprintf: Allow %pe to print non PTR_ERR %pe uses as decimal

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 23.18, Joe Perches wrote:
> On Wed, 2021-03-24 at 22:27 +0100, Rasmus Villemoes wrote:
>> On 24/03/2021 20.24, Joe Perches wrote:
>>> On Wed, 2021-03-24 at 18:33 +0100, Rasmus Villemoes wrote:
>>>> On 24/03/2021 18.20, Joe Perches wrote:
>>>>
>>>>>
>>>>> Maybe it's better to output non PTR_ERR %pe uses as decimal so this
>>>>> sort of code would work.
>>>>
>>>> No, because that would leak the pointer value when somebody has
>>>> accidentally passed a real kernel pointer to %pe.
>>>
>>> I think it's not really an issue.
>>>
>>> _All_ code that uses %p extensions need inspection anyway.
>>
>> There are now a bunch of sanity checks in place that catch e.g. an
>> ERR_PTR passed to an extension that would derefence the pointer;
>> enforcing that only ERR_PTRs are passed to %pe (or falling back to %p)
>> is another of those safeguards.
>>
>>> It's already possible to intentionally 'leak' the ptr value
>>> by using %pe, -ptr so I think that's not really an issue.
>>>
>>
>> Huh, what? I assume -ptr is shorthand for (void*)-(unsigned long)ptr.
>> How would that leak the value if ptr is an ordinary kernel pointer?
>> That's not an ERR_PTR unless (unsigned long)ptr is < 4095 or so.
> 
> You are confusing ERR_PTR with IS_ERR

No I'm not, I'm just being slightly sloppy - obviously when I say "not
an ERR_PTR" I mean "not the result of ERR_PTR applied to a negative
errno value", or "not the result of a valid invocation of ERR_PTR". But
yes, feel free to read "not an ERR_PTR" as "something for which IS_ERR
is false".

Can you expand on why you think %pe, -ptr  would leak the value of ptr?

>> If you want to print the pointer value just do %px. No need for silly
>> games.
> 
> There's no silly game here.  %pe would either print a string or a value.

A hashed value, that is, never the raw value.

> It already does that in 2 cases.

Yes, if you pass it ERR_PTR(-1234) (where no E symbol exists) or
ERR_PTR(-EINVAL) but CONFIG_SYMBOLIC_ERRNAME=n, it prints the value in
decimal, because people will probably recognize "-22" and values in that
range don't reveal anything about the kernel image. Anything outside
[-4095,0] or so is hashed.

Rasmus

Re: [PATCH v2 04/12] module: Add printk format to add module build ID to stacktraces

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 20.11, Stephen Boyd wrote:
> Quoting Rasmus Villemoes (2021-03-24 02:57:13)

>>
>> Is there any reason you didn't just make b an optional flag that could
>> be specified with or without R? I suppose the parsing is more difficult
>> with several orthogonal flags (see escaped_string()), but it's a little
>> easier to understand. Dunno, it's not like we're gonna think of 10 other
>> things that could be printed for a symbol, so perhaps it's fine.
>>
> 
> I think I follow. So %pSb or %pSRb? If it's easier to understand then
> sure. I was trying to avoid checking another character beyond fmt[1] but
> it should be fine if fmt[1] is already 'R'.
> 

I don't know. On the one hand, it seems sensible to allow such "flag"
modifiers to appear independently and in any order. Because what if some
day we think of some other property of the symbol we might want to
provide access to via a "z" flag; when to allow all combinations of the
R, b and z functionality we'd have to use four more random letters to
stand for various combinations of those flags. On the other hand,
vsprintf.c is already a complete wild west of odd conventions for
%p, and it's not like symbol_string() gets extended every other
release, and I can certainly understand the desire to keep the parsing
of fmt minimal. So 'r' to mean 'Rb' is ok by me if you prefer that.

Rasmus

Re: [PATCH] static_call: fix function type mismatch

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 18.33, Peter Zijlstra wrote:
> On Wed, Mar 24, 2021 at 05:45:52PM +0100, Rasmus Villemoes wrote:
>> Sorry, I think I misread the code. The static calls are indeed
>> initialized with a function with the right prototype. Try adding
>> "preempt=full" on the command line so that we exercise these lines
>>
>>static_call_update(cond_resched,
>> (typeof(&__cond_resched)) __static_call_return0);
>> static_call_update(might_resched,
>> (typeof(&__cond_resched)) __static_call_return0);
>>
>> I would expect that to blow up, since we end up calling a long (*)(void)
>> function using a function pointer of type int (*)(void).
> 
> Note that on x86 there won't actually be any calling of function
> pointers. See what arch/x86/kernel/static_call.c does :-)

I know, but so far x86 is the only one with HAVE_STATIC_CALL, so for
arm64 which is where CFI seems to be targeted initially, static_calls
are function pointers. And unless CFI ignores the return type, I'd
really expect the above to fail.

> But I think some of this code might need some __va_function() love when
> combined with CFI.

Well, that was also my first thought when reading through the CFI
patches, I hoped that might salvage my
reduce-boilerplate-and-get-better-type-safety proposal for the
devm_*_action APIs [1]. But I don't think that would help at all;
storing __va_function(__static_call_return0) instead of
&__static_call_return0  (i.e., __static_call_return0 instead of
__static_call_return0.cfi_jt) doesn't help the call sites of that
static_call at all, neither address belongs to the range of jump table
entries corresponding to the prototype "int (*)(void)". So I think it
would be the static_call macro that would somehow need to grow a way to
suppress the cfi checking.

Rasmus

[1]
https://lore.kernel.org/lkml/20210309235917.2134565-1-li...@rasmusvillemoes.dk/

Re: [RFC patch] vsprintf: Allow %pe to print non PTR_ERR %pe uses as decimal

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 20.24, Joe Perches wrote:
> On Wed, 2021-03-24 at 18:33 +0100, Rasmus Villemoes wrote:
>> On 24/03/2021 18.20, Joe Perches wrote:
>>
>>>
>>> Maybe it's better to output non PTR_ERR %pe uses as decimal so this
>>> sort of code would work.
>>
>> No, because that would leak the pointer value when somebody has
>> accidentally passed a real kernel pointer to %pe.
> 
> I think it's not really an issue.
> 
> _All_ code that uses %p extensions need inspection anyway.

There are now a bunch of sanity checks in place that catch e.g. an
ERR_PTR passed to an extension that would derefence the pointer;
enforcing that only ERR_PTRs are passed to %pe (or falling back to %p)
is another of those safeguards.

> It's already possible to intentionally 'leak' the ptr value
> by using %pe, -ptr so I think that's not really an issue.
> 

Huh, what? I assume -ptr is shorthand for (void*)-(unsigned long)ptr.
How would that leak the value if ptr is an ordinary kernel pointer?
That's not an ERR_PTR unless (unsigned long)ptr is < 4095 or so.

If you want to print the pointer value just do %px. No need for silly
games. What I'm talking about is preventing _un_intentionally leaking a
valid kernel pointer value. So no, a non-ERR_PTR passed to %pe is not
going to be printed as-is, not in decimal or hexadecimal or roman numerals.

>> If the code wants a cute -EFOO string explaining what's wrong, what
>> about "%pe", ERR_PTR(mux < 0 : mux : -ERANGE)? Or two separate error
>> messages
>>
>> if (mux < 0)
>>   ...
>> else if (mux >= ARRAY_SIZE())
>>   ...
> 
> Multiple tests, more unnecessary code, multiple format strings, etc...

Agreed, I'm not really advocating for the latter; the former suggestion
is IMO a pretty concise way of providing useful information in dmesg.

Rasmus

Re: [RFC patch] vsprintf: Allow %pe to print non PTR_ERR %pe uses as decimal

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 18.20, Joe Perches wrote:
> On Wed, 2021-03-24 at 09:52 -0700, Joe Perches wrote:
>> On Wed, 2021-03-24 at 17:42 +0100, Arnd Bergmann wrote:
>>> On Wed, Mar 24, 2021 at 3:20 PM Joe Perches  wrote:
>> []
> diff --git a/drivers/gpu/drm/imx/imx-ldb.c b/drivers/gpu/drm/imx/imx-ldb.c
 []
> @@ -197,6 +197,12 @@ static void imx_ldb_encoder_enable(struct 
> drm_encoder *encoder)
>   int dual = ldb->ldb_ctrl & LDB_SPLIT_MODE_EN;
>   int mux = drm_of_encoder_active_port_id(imx_ldb_ch->child, encoder);
>
> + if (mux < 0 || mux >= ARRAY_SIZE(ldb->clk_sel)) {
> + dev_warn(ldb->dev, "%s: invalid mux %d\n",
> +  __func__, ERR_PTR(mux));

 This does not compile without warnings.

 drivers/gpu/drm/imx/imx-ldb.c: In function ‘imx_ldb_encoder_enable’:
 drivers/gpu/drm/imx/imx-ldb.c:201:22: warning: format ‘%d’ expects 
 argument of type ‘int’, but argument 4 has type ‘void *’ [-Wformat=]
   201 |   dev_warn(ldb->dev, "%s: invalid mux %d\n",
   |  ^~

 If you want to use ERR_PTR, the %d should be %pe as ERR_PTR
 is converting an int a void * to decode the error type and
 emit it as a string.
>>>
>>> Sorry about that.
>>>
>>> I decided against using ERR_PTR() in order to also check for
>>> positive array overflow, but the version I tested was different from
>>> the version I sent.
>>>
>>> v3 coming.
>>
>> Thanks.  No worries.
>>
>> Up to you, vsprintf would emit the positive mux as a funky hashed
>> hex value by default if you use ERR_PTR with mux > ARRAY_SIZE so
>> perhaps %d without the ERR_PTR use makes the most sense.
>>

> 
> Maybe it's better to output non PTR_ERR %pe uses as decimal so this
> sort of code would work.

No, because that would leak the pointer value when somebody has
accidentally passed a real kernel pointer to %pe.

If the code wants a cute -EFOO string explaining what's wrong, what
about "%pe", ERR_PTR(mux < 0 : mux : -ERANGE)? Or two separate error
messages

if (mux < 0)
  ...
else if (mux >= ARRAY_SIZE())
  ...

Rasmus

Re: [PATCH] static_call: fix function type mismatch

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 17.01, Sami Tolvanen wrote:
> On Wed, Mar 24, 2021 at 5:46 AM Rasmus Villemoes
>  wrote:
>>
>> On 23/03/2021 08.47, Peter Zijlstra wrote:
>>> On Mon, Mar 22, 2021 at 05:29:21PM -0400, Steven Rostedt wrote:
>>>> On Mon, 22 Mar 2021 22:18:17 +0100
>>>> Arnd Bergmann  wrote:
>>>>
>>>>> I think the code works correctly on all architectures we support because
>>>>> both 'int' and 'long' are returned in a register with any unused bits 
>>>>> cleared.
>>>>> It is however undefined behavior in C because 'int' and 'long' are not
>>>>> compatible types, and the calling conventions don't have to allow this.
>>>>
>>>> Static calls (and so do tracepoints) currently rely on these kind of
>>>> "undefined behavior" in C. This isn't the only UB that it relies on.
>>>
>>> Right, most of the kernel lives in UB. That's what we have -fwrapv
>>> -fno-strict-aliassing and lots of other bits for, to 'fix' the stupid C
>>> standard.
>>>
>>> This is one more of them, so just ignore the warning and make it go
>>> away:
>>>
>>>  -Wno-cast-function-type
>>>
>>> seems to be the magic knob.
>>>
>>
>> That can be done for now, but I think something has to be done if CFI is
>> to become a thing.
>>
>> Sami, what happens if you try to boot a
>> CONFIG_CFI_CLANG+CONFIG_PREEMPT_DYNAMIC kernel?
> 
> Seems to boot just fine. CFI instrumentation is only for
> compiler-generated indirect calls. Casting functions to different
> types is fine as long as you don't end up calling them using an
> incorrect pointer type.

Sorry, I think I misread the code. The static calls are indeed
initialized with a function with the right prototype. Try adding
"preempt=full" on the command line so that we exercise these lines

   static_call_update(cond_resched,
(typeof(&__cond_resched)) __static_call_return0);
static_call_update(might_resched,
(typeof(&__cond_resched)) __static_call_return0);

I would expect that to blow up, since we end up calling a long (*)(void)
function using a function pointer of type int (*)(void).

Rasmus

Re: [PATCH v3 02/17] cfi: add __cficanonical

2021-03-24 Thread Rasmus Villemoes

On 23/03/2021 21.39, Sami Tolvanen wrote:
> With CONFIG_CFI_CLANG, the compiler replaces a function address taken
> in C code with the address of a local jump table entry, which passes
> runtime indirect call checks. However, the compiler won't replace
> addresses taken in assembly code, which will result in a CFI failure
> if we later jump to such an address in instrumented C code. The code
> generated for the non-canonical jump table looks this:
> 
>   : /* In C,  points here */
>   jmp noncanonical
>   ...
>   :/* function body */
>   ...
> 
> This change adds the __cficanonical attribute, which tells the
> compiler to use a canonical jump table for the function instead. This
> means the compiler will rename the actual function to .cfi
> and points the original symbol to the jump table entry instead:
> 
>   :   /* jump table entry */
>   jmp canonical.cfi
>   ...
>   :   /* function body */
>   ...
> 
> As a result, the address taken in assembly, or other non-instrumented
> code always points to the jump table and therefore, can be used for
> indirect calls in instrumented code without tripping CFI checks.

Random ramblings, I'm trying to understand how this CFI stuff works.

First, patch 1 and 2 explain the pros and cons of canonical vs
non-canonical jump tables, in either case, there's problems with stuff
implemented in assembly. But I don't understand why those pros and cons
then end up with using the non-canonical jump tables by default. IIUC,
with canonical jump tables, function pointer equality would keep working
for functions implemented in C, because  would always refer to the
same stub "function" that lives in the same object file as func.cfi,
whereas with the non-canonical version, each TU (or maybe DSO) that
takes the address of func ends up with its own func.cfi_jt.

There are of course lots of direct calls of assembly functions, but
I don't think we take the address of such functions very often. So why
can't we instead equip the declarations of those with a
__cfi_noncanonical attribute?

And now, more directed at the clang folks on cc:

As to how CFI works, I've tried to make sense of the clang docs. So at
place where some int (*)(long, int) function pointer is called, the
compiler computes (roughly) md5sum("int (*)(long, int)") and uses the
first 8 bytes as a cookie representing that type. It then goes to some
global table of jump table ranges indexed by that cookie and checks that
the address it is about to call is within that range. All jump table
entries for one type of function are consecutive in memory (with
complications arising from cross-DSO calls).

What I don't understand about all this is why that indirection through
some hidden global table and magic jump table (whether canonical or not)
is even needed in the simple common case of ordinary C functions. Why
can't the compiler just emit the cookie corresponding to a given
function's prototype immediately prior to the function? Then the inline
check would just be "if (*(u64*)((void*)func - 8) == cookie)" and
function pointer comparison would just work because there's no magic
involved when doing  Cross-DSO calls of C function have no extra
cost to look up a __cfi_check function in the target DSO. An indirect
call doesn't touch at least two extra cache lines (the range table and
the jump table entry). It seems to rely on LTO anyway, so it's not even
that the compiler would have to emit that cookie for every single
function, it knows at link time which functions have their address
taken. Calling functions implemented in assembly through a function
pointer will have the same problem as with the "canonical" jump table
approach, but with a suitable attribute on those surely the compiler
could emit a func.cfi_hoop

  .quad 0x1122334455667788 // cookie
  :
jmp func

and perhaps no such attribute would even be needed (with LTO, the
compiler should be able to see "hey, I don't know that function, it's
probably implemented in assembly, so lemme emit that trampoline with a
cookie in front and redirect address-of to that").

Rasmus

Re: [PATCH] amdgpu: fix gcc -Wrestrict warning

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 14.33, Arnd Bergmann wrote:
> On Tue, Mar 23, 2021 at 4:57 PM Rasmus Villemoes
>  wrote:
>> On 23/03/2021 14.04, Arnd Bergmann wrote:
>>>   if (securedisplay_cmd->status == 
>>> TA_SECUREDISPLAY_STATUS__SUCCESS) {
>>> + int pos = 0;
>>>   memset(i2c_output,  0, sizeof(i2c_output));
>>>   for (i = 0; i < 
>>> TA_SECUREDISPLAY_I2C_BUFFER_SIZE; i++)
>>> - sprintf(i2c_output, "%s 0x%X", 
>>> i2c_output,
>>> + pos += sprintf(i2c_output + pos, " 
>>> 0x%X",
>>>   
>>> securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf[i]);
>>>   dev_info(adev->dev, "SECUREDISPLAY: I2C 
>>> buffer out put is :%s\n", i2c_output);
>>
>> Eh, why not get rid of the 256 byte stack allocation and just replace
>> all of this by
>>
>>   dev_info(adev->dev, ""SECUREDISPLAY: I2C buffer out put is: %*ph\n",
>> TA_SECUREDISPLAY_I2C_BUFFER_SIZE,
>> securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf);
>>
>> That's much less code (both in #LOC and .text), and avoids adding yet
>> another place that will be audited over and over for "hm, yeah, that
>> sprintf() is actually not gonna overflow".
>>
>> Yeah, it'll lose the 0x prefixes for each byte and use lowercase hex chars.
> 
> Ah, I didn't know the kernel's sprintf could do that, that's really nice.

If you're bored, you can "git grep -E -C4 '%[0.]2[xX]'" and find places
that are inside a small loop, many can trivially be converted to %ph,
though often with some small change in formatting. If you're lucky, you
even get to fix real bugs when people pass a "char" to %02x and "know"
that that will produce precisely two bytes of output, so they've sized
their stack buffer accordingly - boom when "char" happens to be signed
and one of the bytes have a value beyond ascii and %02x produces 0xffXX.

%ph has a hard-coded upper bound of 64 bytes, I think that's silly
because people instead do these inefficient and very verbose loops
instead, wasting stack, .text and runtime.

Rasmus

Re: [PATCH] [v2] hinic: avoid gcc -Wrestrict warning

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 14.07, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> With extra warnings enabled, gcc complains that snprintf should not
> take the same buffer as source and destination:
> 
> drivers/net/ethernet/huawei/hinic/hinic_ethtool.c: In function 
> 'hinic_set_settings_to_hw':
> drivers/net/ethernet/huawei/hinic/hinic_ethtool.c:480:9: error: 'snprintf' 
> argument 4 overlaps destination object 'set_link_str' [-Werror=restrict]
>   480 |   err = snprintf(set_link_str, SET_LINK_STR_MAX_LEN,
>   | ^~~~
>   481 |   "%sspeed %d ", set_link_str, speed);
>   |   ~~~
> drivers/net/ethernet/huawei/hinic/hinic_ethtool.c:464:7: note: destination 
> object referenced by 'restrict'-qualified argument 1 was declared here
>   464 |  char set_link_str[SET_LINK_STR_MAX_LEN] = {0};
> 
> Rewrite this to avoid the nested sprintf and instead use separate
> buffers, which is simpler.
> 

This looks much better. Thanks.

> Cc: Rasmus Villemoes 
> Signed-off-by: Arnd Bergmann 
> ---
> v2: rework according to feedback from Rasmus. This one could
> easily avoid most of the pitfalls
> ---
>  .../net/ethernet/huawei/hinic/hinic_ethtool.c | 25 ---
>  1 file changed, 10 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c 
> b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
> index c340d9acba80..d7e20dab6e48 100644
> --- a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
> @@ -34,7 +34,7 @@
>  #include "hinic_rx.h"
>  #include "hinic_dev.h"
>  
> -#define SET_LINK_STR_MAX_LEN 128
> +#define SET_LINK_STR_MAX_LEN 16
>  
>  #define GET_SUPPORTED_MODE   0
>  #define GET_ADVERTISED_MODE  1
> @@ -462,24 +462,19 @@ static int hinic_set_settings_to_hw(struct hinic_dev 
> *nic_dev,
>  {
>   struct hinic_link_ksettings_info settings = {0};
>   char set_link_str[SET_LINK_STR_MAX_LEN] = {0};
> + const char *autoneg_str;
>   struct net_device *netdev = nic_dev->netdev;
>   enum nic_speed_level speed_level = 0;
>   int err;
>  
> - err = snprintf(set_link_str, SET_LINK_STR_MAX_LEN, "%s",
> -(set_settings & HILINK_LINK_SET_AUTONEG) ?
> -(autoneg ? "autong enable " : "autong disable ") : "");
> - if (err < 0 || err >= SET_LINK_STR_MAX_LEN) {
> - netif_err(nic_dev, drv, netdev, "Failed to snprintf link state, 
> function return(%d) and dest_len(%d)\n",
> -   err, SET_LINK_STR_MAX_LEN);
> - return -EFAULT;
> - }
> + autoneg_str = (set_settings & HILINK_LINK_SET_AUTONEG) ?
> +   (autoneg ? "autong enable " : "autong disable ") : "";
>  
>   if (set_settings & HILINK_LINK_SET_SPEED) {
>   speed_level = hinic_ethtool_to_hw_speed_level(speed);
>   err = snprintf(set_link_str, SET_LINK_STR_MAX_LEN,
> -"%sspeed %d ", set_link_str, speed);
> - if (err <= 0 || err >= SET_LINK_STR_MAX_LEN) {
> +"speed %d ", speed);
> + if (err >= SET_LINK_STR_MAX_LEN) {
>   netif_err(nic_dev, drv, netdev, "Failed to snprintf 
> link speed, function return(%d) and dest_len(%d)\n",
> err, SET_LINK_STR_MAX_LEN);
>   return -EFAULT;

It's not your invention of course, but this both seems needlessly harsh
and EFAULT is a weird error to return. It's just a printk() message that
might be truncated, and now that the format string only has a %d
specifier, it can actually be verified statically that overflow will
never happen (though I don't know or think gcc can do that, perhaps
there's some locale nonsense in the standard that allows using
utf16-encoded sanskrit runes). So probably that test should just be
dropped, but that's a separate thing.

Reviewed-by: Rasmus Villemoes

Re: [PATCH] static_call: fix function type mismatch

2021-03-24 Thread Rasmus Villemoes

On 23/03/2021 08.47, Peter Zijlstra wrote:
> On Mon, Mar 22, 2021 at 05:29:21PM -0400, Steven Rostedt wrote:
>> On Mon, 22 Mar 2021 22:18:17 +0100
>> Arnd Bergmann  wrote:
>>
>>> I think the code works correctly on all architectures we support because
>>> both 'int' and 'long' are returned in a register with any unused bits 
>>> cleared.
>>> It is however undefined behavior in C because 'int' and 'long' are not
>>> compatible types, and the calling conventions don't have to allow this.
>>
>> Static calls (and so do tracepoints) currently rely on these kind of
>> "undefined behavior" in C. This isn't the only UB that it relies on.
> 
> Right, most of the kernel lives in UB. That's what we have -fwrapv
> -fno-strict-aliassing and lots of other bits for, to 'fix' the stupid C
> standard.
> 
> This is one more of them, so just ignore the warning and make it go
> away:
> 
>  -Wno-cast-function-type
> 
> seems to be the magic knob.
> 

That can be done for now, but I think something has to be done if CFI is
to become a thing.

Sami, what happens if you try to boot a
CONFIG_CFI_CLANG+CONFIG_PREEMPT_DYNAMIC kernel?

Rasmus

Re: [PATCH v2] editorconfig: Add automatic editor configuration file

2021-03-24 Thread Rasmus Villemoes

On 03/07/2020 14.29, Jonathan Corbet wrote:

[doing a bit of necromancy here]

> On Fri,  3 Jul 2020 00:31:43 -0700
> Danny Lin  wrote:
> 
>> EditorConfig is a standard for defining basic editor configuration in
>> projects. There is support available for 47 code editors as of writing,
>> including both built-in and extension support. Many notable projects
>> have adopted the standard already, including zsh, htop, and qemu.
>>
>> While this isn't a full-fledged C code style specifier, it does set some
>> basic ground rules that make it more convenient for contributors to use
>> any editor of their choice and not have to worry about indentation, line
>> endings, encoding, final newlines, etc. This should make it
>> significantly easier to conform to the kernel's general code style when
>> used in combination with clang-format.
>>
>> For more information, check the official EditorConfig website:
>> https://editorconfig.org/
>>
>> Signed-off-by: Danny Lin 
>> ---
> 
> So I worry a bit that not everybody will welcome the addition of a dotfile
> that may be magically interpreted by their editor.

I would suppose that one would have to enable editorconfig support
explicitly in one's editor, so this would have no effect for people that
haven't done so (though there are almost certainly exceptions).

> I also worry that the
> file itself could become a battleground for people wanting to argue about
> style issues.

I don't think so, not any more than the coding-style document is, and
that seems to be pretty solid (but as doc maintainer, you'd know better).

> 
> Perhaps I worry a bit too much...?

As someone who regularly needs to submit patches to random upstream
projects to fix bugs or implement minor features, I for one would really
welcome more widespread use of editorconfig. While I mostly work with
the linux kernel (and other projects using the same C style), so my
default C style setting is "linux", even for the kernel this would be
helpful to me when I poke around in none-C files (shell scripts, for
example).

Rasmus

Re: [PATCH v2 04/12] module: Add printk format to add module build ID to stacktraces

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 03.04, Stephen Boyd wrote:

> @@ -2778,6 +2793,10 @@ static inline void layout_symtab(struct module *mod, 
> struct load_info *info)
>  static void add_kallsyms(struct module *mod, const struct load_info *info)
>  {
>  }
> +
> +static void init_build_id(struct module *mod, const struct load_info *info)
> +{
> +}
>  #endif /* CONFIG_KALLSYMS */
>  
>  static void dynamic_debug_setup(struct module *mod, struct _ddebug *debug, 
> unsigned int num)
> @@ -4004,6 +4023,7 @@ static int load_module(struct load_info *info, const 
> char __user *uargs,
>   goto free_arch_cleanup;
>   }
>  
> + init_build_id(mod, info);
>   dynamic_debug_setup(mod, info->debug, info->num_debug);
>  
>   /* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
> @@ -4235,7 +4255,7 @@ void * __weak 
> dereference_module_function_descriptor(struct module *mod,
>  const char *module_address_lookup(unsigned long addr,
>   unsigned long *size,
>   unsigned long *offset,
> - char **modname,
> + char **modname, unsigned char **modbuildid,

It's an existing defect with modname, but surely this should take a
"const unsigned char **modbuildid", no?

>   char *namebuf)
>  {
>   const char *ret = NULL;
> @@ -4246,6 +4266,8 @@ const char *module_address_lookup(unsigned long addr,
>   if (mod) {
>   if (modname)
>   *modname = mod->name;
> + if (modbuildid)
> + *modbuildid = mod->build_id;
>  
>   ret = find_kallsyms_symbol(mod, addr, size, offset);
>   }
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 41ddc353ebb8..9cd62e84e4aa 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -961,13 +961,15 @@ char *symbol_string(char *buf, char *end, void *ptr,
>   char sym[KSYM_SYMBOL_LEN];
>  #endif
>  
> - if (fmt[1] == 'R')
> + if (fmt[1] == 'R' || fmt[1] == 'r')
>   ptr = __builtin_extract_return_addr(ptr);
>   value = (unsigned long)ptr;
>  
>  #ifdef CONFIG_KALLSYMS
>   if (*fmt == 'B')
>   sprint_backtrace(sym, value);
> + else if (*fmt == 'S' && (fmt[1] == 'b' || fmt[1] == 'r'))
> + sprint_symbol_stacktrace(sym, value);
>   else if (*fmt != 's')
>   sprint_symbol(sym, value);
>   else
> @@ -2129,6 +2131,8 @@ early_param("no_hash_pointers", 
> no_hash_pointers_enable);
>   * - 'S' For symbolic direct pointers (or function descriptors) with offset
>   * - 's' For symbolic direct pointers (or function descriptors) without 
> offset
>   * - '[Ss]R' as above with __builtin_extract_return_addr() translation
> + * - '[Ss]r' as above with __builtin_extract_return_addr() translation and 
> module build ID
> + * - '[Ss]b' as above with module build ID (for use in backtraces)

The code doesn't quite match the comment. The lowercase s is not handled
(i.e., there's no way to say "without offset, with build ID"). You don't
have to fix the code to support that right now, the whole kallsyms
vsprintf code needs to be reworked inside-out anyway (having vsnprint
call sprint_symbol* which builds the output using snprintf() calls makes
me cringe), so please just replace [Ss] by S to make the comment match
the code - I notice that you did only document the S variant in
printk-formats.rst.

Is there any reason you didn't just make b an optional flag that could
be specified with or without R? I suppose the parsing is more difficult
with several orthogonal flags (see escaped_string()), but it's a little
easier to understand. Dunno, it's not like we're gonna think of 10 other
things that could be printed for a symbol, so perhaps it's fine.

Rasmus

Re: [PATCH v2 02/12] buildid: Add method to get running kernel's build ID

2021-03-24 Thread Rasmus Villemoes

On 24/03/2021 03.04, Stephen Boyd wrote:
> Add vmlinux_build_id() so that callers can print a hex format string
> representation of the running kernel's build ID. This will be used in
> the kdump and dump_stack code so that developers can easily locate the
> vmlinux debug symbols for a crash/stacktrace.
> 
> Cc: Jiri Olsa 
> Cc: Alexei Starovoitov 
> Cc: Jessica Yu 
> Cc: Evan Green 
> Cc: Hsin-Yi Wang 
> Cc: Dave Young 
> Cc: Baoquan He 
> Cc: Vivek Goyal 
> Cc: 
> Signed-off-by: Stephen Boyd 
> ---
>  include/linux/buildid.h |  2 ++
>  lib/buildid.c   | 19 +++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/include/linux/buildid.h b/include/linux/buildid.h
> index ebce93f26d06..2ff6b1b7cc9b 100644
> --- a/include/linux/buildid.h
> +++ b/include/linux/buildid.h
> @@ -10,4 +10,6 @@ int build_id_parse(struct vm_area_struct *vma, unsigned 
> char *build_id,
>  __u32 *size);
>  int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 
> buf_size);
>  
> +const unsigned char *vmlinux_build_id(void);
> +
>  #endif
> diff --git a/lib/buildid.c b/lib/buildid.c
> index 010ab0674cb9..fa1b6466b4b8 100644
> --- a/lib/buildid.c
> +++ b/lib/buildid.c
> @@ -4,6 +4,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define BUILD_ID 3
>  
> @@ -171,3 +172,21 @@ int build_id_parse_buf(const void *buf, unsigned char 
> *build_id, u32 buf_size)
>  {
>   return parse_build_id_buf(build_id, NULL, buf, buf_size);
>  }
> +
> +/**
> + * vmlinux_build_id - Get the running kernel's build ID
> + *
> + * Return: Running kernel's build ID
> + */
> +const unsigned char *vmlinux_build_id(void)
> +{
> + extern const void __start_notes __weak;
> + extern const void __stop_notes __weak;
> + unsigned int size = &__stop_notes - &__start_notes;
> + static unsigned char vmlinux_build_id[BUILD_ID_SIZE_MAX];
> +
> + if (!memchr_inv(vmlinux_build_id, 0, BUILD_ID_SIZE_MAX))
> + build_id_parse_buf(&__start_notes, vmlinux_build_id, size);
> +
> + return vmlinux_build_id;
> +}
> 

Hm, is there any reason to do that initialization lazily and thus need
an accessor? If the system is coming down hard, there's a (very very
small) risk that one thread starts finding the build id, is in the
middle of the memcpy, another thread also ends up wanting the vmlinux
build id, sees some non-nul byte, and proceeds to using the partially
written vmlinux_build_id.

Perhaps consider just exposing the vmlinux_build_id[] array itself,
adding a init_vmlinux_build_id() call somewhere early in start_kernel().

It could then also be made __ro_after_init.

In any case, if you decide to keep the current way, please rename the
local variable (just "build_id" is fine) so that it doesn't shadow the
very function it resides in, that's very confusing.

Rasmus

Re: [PATCH net-next] hinic: avoid gcc -Wrestrict warning

2021-03-23 Thread Rasmus Villemoes

On 23/03/2021 13.56, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> With extra warnings enabled, gcc complains that snprintf should not
> take the same buffer as source and destination:
> 
> drivers/net/ethernet/huawei/hinic/hinic_ethtool.c: In function 
> 'hinic_set_settings_to_hw':
> drivers/net/ethernet/huawei/hinic/hinic_ethtool.c:480:9: error: 'snprintf' 
> argument 4 overlaps destination object 'set_link_str' [-Werror=restrict]
>   480 |   err = snprintf(set_link_str, SET_LINK_STR_MAX_LEN,
>   | ^~~~
>   481 |   "%sspeed %d ", set_link_str, speed);
>   |   ~~~
> drivers/net/ethernet/huawei/hinic/hinic_ethtool.c:464:7: note: destination 
> object referenced by 'restrict'-qualified argument 1 was declared here
>   464 |  char set_link_str[SET_LINK_STR_MAX_LEN] = {0};
> 
> Rewrite this to remember the offset of the previous printf output
> instead.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/net/ethernet/huawei/hinic/hinic_ethtool.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c 
> b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
> index c340d9acba80..74aefc8fc4d8 100644
> --- a/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
> +++ b/drivers/net/ethernet/huawei/hinic/hinic_ethtool.c
> @@ -464,7 +464,7 @@ static int hinic_set_settings_to_hw(struct hinic_dev 
> *nic_dev,
>   char set_link_str[SET_LINK_STR_MAX_LEN] = {0};
>   struct net_device *netdev = nic_dev->netdev;
>   enum nic_speed_level speed_level = 0;
> - int err;
> + int err, off;
>  
>   err = snprintf(set_link_str, SET_LINK_STR_MAX_LEN, "%s",
>  (set_settings & HILINK_LINK_SET_AUTONEG) ?
> @@ -475,10 +475,11 @@ static int hinic_set_settings_to_hw(struct hinic_dev 
> *nic_dev,
>   return -EFAULT;
>   }
>  
> + off = err;
>   if (set_settings & HILINK_LINK_SET_SPEED) {
>   speed_level = hinic_ethtool_to_hw_speed_level(speed);
> - err = snprintf(set_link_str, SET_LINK_STR_MAX_LEN,
> -"%sspeed %d ", set_link_str, speed);
> + err = snprintf(set_link_str + off, SET_LINK_STR_MAX_LEN - off,
> +"speed %d ", speed);
>   if (err <= 0 || err >= SET_LINK_STR_MAX_LEN) {

This is broken, the second snprintf has no longer overflown if "err >=
SET_LINK_STR_MAX_LEN", but if "err >= SET_LINK_STR_MAX_LEN - off". (The
existing err <= 0 check is also bogus, but that's a different story).

But, I think these conversions where you use snprintf are all broken,
it's only a matter of time before gcc or another static analyzer also
learns a
"Wusing-return-value-from-snprintf-as-index-to-output-buffer-is-fragile-because,you-know,snprintf-semantics..."
and then we'd have to revisit all these. You should in general, when
building a string by repeatedly printf'ing to a local buffer, use the
"len += scnprintf()" pattern. That doesn't easily provide a "have we
overflown at some point" so is not directly applicable here, but all the
more reason to start making use of seq_buf to wrap a local char buffer
in a nice abstraction that lets you seq_buf_printf() and ask
seq_buf_has_overflowed().

Rasmus

Re: [kbuild-all] Re: include/linux/compiler_types.h:315:38: error: call to '__compiletime_assert_536' declared with attribute error: BUILD_BUG_ON failed: offsetof(struct can_frame, len) != offsetof(st

2021-03-23 Thread Rasmus Villemoes

On 23/03/2021 19.59, Oliver Hartkopp wrote:
> 
> 
> On 23.03.21 15:00, Rasmus Villemoes wrote:

>> Now what CONFIG_* knobs are responsible for putting -mabi=apcs-gnu in
>> CFLAGS is left as an exercise for the reader. Regardless, it is not a
>> bug in the compiler. The error is the assumption that this language
>>
>> "Aggregates and Unions
>>
>> Structures and unions assume the alignment of their most strictly
>> aligned component.
> 
> (parse error in sentence)

It was a direct quote, but I can try to paraphrase with an example. If
you have a struct foo { T1 m1; T2 m2; T3 m3; }, then alignof(struct foo)
= max(alignof(T1), alignof(T2), alignof(T3)). Same for a "union foo".

But this is specifically for x86-64; for (some flavors of) ARM, other
rules apply - namely, alignof(T) is 4 unless T is char or short (or
(un)signed variants), ignoring bitfields which have their own rules.
Note that while

union u {char a; char b;}

has alignment 4 on ARM and 1 on x86-64, other types are less strictly
aligned on ARM; e.g. s64 aka long long is 8-byte aligned on x86-64 but
(still) just 4-byte aligned on ARM. And again, this is just for specific
-mabi= options.

>> Each member is assigned to the lowest available offset with the
>> appropriate
>> alignment. The size of any object is always a multiple of the object‘s
>> alignment."
>>
>> from the x86-64 ABI applies on all other architectures/ABIs.
>>
>>> I'm not a compiler expert but this does not seem to be consistent.
>>>
>>> Especially as we only have byte sizes (inside and outside of the union)
>>> and "A field with a char type is aligned to the next available byte."
>>
>> Yes, and that's exactly what you got before the anon union was
>> introduced.
> 
> Before(!) the union there is nothing to pad.

Just to be clear, my "before" was in the temporal sense, i.e. "prior to
commit ea7800565a128", all the u8s in struct can_frame were placed one
after the other. But after that commit, struct can_frame has a new
member replacing can_dlc which happens to occupy 4 bytes (for some
ABIs), pushing the subsequent members __pad, __res0 and len8_dlc
(formerly known as __res1) ahead.

>>> The union is indeed aligned to the word boundary - but the following
>>> byte is not aligned to the next available byte.
>>
>> Yes it is, because the union occupies 4 bytes. The first byte is shared
>> by the two char members, the remaining three bytes are padding.
> 
> But why is the union 4 bytes long here and adds a padding of three bytes
> at the end?

Essentially, because arrays. It's true for _any_ type T that sizeof(T)
must be a multiple of alignof(T). Take an array "T x[9]". If x[0] is
4-byte aligned, then in order for x[1] to be 4-byte aligned as well,
x[0] must occupy a multiple of 4 bytes.

It doesn't matter at all that this happens to be an anonymous union.
Layout-wise, you could as well have a definition

union uuu { __u8 len; __u8 can_dlc; }

and made struct can_frame

struct can_frame {
   canid_t can_id;
   union uuu u;
   __u8 __pad;
   ...
};

(you lose the anonymity trick so you'd have to do frame->u.can_dlc
instead of just frame->can_dlc). You have a member with alignof()==4 and
 sizeof()==4; that sizeof() cannot magically become 1 just because that
particular instance of the type is not part of an array. Imagine what
would happen if the compiler pulled subsequent char members into
trailing padding of a previous compound member. E.g. consider

struct a { int x; char y; } // alignof==4, sizeof==8, offsetof(y)==4
struct b { struct a a; char z; }

If I have a "struct b *b", I'm allowed to do ">a" and get a "pointer
to struct a". Then I can do memset(>a, 0, sizeof(struct a)). Clearly,
z must not have been placed inside the trailing padding of struct a.

Rasmus

Re: [PATCH] amdgpu: fix gcc -Wrestrict warning

2021-03-23 Thread Rasmus Villemoes

On 23/03/2021 14.04, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> gcc warns about an sprintf() that uses the same buffer as source
> and destination, which is undefined behavior in C99:
> 
> drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c: In function 
> 'amdgpu_securedisplay_debugfs_write':
> drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c:141:6: error: 'sprintf' 
> argument 3 overlaps destination object 'i2c_output' [-Werror=restrict]
>   141 |  sprintf(i2c_output, "%s 0x%X", i2c_output,
>   |  ^~
>   142 |   
> securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf[i]);
>   |   
> ~
> drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c:97:7: note: destination 
> object referenced by 'restrict'-qualified argument 1 was declared here
>97 |  char i2c_output[256];
>   |   ^~
> 
> Rewrite it to remember the current offset into the buffer instead.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> index 834440ab9ff7..69d7f6bff5d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c
> @@ -136,9 +136,10 @@ static ssize_t amdgpu_securedisplay_debugfs_write(struct 
> file *f, const char __u
>   ret = psp_securedisplay_invoke(psp, 
> TA_SECUREDISPLAY_COMMAND__SEND_ROI_CRC);
>   if (!ret) {
>   if (securedisplay_cmd->status == 
> TA_SECUREDISPLAY_STATUS__SUCCESS) {
> + int pos = 0;
>   memset(i2c_output,  0, sizeof(i2c_output));
>   for (i = 0; i < 
> TA_SECUREDISPLAY_I2C_BUFFER_SIZE; i++)
> - sprintf(i2c_output, "%s 0x%X", 
> i2c_output,
> + pos += sprintf(i2c_output + pos, " 
> 0x%X",
>   
> securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf[i]);
>   dev_info(adev->dev, "SECUREDISPLAY: I2C buffer 
> out put is :%s\n", i2c_output);

Eh, why not get rid of the 256 byte stack allocation and just replace
all of this by

  dev_info(adev->dev, ""SECUREDISPLAY: I2C buffer out put is: %*ph\n",
TA_SECUREDISPLAY_I2C_BUFFER_SIZE,
securedisplay_cmd->securedisplay_out_message.send_roi_crc.i2c_buf);

That's much less code (both in #LOC and .text), and avoids adding yet
another place that will be audited over and over for "hm, yeah, that
sprintf() is actually not gonna overflow".

Yeah, it'll lose the 0x prefixes for each byte and use lowercase hex chars.

Rasmus

[PATCH] printk: rename vprintk_func to vprintk

2021-03-23 Thread Rasmus Villemoes

The printk code is already hard enough to understand. Remove an
unnecessary indirection by renaming vprintk_func to vprintk (adding
the asmlinkage annotation), and removing the vprintk definition from
printk.c. That way, printk is implemented in terms of vprintk as one
would expect, and there's no "vprintk_func, what's that? Some function
pointer that gets set where?"

The declaration of vprintk in linux/printk.h already has the
__printf(1,0) attribute, there's no point repeating that with the
definition - it's for diagnostics in callers.

linux/printk.h already contains a static inline {return 0;} definition
of vprintk when !CONFIG_PRINTK.

Since the corresponding stub definition of vprintk_func was not marked
"static inline", any translation unit including internal.h would get a
definition of vprintk_func - it just so happens that for
!CONFIG_PRINTK, there is precisely one such TU, namely printk.c. Had
there been more, it would be a link error; now it's just a silly waste
of a few bytes of .text, which one must assume are rather precious to
anyone disabling PRINTK.

$ objdump -dr kernel/printk/printk.o
0330 :
 330:   31 c0   xor%eax,%eax
 332:   c3  ret
 333:   8d b4 26 00 00 00 00lea0x0(%esi,%eiz,1),%esi
 33a:   8d b6 00 00 00 00   lea0x0(%esi),%esi

Signed-off-by: Rasmus Villemoes 
---
 kernel/printk/internal.h| 3 ---
 kernel/printk/printk.c  | 8 +---
 kernel/printk/printk_safe.c | 3 ++-
 3 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 3a8fd491758c..1c7554f0e71b 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -21,7 +21,6 @@ int vprintk_store(int facility, int level,
 
 __printf(1, 0) int vprintk_default(const char *fmt, va_list args);
 __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
-__printf(1, 0) int vprintk_func(const char *fmt, va_list args);
 void __printk_safe_enter(void);
 void __printk_safe_exit(void);
 
@@ -56,8 +55,6 @@ void defer_console_output(void);
 
 #else
 
-__printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; }
-
 /*
  * In !PRINTK builds we still export logbuf_lock spin_lock, console_sem
  * semaphore and some of console functions (console_unlock()/etc.), so
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 575a34b88936..458707a06124 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2104,12 +2104,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 }
 EXPORT_SYMBOL(vprintk_emit);
 
-asmlinkage int vprintk(const char *fmt, va_list args)
-{
-   return vprintk_func(fmt, args);
-}
-EXPORT_SYMBOL(vprintk);
-
 int vprintk_default(const char *fmt, va_list args)
 {
return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
@@ -2143,7 +2137,7 @@ asmlinkage __visible int printk(const char *fmt, ...)
int r;
 
va_start(args, fmt);
-   r = vprintk_func(fmt, args);
+   r = vprintk(fmt, args);
va_end(args);
 
return r;
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 2e9e3ed7d63e..87d2e86af122 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -367,7 +367,7 @@ void __printk_safe_exit(void)
this_cpu_dec(printk_context);
 }
 
-__printf(1, 0) int vprintk_func(const char *fmt, va_list args)
+asmlinkage int vprintk(const char *fmt, va_list args)
 {
 #ifdef CONFIG_KGDB_KDB
/* Allow to pass printk() to kdb but avoid a recursion. */
@@ -420,3 +420,4 @@ void __init printk_safe_init(void)
/* Flush pending messages that did not have scheduled IRQ works. */
printk_safe_flush();
 }
+EXPORT_SYMBOL(vprintk);
-- 
2.29.2

Re: [kbuild-all] Re: include/linux/compiler_types.h:315:38: error: call to '__compiletime_assert_536' declared with attribute error: BUILD_BUG_ON failed: offsetof(struct can_frame, len) != offsetof(st

2021-03-23 Thread Rasmus Villemoes

On 23/03/2021 13.49, Oliver Hartkopp wrote:
> 
> 
> On 23.03.21 12:36, Rasmus Villemoes wrote:
>>
>> and more directly from the horse's mouth:
>>
>> https://developer.arm.com/documentation/dui0067/d/arm-compiler-reference/c-and-c---implementation-details/structures--unions--enumerations--and-bitfields
>>
>>
>> Field alignment
>>
>>  Structures are arranged with the first-named component at the lowest
>> address. Fields are aligned as follows:
>>
>>  A field with a char type is aligned to the next available byte.
>>
>>  A field with a short type is aligned to the next even-addressed
>> byte.
>>
>>  Bitfield alignment depends on how the bitfield is declared. See
>> Bitfields in packed structures for more information.
>>
>>  All other types are aligned on word boundaries.
>>
>> That anonymous union falls into the "All other types" bullet.
>>
>> __packed is the documented and standard way to overrule the
>> compiler's/ABI's layout decisions.
> 
> So why is there a difference between
> 
> gcc version 10.2.0
> 
> and
> 
> gcc version 10.2.1 20210110 (Debian 10.2.1-6)

I'm guessing there's no difference between those (in this respect), but
they are invoked differently.

> Would this mean that either STRUCTURE_SIZE_BOUNDARY or the command line
> option -mstructure_size_boundary=
> 
> are set differently?

Yes, though very likely -mstructure_size_boundary is not set explicitly
but via some other option.

gcc has a rather helpful but almost unknown feature that one can
actually query for lots of different parameters and their
default/current values. So on my Ubuntu system (20.04, gcc 9.3), for
example, if I do

$ arm-linux-gnueabihf-gcc -O2 -Q --help=target | grep struct
  -mstructure-size-boundary=8

So that would seem to say that the union should work as expected.
However, when I actually try to compile with the .config that kbuild
reports failing, I do see that BUILD_BUG_ON triggering.

So let us inspect the actual command line used to build some other
random .o file in net/can; look at net/can/.bcm.o.cmd

cmd_net/can/bcm.o := arm-linux-gnueabihf-gcc -Wp,-MMD,net/can/.bcm.o.d
-nostdinc -isystem /usr/lib/gcc-cross/arm-linux-gnueabihf/9/include
-I./arch/arm/include -I./arch/arm/include/generated  -I./include
-I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi
-I./include/uapi -I./include/generated/uapi -include
./include/linux/compiler-version.h -include ./include/linux/kconfig.h
-include ./include/linux/compiler_types.h -D__KERNEL__ -mlittle-endian
-I./arch/arm/mach-footbridge/include -fmacro-prefix-map=./= -Wall
-Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration
-Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu89
-fno-dwarf2-cfi-asm -mno-unaligned-access -fno-omit-frame-pointer -mapcs
-mno-sched-prolog -fno-ipa-sra -mabi=apcs-gnu -mno-thumb-interwork -marm
-Wa,-mno-warn-deprecated -D__LINUX_ARM_ARCH__=4 -march=armv4
-mtune=strongarm110 -msoft-float -Uarm -fno-delete-null-pointer-checks
-Wno-frame-address -Wno-format-truncation -Wno-format-overflow
-Wno-address-of-packed-member -O2 --param=allow-store-data-races=0
-Wframe-larger-than=1024 -fno-stack-protector
-Wno-unused-but-set-variable -Wimplicit-fallthrough
-Wno-unused-const-variable -fno-omit-frame-pointer
-fno-optimize-sibling-calls -fno-inline-functions-called-once
-Wdeclaration-after-statement -Wvla -Wno-pointer-sign
-Wno-stringop-truncation -Wno-array-bounds -Wno-stringop-overflow
-Wno-restrict -Wno-maybe-uninitialized -fno-strict-overflow
-fno-stack-check -fconserve-stack -Werror=date-time
-Werror=incompatible-pointer-types -Werror=designated-init
-Wno-packed-not-aligned-fsanitize-coverage=trace-pc
-DKBUILD_MODFILE='"net/can/can-bcm"' -DKBUILD_BASENAME='"bcm"'
-DKBUILD_MODNAME='"can_bcm"' -D__KBUILD_MODNAME=kmod_can_bcm -c -o
net/can/bcm.o net/can/bcm.c

Lots of gunk. But just to see if one of those options have affected the
-mstructure-size-boundary= value, just take that whole command line and
throw in -Q --help=target at the end, and we get

  -mstructure-size-boundary=32

So let us guess that it's the ABI choice -mabi=apcs-gnu

$ arm-linux-gnueabihf-gcc -O2 -msoft-float -mabi=apcs-gnu -Q
--help=target | grep struct
  -mstructure-size-boundary=32

Bingo. (-msoft-float is also included just as in the real command line
because gcc barfs otherwise).

Now what CONFIG_* knobs are responsible for putting -mabi=apcs-gnu in
CFLAGS is left as an exercise for the reader. Regardless, it is not a
bug in the compiler. The error is the assumption that this language

"Aggregates and Unions

Structures and unions assume the alignment of

Re: [PATCH] Input: analog - fix invalid snprintf() call

2021-03-23 Thread Rasmus Villemoes

On 23/03/2021 14.14, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> overlapping input and output arguments to snprintf() are
> undefined behavior in C99:
> 

Good luck:
https://lore.kernel.org/lkml/1457469654-17059-1-git-send-email-li...@rasmusvillemoes.dk/

At least 5 years ago the consensus from old-timers was that "the
kernel's snprintf supports this use case, just keep it working that way".

> diff --git a/drivers/input/joystick/analog.c b/drivers/input/joystick/analog.c
> index f798922a4598..8c9fed3f13e2 100644
> --- a/drivers/input/joystick/analog.c
> +++ b/drivers/input/joystick/analog.c
> @@ -419,14 +419,16 @@ static void analog_calibrate_timer(struct analog_port 
> *port)
>  
>  static void analog_name(struct analog *analog)
>  {
> - snprintf(analog->name, sizeof(analog->name), "Analog %d-axis %d-button",
> + int len;
> +
> + len = snprintf(analog->name, sizeof(analog->name), "Analog %d-axis 
> %d-button",
>hweight8(analog->mask & ANALOG_AXES_STD),
>hweight8(analog->mask & ANALOG_BTNS_STD) + !!(analog->mask & 
> ANALOG_BTNS_CHF) * 2 +
>hweight16(analog->mask & ANALOG_BTNS_GAMEPAD) + 
> !!(analog->mask & ANALOG_HBTN_CHF) * 4);
>  
>   if (analog->mask & ANALOG_HATS_ALL)
> - snprintf(analog->name, sizeof(analog->name), "%s %d-hat",
> -  analog->name, hweight16(analog->mask & 
> ANALOG_HATS_ALL));
> + len += snprintf(analog->name + len, sizeof(analog->name) - len, 
> "%d-hat",
> +  hweight16(analog->mask & ANALOG_HATS_ALL));

Use scnprintf, this is too fragile and hard to verify. If the first
snprintf overflows, the second passes a huge size_t to snprintf which
will WARN.

Rasmus

Re: [kbuild-all] Re: include/linux/compiler_types.h:315:38: error: call to '__compiletime_assert_536' declared with attribute error: BUILD_BUG_ON failed: offsetof(struct can_frame, len) != offsetof(st

2021-03-23 Thread Rasmus Villemoes

On 23/03/2021 08.45, Oliver Hartkopp wrote:

> IMO we facing a compiler problem here - and we should be very happy that
> the BUILD_BUG_ON() triggered an issue after years of silence.
> 
> I do not have a good feeling about what kind of strange effects this
> compiler issue might have in other code of other projects.
> 
> So I would explicitly suggest NOT to change the af_can.c code to work
> around this compiler issue.
> 
> Let the gcc people fix their product and let them thank all of us for
> detecting it.

I'm sure you'd be eligible for a full refund in case this was a bug in
gcc. It is not. It's a pretty clear ABI requirement for (at least some
flavors of) ARM:

https://stackoverflow.com/questions/43786747/struct-layout-in-apcs-gnu-abi

and more directly from the horse's mouth:

https://developer.arm.com/documentation/dui0067/d/arm-compiler-reference/c-and-c---implementation-details/structures--unions--enumerations--and-bitfields

Field alignment

Structures are arranged with the first-named component at the lowest
address. Fields are aligned as follows:

A field with a char type is aligned to the next available byte.

A field with a short type is aligned to the next even-addressed
byte.

Bitfield alignment depends on how the bitfield is declared. See
Bitfields in packed structures for more information.

All other types are aligned on word boundaries.

That anonymous union falls into the "All other types" bullet.

__packed is the documented and standard way to overrule the
compiler's/ABI's layout decisions.

Rasmus

[PATCH] USB: gadget: legacy: remove left-over __ref annotations

2021-03-23 Thread Rasmus Villemoes

These were added in commit 780cc0f370 ("usb: gadget: add '__ref' for
rndis_config_register() and cdc_config_register()") to silence
modpost, but they didn't fix the real problem - that was fixed later
by removing wrong __init annotations in commit c94e289f195e ("usb:
gadget: remove incorrect __init/__exit annotations").

It really never makes sense for a function to be marked __ref unless
it (1) has some conditional that chooses whether to call an __init
function (or access __initdata) or not and (2) has a comment
explaining why the __ref is there and why it is safe.

Signed-off-by: Rasmus Villemoes 
---
 drivers/usb/gadget/legacy/multi.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/gadget/legacy/multi.c 
b/drivers/usb/gadget/legacy/multi.c
index ec9749845660..f6d0782e6fc3 100644
--- a/drivers/usb/gadget/legacy/multi.c
+++ b/drivers/usb/gadget/legacy/multi.c
@@ -182,7 +182,7 @@ static int rndis_do_config(struct usb_configuration *c)
return ret;
 }
 
-static __ref int rndis_config_register(struct usb_composite_dev *cdev)
+static int rndis_config_register(struct usb_composite_dev *cdev)
 {
static struct usb_configuration config = {
.bConfigurationValue= MULTI_RNDIS_CONFIG_NUM,
@@ -197,7 +197,7 @@ static __ref int rndis_config_register(struct 
usb_composite_dev *cdev)
 
 #else
 
-static __ref int rndis_config_register(struct usb_composite_dev *cdev)
+static int rndis_config_register(struct usb_composite_dev *cdev)
 {
return 0;
 }
@@ -265,7 +265,7 @@ static int cdc_do_config(struct usb_configuration *c)
return ret;
 }
 
-static __ref int cdc_config_register(struct usb_composite_dev *cdev)
+static int cdc_config_register(struct usb_composite_dev *cdev)
 {
static struct usb_configuration config = {
.bConfigurationValue= MULTI_CDC_CONFIG_NUM,
@@ -280,7 +280,7 @@ static __ref int cdc_config_register(struct 
usb_composite_dev *cdev)
 
 #else
 
-static __ref int cdc_config_register(struct usb_composite_dev *cdev)
+static int cdc_config_register(struct usb_composite_dev *cdev)
 {
return 0;
 }
@@ -291,7 +291,7 @@ static __ref int cdc_config_register(struct 
usb_composite_dev *cdev)
 
 /** Gadget Bind **/
 
-static int __ref multi_bind(struct usb_composite_dev *cdev)
+static int multi_bind(struct usb_composite_dev *cdev)
 {
struct usb_gadget *gadget = cdev->gadget;
 #ifdef CONFIG_USB_G_MULTI_CDC
-- 
2.29.2

[PATCH] clkdev: remove unnecessary __ref annotations

2021-03-22 Thread Rasmus Villemoes

clkdev_alloc and vclkdev_alloc do not call any functions living in
.init.text (nor are there comments as to when and why that would be
safe). The original annotation dates back to when the code was
imported from ARM in 6d803ba736ab, and AFAICT the last reason to have
the annotation vanished with ea17c9d868c1 (sh: Use generic clkdev.h
header).

Signed-off-by: Rasmus Villemoes 
---
 drivers/clk/clkdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
index 0f2e3fcf0f19..d446c2c697d5 100644
--- a/drivers/clk/clkdev.c
+++ b/drivers/clk/clkdev.c
@@ -153,7 +153,7 @@ struct clk_lookup_alloc {
charcon_id[MAX_CON_ID];
 };
 
-static struct clk_lookup * __ref
+static struct clk_lookup *
 vclkdev_alloc(struct clk_hw *hw, const char *con_id, const char *dev_fmt,
va_list ap)
 {
@@ -190,7 +190,7 @@ vclkdev_create(struct clk_hw *hw, const char *con_id, const 
char *dev_fmt,
return cl;
 }
 
-struct clk_lookup * __ref
+struct clk_lookup *
 clkdev_alloc(struct clk *clk, const char *con_id, const char *dev_fmt, ...)
 {
struct clk_lookup *cl;
-- 
2.29.2

Re: [PATCH 06/12] tools: sync small_const_nbits() macro with the kernel

2021-03-22 Thread Rasmus Villemoes

On 21/03/2021 22.54, Yury Norov wrote:
> Move the macro from tools/include/asm-generic/bitsperlong.h
> to tools/include/linux/bitmap.h

The patch does it the other way around :)

> Signed-off-by: Yury Norov 
> ---
>  tools/include/asm-generic/bitsperlong.h | 3 +++
>  tools/include/linux/bitmap.h| 3 ---
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/include/asm-generic/bitsperlong.h 
> b/tools/include/asm-generic/bitsperlong.h
> index 8f2283052333..f530da2506cc 100644
> --- a/tools/include/asm-generic/bitsperlong.h
> +++ b/tools/include/asm-generic/bitsperlong.h
> @@ -18,4 +18,7 @@
>  #define BITS_PER_LONG_LONG 64
>  #endif
>  
> +#define small_const_nbits(nbits) \
> + (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG)
> +

Well, the movement is consistent with the kernel, but shouldn't the
definition also be updated to exclude constant-zero-size? It's not that
they exist or ever have, in tools/ or kernel proper, but just if some
day some oddball CONFIG_ combination ends up creating such a beast, I'd
rather not have code like

+   if (small_const_nbits(size)) {
+   unsigned long val = *addr & GENMASK(size - 1, 0);

blow up at run-time.

Other than that (and the above commit log typo), consider the series

Acked-by: Rasmus Villemoes

Re: [PATCH 2/3] static_call: Align static_call_is_init() patching condition

2021-03-19 Thread Rasmus Villemoes

On 19/03/2021 15.40, Rasmus Villemoes wrote:
> On 19/03/2021 15.13, Peter Zijlstra wrote:
> 
>>> Dunno, probably overkill, but perhaps we could have an atomic_t (or
>>> refcount, whatever) init_ref inited to 1, with init_ref_get() doing an
>>> inc_unless_zero, and iff you get a ref, you're free to call (/patch)
>>> __init functions and access __initdata, but must do init_ref_put(), with
>>> PID1 dropping its initial ref and waiting for it to drop to 0 before
>>> doing the *free_initmem() calls.
>>
>> I'd as soon simply add another SYSTEM state. That way we don't have to
>> worry about who else looks at RUNNING for what etc..
> 
> I don't understand. How would that solve the
> 
> PID1   PIDX
>ok = system_state < INIT_MEM_GONE;
> system_state = INIT_MEM_GONE;
> free_initmem();
> system_state = RUNNING;
>if (ok)
>poke init mem
> 
> race? I really don't see any way arbitrary threads can reliably check
> how far PID1 has progressed at one point in time and use that
> information (a few lines) later to access init memory without
> synchronizing with PID1.
> 
> AFAICT, having an atomic_t object representing the init memory 

something like

--- a/init/main.c
+++ b/init/main.c
@@ -1417,6 +1417,18 @@ void __weak free_initmem(void)
free_initmem_default(POISON_FREE_INITMEM);
 }

+static atomic_t init_mem_ref = ATOMIC_INIT(1);
+static DECLARE_COMPLETION(init_mem_may_go);
+bool init_mem_get(void)
+{
+   return atomic_inc_not_zero(_mem_ref);
+}
+void init_mem_put(void)
+{
+   if (atomic_dec_and_test(_mem_ref))
+   complete(_mem_may_go);
+}
+
 static int __ref kernel_init(void *unused)
 {
int ret;
@@ -1424,6 +1436,8 @@ static int __ref kernel_init(void *unused)
kernel_init_freeable();
/* need to finish all async __init code before freeing the memory */
async_synchronize_full();
+   init_mem_put();
+   wait_for_completion(_mem_may_go);
kprobe_free_init_mem();
ftrace_free_init_mem();
kgdb_free_init_mem();

Re: [PATCH 2/3] static_call: Align static_call_is_init() patching condition

2021-03-19 Thread Rasmus Villemoes

On 19/03/2021 15.13, Peter Zijlstra wrote:

>> Dunno, probably overkill, but perhaps we could have an atomic_t (or
>> refcount, whatever) init_ref inited to 1, with init_ref_get() doing an
>> inc_unless_zero, and iff you get a ref, you're free to call (/patch)
>> __init functions and access __initdata, but must do init_ref_put(), with
>> PID1 dropping its initial ref and waiting for it to drop to 0 before
>> doing the *free_initmem() calls.
> 
> I'd as soon simply add another SYSTEM state. That way we don't have to
> worry about who else looks at RUNNING for what etc..

I don't understand. How would that solve the

PID1   PIDX
   ok = system_state < INIT_MEM_GONE;
system_state = INIT_MEM_GONE;
free_initmem();
system_state = RUNNING;
   if (ok)
   poke init mem

race? I really don't see any way arbitrary threads can reliably check
how far PID1 has progressed at one point in time and use that
information (a few lines) later to access init memory without
synchronizing with PID1.

AFAICT, having an atomic_t object representing the init memory and a
"no, you can't have a new reference" would guarantee correctness of the
jump_label/static_call patching: If we get a reference, we do the
patching just as for the rest of the kernel .text. If we don't, i.e.
observe atomic_load(init_ref)==0, that may not necessarily mean that
PID1 has actually discarded the memory yet, but no thread can currently
or in the future actually run __init code, so it need not be patched.

Rasmus

Re: [PATCH 2/3] static_call: Align static_call_is_init() patching condition

2021-03-19 Thread Rasmus Villemoes

On 18/03/2021 12.31, Peter Zijlstra wrote:
> The intent is to avoid writing init code after init (because the text
> might have been freed). The code is needlessly different between
> jump_label and static_call and not obviously correct.
> 
> The existing code relies on the fact that the module loader clears the
> init layout, such that within_module_init() always fails, while
> jump_label relies on the module state which is more obvious and
> matches the kernel logic.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  kernel/static_call.c |   14 --
>  1 file changed, 4 insertions(+), 10 deletions(-)
> 
> --- a/kernel/static_call.c
> +++ b/kernel/static_call.c
> @@ -149,6 +149,7 @@ void __static_call_update(struct static_
>   };
>  
>   for (site_mod =  site_mod; site_mod = site_mod->next) {
> + bool init = system_state < SYSTEM_RUNNING;

I recently had occasion to look at whether that would be a suitable
condition for knowing whether __init stuff was gone, but concluded that
it's not. Maybe I'm wrong. init/main.c:

free_initmem();
mark_readonly();

/*
 * Kernel mappings are now finalized - update the userspace
page-table
 * to finalize PTI.
 */
pti_finalize();

system_state = SYSTEM_RUNNING;

So ISTM there's window where system_state < SYSTEM_RUNNING but accessing
init stuff is a bad idea. If you're PID1 it's all fine, but I suppose
other kernel threads could end up calling static_call_update. And just
moving the system_state setting up before the *free_initmem() calls
doesn't really solve anything because TOCTOU.

Dunno, probably overkill, but perhaps we could have an atomic_t (or
refcount, whatever) init_ref inited to 1, with init_ref_get() doing an
inc_unless_zero, and iff you get a ref, you're free to call (/patch)
__init functions and access __initdata, but must do init_ref_put(), with
PID1 dropping its initial ref and waiting for it to drop to 0 before
doing the *free_initmem() calls.

Rasmus

Re: [RFC PATCH] devres: better type safety with devm_*_action()

2021-03-18 Thread Rasmus Villemoes

On 10/03/2021 00.59, Rasmus Villemoes wrote:

[quoting in full for context to the new CCs]

> With a little MacroMagic(tm), we can allow users to pass a pointer to
> a function that actually takes the type of the data argument, instead
> of forcing the function to have prototype void (*)(void *). Of course,
> we must still accept such functions.
> 
> This can provide a little more type safety in that we get fewer
> implicit casts to and from void* - as a random example,
> gpio_mockup_dispose_mappings in drivers/gpio/gpio-mockup.c could take
> a "struct gpio_mockup_chip *chip" directly.
> 
> Moreover, when the action is some "external" API, there will in many
> cases no longer be a need for a trivial local wrapper -
> e.g. drivers/watchdog/cadence_wdt.c could just use
> clk_disable_unprepare directly and avoid defining
> cdns_clk_disable_unprepare.
> 
> Signed-off-by: Rasmus Villemoes 
> ---
>  drivers/base/devres.c  | 32 +++-
>  include/linux/device.h | 36 
>  2 files changed, 55 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/base/devres.c b/drivers/base/devres.c
> index fb9d5289a620..97ebd26bc44a 100644
> --- a/drivers/base/devres.c
> +++ b/drivers/base/devres.c
> @@ -728,15 +728,25 @@ static void devm_action_release(struct device *dev, 
> void *res)
>  }
>  
>  /**
> - * devm_add_action() - add a custom action to list of managed resources
> + * __devm_add_action() - add a custom action to list of managed resources
>   * @dev: Device that owns the action
>   * @action: Function that should be called
>   * @data: Pointer to data passed to @action implementation
>   *
>   * This adds a custom action to the list of managed resources so that
>   * it gets executed as part of standard resource unwinding.
> + *
> + * Do not call directly, but use the the macro wrapper
> + * devm_add_action, whose "prototype" is
> + *
> + * devm_add_action(struct device *dev, void (*action)(T *), T *data)
> + *
> + * This allows use of type-correct callbacks and can avoid wrapping
> + * external APIs. For example, if the data item is a "struct clk *", one
> + * can use clk_disable_unprepare directly as the action instead of
> + * creating a local wrapper taking a "void *" argument.
>   */
> -int devm_add_action(struct device *dev, void (*action)(void *), void *data)
> +int __devm_add_action(struct device *dev, void (*action)(void *), void *data)
>  {
>   struct action_devres *devres;
>  
> @@ -751,18 +761,20 @@ int devm_add_action(struct device *dev, void 
> (*action)(void *), void *data)
>   devres_add(dev, devres);
>   return 0;
>  }
> -EXPORT_SYMBOL_GPL(devm_add_action);
> +EXPORT_SYMBOL_GPL(__devm_add_action);
>  
>  /**
> - * devm_remove_action() - removes previously added custom action
> + * __devm_remove_action() - removes previously added custom action
>   * @dev: Device that owns the action
>   * @action: Function implementing the action
>   * @data: Pointer to data passed to @action implementation
>   *
>   * Removes instance of @action previously added by devm_add_action().
>   * Both action and data should match one of the existing entries.
> + *
> + * Use the macro wrapper devm_remove_action, see __devm_add_action for 
> details.
>   */
> -void devm_remove_action(struct device *dev, void (*action)(void *), void 
> *data)
> +void __devm_remove_action(struct device *dev, void (*action)(void *), void 
> *data)
>  {
>   struct action_devres devres = {
>   .data = data,
> @@ -772,10 +784,10 @@ void devm_remove_action(struct device *dev, void 
> (*action)(void *), void *data)
>   WARN_ON(devres_destroy(dev, devm_action_release, devm_action_match,
>  ));
>  }
> -EXPORT_SYMBOL_GPL(devm_remove_action);
> +EXPORT_SYMBOL_GPL(__devm_remove_action);
>  
>  /**
> - * devm_release_action() - release previously added custom action
> + * __devm_release_action() - release previously added custom action
>   * @dev: Device that owns the action
>   * @action: Function implementing the action
>   * @data: Pointer to data passed to @action implementation
> @@ -783,8 +795,10 @@ EXPORT_SYMBOL_GPL(devm_remove_action);
>   * Releases and removes instance of @action previously added by
>   * devm_add_action().  Both action and data should match one of the
>   * existing entries.
> + *
> + * Use the macro wrapper devm_release_action, see __devm_add_action for 
> details.
>   */
> -void devm_release_action(struct device *dev, void (*action)(void *), void 
> *data)
> +void __devm_release_action(struct device *dev, void (*action)(void *), void

Re: [PATCH] ARM: decompressor: remove unused global variable output_data

2021-03-18 Thread Rasmus Villemoes

On 08/03/2021 15.29, Rasmus Villemoes wrote:
> output_data seems to have been write-only since the flush_window()
> callback was removed in commit e7db7b4270ed ("arm: add support for
> LZO-compressed kernels").

Ping.

Re: [PATCH 1/2] devtmpfs: fix placement of complete() call

2021-03-18 Thread Rasmus Villemoes

On 12/03/2021 11.30, Rasmus Villemoes wrote:
> Calling complete() from within the __init function is wrong -
> theoretically, the init process could proceed all the way to freeing
> the init mem before the devtmpfsd thread gets to execute the return
> instruction in devtmpfs_setup().

Greg, ping? Also for the other one.

Re: [PATCH v5] printk: Userspace format enumeration support

2021-03-18 Thread Rasmus Villemoes

On 18/03/2021 11.46, Petr Mladek wrote:

> BTW: Is the trick with int (printk)(const char *s, ...) documented
> somewhere? Is it portable?

It is completely standard and portable C, explicitly spelled out in the
C standard itself. C99:

===
6.10.3 Macro replacement

10 [...] Each subsequent instance of the
function-like macro name followed by a ( as the next preprocessing token
introduces the
sequence of preprocessing tokens that is replaced by the replacement
list in the definition
(an invocation of the macro). [...]
===

and later

===
7.1.4 Use of library functions

1 [...] one
of the techniques shown below can be used to ensure the declaration is
not affected by
such a macro. Any macro definition of a function can be suppressed
locally by enclosing
the name of the function in parentheses, because the name is then not
followed by the left
parenthesis that indicates expansion of a macro function name. For the
same syntactic
reason, it is permitted to take the address of a library function even
if it is also defined as
a macro.
===

Also, the use of printk() inside the definition of a printk()
function-like macro does not lead to infinite recursion, by

===
6.10.3.4 Rescanning and further replacement

2 If the name of the macro being replaced is found during this scan of
the replacement list
(not including the rest of the source file’s preprocessing tokens), it
is not replaced.
===

Rasmus

Re: [PATCH 04/13] lib: introduce BITS_{FIRST,LAST} macro

2021-03-17 Thread Rasmus Villemoes

On 17/03/2021 06.40, Yury Norov wrote:
> On Tue, Mar 16, 2021 at 01:42:45PM +0200, Andy Shevchenko wrote:

>>> It would also be much easier to review if you just redefined the
>>> BITMAP_LAST_WORD_MASK macros etc. in terms of these new things, so you
>>> wouldn't have to do a lot of mechanical changes at the same time as
>>> introducing the new ones - especially when those mechanical changes
>>> involve adding a "minus 1" everywhere.
>>
>> I tend to agree with Rasmus here.
> 
> OK. All this plus terrible GENMASK(high, low) design, when high goes
> first, makes me feel like we need to deprecate GENMASK and propose a
> new interface.
> 
> What do you think about this:
> BITS_FIRST(bitnum)  -> [0, bitnum)
> BITS_LAST(bitnum)   -> [bitnum, BITS_PER_LONG)
> BITS_RANGE(begin, end)  -> [begin, end)

Better, though I'm not too happy about BITS_LAST(n) not producing a word
with the n highest bits set. I dunno, I don't have better names.
BITS_FROM/BITS_UPTO perhaps, but not really (and upto sounds like it is
inclusive). BITS_LOW/BITS_HIGH have the same problem as BITS_LAST.

Also, be careful to document what one can expect from the boundary
values 0/BITS_PER_LONG. Is BITS_FIRST(0) a valid invocation? Does it
yield 0UL? How about BITS_FIRST(BITS_PER_LONG), does that give ~0UL?
Note that BITMAP_{FIRST,LAST}_WORD_MASK never produce 0, they're never
used except with a word we know to be part of the bitmap.

> We can pick BITS_{LAST,FIRST} implementation from existing BITMAP_*_WORD_MASK
> analogues, and make the BITS_RANGE like:
> #define BITS_RANGE(begin, end) BITS_FIRST(end) & BITS_LAST(begin)
> 
> Regarding BITMAP_*_WORD_MASK, I can save them in bitmap.h as aliases
> to BITS_{LAST,FIRST} to avoid massive renaming. (Should I?)

Yes, now that I read these again, I definitely think the
BITMAP_{FIRST,LAST}_WORD_MASK should stay (whether their implementation
change I don't care). Their names document what they do much better than
if you replace them with their potential new implementations:
BITMAP_FIRST_WORD_MASK(start) is obviously about having to mask off some
low bits of the first word we're looking at because we're looking at an
offset into the bitmap, and similarly BITMAP_LAST_WORD_MASK(nbits)
explains itself: nbits is such that the last word needs some masking.
But their replacements would be BITS_LAST(start) and BITS_FIRST(nbits)
respectively (possibly with those arguments reduced mod N), which is
quite confusing.

> Would this all work for you?

Maybe, I think I'd have to see the implementation and how those new
macros get used.

Thanks,
Rasmus

Re: [PATCH v2] Increase page and bit waitqueue hash size

2021-03-17 Thread Rasmus Villemoes

On 17/03/2021 08.54, Nicholas Piggin wrote:

> +#if CONFIG_BASE_SMALL
> +static const unsigned int page_wait_table_bits = 4;
>  static wait_queue_head_t page_wait_table[PAGE_WAIT_TABLE_SIZE] 
> __cacheline_aligned;

>  
> + if (!CONFIG_BASE_SMALL) {
> + page_wait_table = alloc_large_system_hash("page waitqueue hash",
> + 
> sizeof(wait_queue_head_t),
> + 0,

So, how does the compiler not scream at you for assigning to an array,
even if it's inside an if (0)?

Rasmus

Re: [PATCH v5] printk: Userspace format enumeration support

2021-03-17 Thread Rasmus Villemoes

On 17/03/2021 09.40, Petr Mladek wrote:
> On Tue 2021-03-16 14:28:12, Chris Down wrote:
>> Rasmus Villemoes writes:
>>> I think it's pointless renaming the symbol to _printk, with all the
>>> churn and reduced readability that involves (especially when reading
>>> assembly "why are we calling _printk and not printk here?"). There's
>>> nothing wrong with providing a macro wrapper by the same name
>>>
>>> #define printk(bla bla) ({ do_stuff; printk(bla bla); })
>>>
>>> Only two places would need to be updated to surround the word printk in
>>> parentheses to suppress macro expansion: The declaration and the
>>> definition of printk. I.e.
>>>
>>> int (printk)(const char *s, ...)

[Of course, one could avoid that on the declaration by making sure the
macro wrapper is defined below.]

>> Hmm, I'm indifferent to either. Personally I don't like the ambiguity of
>> having both a macro and function share the same name and having to think
>> "what's the preprocessor context here?".
> 
> I would prefer to keep _printk. I agree that it creates some churn but
> it is easier to see what is going on.

It is? Nobody except the few who happen to remember about the
printk_index thing are going to understand why, when looking at
disassembly, there's now calls of a _printk function. "Huh, my code just
does pr_err(), I'm not calling some internal printk function". But it's
not up to me, so I'll stop there.

Anyway, on to the other thing I mentioned on dev_err and friends: I
think it would improve readability and make it a lot easier to (probably
in a later patch) add support for all those dev_* and net_* and whatever
other subsystems have their own wrappers if one created a new header,
linux/printk-index.h, doing something like

#ifdef CONFIG_PRINTK_INDEX
#define __printk_index_emit(fmt) do {the appropriate magic} while (0)
#else
#define __printk_index_emit(fmt) do {} while (0)
#endif

#define printk_index_wrap(fmt, real_func, ...) ({ \
  __printk_index_emit(fmt); \
  real_func(__VA_ARGS__); \
})

and then it's a matter of doing

#define printk(fmt, ...) printk_index_wrap(fmt, _printk, fmt, ##__VA_ARGS__)

or

#define _dev_err(dev, fmt, ...) printk_index_wrap(fmt, __dev_err, dev,
fmt, ##__VA_ARGS__)

(yeah, _dev_err is the real function, dev_err is already a macro, so
doing it for dev_* would involve renaming _dev_err to __dev_err. Or one
could fold the printk_index logic into the existing dev_err macro).

That is, avoid defining two different versions of each and every
required wrapper macro depending on CONFIG_PRINTK_INDEX.

One could also record the function a format is being used with - without
that, the display probably can't show a reasonable  for those
dev_* function.

Rasmus

Re: [PATCH v5] printk: Userspace format enumeration support

2021-03-16 Thread Rasmus Villemoes

On 10/03/2021 03.30, Chris Down wrote:

> ---
>  MAINTAINERS  |   5 +
>  arch/arm/kernel/entry-v7m.S  |   2 +-
>  arch/arm/lib/backtrace-clang.S   |   2 +-
>  arch/arm/lib/backtrace.S |   2 +-
>  arch/arm/mach-rpc/io-acorn.S |   2 +-
>  arch/arm/vfp/vfphw.S |   6 +-
>  arch/ia64/include/uapi/asm/cmpxchg.h |   4 +-
>  arch/openrisc/kernel/entry.S |   6 +-
>  arch/powerpc/kernel/head_fsl_booke.S |   2 +-
>  arch/um/include/shared/user.h|   3 +-
>  arch/x86/kernel/head_32.S|   2 +-
>  fs/seq_file.c|  21 +++
>  include/asm-generic/vmlinux.lds.h|  13 ++
>  include/linux/module.h   |   6 +
>  include/linux/printk.h   |  72 ++-
>  include/linux/seq_file.h |   1 +
>  include/linux/string_helpers.h   |   2 +
>  init/Kconfig |  14 ++
>  kernel/module.c  |  14 +-
>  kernel/printk/Makefile   |   1 +
>  kernel/printk/index.c| 183 +++
>  kernel/printk/printk.c   |  20 ++-
>  lib/string_helpers.c |  29 -
>  lib/test-string_helpers.c|   6 +
>  24 files changed, 386 insertions(+), 32 deletions(-)
>  create mode 100644 kernel/printk/index.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3353de0c4bc8..328b3e83 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14314,6 +14314,11 @@ S:   Maintained
>  F:   include/linux/printk.h
>  F:   kernel/printk/
>  
> +PRINTK INDEXING
> +R:   Chris Down 
> +S:   Maintained
> +F:   kernel/printk/index.c
> +
>  PRISM54 WIRELESS DRIVER
>  M:   Luis Chamberlain 
>  L:   linux-wirel...@vger.kernel.org
> diff --git a/arch/arm/kernel/entry-v7m.S b/arch/arm/kernel/entry-v7m.S
> index d0e898608d30..7bde93c10962 100644
> --- a/arch/arm/kernel/entry-v7m.S
> +++ b/arch/arm/kernel/entry-v7m.S
> @@ -23,7 +23,7 @@ __invalid_entry:
>   adr r0, strerr
>   mrs r1, ipsr
>   mov r2, lr
> - bl  printk
> + bl  _printk

I think it's pointless renaming the symbol to _printk, with all the
churn and reduced readability that involves (especially when reading
assembly "why are we calling _printk and not printk here?"). There's
nothing wrong with providing a macro wrapper by the same name

#define printk(bla bla) ({ do_stuff; printk(bla bla); })

Only two places would need to be updated to surround the word printk in
parentheses to suppress macro expansion: The declaration and the
definition of printk. I.e.

int (printk)(const char *s, ...)

>  
> +struct module;
> +
> +#ifdef CONFIG_PRINTK_INDEX
> +extern void pi_sec_store(struct module *mod);
> +extern void pi_sec_remove(struct module *mod);
> +
> +struct pi_object {
> + const char *fmt;
> + const char *func;
> + const char *file;
> + unsigned int line;
> +};
> +
> +extern struct pi_object __start_printk_index[];
> +extern struct pi_object __stop_printk_index[];

Do you need these declarations to be visible to the whole kernel? Can't
they live in printk/index.c?

> +
> +#define pi_sec_elf_embed(_p_func, _fmt, ...)\
> + ({ \
> + int _p_ret;\
> +\
> + if (__builtin_constant_p(_fmt)) {  \
> + /*
> +  * The compiler may not be able to eliminate this, so
> +  * we need to make sure that it doesn't see any
> +  * hypothetical assignment for non-constants even
> +  * though this is already inside the
> +  * __builtin_constant_p guard.
> +  */\
> + static struct pi_object _pi\

static const struct pi_object?

> + __section(".printk_index") = { \
> + .fmt = __builtin_constant_p(_fmt) ? (_fmt) : 
> NULL, \
> + .func = __func__,  \
> + .file = __FILE__,  \
> + .line = __LINE__,  \
> + }; \
> + _p_ret = _p_func(_pi.fmt, ##__VA_ARGS__);  \

Is the use of _pi.fmt here a trick to prevent gcc from eliding the _pi
object, so it is seen as "used"? That seems a bit fragile, especially if
the compiler ends up generating the same code in .text - that means gcc
does not load the format string from the _pi object (which it
shouldn't), but then I don't see why

Re: [PATCH 06/13] lib: extend the scope of small_const_nbits() macro

2021-03-16 Thread Rasmus Villemoes

On 16/03/2021 02.54, Yury Norov wrote:
> find_bit would also benefit from small_const_nbits() optimizations.
> 
> Signed-off-by: Yury Norov 
> ---
>  include/asm-generic/bitsperlong.h | 9 +
>  include/linux/bitmap.h| 3 ---
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/include/asm-generic/bitsperlong.h 
> b/include/asm-generic/bitsperlong.h
> index 3905c1c93dc2..96032f4f908f 100644
> --- a/include/asm-generic/bitsperlong.h
> +++ b/include/asm-generic/bitsperlong.h
> @@ -23,4 +23,13 @@
>  #define BITS_PER_LONG_LONG 64
>  #endif
>  
> +#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < 
> BITS_PER_LONG)
> +
> +/*
> + * The valid number of bits for a bitmap to be small enough, or in other 
> words,
> + * fit into a single machine word is 1 to BITS_PER_LONG inclusively. 0 is 
> not a
> + * valid number for size, and most probably a sing of error.
> + */
> +#define small_const_nbits(n) SMALL_CONST((n) - 1)
> +

I still don't see the point of introducing SMALL_CONST and still don't
like the much-too-generic-name - especially since AFAICT you don't
actually use it anywhere outside the definition of small_const_nbits()?

>  #endif /* __ASM_GENERIC_BITS_PER_LONG */
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index adf7bd9f0467..bc13a890ecc1 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -224,9 +224,6 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf,
>   * so make such users (should any ever turn up) call the out-of-line
>   * versions.
>   */

You added another comment near its new definition, but the left-behind
comment in bitmap.h is now somewhat confusing, no? I suggest expanding
your new comment a bit so it's clear why we're interested in whether a
bitmap is known at compile-time to consist of exactly one word:

/*
small_const_nbits(n) is true precisely when it is known at compile-time
that BITMAP_SIZE(n) is 1, i.e. 1 <= n <= BITS_PER_LONG. This allows
various bit/bitmap APIs to provide a fast inline implementation. Bitmaps
of size 0 are very rare, and a compile-time-known-size 0 is most likely
a sign of error. They will be handled correctly by the bit/bitmap APIs,
but using the out-of-line functions, so that the inline implementations
can unconditionally dereference the pointer(s).
*/

> -#define small_const_nbits(nbits) \
> - (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0)
> -
>  static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
>  {
>   unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
> 

Rasmus

Re: [PATCH 04/13] lib: introduce BITS_{FIRST,LAST} macro

2021-03-16 Thread Rasmus Villemoes

On 16/03/2021 02.54, Yury Norov wrote:
> BITMAP_{LAST,FIRST}_WORD_MASK() in linux/bitmap.h duplicates the
> functionality of GENMASK(). The scope of BITMAP* macros is wider
> than just bitmaps. This patch defines 4 new macros: BITS_FIRST(),
> BITS_LAST(), BITS_FIRST_MASK() and BITS_LAST_MASK() in linux/bits.h
> on top of GENMASK() and replaces BITMAP_{LAST,FIRST}_WORD_MASK()
> to avoid duplication and increase the scope of the macros.
> 
> This change doesn't affect code generation. On ARM64:
> scripts/bloat-o-meter vmlinux.before vmlinux
> add/remove: 1/2 grow/shrink: 2/0 up/down: 17/-16 (1)
> Function old new   delta
> ethtool_get_drvinfo  900 908  +8
> e843419@0cf2_0001309d_7f0  -   8  +8
> vermagic  48  49  +1
> e843419@0d45_000138bb_f68  8   -  -8
> e843419@0cc9_00012bce_198c 8   -  -8

[what on earth are those weird symbols?]

> diff --git a/include/linux/bits.h b/include/linux/bits.h
> index 7f475d59a097..8c191c29506e 100644
> --- a/include/linux/bits.h
> +++ b/include/linux/bits.h
> @@ -37,6 +37,12 @@
>  #define GENMASK(h, l) \
>   (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l))
>  
> +#define BITS_FIRST(nr)   GENMASK((nr), 0)
> +#define BITS_LAST(nr)GENMASK(BITS_PER_LONG - 1, (nr))
> +
> +#define BITS_FIRST_MASK(nr)  BITS_FIRST((nr) % BITS_PER_LONG)
> +#define BITS_LAST_MASK(nr)   BITS_LAST((nr) % BITS_PER_LONG)

I don't think it's a good idea to propagate the unusual closed-range
semantics of GENMASK to those wrappers. Almost all C and kernel code
uses the 'inclusive lower bound, exclusive upper bound', and I'd expect
BITS_FIRST(5) to result in a word with five bits set, not six. So I
think these changes as-is make the code much harder to read and understand.

Regardless, please add some comments on the valid input ranges to the
macros, whether that ends up being 0 <= nr < BITS_PER_LONG or 0 < nr <=
BITS_PER_LONG or whatnot.

It would also be much easier to review if you just redefined the
BITMAP_LAST_WORD_MASK macros etc. in terms of these new things, so you
wouldn't have to do a lot of mechanical changes at the same time as
introducing the new ones - especially when those mechanical changes
involve adding a "minus 1" everywhere.

Rasmus

Re: [PATCH 02/13] tools: bitmap: sync function declarations with the kernel

2021-03-16 Thread Rasmus Villemoes

On 16/03/2021 02.54, Yury Norov wrote:
> Some functions in tools/include/linux/bitmap.h declare nbits as int. In the
> kernel nbits is declared as unsigned int.
> 
> Signed-off-by: Yury Norov 

Acked-by: Rasmus Villemoes

Re: [PATCH 01/13] tools: disable -Wno-type-limits

2021-03-16 Thread Rasmus Villemoes

On 16/03/2021 02.54, Yury Norov wrote:
> GENMASK(h, l) may be passed with unsigned types. In such case, type-limits
> warning is generated for example in case of GENMASK(h, 0).
> 
> Signed-off-by: Yury Norov 
> ---
>  tools/scripts/Makefile.include | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
> index 84dbf61a7eca..15e99905cb7d 100644
> --- a/tools/scripts/Makefile.include
> +++ b/tools/scripts/Makefile.include
> @@ -38,6 +38,7 @@ EXTRA_WARNINGS += -Wswitch-enum
>  EXTRA_WARNINGS += -Wundef
>  EXTRA_WARNINGS += -Wwrite-strings
>  EXTRA_WARNINGS += -Wformat
> +EXTRA_WARNINGS += -Wno-type-limits
>

I don't like that kind of collateral damage. I seem to recall another
instance where a macro was instead rewritten to avoid triggering the
type-limits warning (with a comment explaining the uglyness). Something like

foo > bar  is the same as
!(foo <= bar)  which is the same as
!(foo == bar || foo < bar)

Dunno if that would work here, but if it did, it would have the bonus
that when somebody builds the kernel proper with Wtype-limits enabled
(maybe W=1 or W=2) there would be no false positives from GENMASK to
wade through.

Alternatively, we really should consider making use of _Pragma to
locally disable/re-enable certain warnings.

Rasmus

Re: [PATCH v3 1/2] init/initramfs.c: do unpacking asynchronously

2021-03-15 Thread Rasmus Villemoes

On 15/03/2021 22.33, Andrew Morton wrote:
> On Sat, 13 Mar 2021 22:25:27 +0100 Rasmus Villemoes 
>  wrote:
> 
>> Most of the boot process doesn't actually need anything from the
>> initramfs, until of course PID1 is to be executed. So instead of doing
>> the decompressing and populating of the initramfs synchronously in
>> populate_rootfs() itself, push that off to a worker thread.
[...]
> 
> This seems sensible.  And nice.

Thanks.

> Are you sure that you've found all the code paths that require that
> initramfs be ready?  You have one in init/main, one in the bowels of
> the firmware loader and one in UML.  How do we know that there are no
> other such places?

No, I don't _know_ that I've found all such places, but nobody, Linus
included, have been able to come up with any that I've missed. At this
point, the only way to figure it out is to get the code into linux-next
(and the more time it gets before the merge window, the better). Since
it's default-on, it should get quite a bit of testing that way.

> Also, all this doesn't buy anything for uniprocessor machines.  Is
> there a simple way of making it all go away if !CONFIG_SMP?

It absolutely does buy something for UP, at least for some special case:
The ppc machine I'm talking about is UP, and without getting the
initramfs unpacking pushed to the background, the machine doesn't get to
pet the external hardware watchdog soon enough. So this is really the
difference between booting successfully or being a power-consuming paper
weight. Also, lots of device initialization actually makes the CPU
twiddle its thumbs now and then, so having something other than the idle
thread to schedule makes better use of the single CPU.

Rasmus

Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure

2021-03-15 Thread Rasmus Villemoes

On 12/03/2021 03.29, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Mar 09, 2021 at 06:19:30AM +, Christophe Leroy wrote:
>> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
>> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
>> generates a call to _restgpr_31_x.
> 
>> I don't know if there is a way to tell GCC not to emit that call, because at 
>> the end we get more instructions than needed.
> 
> The function is required by the ABI, you need to have it.
> 
> You get *fewer* insns statically, and that is what -Os is about: reduce
> the size of the binaries.

Is there any reason to not just always build the vdso with -O2? It's one
page/one VMA either way, and the vdso is about making certain system
calls cheaper, so if unconditional -O2 could save a few cycles compared
to -Os, why not? (And if, as it seems, there's only one user within the
DSO of _restgpr_31_x, yes, the overall size of the .text segment
probably increases slightly).

Rasmus

[PATCH v3 1/2] init/initramfs.c: do unpacking asynchronously

2021-03-13 Thread Rasmus Villemoes

Most of the boot process doesn't actually need anything from the
initramfs, until of course PID1 is to be executed. So instead of doing
the decompressing and populating of the initramfs synchronously in
populate_rootfs() itself, push that off to a worker thread.

This is primarily motivated by an embedded ppc target, where unpacking
even the rather modest sized initramfs takes 0.6 seconds, which is
long enough that the external watchdog becomes unhappy that it doesn't
get attention soon enough. By doing the initramfs decompression in a
worker thread, we get to do the device_initcalls and hence start
petting the watchdog much sooner.

Normal desktops might benefit as well. On my mostly stock Ubuntu
kernel, my initramfs is a 26M xz-compressed blob, decompressing to
around 126M. That takes almost two seconds:

[0.201454] Trying to unpack rootfs image as initramfs...
[1.976633] Freeing initrd memory: 29416K

Before this patch, these lines occur consecutively in dmesg. With this
patch, the timestamps on these two lines is roughly the same as above,
but with 172 lines inbetween - so more than one cpu has been kept busy
doing work that would otherwise only happen after the
populate_rootfs() finished.

Should one of the initcalls done after rootfs_initcall time (i.e.,
device_ and late_ initcalls) need something from the initramfs (say, a
kernel module or a firmware blob), it will simply wait for the
initramfs unpacking to be done before proceeding, which should in
theory make this completely safe.

But if some driver pokes around in the filesystem directly and not via
one of the official kernel interfaces (i.e. request_firmware*(),
call_usermodehelper*) that theory may not hold - also, I certainly
might have missed a spot when sprinkling wait_for_initramfs(). So
there is an escape hatch in the form of an initramfs_async= command
line parameter.

Signed-off-by: Rasmus Villemoes 
---
 .../admin-guide/kernel-parameters.txt | 12 ++
 drivers/base/firmware_loader/main.c   |  2 +
 include/linux/initrd.h|  2 +
 init/initramfs.c  | 38 ++-
 init/main.c   |  1 +
 kernel/umh.c  |  2 +
 6 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..861374df0414 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1825,6 +1825,18 @@
initcall functions.  Useful for debugging built-in
modules and initcalls.
 
+   initramfs_async= [KNL]
+   Format: 
+   Default: 1
+   This parameter controls whether the initramfs
+   image is unpacked asynchronously, concurrently
+   with devices being probed and
+   initialized. This should normally just work,
+   but as a debugging aid, one can get the
+   historical behaviour of the initramfs
+   unpacking being completed before device_ and
+   late_ initcalls.
+
initrd= [BOOT] Specify the location of the initial ramdisk
 
initrdmem=  [KNL] Specify a physical address and size from which to
diff --git a/drivers/base/firmware_loader/main.c 
b/drivers/base/firmware_loader/main.c
index 78355095e00d..4fdb8219cd08 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -504,6 +505,7 @@ fw_get_filesystem_firmware(struct device *device, struct 
fw_priv *fw_priv,
if (!path)
return -ENOMEM;
 
+   wait_for_initramfs();
for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
size_t file_size = 0;
size_t *file_size_ptr = NULL;
diff --git a/include/linux/initrd.h b/include/linux/initrd.h
index 85c15717af34..1bbe9af48dc3 100644
--- a/include/linux/initrd.h
+++ b/include/linux/initrd.h
@@ -20,8 +20,10 @@ extern void free_initrd_mem(unsigned long, unsigned long);
 
 #ifdef CONFIG_BLK_DEV_INITRD
 extern void __init reserve_initrd_mem(void);
+extern void wait_for_initramfs(void);
 #else
 static inline void __init reserve_initrd_mem(void) {}
+static inline void wait_for_initramfs(void) {}
 #endif
 
 extern phys_addr_t phys_initrd_start;
diff --git a/init/initramfs.c b/init/initramfs.c
index d677e8e717f1..af27abc59643 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -541,6 +542,14 @@ static int __init keepinitrd_setup(char *__unused)
 __setup("keepinitrd", keepinitrd_setup);
 #endif

[PATCH v3 2/2] modules: add CONFIG_MODPROBE_PATH

2021-03-13 Thread Rasmus Villemoes

Allow the developer to specifiy the initial value of the
modprobe_path[] string. This can be used to set it to the empty string
initially, thus effectively disabling request_module() during early
boot until userspace writes a new value via the
/proc/sys/kernel/modprobe interface. [1]

When building a custom kernel (often for an embedded target), it's
normal to build everything into the kernel that is needed for booting,
and indeed the initramfs often contains no modules at all, so every
such request_module() done before userspace init has mounted the real
rootfs is a waste of time.

This is particularly useful when combined with the previous patch,
which made the initramfs unpacking asynchronous - for that to work, it
had to make any usermodehelper call wait for the unpacking to finish
before attempting to invoke the userspace helper. By eliminating all
such (known-to-be-futile) calls of usermodehelper, the initramfs
unpacking and the {device,late}_initcalls can proceed in parallel for
much longer.

For a relatively slow ppc board I'm working on, the two patches
combined lead to 0.2s faster boot - but more importantly, the fact
that the initramfs unpacking proceeds completely in the background
while devices get probed means I get to handle the gpio watchdog in
time without getting reset.

[1] __request_module() already has an early -ENOENT return when
modprobe_path is the empty string.

Reviewed-by: Greg Kroah-Hartman 
Acked-by: Jessica Yu 
Signed-off-by: Rasmus Villemoes 
---
 init/Kconfig  | 12 
 kernel/kmod.c |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 22946fe5ded9..18b4ec7346d4 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2264,6 +2264,18 @@ config MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS
 
  If unsure, say N.
 
+config MODPROBE_PATH
+   string "Path to modprobe binary"
+   default "/sbin/modprobe"
+   help
+ When kernel code requests a module, it does so by calling
+ the "modprobe" userspace utility. This option allows you to
+ set the path where that binary is found. This can be changed
+ at runtime via the sysctl file
+ /proc/sys/kernel/modprobe. Setting this to the empty string
+ removes the kernel's ability to request modules (but
+ userspace can still load modules explicitly).
+
 config TRIM_UNUSED_KSYMS
bool "Trim unused exported kernel symbols" if EXPERT
depends on !COMPILE_TEST
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 3cd075ce2a1e..b717134ebe17 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -58,7 +58,7 @@ static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
 /*
modprobe_path is set via /proc/sys.
 */
-char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
+char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH;
 
 static void free_modprobe_argv(struct subprocess_info *info)
 {
-- 
2.29.2

[PATCH v3 0/2] background initramfs unpacking, and CONFIG_MODPROBE_PATH

2021-03-13 Thread Rasmus Villemoes

These two patches are independent, but better-together.

The second is a rather trivial patch that simply allows the developer
to change "/sbin/modprobe" to something else - e.g. the empty string,
so that all request_module() during early boot return -ENOENT early,
without even spawning a usermode helper, needlessly synchronizing with
the initramfs unpacking.

The first patch delegates decompressing the initramfs to a worker
thread, allowing do_initcalls() in main.c to proceed to the device_
and late_ initcalls without waiting for that decompression (and
populating of rootfs) to finish. Obviously, some of those later calls
may rely on the initramfs being available, so I've added
synchronization points in the firmware loader and usermodehelper paths
- there might be other places that would need this, but so far no one
has been able to think of any places I have missed.

There's not much to win if most of the functionality needed during
boot is only available as modules. But systems with a custom-made
.config and initramfs can boot faster, partly due to utilizing more
than one cpu earlier, partly by avoiding known-futile modprobe calls
(which would still trigger synchronization with the initramfs
unpacking, thus eliminating most of the first benefit).

Routing-wise, I hope akpm can handle both patches. Andrew, Luis?


v2 at 
<https://lore.kernel.org/lkml/20210309211700.2011017-1-li...@rasmusvillemoes.dk/>

Changes in v3:

- Make it asynchronous by default, and drop the CONFIG_ knob (Linus) -
  the command line parameter escape hatch is still there.

  With that change, it doesn't make sense to have wait_for_initramfs()
  contain a micro-optimization in the form of an early return when
  !initramfs_async.

  I've opted to always push the work to the async machinery, but then
  just do a wait_for_initramfs() in populate_rootfs when
  !initramfs_async.

  I did consider changing the name and sense of the command line
  parameter to "initramfs_sync", but decided to keep the async, since
  that's also the name of the subsystem being used.

- Add A-b, R-b to patch 2.


Rasmus Villemoes (2):
  init/initramfs.c: do unpacking asynchronously
  modules: add CONFIG_MODPROBE_PATH

 .../admin-guide/kernel-parameters.txt | 12 ++
 drivers/base/firmware_loader/main.c   |  2 +
 include/linux/initrd.h|  2 +
 init/Kconfig  | 12 ++
 init/initramfs.c  | 38 ++-
 init/main.c   |  1 +
 kernel/kmod.c |  2 +-
 kernel/umh.c  |  2 +
 8 files changed, 69 insertions(+), 2 deletions(-)

-- 
2.29.2

Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

2021-03-13 Thread Rasmus Villemoes

On 11/03/2021 19.02, Linus Torvalds wrote:
> On Wed, Mar 10, 2021 at 5:45 PM Rasmus Villemoes
>  wrote:
>>
>> Hm, gcc does elide the test of the return value, but jumps back to a
>> place where it always loads state from its memory location and does the
>> whole switch(). To get it to jump directly to the code implementing the
>> various do_* helpers it seems one needs to avoid that global variable
>> and instead return the next state explicitly. The below boots, but I
>> still can't see any measurable improvement on ppc.
> 
> Ok. That's definitely the right way to do efficient statemachines that
> the compiler can actually generate ok code for, but if you can't
> measure the difference I guess it isn't even worth doing.

Just for good measure, I now got around to test on x86 as well, where I
thought the speculation stuff might make a difference. However, the
indirect calls through the actions[] array don't actually hurt due to
__noinitretpoline, and even removing that from the __init definition, I
only see about 1.5% difference with that state machine patch applied.

So it doesn't seem worth pursuing. I'll send v3 of the async patches
shortly.

Rasmus

[PATCH 2/2] devtmpfs: actually reclaim some init memory

2021-03-12 Thread Rasmus Villemoes

Currently gcc seems to inline devtmpfs_setup() into devtmpfsd(), so
its memory footprint isn't reclaimed as intended. Mark it noinline to
make sure it gets put in .init.text.

While here, setup_done can also be put in .init.data: After complete()
releases the internal spinlock, the completion object is never touched
again by that thread, and the waiting thread doesn't proceed until it
observes ->done while holding that spinlock.

This is now the same pattern as for kthreadd_done in init/main.c:
complete() is done in a __ref function, while the corresponding
wait_for_completion() is in an __init function.

Signed-off-by: Rasmus Villemoes 
---
 drivers/base/devtmpfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index aedeb2dc1a18..8be352ab4ddb 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -371,7 +371,7 @@ int __init devtmpfs_mount(void)
return err;
 }
 
-static DECLARE_COMPLETION(setup_done);
+static __initdata DECLARE_COMPLETION(setup_done);
 
 static int handle(const char *name, umode_t mode, kuid_t uid, kgid_t gid,
  struct device *dev)
@@ -405,7 +405,7 @@ static void __noreturn devtmpfs_work_loop(void)
}
 }
 
-static int __init devtmpfs_setup(void *p)
+static noinline int __init devtmpfs_setup(void *p)
 {
int err;
 
-- 
2.29.2

[PATCH 1/2] devtmpfs: fix placement of complete() call

2021-03-12 Thread Rasmus Villemoes

Calling complete() from within the __init function is wrong -
theoretically, the init process could proceed all the way to freeing
the init mem before the devtmpfsd thread gets to execute the return
instruction in devtmpfs_setup().

In practice, it seems to be harmless as gcc inlines devtmpfs_setup()
into devtmpfsd(). So the calls of the __init functions init_chdir()
etc. actually happen from devtmpfs_setup(), but the __ref on that one
silences modpost (it's all right, because those calls happen before
the complete()). But it does make the __init annotation of the setup
function moot, which we'll fix in a subsequent patch.

Fixes: bcbacc4909f1 ("devtmpfs: refactor devtmpfsd()")
Signed-off-by: Rasmus Villemoes 
---
 drivers/base/devtmpfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 653c8c6ac7a7..aedeb2dc1a18 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -419,7 +419,6 @@ static int __init devtmpfs_setup(void *p)
init_chroot(".");
 out:
*(int *)p = err;
-   complete(_done);
return err;
 }
 
@@ -432,6 +431,7 @@ static int __ref devtmpfsd(void *p)
 {
int err = devtmpfs_setup(p);
 
+   complete(_done);
if (err)
return err;
devtmpfs_work_loop();
-- 
2.29.2

Re: [PATCH 14/14] MAINTAINERS: Add entry for the bitmap API

2021-03-12 Thread Rasmus Villemoes

On 18/02/2021 16.34, Yury Norov wrote:
> On Thu, Feb 18, 2021 at 05:28:32PM +0200, Andy Shevchenko wrote:
>> On Wed, Feb 17, 2021 at 08:05:12PM -0800, Yury Norov wrote:
>>> Add myself as maintainer for bitmap API.
>>>
>>> I'm an author of current implementation of lib/find_bit and an
>>> active contributor to lib/bitmap. It was spotted that there's no
>>> maintainer for bitmap API. I'm willing to maintain it.
>>
>> Perhaps reviewers as well, like Rasmus, if he is okay with that, of course?
> 
> I'll be happy if you and Rasmus join the team. :) Guys, just let me
> know and I'll update the patch.
>  
>> Otherwise, why not?
>> Acked-by: Andy Shevchenko 

Sure, you can add my name/email as an R: line (or whatever means
reviewer), and consider this patch (with or without that addition) acked.

Rasmus

Re: [PATCH 06/14] bitsperlong.h: introduce SMALL_CONST() macro

2021-03-12 Thread Rasmus Villemoes

On 12/03/2021 06.28, Yury Norov wrote:
> On Fri, Feb 19, 2021 at 12:07:27AM +0100, Rasmus Villemoes wrote:
>> On 18/02/2021 05.05, Yury Norov wrote:
>>> Many algorithms become simpler if they are passed with relatively small
>>> input values. One example is bitmap operations when the whole bitmap fits
>>> into one word. To implement such simplifications, linux/bitmap.h declares
>>> small_const_nbits() macro.
>>>
>>> Other subsystems may also benefit from optimizations of this sort, like
>>> find_bit API in the following patches. So it looks helpful to generalize
>>> the macro and extend it's visibility.
>>
>> Perhaps, but SMALL_CONST is too generic a name, it needs to keep "bits"
>> somewhere in there. So why not just keep it at small_const_nbits?
>>
>>> Signed-off-by: Yury Norov 
>>> ---
>>>  include/asm-generic/bitsperlong.h |  2 ++
>>>  include/linux/bitmap.h| 33 ++-
>>>  2 files changed, 17 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/include/asm-generic/bitsperlong.h 
>>> b/include/asm-generic/bitsperlong.h
>>> index 3905c1c93dc2..0eeb77544f1d 100644
>>> --- a/include/asm-generic/bitsperlong.h
>>> +++ b/include/asm-generic/bitsperlong.h
>>> @@ -23,4 +23,6 @@
>>>  #define BITS_PER_LONG_LONG 64
>>>  #endif
>>>  
>>> +#define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n) < 
>>> BITS_PER_LONG)
>>> +
>>>  #endif /* __ASM_GENERIC_BITS_PER_LONG */
>>> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
>>> index adf7bd9f0467..e89f1dace846 100644
>>> --- a/include/linux/bitmap.h
>>> +++ b/include/linux/bitmap.h
>>> @@ -224,9 +224,6 @@ extern int bitmap_print_to_pagebuf(bool list, char *buf,
>>>   * so make such users (should any ever turn up) call the out-of-line
>>>   * versions.
>>>   */
>>> -#define small_const_nbits(nbits) \
>>> -   (__builtin_constant_p(nbits) && (nbits) <= BITS_PER_LONG && (nbits) > 0)
>>> -
>>>  static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
>>>  {
>>> unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
>>> @@ -278,7 +275,7 @@ extern void bitmap_to_arr32(u32 *buf, const unsigned 
>>> long *bitmap,
>>>  static inline int bitmap_and(unsigned long *dst, const unsigned long *src1,
>>> const unsigned long *src2, unsigned int nbits)
>>>  {
>>> -   if (small_const_nbits(nbits))
>>> +   if (SMALL_CONST(nbits - 1))
>>
>> Please don't force most users to be changed to something less readable.
>> What's wrong with just keeping small_const_nbits() the way it is,
>> avoiding all this churn and keeping the readability?
> 
> The wrong thing is that it's defined in include/linux/bitmap.h, and I
> cannot use it in include/asm-generic/bitops/find.h, so I have to either
> move it to a separate header, or generalize and share with find.h and
> other users this way. I prefer the latter option, thougt it's more
> verbose.

The logical place would be the same place the BITS_PER_LONG macro is
defined, no? No need to introduce a new header for that, and all current
users of small_const_nbits() must already (very possibly indirectly)
include asm-generic/bitsperlong.h.

I do prefer to keep both the name small_const_nbits() and its current
semantics, which, although not currently spelled out that way anywhere,
is "is BITMAP_SIZE(nbits) known at compile time and equal to 1", which
is precisely what allows the static inlines to unconditionally
dereference the pointer (that's the "exclude the 0 case") and just deal
with that one word.

I don't like either SMALL_CONST or small_const_size, because nothing in
there says it has anything to do with bit ops. As I said, if you have
some special place that for some reason cannot handle
nbits==BITS_PER_LONG, then just add that as an additional constraint
with a comment why.

Rasmus

Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

2021-03-10 Thread Rasmus Villemoes

On 11/03/2021 01.17, Rasmus Villemoes wrote:
> On 09/03/2021 23.16, Linus Torvalds wrote:
>> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
>>  wrote:
>>>
>>> So add an initramfs_async= kernel parameter, allowing the main init
>>> process to proceed to handling device_initcall()s without waiting for
>>> populate_rootfs() to finish.
>>
>> Oh, and a completely unrelated second comment about this: some of the
>> initramfs population code seems to be actively written to be slow.
>>
>> For example, I'm not sure why that write_buffer() function uses an
>> array of indirect function pointer actions. Even ignoring the whole
>> "speculation protections make that really slow" issue that came later,
>> it seems to always have been actively (a) slower and (b) more complex.
>>
>> The normal way to do that is with a simple switch() statement, which
>> makes the compiler able to do a much better job. Not just for the
>> state selector - maybe it picks that function pointer approach, but
>> probably these days just direct comparisons - but simply to do things
>> like inline all those "it's used in one place" cases entirely. In
>> fact, at that point, a lot of the state machine changes may end up
>> turning into pure goto's - compilers are sometimes good at that
>> (because state machines have often been very timing-critical).
> 
> FWIW, I tried doing
> 

Hm, gcc does elide the test of the return value, but jumps back to a
place where it always loads state from its memory location and does the
whole switch(). To get it to jump directly to the code implementing the
various do_* helpers it seems one needs to avoid that global variable
and instead return the next state explicitly. The below boots, but I
still can't see any measurable improvement on ppc.

Rasmus

Subject: [PATCH] init/initramfs.c: change state machine implementation

Instead of having write_buffer() rely on the global variable "state",
have each of the do_* helpers return the next state, or the new token
Stop. Also, instead of an array of function pointers, use a switch
statement.

This means all the do_* helpers end up inlined into write_buffer(),
and all the places which return a compile-time constant next state now
compile to a direct jump to that code.

We still need the global variable state for the initial choice within
write_buffer, and we also need to preserve the last non-Stop state
across calls.
---
 init/initramfs.c | 90 
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/init/initramfs.c b/init/initramfs.c
index 1d0fdd05e5e9..ad7e04393acb 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -189,7 +189,8 @@ static __initdata enum state {
GotName,
CopyFile,
GotSymlink,
-   Reset
+   Reset,
+   Stop
 } state, next_state;

 static __initdata char *victim;
@@ -207,17 +208,17 @@ static __initdata char *collected;
 static long remains __initdata;
 static __initdata char *collect;

-static void __init read_into(char *buf, unsigned size, enum state next)
+static int __init read_into(char *buf, unsigned size, enum state next)
 {
if (byte_count >= size) {
collected = victim;
eat(size);
-   state = next;
+   return next;
} else {
collect = collected = buf;
remains = size;
next_state = next;
-   state = Collect;
+   return Collect;
}
 }

@@ -225,8 +226,7 @@ static __initdata char *header_buf, *symlink_buf,
*name_buf;

 static int __init do_start(void)
 {
-   read_into(header_buf, 110, GotHeader);
-   return 0;
+   return read_into(header_buf, 110, GotHeader);
 }

 static int __init do_collect(void)
@@ -238,50 +238,46 @@ static int __init do_collect(void)
eat(n);
collect += n;
if ((remains -= n) != 0)
-   return 1;
-   state = next_state;
-   return 0;
+   return Stop;
+   return next_state;
 }

 static int __init do_header(void)
 {
if (memcmp(collected, "070707", 6)==0) {
error("incorrect cpio method used: use -H newc option");
-   return 1;
+   return Stop;
}
if (memcmp(collected, "070701", 6)) {
error("no cpio magic");
-   return 1;
+   return Stop;
}
parse_header(collected);
next_header = this_header + N_ALIGN(name_len) + body_len;
next_header = (next_header + 3) & ~3;
-   state = SkipIt;
if (name_len <= 0 || name_len > PATH_MAX)
-   return 0;
+   return SkipIt;
if (S_ISLNK(mode)) {

Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

2021-03-10 Thread Rasmus Villemoes

On 09/03/2021 23.16, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
>  wrote:
>>
>> So add an initramfs_async= kernel parameter, allowing the main init
>> process to proceed to handling device_initcall()s without waiting for
>> populate_rootfs() to finish.
> 
> Oh, and a completely unrelated second comment about this: some of the
> initramfs population code seems to be actively written to be slow.
> 
> For example, I'm not sure why that write_buffer() function uses an
> array of indirect function pointer actions. Even ignoring the whole
> "speculation protections make that really slow" issue that came later,
> it seems to always have been actively (a) slower and (b) more complex.
> 
> The normal way to do that is with a simple switch() statement, which
> makes the compiler able to do a much better job. Not just for the
> state selector - maybe it picks that function pointer approach, but
> probably these days just direct comparisons - but simply to do things
> like inline all those "it's used in one place" cases entirely. In
> fact, at that point, a lot of the state machine changes may end up
> turning into pure goto's - compilers are sometimes good at that
> (because state machines have often been very timing-critical).

FWIW, I tried doing

--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -401,24 +401,26 @@ static int __init do_symlink(void)
return 0;
 }

-static __initdata int (*actions[])(void) = {
-   [Start] = do_start,
-   [Collect]   = do_collect,
-   [GotHeader] = do_header,
-   [SkipIt]= do_skip,
-   [GotName]   = do_name,
-   [CopyFile]  = do_copy,
-   [GotSymlink]= do_symlink,
-   [Reset] = do_reset,
-};
-
 static long __init write_buffer(char *buf, unsigned long len)
 {
+   int ret;
+
byte_count = len;
victim = buf;

-   while (!actions[state]())
-   ;
+   do {
+   switch (state) {
+   case Start: ret = do_start(); break;
+   case Collect: ret = do_collect(); break;
+   case GotHeader: ret = do_header(); break;
+   case SkipIt: ret = do_skip(); break;
+   case GotName: ret = do_name(); break;
+   case CopyFile: ret = do_copy(); break;
+   case GotSymlink: ret = do_symlink(); break;
+   case Reset: ret = do_reset(); break;
+   }
+   } while (!ret);
+
return len - byte_count;
 }


and yes, all the do_* functions get inlined into write_buffer with some
net space saving:

add/remove: 0/9 grow/shrink: 1/0 up/down: 1696/-2112 (-416)
Function old new   delta
write_buffer 1001796   +1696
actions   32   - -32
do_start  52   - -52
do_reset 112   --112
do_skip  128   --128
do_collect   144   --144
do_symlink   172   --172
do_copy  304   --304
do_header572   --572
do_name  596   --596
Total: Before=5360, After=4944, chg -7.76%

(ppc32 build). But the unpacking still takes the same time. It might be
different on x86.

Rasmus

[PATCH] scsi: bnx2i: make bnx2i_process_iscsi_error simpler and more robust

2021-03-10 Thread Rasmus Villemoes

Instead of strcpy'ing into a stack buffer, just let additional_notice
point to a string literal living in .rodata. This is better in a few
ways:

- Smaller .text - instead of gcc compiling the strcpys as a bunch of
  immediate stores (effectively encoding the string literal in the
  instruction stream), we only pay the price of storing the literal in
  .rodata.

- Faster, because there's no string copying.

- Smaller stack usage (with my compiler, 72 bytes instead of 176 for
  the sole caller, bnx2i_indicate_kcqe)

Moreover, it's currently possible for additional_notice[] to get used
uninitialized, so some random stack garbage would be passed to
printk() - in the worst case without any '\0' anywhere in those 64
bytes. That could be fixed by initializing additional_notice[0], but
the same is achieved here by initializing the new pointer variable to
"".

Also give the message pointer a similar treatment - there's no point
making temporary copies on the stack of those two strings.

Signed-off-by: Rasmus Villemoes 
---
 drivers/scsi/bnx2i/bnx2i_hwi.c | 85 --
 1 file changed, 41 insertions(+), 44 deletions(-)

diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
index bad396e5c601..43e8a1dafec0 100644
--- a/drivers/scsi/bnx2i/bnx2i_hwi.c
+++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
@@ -2206,10 +2206,8 @@ static void bnx2i_process_iscsi_error(struct bnx2i_hba 
*hba,
 {
struct bnx2i_conn *bnx2i_conn;
u32 iscsi_cid;
-   char warn_notice[] = "iscsi_warning";
-   char error_notice[] = "iscsi_error";
-   char additional_notice[64];
-   char *message;
+   const char *additional_notice = "";
+   const char *message;
int need_recovery;
u64 err_mask64;
 
@@ -2224,133 +,132 @@ static void bnx2i_process_iscsi_error(struct 
bnx2i_hba *hba,
 
if (err_mask64 & iscsi_error_mask) {
need_recovery = 0;
-   message = warn_notice;
+   message = "iscsi_warning";
} else {
need_recovery = 1;
-   message = error_notice;
+   message = "iscsi_error";
}
 
switch (iscsi_err->completion_status) {
case ISCSI_KCQE_COMPLETION_STATUS_HDR_DIG_ERR:
-   strcpy(additional_notice, "hdr digest err");
+   additional_notice = "hdr digest err";
break;
case ISCSI_KCQE_COMPLETION_STATUS_DATA_DIG_ERR:
-   strcpy(additional_notice, "data digest err");
+   additional_notice = "data digest err";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_OPCODE:
-   strcpy(additional_notice, "wrong opcode rcvd");
+   additional_notice = "wrong opcode rcvd";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_AHS_LEN:
-   strcpy(additional_notice, "AHS len > 0 rcvd");
+   additional_notice = "AHS len > 0 rcvd";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_ITT:
-   strcpy(additional_notice, "invalid ITT rcvd");
+   additional_notice = "invalid ITT rcvd";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_STATSN:
-   strcpy(additional_notice, "wrong StatSN rcvd");
+   additional_notice = "wrong StatSN rcvd";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_EXP_DATASN:
-   strcpy(additional_notice, "wrong DataSN rcvd");
+   additional_notice = "wrong DataSN rcvd";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_PEND_R2T:
-   strcpy(additional_notice, "pend R2T violation");
+   additional_notice = "pend R2T violation";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_O_U_0:
-   strcpy(additional_notice, "ERL0, UO");
+   additional_notice = "ERL0, UO";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_O_U_1:
-   strcpy(additional_notice, "ERL0, U1");
+   additional_notice = "ERL0, U1";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_O_U_2:
-   strcpy(additional_notice, "ERL0, U2");
+   additional_notice = "ERL0, U2";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_O_U_3:
-   strcpy(additional_notice, "ERL0, U3");
+   additional_notice = "ERL0, U3";
break;
case ISCSI_KCQE_COMPLETION_STATUS_PROTOCOL_ERR_O_U_4:
-

[PATCH resend] kernel/cred.c: make init_groups static

2021-03-10 Thread Rasmus Villemoes

init_groups is declared in both cred.h and init_task.h, but it is not
actually referenced anywhere outside of cred.c where it is defined. So
make it static and remove the declarations.

Signed-off-by: Rasmus Villemoes 
---
 include/linux/cred.h  | 1 -
 include/linux/init_task.h | 1 -
 kernel/cred.c | 2 +-
 3 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 4c6350503697..2695ad118806 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -53,7 +53,6 @@ do {  \
groups_free(group_info);\
 } while (0)
 
-extern struct group_info init_groups;
 #ifdef CONFIG_MULTIUSER
 extern struct group_info *groups_alloc(int);
 extern void groups_free(struct group_info *);
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index b2412b4d4c20..40fc5813cf93 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -25,7 +25,6 @@
 extern struct files_struct init_files;
 extern struct fs_struct init_fs;
 extern struct nsproxy init_nsproxy;
-extern struct group_info init_groups;
 extern struct cred init_cred;
 
 #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
diff --git a/kernel/cred.c b/kernel/cred.c
index 421b1149c651..e1d274cd741b 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -33,7 +33,7 @@ do {  
\
 static struct kmem_cache *cred_jar;
 
 /* init to 2 - one for init_task, one to ensure it is never freed */
-struct group_info init_groups = { .usage = ATOMIC_INIT(2) };
+static struct group_info init_groups = { .usage = ATOMIC_INIT(2) };
 
 /*
  * The initial credentials for the initial task
-- 
2.29.2

Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

2021-03-10 Thread Rasmus Villemoes

On 10/03/2021 21.49, Masahiro Yamada wrote:
> On Mon, Mar 1, 2021 at 10:11 AM Nicholas Piggin  wrote:

> 
> I tested LD_DEAD_CODE_DATA_ELIMINATION for the latest kernel.
> 
> I added an unused function, this_func_is_unused(),
> then built the ppc kernel with LD_DEAD_CODE_DATA_ELIMINATION.
> 
> It remained in vmlinux.
> 
> 
> masahiro@oscar:~/ref/linux$ echo  'void this_func_is_unused(void) {}'
>>>  kernel/cpu.c
> masahiro@oscar:~/ref/linux$ export
> CROSS_COMPILE=/home/masahiro/tools/powerpc-10.1.0/bin/powerpc-linux-
> masahiro@oscar:~/ref/linux$ make ARCH=powerpc  defconfig
> masahiro@oscar:~/ref/linux$ ./scripts/config  -e EXPERT
> masahiro@oscar:~/ref/linux$ ./scripts/config  -e LD_DEAD_CODE_DATA_ELIMINATION
> masahiro@oscar:~/ref/linux$
> ~/tools/powerpc-10.1.0/bin/powerpc-linux-nm -n  vmlinux | grep
> this_func
> c0170560 T .this_func_is_unused
> c1d8d560 D this_func_is_unused

Dunno, works just fine for my ppc32 target in v4.19 (i.e., the function
gets eliminated when enabling LD_DEAD_CODE_DATA_ELIMINATION).

But yes, I can reproduce for master ppc64 defconfig. kernel/.cpu.o.cmd
says that it wasn't even compiled with -ffunction-sections, nor does
.vmlinux.cmd mention --gc-sections.

> masahiro@oscar:~/ref/linux$ grep DEAD_CODE_ .config
> CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION=y
> CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y

Ah, but scripts/config just blindly adds that config option - I don't
think ppc64 actually supports this, and
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y automagically vanishes from
.config when building.

Rasmus

[RFC PATCH] devres: better type safety with devm_*_action()

2021-03-09 Thread Rasmus Villemoes

With a little MacroMagic(tm), we can allow users to pass a pointer to
a function that actually takes the type of the data argument, instead
of forcing the function to have prototype void (*)(void *). Of course,
we must still accept such functions.

This can provide a little more type safety in that we get fewer
implicit casts to and from void* - as a random example,
gpio_mockup_dispose_mappings in drivers/gpio/gpio-mockup.c could take
a "struct gpio_mockup_chip *chip" directly.

Moreover, when the action is some "external" API, there will in many
cases no longer be a need for a trivial local wrapper -
e.g. drivers/watchdog/cadence_wdt.c could just use
clk_disable_unprepare directly and avoid defining
cdns_clk_disable_unprepare.

Signed-off-by: Rasmus Villemoes 
---
 drivers/base/devres.c  | 32 +++-
 include/linux/device.h | 36 
 2 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index fb9d5289a620..97ebd26bc44a 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -728,15 +728,25 @@ static void devm_action_release(struct device *dev, void 
*res)
 }
 
 /**
- * devm_add_action() - add a custom action to list of managed resources
+ * __devm_add_action() - add a custom action to list of managed resources
  * @dev: Device that owns the action
  * @action: Function that should be called
  * @data: Pointer to data passed to @action implementation
  *
  * This adds a custom action to the list of managed resources so that
  * it gets executed as part of standard resource unwinding.
+ *
+ * Do not call directly, but use the the macro wrapper
+ * devm_add_action, whose "prototype" is
+ *
+ * devm_add_action(struct device *dev, void (*action)(T *), T *data)
+ *
+ * This allows use of type-correct callbacks and can avoid wrapping
+ * external APIs. For example, if the data item is a "struct clk *", one
+ * can use clk_disable_unprepare directly as the action instead of
+ * creating a local wrapper taking a "void *" argument.
  */
-int devm_add_action(struct device *dev, void (*action)(void *), void *data)
+int __devm_add_action(struct device *dev, void (*action)(void *), void *data)
 {
struct action_devres *devres;
 
@@ -751,18 +761,20 @@ int devm_add_action(struct device *dev, void 
(*action)(void *), void *data)
devres_add(dev, devres);
return 0;
 }
-EXPORT_SYMBOL_GPL(devm_add_action);
+EXPORT_SYMBOL_GPL(__devm_add_action);
 
 /**
- * devm_remove_action() - removes previously added custom action
+ * __devm_remove_action() - removes previously added custom action
  * @dev: Device that owns the action
  * @action: Function implementing the action
  * @data: Pointer to data passed to @action implementation
  *
  * Removes instance of @action previously added by devm_add_action().
  * Both action and data should match one of the existing entries.
+ *
+ * Use the macro wrapper devm_remove_action, see __devm_add_action for details.
  */
-void devm_remove_action(struct device *dev, void (*action)(void *), void *data)
+void __devm_remove_action(struct device *dev, void (*action)(void *), void 
*data)
 {
struct action_devres devres = {
.data = data,
@@ -772,10 +784,10 @@ void devm_remove_action(struct device *dev, void 
(*action)(void *), void *data)
WARN_ON(devres_destroy(dev, devm_action_release, devm_action_match,
   ));
 }
-EXPORT_SYMBOL_GPL(devm_remove_action);
+EXPORT_SYMBOL_GPL(__devm_remove_action);
 
 /**
- * devm_release_action() - release previously added custom action
+ * __devm_release_action() - release previously added custom action
  * @dev: Device that owns the action
  * @action: Function implementing the action
  * @data: Pointer to data passed to @action implementation
@@ -783,8 +795,10 @@ EXPORT_SYMBOL_GPL(devm_remove_action);
  * Releases and removes instance of @action previously added by
  * devm_add_action().  Both action and data should match one of the
  * existing entries.
+ *
+ * Use the macro wrapper devm_release_action, see __devm_add_action for 
details.
  */
-void devm_release_action(struct device *dev, void (*action)(void *), void 
*data)
+void __devm_release_action(struct device *dev, void (*action)(void *), void 
*data)
 {
struct action_devres devres = {
.data = data,
@@ -795,7 +809,7 @@ void devm_release_action(struct device *dev, void 
(*action)(void *), void *data)
   ));
 
 }
-EXPORT_SYMBOL_GPL(devm_release_action);
+EXPORT_SYMBOL_GPL(__devm_release_action);
 
 /*
  * Managed kmalloc/kfree
diff --git a/include/linux/device.h b/include/linux/device.h
index ba660731bd25..c924612bfefd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -250,11 +250,39 @@ void __iomem *devm_of_iomap(struct device *dev,
resource_size_t *size);

Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

2021-03-09 Thread Rasmus Villemoes

On 09/03/2021 23.16, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
>  wrote:
>>
>> So add an initramfs_async= kernel parameter, allowing the main init
>> process to proceed to handling device_initcall()s without waiting for
>> populate_rootfs() to finish.
> 
> Oh, and a completely unrelated second comment about this: some of the
> initramfs population code seems to be actively written to be slow.
> 
> For example, I'm not sure why that write_buffer() function uses an
> array of indirect function pointer actions. Even ignoring the whole
> "speculation protections make that really slow" issue that came later,
> it seems to always have been actively (a) slower and (b) more complex.
> 
[...]
> Is that likely to be a big part of the costs? No. I assume it's the
> decompression and the actual VFS operations. 

Yes, I have been doing some simple measurements, simply by decompressing
the blob in userspace and comparing to the time to that used by
populate_rootfs(). For both the 6M lz4-compressed blob on my ppc target
and the 26M xz-compressed blob on my laptop, the result is that the
decompression itself accounts for the vast majority of the time - and
for ppc in particular, I don't think there's any spectre slowdown.

So I haven't dared looking into changing the unpack implementation since
it doesn't seem it could buy that much.

Rasmus

Re: [PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

2021-03-09 Thread Rasmus Villemoes

On 09/03/2021 23.07, Linus Torvalds wrote:
> On Tue, Mar 9, 2021 at 1:17 PM Rasmus Villemoes
>  wrote:
>>
>> So add an initramfs_async= kernel parameter, allowing the main init
>> process to proceed to handling device_initcall()s without waiting for
>> populate_rootfs() to finish.
> 
> I like this smaller second version of the patch, but am wondering why
> we even need the parameter.
> 
> It sounds mostly like a "maybe I didn't think of all cases" thing -

That's exactly what it is.

> and one that will mean that this code will not see a lot of actual
> test coverage..

Yeah, that's probably true.

> And because of the lack of test coverage, I'd rather reverse the
> meaning, and have the async case on by default (without even the
> Kconfig option), and have the kernel command line purely as a "oops,
> it's buggy, easy to ask people to test if this is what ails them".

Well, I wasn't bold enough to make it "default y" by myself, but I can
certainly do that and nuke the config option.

> What *can* happen early boot outside of firmware loading and usermodehelpers?

Well, that was what I tried to get people to tell me when I sent the
first version as RFC, and also before that
(https://lore.kernel.org/lkml/19574912-44b4-c1dc-44c3-67309968d...@rasmusvillemoes.dk/).
That you can't think of anything suggests that I have covered the
important cases - which does leave random drivers that poke around the
filesystem on their own, but (a) it would probably be a good thing to
have this flush those out and (b) there's the command line option to
make it boot anyway.

Rasmus

Re: [PATCH] ethernet: ucc_geth: Use kmemdup instead of kmalloc and memcpy

2021-03-09 Thread Rasmus Villemoes

On 05/03/2021 15.27, angkery wrote:
> From: Junlin Yang 
> 
> Fixes coccicheck warnings:
> ./drivers/net/ethernet/freescale/ucc_geth.c:3594:11-18:
> WARNING opportunity for kmemdup
> 
> Signed-off-by: Junlin Yang 
> ---
>  drivers/net/ethernet/freescale/ucc_geth.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/ucc_geth.c 
> b/drivers/net/ethernet/freescale/ucc_geth.c
> index ef4e2fe..2c079ad 100644
> --- a/drivers/net/ethernet/freescale/ucc_geth.c
> +++ b/drivers/net/ethernet/freescale/ucc_geth.c
> @@ -3591,10 +3591,9 @@ static int ucc_geth_probe(struct platform_device* 
> ofdev)
>   if ((ucc_num < 0) || (ucc_num > 7))
>   return -ENODEV;
>  
> - ug_info = kmalloc(sizeof(*ug_info), GFP_KERNEL);
> + ug_info = kmemdup(_primary_info, sizeof(*ug_info), GFP_KERNEL);
>   if (ug_info == NULL)
>   return -ENOMEM;
> - memcpy(ug_info, _primary_info, sizeof(*ug_info));
>  
>   ug_info->uf_info.ucc_num = ucc_num;
>  
> 

Ah, yes, of course, I should have used that.

Acked-by: Rasmus Villemoes

[PATCH v2 2/2] modules: add CONFIG_MODPROBE_PATH

2021-03-09 Thread Rasmus Villemoes

Allow the developer to specifiy the initial value of the
modprobe_path[] string. This can be used to set it to the empty string
initially, thus effectively disabling request_module() during early
boot until userspace writes a new value via the
/proc/sys/kernel/modprobe interface. [1]

When building a custom kernel (often for an embedded target), it's
normal to build everything into the kernel that is needed for booting,
and indeed the initramfs often contains no modules at all, so every
such request_module() done before userspace init has mounted the real
rootfs is a waste of time.

This is particularly useful when combined with the previous patch,
which allowed the initramfs to be unpacked asynchronously - for that
to work, it had to make any usermodehelper call wait for the unpacking
to finish before attempting to invoke the userspace helper. By
eliminating all such (known-to-be-futile) calls of usermodehelper, the
initramfs unpacking and the {device,late}_initcalls can proceed in
parallel for much longer.

For a relatively slow ppc board I'm working on, the two patches
combined lead to 0.2s faster boot - but more importantly, the fact
that the initramfs unpacking proceeds completely in the background
while devices get probed means I get to handle the gpio watchdog in
time without getting reset.

[1] __request_module() already has an early -ENOENT return when
modprobe_path is the empty string.

Signed-off-by: Rasmus Villemoes 
---
 init/Kconfig  | 12 
 kernel/kmod.c |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 22946fe5ded9..18b4ec7346d4 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2264,6 +2264,18 @@ config MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS
 
  If unsure, say N.
 
+config MODPROBE_PATH
+   string "Path to modprobe binary"
+   default "/sbin/modprobe"
+   help
+ When kernel code requests a module, it does so by calling
+ the "modprobe" userspace utility. This option allows you to
+ set the path where that binary is found. This can be changed
+ at runtime via the sysctl file
+ /proc/sys/kernel/modprobe. Setting this to the empty string
+ removes the kernel's ability to request modules (but
+ userspace can still load modules explicitly).
+
 config TRIM_UNUSED_KSYMS
bool "Trim unused exported kernel symbols" if EXPERT
depends on !COMPILE_TEST
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 3cd075ce2a1e..b717134ebe17 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -58,7 +58,7 @@ static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
 /*
modprobe_path is set via /proc/sys.
 */
-char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
+char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH;
 
 static void free_modprobe_argv(struct subprocess_info *info)
 {
-- 
2.29.2

[PATCH v2 0/2] background initramfs unpacking, and CONFIG_MODPROBE_PATH

2021-03-09 Thread Rasmus Villemoes

These two patches are independent, but better-together.

The second is a rather trivial patch that simply allows the developer
to change "/sbin/modprobe" to something else - e.g. the empty string,
so that all request_module() during early boot return -ENOENT early,
without even spawning a usermode helper.

The first patch allows delegating decompressing the initramfs to a
worker thread, allowing do_initcalls() in main.c to proceed to the
device_ and late_ initcalls without waiting for that decompression
(and populating of rootfs) to finish. Obviously, some of those later
calls may rely on the initramfs being available, so I've added
synchronization points in the firmware loader and usermodehelper paths
- there might be other places that would need this.

There's not much to win if most of the functionality needed during
boot is only available as modules. But systems with a custom-made
.config and initramfs can boot faster, partly due to utilizing more
than one cpu earlier, partly by avoiding known-futile modprobe calls
(which would still trigger synchronization with the initramfs
unpacking, thus eliminating most of the first benefit).

Routing-wise, I hope akpm can handle both patches. Andrew, Luis?

Changes in v2:

- Rebase on master, piggy-backing on the include guard and #ifdef
  CONFIG_BLK_DEV_INITRD added to initrd.h in fade5cad93 and
  c72160fe05.

- Use existing async_* API instead of wait_for_completion/complete_all

- Drop debug leftovers from wait_for_initramfs().

- Fix initialization of initramfs_async variable.

Rasmus Villemoes (2):
  init/initramfs.c: allow asynchronous unpacking
  modules: add CONFIG_MODPROBE_PATH

 .../admin-guide/kernel-parameters.txt | 12 ++
 drivers/base/firmware_loader/main.c   |  2 +
 include/linux/initrd.h|  2 +
 init/Kconfig  | 12 ++
 init/initramfs.c  | 41 ++-
 init/main.c   |  1 +
 kernel/kmod.c |  2 +-
 kernel/umh.c  |  2 +
 usr/Kconfig   | 10 +
 9 files changed, 82 insertions(+), 2 deletions(-)

-- 
2.29.2

[PATCH v2 1/2] init/initramfs.c: allow asynchronous unpacking

2021-03-09 Thread Rasmus Villemoes

This is primarily motivated by an embedded ppc target, where unpacking
even the rather modest sized initramfs takes 0.6 seconds, which is
long enough that the external watchdog becomes unhappy that it doesn't
get attention soon enough. By doing the initramfs decompression in a
worker thread, we get to do the device_initcalls and hence start
petting the watchdog much sooner.

So add an initramfs_async= kernel parameter, allowing the main init
process to proceed to handling device_initcall()s without waiting for
populate_rootfs() to finish.

Should one of those initcalls need something from the initramfs (say,
a kernel module or a firmware blob), it will simply wait for the
initramfs unpacking to be done before proceeding, which should in
theory make this completely safe to always enable. But if some driver
pokes around in the filesystem directly and not via one of the
official kernel interfaces (i.e. request_firmware*(),
call_usermodehelper*) that theory may not hold - also, I certainly
might have missed a spot when sprinkling wait_for_initramfs().

Normal desktops might benefit as well. On my mostly stock Ubuntu
kernel, my initramfs is a 26M xz-compressed blob, decompressing to
around 126M. That takes almost two seconds:

[0.201454] Trying to unpack rootfs image as initramfs...
[1.976633] Freeing initrd memory: 29416K

Before this patch, or with initramfs_async=0, these lines occur
consecutively in dmesg. With initramfs_async=1, the timestamps on
these two lines is roughly the same as above, but with 172 lines
inbetween - so more than one cpu has been kept busy doing work that
would otherwise only happen after the populate_rootfs() finished.

Signed-off-by: Rasmus Villemoes 
---
 .../admin-guide/kernel-parameters.txt | 12 ++
 drivers/base/firmware_loader/main.c   |  2 +
 include/linux/initrd.h|  2 +
 init/initramfs.c  | 41 ++-
 init/main.c   |  1 +
 kernel/umh.c  |  2 +
 usr/Kconfig   | 10 +
 7 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..fda9f012c42b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1825,6 +1825,18 @@
initcall functions.  Useful for debugging built-in
modules and initcalls.
 
+   initramfs_async= [KNL] Normally, the initramfs image is
+   unpacked synchronously, before most devices
+   are initialized. When the initramfs is huge,
+   or on slow CPUs, this can take a significant
+   amount of time. Setting this option means the
+   unpacking is instead done in a background
+   thread, allowing the main init process to
+   begin calling device_initcall()s while the
+   initramfs is being unpacked.
+   Format: 
+   Default set by CONFIG_INITRAMFS_ASYNC.
+
initrd= [BOOT] Specify the location of the initial ramdisk
 
initrdmem=  [KNL] Specify a physical address and size from which to
diff --git a/drivers/base/firmware_loader/main.c 
b/drivers/base/firmware_loader/main.c
index 78355095e00d..4fdb8219cd08 100644
--- a/drivers/base/firmware_loader/main.c
+++ b/drivers/base/firmware_loader/main.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -504,6 +505,7 @@ fw_get_filesystem_firmware(struct device *device, struct 
fw_priv *fw_priv,
if (!path)
return -ENOMEM;
 
+   wait_for_initramfs();
for (i = 0; i < ARRAY_SIZE(fw_path); i++) {
size_t file_size = 0;
size_t *file_size_ptr = NULL;
diff --git a/include/linux/initrd.h b/include/linux/initrd.h
index 85c15717af34..1bbe9af48dc3 100644
--- a/include/linux/initrd.h
+++ b/include/linux/initrd.h
@@ -20,8 +20,10 @@ extern void free_initrd_mem(unsigned long, unsigned long);
 
 #ifdef CONFIG_BLK_DEV_INITRD
 extern void __init reserve_initrd_mem(void);
+extern void wait_for_initramfs(void);
 #else
 static inline void __init reserve_initrd_mem(void) {}
+static inline void wait_for_initramfs(void) {}
 #endif
 
 extern phys_addr_t phys_initrd_start;
diff --git a/init/initramfs.c b/init/initramfs.c
index d677e8e717f1..d33bd98481c2 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -541,6 +542,14 @@ static int __init keepinitrd_setup(char *__unused)
 __setup("keepinitrd", keepinitrd_setup);
 #endif
 
+static bool initramfs_async = IS_ENABLED(CONFIG_IN

Re: [PATCH v2 3/4] kbuild: re-implement CONFIG_TRIM_UNUSED_KSYMS to make it work in one-pass

2021-03-09 Thread Rasmus Villemoes

On 09/03/2021 20.54, Nicolas Pitre wrote:
> On Wed, 10 Mar 2021, Masahiro Yamada wrote:
> 

>>> I'm not sure I do understand every detail here, especially since it is
>>> so far away from the version that I originally contributed. But the
>>> concept looks good.
>>>
>>> I still think that there is no way around a recursive approach to get
>>> the maximum effect with LTO, but given that true LTO still isn't applied
>>> to mainline after all those years, the recursive approach brings
>>> nothing. Maybe that could be revisited if true LTO ever makes it into
>>> mainline, and the desire to reduce the binary size is still relevant
>>> enough to justify it.
>>
>> Hmm, I am confused.
>>
>> Does this patch change the behavior in the
>> combination with the "true LTO"?
>>
>> Please let me borrow this sentence from your article:
>> "But what LTO does is more like getting rid of branches that simply
>> float in the air without being connected to anything or which have
>> become loose due to optimization."
>> (https://lwn.net/Articles/746780/)
>>
>> This patch throws unneeded EXPORT_SYMBOL metadata
>> into the /DISCARD/ section of the linker script.
>>
>> The approach is different (preprocessor vs linker), but
>> we will still get the same result; the unneeded
>> EXPORT_SYMBOLs are disconnected from the main trunk.
>>
>> Then, the true LTO will remove branches floating in the air,
>> right?
>>
>> So, what will be lost by this patch?
> 
> Let's say you have this in module_foo:
> 
> int foo(int x)
> {
>   return 2 + bar(x);
> }
> EXPORT_SYMBOL(foo);
> 
> And module_bar:
> 
> int bar(int y)
> {
>   return 3 * baz(y);
> }
> EXPORT_SYMBOL(bar);
> 
> And this in the main kernel image:
> 
> int baz(int z)
> {
>   return plonk(z);
> }
> EXPORT_SYMBOLbaz);
> 
> Now we build the kernel and modules. Then we realize that nothing 
> references symbol "foo". We can trim the "foo" export. But it would be 
> necessary to recompile module_foo with LTO (especially if there is 
> some other code in that module) to realize that nothing 
> references foo() any longer and optimize away the reference to bar(). 

But, does LTO even do that to modules? Sure, the export metadata for foo
vanishes, so there's no function pointer reference to foo, but given
that modules are just -r links, the compiler/linker can't really assume
that the generated object won't later be linked with something that does
require foo? At least for the simpler case of --gc-sections, ld docs say:

'--gc-sections'
...

This option can be set when doing a partial link (enabled with
 option '-r').  In this case the root of symbols kept must be
 explicitly specified either by one of the options '--entry',
 '--undefined', or '--gc-keep-exported' or by a 'ENTRY' command in
 the linker script.

and I would assume that for LTO, --gc-keep-exported would be the only
sane semantics (keep any external symbol with default visibility).

Can you point me at a tree/set of LTO patches and a toolchain where the
previous implementation would actually eventually eliminate bar() from
module_bar?

Rasmus

[PATCH 2/2] kernel/async.c: remove async_unregister_domain()

2021-03-09 Thread Rasmus Villemoes

No callers in the tree.

Signed-off-by: Rasmus Villemoes 
---
 include/linux/async.h |  1 -
 kernel/async.c| 18 --
 2 files changed, 19 deletions(-)

diff --git a/include/linux/async.h b/include/linux/async.h
index 0a17cd27f348..cce4ad31e8fc 100644
--- a/include/linux/async.h
+++ b/include/linux/async.h
@@ -112,7 +112,6 @@ async_schedule_dev_domain(async_func_t func, struct device 
*dev,
return async_schedule_node_domain(func, dev, dev_to_node(dev), domain);
 }
 
-void async_unregister_domain(struct async_domain *domain);
 extern void async_synchronize_full(void);
 extern void async_synchronize_full_domain(struct async_domain *domain);
 extern void async_synchronize_cookie(async_cookie_t cookie);
diff --git a/kernel/async.c b/kernel/async.c
index 4b5971142922..b8d7a663497f 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -245,24 +245,6 @@ void async_synchronize_full(void)
 }
 EXPORT_SYMBOL_GPL(async_synchronize_full);
 
-/**
- * async_unregister_domain - ensure no more anonymous waiters on this domain
- * @domain: idle domain to flush out of any async_synchronize_full instances
- *
- * async_synchronize_{cookie|full}_domain() are not flushed since callers
- * of these routines should know the lifetime of @domain
- *
- * Prefer ASYNC_DOMAIN_EXCLUSIVE() declarations over flushing
- */
-void async_unregister_domain(struct async_domain *domain)
-{
-   spin_lock_irq(_lock);
-   WARN_ON(!domain->registered || !list_empty(>pending));
-   domain->registered = 0;
-   spin_unlock_irq(_lock);
-}
-EXPORT_SYMBOL_GPL(async_unregister_domain);
-
 /**
  * async_synchronize_full_domain - synchronize all asynchronous function 
within a certain domain
  * @domain: the domain to synchronize
-- 
2.29.2

[PATCH 1/2] kernel/async.c: stop guarding pr_debug() statements

2021-03-09 Thread Rasmus Villemoes

It's currently nigh impossible to get these pr_debug()s to print
something. Being guarded by initcall_debug means one has to enable
tons of other debug output during boot, and the system_state condition
further means it's impossible to get them when loading modules later.

Also, the compiler can't know that these global conditions do not
change, so there are W=2 warnings

kernel/async.c:125:9: warning: ‘calltime’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]
kernel/async.c:300:9: warning: ‘starttime’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]

Make it possible, for a DYNAMIC_DEBUG kernel, to get these to print
their messages by booting with appropriate 'dyndbg="file async.c +p"'
command line argument. For a non-DYNAMIC_DEBUG kernel, pr_debug()
compiles to nothing.

This does cost doing an unconditional ktime_get() for the starttime
value, but the corresponding ktime_get for the end time can be elided
by factoring it into a function which only gets called if the printk()
arguments end up being evaluated.

Signed-off-by: Rasmus Villemoes 
---

Andrew, this one is of course on top of
https://lore.kernel.org/lkml/20210226124355.2503524-1-li...@rasmusvillemoes.dk/
which you already applied.

 kernel/async.c | 48 
 1 file changed, 20 insertions(+), 28 deletions(-)

diff --git a/kernel/async.c b/kernel/async.c
index 45a867b8644a..4b5971142922 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -78,6 +78,12 @@ static DECLARE_WAIT_QUEUE_HEAD(async_done);
 
 static atomic_t entry_count;
 
+static long long microseconds_since(ktime_t start)
+{
+   ktime_t now = ktime_get();
+   return ktime_to_ns(ktime_sub(now, start)) >> 10;
+}
+
 static async_cookie_t lowest_in_progress(struct async_domain *domain)
 {
struct async_entry *first = NULL;
@@ -111,24 +117,18 @@ static void async_run_entry_fn(struct work_struct *work)
struct async_entry *entry =
container_of(work, struct async_entry, work);
unsigned long flags;
-   ktime_t calltime, delta, rettime;
+   ktime_t calltime;
 
/* 1) run (and print duration) */
-   if (initcall_debug && system_state < SYSTEM_RUNNING) {
-   pr_debug("calling  %lli_%pS @ %i\n",
-   (long long)entry->cookie,
-   entry->func, task_pid_nr(current));
-   calltime = ktime_get();
-   }
+   pr_debug("calling  %lli_%pS @ %i\n", (long long)entry->cookie,
+entry->func, task_pid_nr(current));
+   calltime = ktime_get();
+
entry->func(entry->data, entry->cookie);
-   if (initcall_debug && system_state < SYSTEM_RUNNING) {
-   rettime = ktime_get();
-   delta = ktime_sub(rettime, calltime);
-   pr_debug("initcall %lli_%pS returned after %lld usecs\n",
-   (long long)entry->cookie,
-   entry->func,
-   (long long)ktime_to_ns(delta) >> 10);
-   }
+
+   pr_debug("initcall %lli_%pS returned after %lld usecs\n",
+(long long)entry->cookie, entry->func,
+microseconds_since(calltime));
 
/* 2) remove self from the pending queues */
spin_lock_irqsave(_lock, flags);
@@ -287,23 +287,15 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
  */
 void async_synchronize_cookie_domain(async_cookie_t cookie, struct 
async_domain *domain)
 {
-   ktime_t starttime, delta, endtime;
+   ktime_t starttime;
 
-   if (initcall_debug && system_state < SYSTEM_RUNNING) {
-   pr_debug("async_waiting @ %i\n", task_pid_nr(current));
-   starttime = ktime_get();
-   }
+   pr_debug("async_waiting @ %i\n", task_pid_nr(current));
+   starttime = ktime_get();
 
wait_event(async_done, lowest_in_progress(domain) >= cookie);
 
-   if (initcall_debug && system_state < SYSTEM_RUNNING) {
-   endtime = ktime_get();
-   delta = ktime_sub(endtime, starttime);
-
-   pr_debug("async_continuing @ %i after %lli usec\n",
-   task_pid_nr(current),
-   (long long)ktime_to_ns(delta) >> 10);
-   }
+   pr_debug("async_continuing @ %i after %lli usec\n", 
task_pid_nr(current),
+microseconds_since(starttime));
 }
 EXPORT_SYMBOL_GPL(async_synchronize_cookie_domain);
 
-- 
2.29.2

Re: [PATCH RESEND 0/2] Common protected-clocks implementation

2021-03-09 Thread Rasmus Villemoes

On 03/09/2020 06.00, Samuel Holland wrote:
> Stephen, Maxime,
> 
> You previously asked me to implement the protected-clocks property in a
> driver-independent way:
> 
> https://www.spinics.net/lists/arm-kernel/msg753832.html
> 
> I provided an implementation 6 months ago, which I am resending now:
> 
> https://patchwork.kernel.org/patch/11398629/
> 
> Do you have any comments on it?

I'm also interested [1] in getting something like this supported in a
generic fashion - i.e., being able to mark a clock as
protected/critical/whatnot by just adding an appropriate property in the
clock provider's DT node, but without modifying the driver to opt-in to
handling it.

Now, as to this implementation, the commit 48d7f160b1 which added the
common protected-clocks binding says

  For example, on some Qualcomm firmwares reading or writing certain clk
  registers causes the entire system to reboot,

so I'm not sure handling protected-clocks by translating it to
CLK_CRITICAL and thus calling prepare/enable on it is the right thing to
do - clks that behave like above are truly "hands off, kernel", so the
current driver-specific implementation of simply not registering those
clocks seems to be the right thing to do - or at least the clk framework
would need to be taught to not actually call any methods on such
protected clocks.

For my use case, either "hands off kernel" or "make sure this clock is
enabled" would work since the bootloader anyway enables the clock.

Rasmus

[1]
https://lore.kernel.org/lkml/20210226141411.2517368-1-li...@rasmusvillemoes.dk/

Re: [PATCH 1/2] dt-bindings: misc: add binding for generic ripple counter

2021-03-08 Thread Rasmus Villemoes

On 08/03/2021 22.38, Rob Herring wrote:
> On Mon, Mar 08, 2021 at 09:02:29PM +0100, Rasmus Villemoes wrote:
>> On 08/03/2021 18.21, Rob Herring wrote:
>>> On Fri, Feb 26, 2021 at 03:14:10PM +0100, Rasmus Villemoes wrote:
>>>> While a ripple counter can not usually be interfaced with (directly)
>>>> from software, it may still be a crucial component in a board
>>>> layout. To prevent its input clock from being disabled by the clock
>>>> core because it apparently has no consumer, one needs to be able to
>>>> represent that consumer in DT.
>>>
>>> I'm okay with this as it is describing h/w, but we already 
>>> 'protected-clocks' property which should work.
>>
>> Hm. Unless
>> https://lore.kernel.org/lkml/20200903040015.5627-2-sam...@sholland.org/
>> gets merged, I don't see how this would work out-of-the-box.
> 
> Hum, no really clear what the hold up is there given it seems it was 
> asked for. Letting it sit for 5 months is certainly not the way 
> to get it merged. Anyways, that's the kernel's problem, not mine as far 
> as DT bindings are concerned.
> 
>>
>> Note that I sent a completely different v2, which made the gpio-wdt the
>> clock consumer based on feedback from Guenter and Arnd, but that v2
>> isn't suitable for our case because it post-poned handling of the
>> watchdog till after i2c is ready, which is too late. Somewhat similar to
>> https://lore.kernel.org/lkml/20210222171247.97609-2-sebastian.reic...@collabora.com/
>> it seems.
> 
> Now at that one in my queue... I think 'protected-clocks' is the best 
> way to avoid any driver probe ordering issues. It's the only thing that 
> really captures don't turn off this clock. 

Agreed, and I did start by looking for a generic way to mark the clock
as either "hands off, kernel" (relying on the bootloader to enable it),
or better "make sure it's enabled". The closest I found was
of_clk_detect_critical(), but the comment above that one says not to use
it, so adding a call to some random RTC driver to support the
clock-critical property just for my use case didn't seem like the right
way to go.

I didn't know about protected-clocks until you mentioned it, and it does
seem to be the right way to handle these situations (which are
apparently more common than I thought).

The ripple counter binding
> doesn't really capture that or what it is related to.

Agreed, it was a "hail mary" and why I explained what I was really
trying to achieve in the cover letter.

Also, the
> ripple-counter driver could be a module and you'd still have the same 
> issue as v2.

Well, not quite. First of all, for a board like this, one always uses a
tailor-made .config, where one would never set that to be a module (and
even more obviously one wouldn't make the gpio-wdt driver a module).
Second, it wouldn't be the same issue as v2. Rather, if the clock only
gets enabled later when the ripple counter module would get loaded,
there would be a period of time where the watchdog was rendered useless
- the problem with v2 was that the watchdog wouldn't be petted in time,
so the board would be reset before it booted completely.

>>>> +Required properties:
>>>> +- compatible: Must be "linux,ripple-ctr".
>>>
>>> Nothing linux specific about this.
>>
>> True, but I was following the lead of the existing gpio-wdt binding. Is
>> there some other "vendor" name one can and should use for completely
>> generic and simple components like these? "generic"?
> 
> Most 'generic' and GPIO based interfaces have no vendor prefix.

Ah, I see. Can we add just plain "wdt-gpio" to the gpio-wdt binding, and
deprecate the "linux,wdt-gpio"? It's a little awkward to handle a
"linux,wdt-gpio" compatible in a U-Boot driver.

Rasmus

Re: [PATCH 1/2] dt-bindings: misc: add binding for generic ripple counter

2021-03-08 Thread Rasmus Villemoes

On 08/03/2021 18.21, Rob Herring wrote:
> On Fri, Feb 26, 2021 at 03:14:10PM +0100, Rasmus Villemoes wrote:
>> While a ripple counter can not usually be interfaced with (directly)
>> from software, it may still be a crucial component in a board
>> layout. To prevent its input clock from being disabled by the clock
>> core because it apparently has no consumer, one needs to be able to
>> represent that consumer in DT.
> 
> I'm okay with this as it is describing h/w, but we already 
> 'protected-clocks' property which should work.

Hm. Unless
https://lore.kernel.org/lkml/20200903040015.5627-2-sam...@sholland.org/
gets merged, I don't see how this would work out-of-the-box.

Note that I sent a completely different v2, which made the gpio-wdt the
clock consumer based on feedback from Guenter and Arnd, but that v2
isn't suitable for our case because it post-poned handling of the
watchdog till after i2c is ready, which is too late. Somewhat similar to
https://lore.kernel.org/lkml/20210222171247.97609-2-sebastian.reic...@collabora.com/
it seems.

>> +
>> +Required properties:
>> +- compatible: Must be "linux,ripple-ctr".
> 
> Nothing linux specific about this.

True, but I was following the lead of the existing gpio-wdt binding. Is
there some other "vendor" name one can and should use for completely
generic and simple components like these? "generic"?

Rasmus

[PATCH] ARM: decompressor: remove unused global variable output_data

2021-03-08 Thread Rasmus Villemoes

output_data seems to have been write-only since the flush_window()
callback was removed in commit e7db7b4270ed ("arm: add support for
LZO-compressed kernels").

Signed-off-by: Rasmus Villemoes 
---
 arch/arm/boot/compressed/misc.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/arm/boot/compressed/misc.c b/arch/arm/boot/compressed/misc.c
index e1e9a5dde853..fdef4d413d24 100644
--- a/arch/arm/boot/compressed/misc.c
+++ b/arch/arm/boot/compressed/misc.c
@@ -103,8 +103,6 @@ static void putstr(const char *ptr)
 extern char input_data[];
 extern char input_data_end[];
 
-unsigned char *output_data;
-
 unsigned long free_mem_ptr;
 unsigned long free_mem_end_ptr;
 
@@ -145,7 +143,6 @@ decompress_kernel(unsigned long output_start, unsigned long 
free_mem_ptr_p,
 {
int ret;
 
-   output_data = (unsigned char *)output_start;
free_mem_ptr= free_mem_ptr_p;
free_mem_end_ptr= free_mem_ptr_end_p;
__machine_arch_type = arch_id;
@@ -154,7 +151,7 @@ decompress_kernel(unsigned long output_start, unsigned long 
free_mem_ptr_p,
 
putstr("Uncompressing Linux...");
ret = do_decompress(input_data, input_data_end - input_data,
-   output_data, error);
+   (u8 *)output_start, error);
if (ret)
error("decompressor returned an error");
else
-- 
2.29.2

[PATCH] kbuild: apply fixdep logic to link-vmlinux.sh

2021-03-05 Thread Rasmus Villemoes

The patch adding CONFIG_VMLINUX_MAP revealed a small defect in the
build system: link-vmlinux.sh takes decisions based on CONFIG_*
options, but changing one of those does not always lead to vmlinux
being linked again.

For most of the CONFIG_* knobs referenced previously, this has
probably been hidden by those knobs also affecting some object file,
hence indirectly also vmlinux.

But CONFIG_VMLINUX_MAP is only handled inside link-vmlinux.sh, and
changing CONFIG_VMLINUX_MAP=n to CONFIG_VMLINUX_MAP=y does not cause
the build system to re-link (and hence have vmlinux.map
emitted). Since that map file is mostly a debugging aid, this is
merely a nuisance which is easily worked around by just deleting
vmlinux and building again.

But one could imagine other (possibly future) CONFIG options that
actually do affect the vmlinux binary but which are not captured
through some object file dependency.

To fix this, make link-vmlinux.sh emit a .vmlinux.d file in the same
format as the dependency files generated by gcc, and apply the fixdep
logic to that. I've tested that this correctly works with both in-tree
and out-of-tree builds.

Signed-off-by: Rasmus Villemoes 
---
 Makefile| 2 +-
 scripts/link-vmlinux.sh | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index b18dbc634690..19d2f7fd088a 100644
--- a/Makefile
+++ b/Makefile
@@ -1192,7 +1192,7 @@ cmd_link-vmlinux =
 \
$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
 
 vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
-   +$(call if_changed,link-vmlinux)
+   +$(call if_changed_dep,link-vmlinux)
 
 targets := vmlinux
 
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 855fd4e6f03e..7d4b7c6f01e8 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -312,6 +312,7 @@ cleanup()
rm -f vmlinux
rm -f vmlinux.map
rm -f vmlinux.o
+   rm -f .vmlinux.d
 }
 
 on_exit()
@@ -421,6 +422,7 @@ if [ -n "${CONFIG_KALLSYMS}" ]; then
 fi
 
 vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o}
+echo "vmlinux: $0" > .vmlinux.d
 
 # fill in BTF IDs
 if [ -n "${CONFIG_DEBUG_INFO_BTF}" -a -n "${CONFIG_BPF}" ]; then
-- 
2.29.2

[PATCH v2] kbuild: add CONFIG_VMLINUX_MAP expert option

2021-03-05 Thread Rasmus Villemoes

It can be quite useful to have ld emit a link map file, in order to
debug or verify that special sections end up where they are supposed
to, and to see what LD_DEAD_CODE_DATA_ELIMINATION manages to get rid
of.

The only reason I'm not just adding this unconditionally is that the
.map file can be rather large (several MB), and that's a waste of
space when one isn't interested in these things. Also make it depend
on CONFIG_EXPERT.

Signed-off-by: Rasmus Villemoes 
---
 .gitignore  |  1 +
 Documentation/dontdiff  |  1 +
 lib/Kconfig.debug   | 10 ++
 scripts/link-vmlinux.sh |  8 
 4 files changed, 20 insertions(+)

diff --git a/.gitignore b/.gitignore
index 3af66272d6f1..3adea59847ce 100644
--- a/.gitignore
+++ b/.gitignore
@@ -59,6 +59,7 @@ modules.order
 /linux
 /vmlinux
 /vmlinux.32
+/vmlinux.map
 /vmlinux.symvers
 /vmlinux-gdb.py
 /vmlinuz
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index e361fc95ca29..ac42ad8d430d 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -252,6 +252,7 @@ vmlinux-*
 vmlinux.aout
 vmlinux.bin.all
 vmlinux.lds
+vmlinux.map
 vmlinux.symvers
 vmlinuz
 voffset.h
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5ea0c1773b0a..663c1cd5018c 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -412,6 +412,16 @@ config VMLINUX_VALIDATION
depends on STACK_VALIDATION && DEBUG_ENTRY && !PARAVIRT
default y
 
+config VMLINUX_MAP
+   bool "Generate vmlinux.map file when linking"
+   depends on EXPERT
+   help
+ Selecting this option will pass "-Map=vmlinux.map" to ld
+ when linking vmlinux. That file can be useful for verifying
+ and debugging magic section games, and for seeing which
+ pieces of code get eliminated with
+ CONFIG_LD_DEAD_CODE_DATA_ELIMINATION.
+
 config DEBUG_FORCE_WEAK_PER_CPU
bool "Force weak per-cpu definitions"
depends on DEBUG_KERNEL
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 3b261b0f74f0..855fd4e6f03e 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -155,6 +155,7 @@ vmlinux_link()
local output=${1}
local objects
local strip_debug
+   local map_option
 
info LD ${output}
 
@@ -166,6 +167,10 @@ vmlinux_link()
strip_debug=-Wl,--strip-debug
fi
 
+   if [ -n "${CONFIG_VMLINUX_MAP}" ]; then
+   map_option="-Map=${output}.map"
+   fi
+
if [ "${SRCARCH}" != "um" ]; then
if [ -n "${CONFIG_LTO_CLANG}" ]; then
# Use vmlinux.o instead of performing the slow LTO
@@ -187,6 +192,7 @@ vmlinux_link()
${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}  \
${strip_debug#-Wl,} \
-o ${output}\
+   ${map_option}   \
-T ${lds} ${objects}
else
objects="-Wl,--whole-archive\
@@ -200,6 +206,7 @@ vmlinux_link()
${CC} ${CFLAGS_vmlinux} \
${strip_debug}  \
-o ${output}\
+   ${map_option:+-Wl,${map_option}}\
-Wl,-T,${lds}   \
${objects}  \
-lutil -lrt -lpthread
@@ -303,6 +310,7 @@ cleanup()
rm -f .tmp_vmlinux*
rm -f System.map
rm -f vmlinux
+   rm -f vmlinux.map
rm -f vmlinux.o
 }
 
-- 
2.29.2

[PATCH] clk: use clk_core_enable_lock() a bit more

2021-03-04 Thread Rasmus Villemoes

Use clk_core_enable_lock() and clk_core_disable_lock() in a few places
rather than open-coding them.

Signed-off-by: Rasmus Villemoes 
---
 drivers/clk/clk.c | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 5052541a0986..8c1ed844b97e 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2078,12 +2078,8 @@ static void clk_change_rate(struct clk_core *core)
return;
 
if (core->flags & CLK_SET_RATE_UNGATE) {
-   unsigned long flags;
-
clk_core_prepare(core);
-   flags = clk_enable_lock();
-   clk_core_enable(core);
-   clk_enable_unlock(flags);
+   clk_core_enable_lock(core);
}
 
if (core->new_parent && core->new_parent != core->parent) {
@@ -2116,11 +2112,7 @@ static void clk_change_rate(struct clk_core *core)
core->rate = clk_recalc(core, best_parent_rate);
 
if (core->flags & CLK_SET_RATE_UNGATE) {
-   unsigned long flags;
-
-   flags = clk_enable_lock();
-   clk_core_disable(core);
-   clk_enable_unlock(flags);
+   clk_core_disable_lock(core);
clk_core_unprepare(core);
}
 
@@ -3564,8 +3556,6 @@ static int __clk_core_init(struct clk_core *core)
 * reparenting clocks
 */
if (core->flags & CLK_IS_CRITICAL) {
-   unsigned long flags;
-
ret = clk_core_prepare(core);
if (ret) {
pr_warn("%s: critical clk '%s' failed to prepare\n",
@@ -3573,9 +3563,7 @@ static int __clk_core_init(struct clk_core *core)
goto out;
}
 
-   flags = clk_enable_lock();
-   ret = clk_core_enable(core);
-   clk_enable_unlock(flags);
+   ret = clk_core_enable_lock(core);
if (ret) {
pr_warn("%s: critical clk '%s' failed to enable\n",
   __func__, core->name);
-- 
2.29.2

Re: [PATCH 5/7] printk: Make %pS and friends print module build ID

2021-03-04 Thread Rasmus Villemoes

On 04/03/2021 20.15, Stephen Boyd wrote:
> Quoting Matthew Wilcox (2021-03-04 09:00:52)
>> On Mon, Mar 01, 2021 at 09:47:47AM -0800, Stephen Boyd wrote:
>>> Example:
>>>
>>>  WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 
>>> lkdtm_WARNING+0x28/0x30 [lkdtm] (ed5019fdf5e53be37cb1ba7899292d7e143b259e)
>>
>> Would the first 12 characters instead of all 40 make it more palatable
>> without reducing its utility?
> 
> I can't seem to request debuginfo from debuginfod without the full 40
> characters. It's not a git sha1 hash. 
> 
>> And I feel it should be within the [], so maybe this:
>>
>> WARNING: CPU: 4 PID: 3255 at drivers/misc/lkdtm/bugs.c:83 
>> lkdtm_WARNING+0x28/0x30 [lkdtm ed5019fdf5e5]
>>
> 
> Sure I could put the hex numbers inside the brackets. I suspect changing
> %pS or updating the "Modules linked in:" line isn't going to be
> palatable. I've decided to introduce another printk format %pT to print
> the stacktrace 

Can you avoid claiming a new "top-level" %p modifier? Isn't it better to
add a new flag to '%pS', say '%pSb' to include build-id?

Rasmus

Re: [PATCH] kbuild: add CONFIG_VMLINUX_MAP expert option

2021-03-04 Thread Rasmus Villemoes

On 24/02/2021 11.52, Rasmus Villemoes wrote:
> It can be quite useful to have ld emit a link map file, in order to
> debug or verify that special sections end up where they are supposed
> to, and to see what LD_DEAD_CODE_DATA_ELIMINATION manages to get rid
> of.
> 
> The only reason I'm not just adding this unconditionally is that the
> .map file can be rather large (several MB), and that's a waste of
> space when one isn't interested in these things. Also hide the prompt
> behind CONFIG_EXPERT.

ping

[PATCH v2 0/3] add "delay" clock support to gpio_wdt

2021-03-04 Thread Rasmus Villemoes

As Arnd and Guenther suggested, this adds support to the gpio_wdt
driver for being a consumer of the clock driving the ripple
counter. However, I don't think it should be merged as-is, see below.

The first patch makes sense on its own, quick grepping suggests plenty
of places that could benefit from this, and I thought it would be odd
to have to re-introduce a .remove callback in the gpio_wdt driver.

Unfortunately, this turns out to be a bit of an "operation succeeded,
patient (almost) died": We use CONFIG_GPIO_WATCHDOG_ARCH_INITCALL
because the watchdog has a rather short timeout (1.0-2.25s, 1.6s
typical according to data sheet). At first, I put the new code right
after the devm_gpiod_get(), but the problem is that this early, we get
-EPROBE_DEFER since the clock provider (the RTC which sits off i2c)
isn't probed yet. But then the board would reset because it takes way
too long for the rest of the machine to initialize. [The bootloader
makes sure to turn on the RTC's clock output so the watchdog is
actually functional, the task here is to figure out the proper way to
prevent clk_disable_unused() from disabling it.]

Moving the logic to after the first "is it always-running and if so
give it an initial ping" made the board survive, but unfortunately the
second, and succesful, probe happens a little more than a second
later, which happens to work on this particular board, but is
obviously not suitable for production given that it's already above
what the spec says, and other random changes in the future might make
the gap even wider.

So I don't know. The hardware is obviously misdesigned, and I don't
know how far the mainline kernel should stretch to support this; OTOH
the kernel does contain lots of workarounds for quirks and hardware
bugs. 




Rasmus Villemoes (3):
  clk: add devm_clk_prepare_enable() helper
  dt-bindings: watchdog: add optional "delay" clock to gpio-wdt binding
  watchdog: gpio_wdt: implement support for optional "delay" clock

 .../devicetree/bindings/watchdog/gpio-wdt.txt |  6 
 .../driver-api/driver-model/devres.rst|  1 +
 drivers/clk/clk-devres.c  | 29 +++
 drivers/watchdog/gpio_wdt.c   |  9 ++
 include/linux/clk.h   | 13 +
 5 files changed, 58 insertions(+)

-- 
2.29.2

[PATCH v2 2/3] dt-bindings: watchdog: add optional "delay" clock to gpio-wdt binding

2021-03-04 Thread Rasmus Villemoes

[DO NOT MERGE - see cover letter]

We have a board where the reset output from the ADM706S is split in
two: directly routed to an interrupt, and also to start a ripple
counter, which 64 ms later than pulls the SOC's reset pin. That ripple
counter only works if the RTC's 32kHz output is enabled, and since
linux by default disables unused clocks, that effectively renders the
watchdog useless. So add an optional "delay" clock binding.

Suggested-by: Arnd Bergmann 
Signed-off-by: Rasmus Villemoes 
---
 Documentation/devicetree/bindings/watchdog/gpio-wdt.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/watchdog/gpio-wdt.txt 
b/Documentation/devicetree/bindings/watchdog/gpio-wdt.txt
index 198794963786..527be6b30451 100644
--- a/Documentation/devicetree/bindings/watchdog/gpio-wdt.txt
+++ b/Documentation/devicetree/bindings/watchdog/gpio-wdt.txt
@@ -17,6 +17,10 @@ Optional Properties:
 - always-running: If the watchdog timer cannot be disabled, add this flag to
   have the driver keep toggling the signal without a client. It will only cease
   to toggle the signal when the device is open and the timeout elapsed.
+- clock-names: May contain the entry "delay" if the board has logic
+  that delays the reset signal from the watchdog and which requires an
+  external signal to function.
+- clocks: Phandles corresponding to the clock-names.
 
 Example:
watchdog: watchdog {
@@ -25,4 +29,6 @@ Example:
gpios = < 9 GPIO_ACTIVE_LOW>;
hw_algo = "toggle";
hw_margin_ms = <1600>;
+   clock-names = "delay";
+   clocks = < 1>;
};
-- 
2.29.2

[PATCH v2 3/3] watchdog: gpio_wdt: implement support for optional "delay" clock

2021-03-04 Thread Rasmus Villemoes

[DO NOT MERGE - see cover letter]

We have a board where the reset output from the ADM706S is split in
two: directly routed to an interrupt, and also to start a ripple
counter, which 64 ms later than pulls the SOC's reset pin. That ripple
counter only works if the RTC's 32kHz output is enabled, and since
linux by default disables unused clocks, that effectively renders the
watchdog useless.

Add driver support for an optional "delay" clock, as documented in the
preceding patch.

Signed-off-by: Rasmus Villemoes 
---
 drivers/watchdog/gpio_wdt.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/watchdog/gpio_wdt.c b/drivers/watchdog/gpio_wdt.c
index 0923201ce874..f812c39bc1e8 100644
--- a/drivers/watchdog/gpio_wdt.c
+++ b/drivers/watchdog/gpio_wdt.c
@@ -6,6 +6,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -111,6 +112,7 @@ static int gpio_wdt_probe(struct platform_device *pdev)
enum gpiod_flags gflags;
unsigned int hw_margin;
const char *algo;
+   struct clk *clk;
int ret;
 
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
@@ -164,6 +166,13 @@ static int gpio_wdt_probe(struct platform_device *pdev)
if (priv->always_running)
gpio_wdt_start(>wdd);
 
+   clk = devm_clk_get_optional(dev, "delay");
+   if (IS_ERR(clk))
+   return PTR_ERR(clk);
+   ret = devm_clk_prepare_enable(dev, clk);
+   if (ret)
+   return ret;
+
return devm_watchdog_register_device(dev, >wdd);
 }
 
-- 
2.29.2

[PATCH v2 1/3] clk: add devm_clk_prepare_enable() helper

2021-03-04 Thread Rasmus Villemoes

Add a managed wrapper for clk_prepare_enable().

Signed-off-by: Rasmus Villemoes 
---
 .../driver-api/driver-model/devres.rst|  1 +
 drivers/clk/clk-devres.c  | 29 +++
 include/linux/clk.h   | 13 +
 3 files changed, 43 insertions(+)

diff --git a/Documentation/driver-api/driver-model/devres.rst 
b/Documentation/driver-api/driver-model/devres.rst
index cd8b6e657b94..8ee2557f9ad7 100644
--- a/Documentation/driver-api/driver-model/devres.rst
+++ b/Documentation/driver-api/driver-model/devres.rst
@@ -253,6 +253,7 @@ CLOCK
   devm_clk_hw_register()
   devm_of_clk_add_hw_provider()
   devm_clk_hw_register_clkdev()
+  devm_clk_prepare_enable()
 
 DMA
   dmaenginem_async_device_register()
diff --git a/drivers/clk/clk-devres.c b/drivers/clk/clk-devres.c
index be160764911b..d5bfa8cd7347 100644
--- a/drivers/clk/clk-devres.c
+++ b/drivers/clk/clk-devres.c
@@ -156,3 +156,32 @@ struct clk *devm_get_clk_from_child(struct device *dev,
return clk;
 }
 EXPORT_SYMBOL(devm_get_clk_from_child);
+
+static void devm_clk_disable_unprepare(struct device *dev, void *res)
+{
+   clk_disable_unprepare(*(struct clk **)res);
+}
+
+int devm_clk_prepare_enable(struct device *dev, struct clk *clk)
+{
+   struct clk **ptr;
+   int ret;
+
+   if (!clk)
+   return 0;
+
+   ptr = devres_alloc(devm_clk_disable_unprepare, sizeof(*ptr), 
GFP_KERNEL);
+   if (!ptr)
+   return -ENOMEM;
+
+   ret = clk_prepare_enable(clk);
+   if (!ret) {
+   *ptr = clk;
+   devres_add(dev, ptr);
+   } else {
+   devres_free(ptr);
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL(devm_clk_prepare_enable);
diff --git a/include/linux/clk.h b/include/linux/clk.h
index 266e8de3cb51..04d135520480 100644
--- a/include/linux/clk.h
+++ b/include/linux/clk.h
@@ -485,6 +485,19 @@ struct clk *devm_clk_get_optional(struct device *dev, 
const char *id);
  */
 struct clk *devm_get_clk_from_child(struct device *dev,
struct device_node *np, const char *con_id);
+/**
+ * devm_clk_prepare_enable - prepare and enable a clock source
+ * @dev: device for clock "consumer"
+ * @clk: clock source
+ *
+ * This function calls clk_prepare_enable() on @clk, and ensures the
+ * clock will automatically be disabled and unprepared when the device
+ * is unbound from the bus.
+ *
+ * Must not be called from within atomic context.
+ */
+int devm_clk_prepare_enable(struct device *dev, struct clk *clk);
+
 /**
  * clk_rate_exclusive_get - get exclusivity over the rate control of a
  *  producer
-- 
2.29.2

Re: linux-kernel janitorial RFP: Mark static arrays as const

2021-03-03 Thread Rasmus Villemoes

On 02/03/2021 18.42, Joe Perches wrote:
> Here is a possible opportunity to reduce data usage in the kernel.
> 
> $ git grep -P -n '^static\s+(?!const|struct)(?:\w+\s+){1,3}\w+\s*\[\s*\]' 
> drivers/ | \
>   grep -v __initdata | \
>   wc -l
> 3250
> 
> Meaning there are ~3000 declarations of arrays with what appears to be
> file static const content that are not marked const.
> 
> So there are many static arrays that could be marked const to move the
> compiled object code from data to text minimizing the total amount of
> exposed r/w data.

You can add const if you like, but it will rarely change the generated
code. gcc is already smart enough to take a static array whose contents
are provably never modified within the TU and put it in .rodata:

static int x = 7;
static int y[2] = {13, 19};

static int p(int a, const int *foo)
{
return a + *foo;
}
int q(int a)
{
int b = p(a, );
return p(b, [b & 1]);
}
$ nm c.o
 T q
 r y
$ size c.o
   textdata bss dec hex filename
111   0   0 111  6f c.o

So x gets optimized away completely, but y isn't so easy to get rid of -
nevertheless, it's never modified and the address doesn't escape the TU,
so gcc treats is as if it was declared const.

I think you'll see the same if you try adding the const on a few of your
real-life examples. (Of course, the control flow may be so convoluted
that gcc isn't able to infer the constness, so I'm not saying it will
never make a difference - only that you shouldn't expect too much.)

Rasmus

Re: [PATCH 3/3] [v4] lib/vsprintf: no_hash_pointers prints all addresses as unhashed

2021-03-02 Thread Rasmus Villemoes

On 02/03/2021 15.28, Geert Uytterhoeven wrote:

> Fortunately gcc is already smart enough to deduplicate identical strings,
> but only in the same source file.

Yeah, gcc can't do much more since it only handles one source file at a
time. However, the linker certainly deduplicates strings across
translation units :)

I don't think gcc bothers to do the tail merging since the linker does
it, but that may depend on optimization level and gcc version; it does
seem to merge identical strings within a TU, at least in very simple cases.

Rasmus

$ grep . *.c
a.c:const char *a1(void) { return "string"; }
a.c:const char *a2(void) { return "string"; }
a.c:const char *a3(void) { return "longer string"; }
b.c:const char *b1(void) { return "string"; }
b.c:const char *b2(void) { return "longer string"; }
b.c:const char *b3(void) { return "much longer string"; }
main.c:#include 
main.c:const char *a1(void);
main.c:const char *a2(void);
main.c:const char *a3(void);
main.c:const char *b1(void);
main.c:const char *b2(void);
main.c:const char *b3(void);
main.c:int main(int argc, char *argv[])
main.c:{
main.c:#define p(x) printf("%s(): %p - [%s]\n", #x, x(), x())
main.c: p(a1);
main.c: p(a2);
main.c: p(a3);
main.c: p(b1);
main.c: p(b2);
main.c: p(b3);
main.c:
main.c: return 0;
main.c:}

$ gcc -O2 -o a.o -c a.c ; gcc -O2 -o b.o -c b.c ; gcc -O2 -o main.o -c
main.c ; gcc -o main main.o a.o b.o ; ./main
a1(): 0x560b1bc3d033 - [string]
a2(): 0x560b1bc3d033 - [string]
a3(): 0x560b1bc3d02c - [longer string]
b1(): 0x560b1bc3d033 - [string]
b2(): 0x560b1bc3d02c - [longer string]
b3(): 0x560b1bc3d027 - [much longer string]

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3339 matches

Mail list logo