Re: [RFC PATCH v2 1/4] tracing: add __print_sym() to replace __print_symbolic()

2024-03-27 Thread Johannes Berg
On Wed, 2024-03-27 at 21:11 +, Simon Horman wrote:
> 
> I'm seeing some allmodconfig build problems with this applied on top of
> net-next.

> ./include/trace/stages/init.h:30: warning: "TRACE_DEFINE_SYM_FNS" redefined

> ./include/trace/stages/init.h:54: warning: "TRACE_DEFINE_SYM_LIST" redefined

Yeah, the 0-day bot reported that too, sorry about that. It needs two
lines to #undef these in init.h before their definition, just like all
other macros there. Not sure why my builds didn't show that, maybe it
doesn't affect all users.

johannes



[RFC PATCH v2 4/4] tracing/timer: use __print_sym()

2024-03-26 Thread Johannes Berg
From: Johannes Berg 

Use the new __print_sym() in the timer tracing, just to show
how to convert something. This adds ~80 bytes of .text for a
saving of ~1.5K of data in my builds.

Note the format changes from

print fmt: "success=%d dependency=%s", REC->success, 
__print_symbolic(REC->dependency, { 0, "NONE" }, { (1 << 0), "POSIX_TIMER" }, { 
(1 << 1), "PERF_EVENTS" }, { (1 << 2), "SCHED" }, { (1 << 3), "CLOCK_UNSTABLE" 
}, { (1 << 4), "RCU" }, { (1 << 5), "RCU_EXP" })

to

print fmt: "success=%d dependency=%s", REC->success, 
__print_symbolic(REC->dependency, { 0, "NONE" }, { 1, "POSIX_TIMER" }, { 2, 
"PERF_EVENTS" }, { 4, "SCHED" }, { 8, "CLOCK_UNSTABLE" }, { 16, "RCU" }, { 32, 
"RCU_EXP" })

Since the values are now just printed in the show function as
pure decimal values.

Signed-off-by: Johannes Berg 
---
 include/trace/events/timer.h | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 1ef58a04fc57..d483abffed78 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -402,26 +402,18 @@ TRACE_EVENT(itimer_expire,
 #undef tick_dep_mask_name
 #undef tick_dep_name_end
 
-/* The MASK will convert to their bits and they need to be processed too */
-#define tick_dep_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \
-   TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-#define tick_dep_name_end(sdep)  TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \
-   TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-/* NONE only has a mask defined for it */
-#define tick_dep_mask_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-
-TICK_DEP_NAMES
-
-#undef tick_dep_name
-#undef tick_dep_mask_name
-#undef tick_dep_name_end
-
 #define tick_dep_name(sdep) { TICK_DEP_MASK_##sdep, #sdep },
 #define tick_dep_mask_name(sdep) { TICK_DEP_MASK_##sdep, #sdep },
 #define tick_dep_name_end(sdep) { TICK_DEP_MASK_##sdep, #sdep }
 
+TRACE_DEFINE_SYM_LIST(tick_dep_names, TICK_DEP_NAMES);
+
+#undef tick_dep_name
+#undef tick_dep_mask_name
+#undef tick_dep_name_end
+
 #define show_tick_dep_name(val)\
-   __print_symbolic(val, TICK_DEP_NAMES)
+   __print_sym(val, tick_dep_names)
 
 TRACE_EVENT(tick_stop,
 
-- 
2.44.0




[RFC PATCH v2 1/4] tracing: add __print_sym() to replace __print_symbolic()

2024-03-26 Thread Johannes Berg
From: Johannes Berg 

The way __print_symbolic() works is limited and inefficient
in multiple ways:
 - you can only use it with a static list of symbols, but
   e.g. the SKB dropreasons are now a dynamic list

 - it builds the list in memory _three_ times, so it takes
   a lot of memory:
   - The print_fmt contains the list (since it's passed to
 the macro there). This actually contains the names
 _twice_, which is fixed up at runtime.
   - TRACE_DEFINE_ENUM() puts a 24-byte struct trace_eval_map
 for every entry, plus the string pointed to by it, which
 cannot be deduplicated with the strings in the print_fmt
   - The in-kernel symbolic printing creates yet another list
 of struct trace_print_flags for trace_print_symbols_seq()

 - it also requires runtime fixup during init, which is a lot
   of string parsing due to the print_fmt fixup

Introduce __print_sym() to - over time - replace the old one.
We can easily extend this also to __print_flags later, but I
cared only about the SKB dropreasons for now, which has only
__print_symbolic().

This new __print_sym() requires only a single list of items,
created by TRACE_DEFINE_SYM_LIST(), or can even use another
already existing list by using TRACE_DEFINE_SYM_FNS() with
lookup and show methods.

Then, instead of doing an init-time fixup, just do this at the
time when userspace reads the print_fmt. This way, dynamically
updated lists are possible.

For userspace, nothing actually changes, because the print_fmt
is shown exactly the same way the old __print_symbolic() was.

This adds about 4k .text in my test builds, but that'll be
more than paid for by the actual conversions.

Signed-off-by: Johannes Berg 
---
v2:
 - fix RCU
 - use ':' as separator to simplify the code, that's
   still not valid in a C identifier
---
 include/asm-generic/vmlinux.lds.h  |  3 +-
 include/linux/module.h |  2 +
 include/linux/trace_events.h   |  7 ++
 include/linux/tracepoint.h | 20 +
 include/trace/stages/init.h| 54 +
 include/trace/stages/stage2_data_offsets.h |  6 ++
 include/trace/stages/stage3_trace_output.h |  9 +++
 include/trace/stages/stage7_class_define.h |  3 +
 kernel/module/main.c   |  3 +
 kernel/trace/trace_events.c| 90 +-
 kernel/trace/trace_output.c| 45 +++
 11 files changed, 239 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index f7749d0f2562..88de434578a5 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -256,7 +256,8 @@
 #define FTRACE_EVENTS()
\
. = ALIGN(8);   \
BOUNDED_SECTION(_ftrace_events) \
-   BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps)
+   BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) \
+   BOUNDED_SECTION(_ftrace_sym_defs)
 #else
 #define FTRACE_EVENTS()
 #endif
diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..571e5e8f17b6 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -524,6 +524,8 @@ struct module {
unsigned int num_trace_events;
struct trace_eval_map **trace_evals;
unsigned int num_trace_evals;
+   struct trace_sym_def **trace_sym_defs;
+   unsigned int num_trace_sym_defs;
 #endif
 #ifdef CONFIG_FTRACE_MCOUNT_RECORD
unsigned int num_ftrace_callsites;
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 6f9bdfb09d1d..bc7045d535d0 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -27,6 +27,13 @@ const char *trace_print_flags_seq(struct trace_seq *p, const 
char *delim,
 const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val,
const struct trace_print_flags 
*symbol_array);
 
+const char *trace_print_sym_seq(struct trace_seq *p, unsigned long long val,
+   const char *(*lookup)(unsigned long long val));
+const char *trace_sym_lookup(const struct trace_sym_entry *list,
+size_t len, unsigned long long value);
+void trace_sym_show(struct seq_file *m,
+   const struct trace_sym_entry *list, size_t len);
+
 #if BITS_PER_LONG == 32
 const char *trace_print_flags_seq_u64(struct trace_seq *p, const char *delim,
  unsigned long long flags,
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 689b6d71590e..cc3b387953d1 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -31,6 +31,24 @@ struct trace_eval_map {
unsigned long   eval_value;
 };
 
+struct trace_sym_def {
+   const char  *system;
+   const char

[RFC PATCH v2 3/4] net: drop_monitor: use drop_reason_lookup()

2024-03-26 Thread Johannes Berg
From: Johannes Berg 

Now that we have drop_reason_lookup(), we can just use it for
drop_monitor as well, rather than exporting the list itself.

Signed-off-by: Johannes Berg 
---
 include/net/dropreason.h |  4 
 net/core/drop_monitor.c  | 18 +++---
 net/core/skbuff.c|  6 +++---
 3 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index c157070b5303..0e2195ccf2cd 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -38,10 +38,6 @@ struct drop_reason_list {
size_t n_reasons;
 };
 
-/* Note: due to dynamic registrations, access must be under RCU */
-extern const struct drop_reason_list __rcu *
-drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM];
-
 #ifdef CONFIG_TRACEPOINTS
 const char *drop_reason_lookup(unsigned long long value);
 void drop_reason_show(struct seq_file *m);
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index b0f221d658be..185c43e5b501 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -610,9 +610,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, 
struct sk_buff *skb,
 size_t payload_len)
 {
struct net_dm_skb_cb *cb = NET_DM_SKB_CB(skb);
-   const struct drop_reason_list *list = NULL;
-   unsigned int subsys, subsys_reason;
char buf[NET_DM_MAX_SYMBOL_LEN];
+   const char *reason_str;
struct nlattr *attr;
void *hdr;
int rc;
@@ -630,19 +629,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, 
struct sk_buff *skb,
goto nla_put_failure;
 
rcu_read_lock();
-   subsys = u32_get_bits(cb->reason, SKB_DROP_REASON_SUBSYS_MASK);
-   if (subsys < SKB_DROP_REASON_SUBSYS_NUM)
-   list = rcu_dereference(drop_reasons_by_subsys[subsys]);
-   subsys_reason = cb->reason & ~SKB_DROP_REASON_SUBSYS_MASK;
-   if (!list ||
-   subsys_reason >= list->n_reasons ||
-   !list->reasons[subsys_reason] ||
-   strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) {
-   list = 
rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]);
-   subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED;
-   }
-   if (nla_put_string(msg, NET_DM_ATTR_REASON,
-  list->reasons[subsys_reason])) {
+   reason_str = drop_reason_lookup(cb->reason);
+   if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) {
rcu_read_unlock();
goto nla_put_failure;
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 012b48da8810..a8065c40a270 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -141,13 +141,11 @@ static const struct drop_reason_list drop_reasons_core = {
.n_reasons = ARRAY_SIZE(drop_reasons),
 };
 
-const struct drop_reason_list __rcu *
+static const struct drop_reason_list __rcu *
 drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = {
[SKB_DROP_REASON_SUBSYS_CORE] = RCU_INITIALIZER(_reasons_core),
 };
-EXPORT_SYMBOL(drop_reasons_by_subsys);
 
-#ifdef CONFIG_TRACEPOINTS
 const char *drop_reason_lookup(unsigned long long value)
 {
unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT;
@@ -164,7 +162,9 @@ const char *drop_reason_lookup(unsigned long long value)
return NULL;
return subsys->reasons[reason];
 }
+EXPORT_SYMBOL(drop_reason_lookup);
 
+#ifdef CONFIG_TRACEPOINTS
 void drop_reason_show(struct seq_file *m)
 {
u32 subsys_id;
-- 
2.44.0




[RFC PATCH v2 2/4] net: dropreason: use new __print_sym() in tracing

2024-03-26 Thread Johannes Berg
From: Johannes Berg 

The __print_symbolic() could only ever print the core
drop reasons, since that's the way the infrastructure
works. Now that we have __print_sym() with all the
advantages mentioned in that commit, convert to that
and get all the drop reasons from all subsystems. As
we already have a list of them, that's really easy.

This is a little bit of .text (~100 bytes in my build)
and saves a lot of .data (~17k).

Signed-off-by: Johannes Berg 
---
 include/net/dropreason.h   |  5 +
 include/trace/events/skb.h | 16 +++---
 net/core/skbuff.c  | 43 ++
 3 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index 56cb7be92244..c157070b5303 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -42,6 +42,11 @@ struct drop_reason_list {
 extern const struct drop_reason_list __rcu *
 drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM];
 
+#ifdef CONFIG_TRACEPOINTS
+const char *drop_reason_lookup(unsigned long long value);
+void drop_reason_show(struct seq_file *m);
+#endif
+
 void drop_reasons_register_subsys(enum skb_drop_reason_subsys subsys,
  const struct drop_reason_list *list);
 void drop_reasons_unregister_subsys(enum skb_drop_reason_subsys subsys);
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 07e0715628ec..8a1a63f9e796 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -8,15 +8,9 @@
 #include 
 #include 
 #include 
+#include 
 
-#undef FN
-#define FN(reason) TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason);
-DEFINE_DROP_REASON(FN, FN)
-
-#undef FN
-#undef FNe
-#define FN(reason) { SKB_DROP_REASON_##reason, #reason },
-#define FNe(reason){ SKB_DROP_REASON_##reason, #reason }
+TRACE_DEFINE_SYM_FNS(drop_reason, drop_reason_lookup, drop_reason_show);
 
 /*
  * Tracepoint for free an sk_buff:
@@ -44,13 +38,9 @@ TRACE_EVENT(kfree_skb,
 
TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
  __entry->skbaddr, __entry->protocol, __entry->location,
- __print_symbolic(__entry->reason,
-  DEFINE_DROP_REASON(FN, FNe)))
+ __print_sym(__entry->reason, drop_reason ))
 );
 
-#undef FN
-#undef FNe
-
 TRACE_EVENT(consume_skb,
 
TP_PROTO(struct sk_buff *skb, void *location),
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b99127712e67..012b48da8810 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -147,6 +147,49 @@ drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = {
 };
 EXPORT_SYMBOL(drop_reasons_by_subsys);
 
+#ifdef CONFIG_TRACEPOINTS
+const char *drop_reason_lookup(unsigned long long value)
+{
+   unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT;
+   u32 reason = value & ~SKB_DROP_REASON_SUBSYS_MASK;
+   const struct drop_reason_list *subsys;
+
+   if (subsys_id >= SKB_DROP_REASON_SUBSYS_NUM)
+   return NULL;
+
+   subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]);
+   if (!subsys)
+   return NULL;
+   if (reason >= subsys->n_reasons)
+   return NULL;
+   return subsys->reasons[reason];
+}
+
+void drop_reason_show(struct seq_file *m)
+{
+   u32 subsys_id;
+
+   rcu_read_lock();
+   for (subsys_id = 0; subsys_id < SKB_DROP_REASON_SUBSYS_NUM; 
subsys_id++) {
+   const struct drop_reason_list *subsys;
+   u32 i;
+
+   subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]);
+   if (!subsys)
+   continue;
+
+   for (i = 0; i < subsys->n_reasons; i++) {
+   if (!subsys->reasons[i])
+   continue;
+   seq_printf(m, ", { %u, \"%s\" }",
+  (subsys_id << SKB_DROP_REASON_SUBSYS_SHIFT) 
| i,
+  subsys->reasons[i]);
+   }
+   }
+   rcu_read_unlock();
+}
+#endif
+
 /**
  * drop_reasons_register_subsys - register another drop reason subsystem
  * @subsys: the subsystem to register, must not be the core
-- 
2.44.0




[RFC PATCH v2 0/4] tracing: improve symbolic printing

2024-03-26 Thread Johannes Berg
As I mentioned before, it's annoying to see this in dropreason tracing
with trace-cmd:

 irq/65-iwlwifi:-401   [000]22.79: kfree_skb:
skbaddr=0x6a89b000 protocol=0 location=ieee80211_rx_handlers_result+0x21a 
reason: 0x2

and much nicer to see

 irq/65-iwlwifi:-401   [000]22.79: kfree_skb:
skbaddr=0x69142000 protocol=0 location=ieee80211_rx_handlers_result+0x21a 
reason: RX_DROP_MONITOR

The reason for this is that the __print_symbolic() string in tracing
for trace-cmd to parse it is created at build-time, from the long list
of _core_ drop reasons, but the drop reasons are now more dynamic.

So I came up with __print_sym() which is similar, except it doesn't
build the big list of numbers at build time but rather at runtime,
which is actually a big memory saving too. But building it then, at
the time userspace is recording it, lets us include all the known
reasons.

v2:
 - rebased on 6.9-rc1
 - always search for __print_sym() and get rid of the DYNPRINT flag
   and associated code; I think ideally we'll just remove the older
   __print_symbolic() entirely
 - use ':' as the separator instead of "//" since that makes searching
   for it much easier and it's still not a valid char in an identifier
 - fix RCU

johannes




Re: [PATCH 0/4] tracing: improve symbolic printing

2023-10-04 Thread Johannes Berg
On Wed, 2023-10-04 at 09:22 -0700, Jakub Kicinski wrote:
> 
> Potentially naive question - the trace point holds enum skb_drop_reason.
> The user space can get the names from BTF. Can we not teach user space
> to generically look up names of enums in BTF?

I'll note that, unrelated to the discussion about whether or not we
could use BTF, we couldn't do it in this case anyway since the whole
drop reasons aren't captured in enum skb_drop_reason, that contains only
the core ones, and now other subsystems are adding their own somewhat
dynamically later.

johannes




Re: [RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup()

2023-09-21 Thread Johannes Berg
On Thu, 2023-09-21 at 10:51 +0200, Johannes Berg wrote:
> 
> - if (!list ||
> - subsys_reason >= list->n_reasons ||
> - !list->reasons[subsys_reason] ||
> - strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) {
> - list = 
> rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]);
> - subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED;
> - }

Oops, I lost this translation of erroneous ones to the core
SKB_DROP_REASON_NOT_SPECIFIED.

Maybe we should just not have the attribute in that case? But that could
be a different change too.

> - if (nla_put_string(msg, NET_DM_ATTR_REASON,
> -list->reasons[subsys_reason])) {
> + reason_str = drop_reason_lookup(cb->reason);
> + if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) {
>   rcu_read_unlock();

johannes



[PATCH 0/4] tracing: improve symbolic printing

2023-09-21 Thread Johannes Berg
So I was frustrated with not seeing the names of SKB dropreasons
for all but the core reasons, and then while looking into this
all, realized, that the current __print_symbolic() is pretty bad
anyway.

So I came up with a new approach, using a separate declaration
of the symbols, and __print_sym() in there, but to userspace it
all doesn't matter, it shows it the same way, just dyamically
instead of munging with the strings all the time.

This is a huge .data savings as far as I can tell, with a modest
amount (~4k) of .text addition, while making it all dynamic and
in the SKB dropreason case even reusing the existing list that
dropmonitor uses today. Surely patch 3 isn't needed here, but it
felt right.

Anyway, I think it's a pretty reasonable approach overall, and
it does works.

I've listed a number of open questions in the first patch since
that's where the real changes for this are.

johannes





[RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic()

2023-09-21 Thread Johannes Berg
From: Johannes Berg 

The way __print_symbolic() works is limited and inefficient
in multiple ways:
 - you can only use it with a static list of symbols, but
   e.g. the SKB dropreasons are now a dynamic list

 - it builds the list in memory _three_ times, so it takes
   a lot of memory:
   - The print_fmt contains the list (since it's passed to
 the macro there). This actually contains the names
 _twice_, which is fixed up at runtime.
   - TRACE_DEFINE_ENUM() puts a 24-byte struct trace_eval_map
 for every entry, plus the string pointed to by it, which
 cannot be deduplicated with the strings in the print_fmt
   - The in-kernel symbolic printing creates yet another list
 of struct trace_print_flags for trace_print_symbols_seq()

 - it also requires runtime fixup during init, which is a lot
   of string parsing due to the print_fmt fixup

Introduce __print_sym() to - over time - replace the old one.
We can easily extend this also to __print_flags later, but I
cared more about the SKB dropreasons for now.

This new __print_sym() requires only a single list of items,
created by TRACE_DEFINE_SYM_LIST(), for can even use another
already existing list by using TRACE_DEFINE_SYM_FNS() with
lookup and show methods.

Then, instead of doing an init-time fixup, just do this at the
time when userspace reads the print_fmt. This way, dynamically
updated lists are possible.

For userspace, nothing actually changes, because the print_fmt
is shown exactly the same way the old __print_symbolic() was.

This adds about 4k .text in my test builds, but that'll be
more than paid for by the actual conversions.

OPEN QUESTIONS:
 - do we need TRACE_EVENT_FL_DYNPRINT or should we just always
   go through show_print_fmt()? If we do, we can get rid of
   set_event_dynprint() which is also a stupid function, and
   somewhat inefficient.

 - Is the WARN_ON() in show_print_fmt() correct? Can we have a
   dynamic event using this infrastructure? I don't know about
   dynamic events enough.

 - Is the double-slash (//) thing a good approach? I was looking
   for something that couldn't happen in C, and that's a comment
   so it shouldn't be possible ... It's only internal though so
   we could always change it. Parsing to the next , means we'd
   have to parse a LOT of C, so that's out of the question.

 - Is it correct that we can assume RCU critical section when in
   the lookup function? The SKB code currently does, but I may
   not ever have actually run this code yet.

 - Where should all the functions be implemented :-)

Signed-off-by: Johannes Berg 
---
 include/asm-generic/vmlinux.lds.h  |   3 +-
 include/linux/module.h |   2 +
 include/linux/trace_events.h   |  10 ++
 include/linux/tracepoint.h |  20 
 include/trace/stages/init.h|  54 +
 include/trace/stages/stage2_data_offsets.h |   6 +
 include/trace/stages/stage3_trace_output.h |   9 ++
 include/trace/stages/stage7_class_define.h |   4 +
 kernel/module/main.c   |   3 +
 kernel/trace/trace_events.c| 131 -
 kernel/trace/trace_output.c|  43 +++
 11 files changed, 282 insertions(+), 3 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 9c59409104f6..b2b45ff0a701 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -263,7 +263,8 @@
 #define FTRACE_EVENTS()
\
. = ALIGN(8);   \
BOUNDED_SECTION(_ftrace_events) \
-   BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps)
+   BOUNDED_SECTION_BY(_ftrace_eval_map, _ftrace_eval_maps) \
+   BOUNDED_SECTION(_ftrace_sym_defs)
 #else
 #define FTRACE_EVENTS()
 #endif
diff --git a/include/linux/module.h b/include/linux/module.h
index a98e188cf37b..bf14b74a872c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -524,6 +524,8 @@ struct module {
unsigned int num_trace_events;
struct trace_eval_map **trace_evals;
unsigned int num_trace_evals;
+   struct trace_sym_def **trace_sym_defs;
+   unsigned int num_trace_sym_defs;
 #endif
 #ifdef CONFIG_FTRACE_MCOUNT_RECORD
unsigned int num_ftrace_callsites;
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index eb5c3add939b..9ab651a8fa0b 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -24,6 +24,13 @@ const char *trace_print_flags_seq(struct trace_seq *p, const 
char *delim,
 const char *trace_print_symbols_seq(struct trace_seq *p, unsigned long val,
const struct trace_print_flags 
*symbol_array);
 
+const char *trace_print_sym_seq(struct trace_seq *p, unsigned long long val,
+   const char *(*lookup

[RFC PATCH 3/4] net: drop_monitor: use drop_reason_lookup()

2023-09-21 Thread Johannes Berg
From: Johannes Berg 

Now that we have drop_reason_lookup(), we can just use it for
drop_monitor as well, rather than exporting the list itself.

Signed-off-by: Johannes Berg 
---
 include/net/dropreason.h |  4 
 net/core/drop_monitor.c  | 18 +++---
 net/core/skbuff.c|  6 +++---
 3 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index c157070b5303..0e2195ccf2cd 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -38,10 +38,6 @@ struct drop_reason_list {
size_t n_reasons;
 };
 
-/* Note: due to dynamic registrations, access must be under RCU */
-extern const struct drop_reason_list __rcu *
-drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM];
-
 #ifdef CONFIG_TRACEPOINTS
 const char *drop_reason_lookup(unsigned long long value);
 void drop_reason_show(struct seq_file *m);
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index aff31cd944c2..d110a16cde46 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -610,9 +610,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, 
struct sk_buff *skb,
 size_t payload_len)
 {
struct net_dm_skb_cb *cb = NET_DM_SKB_CB(skb);
-   const struct drop_reason_list *list = NULL;
-   unsigned int subsys, subsys_reason;
char buf[NET_DM_MAX_SYMBOL_LEN];
+   const char *reason_str;
struct nlattr *attr;
void *hdr;
int rc;
@@ -630,19 +629,8 @@ static int net_dm_packet_report_fill(struct sk_buff *msg, 
struct sk_buff *skb,
goto nla_put_failure;
 
rcu_read_lock();
-   subsys = u32_get_bits(cb->reason, SKB_DROP_REASON_SUBSYS_MASK);
-   if (subsys < SKB_DROP_REASON_SUBSYS_NUM)
-   list = rcu_dereference(drop_reasons_by_subsys[subsys]);
-   subsys_reason = cb->reason & ~SKB_DROP_REASON_SUBSYS_MASK;
-   if (!list ||
-   subsys_reason >= list->n_reasons ||
-   !list->reasons[subsys_reason] ||
-   strlen(list->reasons[subsys_reason]) > NET_DM_MAX_REASON_LEN) {
-   list = 
rcu_dereference(drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_CORE]);
-   subsys_reason = SKB_DROP_REASON_NOT_SPECIFIED;
-   }
-   if (nla_put_string(msg, NET_DM_ATTR_REASON,
-  list->reasons[subsys_reason])) {
+   reason_str = drop_reason_lookup(cb->reason);
+   if (nla_put_string(msg, NET_DM_ATTR_REASON, reason_str)) {
rcu_read_unlock();
goto nla_put_failure;
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 415329b76921..9efde769949d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -126,13 +126,11 @@ static const struct drop_reason_list drop_reasons_core = {
.n_reasons = ARRAY_SIZE(drop_reasons),
 };
 
-const struct drop_reason_list __rcu *
+static const struct drop_reason_list __rcu *
 drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = {
[SKB_DROP_REASON_SUBSYS_CORE] = RCU_INITIALIZER(_reasons_core),
 };
-EXPORT_SYMBOL(drop_reasons_by_subsys);
 
-#ifdef CONFIG_TRACEPOINTS
 const char *drop_reason_lookup(unsigned long long value)
 {
unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT;
@@ -149,7 +147,9 @@ const char *drop_reason_lookup(unsigned long long value)
return NULL;
return subsys->reasons[reason];
 }
+EXPORT_SYMBOL(drop_reason_lookup);
 
+#ifdef CONFIG_TRACEPOINTS
 void drop_reason_show(struct seq_file *m)
 {
u32 subsys_id;
-- 
2.41.0




[RFC PATCH 4/4] tracing/timer: use __print_sym()

2023-09-21 Thread Johannes Berg
From: Johannes Berg 

Use the new __print_sym() in the timer tracing, just to show
how to convert something. This adds ~80 bytes of .text for a
saving of ~1.5K of data in my builds.

Note the format changes from

print fmt: "success=%d dependency=%s", REC->success, 
__print_symbolic(REC->dependency, { 0, "NONE" }, { (1 << 0), "POSIX_TIMER" }, { 
(1 << 1), "PERF_EVENTS" }, { (1 << 2), "SCHED" }, { (1 << 3), "CLOCK_UNSTABLE" 
}, { (1 << 4), "RCU" }, { (1 << 5), "RCU_EXP" })

to

print fmt: "success=%d dependency=%s", REC->success, 
__print_symbolic(REC->dependency, { 0, "NONE" }, { 1, "POSIX_TIMER" }, { 2, 
"PERF_EVENTS" }, { 4, "SCHED" }, { 8, "CLOCK_UNSTABLE" }, { 16, "RCU" }, { 32, 
"RCU_EXP" })

Since the values are now just printed in the show function as
pure decimal values.

Signed-off-by: Johannes Berg 
---
 include/trace/events/timer.h | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index b4bc2828fa09..cb8294da29d0 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -382,26 +382,18 @@ TRACE_EVENT(itimer_expire,
 #undef tick_dep_mask_name
 #undef tick_dep_name_end
 
-/* The MASK will convert to their bits and they need to be processed too */
-#define tick_dep_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \
-   TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-#define tick_dep_name_end(sdep)  TRACE_DEFINE_ENUM(TICK_DEP_BIT_##sdep); \
-   TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-/* NONE only has a mask defined for it */
-#define tick_dep_mask_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
-
-TICK_DEP_NAMES
-
-#undef tick_dep_name
-#undef tick_dep_mask_name
-#undef tick_dep_name_end
-
 #define tick_dep_name(sdep) { TICK_DEP_MASK_##sdep, #sdep },
 #define tick_dep_mask_name(sdep) { TICK_DEP_MASK_##sdep, #sdep },
 #define tick_dep_name_end(sdep) { TICK_DEP_MASK_##sdep, #sdep }
 
+TRACE_DEFINE_SYM_LIST(tick_dep_names, TICK_DEP_NAMES);
+
+#undef tick_dep_name
+#undef tick_dep_mask_name
+#undef tick_dep_name_end
+
 #define show_tick_dep_name(val)\
-   __print_symbolic(val, TICK_DEP_NAMES)
+   __print_sym(val, tick_dep_names)
 
 TRACE_EVENT(tick_stop,
 
-- 
2.41.0




Re: [RFC PATCH 1/4] tracing: add __print_sym() to replace __print_symbolic()

2023-09-21 Thread Johannes Berg
On Thu, 2023-09-21 at 08:51 +, Johannes Berg wrote:
> 
>  - Is it correct that we can assume RCU critical section when in
>the lookup function? The SKB code currently does, but I may
>not ever have actually run this code yet.

Well, I could easily answer that myself, and no, it's incorrect.

It'd be really useful though for these lookups to be able to do them
under RCU, so I think I'll fold this?

--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -34,7 +34,7 @@ struct trace_eval_map {
 struct trace_sym_def {
const char  *system;
const char  *symbol_id;
-   /* may return NULL */
+   /* may return NULL, called under rcu_read_lock() */
const char *(*lookup)(unsigned long long);
/*
 * Must print the list: ', { val, "name"}, ...'
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -155,11 +155,13 @@ trace_print_sym_seq(struct trace_seq *p, unsigned
long long val,
const char *ret = trace_seq_buffer_ptr(p);
const char *name;
 
+   rcu_read_lock();
name = lookup(val);
if (name)
trace_seq_puts(p, name);
else
trace_seq_printf(p, "0x%llx", val);
+   rcu_read_unlock();
 
trace_seq_putc(p, 0);
 


johannes




[RFC PATCH 2/4] net: dropreason: use new __print_sym() in tracing

2023-09-21 Thread Johannes Berg
From: Johannes Berg 

The __print_symbolic() could only ever print the core
drop reasons, since that's the way the infrastructure
works. Now that we have __print_sym() with all the
advantages mentioned in that commit, convert to that
and get all the drop reasons from all subsystems. As
we already have a list of them, that's really easy.

This is a little bit of .text (~100 bytes in my build)
and saves a lot of .data (~17k).

Signed-off-by: Johannes Berg 
---
 include/net/dropreason.h   |  5 +
 include/trace/events/skb.h | 16 +++---
 net/core/skbuff.c  | 43 ++
 3 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/include/net/dropreason.h b/include/net/dropreason.h
index 56cb7be92244..c157070b5303 100644
--- a/include/net/dropreason.h
+++ b/include/net/dropreason.h
@@ -42,6 +42,11 @@ struct drop_reason_list {
 extern const struct drop_reason_list __rcu *
 drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM];
 
+#ifdef CONFIG_TRACEPOINTS
+const char *drop_reason_lookup(unsigned long long value);
+void drop_reason_show(struct seq_file *m);
+#endif
+
 void drop_reasons_register_subsys(enum skb_drop_reason_subsys subsys,
  const struct drop_reason_list *list);
 void drop_reasons_unregister_subsys(enum skb_drop_reason_subsys subsys);
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 07e0715628ec..8a1a63f9e796 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -8,15 +8,9 @@
 #include 
 #include 
 #include 
+#include 
 
-#undef FN
-#define FN(reason) TRACE_DEFINE_ENUM(SKB_DROP_REASON_##reason);
-DEFINE_DROP_REASON(FN, FN)
-
-#undef FN
-#undef FNe
-#define FN(reason) { SKB_DROP_REASON_##reason, #reason },
-#define FNe(reason){ SKB_DROP_REASON_##reason, #reason }
+TRACE_DEFINE_SYM_FNS(drop_reason, drop_reason_lookup, drop_reason_show);
 
 /*
  * Tracepoint for free an sk_buff:
@@ -44,13 +38,9 @@ TRACE_EVENT(kfree_skb,
 
TP_printk("skbaddr=%p protocol=%u location=%pS reason: %s",
  __entry->skbaddr, __entry->protocol, __entry->location,
- __print_symbolic(__entry->reason,
-  DEFINE_DROP_REASON(FN, FNe)))
+ __print_sym(__entry->reason, drop_reason ))
 );
 
-#undef FN
-#undef FNe
-
 TRACE_EVENT(consume_skb,
 
TP_PROTO(struct sk_buff *skb, void *location),
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4eaf7ed0d1f4..415329b76921 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -132,6 +132,49 @@ drop_reasons_by_subsys[SKB_DROP_REASON_SUBSYS_NUM] = {
 };
 EXPORT_SYMBOL(drop_reasons_by_subsys);
 
+#ifdef CONFIG_TRACEPOINTS
+const char *drop_reason_lookup(unsigned long long value)
+{
+   unsigned long long subsys_id = value >> SKB_DROP_REASON_SUBSYS_SHIFT;
+   u32 reason = value & ~SKB_DROP_REASON_SUBSYS_MASK;
+   const struct drop_reason_list *subsys;
+
+   if (subsys_id >= SKB_DROP_REASON_SUBSYS_NUM)
+   return NULL;
+
+   subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]);
+   if (!subsys)
+   return NULL;
+   if (reason >= subsys->n_reasons)
+   return NULL;
+   return subsys->reasons[reason];
+}
+
+void drop_reason_show(struct seq_file *m)
+{
+   u32 subsys_id;
+
+   rcu_read_lock();
+   for (subsys_id = 0; subsys_id < SKB_DROP_REASON_SUBSYS_NUM; 
subsys_id++) {
+   const struct drop_reason_list *subsys;
+   u32 i;
+
+   subsys = rcu_dereference(drop_reasons_by_subsys[subsys_id]);
+   if (!subsys)
+   continue;
+
+   for (i = 0; i < subsys->n_reasons; i++) {
+   if (!subsys->reasons[i])
+   continue;
+   seq_printf(m, ", { %u, \"%s\" }",
+  (subsys_id << SKB_DROP_REASON_SUBSYS_SHIFT) 
| i,
+  subsys->reasons[i]);
+   }
+   }
+   rcu_read_unlock();
+}
+#endif
+
 /**
  * drop_reasons_register_subsys - register another drop reason subsystem
  * @subsys: the subsystem to register, must not be the core
-- 
2.41.0




Re: [PATCH v2 1/8] scripts/gdb/symbols: add specific ko module load command

2023-09-12 Thread Johannes Berg
On Tue, 2023-08-08 at 16:30 +0800, Kuan-Ying Lee wrote:
> Add lx-symbols  command to support add specific
> ko module.

I'm not sure how this was supposed to work? It should have updated the
documentation, but more importantly, it shouldn't have broken the
documented usage of this command:

  The kernel (vmlinux) is taken from the current working directly. Modules 
(.ko)
  are scanned recursively, starting in the same directory. Optionally, the 
module
  search path can be extended by a space separated list of paths passed to 
the
  lx-symbols command.

Note how that talks about a "space separated list of paths" for which
clearly a single path seems like it should be accepted?

> @@ -138,6 +139,19 @@ lx-symbols command."""
>  else:
>  gdb.write("no module object found for 
> '{0}'\n".format(module_name))
>  
> +def load_ko_symbols(self, mod_path):
> +self.loaded_modules = []
> +module_list = modules.module_list()
> +
> +for module in module_list:
> +module_name = module['name'].string()
> +module_pattern = ".*/{0}\.ko(?:.debug)?$".format(
> +module_name.replace("_", r"[_\-]"))
> +if re.match(module_pattern, mod_path) and 
> os.path.exists(mod_path):
> +self.load_module_symbols(module, mod_path)
> +return
> +raise gdb.GdbError("%s is not a valid .ko\n" % mod_path)
> +
>  def load_all_symbols(self):
>  gdb.write("loading vmlinux\n")
>  
> @@ -176,6 +190,11 @@ lx-symbols command."""
>  self.module_files = []
>  self.module_files_updated = False
>  
> +argv = gdb.string_to_argv(arg)
> +if len(argv) == 1:
> +self.load_ko_symbols(argv[0])
> +return

But this obviously breaks it, since passing a single path will go into
the if, then complain "some/folder/ is not a valid .ko" and exit.


But I'm not even sure how you intended this to work _at all_, because in
the context before this if, we have:

 self.module_paths = [os.path.abspath(os.path.expanduser(p))
  for p in arg.split()]
 self.module_paths.append(os.getcwd())

so you first add the (file!) to the list of paths, and then try to load
the file by finding modules in the paths, and filtering by the specified
file? That seems ... very roundabout, and can even only work if the file
can be found via os.getcwd(), so you could never specify an absolute
filename?

All that seems counter to what the patch was meant to do.

I suspect that really you need to individually check "is it a file or a
directory" before handling any of this, and if it's actually a file,
don't use it as a _filter_ as you do in load_ko_symbols() now but
directly use it as is with load_module_symbols()?


@Jan, can we revert this? I came up with the following trivial fix that
makes it at least not break the original use case of passing a single-
entry directory list, but it really doesn't seem right one way or the
other.


--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -191,7 +191,7 @@ lx-symbols command."""
 self.module_files_updated = False
 
 argv = gdb.string_to_argv(arg)
-if len(argv) == 1:
+if len(argv) == 1 and os.path.isfile(argv[0]):
 self.load_ko_symbols(argv[0])
 return
 


johannes


Re: [PATCH AUTOSEL 5.4 13/14] gcov: clang: fix clang-11+ build

2021-04-20 Thread Johannes Berg
On Mon, 2021-04-19 at 20:44 +, Sasha Levin wrote:
> From: Johannes Berg 
> 
> [ Upstream commit 04c53de57cb6435738961dace8b1b71d3ecd3c39 ]
> 
> With clang-11+, the code is broken due to my kvmalloc() conversion
> (which predated the clang-11 support code) leaving one vmalloc() in
> place.  Fix that.

This patch might *apply* on 5.4 (and the other kernels you selected it
for), but then I'm pretty sure it'll be broken, unless you also applied
the various patches this was actually fixing (my kvmalloc conversion,
and the clang-11 support).

Also, Linus has (correctly) reverted this patch from mainline, it
shouldn't have gone there in the first place, probably really should be
squashed into my kvmalloc conversion patch that's in -mm now.

Sorry if I didn't make that clear enough at the time.


In any case, please drop this patch from all stable trees.

johannes



Re: iwlwifi: Microcode SW error

2021-04-19 Thread Johannes Berg
On Mon, 2021-04-19 at 11:54 +0200, Gonsolo wrote:
> > Do you use MFP?
> 
> I don't think so. I'm running unmodified kernels from Ubuntu PPA and
> don't meddle with WIFI configs.
> How can I find out?
> 
> > Could be related to
> > https://patchwork.kernel.org/project/linux-wireless/patch/20210416134702.ef8486a64293.If0a9025b39c71bb91b11dd6ac45547aba682df34@changeid/
> > I saw similar issues internally without that fix.
> 
> A quick "git grep" didn't show any of these two patches in the 5.12-
> rc7 kernel.
> 
Yeah, my bad, I only put the offending patch into -next, I
misremembered. Sorry for the noise.

johannes



Re: iwlwifi: Microcode SW error

2021-04-19 Thread Johannes Berg
On Mon, 2021-04-19 at 11:08 +0200, Gon Solo wrote:
> 
> [Apr19 10:50] iwlwifi :02:00.0: Queue 10 is active on fifo 1 and stuck 
> for 1 ms. SW [40, 93] HW [40, 93] FH TRB=0x0c010a037
> [  +0,001244] iwlwifi :02:00.0: Microcode SW error detected.  Restarting 
> 0x200.
> 
> The rest of the message is at the end of this message.
> The kernel version is "Linux Limone 5.12.0-051200rc7-lowlatency" from 
> https://kernel.ubuntu.com/~kernel-ppa/mainline.
> The relevant output of lspci is:
> 02:00.0 Network controller: Intel Corporation Wireless 7260 (rev 73)
> 
> I would be glad to provide additional details if somebody is interested
> to fix this bug.
> 
> Regards,
> Andreas
> 
> [[Apr19 10:50] iwlwifi :02:00.0: Queue 10 is active on fifo 1 and stuck 
> for 1 ms. SW [40, 93] HW [40, 93] FH TRB=0x0c010a037
> [  +0,001244] iwlwifi :02:00.0: Microcode SW error detected.  Restarting 
> 0x200.
> [  +0,000160] iwlwifi :02:00.0: Start IWL Error Log Dump:
> [  +0,04] iwlwifi :02:00.0: Status: 0x0040, count: 6
> [  +0,05] iwlwifi :02:00.0: Loaded firmware version: 17.3216344376.0 
> 7260-17.ucode
> [  +0,05] iwlwifi :02:00.0: 0x0084 | NMI_INTERRUPT_UNKNOWN   

Do you use MFP?

Could be related to

https://patchwork.kernel.org/project/linux-wireless/patch/20210416134702.ef8486a64293.If0a9025b39c71bb91b11dd6ac45547aba682df34@changeid/

I saw similar issues internally without that fix.

johannes



Re: [PATCH] mac80211_hwsim: indicate support for 60GHz channels

2021-04-14 Thread Johannes Berg
On Wed, 2021-04-14 at 12:06 -0300, Ramon Fontes wrote:
> > Advertise 60GHz channels to mac80211.
> Oh.. That was a mistake. Sorry for that.
> 
> Anyway, can we indicate 60GHz support even though it is not being
> supported by mac80211 yet?
> 
No, that makes no sense. DMG operation is significantly different from
non-DMG operation, even the A-MSDU format for example (an abbreviated
format called "short A-MSDU" is used in DMG). Similarly beacons, etc.,
all kinds of operations are significantly different.

Adding 60 GHz channels to hwsim would be essentially operating as non-
DMG yet on a DMG channel, which makes no sense.

I don't think anyone's planning to add DMG support to mac80211, and in
the absence of a real driver requiring that it wouldn't make sense
anyway. Quite possibly, it simply doesn't make sense period, because DMG
operation is sufficiently different.

johannes



Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space

2021-04-14 Thread Johannes Berg
On Wed, 2021-04-14 at 08:22 +0100, Anton Ivanov wrote:
> On 14/04/2021 06:52, Andrei Vagin wrote:
> > We already have process_vm_readv and process_vm_writev to read and write
> > to a process memory faster than we can do this with ptrace. And now it
> > is time for process_vm_exec that allows executing code in an address
> > space of another process. We can do this with ptrace but it is much
> > slower.
> > 
> > = Use-cases =
> > 
> > Here are two known use-cases. The first one is “application kernel”
> > sandboxes like User-mode Linux and gVisor. In this case, we have a
> > process that runs the sandbox kernel and a set of stub processes that
> > are used to manage guest address spaces. Guest code is executed in the
> > context of stub processes but all system calls are intercepted and
> > handled in the sandbox kernel. Right now, these sort of sandboxes use
> > PTRACE_SYSEMU to trap system calls, but the process_vm_exec can
> > significantly speed them up.
> 
> Certainly interesting, but will require um to rework most of its memory 
> management and we will most likely need extra mm support to make use of 
> it in UML. We are not likely to get away just with one syscall there.

Might help the seccomp mode though:

https://patchwork.ozlabs.org/project/linux-um/list/?series=231980

johannes




Re: [PATCH] mac80211_hwsim: indicate support for 60GHz channels

2021-04-14 Thread Johannes Berg
On Mon, 2021-04-12 at 22:06 -0300, Ramon Fontes wrote:
> Advertise 60GHz channels to mac80211.

This is wrong. mac80211 doesn't support 60 GHz operation.

johannes



Re: linux-next: build warning after merge of the mac80211-next tree

2021-04-13 Thread Johannes Berg
On Tue, 2021-04-13 at 10:39 +, Grumbach, Emmanuel wrote:
> Hi,
> 
> > 
> > Hi all,
> > 
> > After merging the mac80211-next tree, today's linux-next build (htmldocs)
> > produced this warning:
> > 
> > include/net/cfg80211.h:6643: warning: expecting prototype for
> > wiphy_rfkill_set_hw_state(). Prototype was for
> > wiphy_rfkill_set_hw_state_reason() instead
> > include/net/cfg80211.h:6643: warning: expecting prototype for
> 
> Ouch
> 
> Johannes, this is the fix:
> diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
> index 3b296f2b7a2c..c6134220dd8f 100644
> --- a/include/net/cfg80211.h
> +++ b/include/net/cfg80211.h
> @@ -6634,7 +6634,7 @@ void cfg80211_notify_new_peer_candidate(struct 
> net_device *dev,
>   */
>  
> 
>  /**
> - * wiphy_rfkill_set_hw_state - notify cfg80211 about hw block state
> + * wiphy_rfkill_set_hw_state_reason - notify cfg80211 about hw block state
>   * @wiphy: the wiphy
>   * @blocked: block status
>   * @reason: one of reasons in  rfkill_hard_block_reasons
> 
> Do you want a patch or you amend the original commit?
> It is not in net-next yet.

Please send a separate fix anyway, I've pushed it out publicly after all
and try not to rebase unless absolutely necessary.

johannes



[PATCH] gcov: clang: fix clang-11+ build

2021-04-12 Thread Johannes Berg
From: Johannes Berg 

With clang-11+, the code is broken due to my kvmalloc() conversion
(which predated the clang-11 support code) leaving one vmalloc()
in place. Fix that.

Signed-off-by: Johannes Berg 
---
This fixes a clang-11 build issue reported against current
linux-next since the clang-11+ support landed in v5.12-rc5
and my kvmalloc() conversion patch is in akpm/linux-next.
I guess this should be folded into

  gcov: use kvmalloc()

If desired, I can send a combined patch instead.
---
 kernel/gcov/clang.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
index 04a65443005d..d43ffd0c5ddb 100644
--- a/kernel/gcov/clang.c
+++ b/kernel/gcov/clang.c
@@ -368,7 +368,7 @@ static struct gcov_fn_info *gcov_fn_info_dup(struct 
gcov_fn_info *fn)
INIT_LIST_HEAD(_dup->head);
 
cv_size = fn->num_counters * sizeof(fn->counters[0]);
-   fn_dup->counters = vmalloc(cv_size);
+   fn_dup->counters = kvmalloc(cv_size, GFP_KERNEL);
if (!fn_dup->counters) {
kfree(fn_dup);
return NULL;
-- 
2.30.2



Re: [PATCH] net: netlink: fix error check in genl_family_rcv_msg_doit

2021-04-03 Thread Johannes Berg
On Sat, 2021-04-03 at 15:13 +, Pavel Skripkin wrote:
> genl_family_rcv_msg_attrs_parse() can return NULL
> pointer:
> 
> if (!ops->maxattr)
> return NULL;
> 
> But this condition doesn't cause an error in
> genl_family_rcv_msg_doit

And I'm almost certain that in fact it shouldn't cause an error!

If the family doesn't set maxattr then it doesn't want to have generic
netlink doing the parsing, but still it should be possible to call the
ops. Look at fs/dlm/netlink.c for example, it doesn't even have
attributes. You're breaking it with this patch.

Also, the (NULL) pointer is not actually _used_ anywhere, so why would
it matter?

johannes



Re: [PATCH 7/7] devcoredump: fix kernel-doc warning

2021-04-01 Thread Johannes Berg
On Wed, 2021-03-31 at 18:26 -0500, Pierre-Louis Bossart wrote:
> remove make W=1 warnings
> 
> drivers/base/devcoredump.c:208: warning:
> Function parameter or member 'data' not described in
> 'devcd_free_sgtable'
> 
> drivers/base/devcoredump.c:208: warning:
> Excess function parameter 'table' description in 'devcd_free_sgtable'
> 
> drivers/base/devcoredump.c:225: warning:
> expecting prototype for devcd_read_from_table(). Prototype was for
> devcd_read_from_sgtable() instead

Thanks!

Reviewed-by: Johannes Berg 

johannes




Re: Build regressions/improvements in v5.12-rc4

2021-03-22 Thread Johannes Berg
On Mon, 2021-03-22 at 15:43 +0100, Geert Uytterhoeven wrote:
> On Mon, Mar 22, 2021 at 3:16 PM Geert Uytterhoeven  
> wrote:
> > JFYI, when comparing v5.12-rc4[1] to v5.12-rc3[3], the summaries are:
> >   - build errors: +6/-2
> 
> > [1] 
> > http://kisskb.ellerman.id.au/kisskb/branch/linus/head/0d02ec6b3136c73c09e7859f0d0e4e2c4c07b49b/
> >  (all 192 configs)
> > [3] 
> > http://kisskb.ellerman.id.au/kisskb/branch/linus/head/1e28eed17697bcf343c6743f0028cc3b5dd88bf0/
> >  (all 192 configs)
> 
> > 6 error regressions:
> >   + error: modpost: "devm_ioremap_resource" 
> > [drivers/net/ethernet/xilinx/xilinx_emac.ko] undefined!:  => N/A
> >   + error: modpost: "devm_ioremap_resource" 
> > [drivers/net/ethernet/xilinx/xilinx_emaclite.ko] undefined!:  => N/A
> >   + error: modpost: "devm_of_iomap" 
> > [drivers/net/ethernet/xilinx/ll_temac.ko] undefined!:  => N/A
> >   + error: modpost: "devm_platform_ioremap_resource" 
> > [drivers/net/ethernet/xilinx/ll_temac.ko] undefined!:  => N/A
> >   + error: modpost: "devm_platform_ioremap_resource_byname" 
> > [drivers/net/ethernet/xilinx/ll_temac.ko] undefined!:  => N/A
> 
> x86_64/um-allmodconfig

I'd guess that's a missing 'depends on HAS_IOMEM'.

johannes



Re: systemd-rfkill regression on 5.11 and later kernels

2021-03-19 Thread Johannes Berg
On Thu, 2021-03-18 at 12:16 +0100, Johannes Berg wrote:
> > Yeah, that's a dilemma.  An oft-seen trick is to add more bytes for
> > the future use, e.g. extend to 16 bytes and fill 0 for the remaining.
> 
> Yeah, I guess I could stick a reserved[15] there, it's small enough.

Actually, that doesn't really help anything either.

If today I require that the reserved bytes are sent as 0 by userspace,
then any potential expansion that requires userspace to set it will
break when userspace does it and runs on an old kernel.

If I don't require the reserved bytes to be set to 0 then somebody will
invariably get it wrong and send garbage, and then we again cannot
extend it.

So ... that all seems pointless. I guess I'll send the patch as it is
now.

johannes



Re: [PATCH 4/6] um: split up CONFIG_GCOV

2021-03-18 Thread Johannes Berg
Hi Brendan,

> Hey, thanks for doing this! I was looking into this a few weeks ago
> and root caused part of the issue in GCC and in the kernel, but I did
> not have a fix put together.
> 
> Anyway, most of the patches make sense to me, but I am not able to
> apply this patch on torvalds/master. Do you mind sending a rebase so I
> can test it?

Well, if you see my other replies in the thread, I gave up for various
reasons, see

https://lore.kernel.org/r/d36ea54d8c0a8dd706826ba844a6f27691f45d55.ca...@sipsolutions.net

Personally, I ended up switching to CONFIG_GCOV_KERNEL instead because
it actually works for modules, but then it was _really_ slow (think 30s
to copy data for a few modules), but I root-caused this and ultimately
sent these patches instead:

https://patchwork.ozlabs.org/project/linux-um/patch/20210315233804.d3e52f6a3422.I9672eef7dfa7ce6c3de1ccf7ab8d9aad1fa7f3a6@changeid/
https://patchwork.ozlabs.org/project/linux-um/patch/20210315234731.2e03184a344b.I04f1816296f04c5aa7d7d88b33bd4a14dd458da8@changeid/


johannes



Re: systemd-rfkill regression on 5.11 and later kernels

2021-03-18 Thread Johannes Berg
On Thu, 2021-03-18 at 12:11 +0100, Takashi Iwai wrote:
> > That said, we can "fix" this like this, and hope we'll not get this
> > again? And if we do get it again ... well, we keep renaming the structs
> > and add "struct rfkill_event_v3" next time?
> 
> Yeah, that's a dilemma.  An oft-seen trick is to add more bytes for
> the future use, e.g. extend to 16 bytes and fill 0 for the remaining.

Yeah, I guess I could stick a reserved[15] there, it's small enough.

> In the sound driver, we introduced an ioctl to inform from user-space
> which API protocol it can speak, and the kernel falls back to the old
> API/ABI if it's a lower version or it's not told at all.  But I'm not
> sure whether such an implementation is optimal for rfkill.

I thought about it, but it ... doesn't really help.

Somebody's going to do

ioctl(..., sizeof(ev)) == sizeof(ev)

and break on older kernels, or == my_fixed_size, or ... something. It's
not really going to address the issue entirely.

And it's more complexity.

johannes



Re: systemd-rfkill regression on 5.11 and later kernels

2021-03-18 Thread Johannes Berg
Hi,

> OK, I took a deeper look again, and actually there are two issues in
> systemd-rfkill code:
> 
> * It expects 8 bytes returned from read while it reads a struct
>   rfkill_event record.  If the code is rebuilt with the latest kernel
>   headers, it breaks due to the change of rfkill_event.  That's the
>   error openSUSE bug report points to.

Right. It hardcoded the size check but not the size it reads.

> * When systemd-rfkill is built with the latest kernel headers but runs
>   on the old kernel code, the write size check fails as you mentioned
>   in the above.  That's another part of the github issue.

Yes. And it's all confusing, because they only later added the "this is
on 5.10" bits, and on pure 5.11 the second thing made no sense.

Same confusion bit the developer of the systemd fix, but nonetheless the
fix seems OK.

> So, with a kernel devs hat on, I share your feeling, that's an
> application bug.  OTOH, the extension of the rfkill_event is, well,
> not really safe as expected.

Evidently.

> IMO, if systemd-rfkill is the only one that hits such a problem, we
> may let the systemd code fixed, as it's obviously buggy.  But who
> knows...

We hit it in at least one other places, but that was just dev/test code,
I think.

> Is the extension of rfkill_event mandatory?  Can the new entry
> provided in a different way such as another sysfs record?

Yes, it is mandatory - it needs to be provided as an event. Well, I
guess in theory it's all software, but ... getting an event and then
having to poke a sysfs file is also a nightmare.

> IOW, if we revert the change, would it break anything else new?

It would break the necessary notification for the feature :)


That said, we can "fix" this like this, and hope we'll not get this
again? And if we do get it again ... well, we keep renaming the structs
and add "struct rfkill_event_v3" next time?



diff --git a/include/uapi/linux/rfkill.h b/include/uapi/linux/rfkill.h
index 03e8af87b364..9b77cfc42efa 100644
--- a/include/uapi/linux/rfkill.h
+++ b/include/uapi/linux/rfkill.h
@@ -86,34 +86,90 @@ enum rfkill_hard_block_reasons {
  * @op: operation code
  * @hard: hard state (0/1)
  * @soft: soft state (0/1)
+ *
+ * Structure used for userspace communication on /dev/rfkill,
+ * used for events from the kernel and control to the kernel.
+ */
+struct rfkill_event {
+   __u32 idx;
+   __u8  type;
+   __u8  op;
+   __u8  soft;
+   __u8  hard;
+} __attribute__((packed));
+
+/**
+ * struct rfkill_event_ext - events for userspace on /dev/rfkill
+ * @idx: index of dev rfkill
+ * @type: type of the rfkill struct
+ * @op: operation code
+ * @hard: hard state (0/1)
+ * @soft: soft state (0/1)
  * @hard_block_reasons: valid if hard is set. One or several reasons from
  *  rfkill_hard_block_reasons.
  *
  * Structure used for userspace communication on /dev/rfkill,
  * used for events from the kernel and control to the kernel.
+ *
+ * See the extensibility docs below.
  */
-struct rfkill_event {
+struct rfkill_event_ext {
__u32 idx;
__u8  type;
__u8  op;
__u8  soft;
__u8  hard;
+
+   /*
+* older kernels will accept/send only up to this point,
+* and if extended further up to any chunk marked below
+*/
+
__u8  hard_block_reasons;
 } __attribute__((packed));
 
-/*
- * We are planning to be backward and forward compatible with changes
- * to the event struct, by adding new, optional, members at the end.
- * When reading an event (whether the kernel from userspace or vice
- * versa) we need to accept anything that's at least as large as the
- * version 1 event size, but might be able to accept other sizes in
- * the future.
+/**
+ * DOC: Extensibility
+ *
+ * Originally, we had planned to allow backward and forward compatible
+ * changes by just adding fields at the end of the structure that are
+ * then not reported on older kernels on read(), and not written to by
+ * older kernels on write(), with the kernel reporting the size it did
+ * accept as the result.
+ *
+ * This would have allowed userspace to detect on read() and write()
+ * which kernel structure version it was dealing with, and if was just
+ * recompiled it would have gotten the new fields, but obviously not
+ * accessed them, but things should've continued to work.
+ *
+ * Unfortunately, while actually exercising this mechanism to add the
+ * hard block reasons field, we found that userspace (notably systemd)
+ * did all kinds of fun things not in line with this scheme:
+ *
+ * 1. treat the (expected) short writes as an error;
+ * 2. ask to read sizeof(struct rfkill_event) but then compare the
+ *actual return value to RFKILL_EVENT_SIZE_V1 and treat any
+ *mismatch as an error.
+ *
+ * As a consequence, just recompiling with a new struct version caused
+ * things to no longer work correctly on old and new kernels.
+ *
+ * Hence, we've rolled back  rfkill_event to the original version
+ * and 

Re: systemd-rfkill regression on 5.11 and later kernels

2021-03-18 Thread Johannes Berg
Hi Takashi,

Oh yay :-(

> we've received a bug report about rfkill change that was introduced in
> 5.11.  While the systemd-rfkill expects the same size of both read and
> write, the kernel rfkill write cuts off to the old 8 bytes while read
> gives 9 bytes, hence it leads the error:
>   https://github.com/systemd/systemd/issues/18677
>   https://bugzilla.opensuse.org/show_bug.cgi?id=1183147

> As far as I understand from the log in the commit 14486c82612a, this
> sounds like the intended behavior.

Not really? I don't even understand why we get this behaviour.

The code is this:

rfkill_fop_write():

...
/* we don't need the 'hard' variable but accept it */
if (count < RFKILL_EVENT_SIZE_V1 - 1)
return -EINVAL;

# this is actually 7 - RFKILL_EVENT_SIZE_V1 is 8
# (and obviously we get past this if and don't get -EINVAL

/*
 * Copy as much data as we can accept into our 'ev' buffer,
 * but tell userspace how much we've copied so it can determine
 * our API version even in a write() call, if it cares.
 */
count = min(count, sizeof(ev));

# sizeof(ev) should be 9 since 'ev' is the new struct

if (copy_from_user(, buf, count))
return -EFAULT;

...
ret = 0;
...
return ret ?: count;




Ah, no, I see. The bug says:

EDIT: above is with kernel-core-5.10.16-200.fc33.x86_64.

So you've recompiled systemd with 5.11 headers, but are running against
5.10 now, where the short write really was intentional - it lets you
detect that the new fields weren't handled by the kernel. If 


The other issue is basically this (pre-fix) systemd code:

l = read(c.rfkill_fd, , sizeof(event));
...
if (l != RFKILL_EVENT_SIZE_V1) /* log/return error */



So ... honestly, I don't have all that much sympathy, when the uapi
header explicitly says we want to be able to change the size. But I
guess "no regressions" rules are hard, so ... dunno. I guess we can add
a version/size ioctl and keep using 8 bytes unless you send that?

johannes



Re: [PATCH] net: wireless: search and hold bss in cfg80211_connect_done

2021-03-16 Thread Johannes Berg
On Tue, 2021-03-16 at 19:29 +, Abhishek Kumar wrote:
> If BSS instance is not provided in __cfg80211_connect_result then
> a get bss is performed. This can return NULL if the BSS for the
> given SSID is expired due to delayed scheduling of connect result event
> in rdev->event_work. This can cause WARN_ON(!cr->bss) in
> __cfg80211_connect_result to be triggered and cause cascading
> failures. To mitigate this, initiate a get bss call in
> cfg80211_connect_done itself and hold it to ensure that the BSS
> instance does not get expired.

I'm not sure I see the value in this.

You're basically picking a slightly earlier point in time where cfg80211
might know about the BSS entry still, so you're really just making the
problem window a few microseconds or perhaps milliseconds (whatever ends
up being the worker delay) shorter.

Compared to the 30s entry lifetime, that's nothing.

So what's the point? Please fix the driver instead to actually hold on
to it and report it back.

johannes



[PATCH 1/3] gcov: combine common code

2021-03-15 Thread Johannes Berg
From: Johannes Berg 

There's a lot of duplicated code between gcc and clang
implementations, move it over to fs.c to simplify the
code, there's no reason to believe that for small data
like this one would not just implement the simple
convert_to_gcda() function.

Signed-off-by: Johannes Berg 
---
 kernel/gcov/base.c|  49 +
 kernel/gcov/clang.c   | 167 +-
 kernel/gcov/fs.c  | 116 +
 kernel/gcov/gcc_4_7.c | 167 +-
 kernel/gcov/gcov.h|  14 +---
 5 files changed, 171 insertions(+), 342 deletions(-)

diff --git a/kernel/gcov/base.c b/kernel/gcov/base.c
index 0ffe9f194080..073a3738c5e6 100644
--- a/kernel/gcov/base.c
+++ b/kernel/gcov/base.c
@@ -49,6 +49,55 @@ void gcov_enable_events(void)
mutex_unlock(_lock);
 }
 
+/**
+ * store_gcov_u32 - store 32 bit number in gcov format to buffer
+ * @buffer: target buffer or NULL
+ * @off: offset into the buffer
+ * @v: value to be stored
+ *
+ * Number format defined by gcc: numbers are recorded in the 32 bit
+ * unsigned binary form of the endianness of the machine generating the
+ * file. Returns the number of bytes stored. If @buffer is %NULL, doesn't
+ * store anything.
+ */
+size_t store_gcov_u32(void *buffer, size_t off, u32 v)
+{
+   u32 *data;
+
+   if (buffer) {
+   data = buffer + off;
+   *data = v;
+   }
+
+   return sizeof(*data);
+}
+
+/**
+ * store_gcov_u64 - store 64 bit number in gcov format to buffer
+ * @buffer: target buffer or NULL
+ * @off: offset into the buffer
+ * @v: value to be stored
+ *
+ * Number format defined by gcc: numbers are recorded in the 32 bit
+ * unsigned binary form of the endianness of the machine generating the
+ * file. 64 bit numbers are stored as two 32 bit numbers, the low part
+ * first. Returns the number of bytes stored. If @buffer is %NULL, doesn't 
store
+ * anything.
+ */
+size_t store_gcov_u64(void *buffer, size_t off, u64 v)
+{
+   u32 *data;
+
+   if (buffer) {
+   data = buffer + off;
+
+   data[0] = (v & 0xUL);
+   data[1] = (v >> 32);
+   }
+
+   return sizeof(*data) * 2;
+}
+
 #ifdef CONFIG_MODULES
 /* Update list and generate events when modules are unloaded. */
 static int gcov_module_notifier(struct notifier_block *nb, unsigned long event,
diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
index c94b820a1b62..e59c290527b2 100644
--- a/kernel/gcov/clang.c
+++ b/kernel/gcov/clang.c
@@ -48,7 +48,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include "gcov.h"
@@ -376,71 +375,6 @@ void gcov_info_free(struct gcov_info *info)
kfree(info);
 }
 
-#define ITER_STRIDEPAGE_SIZE
-
-/**
- * struct gcov_iterator - specifies current file position in logical records
- * @info: associated profiling data
- * @buffer: buffer containing file data
- * @size: size of buffer
- * @pos: current position in file
- */
-struct gcov_iterator {
-   struct gcov_info *info;
-   void *buffer;
-   size_t size;
-   loff_t pos;
-};
-
-/**
- * store_gcov_u32 - store 32 bit number in gcov format to buffer
- * @buffer: target buffer or NULL
- * @off: offset into the buffer
- * @v: value to be stored
- *
- * Number format defined by gcc: numbers are recorded in the 32 bit
- * unsigned binary form of the endianness of the machine generating the
- * file. Returns the number of bytes stored. If @buffer is %NULL, doesn't
- * store anything.
- */
-static size_t store_gcov_u32(void *buffer, size_t off, u32 v)
-{
-   u32 *data;
-
-   if (buffer) {
-   data = buffer + off;
-   *data = v;
-   }
-
-   return sizeof(*data);
-}
-
-/**
- * store_gcov_u64 - store 64 bit number in gcov format to buffer
- * @buffer: target buffer or NULL
- * @off: offset into the buffer
- * @v: value to be stored
- *
- * Number format defined by gcc: numbers are recorded in the 32 bit
- * unsigned binary form of the endianness of the machine generating the
- * file. 64 bit numbers are stored as two 32 bit numbers, the low part
- * first. Returns the number of bytes stored. If @buffer is %NULL, doesn't 
store
- * anything.
- */
-static size_t store_gcov_u64(void *buffer, size_t off, u64 v)
-{
-   u32 *data;
-
-   if (buffer) {
-   data = buffer + off;
-
-   data[0] = (v & 0xUL);
-   data[1] = (v >> 32);
-   }
-
-   return sizeof(*data) * 2;
-}
-
 /**
  * convert_to_gcda - convert profiling data set to gcda file format
  * @buffer: the buffer to store file data or %NULL if no data should be stored
@@ -448,7 +382,7 @@ static size_t store_gcov_u64(void *buffer, size_t off, u64 
v)
  *
  * Returns the number of bytes that were/would have been stored into the 
buffer.
  */
-static size_t convert_to_gcda(char *buffer, struct gcov_info *info)
+size_t convert_to_gcda(

[PATCH 2/3] gcov: simplify buffer allocation

2021-03-15 Thread Johannes Berg
From: Johannes Berg 

Use just a single vmalloc() with struct_size() instead of
a separate kmalloc() for the iter struct.

Signed-off-by: Johannes Berg 
---
 kernel/gcov/fs.c | 24 +---
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/kernel/gcov/fs.c b/kernel/gcov/fs.c
index 2d29e1d1225d..40ea81c0475b 100644
--- a/kernel/gcov/fs.c
+++ b/kernel/gcov/fs.c
@@ -97,9 +97,9 @@ __setup("gcov_persist=", gcov_persist_setup);
  */
 struct gcov_iterator {
struct gcov_info *info;
-   void *buffer;
size_t size;
loff_t pos;
+   char buffer[];
 };
 
 /**
@@ -111,25 +111,20 @@ struct gcov_iterator {
 static struct gcov_iterator *gcov_iter_new(struct gcov_info *info)
 {
struct gcov_iterator *iter;
+   size_t size;
+
+   /* Dry-run to get the actual buffer size. */
+   size = convert_to_gcda(NULL, info);
 
-   iter = kzalloc(sizeof(struct gcov_iterator), GFP_KERNEL);
+   iter = vmalloc(struct_size(iter, buffer, size));
if (!iter)
-   goto err_free;
+   return NULL;
 
iter->info = info;
-   /* Dry-run to get the actual buffer size. */
-   iter->size = convert_to_gcda(NULL, info);
-   iter->buffer = vmalloc(iter->size);
-   if (!iter->buffer)
-   goto err_free;
-
+   iter->size = size;
convert_to_gcda(iter->buffer, info);
 
return iter;
-
-err_free:
-   kfree(iter);
-   return NULL;
 }
 
 
@@ -139,8 +134,7 @@ static struct gcov_iterator *gcov_iter_new(struct gcov_info 
*info)
  */
 static void gcov_iter_free(struct gcov_iterator *iter)
 {
-   vfree(iter->buffer);
-   kfree(iter);
+   vfree(iter);
 }
 
 /**
-- 
2.30.2



[PATCH 3/3] gcov: use kvmalloc()

2021-03-15 Thread Johannes Berg
From: Johannes Berg 

Using vmalloc() in gcov is really quite wasteful, many of the
objects allocated are really small (e.g. I've seen 24 bytes.)
Use kvmalloc() to automatically pick the better of kmalloc()
or vmalloc() depending on the size.

Signed-off-by: Johannes Berg 
---
 kernel/gcov/clang.c   | 6 +++---
 kernel/gcov/fs.c  | 6 +++---
 kernel/gcov/gcc_4_7.c | 6 +++---
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/gcov/clang.c b/kernel/gcov/clang.c
index e59c290527b2..89be9ab2d909 100644
--- a/kernel/gcov/clang.c
+++ b/kernel/gcov/clang.c
@@ -49,7 +49,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include "gcov.h"
 
 typedef void (*llvm_gcov_callback)(void);
@@ -308,7 +308,7 @@ static struct gcov_fn_info *gcov_fn_info_dup(struct 
gcov_fn_info *fn)
goto err_name;
 
cv_size = fn->num_counters * sizeof(fn->counters[0]);
-   fn_dup->counters = vmalloc(cv_size);
+   fn_dup->counters = kvmalloc(cv_size, GFP_KERNEL);
if (!fn_dup->counters)
goto err_counters;
memcpy(fn_dup->counters, fn->counters, cv_size);
@@ -367,7 +367,7 @@ void gcov_info_free(struct gcov_info *info)
 
list_for_each_entry_safe(fn, tmp, >functions, head) {
kfree(fn->function_name);
-   vfree(fn->counters);
+   kvfree(fn->counters);
list_del(>head);
kfree(fn);
}
diff --git a/kernel/gcov/fs.c b/kernel/gcov/fs.c
index 40ea81c0475b..5c3086cad8f9 100644
--- a/kernel/gcov/fs.c
+++ b/kernel/gcov/fs.c
@@ -26,7 +26,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include "gcov.h"
 
 /**
@@ -116,7 +116,7 @@ static struct gcov_iterator *gcov_iter_new(struct gcov_info 
*info)
/* Dry-run to get the actual buffer size. */
size = convert_to_gcda(NULL, info);
 
-   iter = vmalloc(struct_size(iter, buffer, size));
+   iter = kvmalloc(struct_size(iter, buffer, size), GFP_KERNEL);
if (!iter)
return NULL;
 
@@ -134,7 +134,7 @@ static struct gcov_iterator *gcov_iter_new(struct gcov_info 
*info)
  */
 static void gcov_iter_free(struct gcov_iterator *iter)
 {
-   vfree(iter);
+   kvfree(iter);
 }
 
 /**
diff --git a/kernel/gcov/gcc_4_7.c b/kernel/gcov/gcc_4_7.c
index 1251f2434e90..460c12b7dfea 100644
--- a/kernel/gcov/gcc_4_7.c
+++ b/kernel/gcov/gcc_4_7.c
@@ -15,7 +15,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include "gcov.h"
 
 #if (__GNUC__ >= 10)
@@ -309,7 +309,7 @@ struct gcov_info *gcov_info_dup(struct gcov_info *info)
 
cv_size = sizeof(gcov_type) * sci_ptr->num;
 
-   dci_ptr->values = vmalloc(cv_size);
+   dci_ptr->values = kvmalloc(cv_size, GFP_KERNEL);
 
if (!dci_ptr->values)
goto err_free;
@@ -351,7 +351,7 @@ void gcov_info_free(struct gcov_info *info)
ci_ptr = info->functions[fi_idx]->ctrs;
 
for (ct_idx = 0; ct_idx < active; ct_idx++, ci_ptr++)
-   vfree(ci_ptr->values);
+   kvfree(ci_ptr->values);
 
kfree(info->functions[fi_idx]);
}
-- 
2.30.2



Re: [PATCH 0/6] um: fix up CONFIG_GCOV support

2021-03-12 Thread Johannes Berg
On Fri, 2021-03-12 at 10:55 +0100, Johannes Berg wrote:
> CONFIG_GCOV is fairly useful for ARCH=um (e.g. with kunit, though
> my main use case is a bit different) since it writes coverage data
> directly out like a normal userspace binary. Theoretically, that
> is.
> 
> Unfortunately, it's broken in multiple ways today:
> 
>  1) it doesn't like, due to 'mangle_path' in seq_file, and the only
> solution to that seems to be to rename our symbol, but that's
> not so bad, and "mangle_path" sounds very generic anyway, which
> it isn't quite
> 
>  2) gcov requires exit handlers to write out the data, and those are
> never called for modules, config CONSTRUCTORS exists for init
> handlers, so add CONFIG_MODULE_DESTRUCTORS here that we can then
> select in ARCH=um

Yeah, I wish.

None of this can really work. Thing is, __gcov_init(), called from the
constructors, will add the local data structure for the object file into
the global list (__gcov_root). So far, so good.

However, __gcov_exit(), which is called from the destructors (fini_array
which I added support for here) never gets passed the local data
structure pointer. It dumps __gcov_root, and that's it.

That basically means each executable/shared object should have its own
instance of __gcov_root and the functions. But the code in UML was set
up to export __gcov_exit(), which obviously then dumps the kernel's gcov
data.

So to make this really work we should treat modules like shared objects,
and link libgcov.a into each one of them. That might even work, but we
get

ERROR: modpost: "free" [module.ko] undefined!
ERROR: modpost: "vfprintf" [module.ko] undefined!
ERROR: modpost: "fcntl" [module.ko] undefined!
ERROR: modpost: "setbuf" [module.ko] undefined!
ERROR: modpost: "exit" [module.ko] undefined!
ERROR: modpost: "fwrite" [module.ko] undefined!
ERROR: modpost: "stderr" [module.ko] undefined!
ERROR: modpost: "fclose" [module.ko] undefined!
ERROR: modpost: "ftell" [module.ko] undefined!
ERROR: modpost: "fopen" [module.ko] undefined!
ERROR: modpost: "fread" [module.ko] undefined!
ERROR: modpost: "fdopen" [module.ko] undefined!
ERROR: modpost: "fseek" [module.ko] undefined!
ERROR: modpost: "fprintf" [module.ko] undefined!
ERROR: modpost: "strtol" [module.ko] undefined!
ERROR: modpost: "malloc" [module.ko] undefined!
ERROR: modpost: "getpid" [module.ko] undefined!
ERROR: modpost: "getenv" [module.ko] undefined!

We could of course export those, but that makes me nervous, e.g.
printf() is known to use a LOT of stack, far more than we have in the
kernel.

Also, we see:

WARNING: modpost: "__gcov_var" [module] is COMMON symbol
WARNING: modpost: "__gcov_root" [module] is COMMON symbol

which means the module cannot be loaded.

I think I'll just make CONFIG_GCOV depend on !MODULE instead, and for my
use case use CONFIG_GCOV_KERNEL.

Or maybe just kill CONFIG_GCOV entirely, since obviously nobody has ever
tried to use it with modules or with recent toolchains (gcc 9 or newer,
the mangle_path conflict).

johannes



Re: [PATCH 2/6] module: add support for CONFIG_MODULE_DESTRUCTORS

2021-03-12 Thread Johannes Berg
On Fri, 2021-03-12 at 10:55 +0100, Johannes Berg wrote:
> From: Johannes Berg 
> 
> At least in ARCH=um with CONFIG_GCOV (which writes all the
> coverage data directly out from the userspace binary rather
> than presenting it in debugfs) it's necessary to run all
> the atexit handlers (dtors/fini_array) so that gcov actually
> does write out the data.
> 
> Add a new config option CONFIG_MODULE_DESTRUCTORS that can
> be selected via CONFIG_WANT_MODULE_DESTRUCTORS that the arch
> selects (this indirection exists so the architecture doesn't
> have to worry about whether or not CONFIG_MODULES is on).
> Additionally, the architecture must then (when it exits and
> no more module code can run) call run_all_module_destructors
> to run the code for all modules that are still loaded. When
> modules are unloaded, the handlers are called as well.

Oops, I forgot to add this bit to the patch:

--- a/scripts/module.lds.S
+++ b/scripts/module.lds.S
@@ -16,6 +16,8 @@ SECTIONS {
 
.init_array 0 : ALIGN(8) { *(SORT(.init_array.*)) 
*(.init_array) }
 
+   .fini_array 0 : ALIGN(8) { *(SORT(.fini_array.*)) 
*(.fini_array) }
+
__jump_table0 : ALIGN(8) { KEEP(*(__jump_table)) }
 
__patchable_function_entries : { *(__patchable_function_entries) }


Should that be under the ifdef? .init_array isn't, even though it's only
relevant for CONFIG_CONSTRUCTORS.

johannes



[PATCH 2/6] module: add support for CONFIG_MODULE_DESTRUCTORS

2021-03-12 Thread Johannes Berg
From: Johannes Berg 

At least in ARCH=um with CONFIG_GCOV (which writes all the
coverage data directly out from the userspace binary rather
than presenting it in debugfs) it's necessary to run all
the atexit handlers (dtors/fini_array) so that gcov actually
does write out the data.

Add a new config option CONFIG_MODULE_DESTRUCTORS that can
be selected via CONFIG_WANT_MODULE_DESTRUCTORS that the arch
selects (this indirection exists so the architecture doesn't
have to worry about whether or not CONFIG_MODULES is on).
Additionally, the architecture must then (when it exits and
no more module code can run) call run_all_module_destructors
to run the code for all modules that are still loaded. When
modules are unloaded, the handlers are called as well.

Signed-off-by: Johannes Berg 
---
 include/linux/module.h | 14 ++
 init/Kconfig   | 17 +
 kernel/module.c| 39 +++
 3 files changed, 70 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 59f094fa6f74..8574d76a884d 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -517,6 +517,12 @@ struct module {
unsigned int num_ctors;
 #endif
 
+#ifdef CONFIG_MODULE_DESTRUCTORS
+   /* Destructor functions. */
+   ctor_fn_t *dtors;
+   unsigned int num_dtors;
+#endif
+
 #ifdef CONFIG_FUNCTION_ERROR_INJECTION
struct error_injection_entry *ei_funcs;
unsigned int num_ei_funcs;
@@ -853,4 +859,12 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const 
char *,
 struct module *, unsigned long),
   void *data);
 
+#ifdef CONFIG_MODULE_DESTRUCTORS
+void run_all_module_destructors(void);
+#else
+static inline void run_all_module_destructors(void)
+{
+}
+#endif
+
 #endif /* _LINUX_MODULE_H */
diff --git a/init/Kconfig b/init/Kconfig
index 22946fe5ded9..b0f0f51f9d2c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2295,6 +2295,23 @@ config UNUSED_KSYMS_WHITELIST
 
 endif # MODULES
 
+config WANT_MODULE_DESTRUCTORS
+   bool
+   help
+ Architectures may select this if they need atexit functions (such as
+ generated by the compiler for -ftest-coverage/gcov) to run in modules.
+ They're then responsible for calling run_all_module_destructors() at
+ shutdown so that module destructors are called for all still loaded
+ modules as well.
+
+ Note that CONFIG_GCOV_KERNEL does *not* require this since it keeps
+ all the coverage data in the kernel, notably CONFIG_GCOV in ARCH=um
+ requires this.
+
+config MODULE_DESTRUCTORS
+   def_bool y
+   depends on WANT_MODULE_DESTRUCTORS && MODULES
+
 config MODULES_TREE_LOOKUP
def_bool y
depends on PERF_EVENTS || TRACING
diff --git a/kernel/module.c b/kernel/module.c
index 30479355ab85..3023b5f054d4 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -904,6 +904,27 @@ EXPORT_SYMBOL(module_refcount);
 /* This exists whether we can unload or not */
 static void free_module(struct module *mod);
 
+#ifdef CONFIG_MODULE_DESTRUCTORS
+static void do_mod_dtors(struct module *mod)
+{
+   unsigned long i;
+
+   for (i = 0; i < mod->num_dtors; i++)
+   mod->dtors[i]();
+}
+
+void run_all_module_destructors(void)
+{
+   struct module *mod;
+
+   /* we no longer need to care about locking at this point */
+   list_for_each_entry(mod, , list)
+   do_mod_dtors(mod);
+}
+#else
+static inline void do_mod_dtors(struct module *mod) {}
+#endif
+
 SYSCALL_DEFINE2(delete_module, const char __user *, name_user,
unsigned int, flags)
 {
@@ -966,6 +987,7 @@ SYSCALL_DEFINE2(delete_module, const char __user *, 
name_user,
 MODULE_STATE_GOING, mod);
klp_module_going(mod);
ftrace_release_mod(mod);
+   do_mod_dtors(mod);
 
async_synchronize_full();
 
@@ -3263,6 +3285,23 @@ static int find_module_sections(struct module *mod, 
struct load_info *info)
}
 #endif
 
+#ifdef CONFIG_MODULE_DESTRUCTORS
+   mod->dtors = section_objs(info, ".dtors",
+ sizeof(*mod->dtors), >num_dtors);
+   if (!mod->dtors)
+   mod->dtors = section_objs(info, ".fini_array",
+   sizeof(*mod->dtors), >num_dtors);
+   else if (find_sec(info, ".fini_array")) {
+   /*
+* This shouldn't happen with same compiler and binutils
+* building all parts of the module.
+*/
+   pr_warn("%s: has both .dtors and .fini_array.\n",
+  mod->name);
+   return -EINVAL;
+   }
+#endif
+
mod->noinstr_text_start = section_objs(info, ".noinstr.text", 1,
>noinstr_text_size);
 
-- 
2.29.2



[PATCH 3/6] .gitignore: also ignore gcda files

2021-03-12 Thread Johannes Berg
From: Johannes Berg 

We already ignore gcno files that are created by the compiler
at build time for -ftest-coverage. However, with ARCH=um it's
possible to select CONFIG_GCOV which actually has the kernel
binary write out gcda files (rather than have them in debugfs
like CONFIG_GCOV_KERNEL does), so an in-tree build can create
them. Ignore them so the tree doesn't look dirty for that.

Signed-off-by: Johannes Berg 
---
 .gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitignore b/.gitignore
index 3af66272d6f1..91e46190d418 100644
--- a/.gitignore
+++ b/.gitignore
@@ -23,6 +23,7 @@
 *.dwo
 *.elf
 *.gcno
+*.gcda
 *.gz
 *.i
 *.ko
-- 
2.29.2



[PATCH 4/6] um: split up CONFIG_GCOV

2021-03-12 Thread Johannes Berg
From: Johannes Berg 

It's not always desirable to collect coverage data for the
entire kernel, so split off CONFIG_GCOV_BASE. This option
only enables linking with coverage options, and compiles a
single file (reboot.c) with them as well to force gcov to
be linked into the kernel binary. That way, modules also
work.

To use this new option properly, one needs to manually add
'-fprofile-arcs -ftest-coverage' to the compiler options
of some object(s) or subdir(s) to collect coverage data at
the desired places.

Signed-off-by: Johannes Berg 
---
 arch/um/Kconfig.debug   | 38 ++
 arch/um/Makefile-skas   |  2 +-
 arch/um/kernel/Makefile | 11 ++-
 3 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/arch/um/Kconfig.debug b/arch/um/Kconfig.debug
index 315d368e63ad..ca040b4e86e5 100644
--- a/arch/um/Kconfig.debug
+++ b/arch/um/Kconfig.debug
@@ -13,19 +13,41 @@ config GPROF
  If you're involved in UML kernel development and want to use gprof,
  say Y.  If you're unsure, say N.
 
-config GCOV
-   bool "Enable gcov support"
+config GCOV_BASE
+   bool "Enable gcov support (selectively)"
depends on DEBUG_INFO
-   depends on !KCOV
+   depends on !KCOV && !GCOV_KERNEL
help
  This option allows developers to retrieve coverage data from a UML
- session.
+ session, stored to disk just like with a regular userspace binary,
+ use the same tools (gcov, lcov, ...) to collect and process the
+ data.
 
- See <http://user-mode-linux.sourceforge.net/old/gprof.html> for more
- details.
+ See also KCOV and GCOV_KERNEL as alternatives.
 
- If you're involved in UML kernel development and want to use gcov,
- say Y.  If you're unsure, say N.
+ This option (almost) only links with the needed support code, but
+ doesn't enable coverage data collection for any code (other than a
+ dummy file to get everything linked properly). See also the GCOV
+ option which enables coverage collection for the entire kernel and
+ all modules.
+
+ If you're using UML to test something and want to manually instruct
+ the compiler to instrument only parts of the code by adding the
+ relevant options for the objects you care about, say Y and do that
+ to get coverage collection only for the parts you need.
+
+ If you're unsure, say N.
+
+config GCOV
+   bool "Enable gcov support (whole kernel)"
+   depends on DEBUG_INFO
+   depends on !KCOV && !GCOV_KERNEL
+   select GCOV_BASE
+   help
+ This enables coverage data collection for the entire kernel and
+ all modules, see the GCOV_BASE option for more information.
+
+ If you're unsure, say N.
 
 config EARLY_PRINTK
bool "Early printk"
diff --git a/arch/um/Makefile-skas b/arch/um/Makefile-skas
index ac35de5316a6..b5be5f55ac11 100644
--- a/arch/um/Makefile-skas
+++ b/arch/um/Makefile-skas
@@ -8,5 +8,5 @@ GCOV_OPT += -fprofile-arcs -ftest-coverage
 
 CFLAGS-$(CONFIG_GCOV) += $(GCOV_OPT)
 CFLAGS-$(CONFIG_GPROF) += $(GPROF_OPT)
-LINK-$(CONFIG_GCOV) += $(GCOV_OPT)
+LINK-$(CONFIG_GCOV_BASE) += $(GCOV_OPT)
 LINK-$(CONFIG_GPROF) += $(GPROF_OPT)
diff --git a/arch/um/kernel/Makefile b/arch/um/kernel/Makefile
index c1205f9ec17e..0403e329f931 100644
--- a/arch/um/kernel/Makefile
+++ b/arch/um/kernel/Makefile
@@ -21,7 +21,7 @@ obj-y = config.o exec.o exitcode.o irq.o ksyms.o mem.o \
 
 obj-$(CONFIG_BLK_DEV_INITRD) += initrd.o
 obj-$(CONFIG_GPROF)+= gprof_syms.o
-obj-$(CONFIG_GCOV) += gmon_syms.o
+obj-$(CONFIG_GCOV_BASE)+= gmon_syms.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-$(CONFIG_GENERIC_PCI_IOMAP) += ioport.o
@@ -32,6 +32,15 @@ include arch/um/scripts/Makefile.rules
 
 targets := config.c config.tmp
 
+# This just causes _something_ to be compiled with coverage
+# collection so that gcov is linked into the binary, in case
+# the only thing that has it enabled is a module, when only
+# CONFIG_GCOV_BASE is selected. Yes, we thus always get some
+# coverage data for this file, but it's not hit often ...
+ifeq ($(CONFIG_GCOV_BASE),y)
+CFLAGS_reboot.o += -fprofile-arcs -ftest-coverage
+endif
+
 # Be careful with the below Sed code - sed is pitfall-rich!
 # We use sed to lower build requirements, for "embedded" builders for instance.
 
-- 
2.29.2



[PATCH 6/6] um: fix CONFIG_GCOV for modules

2021-03-12 Thread Johannes Berg
From: Johannes Berg 

At least with current toolchain versions, gcov (as enabled
by CONFIG_GCOV) requires init and exit handlers to run. For
modules, this wasn't done properly, so use the new support
for CONFIG_MODULE_DESTRUCTORS as well as CONFIG_CONSTRUCTORS
to have gcov init and exit called appropriately.

Signed-off-by: Johannes Berg 
---
 arch/um/Kconfig.debug| 2 ++
 arch/um/kernel/process.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/um/Kconfig.debug b/arch/um/Kconfig.debug
index ca040b4e86e5..74b27b11cb44 100644
--- a/arch/um/Kconfig.debug
+++ b/arch/um/Kconfig.debug
@@ -17,6 +17,8 @@ config GCOV_BASE
bool "Enable gcov support (selectively)"
depends on DEBUG_INFO
depends on !KCOV && !GCOV_KERNEL
+   select CONSTRUCTORS
+   select WANT_MODULE_DESTRUCTORS
help
  This option allows developers to retrieve coverage data from a UML
  session, stored to disk just like with a regular userspace binary,
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index c5011064b5dd..33f895a95ff8 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -240,6 +240,8 @@ void do_uml_exitcalls(void)
call = &__uml_exitcall_end;
while (--call >= &__uml_exitcall_begin)
(*call)();
+
+   run_all_module_destructors();
 }
 
 char *uml_strdup(const char *string)
-- 
2.29.2



[PATCH 1/6] seq_file: rename mangle_path to seq_mangle_path

2021-03-12 Thread Johannes Berg
From: Johannes Berg 

The symbol mangle_path conflicts with a gcov symbol which
can break the build of ARCH=um with gcov, and it's also
not very specific and descriptive.

Rename mangle_path() to seq_mangle_path(), and also remove
the export since it's not needed or used by any modules.

Signed-off-by: Johannes Berg 
---
 fs/seq_file.c| 11 +--
 include/linux/seq_file.h |  2 +-
 lib/seq_buf.c|  2 +-
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index cb11a34fb871..dfa1982a87ca 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -413,7 +413,7 @@ void seq_printf(struct seq_file *m, const char *f, ...)
 EXPORT_SYMBOL(seq_printf);
 
 /**
- * mangle_path -   mangle and copy path to buffer beginning
+ * seq_mangle_path -   mangle and copy path to buffer beginning
  * @s: buffer start
  * @p: beginning of path in above buffer
  * @esc: set of characters that need escaping
@@ -423,7 +423,7 @@ EXPORT_SYMBOL(seq_printf);
  *  Returns pointer past last written character in @s, or NULL in case of
  *  failure.
  */
-char *mangle_path(char *s, const char *p, const char *esc)
+char *seq_mangle_path(char *s, const char *p, const char *esc)
 {
while (s <= p) {
char c = *p++;
@@ -442,7 +442,6 @@ char *mangle_path(char *s, const char *p, const char *esc)
}
return NULL;
 }
-EXPORT_SYMBOL(mangle_path);
 
 /**
  * seq_path - seq_file interface to print a pathname
@@ -462,7 +461,7 @@ int seq_path(struct seq_file *m, const struct path *path, 
const char *esc)
if (size) {
char *p = d_path(path, buf, size);
if (!IS_ERR(p)) {
-   char *end = mangle_path(buf, p, esc);
+   char *end = seq_mangle_path(buf, p, esc);
if (end)
res = end - buf;
}
@@ -505,7 +504,7 @@ int seq_path_root(struct seq_file *m, const struct path 
*path,
return SEQ_SKIP;
res = PTR_ERR(p);
if (!IS_ERR(p)) {
-   char *end = mangle_path(buf, p, esc);
+   char *end = seq_mangle_path(buf, p, esc);
if (end)
res = end - buf;
else
@@ -529,7 +528,7 @@ int seq_dentry(struct seq_file *m, struct dentry *dentry, 
const char *esc)
if (size) {
char *p = dentry_path(dentry, buf, size);
if (!IS_ERR(p)) {
-   char *end = mangle_path(buf, p, esc);
+   char *end = seq_mangle_path(buf, p, esc);
if (end)
res = end - buf;
}
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index b83b3ae3c877..0a7dda239e56 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -104,7 +104,7 @@ static inline void seq_setwidth(struct seq_file *m, size_t 
size)
 }
 void seq_pad(struct seq_file *m, char c);
 
-char *mangle_path(char *s, const char *p, const char *esc);
+char *seq_mangle_path(char *s, const char *p, const char *esc);
 int seq_open(struct file *, const struct seq_operations *);
 ssize_t seq_read(struct file *, char __user *, size_t, loff_t *);
 ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter);
diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 707453f5d58e..90b50a514edb 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -274,7 +274,7 @@ int seq_buf_path(struct seq_buf *s, const struct path 
*path, const char *esc)
if (size) {
char *p = d_path(path, buf, size);
if (!IS_ERR(p)) {
-   char *end = mangle_path(buf, p, esc);
+   char *end = seq_mangle_path(buf, p, esc);
if (end)
res = end - buf;
}
-- 
2.29.2



[PATCH 0/6] um: fix up CONFIG_GCOV support

2021-03-12 Thread Johannes Berg
CONFIG_GCOV is fairly useful for ARCH=um (e.g. with kunit, though
my main use case is a bit different) since it writes coverage data
directly out like a normal userspace binary. Theoretically, that
is.

Unfortunately, it's broken in multiple ways today:

 1) it doesn't like, due to 'mangle_path' in seq_file, and the only
solution to that seems to be to rename our symbol, but that's
not so bad, and "mangle_path" sounds very generic anyway, which
it isn't quite

 2) gcov requires exit handlers to write out the data, and those are
never called for modules, config CONSTRUCTORS exists for init
handlers, so add CONFIG_MODULE_DESTRUCTORS here that we can then
select in ARCH=um

 3) As mentioned above, gcov requires init/exit handlers, but they
aren't linked into binary properly, that's easy to fix.

 4) gcda files are then written, so .gitignore them

 5) it's not always useful to create coverage data for the *entire*
kernel, so I've split off CONFIG_GCOV_BASE from CONFIG_GCOV to
allow option in only in some places, which of course requires
adding the necessary "subdir-cflags" or "CFLAGS_obj" changes in
the places where it's desired, as local patches.


None of these changes (hopefully) seem too controversional, biggest
are the module changes but obviously they compile to nothing if the
architecture doesn't WANT_MODULE_DESTRUCTORS.

Any thoughts on how to merge this? The seq_file/.gitignore changes
are independent at least code-wise, though of course it only works
with the seq_file changes (.gitignore doesn't matter, of course),
while the module changes are a requirement for the later ARCH=um
patches since the Kconfig symbol has to exist.

Perhaps I can just get ACKs on all the patches and then they can go
through the UML tree?

Thanks,
johannes




[PATCH 5/6] um: fix CONFIG_GCOV for built-in code

2021-03-12 Thread Johannes Berg
From: Johannes Berg 

With contemporary toolchains, CONFIG_GCOV doesn't work because
gcov now relies on both init and exit handlers, but those are
discarded from the binary. Fix the linker scripts to keep them
instead, so that CONFIG_GCOV can work again.

Note that this does not make it work in modules yet, since we
don't call their exit handlers.

Signed-off-by: Johannes Berg 
---
 arch/um/include/asm/common.lds.S | 2 ++
 arch/um/kernel/vmlinux.lds.S | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/um/include/asm/common.lds.S b/arch/um/include/asm/common.lds.S
index eca6c452a41b..1223dcaaf7e3 100644
--- a/arch/um/include/asm/common.lds.S
+++ b/arch/um/include/asm/common.lds.S
@@ -84,11 +84,13 @@
   .init_array : {
__init_array_start = .;
*(.init_array)
+   *(.init_array.*)
__init_array_end = .;
   }
   .fini_array : {
__fini_array_start = .;
*(.fini_array)
+   *(.fini_array.*)
__fini_array_end = .;
   }
 
diff --git a/arch/um/kernel/vmlinux.lds.S b/arch/um/kernel/vmlinux.lds.S
index 16e49bfa2b42..2245ae4907d2 100644
--- a/arch/um/kernel/vmlinux.lds.S
+++ b/arch/um/kernel/vmlinux.lds.S
@@ -1,6 +1,8 @@
 
 KERNEL_STACK_SIZE = 4096 * (1 << CONFIG_KERNEL_STACK_ORDER);
 
+#define RUNTIME_DISCARD_EXIT
+
 #ifdef CONFIG_LD_SCRIPT_STATIC
 #include "uml.lds.S"
 #else
-- 
2.29.2



Re: [PATCH] uml: remove unneeded variable 'ret'

2021-03-10 Thread Johannes Berg
On Wed, 2021-03-10 at 16:49 +0800, Yang Li wrote:
> Fix the following coccicheck warning:
> ./arch/um/drivers/hostaudio_kern.c:125:10-14: Unneeded variable: "mask".
> Return "0" on line 131

Word of caution to you:

You've already managed to be in various people's block list due to
sending patches such as

https://lore.kernel.org/linuxppc-dev/1614243417-48556-1-git-send-email-yang@linux.alibaba.com/


You might want to stop for a while, and *really* carefully look at each
and every patch you send out.


In this case, you messed up the subject.

johannes



Re: BUG: soft lockup in ieee80211_tasklet_handler

2021-03-04 Thread Johannes Berg
On Tue, 2021-03-02 at 20:01 +0100, Dmitry Vyukov wrote:
> 
> Looking at the reproducer that mostly contains just perf_event_open,
> It may be the old known issue of perf_event_open with some extreme
> parameters bringing down kernel.
> +perf maintainers
> And as far as I remember +Peter had some patch to restrict
> perf_event_open parameters.
> 
> r0 = perf_event_open(&(0x7f01d000)={0x1, 0x70, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x3ff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xfffe, 0x0, @perf_config_ext}, 0x0,
> 0x0, 0x, 0x0)

Oh! Thanks for looking.

Seems that also applies to

https://syzkaller.appspot.com/bug?extid=d6219cf21f26bdfcc22e

FWIW. I was still tracking that one, but never had a chance to look at
it (also way down the list since it was reported as directly in hwsim)

johannes



Re: BUG: soft lockup in ieee80211_tasklet_handler

2021-03-02 Thread Johannes Berg
On Wed, 2021-02-24 at 10:30 +0800, Hillf Danton wrote:
> 
> Add budget for the 80211 softint handler - it's feasible not to try to
> build the giant pyramid in a week.
> 
> --- x/net/mac80211/main.c
> +++ y/net/mac80211/main.c
> @@ -224,9 +224,15 @@ static void ieee80211_tasklet_handler(un
>  {
>   struct ieee80211_local *local = (struct ieee80211_local *) data;
>   struct sk_buff *skb;
> + int i = 0;
> +
> + while (i++ < 64) {
> + skb = skb_dequeue(>skb_queue);
> + if (!skb)
> + skb = skb_dequeue(>skb_queue_unreliable);
> + if (!skb)
> + return;

I guess that's not such a bad idea, but I do wonder how we get here,
userspace can submit packets faster than we can process?

It feels like a simulation-only case, tbh, since over the air you have
limits how much bandwidth you can get ... unless you have a very slow
CPU?

In any case, if you want anything merged you're going to have to submit
a proper patch with a real commit message and Signed-off-by, etc.

johannes



Re: Lockdep warning in iwl_pcie_rx_handle()

2021-03-01 Thread Johannes Berg
Hi Jiri,

> I am getting the splat below with Linus' tree as of today (5.11-rc1, 
> fe07bfda2fb). I haven't started to look into the code yet, but apparently 
> this has been already reported by Heiner here:
> 
>   https://www.spinics.net/lists/linux-wireless/msg208353.html
> 
> so before I start digging deep into it (the previous kernel this 
> particular machine had is 5.9, so I'd rather avoid lenghty bisect for now 
> in case someone has already looked into it and has ideas where the problem 
> is), I thought I'd ask whether this has been root-caused elsewhere 
> already.

Yeah, I'm pretty sure we have a fix for this, though I'm not sure right
now where it is in the pipeline.

It's called "iwlwifi: pcie: don't add NAPI under rxq->lock" but right
now I can't find it in any of the public archives.

johannes



[PATCH v2 5/8] um: time-travel/signals: fix ndelay() in interrupt

2021-03-01 Thread Johannes Berg
From: Johannes Berg 

We should be able to ndelay() from any context, even from an
interrupt context! However, this is broken (not functionally,
but locking-wise) in time-travel because we'll get into the
time-travel code and enable interrupts to handle messages on
other time-travel aware subsystems (only virtio for now).

Luckily, I've already reworked the time-travel aware signal
(interrupt) delivery for suspend/resume to have a time travel
handler, which runs directly in the context of the signal and
not from the Linux interrupt.

In order to fix this time-travel issue then, we need to do a
few things:

 1) rework the signal handling code to not block SIGIO (if
time-travel is enabled, just to simplify the other case)
but rather let it bubble through the system, all the way
*past* the IRQ's timetravel_handler, stopping it after
that and before real interrupt delivery if IRQs are not
actually enabled;
 2) rework time-travel to not enable interrupts while it's
waiting for a message;
 3) rework time-travel to not (just) disable interrupts but
rather block signals at a lower level while it needs them
disabled for communicating with the controller.

Finally, since now we can actually spend even virtual time
in interrupts-disabled sections, the delay warning when we
deliver a time-travel delayed interrupt is no longer valid,
things can (and should) now get delayed.

Signed-off-by: Johannes Berg 
---
v2:
 - check for signals_enabled as well as irqs_suspended
   to see if only time-travel handlers should be called
 - fix race in unblock
---
 arch/um/include/shared/irq_user.h |  1 +
 arch/um/include/shared/os.h   |  3 +++
 arch/um/kernel/irq.c  | 42 -
 arch/um/kernel/time.c | 35 +---
 arch/um/os-Linux/signal.c | 45 ---
 5 files changed, 93 insertions(+), 33 deletions(-)

diff --git a/arch/um/include/shared/irq_user.h 
b/arch/um/include/shared/irq_user.h
index 07239e801a5b..065829f443ae 100644
--- a/arch/um/include/shared/irq_user.h
+++ b/arch/um/include/shared/irq_user.h
@@ -17,6 +17,7 @@ enum um_irq_type {
 
 struct siginfo;
 extern void sigio_handler(int sig, struct siginfo *unused_si, struct 
uml_pt_regs *regs);
+void sigio_run_timetravel_handlers(void);
 extern void free_irq_by_fd(int fd);
 extern void deactivate_fd(int fd, int irqnum);
 extern int deactivate_all_fds(void);
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index f9fbbddc38bb..453b13369533 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -242,6 +242,9 @@ extern int set_signals_trace(int enable);
 extern int os_is_signal_stack(void);
 extern void deliver_alarm(void);
 extern void register_pm_wake_signal(void);
+extern void block_signals_hard(void);
+extern void unblock_signals_hard(void);
+extern void mark_sigio_pending(void);
 
 /* util.c */
 extern void stack_protections(unsigned long address);
diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index 82af5191e73d..ccf5e4d27202 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
@@ -123,7 +123,8 @@ static bool irq_do_timetravel_handler(struct irq_entry 
*entry,
 #endif
 
 static void sigio_reg_handler(int idx, struct irq_entry *entry, enum 
um_irq_type t,
- struct uml_pt_regs *regs)
+ struct uml_pt_regs *regs,
+ bool timetravel_handlers_only)
 {
struct irq_reg *reg = >reg[t];
 
@@ -136,18 +137,32 @@ static void sigio_reg_handler(int idx, struct irq_entry 
*entry, enum um_irq_type
if (irq_do_timetravel_handler(entry, t))
return;
 
-   if (irqs_suspended)
+#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
+   /*
+* We can only get here with signals disabled if we have time-travel
+* support, otherwise we cannot have the hard handlers that may need
+* to run even in interrupts-disabled sections and therefore sigio is
+* blocked as well when interrupts are disabled.
+*/
+   if (!signals_enabled) {
+   mark_sigio_pending();
+   return;
+   }
+#endif
+
+   if (timetravel_handlers_only)
return;
 
irq_io_loop(reg, regs);
 }
 
-void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs 
*regs)
+static void _sigio_handler(struct uml_pt_regs *regs,
+  bool timetravel_handlers_only)
 {
struct irq_entry *irq_entry;
int n, i;
 
-   if (irqs_suspended && !um_irq_timetravel_handler_used())
+   if (timetravel_handlers_only && !um_irq_timetravel_handler_used())
return;
 
while (1) {
@@ -172,14 +187,24 @@ void sigio_handler(int sig, struct siginfo *unused_si, 
struct uml_pt_regs *regs)
irq_entry = os_epoll_get_data_pointer(i);
 
for (t = 0; t

[PATCH v2 6/8] um: irqs: allow invoking time-travel handler multiple times

2021-03-01 Thread Johannes Berg
From: Johannes Berg 

If we happen to get multiple messages while IRQS are already
suspended, we still need to handle them, since otherwise the
simulation blocks.

Remove the "prevent nesting" part, time_travel_add_irq_event()
will deal with being called multiple times just fine.

Signed-off-by: Johannes Berg 
---
 arch/um/kernel/irq.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index ccf5e4d27202..2ee0a368aa59 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
@@ -101,10 +101,12 @@ static bool irq_do_timetravel_handler(struct irq_entry 
*entry,
if (!reg->timetravel_handler)
return false;
 
-   /* prevent nesting - we'll get it again later when we SIGIO ourselves */
-   if (reg->pending_on_resume)
-   return true;
-
+   /*
+* Handle all messages - we might get multiple even while
+* interrupts are already suspended, due to suspend order
+* etc. Note that time_travel_add_irq_event() will not add
+* an event twice, if it's pending already "first wins".
+*/
reg->timetravel_handler(reg->irq, entry->fd, reg->id, >event);
 
if (!reg->event.pending)
-- 
2.26.2



[PATCH v2 7/8] um: add PCI over virtio emulation driver

2021-03-01 Thread Johannes Berg
From: Johannes Berg 

To support testing of PCI/PCIe drivers in UML, add a PCI bus
support driver. This driver uses virtio, which in UML is really
just vhost-user, to talk to devices, and adds the devices to
the virtual PCI bus in the system.

Since virtio already allows DMA/bus mastering this really isn't
all that hard, of course we need the logic_iomem infrastructure
that was added by a previous patch.

The protocol to talk to the device is has a few fairly simple
messages for reading to/writing from config and IO spaces, and
messages for the device to send the various interrupts (INT#,
MSI/MSI-X and while suspended PME#).

Note that currently no offical virtio device ID is assigned for
this protocol, as a consequence this patch requires defining it
in the Kconfig, with a default that makes the driver refuse to
work at all.

Finally, in order to add support for MSI/MSI-X interrupts, some
small changes are needed in the UML IRQ code, it needs to have
more interrupts, changing NR_IRQS from 64 to 128 if this driver
is enabled, but not actually use them for anything so that the
generic IRQ domain/MSI infrastructure can allocate IRQ numbers.

Signed-off-by: Johannes Berg 
---
v2:
 - fix memory leak
---
 arch/um/Kconfig|  13 +-
 arch/um/drivers/Kconfig|  20 +
 arch/um/drivers/Makefile   |   1 +
 arch/um/drivers/virt-pci.c | 885 +
 arch/um/include/asm/Kbuild |   1 -
 arch/um/include/asm/io.h   |   7 +
 arch/um/include/asm/irq.h  |   8 +-
 arch/um/include/asm/msi.h  |   1 +
 arch/um/include/asm/pci.h  |  39 ++
 arch/um/kernel/Makefile|   1 +
 arch/um/kernel/ioport.c|  13 +
 arch/um/kernel/irq.c   |   7 +-
 include/uapi/linux/virtio_pcidev.h |  64 +++
 13 files changed, 1054 insertions(+), 6 deletions(-)
 create mode 100644 arch/um/drivers/virt-pci.c
 create mode 100644 arch/um/include/asm/msi.h
 create mode 100644 arch/um/include/asm/pci.h
 create mode 100644 arch/um/kernel/ioport.c
 create mode 100644 include/uapi/linux/virtio_pcidev.h

diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index 20b0640e01b8..f64d774706e5 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -14,7 +14,7 @@ config UML
select HAVE_FUTEX_CMPXCHG if FUTEX
select HAVE_DEBUG_KMEMLEAK
select HAVE_DEBUG_BUGVERBOSE
-   select NO_DMA
+   select NO_DMA if !UML_DMA_EMULATION
select GENERIC_IRQ_SHOW
select GENERIC_CPU_DEVICES
select HAVE_GCC_PLUGINS
@@ -25,10 +25,21 @@ config MMU
bool
default y
 
+config UML_DMA_EMULATION
+   bool
+
 config NO_IOMEM
bool "disable IOMEM" if EXPERT
+   depends on !INDIRECT_IOMEM
default y
 
+config UML_IOMEM_EMULATION
+   bool
+   select INDIRECT_IOMEM
+   select GENERIC_PCI_IOMAP
+   select GENERIC_IOMAP
+   select NO_GENERIC_PCI_IOPORT_MAP
+
 config NO_IOPORT_MAP
def_bool y
 
diff --git a/arch/um/drivers/Kconfig b/arch/um/drivers/Kconfig
index 03ba34b61115..f145842c40b9 100644
--- a/arch/um/drivers/Kconfig
+++ b/arch/um/drivers/Kconfig
@@ -357,3 +357,23 @@ config UML_RTC
  rtcwake, especially in time-travel mode. This driver enables that
  by providing a fake RTC clock that causes a wakeup at the right
  time.
+
+config UML_PCI_OVER_VIRTIO
+   bool "Enable PCI over VIRTIO device simulation"
+   # in theory, just VIRTIO is enough, but that causes recursion
+   depends on VIRTIO_UML
+   select FORCE_PCI
+   select UML_IOMEM_EMULATION
+   select UML_DMA_EMULATION
+   select PCI_MSI
+   select PCI_MSI_IRQ_DOMAIN
+   select PCI_LOCKLESS_CONFIG
+
+config UML_PCI_OVER_VIRTIO_DEVICE_ID
+   int "set the virtio device ID for PCI emulation"
+   default -1
+   depends on UML_PCI_OVER_VIRTIO
+   help
+ There's no official device ID assigned (yet), set the one you
+ wish to use for experimentation here. The default of -1 is
+ not valid and will cause the driver to fail at probe.
diff --git a/arch/um/drivers/Makefile b/arch/um/drivers/Makefile
index dcc64a02f81f..803666e85414 100644
--- a/arch/um/drivers/Makefile
+++ b/arch/um/drivers/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_BLK_DEV_COW_COMMON) += cow_user.o
 obj-$(CONFIG_UML_RANDOM) += random.o
 obj-$(CONFIG_VIRTIO_UML) += virtio_uml.o
 obj-$(CONFIG_UML_RTC) += rtc.o
+obj-$(CONFIG_UML_PCI_OVER_VIRTIO) += virt-pci.o
 
 # pcap_user.o must be added explicitly.
 USER_OBJS := fd.o null.o pty.o tty.o xterm.o slip_common.o pcap_user.o 
vde_user.o vector_user.o
diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
new file mode 100644
index ..dd85f36197aa
--- /dev/null
+++ b/arch/um/drivers/virt-pci.c
@@ -0,0 +1,885 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Intel Corporation
+ * Author: Johannes Berg 
+ */
+#include 
+#include 
+

[PATCH v2 4/8] um: export signals_enabled directly

2021-03-01 Thread Johannes Berg
From: Johannes Berg 

Use signals_enabled instead of always jumping through
a function call to read it, there's not much point in
that.

Signed-off-by: Johannes Berg 
---
 arch/um/include/asm/irqflags.h   | 10 +-
 arch/um/include/shared/longjmp.h | 14 +++---
 arch/um/include/shared/os.h  |  1 -
 arch/um/kernel/ksyms.c   |  2 +-
 arch/um/os-Linux/signal.c|  7 +--
 5 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/arch/um/include/asm/irqflags.h b/arch/um/include/asm/irqflags.h
index 0642ad9035d1..dab5744e9253 100644
--- a/arch/um/include/asm/irqflags.h
+++ b/arch/um/include/asm/irqflags.h
@@ -2,15 +2,15 @@
 #ifndef __UM_IRQFLAGS_H
 #define __UM_IRQFLAGS_H
 
-extern int get_signals(void);
-extern int set_signals(int enable);
-extern void block_signals(void);
-extern void unblock_signals(void);
+extern int signals_enabled;
+int set_signals(int enable);
+void block_signals(void);
+void unblock_signals(void);
 
 #define arch_local_save_flags arch_local_save_flags
 static inline unsigned long arch_local_save_flags(void)
 {
-   return get_signals();
+   return signals_enabled;
 }
 
 #define arch_local_irq_restore arch_local_irq_restore
diff --git a/arch/um/include/shared/longjmp.h b/arch/um/include/shared/longjmp.h
index 85a1cc290ecb..bdb2869b72b3 100644
--- a/arch/um/include/shared/longjmp.h
+++ b/arch/um/include/shared/longjmp.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 
+extern int signals_enabled;
 extern int setjmp(jmp_buf);
 extern void longjmp(jmp_buf, int);
 
@@ -12,13 +13,12 @@ extern void longjmp(jmp_buf, int);
longjmp(*buf, val); \
 } while(0)
 
-#define UML_SETJMP(buf) ({ \
-   int n; \
-   volatile int enable;\
-   enable = get_signals(); \
-   n = setjmp(*buf); \
-   if(n != 0) \
-   set_signals_trace(enable); \
+#define UML_SETJMP(buf) ({ \
+   int n, enable;  \
+   enable = *(volatile int *)_enabled; \
+   n = setjmp(*buf);   \
+   if(n != 0)  \
+   set_signals_trace(enable);  \
n; })
 
 #endif
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 13d86f94cf0f..f9fbbddc38bb 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -237,7 +237,6 @@ extern void send_sigio_to_self(void);
 extern int change_sig(int signal, int on);
 extern void block_signals(void);
 extern void unblock_signals(void);
-extern int get_signals(void);
 extern int set_signals(int enable);
 extern int set_signals_trace(int enable);
 extern int os_is_signal_stack(void);
diff --git a/arch/um/kernel/ksyms.c b/arch/um/kernel/ksyms.c
index 8ade54a86a7e..b1e5634398d0 100644
--- a/arch/um/kernel/ksyms.c
+++ b/arch/um/kernel/ksyms.c
@@ -7,7 +7,7 @@
 #include 
 
 EXPORT_SYMBOL(set_signals);
-EXPORT_SYMBOL(get_signals);
+EXPORT_SYMBOL(signals_enabled);
 
 EXPORT_SYMBOL(os_stat_fd);
 EXPORT_SYMBOL(os_stat_file);
diff --git a/arch/um/os-Linux/signal.c b/arch/um/os-Linux/signal.c
index 96f511d1aabe..8c9d162e6c51 100644
--- a/arch/um/os-Linux/signal.c
+++ b/arch/um/os-Linux/signal.c
@@ -62,7 +62,7 @@ static void sig_handler_common(int sig, struct siginfo *si, 
mcontext_t *mc)
 #define SIGALRM_BIT 1
 #define SIGALRM_MASK (1 << SIGALRM_BIT)
 
-static int signals_enabled;
+int signals_enabled;
 static unsigned int signals_pending;
 static unsigned int signals_active = 0;
 
@@ -334,11 +334,6 @@ void unblock_signals(void)
}
 }
 
-int get_signals(void)
-{
-   return signals_enabled;
-}
-
 int set_signals(int enable)
 {
int ret;
-- 
2.26.2



[PATCH v2 2/8] lib: add iomem emulation (logic_iomem)

2021-03-01 Thread Johannes Berg
From: Johannes Berg 

Add IO memory emulation that uses callbacks for read/write to
the allocated regions. The callbacks can be registered by the
users using logic_iomem_alloc().

To use, an architecture must 'select LOGIC_IOMEM' in Kconfig
and then include  into asm/io.h to get
the __raw_read*/__raw_write* functions.

Optionally, an architecture may 'select LOGIC_IOMEM_FALLBACK'
in which case non-emulated regions will 'fall back' to the
various real_* functions that must then be provided.

Cc: Arnd Bergmann 
Signed-off-by: Johannes Berg 
---
 include/asm-generic/logic_io.h |  78 
 include/linux/logic_iomem.h|  62 +++
 lib/Kconfig|  14 ++
 lib/Makefile   |   2 +
 lib/logic_iomem.c  | 318 +
 5 files changed, 474 insertions(+)
 create mode 100644 include/asm-generic/logic_io.h
 create mode 100644 include/linux/logic_iomem.h
 create mode 100644 lib/logic_iomem.c

diff --git a/include/asm-generic/logic_io.h b/include/asm-generic/logic_io.h
new file mode 100644
index ..a53116b8c57e
--- /dev/null
+++ b/include/asm-generic/logic_io.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2021 Intel Corporation
+ * Author: johan...@sipsolutions.net
+ */
+#ifndef _LOGIC_IO_H
+#define _LOGIC_IO_H
+#include 
+
+/* include this file into asm/io.h */
+
+#ifdef CONFIG_INDIRECT_IOMEM
+
+#ifdef CONFIG_INDIRECT_IOMEM_FALLBACK
+/*
+ * If you want emulated IO memory to fall back to 'normal' IO memory
+ * if a region wasn't registered as emulated, then you need to have
+ * all of the real_* functions implemented.
+ */
+#if !defined(real_ioremap) || !defined(real_iounmap) || \
+!defined(real_raw_readb) || !defined(real_raw_writeb) || \
+!defined(real_raw_readw) || !defined(real_raw_writew) || \
+!defined(real_raw_readl) || !defined(real_raw_writel) || \
+(defined(CONFIG_64BIT) && \
+ (!defined(real_raw_readq) || !defined(real_raw_writeq))) || \
+!defined(real_memset_io) || \
+!defined(real_memcpy_fromio) || \
+!defined(real_memcpy_toio)
+#error "Must provide fallbacks for real IO memory access"
+#endif /* defined ... */
+#endif /* CONFIG_INDIRECT_IOMEM_FALLBACK */
+
+#define ioremap ioremap
+void __iomem *ioremap(phys_addr_t offset, size_t size);
+
+#define iounmap iounmap
+void iounmap(void __iomem *addr);
+
+#define __raw_readb __raw_readb
+u8 __raw_readb(const volatile void __iomem *addr);
+
+#define __raw_readw __raw_readw
+u16 __raw_readw(const volatile void __iomem *addr);
+
+#define __raw_readl __raw_readl
+u32 __raw_readl(const volatile void __iomem *addr);
+
+#ifdef CONFIG_64BIT
+#define __raw_readq __raw_readq
+u64 __raw_readq(const volatile void __iomem *addr);
+#endif /* CONFIG_64BIT */
+
+#define __raw_writeb __raw_writeb
+void __raw_writeb(u8 value, volatile void __iomem *addr);
+
+#define __raw_writew __raw_writew
+void __raw_writew(u16 value, volatile void __iomem *addr);
+
+#define __raw_writel __raw_writel
+void __raw_writel(u32 value, volatile void __iomem *addr);
+
+#ifdef CONFIG_64BIT
+#define __raw_writeq __raw_writeq
+void __raw_writeq(u64 value, volatile void __iomem *addr);
+#endif /* CONFIG_64BIT */
+
+#define memset_io memset_io
+void memset_io(volatile void __iomem *addr, int value, size_t size);
+
+#define memcpy_fromio memcpy_fromio
+void memcpy_fromio(void *buffer, const volatile void __iomem *addr,
+  size_t size);
+
+#define memcpy_toio memcpy_toio
+void memcpy_toio(volatile void __iomem *addr, const void *buffer, size_t size);
+
+#endif /* CONFIG_INDIRECT_IOMEM */
+#endif /* _LOGIC_IO_H */
diff --git a/include/linux/logic_iomem.h b/include/linux/logic_iomem.h
new file mode 100644
index ..3fa65c964379
--- /dev/null
+++ b/include/linux/logic_iomem.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2021 Intel Corporation
+ * Author: johan...@sipsolutions.net
+ */
+#ifndef __LOGIC_IOMEM_H
+#define __LOGIC_IOMEM_H
+#include 
+#include 
+
+/**
+ * struct logic_iomem_ops - emulated IO memory ops
+ * @read: read an 8, 16, 32 or 64 bit quantity from the given offset,
+ * size is given in bytes (1, 2, 4 or 8)
+ * (64-bit only necessary if CONFIG_64BIT is set)
+ * @write: write an 8, 16 32 or 64 bit quantity to the given offset,
+ * size is given in bytes (1, 2, 4 or 8)
+ * (64-bit only necessary if CONFIG_64BIT is set)
+ * @set: optional, for memset_io()
+ * @copy_from: optional, for memcpy_fromio()
+ * @copy_to: optional, for memcpy_toio()
+ * @unmap: optional, this region is getting unmapped
+ */
+struct logic_iomem_ops {
+   unsigned long (*read)(void *priv, unsigned int offset, int size);
+   void (*write)(void *priv, unsigned int offset, int size,
+ unsigned long val);
+
+   void (*set)(void *priv, unsigned int offset, u8 value, int size);
+   void (*copy_from)(void *priv, void *buffer, u

[PATCH v2 8/8] um: virtio/pci: enable suspend/resume

2021-03-01 Thread Johannes Berg
From: Johannes Berg 

The UM virtual PCI devices currently cannot be suspended properly
since the virtio driver already disables VQs well before the PCI
bus's suspend_noirq wants to complete the transition by writing to
PCI config space.

After trying around for a long time with moving the devices on the
DPM list, trying to create dependencies between them, etc. I gave
up and instead added UML specific cross-driver API that lets the
virt-pci code enable not suspending/resuming VQs for its devices.

This then allows the PCI bus suspend_noirq to still talk to the
device, and suspend/resume works properly.

Signed-off-by: Johannes Berg 
---
 arch/um/drivers/virt-pci.c | 10 
 arch/um/drivers/virtio_uml.c   | 40 ++
 arch/um/include/linux/virtio-uml.h | 13 ++
 3 files changed, 53 insertions(+), 10 deletions(-)
 create mode 100644 arch/um/include/linux/virtio-uml.h

diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
index dd85f36197aa..0b802834f40a 100644
--- a/arch/um/drivers/virt-pci.c
+++ b/arch/um/drivers/virt-pci.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -134,6 +135,9 @@ static int um_pci_send_cmd(struct um_pci_device *dev,
if (completed == HANDLE_NO_FREE(cmd))
break;
 
+   if (completed && !HANDLE_IS_NO_FREE(completed))
+   kfree(completed);
+
if (WARN_ONCE(virtqueue_is_broken(dev->cmd_vq) ||
  ++delay_count > UM_VIRT_PCI_MAXDELAY,
  "um virt-pci delay: %d", delay_count)) {
@@ -550,6 +554,12 @@ static int um_pci_virtio_probe(struct virtio_device *vdev)
 
device_set_wakeup_enable(>dev, true);
 
+   /*
+* In order to do suspend-resume properly, don't allow VQs
+* to be suspended.
+*/
+   virtio_uml_set_no_vq_suspend(vdev, true);
+
um_pci_rescan();
return 0;
 error:
diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
index 91ddf74ca888..4412d6febade 100644
--- a/arch/um/drivers/virtio_uml.c
+++ b/arch/um/drivers/virtio_uml.c
@@ -56,6 +56,7 @@ struct virtio_uml_device {
u8 status;
u8 registered:1;
u8 suspended:1;
+   u8 no_vq_suspend:1;
 
u8 config_changed_irq:1;
uint64_t vq_irq_vq_map;
@@ -1098,6 +1099,19 @@ static void virtio_uml_release_dev(struct device *d)
kfree(vu_dev);
 }
 
+void virtio_uml_set_no_vq_suspend(struct virtio_device *vdev,
+ bool no_vq_suspend)
+{
+   struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev);
+
+   if (WARN_ON(vdev->config != _uml_config_ops))
+   return;
+
+   vu_dev->no_vq_suspend = no_vq_suspend;
+   dev_info(>dev, "%sabled VQ suspend\n",
+no_vq_suspend ? "dis" : "en");
+}
+
 /* Platform device */
 
 static int virtio_uml_probe(struct platform_device *pdev)
@@ -1302,13 +1316,16 @@ MODULE_DEVICE_TABLE(of, virtio_uml_match);
 static int virtio_uml_suspend(struct platform_device *pdev, pm_message_t state)
 {
struct virtio_uml_device *vu_dev = platform_get_drvdata(pdev);
-   struct virtqueue *vq;
 
-   virtio_device_for_each_vq((_dev->vdev), vq) {
-   struct virtio_uml_vq_info *info = vq->priv;
+   if (!vu_dev->no_vq_suspend) {
+   struct virtqueue *vq;
 
-   info->suspended = true;
-   vhost_user_set_vring_enable(vu_dev, vq->index, false);
+   virtio_device_for_each_vq((_dev->vdev), vq) {
+   struct virtio_uml_vq_info *info = vq->priv;
+
+   info->suspended = true;
+   vhost_user_set_vring_enable(vu_dev, vq->index, false);
+   }
}
 
if (!device_may_wakeup(_dev->vdev.dev)) {
@@ -1322,13 +1339,16 @@ static int virtio_uml_suspend(struct platform_device 
*pdev, pm_message_t state)
 static int virtio_uml_resume(struct platform_device *pdev)
 {
struct virtio_uml_device *vu_dev = platform_get_drvdata(pdev);
-   struct virtqueue *vq;
 
-   virtio_device_for_each_vq((_dev->vdev), vq) {
-   struct virtio_uml_vq_info *info = vq->priv;
+   if (!vu_dev->no_vq_suspend) {
+   struct virtqueue *vq;
+
+   virtio_device_for_each_vq((_dev->vdev), vq) {
+   struct virtio_uml_vq_info *info = vq->priv;
 
-   info->suspended = false;
-   vhost_user_set_vring_enable(vu_dev, vq->index, true);
+   info->suspended = false;
+   vhost_user_set_vring_enable(vu_dev, vq->index, true);
+   }
}
 
vu_dev->suspended = false;
diff --git a/arch/um/include/linux/virtio-uml.h 
b/arch

[PATCH v2 0/8] PCI support for UML

2021-03-01 Thread Johannes Berg
Hi,

Changes since v1:
 - fix a memory leak in the PCI code
 - fix race in interrupt handling
 - fix checks in interrupt handling
 - use asm-generic for fb.h and vga.h
 - rediff against v5.12-rc1
 - export signals_enabled directly


Original description:

In order to simulate some devices and write tests completely
independent of real PCI devices, we continued the development
of time-travel and related bits, and are adding PCI support
here now.

The way it works is that it communicates with the outside (of
UML) with virtio, which we previously added using vhost-user,
and then offers a PCI bus to the inside system, where normal
PCI probing etc. happens, but all config space & IO accesses
are forwarded over virtio.

To enable that, add lib/logic_iomem, similar to logic_pio but
for iomem regions, this way, ioread/iowrite can be redirected
over the virtio device.

Since currently no official virtio device ID is assigned yet
a Kconfig option for that is required to be set to the value
you want to use locally for experimentation. Once we have an
official value we can change the default (currently -1 which
makes it non-functional) or remove the option entirely.

johannes






[PATCH v2 3/8] um: remove unused smp_sigio_handler() declaration

2021-03-01 Thread Johannes Berg
From: Johannes Berg 

This function doesn't exist, remove its declaration.

Signed-off-by: Johannes Berg 
---
 arch/um/include/shared/kern_util.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/um/include/shared/kern_util.h 
b/arch/um/include/shared/kern_util.h
index 2888ec812f6e..a2cfd42608a0 100644
--- a/arch/um/include/shared/kern_util.h
+++ b/arch/um/include/shared/kern_util.h
@@ -33,7 +33,6 @@ extern int handle_page_fault(unsigned long address, unsigned 
long ip,
 int is_write, int is_user, int *code_out);
 
 extern unsigned int do_IRQ(int irq, struct uml_pt_regs *regs);
-extern int smp_sigio_handler(void);
 extern void initial_thread_cb(void (*proc)(void *), void *arg);
 extern int is_syscall(unsigned long addr);
 
-- 
2.26.2



[PATCH v2 1/8] um: allow disabling NO_IOMEM

2021-03-01 Thread Johannes Berg
From: Johannes Berg 

Adjust the kconfig a little to allow disabling NO_IOMEM in UML. To
make an "allyesconfig" with CONFIG_NO_IOMEM=n build, adjust a few
Kconfig things elsewhere and add dummy asm/fb.h and asm/vga.h files.

Signed-off-by: Johannes Berg 
---
v2: use asm-generic for fb.h and vga.h
---
 arch/um/Kconfig| 4 
 arch/um/include/asm/Kbuild | 2 ++
 drivers/input/Kconfig  | 1 -
 drivers/input/gameport/Kconfig | 1 +
 drivers/input/joystick/Kconfig | 1 +
 drivers/tty/Kconfig| 5 ++---
 drivers/video/console/Kconfig  | 2 +-
 7 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index c3030db3325f..20b0640e01b8 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -26,6 +26,10 @@ config MMU
default y
 
 config NO_IOMEM
+   bool "disable IOMEM" if EXPERT
+   default y
+
+config NO_IOPORT_MAP
def_bool y
 
 config ISA
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index d7492e5a1bbb..10b7228b3aee 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -7,6 +7,7 @@ generic-y += device.h
 generic-y += emergency-restart.h
 generic-y += exec.h
 generic-y += extable.h
+generic-y += fb.h
 generic-y += ftrace.h
 generic-y += futex.h
 generic-y += hw_irq.h
@@ -27,3 +28,4 @@ generic-y += trace_clock.h
 generic-y += word-at-a-time.h
 generic-y += kprobes.h
 generic-y += mm_hooks.h
+generic-y += vga.h
diff --git a/drivers/input/Kconfig b/drivers/input/Kconfig
index ec0e861f185f..5baebf62df33 100644
--- a/drivers/input/Kconfig
+++ b/drivers/input/Kconfig
@@ -4,7 +4,6 @@
 #
 
 menu "Input device support"
-   depends on !UML
 
 config INPUT
tristate "Generic input layer (needed for keyboard, mouse, ...)" if 
EXPERT
diff --git a/drivers/input/gameport/Kconfig b/drivers/input/gameport/Kconfig
index 4761795cb49f..5a2c2fb3217d 100644
--- a/drivers/input/gameport/Kconfig
+++ b/drivers/input/gameport/Kconfig
@@ -4,6 +4,7 @@
 #
 config GAMEPORT
tristate "Gameport support"
+   depends on !UML
help
  Gameport support is for the standard 15-pin PC gameport. If you
  have a joystick, gamepad, gameport card, a soundcard with a gameport
diff --git a/drivers/input/joystick/Kconfig b/drivers/input/joystick/Kconfig
index 5e38899058c1..b10948bf9551 100644
--- a/drivers/input/joystick/Kconfig
+++ b/drivers/input/joystick/Kconfig
@@ -4,6 +4,7 @@
 #
 menuconfig INPUT_JOYSTICK
bool "Joysticks/Gamepads"
+   depends on !UML
help
  If you have a joystick, 6dof controller, gamepad, steering wheel,
  weapon control system or something like that you can say Y here
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index e15cd6b5bb99..26648daaaee2 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -12,9 +12,8 @@ if TTY
 
 config VT
bool "Virtual terminal" if EXPERT
-   depends on !UML
select INPUT
-   default y
+   default y if !UML
help
  If you say Y here, you will get support for terminal devices with
  display and keyboard devices. These are called "virtual" because you
@@ -78,7 +77,7 @@ config VT_CONSOLE_SLEEP
 
 config HW_CONSOLE
bool
-   depends on VT && !UML
+   depends on VT
default y
 
 config VT_HW_CONSOLE_BINDING
diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
index ee33b8ec62bb..840d9813b0bc 100644
--- a/drivers/video/console/Kconfig
+++ b/drivers/video/console/Kconfig
@@ -9,7 +9,7 @@ config VGA_CONSOLE
bool "VGA text console" if EXPERT || !X86
depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC &&  !SUPERH 
&& \
(!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) 
&& \
-   !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !NDS32 && !S390
+   !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !NDS32 && !S390 
&& !UML
default y
help
  Saying Y here will allow you to use Linux in text mode through a
-- 
2.26.2



Re: [PATCH v2 2/3] lockdep: add lockdep lock state defines

2021-02-26 Thread Johannes Berg


> @@ -5475,7 +5476,7 @@ noinstr int lock_is_held_type(const struct lockdep_map 
> *lock, int read)
>   /* avoid false negative lockdep_assert_not_held()
>* and lockdep_assert_held()
>*/
> - return -1;
> + return LOCK_STATE_UNKNOWN;

I'd argue that then the other two return places here should also be
changed.

johannes



Re: [PATCH 4/7] um: time-travel/signals: fix ndelay() in interrupt

2021-02-25 Thread Johannes Berg
On Tue, 2021-02-23 at 16:27 +0100, Johannes Berg wrote:
> 
> +void unblock_signals_hard(void)
> +{
> + if (!signals_blocked)
> + return;
> +
> + if (signals_pending && signals_enabled) {
> + /* this is a bit inefficient, but that's not really important */
> + block_signals();
> + unblock_signals();
> + } else if (signals_pending & SIGIO_MASK) {
> + /* we need to run time-travel handlers even if not enabled */
> + sigio_run_timetravel_handlers();
> + }
> +
> + signals_blocked = 0;
> +}

This is, of course, racy & wrong - we could set signals_pending just
after checking it.

Need to make this

{
if (!signals_blocked)
return;
signals_blocked = 0;
barrier();

if (signals_pending && signals_enabled) {
...


Anyway, I need to repost this series, but I'll wait a bit longer for
further feedback before that, since I know Arnd and others wanted to
take a look at the IOMEM and PCI bits.

johannes



Re: [PATCH 0/7] PCI support for UML

2021-02-24 Thread Johannes Berg
Hi,

> > If anyone has any suggestions on a good example PCI device that already
> > has a driver upstream I'd be interested - I looked for something simple
> > like LED or GPIO but no such thing I could find (that wasn't platform
> > dependent in some way). So far I've only implemented a virtual Intel
> > WiFi NIC, but that depends on a large body of code I can't publish. As
> > an example, it would be nice to write (and publish there) a simple PCI
> > device implementation. :)
> 
>  bt8xxgpio?

Yeah, that looks pretty good, thanks for the pointer!

johannes



Re: [PATCH 0/7] PCI support for UML

2021-02-23 Thread Johannes Berg
On Tue, 2021-02-23 at 16:27 +0100, Johannes Berg wrote:
> In order to simulate some devices and write tests completely
> independent of real PCI devices, we continued the development
> of time-travel and related bits, and are adding PCI support
> here now.
> 
> The way it works is that it communicates with the outside (of
> UML) with virtio, which we previously added using vhost-user,
> and then offers a PCI bus to the inside system, where normal
> PCI probing etc. happens, but all config space & IO accesses
> are forwarded over virtio.

I hadn't sent it out until now, but the userspace bits for all the time-
travel and PCI-over-vhost-user are here:

https://github.com/linux-test-project/usfstl/

If anyone has any suggestions on a good example PCI device that already
has a driver upstream I'd be interested - I looked for something simple
like LED or GPIO but no such thing I could find (that wasn't platform
dependent in some way). So far I've only implemented a virtual Intel
WiFi NIC, but that depends on a large body of code I can't publish. As
an example, it would be nice to write (and publish there) a simple PCI
device implementation. :)

johannes



Re: [PATCH 1/7] um: allow disabling NO_IOMEM

2021-02-23 Thread Johannes Berg
Hi,

> Can't you just use the asm-generic versions instead?

Hmm, yes, for fb.h and vga.h that should work, since they should never
be used. For vga.h it would be wrong since it assumes the VGA memory is
mapped into the CPU memory, but since it should never run it would still
address the compilation issues I needed this for.

So yeah, I guess we could just use the asm-generic versions of these
two.

johannes



[PATCH 6/7] um: add PCI over virtio emulation driver

2021-02-23 Thread Johannes Berg
From: Johannes Berg 

To support testing of PCI/PCIe drivers in UML, add a PCI bus
support driver. This driver uses virtio, which in UML is really
just vhost-user, to talk to devices, and adds the devices to
the virtual PCI bus in the system.

Since virtio already allows DMA/bus mastering this really isn't
all that hard, of course we need the logic_iomem infrastructure
that was added by a previous patch.

The protocol to talk to the device is has a few fairly simple
messages for reading to/writing from config and IO spaces, and
messages for the device to send the various interrupts (INT#,
MSI/MSI-X and while suspended PME#).

Note that currently no offical virtio device ID is assigned for
this protocol, as a consequence this patch requires defining it
in the Kconfig, with a default that makes the driver refuse to
work at all.

Finally, in order to add support for MSI/MSI-X interrupts, some
small changes are needed in the UML IRQ code, it needs to have
more interrupts, changing NR_IRQS from 64 to 128 if this driver
is enabled, but not actually use them for anything so that the
generic IRQ domain/MSI infrastructure can allocate IRQ numbers.

Signed-off-by: Johannes Berg 
---
 arch/um/Kconfig|  13 +-
 arch/um/drivers/Kconfig|  20 +
 arch/um/drivers/Makefile   |   1 +
 arch/um/drivers/virt-pci.c | 885 +
 arch/um/include/asm/Kbuild |   1 -
 arch/um/include/asm/io.h   |   7 +
 arch/um/include/asm/irq.h  |   8 +-
 arch/um/include/asm/msi.h  |   1 +
 arch/um/include/asm/pci.h  |  39 ++
 arch/um/kernel/Makefile|   1 +
 arch/um/kernel/ioport.c|  13 +
 arch/um/kernel/irq.c   |   7 +-
 include/uapi/linux/virtio_pcidev.h |  64 +++
 13 files changed, 1054 insertions(+), 6 deletions(-)
 create mode 100644 arch/um/drivers/virt-pci.c
 create mode 100644 arch/um/include/asm/msi.h
 create mode 100644 arch/um/include/asm/pci.h
 create mode 100644 arch/um/kernel/ioport.c
 create mode 100644 include/uapi/linux/virtio_pcidev.h

diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index 20b0640e01b8..f64d774706e5 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -14,7 +14,7 @@ config UML
select HAVE_FUTEX_CMPXCHG if FUTEX
select HAVE_DEBUG_KMEMLEAK
select HAVE_DEBUG_BUGVERBOSE
-   select NO_DMA
+   select NO_DMA if !UML_DMA_EMULATION
select GENERIC_IRQ_SHOW
select GENERIC_CPU_DEVICES
select HAVE_GCC_PLUGINS
@@ -25,10 +25,21 @@ config MMU
bool
default y
 
+config UML_DMA_EMULATION
+   bool
+
 config NO_IOMEM
bool "disable IOMEM" if EXPERT
+   depends on !INDIRECT_IOMEM
default y
 
+config UML_IOMEM_EMULATION
+   bool
+   select INDIRECT_IOMEM
+   select GENERIC_PCI_IOMAP
+   select GENERIC_IOMAP
+   select NO_GENERIC_PCI_IOPORT_MAP
+
 config NO_IOPORT_MAP
def_bool y
 
diff --git a/arch/um/drivers/Kconfig b/arch/um/drivers/Kconfig
index 03ba34b61115..f145842c40b9 100644
--- a/arch/um/drivers/Kconfig
+++ b/arch/um/drivers/Kconfig
@@ -357,3 +357,23 @@ config UML_RTC
  rtcwake, especially in time-travel mode. This driver enables that
  by providing a fake RTC clock that causes a wakeup at the right
  time.
+
+config UML_PCI_OVER_VIRTIO
+   bool "Enable PCI over VIRTIO device simulation"
+   # in theory, just VIRTIO is enough, but that causes recursion
+   depends on VIRTIO_UML
+   select FORCE_PCI
+   select UML_IOMEM_EMULATION
+   select UML_DMA_EMULATION
+   select PCI_MSI
+   select PCI_MSI_IRQ_DOMAIN
+   select PCI_LOCKLESS_CONFIG
+
+config UML_PCI_OVER_VIRTIO_DEVICE_ID
+   int "set the virtio device ID for PCI emulation"
+   default -1
+   depends on UML_PCI_OVER_VIRTIO
+   help
+ There's no official device ID assigned (yet), set the one you
+ wish to use for experimentation here. The default of -1 is
+ not valid and will cause the driver to fail at probe.
diff --git a/arch/um/drivers/Makefile b/arch/um/drivers/Makefile
index dcc64a02f81f..803666e85414 100644
--- a/arch/um/drivers/Makefile
+++ b/arch/um/drivers/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_BLK_DEV_COW_COMMON) += cow_user.o
 obj-$(CONFIG_UML_RANDOM) += random.o
 obj-$(CONFIG_VIRTIO_UML) += virtio_uml.o
 obj-$(CONFIG_UML_RTC) += rtc.o
+obj-$(CONFIG_UML_PCI_OVER_VIRTIO) += virt-pci.o
 
 # pcap_user.o must be added explicitly.
 USER_OBJS := fd.o null.o pty.o tty.o xterm.o slip_common.o pcap_user.o 
vde_user.o vector_user.o
diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
new file mode 100644
index ..dd85f36197aa
--- /dev/null
+++ b/arch/um/drivers/virt-pci.c
@@ -0,0 +1,885 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Intel Corporation
+ * Author: Johannes Berg 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+

[PATCH 3/7] um: remove unused smp_sigio_handler() declaration

2021-02-23 Thread Johannes Berg
From: Johannes Berg 

This function doesn't exist, remove its declaration.

Signed-off-by: Johannes Berg 
---
 arch/um/include/shared/kern_util.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/um/include/shared/kern_util.h 
b/arch/um/include/shared/kern_util.h
index 2888ec812f6e..a2cfd42608a0 100644
--- a/arch/um/include/shared/kern_util.h
+++ b/arch/um/include/shared/kern_util.h
@@ -33,7 +33,6 @@ extern int handle_page_fault(unsigned long address, unsigned 
long ip,
 int is_write, int is_user, int *code_out);
 
 extern unsigned int do_IRQ(int irq, struct uml_pt_regs *regs);
-extern int smp_sigio_handler(void);
 extern void initial_thread_cb(void (*proc)(void *), void *arg);
 extern int is_syscall(unsigned long addr);
 
-- 
2.26.2



[PATCH 7/7] um: virtio/pci: enable suspend/resume

2021-02-23 Thread Johannes Berg
From: Johannes Berg 

The UM virtual PCI devices currently cannot be suspended properly
since the virtio driver already disables VQs well before the PCI
bus's suspend_noirq wants to complete the transition by writing to
PCI config space.

After trying around for a long time with moving the devices on the
DPM list, trying to create dependencies between them, etc. I gave
up and instead added UML specific cross-driver API that lets the
virt-pci code enable not suspending/resuming VQs for its devices.

This then allows the PCI bus suspend_noirq to still talk to the
device, and suspend/resume works properly.

Signed-off-by: Johannes Berg 
---
 arch/um/drivers/virt-pci.c |  7 ++
 arch/um/drivers/virtio_uml.c   | 40 ++
 arch/um/include/linux/virtio-uml.h | 13 ++
 3 files changed, 50 insertions(+), 10 deletions(-)
 create mode 100644 arch/um/include/linux/virtio-uml.h

diff --git a/arch/um/drivers/virt-pci.c b/arch/um/drivers/virt-pci.c
index dd85f36197aa..e5d57ec8ada6 100644
--- a/arch/um/drivers/virt-pci.c
+++ b/arch/um/drivers/virt-pci.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -550,6 +551,12 @@ static int um_pci_virtio_probe(struct virtio_device *vdev)
 
device_set_wakeup_enable(>dev, true);
 
+   /*
+* In order to do suspend-resume properly, don't allow VQs
+* to be suspended.
+*/
+   virtio_uml_set_no_vq_suspend(vdev, true);
+
um_pci_rescan();
return 0;
 error:
diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
index 91ddf74ca888..4412d6febade 100644
--- a/arch/um/drivers/virtio_uml.c
+++ b/arch/um/drivers/virtio_uml.c
@@ -56,6 +56,7 @@ struct virtio_uml_device {
u8 status;
u8 registered:1;
u8 suspended:1;
+   u8 no_vq_suspend:1;
 
u8 config_changed_irq:1;
uint64_t vq_irq_vq_map;
@@ -1098,6 +1099,19 @@ static void virtio_uml_release_dev(struct device *d)
kfree(vu_dev);
 }
 
+void virtio_uml_set_no_vq_suspend(struct virtio_device *vdev,
+ bool no_vq_suspend)
+{
+   struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev);
+
+   if (WARN_ON(vdev->config != _uml_config_ops))
+   return;
+
+   vu_dev->no_vq_suspend = no_vq_suspend;
+   dev_info(>dev, "%sabled VQ suspend\n",
+no_vq_suspend ? "dis" : "en");
+}
+
 /* Platform device */
 
 static int virtio_uml_probe(struct platform_device *pdev)
@@ -1302,13 +1316,16 @@ MODULE_DEVICE_TABLE(of, virtio_uml_match);
 static int virtio_uml_suspend(struct platform_device *pdev, pm_message_t state)
 {
struct virtio_uml_device *vu_dev = platform_get_drvdata(pdev);
-   struct virtqueue *vq;
 
-   virtio_device_for_each_vq((_dev->vdev), vq) {
-   struct virtio_uml_vq_info *info = vq->priv;
+   if (!vu_dev->no_vq_suspend) {
+   struct virtqueue *vq;
 
-   info->suspended = true;
-   vhost_user_set_vring_enable(vu_dev, vq->index, false);
+   virtio_device_for_each_vq((_dev->vdev), vq) {
+   struct virtio_uml_vq_info *info = vq->priv;
+
+   info->suspended = true;
+   vhost_user_set_vring_enable(vu_dev, vq->index, false);
+   }
}
 
if (!device_may_wakeup(_dev->vdev.dev)) {
@@ -1322,13 +1339,16 @@ static int virtio_uml_suspend(struct platform_device 
*pdev, pm_message_t state)
 static int virtio_uml_resume(struct platform_device *pdev)
 {
struct virtio_uml_device *vu_dev = platform_get_drvdata(pdev);
-   struct virtqueue *vq;
 
-   virtio_device_for_each_vq((_dev->vdev), vq) {
-   struct virtio_uml_vq_info *info = vq->priv;
+   if (!vu_dev->no_vq_suspend) {
+   struct virtqueue *vq;
+
+   virtio_device_for_each_vq((_dev->vdev), vq) {
+   struct virtio_uml_vq_info *info = vq->priv;
 
-   info->suspended = false;
-   vhost_user_set_vring_enable(vu_dev, vq->index, true);
+   info->suspended = false;
+   vhost_user_set_vring_enable(vu_dev, vq->index, true);
+   }
}
 
vu_dev->suspended = false;
diff --git a/arch/um/include/linux/virtio-uml.h 
b/arch/um/include/linux/virtio-uml.h
new file mode 100644
index ..2f652fa90f04
--- /dev/null
+++ b/arch/um/include/linux/virtio-uml.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2021 Intel Corporation
+ * Author: Johannes Berg 
+ */
+
+#ifndef __VIRTIO_UML_H__
+#define __VIRTIO_UML_H__
+
+void virtio_uml_set_no_vq_suspend(struct virtio_device *vdev,
+ bool no_vq_suspend);
+
+#endif /* __VIRTIO_UML_H__ */
-- 
2.26.2



[PATCH 2/7] lib: add iomem emulation (logic_iomem)

2021-02-23 Thread Johannes Berg
From: Johannes Berg 

Add IO memory emulation that uses callbacks for read/write to
the allocated regions. The callbacks can be registered by the
users using logic_iomem_alloc().

To use, an architecture must 'select LOGIC_IOMEM' in Kconfig
and then include  into asm/io.h to get
the __raw_read*/__raw_write* functions.

Optionally, an architecture may 'select LOGIC_IOMEM_FALLBACK'
in which case non-emulated regions will 'fall back' to the
various real_* functions that must then be provided.

Cc: Arnd Bergmann 
Signed-off-by: Johannes Berg 
---
 include/asm-generic/logic_io.h |  78 
 include/linux/logic_iomem.h|  62 +++
 lib/Kconfig|  14 ++
 lib/Makefile   |   2 +
 lib/logic_iomem.c  | 318 +
 5 files changed, 474 insertions(+)
 create mode 100644 include/asm-generic/logic_io.h
 create mode 100644 include/linux/logic_iomem.h
 create mode 100644 lib/logic_iomem.c

diff --git a/include/asm-generic/logic_io.h b/include/asm-generic/logic_io.h
new file mode 100644
index ..a53116b8c57e
--- /dev/null
+++ b/include/asm-generic/logic_io.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2021 Intel Corporation
+ * Author: johan...@sipsolutions.net
+ */
+#ifndef _LOGIC_IO_H
+#define _LOGIC_IO_H
+#include 
+
+/* include this file into asm/io.h */
+
+#ifdef CONFIG_INDIRECT_IOMEM
+
+#ifdef CONFIG_INDIRECT_IOMEM_FALLBACK
+/*
+ * If you want emulated IO memory to fall back to 'normal' IO memory
+ * if a region wasn't registered as emulated, then you need to have
+ * all of the real_* functions implemented.
+ */
+#if !defined(real_ioremap) || !defined(real_iounmap) || \
+!defined(real_raw_readb) || !defined(real_raw_writeb) || \
+!defined(real_raw_readw) || !defined(real_raw_writew) || \
+!defined(real_raw_readl) || !defined(real_raw_writel) || \
+(defined(CONFIG_64BIT) && \
+ (!defined(real_raw_readq) || !defined(real_raw_writeq))) || \
+!defined(real_memset_io) || \
+!defined(real_memcpy_fromio) || \
+!defined(real_memcpy_toio)
+#error "Must provide fallbacks for real IO memory access"
+#endif /* defined ... */
+#endif /* CONFIG_INDIRECT_IOMEM_FALLBACK */
+
+#define ioremap ioremap
+void __iomem *ioremap(phys_addr_t offset, size_t size);
+
+#define iounmap iounmap
+void iounmap(void __iomem *addr);
+
+#define __raw_readb __raw_readb
+u8 __raw_readb(const volatile void __iomem *addr);
+
+#define __raw_readw __raw_readw
+u16 __raw_readw(const volatile void __iomem *addr);
+
+#define __raw_readl __raw_readl
+u32 __raw_readl(const volatile void __iomem *addr);
+
+#ifdef CONFIG_64BIT
+#define __raw_readq __raw_readq
+u64 __raw_readq(const volatile void __iomem *addr);
+#endif /* CONFIG_64BIT */
+
+#define __raw_writeb __raw_writeb
+void __raw_writeb(u8 value, volatile void __iomem *addr);
+
+#define __raw_writew __raw_writew
+void __raw_writew(u16 value, volatile void __iomem *addr);
+
+#define __raw_writel __raw_writel
+void __raw_writel(u32 value, volatile void __iomem *addr);
+
+#ifdef CONFIG_64BIT
+#define __raw_writeq __raw_writeq
+void __raw_writeq(u64 value, volatile void __iomem *addr);
+#endif /* CONFIG_64BIT */
+
+#define memset_io memset_io
+void memset_io(volatile void __iomem *addr, int value, size_t size);
+
+#define memcpy_fromio memcpy_fromio
+void memcpy_fromio(void *buffer, const volatile void __iomem *addr,
+  size_t size);
+
+#define memcpy_toio memcpy_toio
+void memcpy_toio(volatile void __iomem *addr, const void *buffer, size_t size);
+
+#endif /* CONFIG_INDIRECT_IOMEM */
+#endif /* _LOGIC_IO_H */
diff --git a/include/linux/logic_iomem.h b/include/linux/logic_iomem.h
new file mode 100644
index ..3fa65c964379
--- /dev/null
+++ b/include/linux/logic_iomem.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2021 Intel Corporation
+ * Author: johan...@sipsolutions.net
+ */
+#ifndef __LOGIC_IOMEM_H
+#define __LOGIC_IOMEM_H
+#include 
+#include 
+
+/**
+ * struct logic_iomem_ops - emulated IO memory ops
+ * @read: read an 8, 16, 32 or 64 bit quantity from the given offset,
+ * size is given in bytes (1, 2, 4 or 8)
+ * (64-bit only necessary if CONFIG_64BIT is set)
+ * @write: write an 8, 16 32 or 64 bit quantity to the given offset,
+ * size is given in bytes (1, 2, 4 or 8)
+ * (64-bit only necessary if CONFIG_64BIT is set)
+ * @set: optional, for memset_io()
+ * @copy_from: optional, for memcpy_fromio()
+ * @copy_to: optional, for memcpy_toio()
+ * @unmap: optional, this region is getting unmapped
+ */
+struct logic_iomem_ops {
+   unsigned long (*read)(void *priv, unsigned int offset, int size);
+   void (*write)(void *priv, unsigned int offset, int size,
+ unsigned long val);
+
+   void (*set)(void *priv, unsigned int offset, u8 value, int size);
+   void (*copy_from)(void *priv, void *buffer, u

[PATCH 4/7] um: time-travel/signals: fix ndelay() in interrupt

2021-02-23 Thread Johannes Berg
From: Johannes Berg 

We should be able to ndelay() from any context, even from an
interrupt context! However, this is broken (not functionally,
but locking-wise) in time-travel because we'll get into the
time-travel code and enable interrupts to handle messages on
other time-travel aware subsystems (only virtio for now).

Luckily, I've already reworked the time-travel aware signal
(interrupt) delivery for suspend/resume to have a time travel
handler, which runs directly in the context of the signal and
not from the Linux interrupt.

In order to fix this time-travel issue then, we need to do a
few things:

 1) rework the signal handling code to not block SIGIO (if
time-travel is enabled, just to simplify the other case)
but rather let it bubble through the system, all the way
*past* the IRQ's timetravel_handler, stopping it after
that and before real interrupt delivery if IRQs are not
actually enabled;
 2) rework time-travel to not enable interrupts while it's
waiting for a message;
 3) rework time-travel to not (just) disable interrupts but
rather block signals at a lower level while it needs them
disabled for communicating with the controller.

Finally, since now we can actually spend even virtual time
in interrupts-disabled sections, the delay warning when we
deliver a time-travel delayed interrupt is no longer valid,
things can (and should) now get delayed.

Signed-off-by: Johannes Berg 
---
 arch/um/include/shared/irq_user.h |  1 +
 arch/um/include/shared/os.h   |  3 +++
 arch/um/kernel/irq.c  | 38 +-
 arch/um/kernel/time.c | 35 +---
 arch/um/os-Linux/signal.c | 44 ---
 5 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/arch/um/include/shared/irq_user.h 
b/arch/um/include/shared/irq_user.h
index 07239e801a5b..065829f443ae 100644
--- a/arch/um/include/shared/irq_user.h
+++ b/arch/um/include/shared/irq_user.h
@@ -17,6 +17,7 @@ enum um_irq_type {
 
 struct siginfo;
 extern void sigio_handler(int sig, struct siginfo *unused_si, struct 
uml_pt_regs *regs);
+void sigio_run_timetravel_handlers(void);
 extern void free_irq_by_fd(int fd);
 extern void deactivate_fd(int fd, int irqnum);
 extern int deactivate_all_fds(void);
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 13d86f94cf0f..afd5fbc1c867 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -243,6 +243,9 @@ extern int set_signals_trace(int enable);
 extern int os_is_signal_stack(void);
 extern void deliver_alarm(void);
 extern void register_pm_wake_signal(void);
+extern void block_signals_hard(void);
+extern void unblock_signals_hard(void);
+extern void mark_sigio_pending(void);
 
 /* util.c */
 extern void stack_protections(unsigned long address);
diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index 82af5191e73d..76448b85292f 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
@@ -123,7 +123,8 @@ static bool irq_do_timetravel_handler(struct irq_entry 
*entry,
 #endif
 
 static void sigio_reg_handler(int idx, struct irq_entry *entry, enum 
um_irq_type t,
- struct uml_pt_regs *regs)
+ struct uml_pt_regs *regs,
+ bool timetravel_handlers_only)
 {
struct irq_reg *reg = >reg[t];
 
@@ -136,18 +137,32 @@ static void sigio_reg_handler(int idx, struct irq_entry 
*entry, enum um_irq_type
if (irq_do_timetravel_handler(entry, t))
return;
 
-   if (irqs_suspended)
+#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
+   /*
+* We can only get here with signals disabled if we have time-travel
+* support, otherwise we cannot have the hard handlers that may need
+* to run even in interrupts-disabled sections and therefore sigio is
+* blocked as well when interrupts are disabled.
+*/
+   if (!get_signals()) {
+   mark_sigio_pending();
+   return;
+   }
+#endif
+
+   if (timetravel_handlers_only)
return;
 
irq_io_loop(reg, regs);
 }
 
-void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs 
*regs)
+static void _sigio_handler(struct uml_pt_regs *regs,
+  bool timetravel_handlers_only)
 {
struct irq_entry *irq_entry;
int n, i;
 
-   if (irqs_suspended && !um_irq_timetravel_handler_used())
+   if (timetravel_handlers_only && !um_irq_timetravel_handler_used())
return;
 
while (1) {
@@ -172,14 +187,20 @@ void sigio_handler(int sig, struct siginfo *unused_si, 
struct uml_pt_regs *regs)
irq_entry = os_epoll_get_data_pointer(i);
 
for (t = 0; t < NUM_IRQ_TYPES; t++)
-   sigio_reg_handler(i, irq_entry, t, regs);
+   sigio_reg_handler

[PATCH 5/7] um: irqs: allow invoking time-travel handler multiple times

2021-02-23 Thread Johannes Berg
From: Johannes Berg 

If we happen to get multiple messages while IRQS are already
suspended, we still need to handle them, since otherwise the
simulation blocks.

Remove the "prevent nesting" part, time_travel_add_irq_event()
will deal with being called multiple times just fine.

Signed-off-by: Johannes Berg 
---
 arch/um/kernel/irq.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index 76448b85292f..3718a5cdbc85 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
@@ -101,10 +101,12 @@ static bool irq_do_timetravel_handler(struct irq_entry 
*entry,
if (!reg->timetravel_handler)
return false;
 
-   /* prevent nesting - we'll get it again later when we SIGIO ourselves */
-   if (reg->pending_on_resume)
-   return true;
-
+   /*
+* Handle all messages - we might get multiple even while
+* interrupts are already suspended, due to suspend order
+* etc. Note that time_travel_add_irq_event() will not add
+* an event twice, if it's pending already "first wins".
+*/
reg->timetravel_handler(reg->irq, entry->fd, reg->id, >event);
 
if (!reg->event.pending)
-- 
2.26.2



[PATCH 1/7] um: allow disabling NO_IOMEM

2021-02-23 Thread Johannes Berg
From: Johannes Berg 

Adjust the kconfig a little to allow disabling NO_IOMEM in UML. To
make an "allyesconfig" with CONFIG_NO_IOMEM=n build, adjust a few
Kconfig things elsewhere and add dummy asm/fb.h and asm/vga.h files.

Signed-off-by: Johannes Berg 
---
 arch/um/Kconfig|  4 
 arch/um/include/asm/fb.h   | 15 +++
 arch/um/include/asm/vga.h  |  9 +
 drivers/input/Kconfig  |  1 -
 drivers/input/gameport/Kconfig |  1 +
 drivers/input/joystick/Kconfig |  1 +
 drivers/tty/Kconfig|  5 ++---
 drivers/video/console/Kconfig  |  2 +-
 8 files changed, 33 insertions(+), 5 deletions(-)
 create mode 100644 arch/um/include/asm/fb.h
 create mode 100644 arch/um/include/asm/vga.h

diff --git a/arch/um/Kconfig b/arch/um/Kconfig
index c3030db3325f..20b0640e01b8 100644
--- a/arch/um/Kconfig
+++ b/arch/um/Kconfig
@@ -26,6 +26,10 @@ config MMU
default y
 
 config NO_IOMEM
+   bool "disable IOMEM" if EXPERT
+   default y
+
+config NO_IOPORT_MAP
def_bool y
 
 config ISA
diff --git a/arch/um/include/asm/fb.h b/arch/um/include/asm/fb.h
new file mode 100644
index ..f8fa5a6b43b5
--- /dev/null
+++ b/arch/um/include/asm/fb.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_UM_FB_H
+#define _ASM_UM_FB_H
+
+static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
+unsigned long off)
+{
+}
+
+static inline int fb_is_primary_device(struct fb_info *info)
+{
+   return 0;
+}
+
+#endif /* _ASM_UM_FB_H */
diff --git a/arch/um/include/asm/vga.h b/arch/um/include/asm/vga.h
new file mode 100644
index ..0b0e73ccdb28
--- /dev/null
+++ b/arch/um/include/asm/vga.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_UM_VGA_H
+#define _ASM_UM_VGA_H
+
+#define VGA_MAP_MEM(x, s)  ((unsigned long) ioremap(x, s))
+#define vga_readb(a)   readb((u8 __iomem *)(a))
+#define vga_writeb(v,a)writeb(v, (u8 __iomem *)(a))
+
+#endif /* _ASM_UM_VGA_H */
diff --git a/drivers/input/Kconfig b/drivers/input/Kconfig
index ec0e861f185f..5baebf62df33 100644
--- a/drivers/input/Kconfig
+++ b/drivers/input/Kconfig
@@ -4,7 +4,6 @@
 #
 
 menu "Input device support"
-   depends on !UML
 
 config INPUT
tristate "Generic input layer (needed for keyboard, mouse, ...)" if 
EXPERT
diff --git a/drivers/input/gameport/Kconfig b/drivers/input/gameport/Kconfig
index 4761795cb49f..5a2c2fb3217d 100644
--- a/drivers/input/gameport/Kconfig
+++ b/drivers/input/gameport/Kconfig
@@ -4,6 +4,7 @@
 #
 config GAMEPORT
tristate "Gameport support"
+   depends on !UML
help
  Gameport support is for the standard 15-pin PC gameport. If you
  have a joystick, gamepad, gameport card, a soundcard with a gameport
diff --git a/drivers/input/joystick/Kconfig b/drivers/input/joystick/Kconfig
index b080f0cfb068..6a03a1f0cd5f 100644
--- a/drivers/input/joystick/Kconfig
+++ b/drivers/input/joystick/Kconfig
@@ -4,6 +4,7 @@
 #
 menuconfig INPUT_JOYSTICK
bool "Joysticks/Gamepads"
+   depends on !UML
help
  If you have a joystick, 6dof controller, gamepad, steering wheel,
  weapon control system or something like that you can say Y here
diff --git a/drivers/tty/Kconfig b/drivers/tty/Kconfig
index e15cd6b5bb99..26648daaaee2 100644
--- a/drivers/tty/Kconfig
+++ b/drivers/tty/Kconfig
@@ -12,9 +12,8 @@ if TTY
 
 config VT
bool "Virtual terminal" if EXPERT
-   depends on !UML
select INPUT
-   default y
+   default y if !UML
help
  If you say Y here, you will get support for terminal devices with
  display and keyboard devices. These are called "virtual" because you
@@ -78,7 +77,7 @@ config VT_CONSOLE_SLEEP
 
 config HW_CONSOLE
bool
-   depends on VT && !UML
+   depends on VT
default y
 
 config VT_HW_CONSOLE_BINDING
diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
index ee33b8ec62bb..840d9813b0bc 100644
--- a/drivers/video/console/Kconfig
+++ b/drivers/video/console/Kconfig
@@ -9,7 +9,7 @@ config VGA_CONSOLE
bool "VGA text console" if EXPERT || !X86
depends on !4xx && !PPC_8xx && !SPARC && !M68K && !PARISC &&  !SUPERH 
&& \
(!ARM || ARCH_FOOTBRIDGE || ARCH_INTEGRATOR || ARCH_NETWINDER) 
&& \
-   !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !NDS32 && !S390
+   !ARM64 && !ARC && !MICROBLAZE && !OPENRISC && !NDS32 && !S390 
&& !UML
default y
help
  Saying Y here will allow you to use Linux in text mode through a
-- 
2.26.2



[PATCH 0/7] PCI support for UML

2021-02-23 Thread Johannes Berg
Hi,

In order to simulate some devices and write tests completely
independent of real PCI devices, we continued the development
of time-travel and related bits, and are adding PCI support
here now.

The way it works is that it communicates with the outside (of
UML) with virtio, which we previously added using vhost-user,
and then offers a PCI bus to the inside system, where normal
PCI probing etc. happens, but all config space & IO accesses
are forwarded over virtio.

To enable that, add lib/logic_iomem, similar to logic_pio but
for iomem regions, this way, ioread/iowrite can be redirected
over the virtio device.

Since currently no official virtio device ID is assigned yet
a Kconfig option for that is required to be set to the value
you want to use locally for experimentation. Once we have an
official value we can change the default (currently -1 which
makes it non-functional) or remove the option entirely.

johannes




Re: [PATCH v2] gdb: lx-symbols: store the abspath()

2021-02-23 Thread Johannes Berg
On Thu, 2020-12-17 at 09:31 +0100, Jan Kiszka wrote:
> On 17.12.20 09:17, Johannes Berg wrote:
> > From: Johannes Berg 
> > 
> > If we store the relative path, the user might later cd to a
> > different directory, and that would break the automatic symbol
> > resolving that happens when a module is loaded into the target
> > kernel. Fix this by storing the abspath() of each path given,
> > just like we already do for the cwd (os.getcwd() is absolute.)

> Reviewed-by: Jan Kiszka 

So ... I'm still carrying this patch.

Anyone want to send it to Linus? Andrew, maybe you?

Thanks,
johannes



Re: [PATCH net v1 3/3] [RFC] mac80211: ieee80211_store_ack_skb(): make use of skb_clone_sk_optional()

2021-02-23 Thread Johannes Berg
On Mon, 2021-02-22 at 19:51 +0100, Marc Kleine-Budde wrote:
> On 22.02.2021 17:30:59, Johannes Berg wrote:
> > On Mon, 2021-02-22 at 16:12 +0100, Oleksij Rempel wrote:
> > > This code is trying to clone the skb with optional skb->sk. But this
> > > will fail to clone the skb if socket was closed just after the skb was
> > > pushed into the networking stack.
> > 
> > Which IMHO is completely fine. If we then still clone the SKB we can't
> > do anything with it, since the point would be to ... send it back to the
> > socket, but it's gone.
> 
> Ok, but why is the skb cloned if there is no socket linked in skb->sk?

Hm? There are two different ways to get here, one with and one without a
socket.

johannes



Re: [PATCH net v1 3/3] [RFC] mac80211: ieee80211_store_ack_skb(): make use of skb_clone_sk_optional()

2021-02-22 Thread Johannes Berg
On Mon, 2021-02-22 at 16:12 +0100, Oleksij Rempel wrote:
> This code is trying to clone the skb with optional skb->sk. But this
> will fail to clone the skb if socket was closed just after the skb was
> pushed into the networking stack.

Which IMHO is completely fine. If we then still clone the SKB we can't
do anything with it, since the point would be to ... send it back to the
socket, but it's gone.

Nothing to fix here, I'd think. If you wanted to get a copy back that
gives you the status of the SKB, it should not come as a huge surprise
that you have to keep the socket open for that :)

Having the ACK skb will just make us do more work by handing it back
to skb_complete_wifi_ack() at TX status time, which is supposed to put
it into the socket's error queue, but if the socket is closed ... no
point in that.

johannes




Re: [PATCH] [v13] wireless: Initial driver submission for pureLiFi STA devices

2021-02-19 Thread Johannes Berg
On Fri, 2021-02-19 at 05:15 +, Srinivasan Raju wrote:
> Hi,
> 
> Please find a few responses to the comments , We will fix rest of the 
> comments and submit v14 of the patch.
> 
> > Also, you *really* need some validation here, rather than blindly
> > trusting that the file is well-formed, otherwise you immediately have a
> > security issue here.
> 
> The firmware is signed and the harware validates the signature , so we are 
> not validating in the software.

That wasn't my point. My point was that the kernel code trusts the
validity of the firmware image, in the sense of e.g. this piece:

> + no_of_files = *(u32 *)_packed->data[0];

If the firmware file was corrupted (intentionally/maliciously or not), this 
could now be say 0x.

> + for (step = 0; step < no_of_files; step++) {

> + start_addr = *(u32 *)_packed->data[4 + (step * 4)];

But you trust it completely and don't even check that "4 + (step * 4)"
fits into the length of the data!

That's what I meant. Don't allow this to crash the kernel.

> > > +static const struct plf_reg_alpha2_map reg_alpha2_map[] = {
> > > + { PLF_REGDOMAIN_FCC, "US" },
> > > + { PLF_REGDOMAIN_IC, "CA" },
> > > + { PLF_REGDOMAIN_ETSI, "DE" }, /* Generic ETSI, use most restrictive 
> > > */
> > > + { PLF_REGDOMAIN_JAPAN, "JP" },
> > > + { PLF_REGDOMAIN_SPAIN, "ES" },
> > > + { PLF_REGDOMAIN_FRANCE, "FR" },
> > > +};
> > You actually have regulatory restrictions on this stuff?
> 
> Currently, There are no regulatory restrictions applicable for LiFi ,
> regulatory_hint setting is only for wifi core 

So why do you have PLF_REGDOMAIN_* things? What does that even mean
then?

> > OTOH, I can sort of see why/how you're reusing wifi functionality here,
> > very old versions of wifi even had an infrared PHY I think.
> > 
> > Conceptually, it seems odd. Perhaps we should add a new band definition?
> > 
> > And what I also asked above - how much of the rate stuff is completely
> > fake? Are you really doing CCK/OFDM in some (strange?) way?
> 
> Yes, your understanding is correct, and we use OFDM.
> For now we will use the existing band definition.

OFDM over infrared, nice.

Still, I'm not convinced we should piggy-back this on 2.4 GHz.

NL80211_BAND_1THZ? ;-)

Ok, I don't know what wavelength you're using, of course...

But at least thinking about this, wouldn't we be better off if we define
NL80211_BAND_INFRARED or something? That way, at least we can tell that
it's not going to interoperate with real 2.4 GHz, etc.

OTOH, this is a bit of a slippery slow - what if somebody else designs
an *incompatible* infrared PHY? I guess this is not really standardised
at this point.

Not really sure about all this, but I guess we need to think about it
more.

What are your reasons for piggy-backing on 2.4 GHz? Just practical "it's
there and we don't care"?

johannes



Re: [PATCH] kcov: Remove kcov include from sched.h and move it to its users.

2021-02-18 Thread Johannes Berg
On Thu, 2021-02-18 at 18:31 +0100, Sebastian Andrzej Siewior wrote:
> The recent addition of in_serving_softirq() to kconv.h results in

You typo'ed "kconv.h" pretty consistently ;-)

But yes, that makes sense.

Acked-by: Johannes Berg 

johannes




Re: [PATCH 1/2] lockdep: add lockdep_assert_not_held()

2021-02-15 Thread Johannes Berg
On Mon, 2021-02-15 at 17:04 +0100, Peter Zijlstra wrote:
> On Mon, Feb 15, 2021 at 02:12:30PM +0100, Johannes Berg wrote:
> > On Mon, 2021-02-15 at 11:44 +0100, Peter Zijlstra wrote:
> > > I think something like so will work, but please double check.
> > 
> > Yeah, that looks better.
> > 
> > > +++ b/include/linux/lockdep.h
> > > @@ -294,11 +294,15 @@ extern void lock_unpin_lock(struct lockdep_map 
> > > *lock, struct pin_cookie);
> > >  
> > >  #define lockdep_depth(tsk)   (debug_locks ? (tsk)->lockdep_depth : 0)
> > >  
> > > -#define lockdep_assert_held(l)   do {\
> > > - WARN_ON(debug_locks && !lockdep_is_held(l));\
> > > +#define lockdep_assert_held(l)   do {
> > > \
> > > + WARN_ON(debug_locks && lockdep_is_held(l) == 0));   \
> > >   } while (0)
> > 
> > That doesn't really need to change? It's the same.
> 
> Correct, but I found it more symmetric vs the not implementation below.

Fair enough. One might argue that you should have an

enum lockdep_lock_state {
LOCK_STATE_NOT_HELD, /* 0 now */
LOCK_STATE_HELD, /* 1 now */
LOCK_STATE_UNKNOWN, /* -1 with your patch but might as well be 2 */
};

:)

johannes



Re: [PATCH 1/2] lockdep: add lockdep_assert_not_held()

2021-02-15 Thread Johannes Berg
On Mon, 2021-02-15 at 11:44 +0100, Peter Zijlstra wrote:
> 
> I think something like so will work, but please double check.

Yeah, that looks better.

> +++ b/include/linux/lockdep.h
> @@ -294,11 +294,15 @@ extern void lock_unpin_lock(struct lockdep_map *lock, 
> struct pin_cookie);
>  
>  #define lockdep_depth(tsk)   (debug_locks ? (tsk)->lockdep_depth : 0)
>  
> -#define lockdep_assert_held(l)   do {\
> - WARN_ON(debug_locks && !lockdep_is_held(l));\
> +#define lockdep_assert_held(l)   do {
> \
> + WARN_ON(debug_locks && lockdep_is_held(l) == 0));   \
>   } while (0)

That doesn't really need to change? It's the same.

> -#define lockdep_assert_held_write(l) do {\
> +#define lockdep_assert_not_held(l)   do {\
> + WARN_ON(debug_locks && lockdep_is_held(l) == 1));   \
> + } while (0)
> +
> +#define lockdep_assert_held_write(l) do {\
>   WARN_ON(debug_locks && !lockdep_is_held_type(l, 0));\
>   } while (0)
>  
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index c1418b47f625..983ba206f7b2 100644
> --- a/kernel/locking/lockdep.
> +++ b/kernel/locking/lockdep.c
> @@ -5467,7 +5467,7 @@ noinstr int lock_is_held_type(const struct lockdep_map 
> *lock, int read)
>   int ret = 0;
>  
>   if (unlikely(!lockdep_enabled()))
> - return 1; /* avoid false negative lockdep_assert_held() */
> + return -1; /* avoid false negative lockdep_assert_held() */

Maybe add lockdep_assert_not_held() to the comment, to explain the -1
(vs non-zero)?

johannes



Re: [PATCH] [v13] wireless: Initial driver submission for pureLiFi STA devices

2021-02-12 Thread Johannes Berg
Hi,

Thanks for your patience, and thanks for sticking around!

I'm sorry I haven't reviewed this again in a long time, but I was able
to today.


> +PUREILIFI USB DRIVER

Did you mistype "PURELIFI" here, or was that intentional?

> +PUREILIFI USB DRIVER
> +M:   Srinivasan Raju 

Probably would be good to have an "L" entry with the linux-wireless list
here.

> +if WLAN_VENDOR_PURELIFI
> +
> +source "drivers/net/wireless/purelifi/plfxlc/Kconfig"

Seems odd to have the Makefile under purelifi/ but the Kconfig is yet
another directory deeper?

> +++ b/drivers/net/wireless/purelifi/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_WLAN_VENDOR_PURELIFI)   := plfxlc/

Although this one doesn't do anything, so all you did was save a level
of Kconfig inclusion I guess ... no real objection to that.

> diff --git a/drivers/net/wireless/purelifi/plfxlc/Kconfig 
> b/drivers/net/wireless/purelifi/plfxlc/Kconfig
> new file mode 100644
> index ..07a65c0fce68
> --- /dev/null
> +++ b/drivers/net/wireless/purelifi/plfxlc/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config PLFXLC
> +
> + tristate "pureLiFi X, XL, XC device support"

extra blank line.

Also, maybe that should be a bit more verbose? PURELIFI_XLC or so? I
don't think it shows up in many places, but if you see "PLFXLC"
somewhere that's not very helpful.

> + depends on CFG80211 && MAC80211 && USB
> + help
> +This driver makes the adapter appear as a normal WLAN interface.
> +
> +The pureLiFi device requires external STA firmware to be loaded.
> +
> +To compile this driver as a module, choose M here: the module will
> +be called purelifi.

But will it? Seems like it would be called "plfxlc"?

See here:

> +++ b/drivers/net/wireless/purelifi/plfxlc/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_PLFXLC) := plfxlc.o
> +plfxlc-objs  += chip.o firmware.o usb.o mac.o


> +int purelifi_set_beacon_interval(struct purelifi_chip *chip, u16 interval,
> +  u8 dtim_period, int type)
> +{
> + if (!interval ||
> + (chip->beacon_set && chip->beacon_interval == interval)) {
> + return 0;
> + }

Don't need braces here

> + chip->beacon_interval = interval;
> + chip->beacon_set = true;
> + return usb_write_req((const u8 *)>beacon_interval,
> +  sizeof(chip->beacon_interval),
> +  USB_REQ_BEACON_INTERVAL_WR);

There's clearly an endian problem hiding here somewhere. You should have
"chip->beacon_interval" be (probably) __le16 since you send it to the
device as a buffer, yet the parameter to this function is just "u16".
Therefore, a conversion is missing somewhere - likely you need it to be
__le16 in the chip struct since you can't send USB requests from stack.


You should add some annotations for things getting sent to the device
like this, and then you can run sparse to validate them all over the
code.


> +}
> +
> +int purelifi_chip_init_hw(struct purelifi_chip *chip)
> +{
> + u8 *addr = purelifi_mac_get_perm_addr(purelifi_chip_to_mac(chip));
> + struct usb_device *udev = interface_to_usbdev(chip->usb.intf);
> +
> + pr_info("purelifi chip %02x:%02x v%02x  %02x-%02x-%02x %s\n",

"%04x:%04x" for the USB ID?

And you should probably use %pM for the MAC address - the format you
have there looks ... strange? Why only print the vendor OUI?

> +int purelifi_chip_switch_radio(struct purelifi_chip *chip, u32 value)
> +{
> + int r;
> +
> + r = usb_write_req((const u8 *), sizeof(value), USB_REQ_POWER_WR);
> + if (r)
> + dev_err(purelifi_chip_dev(chip), "POWER_WR failed (%d)\n", r);

Same endian problem here.

> 
> +static inline void purelifi_mc_add_addr(struct purelifi_mc_hash *hash,
> + u8 *addr)
> +{
> + unsigned int i = addr[5] >> 2;
> +
> + if (i < 32)
> + hash->low |= 1 << i;
> + else
> + hash->high |= 1 << (i - 32);

That's UB if i == 31, you need 1U << i, or just use the BIT() macro.

> + r = request_firmware((const struct firmware **), fw_name,

Not sure why that cast should be required?

> + fpga_dmabuff = NULL;
> + fpga_dmabuff = kmalloc(PLF_FPGA_SLEN, GFP_KERNEL);

that NULL assignment is pointless :)

> +
> + if (!fpga_dmabuff) {
> + r = -ENOMEM;
> + goto error_free_fw;
> + }
> + send_vendor_request(udev, PLF_VNDR_FPGA_SET_REQ, fpga_dmabuff, 
> sizeof(fpga_setting));
> + memcpy(fpga_setting, fpga_dmabuff, PLF_FPGA_SLEN);
> + kfree(fpga_dmabuff);

I'd say you should remove fpga_setting from the stack since you anyway
allocated it, save the memcpy() to the stack, and just use fpga_dmabuff
below - and then free it later?

> + for (tbuf_idx = 0; tbuf_idx < blk_tran_len; tbuf_idx++) {
> + /* u8 

Re: [PATCH 2/3] mac80211: Add support to trigger sta disconnect on hardware restart

2021-02-12 Thread Johannes Berg
On Fri, 2021-02-12 at 09:42 +0100, Johannes Berg wrote:
> On Tue, 2020-12-15 at 22:53 +0530, Youghandhar Chintala wrote:
> > The right fix would be to pull the entire data path into the host
> > +++ b/net/mac80211/ieee80211_i.h
> > @@ -748,6 +748,8 @@ struct ieee80211_if_mesh {
> >   * back to wireless media and to the local net stack.
> >   * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume.
> >   * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver
> > + * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware 
> > restart
> > + * recovery
> 
> How did you model this on IEEE80211_SDATA_DISCONNECT_RESUME, but than
> didn't check how that's actually used?
> 
> Please change it so that the two models are the same. You really don't
> need the wiphy flag.

In fact, you could even simply
generalize IEEE80211_SDATA_DISCONNECT_RESUME
and ieee80211_resume_disconnect() to _reconfig_ instead of _resume_, and
call it from the driver just before requesting HW restart.

johannes



Re: [PATCH 2/3] mac80211: Add support to trigger sta disconnect on hardware restart

2021-02-12 Thread Johannes Berg
On Tue, 2020-12-15 at 22:53 +0530, Youghandhar Chintala wrote:
> The right fix would be to pull the entire data path into the host

> +++ b/net/mac80211/ieee80211_i.h
> @@ -748,6 +748,8 @@ struct ieee80211_if_mesh {
>   *   back to wireless media and to the local net stack.
>   * @IEEE80211_SDATA_DISCONNECT_RESUME: Disconnect after resume.
>   * @IEEE80211_SDATA_IN_DRIVER: indicates interface was added to driver
> + * @IEEE80211_SDATA_DISCONNECT_HW_RESTART: Disconnect after hardware restart
> + *   recovery

How did you model this on IEEE80211_SDATA_DISCONNECT_RESUME, but than
didn't check how that's actually used?

Please change it so that the two models are the same. You really don't
need the wiphy flag.

johannes



Re: [PATCH 2/3] mac80211: Add support to trigger sta disconnect on hardware restart

2021-02-12 Thread Johannes Berg
On Fri, 2021-02-05 at 13:51 -0800, Abhishek Kumar wrote:
> Since using DELBA frame to APs to re-establish BA session has a
> dependency on APs and also some APs may not honour the DELBA frame.


That's completely out of spec ... Can you say which AP this was?

You could also try sending a BAR that updates the SN.

johannes



Re: [PATCH 1/3] cfg80211: Add wiphy flag to trigger STA disconnect after hardware restart

2021-02-12 Thread Johannes Berg
On Tue, 2020-12-15 at 23:00 +0530, Youghandhar Chintala wrote:
> 
>   * @WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK: The device supports bigger kek and kck 
> keys
> + * @WIPHY_FLAG_STA_DISCONNECT_ON_HW_RESTART: The device needs a trigger to
> + *   disconnect STA after target hardware restart. This flag should be
> + *   exposed by drivers which support target recovery.

You're not doing anything with this information in cfg80211, so
consequently it doesn't belong there.

johannes



Re: [PATCH AUTOSEL 4.9 4/4] init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov

2021-02-08 Thread Johannes Berg
On Mon, 2021-02-08 at 18:00 +, Sasha Levin wrote:
> From: Johannes Berg 
> 
> [ Upstream commit 55b6f763d8bcb5546997933105d66d3e6b080e6a ]
> 
> On ARCH=um, loading a module doesn't result in its constructors getting
> called, which breaks module gcov since the debugfs files are never
> registered.  On the other hand, in-kernel constructors have already been
> called by the dynamic linker, so we can't call them again.
> 
> Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be
> selected, but avoiding the in-kernel constructor calls.
> 
> Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now, since we
> really do want CONSTRUCTORS, just not kernel binary ones.
> 
> Link: 
> https://lkml.kernel.org/r/20210120172041.c246a2cac2fb.I1358f584b76f1898373adfed77f4462c8705b736@changeid
> 


While I don't really *object* to this getting backported, it's also a
(development) corner case that somebody wants gcov and modules in
ARCH=um ... I'd probably not backport this.

johannes



Re: [PATCH] wireless: fix typo issue

2021-02-02 Thread Johannes Berg
On Wed, 2021-02-03 at 15:00 +0800, samirweng1979 wrote:
> From: wengjianfeng 
> 
> change 'iff' to 'if'.
> 
> Signed-off-by: wengjianfeng 
> ---
>  net/wireless/chan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/wireless/chan.c b/net/wireless/chan.c
> index 285b807..2f17edf 100644
> --- a/net/wireless/chan.c
> +++ b/net/wireless/chan.c
> @@ -1084,7 +1084,7 @@ bool cfg80211_chandef_usable(struct wiphy *wiphy,
>   * associated to an AP on the same channel or on the same UNII band
>   * (assuming that the AP is an authorized master).
>   * In addition allow operation on a channel on which indoor operation is
> - * allowed, iff we are currently operating in an indoor environment.
> + * allowed, if we are currently operating in an indoor environment.
>   */

I suspect that was intentional, as a common abbreviation for "if and
only if".

johannes



Re: possible deadlock in cfg80211_netdev_notifier_call

2021-02-01 Thread Johannes Berg
On Mon, 2021-02-01 at 14:37 +0200, Mike Rapoport wrote:
> On Mon, Feb 01, 2021 at 01:17:13AM -0800, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:b01f250d Add linux-next specific files for 20210129
> > git tree:   linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14daa408d0
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=725bc96dc234fda7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=2ae0ca9d7737ad1a62b7
> > compiler:   gcc (GCC) 10.1.0-syz 20200507
> > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1757f2a0d0
> > 
> > The issue was bisected to:
> > 
> > commit cc9327f3b085ba5be5639a5ec3ce5b08a0f14a7c
> > Author: Mike Rapoport 
> > Date:   Thu Jan 28 07:42:40 2021 +
> > 
> > mm: introduce memfd_secret system call to create "secret" memory areas
> > 
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1505d28cd0
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=1705d28cd0
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1305d28cd0
> 
> Sounds really weird to me. At this point the memfd_secret syscall is not
> even wired to arch syscall handlers. I cannot see how it can be a reason of
> deadlock in wireless...

Yeah, forget about it. Usually this is a consequence of the way syzbot
creates tests - it might have created something like

  if (!create_secret_memfd())
return;
  try_something_on_wireless()

and then of course without your patch it cannot get to the wireless
bits.

Pretty sure I know what's going on here, I'll take a closer look later.

johannes



Re: WARNING in sta_info_insert_check

2021-02-01 Thread Johannes Berg
On Sun, 2021-01-31 at 21:26 -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:bec4c296 Merge tag 'ecryptfs-5.11-rc6-setxattr-fix' of git..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11991778d0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f75d66d6d359ef2f
> dashboard link: https://syzkaller.appspot.com/bug?extid=8dcc087eb24227ded47e
> userspace arch: arm64
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8dcc087eb24227ded...@syzkaller.appspotmail.com

Looks like this is a dup.

#syz dup: WARNING in sta_info_insert_rcu

Just in this case sta_info_insert_check() didn't get inlined into
sta_info_insert_rcu().

johannes



Re: [PATCH] nl80211: ignore the length of hide ssid is zero in scan

2021-01-28 Thread Johannes Berg
On Thu, 2021-01-28 at 19:56 +0800, samirweng1979 wrote:
> From: wengjianfeng 
> 
> If the length of hide ssid is zero in scan, don't pass
> it to driver, which doesn't make any sense.

Err, please check again how scanning works. This is quite obviously
intentional.

johannes



Re: [PATCH] ath9k: fix build error with LEDS_CLASS=m

2021-01-27 Thread Johannes Berg
On Wed, 2021-01-27 at 12:35 +0200, Kalle Valo wrote:
> Arnd Bergmann  writes:
> 
> > On Mon, Jan 25, 2021 at 4:04 PM Krzysztof Kozlowski  wrote:
> > > On Mon, 25 Jan 2021 at 15:38, Arnd Bergmann  wrote:
> > > > On Mon, Jan 25, 2021 at 2:27 PM Krzysztof Kozlowski  
> > > > wrote:
> > > 
> > > I meant that having MAC80211_LEDS selected causes the ath9k driver to
> > > toggle on/off the WiFi LED. Every second, regardless whether it's
> > > doing something or not. In my setup, I have problems with a WiFi
> > > dongle somehow crashing (WiFi disappears, nothing comes from the
> > > dongle... maybe it's Atheros FW, maybe some HW problem) and I found
> > > this LED on/off slightly increases the chances of this dongle-crash.
> > > That was the actual reason behind my commits.
> > > 
> > > Second reason is that I don't want to send USB commands every second
> > > when the device is idle. It unnecessarily consumes power on my
> > > low-power device.
> > 
> > Ok, I see.
> > 
> > > Of course another solution is to just disable the trigger via sysfs
> > > LED API. It would also work but my patch allows entire code to be
> > > compiled-out (which was conditional in ath9k already).
> > > 
> > > Therefore the patch I sent allows the ath9k LED option to be fully
> > > choosable. Someone wants every-second-LED-blink, sure, enable
> > > ATH9K_LEDS and you have it. Someone wants to reduce the kernel size,
> > > don't enable ATH9K_LEDS.
> > 
> > Originally, I think this is what CONFIG_MAC80211_LEDS was meant
> > for, but it seems that this is not actually practical, since this also
> > gets selected by half of the drivers using it, while the other half have
> > a dependency on it. Out of the ones that select it, some in turn
> > select LEDS_CLASS, while some depend on it.
> > 
> > I think this needs a larger-scale cleanup for consistency between
> > (at least) all the wireless drivers using LEDs.
> 
> I agree, this needs cleanup.
> 
> > Either your patch or mine should get applied in the meantime, and I
> > don't care much which one in this case, as we still have the remaining
> > inconsistency.
> 
> My problem with Krzysztof's patch[1] is that it adds a new Kconfig
> option for ath9k, is that really necessary? Like Arnd said, we should
> fix drivers to use CONFIG_MAC80211_LEDS instead of having driver
> specific options.
> 
> So I would prefer take this Arnd's patch instead and queue it for v5.11.
> But as it modifies mac80211 I'll need an ack from Johannes, what do you
> think?

Sure, that seems fine.

Acked-by: Johannes Berg 

johannes



[PATCH] fs/pipe: allow sendfile() to pipe again

2021-01-26 Thread Johannes Berg
After commit 36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops") sendfile() could no longer send data
from a real file to a pipe, breaking for example certain cgit
setups (e.g. when running behind fcgiwrap), because in this
case cgit will try to do exactly this: sendfile() to a pipe.

Fix this by using iter_file_splice_write for the splice_write
method of pipes, as suggested by Christoph.

Cc: sta...@vger.kernel.org
Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Suggested-by: Christoph Hellwig 
Tested-by: Johannes Berg 
Signed-off-by: Johannes Berg 
---
Mostly for the record, since it looks like we'll have a proper
fix without another intermediate pipe. However, this fixes the
regression for now.
---
 fs/pipe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index c5989cfd564d..39c96845a72f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = {
.unlocked_ioctl = pipe_ioctl,
.release= pipe_release,
.fasync = pipe_fasync,
+   .splice_write   = iter_file_splice_write,
 };
 
 /*
-- 
2.26.2



Re: linux-next boot error: WARNING in cfg80211_register_netdevice

2021-01-25 Thread Johannes Berg
On Mon, 2021-01-25 at 01:52 -0800, syzbot wrote:
> 
> [ cut here ]
> WARNING: CPU: 0 PID: 1 at net/wireless/core.c:1336 
> cfg80211_register_netdevice+0x235/0x330 net/wireless/core.c:1336
> 


Yes, umm. I accidentally *copied* that line a few lines further down
rather than *moving* it down. Ilan told me about it yesterday but I
didn't have time to check and fix it up. It's fixed in mac80211-next
now.

johannes



Re: Splicing to/from a tty

2021-01-21 Thread Johannes Berg
On Thu, 2021-01-21 at 07:05 +0100, Willy Tarreau wrote:

> I think that most users of splice() on sockets got used to falling back
> to recv/send on splice failure due to various cases not being supported
> historically (UNIX family sockets immediately come to my mind but I seem
> to remember other combinations).

Note, however, that I got here because cgit, if using sendfile(), does
*not* fall back if it fails (and thus my git tree view is currently down
because I haven't downgraded the kernel so far). That may not be common
for splice() though.

johannes



Re: [PATCH v2] init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov

2021-01-20 Thread Johannes Berg
On Wed, 2021-01-20 at 18:04 +0100, Peter Oberparleiter wrote:
> 
> > Tested with a kernel configured with CONFIG_GCOV_KERNEL, without
> > the patch nothing ever appears in /sys/kernel/debug/gcov/ (apart
> > from the reset file), and with it we get the files and they work.
> 
> Just to be sure: could you confirm that this test statement for UML also
> applies to v2, i.e. that it fixes the original problem you were seeing?

Yes, I tested this version too :)

johannes



[PATCH v2] init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov

2021-01-20 Thread Johannes Berg
From: Johannes Berg 

On ARCH=um, loading a module doesn't result in its constructors
getting called, which breaks module gcov since the debugfs files
are never registered. On the other hand, in-kernel constructors
have already been called by the dynamic linker, so we can't call
them again.

Get out of this conundrum by allowing CONFIG_CONSTRUCTORS to be
selected, but avoiding the in-kernel constructor calls.

Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now,
since we really do want CONSTRUCTORS, just not kernel binary
ones.

Signed-off-by: Johannes Berg 
---
Tested with a kernel configured with CONFIG_GCOV_KERNEL, without
the patch nothing ever appears in /sys/kernel/debug/gcov/ (apart
from the reset file), and with it we get the files and they work.

v2:
 * special-case UML instead of splitting the CONSTRUCTORS config
---
 init/Kconfig| 1 -
 init/main.c | 8 +++-
 kernel/gcov/Kconfig | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index b77c60f8b963..29ad68325028 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -76,7 +76,6 @@ config CC_HAS_ASM_INLINE
 
 config CONSTRUCTORS
bool
-   depends on !UML
 
 config IRQ_WORK
bool
diff --git a/init/main.c b/init/main.c
index c68d784376ca..a626e78dbf06 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1066,7 +1066,13 @@ asmlinkage __visible void __init __no_sanitize_address 
start_kernel(void)
 /* Call all constructor functions linked into the kernel. */
 static void __init do_ctors(void)
 {
-#ifdef CONFIG_CONSTRUCTORS
+/*
+ * For UML, the constructors have already been called by the
+ * normal setup code as it's just a normal ELF binary, so we
+ * cannot do it again - but we do need CONFIG_CONSTRUCTORS
+ * even on UML for modules.
+ */
+#if defined(CONFIG_CONSTRUCTORS) && !defined(CONFIG_UML)
ctor_fn_t *fn = (ctor_fn_t *) __ctors_start;
 
for (; fn < (ctor_fn_t *) __ctors_end; fn++)
diff --git a/kernel/gcov/Kconfig b/kernel/gcov/Kconfig
index 3110c77230c7..f62de2dea8a3 100644
--- a/kernel/gcov/Kconfig
+++ b/kernel/gcov/Kconfig
@@ -4,7 +4,7 @@ menu "GCOV-based kernel profiling"
 config GCOV_KERNEL
bool "Enable gcov-based kernel profiling"
depends on DEBUG_FS
-   select CONSTRUCTORS if !UML
+   select CONSTRUCTORS
default n
help
This option enables gcov-based code profiling (e.g. for code coverage
-- 
2.26.2



Re: [PATCH] init/module: split CONFIG_CONSTRUCTORS to fix module gcov on UML

2021-01-20 Thread Johannes Berg
On Wed, 2021-01-20 at 17:07 +0100, Peter Oberparleiter wrote:

> Do you expect other users for these new config symbols? 

Probably not.

> If not it seems
> to me that the goal of enabling module constructors for UML could also
> be achieved without introducing new config symbols.

Yeah, true.

> For example you could suppress calling any built-in kernel constructors
> in case of UML by extending the ifdef in do_ctors() to depend on both
> CONFIG_CONSTRUCTORS and !CONFIG_UML (maybe with an explanatory comment).
> 
> Of course you'd still need to remove the !UML dependency in
> CONFIG_GCOV_KERNEL.

Right.

I can post a separate patch and we can see which one looks nicer?

johannes




[PATCH] init/module: split CONFIG_CONSTRUCTORS to fix module gcov on UML

2021-01-19 Thread Johannes Berg
From: Johannes Berg 

On ARCH=um, loading a module doesn't result in its constructors
getting called, which breaks module gcov since the debugfs files
are never registered. On the other hand, in-kernel constructors
have already been called by the dynamic linker, so we can't call
them again.

Get out of this conundrum by splitting CONFIG_CONSTRUCTORS into
CONFIG_CONSTRUCTORS_KERNEL and CONFIG_CONSTRUCTORS_MODULE, both
of which are enabled by default if CONFIG_CONSTRUCTORS is turned
on, but CONFIG_CONSTRUCTORS_KERNEL depends on !UML so that it's
not used on ARCH=um.

Also remove the "if !UML" from GCOV selecting CONSTRUCTORS now,
since we really do want CONSTRUCTORS, just not kernel binary
ones.

Signed-off-by: Johannes Berg 
---
Tested with a kernel configured with CONFIG_GCOV_KERNEL, without
the patch nothing ever appears in /sys/kernel/debug/gcov/ (apart
from the reset file), and with it we get the files and they work.

I have no idea which tree this might go through, any suggestions?
---
 include/asm-generic/vmlinux.lds.h | 6 +++---
 include/linux/module.h| 2 +-
 init/Kconfig  | 9 -
 init/main.c   | 2 +-
 kernel/gcov/Kconfig   | 2 +-
 kernel/module.c   | 4 ++--
 6 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index b2b3d81b1535..87b300471c54 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -698,7 +698,7 @@
INIT_TASK_DATA(align)   \
}
 
-#ifdef CONFIG_CONSTRUCTORS
+#ifdef CONFIG_CONSTRUCTORS_KERNEL
 #define KERNEL_CTORS() . = ALIGN(8);  \
__ctors_start = .; \
KEEP(*(SORT(.ctors.*)))\
@@ -990,11 +990,11 @@
 /*
  * Clang's -fsanitize=kernel-address and -fsanitize=thread produce
  * unwanted sections (.eh_frame and .init_array.*), but
- * CONFIG_CONSTRUCTORS wants to keep any .init_array.* sections.
+ * CONFIG_CONSTRUCTORS_KERNEL wants to keep any .init_array.* sections.
  * https://bugs.llvm.org/show_bug.cgi?id=46478
  */
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KCSAN)
-# ifdef CONFIG_CONSTRUCTORS
+# ifdef CONFIG_CONSTRUCTORS_KERNEL
 #  define SANITIZER_DISCARDS   \
*(.eh_frame)
 # else
diff --git a/include/linux/module.h b/include/linux/module.h
index 7a0bcb5b1ffc..027cfdbd84bd 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -528,7 +528,7 @@ struct module {
atomic_t refcnt;
 #endif
 
-#ifdef CONFIG_CONSTRUCTORS
+#ifdef CONFIG_CONSTRUCTORS_MODULE
/* Constructor functions. */
ctor_fn_t *ctors;
unsigned int num_ctors;
diff --git a/init/Kconfig b/init/Kconfig
index b77c60f8b963..e409de8d6c17 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -76,7 +76,14 @@ config CC_HAS_ASM_INLINE
 
 config CONSTRUCTORS
bool
-   depends on !UML
+
+config CONSTRUCTORS_KERNEL
+   def_bool y
+   depends on CONSTRUCTORS && !UML
+
+config CONSTRUCTORS_MODULE
+   def_bool y
+   depends on CONSTRUCTORS
 
 config IRQ_WORK
bool
diff --git a/init/main.c b/init/main.c
index c68d784376ca..51eb4802511c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1066,7 +1066,7 @@ asmlinkage __visible void __init __no_sanitize_address 
start_kernel(void)
 /* Call all constructor functions linked into the kernel. */
 static void __init do_ctors(void)
 {
-#ifdef CONFIG_CONSTRUCTORS
+#ifdef CONFIG_CONSTRUCTORS_KERNEL
ctor_fn_t *fn = (ctor_fn_t *) __ctors_start;
 
for (; fn < (ctor_fn_t *) __ctors_end; fn++)
diff --git a/kernel/gcov/Kconfig b/kernel/gcov/Kconfig
index 3110c77230c7..f62de2dea8a3 100644
--- a/kernel/gcov/Kconfig
+++ b/kernel/gcov/Kconfig
@@ -4,7 +4,7 @@ menu "GCOV-based kernel profiling"
 config GCOV_KERNEL
bool "Enable gcov-based kernel profiling"
depends on DEBUG_FS
-   select CONSTRUCTORS if !UML
+   select CONSTRUCTORS
default n
help
This option enables gcov-based code profiling (e.g. for code coverage
diff --git a/kernel/module.c b/kernel/module.c
index 4bf30e4b3eaa..c161a360d929 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3257,7 +3257,7 @@ static int find_module_sections(struct module *mod, 
struct load_info *info)
>num_unused_gpl_syms);
mod->unused_gpl_crcs = section_addr(info, "__kcrctab_unused_gpl");
 #endif
-#ifdef CONFIG_CONSTRUCTORS
+#ifdef CONFIG_CONSTRUCTORS_MODULE
mod->ctors = section_objs(info, ".ctors",
  sizeof(*mod->ctors), >num_ctors);
if (!mod->ctors)
@@ -3612,7 +3612,7 @@ static bool finished_loading(const char *name)
 /* Call module constructors. */
 stat

Re: Splicing to/from a tty

2021-01-18 Thread Johannes Berg
On Mon, 2021-01-18 at 09:53 +0100, Christoph Hellwig wrote:
> On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote:
> > > For my case, I attempted to instead implement splice_write and
> > > splice_read in tty_fops; I managed to get splice_write working calling
> > > ld->ops->write, but splice_read is not so simple because the
> > > tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> > > how to implement this without either (a) using set_fs, or (b)
> > > implementing iter ops on all line disciplines.
> > > 
> > > Is splice()ing between a tty and a pipe worth supporting at all? Not a
> > > big deal for my use case at least, but it used to work.
> > 
> > Is it even strictly related to the tty?
> > 
> > I was just now looking into why my cgit/fcgi/nginx setup no longer
> > works, and the reason is getting -EINVAL from sendfile() when the input
> > is a file and the output is a pipe().
> 
> Yes, pipes do not support ->splice_write currenly.

Well, it clearly used to work :-)

>I think just wiring
> up iter_file_splice_write would work.  Al?

Seems to work for the simple test case that I had, at least:

diff --git a/fs/pipe.c b/fs/pipe.c
index c5989cfd564d..39c96845a72f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = {
.unlocked_ioctl = pipe_ioctl,
.release= pipe_release,
.fasync = pipe_fasync,
+   .splice_write   = iter_file_splice_write,
 };
 
 /*

johannes



Re: Splicing to/from a tty

2021-01-16 Thread Johannes Berg
On Sat, 2021-01-16 at 20:35 +1300, Oliver Giles wrote:
> Commit 36e2c7421f02 (fs: don't allow splice read/write without
> explicit ops) broke my userspace application which talks to an SSL VPN
> by splice()ing between "openssl s_client" and "pppd". The latter
> operates over a pty, and since that commit there is no fallback for
> splice()ing between a pipe and a pty, or any tty for that matter.
> 
> The above commit mentions switching them to the iter ops and using
> generic_file_splice_read. IIUC, this would require implementing iter
> ops also on the line disciplines, which sounds pretty disruptive.
> 
> For my case, I attempted to instead implement splice_write and
> splice_read in tty_fops; I managed to get splice_write working calling
> ld->ops->write, but splice_read is not so simple because the
> tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> how to implement this without either (a) using set_fs, or (b)
> implementing iter ops on all line disciplines.
> 
> Is splice()ing between a tty and a pipe worth supporting at all? Not a
> big deal for my use case at least, but it used to work.

Is it even strictly related to the tty?

I was just now looking into why my cgit/fcgi/nginx setup no longer
works, and the reason is getting -EINVAL from sendfile() when the input
is a file and the output is a pipe().

So I wrote a simple test program (below) and that errors out on kernel
5.10.4, while it works fine on the 5.9.16 I currently have. Haven't
tried reverting anything yet, but now that I haev a test program it
should be simple to even bisect.

johannes


#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
int in = open(argv[0], O_RDONLY);
int p[2], out;
off_t off = 0;
int err;

assert(in >= 0);
assert(pipe(p) >= 0);
out = p[1];
err = sendfile(out, in, , 1024);
if (err < 0)
perror("sendfile");
assert(err == 1024);

return 0;
}




Re: [PATCH v6 12/16] net: tip: fix a couple kernel-doc markups

2021-01-14 Thread Johannes Berg
On Thu, 2021-01-14 at 10:34 -0800, Jakub Kicinski wrote:
> On Thu, 14 Jan 2021 10:59:08 -0500 Jon Maloy wrote:
> > On 1/14/21 3:04 AM, Mauro Carvalho Chehab wrote:
> > > A function has a different name between their prototype
> > > and its kernel-doc markup:
> > > 
> > >   ../net/tipc/link.c:2551: warning: expecting prototype for 
> > > link_reset_stats(). Prototype was for tipc_link_reset_stats() instead
> > >   ../net/tipc/node.c:1678: warning: expecting prototype for is the 
> > > general link level function for message sending(). Prototype was for 
> > > tipc_node_xmit() instead
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab 
> > 
> > Acked-by: Jon Maloy 
> 
> Thanks! Applied this one to net, the cfg80211 one does not apply to
> net, so I'll leave it to Johannes.

Right, that was diffed against -next, and I've got a fix pending that I
didn't send yet.

I've applied this now, thanks.

johannes



[PATCH v2] mm/slub: disable user tracing for kmemleak caches by default

2021-01-13 Thread Johannes Berg
From: Johannes Berg 

If kmemleak is enabled, it uses a kmem cache for its own objects.
These objects are used to hold information kmemleak uses, including
a stack trace. If slub_debug is also turned on, each of them has
*another* stack trace, so the overhead adds up, and on my tests (on
ARCH=um, admittedly) 2/3rds of the allocations end up being doing
the stack tracing.

Turn off SLAB_STORE_USER if SLAB_NOLEAKTRACE was given, to avoid
storing the essentially same data twice.

Signed-off-by: Johannes Berg 
---
Perhaps instead it should go the other way around, and kmemleak
could even use/access the stack trace that's already in there ...
But I don't really care too much, I can just turn off slub debug
for the kmemleak caches via the command line anyway :-)

v2:
 - strip SLAB_STORE_USER only coming from slub_debug so that the
   command line args always take effect

---
 mm/slub.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 34dcc09e2ec9..a66c9948c529 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1412,6 +1412,15 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
size_t len;
char *next_block;
slab_flags_t block_flags;
+   slab_flags_t slub_debug_local = slub_debug;
+
+   /*
+* If the slab cache is for debugging (e.g. kmemleak) then
+* don't store user (stack trace) information by default,
+* but let the user enable it via the command line below.
+*/
+   if (flags & SLAB_NOLEAKTRACE)
+   slub_debug_local &= ~SLAB_STORE_USER;
 
len = strlen(name);
next_block = slub_debug_string;
@@ -1446,7 +1455,7 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
}
}
 
-   return flags | slub_debug;
+   return flags | slub_debug_local;
 }
 #else /* !CONFIG_SLUB_DEBUG */
 static inline void setup_object_debug(struct kmem_cache *s,
-- 
2.26.2



Re: [PATCH] mm/slub: disable user tracing for kmemleak caches

2021-01-13 Thread Johannes Berg
On Wed, 2021-01-13 at 17:59 +0100, Vlastimil Babka wrote:
> On 1/13/21 5:09 PM, Johannes Berg wrote:
> > From: Johannes Berg 
> > 
> > If kmemleak is enabled, it uses a kmem cache for its own objects.
> > These objects are used to hold information kmemleak uses, including
> > a stack trace. If slub_debug is also turned on, each of them has
> > *another* stack trace, so the overhead adds up, and on my tests (on
> > ARCH=um, admittedly) 2/3rds of the allocations end up being doing
> > the stack tracing.
> > 
> > Turn off SLAB_STORE_USER if SLAB_NOLEAKTRACE was given, to avoid
> > storing the essentially same data twice.
> > 
> > Signed-off-by: Johannes Berg 
> 
> How about stripping away SLAB_STORE_USER only if it's added from the global
> slub_debug variable? In case somebody lists one of the kmemleak caches
> explicitly in "slub_debug=..." instead of just booting with "slub_debug", we
> should honor that.

Good point, that makes a lot of sense.

TBH, I mostly sent this to see if anyone would think it acceptable. I've
now disabled slub debugging completely for the kmemleak caches by
command line, and as expected that improves things further. I'm _hoping_
of course that kmemleak itself doesn't contain egregious bugs, but seems
like a fair bet for now :)

So what do you/people think? Should we disable this? Disable all?
Subject to the above constraint, either way.

Thanks,
johannes



  1   2   3   4   5   6   7   8   9   10   >