Re: tracer_init_tracefs really slow
On Wed, 09 Dec 2020 15:38:50 +0100 Lucas Stach wrote: > - trace_eval_init(); > - > ... and this. Moving the trace_eval_init into its own initcall means it > gets started before tracer_init_tracefs. As it holds the > trace_event_sem while updating the eval maps, event_trace_init() then > blocks further initcall execution when it tries to grab this semaphore > a bit down the line, killing the parallelism we are trying to unlock > here. > > With those 2 lines dropped, the change seems to work as intended and > shaves ~830ms from the kernel boot time on this system. Ah, the rationale to do that was so that it could start parsing earlier. But you are right, if it's really slow, and is still parsing by the time we start populating the tracefs directory, it will then cause that to block as well. OK, I wont move it into its own initcall and send a v2. -- Steve
Re: tracer_init_tracefs really slow
Hi Steven, Am Montag, den 07.12.2020, 14:47 -0500 schrieb Steven Rostedt: On Mon, 07 Dec 2020 17:24:58 +0100 Lucas Stach wrote: > I would be happy to test a patch on our whimpy machines, though. :) Let me know if this helps: -- Steve diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 06134189e9a7..48aa5b6a7d15 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -9062,7 +9062,10 @@ int tracing_init_dentry(void) extern struct trace_eval_map *__start_ftrace_eval_maps[]; extern struct trace_eval_map *__stop_ftrace_eval_maps[]; -static void __init trace_eval_init(void) +static struct workqueue_struct *eval_map_wq __initdata; +struct work_struct eval_map_work __initdata; + +static void __init eval_map_work_func(struct work_struct *work) { int len; @@ -9070,6 +9073,34 @@ static void __init trace_eval_init(void) trace_insert_eval_map(NULL, __start_ftrace_eval_maps, len); } +static int __init trace_eval_init(void) +{ + INIT_WORK(&eval_map_work, eval_map_work_func); + + eval_map_wq = alloc_workqueue("eval_map_wq", WQ_UNBOUND, 0); + if (!eval_map_wq) { + pr_err("Unable to allocate eval_map_wq\n"); + /* Do work here */ + eval_map_work_func(&eval_map_work); + return -ENOMEM; + } + + queue_work(eval_map_wq, &eval_map_work); + return 0; +} + +static int __init trace_eval_sync(void) +{ + /* Make sure the eval map updates are finished */ + if (eval_map_wq) + destroy_workqueue(eval_map_wq); + return 0; +} + +subsys_initcall(trace_eval_init); You don't want to do this... +late_initcall_sync(trace_eval_sync); + + #ifdef CONFIG_MODULES static void trace_module_add_evals(struct module *mod) { @@ -9171,8 +9202,6 @@ static __init int tracer_init_tracefs(void) trace_create_file("saved_tgids", 0444, NULL, NULL, &tracing_saved_tgids_fops); - trace_eval_init(); - ... and this. Moving the trace_eval_init into its own initcall means it gets started before tracer_init_tracefs. As it holds the trace_event_sem while updating the eval maps, event_trace_init() then blocks further initcall execution when it tries to grab this semaphore a bit down the line, killing the parallelism we are trying to unlock here. With those 2 lines dropped, the change seems to work as intended and shaves ~830ms from the kernel boot time on this system. Regards, Lucas trace_create_eval_file(NULL); #ifdef CONFIG_MODULES
Re: tracer_init_tracefs really slow
On Mon, 07 Dec 2020 17:24:58 +0100 Lucas Stach wrote: > I would be happy to test a patch on our whimpy machines, though. :) Let me know if this helps: -- Steve diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 06134189e9a7..48aa5b6a7d15 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -9062,7 +9062,10 @@ int tracing_init_dentry(void) extern struct trace_eval_map *__start_ftrace_eval_maps[]; extern struct trace_eval_map *__stop_ftrace_eval_maps[]; -static void __init trace_eval_init(void) +static struct workqueue_struct *eval_map_wq __initdata; +struct work_struct eval_map_work __initdata; + +static void __init eval_map_work_func(struct work_struct *work) { int len; @@ -9070,6 +9073,34 @@ static void __init trace_eval_init(void) trace_insert_eval_map(NULL, __start_ftrace_eval_maps, len); } +static int __init trace_eval_init(void) +{ + INIT_WORK(&eval_map_work, eval_map_work_func); + + eval_map_wq = alloc_workqueue("eval_map_wq", WQ_UNBOUND, 0); + if (!eval_map_wq) { + pr_err("Unable to allocate eval_map_wq\n"); + /* Do work here */ + eval_map_work_func(&eval_map_work); + return -ENOMEM; + } + + queue_work(eval_map_wq, &eval_map_work); + return 0; +} + +static int __init trace_eval_sync(void) +{ + /* Make sure the eval map updates are finished */ + if (eval_map_wq) + destroy_workqueue(eval_map_wq); + return 0; +} + +subsys_initcall(trace_eval_init); +late_initcall_sync(trace_eval_sync); + + #ifdef CONFIG_MODULES static void trace_module_add_evals(struct module *mod) { @@ -9171,8 +9202,6 @@ static __init int tracer_init_tracefs(void) trace_create_file("saved_tgids", 0444, NULL, NULL, &tracing_saved_tgids_fops); - trace_eval_init(); - trace_create_eval_file(NULL); #ifdef CONFIG_MODULES
Re: tracer_init_tracefs really slow
On Mon, 07 Dec 2020 17:24:58 +0100 Lucas Stach wrote: > > We could make a thread that does this, that the init wakes up and runs, > > letting the kernel to move forward. Would you like to make that patch > > or shall I? > > I guess you are much more likely to come up with a correct patch, as > I'm not really clear yet on when we would need to synchronize this > thread, to make sure things are available before they get used by > something. I likely won't have time in the near future to read enough > code in this particular spot of the kernel. No problem. I've just been told to try to delegate when I can ;-) > > I would be happy to test a patch on our whimpy machines, though. :) The code is only needed for user space, and we can make it a separate thread to do the work to let kernel move forward. The code is wrapped with a mutex so that even if it's still processing when user space is up and running, and it tries to access the code, it should block until it's finished as well. I'll write something up and send it to you. Thanks for the report! -- Steve
Re: tracer_init_tracefs really slow
Hi Steven, Am Donnerstag, den 03.12.2020, 21:18 -0500 schrieb Steven Rostedt: > Sorry for the really late reply, but I received this while I was on > vacation, and my backlog was so big when I got back that I left most of > it unread. :-/ And to make matters worse, my out-of-office script > wasn't working, to let people know I was on vacation. No problem, I already figured that this might have fallen through the cracks. It's also not really a high prio issue for us. > On Mon, 07 Sep 2020 18:16:52 +0200 > Lucas Stach wrote: > > > Hi all, > > > > one of my colleagues has taken a look at device boot times and stumbled > > across a pretty big amount of kernel boot time being spent in > > tracer_init_tracefs(). On this particular i.MX6Q based device the > > kernel spends more than 1 second in this function, which is a > > significant amount of the overall kernel inititalization time. While > > this machine is no rocket with its Cortex A9 @ 800MHz, the amount of > > CPU time being used there is pretty irritating. > > > > Specifically the issue lies within trace_event_eval_update where ~1100 > > trace_event_calls get updated with ~500 trace_eval_maps. I haven't had > > a chance yet to dig any deeper or try to understand more of what's > > going on there, but I wanted to get the issue out there in case anyone > > has some cycles to spare to help us along. > > OK, that makes sense. The macro TRACE_DEFINE_ENUM() will make a mapping > of enums into their values. This is needed because if an enum is used > in tp_printk() of a TRACE_EVENT(), the name of the ENUM is passed to > user space. The enum name is useless to user space, so this function > will scan the strings that are exported to user space and convert the > enum name to the enum values. > > > > > The obvious questions for now are: > > 1. Why is this function so damn expensive (at least on this whimpy ARM > > machine)? and > > Well, it's doing a string substitution for thousands of events. > > > > 2. Could any of this be done asynchronously, to not block the kernel in > > early init? > > Yes :-) > > We could make a thread that does this, that the init wakes up and runs, > letting the kernel to move forward. Would you like to make that patch > or shall I? I guess you are much more likely to come up with a correct patch, as I'm not really clear yet on when we would need to synchronize this thread, to make sure things are available before they get used by something. I likely won't have time in the near future to read enough code in this particular spot of the kernel. I would be happy to test a patch on our whimpy machines, though. :) Regards, Lucas
Re: tracer_init_tracefs really slow
Sorry for the really late reply, but I received this while I was on vacation, and my backlog was so big when I got back that I left most of it unread. :-/ And to make matters worse, my out-of-office script wasn't working, to let people know I was on vacation. On Mon, 07 Sep 2020 18:16:52 +0200 Lucas Stach wrote: > Hi all, > > one of my colleagues has taken a look at device boot times and stumbled > across a pretty big amount of kernel boot time being spent in > tracer_init_tracefs(). On this particular i.MX6Q based device the > kernel spends more than 1 second in this function, which is a > significant amount of the overall kernel inititalization time. While > this machine is no rocket with its Cortex A9 @ 800MHz, the amount of > CPU time being used there is pretty irritating. > > Specifically the issue lies within trace_event_eval_update where ~1100 > trace_event_calls get updated with ~500 trace_eval_maps. I haven't had > a chance yet to dig any deeper or try to understand more of what's > going on there, but I wanted to get the issue out there in case anyone > has some cycles to spare to help us along. OK, that makes sense. The macro TRACE_DEFINE_ENUM() will make a mapping of enums into their values. This is needed because if an enum is used in tp_printk() of a TRACE_EVENT(), the name of the ENUM is passed to user space. The enum name is useless to user space, so this function will scan the strings that are exported to user space and convert the enum name to the enum values. > > The obvious questions for now are: > 1. Why is this function so damn expensive (at least on this whimpy ARM > machine)? and Well, it's doing a string substitution for thousands of events. > 2. Could any of this be done asynchronously, to not block the kernel in > early init? Yes :-) We could make a thread that does this, that the init wakes up and runs, letting the kernel to move forward. Would you like to make that patch or shall I? -- Steve
tracer_init_tracefs really slow
Hi all, one of my colleagues has taken a look at device boot times and stumbled across a pretty big amount of kernel boot time being spent in tracer_init_tracefs(). On this particular i.MX6Q based device the kernel spends more than 1 second in this function, which is a significant amount of the overall kernel inititalization time. While this machine is no rocket with its Cortex A9 @ 800MHz, the amount of CPU time being used there is pretty irritating. Specifically the issue lies within trace_event_eval_update where ~1100 trace_event_calls get updated with ~500 trace_eval_maps. I haven't had a chance yet to dig any deeper or try to understand more of what's going on there, but I wanted to get the issue out there in case anyone has some cycles to spare to help us along. The obvious questions for now are: 1. Why is this function so damn expensive (at least on this whimpy ARM machine)? and 2. Could any of this be done asynchronously, to not block the kernel in early init? Regards, Lucas