Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Sat, Oct 17 2020 at 16:15, Alex Belits wrote: > On Sat, 2020-10-17 at 18:08 +0200, Thomas Gleixner wrote: >> On Sat, Oct 17 2020 at 01:08, Alex Belits wrote: >> > I think that the goal of "finding source of disturbance" interface >> > is >> > different from what can be accomplished by tracing in two ways: >> > >> > 1. "Source of disturbance" should provide some useful information >> > about >> > category of event and it cause as opposed to determining all >> > precise >> > details about things being called that resulted or could result in >> > disturbance. It should not depend on the user's knowledge about >> > details >> >> Tracepoints already give you selectively useful information. > > Carefully placed tracepoints also can give the user information about > failures of open(), write(), execve() or mmap(). However syscalls still > provide an error code instead of returning generic failure and letting > user debug the cause. I have absolutely no idea what you are trying to tell me. Thanks, tglx
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Sat, 2020-10-17 at 18:08 +0200, Thomas Gleixner wrote: > On Sat, Oct 17 2020 at 01:08, Alex Belits wrote: > > On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote: > > > On 10/4/20 7:14 PM, Frederic Weisbecker wrote: > > I think that the goal of "finding source of disturbance" interface > > is > > different from what can be accomplished by tracing in two ways: > > > > 1. "Source of disturbance" should provide some useful information > > about > > category of event and it cause as opposed to determining all > > precise > > details about things being called that resulted or could result in > > disturbance. It should not depend on the user's knowledge about > > details > > Tracepoints already give you selectively useful information. Carefully placed tracepoints also can give the user information about failures of open(), write(), execve() or mmap(). However syscalls still provide an error code instead of returning generic failure and letting user debug the cause. -- Alex
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Sat, Oct 17 2020 at 01:08, Alex Belits wrote: > On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote: >> On 10/4/20 7:14 PM, Frederic Weisbecker wrote: > I think that the goal of "finding source of disturbance" interface is > different from what can be accomplished by tracing in two ways: > > 1. "Source of disturbance" should provide some useful information about > category of event and it cause as opposed to determining all precise > details about things being called that resulted or could result in > disturbance. It should not depend on the user's knowledge about > details Tracepoints already give you selectively useful information. > of implementations, it should provide some definite answer of what > happened (with whatever amount of details can be given in a generic > mechanism) even if the user has no idea how those things happen and > what part of kernel is responsible for either causing or processing > them. Then if the user needs further details, they can be obtained with > tracing. It's just a matter of defining the tracepoint at the right place. > 2. It should be usable as a runtime error handling mechanism, so the > information it provides should be suitable for application use and > logging. It should be usable when applications are running on a system > in production, and no specific tracing or monitoring mechanism can be > in use. That's a strawman really. There is absolutely no reason why a specific set of tracepoints cannot be enabled on a production system. Your tracker is a monitoring mechanism, just a different flavour. By your logic above it cannot be enabled on a production system either. Also you can enable tracepoints from a control application, consume, log and act upon them. It's not any different from opening some magic isolation tracker interface. There are even multiple ways to do that including libraries. > If, say, thousands of devices are controlling neutrino detectors on an > ocean floor, and in a month of work one of them got one isolation > breaking event, it should be able to report that isolation was broken > by an interrupt from a network interface, so the users will be able to > track it down to some userspace application reconfiguring those > interrupts. Tracing can do that and it can do it selectively on the isolated CPUs. It's just a matter of proper configuration and usage. > It will be a good idea to make such mechanism optional and suitable for > tracking things on conditions other than "always enabled" and "enabled > with task isolation". Tracing already provides that. Tracepoints are individually controlled and filtered. > However in my opinion, there should be something in kernel entry > procedure that, if enabled, prepared something to be filled by the > cause data, and we know at least one such situation when this kernel > entry procedure should be triggered -- when task isolation is on. A tracepoint will gather that information for you. task isolation is not special, it's just yet another way to configure and use a system and tracepoints provide everything you need with the bonus that you can gather more correlated information when you need it. In fact tracing and tracepoints have replaced all specialized trackers which were in the kernel before tracing was available. We're not going to add a new one just because. If there is anything which you find that tracing and tracepoints cannot provide then the obvious solution is to extend that infrastructure so it can serve your usecase. Thanks, tglx
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote: > On 10/4/20 7:14 PM, Frederic Weisbecker wrote: > > On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote: > > > On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote: > > > > External Email > > > > > > > > - > > > > -- > > > > --- > > > > On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote: > > > > > +/* > > > > > + * Description of the last two tasks that ran isolated on a > > > > > given > > > > > CPU. > > > > > + * This is intended only for messages about isolation > > > > > breaking. We > > > > > + * don't want any references to actual task while accessing > > > > > this > > > > > from > > > > > + * CPU that caused isolation breaking -- we know nothing > > > > > about > > > > > timing > > > > > + * and don't want to use locking or RCU. > > > > > + */ > > > > > +struct isol_task_desc { > > > > > + atomic_t curr_index; > > > > > + atomic_t curr_index_wr; > > > > > + boolwarned[2]; > > > > > + pid_t pid[2]; > > > > > + pid_t tgid[2]; > > > > > + charcomm[2][TASK_COMM_LEN]; > > > > > +}; > > > > > +static DEFINE_PER_CPU(struct isol_task_desc, > > > > > isol_task_descs); > > > > So that's quite a huge patch that would have needed to be split > > > > up. > > > > Especially this tracing engine. > > > > > > > > Speaking of which, I agree with Thomas that it's unnecessary. > > > > It's > > > > too much > > > > code and complexity. We can use the existing trace events and > > > > perform > > > > the > > > > analysis from userspace to find the source of the disturbance. > > > The idea behind this is that isolation breaking events are > > > supposed to > > > be known to the applications while applications run normally, and > > > they > > > should not require any analysis or human intervention to be > > > handled. > > Sure but you can use trace events for that. Just trace interrupts, > > workqueues, > > timers, syscalls, exceptions and scheduler events and you get all > > the local > > disturbance. You might want to tune a few filters but that's pretty > > much it. > > > > As for the source of the disturbances, if you really need that > > information, > > you can trace the workqueue and timer queue events and just filter > > those that > > target your isolated CPUs. > > > > I agree that we can do all those things with tracing. > However, IMHO having a simplified logging mechanism to gather the > source of > violation may help in reducing the manual effort. > > Although, I am not sure how easy will it be to maintain such an > interface > over time. I think that the goal of "finding source of disturbance" interface is different from what can be accomplished by tracing in two ways: 1. "Source of disturbance" should provide some useful information about category of event and it cause as opposed to determining all precise details about things being called that resulted or could result in disturbance. It should not depend on the user's knowledge about details of implementations, it should provide some definite answer of what happened (with whatever amount of details can be given in a generic mechanism) even if the user has no idea how those things happen and what part of kernel is responsible for either causing or processing them. Then if the user needs further details, they can be obtained with tracing. 2. It should be usable as a runtime error handling mechanism, so the information it provides should be suitable for application use and logging. It should be usable when applications are running on a system in production, and no specific tracing or monitoring mechanism can be in use. If, say, thousands of devices are controlling neutrino detectors on an ocean floor, and in a month of work one of them got one isolation breaking event, it should be able to report that isolation was broken by an interrupt from a network interface, so the users will be able to track it down to some userspace application reconfiguring those interrupts. It will be a good idea to make such mechanism optional and suitable for tracking things on conditions other than "always enabled" and "enabled with task isolation". However in my opinion, there should be something in kernel entry procedure that, if enabled, prepared something to be filled by the cause data, and we know at least one such situation when this kernel entry procedure should be triggered -- when task isolation is on. -- Alex
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Tue, 2020-10-06 at 12:35 +0200, Frederic Weisbecker wrote: > On Mon, Oct 05, 2020 at 02:52:49PM -0400, Nitesh Narayan Lal wrote: > > On 10/4/20 7:14 PM, Frederic Weisbecker wrote: > > > On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote: > > > > > > > The idea behind this is that isolation breaking events are > > > > supposed to > > > > be known to the applications while applications run normally, > > > > and they > > > > should not require any analysis or human intervention to be > > > > handled. > > > Sure but you can use trace events for that. Just trace > > > interrupts, workqueues, > > > timers, syscalls, exceptions and scheduler events and you get all > > > the local > > > disturbance. You might want to tune a few filters but that's > > > pretty much it. > > formation, > > > you can trace the workqueue and timer queue events and just > > > filter those that > > > target your isolated CPUs. > > > > > > > I agree that we can do all those things with tracing. > > However, IMHO having a simplified logging mechanism to gather the > > source of > > violation may help in reducing the manual effort. > > > > Although, I am not sure how easy will it be to maintain such an > > interface > > over time. > > The thing is: tracing is your simplified logging mechanism here. You > can achieve > the same in userspace with _way_ less code, no race, and you can do > it in > bash. The idea is that this mechanism should be usable when no one is there to run things in bash, or no information about what might happen. It should be able to report rare events in production when users may not be able to reproduce them. -- Alex
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Mon, 2020-10-05 at 01:14 +0200, Frederic Weisbecker wrote: > Speaking of which, I agree with Thomas that it's unnecessary. > > > It's > > > too much > > > code and complexity. We can use the existing trace events and > > > perform > > > the > > > analysis from userspace to find the source of the disturbance. > > > > The idea behind this is that isolation breaking events are supposed > > to > > be known to the applications while applications run normally, and > > they > > should not require any analysis or human intervention to be > > handled. > > Sure but you can use trace events for that. Just trace interrupts, > workqueues, > timers, syscalls, exceptions and scheduler events and you get all the > local > disturbance. You might want to tune a few filters but that's pretty > much it. And keep all tracing enabled all the time, just to be able to figure out that disturbance happened at all? Or do you mean that we can use kernel entry mechanism to reliably determine that isolation breaking event happened (so the isolation- breaking procedure can be triggered as early as possible), yet avoid trying to determine why exactly it happened, and use tracing if we want to know? Original patch did the opposite, it triggered any isolation-breaking procedure only once it was known specifically, what kind of event happened -- a hardware interrupt, IPI, syscall, page fault, or any other kind of exception, possibly something architecture-specific. This, of course, always had a potential problem with coverage -- if handling of something is missing, isolation breaking is not handled at all, and there is no obvious way of finding if we covered everything. This also made the patch large and somewhat ugly. When I have added a mechanism for low-level isolation breaking handling on kernel entry, it also partially improved the problem with completeness. Partially because I have not yet added handling of "unknown cause" before returning to userspace, however that would be a logical thing to do. Then if we entered kernel from isolation, did something, and are returning to userspace still not knowing what kind of isolation-breaking event happened, we can still trigger isolation breaking. Did I get it right, and you mean that we can remove all specific handling of isolation breaking causes, except for syscall that exits isolation, and report isolation breaking instead of normally returning to userspace? Then isolation breaking will be handled reliably without knowing the cause, and we can leave determining the cause to the tracing mechanism (if enabled)? This does make sense. However for me it looks somewhat strange, because I assume isolation breaking to be a kind of runtime error, that userspace software is supposed to get some basic information about -- like, signals distinguishing between, say, SIGSEGV and SIGPIPE, or write() being able to set errno to ENOSPC or EIO. Then userspace receives basic information about the cause of exception or error, and can do some meaningful reporting, or decide if the error should be fatal for the application or handled differently, based on its internal logic. To get those distinctions, application does not have to be aware of anything internal to the kernel. Similarly distinguishing between, say, a page fault, device interrupt and a timer may be important for a logic implemented in userspace, and I think, it may be nice to allow userspace to get this information immediately and without being aware of any additional details of kernel implementation. The current patch doesn't do this yet, however the intention is to implement reliable isolation breaking by checking on userspace re-entry, plus make reporting of causes, if any were found, visible to the userspace in some convenient way. The part that determines the cause can be implemented separately from isolation breaking mechanism. Then we can have isolation breaking on kernel entry (or potentially some other condition on kernel entry that requires logging the cause) enable reporting, then reporting mechanism, if it exists will fill the blanks, and once either cause is known, or it's time to return to userspace, notification will be done with whatever information is available. For some in-depth analysis, if necessary for debugging the kernel, we can have tracing check if we are in this "suspicious kernel entry" mode, and log things that otherwise would not be. > As for the source of the disturbances, if you really need that > information, > you can trace the workqueue and timer queue events and just filter > those that > target your isolated CPUs. For the purpose of human debugging the kernel or application, the more information is (usually) the better, so the only concern here is that now user is responsible for completeness of things he is tracing. However from application's point of view, or for logging in a production environment it's usually more important to get general type of events, so it's possible to, say, confirm that
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Mon, Oct 05, 2020 at 02:52:49PM -0400, Nitesh Narayan Lal wrote: > > On 10/4/20 7:14 PM, Frederic Weisbecker wrote: > > On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote: > >> On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote: > >>> External Email > >>> > >>> --- > >>> --- > >>> On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote: > +/* > + * Description of the last two tasks that ran isolated on a given > CPU. > + * This is intended only for messages about isolation breaking. We > + * don't want any references to actual task while accessing this > from > + * CPU that caused isolation breaking -- we know nothing about > timing > + * and don't want to use locking or RCU. > + */ > +struct isol_task_desc { > +atomic_t curr_index; > +atomic_t curr_index_wr; > +boolwarned[2]; > +pid_t pid[2]; > +pid_t tgid[2]; > +charcomm[2][TASK_COMM_LEN]; > +}; > +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs); > >>> So that's quite a huge patch that would have needed to be split up. > >>> Especially this tracing engine. > >>> > >>> Speaking of which, I agree with Thomas that it's unnecessary. It's > >>> too much > >>> code and complexity. We can use the existing trace events and perform > >>> the > >>> analysis from userspace to find the source of the disturbance. > >> The idea behind this is that isolation breaking events are supposed to > >> be known to the applications while applications run normally, and they > >> should not require any analysis or human intervention to be handled. > > Sure but you can use trace events for that. Just trace interrupts, > > workqueues, > > timers, syscalls, exceptions and scheduler events and you get all the local > > disturbance. You might want to tune a few filters but that's pretty much it. > > > > As for the source of the disturbances, if you really need that information, > > you can trace the workqueue and timer queue events and just filter those > > that > > target your isolated CPUs. > > > > I agree that we can do all those things with tracing. > However, IMHO having a simplified logging mechanism to gather the source of > violation may help in reducing the manual effort. > > Although, I am not sure how easy will it be to maintain such an interface > over time. The thing is: tracing is your simplified logging mechanism here. You can achieve the same in userspace with _way_ less code, no race, and you can do it in bash. Thanks.
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On 10/4/20 7:14 PM, Frederic Weisbecker wrote: > On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote: >> On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote: >>> External Email >>> >>> --- >>> --- >>> On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote: +/* + * Description of the last two tasks that ran isolated on a given CPU. + * This is intended only for messages about isolation breaking. We + * don't want any references to actual task while accessing this from + * CPU that caused isolation breaking -- we know nothing about timing + * and don't want to use locking or RCU. + */ +struct isol_task_desc { + atomic_t curr_index; + atomic_t curr_index_wr; + boolwarned[2]; + pid_t pid[2]; + pid_t tgid[2]; + charcomm[2][TASK_COMM_LEN]; +}; +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs); >>> So that's quite a huge patch that would have needed to be split up. >>> Especially this tracing engine. >>> >>> Speaking of which, I agree with Thomas that it's unnecessary. It's >>> too much >>> code and complexity. We can use the existing trace events and perform >>> the >>> analysis from userspace to find the source of the disturbance. >> The idea behind this is that isolation breaking events are supposed to >> be known to the applications while applications run normally, and they >> should not require any analysis or human intervention to be handled. > Sure but you can use trace events for that. Just trace interrupts, workqueues, > timers, syscalls, exceptions and scheduler events and you get all the local > disturbance. You might want to tune a few filters but that's pretty much it. > > As for the source of the disturbances, if you really need that information, > you can trace the workqueue and timer queue events and just filter those that > target your isolated CPUs. > I agree that we can do all those things with tracing. However, IMHO having a simplified logging mechanism to gather the source of violation may help in reducing the manual effort. Although, I am not sure how easy will it be to maintain such an interface over time. -- Thanks Nitesh signature.asc Description: OpenPGP digital signature
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote: > On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote: > > External Email > > > > --- > > --- > > On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote: > > > +/* > > > + * Description of the last two tasks that ran isolated on a given > > > CPU. > > > + * This is intended only for messages about isolation breaking. We > > > + * don't want any references to actual task while accessing this > > > from > > > + * CPU that caused isolation breaking -- we know nothing about > > > timing > > > + * and don't want to use locking or RCU. > > > + */ > > > +struct isol_task_desc { > > > + atomic_t curr_index; > > > + atomic_t curr_index_wr; > > > + boolwarned[2]; > > > + pid_t pid[2]; > > > + pid_t tgid[2]; > > > + charcomm[2][TASK_COMM_LEN]; > > > +}; > > > +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs); > > > > So that's quite a huge patch that would have needed to be split up. > > Especially this tracing engine. > > > > Speaking of which, I agree with Thomas that it's unnecessary. It's > > too much > > code and complexity. We can use the existing trace events and perform > > the > > analysis from userspace to find the source of the disturbance. > > The idea behind this is that isolation breaking events are supposed to > be known to the applications while applications run normally, and they > should not require any analysis or human intervention to be handled. Sure but you can use trace events for that. Just trace interrupts, workqueues, timers, syscalls, exceptions and scheduler events and you get all the local disturbance. You might want to tune a few filters but that's pretty much it. As for the source of the disturbances, if you really need that information, you can trace the workqueue and timer queue events and just filter those that target your isolated CPUs. > > A process may exit isolation because some leftover delayed work, for > example, a timer or a workqueue, is still present on a CPU, or because > a page fault or some other exception, normally handled silently, is > caused by the task. It is also possible to direct an interrupt to a CPU > that is running an isolated task -- currently it's perfectly valid to > set interrupt smp affinity to a CPU running isolated task, and then > interrupt will cause breaking isolation. While it's probably not the > best way of handling interrupts, I would rather not prohibit this > explicitly. Sure, but you can trace all these events with the existing tracing interface we have. > > There is also a matter of avoiding race conditions on entering > isolation. Once CPU entered isolation, other CPUs should avoid > disturbing it when they know that CPU is running a task in isolated > mode. However for a short time after entering isolation other CPUs may > be unaware of this, and will still send IPIs to it. Preventing this > scenario completely would be very costly in terms of what other CPUs > will have to do before notifying others, so similar to how EINTR works, > we can simply specify that this is allowed, and task is supposed to re- > enter isolation after this. It's still a bad idea to specify that > isolation breaking can continue happening while application is running > in isolated mode, however allowing some "grace period" after entering > is acceptable as long as application is aware of this happening. Right but that doesn't look related to tracing. Anyway I guess we can make the CPU enter some specific mode after calling synchronize_rcu(). > > In libtmc I have moved this handling of isolation breaking into a > separate thread, intended to become a separate daemon if necessary. In > part it was done because initial implementation of isolation made it > very difficult to avoid repeating delayed work on isolated CPUs, so > something had to watch for it from non-isolated CPU. It's possible that > now, when delayed work does not appear on isolated CPUs out of nowhere, > the need in isolation manager thread will disappear, and task itself > will be able to handle all isolation breaking, like original > implementation by Chris was supposed to. > > However in either case it's still useful for the task, or isolation > manager, to get a description of the isolation-breaking event. This is > what those things are intended for. Now they only produce log messages > because this is where initially all description of isolation-breaking > events went, however I would prefer to make logging optional but always > let applications read those events descriptions, regardless of any > tracing mechanism being used. I was more focused on making the > reporting mechanism properly detect the cause of isolation breaking > because that functionality was not quite working in earlier work by > Chris and Yuri, so I have kept logging as the only output, but made it > suitable for producing events that applications
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Thu, 2020-10-01 at 16:40 +0200, Frederic Weisbecker wrote: > External Email > > --- > --- > On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote: > > +/** > > + * task_isolation_kernel_enter() - clear low-level task isolation > > flag > > + * > > + * This should be called immediately after entering kernel. > > + */ > > +static inline void task_isolation_kernel_enter(void) > > +{ > > + unsigned long flags; > > + > > + /* > > +* This function runs on a CPU that ran isolated task. > > +* > > +* We don't want this CPU running code from the rest of kernel > > +* until other CPUs know that it is no longer isolated. > > +* When CPU is running isolated task until this point anything > > +* that causes an interrupt on this CPU must end up calling > > this > > +* before touching the rest of kernel. That is, this function > > or > > +* fast_task_isolation_cpu_cleanup() or stop_isolation() > > calling > > +* it. If any interrupt, including scheduling timer, arrives, > > it > > +* will still end up here early after entering kernel. > > +* From this point interrupts are disabled until all CPUs will > > see > > +* that this CPU is no longer running isolated task. > > +* > > +* See also fast_task_isolation_cpu_cleanup(). > > +*/ > > + smp_rmb(); > > I'm a bit confused what this read memory barrier is ordering. Also > against > what it pairs. My bad, I have kept it after there were left no write accesses from other CPUs. > > > + if((this_cpu_read(ll_isol_flags) & FLAG_LL_TASK_ISOLATION) == > > 0) > > + return; > > + > > + local_irq_save(flags); > > + > > + /* Clear low-level flags */ > > + this_cpu_write(ll_isol_flags, 0); > > + > > + /* > > +* If something happened that requires a barrier that would > > +* otherwise be called from remote CPUs by CPU kick procedure, > > +* this barrier runs instead of it. After this barrier, CPU > > +* kick procedure would see the updated ll_isol_flags, so it > > +* will run its own IPI to trigger a barrier. > > +*/ > > + smp_mb(); > > + /* > > +* Synchronize instructions -- this CPU was not kicked while > > +* in isolated mode, so it might require synchronization. > > +* There might be an IPI if kick procedure happened and > > +* ll_isol_flags was already updated while it assembled a CPU > > +* mask. However if this did not happen, synchronize everything > > +* here. > > +*/ > > + instr_sync(); > > It's the first time I meet an instruction barrier. I should get > information > about that but what is it ordering here? Against barriers in instruction cache flushing (flush_icache_range() and such). > > + local_irq_restore(flags); > > +} > > Thanks.
Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel
On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote: > External Email > > --- > --- > On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote: > > +/* > > + * Description of the last two tasks that ran isolated on a given > > CPU. > > + * This is intended only for messages about isolation breaking. We > > + * don't want any references to actual task while accessing this > > from > > + * CPU that caused isolation breaking -- we know nothing about > > timing > > + * and don't want to use locking or RCU. > > + */ > > +struct isol_task_desc { > > + atomic_t curr_index; > > + atomic_t curr_index_wr; > > + boolwarned[2]; > > + pid_t pid[2]; > > + pid_t tgid[2]; > > + charcomm[2][TASK_COMM_LEN]; > > +}; > > +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs); > > So that's quite a huge patch that would have needed to be split up. > Especially this tracing engine. > > Speaking of which, I agree with Thomas that it's unnecessary. It's > too much > code and complexity. We can use the existing trace events and perform > the > analysis from userspace to find the source of the disturbance. The idea behind this is that isolation breaking events are supposed to be known to the applications while applications run normally, and they should not require any analysis or human intervention to be handled. A process may exit isolation because some leftover delayed work, for example, a timer or a workqueue, is still present on a CPU, or because a page fault or some other exception, normally handled silently, is caused by the task. It is also possible to direct an interrupt to a CPU that is running an isolated task -- currently it's perfectly valid to set interrupt smp affinity to a CPU running isolated task, and then interrupt will cause breaking isolation. While it's probably not the best way of handling interrupts, I would rather not prohibit this explicitly. There is also a matter of avoiding race conditions on entering isolation. Once CPU entered isolation, other CPUs should avoid disturbing it when they know that CPU is running a task in isolated mode. However for a short time after entering isolation other CPUs may be unaware of this, and will still send IPIs to it. Preventing this scenario completely would be very costly in terms of what other CPUs will have to do before notifying others, so similar to how EINTR works, we can simply specify that this is allowed, and task is supposed to re- enter isolation after this. It's still a bad idea to specify that isolation breaking can continue happening while application is running in isolated mode, however allowing some "grace period" after entering is acceptable as long as application is aware of this happening. In libtmc I have moved this handling of isolation breaking into a separate thread, intended to become a separate daemon if necessary. In part it was done because initial implementation of isolation made it very difficult to avoid repeating delayed work on isolated CPUs, so something had to watch for it from non-isolated CPU. It's possible that now, when delayed work does not appear on isolated CPUs out of nowhere, the need in isolation manager thread will disappear, and task itself will be able to handle all isolation breaking, like original implementation by Chris was supposed to. However in either case it's still useful for the task, or isolation manager, to get a description of the isolation-breaking event. This is what those things are intended for. Now they only produce log messages because this is where initially all description of isolation-breaking events went, however I would prefer to make logging optional but always let applications read those events descriptions, regardless of any tracing mechanism being used. I was more focused on making the reporting mechanism properly detect the cause of isolation breaking because that functionality was not quite working in earlier work by Chris and Yuri, so I have kept logging as the only output, but made it suitable for producing events that applications will be able to receive. Application, or isolation manager, will receive clear and unambiguous reporting, so there will be no need for any additional analysis or guesswork. After adding a proper "low-level" isolation flags, I got the idea that we might have a better yet reporting mechanism. Early isolation breaking detection on kernel entry may set a flag that says that isolation breaking happened, however its cause is unknown. Or, more likely, only some general information about isolation breaking is available, like a type of exception. Then, once a known isolation- breaking reporting mechanism is called from interrupt, syscall, IPI or exception processing, the flag is cleared, and reporting is supposed to be done. However if then kernel returns to userspace on isolated task but isolation breaking is not reported yet, an isolation breaking reporting