Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-17 Thread Thomas Gleixner
On Sat, Oct 17 2020 at 16:15, Alex Belits wrote:
> On Sat, 2020-10-17 at 18:08 +0200, Thomas Gleixner wrote:
>> On Sat, Oct 17 2020 at 01:08, Alex Belits wrote:
>> > I think that the goal of "finding source of disturbance" interface
>> > is
>> > different from what can be accomplished by tracing in two ways:
>> > 
>> > 1. "Source of disturbance" should provide some useful information
>> > about
>> > category of event and it cause as opposed to determining all
>> > precise
>> > details about things being called that resulted or could result in
>> > disturbance. It should not depend on the user's knowledge about
>> > details
>> 
>> Tracepoints already give you selectively useful information.
>
> Carefully placed tracepoints also can give the user information about
> failures of open(), write(), execve() or mmap(). However syscalls still
> provide an error code instead of returning generic failure and letting
> user debug the cause.

I have absolutely no idea what you are trying to tell me.

Thanks,

tglx


Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-17 Thread Alex Belits

On Sat, 2020-10-17 at 18:08 +0200, Thomas Gleixner wrote:
> On Sat, Oct 17 2020 at 01:08, Alex Belits wrote:
> > On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote:
> > > On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> > I think that the goal of "finding source of disturbance" interface
> > is
> > different from what can be accomplished by tracing in two ways:
> > 
> > 1. "Source of disturbance" should provide some useful information
> > about
> > category of event and it cause as opposed to determining all
> > precise
> > details about things being called that resulted or could result in
> > disturbance. It should not depend on the user's knowledge about
> > details
> 
> Tracepoints already give you selectively useful information.

Carefully placed tracepoints also can give the user information about
failures of open(), write(), execve() or mmap(). However syscalls still
provide an error code instead of returning generic failure and letting
user debug the cause.

-- 
Alex


Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-17 Thread Thomas Gleixner
On Sat, Oct 17 2020 at 01:08, Alex Belits wrote:
> On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote:
>> On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> I think that the goal of "finding source of disturbance" interface is
> different from what can be accomplished by tracing in two ways:
>
> 1. "Source of disturbance" should provide some useful information about
> category of event and it cause as opposed to determining all precise
> details about things being called that resulted or could result in
> disturbance. It should not depend on the user's knowledge about
> details

Tracepoints already give you selectively useful information.

> of implementations, it should provide some definite answer of what
> happened (with whatever amount of details can be given in a generic
> mechanism) even if the user has no idea how those things happen and
> what part of kernel is responsible for either causing or processing
> them. Then if the user needs further details, they can be obtained with
> tracing.

It's just a matter of defining the tracepoint at the right place.

> 2. It should be usable as a runtime error handling mechanism, so the
> information it provides should be suitable for application use and
> logging. It should be usable when applications are running on a system
> in production, and no specific tracing or monitoring mechanism can be
> in use.

That's a strawman really. There is absolutely no reason why a specific
set of tracepoints cannot be enabled on a production system.

Your tracker is a monitoring mechanism, just a different flavour.  By
your logic above it cannot be enabled on a production system either.

Also you can enable tracepoints from a control application, consume, log
and act upon them. It's not any different from opening some magic
isolation tracker interface. There are even multiple ways to do that
including libraries.

> If, say, thousands of devices are controlling neutrino detectors on an
> ocean floor, and in a month of work one of them got one isolation
> breaking event, it should be able to report that isolation was broken
> by an interrupt from a network interface, so the users will be able to
> track it down to some userspace application reconfiguring those
> interrupts.

Tracing can do that and it can do it selectively on the isolated
CPUs. It's just a matter of proper configuration and usage.

> It will be a good idea to make such mechanism optional and suitable for
> tracking things on conditions other than "always enabled" and "enabled
> with task isolation".

Tracing already provides that. Tracepoints are individually controlled
and filtered.

> However in my opinion, there should be something in kernel entry
> procedure that, if enabled, prepared something to be filled by the
> cause data, and we know at least one such situation when this kernel
> entry procedure should be triggered -- when task isolation is on.

A tracepoint will gather that information for you.

task isolation is not special, it's just yet another way to configure
and use a system and tracepoints provide everything you need with the
bonus that you can gather more correlated information when you need it.

In fact tracing and tracepoints have replaced all specialized trackers
which were in the kernel before tracing was available. We're not going
to add a new one just because.

If there is anything which you find that tracing and tracepoints cannot
provide then the obvious solution is to extend that infrastructure so it
can serve your usecase.

Thanks,

tglx


Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-16 Thread Alex Belits

On Mon, 2020-10-05 at 14:52 -0400, Nitesh Narayan Lal wrote:
> On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> > On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote:
> > > On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:
> > > > External Email
> > > > 
> > > > -
> > > > --
> > > > ---
> > > > On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote:
> > > > > +/*
> > > > > + * Description of the last two tasks that ran isolated on a
> > > > > given
> > > > > CPU.
> > > > > + * This is intended only for messages about isolation
> > > > > breaking. We
> > > > > + * don't want any references to actual task while accessing
> > > > > this
> > > > > from
> > > > > + * CPU that caused isolation breaking -- we know nothing
> > > > > about
> > > > > timing
> > > > > + * and don't want to use locking or RCU.
> > > > > + */
> > > > > +struct isol_task_desc {
> > > > > + atomic_t curr_index;
> > > > > + atomic_t curr_index_wr;
> > > > > + boolwarned[2];
> > > > > + pid_t   pid[2];
> > > > > + pid_t   tgid[2];
> > > > > + charcomm[2][TASK_COMM_LEN];
> > > > > +};
> > > > > +static DEFINE_PER_CPU(struct isol_task_desc,
> > > > > isol_task_descs);
> > > > So that's quite a huge patch that would have needed to be split
> > > > up.
> > > > Especially this tracing engine.
> > > > 
> > > > Speaking of which, I agree with Thomas that it's unnecessary.
> > > > It's
> > > > too much
> > > > code and complexity. We can use the existing trace events and
> > > > perform
> > > > the
> > > > analysis from userspace to find the source of the disturbance.
> > > The idea behind this is that isolation breaking events are
> > > supposed to
> > > be known to the applications while applications run normally, and
> > > they
> > > should not require any analysis or human intervention to be
> > > handled.
> > Sure but you can use trace events for that. Just trace interrupts,
> > workqueues,
> > timers, syscalls, exceptions and scheduler events and you get all
> > the local
> > disturbance. You might want to tune a few filters but that's pretty
> > much it.
> > 
> > As for the source of the disturbances, if you really need that
> > information,
> > you can trace the workqueue and timer queue events and just filter
> > those that
> > target your isolated CPUs.
> > 
> 
> I agree that we can do all those things with tracing.
> However, IMHO having a simplified logging mechanism to gather the
> source of
> violation may help in reducing the manual effort.
> 
> Although, I am not sure how easy will it be to maintain such an
> interface
> over time.

I think that the goal of "finding source of disturbance" interface is
different from what can be accomplished by tracing in two ways:

1. "Source of disturbance" should provide some useful information about
category of event and it cause as opposed to determining all precise
details about things being called that resulted or could result in
disturbance. It should not depend on the user's knowledge about details
of implementations, it should provide some definite answer of what
happened (with whatever amount of details can be given in a generic
mechanism) even if the user has no idea how those things happen and
what part of kernel is responsible for either causing or processing
them. Then if the user needs further details, they can be obtained with
tracing.

2. It should be usable as a runtime error handling mechanism, so the
information it provides should be suitable for application use and
logging. It should be usable when applications are running on a system
in production, and no specific tracing or monitoring mechanism can be
in use. If, say, thousands of devices are controlling neutrino
detectors on an ocean floor, and in a month of work one of them got one
isolation breaking event, it should be able to report that isolation
was broken by an interrupt from a network interface, so the users will
be able to track it down to some userspace application reconfiguring
those interrupts.

It will be a good idea to make such mechanism optional and suitable for
tracking things on conditions other than "always enabled" and "enabled
with task isolation". However in my opinion, there should be something
in kernel entry procedure that, if enabled, prepared something to be
filled by the cause data, and we know at least one such situation when
this kernel entry procedure should be triggered -- when task isolation
is on.

-- 
Alex


Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-16 Thread Alex Belits

On Tue, 2020-10-06 at 12:35 +0200, Frederic Weisbecker wrote:
> On Mon, Oct 05, 2020 at 02:52:49PM -0400, Nitesh Narayan Lal wrote:
> > On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> > > On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote:
> > > 
> > > > The idea behind this is that isolation breaking events are
> > > > supposed to
> > > > be known to the applications while applications run normally,
> > > > and they
> > > > should not require any analysis or human intervention to be
> > > > handled.
> > > Sure but you can use trace events for that. Just trace
> > > interrupts, workqueues,
> > > timers, syscalls, exceptions and scheduler events and you get all
> > > the local
> > > disturbance. You might want to tune a few filters but that's
> > > pretty much it.
> 
> formation,
> > > you can trace the workqueue and timer queue events and just
> > > filter those that
> > > target your isolated CPUs.
> > > 
> > 
> > I agree that we can do all those things with tracing.
> > However, IMHO having a simplified logging mechanism to gather the
> > source of
> > violation may help in reducing the manual effort.
> > 
> > Although, I am not sure how easy will it be to maintain such an
> > interface
> > over time.
> 
> The thing is: tracing is your simplified logging mechanism here. You
> can achieve
> the same in userspace with _way_ less code, no race, and you can do
> it in
> bash.

The idea is that this mechanism should be usable when no one is there
to run things in bash, or no information about what might happen. It
should be able to report rare events in production when users may not
be able to reproduce them.

-- 
Alex


Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-06 Thread Alex Belits
On Mon, 2020-10-05 at 01:14 +0200, Frederic Weisbecker wrote:
> Speaking of which, I agree with Thomas that it's unnecessary.
> > > It's
> > > too much
> > > code and complexity. We can use the existing trace events and
> > > perform
> > > the
> > > analysis from userspace to find the source of the disturbance.
> > 
> > The idea behind this is that isolation breaking events are supposed
> > to
> > be known to the applications while applications run normally, and
> > they
> > should not require any analysis or human intervention to be
> > handled.
> 
> Sure but you can use trace events for that. Just trace interrupts,
> workqueues,
> timers, syscalls, exceptions and scheduler events and you get all the
> local
> disturbance. You might want to tune a few filters but that's pretty
> much it.

And keep all tracing enabled all the time, just to be able to figure
out that disturbance happened at all?

Or do you mean that we can use kernel entry mechanism to reliably
determine that isolation breaking event happened (so the isolation-
breaking procedure can be triggered as early as possible), yet avoid
trying to determine why exactly it happened, and use tracing if we want
to know?

Original patch did the opposite, it triggered any isolation-breaking
procedure only once it was known specifically, what kind of event
happened -- a hardware interrupt, IPI, syscall, page fault, or any
other kind of exception, possibly something architecture-specific.
This, of course, always had a potential problem with coverage -- if
handling of something is missing, isolation breaking is not handled at
all, and there is no obvious way of finding if we covered everything.
This also made the patch large and somewhat ugly.

When I have added a mechanism for low-level isolation breaking handling
on kernel entry, it also partially improved the problem with
completeness. Partially because I have not yet added handling of
"unknown cause" before returning to userspace, however that would be a
logical thing to do. Then if we entered kernel from isolation, did
something, and are returning to userspace still not knowing what kind
of isolation-breaking event happened, we can still trigger isolation
breaking.

Did I get it right, and you mean that we can remove all specific
handling of isolation breaking causes, except for syscall that exits
isolation, and report isolation breaking instead of normally returning
to userspace? Then isolation breaking will be handled reliably without
knowing the cause, and we can leave determining the cause to the
tracing mechanism (if enabled)?

This does make sense. However for me it looks somewhat strange, because
I assume isolation breaking to be a kind of runtime error, that
userspace software is supposed to get some basic information about --
like, signals distinguishing between, say, SIGSEGV and SIGPIPE, or
write() being able to set errno to ENOSPC or EIO. Then userspace
receives basic information about the cause of exception or error, and
can do some meaningful reporting, or decide if the error should be
fatal for the application or handled differently, based on its internal
logic. To get those distinctions, application does not have to be aware
of anything internal to the kernel.

Similarly distinguishing between, say, a page fault, device interrupt
and a timer may be important for a logic implemented in userspace, and
I think, it may be nice to allow userspace to get this information
immediately and without being aware of any additional details of kernel
implementation. The current patch doesn't do this yet, however the
intention is to implement reliable isolation breaking by checking on
userspace re-entry, plus make reporting of causes, if any were found,
visible to the userspace in some convenient way.

The part that determines the cause can be implemented separately from
isolation breaking mechanism. Then we can have isolation breaking on
kernel entry (or potentially some other condition on kernel entry that
requires logging the cause) enable reporting, then reporting mechanism,
if it exists will fill the blanks, and once either cause is known, or
it's time to return to userspace, notification will be done with
whatever information is available. For some in-depth analysis, if
necessary for debugging the kernel, we can have tracing check if we are
in this "suspicious kernel entry" mode, and log things that otherwise
would not be.

> As for the source of the disturbances, if you really need that
> information,
> you can trace the workqueue and timer queue events and just filter
> those that
> target your isolated CPUs.

For the purpose of human debugging the kernel or application, the more
information is (usually) the better, so the only concern here is that
now user is responsible for completeness of things he is tracing.
However from application's point of view, or for logging in a
production environment it's usually more important to get general type
of events, so it's possible to, say, confirm that 

Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-06 Thread Frederic Weisbecker
On Mon, Oct 05, 2020 at 02:52:49PM -0400, Nitesh Narayan Lal wrote:
> 
> On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> > On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote:
> >> On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:
> >>> External Email
> >>>
> >>> ---
> >>> ---
> >>> On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote:
>  +/*
>  + * Description of the last two tasks that ran isolated on a given
>  CPU.
>  + * This is intended only for messages about isolation breaking. We
>  + * don't want any references to actual task while accessing this
>  from
>  + * CPU that caused isolation breaking -- we know nothing about
>  timing
>  + * and don't want to use locking or RCU.
>  + */
>  +struct isol_task_desc {
>  +atomic_t curr_index;
>  +atomic_t curr_index_wr;
>  +boolwarned[2];
>  +pid_t   pid[2];
>  +pid_t   tgid[2];
>  +charcomm[2][TASK_COMM_LEN];
>  +};
>  +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs);
> >>> So that's quite a huge patch that would have needed to be split up.
> >>> Especially this tracing engine.
> >>>
> >>> Speaking of which, I agree with Thomas that it's unnecessary. It's
> >>> too much
> >>> code and complexity. We can use the existing trace events and perform
> >>> the
> >>> analysis from userspace to find the source of the disturbance.
> >> The idea behind this is that isolation breaking events are supposed to
> >> be known to the applications while applications run normally, and they
> >> should not require any analysis or human intervention to be handled.
> > Sure but you can use trace events for that. Just trace interrupts, 
> > workqueues,
> > timers, syscalls, exceptions and scheduler events and you get all the local
> > disturbance. You might want to tune a few filters but that's pretty much it.
> >
> > As for the source of the disturbances, if you really need that information,
> > you can trace the workqueue and timer queue events and just filter those 
> > that
> > target your isolated CPUs.
> >
> 
> I agree that we can do all those things with tracing.
> However, IMHO having a simplified logging mechanism to gather the source of
> violation may help in reducing the manual effort.
> 
> Although, I am not sure how easy will it be to maintain such an interface
> over time.

The thing is: tracing is your simplified logging mechanism here. You can achieve
the same in userspace with _way_ less code, no race, and you can do it in
bash.

Thanks.





Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-05 Thread Nitesh Narayan Lal

On 10/4/20 7:14 PM, Frederic Weisbecker wrote:
> On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote:
>> On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:
>>> External Email
>>>
>>> ---
>>> ---
>>> On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote:
 +/*
 + * Description of the last two tasks that ran isolated on a given
 CPU.
 + * This is intended only for messages about isolation breaking. We
 + * don't want any references to actual task while accessing this
 from
 + * CPU that caused isolation breaking -- we know nothing about
 timing
 + * and don't want to use locking or RCU.
 + */
 +struct isol_task_desc {
 +  atomic_t curr_index;
 +  atomic_t curr_index_wr;
 +  boolwarned[2];
 +  pid_t   pid[2];
 +  pid_t   tgid[2];
 +  charcomm[2][TASK_COMM_LEN];
 +};
 +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs);
>>> So that's quite a huge patch that would have needed to be split up.
>>> Especially this tracing engine.
>>>
>>> Speaking of which, I agree with Thomas that it's unnecessary. It's
>>> too much
>>> code and complexity. We can use the existing trace events and perform
>>> the
>>> analysis from userspace to find the source of the disturbance.
>> The idea behind this is that isolation breaking events are supposed to
>> be known to the applications while applications run normally, and they
>> should not require any analysis or human intervention to be handled.
> Sure but you can use trace events for that. Just trace interrupts, workqueues,
> timers, syscalls, exceptions and scheduler events and you get all the local
> disturbance. You might want to tune a few filters but that's pretty much it.
>
> As for the source of the disturbances, if you really need that information,
> you can trace the workqueue and timer queue events and just filter those that
> target your isolated CPUs.
>

I agree that we can do all those things with tracing.
However, IMHO having a simplified logging mechanism to gather the source of
violation may help in reducing the manual effort.

Although, I am not sure how easy will it be to maintain such an interface
over time.

--
Thanks
Nitesh



signature.asc
Description: OpenPGP digital signature


Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-04 Thread Frederic Weisbecker
On Sun, Oct 04, 2020 at 02:44:39PM +, Alex Belits wrote:
> On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:
> > External Email
> > 
> > ---
> > ---
> > On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote:
> > > +/*
> > > + * Description of the last two tasks that ran isolated on a given
> > > CPU.
> > > + * This is intended only for messages about isolation breaking. We
> > > + * don't want any references to actual task while accessing this
> > > from
> > > + * CPU that caused isolation breaking -- we know nothing about
> > > timing
> > > + * and don't want to use locking or RCU.
> > > + */
> > > +struct isol_task_desc {
> > > + atomic_t curr_index;
> > > + atomic_t curr_index_wr;
> > > + boolwarned[2];
> > > + pid_t   pid[2];
> > > + pid_t   tgid[2];
> > > + charcomm[2][TASK_COMM_LEN];
> > > +};
> > > +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs);
> > 
> > So that's quite a huge patch that would have needed to be split up.
> > Especially this tracing engine.
> > 
> > Speaking of which, I agree with Thomas that it's unnecessary. It's
> > too much
> > code and complexity. We can use the existing trace events and perform
> > the
> > analysis from userspace to find the source of the disturbance.
> 
> The idea behind this is that isolation breaking events are supposed to
> be known to the applications while applications run normally, and they
> should not require any analysis or human intervention to be handled.

Sure but you can use trace events for that. Just trace interrupts, workqueues,
timers, syscalls, exceptions and scheduler events and you get all the local
disturbance. You might want to tune a few filters but that's pretty much it.

As for the source of the disturbances, if you really need that information,
you can trace the workqueue and timer queue events and just filter those that
target your isolated CPUs.

> 
> A process may exit isolation because some leftover delayed work, for
> example, a timer or a workqueue, is still present on a CPU, or because
> a page fault or some other exception, normally handled silently, is
> caused by the task. It is also possible to direct an interrupt to a CPU
> that is running an isolated task -- currently it's perfectly valid to
> set interrupt smp affinity to a CPU running isolated task, and then
> interrupt will cause breaking isolation. While it's probably not the
> best way of handling interrupts, I would rather not prohibit this
> explicitly.

Sure, but you can trace all these events with the existing tracing
interface we have.

> 
> There is also a matter of avoiding race conditions on entering
> isolation. Once CPU entered isolation, other CPUs should avoid
> disturbing it when they know that CPU is running a task in isolated
> mode. However for a short time after entering isolation other CPUs may
> be unaware of this, and will still send IPIs to it. Preventing this
> scenario completely would be very costly in terms of what other CPUs
> will have to do before notifying others, so similar to how EINTR works,
> we can simply specify that this is allowed, and task is supposed to re-
> enter isolation after this. It's still a bad idea to specify that
> isolation breaking can continue happening while application is running
> in isolated mode, however allowing some "grace period" after entering
> is acceptable as long as application is aware of this happening.

Right but that doesn't look related to tracing. Anyway I guess we
can make the CPU enter some specific mode after calling synchronize_rcu().

> 
> In libtmc I have moved this handling of isolation breaking into a
> separate thread, intended to become a separate daemon if necessary. In
> part it was done because initial implementation of isolation made it
> very difficult to avoid repeating delayed work on isolated CPUs, so
> something had to watch for it from non-isolated CPU. It's possible that
> now, when delayed work does not appear on isolated CPUs out of nowhere,
> the need in isolation manager thread will disappear, and task itself
> will be able to handle all isolation breaking, like original
> implementation by Chris was supposed to.
> 
> However in either case it's still useful for the task, or isolation
> manager, to get a description of the isolation-breaking event. This is
> what those things are intended for. Now they only produce log messages
> because this is where initially all description of isolation-breaking
> events went, however I would prefer to make logging optional but always
> let applications read those events descriptions, regardless of any
> tracing mechanism being used. I was more focused on making the
> reporting mechanism properly detect the cause of isolation breaking
> because that functionality was not quite working in earlier work by
> Chris and Yuri, so I have kept logging as the only output, but made it
> suitable for producing events that applications 

Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-04 Thread Alex Belits

On Thu, 2020-10-01 at 16:40 +0200, Frederic Weisbecker wrote:
> External Email
> 
> ---
> ---
> On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote:
> > +/**
> > + * task_isolation_kernel_enter() - clear low-level task isolation
> > flag
> > + *
> > + * This should be called immediately after entering kernel.
> > + */
> > +static inline void task_isolation_kernel_enter(void)
> > +{
> > +   unsigned long flags;
> > +
> > +   /*
> > +* This function runs on a CPU that ran isolated task.
> > +*
> > +* We don't want this CPU running code from the rest of kernel
> > +* until other CPUs know that it is no longer isolated.
> > +* When CPU is running isolated task until this point anything
> > +* that causes an interrupt on this CPU must end up calling
> > this
> > +* before touching the rest of kernel. That is, this function
> > or
> > +* fast_task_isolation_cpu_cleanup() or stop_isolation()
> > calling
> > +* it. If any interrupt, including scheduling timer, arrives,
> > it
> > +* will still end up here early after entering kernel.
> > +* From this point interrupts are disabled until all CPUs will
> > see
> > +* that this CPU is no longer running isolated task.
> > +*
> > +* See also fast_task_isolation_cpu_cleanup().
> > +*/
> > +   smp_rmb();
> 
> I'm a bit confused what this read memory barrier is ordering. Also
> against
> what it pairs.

My bad, I have kept it after there were left no write accesses from
other CPUs.

> 
> > +   if((this_cpu_read(ll_isol_flags) & FLAG_LL_TASK_ISOLATION) ==
> > 0)
> > +   return;
> > +
> > +   local_irq_save(flags);
> > +
> > +   /* Clear low-level flags */
> > +   this_cpu_write(ll_isol_flags, 0);
> > +
> > +   /*
> > +* If something happened that requires a barrier that would
> > +* otherwise be called from remote CPUs by CPU kick procedure,
> > +* this barrier runs instead of it. After this barrier, CPU
> > +* kick procedure would see the updated ll_isol_flags, so it
> > +* will run its own IPI to trigger a barrier.
> > +*/
> > +   smp_mb();
> > +   /*
> > +* Synchronize instructions -- this CPU was not kicked while
> > +* in isolated mode, so it might require synchronization.
> > +* There might be an IPI if kick procedure happened and
> > +* ll_isol_flags was already updated while it assembled a CPU
> > +* mask. However if this did not happen, synchronize everything
> > +* here.
> > +*/
> > +   instr_sync();
> 
> It's the first time I meet an instruction barrier. I should get
> information
> about that but what is it ordering here?

Against barriers in instruction cache flushing (flush_icache_range()
and such). 

> > +   local_irq_restore(flags);
> > +}
> 
> Thanks.



Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard isolation from kernel

2020-10-04 Thread Alex Belits
On Thu, 2020-10-01 at 15:56 +0200, Frederic Weisbecker wrote:
> External Email
> 
> ---
> ---
> On Wed, Jul 22, 2020 at 02:49:49PM +, Alex Belits wrote:
> > +/*
> > + * Description of the last two tasks that ran isolated on a given
> > CPU.
> > + * This is intended only for messages about isolation breaking. We
> > + * don't want any references to actual task while accessing this
> > from
> > + * CPU that caused isolation breaking -- we know nothing about
> > timing
> > + * and don't want to use locking or RCU.
> > + */
> > +struct isol_task_desc {
> > +   atomic_t curr_index;
> > +   atomic_t curr_index_wr;
> > +   boolwarned[2];
> > +   pid_t   pid[2];
> > +   pid_t   tgid[2];
> > +   charcomm[2][TASK_COMM_LEN];
> > +};
> > +static DEFINE_PER_CPU(struct isol_task_desc, isol_task_descs);
> 
> So that's quite a huge patch that would have needed to be split up.
> Especially this tracing engine.
> 
> Speaking of which, I agree with Thomas that it's unnecessary. It's
> too much
> code and complexity. We can use the existing trace events and perform
> the
> analysis from userspace to find the source of the disturbance.

The idea behind this is that isolation breaking events are supposed to
be known to the applications while applications run normally, and they
should not require any analysis or human intervention to be handled.

A process may exit isolation because some leftover delayed work, for
example, a timer or a workqueue, is still present on a CPU, or because
a page fault or some other exception, normally handled silently, is
caused by the task. It is also possible to direct an interrupt to a CPU
that is running an isolated task -- currently it's perfectly valid to
set interrupt smp affinity to a CPU running isolated task, and then
interrupt will cause breaking isolation. While it's probably not the
best way of handling interrupts, I would rather not prohibit this
explicitly.

There is also a matter of avoiding race conditions on entering
isolation. Once CPU entered isolation, other CPUs should avoid
disturbing it when they know that CPU is running a task in isolated
mode. However for a short time after entering isolation other CPUs may
be unaware of this, and will still send IPIs to it. Preventing this
scenario completely would be very costly in terms of what other CPUs
will have to do before notifying others, so similar to how EINTR works,
we can simply specify that this is allowed, and task is supposed to re-
enter isolation after this. It's still a bad idea to specify that
isolation breaking can continue happening while application is running
in isolated mode, however allowing some "grace period" after entering
is acceptable as long as application is aware of this happening.

In libtmc I have moved this handling of isolation breaking into a
separate thread, intended to become a separate daemon if necessary. In
part it was done because initial implementation of isolation made it
very difficult to avoid repeating delayed work on isolated CPUs, so
something had to watch for it from non-isolated CPU. It's possible that
now, when delayed work does not appear on isolated CPUs out of nowhere,
the need in isolation manager thread will disappear, and task itself
will be able to handle all isolation breaking, like original
implementation by Chris was supposed to.

However in either case it's still useful for the task, or isolation
manager, to get a description of the isolation-breaking event. This is
what those things are intended for. Now they only produce log messages
because this is where initially all description of isolation-breaking
events went, however I would prefer to make logging optional but always
let applications read those events descriptions, regardless of any
tracing mechanism being used. I was more focused on making the
reporting mechanism properly detect the cause of isolation breaking
because that functionality was not quite working in earlier work by
Chris and Yuri, so I have kept logging as the only output, but made it
suitable for producing events that applications will be able to
receive. Application, or isolation manager, will receive clear and
unambiguous reporting, so there will be no need for any additional
analysis or guesswork.

After adding a proper "low-level" isolation flags, I got the idea that
we might have a better yet reporting mechanism. Early isolation
breaking detection on kernel entry may set a flag that says that
isolation breaking happened, however its cause is unknown. Or, more
likely, only some general information about isolation breaking is
available, like a type of exception. Then, once a known isolation-
breaking reporting mechanism is called from interrupt, syscall, IPI or
exception processing, the flag is cleared, and reporting is supposed to
be done. However if then kernel returns to userspace on isolated task
but isolation breaking is not reported yet, an isolation breaking
reporting