Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-17 Thread Roger Pau Monné
On Mon, Dec 17, 2018 at 09:47:30AM -0700, Jan Beulich wrote:
> >>> On 17.12.18 at 16:42,  wrote:
> > On Fri, Dec 14, 2018 at 04:52:00AM -0700, Jan Beulich wrote:
> >> >>> On 14.12.18 at 12:45,  wrote:
> >> > On Fri, Dec 14, 2018 at 03:45:21AM -0700, Jan Beulich wrote:
> >> >> >>> On 14.12.18 at 11:03,  wrote:
> >> >> > I expect the interdomain locking as a result of using a paging caller
> >> >> > domain is going to be restricted to the p2m lock of the caller domain,
> >> >> > as a result of the usage of copy to/from helpers.
> >> >> > 
> >> >> > Maybe the less intrusive change would be to just allow locking the
> >> >> > caller p2m lock (that only lock) regardless of the subject domain lock
> >> >> > level?
> >> >> 
> >> >> With caller != subject, and with the lock level then being higher
> >> >> than all "normal" ones, this might be an option. But from the
> >> >> very beginning we should keep the transitive aspect here in
> >> >> mind: If Dom0 controls a PVH domain which controls a HVM one,
> >> >> the Dom0 p2m lock may also need allowing to nest inside the PVH
> >> >> DomU's one, so it'll be a total of two extra new lock levels even
> >> >> if we restrict this to the p2m locks.
> >> > 
> >> > I'm not sure I follow here, so far we have spoken about a subject and
> >> > a caller domain (subject being the target of the operation). In the
> >> > above scenario I see a relation between Dom0 and the PVH domain, and
> >> > between the PVH domain and the HVM one, but not a relation that
> >> > encompasses the three domains in terms of mm locking.
> >> > 
> >> > Dom0 (caller) mm lock could be nested inside the PVH DomU (subject) mm
> >> > locks, and the PVH DomU (caller) locks could be nested inside the HVM
> >> > domain (subject) locks, but I don't see an operation where Dom0 mm
> >> > lock could be nested inside of a PVH DomU lock that's already nested
> >> > inside of the HVM DomU lock.
> >> 
> >> Well, if we're _sure_ this can't happen, then of course we also
> >> don't need to deal with it.
> > 
> > So I've got a patch that adds a positive offset to lock priority when
> > the lock owner is the current domain. This allows locking other
> > domains (subject) mm locks and then take the current domain's mm locks.
> > 
> > There's one slight issue with the page sharing lock, that will always
> > be locked as belonging to a subject domain, due to the fact that
> > there's no clear owner of such lock. AFAICT some page-sharing routines
> > are even run from the idle thread.
> > 
> > I'm attaching such patch below, note it keeps the same lock order as
> > before, but allows taking the caller locks _only_ after having took
> > some of the subject mm locks. Note this doesn't allow interleaved mm
> > locking between the subject and the caller.
> > 
> > The nice part about this patch is that changes are mostly confined to
> > mm-locks.h, there's only an extra change to xen/arch/x86/mm/p2m-pod.c.
> 
> Yes, this looks quite reasonable at the first glance. One thing I
> don't understand is why you compare domain IDs instead of
> struct domain pointers.

This patches started as something else, where I planned to store the
domid, and I just reused it. Comparing domain pointers is indeed
better.

> The other this is that perhaps it would
> be a good idea to factor out into a macro or inline function the
> biasing construct.

Sure.

While there I might also add a pre-patch to convert __check_lock_level
and __set_lock_level into a function instead of a macro.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-17 Thread Jan Beulich
>>> On 17.12.18 at 16:42,  wrote:
> On Fri, Dec 14, 2018 at 04:52:00AM -0700, Jan Beulich wrote:
>> >>> On 14.12.18 at 12:45,  wrote:
>> > On Fri, Dec 14, 2018 at 03:45:21AM -0700, Jan Beulich wrote:
>> >> >>> On 14.12.18 at 11:03,  wrote:
>> >> > I expect the interdomain locking as a result of using a paging caller
>> >> > domain is going to be restricted to the p2m lock of the caller domain,
>> >> > as a result of the usage of copy to/from helpers.
>> >> > 
>> >> > Maybe the less intrusive change would be to just allow locking the
>> >> > caller p2m lock (that only lock) regardless of the subject domain lock
>> >> > level?
>> >> 
>> >> With caller != subject, and with the lock level then being higher
>> >> than all "normal" ones, this might be an option. But from the
>> >> very beginning we should keep the transitive aspect here in
>> >> mind: If Dom0 controls a PVH domain which controls a HVM one,
>> >> the Dom0 p2m lock may also need allowing to nest inside the PVH
>> >> DomU's one, so it'll be a total of two extra new lock levels even
>> >> if we restrict this to the p2m locks.
>> > 
>> > I'm not sure I follow here, so far we have spoken about a subject and
>> > a caller domain (subject being the target of the operation). In the
>> > above scenario I see a relation between Dom0 and the PVH domain, and
>> > between the PVH domain and the HVM one, but not a relation that
>> > encompasses the three domains in terms of mm locking.
>> > 
>> > Dom0 (caller) mm lock could be nested inside the PVH DomU (subject) mm
>> > locks, and the PVH DomU (caller) locks could be nested inside the HVM
>> > domain (subject) locks, but I don't see an operation where Dom0 mm
>> > lock could be nested inside of a PVH DomU lock that's already nested
>> > inside of the HVM DomU lock.
>> 
>> Well, if we're _sure_ this can't happen, then of course we also
>> don't need to deal with it.
> 
> So I've got a patch that adds a positive offset to lock priority when
> the lock owner is the current domain. This allows locking other
> domains (subject) mm locks and then take the current domain's mm locks.
> 
> There's one slight issue with the page sharing lock, that will always
> be locked as belonging to a subject domain, due to the fact that
> there's no clear owner of such lock. AFAICT some page-sharing routines
> are even run from the idle thread.
> 
> I'm attaching such patch below, note it keeps the same lock order as
> before, but allows taking the caller locks _only_ after having took
> some of the subject mm locks. Note this doesn't allow interleaved mm
> locking between the subject and the caller.
> 
> The nice part about this patch is that changes are mostly confined to
> mm-locks.h, there's only an extra change to xen/arch/x86/mm/p2m-pod.c.

Yes, this looks quite reasonable at the first glance. One thing I
don't understand is why you compare domain IDs instead of
struct domain pointers. The other this is that perhaps it would
be a good idea to factor out into a macro or inline function the
biasing construct.

> I am however not sure how to prove there are no valid paths that could
> require a different lock ordering, am I expected to check all possible
> code paths for mm lock ordering?

Well, this is a really good question. To do such an audit properly
would require quite a bit of time, so I think it would be unfair to
demand this of you. Yet without such a proof we risk running into
unforeseen issues with this sooner or later. I have no good idea.
George?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-17 Thread Roger Pau Monné
On Fri, Dec 14, 2018 at 04:52:00AM -0700, Jan Beulich wrote:
> >>> On 14.12.18 at 12:45,  wrote:
> > On Fri, Dec 14, 2018 at 03:45:21AM -0700, Jan Beulich wrote:
> >> >>> On 14.12.18 at 11:03,  wrote:
> >> > I expect the interdomain locking as a result of using a paging caller
> >> > domain is going to be restricted to the p2m lock of the caller domain,
> >> > as a result of the usage of copy to/from helpers.
> >> > 
> >> > Maybe the less intrusive change would be to just allow locking the
> >> > caller p2m lock (that only lock) regardless of the subject domain lock
> >> > level?
> >> 
> >> With caller != subject, and with the lock level then being higher
> >> than all "normal" ones, this might be an option. But from the
> >> very beginning we should keep the transitive aspect here in
> >> mind: If Dom0 controls a PVH domain which controls a HVM one,
> >> the Dom0 p2m lock may also need allowing to nest inside the PVH
> >> DomU's one, so it'll be a total of two extra new lock levels even
> >> if we restrict this to the p2m locks.
> > 
> > I'm not sure I follow here, so far we have spoken about a subject and
> > a caller domain (subject being the target of the operation). In the
> > above scenario I see a relation between Dom0 and the PVH domain, and
> > between the PVH domain and the HVM one, but not a relation that
> > encompasses the three domains in terms of mm locking.
> > 
> > Dom0 (caller) mm lock could be nested inside the PVH DomU (subject) mm
> > locks, and the PVH DomU (caller) locks could be nested inside the HVM
> > domain (subject) locks, but I don't see an operation where Dom0 mm
> > lock could be nested inside of a PVH DomU lock that's already nested
> > inside of the HVM DomU lock.
> 
> Well, if we're _sure_ this can't happen, then of course we also
> don't need to deal with it.

So I've got a patch that adds a positive offset to lock priority when
the lock owner is the current domain. This allows locking other
domains (subject) mm locks and then take the current domain's mm locks.

There's one slight issue with the page sharing lock, that will always
be locked as belonging to a subject domain, due to the fact that
there's no clear owner of such lock. AFAICT some page-sharing routines
are even run from the idle thread.

I'm attaching such patch below, note it keeps the same lock order as
before, but allows taking the caller locks _only_ after having took
some of the subject mm locks. Note this doesn't allow interleaved mm
locking between the subject and the caller.

The nice part about this patch is that changes are mostly confined to
mm-locks.h, there's only an extra change to xen/arch/x86/mm/p2m-pod.c.

I am however not sure how to prove there are no valid paths that could
require a different lock ordering, am I expected to check all possible
code paths for mm lock ordering?

I think this solution introduces the minimum amount of modifications
to fix the current issue, ie: paging caller being able to use copy
to/from after taking subject mm locks.

Thanks, Roger.

---8<---
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 95295b62d2..a47beaf893 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -46,29 +46,36 @@ static inline int mm_locked_by_me(mm_lock_t *l)
 return (l->lock.recurse_cpu == current->processor);
 }
 
+#define MM_LOCK_ORDER_MAX64
+
 /*
  * If you see this crash, the numbers printed are order levels defined
  * in this file.
  */
-#define __check_lock_level(l)   \
-do {\
-if ( unlikely(__get_lock_level() > (l)) )   \
-{   \
-printk("mm locking order violation: %i > %i\n", \
-   __get_lock_level(), (l));\
-BUG();  \
-}   \
+#define __check_lock_level(d, l)\
+do {\
+if ( unlikely(__get_lock_level() >  \
+ (l) + ((d) == current->domain->domain_id ? \
+MM_LOCK_ORDER_MAX : 0)) )   \
+{   \
+printk("mm locking order violation: %i > %i\n", \
+   __get_lock_level(),  \
+   (l) + ((d) == current->domain->domain_id ?   \
+  MM_LOCK_ORDER_MAX : 0));  \
+BUG();  \
+}   \
 } while(0)
 
-#define __set_lock_level(l) \
-do {\
-__get_lock_level() = (l);   \
+#define __set_lock_level(l) \
+do {   

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-14 Thread Jan Beulich
>>> On 14.12.18 at 12:45,  wrote:
> On Fri, Dec 14, 2018 at 03:45:21AM -0700, Jan Beulich wrote:
>> >>> On 14.12.18 at 11:03,  wrote:
>> > I expect the interdomain locking as a result of using a paging caller
>> > domain is going to be restricted to the p2m lock of the caller domain,
>> > as a result of the usage of copy to/from helpers.
>> > 
>> > Maybe the less intrusive change would be to just allow locking the
>> > caller p2m lock (that only lock) regardless of the subject domain lock
>> > level?
>> 
>> With caller != subject, and with the lock level then being higher
>> than all "normal" ones, this might be an option. But from the
>> very beginning we should keep the transitive aspect here in
>> mind: If Dom0 controls a PVH domain which controls a HVM one,
>> the Dom0 p2m lock may also need allowing to nest inside the PVH
>> DomU's one, so it'll be a total of two extra new lock levels even
>> if we restrict this to the p2m locks.
> 
> I'm not sure I follow here, so far we have spoken about a subject and
> a caller domain (subject being the target of the operation). In the
> above scenario I see a relation between Dom0 and the PVH domain, and
> between the PVH domain and the HVM one, but not a relation that
> encompasses the three domains in terms of mm locking.
> 
> Dom0 (caller) mm lock could be nested inside the PVH DomU (subject) mm
> locks, and the PVH DomU (caller) locks could be nested inside the HVM
> domain (subject) locks, but I don't see an operation where Dom0 mm
> lock could be nested inside of a PVH DomU lock that's already nested
> inside of the HVM DomU lock.

Well, if we're _sure_ this can't happen, then of course we also
don't need to deal with it.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-14 Thread Roger Pau Monné
On Fri, Dec 14, 2018 at 03:45:21AM -0700, Jan Beulich wrote:
> >>> On 14.12.18 at 11:03,  wrote:
> > I expect the interdomain locking as a result of using a paging caller
> > domain is going to be restricted to the p2m lock of the caller domain,
> > as a result of the usage of copy to/from helpers.
> > 
> > Maybe the less intrusive change would be to just allow locking the
> > caller p2m lock (that only lock) regardless of the subject domain lock
> > level?
> 
> With caller != subject, and with the lock level then being higher
> than all "normal" ones, this might be an option. But from the
> very beginning we should keep the transitive aspect here in
> mind: If Dom0 controls a PVH domain which controls a HVM one,
> the Dom0 p2m lock may also need allowing to nest inside the PVH
> DomU's one, so it'll be a total of two extra new lock levels even
> if we restrict this to the p2m locks.

I'm not sure I follow here, so far we have spoken about a subject and
a caller domain (subject being the target of the operation). In the
above scenario I see a relation between Dom0 and the PVH domain, and
between the PVH domain and the HVM one, but not a relation that
encompasses the three domains in terms of mm locking.

Dom0 (caller) mm lock could be nested inside the PVH DomU (subject) mm
locks, and the PVH DomU (caller) locks could be nested inside the HVM
domain (subject) locks, but I don't see an operation where Dom0 mm
lock could be nested inside of a PVH DomU lock that's already nested
inside of the HVM DomU lock.

> 
> Furthermore, if we limited this to the p2m locks, it would become
> illegal to obtain any of the higher lock level locks (i.e. ones that
> nest inside the p2m lock) for the caller domain. I'm not sure
> that's a good idea. As mentioned before, if caller locks collectively
> all get obtained inside any subject domain ones, applying a bias
> to all lock levels would perhaps be preferable. The question is
> whether this criteria holds.

So far I can only think of paging callers requiring the p2m lock after
having locked an arbitrary number of subject domain mm locks in order
to perform the copy to/from of the operation data.

IMO a bias applied to caller locks when nested inside the subject
locks looks like the best solution.

> I also as of yet can't see how this could be expressed
> transparently to existing code. Yet obviously having to change
> various p2m_lock() invocations would be rather undesirable.

Most of the lock callers can be easily adjusted. Take p2m_lock for
example, the lock owner domain can be obtained from the p2m_domain
parameter (p2m_domain->domain) and this could be checked against
current->domain in order to figure out if the lock belongs to the
subject or the caller domain.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-14 Thread Jan Beulich
>>> On 14.12.18 at 11:03,  wrote:
> I expect the interdomain locking as a result of using a paging caller
> domain is going to be restricted to the p2m lock of the caller domain,
> as a result of the usage of copy to/from helpers.
> 
> Maybe the less intrusive change would be to just allow locking the
> caller p2m lock (that only lock) regardless of the subject domain lock
> level?

With caller != subject, and with the lock level then being higher
than all "normal" ones, this might be an option. But from the
very beginning we should keep the transitive aspect here in
mind: If Dom0 controls a PVH domain which controls a HVM one,
the Dom0 p2m lock may also need allowing to nest inside the PVH
DomU's one, so it'll be a total of two extra new lock levels even
if we restrict this to the p2m locks.

Furthermore, if we limited this to the p2m locks, it would become
illegal to obtain any of the higher lock level locks (i.e. ones that
nest inside the p2m lock) for the caller domain. I'm not sure
that's a good idea. As mentioned before, if caller locks collectively
all get obtained inside any subject domain ones, applying a bias
to all lock levels would perhaps be preferable. The question is
whether this criteria holds.

I also as of yet can't see how this could be expressed
transparently to existing code. Yet obviously having to change
various p2m_lock() invocations would be rather undesirable.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-14 Thread Roger Pau Monné
On Thu, Dec 13, 2018 at 08:53:22AM -0700, Jan Beulich wrote:
> >>> On 13.12.18 at 16:34,  wrote:
> > On Thu, Dec 13, 2018 at 07:52:16AM -0700, Jan Beulich wrote:
> >> >>> On 13.12.18 at 15:14,  wrote:
> >> > On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote:
> >> >> >>> On 13.12.18 at 12:39,  wrote:
> >> >> > Well, Just keeping correct order between each domain locks should be
> >> >> > enough?
> >> >> > 
> >> >> > Ie: exactly the same that Xen currently does but on a per-domain
> >> >> > basis. This is feasible, but each CPU would need to store the lock
> >> >> > order of each possible domain:
> >> >> > 
> >> >> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
> >> >> > 
> >> >> > This would consume ~32KB per CPU, which is not that much but seems a
> >> >> > waste when most of the time a single entry will be used.
> >> >> 
> >> >> Well, tracking by domain ID wouldn't help you - the controlling
> >> >> domain may well have a higher ID than the being controlled one,
> >> >> i.e. the nesting you want needs to be independent of domain ID.
> >> > 
> >> > It's not tracking the domain ID, but rather tracking the lock level of
> >> > each different domain, hence the need for the array in the pcpu
> >> > structure. The lock checker would take a domain id and a level, and
> >> > perform the check as:
> >> > 
> >> > if ( mm_lock_level[domid] > level )
> >> > panic
> >> 
> >> But this would open things up for deadlocks because of intermixed
> >> lock usage between the calling domain's and the subject one's.
> >> There needs to be a linear sequence of locks (of all involved
> >> domains) describing the one and only order in which they may be
> >> acquired.
> > 
> > Well, my plan was to only check for deadlocks between the locks of the
> > same domain, without taking into account intermixed domain locking.
> > 
> > I guess at this point I will need some input from Tim and George about
> > how to proceed, because I'm not sure how to weight locks when using
> > intermixed domain locks, neither what is the correct order. The order
> > in paging_log_dirty_op looks like a valid order that we want to
> > support, but are there any others?
> > 
> > Is it possible to have multiple valid interdomain lock orders that
> > cannot be expressed using the current weighted lock ordering?
> 
> Well, first of all I'm afraid I didn't look closely enough at you
> original mail: We're not talking about the paging lock of two
> domains here, but about he paging lock of the subject domain
> and dom0's p2m lock.
> 
> Second I then notice that
> 
> (XEN) mm locking order violation: 64 > 16
> 
> indicates that it might not have complained when two similar
> locks of different domains were acquired in a nested fashion,
> which I'd call a shortcoming that would be nice to eliminate at
> this same occasion.

Yes, that's a current shortcoming, but then I'm not sure if such case
would be a violation of the lock ordering if the locks belong to
different domains and arbitrary interdomain locking is allowed.

> And third, to answer your question, I can't see anything
> conceptually wrong with an arbitrary intermix of locks from
> two different domains, as long their inner-domain ordering is
> correct. E.g. a Dom0 hypercall may find a need to acquire
> - the subject domain's p2m lock
> - its own p2m lock
> - the subject domain's PoD lock
> - its own paging lock

OK, so if the plan is to allow arbitrary intermix of locks from
different domains then using a per-domain lock level tracking seems
like the best option, along the lines of what I was proposing earlier:

if ( mm_lock_level[domid] > level )
panic

> Of course it may be possible to determine that "own" locks
> are not supposed to be acquired outside of any "subject
> domain" ones, in which case we'd have a workable hierarchy
> (along the lines of what you had earlier suggested).

I expect the interdomain locking as a result of using a paging caller
domain is going to be restricted to the p2m lock of the caller domain,
as a result of the usage of copy to/from helpers.

Maybe the less intrusive change would be to just allow locking the
caller p2m lock (that only lock) regardless of the subject domain lock
level?

> But I'm
> not sure how expensive (in terms of code auditing) such
> determination is going be, which is why for the moment I'm
> trying to think of a solution (ordering criteria) for the general
> case.

Yes, I think auditing current interdomain locking (if there's more
apart from the paging logdirty hypercall) will be expensive.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Jan Beulich
>>> On 13.12.18 at 16:34,  wrote:
> On Thu, Dec 13, 2018 at 07:52:16AM -0700, Jan Beulich wrote:
>> >>> On 13.12.18 at 15:14,  wrote:
>> > On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote:
>> >> >>> On 13.12.18 at 12:39,  wrote:
>> >> > Well, Just keeping correct order between each domain locks should be
>> >> > enough?
>> >> > 
>> >> > Ie: exactly the same that Xen currently does but on a per-domain
>> >> > basis. This is feasible, but each CPU would need to store the lock
>> >> > order of each possible domain:
>> >> > 
>> >> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
>> >> > 
>> >> > This would consume ~32KB per CPU, which is not that much but seems a
>> >> > waste when most of the time a single entry will be used.
>> >> 
>> >> Well, tracking by domain ID wouldn't help you - the controlling
>> >> domain may well have a higher ID than the being controlled one,
>> >> i.e. the nesting you want needs to be independent of domain ID.
>> > 
>> > It's not tracking the domain ID, but rather tracking the lock level of
>> > each different domain, hence the need for the array in the pcpu
>> > structure. The lock checker would take a domain id and a level, and
>> > perform the check as:
>> > 
>> > if ( mm_lock_level[domid] > level )
>> > panic
>> 
>> But this would open things up for deadlocks because of intermixed
>> lock usage between the calling domain's and the subject one's.
>> There needs to be a linear sequence of locks (of all involved
>> domains) describing the one and only order in which they may be
>> acquired.
> 
> Well, my plan was to only check for deadlocks between the locks of the
> same domain, without taking into account intermixed domain locking.
> 
> I guess at this point I will need some input from Tim and George about
> how to proceed, because I'm not sure how to weight locks when using
> intermixed domain locks, neither what is the correct order. The order
> in paging_log_dirty_op looks like a valid order that we want to
> support, but are there any others?
> 
> Is it possible to have multiple valid interdomain lock orders that
> cannot be expressed using the current weighted lock ordering?

Well, first of all I'm afraid I didn't look closely enough at you
original mail: We're not talking about the paging lock of two
domains here, but about he paging lock of the subject domain
and dom0's p2m lock.

Second I then notice that

(XEN) mm locking order violation: 64 > 16

indicates that it might not have complained when two similar
locks of different domains were acquired in a nested fashion,
which I'd call a shortcoming that would be nice to eliminate at
this same occasion.

And third, to answer your question, I can't see anything
conceptually wrong with an arbitrary intermix of locks from
two different domains, as long their inner-domain ordering is
correct. E.g. a Dom0 hypercall may find a need to acquire
- the subject domain's p2m lock
- its own p2m lock
- the subject domain's PoD lock
- its own paging lock
Of course it may be possible to determine that "own" locks
are not supposed to be acquired outside of any "subject
domain" ones, in which case we'd have a workable hierarchy
(along the lines of what you had earlier suggested). But I'm
not sure how expensive (in terms of code auditing) such
determination is going be, which is why for the moment I'm
trying to think of a solution (ordering criteria) for the general
case.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Roger Pau Monné
On Thu, Dec 13, 2018 at 07:52:16AM -0700, Jan Beulich wrote:
> >>> On 13.12.18 at 15:14,  wrote:
> > On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote:
> >> >>> On 13.12.18 at 12:39,  wrote:
> >> > Well, Just keeping correct order between each domain locks should be
> >> > enough?
> >> > 
> >> > Ie: exactly the same that Xen currently does but on a per-domain
> >> > basis. This is feasible, but each CPU would need to store the lock
> >> > order of each possible domain:
> >> > 
> >> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
> >> > 
> >> > This would consume ~32KB per CPU, which is not that much but seems a
> >> > waste when most of the time a single entry will be used.
> >> 
> >> Well, tracking by domain ID wouldn't help you - the controlling
> >> domain may well have a higher ID than the being controlled one,
> >> i.e. the nesting you want needs to be independent of domain ID.
> > 
> > It's not tracking the domain ID, but rather tracking the lock level of
> > each different domain, hence the need for the array in the pcpu
> > structure. The lock checker would take a domain id and a level, and
> > perform the check as:
> > 
> > if ( mm_lock_level[domid] > level )
> > panic
> 
> But this would open things up for deadlocks because of intermixed
> lock usage between the calling domain's and the subject one's.
> There needs to be a linear sequence of locks (of all involved
> domains) describing the one and only order in which they may be
> acquired.

Well, my plan was to only check for deadlocks between the locks of the
same domain, without taking into account intermixed domain locking.

I guess at this point I will need some input from Tim and George about
how to proceed, because I'm not sure how to weight locks when using
intermixed domain locks, neither what is the correct order. The order
in paging_log_dirty_op looks like a valid order that we want to
support, but are there any others?

Is it possible to have multiple valid interdomain lock orders that
cannot be expressed using the current weighted lock ordering?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Jan Beulich
>>> On 13.12.18 at 15:14,  wrote:
> On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote:
>> >>> On 13.12.18 at 12:39,  wrote:
>> > Well, Just keeping correct order between each domain locks should be
>> > enough?
>> > 
>> > Ie: exactly the same that Xen currently does but on a per-domain
>> > basis. This is feasible, but each CPU would need to store the lock
>> > order of each possible domain:
>> > 
>> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
>> > 
>> > This would consume ~32KB per CPU, which is not that much but seems a
>> > waste when most of the time a single entry will be used.
>> 
>> Well, tracking by domain ID wouldn't help you - the controlling
>> domain may well have a higher ID than the being controlled one,
>> i.e. the nesting you want needs to be independent of domain ID.
> 
> It's not tracking the domain ID, but rather tracking the lock level of
> each different domain, hence the need for the array in the pcpu
> structure. The lock checker would take a domain id and a level, and
> perform the check as:
> 
> if ( mm_lock_level[domid] > level )
> panic

But this would open things up for deadlocks because of intermixed
lock usage between the calling domain's and the subject one's.
There needs to be a linear sequence of locks (of all involved
domains) describing the one and only order in which they may be
acquired.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Roger Pau Monné
On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote:
> >>> On 13.12.18 at 12:39,  wrote:
> > On Wed, Dec 12, 2018 at 09:07:14AM -0700, Jan Beulich wrote:
> >> >>> On 12.12.18 at 15:54,  wrote:
> >> > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d,
> >> >  bytes = (unsigned int)((sc->pages - pages + 7) >> 
> >> > 3);
> >> >  if ( likely(peek) )
> >> >  {
> >> > +if ( paging_mode_enabled(current->domain) )
> >> > +/*
> >> > + * Drop the target p2m lock, or else Xen will 
> >> > panic
> >> > + * when trying to acquire the p2m lock of the 
> >> > caller
> >> > + * due to invalid lock order. Note that there 
> >> > are no
> >> > + * lock ordering issues here, and the panic is 
> >> > due to
> >> > + * the fact that the lock level tracking 
> >> > doesn't record
> >> > + * the domain the lock belongs to.
> >> > + */
> >> > +paging_unlock(d);
> >> 
> >> This makes it sound as if tracking the domain would help. It doesn't,
> >> at least not as long as there is not also some sort of ordering or
> >> hierarchy between domains. IOW I'd prefer if the "doesn't" became
> >> "can't".
> > 
> > Well, Just keeping correct order between each domain locks should be
> > enough?
> > 
> > Ie: exactly the same that Xen currently does but on a per-domain
> > basis. This is feasible, but each CPU would need to store the lock
> > order of each possible domain:
> > 
> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
> > 
> > This would consume ~32KB per CPU, which is not that much but seems a
> > waste when most of the time a single entry will be used.
> 
> Well, tracking by domain ID wouldn't help you - the controlling
> domain may well have a higher ID than the being controlled one,
> i.e. the nesting you want needs to be independent of domain ID.

It's not tracking the domain ID, but rather tracking the lock level of
each different domain, hence the need for the array in the pcpu
structure. The lock checker would take a domain id and a level, and
perform the check as:

if ( mm_lock_level[domid] > level )
panic

> >> Now before we go this route I think we need to consider whether
> >> this really is the only place where the two locks get into one
> >> another's way. That's because I don't think we want to introduce
> >> more of such, well, hackery, and hence we'd otherwise need a
> >> better solution. For example the locking model could be adjusted
> >> to allow such nesting in the general case: Dom0 and any domain
> >> whose ->target matches the subject domain here could be given
> >> a slightly different "weight" in the lock order violation check logic.
> > 
> > So locks from domains != current would be given a lower order, let's
> > say:
> > 
> > #define MM_LOCK_ORDER_nestedp2m   8
> > #define MM_LOCK_ORDER_p2m16
> > #define MM_LOCK_ORDER_per_page_sharing   24
> > #define MM_LOCK_ORDER_altp2mlist 32
> > #define MM_LOCK_ORDER_altp2m 40
> > #define MM_LOCK_ORDER_pod48
> > #define MM_LOCK_ORDER_page_alloc 56
> > #define MM_LOCK_ORDER_paging 64
> > #define MM_LOCK_ORDER_MAXMM_LOCK_ORDER_paging
> > 
> > If domain != current, the above values are used. If domain == current
> > the values above are used + MM_LOCK_ORDER_MAX? So in that case the
> > order of MM_LOCK_ORDER_p2m against the current domain would be 16 + 64
> > = 80?
> > 
> > This has the slight inconvenience that not all mm lock call sites have
> > the target domain available, but can be solved.
> 
> No, I wasn't envisioning a really generic approach, i.e.
> permitting this for all of the locks. Do we need this? Till now
> I was rather considering to do this for just the paging lock,
> with Dom0 and the controlling domains using just a small
> (+/- 2 and +/- 1) offset from the normal paging one.
> However, I now realize that indeed there may be more of
> such nesting, and you've merely hit a specific case of it.

So far that's the only case I've hit, but I'm quite sure there are a
non-trivial amount of hypercalls that where only used by a PV Dom0, so
I don't discard finding other such instances.

> In any event it seems to me that you've got the ordering
> inversed, i.e. domain == current (or really: the general case,
> i.e. after excluding the two special cases) would use base
> values, domain == currd->target would use some offset, and
> Dom0 would use double the offset.
> 
> With it extended to all locks I'm no longer sure though that
> the end result would be safe. I think it would be helpful to
> rope in whoever did originally design this locking model.

I'm adding Tim which I think is the original author of 

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Jan Beulich
>>> On 13.12.18 at 12:39,  wrote:
> On Wed, Dec 12, 2018 at 09:07:14AM -0700, Jan Beulich wrote:
>> >>> On 12.12.18 at 15:54,  wrote:
>> > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d,
>> >  bytes = (unsigned int)((sc->pages - pages + 7) >> 3);
>> >  if ( likely(peek) )
>> >  {
>> > +if ( paging_mode_enabled(current->domain) )
>> > +/*
>> > + * Drop the target p2m lock, or else Xen will 
>> > panic
>> > + * when trying to acquire the p2m lock of the 
>> > caller
>> > + * due to invalid lock order. Note that there are 
>> > no
>> > + * lock ordering issues here, and the panic is 
>> > due to
>> > + * the fact that the lock level tracking doesn't 
>> > record
>> > + * the domain the lock belongs to.
>> > + */
>> > +paging_unlock(d);
>> 
>> This makes it sound as if tracking the domain would help. It doesn't,
>> at least not as long as there is not also some sort of ordering or
>> hierarchy between domains. IOW I'd prefer if the "doesn't" became
>> "can't".
> 
> Well, Just keeping correct order between each domain locks should be
> enough?
> 
> Ie: exactly the same that Xen currently does but on a per-domain
> basis. This is feasible, but each CPU would need to store the lock
> order of each possible domain:
> 
> DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
> 
> This would consume ~32KB per CPU, which is not that much but seems a
> waste when most of the time a single entry will be used.

Well, tracking by domain ID wouldn't help you - the controlling
domain may well have a higher ID than the being controlled one,
i.e. the nesting you want needs to be independent of domain ID.

>> Now before we go this route I think we need to consider whether
>> this really is the only place where the two locks get into one
>> another's way. That's because I don't think we want to introduce
>> more of such, well, hackery, and hence we'd otherwise need a
>> better solution. For example the locking model could be adjusted
>> to allow such nesting in the general case: Dom0 and any domain
>> whose ->target matches the subject domain here could be given
>> a slightly different "weight" in the lock order violation check logic.
> 
> So locks from domains != current would be given a lower order, let's
> say:
> 
> #define MM_LOCK_ORDER_nestedp2m   8
> #define MM_LOCK_ORDER_p2m16
> #define MM_LOCK_ORDER_per_page_sharing   24
> #define MM_LOCK_ORDER_altp2mlist 32
> #define MM_LOCK_ORDER_altp2m 40
> #define MM_LOCK_ORDER_pod48
> #define MM_LOCK_ORDER_page_alloc 56
> #define MM_LOCK_ORDER_paging 64
> #define MM_LOCK_ORDER_MAXMM_LOCK_ORDER_paging
> 
> If domain != current, the above values are used. If domain == current
> the values above are used + MM_LOCK_ORDER_MAX? So in that case the
> order of MM_LOCK_ORDER_p2m against the current domain would be 16 + 64
> = 80?
> 
> This has the slight inconvenience that not all mm lock call sites have
> the target domain available, but can be solved.

No, I wasn't envisioning a really generic approach, i.e.
permitting this for all of the locks. Do we need this? Till now
I was rather considering to do this for just the paging lock,
with Dom0 and the controlling domains using just a small
(+/- 2 and +/- 1) offset from the normal paging one.
However, I now realize that indeed there may be more of
such nesting, and you've merely hit a specific case of it.

In any event it seems to me that you've got the ordering
inversed, i.e. domain == current (or really: the general case,
i.e. after excluding the two special cases) would use base
values, domain == currd->target would use some offset, and
Dom0 would use double the offset.

With it extended to all locks I'm no longer sure though that
the end result would be safe. I think it would be helpful to
rope in whoever did originally design this locking model.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Roger Pau Monné
On Wed, Dec 12, 2018 at 09:07:14AM -0700, Jan Beulich wrote:
> >>> On 12.12.18 at 15:54,  wrote:
> > Fix this by releasing the target paging lock before attempting to
> > perform the copy of the dirty bitmap, and then forcing a restart of
> > the whole process in case there have been changes to the dirty bitmap
> > tables.
> 
> I'm afraid it's not that simple: The writer side (paging_mark_pfn_dirty())
> uses the same lock, and I think the assumption is that while the copying
> takes place no updates to the bitmap can occur. Then again the subject
> domain gets paused anyway, so updates probably can't happen (the
> "probably" of course needs to be eliminated).

I've looked into this, and I think you are right, the copy must be
done with the paging lock held. Even if the domain is paused, the
device model can still mark gfns as dirty using
XEN_DMOP_modified_memory, so the only option would be to copy to a
temporary page while holding the lock, release the lock and copy the
temporary page to the called buffer.

> > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d,
> >  bytes = (unsigned int)((sc->pages - pages + 7) >> 3);
> >  if ( likely(peek) )
> >  {
> > +if ( paging_mode_enabled(current->domain) )
> > +/*
> > + * Drop the target p2m lock, or else Xen will panic
> > + * when trying to acquire the p2m lock of the 
> > caller
> > + * due to invalid lock order. Note that there are 
> > no
> > + * lock ordering issues here, and the panic is due 
> > to
> > + * the fact that the lock level tracking doesn't 
> > record
> > + * the domain the lock belongs to.
> > + */
> > +paging_unlock(d);
> 
> This makes it sound as if tracking the domain would help. It doesn't,
> at least not as long as there is not also some sort of ordering or
> hierarchy between domains. IOW I'd prefer if the "doesn't" became
> "can't".

Well, Just keeping correct order between each domain locks should be
enough?

Ie: exactly the same that Xen currently does but on a per-domain
basis. This is feasible, but each CPU would need to store the lock
order of each possible domain:

DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);

This would consume ~32KB per CPU, which is not that much but seems a
waste when most of the time a single entry will be used.

> Now before we go this route I think we need to consider whether
> this really is the only place where the two locks get into one
> another's way. That's because I don't think we want to introduce
> more of such, well, hackery, and hence we'd otherwise need a
> better solution. For example the locking model could be adjusted
> to allow such nesting in the general case: Dom0 and any domain
> whose ->target matches the subject domain here could be given
> a slightly different "weight" in the lock order violation check logic.

So locks from domains != current would be given a lower order, let's
say:

#define MM_LOCK_ORDER_nestedp2m   8
#define MM_LOCK_ORDER_p2m16
#define MM_LOCK_ORDER_per_page_sharing   24
#define MM_LOCK_ORDER_altp2mlist 32
#define MM_LOCK_ORDER_altp2m 40
#define MM_LOCK_ORDER_pod48
#define MM_LOCK_ORDER_page_alloc 56
#define MM_LOCK_ORDER_paging 64
#define MM_LOCK_ORDER_MAXMM_LOCK_ORDER_paging

If domain != current, the above values are used. If domain == current
the values above are used + MM_LOCK_ORDER_MAX? So in that case the
order of MM_LOCK_ORDER_p2m against the current domain would be 16 + 64
= 80?

This has the slight inconvenience that not all mm lock call sites have
the target domain available, but can be solved.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-12 Thread Jan Beulich
>>> On 12.12.18 at 15:54,  wrote:
> Fix this by releasing the target paging lock before attempting to
> perform the copy of the dirty bitmap, and then forcing a restart of
> the whole process in case there have been changes to the dirty bitmap
> tables.

I'm afraid it's not that simple: The writer side (paging_mark_pfn_dirty())
uses the same lock, and I think the assumption is that while the copying
takes place no updates to the bitmap can occur. Then again the subject
domain gets paused anyway, so updates probably can't happen (the
"probably" of course needs to be eliminated). But at the very least I'd
expect a more thorough discussion here as to why dropping the lock
intermediately is fine. The preemption mechanism can't be used as sole
reference, because that one doesn't drop the lock in the middle of an
iteration.

> --- a/xen/arch/x86/mm/paging.c
> +++ b/xen/arch/x86/mm/paging.c
> @@ -408,6 +408,7 @@ static int paging_log_dirty_op(struct domain *d,
>  unsigned long *l1 = NULL;
>  int i4, i3, i2;
>  
> + again:
>  if ( !resuming )
>  {

Considering that you set "resuming" to 1 before jumping here,
why don't you put the label after this if()? I'd even consider pulling
the label even further down (perhaps to immediately before the
last if() ahead of the main loop), and have the path branching there
re-acquire the lock (and drop some of the other state setting that
then becomes unnecessary).

> @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d,
>  bytes = (unsigned int)((sc->pages - pages + 7) >> 3);
>  if ( likely(peek) )
>  {
> +if ( paging_mode_enabled(current->domain) )
> +/*
> + * Drop the target p2m lock, or else Xen will panic
> + * when trying to acquire the p2m lock of the caller
> + * due to invalid lock order. Note that there are no
> + * lock ordering issues here, and the panic is due to
> + * the fact that the lock level tracking doesn't 
> record
> + * the domain the lock belongs to.
> + */
> +paging_unlock(d);

This makes it sound as if tracking the domain would help. It doesn't,
at least not as long as there is not also some sort of ordering or
hierarchy between domains. IOW I'd prefer if the "doesn't" became
"can't".

Now before we go this route I think we need to consider whether
this really is the only place where the two locks get into one
another's way. That's because I don't think we want to introduce
more of such, well, hackery, and hence we'd otherwise need a
better solution. For example the locking model could be adjusted
to allow such nesting in the general case: Dom0 and any domain
whose ->target matches the subject domain here could be given
a slightly different "weight" in the lock order violation check logic.

I'm also uncertain about the overall effect this change has, in
that now the lock gets dropped for _every_ page to be copied.

> @@ -505,6 +517,23 @@ static int paging_log_dirty_op(struct domain *d,
>  clear_page(l1);
>  unmap_domain_page(l1);
>  }
> +if ( paging_mode_enabled(current->domain) && peek )
> +{
> +d->arch.paging.preempt.log_dirty.i4 = i4;
> +d->arch.paging.preempt.log_dirty.i3 = i3;
> +d->arch.paging.preempt.log_dirty.i2 = i2 + 1;

If i2 + 1 == LOGDIRTY_NODE_ENTRIES I think it would be better

> +d->arch.paging.preempt.log_dirty.done = pages;
> +d->arch.paging.preempt.dom = current->domain;
> +d->arch.paging.preempt.op = sc->op;
> +resuming = 1;

true?

> +if ( l2 )
> +unmap_domain_page(l2);
> +if ( l3 )
> +unmap_domain_page(l3);
> +if ( l4 )
> +unmap_domain_page(l4);
> +goto again;

At the very least for safety reasons you should set l2, l3, and l4
back to NULL here.

> @@ -513,6 +542,7 @@ static int paging_log_dirty_op(struct domain *d,
>  {
>  d->arch.paging.preempt.log_dirty.i4 = i4;
>  d->arch.paging.preempt.log_dirty.i3 = i3 + 1;
> +d->arch.paging.preempt.log_dirty.i2 = 0;
>  rv = -ERESTART;
>  break;
>  }
> @@ -525,6 +555,7 @@ static int paging_log_dirty_op(struct domain *d,
>  {
>  d->arch.paging.preempt.log_dirty.i4 = i4 + 1;
>  d->arch.paging.preempt.log_dirty.i3 = 0;
> +d->arch.paging.preempt.log_dirty.i2 = 0;
>  rv = -ERESTART;
>  }

With these I don't see why you need storage in the