Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-19 Thread George Dunlap
On 11/19/18 4:17 PM, Razvan Cojocaru wrote:
> On 11/19/18 5:58 PM, George Dunlap wrote:
>> On 11/17/18 6:58 PM, Razvan Cojocaru wrote:
>>> On 11/16/18 9:50 PM, Razvan Cojocaru wrote:
 On 11/16/18 7:59 PM, George Dunlap wrote:
> On the other hand, we want the logdirty rangesets to actually match the
> host's rangesets; using altp2m->max_mapped_pfn for this is clearly
> wrong. The easiest fix would be just to explicitly use the host's
> max_mapped_pfn when calculating the clipping.  A more complete fix would
> involve calculating two different ranges -- a "rangeset" range and a
> "invalidate" range, the second of which would be clipped on altp2ms by
> {min,max}_remapped_gfn.
>
> Something like the attached (compile-tested only).  I'm partial to
> having both patches applied, but I'd be open to arguments that we should
> only use the first.

 Thanks! I haven't yet been able to think in depth about the logic, but I
 did manage to apply them. Just applying the first one allows me to set
 p2m->max_mapped_pfn = 0; without the ASSERT() crashing the hypervisor,
 and everything appears to work well.

 With both patches applies, the display remains frozen (things appear to
 behave - externally - in the same way as before the series).
>>>
>>> Right, I know why the second patch keeps the display frozen. It's
>>> because for altp2ms (where it matters most), the patch basically does
>>> invalidate_start = max(invalidate_start, p2m->min_remapped_gfn) and
>>> invalidate_end = min(invalidate_end, p2m->max_remapped_gfn).
>>>
>>> However, as previously requested, I've now made p2m->max_remapped_gfn
>>> begin life as 0 for altp2ms, and p2m->min_remapped_gfn is initialized to
>>> INVALID_GFN, which is decimal 18446744073709551615. So we get
>>> invalidate_end: 0, invalidate_start: 18446744073709551615,
>>> invalidate_end < invalidate_start, resulting in nothing being done for
>>> altp2ms, which is functionally back to square one.
>>
>> But this doesn't explain why my reasoning was wrong; and it's always
>> dangerous to use a system whose behavior you don't really understand,
>> even if it seems to work.
>>
>> It turns out I'd made a mistake in saying that altp2m->max_mapped_pfn ==
>> alt2m->max_remapped_gfn.  The first is the highest gfn present in the
>> altp2m, either copied from the hostp2m or changed; the second is the
>> highest value changed (via p2m_altp2m_change_gfn()).
>>
>> What about using the attached patch instead?
> 
> The attached patch does work, thanks! Shall I append them both to the
> end of the series and send out V7?

Yes, please.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-19 Thread Razvan Cojocaru
On 11/19/18 5:58 PM, George Dunlap wrote:
> On 11/17/18 6:58 PM, Razvan Cojocaru wrote:
>> On 11/16/18 9:50 PM, Razvan Cojocaru wrote:
>>> On 11/16/18 7:59 PM, George Dunlap wrote:
 On the other hand, we want the logdirty rangesets to actually match the
 host's rangesets; using altp2m->max_mapped_pfn for this is clearly
 wrong. The easiest fix would be just to explicitly use the host's
 max_mapped_pfn when calculating the clipping.  A more complete fix would
 involve calculating two different ranges -- a "rangeset" range and a
 "invalidate" range, the second of which would be clipped on altp2ms by
 {min,max}_remapped_gfn.

 Something like the attached (compile-tested only).  I'm partial to
 having both patches applied, but I'd be open to arguments that we should
 only use the first.
>>>
>>> Thanks! I haven't yet been able to think in depth about the logic, but I
>>> did manage to apply them. Just applying the first one allows me to set
>>> p2m->max_mapped_pfn = 0; without the ASSERT() crashing the hypervisor,
>>> and everything appears to work well.
>>>
>>> With both patches applies, the display remains frozen (things appear to
>>> behave - externally - in the same way as before the series).
>>
>> Right, I know why the second patch keeps the display frozen. It's
>> because for altp2ms (where it matters most), the patch basically does
>> invalidate_start = max(invalidate_start, p2m->min_remapped_gfn) and
>> invalidate_end = min(invalidate_end, p2m->max_remapped_gfn).
>>
>> However, as previously requested, I've now made p2m->max_remapped_gfn
>> begin life as 0 for altp2ms, and p2m->min_remapped_gfn is initialized to
>> INVALID_GFN, which is decimal 18446744073709551615. So we get
>> invalidate_end: 0, invalidate_start: 18446744073709551615,
>> invalidate_end < invalidate_start, resulting in nothing being done for
>> altp2ms, which is functionally back to square one.
> 
> But this doesn't explain why my reasoning was wrong; and it's always
> dangerous to use a system whose behavior you don't really understand,
> even if it seems to work.
> 
> It turns out I'd made a mistake in saying that altp2m->max_mapped_pfn ==
> alt2m->max_remapped_gfn.  The first is the highest gfn present in the
> altp2m, either copied from the hostp2m or changed; the second is the
> highest value changed (via p2m_altp2m_change_gfn()).
> 
> What about using the attached patch instead?

The attached patch does work, thanks! Shall I append them both to the
end of the series and send out V7?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-19 Thread George Dunlap
On 11/17/18 6:58 PM, Razvan Cojocaru wrote:
> On 11/16/18 9:50 PM, Razvan Cojocaru wrote:
>> On 11/16/18 7:59 PM, George Dunlap wrote:
>>> On the other hand, we want the logdirty rangesets to actually match the
>>> host's rangesets; using altp2m->max_mapped_pfn for this is clearly
>>> wrong. The easiest fix would be just to explicitly use the host's
>>> max_mapped_pfn when calculating the clipping.  A more complete fix would
>>> involve calculating two different ranges -- a "rangeset" range and a
>>> "invalidate" range, the second of which would be clipped on altp2ms by
>>> {min,max}_remapped_gfn.
>>>
>>> Something like the attached (compile-tested only).  I'm partial to
>>> having both patches applied, but I'd be open to arguments that we should
>>> only use the first.
>>
>> Thanks! I haven't yet been able to think in depth about the logic, but I
>> did manage to apply them. Just applying the first one allows me to set
>> p2m->max_mapped_pfn = 0; without the ASSERT() crashing the hypervisor,
>> and everything appears to work well.
>>
>> With both patches applies, the display remains frozen (things appear to
>> behave - externally - in the same way as before the series).
> 
> Right, I know why the second patch keeps the display frozen. It's
> because for altp2ms (where it matters most), the patch basically does
> invalidate_start = max(invalidate_start, p2m->min_remapped_gfn) and
> invalidate_end = min(invalidate_end, p2m->max_remapped_gfn).
> 
> However, as previously requested, I've now made p2m->max_remapped_gfn
> begin life as 0 for altp2ms, and p2m->min_remapped_gfn is initialized to
> INVALID_GFN, which is decimal 18446744073709551615. So we get
> invalidate_end: 0, invalidate_start: 18446744073709551615,
> invalidate_end < invalidate_start, resulting in nothing being done for
> altp2ms, which is functionally back to square one.

But this doesn't explain why my reasoning was wrong; and it's always
dangerous to use a system whose behavior you don't really understand,
even if it seems to work.

It turns out I'd made a mistake in saying that altp2m->max_mapped_pfn ==
alt2m->max_remapped_gfn.  The first is the highest gfn present in the
altp2m, either copied from the hostp2m or changed; the second is the
highest value changed (via p2m_altp2m_change_gfn()).

What about using the attached patch instead?

 -George
From f4c72ecb95cfa5d597b8e28e99e681cfeaa32199 Mon Sep 17 00:00:00 2001
From: George Dunlap 
Date: Fri, 16 Nov 2018 16:28:25 +
Subject: [PATCH 2/2] p2m: change_range_type: Only invalidate mapped gfns

change_range_type() invalidates gfn ranges to lazily change the type
of a range of gfns, and also modifies the logdirty rangesets of that
p2m. At the moment, it clips both down by the hostp2m.

While this will result in correct behavior, it's not entirely efficient,
since invalidated entries outside that range will, on fault, simply be
modified back to "empty" before faulting normally again.

Separate out the calculation of the two ranges.  Keep using the
hostp2m's max_mapped_pfn to clip the logdirty ranges, but use the
current p2m's max_mapped_pfn to further clip the invalidation range
for alternate p2ms.

Signed-off-by: George Dunlap 
---
 xen/arch/x86/mm/p2m.c | 61 +--
 1 file changed, 42 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6d764d1e22..b97e138452 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1043,41 +1043,64 @@ static void change_type_range(struct p2m_domain *p2m,
   p2m_type_t ot, p2m_type_t nt)
 {
 unsigned long rangeset_start, rangeset_end;
+unsigned long invalidate_start, invalidate_end;
 struct domain *d = p2m->domain;
 unsigned long host_max_pfn = p2m_get_hostp2m(d)->max_mapped_pfn;
+unsigned long max_pfn = p2m->max_mapped_pfn;
 int rc = 0;
 
-rangeset_start = start;
-rangeset_end   = end - 1;
+/* 
+ * If we have an altp2m, the logdirty rangeset range needs to
+ * match that of the hostp2m, but for efficiency, we want to clip
+ * down the the invalidation range according to the mapped values
+ * in the altp2m.  Keep track of and clip the ranges separately.
+ */
+rangeset_start = invalidate_start = start;
+rangeset_end   = invalidate_end   = end - 1;
 
-/* Always clip the rangeset down to the host p2m */
+/* Clip down to the host p2m */
 if ( unlikely(rangeset_end > host_max_pfn) )
-rangeset_end = host_max_pfn;
+rangeset_end = invalidate_end = host_max_pfn;
 
 /* If the requested range is out of scope, return doing nothing */
 if ( rangeset_start > rangeset_end )
 return;
 
+if ( p2m_is_altp2m(p2m) )
+invalidate_end = min(invalidate_end, max_pfn);
+
 p2m->defer_nested_flush = 1;
 
 /* 
- * If all valid gfns are in the invalidation range, just do a
- * global type change.  

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-19 Thread Razvan Cojocaru
On 11/19/18 2:42 PM, Jan Beulich wrote:
 On 16.11.18 at 16:52,  wrote:
>> On 11/16/18 3:05 PM, George Dunlap wrote:
>>> Now take change_type_range.  The global effect of change_type_range
>>> should be that reads of the p2m which happen afterwards should have the
>>> new, changed value.
>>>
>>> In case A, change_type_range will write invalid entries up to
>>> max_remapped_pfn, leaving the range between max_remapped_pfn and
>>> hostp2m->max_mapped_pfn invalid.  When a gfn in this range is read, an
>>> EPT fault will happen, p2m_altp2m_lazy_copy() will be called, and the
>>> new (correct) value copied from the hostp2m.
>>>
>>> In case B, change_type_range will write invalid entries up until
>>> hostp2m->max_mapped_pfn.  When a gfn in this range is accessed, a
>>> MISCONFIG fault will happen, and the correct value will be calculated in
>>> resolve_misconfig.
>>
>> Or, no: resolve_misconfig() will read the current type in the altp2m,
>> which will be p2m_invalid; p2m_recalc_type() will then return
>> p2m_invalid, and then set the entry to a "plain" invalid without the
>> reserved bit set.  Then the fault will happen again, taking the normal
>> HAP fault path (calling p2m_altp2m_lazy_copy()), at which point...
>>
>>> And at this point, I realize that my previous analysis was probably
>>> wrong, because at this point altp2m->max_remapped_gfn will be wrong:
>>> entries above max_remapped_gfn will have become valid without going
>>> through p2m_altp2m_lazy_copy().
>>
>> ...altp2m->max_remapped_gfn will be set appropriately.
>>
>> I think. :-/
> 
> And I think I can follow your analysis now, which in turn means I
> think your two patches are going to help, but for a proper review
> I'd prefer them to be inline in a mail, rather than attachments.

Please see my review of the second patch, which won't work with my series.

Also, the patches George has sent are applicable after my series has
been applied - would you and George like me to append his first patch to
the end of my series, or would we be better off if he sends his patch
first for the current code, and I rebase my series on that?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-19 Thread Jan Beulich
>>> On 16.11.18 at 16:52,  wrote:
> On 11/16/18 3:05 PM, George Dunlap wrote:
>> Now take change_type_range.  The global effect of change_type_range
>> should be that reads of the p2m which happen afterwards should have the
>> new, changed value.
>> 
>> In case A, change_type_range will write invalid entries up to
>> max_remapped_pfn, leaving the range between max_remapped_pfn and
>> hostp2m->max_mapped_pfn invalid.  When a gfn in this range is read, an
>> EPT fault will happen, p2m_altp2m_lazy_copy() will be called, and the
>> new (correct) value copied from the hostp2m.
>> 
>> In case B, change_type_range will write invalid entries up until
>> hostp2m->max_mapped_pfn.  When a gfn in this range is accessed, a
>> MISCONFIG fault will happen, and the correct value will be calculated in
>> resolve_misconfig.
> 
> Or, no: resolve_misconfig() will read the current type in the altp2m,
> which will be p2m_invalid; p2m_recalc_type() will then return
> p2m_invalid, and then set the entry to a "plain" invalid without the
> reserved bit set.  Then the fault will happen again, taking the normal
> HAP fault path (calling p2m_altp2m_lazy_copy()), at which point...
> 
>> And at this point, I realize that my previous analysis was probably
>> wrong, because at this point altp2m->max_remapped_gfn will be wrong:
>> entries above max_remapped_gfn will have become valid without going
>> through p2m_altp2m_lazy_copy().
> 
> ...altp2m->max_remapped_gfn will be set appropriately.
> 
> I think. :-/

And I think I can follow your analysis now, which in turn means I
think your two patches are going to help, but for a proper review
I'd prefer them to be inline in a mail, rather than attachments.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-17 Thread Razvan Cojocaru
On 11/16/18 9:50 PM, Razvan Cojocaru wrote:
> On 11/16/18 7:59 PM, George Dunlap wrote:
>> On the other hand, we want the logdirty rangesets to actually match the
>> host's rangesets; using altp2m->max_mapped_pfn for this is clearly
>> wrong. The easiest fix would be just to explicitly use the host's
>> max_mapped_pfn when calculating the clipping.  A more complete fix would
>> involve calculating two different ranges -- a "rangeset" range and a
>> "invalidate" range, the second of which would be clipped on altp2ms by
>> {min,max}_remapped_gfn.
>>
>> Something like the attached (compile-tested only).  I'm partial to
>> having both patches applied, but I'd be open to arguments that we should
>> only use the first.
> 
> Thanks! I haven't yet been able to think in depth about the logic, but I
> did manage to apply them. Just applying the first one allows me to set
> p2m->max_mapped_pfn = 0; without the ASSERT() crashing the hypervisor,
> and everything appears to work well.
> 
> With both patches applies, the display remains frozen (things appear to
> behave - externally - in the same way as before the series).

Right, I know why the second patch keeps the display frozen. It's
because for altp2ms (where it matters most), the patch basically does
invalidate_start = max(invalidate_start, p2m->min_remapped_gfn) and
invalidate_end = min(invalidate_end, p2m->max_remapped_gfn).

However, as previously requested, I've now made p2m->max_remapped_gfn
begin life as 0 for altp2ms, and p2m->min_remapped_gfn is initialized to
INVALID_GFN, which is decimal 18446744073709551615. So we get
invalidate_end: 0, invalidate_start: 18446744073709551615,
invalidate_end < invalidate_start, resulting in nothing being done for
altp2ms, which is functionally back to square one.

In light of this analysis, I suppose I should now also disregard the
recommendation to clip first_gfn and max_nr to fit within
p2m->min_remapped_gfn and p2m->max_remapped_gfn for altp2ms in
finish_type_change(), for obvious reasons - unless you prefer a
different strategy.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread Razvan Cojocaru
On 11/16/18 7:59 PM, George Dunlap wrote:
> On 11/16/18 2:10 PM, Razvan Cojocaru wrote:
>> On 11/16/18 2:03 PM, George Dunlap wrote:
>>> The code is definitely complicated enough, though, that I may have
>>> missed something, which is why I asked Razvan if there was a reason he
>>> changed it.
>>>
>>> For the purposes of this patch, I propose having p2m_altp2m_init_ept()
>>> set max_mapped_pfn to 0 (if that works), and leaving "get rid of
>>> max_remapped_pfn" for a future clean-up series.
>>
>> I've retraced my previous analysis and re-ran some tests, and I now
>> remember (sorry it took a while) why the p2m->max_mapped_pfn =
>> hostp2m->max_mapped_pfn was both necessary and not accidental.
>>
>> Let's say we set it to 0 in p2m_altp2m_init_ept(). Then,
>> hap_track_dirty_vram() calls p2m_change_type_range(), which calls the
>> newly added change_type_range().
>>
>> Change_type_range() looks like this:
>>
>> static void change_type_range(struct p2m_domain *p2m,
>>   unsigned long start, unsigned long end,
>>   p2m_type_t ot, p2m_type_t nt)
>> {
>> unsigned long gfn = start;
>> struct domain *d = p2m->domain;
>> int rc = 0;
>>
>> p2m->defer_nested_flush = 1;
>>
>> if ( unlikely(end > p2m->max_mapped_pfn) )
>> {
>> if ( !gfn )
>> {
>> p2m->change_entry_type_global(p2m, ot, nt);
>> gfn = end;
>> }
>> end = p2m->max_mapped_pfn + 1;
>> }
>> if ( gfn < end )
>> rc = p2m->change_entry_type_range(p2m, ot, nt, gfn, end - 1);
>> if ( rc )
>> {
>> printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx] from
>> %d to %d\n",
>>rc, d->domain_id, start, end - 1, ot, nt);
>> domain_crash(d);
>> }
>>
>> switch ( nt )
>> {
>> case p2m_ram_rw:
>> if ( ot == p2m_ram_logdirty )
>> rc = rangeset_remove_range(p2m->logdirty_ranges, start, end
>> - 1);
>> break;
>> case p2m_ram_logdirty:
>> if ( ot == p2m_ram_rw )
>> rc = rangeset_add_range(p2m->logdirty_ranges, start, end - 1);
>> break;
>> default:
>> break;
>> }
>> if ( rc )
>> {
>> printk(XENLOG_G_ERR "Error %d manipulating Dom%d's log-dirty
>> ranges\n",
>>rc, d->domain_id);
>> domain_crash(d);
>> }
>>
>> p2m->defer_nested_flush = 0;
>> if ( nestedhvm_enabled(d) )
>> p2m_flush_nestedp2m(d);
>> }
>>
>> If we set p2m->max_mapped_pfn to 0, we're guaranteed to run into the if
>> ( unlikely(end > p2m->max_mapped_pfn) ) body, where end =
>> p2m->max_mapped_pfn + 1; will make end 1.
>>
>> Then, we will crash the hypervisor in rangeset_add_range(), where
>> there's an ASSERT() stating that start <= end.
> 
> Ah, right, this was the original crash that you ran into several months
> ago, which flagged up the whole logdirty range synchronization issue.
> 
> But that's partly a logic hole in change_entry_type_range(), which
> assumes that start < p2m->max_mapped_pfn.  It would be better to fix
> that than to work around it by changing the meaning of max_mapped_pfn.
> 
> On the other hand, we want the logdirty rangesets to actually match the
> host's rangesets; using altp2m->max_mapped_pfn for this is clearly
> wrong. The easiest fix would be just to explicitly use the host's
> max_mapped_pfn when calculating the clipping.  A more complete fix would
> involve calculating two different ranges -- a "rangeset" range and a
> "invalidate" range, the second of which would be clipped on altp2ms by
> {min,max}_remapped_gfn.
> 
> Something like the attached (compile-tested only).  I'm partial to
> having both patches applied, but I'd be open to arguments that we should
> only use the first.

Thanks! I haven't yet been able to think in depth about the logic, but I
did manage to apply them. Just applying the first one allows me to set
p2m->max_mapped_pfn = 0; without the ASSERT() crashing the hypervisor,
and everything appears to work well.

With both patches applies, the display remains frozen (things appear to
behave - externally - in the same way as before the series).


Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread George Dunlap
On 11/16/18 2:10 PM, Razvan Cojocaru wrote:
> On 11/16/18 2:03 PM, George Dunlap wrote:
>> The code is definitely complicated enough, though, that I may have
>> missed something, which is why I asked Razvan if there was a reason he
>> changed it.
>>
>> For the purposes of this patch, I propose having p2m_altp2m_init_ept()
>> set max_mapped_pfn to 0 (if that works), and leaving "get rid of
>> max_remapped_pfn" for a future clean-up series.
> 
> I've retraced my previous analysis and re-ran some tests, and I now
> remember (sorry it took a while) why the p2m->max_mapped_pfn =
> hostp2m->max_mapped_pfn was both necessary and not accidental.
> 
> Let's say we set it to 0 in p2m_altp2m_init_ept(). Then,
> hap_track_dirty_vram() calls p2m_change_type_range(), which calls the
> newly added change_type_range().
> 
> Change_type_range() looks like this:
> 
> static void change_type_range(struct p2m_domain *p2m,
>   unsigned long start, unsigned long end,
>   p2m_type_t ot, p2m_type_t nt)
> {
> unsigned long gfn = start;
> struct domain *d = p2m->domain;
> int rc = 0;
> 
> p2m->defer_nested_flush = 1;
> 
> if ( unlikely(end > p2m->max_mapped_pfn) )
> {
> if ( !gfn )
> {
> p2m->change_entry_type_global(p2m, ot, nt);
> gfn = end;
> }
> end = p2m->max_mapped_pfn + 1;
> }
> if ( gfn < end )
> rc = p2m->change_entry_type_range(p2m, ot, nt, gfn, end - 1);
> if ( rc )
> {
> printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx] from
> %d to %d\n",
>rc, d->domain_id, start, end - 1, ot, nt);
> domain_crash(d);
> }
> 
> switch ( nt )
> {
> case p2m_ram_rw:
> if ( ot == p2m_ram_logdirty )
> rc = rangeset_remove_range(p2m->logdirty_ranges, start, end
> - 1);
> break;
> case p2m_ram_logdirty:
> if ( ot == p2m_ram_rw )
> rc = rangeset_add_range(p2m->logdirty_ranges, start, end - 1);
> break;
> default:
> break;
> }
> if ( rc )
> {
> printk(XENLOG_G_ERR "Error %d manipulating Dom%d's log-dirty
> ranges\n",
>rc, d->domain_id);
> domain_crash(d);
> }
> 
> p2m->defer_nested_flush = 0;
> if ( nestedhvm_enabled(d) )
> p2m_flush_nestedp2m(d);
> }
> 
> If we set p2m->max_mapped_pfn to 0, we're guaranteed to run into the if
> ( unlikely(end > p2m->max_mapped_pfn) ) body, where end =
> p2m->max_mapped_pfn + 1; will make end 1.
> 
> Then, we will crash the hypervisor in rangeset_add_range(), where
> there's an ASSERT() stating that start <= end.

Ah, right, this was the original crash that you ran into several months
ago, which flagged up the whole logdirty range synchronization issue.

But that's partly a logic hole in change_entry_type_range(), which
assumes that start < p2m->max_mapped_pfn.  It would be better to fix
that than to work around it by changing the meaning of max_mapped_pfn.

On the other hand, we want the logdirty rangesets to actually match the
host's rangesets; using altp2m->max_mapped_pfn for this is clearly
wrong. The easiest fix would be just to explicitly use the host's
max_mapped_pfn when calculating the clipping.  A more complete fix would
involve calculating two different ranges -- a "rangeset" range and a
"invalidate" range, the second of which would be clipped on altp2ms by
{min,max}_remapped_gfn.

Something like the attached (compile-tested only).  I'm partial to
having both patches applied, but I'd be open to arguments that we should
only use the first.

 -George
From d92bd123f92d66aef394735a6d836fd104f01867 Mon Sep 17 00:00:00 2001
From: George Dunlap 
Date: Fri, 16 Nov 2018 17:17:48 +
Subject: [PATCH 1/2] p2m: Always use hostp2m when clipping rangesets

The logdirty rangesets of the altp2ms need to be kept in sync with the
hostp2m.  This means when iterating through the altp2ms, we need to
use the host p2m to clip the rangeset, not the indiviual altp2m's
value.

This change also:

- Documents that the end is non-inclusive

- Calculates an "inclusive" value for the end once, rather than
  open-coding the modification, and (worse) back-modifying updates so
  that the calculation ends up correct

- Clarifies the logic deciding whether to call
  change_entry_type_global() or change_entry_type_range()

- Handles the case where start >= hostp2m->max_mapped_pfn

Signed-off-by: George Dunlap 
---
RFC: Wasn't sure what the best thing was to do if start >=
host_max_pfn.  We silently clip the logdirty rangeset to
max_mapped_pfn, and the chosen behavior seems consistent with that.
But it seems like such a request would almost certainly be a bug
somewhere that people might like to find out about.
---
 xen/arch/x86/mm/p2m.c | 46 +++
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c 

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread George Dunlap
On 11/16/18 3:05 PM, George Dunlap wrote:
> Now take change_type_range.  The global effect of change_type_range
> should be that reads of the p2m which happen afterwards should have the
> new, changed value.
> 
> In case A, change_type_range will write invalid entries up to
> max_remapped_pfn, leaving the range between max_remapped_pfn and
> hostp2m->max_mapped_pfn invalid.  When a gfn in this range is read, an
> EPT fault will happen, p2m_altp2m_lazy_copy() will be called, and the
> new (correct) value copied from the hostp2m.
> 
> In case B, change_type_range will write invalid entries up until
> hostp2m->max_mapped_pfn.  When a gfn in this range is accessed, a
> MISCONFIG fault will happen, and the correct value will be calculated in
> resolve_misconfig.

Or, no: resolve_misconfig() will read the current type in the altp2m,
which will be p2m_invalid; p2m_recalc_type() will then return
p2m_invalid, and then set the entry to a "plain" invalid without the
reserved bit set.  Then the fault will happen again, taking the normal
HAP fault path (calling p2m_altp2m_lazy_copy()), at which point...

> And at this point, I realize that my previous analysis was probably
> wrong, because at this point altp2m->max_remapped_gfn will be wrong:
> entries above max_remapped_gfn will have become valid without going
> through p2m_altp2m_lazy_copy().

...altp2m->max_remapped_gfn will be set appropriately.

I think. :-/

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread George Dunlap
On 11/16/18 12:31 PM, Jan Beulich wrote:
 On 16.11.18 at 13:03,  wrote:
>> So the basic question is, does "max_mapped_pfn" mean, "Maximum pfn _for
>> the domain_", or "Maximum pfn _for this p2m_".  When the element was
>> added to the p2m struct those were the same thing.  Which do the various
>> use cases expect it to be, and which would be the most robust going forward?
> 
> So with the field getting updated right in ept_set_entry(), as long as
> no copying of entries exists which does not go through that function,
> I then agree that it shouldn't really matter whether the field gets
> copied when setting up a new altp2m.
> 
> However, fair parts of your further response are confusing to me,
> rather than clarifying. That's for one because you talk about the
> max_remapped_gfn field, but you never mention its
> min_remapped_gfn sibling. The only place I could find where
> current code consumes these two is p2m_altp2m_propagate_change().
> This suggests to me that both fields really only exist for optimization
> purposes.
> 
> Furthermore I in particular ...
> 
>> I spent a bunch of time going through the code yesterday, and I'm pretty
>> sure that as long as the value in the p2m is one or the other, there
>> will be at the moment no _correctness_ issues.  It looked to me like in
>> the case where altp2m->max_mapped_pfn > altp2m->max_remapped_gfn, then
>> the p2m machinery would do a certain amount of unnecessary work, but
>> that's all.
>>
>> It also looked to me like before this patch, the value mostly ends up
>> being  "maximum pfn ever mapped in this p2m (even across altp2m
>> desroys)".  That's because when the altp2m is allocated, it starts as 0;
>> every time an entry is propagated from the hostp2m to the altp2m,
>> ept_set_entry() updates max_mapped_pfn; but nothing sets it back to zero.
>>
>> Also, hostp2m->max_mapped_pfn is never decreased, only increased.
>>
>> So either before or after this patch, altp2m->max_remapped_gfn <=
>> altp2m->max_mapped_pfn <= hostp2m->max_mapped_pfn.
>>
>> In the vast majority of cases, max_mapped_pfn is explicitly being read
>> from the hostp2m.
>>
>> Below are the cases where it may be read from an altp2m:
>>
>>  - ept_get_entry(), ept_walk_tables(): If ==max_remapped_gfn, it will
>> return INVALID_MFN early.  If higher than max_remapped_gfn, it falls
>> back to walking the altp2m's ept tables, which will give you the same
>> answer, just a bit more slowly.
>>
>>  - ept_dump_p2m_tables(): If ==max_remapped_gfn, it will dump only up to
>> the current map; if >max_remapped_gfn, it will dump a number of
>> unnecessary INVALID_MFN entries, but no more entries than the hostp2m.
>>
>>  - p2m.c:change_type_range(): If ==max_remapped_gfn, it will only
>> invalidate entries in the altp2m as high have been currently remapped.
>> If >max_remapped_gfn, it will write invalid ept entries that *haven't*
>> yet been copied over.  But I don't think either one should cause a
>> correctness issue: either way, accessing a gfn > max_remapped_gfn will
>> cause a fault, at which point either the correct value will be copied
>> from the hostp2m (perhaps going through resolve_misconfig() on the
>> hostp2m in the process) or the correct value will be calculated via
>> resolve_misconfig().
> 
> ... cannot identify any of the three cases above where I understand
> you say a max_mapped_pfn == max_remapped_gfn comparison
> happens. But as you say - the code is complicated  enough, so I may
> easily overlook something.

Sorry, it seems I took too many shortcuts explaining things. :-)  I was
using max_remapped_gfn as a shorthand for, "the highest gfn mapped in
the altp2m" (since that's what it will be equal to).  Not that an actual
comparison will happen there, but we're considering what will happen,
based on various values of altp2m->max_mapped_pfn, when a gfn that is
higher than the highest *remapped* gfn is encountered.

So in the situation we're considering, the following are always true:
- gfn > altp2m->max_remapped_gfn
- altp2m->max_remapped_gfn <= altp2m->max_mapped_pfn <=
hostp2m->max_mapped_pfn

And we're comparing the results in the following cases:

A: altp2m->max_mapped_pfn == alt2m->max_remapped_gfn
B: altp2m->max_mapped_pfn > altp2m->max_remapped_gfn
  (Perhaps == hostp2m->max_mapped_pfn, perhaps something in between).

Take ept_get_entry().  At this level, it's below all the mem_access /
mem_sharing / mem_paging / altp2m abstractions.  Apart from
resolve_misconfig and pod, it should return the actual contents of the
particular p2m it's given.  In the case of an altp2m, any entries above
what's currently been remapped should return empty.

In case A, it will do this, because the first conditional will find that
gfn > altp2m->max_mapped_pfn.

In case B, it will also do this, because although it passes the first
conditional, when it actually reads the table it will find an empty
entry and return that.

Both results are correct, but A is a tiny bit faster.

Now take 

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread Razvan Cojocaru
On 11/16/18 2:03 PM, George Dunlap wrote:
> The code is definitely complicated enough, though, that I may have
> missed something, which is why I asked Razvan if there was a reason he
> changed it.
> 
> For the purposes of this patch, I propose having p2m_altp2m_init_ept()
> set max_mapped_pfn to 0 (if that works), and leaving "get rid of
> max_remapped_pfn" for a future clean-up series.

I've retraced my previous analysis and re-ran some tests, and I now
remember (sorry it took a while) why the p2m->max_mapped_pfn =
hostp2m->max_mapped_pfn was both necessary and not accidental.

Let's say we set it to 0 in p2m_altp2m_init_ept(). Then,
hap_track_dirty_vram() calls p2m_change_type_range(), which calls the
newly added change_type_range().

Change_type_range() looks like this:

static void change_type_range(struct p2m_domain *p2m,
  unsigned long start, unsigned long end,
  p2m_type_t ot, p2m_type_t nt)
{
unsigned long gfn = start;
struct domain *d = p2m->domain;
int rc = 0;

p2m->defer_nested_flush = 1;

if ( unlikely(end > p2m->max_mapped_pfn) )
{
if ( !gfn )
{
p2m->change_entry_type_global(p2m, ot, nt);
gfn = end;
}
end = p2m->max_mapped_pfn + 1;
}
if ( gfn < end )
rc = p2m->change_entry_type_range(p2m, ot, nt, gfn, end - 1);
if ( rc )
{
printk(XENLOG_G_ERR "Error %d changing Dom%d GFNs [%lx,%lx] from
%d to %d\n",
   rc, d->domain_id, start, end - 1, ot, nt);
domain_crash(d);
}

switch ( nt )
{
case p2m_ram_rw:
if ( ot == p2m_ram_logdirty )
rc = rangeset_remove_range(p2m->logdirty_ranges, start, end
- 1);
break;
case p2m_ram_logdirty:
if ( ot == p2m_ram_rw )
rc = rangeset_add_range(p2m->logdirty_ranges, start, end - 1);
break;
default:
break;
}
if ( rc )
{
printk(XENLOG_G_ERR "Error %d manipulating Dom%d's log-dirty
ranges\n",
   rc, d->domain_id);
domain_crash(d);
}

p2m->defer_nested_flush = 0;
if ( nestedhvm_enabled(d) )
p2m_flush_nestedp2m(d);
}

If we set p2m->max_mapped_pfn to 0, we're guaranteed to run into the if
( unlikely(end > p2m->max_mapped_pfn) ) body, where end =
p2m->max_mapped_pfn + 1; will make end 1.

Then, we will crash the hypervisor in rangeset_add_range(), where
there's an ASSERT() stating that start <= end.

So my reasoning was that, since setting p2m->max_mapped_pfn =
hostp2m->max_mapped_pfn is both harmless (as both your and Jan's
analyses appear to confirm) and makes this code correct, that was the
way to go.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread Jan Beulich
>>> On 16.11.18 at 13:03,  wrote:
> So the basic question is, does "max_mapped_pfn" mean, "Maximum pfn _for
> the domain_", or "Maximum pfn _for this p2m_".  When the element was
> added to the p2m struct those were the same thing.  Which do the various
> use cases expect it to be, and which would be the most robust going forward?

So with the field getting updated right in ept_set_entry(), as long as
no copying of entries exists which does not go through that function,
I then agree that it shouldn't really matter whether the field gets
copied when setting up a new altp2m.

However, fair parts of your further response are confusing to me,
rather than clarifying. That's for one because you talk about the
max_remapped_gfn field, but you never mention its
min_remapped_gfn sibling. The only place I could find where
current code consumes these two is p2m_altp2m_propagate_change().
This suggests to me that both fields really only exist for optimization
purposes.

Furthermore I in particular ...

> I spent a bunch of time going through the code yesterday, and I'm pretty
> sure that as long as the value in the p2m is one or the other, there
> will be at the moment no _correctness_ issues.  It looked to me like in
> the case where altp2m->max_mapped_pfn > altp2m->max_remapped_gfn, then
> the p2m machinery would do a certain amount of unnecessary work, but
> that's all.
> 
> It also looked to me like before this patch, the value mostly ends up
> being  "maximum pfn ever mapped in this p2m (even across altp2m
> desroys)".  That's because when the altp2m is allocated, it starts as 0;
> every time an entry is propagated from the hostp2m to the altp2m,
> ept_set_entry() updates max_mapped_pfn; but nothing sets it back to zero.
> 
> Also, hostp2m->max_mapped_pfn is never decreased, only increased.
> 
> So either before or after this patch, altp2m->max_remapped_gfn <=
> altp2m->max_mapped_pfn <= hostp2m->max_mapped_pfn.
> 
> In the vast majority of cases, max_mapped_pfn is explicitly being read
> from the hostp2m.
> 
> Below are the cases where it may be read from an altp2m:
> 
>  - ept_get_entry(), ept_walk_tables(): If ==max_remapped_gfn, it will
> return INVALID_MFN early.  If higher than max_remapped_gfn, it falls
> back to walking the altp2m's ept tables, which will give you the same
> answer, just a bit more slowly.
> 
>  - ept_dump_p2m_tables(): If ==max_remapped_gfn, it will dump only up to
> the current map; if >max_remapped_gfn, it will dump a number of
> unnecessary INVALID_MFN entries, but no more entries than the hostp2m.
> 
>  - p2m.c:change_type_range(): If ==max_remapped_gfn, it will only
> invalidate entries in the altp2m as high have been currently remapped.
> If >max_remapped_gfn, it will write invalid ept entries that *haven't*
> yet been copied over.  But I don't think either one should cause a
> correctness issue: either way, accessing a gfn > max_remapped_gfn will
> cause a fault, at which point either the correct value will be copied
> from the hostp2m (perhaps going through resolve_misconfig() on the
> hostp2m in the process) or the correct value will be calculated via
> resolve_misconfig().

... cannot identify any of the three cases above where I understand
you say a max_mapped_pfn == max_remapped_gfn comparison
happens. But as you say - the code is complicated  enough, so I may
easily overlook something.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread Razvan Cojocaru
On 11/16/18 2:03 PM, George Dunlap wrote:
> On 11/16/18 10:08 AM, Jan Beulich wrote:
> On 15.11.18 at 18:54,  wrote:
>>> On 11/15/18 7:34 PM, George Dunlap wrote:
> On Nov 14, 2018, at 8:40 PM, Razvan Cojocaru  
> wrote:
> @@ -1440,6 +1443,11 @@ void p2m_init_altp2m_ept(struct domain *d, 
> unsigned int i)
> struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
> struct ept_data *ept;
>
> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;

 Why we need to copy this value?

 I’ve just spent the afternoon tracing around the source code, and while 
 I’m 
>>> pretty sure it won’t hurt, I’m also not sure why it’s necessary.
>>>
>>> I think I did that because I assumed that it would matter for
>>> ept_get_entry() and misconfigurations, when:
>>>
>>>  954 /* This pfn is higher than the highest the p2m map currently
>>> holds */
>>>  955 if ( gfn > p2m->max_mapped_pfn )
>>
>> The question is whether under any condition such checks require that
>> the accumulated value be in sync between the host and the various
>> alt p2m-s.
>>
>>>  956 {
>>>  957 for ( i = ept->wl; i > 0; --i )
>>>  958 if ( (gfn & ~((1UL << (i * EPT_TABLE_ORDER)) - 1)) >
>>>  959  p2m->max_mapped_pfn )
>>>  960 break;
>>>  961 goto out;
>>>  962 }
>>>
>>> but I am not certain it is required either. I can certainly remove this
>>> assignment and see if anything breaks.
>>
>> I've seen you mention such or alike a couple of times now while
>> discussing this series, and I have to admit that I'm a little frightened:
>> Making a change just based on the observation that nothing breaks
>> is setting us up for breakage in some corner case. That is - seeing
>> something break is a good indication that a change was wrong, but
>> not seeing any breakage doesn't alone justify a change.
>>
>> In the particular case here I think whether the copying of the field
>> is needed depends on what else is being copied (I'm sorry, I'm not
>> that familiar with altp2m): If mappings get cloned from the host p2m,
>> then this value ought to be cloned too. For any mappings that
>> might be kept in sync between p2m-s, I think this field also would
>> need to be updated in sync.
> 
> So the basic question is, does "max_mapped_pfn" mean, "Maximum pfn _for
> the domain_", or "Maximum pfn _for this p2m_".  When the element was
> added to the p2m struct those were the same thing.  Which do the various
> use cases expect it to be, and which would be the most robust going forward?
> 
> I spent a bunch of time going through the code yesterday, and I'm pretty
> sure that as long as the value in the p2m is one or the other, there
> will be at the moment no _correctness_ issues.  It looked to me like in
> the case where altp2m->max_mapped_pfn > altp2m->max_remapped_gfn, then
> the p2m machinery would do a certain amount of unnecessary work, but
> that's all.
> 
> It also looked to me like before this patch, the value mostly ends up
> being  "maximum pfn ever mapped in this p2m (even across altp2m
> desroys)".  That's because when the altp2m is allocated, it starts as 0;
> every time an entry is propagated from the hostp2m to the altp2m,
> ept_set_entry() updates max_mapped_pfn; but nothing sets it back to zero.
> 
> Also, hostp2m->max_mapped_pfn is never decreased, only increased.
> 
> So either before or after this patch, altp2m->max_remapped_gfn <=
> altp2m->max_mapped_pfn <= hostp2m->max_mapped_pfn.
> 
> In the vast majority of cases, max_mapped_pfn is explicitly being read
> from the hostp2m.
> 
> Below are the cases where it may be read from an altp2m:
> 
>  - ept_get_entry(), ept_walk_tables(): If ==max_remapped_gfn, it will
> return INVALID_MFN early.  If higher than max_remapped_gfn, it falls
> back to walking the altp2m's ept tables, which will give you the same
> answer, just a bit more slowly.
> 
>  - ept_dump_p2m_tables(): If ==max_remapped_gfn, it will dump only up to
> the current map; if >max_remapped_gfn, it will dump a number of
> unnecessary INVALID_MFN entries, but no more entries than the hostp2m.
> 
>  - p2m.c:change_type_range(): If ==max_remapped_gfn, it will only
> invalidate entries in the altp2m as high have been currently remapped.
> If >max_remapped_gfn, it will write invalid ept entries that *haven't*
> yet been copied over.  But I don't think either one should cause a
> correctness issue: either way, accessing a gfn > max_remapped_gfn will
> cause a fault, at which point either the correct value will be copied
> from the hostp2m (perhaps going through resolve_misconfig() on the
> hostp2m in the process) or the correct value will be calculated via
> resolve_misconfig().
> 
> So, it seemed from my investigation like it would be more useful if we
> got rid of altp2m->max_remapped_gfn, and used atlp2m->max_mapped_pfn
> instead.  If people want to know the maximum gfn mapped by the domain as
> a whole, they should just use 

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread George Dunlap
On 11/15/18 8:51 PM, Razvan Cojocaru wrote:
> On 11/15/18 10:26 PM, George Dunlap wrote:
>>
>>
>>> On Nov 14, 2018, at 8:40 PM, Razvan Cojocaru  
>>> wrote:
>>>
>>> When an new altp2m view is created very early on guest boot, the
>>> display will freeze (although the guest will run normally). This
>>> may also happen on resizing the display. The reason is the way
>>> Xen currently (mis)handles logdirty VGA: it intentionally
>>> misconfigures VGA pages so that they will fault.
>>>
>>> The problem is that it only does this in the host p2m. Once we
>>> switch to a new altp2m, the misconfigured entries will no longer
>>> fault, so the display will not be updated.
>>>
>>> This patch:
>>> * updates ept_handle_misconfig() to use the active altp2m instead
>>>  of the hostp2m;
>>> * modifies p2m_change_entry_type_global(), p2m_memory_type_changed
>>>  and p2m_change_type_range() to propagate their changes to all
>>>  valid altp2ms.
>>>
>>> With the introduction of altp2m fields in p2m_memory_type_changed()
>>> the whole function has been put under CONFIG_HVM.
>>
>> Sorry — actually, I think there’s yet another issue lurking here:  
>> p2m_finish_type_change(). I’ll poke around a bit more tomorrow.
> 
> OK, I'll wait for your recommendation before working on the last patch
> of the series.

It looks like you'll want to do the same thing to
p2m_finish_type_change() that you did for p2m_change_type_range():
create a finish_type_change() function which loops over the appropriate
range calling p2m->recalc(), and then have p2m_finish_type_change()
first call this on the hostp2m, then one-by-one on all valid altp2ms.

In finish_type_change() you'll want to clip first_gfn and max_nr to fit
within p2m->min_remapped_gfn and p2m->max_remapped_gfn for altp2ms.

I think that should do the trick.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread George Dunlap
On 11/16/18 10:08 AM, Jan Beulich wrote:
 On 15.11.18 at 18:54,  wrote:
>> On 11/15/18 7:34 PM, George Dunlap wrote:
 On Nov 14, 2018, at 8:40 PM, Razvan Cojocaru  
 wrote:
 @@ -1440,6 +1443,11 @@ void p2m_init_altp2m_ept(struct domain *d, unsigned 
 int i)
 struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
 struct ept_data *ept;

 +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
>>>
>>> Why we need to copy this value?
>>>
>>> I’ve just spent the afternoon tracing around the source code, and while I’m 
>> pretty sure it won’t hurt, I’m also not sure why it’s necessary.
>>
>> I think I did that because I assumed that it would matter for
>> ept_get_entry() and misconfigurations, when:
>>
>>  954 /* This pfn is higher than the highest the p2m map currently
>> holds */
>>  955 if ( gfn > p2m->max_mapped_pfn )
> 
> The question is whether under any condition such checks require that
> the accumulated value be in sync between the host and the various
> alt p2m-s.
> 
>>  956 {
>>  957 for ( i = ept->wl; i > 0; --i )
>>  958 if ( (gfn & ~((1UL << (i * EPT_TABLE_ORDER)) - 1)) >
>>  959  p2m->max_mapped_pfn )
>>  960 break;
>>  961 goto out;
>>  962 }
>>
>> but I am not certain it is required either. I can certainly remove this
>> assignment and see if anything breaks.
> 
> I've seen you mention such or alike a couple of times now while
> discussing this series, and I have to admit that I'm a little frightened:
> Making a change just based on the observation that nothing breaks
> is setting us up for breakage in some corner case. That is - seeing
> something break is a good indication that a change was wrong, but
> not seeing any breakage doesn't alone justify a change.
> 
> In the particular case here I think whether the copying of the field
> is needed depends on what else is being copied (I'm sorry, I'm not
> that familiar with altp2m): If mappings get cloned from the host p2m,
> then this value ought to be cloned too. For any mappings that
> might be kept in sync between p2m-s, I think this field also would
> need to be updated in sync.

So the basic question is, does "max_mapped_pfn" mean, "Maximum pfn _for
the domain_", or "Maximum pfn _for this p2m_".  When the element was
added to the p2m struct those were the same thing.  Which do the various
use cases expect it to be, and which would be the most robust going forward?

I spent a bunch of time going through the code yesterday, and I'm pretty
sure that as long as the value in the p2m is one or the other, there
will be at the moment no _correctness_ issues.  It looked to me like in
the case where altp2m->max_mapped_pfn > altp2m->max_remapped_gfn, then
the p2m machinery would do a certain amount of unnecessary work, but
that's all.

It also looked to me like before this patch, the value mostly ends up
being  "maximum pfn ever mapped in this p2m (even across altp2m
desroys)".  That's because when the altp2m is allocated, it starts as 0;
every time an entry is propagated from the hostp2m to the altp2m,
ept_set_entry() updates max_mapped_pfn; but nothing sets it back to zero.

Also, hostp2m->max_mapped_pfn is never decreased, only increased.

So either before or after this patch, altp2m->max_remapped_gfn <=
altp2m->max_mapped_pfn <= hostp2m->max_mapped_pfn.

In the vast majority of cases, max_mapped_pfn is explicitly being read
from the hostp2m.

Below are the cases where it may be read from an altp2m:

 - ept_get_entry(), ept_walk_tables(): If ==max_remapped_gfn, it will
return INVALID_MFN early.  If higher than max_remapped_gfn, it falls
back to walking the altp2m's ept tables, which will give you the same
answer, just a bit more slowly.

 - ept_dump_p2m_tables(): If ==max_remapped_gfn, it will dump only up to
the current map; if >max_remapped_gfn, it will dump a number of
unnecessary INVALID_MFN entries, but no more entries than the hostp2m.

 - p2m.c:change_type_range(): If ==max_remapped_gfn, it will only
invalidate entries in the altp2m as high have been currently remapped.
If >max_remapped_gfn, it will write invalid ept entries that *haven't*
yet been copied over.  But I don't think either one should cause a
correctness issue: either way, accessing a gfn > max_remapped_gfn will
cause a fault, at which point either the correct value will be copied
from the hostp2m (perhaps going through resolve_misconfig() on the
hostp2m in the process) or the correct value will be calculated via
resolve_misconfig().

So, it seemed from my investigation like it would be more useful if we
got rid of altp2m->max_remapped_gfn, and used atlp2m->max_mapped_pfn
instead.  If people want to know the maximum gfn mapped by the domain as
a whole, they should just use hostp2m->max_mapped_pfn.

The code is definitely complicated enough, though, that I may have
missed something, which is why I asked Razvan if there was a reason he
changed it.

For the purposes of this 

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread Razvan Cojocaru
On 11/16/18 12:08 PM, Jan Beulich wrote:
> I've seen you mention such or alike a couple of times now while
> discussing this series, and I have to admit that I'm a little frightened:
> Making a change just based on the observation that nothing breaks
> is setting us up for breakage in some corner case. That is - seeing
> something break is a good indication that a change was wrong, but
> not seeing any breakage doesn't alone justify a change.
> 
> In the particular case here I think whether the copying of the field
> is needed depends on what else is being copied (I'm sorry, I'm not
> that familiar with altp2m): If mappings get cloned from the host p2m,
> then this value ought to be cloned too. For any mappings that
> might be kept in sync between p2m-s, I think this field also would
> need to be updated in sync.

I agree, I was merely offering to test whether any breakage occurs if
not syncing the field, not suggesting that that's a comprehensive
analysis or a justification for change, in case that offers us more insight.

AFAIK, EPT restrictions are always propagated from the hostp2m to all
active altp2ms. I've tested this at some point in the past and it has
proven true - it's supposedly an altp2m feature although it can also be
a bit of a headache: if we would have been able to use the hostp2m
independently then we wouldn't have had this display freeze problem,
since we could have used the hostp2m as the default (restricted) EPT
view. As things stand now, we need to create a new view when starting
introspection and switch to that immediately, so changes to it will not
propagate into other views (in other words, so we can use it
independently of all other altp2ms).


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-16 Thread Jan Beulich
>>> On 15.11.18 at 18:54,  wrote:
> On 11/15/18 7:34 PM, George Dunlap wrote:
>>> On Nov 14, 2018, at 8:40 PM, Razvan Cojocaru  
>>> wrote:
>>> @@ -1440,6 +1443,11 @@ void p2m_init_altp2m_ept(struct domain *d, unsigned 
>>> int i)
>>> struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>>> struct ept_data *ept;
>>>
>>> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
>> 
>> Why we need to copy this value?
>> 
>> I’ve just spent the afternoon tracing around the source code, and while I’m 
> pretty sure it won’t hurt, I’m also not sure why it’s necessary.
> 
> I think I did that because I assumed that it would matter for
> ept_get_entry() and misconfigurations, when:
> 
>  954 /* This pfn is higher than the highest the p2m map currently
> holds */
>  955 if ( gfn > p2m->max_mapped_pfn )

The question is whether under any condition such checks require that
the accumulated value be in sync between the host and the various
alt p2m-s.

>  956 {
>  957 for ( i = ept->wl; i > 0; --i )
>  958 if ( (gfn & ~((1UL << (i * EPT_TABLE_ORDER)) - 1)) >
>  959  p2m->max_mapped_pfn )
>  960 break;
>  961 goto out;
>  962 }
> 
> but I am not certain it is required either. I can certainly remove this
> assignment and see if anything breaks.

I've seen you mention such or alike a couple of times now while
discussing this series, and I have to admit that I'm a little frightened:
Making a change just based on the observation that nothing breaks
is setting us up for breakage in some corner case. That is - seeing
something break is a good indication that a change was wrong, but
not seeing any breakage doesn't alone justify a change.

In the particular case here I think whether the copying of the field
is needed depends on what else is being copied (I'm sorry, I'm not
that familiar with altp2m): If mappings get cloned from the host p2m,
then this value ought to be cloned too. For any mappings that
might be kept in sync between p2m-s, I think this field also would
need to be updated in sync.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-15 Thread Razvan Cojocaru
On 11/15/18 10:26 PM, George Dunlap wrote:
> 
> 
>> On Nov 14, 2018, at 8:40 PM, Razvan Cojocaru  
>> wrote:
>>
>> When an new altp2m view is created very early on guest boot, the
>> display will freeze (although the guest will run normally). This
>> may also happen on resizing the display. The reason is the way
>> Xen currently (mis)handles logdirty VGA: it intentionally
>> misconfigures VGA pages so that they will fault.
>>
>> The problem is that it only does this in the host p2m. Once we
>> switch to a new altp2m, the misconfigured entries will no longer
>> fault, so the display will not be updated.
>>
>> This patch:
>> * updates ept_handle_misconfig() to use the active altp2m instead
>>  of the hostp2m;
>> * modifies p2m_change_entry_type_global(), p2m_memory_type_changed
>>  and p2m_change_type_range() to propagate their changes to all
>>  valid altp2ms.
>>
>> With the introduction of altp2m fields in p2m_memory_type_changed()
>> the whole function has been put under CONFIG_HVM.
> 
> Sorry — actually, I think there’s yet another issue lurking here:  
> p2m_finish_type_change(). I’ll poke around a bit more tomorrow.

OK, I'll wait for your recommendation before working on the last patch
of the series.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-15 Thread George Dunlap


> On Nov 14, 2018, at 8:40 PM, Razvan Cojocaru  
> wrote:
> 
> When an new altp2m view is created very early on guest boot, the
> display will freeze (although the guest will run normally). This
> may also happen on resizing the display. The reason is the way
> Xen currently (mis)handles logdirty VGA: it intentionally
> misconfigures VGA pages so that they will fault.
> 
> The problem is that it only does this in the host p2m. Once we
> switch to a new altp2m, the misconfigured entries will no longer
> fault, so the display will not be updated.
> 
> This patch:
> * updates ept_handle_misconfig() to use the active altp2m instead
>  of the hostp2m;
> * modifies p2m_change_entry_type_global(), p2m_memory_type_changed
>  and p2m_change_type_range() to propagate their changes to all
>  valid altp2ms.
> 
> With the introduction of altp2m fields in p2m_memory_type_changed()
> the whole function has been put under CONFIG_HVM.

Sorry — actually, I think there’s yet another issue lurking here:  
p2m_finish_type_change(). I’ll poke around a bit more tomorrow.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-15 Thread Razvan Cojocaru
On 11/15/18 7:34 PM, George Dunlap wrote:
> 
> 
>> On Nov 14, 2018, at 8:40 PM, Razvan Cojocaru  
>> wrote:
>>
>> When an new altp2m view is created very early on guest boot, the
>> display will freeze (although the guest will run normally). This
>> may also happen on resizing the display. The reason is the way
>> Xen currently (mis)handles logdirty VGA: it intentionally
>> misconfigures VGA pages so that they will fault.
>>
>> The problem is that it only does this in the host p2m. Once we
>> switch to a new altp2m, the misconfigured entries will no longer
>> fault, so the display will not be updated.
>>
>> This patch:
>> * updates ept_handle_misconfig() to use the active altp2m instead
>>  of the hostp2m;
>> * modifies p2m_change_entry_type_global(), p2m_memory_type_changed
>>  and p2m_change_type_range() to propagate their changes to all
>>  valid altp2ms.
>>
>> With the introduction of altp2m fields in p2m_memory_type_changed()
>> the whole function has been put under CONFIG_HVM.
>>
>> Signed-off-by: Razvan Cojocaru 
>> Suggested-by: George Dunlap 
>> Reviewed-by: Kevin Tian 
> 
> Just one more question...
> 
>>
>> ---
>> CC: Jun Nakajima 
>> CC: Kevin Tian 
>> CC: George Dunlap 
>> CC: Jan Beulich 
>> CC: Andrew Cooper 
>> CC: Wei Liu 
>>
>> ---
>> Changes since V5:
>> - Added Kevin's Reviewed-by.
>> - Added a note on p2m_memory_type_changed() being under CONFIG_HVM.
>> ---
>> xen/arch/x86/mm/p2m-ept.c |   8 
>> xen/arch/x86/mm/p2m-pt.c  |   8 
>> xen/arch/x86/mm/p2m.c | 115 
>> ++
>> xen/include/asm-x86/p2m.h |   6 +--
>> 4 files changed, 114 insertions(+), 23 deletions(-)
>>
>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>> index fabcd06..e6fa85f 100644
>> --- a/xen/arch/x86/mm/p2m-ept.c
>> +++ b/xen/arch/x86/mm/p2m-ept.c
>> @@ -657,6 +657,9 @@ bool_t ept_handle_misconfig(uint64_t gpa)
>> bool_t spurious;
>> int rc;
>>
>> +if ( altp2m_active(curr->domain) )
>> +p2m = p2m_get_altp2m(curr);
>> +
>> p2m_lock(p2m);
>>
>> spurious = curr->arch.hvm.vmx.ept_spurious_misconfig;
>> @@ -1440,6 +1443,11 @@ void p2m_init_altp2m_ept(struct domain *d, unsigned 
>> int i)
>> struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
>> struct ept_data *ept;
>>
>> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;
> 
> Why we need to copy this value?
> 
> I’ve just spent the afternoon tracing around the source code, and while I’m 
> pretty sure it won’t hurt, I’m also not sure why it’s necessary.

I think I did that because I assumed that it would matter for
ept_get_entry() and misconfigurations, when:

 954 /* This pfn is higher than the highest the p2m map currently
holds */
 955 if ( gfn > p2m->max_mapped_pfn )
 956 {
 957 for ( i = ept->wl; i > 0; --i )
 958 if ( (gfn & ~((1UL << (i * EPT_TABLE_ORDER)) - 1)) >
 959  p2m->max_mapped_pfn )
 960 break;
 961 goto out;
 962 }

but I am not certain it is required either. I can certainly remove this
assignment and see if anything breaks.

> Everything else looks good!


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V6 4/4] x86/altp2m: fix display frozen when switching to a new view early

2018-11-15 Thread George Dunlap


> On Nov 14, 2018, at 8:40 PM, Razvan Cojocaru  
> wrote:
> 
> When an new altp2m view is created very early on guest boot, the
> display will freeze (although the guest will run normally). This
> may also happen on resizing the display. The reason is the way
> Xen currently (mis)handles logdirty VGA: it intentionally
> misconfigures VGA pages so that they will fault.
> 
> The problem is that it only does this in the host p2m. Once we
> switch to a new altp2m, the misconfigured entries will no longer
> fault, so the display will not be updated.
> 
> This patch:
> * updates ept_handle_misconfig() to use the active altp2m instead
>  of the hostp2m;
> * modifies p2m_change_entry_type_global(), p2m_memory_type_changed
>  and p2m_change_type_range() to propagate their changes to all
>  valid altp2ms.
> 
> With the introduction of altp2m fields in p2m_memory_type_changed()
> the whole function has been put under CONFIG_HVM.
> 
> Signed-off-by: Razvan Cojocaru 
> Suggested-by: George Dunlap 
> Reviewed-by: Kevin Tian 

Just one more question...

> 
> ---
> CC: Jun Nakajima 
> CC: Kevin Tian 
> CC: George Dunlap 
> CC: Jan Beulich 
> CC: Andrew Cooper 
> CC: Wei Liu 
> 
> ---
> Changes since V5:
> - Added Kevin's Reviewed-by.
> - Added a note on p2m_memory_type_changed() being under CONFIG_HVM.
> ---
> xen/arch/x86/mm/p2m-ept.c |   8 
> xen/arch/x86/mm/p2m-pt.c  |   8 
> xen/arch/x86/mm/p2m.c | 115 ++
> xen/include/asm-x86/p2m.h |   6 +--
> 4 files changed, 114 insertions(+), 23 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index fabcd06..e6fa85f 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -657,6 +657,9 @@ bool_t ept_handle_misconfig(uint64_t gpa)
> bool_t spurious;
> int rc;
> 
> +if ( altp2m_active(curr->domain) )
> +p2m = p2m_get_altp2m(curr);
> +
> p2m_lock(p2m);
> 
> spurious = curr->arch.hvm.vmx.ept_spurious_misconfig;
> @@ -1440,6 +1443,11 @@ void p2m_init_altp2m_ept(struct domain *d, unsigned 
> int i)
> struct p2m_domain *hostp2m = p2m_get_hostp2m(d);
> struct ept_data *ept;
> 
> +p2m->max_mapped_pfn = hostp2m->max_mapped_pfn;

Why we need to copy this value?

I’ve just spent the afternoon tracing around the source code, and while I’m 
pretty sure it won’t hurt, I’m also not sure why it’s necessary.

Everything else looks good!

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel