On 7/22/2016 4:00 PM, Razvan Cojocaru wrote:
On 07/22/2016 03:32 PM, Corneliu ZUZU wrote:
Look @ hvm_do_resume():
1. If you, as a toolstack user, get at sane step no. 6: "Uninitialize
everything (no events are possible here because of steps 4-5)."
2. But just before you do that, a hvm_do_resume() happens on an
arbitrary vCPU of the domain and gets to this point:
if ( unlikely(v->arch.vm_event) )
{
// ---> let's say you are now at this point, just after the
above non-NULL check
struct monitor_write_data *w = &v->arch.vm_event->write_data;
3. and before proceeding further, the uninitialization sequence
_completes_, i.e. this was done in vm_event_cleanup_domain():
for_each_vcpu ( d, v )
{
xfree(v->arch.vm_event);
v->arch.vm_event = NULL;
}
4. and then in hvm_do_resume() this gets to be done:
struct monitor_write_data *w = &v->arch.vm_event->write_data;
[...]
if ( w->do_write.msr )
{
hvm_msr_write_intercept(w->msr, w->value, 0);
w->do_write.msr = 0;
}
then you'll have a beautiful NULL-pointer access hypervisor crash.
As I've said before, of course I agree with that
Sorry, I thought by your previous statement "I assume that the reason
why this has not been addressed in the past is that introspection
applications are expected to be well-behaved" you were inferring that
with a 'sane' toolstack user, as you defined it, it won't be possible to
have a hypervisor crash.
I did (please see below).
Then, again, you'd be wrong, the hvm_do_resume() was just one example,
in a similar fashion this can happen in a number of other places.
Details will be given in the patch-series.
- isn't that what a
previous patch you've submitted addressed?
I'm not sure what patch you're referring to, I don't remember ever
addressing these issues before..
I am talking about your patch "x86/monitor: fix: don't compromise a
monitor_write_data with pending CR writes", which, if I'm not mistaken,
addresses the above problem (since we won't xfree(v->arch.vm_event) on
monitor uninit anymore).
Heh, I wouldn't call fixing problem A and by -coincidence- with that fix
also having -totally unrelated- problem B go away "addressing problem
B". I didn't even mention these locking issues in that patch, so how
could I have addressed them there?
With the above problem fixed, the workflow I suggested should be fine.
The fact of the matter remains that the current state of staging (which
is what this thread applies to) still has the hvm_do_resume() issue as
well as the others.
Again, I would prefer things to be rock solid, so personally I encourage
your patch and hope we get it in.
Thanks,
Razvan
Corneliu.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel