subject:"\[Xen\-devel\] \[PATCH v3 3\/3\] tools\: introduce parameter max_wp_ram

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-22 Thread Paul Durrant

> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: 22 February 2016 17:23
> To: Paul Durrant
> Cc: Jan Beulich; Kevin Tian; Zhang Yu; Wei Liu; Ian Campbell; Andrew Cooper;
> xen-devel@lists.xen.org; Stefano Stabellini; Zhiyuan Lv; Ian Jackson; Keir
> (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> On 22/02/16 17:01, Paul Durrant wrote:
> >> What you did in an earlier version of this series (correct me if I'm
> >> wrong) is to make a separate hypercall for memory, but still keep using
> >> the same internal implementation (i.e., still having a write_dm p2m type
> >> and using rangesets to determine which ioreq server to give the
> >> notification to).  But since the interface for memory looks exactly the
> >> same as the interface for MMIO, at some point this morphed back to "use
> >> xen_hvm_io_range but with a different range type (i.e., WP_MEM)".
> >>
> >
> > Yes, and that's now pointless since we're going to use purely p2m types for
> sending memory accesses to ioreq servers.
> >
> >> From an *interface* perspective that makes sense, because in both cases
> >> you want to be able to specify a domain, an ioreq server, and a range of
> >> physical addresses.  I don't have any objection to the change you made
> >> to hvm_op.h in this version of the series.
> >>
> >
> > No, if we are intercepting 'memory' purely by p2m type then there is no
> need for the additional range type.
> 
> So here seems to be the crux of our disagreement.
> 
> I don't understand why you think that the WP_MEM interface described in
> patch 2 of this series can't be implemented using p2m types rather than
> rangesets.
> 

Huh? I just said that's exactly how I intend to implement it:

"if we are intercepting 'memory' purely by p2m type then there is no need for 
the additional range type"

I.e. we will not intercept memory accesses by range, we will only intercept 
them by type.

  Paul

>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-22 Thread George Dunlap

On 22/02/16 17:01, Paul Durrant wrote:
>> What you did in an earlier version of this series (correct me if I'm
>> wrong) is to make a separate hypercall for memory, but still keep using
>> the same internal implementation (i.e., still having a write_dm p2m type
>> and using rangesets to determine which ioreq server to give the
>> notification to).  But since the interface for memory looks exactly the
>> same as the interface for MMIO, at some point this morphed back to "use
>> xen_hvm_io_range but with a different range type (i.e., WP_MEM)".
>>
> 
> Yes, and that's now pointless since we're going to use purely p2m types for 
> sending memory accesses to ioreq servers.
> 
>> From an *interface* perspective that makes sense, because in both cases
>> you want to be able to specify a domain, an ioreq server, and a range of
>> physical addresses.  I don't have any objection to the change you made
>> to hvm_op.h in this version of the series.
>>
> 
> No, if we are intercepting 'memory' purely by p2m type then there is no need 
> for the additional range type.

So here seems to be the crux of our disagreement.

I don't understand why you think that the WP_MEM interface described in
patch 2 of this series can't be implemented using p2m types rather than
rangesets.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-22 Thread Paul Durrant

> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: 22 February 2016 16:46
> To: Paul Durrant
> Cc: Jan Beulich; Kevin Tian; Zhang Yu; Wei Liu; Ian Campbell; Andrew Cooper;
> xen-devel@lists.xen.org; Stefano Stabellini; Zhiyuan Lv; Ian Jackson; Keir
> (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> On 22/02/16 16:02, Paul Durrant wrote:
> >> -Original Message-
> >> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> >> George Dunlap
> >> Sent: 22 February 2016 15:56
> >> To: Paul Durrant
> >> Cc: George Dunlap; Jan Beulich; Kevin Tian; Zhang Yu; Wei Liu; Ian
> Campbell;
> >> Andrew Cooper; xen-devel@lists.xen.org; Stefano Stabellini; Zhiyuan Lv;
> Ian
> >> Jackson; Keir (Xen.org)
> >> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> >> max_wp_ram_ranges.
> >>
> >> On Wed, Feb 17, 2016 at 11:12 AM, Paul Durrant
> <paul.durr...@citrix.com>
> >> wrote:
> >>>> -Original Message-
> >>>> From: George Dunlap [mailto:george.dun...@citrix.com]
> >>>> Sent: 17 February 2016 11:02
> >>>> To: Jan Beulich; Paul Durrant; Kevin Tian; Zhang Yu
> >>>> Cc: Andrew Cooper; Ian Campbell; Ian Jackson; Stefano Stabellini; Wei
> Liu;
> >>>> Zhiyuan Lv; xen-devel@lists.xen.org; George Dunlap; Keir (Xen.org)
> >>>> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> >>>> max_wp_ram_ranges.
> >>>>
> >>>> On 17/02/16 10:22, Jan Beulich wrote:
> >>>>>>>> On 17.02.16 at 10:58, <kevin.t...@intel.com> wrote:
> >>>>>> Thanks for the help. Let's see whether we can have some solution
> >> ready
> >>>> for
> >>>>>> 4.7. :-)
> >>>>>
> >>>>> Since we now seem to all agree that a different approach is going to
> >>>>> be taken, I think we indeed should revert f5a32c5b8e ("x86/HVM:
> >>>>> differentiate IO/mem resources tracked by ioreq server"). Please
> >>>>> voice objections to the plan pretty soon.
> >>>>
> >>>> FWIW, after this discussion, I don't have an objection to the basic
> >>>> interface in this series as it is, since it addresses my request that it
> >>>> be memory-based, and it could be switched to using p2m types
> >>>> behind-the-scenes -- with the exception of the knob to control how
> many
> >>>> ranges are allowed (since it exposes the internal implementation).
> >>>>
> >>>> I wouldn't object to the current implementation going in as it was in v9
> >>>> (apparently), and then changing the type stuff around behind the
> scenes
> >>>> later as an optimization.  I also don't think it would be terribly
> >>>> difficult to change the implementation as it is to just use write_dm for
> >>>> a single IOREQ server.  We can rename it ioreq_server and expand it
> >>>> later.  Sorry if this wasn't clear from my comments before.
> >>>>
> >>>
> >>> The problem is that, since you objected to wp_mem being treated the
> >> same as mmio, we had to introduce a new range type to the map/unmap
> >> hypercalls and that will become redundant if we steer by type. If we could
> >> have just increased the size of the rangeset for mmio and used that *for
> >> now* then weeks of work could have been saved going down this dead
> end.
> >>
> >> So first of all, let me say that I handled things respect to this
> >> thread rather poorly in a lot of ways, and you're right to be
> >> frustrated.  All I can say is, I'm sorry and I'll try to do better.
> >>
> >> I'm not sure I understand what you're saying with this line: "We had
> >> to introduce a new range type to the map/unmap hypercalls that will
> >> become redundant if we steer by type".
> >>
> >
> > What I means is that if we are going to do away with the concept of 'wp
> mem' and replace it with 'special type for steering to ioreq server' then we
> only need range types of pci, portio and mmio as we had before.
> 
> I'm still confused.
> 

So am I. I thought I was being quite clear.

> To me, there are two distinct things here: the interface and the
> implementation.
> 

Indeed.

> From an interface perspective, ioreq servers

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-22 Thread George Dunlap

On 22/02/16 16:02, Paul Durrant wrote:
>> -Original Message-
>> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
>> George Dunlap
>> Sent: 22 February 2016 15:56
>> To: Paul Durrant
>> Cc: George Dunlap; Jan Beulich; Kevin Tian; Zhang Yu; Wei Liu; Ian Campbell;
>> Andrew Cooper; xen-devel@lists.xen.org; Stefano Stabellini; Zhiyuan Lv; Ian
>> Jackson; Keir (Xen.org)
>> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
>> max_wp_ram_ranges.
>>
>> On Wed, Feb 17, 2016 at 11:12 AM, Paul Durrant <paul.durr...@citrix.com>
>> wrote:
>>>> -Original Message-
>>>> From: George Dunlap [mailto:george.dun...@citrix.com]
>>>> Sent: 17 February 2016 11:02
>>>> To: Jan Beulich; Paul Durrant; Kevin Tian; Zhang Yu
>>>> Cc: Andrew Cooper; Ian Campbell; Ian Jackson; Stefano Stabellini; Wei Liu;
>>>> Zhiyuan Lv; xen-devel@lists.xen.org; George Dunlap; Keir (Xen.org)
>>>> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
>>>> max_wp_ram_ranges.
>>>>
>>>> On 17/02/16 10:22, Jan Beulich wrote:
>>>>>>>> On 17.02.16 at 10:58, <kevin.t...@intel.com> wrote:
>>>>>> Thanks for the help. Let's see whether we can have some solution
>> ready
>>>> for
>>>>>> 4.7. :-)
>>>>>
>>>>> Since we now seem to all agree that a different approach is going to
>>>>> be taken, I think we indeed should revert f5a32c5b8e ("x86/HVM:
>>>>> differentiate IO/mem resources tracked by ioreq server"). Please
>>>>> voice objections to the plan pretty soon.
>>>>
>>>> FWIW, after this discussion, I don't have an objection to the basic
>>>> interface in this series as it is, since it addresses my request that it
>>>> be memory-based, and it could be switched to using p2m types
>>>> behind-the-scenes -- with the exception of the knob to control how many
>>>> ranges are allowed (since it exposes the internal implementation).
>>>>
>>>> I wouldn't object to the current implementation going in as it was in v9
>>>> (apparently), and then changing the type stuff around behind the scenes
>>>> later as an optimization.  I also don't think it would be terribly
>>>> difficult to change the implementation as it is to just use write_dm for
>>>> a single IOREQ server.  We can rename it ioreq_server and expand it
>>>> later.  Sorry if this wasn't clear from my comments before.
>>>>
>>>
>>> The problem is that, since you objected to wp_mem being treated the
>> same as mmio, we had to introduce a new range type to the map/unmap
>> hypercalls and that will become redundant if we steer by type. If we could
>> have just increased the size of the rangeset for mmio and used that *for
>> now* then weeks of work could have been saved going down this dead end.
>>
>> So first of all, let me say that I handled things respect to this
>> thread rather poorly in a lot of ways, and you're right to be
>> frustrated.  All I can say is, I'm sorry and I'll try to do better.
>>
>> I'm not sure I understand what you're saying with this line: "We had
>> to introduce a new range type to the map/unmap hypercalls that will
>> become redundant if we steer by type".
>>
> 
> What I means is that if we are going to do away with the concept of 'wp mem' 
> and replace it with 'special type for steering to ioreq server' then we only 
> need range types of pci, portio and mmio as we had before.

I'm still confused.

To me, there are two distinct things here: the interface and the
implementation.

>From an interface perspective, ioreq servers need to be able to ask Xen
to intercept writes to specific memory regions.  I insisted that this
interface be specifically designed for memory, not calling them MMIO
regions (since it's actually memory and not MMIO regions); and I said
that the explosion in number of required rangesets demonstrated that
rangesets were a bad fit to implement this interface.

What you did in an earlier version of this series (correct me if I'm
wrong) is to make a separate hypercall for memory, but still keep using
the same internal implementation (i.e., still having a write_dm p2m type
and using rangesets to determine which ioreq server to give the
notification to).  But since the interface for memory looks exactly the
same as the interface for MMIO, at some point this morphed back to "use
xen_hvm_io_range but with a different range type (i.e., WP_MEM)".

>From an *

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-22 Thread Paul Durrant

> -Original Message-
> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> George Dunlap
> Sent: 22 February 2016 15:56
> To: Paul Durrant
> Cc: George Dunlap; Jan Beulich; Kevin Tian; Zhang Yu; Wei Liu; Ian Campbell;
> Andrew Cooper; xen-devel@lists.xen.org; Stefano Stabellini; Zhiyuan Lv; Ian
> Jackson; Keir (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> On Wed, Feb 17, 2016 at 11:12 AM, Paul Durrant <paul.durr...@citrix.com>
> wrote:
> >> -Original Message-
> >> From: George Dunlap [mailto:george.dun...@citrix.com]
> >> Sent: 17 February 2016 11:02
> >> To: Jan Beulich; Paul Durrant; Kevin Tian; Zhang Yu
> >> Cc: Andrew Cooper; Ian Campbell; Ian Jackson; Stefano Stabellini; Wei Liu;
> >> Zhiyuan Lv; xen-devel@lists.xen.org; George Dunlap; Keir (Xen.org)
> >> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> >> max_wp_ram_ranges.
> >>
> >> On 17/02/16 10:22, Jan Beulich wrote:
> >> >>>> On 17.02.16 at 10:58, <kevin.t...@intel.com> wrote:
> >> >> Thanks for the help. Let's see whether we can have some solution
> ready
> >> for
> >> >> 4.7. :-)
> >> >
> >> > Since we now seem to all agree that a different approach is going to
> >> > be taken, I think we indeed should revert f5a32c5b8e ("x86/HVM:
> >> > differentiate IO/mem resources tracked by ioreq server"). Please
> >> > voice objections to the plan pretty soon.
> >>
> >> FWIW, after this discussion, I don't have an objection to the basic
> >> interface in this series as it is, since it addresses my request that it
> >> be memory-based, and it could be switched to using p2m types
> >> behind-the-scenes -- with the exception of the knob to control how many
> >> ranges are allowed (since it exposes the internal implementation).
> >>
> >> I wouldn't object to the current implementation going in as it was in v9
> >> (apparently), and then changing the type stuff around behind the scenes
> >> later as an optimization.  I also don't think it would be terribly
> >> difficult to change the implementation as it is to just use write_dm for
> >> a single IOREQ server.  We can rename it ioreq_server and expand it
> >> later.  Sorry if this wasn't clear from my comments before.
> >>
> >
> > The problem is that, since you objected to wp_mem being treated the
> same as mmio, we had to introduce a new range type to the map/unmap
> hypercalls and that will become redundant if we steer by type. If we could
> have just increased the size of the rangeset for mmio and used that *for
> now* then weeks of work could have been saved going down this dead end.
> 
> So first of all, let me say that I handled things respect to this
> thread rather poorly in a lot of ways, and you're right to be
> frustrated.  All I can say is, I'm sorry and I'll try to do better.
> 
> I'm not sure I understand what you're saying with this line: "We had
> to introduce a new range type to the map/unmap hypercalls that will
> become redundant if we steer by type".
>

What I means is that if we are going to do away with the concept of 'wp mem' 
and replace it with 'special type for steering to ioreq server' then we only 
need range types of pci, portio and mmio as we had before.
 
> The interface you have here simply says, "Please allow the guest to
> read from these gpfns, but give me (an ioreq server) a notification if
> the guest attempts to write to them."  One way to implement this is
> the way you have done -- by taking the existing (basically unused)
> write_dm p2m type, and layering the existing MMIO rangesets on top of
> it.

That type was introduced for this purpose; it was not there before.

>  Another way you could have (and AFAICS still can) implemented
> this interface is by allowing ioreq servers asking for write
> intercepts on gpfns to claim individual p2m types set aside for that
> purpose -- starting with the existing write_dm, which would only allow
> one such server to register, and later perhaps extending to allow
> more.
> 
> In what way would the second implementation make the new type for the
> map/unmap hypercalls redundant?
> 

Because that type is going to go away, now that it has no use.

  Paul

>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-22 Thread George Dunlap

On Wed, Feb 17, 2016 at 11:12 AM, Paul Durrant <paul.durr...@citrix.com> wrote:
>> -Original Message-
>> From: George Dunlap [mailto:george.dun...@citrix.com]
>> Sent: 17 February 2016 11:02
>> To: Jan Beulich; Paul Durrant; Kevin Tian; Zhang Yu
>> Cc: Andrew Cooper; Ian Campbell; Ian Jackson; Stefano Stabellini; Wei Liu;
>> Zhiyuan Lv; xen-devel@lists.xen.org; George Dunlap; Keir (Xen.org)
>> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
>> max_wp_ram_ranges.
>>
>> On 17/02/16 10:22, Jan Beulich wrote:
>> >>>> On 17.02.16 at 10:58, <kevin.t...@intel.com> wrote:
>> >> Thanks for the help. Let's see whether we can have some solution ready
>> for
>> >> 4.7. :-)
>> >
>> > Since we now seem to all agree that a different approach is going to
>> > be taken, I think we indeed should revert f5a32c5b8e ("x86/HVM:
>> > differentiate IO/mem resources tracked by ioreq server"). Please
>> > voice objections to the plan pretty soon.
>>
>> FWIW, after this discussion, I don't have an objection to the basic
>> interface in this series as it is, since it addresses my request that it
>> be memory-based, and it could be switched to using p2m types
>> behind-the-scenes -- with the exception of the knob to control how many
>> ranges are allowed (since it exposes the internal implementation).
>>
>> I wouldn't object to the current implementation going in as it was in v9
>> (apparently), and then changing the type stuff around behind the scenes
>> later as an optimization.  I also don't think it would be terribly
>> difficult to change the implementation as it is to just use write_dm for
>> a single IOREQ server.  We can rename it ioreq_server and expand it
>> later.  Sorry if this wasn't clear from my comments before.
>>
>
> The problem is that, since you objected to wp_mem being treated the same as 
> mmio, we had to introduce a new range type to the map/unmap hypercalls and 
> that will become redundant if we steer by type. If we could have just 
> increased the size of the rangeset for mmio and used that *for now* then 
> weeks of work could have been saved going down this dead end.

So first of all, let me say that I handled things respect to this
thread rather poorly in a lot of ways, and you're right to be
frustrated.  All I can say is, I'm sorry and I'll try to do better.

I'm not sure I understand what you're saying with this line: "We had
to introduce a new range type to the map/unmap hypercalls that will
become redundant if we steer by type".

The interface you have here simply says, "Please allow the guest to
read from these gpfns, but give me (an ioreq server) a notification if
the guest attempts to write to them."  One way to implement this is
the way you have done -- by taking the existing (basically unused)
write_dm p2m type, and layering the existing MMIO rangesets on top of
it.  Another way you could have (and AFAICS still can) implemented
this interface is by allowing ioreq servers asking for write
intercepts on gpfns to claim individual p2m types set aside for that
purpose -- starting with the existing write_dm, which would only allow
one such server to register, and later perhaps extending to allow
more.

In what way would the second implementation make the new type for the
map/unmap hypercalls redundant?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-17 Thread Paul Durrant

> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: 17 February 2016 11:02
> To: Jan Beulich; Paul Durrant; Kevin Tian; Zhang Yu
> Cc: Andrew Cooper; Ian Campbell; Ian Jackson; Stefano Stabellini; Wei Liu;
> Zhiyuan Lv; xen-devel@lists.xen.org; George Dunlap; Keir (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> On 17/02/16 10:22, Jan Beulich wrote:
> >>>> On 17.02.16 at 10:58, <kevin.t...@intel.com> wrote:
> >> Thanks for the help. Let's see whether we can have some solution ready
> for
> >> 4.7. :-)
> >
> > Since we now seem to all agree that a different approach is going to
> > be taken, I think we indeed should revert f5a32c5b8e ("x86/HVM:
> > differentiate IO/mem resources tracked by ioreq server"). Please
> > voice objections to the plan pretty soon.
> 
> FWIW, after this discussion, I don't have an objection to the basic
> interface in this series as it is, since it addresses my request that it
> be memory-based, and it could be switched to using p2m types
> behind-the-scenes -- with the exception of the knob to control how many
> ranges are allowed (since it exposes the internal implementation).
> 
> I wouldn't object to the current implementation going in as it was in v9
> (apparently), and then changing the type stuff around behind the scenes
> later as an optimization.  I also don't think it would be terribly
> difficult to change the implementation as it is to just use write_dm for
> a single IOREQ server.  We can rename it ioreq_server and expand it
> later.  Sorry if this wasn't clear from my comments before.
>

The problem is that, since you objected to wp_mem being treated the same as 
mmio, we had to introduce a new range type to the map/unmap hypercalls and that 
will become redundant if we steer by type. If we could have just increased the 
size of the rangeset for mmio and used that *for now* then weeks of work could 
have been saved going down this dead end.

  Paul
 
> A new interface that's been carefully thought through would of course be
> nice too.
> 
>  -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-17 Thread George Dunlap

On 17/02/16 10:22, Jan Beulich wrote:
 On 17.02.16 at 10:58,  wrote:
>> Thanks for the help. Let's see whether we can have some solution ready for 
>> 4.7. :-)
> 
> Since we now seem to all agree that a different approach is going to
> be taken, I think we indeed should revert f5a32c5b8e ("x86/HVM:
> differentiate IO/mem resources tracked by ioreq server"). Please
> voice objections to the plan pretty soon.

FWIW, after this discussion, I don't have an objection to the basic
interface in this series as it is, since it addresses my request that it
be memory-based, and it could be switched to using p2m types
behind-the-scenes -- with the exception of the knob to control how many
ranges are allowed (since it exposes the internal implementation).

I wouldn't object to the current implementation going in as it was in v9
(apparently), and then changing the type stuff around behind the scenes
later as an optimization.  I also don't think it would be terribly
difficult to change the implementation as it is to just use write_dm for
a single IOREQ server.  We can rename it ioreq_server and expand it
later.  Sorry if this wasn't clear from my comments before.

A new interface that's been carefully thought through would of course be
nice too.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-17 Thread Tian, Kevin

> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: Wednesday, February 17, 2016 6:24 PM
> 
> > -Original Message-
> > From: Jan Beulich [mailto:jbeul...@suse.com]
> > Sent: 17 February 2016 10:22
> > To: Paul Durrant; Kevin Tian; Zhang Yu
> > Cc: Andrew Cooper; George Dunlap; Ian Campbell; Ian Jackson; Stefano
> > Stabellini; Wei Liu; Zhiyuan Lv; xen-devel@lists.xen.org; George Dunlap; 
> > Keir
> > (Xen.org)
> > Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> > max_wp_ram_ranges.
> >
> > >>> On 17.02.16 at 10:58, <kevin.t...@intel.com> wrote:
> > > Thanks for the help. Let's see whether we can have some solution ready
> > for
> > > 4.7. :-)
> >
> > Since we now seem to all agree that a different approach is going to
> > be taken, I think we indeed should revert f5a32c5b8e ("x86/HVM:
> > differentiate IO/mem resources tracked by ioreq server"). Please
> > voice objections to the plan pretty soon.
> >
> 
> I'm happy with reversion at this point.
> 

Agree.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-17 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 17 February 2016 10:22
> To: Paul Durrant; Kevin Tian; Zhang Yu
> Cc: Andrew Cooper; George Dunlap; Ian Campbell; Ian Jackson; Stefano
> Stabellini; Wei Liu; Zhiyuan Lv; xen-devel@lists.xen.org; George Dunlap; Keir
> (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> >>> On 17.02.16 at 10:58, <kevin.t...@intel.com> wrote:
> > Thanks for the help. Let's see whether we can have some solution ready
> for
> > 4.7. :-)
> 
> Since we now seem to all agree that a different approach is going to
> be taken, I think we indeed should revert f5a32c5b8e ("x86/HVM:
> differentiate IO/mem resources tracked by ioreq server"). Please
> voice objections to the plan pretty soon.
> 

I'm happy with reversion at this point.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-17 Thread Jan Beulich

>>> On 17.02.16 at 10:58,  wrote:
> Thanks for the help. Let's see whether we can have some solution ready for 
> 4.7. :-)

Since we now seem to all agree that a different approach is going to
be taken, I think we indeed should revert f5a32c5b8e ("x86/HVM:
differentiate IO/mem resources tracked by ioreq server"). Please
voice objections to the plan pretty soon.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-17 Thread Paul Durrant

> -Original Message-
> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> Sent: 17 February 2016 09:58
> To: Paul Durrant; Jan Beulich
> Cc: Andrew Cooper; George Dunlap; Ian Campbell; Ian Jackson; Stefano
> Stabellini; Wei Liu; Lv, Zhiyuan; Zhang Yu; xen-devel@lists.xen.org; George
> Dunlap; Keir (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> > From: Paul Durrant [mailto:paul.durr...@citrix.com]
> > Sent: Wednesday, February 17, 2016 4:58 PM
> >
> > >
> > > btw does this design consider the case where multiple ioreq servers
> > > may claim on same page?
> >
> > Yes it does and there are currently insufficient page types to allow any
> more than a single
> > ioreq server to claim a type. My plan is that, in future, we can add a p2t
> mapping table to
> > allow for more types and then introduce HVMMEM_ioreq_1,
> HVMMEM_ioreq_2, etc.
> 
> so these new types actually represent ioreq server ID, right? If yes I can 
> then
> understand your earlier explanations.

Well, not quite. Each of these types can be steered to exactly one ioreq server 
but there will be a new hypercall to control the mapping. The design doc will 
make it clear.

  Paul

> 
> >
> > > For example, different usages may both
> > > want to capture write requests on the same set of pages (say XenGT
> > > selectively write-protects a subset of pages due to shadow GTT, while
> > > another agent wants to monitor all guest writes to any guest memory
> > > page).
> >
> > Monitoring is a different thing altogether. Emulation is costly and not
> something you'd want
> > to use for that purpose. If you want to monitor writes then log-dirty
> already exists for that
> > purpose.
> 
> Agree.
> 
> >
> > >
> > > Thanks
> > > Kevin
> >
> > I hope my explanation helped. I think things will be clearer once I've had
> chance to actually
> > put together a design doc. and hack up a PoC (probably only for EPT at
> first).
> >
> 
> Thanks for the help. Let's see whether we can have some solution ready for
> 4.7. :-)
> 
> Thanks
> Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-17 Thread Tian, Kevin

> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: Wednesday, February 17, 2016 4:58 PM
> 
> >
> > btw does this design consider the case where multiple ioreq servers
> > may claim on same page?
> 
> Yes it does and there are currently insufficient page types to allow any more 
> than a single
> ioreq server to claim a type. My plan is that, in future, we can add a p2t 
> mapping table to
> allow for more types and then introduce HVMMEM_ioreq_1, HVMMEM_ioreq_2, etc.

so these new types actually represent ioreq server ID, right? If yes I can then
understand your earlier explanations.

> 
> > For example, different usages may both
> > want to capture write requests on the same set of pages (say XenGT
> > selectively write-protects a subset of pages due to shadow GTT, while
> > another agent wants to monitor all guest writes to any guest memory
> > page).
> 
> Monitoring is a different thing altogether. Emulation is costly and not 
> something you'd want
> to use for that purpose. If you want to monitor writes then log-dirty already 
> exists for that
> purpose.

Agree.

> 
> >
> > Thanks
> > Kevin
> 
> I hope my explanation helped. I think things will be clearer once I've had 
> chance to actually
> put together a design doc. and hack up a PoC (probably only for EPT at first).
> 

Thanks for the help. Let's see whether we can have some solution ready for 4.7. 
:-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-17 Thread Jan Beulich

>>> On 17.02.16 at 09:58,  wrote:
>> > I'd envisaged that setting HVM_emulate_0 type on a page would mean
>> nothing until an
>> 
>> for "mean nothing" what is the default policy then if guest happens to access
>> it
>> before any ioreq server claims it?
>> 
> 
> My thoughts were that, since no specific emulation has yet been requested 
> (because no ioreq server has yet claimed it) that the default policy is to 
> treat it as r/w RAM as I said below. This is because I think the only legal 
> type transitions should be from HVMMEM_ram_rw to HVMMEM_emulate_0 and back 
> again.

+1

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-17 Thread Paul Durrant

> -Original Message-
[snip]
> > > >
> > > > I'm afraid I have made little progress due to the distractions of trying
> get
> > > > some patches into Linux but my thoughts are around replacing the
> > > > HVM_mmio_write_dm with something like HVM_emulate_0 (i.e. the
> zero-
> > > th example
> > > > of a type that requires emulation, to be followed by others in future)
> and
> > > > then add a hypercall along the lines of
> > > HVMOP_map_mem_type_to_ioreq_server
> > > > which will take an ioerq server id, a type and flags saying whether it
> wishes
> > > > to handle reads and/or writes to that type.
> > > >
> > > > Thoughts (anyone)?
> > >
> > > I think as a general idea also allowing reads to be intercepted is
> > > nice, but would incur quite a few changes which we don't currently
> > > have a user for. Hence I'd suggest making the public interface
> > > ready for that without actually implementing hypervisor support.
> > >
> >
> > Well, we need some form a hypervisor support to replace what's already
> there.
> >
> > I'd envisaged that setting HVM_emulate_0 type on a page would mean
> nothing until an
> 
> for "mean nothing" what is the default policy then if guest happens to access
> it
> before any ioreq server claims it?
> 

My thoughts were that, since no specific emulation has yet been requested 
(because no ioreq server has yet claimed it) that the default policy is to 
treat it as r/w RAM as I said below. This is because I think the only legal 
type transitions should be from HVMMEM_ram_rw to HVMMEM_emulate_0 and back 
again.

> > ioreq server claims it (i.e. it stays as r/w RAM) but when the ioreq server
> makes the claim
> > the EPT is changed according to whether reads and/or writes are wanted
> and then the fault
> > handler steers transactions to the (single at the moment) ioreq server. I'll
> need to code up
> > a PoC to make sure I'm not barking up the wrong tree though.
> >
> 
> Curious any reason why we must have a HVM_emulate_0 placeholder
> first and why we can't allow ioreq server to claim on any existing type?

Which type were you thinking of? Remember that the ioreq server would be 
claiming *all* pages of that type.

> Thinking about XenGT usage, I cannot envisage when a page should
> be set to HVM_emulate_0 first. The write-protection operation is dynamic
> according to guest page table operations, upon which we'll directly jump
> to claim phase...

I don't imagine that things would happen that way round in the common case. For 
XenGT I'd expect the ioreq server to immediately claim HVMMEM_emulate_0 and 
then set that type on any page that it wants to trap accesses on (which means 
that
I am assuming that the same emulation - i.e. write accesses only - is desired 
for all pages... but I think that is indeed the case).

> 
> btw does this design consider the case where multiple ioreq servers
> may claim on same page?

Yes it does and there are currently insufficient page types to allow any more 
than a single ioreq server to claim a type. My plan is that, in future, we can 
add a p2t mapping table to allow for more types and then introduce 
HVMMEM_ioreq_1, HVMMEM_ioreq_2, etc.

> For example, different usages may both
> want to capture write requests on the same set of pages (say XenGT
> selectively write-protects a subset of pages due to shadow GTT, while
> another agent wants to monitor all guest writes to any guest memory
> page).

Monitoring is a different thing altogether. Emulation is costly and not 
something you'd want to use for that purpose. If you want to monitor writes 
then log-dirty already exists for that purpose.

> 
> Thanks
> Kevin

I hope my explanation helped. I think things will be clearer once I've had 
chance to actually put together a design doc. and hack up a PoC (probably only 
for EPT at first).

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-16 Thread Tian, Kevin

> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: Tuesday, February 16, 2016 7:11 PM
> 
> > -Original Message-
> > From: Jan Beulich [mailto:jbeul...@suse.com]
> > Sent: 16 February 2016 10:34
> > To: Paul Durrant
> > Cc: Andrew Cooper; George Dunlap; Ian Campbell; Ian Jackson; Stefano
> > Stabellini; Wei Liu; Kevin Tian; Zhiyuan Lv; Zhang Yu; 
> > xen-devel@lists.xen.org;
> > George Dunlap; Keir (Xen.org)
> > Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> > max_wp_ram_ranges.
> >
> > >>> On 16.02.16 at 09:50, <paul.durr...@citrix.com> wrote:
> > >>  -Original Message-
> > >> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> > >> Sent: 16 February 2016 07:23
> > >> To: Paul Durrant; George Dunlap
> > >> Cc: Jan Beulich; George Dunlap; Wei Liu; Ian Campbell; Andrew Cooper;
> > >> Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini; Lv, Zhiyuan; Ian
> > >> Jackson; Keir (Xen.org)
> > >> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> > >> max_wp_ram_ranges.
> > >>
> > >> > From: Paul Durrant [mailto:paul.durr...@citrix.com]
> > >> > Sent: Friday, February 05, 2016 7:24 PM
> > >> >
> > >> > > -Original Message-
> > >> > > From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> > >> > > George Dunlap
> > >> > > Sent: 05 February 2016 11:14
> > >> > > To: Paul Durrant
> > >> > > Cc: Jan Beulich; George Dunlap; Kevin Tian; Wei Liu; Ian Campbell;
> > >> Andrew
> > >> > > Cooper; Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini;
> > >> > > zhiyuan...@intel.com; Ian Jackson; Keir (Xen.org)
> > >> > > Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> > >> > > max_wp_ram_ranges.
> > >> > >
> > >> > > On Fri, Feb 5, 2016 at 9:24 AM, Paul Durrant
> > <paul.durr...@citrix.com>
> > >> > > wrote:
> > >> > > > Utilizing the default server is a backwards step. GVT-g would have 
> > >> > > > to
> > >> use
> > >> > > the old HVM_PARAM mechanism to cause it's emulator to become
> > >> default. I
> > >> > > think a more appropriate mechanism would be p2m_mmio_write_dm
> > to
> > >> > > become something like 'p2m_ioreq_server_write' and then have a
> > >> hypercall
> > >> > > to allow it to be mapped to a particular ioreq server.
> > >> > > > Obviously only one could claim it but, with a p2t, the bit could 
> > >> > > > be re-
> > >> > > purposed to simply mean 'go look in the p2t' for more information and
> > >> then
> > >> > > the p2t could be structured to allow emulations to be steered to one
> > of
> > >> many
> > >> > > ioreq servers (for read and/or write emulation).
> > >> > >
> > >> > > Right; I had in mind that Xen would allow at any given time a max of 
> > >> > > N
> > >> > > ioreq servers to register for mmio_write_dm ranges, first-come
> > >> > > first-served; with 'N' being '1' to begin with.  If a second ioreq
> > >> > > server requested mmio_write_dm functionality, it would get -EBUSY.
> > >> > > This would allow their current setup (one qemu dm which doesn't do
> > >> > > mmio_write_dm, one xengt dm which does) to work without needing
> > to
> > >> > > worry any more about how many pages might need to be tracked
> > (either
> > >> > > for efficiency or correctness).
> > >> > >
> > >> > > We could then extend this to some larger number (4 seems pretty
> > >> > > reasonable to me) either by adding an extra 3 types, or by some other
> > >> > > method (such as the one Paul suggests).
> > >> >
> > >> > I think it would be best to do away with the 'write dm' name though. I
> > >> would like to see it
> > >> > be possible to steer reads+writes, as well as writes (and maybe just
> > > reads?)
> > >> to a particular
> > >> > ioreq server based on type information. So maybe we just call the
> > existing
> > >> type
> > >> > 'p2m_io

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-16 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 16 February 2016 10:34
> To: Paul Durrant
> Cc: Andrew Cooper; George Dunlap; Ian Campbell; Ian Jackson; Stefano
> Stabellini; Wei Liu; Kevin Tian; Zhiyuan Lv; Zhang Yu; 
> xen-devel@lists.xen.org;
> George Dunlap; Keir (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> >>> On 16.02.16 at 09:50, <paul.durr...@citrix.com> wrote:
> >>  -Original Message-
> >> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> >> Sent: 16 February 2016 07:23
> >> To: Paul Durrant; George Dunlap
> >> Cc: Jan Beulich; George Dunlap; Wei Liu; Ian Campbell; Andrew Cooper;
> >> Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini; Lv, Zhiyuan; Ian
> >> Jackson; Keir (Xen.org)
> >> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> >> max_wp_ram_ranges.
> >>
> >> > From: Paul Durrant [mailto:paul.durr...@citrix.com]
> >> > Sent: Friday, February 05, 2016 7:24 PM
> >> >
> >> > > -Original Message-
> >> > > From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> >> > > George Dunlap
> >> > > Sent: 05 February 2016 11:14
> >> > > To: Paul Durrant
> >> > > Cc: Jan Beulich; George Dunlap; Kevin Tian; Wei Liu; Ian Campbell;
> >> Andrew
> >> > > Cooper; Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini;
> >> > > zhiyuan...@intel.com; Ian Jackson; Keir (Xen.org)
> >> > > Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> >> > > max_wp_ram_ranges.
> >> > >
> >> > > On Fri, Feb 5, 2016 at 9:24 AM, Paul Durrant
> <paul.durr...@citrix.com>
> >> > > wrote:
> >> > > > Utilizing the default server is a backwards step. GVT-g would have to
> >> use
> >> > > the old HVM_PARAM mechanism to cause it's emulator to become
> >> default. I
> >> > > think a more appropriate mechanism would be p2m_mmio_write_dm
> to
> >> > > become something like 'p2m_ioreq_server_write' and then have a
> >> hypercall
> >> > > to allow it to be mapped to a particular ioreq server.
> >> > > > Obviously only one could claim it but, with a p2t, the bit could be 
> >> > > > re-
> >> > > purposed to simply mean 'go look in the p2t' for more information and
> >> then
> >> > > the p2t could be structured to allow emulations to be steered to one
> of
> >> many
> >> > > ioreq servers (for read and/or write emulation).
> >> > >
> >> > > Right; I had in mind that Xen would allow at any given time a max of N
> >> > > ioreq servers to register for mmio_write_dm ranges, first-come
> >> > > first-served; with 'N' being '1' to begin with.  If a second ioreq
> >> > > server requested mmio_write_dm functionality, it would get -EBUSY.
> >> > > This would allow their current setup (one qemu dm which doesn't do
> >> > > mmio_write_dm, one xengt dm which does) to work without needing
> to
> >> > > worry any more about how many pages might need to be tracked
> (either
> >> > > for efficiency or correctness).
> >> > >
> >> > > We could then extend this to some larger number (4 seems pretty
> >> > > reasonable to me) either by adding an extra 3 types, or by some other
> >> > > method (such as the one Paul suggests).
> >> >
> >> > I think it would be best to do away with the 'write dm' name though. I
> >> would like to see it
> >> > be possible to steer reads+writes, as well as writes (and maybe just
> > reads?)
> >> to a particular
> >> > ioreq server based on type information. So maybe we just call the
> existing
> >> type
> >> > 'p2m_ioreq_server' and then, in the absence of a p2t, hardcode this to
> go
> >> to whichever
> >> > emulator makes the new TBD hypercall.
> >> > I think we need a proper design at this point. Given that it's Chinese
> New
> >> Year maybe I'll
> >> > have a stab in Yu's absence.
> >> >
> >>
> >> Hi, Paul, what about your progress on this?
> >>
> >> My feeling is that we do not need a new hypercall to explicitly claim
> >> whether a ioreq server wants to handle write reque

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-16 Thread Jan Beulich

>>> On 16.02.16 at 09:50, <paul.durr...@citrix.com> wrote:
>>  -Original Message-
>> From: Tian, Kevin [mailto:kevin.t...@intel.com]
>> Sent: 16 February 2016 07:23
>> To: Paul Durrant; George Dunlap
>> Cc: Jan Beulich; George Dunlap; Wei Liu; Ian Campbell; Andrew Cooper;
>> Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini; Lv, Zhiyuan; Ian
>> Jackson; Keir (Xen.org)
>> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
>> max_wp_ram_ranges.
>> 
>> > From: Paul Durrant [mailto:paul.durr...@citrix.com]
>> > Sent: Friday, February 05, 2016 7:24 PM
>> >
>> > > -Original Message-
>> > > From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
>> > > George Dunlap
>> > > Sent: 05 February 2016 11:14
>> > > To: Paul Durrant
>> > > Cc: Jan Beulich; George Dunlap; Kevin Tian; Wei Liu; Ian Campbell;
>> Andrew
>> > > Cooper; Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini;
>> > > zhiyuan...@intel.com; Ian Jackson; Keir (Xen.org)
>> > > Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
>> > > max_wp_ram_ranges.
>> > >
>> > > On Fri, Feb 5, 2016 at 9:24 AM, Paul Durrant <paul.durr...@citrix.com>
>> > > wrote:
>> > > > Utilizing the default server is a backwards step. GVT-g would have to
>> use
>> > > the old HVM_PARAM mechanism to cause it's emulator to become
>> default. I
>> > > think a more appropriate mechanism would be p2m_mmio_write_dm to
>> > > become something like 'p2m_ioreq_server_write' and then have a
>> hypercall
>> > > to allow it to be mapped to a particular ioreq server.
>> > > > Obviously only one could claim it but, with a p2t, the bit could be re-
>> > > purposed to simply mean 'go look in the p2t' for more information and
>> then
>> > > the p2t could be structured to allow emulations to be steered to one of
>> many
>> > > ioreq servers (for read and/or write emulation).
>> > >
>> > > Right; I had in mind that Xen would allow at any given time a max of N
>> > > ioreq servers to register for mmio_write_dm ranges, first-come
>> > > first-served; with 'N' being '1' to begin with.  If a second ioreq
>> > > server requested mmio_write_dm functionality, it would get -EBUSY.
>> > > This would allow their current setup (one qemu dm which doesn't do
>> > > mmio_write_dm, one xengt dm which does) to work without needing to
>> > > worry any more about how many pages might need to be tracked (either
>> > > for efficiency or correctness).
>> > >
>> > > We could then extend this to some larger number (4 seems pretty
>> > > reasonable to me) either by adding an extra 3 types, or by some other
>> > > method (such as the one Paul suggests).
>> >
>> > I think it would be best to do away with the 'write dm' name though. I
>> would like to see it
>> > be possible to steer reads+writes, as well as writes (and maybe just 
> reads?)
>> to a particular
>> > ioreq server based on type information. So maybe we just call the existing
>> type
>> > 'p2m_ioreq_server' and then, in the absence of a p2t, hardcode this to go
>> to whichever
>> > emulator makes the new TBD hypercall.
>> > I think we need a proper design at this point. Given that it's Chinese New
>> Year maybe I'll
>> > have a stab in Yu's absence.
>> >
>> 
>> Hi, Paul, what about your progress on this?
>> 
>> My feeling is that we do not need a new hypercall to explicitly claim
>> whether a ioreq server wants to handle write requests. It can be
>> implicitly marked upon whether a specific page is requested for
>> write-protection through a specific ioreq channel, and then that
>> ioreq server will claim the attribute automatically.
> 
> Hi Kevin,
> 
> Is there a hypercall to do that? Maybe I'm missing something but I was under 
> the impression that the only way to set write protection was via an 
> HVMOP_set_mem_type and that does not carry an ioreq server id.
> 
> I'm afraid I have made little progress due to the distractions of trying get 
> some patches into Linux but my thoughts are around replacing the 
> HVM_mmio_write_dm with something like HVM_emulate_0 (i.e. the zero-th example 
> of a type that requires emulation, to be followed by others in future) and 
> then add a hypercall along the lines of HVMOP_map_mem_type_to_ioreq_server 
> which will take an ioerq server id, a type and flags saying whether it wishes 
> to handle reads and/or writes to that type.
> 
> Thoughts (anyone)?

I think as a general idea also allowing reads to be intercepted is
nice, but would incur quite a few changes which we don't currently
have a user for. Hence I'd suggest making the public interface
ready for that without actually implementing hypervisor support.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-16 Thread Paul Durrant

> -Original Message-
> From: Tian, Kevin [mailto:kevin.t...@intel.com]
> Sent: 16 February 2016 07:23
> To: Paul Durrant; George Dunlap
> Cc: Jan Beulich; George Dunlap; Wei Liu; Ian Campbell; Andrew Cooper;
> Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini; Lv, Zhiyuan; Ian
> Jackson; Keir (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> > From: Paul Durrant [mailto:paul.durr...@citrix.com]
> > Sent: Friday, February 05, 2016 7:24 PM
> >
> > > -Original Message-
> > > From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> > > George Dunlap
> > > Sent: 05 February 2016 11:14
> > > To: Paul Durrant
> > > Cc: Jan Beulich; George Dunlap; Kevin Tian; Wei Liu; Ian Campbell;
> Andrew
> > > Cooper; Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini;
> > > zhiyuan...@intel.com; Ian Jackson; Keir (Xen.org)
> > > Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> > > max_wp_ram_ranges.
> > >
> > > On Fri, Feb 5, 2016 at 9:24 AM, Paul Durrant <paul.durr...@citrix.com>
> > > wrote:
> > > > Utilizing the default server is a backwards step. GVT-g would have to
> use
> > > the old HVM_PARAM mechanism to cause it's emulator to become
> default. I
> > > think a more appropriate mechanism would be p2m_mmio_write_dm to
> > > become something like 'p2m_ioreq_server_write' and then have a
> hypercall
> > > to allow it to be mapped to a particular ioreq server.
> > > > Obviously only one could claim it but, with a p2t, the bit could be re-
> > > purposed to simply mean 'go look in the p2t' for more information and
> then
> > > the p2t could be structured to allow emulations to be steered to one of
> many
> > > ioreq servers (for read and/or write emulation).
> > >
> > > Right; I had in mind that Xen would allow at any given time a max of N
> > > ioreq servers to register for mmio_write_dm ranges, first-come
> > > first-served; with 'N' being '1' to begin with.  If a second ioreq
> > > server requested mmio_write_dm functionality, it would get -EBUSY.
> > > This would allow their current setup (one qemu dm which doesn't do
> > > mmio_write_dm, one xengt dm which does) to work without needing to
> > > worry any more about how many pages might need to be tracked (either
> > > for efficiency or correctness).
> > >
> > > We could then extend this to some larger number (4 seems pretty
> > > reasonable to me) either by adding an extra 3 types, or by some other
> > > method (such as the one Paul suggests).
> >
> > I think it would be best to do away with the 'write dm' name though. I
> would like to see it
> > be possible to steer reads+writes, as well as writes (and maybe just reads?)
> to a particular
> > ioreq server based on type information. So maybe we just call the existing
> type
> > 'p2m_ioreq_server' and then, in the absence of a p2t, hardcode this to go
> to whichever
> > emulator makes the new TBD hypercall.
> > I think we need a proper design at this point. Given that it's Chinese New
> Year maybe I'll
> > have a stab in Yu's absence.
> >
> 
> Hi, Paul, what about your progress on this?
> 
> My feeling is that we do not need a new hypercall to explicitly claim
> whether a ioreq server wants to handle write requests. It can be
> implicitly marked upon whether a specific page is requested for
> write-protection through a specific ioreq channel, and then that
> ioreq server will claim the attribute automatically.

Hi Kevin,

Is there a hypercall to do that? Maybe I'm missing something but I was under 
the impression that the only way to set write protection was via an 
HVMOP_set_mem_type and that does not carry an ioreq server id.

I'm afraid I have made little progress due to the distractions of trying get 
some patches into Linux but my thoughts are around replacing the 
HVM_mmio_write_dm with something like HVM_emulate_0 (i.e. the zero-th example 
of a type that requires emulation, to be followed by others in future) and then 
add a hypercall along the lines of HVMOP_map_mem_type_to_ioreq_server which 
will take an ioerq server id, a type and flags saying whether it wishes to 
handle reads and/or writes to that type.

Thoughts (anyone)?

  Paul

> 
> Thanks
> Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-15 Thread Tian, Kevin

> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: Friday, February 05, 2016 7:24 PM
> 
> > -Original Message-
> > From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> > George Dunlap
> > Sent: 05 February 2016 11:14
> > To: Paul Durrant
> > Cc: Jan Beulich; George Dunlap; Kevin Tian; Wei Liu; Ian Campbell; Andrew
> > Cooper; Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini;
> > zhiyuan...@intel.com; Ian Jackson; Keir (Xen.org)
> > Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> > max_wp_ram_ranges.
> >
> > On Fri, Feb 5, 2016 at 9:24 AM, Paul Durrant <paul.durr...@citrix.com>
> > wrote:
> > > Utilizing the default server is a backwards step. GVT-g would have to use
> > the old HVM_PARAM mechanism to cause it's emulator to become default. I
> > think a more appropriate mechanism would be p2m_mmio_write_dm to
> > become something like 'p2m_ioreq_server_write' and then have a hypercall
> > to allow it to be mapped to a particular ioreq server.
> > > Obviously only one could claim it but, with a p2t, the bit could be re-
> > purposed to simply mean 'go look in the p2t' for more information and then
> > the p2t could be structured to allow emulations to be steered to one of many
> > ioreq servers (for read and/or write emulation).
> >
> > Right; I had in mind that Xen would allow at any given time a max of N
> > ioreq servers to register for mmio_write_dm ranges, first-come
> > first-served; with 'N' being '1' to begin with.  If a second ioreq
> > server requested mmio_write_dm functionality, it would get -EBUSY.
> > This would allow their current setup (one qemu dm which doesn't do
> > mmio_write_dm, one xengt dm which does) to work without needing to
> > worry any more about how many pages might need to be tracked (either
> > for efficiency or correctness).
> >
> > We could then extend this to some larger number (4 seems pretty
> > reasonable to me) either by adding an extra 3 types, or by some other
> > method (such as the one Paul suggests).
> 
> I think it would be best to do away with the 'write dm' name though. I would 
> like to see it
> be possible to steer reads+writes, as well as writes (and maybe just reads?) 
> to a particular
> ioreq server based on type information. So maybe we just call the existing 
> type
> 'p2m_ioreq_server' and then, in the absence of a p2t, hardcode this to go to 
> whichever
> emulator makes the new TBD hypercall.
> I think we need a proper design at this point. Given that it's Chinese New 
> Year maybe I'll
> have a stab in Yu's absence.
> 

Hi, Paul, what about your progress on this?

My feeling is that we do not need a new hypercall to explicitly claim
whether a ioreq server wants to handle write requests. It can be
implicitly marked upon whether a specific page is requested for
write-protection through a specific ioreq channel, and then that
ioreq server will claim the attribute automatically. 

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread Zhiyuan Lv

Hi George,

On Fri, Feb 05, 2016 at 11:05:39AM +, George Dunlap wrote:
> On Fri, Feb 5, 2016 at 3:44 AM, Tian, Kevin  wrote:
> >> > So as long as the currently-in-use GTT tree contains no more than
> >> > $LIMIT ranges, you can unshadow and reshadow; this will be slow, but
> >> > strictly speaking correct.
> >> >
> >> > What do you do if the guest driver switches to a GTT such that the
> >> > entire tree takes up more than $LIMIT entries?
> >>
> >> GPU has some special properties different from CPU, which make things
> >> easier. The GPU page table is constructed by CPU and used by GPU
> >> workloads. GPU workload itself will not change the page table.
> >> Meanwhile, GPU workload submission, in virtualized environment, is
> >> controled by our device model. Then we can reshadow the whole table
> >> every time before we submit workload. That can reduce the total number
> >> required for write protection but with performance impact, because GPU
> >> has to have more idle time waiting for CPU. Hope the info helps.
> >> Thanks!
> >>
> >
> > Putting in another way, it's fully under mediation when a GPU page table
> > (GTT) will be referenced by the GPU, so there're plenty of room to
> > optimize existing shadowing (always shadowing all recognized GPU page
> > tables), e.g. shadowing only active one when a VM is scheduled in. It's
> > a performance matter but no correctness issue.
> >
> > This is why Yu mentioned earlier whether we can just set a default
> > limit which is good for majority of use cases, while extending our
> > device mode to drop/recreate some shadow tables upon the limitation
> > is hit. I think this matches how today's CPU shadow page table is
> > implemented, which also has a limitation of how many shadow pages
> > are allowed per-VM.
> 
> I don't think you've understood my question (or maybe I still don't
> understood the situation properly).
> 
> So in memory pagetables, there's a "tree" that contains a single
> top-level page, which points to other pages, which defines one address
> space (usually corresponding to one process or thread).   (This is
> often just refered to as 'cr3', since it's the value you write into
> the cr3 register on x86 processors.) I'm assuming that the structure
> is similar for your GPU translation tables -- that a single GTT is
> effectively a "tree" sort of like a process address space for an OS.
> 
> And it sounds like what you're saying is: suppose we have 10 different
> GTTs (i.e., an entire tree / gpu thread), and each one require 1024
> ranges to shadow.  In that case, a limit of 8192 ranges means we can
> only keep 8 of the ten actually shadowed at any one time.  This is not
> optimal, since it will occasionally mean unshadowing an entire GTT and
> re-shadowing another one, but it will work, because we can always make
> sure that the currently-active GTT is shadowed.
> 
> My question is, suppose a single GTT / gpu thread / tree has 9000
> ranges.  It would be trivial for an attacker to break into the
> operating system and *construct* such a tree, but it's entirely
> possible that due to a combination of memory fragmentation and very
> large usage, the normal driver might accidentally create such a GTT.
> In that case, the device model will not be able to write-protect all
> the pages in the single GTT, and thus will not be able to correctly
> track changes to the currently-active GTT.  What does your device
> model do in that case?

We can live with the partially write protected tree. That is because
GPU's workload execution is controlled by the device model. We still
have chance to update the shadow page table before we submit workload
to GPU. The impact is performance not correctness. Thanks!

-Zhiyuan

> 
>  -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread Jan Beulich

>>> On 04.02.16 at 18:12,  wrote:
> Two angles on this.
> 
> First, assuming that limiting the number of ranges is what we want:  I'm
> not really a fan of using HVM_PARAMs for this, but as long as it's not
> considered a public interface (i.e., it could go away or disappear and
> everything would Just Work), then I wouldn't object.

The parameter setting interface generally is guest exposed, it's just
that many parameters can't be set by guests for themselves. Of
course the new #define could be put inside a Xen/tools conditional.

> Although I would ask: would it instead be suitable for now to just set
> the default limit for WP_RAM to 8196 in the hypervisor, since we do
> expect it to be tracking gpfn ranges rather than IO regions?  And if we
> determine in the future that more ranges are necessary, to then do the
> work of moving it to using p2m types (or exposing a knob to adjust it)?

That's what collides with disaggregation (as pointed out before):
Any tool stack component could then create wp-ram pages in
guests it controls, no matter whether those guests actually have
a need for such. I continue to think that if we indeed got this
route, then the host admin should be required to explicitly state
that the risks are acceptable (by allowing only certain guests to
actually create [many] of such pages).

> But (and this the other angle): is simply marking a numerical limit
> sufficient to avoid memory exhaustion? Is there a danger that after
> creating several guests, such that Xen was now running very low on
> memory, that a guest would (purposely or not) cause memory to be
> exhausted sometime further after boot, causing a system-wide DoS (or
> just general lack of stability)?

The guest itself can't, but other than fully privileged tool stack
components could, and that's still something we need to avoid.

> In the shadow / hap memory case, the memory is pre-allocated up front,
> which makes sure that nothing a guest does can cause Xen to run out of
> memory once it's booted.  Without pre-allocating it, it's still possible
> that the admin might start up enough VMs that exhaustion is *possible*,
> but won't be triggered until later (when the guest starts using more GTTs).
> 
> Although in fact this really points to the need for a general overhaul
> in how memory allocation on behalf of a domain is handled in general;
> that's a bigger chunk of work.

Well, right now there's pretty little allocation happening at run time
afaict, i.e. having a couple of pages available will generally keep
things going smoothly. (Once upon a time it was that there were -
iirc - no active runtime allocations at all, i.e. such had occurred only
in the context of things like domctls, where failure could be dealt
with by ballooning a few pages out of some domain and retrying.
I'm afraid that's not the case anymore nowadays, and even if it
was would collide with disaggregation, as a tool stack component
legitimately issuing a domctl may not have the privileges to
balloon other than the domain it controls, in particular not Dom0.)

> But in any case, it seems to me that we can entirely avoid the question
> of how many ranges might ever be necessary by starting with a fixed
> limit in the hypervisor, and then moving to a p2m-type based
> implementation if and when that becomes unsatisfactory.

With all of the above I think we should rather explore the p2m-type
based approach, in particular the suggestion Kevin has made to
direct all p2m_mmio_write_dm (a name which appears to have been
badly chosen, considering that we're talking about RAM pages
here) write accesses to the default ioreq server (utilizing that
upstream qemu doesn't itself register as such).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread Yu, Zhang




On 2/4/2016 7:06 PM, George Dunlap wrote:

On Thu, Feb 4, 2016 at 9:38 AM, Yu, Zhang  wrote:

On 2/4/2016 5:28 PM, Paul Durrant wrote:

I assume this means that the emulator can 'unshadow' GTTs (I guess on an
LRU basis) so that it can shadow new ones when the limit has been exhausted?
If so, how bad is performance likely to be if we live with a lower limit
and take the hit of unshadowing if the guest GTTs become heavily fragmented?


Thank you, Paul.

Well, I was told the emulator have approaches to delay the shadowing of
the GTT till future GPU commands are submitted. By now, I'm not sure
about the performance penalties if the limit is set too low. Although
we are confident 8K is a secure limit, it seems still too high to be
accepted. We will perform more experiments with this new approach to
find a balance between the lowest limit and the XenGT performance.


Just to check some of my assumptions:

I assume that unlike memory accesses, your GPU hardware cannot
'recover' from faults in the GTTs. That is, for memory, you can take a
page fault, fix up the pagetables, and then re-execute the original
instruction; but so far I haven't heard of any devices being able to
seamlessly re-execute a transaction after a fault.  Is my
understanding correct?



Yes


If that is the case, then for every top-level value (whatever the
equivalent of the CR3), you need to be able to shadow the entire GTT
tree below it, yes?  You can't use a trick that the memory shadow
pagetables can use, of unshadowing parts of the tree and reshadowing
them.

So as long as the currently-in-use GTT tree contains no more than
$LIMIT ranges, you can unshadow and reshadow; this will be slow, but
strictly speaking correct.

What do you do if the guest driver switches to a GTT such that the
entire tree takes up more than $LIMIT entries?



Good question. Like the memory virtualization, IIUC, besides wp the
guest page tables, we can also track the updates of them when cr3 is
written or when a tlb flush occurs. We can consider to optimize our GPU
device model to achieve similar goal, e.g. when a root pointer(like
cr3) to the page table is written and when a set of commands is
submitted(Both situations are trigger by MMIO operations). But taking
consideration of performance, we may probably still need to wp all the
page tables when they are created at the first time. It requires a lot
optimization work in the device model side to find a balance between a
minimal wp-ed gpfns and a reasonable performance. We'd like to have a
try. :)

Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread Yu, Zhang




On 2/5/2016 1:12 AM, George Dunlap wrote:

On 04/02/16 14:08, Jan Beulich wrote:

On 04.02.16 at 14:33, <ian.jack...@eu.citrix.com> wrote:

Jan Beulich writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
max_wp_ram_ranges."):

On 04.02.16 at 10:38, <yu.c.zh...@linux.intel.com> wrote:

So another question is, if value of this limit really matters, will a
lower one be more acceptable(the current 256 being not enough)?


If you've carefully read George's replies, [...]


Thanks to George for the very clear explanation, and also to him for
an illuminating in-person discussion.

It is disturbing that as a result of me as a tools maintainer asking
questions about what seems to me to be a troublesome a user-visible
control setting in libxl, we are now apparently revisiting lower
layers of the hypervisor design, which have already been committed.

While I find George's line of argument convincing, neither I nor
George are maintainers of the relevant hypervisor code.  I am not
going to insist that anything in the hypervisor is done different and
am not trying to use my tools maintainer position to that end.

Clearly there has been a failure of our workflow to consider and
review everything properly together.  But given where we are now, I
think that this discussion about hypervisor internals is probably a
distraction.


While I recall George having made that alternative suggestion,
both Yu and Paul having reservations against it made me not
insist on that alternative. Instead I've been trying to limit some
of the bad effects that the variant originally proposed brought
with it. Clearly, with the more detailed reply George has now
given (involving areas where he is the maintainer for), I should
have been more demanding towards the exploration of that
alternative. That's clearly unfortunate, and I apologize for that,
but such things happen.

As to one of the patches already having for committed - I'm not
worried about that at all. We can always revert, that's why the
thing is called "unstable".


It looks like I should have been more careful to catch up on the current
state of things before I started arguing again -- please accept my
apologies.



In fact, I need to say thank you all for your patience and suggestions.
I'm thrilled to see XenGT is receiving so much attention. :)


I see that patch 2/3 addresses the gpfn/io question in the commit
message by saying, "Previously, a new hypercall or subop was suggested
to map write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the existing
pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a
type parameter in this hypercall. So no new hypercall defined, only a
new type is introduced."

And I see that 2/3 internally separates the WP_RAM type into a separate
rangeset, whose size can be adjusted separately.

This addresses my complaint about the interface using gpfns rather than
MMIO ranges as an interface (somewhat anyway).  Sorry for not
acknowledging this at first.

The question of the internal implementation -- whether to use RB tree
rangesets, or radix trees (as apparently ARM memaccess does) or p2m
types -- is an internal implementation question.  I think p2m types is
long-term the best way to go, but it won't hurt to have the current
implementation checked in, as long as it doesn't have any impacts on the
stable interface.

At the moment, as far as I can tell, there's no way for libxl to even
run a version of qemu with XenGT enabled, so there's no real need for
libxl to be involved.



I agree.


The purpose of having the limit would putatively be to prevent a guest
being able to trigger an exhaustion of hypervisor memory by inducing the
device model to mark an arbitrary number of ranges as mmio_dm.

Two angles on this.

First, assuming that limiting the number of ranges is what we want:  I'm
not really a fan of using HVM_PARAMs for this, but as long as it's not
considered a public interface (i.e., it could go away or disappear and
everything would Just Work), then I wouldn't object.

Although I would ask: would it instead be suitable for now to just set
the default limit for WP_RAM to 8196 in the hypervisor, since we do
expect it to be tracking gpfn ranges rather than IO regions?  And if we


That is what we have suggesting in v9. But Jan proposed we leave this
option to the admin. And to some extent, I can understand his concern.


determine in the future that more ranges are necessary, to then do the
work of moving it to using p2m types (or exposing a knob to adjust it)?

But (and this the other angle): is simply marking a numerical limit
sufficient to avoid memory exhaustion? Is there a danger that after
creating several guests, such that Xen was now running very low on
memory, that a guest would (purposely or not) cause memory to be
exhausted sometime further after boot, causing a system-wide DoS (or
just genera

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread Jan Beulich

>>> On 05.02.16 at 04:44,  wrote:
> This is why Yu mentioned earlier whether we can just set a default
> limit which is good for majority of use cases, while extending our
> device mode to drop/recreate some shadow tables upon the limitation
> is hit. I think this matches how today's CPU shadow page table is
> implemented, which also has a limitation of how many shadow pages
> are allowed per-VM.

Except that un-shadowing some victim page in order to shadow one
to make forward progress can there be done in the page fault handler,
i.e. we don't need to up front determine the set of pages to be
shadowed for a certain operation to complete.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread Yu, Zhang




On 2/5/2016 12:18 PM, Tian, Kevin wrote:

From: George Dunlap [mailto:george.dun...@citrix.com]
Sent: Friday, February 05, 2016 1:12 AM

On 04/02/16 14:08, Jan Beulich wrote:

On 04.02.16 at 14:33, <ian.jack...@eu.citrix.com> wrote:

Jan Beulich writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
max_wp_ram_ranges."):

On 04.02.16 at 10:38, <yu.c.zh...@linux.intel.com> wrote:

So another question is, if value of this limit really matters, will a
lower one be more acceptable(the current 256 being not enough)?


If you've carefully read George's replies, [...]


Thanks to George for the very clear explanation, and also to him for
an illuminating in-person discussion.

It is disturbing that as a result of me as a tools maintainer asking
questions about what seems to me to be a troublesome a user-visible
control setting in libxl, we are now apparently revisiting lower
layers of the hypervisor design, which have already been committed.

While I find George's line of argument convincing, neither I nor
George are maintainers of the relevant hypervisor code.  I am not
going to insist that anything in the hypervisor is done different and
am not trying to use my tools maintainer position to that end.

Clearly there has been a failure of our workflow to consider and
review everything properly together.  But given where we are now, I
think that this discussion about hypervisor internals is probably a
distraction.


While I recall George having made that alternative suggestion,
both Yu and Paul having reservations against it made me not
insist on that alternative. Instead I've been trying to limit some
of the bad effects that the variant originally proposed brought
with it. Clearly, with the more detailed reply George has now
given (involving areas where he is the maintainer for), I should
have been more demanding towards the exploration of that
alternative. That's clearly unfortunate, and I apologize for that,
but such things happen.

As to one of the patches already having for committed - I'm not
worried about that at all. We can always revert, that's why the
thing is called "unstable".


It looks like I should have been more careful to catch up on the current
state of things before I started arguing again -- please accept my
apologies.


Thanks George for your careful thinking.



I see that patch 2/3 addresses the gpfn/io question in the commit
message by saying, "Previously, a new hypercall or subop was suggested
to map write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the existing
pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a
type parameter in this hypercall. So no new hypercall defined, only a
new type is introduced."

And I see that 2/3 internally separates the WP_RAM type into a separate
rangeset, whose size can be adjusted separately.

This addresses my complaint about the interface using gpfns rather than
MMIO ranges as an interface (somewhat anyway).  Sorry for not
acknowledging this at first.

The question of the internal implementation -- whether to use RB tree
rangesets, or radix trees (as apparently ARM memaccess does) or p2m
types -- is an internal implementation question.  I think p2m types is
long-term the best way to go, but it won't hurt to have the current
implementation checked in, as long as it doesn't have any impacts on the
stable interface.


I'm still trying to understand your suggestion vs. this one. Today we
already have a p2m_mmio_write_dm type. It's there already, and any
write fault hitting that type will be delivered to ioreq server. Then next
open is how a ioreq server could know whether it should handle this
request or not, which is why some tracking structures (either RB/radix)
are created to maintain that specific information. It's under the assumption
that multiple ioreq servers co-exist, so a loop check on all ioreq servers
is required to identify the right target. And multiple ioreq servers are
real case in XenGT, because our vGPU device model is in kernel, as
part of Intel i915 graphics driver. So at least two ioreq servers already
exist, with one routing to XenGT in Dom0 kernel space and the other
to the default Qemu in Dom0 user.

In your long-term approach with p2m types, looks you are proposing
encoding ioreq server ID in p2m type directly (e.g. 4bits), which then
eliminates the need of tracking in ioreq server side so the whole
security concern is gone. And no limitation at all. Because available
p2m bits are limited, as Andrew pointed out, so it might be reasonable
to implement this approach when a new p2t structure is added, which
is why we consider it as a long-term approach.

Please correct me if above understanding is correct?



At the moment, as far as I can tell, there's no way for libxl to even
run a version of qemu with XenGT enabled, so there's no real need for
libxl to be involved.


no way because we

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread George Dunlap

On Fri, Feb 5, 2016 at 3:13 PM, Zhiyuan Lv  wrote:
>> My question is, suppose a single GTT / gpu thread / tree has 9000
>> ranges.  It would be trivial for an attacker to break into the
>> operating system and *construct* such a tree, but it's entirely
>> possible that due to a combination of memory fragmentation and very
>> large usage, the normal driver might accidentally create such a GTT.
>> In that case, the device model will not be able to write-protect all
>> the pages in the single GTT, and thus will not be able to correctly
>> track changes to the currently-active GTT.  What does your device
>> model do in that case?
>
> We can live with the partially write protected tree. That is because
> GPU's workload execution is controlled by the device model. We still
> have chance to update the shadow page table before we submit workload
> to GPU. The impact is performance not correctness. Thanks!

Right -- so it's actually never a hard limit.  That's good to know.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 05 February 2016 08:33
> To: George Dunlap
> Cc: Andrew Cooper; Ian Campbell; Paul Durrant; Stefano Stabellini; Wei Liu;
> Ian Jackson; Kevin Tian; zhiyuan...@intel.com; Zhang Yu; xen-
> de...@lists.xen.org; Keir (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> >>> On 04.02.16 at 18:12, <george.dun...@citrix.com> wrote:
> > Two angles on this.
> >
> > First, assuming that limiting the number of ranges is what we want:  I'm
> > not really a fan of using HVM_PARAMs for this, but as long as it's not
> > considered a public interface (i.e., it could go away or disappear and
> > everything would Just Work), then I wouldn't object.
> 
> The parameter setting interface generally is guest exposed, it's just
> that many parameters can't be set by guests for themselves. Of
> course the new #define could be put inside a Xen/tools conditional.
> 
> > Although I would ask: would it instead be suitable for now to just set
> > the default limit for WP_RAM to 8196 in the hypervisor, since we do
> > expect it to be tracking gpfn ranges rather than IO regions?  And if we
> > determine in the future that more ranges are necessary, to then do the
> > work of moving it to using p2m types (or exposing a knob to adjust it)?
> 
> That's what collides with disaggregation (as pointed out before):
> Any tool stack component could then create wp-ram pages in
> guests it controls, no matter whether those guests actually have
> a need for such. I continue to think that if we indeed got this
> route, then the host admin should be required to explicitly state
> that the risks are acceptable (by allowing only certain guests to
> actually create [many] of such pages).
> 
> > But (and this the other angle): is simply marking a numerical limit
> > sufficient to avoid memory exhaustion? Is there a danger that after
> > creating several guests, such that Xen was now running very low on
> > memory, that a guest would (purposely or not) cause memory to be
> > exhausted sometime further after boot, causing a system-wide DoS (or
> > just general lack of stability)?
> 
> The guest itself can't, but other than fully privileged tool stack
> components could, and that's still something we need to avoid.
> 
> > In the shadow / hap memory case, the memory is pre-allocated up front,
> > which makes sure that nothing a guest does can cause Xen to run out of
> > memory once it's booted.  Without pre-allocating it, it's still possible
> > that the admin might start up enough VMs that exhaustion is *possible*,
> > but won't be triggered until later (when the guest starts using more GTTs).
> >
> > Although in fact this really points to the need for a general overhaul
> > in how memory allocation on behalf of a domain is handled in general;
> > that's a bigger chunk of work.
> 
> Well, right now there's pretty little allocation happening at run time
> afaict, i.e. having a couple of pages available will generally keep
> things going smoothly. (Once upon a time it was that there were -
> iirc - no active runtime allocations at all, i.e. such had occurred only
> in the context of things like domctls, where failure could be dealt
> with by ballooning a few pages out of some domain and retrying.
> I'm afraid that's not the case anymore nowadays, and even if it
> was would collide with disaggregation, as a tool stack component
> legitimately issuing a domctl may not have the privileges to
> balloon other than the domain it controls, in particular not Dom0.)
> 
> > But in any case, it seems to me that we can entirely avoid the question
> > of how many ranges might ever be necessary by starting with a fixed
> > limit in the hypervisor, and then moving to a p2m-type based
> > implementation if and when that becomes unsatisfactory.
> 
> With all of the above I think we should rather explore the p2m-type
> based approach, in particular the suggestion Kevin has made to
> direct all p2m_mmio_write_dm (a name which appears to have been
> badly chosen, considering that we're talking about RAM pages
> here) write accesses to the default ioreq server (utilizing that
> upstream qemu doesn't itself register as such).
> 

Utilizing the default server is a backwards step. GVT-g would have to use the 
old HVM_PARAM mechanism to cause it's emulator to become default. I think a 
more appropriate mechanism would be p2m_mmio_write_dm to become something like 
'p2m_ioreq_server_write' and then have a hypercall to allow it to be mapped to 
a particular ioreq server.
Obviously only one could claim it but, with

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread Jan Beulich

>>> On 05.02.16 at 10:24,  wrote:
> Utilizing the default server is a backwards step. GVT-g would have to use the 
> old HVM_PARAM mechanism to cause it's emulator to become default. I think a 
> more appropriate mechanism would be p2m_mmio_write_dm to become something 
> like 'p2m_ioreq_server_write' and then have a hypercall to allow it to be 
> mapped to a particular ioreq server.
> Obviously only one could claim it but, with a p2t, the bit could be 
> re-purposed to simply mean 'go look in the p2t' for more information and 
> then the p2t could be structured to allow emulations to be steered to one of 
> many ioreq servers (for read and/or write emulation).

Sounds reasonable.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread George Dunlap

On Fri, Feb 5, 2016 at 3:44 AM, Tian, Kevin  wrote:
>> > So as long as the currently-in-use GTT tree contains no more than
>> > $LIMIT ranges, you can unshadow and reshadow; this will be slow, but
>> > strictly speaking correct.
>> >
>> > What do you do if the guest driver switches to a GTT such that the
>> > entire tree takes up more than $LIMIT entries?
>>
>> GPU has some special properties different from CPU, which make things
>> easier. The GPU page table is constructed by CPU and used by GPU
>> workloads. GPU workload itself will not change the page table.
>> Meanwhile, GPU workload submission, in virtualized environment, is
>> controled by our device model. Then we can reshadow the whole table
>> every time before we submit workload. That can reduce the total number
>> required for write protection but with performance impact, because GPU
>> has to have more idle time waiting for CPU. Hope the info helps.
>> Thanks!
>>
>
> Putting in another way, it's fully under mediation when a GPU page table
> (GTT) will be referenced by the GPU, so there're plenty of room to
> optimize existing shadowing (always shadowing all recognized GPU page
> tables), e.g. shadowing only active one when a VM is scheduled in. It's
> a performance matter but no correctness issue.
>
> This is why Yu mentioned earlier whether we can just set a default
> limit which is good for majority of use cases, while extending our
> device mode to drop/recreate some shadow tables upon the limitation
> is hit. I think this matches how today's CPU shadow page table is
> implemented, which also has a limitation of how many shadow pages
> are allowed per-VM.

I don't think you've understood my question (or maybe I still don't
understood the situation properly).

So in memory pagetables, there's a "tree" that contains a single
top-level page, which points to other pages, which defines one address
space (usually corresponding to one process or thread).   (This is
often just refered to as 'cr3', since it's the value you write into
the cr3 register on x86 processors.) I'm assuming that the structure
is similar for your GPU translation tables -- that a single GTT is
effectively a "tree" sort of like a process address space for an OS.

And it sounds like what you're saying is: suppose we have 10 different
GTTs (i.e., an entire tree / gpu thread), and each one require 1024
ranges to shadow.  In that case, a limit of 8192 ranges means we can
only keep 8 of the ten actually shadowed at any one time.  This is not
optimal, since it will occasionally mean unshadowing an entire GTT and
re-shadowing another one, but it will work, because we can always make
sure that the currently-active GTT is shadowed.

My question is, suppose a single GTT / gpu thread / tree has 9000
ranges.  It would be trivial for an attacker to break into the
operating system and *construct* such a tree, but it's entirely
possible that due to a combination of memory fragmentation and very
large usage, the normal driver might accidentally create such a GTT.
In that case, the device model will not be able to write-protect all
the pages in the single GTT, and thus will not be able to correctly
track changes to the currently-active GTT.  What does your device
model do in that case?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread George Dunlap

On Fri, Feb 5, 2016 at 9:24 AM, Paul Durrant  wrote:
> Utilizing the default server is a backwards step. GVT-g would have to use the 
> old HVM_PARAM mechanism to cause it's emulator to become default. I think a 
> more appropriate mechanism would be p2m_mmio_write_dm to become something 
> like 'p2m_ioreq_server_write' and then have a hypercall to allow it to be 
> mapped to a particular ioreq server.
> Obviously only one could claim it but, with a p2t, the bit could be 
> re-purposed to simply mean 'go look in the p2t' for more information and then 
> the p2t could be structured to allow emulations to be steered to one of many 
> ioreq servers (for read and/or write emulation).

Right; I had in mind that Xen would allow at any given time a max of N
ioreq servers to register for mmio_write_dm ranges, first-come
first-served; with 'N' being '1' to begin with.  If a second ioreq
server requested mmio_write_dm functionality, it would get -EBUSY.
This would allow their current setup (one qemu dm which doesn't do
mmio_write_dm, one xengt dm which does) to work without needing to
worry any more about how many pages might need to be tracked (either
for efficiency or correctness).

We could then extend this to some larger number (4 seems pretty
reasonable to me) either by adding an extra 3 types, or by some other
method (such as the one Paul suggests).

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-05 Thread Paul Durrant

> -Original Message-
> From: dunl...@gmail.com [mailto:dunl...@gmail.com] On Behalf Of
> George Dunlap
> Sent: 05 February 2016 11:14
> To: Paul Durrant
> Cc: Jan Beulich; George Dunlap; Kevin Tian; Wei Liu; Ian Campbell; Andrew
> Cooper; Zhang Yu; xen-devel@lists.xen.org; Stefano Stabellini;
> zhiyuan...@intel.com; Ian Jackson; Keir (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> On Fri, Feb 5, 2016 at 9:24 AM, Paul Durrant <paul.durr...@citrix.com>
> wrote:
> > Utilizing the default server is a backwards step. GVT-g would have to use
> the old HVM_PARAM mechanism to cause it's emulator to become default. I
> think a more appropriate mechanism would be p2m_mmio_write_dm to
> become something like 'p2m_ioreq_server_write' and then have a hypercall
> to allow it to be mapped to a particular ioreq server.
> > Obviously only one could claim it but, with a p2t, the bit could be re-
> purposed to simply mean 'go look in the p2t' for more information and then
> the p2t could be structured to allow emulations to be steered to one of many
> ioreq servers (for read and/or write emulation).
> 
> Right; I had in mind that Xen would allow at any given time a max of N
> ioreq servers to register for mmio_write_dm ranges, first-come
> first-served; with 'N' being '1' to begin with.  If a second ioreq
> server requested mmio_write_dm functionality, it would get -EBUSY.
> This would allow their current setup (one qemu dm which doesn't do
> mmio_write_dm, one xengt dm which does) to work without needing to
> worry any more about how many pages might need to be tracked (either
> for efficiency or correctness).
> 
> We could then extend this to some larger number (4 seems pretty
> reasonable to me) either by adding an extra 3 types, or by some other
> method (such as the one Paul suggests).

I think it would be best to do away with the 'write dm' name though. I would 
like to see it be possible to steer reads+writes, as well as writes (and maybe 
just reads?) to a particular ioreq server based on type information. So maybe 
we just call the existing type 'p2m_ioreq_server' and then, in the absence of a 
p2t, hardcode this to go to whichever emulator makes the new TBD hypercall.
I think we need a proper design at this point. Given that it's Chinese New Year 
maybe I'll have a stab in Yu's absence.

  Paul

> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Yu, Zhang

On 2/4/2016 5:28 PM, Paul Durrant wrote:

-Original Message-
From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com]
Sent: 04 February 2016 08:51
To: George Dunlap; Ian Jackson
Cc: Paul Durrant; Kevin Tian; Wei Liu; Ian Campbell; Andrew Cooper; xen-
de...@lists.xen.org; Stefano Stabellini; zhiyuan...@intel.com; Jan Beulich;
Keir (Xen.org)
Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
max_wp_ram_ranges.

On 2/4/2016 2:21 AM, George Dunlap wrote:

On Wed, Feb 3, 2016 at 5:41 PM, George Dunlap
<george.dun...@eu.citrix.com> wrote:

I think at some point I suggested an alternate design based on marking
such gpfns with a special p2m type; I can't remember if that
suggestion was actually addressed or not.

FWIW, the thread where I suggested using p2m types was in response to

<1436163912-1506-2-git-send-email-yu.c.zh...@linux.intel.com>

Looking through it again, the main objection Paul gave[1]  was:

"And it's the assertion that use of write_dm will only be relevant to
gfns, and that all such notifications only need go to a single ioreq
server, that I have a problem with. Whilst the use of io ranges to
track gfn updates is, I agree, not ideal I think the overloading of
write_dm is not a step in the right direction."

Two issues raised here, about using only p2m types to implement

write_dm:

1. More than one ioreq server may want to use the write_dm functionality
2. ioreq servers may want to use write_dm for things other than individual

gpfns

My answer to #1 was:
1. At the moment, we only need to support a single ioreq server using

write_dm

2. It's not technically difficult to extend the number of servers
supported to something sensible, like 4 (using 4 different write_dm
p2m types)
3. The interface can be designed such that we can extend support to
multiple servers when we need to.

My answer to #2 was that there's no reason why using write_dm could be
used for both individual gpfns and ranges; there's no reason the
interface can't take a "start" and "count" argument, even if for the
time being "count" is almost always going to be 1.

Well, talking about "the 'count' always going to be 1". I doubt that. :)
Statistics in XenGT shows that, GPU page tables are very likely to
be allocated in contiguous gpfns.

Compare this to the downsides of the approach you're proposing:
1. Using 40 bytes of hypervisor space per guest GPU pagetable page (as
opposed to using a bit in the existing p2m table)
2. Walking down an RB tree with 8000 individual nodes to find out
which server to send the message to (rather than just reading the
value from the p2m table).

8K is an upper limit for the rangeset, in many cases the RB tree will
not contain that many nodes.

3. Needing to determine on a guest-by-guest basis whether to change the

limit

4. Needing to have an interface to make the limit even bigger, just in
case we find workloads that have even more GTTs.

Well, I have suggested in yesterday's reply. XenGT can choose not to
change this limit even when workloads are getting heavy - with
tradeoffs in the device model side.

I assume this means that the emulator can 'unshadow' GTTs (I guess on an LRU 
basis) so that it can shadow new ones when the limit has been exhausted?
If so, how bad is performance likely to be if we live with a lower limit and 
take the hit of unshadowing if the guest GTTs become heavily fragmented?

Thank you, Paul.

Well, I was told the emulator have approaches to delay the shadowing of
the GTT till future GPU commands are submitted. By now, I'm not sure
about the performance penalties if the limit is set too low. Although
we are confident 8K is a secure limit, it seems still too high to be
accepted. We will perform more experiments with this new approach to
find a balance between the lowest limit and the XenGT performance.

So another question is, if value of this limit really matters, will a
lower one be more acceptable(the current 256 being not enough)?

Thanks
Yu
find a

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Jan Beulich

>>> On 04.02.16 at 10:38,  wrote:
> So another question is, if value of this limit really matters, will a
> lower one be more acceptable(the current 256 being not enough)?

If you've carefully read George's replies, a primary aspect is
whether we wouldn't better revert commit f5a32c5b8e
("x86/HVM: differentiate IO/mem resources tracked by ioreq
server"), as with the alternative approach we wouldn't even
need HVMOP_IO_RANGE_WP_MEM afaict. And then the question
you raise would become irrelevant.

The part of the public interface being tools only allows some
freedom in when to do this, but I think it would be a bad idea
to ship 4.7 with this still in if you're not going to pursue this
route.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Ian Campbell

On Thu, 2016-02-04 at 10:49 +, George Dunlap wrote:
> On Thu, Feb 4, 2016 at 8:51 AM, Yu, Zhang 
> wrote:
> > > Going forward, we probably will, at some point, need to implement a
> > > parallel "p2t" structure to keep track of types -- and probably will
> > > whether end up implementing 4 separate write_dm types or not (for the
> > > reasons you describe).
> > > 
> > 
> > Thank you, George. Could you please elaborate more about the idea of
> > "p2t"?
> 
> So the p2m table is a partially-sparse structure designed to map pfns
> to mfns.  At the bottom is a 64-bit entry that contains certain bits
> specified by the hardware (mfn number, permissions, other features).
> There are at the moment extra bits that aren't use, and in these bits
> we store information about the pfn that Xen wants to know -- among
> other things, the 'type' of the gpfn.
> 
> But as Andy pointed out, Intel are adding new features which take up
> more of these bits; and at the same time, Xen has more features for
> which using a p2m type is the most obvious solution.  So the idea
> would be to have a separate structure, similar to the p2m table, but
> wouldn't be used by the hardware -- it would only be used by Xen to
> map pfn to type (or whatever other information it wanted about the
> gpfn).  This does mean duplicating all the non-leaf nodes, as well as
> doing two sparse-tree-walks rather  than just one when looking up gpfn
> information.

FWIW ARM already has such a structure to support xenaccess, since ARM had
far fewer available bits to start with.

It is not quite the same as above since it is only populated with pages for
which xenaccess is in use rather than all pages (and is only consulted if
we have reason to believe the page might be subject to xenaccess). FWIW it
uses Xen's radix tree stuff.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Paul Durrant

> -Original Message-
[snip]
> >>> Compare this to the downsides of the approach you're proposing:
> >>> 1. Using 40 bytes of hypervisor space per guest GPU pagetable page (as
> >>> opposed to using a bit in the existing p2m table)
> >>> 2. Walking down an RB tree with 8000 individual nodes to find out
> >>> which server to send the message to (rather than just reading the
> >>> value from the p2m table).
> >>
> >> 8K is an upper limit for the rangeset, in many cases the RB tree will
> >> not contain that many nodes.
> >>
> >>> 3. Needing to determine on a guest-by-guest basis whether to change
> the
> >> limit
> >>> 4. Needing to have an interface to make the limit even bigger, just in
> >>> case we find workloads that have even more GTTs.
> >>>
> >>
> >> Well, I have suggested in yesterday's reply. XenGT can choose not to
> >> change this limit even when workloads are getting heavy - with
> >> tradeoffs in the device model side.
> >
> > I assume this means that the emulator can 'unshadow' GTTs (I guess on an
> LRU basis) so that it can shadow new ones when the limit has been
> exhausted?
> > If so, how bad is performance likely to be if we live with a lower limit and
> take the hit of unshadowing if the guest GTTs become heavily fragmented?
> >
> Thank you, Paul.
> 
> Well, I was told the emulator have approaches to delay the shadowing of
> the GTT till future GPU commands are submitted. By now, I'm not sure
> about the performance penalties if the limit is set too low. Although
> we are confident 8K is a secure limit, it seems still too high to be
> accepted. We will perform more experiments with this new approach to
> find a balance between the lowest limit and the XenGT performance.
> 
> So another question is, if value of this limit really matters, will a
> lower one be more acceptable(the current 256 being not enough)?

Well, do you know conclusively that 256 is absolutely not enough, or is it just 
too low for acceptable performance?

  Paul

> 
> Thanks
> Yu
> find a
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Ian Campbell

On Wed, 2016-02-03 at 17:41 +, George Dunlap wrote:
> But of course, since they they aren't actually ranges but just gpfns,
> they're scattered randomly throughout the guest physical address
> space.

(Possibly) stupid question:

Since, AIUI, the in-guest GPU driver is XenGT aware could it not allocate a
contiguous range of pages at start of day to use as GPU PTs? Or even just N
contiguous regions, e.g. i think the "8K" refers to pages, which is 16 2M
allocations, which is a far more manageable number of ranges to track than
8096 individual pages.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread George Dunlap

On Thu, Feb 4, 2016 at 8:51 AM, Yu, Zhang  wrote:
>> Going forward, we probably will, at some point, need to implement a
>> parallel "p2t" structure to keep track of types -- and probably will
>> whether end up implementing 4 separate write_dm types or not (for the
>> reasons you describe).
>>
>
> Thank you, George. Could you please elaborate more about the idea of
> "p2t"?

So the p2m table is a partially-sparse structure designed to map pfns
to mfns.  At the bottom is a 64-bit entry that contains certain bits
specified by the hardware (mfn number, permissions, other features).
There are at the moment extra bits that aren't use, and in these bits
we store information about the pfn that Xen wants to know -- among
other things, the 'type' of the gpfn.

But as Andy pointed out, Intel are adding new features which take up
more of these bits; and at the same time, Xen has more features for
which using a p2m type is the most obvious solution.  So the idea
would be to have a separate structure, similar to the p2m table, but
wouldn't be used by the hardware -- it would only be used by Xen to
map pfn to type (or whatever other information it wanted about the
gpfn).  This does mean duplicating all the non-leaf nodes, as well as
doing two sparse-tree-walks rather  than just one when looking up gpfn
information.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread George Dunlap

On Thu, Feb 4, 2016 at 9:38 AM, Yu, Zhang  wrote:
> On 2/4/2016 5:28 PM, Paul Durrant wrote:
>> I assume this means that the emulator can 'unshadow' GTTs (I guess on an
>> LRU basis) so that it can shadow new ones when the limit has been exhausted?
>> If so, how bad is performance likely to be if we live with a lower limit
>> and take the hit of unshadowing if the guest GTTs become heavily fragmented?
>>
> Thank you, Paul.
>
> Well, I was told the emulator have approaches to delay the shadowing of
> the GTT till future GPU commands are submitted. By now, I'm not sure
> about the performance penalties if the limit is set too low. Although
> we are confident 8K is a secure limit, it seems still too high to be
> accepted. We will perform more experiments with this new approach to
> find a balance between the lowest limit and the XenGT performance.

Just to check some of my assumptions:

I assume that unlike memory accesses, your GPU hardware cannot
'recover' from faults in the GTTs. That is, for memory, you can take a
page fault, fix up the pagetables, and then re-execute the original
instruction; but so far I haven't heard of any devices being able to
seamlessly re-execute a transaction after a fault.  Is my
understanding correct?

If that is the case, then for every top-level value (whatever the
equivalent of the CR3), you need to be able to shadow the entire GTT
tree below it, yes?  You can't use a trick that the memory shadow
pagetables can use, of unshadowing parts of the tree and reshadowing
them.

So as long as the currently-in-use GTT tree contains no more than
$LIMIT ranges, you can unshadow and reshadow; this will be slow, but
strictly speaking correct.

What do you do if the guest driver switches to a GTT such that the
entire tree takes up more than $LIMIT entries?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Ian Campbell

On Thu, 2016-02-04 at 11:08 +, Ian Campbell wrote:
> On Thu, 2016-02-04 at 10:49 +, George Dunlap wrote:
> > On Thu, Feb 4, 2016 at 8:51 AM, Yu, Zhang 
> > wrote:
> > > > Going forward, we probably will, at some point, need to implement a
> > > > parallel "p2t" structure to keep track of types -- and probably
> > > > will
> > > > whether end up implementing 4 separate write_dm types or not (for
> > > > the
> > > > reasons you describe).
> > > > 
> > > 
> > > Thank you, George. Could you please elaborate more about the idea of
> > > "p2t"?
> > 
> > So the p2m table is a partially-sparse structure designed to map pfns
> > to mfns.  At the bottom is a 64-bit entry that contains certain bits
> > specified by the hardware (mfn number, permissions, other features).
> > There are at the moment extra bits that aren't use, and in these bits
> > we store information about the pfn that Xen wants to know -- among
> > other things, the 'type' of the gpfn.
> > 
> > But as Andy pointed out, Intel are adding new features which take up
> > more of these bits; and at the same time, Xen has more features for
> > which using a p2m type is the most obvious solution.  So the idea
> > would be to have a separate structure, similar to the p2m table, but
> > wouldn't be used by the hardware -- it would only be used by Xen to
> > map pfn to type (or whatever other information it wanted about the
> > gpfn).  This does mean duplicating all the non-leaf nodes, as well as
> > doing two sparse-tree-walks rather  than just one when looking up gpfn
> > information.
> 
> FWIW ARM already has such a structure to support xenaccess, since ARM had
> far fewer available bits to start with.
> 
> It is not quite the same as above since it is only populated with pages for
> which xenaccess is in use rather than all pages (and is only consulted if
> we have reason to believe the page might be subject to xenaccess). FWIW it
> uses Xen's radix tree stuff.

A second FWIW in case it is useful, but Linux on 32-bit ARM (older ones
with 32-bit PTEs) allocates every PT page as a pair of pages, so it gets 32
software defined bits for every h/w visible pte at pte+4K.

If you wanted to avoid the order-1 allocations which that implies then I
suppose you could chain to the "pteext" page from the struct page of the
h/w visible page.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Zhiyuan Lv

Hi George,

On Thu, Feb 04, 2016 at 11:06:33AM +, George Dunlap wrote:
> On Thu, Feb 4, 2016 at 9:38 AM, Yu, Zhang  wrote:
> > On 2/4/2016 5:28 PM, Paul Durrant wrote:
> >> I assume this means that the emulator can 'unshadow' GTTs (I guess on an
> >> LRU basis) so that it can shadow new ones when the limit has been 
> >> exhausted?
> >> If so, how bad is performance likely to be if we live with a lower limit
> >> and take the hit of unshadowing if the guest GTTs become heavily 
> >> fragmented?
> >>
> > Thank you, Paul.
> >
> > Well, I was told the emulator have approaches to delay the shadowing of
> > the GTT till future GPU commands are submitted. By now, I'm not sure
> > about the performance penalties if the limit is set too low. Although
> > we are confident 8K is a secure limit, it seems still too high to be
> > accepted. We will perform more experiments with this new approach to
> > find a balance between the lowest limit and the XenGT performance.
> 
> Just to check some of my assumptions:
> 
> I assume that unlike memory accesses, your GPU hardware cannot
> 'recover' from faults in the GTTs. That is, for memory, you can take a
> page fault, fix up the pagetables, and then re-execute the original
> instruction; but so far I haven't heard of any devices being able to
> seamlessly re-execute a transaction after a fault.  Is my
> understanding correct?

That's correct. At least for now, we do not have GPU page fault.

> 
> If that is the case, then for every top-level value (whatever the
> equivalent of the CR3), you need to be able to shadow the entire GTT
> tree below it, yes?  You can't use a trick that the memory shadow

That's right also.

> pagetables can use, of unshadowing parts of the tree and reshadowing
> them.
> 
> So as long as the currently-in-use GTT tree contains no more than
> $LIMIT ranges, you can unshadow and reshadow; this will be slow, but
> strictly speaking correct.
> 
> What do you do if the guest driver switches to a GTT such that the
> entire tree takes up more than $LIMIT entries?

GPU has some special properties different from CPU, which make things
easier. The GPU page table is constructed by CPU and used by GPU
workloads. GPU workload itself will not change the page table.
Meanwhile, GPU workload submission, in virtualized environment, is
controled by our device model. Then we can reshadow the whole table
every time before we submit workload. That can reduce the total number
required for write protection but with performance impact, because GPU
has to have more idle time waiting for CPU. Hope the info helps.
Thanks!

Regards,
-Zhiyuan

> 
>  -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Jan Beulich

>>> On 04.02.16 at 14:47,  wrote:
>> From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
>> Sent: 04 February 2016 13:34
>>  * Is it possible for libxl to somehow tell from the rest of the
>>configuration that this larger limit should be applied ?
>> 
>>AFAICT there is nothing in libxl directly involving vgpu.  How can
>>libxl be used to create a guest with vgpu enabled ?  I had thought
>>that this was done merely with the existing PCI passthrough
>>configuration, but it now seems that somehow a second device model
>>would have to be started.  libxl doesn't have code to do that.
>> 
> 
> AIUI if the setting of the increased limit is tied to provisioning a gvt-g 
> instance for a VM then I don't there needs to be extra information in the VM 
> config. These seems like the most sensible thing to do.

I don't understand this: For one, it's still unclear to me on what basis
it would be known that a given VM is a "gvt-g instance". And even if
that's indeed derivable from something, the uncertainty about a
workable upper bound on the number of WP ranges would still seem
to demand the value to be specifiable separately...

>> I now understand that these mmio ranges are created by the device
>> model.  Of course the device model needs to be able to create mmio
>> ranges for the guest.  And since they consume hypervisor resources,
>> the number of these must be limited (device models not necessarily
>> being trusted).
> 
> ...but I think there is still an open question as to whether the toolstack 
> is allowed to set that limit for a VM or not. IMO the toolstack should be 
> allowed to set that limit when creating a domain.

... as you indicate here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Ian Jackson

Paul Durrant writes ("RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter 
max_wp_ram_ranges."):
> There are patches in the XenGT xen repo which add extra parameters
> into the VM config to allow libxl to provision a gvt-g instance (of
> which there are a finite number per GPU) for a VM. The increased
> limit could be applied when doing so and it may be feasible to
> determine (maybe from the version of the GPU h/w) what a reasonable
> limit is.

Right.  So in that case there would be no need for this user-visible
knob (along with its sadly not very helpful documentation) to be
exposed in the libxl stable API.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 04 February 2016 14:13
> To: Paul Durrant
> Cc: Andrew Cooper; George Dunlap; Ian Campbell; Ian Jackson; Stefano
> Stabellini; Wei Liu; Kevin Tian; zhiyuan...@intel.com; Zhang Yu; xen-
> de...@lists.xen.org; Keir (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> >>> On 04.02.16 at 14:47, <paul.durr...@citrix.com> wrote:
> >> From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
> >> Sent: 04 February 2016 13:34
> >>  * Is it possible for libxl to somehow tell from the rest of the
> >>configuration that this larger limit should be applied ?
> >>
> >>AFAICT there is nothing in libxl directly involving vgpu.  How can
> >>libxl be used to create a guest with vgpu enabled ?  I had thought
> >>that this was done merely with the existing PCI passthrough
> >>configuration, but it now seems that somehow a second device model
> >>would have to be started.  libxl doesn't have code to do that.
> >>
> >
> > AIUI if the setting of the increased limit is tied to provisioning a gvt-g
> > instance for a VM then I don't there needs to be extra information in the
> VM
> > config. These seems like the most sensible thing to do.
> 
> I don't understand this: For one, it's still unclear to me on what basis
> it would be known that a given VM is a "gvt-g instance". And even if
> that's indeed derivable from something, the uncertainty about a
> workable upper bound on the number of WP ranges would still seem
> to demand the value to be specifiable separately...

There are patches in the XenGT xen repo which add extra parameters into the VM 
config to allow libxl to provision a gvt-g instance (of which there are a 
finite number per GPU) for a VM. The increased limit could be applied when 
doing so and it may be feasible to determine (maybe from the version of the GPU 
h/w) what a reasonable limit is.

  Paul

> 
> >> I now understand that these mmio ranges are created by the device
> >> model.  Of course the device model needs to be able to create mmio
> >> ranges for the guest.  And since they consume hypervisor resources,
> >> the number of these must be limited (device models not necessarily
> >> being trusted).
> >
> > ...but I think there is still an open question as to whether the toolstack
> > is allowed to set that limit for a VM or not. IMO the toolstack should be
> > allowed to set that limit when creating a domain.
> 
> ... as you indicate here.
> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Tian, Kevin

> From: Lv, Zhiyuan
> Sent: Friday, February 05, 2016 10:01 AM
> 
> Hi George,
> 
> On Thu, Feb 04, 2016 at 11:06:33AM +, George Dunlap wrote:
> > On Thu, Feb 4, 2016 at 9:38 AM, Yu, Zhang  
> > wrote:
> > > On 2/4/2016 5:28 PM, Paul Durrant wrote:
> > >> I assume this means that the emulator can 'unshadow' GTTs (I guess on an
> > >> LRU basis) so that it can shadow new ones when the limit has been 
> > >> exhausted?
> > >> If so, how bad is performance likely to be if we live with a lower limit
> > >> and take the hit of unshadowing if the guest GTTs become heavily 
> > >> fragmented?
> > >>
> > > Thank you, Paul.
> > >
> > > Well, I was told the emulator have approaches to delay the shadowing of
> > > the GTT till future GPU commands are submitted. By now, I'm not sure
> > > about the performance penalties if the limit is set too low. Although
> > > we are confident 8K is a secure limit, it seems still too high to be
> > > accepted. We will perform more experiments with this new approach to
> > > find a balance between the lowest limit and the XenGT performance.
> >
> > Just to check some of my assumptions:
> >
> > I assume that unlike memory accesses, your GPU hardware cannot
> > 'recover' from faults in the GTTs. That is, for memory, you can take a
> > page fault, fix up the pagetables, and then re-execute the original
> > instruction; but so far I haven't heard of any devices being able to
> > seamlessly re-execute a transaction after a fault.  Is my
> > understanding correct?
> 
> That's correct. At least for now, we do not have GPU page fault.
> 
> >
> > If that is the case, then for every top-level value (whatever the
> > equivalent of the CR3), you need to be able to shadow the entire GTT
> > tree below it, yes?  You can't use a trick that the memory shadow
> 
> That's right also.
> 
> > pagetables can use, of unshadowing parts of the tree and reshadowing
> > them.
> >
> > So as long as the currently-in-use GTT tree contains no more than
> > $LIMIT ranges, you can unshadow and reshadow; this will be slow, but
> > strictly speaking correct.
> >
> > What do you do if the guest driver switches to a GTT such that the
> > entire tree takes up more than $LIMIT entries?
> 
> GPU has some special properties different from CPU, which make things
> easier. The GPU page table is constructed by CPU and used by GPU
> workloads. GPU workload itself will not change the page table.
> Meanwhile, GPU workload submission, in virtualized environment, is
> controled by our device model. Then we can reshadow the whole table
> every time before we submit workload. That can reduce the total number
> required for write protection but with performance impact, because GPU
> has to have more idle time waiting for CPU. Hope the info helps.
> Thanks!
> 

Putting in another way, it's fully under mediation when a GPU page table 
(GTT) will be referenced by the GPU, so there're plenty of room to
optimize existing shadowing (always shadowing all recognized GPU page
tables), e.g. shadowing only active one when a VM is scheduled in. It's
a performance matter but no correctness issue.

This is why Yu mentioned earlier whether we can just set a default
limit which is good for majority of use cases, while extending our
device mode to drop/recreate some shadow tables upon the limitation
is hit. I think this matches how today's CPU shadow page table is
implemented, which also has a limitation of how many shadow pages
are allowed per-VM.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Tian, Kevin

> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: Friday, February 05, 2016 1:12 AM
> 
> On 04/02/16 14:08, Jan Beulich wrote:
> >>>> On 04.02.16 at 14:33, <ian.jack...@eu.citrix.com> wrote:
> >> Jan Beulich writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce 
> >> parameter
> >> max_wp_ram_ranges."):
> >>> On 04.02.16 at 10:38, <yu.c.zh...@linux.intel.com> wrote:
> >>>> So another question is, if value of this limit really matters, will a
> >>>> lower one be more acceptable(the current 256 being not enough)?
> >>>
> >>> If you've carefully read George's replies, [...]
> >>
> >> Thanks to George for the very clear explanation, and also to him for
> >> an illuminating in-person discussion.
> >>
> >> It is disturbing that as a result of me as a tools maintainer asking
> >> questions about what seems to me to be a troublesome a user-visible
> >> control setting in libxl, we are now apparently revisiting lower
> >> layers of the hypervisor design, which have already been committed.
> >>
> >> While I find George's line of argument convincing, neither I nor
> >> George are maintainers of the relevant hypervisor code.  I am not
> >> going to insist that anything in the hypervisor is done different and
> >> am not trying to use my tools maintainer position to that end.
> >>
> >> Clearly there has been a failure of our workflow to consider and
> >> review everything properly together.  But given where we are now, I
> >> think that this discussion about hypervisor internals is probably a
> >> distraction.
> >
> > While I recall George having made that alternative suggestion,
> > both Yu and Paul having reservations against it made me not
> > insist on that alternative. Instead I've been trying to limit some
> > of the bad effects that the variant originally proposed brought
> > with it. Clearly, with the more detailed reply George has now
> > given (involving areas where he is the maintainer for), I should
> > have been more demanding towards the exploration of that
> > alternative. That's clearly unfortunate, and I apologize for that,
> > but such things happen.
> >
> > As to one of the patches already having for committed - I'm not
> > worried about that at all. We can always revert, that's why the
> > thing is called "unstable".
> 
> It looks like I should have been more careful to catch up on the current
> state of things before I started arguing again -- please accept my
> apologies.

Thanks George for your careful thinking.

> 
> I see that patch 2/3 addresses the gpfn/io question in the commit
> message by saying, "Previously, a new hypercall or subop was suggested
> to map write-protected pages into ioreq server. However, it turned out
> handler of this new hypercall would be almost the same with the existing
> pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a
> type parameter in this hypercall. So no new hypercall defined, only a
> new type is introduced."
> 
> And I see that 2/3 internally separates the WP_RAM type into a separate
> rangeset, whose size can be adjusted separately.
> 
> This addresses my complaint about the interface using gpfns rather than
> MMIO ranges as an interface (somewhat anyway).  Sorry for not
> acknowledging this at first.
> 
> The question of the internal implementation -- whether to use RB tree
> rangesets, or radix trees (as apparently ARM memaccess does) or p2m
> types -- is an internal implementation question.  I think p2m types is
> long-term the best way to go, but it won't hurt to have the current
> implementation checked in, as long as it doesn't have any impacts on the
> stable interface.

I'm still trying to understand your suggestion vs. this one. Today we
already have a p2m_mmio_write_dm type. It's there already, and any
write fault hitting that type will be delivered to ioreq server. Then next
open is how a ioreq server could know whether it should handle this
request or not, which is why some tracking structures (either RB/radix)
are created to maintain that specific information. It's under the assumption
that multiple ioreq servers co-exist, so a loop check on all ioreq servers
is required to identify the right target. And multiple ioreq servers are
real case in XenGT, because our vGPU device model is in kernel, as
part of Intel i915 graphics driver. So at least two ioreq servers already
exist, with one routing to XenGT in Dom0 kernel space and the other 
to the default Qemu in Dom0 user.

In your long-term approach with p2m types, looks you are

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Tian, Kevin

> From: Ian Campbell [mailto:ian.campb...@citrix.com]
> Sent: Thursday, February 04, 2016 6:06 PM
> 
> On Wed, 2016-02-03 at 17:41 +, George Dunlap wrote:
> > But of course, since they they aren't actually ranges but just gpfns,
> > they're scattered randomly throughout the guest physical address
> > space.
> 
> (Possibly) stupid question:
> 
> Since, AIUI, the in-guest GPU driver is XenGT aware could it not allocate a
> contiguous range of pages at start of day to use as GPU PTs? Or even just N
> contiguous regions, e.g. i think the "8K" refers to pages, which is 16 2M
> allocations, which is a far more manageable number of ranges to track than
> 8096 individual pages.
> 

We add XenGT awareness in guest driver at minimum requirement (e.g.
to handle address space ballooning due to graphics memory partitioning),
which only impacts hardware specific initialization code (thinking vgpu as
a new sku).

However we're hesitating to touch other general driver code, such as 
allocation policy you mentioned earlier. Changing that part specifically 
for one sku needs very strong reason to convince driver maintainer.

And we cannot assume above allocation policy can be always met. 

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Tian, Kevin

> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: Thursday, February 04, 2016 11:51 PM
> 
> > -Original Message-
> > From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
> > Sent: 04 February 2016 15:06
> > To: Paul Durrant
> > Cc: Jan Beulich; Andrew Cooper; George Dunlap; Ian Campbell; Stefano
> > Stabellini; Wei Liu; Kevin Tian; zhiyuan...@intel.com; Zhang Yu; xen-
> > de...@lists.xen.org; Keir (Xen.org)
> > Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> > max_wp_ram_ranges.
> >
> > Paul Durrant writes ("RE: [Xen-devel] [PATCH v3 3/3] tools: introduce
> > parameter max_wp_ram_ranges."):
> > > There are patches in the XenGT xen repo which add extra parameters
> > > into the VM config to allow libxl to provision a gvt-g instance (of
> > > which there are a finite number per GPU) for a VM. The increased
> > > limit could be applied when doing so and it may be feasible to
> > > determine (maybe from the version of the GPU h/w) what a reasonable
> > > limit is.
> >
> > Right.  So in that case there would be no need for this user-visible
> > knob (along with its sadly not very helpful documentation) to be
> > exposed in the libxl stable API.
> >
> 
> No, I really don't think that should be necessary. Libxl merely needs to 
> configure a limit in
> the hypervisor appropriate to the gvt-g instance that is provisioned.
> 

Agree.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Tian, Kevin

> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, February 04, 2016 10:13 PM
> 
> >>> On 04.02.16 at 14:47,  wrote:
> >> From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
> >> Sent: 04 February 2016 13:34
> >>  * Is it possible for libxl to somehow tell from the rest of the
> >>configuration that this larger limit should be applied ?
> >>
> >>AFAICT there is nothing in libxl directly involving vgpu.  How can
> >>libxl be used to create a guest with vgpu enabled ?  I had thought
> >>that this was done merely with the existing PCI passthrough
> >>configuration, but it now seems that somehow a second device model
> >>would have to be started.  libxl doesn't have code to do that.
> >>
> >
> > AIUI if the setting of the increased limit is tied to provisioning a gvt-g
> > instance for a VM then I don't there needs to be extra information in the VM
> > config. These seems like the most sensible thing to do.
> 
> I don't understand this: For one, it's still unclear to me on what basis
> it would be known that a given VM is a "gvt-g instance". And even if
> that's indeed derivable from something, the uncertainty about a
> workable upper bound on the number of WP ranges would still seem
> to demand the value to be specifiable separately...
> 

We'll invent a different parameter e.g. gvt-g from existing passthrough
one. So just for this question, a toolstack can know whether a VM is
provisioned with a vgpu based on that parameter from config file.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Paul Durrant

> -Original Message-
> From: Yu, Zhang [mailto:yu.c.zh...@linux.intel.com]
> Sent: 04 February 2016 08:51
> To: George Dunlap; Ian Jackson
> Cc: Paul Durrant; Kevin Tian; Wei Liu; Ian Campbell; Andrew Cooper; xen-
> de...@lists.xen.org; Stefano Stabellini; zhiyuan...@intel.com; Jan Beulich;
> Keir (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> 
> 
> On 2/4/2016 2:21 AM, George Dunlap wrote:
> > On Wed, Feb 3, 2016 at 5:41 PM, George Dunlap
> > <george.dun...@eu.citrix.com> wrote:
> >> I think at some point I suggested an alternate design based on marking
> >> such gpfns with a special p2m type; I can't remember if that
> >> suggestion was actually addressed or not.
> >
> > FWIW, the thread where I suggested using p2m types was in response to
> >
> > <1436163912-1506-2-git-send-email-yu.c.zh...@linux.intel.com>
> >
> > Looking through it again, the main objection Paul gave[1]  was:
> >
> > "And it's the assertion that use of write_dm will only be relevant to
> > gfns, and that all such notifications only need go to a single ioreq
> > server, that I have a problem with. Whilst the use of io ranges to
> > track gfn updates is, I agree, not ideal I think the overloading of
> > write_dm is not a step in the right direction."
> >
> > Two issues raised here, about using only p2m types to implement
> write_dm:
> > 1. More than one ioreq server may want to use the write_dm functionality
> > 2. ioreq servers may want to use write_dm for things other than individual
> gpfns
> >
> > My answer to #1 was:
> > 1. At the moment, we only need to support a single ioreq server using
> write_dm
> > 2. It's not technically difficult to extend the number of servers
> > supported to something sensible, like 4 (using 4 different write_dm
> > p2m types)
> > 3. The interface can be designed such that we can extend support to
> > multiple servers when we need to.
> >
> > My answer to #2 was that there's no reason why using write_dm could be
> > used for both individual gpfns and ranges; there's no reason the
> > interface can't take a "start" and "count" argument, even if for the
> > time being "count" is almost always going to be 1.
> >
> 
> Well, talking about "the 'count' always going to be 1". I doubt that. :)
> Statistics in XenGT shows that, GPU page tables are very likely to
> be allocated in contiguous gpfns.
> 
> > Compare this to the downsides of the approach you're proposing:
> > 1. Using 40 bytes of hypervisor space per guest GPU pagetable page (as
> > opposed to using a bit in the existing p2m table)
> > 2. Walking down an RB tree with 8000 individual nodes to find out
> > which server to send the message to (rather than just reading the
> > value from the p2m table).
> 
> 8K is an upper limit for the rangeset, in many cases the RB tree will
> not contain that many nodes.
> 
> > 3. Needing to determine on a guest-by-guest basis whether to change the
> limit
> > 4. Needing to have an interface to make the limit even bigger, just in
> > case we find workloads that have even more GTTs.
> >
> 
> Well, I have suggested in yesterday's reply. XenGT can choose not to
> change this limit even when workloads are getting heavy - with
> tradeoffs in the device model side.

I assume this means that the emulator can 'unshadow' GTTs (I guess on an LRU 
basis) so that it can shadow new ones when the limit has been exhausted?
If so, how bad is performance likely to be if we live with a lower limit and 
take the hit of unshadowing if the guest GTTs become heavily fragmented?

  Paul

> 
> > I really don't understand where you're coming from on this.  The
> > approach you've chosen looks to me to be slower, more difficult to
> > implement, and more complicated; and it's caused a lot more resistance
> > trying to get this series accepted.
> >
> 
> I agree utilizing the p2m types to do so is more efficient and quite
> intuitive. But I hesitate to occupy the software available bits in EPT
> PTEs(like Andrew's reply). Although we have introduced one, we believe
> it can also be used for other situations in the future, not just XenGT.
> 
> Thanks
> Yu
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread George Dunlap

On 04/02/16 14:08, Jan Beulich wrote:
>>>> On 04.02.16 at 14:33, <ian.jack...@eu.citrix.com> wrote:
>> Jan Beulich writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce 
>> parameter 
>> max_wp_ram_ranges."):
>>> On 04.02.16 at 10:38, <yu.c.zh...@linux.intel.com> wrote:
>>>> So another question is, if value of this limit really matters, will a
>>>> lower one be more acceptable(the current 256 being not enough)?
>>>
>>> If you've carefully read George's replies, [...]
>>
>> Thanks to George for the very clear explanation, and also to him for
>> an illuminating in-person discussion.
>>
>> It is disturbing that as a result of me as a tools maintainer asking
>> questions about what seems to me to be a troublesome a user-visible
>> control setting in libxl, we are now apparently revisiting lower
>> layers of the hypervisor design, which have already been committed.
>>
>> While I find George's line of argument convincing, neither I nor
>> George are maintainers of the relevant hypervisor code.  I am not
>> going to insist that anything in the hypervisor is done different and
>> am not trying to use my tools maintainer position to that end.
>>
>> Clearly there has been a failure of our workflow to consider and
>> review everything properly together.  But given where we are now, I
>> think that this discussion about hypervisor internals is probably a
>> distraction.
> 
> While I recall George having made that alternative suggestion,
> both Yu and Paul having reservations against it made me not
> insist on that alternative. Instead I've been trying to limit some
> of the bad effects that the variant originally proposed brought
> with it. Clearly, with the more detailed reply George has now
> given (involving areas where he is the maintainer for), I should
> have been more demanding towards the exploration of that
> alternative. That's clearly unfortunate, and I apologize for that,
> but such things happen.
> 
> As to one of the patches already having for committed - I'm not
> worried about that at all. We can always revert, that's why the
> thing is called "unstable".

It looks like I should have been more careful to catch up on the current
state of things before I started arguing again -- please accept my
apologies.

I see that patch 2/3 addresses the gpfn/io question in the commit
message by saying, "Previously, a new hypercall or subop was suggested
to map write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the existing
pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a
type parameter in this hypercall. So no new hypercall defined, only a
new type is introduced."

And I see that 2/3 internally separates the WP_RAM type into a separate
rangeset, whose size can be adjusted separately.

This addresses my complaint about the interface using gpfns rather than
MMIO ranges as an interface (somewhat anyway).  Sorry for not
acknowledging this at first.

The question of the internal implementation -- whether to use RB tree
rangesets, or radix trees (as apparently ARM memaccess does) or p2m
types -- is an internal implementation question.  I think p2m types is
long-term the best way to go, but it won't hurt to have the current
implementation checked in, as long as it doesn't have any impacts on the
stable interface.

At the moment, as far as I can tell, there's no way for libxl to even
run a version of qemu with XenGT enabled, so there's no real need for
libxl to be involved.

The purpose of having the limit would putatively be to prevent a guest
being able to trigger an exhaustion of hypervisor memory by inducing the
device model to mark an arbitrary number of ranges as mmio_dm.

Two angles on this.

First, assuming that limiting the number of ranges is what we want:  I'm
not really a fan of using HVM_PARAMs for this, but as long as it's not
considered a public interface (i.e., it could go away or disappear and
everything would Just Work), then I wouldn't object.

Although I would ask: would it instead be suitable for now to just set
the default limit for WP_RAM to 8196 in the hypervisor, since we do
expect it to be tracking gpfn ranges rather than IO regions?  And if we
determine in the future that more ranges are necessary, to then do the
work of moving it to using p2m types (or exposing a knob to adjust it)?

But (and this the other angle): is simply marking a numerical limit
sufficient to avoid memory exhaustion? Is there a danger that after
creating several guests, such that Xen was now running very low on
memory, that a guest would (purposely or not) cause memory to be
exhausted sometime further after boot, causing a system-wide DoS (or
just general lac

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Paul Durrant

> -Original Message-
> From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
> Sent: 04 February 2016 13:34
> To: Jan Beulich
> Cc: Zhang Yu; Andrew Cooper; George Dunlap; Ian Campbell; Paul Durrant;
> Stefano Stabellini; Wei Liu; Kevin Tian; zhiyuan...@intel.com; xen-
> de...@lists.xen.org; Keir (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> Jan Beulich writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce
> parameter max_wp_ram_ranges."):
> > On 04.02.16 at 10:38, <yu.c.zh...@linux.intel.com> wrote:
> > > So another question is, if value of this limit really matters, will a
> > > lower one be more acceptable(the current 256 being not enough)?
> >
> > If you've carefully read George's replies, [...]
> 
> Thanks to George for the very clear explanation, and also to him for
> an illuminating in-person discussion.
> 
> It is disturbing that as a result of me as a tools maintainer asking
> questions about what seems to me to be a troublesome a user-visible
> control setting in libxl, we are now apparently revisiting lower
> layers of the hypervisor design, which have already been committed.
> 
> While I find George's line of argument convincing, neither I nor
> George are maintainers of the relevant hypervisor code.  I am not
> going to insist that anything in the hypervisor is done different and
> am not trying to use my tools maintainer position to that end.
> 
> Clearly there has been a failure of our workflow to consider and
> review everything properly together.  But given where we are now, I
> think that this discussion about hypervisor internals is probably a
> distraction.
> 
> 
> Let pose again some questions that I still don't have clear answers
> to:
> 
>  * Is it possible for libxl to somehow tell from the rest of the
>configuration that this larger limit should be applied ?
> 
>AFAICT there is nothing in libxl directly involving vgpu.  How can
>libxl be used to create a guest with vgpu enabled ?  I had thought
>that this was done merely with the existing PCI passthrough
>configuration, but it now seems that somehow a second device model
>would have to be started.  libxl doesn't have code to do that.
> 

AIUI if the setting of the increased limit is tied to provisioning a gvt-g 
instance for a VM then I don't there needs to be extra information in the VM 
config. These seems like the most sensible thing to do.

>  * In the configurations where a larger number is needed, what larger
>limit is appropriate ?  How should it be calculated ?
> 
>AFAICT from the discussion, 8192 is a reasonable bet.  Is everyone
>happy with it.
> 
> Ian.
> 
> PS: Earlier I asked:
> 
>  * How do we know that this does not itself give an opportunity for
>hypervisor resource exhaustion attacks by guests ?  (Note: if it
>_does_ give such an opportunity, this should be mentioned more
>clearly in the documentation.)
> 
>  * If we are talking about mmio ranges for ioreq servers, why do
>guests which do not use this feature have the ability to create
>them at all ?
> 
> I now understand that these mmio ranges are created by the device
> model.  Of course the device model needs to be able to create mmio
> ranges for the guest.  And since they consume hypervisor resources,
> the number of these must be limited (device models not necessarily
> being trusted).

...but I think there is still an open question as to whether the toolstack is 
allowed to set that limit for a VM or not. IMO the toolstack should be allowed 
to set that limit when creating a domain.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Jan Beulich

>>> On 04.02.16 at 14:33, <ian.jack...@eu.citrix.com> wrote:
> Jan Beulich writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce 
> parameter 
> max_wp_ram_ranges."):
>> On 04.02.16 at 10:38, <yu.c.zh...@linux.intel.com> wrote:
>> > So another question is, if value of this limit really matters, will a
>> > lower one be more acceptable(the current 256 being not enough)?
>> 
>> If you've carefully read George's replies, [...]
> 
> Thanks to George for the very clear explanation, and also to him for
> an illuminating in-person discussion.
> 
> It is disturbing that as a result of me as a tools maintainer asking
> questions about what seems to me to be a troublesome a user-visible
> control setting in libxl, we are now apparently revisiting lower
> layers of the hypervisor design, which have already been committed.
> 
> While I find George's line of argument convincing, neither I nor
> George are maintainers of the relevant hypervisor code.  I am not
> going to insist that anything in the hypervisor is done different and
> am not trying to use my tools maintainer position to that end.
> 
> Clearly there has been a failure of our workflow to consider and
> review everything properly together.  But given where we are now, I
> think that this discussion about hypervisor internals is probably a
> distraction.

While I recall George having made that alternative suggestion,
both Yu and Paul having reservations against it made me not
insist on that alternative. Instead I've been trying to limit some
of the bad effects that the variant originally proposed brought
with it. Clearly, with the more detailed reply George has now
given (involving areas where he is the maintainer for), I should
have been more demanding towards the exploration of that
alternative. That's clearly unfortunate, and I apologize for that,
but such things happen.

As to one of the patches already having for committed - I'm not
worried about that at all. We can always revert, that's why the
thing is called "unstable".

Everything below here I think was meant to be mainly directed
at Yu instead of me.

> Let pose again some questions that I still don't have clear answers
> to:
> 
>  * Is it possible for libxl to somehow tell from the rest of the
>configuration that this larger limit should be applied ?
> 
>AFAICT there is nothing in libxl directly involving vgpu.  How can
>libxl be used to create a guest with vgpu enabled ?  I had thought
>that this was done merely with the existing PCI passthrough
>configuration, but it now seems that somehow a second device model
>would have to be started.  libxl doesn't have code to do that.
> 
>  * In the configurations where a larger number is needed, what larger
>limit is appropriate ?  How should it be calculated ?
> 
>AFAICT from the discussion, 8192 is a reasonable bet.  Is everyone
>happy with it.
> 
> Ian.
> 
> PS: Earlier I asked:
> 
>  * How do we know that this does not itself give an opportunity for
>hypervisor resource exhaustion attacks by guests ?  (Note: if it
>_does_ give such an opportunity, this should be mentioned more
>clearly in the documentation.)
> 
>  * If we are talking about mmio ranges for ioreq servers, why do
>guests which do not use this feature have the ability to create
>them at all ?
> 
> I now understand that these mmio ranges are created by the device
> model.  Of course the device model needs to be able to create mmio
> ranges for the guest.  And since they consume hypervisor resources,
> the number of these must be limited (device models not necessarily
> being trusted).

Which is why I have been so hesitant to accept these patches.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Ian Jackson

Jan Beulich writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter 
max_wp_ram_ranges."):
> On 04.02.16 at 10:38, <yu.c.zh...@linux.intel.com> wrote:
> > So another question is, if value of this limit really matters, will a
> > lower one be more acceptable(the current 256 being not enough)?
> 
> If you've carefully read George's replies, [...]

Thanks to George for the very clear explanation, and also to him for
an illuminating in-person discussion.

It is disturbing that as a result of me as a tools maintainer asking
questions about what seems to me to be a troublesome a user-visible
control setting in libxl, we are now apparently revisiting lower
layers of the hypervisor design, which have already been committed.

While I find George's line of argument convincing, neither I nor
George are maintainers of the relevant hypervisor code.  I am not
going to insist that anything in the hypervisor is done different and
am not trying to use my tools maintainer position to that end.

Clearly there has been a failure of our workflow to consider and
review everything properly together.  But given where we are now, I
think that this discussion about hypervisor internals is probably a
distraction.

Let pose again some questions that I still don't have clear answers
to:

 * Is it possible for libxl to somehow tell from the rest of the
   configuration that this larger limit should be applied ?

   AFAICT there is nothing in libxl directly involving vgpu.  How can
   libxl be used to create a guest with vgpu enabled ?  I had thought
   that this was done merely with the existing PCI passthrough
   configuration, but it now seems that somehow a second device model
   would have to be started.  libxl doesn't have code to do that.

 * In the configurations where a larger number is needed, what larger
   limit is appropriate ?  How should it be calculated ?

   AFAICT from the discussion, 8192 is a reasonable bet.  Is everyone
   happy with it.

Ian.

PS: Earlier I asked:

 * How do we know that this does not itself give an opportunity for
   hypervisor resource exhaustion attacks by guests ?  (Note: if it
   _does_ give such an opportunity, this should be mentioned more
   clearly in the documentation.)

 * If we are talking about mmio ranges for ioreq servers, why do
   guests which do not use this feature have the ability to create
   them at all ?

I now understand that these mmio ranges are created by the device
model.  Of course the device model needs to be able to create mmio
ranges for the guest.  And since they consume hypervisor resources,
the number of these must be limited (device models not necessarily
being trusted).

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Paul Durrant

> -Original Message-
> From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
> Sent: 04 February 2016 15:06
> To: Paul Durrant
> Cc: Jan Beulich; Andrew Cooper; George Dunlap; Ian Campbell; Stefano
> Stabellini; Wei Liu; Kevin Tian; zhiyuan...@intel.com; Zhang Yu; xen-
> de...@lists.xen.org; Keir (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> Paul Durrant writes ("RE: [Xen-devel] [PATCH v3 3/3] tools: introduce
> parameter max_wp_ram_ranges."):
> > There are patches in the XenGT xen repo which add extra parameters
> > into the VM config to allow libxl to provision a gvt-g instance (of
> > which there are a finite number per GPU) for a VM. The increased
> > limit could be applied when doing so and it may be feasible to
> > determine (maybe from the version of the GPU h/w) what a reasonable
> > limit is.
> 
> Right.  So in that case there would be no need for this user-visible
> knob (along with its sadly not very helpful documentation) to be
> exposed in the libxl stable API.
> 

No, I really don't think that should be necessary. Libxl merely needs to 
configure a limit in the hypervisor appropriate to the gvt-g instance that is 
provisioned.

  Paul

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Yu, Zhang




On 2/4/2016 3:12 AM, George Dunlap wrote:

On 03/02/16 18:39, Andrew Cooper wrote:

On 03/02/16 18:21, George Dunlap wrote:

2. It's not technically difficult to extend the number of servers
supported to something sensible, like 4 (using 4 different write_dm
p2m types)


While technically true, spare bits in the pagetable entries are at a
premium, and steadily decreasing as Intel are introducing new features.

We have 16 current p2m types, and a finite upper bound of 6 bits of p2m
type space, already with a push to reduce this number.

While introducing 1 new p2m type for this purpose might be an acceptable
tradeoff, a using a p2m type per ioreq server is not IMO.


It is true that we don't have a ton of elbow room to grow at the moment.

But we actually already have a single p2m type -- mmio_write_dm -- that
as far as I know is only being used by XenGT.  We don't actually need to
add any new p2m types for my initial proposal.

Going forward, we probably will, at some point, need to implement a
parallel "p2t" structure to keep track of types -- and probably will
whether end up implementing 4 separate write_dm types or not (for the
reasons you describe).



Thank you, George. Could you please elaborate more about the idea of
"p2t"?

Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Yu, Zhang


On 2/4/2016 1:50 AM, George Dunlap wrote:

On Wed, Feb 3, 2016 at 3:10 PM, Paul Durrant  wrote:

  * Is it possible for libxl to somehow tell from the rest of the
configuration that this larger limit should be applied ?



If a XenGT-enabled VM is provisioned through libxl then some larger limit is 
likely to be required. One of the issues is that it is impossible (or at least 
difficult) to know how many GTTs are going to need to be shadowed.


By GTT, you mean the GPU pagetables I assume?  So you're talking about


Yes, GTT is "Graphics Translation Table" for short.


how large this value should be made, not whether the
heuristically-chosen larger value should be used.  libxl should be
able to tell that XenGT is enabled, I assume, so it should be able to
automatically bump this to 8k if necessary, right?



Yes.


But I think you'd still need a parameter that you could tweak if it
turned out that 8k wasn't enough for a particular workload, right?



Well, not exactly. For XenGT, the latest suggestion is that even when 8K
is not enough, we will not extend this limit anymore. But when
introducing this parameter, I had thought it might also be helpful for
other device virtualization cases which would like to use the mediated
passthrough idea.


  * If we are talking about mmio ranges for ioreq servers, why do
guests which do not use this feature have the ability to create
them at all ?


It's not the guest that directly creates the ranges, it's the emulator. 
Normally device emulation would require a relatively small number of MMIO 
ranges and a total number that cannot be influenced by the guest itself. In 
this case though, as I said above, the number *can* be influenced by the guest 
(although it is still the emulator which actually causes the ranges to be 
created).


Just to make this clear: The guest chooses how many gpfns are used in
the GPU pagetables; for each additional gpfn in the guest pagetable,
qemu / xen have the option of either marking it to be emulated (at the
moment, by marking it as a one-page "MMIO region") or crashing the
guest.



Well, kind of. The backend device model in dom0(not qemu) makes the 
decision whether or not this page is to be emulated.



(A background problem I have is that this thread is full of product
name jargon and assumes a lot of background knowledge of the
implementation of these features - background knowledge which I lack
and which isn't in these patches.  If someone could point me at a
quick summary of what `GVT-g' and `GVT-d' are that might help.)



GVT-d is a name applied to PCI passthrough of an Intel GPU. GVT-g is a name 
applied to Intel GPU virtualization, which makes use of an emulator to mediate 
guest access to the real GPU so that it is possible to share the resources 
securely.


And GTT are the GPU equivalent of page tables?


Yes.

Here let me try to give some brief introduction to the jargons:
* Intel GVT-d: an intel graphic virtualization solution, which dedicates
one physical GPU to a guest exclusively.

* Intel GVT-g: an intel graphic virtualization solution, with mediated
pass-through support. One physical GPU can be shared by multiple guests.
GPU performance-critical resources are partitioned by and passed
through to different vGPUs. Other GPU resources are trapped and
emulated by the device model.

* XenGT: Intel GVT-g code name for Xen.
Here this patch series are features required by XenGT.

* vGPU: virtual GPU presented to guests.

* GTT: abbreviation for graphics translation table, a page table
structure which translates the graphic memory address to a physical
one. For vGPU, PTEs in its GTT are GPFNs, thus raise a demand for
the device model to construct a group of shadow GPU page tables.

Thanks
Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-04 Thread Yu, Zhang




On 2/4/2016 2:21 AM, George Dunlap wrote:

On Wed, Feb 3, 2016 at 5:41 PM, George Dunlap
 wrote:

I think at some point I suggested an alternate design based on marking
such gpfns with a special p2m type; I can't remember if that
suggestion was actually addressed or not.


FWIW, the thread where I suggested using p2m types was in response to

<1436163912-1506-2-git-send-email-yu.c.zh...@linux.intel.com>

Looking through it again, the main objection Paul gave[1]  was:

"And it's the assertion that use of write_dm will only be relevant to
gfns, and that all such notifications only need go to a single ioreq
server, that I have a problem with. Whilst the use of io ranges to
track gfn updates is, I agree, not ideal I think the overloading of
write_dm is not a step in the right direction."

Two issues raised here, about using only p2m types to implement write_dm:
1. More than one ioreq server may want to use the write_dm functionality
2. ioreq servers may want to use write_dm for things other than individual gpfns

My answer to #1 was:
1. At the moment, we only need to support a single ioreq server using write_dm
2. It's not technically difficult to extend the number of servers
supported to something sensible, like 4 (using 4 different write_dm
p2m types)
3. The interface can be designed such that we can extend support to
multiple servers when we need to.

My answer to #2 was that there's no reason why using write_dm could be
used for both individual gpfns and ranges; there's no reason the
interface can't take a "start" and "count" argument, even if for the
time being "count" is almost always going to be 1.



Well, talking about "the 'count' always going to be 1". I doubt that. :)
Statistics in XenGT shows that, GPU page tables are very likely to
be allocated in contiguous gpfns.


Compare this to the downsides of the approach you're proposing:
1. Using 40 bytes of hypervisor space per guest GPU pagetable page (as
opposed to using a bit in the existing p2m table)
2. Walking down an RB tree with 8000 individual nodes to find out
which server to send the message to (rather than just reading the
value from the p2m table).


8K is an upper limit for the rangeset, in many cases the RB tree will
not contain that many nodes.


3. Needing to determine on a guest-by-guest basis whether to change the limit
4. Needing to have an interface to make the limit even bigger, just in
case we find workloads that have even more GTTs.



Well, I have suggested in yesterday's reply. XenGT can choose not to
change this limit even when workloads are getting heavy - with
tradeoffs in the device model side.


I really don't understand where you're coming from on this.  The
approach you've chosen looks to me to be slower, more difficult to
implement, and more complicated; and it's caused a lot more resistance
trying to get this series accepted.



I agree utilizing the p2m types to do so is more efficient and quite
intuitive. But I hesitate to occupy the software available bits in EPT
PTEs(like Andrew's reply). Although we have introduced one, we believe 
it can also be used for other situations in the future, not just XenGT.


Thanks
Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Jan Beulich

>>> On 03.02.16 at 08:10,  wrote:
> On 2/2/2016 11:21 PM, Jan Beulich wrote:
> On 02.02.16 at 16:00,  wrote:
>>> The limit of 4G is to avoid the data missing from uint64 to uint32
>>> assignment. And I can accept the 8K limit for XenGT in practice.
>>> After all, it is vGPU page tables we are trying to trap and emulate,
>>> not normal page frames.
>>>
>>> And I guess the reason that one domain exhausting Xen's memory can
>>> affect another domain is because rangeset uses Xen heap, instead of the
>>> per-domain memory. So what about we use a 8K limit by now for XenGT,
>>> and in the future, if a per-domain memory allocation solution for
>>> rangeset is ready, we do need to limit the rangeset size. Does this
>>> sounds more acceptable?
>>
>> The lower the limit the better (but no matter how low the limit
>> it won't make this a pretty thing). Anyway I'd still like to wait
>> for what Ian may further say on this.
>>
> Hi Jan, I just had a discussion with my colleague. We believe 8K could
> be the biggest limit for the write-protected ram ranges. If in the
> future, number of vGPU page tables exceeds this limit, we will modify
> our back-end device model to find a trade-off method, instead of
> extending this limit. If you can accept this value as the upper bound
> of rangeset, maybe we do not need to add any tool stack parameters, but
> define a MAX_NR_WR_RAM_RANGES for the write-protected ram rangesset. As
> to other rangesets, we keep their limit as 256. Does this sounds OK? :)

I'm getting the impression that we're moving in circles. A blanket
limit above the 256 one for all domains is _not_ going to be
acceptable; going to 8k will still need host admin consent. With
your rangeset performance improvement patch, each range is
going to be tracked by a 40 byte structure (up from 32), which
already means an overhead increase for all the other ranges. 8k
of wp ranges implies an overhead beyond 448k (including the
xmalloc() overhead), which is not _that_ much, but also not
negligible.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Jan Beulich

>>> On 03.02.16 at 13:50, <paul.durr...@citrix.com> wrote:
>>  -Original Message-
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: 03 February 2016 12:36
>> To: Paul Durrant
>> Cc: Andrew Cooper; Ian Campbell; Ian Jackson; Stefano Stabellini; Wei Liu;
>> Kevin Tian; zhiyuan...@intel.com; Zhang Yu; xen-devel@lists.xen.org; Keir
>> (Xen.org)
>> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
>> max_wp_ram_ranges.
>> 
>> >>> On 03.02.16 at 13:20, <paul.durr...@citrix.com> wrote:
>> >>  -Original Message-
>> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> Sent: 03 February 2016 08:33
>> >> To: Zhang Yu
>> >> Cc: Andrew Cooper; Ian Campbell; Paul Durrant; Wei Liu; Ian Jackson;
>> Stefano
>> >> Stabellini; Kevin Tian; zhiyuan...@intel.com; xen-devel@lists.xen.org; 
>> >> Keir
>> >> (Xen.org)
>> >> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
>> >> max_wp_ram_ranges.
>> >>
>> >> >>> On 03.02.16 at 08:10, <yu.c.zh...@linux.intel.com> wrote:
>> >> > On 2/2/2016 11:21 PM, Jan Beulich wrote:
>> >> >>>>> On 02.02.16 at 16:00, <yu.c.zh...@linux.intel.com> wrote:
>> >> >>> The limit of 4G is to avoid the data missing from uint64 to uint32
>> >> >>> assignment. And I can accept the 8K limit for XenGT in practice.
>> >> >>> After all, it is vGPU page tables we are trying to trap and emulate,
>> >> >>> not normal page frames.
>> >> >>>
>> >> >>> And I guess the reason that one domain exhausting Xen's memory
>> can
>> >> >>> affect another domain is because rangeset uses Xen heap, instead of
>> >> the
>> >> >>> per-domain memory. So what about we use a 8K limit by now for
>> XenGT,
>> >> >>> and in the future, if a per-domain memory allocation solution for
>> >> >>> rangeset is ready, we do need to limit the rangeset size. Does this
>> >> >>> sounds more acceptable?
>> >> >>
>> >> >> The lower the limit the better (but no matter how low the limit
>> >> >> it won't make this a pretty thing). Anyway I'd still like to wait
>> >> >> for what Ian may further say on this.
>> >> >>
>> >> > Hi Jan, I just had a discussion with my colleague. We believe 8K could
>> >> > be the biggest limit for the write-protected ram ranges. If in the
>> >> > future, number of vGPU page tables exceeds this limit, we will modify
>> >> > our back-end device model to find a trade-off method, instead of
>> >> > extending this limit. If you can accept this value as the upper bound
>> >> > of rangeset, maybe we do not need to add any tool stack parameters,
>> but
>> >> > define a MAX_NR_WR_RAM_RANGES for the write-protected ram
>> >> rangesset. As
>> >> > to other rangesets, we keep their limit as 256. Does this sounds OK? :)
>> >>
>> >> I'm getting the impression that we're moving in circles. A blanket
>> >> limit above the 256 one for all domains is _not_ going to be
>> >> acceptable; going to 8k will still need host admin consent. With
>> >> your rangeset performance improvement patch, each range is
>> >> going to be tracked by a 40 byte structure (up from 32), which
>> >> already means an overhead increase for all the other ranges. 8k
>> >> of wp ranges implies an overhead beyond 448k (including the
>> >> xmalloc() overhead), which is not _that_ much, but also not
>> >> negligible.
>> >>
>> >
>> > ... which means we are still going to need a toolstack parameter to set the
>> > limit. We already have a parameter for VRAM size so is having a parameter
>> for
>> > max. GTT shadow ranges such a bad thing?
>> 
>> It's workable, but not nice (see also Ian's earlier response).
>> 
>> > Is the fact that the memory comes
>> > from xenheap rather than domheap the real problem?
>> 
>> Not the primary one, since except on huge memory machines
>> both heaps are identical. To me the primary one is the quite
>> more significant resource consumption in the first place (I'm not
>> going to repeat what I've written in already way too many
>> replies before).
> 
> Ok. Well the only way round tracking specific ranges for emulation (and 
> consequently suffering the overhead) is tracking by type. For XenGT I guess 
> it would be possible to live with a situation where a single ioreq server can 
> register all wp mem emulations for a given VM. I can't say I particularly 
> like that way of doing things but if it's the only way forward then I guess 
> we may have to live with it.

Well, subject to Ian not objecting (still awaiting some follow-up by
him), I didn't mean to say doing it the proposed way is a no-go.
All that I really insist on is that this larger resource consumption
won't go without some form of host admin consent.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Jan Beulich

>>> On 03.02.16 at 14:07,  wrote:
>>  -Original Message-
> [snip]
>> >> >> I'm getting the impression that we're moving in circles. A blanket
>> >> >> limit above the 256 one for all domains is _not_ going to be
>> >> >> acceptable; going to 8k will still need host admin consent. With
>> >> >> your rangeset performance improvement patch, each range is
>> >> >> going to be tracked by a 40 byte structure (up from 32), which
>> >> >> already means an overhead increase for all the other ranges. 8k
>> >> >> of wp ranges implies an overhead beyond 448k (including the
>> >> >> xmalloc() overhead), which is not _that_ much, but also not
>> >> >> negligible.
>> >> >>
>> >> >
>> >> > ... which means we are still going to need a toolstack parameter to set
>> the
>> >> > limit. We already have a parameter for VRAM size so is having a
>> parameter
>> >> for
>> >> > max. GTT shadow ranges such a bad thing?
>> >>
>> >> It's workable, but not nice (see also Ian's earlier response).
>> >>
>> >> > Is the fact that the memory comes
>> >> > from xenheap rather than domheap the real problem?
>> >>
>> >> Not the primary one, since except on huge memory machines
>> >> both heaps are identical. To me the primary one is the quite
>> >> more significant resource consumption in the first place (I'm not
>> >> going to repeat what I've written in already way too many
>> >> replies before).
>> >
>> > Ok. Well the only way round tracking specific ranges for emulation (and
>> > consequently suffering the overhead) is tracking by type. For XenGT I
>> guess
>> > it would be possible to live with a situation where a single ioreq server 
>> > can
>> > register all wp mem emulations for a given VM. I can't say I particularly
>> > like that way of doing things but if it's the only way forward then I guess
>> > we may have to live with it.
>> 
>> Well, subject to Ian not objecting (still awaiting some follow-up by
>> him), I didn't mean to say doing it the proposed way is a no-go.
>> All that I really insist on is that this larger resource consumption
>> won't go without some form of host admin consent.
> 
> Would you be ok with purely host admin consent e.g. just setting the limit 
> via boot command line?

I wouldn't be happy with that (and I've said so before), since it
would allow all VM this extra resource consumption.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 03 February 2016 13:18
> To: Paul Durrant
> Cc: Andrew Cooper; Ian Campbell; Ian Jackson; Stefano Stabellini; Wei Liu;
> Kevin Tian; zhiyuan...@intel.com; Zhang Yu; xen-devel@lists.xen.org; Keir
> (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> >>> On 03.02.16 at 14:07, <paul.durr...@citrix.com> wrote:
> >>  -Original Message-
> > [snip]
> >> >> >> I'm getting the impression that we're moving in circles. A blanket
> >> >> >> limit above the 256 one for all domains is _not_ going to be
> >> >> >> acceptable; going to 8k will still need host admin consent. With
> >> >> >> your rangeset performance improvement patch, each range is
> >> >> >> going to be tracked by a 40 byte structure (up from 32), which
> >> >> >> already means an overhead increase for all the other ranges. 8k
> >> >> >> of wp ranges implies an overhead beyond 448k (including the
> >> >> >> xmalloc() overhead), which is not _that_ much, but also not
> >> >> >> negligible.
> >> >> >>
> >> >> >
> >> >> > ... which means we are still going to need a toolstack parameter to
> set
> >> the
> >> >> > limit. We already have a parameter for VRAM size so is having a
> >> parameter
> >> >> for
> >> >> > max. GTT shadow ranges such a bad thing?
> >> >>
> >> >> It's workable, but not nice (see also Ian's earlier response).
> >> >>
> >> >> > Is the fact that the memory comes
> >> >> > from xenheap rather than domheap the real problem?
> >> >>
> >> >> Not the primary one, since except on huge memory machines
> >> >> both heaps are identical. To me the primary one is the quite
> >> >> more significant resource consumption in the first place (I'm not
> >> >> going to repeat what I've written in already way too many
> >> >> replies before).
> >> >
> >> > Ok. Well the only way round tracking specific ranges for emulation (and
> >> > consequently suffering the overhead) is tracking by type. For XenGT I
> >> guess
> >> > it would be possible to live with a situation where a single ioreq server
> can
> >> > register all wp mem emulations for a given VM. I can't say I particularly
> >> > like that way of doing things but if it's the only way forward then I 
> >> > guess
> >> > we may have to live with it.
> >>
> >> Well, subject to Ian not objecting (still awaiting some follow-up by
> >> him), I didn't mean to say doing it the proposed way is a no-go.
> >> All that I really insist on is that this larger resource consumption
> >> won't go without some form of host admin consent.
> >
> > Would you be ok with purely host admin consent e.g. just setting the limit
> > via boot command line?
> 
> I wouldn't be happy with that (and I've said so before), since it
> would allow all VM this extra resource consumption.
> 

The ball is back in Ian's court then.

  Paul

> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Paul Durrant

> -Original Message-
[snip]
> >> >> I'm getting the impression that we're moving in circles. A blanket
> >> >> limit above the 256 one for all domains is _not_ going to be
> >> >> acceptable; going to 8k will still need host admin consent. With
> >> >> your rangeset performance improvement patch, each range is
> >> >> going to be tracked by a 40 byte structure (up from 32), which
> >> >> already means an overhead increase for all the other ranges. 8k
> >> >> of wp ranges implies an overhead beyond 448k (including the
> >> >> xmalloc() overhead), which is not _that_ much, but also not
> >> >> negligible.
> >> >>
> >> >
> >> > ... which means we are still going to need a toolstack parameter to set
> the
> >> > limit. We already have a parameter for VRAM size so is having a
> parameter
> >> for
> >> > max. GTT shadow ranges such a bad thing?
> >>
> >> It's workable, but not nice (see also Ian's earlier response).
> >>
> >> > Is the fact that the memory comes
> >> > from xenheap rather than domheap the real problem?
> >>
> >> Not the primary one, since except on huge memory machines
> >> both heaps are identical. To me the primary one is the quite
> >> more significant resource consumption in the first place (I'm not
> >> going to repeat what I've written in already way too many
> >> replies before).
> >
> > Ok. Well the only way round tracking specific ranges for emulation (and
> > consequently suffering the overhead) is tracking by type. For XenGT I
> guess
> > it would be possible to live with a situation where a single ioreq server 
> > can
> > register all wp mem emulations for a given VM. I can't say I particularly
> > like that way of doing things but if it's the only way forward then I guess
> > we may have to live with it.
> 
> Well, subject to Ian not objecting (still awaiting some follow-up by
> him), I didn't mean to say doing it the proposed way is a no-go.
> All that I really insist on is that this larger resource consumption
> won't go without some form of host admin consent.
> 

Would you be ok with purely host admin consent e.g. just setting the limit via 
boot command line?

  Paul

> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 03 February 2016 08:33
> To: Zhang Yu
> Cc: Andrew Cooper; Ian Campbell; Paul Durrant; Wei Liu; Ian Jackson; Stefano
> Stabellini; Kevin Tian; zhiyuan...@intel.com; xen-devel@lists.xen.org; Keir
> (Xen.org)
> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> >>> On 03.02.16 at 08:10, <yu.c.zh...@linux.intel.com> wrote:
> > On 2/2/2016 11:21 PM, Jan Beulich wrote:
> >>>>> On 02.02.16 at 16:00, <yu.c.zh...@linux.intel.com> wrote:
> >>> The limit of 4G is to avoid the data missing from uint64 to uint32
> >>> assignment. And I can accept the 8K limit for XenGT in practice.
> >>> After all, it is vGPU page tables we are trying to trap and emulate,
> >>> not normal page frames.
> >>>
> >>> And I guess the reason that one domain exhausting Xen's memory can
> >>> affect another domain is because rangeset uses Xen heap, instead of
> the
> >>> per-domain memory. So what about we use a 8K limit by now for XenGT,
> >>> and in the future, if a per-domain memory allocation solution for
> >>> rangeset is ready, we do need to limit the rangeset size. Does this
> >>> sounds more acceptable?
> >>
> >> The lower the limit the better (but no matter how low the limit
> >> it won't make this a pretty thing). Anyway I'd still like to wait
> >> for what Ian may further say on this.
> >>
> > Hi Jan, I just had a discussion with my colleague. We believe 8K could
> > be the biggest limit for the write-protected ram ranges. If in the
> > future, number of vGPU page tables exceeds this limit, we will modify
> > our back-end device model to find a trade-off method, instead of
> > extending this limit. If you can accept this value as the upper bound
> > of rangeset, maybe we do not need to add any tool stack parameters, but
> > define a MAX_NR_WR_RAM_RANGES for the write-protected ram
> rangesset. As
> > to other rangesets, we keep their limit as 256. Does this sounds OK? :)
> 
> I'm getting the impression that we're moving in circles. A blanket
> limit above the 256 one for all domains is _not_ going to be
> acceptable; going to 8k will still need host admin consent. With
> your rangeset performance improvement patch, each range is
> going to be tracked by a 40 byte structure (up from 32), which
> already means an overhead increase for all the other ranges. 8k
> of wp ranges implies an overhead beyond 448k (including the
> xmalloc() overhead), which is not _that_ much, but also not
> negligible.
> 

... which means we are still going to need a toolstack parameter to set the 
limit. We already have a parameter for VRAM size so is having a parameter for 
max. GTT shadow ranges such a bad thing? Is the fact that the memory comes from 
xenheap rather than domheap the real problem?

  Paul

> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Ian Jackson

Paul Durrant writes ("RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter 
max_wp_ram_ranges."):
> > From: Jan Beulich [mailto:jbeul...@suse.com]
...
> > I wouldn't be happy with that (and I've said so before), since it
> > would allow all VM this extra resource consumption.
> 
> The ball is back in Ian's court then.

Sorry to be vague, but: I'm not definitely objecting to some toolstack
parameter.  I'm trying to figure out whether this parameter, in this
form, with this documentation, makes some kind of sense.

In the most recent proposed patch the docs basically say (to most
users) "there is this parameter, but it is very complicated, so do not
set it.  We already have a lot of these kind of parameters.  As a
general rule they are OK if it is really the case that the parameter
should be ignored.  I am happy to have a whole lot of strange
parameters that the user can ignore.

But as far as I can tell from this conversation, users are going to
need to set this parameter in normal operation in some
configurations.

I would ideally like to avoid a situation where (i) the Xen docs say
"do not set this parameter because it is confusing" but (ii) other
less authoritative sources (wiki pages, or mailing list threads, etc.)
say "oh yes just set this weird parameter to 8192 for no readily
comprehensible reason".

I say `some configurations' because, I'm afraid, most of the
conversation about hypervisor internals has gone over my head.  Let me
try to summarise (correct me if I am wrong):

 * There are some hypervisor tracking resources associated with each
   emulated MMIO range.

   (Do we mean the memory ranges that are configured in the hypervisor
   to be sent to an ioemu via the ioreq protocol - ie, the system
   which is normally used in HVM domains to interface to the device
   model?)

   (Are these ranges denominated in guest-physical space?)

 * For almost all domains the set of such MMIO ranges is small or very
   small.

 * Such ranges are sometimes created by, or specified by, the guest.

   (I don't understand why this should be the case but perhaps this is
   an inherent aspect of the design of this new feature.)

 * If too many such ranges were created by the guest the guest could
   consume excessive hypervisor memory.

 * Therefore normally the number of such ranges per guest is (or
   should be) limited to a small number.

 * With `Intel GVT-g broadwell platform' and `vGPU in GVT-g' or
   `GVT-d' it may be necessary for functionality to allow a larger
   number of such ranges.

But to be able to know what the right interface is for the system
administrator (and what to write in the docs), I need know:

 * Is it possible for libxl to somehow tell from the rest of the
   configuration that this larger limit should be applied ?

 * In the configurations where a larger number is needed, what larger
   limit is appropriate ?  How should it be calculated ?

 * How do we know that this does not itself give an opportunity for
   hypervisor resource exhaustion attacks by guests ?  (Note: if it
   _does_ give such an opportunity, this should be mentioned more
   clearly in the documentation.)

 * If we are talking about mmio ranges for ioreq servers, why do
   guests which do not use this feature have the ability to create
   them at all ?

(A background problem I have is that this thread is full of product
name jargon and assumes a lot of background knowledge of the
implementation of these features - background knowledge which I lack
and which isn't in these patches.  If someone could point me at a
quick summary of what `GVT-g' and `GVT-d' are that might help.)

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Paul Durrant

> -Original Message-
> From: Ian Jackson [mailto:ian.jack...@eu.citrix.com]
> Sent: 03 February 2016 14:44
> To: Paul Durrant
> Cc: Jan Beulich; Andrew Cooper; Ian Campbell; Stefano Stabellini; Wei Liu;
> Kevin Tian; zhiyuan...@intel.com; Zhang Yu; xen-devel@lists.xen.org; Keir
> (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> Paul Durrant writes ("RE: [Xen-devel] [PATCH v3 3/3] tools: introduce
> parameter max_wp_ram_ranges."):
> > > From: Jan Beulich [mailto:jbeul...@suse.com]
> ...
> > > I wouldn't be happy with that (and I've said so before), since it
> > > would allow all VM this extra resource consumption.
> >
> > The ball is back in Ian's court then.
> 
> Sorry to be vague, but: I'm not definitely objecting to some toolstack
> parameter.  I'm trying to figure out whether this parameter, in this
> form, with this documentation, makes some kind of sense.
> 
> In the most recent proposed patch the docs basically say (to most
> users) "there is this parameter, but it is very complicated, so do not
> set it.  We already have a lot of these kind of parameters.  As a
> general rule they are OK if it is really the case that the parameter
> should be ignored.  I am happy to have a whole lot of strange
> parameters that the user can ignore.
> 
> But as far as I can tell from this conversation, users are going to
> need to set this parameter in normal operation in some
> configurations.
> 
> I would ideally like to avoid a situation where (i) the Xen docs say
> "do not set this parameter because it is confusing" but (ii) other
> less authoritative sources (wiki pages, or mailing list threads, etc.)
> say "oh yes just set this weird parameter to 8192 for no readily
> comprehensible reason".
> 
> I say `some configurations' because, I'm afraid, most of the
> conversation about hypervisor internals has gone over my head.  Let me
> try to summarise (correct me if I am wrong):
> 
>  * There are some hypervisor tracking resources associated with each
>emulated MMIO range.
> 
>(Do we mean the memory ranges that are configured in the hypervisor
>to be sent to an ioemu via the ioreq protocol - ie, the system
>which is normally used in HVM domains to interface to the device
>model?)

Yes, the ioreq server infrastructure needs rangesets to steer  particular 
emulation requests to the correct emulator.

> 
>(Are these ranges denominated in guest-physical space?)
> 

Yes.

>  * For almost all domains the set of such MMIO ranges is small or very
>small.
> 
>  * Such ranges are sometimes created by, or specified by, the guest.
> 
>(I don't understand why this should be the case but perhaps this is
>an inherent aspect of the design of this new feature.)
> 

This is the 'new' thing here. Because XenGT needs to shadow GTTs (i.e. GPU page 
tables) by steering writes to the GVT-g ioreq server and the number of GTTs is 
broadly determined by the number of applications using the GPU inside the VM 
then we have a situation where the guest is determining Xen's use of memory.

>  * If too many such ranges were created by the guest the guest could
>consume excessive hypervisor memory.
> 
>  * Therefore normally the number of such ranges per guest is (or
>should be) limited to a small number.
> 
>  * With `Intel GVT-g broadwell platform' and `vGPU in GVT-g' or
>`GVT-d' it may be necessary for functionality to allow a larger
>number of such ranges.
> 
> But to be able to know what the right interface is for the system
> administrator (and what to write in the docs), I need know:
> 
>  * Is it possible for libxl to somehow tell from the rest of the
>configuration that this larger limit should be applied ?
> 

If a XenGT-enabled VM is provisioned through libxl then some larger limit is 
likely to be required. One of the issues is that it is impossible (or at least 
difficult) to know how many GTTs are going to need to be shadowed.

>  * In the configurations where a larger number is needed, what larger
>limit is appropriate ?  How should it be calculated ?
> 

Intel are suggesting a static 8k value should be ok.

>  * How do we know that this does not itself give an opportunity for
>hypervisor resource exhaustion attacks by guests ?  (Note: if it
>_does_ give such an opportunity, this should be mentioned more
>clearly in the documentation.)
> 

In practice XenGT only supports 4 VMs (including dom0) on Haswell and 7 on 
Broadwell so even with the larger limit we're not talking about a huge amount 
of resource.

>  * If we are talking about mmio ranges for ioreq servers, why do
>gu

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Jan Beulich

>>> On 03.02.16 at 13:20, <paul.durr...@citrix.com> wrote:
>>  -Original Message-
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: 03 February 2016 08:33
>> To: Zhang Yu
>> Cc: Andrew Cooper; Ian Campbell; Paul Durrant; Wei Liu; Ian Jackson; Stefano
>> Stabellini; Kevin Tian; zhiyuan...@intel.com; xen-devel@lists.xen.org; Keir
>> (Xen.org)
>> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
>> max_wp_ram_ranges.
>> 
>> >>> On 03.02.16 at 08:10, <yu.c.zh...@linux.intel.com> wrote:
>> > On 2/2/2016 11:21 PM, Jan Beulich wrote:
>> >>>>> On 02.02.16 at 16:00, <yu.c.zh...@linux.intel.com> wrote:
>> >>> The limit of 4G is to avoid the data missing from uint64 to uint32
>> >>> assignment. And I can accept the 8K limit for XenGT in practice.
>> >>> After all, it is vGPU page tables we are trying to trap and emulate,
>> >>> not normal page frames.
>> >>>
>> >>> And I guess the reason that one domain exhausting Xen's memory can
>> >>> affect another domain is because rangeset uses Xen heap, instead of
>> the
>> >>> per-domain memory. So what about we use a 8K limit by now for XenGT,
>> >>> and in the future, if a per-domain memory allocation solution for
>> >>> rangeset is ready, we do need to limit the rangeset size. Does this
>> >>> sounds more acceptable?
>> >>
>> >> The lower the limit the better (but no matter how low the limit
>> >> it won't make this a pretty thing). Anyway I'd still like to wait
>> >> for what Ian may further say on this.
>> >>
>> > Hi Jan, I just had a discussion with my colleague. We believe 8K could
>> > be the biggest limit for the write-protected ram ranges. If in the
>> > future, number of vGPU page tables exceeds this limit, we will modify
>> > our back-end device model to find a trade-off method, instead of
>> > extending this limit. If you can accept this value as the upper bound
>> > of rangeset, maybe we do not need to add any tool stack parameters, but
>> > define a MAX_NR_WR_RAM_RANGES for the write-protected ram
>> rangesset. As
>> > to other rangesets, we keep their limit as 256. Does this sounds OK? :)
>> 
>> I'm getting the impression that we're moving in circles. A blanket
>> limit above the 256 one for all domains is _not_ going to be
>> acceptable; going to 8k will still need host admin consent. With
>> your rangeset performance improvement patch, each range is
>> going to be tracked by a 40 byte structure (up from 32), which
>> already means an overhead increase for all the other ranges. 8k
>> of wp ranges implies an overhead beyond 448k (including the
>> xmalloc() overhead), which is not _that_ much, but also not
>> negligible.
>> 
> 
> ... which means we are still going to need a toolstack parameter to set the 
> limit. We already have a parameter for VRAM size so is having a parameter for 
> max. GTT shadow ranges such a bad thing?

It's workable, but not nice (see also Ian's earlier response).

> Is the fact that the memory comes 
> from xenheap rather than domheap the real problem?

Not the primary one, since except on huge memory machines
both heaps are identical. To me the primary one is the quite
more significant resource consumption in the first place (I'm not
going to repeat what I've written in already way too many
replies before).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: 03 February 2016 12:36
> To: Paul Durrant
> Cc: Andrew Cooper; Ian Campbell; Ian Jackson; Stefano Stabellini; Wei Liu;
> Kevin Tian; zhiyuan...@intel.com; Zhang Yu; xen-devel@lists.xen.org; Keir
> (Xen.org)
> Subject: RE: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> max_wp_ram_ranges.
> 
> >>> On 03.02.16 at 13:20, <paul.durr...@citrix.com> wrote:
> >>  -Original Message-
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: 03 February 2016 08:33
> >> To: Zhang Yu
> >> Cc: Andrew Cooper; Ian Campbell; Paul Durrant; Wei Liu; Ian Jackson;
> Stefano
> >> Stabellini; Kevin Tian; zhiyuan...@intel.com; xen-devel@lists.xen.org; Keir
> >> (Xen.org)
> >> Subject: Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
> >> max_wp_ram_ranges.
> >>
> >> >>> On 03.02.16 at 08:10, <yu.c.zh...@linux.intel.com> wrote:
> >> > On 2/2/2016 11:21 PM, Jan Beulich wrote:
> >> >>>>> On 02.02.16 at 16:00, <yu.c.zh...@linux.intel.com> wrote:
> >> >>> The limit of 4G is to avoid the data missing from uint64 to uint32
> >> >>> assignment. And I can accept the 8K limit for XenGT in practice.
> >> >>> After all, it is vGPU page tables we are trying to trap and emulate,
> >> >>> not normal page frames.
> >> >>>
> >> >>> And I guess the reason that one domain exhausting Xen's memory
> can
> >> >>> affect another domain is because rangeset uses Xen heap, instead of
> >> the
> >> >>> per-domain memory. So what about we use a 8K limit by now for
> XenGT,
> >> >>> and in the future, if a per-domain memory allocation solution for
> >> >>> rangeset is ready, we do need to limit the rangeset size. Does this
> >> >>> sounds more acceptable?
> >> >>
> >> >> The lower the limit the better (but no matter how low the limit
> >> >> it won't make this a pretty thing). Anyway I'd still like to wait
> >> >> for what Ian may further say on this.
> >> >>
> >> > Hi Jan, I just had a discussion with my colleague. We believe 8K could
> >> > be the biggest limit for the write-protected ram ranges. If in the
> >> > future, number of vGPU page tables exceeds this limit, we will modify
> >> > our back-end device model to find a trade-off method, instead of
> >> > extending this limit. If you can accept this value as the upper bound
> >> > of rangeset, maybe we do not need to add any tool stack parameters,
> but
> >> > define a MAX_NR_WR_RAM_RANGES for the write-protected ram
> >> rangesset. As
> >> > to other rangesets, we keep their limit as 256. Does this sounds OK? :)
> >>
> >> I'm getting the impression that we're moving in circles. A blanket
> >> limit above the 256 one for all domains is _not_ going to be
> >> acceptable; going to 8k will still need host admin consent. With
> >> your rangeset performance improvement patch, each range is
> >> going to be tracked by a 40 byte structure (up from 32), which
> >> already means an overhead increase for all the other ranges. 8k
> >> of wp ranges implies an overhead beyond 448k (including the
> >> xmalloc() overhead), which is not _that_ much, but also not
> >> negligible.
> >>
> >
> > ... which means we are still going to need a toolstack parameter to set the
> > limit. We already have a parameter for VRAM size so is having a parameter
> for
> > max. GTT shadow ranges such a bad thing?
> 
> It's workable, but not nice (see also Ian's earlier response).
> 
> > Is the fact that the memory comes
> > from xenheap rather than domheap the real problem?
> 
> Not the primary one, since except on huge memory machines
> both heaps are identical. To me the primary one is the quite
> more significant resource consumption in the first place (I'm not
> going to repeat what I've written in already way too many
> replies before).

Ok. Well the only way round tracking specific ranges for emulation (and 
consequently suffering the overhead) is tracking by type. For XenGT I guess it 
would be possible to live with a situation where a single ioreq server can 
register all wp mem emulations for a given VM. I can't say I particularly like 
that way of doing things but if it's the only way forward then I guess we may 
have to live with it.

  Paul

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread George Dunlap

On Wed, Feb 3, 2016 at 3:10 PM, Paul Durrant  wrote:
>>  * Is it possible for libxl to somehow tell from the rest of the
>>configuration that this larger limit should be applied ?
>>
>
> If a XenGT-enabled VM is provisioned through libxl then some larger limit is 
> likely to be required. One of the issues is that it is impossible (or at 
> least difficult) to know how many GTTs are going to need to be shadowed.

By GTT, you mean the GPU pagetables I assume?  So you're talking about
how large this value should be made, not whether the
heuristically-chosen larger value should be used.  libxl should be
able to tell that XenGT is enabled, I assume, so it should be able to
automatically bump this to 8k if necessary, right?

But I think you'd still need a parameter that you could tweak if it
turned out that 8k wasn't enough for a particular workload, right?

>>  * If we are talking about mmio ranges for ioreq servers, why do
>>guests which do not use this feature have the ability to create
>>them at all ?
>
> It's not the guest that directly creates the ranges, it's the emulator. 
> Normally device emulation would require a relatively small number of MMIO 
> ranges and a total number that cannot be influenced by the guest itself. In 
> this case though, as I said above, the number *can* be influenced by the 
> guest (although it is still the emulator which actually causes the ranges to 
> be created).

Just to make this clear: The guest chooses how many gpfns are used in
the GPU pagetables; for each additional gpfn in the guest pagetable,
qemu / xen have the option of either marking it to be emulated (at the
moment, by marking it as a one-page "MMIO region") or crashing the
guest.

>> (A background problem I have is that this thread is full of product
>> name jargon and assumes a lot of background knowledge of the
>> implementation of these features - background knowledge which I lack
>> and which isn't in these patches.  If someone could point me at a
>> quick summary of what `GVT-g' and `GVT-d' are that might help.)
>>
>
> GVT-d is a name applied to PCI passthrough of an Intel GPU. GVT-g is a name 
> applied to Intel GPU virtualization, which makes use of an emulator to 
> mediate guest access to the real GPU so that it is possible to share the 
> resources securely.

And GTT are the GPU equivalent of page tables?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread George Dunlap

On Wed, Feb 3, 2016 at 2:43 PM, Ian Jackson <ian.jack...@eu.citrix.com> wrote:
> Paul Durrant writes ("RE: [Xen-devel] [PATCH v3 3/3] tools: introduce 
> parameter max_wp_ram_ranges."):
>> > From: Jan Beulich [mailto:jbeul...@suse.com]
> ...
>> > I wouldn't be happy with that (and I've said so before), since it
>> > would allow all VM this extra resource consumption.
>>
>> The ball is back in Ian's court then.
>
> Sorry to be vague, but: I'm not definitely objecting to some toolstack
> parameter.  I'm trying to figure out whether this parameter, in this
> form, with this documentation, makes some kind of sense.
>
> In the most recent proposed patch the docs basically say (to most
> users) "there is this parameter, but it is very complicated, so do not
> set it.  We already have a lot of these kind of parameters.  As a
> general rule they are OK if it is really the case that the parameter
> should be ignored.  I am happy to have a whole lot of strange
> parameters that the user can ignore.
>
> But as far as I can tell from this conversation, users are going to
> need to set this parameter in normal operation in some
> configurations.
>
> I would ideally like to avoid a situation where (i) the Xen docs say
> "do not set this parameter because it is confusing" but (ii) other
> less authoritative sources (wiki pages, or mailing list threads, etc.)
> say "oh yes just set this weird parameter to 8192 for no readily
> comprehensible reason".
>
> I say `some configurations' because, I'm afraid, most of the
> conversation about hypervisor internals has gone over my head.  Let me
> try to summarise (correct me if I am wrong):
>
>  * There are some hypervisor tracking resources associated with each
>emulated MMIO range.
>
>(Do we mean the memory ranges that are configured in the hypervisor
>to be sent to an ioemu via the ioreq protocol - ie, the system
>which is normally used in HVM domains to interface to the device
>model?)
>
>(Are these ranges denominated in guest-physical space?)
>
>  * For almost all domains the set of such MMIO ranges is small or very
>small.
>
>  * Such ranges are sometimes created by, or specified by, the guest.
>
>(I don't understand why this should be the case but perhaps this is
>an inherent aspect of the design of this new feature.)

So the real issue here, as I've said elsewhere is this:

These are not MMIO regions.  They are not IO because they do not talk
to devices, and they are not regions; they are individual gpfns.

What's happening here is that qemu wants to be able to do the
equivalent of shadow pagetable emulation for the GPU's equivalent of
the pagetables (there's a special name, I forget what they're called).

These gpfns are just normal guest memory, selected by the operating
system and/or the graphics driver to use in the equivalent of a
pagetable for the GPU.

And instead of making a new interface designed to keep track of gpfns,
they are (ab)using the existing "MMIO range" interface.

But of course, since they they aren't actually ranges but just gpfns,
they're scattered randomly throughout the guest physical address
space.

That's why XenGT suddenly wants orders of magnitude more "MMIO
regions" than any other device has ever needed before -- because
they're using a hammer when what they want is a rake.

They claim that 8k "should be enough for anybody", but as far as I can
tell, the theoretical limit of the number of pages being used in the
GPU pagetables could be unbounded.

I think at some point I suggested an alternate design based on marking
such gpfns with a special p2m type; I can't remember if that
suggestion was actually addressed or not.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread George Dunlap

On 03/02/16 18:39, Andrew Cooper wrote:
> On 03/02/16 18:21, George Dunlap wrote:
>> 2. It's not technically difficult to extend the number of servers
>> supported to something sensible, like 4 (using 4 different write_dm
>> p2m types)
> 
> While technically true, spare bits in the pagetable entries are at a
> premium, and steadily decreasing as Intel are introducing new features.
> 
> We have 16 current p2m types, and a finite upper bound of 6 bits of p2m
> type space, already with a push to reduce this number.
> 
> While introducing 1 new p2m type for this purpose might be an acceptable
> tradeoff, a using a p2m type per ioreq server is not IMO.

It is true that we don't have a ton of elbow room to grow at the moment.

But we actually already have a single p2m type -- mmio_write_dm -- that
as far as I know is only being used by XenGT.  We don't actually need to
add any new p2m types for my initial proposal.

Going forward, we probably will, at some point, need to implement a
parallel "p2t" structure to keep track of types -- and probably will
whether end up implementing 4 separate write_dm types or not (for the
reasons you describe).

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread George Dunlap

On Wed, Feb 3, 2016 at 5:41 PM, George Dunlap
 wrote:
> I think at some point I suggested an alternate design based on marking
> such gpfns with a special p2m type; I can't remember if that
> suggestion was actually addressed or not.

FWIW, the thread where I suggested using p2m types was in response to

<1436163912-1506-2-git-send-email-yu.c.zh...@linux.intel.com>

Looking through it again, the main objection Paul gave[1]  was:

"And it's the assertion that use of write_dm will only be relevant to
gfns, and that all such notifications only need go to a single ioreq
server, that I have a problem with. Whilst the use of io ranges to
track gfn updates is, I agree, not ideal I think the overloading of
write_dm is not a step in the right direction."

Two issues raised here, about using only p2m types to implement write_dm:
1. More than one ioreq server may want to use the write_dm functionality
2. ioreq servers may want to use write_dm for things other than individual gpfns

My answer to #1 was:
1. At the moment, we only need to support a single ioreq server using write_dm
2. It's not technically difficult to extend the number of servers
supported to something sensible, like 4 (using 4 different write_dm
p2m types)
3. The interface can be designed such that we can extend support to
multiple servers when we need to.

My answer to #2 was that there's no reason why using write_dm could be
used for both individual gpfns and ranges; there's no reason the
interface can't take a "start" and "count" argument, even if for the
time being "count" is almost always going to be 1.

Compare this to the downsides of the approach you're proposing:
1. Using 40 bytes of hypervisor space per guest GPU pagetable page (as
opposed to using a bit in the existing p2m table)
2. Walking down an RB tree with 8000 individual nodes to find out
which server to send the message to (rather than just reading the
value from the p2m table).
3. Needing to determine on a guest-by-guest basis whether to change the limit
4. Needing to have an interface to make the limit even bigger, just in
case we find workloads that have even more GTTs.

I really don't understand where you're coming from on this.  The
approach you've chosen looks to me to be slower, more difficult to
implement, and more complicated; and it's caused a lot more resistance
trying to get this series accepted.

 -George

[1] <9aae0902d5bc7e449b7c8e4e778abcd02598b...@amspex01cl02.citrite.net>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread George Dunlap

On Wed, Feb 3, 2016 at 6:21 PM, George Dunlap
 wrote:
> I really don't understand where you're coming from on this.  The
> approach you've chosen looks to me to be slower, more difficult to
> implement, and more complicated; and it's caused a lot more resistance
> trying to get this series accepted.

Maybe it would be helpful if I did a mock-up implementation of what I
had in mind, and you can have something concrete to criticize.  I'll
see if I can do that sometime this week.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-03 Thread Andrew Cooper

On 03/02/16 18:21, George Dunlap wrote:
> 2. It's not technically difficult to extend the number of servers
> supported to something sensible, like 4 (using 4 different write_dm
> p2m types)

While technically true, spare bits in the pagetable entries are at a
premium, and steadily decreasing as Intel are introducing new features.

We have 16 current p2m types, and a finite upper bound of 6 bits of p2m
type space, already with a push to reduce this number.

While introducing 1 new p2m type for this purpose might be an acceptable
tradeoff, a using a p2m type per ioreq server is not IMO.

As an orthogonal exercise, several of the existing p2m types are
redundant and can (or are already) expressed elsehow.  There is
definitely room to reduce the current 16 types down a bit without too
much effort.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Yu, Zhang


Thanks for your reply, Ian.

On 2/2/2016 1:05 AM, Ian Jackson wrote:

Yu, Zhang writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter 
max_wp_ram_ranges."):

On 2/2/2016 12:35 AM, Jan Beulich wrote:

On 01.02.16 at 17:19, <yu.c.zh...@linux.intel.com> wrote:

So, we need also validate this param in hvm_allow_set_param,
current although hvm_allow_set_param has not performed any
validation other parameters. We need to do this for the new
ones. Is this understanding correct?


Yes.


Another question is: as to the tool stack side, do you think
an error message would suffice? Shouldn't xl be terminated?


I have no idea what consistent behavior in such a case would
be - I'll defer input on this to the tool stack maintainers.


Thank you.
Wei, which one do you prefer?


I think that arrangements should be made for the hypercall failure to
be properly reported to the caller, and properly logged.

I don't think it is desirable to duplicate the sanity check in
xl/libxl/libxc.  That would simply result in there being two limits to
update.



Sorry, I do not follow. What does "being two limits to update" mean?


I have to say, though, that the situation with this parameter seems
quite unsatisfactory.  It seems to be a kind of bodge.



By "situation with this parameter", do you mean:
a> the introduction of this parameter in tool stack, or
b> the sanitizing for this parameter(In fact I'd prefer not to treat
the check of this parameter as a sanitizing, cause it only checks
the input against 4G to avoid data missing from uint64 to uint32
assignment in hvm_ioreq_server_alloc_rangesets)?




The changeable limit is there to prevent excessive resource usage by a
guest.  But the docs suggest that the excessive usage might be
normal.  That sounds like a suboptimal design to me.



Yes, there might be situations that this limit be set to some large
value. But I that situation would be very rare. Like the docs
suggested, for XenGT, 8K is a big enough one for most cases.


For reference, here is the docs proposed in this patch:

   =item B

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Jan Beulich

>>> On 02.02.16 at 16:00,  wrote:
> The limit of 4G is to avoid the data missing from uint64 to uint32
> assignment. And I can accept the 8K limit for XenGT in practice.
> After all, it is vGPU page tables we are trying to trap and emulate,
> not normal page frames.
> 
> And I guess the reason that one domain exhausting Xen's memory can
> affect another domain is because rangeset uses Xen heap, instead of the
> per-domain memory. So what about we use a 8K limit by now for XenGT,
> and in the future, if a per-domain memory allocation solution for
> rangeset is ready, we do need to limit the rangeset size. Does this
> sounds more acceptable?

The lower the limit the better (but no matter how low the limit
it won't make this a pretty thing). Anyway I'd still like to wait
for what Ian may further say on this.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Yu, Zhang




On 2/2/2016 10:42 PM, Jan Beulich wrote:

On 02.02.16 at 15:01,  wrote:

On 2/2/2016 7:12 PM, Jan Beulich wrote:

On 02.02.16 at 11:56,  wrote:

I understand your concern, and to be honest, I do not think
this is an optimal solution. But I also have no better idea
in mind.  :(
Another option may be: instead of opening this parameter to
the tool stack, we use a XenGT flag, which set the rangeset
limit to a default value. But like I said, this default value
may not always work on future XenGT platforms.


Assuming that you think of something set e.g. by hypervisor
command line option: How would that work? I.e. how would
that limit the resource use for all VMs not using XenGT? Or if
you mean a flag settable in the domain config - how would you
avoid a malicious admin to set this flag for all the VMs created
in the controlled partition of the system?


Well, I am not satisfied with this new parameter, because:
1> exposing an option like max_wp_ram_ranges to the user seems too
detailed;
2> but if not, using a XenGT flag means it would be hard for hypervisor
to find a default value which can work in all situations theoretically,
although in practice, 8K is already a big enough one.

However, as to the security concern you raised, I can not fully
understand. :) E.g. I believe a malicious admin can also breach the
system even without this patch. This argument may not be convincing to
you, but as to this specific case, even if an admin set XenGT flag to
all VMs, what harm will this action do? It only means the ioreq server
can at most allocate 8K ranges, will that consume all the Xen heaps,
especially for 64 bit Xen?


First of all so far you meant to set a limit of 4G, which - taking a
handful of domains - if fully used would take even a mid-size
host out of memory. And then you have to consider bad effects
resulting from Xen itself not normally having a lot of memory left
(especially when "dom0_mem=" is not forcing most of the memory
to be in Xen's hands), which may mean that one domain
exhausting Xen's memory can affect another domain if Xen can't
allocate memory it needs to support that other domain, in the
worst case leading to a domain crash. And this all is still leaving
aside Xen's own health...



Thanks, Jan.
The limit of 4G is to avoid the data missing from uint64 to uint32
assignment. And I can accept the 8K limit for XenGT in practice.
After all, it is vGPU page tables we are trying to trap and emulate,
not normal page frames.

And I guess the reason that one domain exhausting Xen's memory can
affect another domain is because rangeset uses Xen heap, instead of the
per-domain memory. So what about we use a 8K limit by now for XenGT,
and in the future, if a per-domain memory allocation solution for
rangeset is ready, we do need to limit the rangeset size. Does this
sounds more acceptable?

B.R.
Yu


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Andrew Cooper

On 02/02/16 11:43, Jan Beulich wrote:
 On 02.02.16 at 12:31,  wrote:
>> This specific issue concerns resource allocation during domain building
>> and is an area which can never ever be given to a less privileged entity.
> Which is because of ...? (And if so, why would we have put
> XEN_DOMCTL_createdomain on the XSA-77 waiver list?)

That list came out of the blue as far as the Xen community went.

The purpose of XEN_DOMCTL_createdomain is to mutate the set of valid
identifiers in Xen on which XSM permissions are based, and any entity
capable of making the hypercall can at the very least cause reuse of an
existing identifier.

For a different example, take XEN_DOMCTL_gdbsx_guestmemio.  This
hypercall specifically permits the caller to change arbitrary memory,
including that of the Xen itself.

Neither of these two operations will ever be safe in the hands of
anything but a fully privileged entity.  Pretending otherwise isn't
going to change this fact.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Jan Beulich

>>> On 02.02.16 at 15:01,  wrote:
> On 2/2/2016 7:12 PM, Jan Beulich wrote:
> On 02.02.16 at 11:56,  wrote:
>>> I understand your concern, and to be honest, I do not think
>>> this is an optimal solution. But I also have no better idea
>>> in mind.  :(
>>> Another option may be: instead of opening this parameter to
>>> the tool stack, we use a XenGT flag, which set the rangeset
>>> limit to a default value. But like I said, this default value
>>> may not always work on future XenGT platforms.
>>
>> Assuming that you think of something set e.g. by hypervisor
>> command line option: How would that work? I.e. how would
>> that limit the resource use for all VMs not using XenGT? Or if
>> you mean a flag settable in the domain config - how would you
>> avoid a malicious admin to set this flag for all the VMs created
>> in the controlled partition of the system?
> 
> Well, I am not satisfied with this new parameter, because:
> 1> exposing an option like max_wp_ram_ranges to the user seems too
> detailed;
> 2> but if not, using a XenGT flag means it would be hard for hypervisor
> to find a default value which can work in all situations theoretically,
> although in practice, 8K is already a big enough one.
> 
> However, as to the security concern you raised, I can not fully
> understand. :) E.g. I believe a malicious admin can also breach the
> system even without this patch. This argument may not be convincing to 
> you, but as to this specific case, even if an admin set XenGT flag to
> all VMs, what harm will this action do? It only means the ioreq server 
> can at most allocate 8K ranges, will that consume all the Xen heaps, 
> especially for 64 bit Xen?

First of all so far you meant to set a limit of 4G, which - taking a
handful of domains - if fully used would take even a mid-size
host out of memory. And then you have to consider bad effects
resulting from Xen itself not normally having a lot of memory left
(especially when "dom0_mem=" is not forcing most of the memory
to be in Xen's hands), which may mean that one domain
exhausting Xen's memory can affect another domain if Xen can't
allocate memory it needs to support that other domain, in the
worst case leading to a domain crash. And this all is still leaving
aside Xen's own health...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Yu, Zhang




On 2/2/2016 11:21 PM, Jan Beulich wrote:

On 02.02.16 at 16:00,  wrote:

The limit of 4G is to avoid the data missing from uint64 to uint32
assignment. And I can accept the 8K limit for XenGT in practice.
After all, it is vGPU page tables we are trying to trap and emulate,
not normal page frames.

And I guess the reason that one domain exhausting Xen's memory can
affect another domain is because rangeset uses Xen heap, instead of the
per-domain memory. So what about we use a 8K limit by now for XenGT,
and in the future, if a per-domain memory allocation solution for
rangeset is ready, we do need to limit the rangeset size. Does this
sounds more acceptable?


The lower the limit the better (but no matter how low the limit
it won't make this a pretty thing). Anyway I'd still like to wait
for what Ian may further say on this.



OK then. :)
Ian, do you have any suggestions?

Thanks
Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Jan Beulich

>>> On 01.02.16 at 18:05,  wrote:
> Having said that, if the hypervisor maintainers are happy with a
> situation where this value is configured explicitly, and the
> configurations where a non-default value is required is expected to be
> rare, then I guess we can live with it.

Well, from the very beginning I have been not very happy with
the introduction of this, and I still consider it half way acceptable
only because of not seeing any good alternative. If we look at
it strictly, it's in violation of the rule we set forth after XSA-77:
No introduction of new code making the system susceptible to
bad (malicious) tool stack behavior, and hence we should reject
it. Yet that would leave XenGT in a state where it would have no
perspective of ever getting merged, which doesn't seem very
desirable either.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Yu, Zhang




On 2/2/2016 6:32 PM, Jan Beulich wrote:

On 01.02.16 at 18:05,  wrote:

Having said that, if the hypervisor maintainers are happy with a
situation where this value is configured explicitly, and the
configurations where a non-default value is required is expected to be
rare, then I guess we can live with it.


Well, from the very beginning I have been not very happy with
the introduction of this, and I still consider it half way acceptable
only because of not seeing any good alternative. If we look at
it strictly, it's in violation of the rule we set forth after XSA-77:
No introduction of new code making the system susceptible to
bad (malicious) tool stack behavior, and hence we should reject
it. Yet that would leave XenGT in a state where it would have no
perspective of ever getting merged, which doesn't seem very
desirable either.

Jan



Thanks, Jan.
I understand your concern, and to be honest, I do not think
this is an optimal solution. But I also have no better idea
in mind.  :(
Another option may be: instead of opening this parameter to
the tool stack, we use a XenGT flag, which set the rangeset
limit to a default value. But like I said, this default value
may not always work on future XenGT platforms.


B.R.
Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Andrew Cooper

On 02/02/16 10:32, Jan Beulich wrote:
 On 01.02.16 at 18:05,  wrote:
>> Having said that, if the hypervisor maintainers are happy with a
>> situation where this value is configured explicitly, and the
>> configurations where a non-default value is required is expected to be
>> rare, then I guess we can live with it.
> Well, from the very beginning I have been not very happy with
> the introduction of this, and I still consider it half way acceptable
> only because of not seeing any good alternative. If we look at
> it strictly, it's in violation of the rule we set forth after XSA-77:
> No introduction of new code making the system susceptible to
> bad (malicious) tool stack behavior

Lets take a step back here.

If your toolstack is malicious, you have already lost.  Coding Xen
around this is a waste of time.

The XSM case is for splitting out some of the privileged domains
responsibilities to less privileged domains.  In these cases, we do
indeed want to assure that the somewhat-privileged entity cannot abuse
anything outside its area of privilege.

This specific issue concerns resource allocation during domain building
and is an area which can never ever be given to a less privileged entity.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Jan Beulich

>>> On 02.02.16 at 12:31,  wrote:
> This specific issue concerns resource allocation during domain building
> and is an area which can never ever be given to a less privileged entity.

Which is because of ...? (And if so, why would we have put
XEN_DOMCTL_createdomain on the XSA-77 waiver list?)

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Jan Beulich

>>> On 02.02.16 at 11:56,  wrote:
> I understand your concern, and to be honest, I do not think
> this is an optimal solution. But I also have no better idea
> in mind.  :(
> Another option may be: instead of opening this parameter to
> the tool stack, we use a XenGT flag, which set the rangeset
> limit to a default value. But like I said, this default value
> may not always work on future XenGT platforms.

Assuming that you think of something set e.g. by hypervisor
command line option: How would that work? I.e. how would
that limit the resource use for all VMs not using XenGT? Or if
you mean a flag settable in the domain config - how would you
avoid a malicious admin to set this flag for all the VMs created
in the controlled partition of the system?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Wei Liu

On Tue, Feb 02, 2016 at 04:04:14PM +0800, Yu, Zhang wrote:
> Thanks for your reply, Ian.
> 
> On 2/2/2016 1:05 AM, Ian Jackson wrote:
> >Yu, Zhang writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter 
> >max_wp_ram_ranges."):
> >>On 2/2/2016 12:35 AM, Jan Beulich wrote:
> >>>On 01.02.16 at 17:19, <yu.c.zh...@linux.intel.com> wrote:
> >>>>So, we need also validate this param in hvm_allow_set_param,
> >>>>current although hvm_allow_set_param has not performed any
> >>>>validation other parameters. We need to do this for the new
> >>>>ones. Is this understanding correct?
> >>>
> >>>Yes.
> >>>
> >>>>Another question is: as to the tool stack side, do you think
> >>>>an error message would suffice? Shouldn't xl be terminated?
> >>>
> >>>I have no idea what consistent behavior in such a case would
> >>>be - I'll defer input on this to the tool stack maintainers.
> >>
> >>Thank you.
> >>Wei, which one do you prefer?
> >
> >I think that arrangements should be made for the hypercall failure to
> >be properly reported to the caller, and properly logged.
> >
> >I don't think it is desirable to duplicate the sanity check in
> >xl/libxl/libxc.  That would simply result in there being two limits to
> >update.
> >
> 
> Sorry, I do not follow. What does "being two limits to update" mean?
> 

I can't speak for Ian, but my understanding is that if the code logic is
duplicated in several places, you need to update all of them whenever
you change the logic. But, he has expressed if this is blocker for this
series, so I will let him clarify.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Yu, Zhang




On 2/2/2016 7:51 PM, Wei Liu wrote:

On Tue, Feb 02, 2016 at 04:04:14PM +0800, Yu, Zhang wrote:

Thanks for your reply, Ian.

On 2/2/2016 1:05 AM, Ian Jackson wrote:

Yu, Zhang writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter 
max_wp_ram_ranges."):

On 2/2/2016 12:35 AM, Jan Beulich wrote:

On 01.02.16 at 17:19, <yu.c.zh...@linux.intel.com> wrote:

So, we need also validate this param in hvm_allow_set_param,
current although hvm_allow_set_param has not performed any
validation other parameters. We need to do this for the new
ones. Is this understanding correct?


Yes.


Another question is: as to the tool stack side, do you think
an error message would suffice? Shouldn't xl be terminated?


I have no idea what consistent behavior in such a case would
be - I'll defer input on this to the tool stack maintainers.


Thank you.
Wei, which one do you prefer?


I think that arrangements should be made for the hypercall failure to
be properly reported to the caller, and properly logged.

I don't think it is desirable to duplicate the sanity check in
xl/libxl/libxc.  That would simply result in there being two limits to
update.



Sorry, I do not follow. What does "being two limits to update" mean?



I can't speak for Ian, but my understanding is that if the code logic is
duplicated in several places, you need to update all of them whenever
you change the logic. But, he has expressed if this is blocker for this
series, so I will let him clarify.


Thank you, Wei. This explanation helps. :)

B.R.
Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-02 Thread Yu, Zhang




On 2/2/2016 7:12 PM, Jan Beulich wrote:

On 02.02.16 at 11:56,  wrote:

I understand your concern, and to be honest, I do not think
this is an optimal solution. But I also have no better idea
in mind.  :(
Another option may be: instead of opening this parameter to
the tool stack, we use a XenGT flag, which set the rangeset
limit to a default value. But like I said, this default value
may not always work on future XenGT platforms.


Assuming that you think of something set e.g. by hypervisor
command line option: How would that work? I.e. how would
that limit the resource use for all VMs not using XenGT? Or if
you mean a flag settable in the domain config - how would you
avoid a malicious admin to set this flag for all the VMs created
in the controlled partition of the system?



Well, I am not satisfied with this new parameter, because:
1> exposing an option like max_wp_ram_ranges to the user seems too
detailed;
2> but if not, using a XenGT flag means it would be hard for hypervisor
to find a default value which can work in all situations theoretically,
although in practice, 8K is already a big enough one.

However, as to the security concern you raised, I can not fully
understand. :) E.g. I believe a malicious admin can also breach the
system even without this patch. This argument may not be convincing to 
you, but as to this specific case, even if an admin set XenGT flag to
all VMs, what harm will this action do? It only means the ioreq server 
can at most allocate 8K ranges, will that consume all the Xen heaps, 
especially for 64 bit Xen?


Anyway, despite different opinions, I still need to say thank you
for your explanation. Upstreaming XenGT features is my task, it is 
painfully rewarding, to receive suggestions from community maintainers,

which helps a newbie like me better understand the virtualization
technology. :)

Thanks
Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Jan Beulich

>>> On 01.02.16 at 13:49,  wrote:
> On Mon, Feb 01, 2016 at 05:15:16AM -0700, Jan Beulich wrote:
>> >>> On 01.02.16 at 13:02,  wrote:
>> > On Mon, Feb 01, 2016 at 12:52:51AM -0700, Jan Beulich wrote:
>> >> >>> On 30.01.16 at 15:38,  wrote:
>> >> 
>> >> > On 1/30/2016 12:33 AM, Jan Beulich wrote:
>> >> > On 29.01.16 at 11:45,  wrote:
>> >> >>> --- a/xen/arch/x86/hvm/hvm.c
>> >> >>> +++ b/xen/arch/x86/hvm/hvm.c
>> >> >>> @@ -940,6 +940,8 @@ static int 
>> >> >>> hvm_ioreq_server_alloc_rangesets(struct 
>> > hvm_ioreq_server *s,
>> >> >>>   {
>> >> >>>   unsigned int i;
>> >> >>>   int rc;
>> >> >>> +unsigned int max_wp_ram_ranges =
>> >> >>> +
>> >> >>> s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES];
>> >> >>
>> >> >> You're still losing the upper 32 bits here. Iirc you agreed to range
>> >> >> check the value before storing into params[]...
>> >> > 
>> >> > Thanks, Jan. :)
>> >> > In this version, the check is added in routine parse_config_data().
>> >> > If option 'max_wp_ram_ranges' is configured with an unreasonable value,
>> >> > the xl will terminate, before calling xc_hvm_param_set(). Does this
>> >> > change meet your requirement? Or maybe did I have some misunderstanding
>> >> > on this issue?
>> >> 
>> >> Checking in the tools is desirable, but the hypervisor shouldn't rely
>> >> on any tool side checking.
>> >> 
>> > 
>> > As in hypervisor needs to sanitise all input from toolstack? I don't
>> > think Xen does that today.
>> 
>> If it doesn't, then that's a bug. Note that in many cases (domctl-s
>> and alike) such bogus trusting in the tool stack behaving correctly
>> is only not a security issue due to XSA-77. Yet with XSA-77 we
>> made quite clear that we shouldn't knowingly allow in further such
>> issues (it'll be hard enough to find and address all existing ones).
> 
> So are you suggesting pulling the check done in toolstack into
> hypervisor?

I think the check in the tools should stay (allowing for a
distinguishable error message to be issued); all I'm saying is
that doing the check in the tools is not enough.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Ian Jackson

Yu, Zhang writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter 
max_wp_ram_ranges."):
> On 2/2/2016 12:35 AM, Jan Beulich wrote:
>> On 01.02.16 at 17:19, <yu.c.zh...@linux.intel.com> wrote:
> >> So, we need also validate this param in hvm_allow_set_param,
> >> current although hvm_allow_set_param has not performed any
> >> validation other parameters. We need to do this for the new
> >> ones. Is this understanding correct?
> >
> > Yes.
> >
> >> Another question is: as to the tool stack side, do you think
> >> an error message would suffice? Shouldn't xl be terminated?
> >
> > I have no idea what consistent behavior in such a case would
> > be - I'll defer input on this to the tool stack maintainers.
> 
> Thank you.
> Wei, which one do you prefer?

I think that arrangements should be made for the hypercall failure to
be properly reported to the caller, and properly logged.

I don't think it is desirable to duplicate the sanity check in
xl/libxl/libxc.  That would simply result in there being two limits to
update.

I have to say, though, that the situation with this parameter seems
quite unsatisfactory.  It seems to be a kind of bodge.

The changeable limit is there to prevent excessive resource usage by a
guest.  But the docs suggest that the excessive usage might be
normal.  That sounds like a suboptimal design to me.

For reference, here is the docs proposed in this patch:

  =item B

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Yu, Zhang




On 2/1/2016 11:14 PM, Yu, Zhang wrote:



On 2/1/2016 9:07 PM, Jan Beulich wrote:

On 01.02.16 at 13:49,  wrote:

On Mon, Feb 01, 2016 at 05:15:16AM -0700, Jan Beulich wrote:

On 01.02.16 at 13:02,  wrote:

On Mon, Feb 01, 2016 at 12:52:51AM -0700, Jan Beulich wrote:

On 30.01.16 at 15:38,  wrote:



On 1/30/2016 12:33 AM, Jan Beulich wrote:

On 29.01.16 at 11:45,  wrote:

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -940,6 +940,8 @@ static int
hvm_ioreq_server_alloc_rangesets(struct

hvm_ioreq_server *s,

   {
   unsigned int i;
   int rc;
+unsigned int max_wp_ram_ranges =
+
s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES];


You're still losing the upper 32 bits here. Iirc you agreed to
range
check the value before storing into params[]...


Thanks, Jan. :)
In this version, the check is added in routine parse_config_data().
If option 'max_wp_ram_ranges' is configured with an unreasonable
value,
the xl will terminate, before calling xc_hvm_param_set(). Does this
change meet your requirement? Or maybe did I have some
misunderstanding
on this issue?


Checking in the tools is desirable, but the hypervisor shouldn't rely
on any tool side checking.



As in hypervisor needs to sanitise all input from toolstack? I don't
think Xen does that today.


If it doesn't, then that's a bug. Note that in many cases (domctl-s
and alike) such bogus trusting in the tool stack behaving correctly
is only not a security issue due to XSA-77. Yet with XSA-77 we
made quite clear that we shouldn't knowingly allow in further such
issues (it'll be hard enough to find and address all existing ones).


So are you suggesting pulling the check done in toolstack into
hypervisor?


I think the check in the tools should stay (allowing for a
distinguishable error message to be issued); all I'm saying is
that doing the check in the tools is not enough.

Jan



Thank you Jan and Wei. And sorry for the late response.
But I still do not quite understand. :)
If tool stack can guarantee the validity of a parameter,
under which circumstances will hypervisor be threatened?
I'm not familiar with XSA-77, and I'll read it ASAP.

B.R.
Yu


Sorry to bother you, Jan.
After a second thought, I guess one of the security concern
is when some APP is trying to trigger the HVMOP_set_param
directly with some illegal values.
So, we need also validate this param in hvm_allow_set_param,
current although hvm_allow_set_param has not performed any
validation other parameters. We need to do this for the new
ones. Is this understanding correct?
Another question is: as to the tool stack side, do you think
an error message would suffice? Shouldn't xl be terminated?

Thanks
Yu



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Jan Beulich

>>> On 01.02.16 at 16:14,  wrote:
> But I still do not quite understand. :)
> If tool stack can guarantee the validity of a parameter,
> under which circumstances will hypervisor be threatened?

At least in disaggregated environments the hypervisor cannot
trust the (parts of the) tool stack(s) living outside of Dom0. But
even without disaggregation in mind it is bad practice to have
the hypervisor assume the tool stack will only pass sane values.
Just at the example of the param you're introducing: You don't
even do the validation in libxc, so any (theoretical) tool stack
no based on xl/libxl would not be guaranteed to pass a sane
value. And even if you moved it into libxc, one could still argue
that there could an even more theoretical tool stack not even
building on top of libxc.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Yu, Zhang




On 2/2/2016 12:16 AM, Jan Beulich wrote:

On 01.02.16 at 16:14,  wrote:

But I still do not quite understand. :)
If tool stack can guarantee the validity of a parameter,
under which circumstances will hypervisor be threatened?


At least in disaggregated environments the hypervisor cannot
trust the (parts of the) tool stack(s) living outside of Dom0. But
even without disaggregation in mind it is bad practice to have
the hypervisor assume the tool stack will only pass sane values.
Just at the example of the param you're introducing: You don't
even do the validation in libxc, so any (theoretical) tool stack
no based on xl/libxl would not be guaranteed to pass a sane
value. And even if you moved it into libxc, one could still argue
that there could an even more theoretical tool stack not even
building on top of libxc.

Jan



Great. Thank you very much for your patience to explain.
Just sent out another mail about my understanding a moment ago,
seems I partially get it. :)
My vnc connection is too slow, will change the code tomorrow.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Yu, Zhang




On 2/2/2016 12:35 AM, Jan Beulich wrote:

On 01.02.16 at 17:19,  wrote:

After a second thought, I guess one of the security concern
is when some APP is trying to trigger the HVMOP_set_param
directly with some illegal values.


Not sure what "directly" is supposed to mean here.


I mean with no validation by itself, like libxc...


So, we need also validate this param in hvm_allow_set_param,
current although hvm_allow_set_param has not performed any
validation other parameters. We need to do this for the new
ones. Is this understanding correct?


Yes.


Another question is: as to the tool stack side, do you think
an error message would suffice? Shouldn't xl be terminated?


I have no idea what consistent behavior in such a case would
be - I'll defer input on this to the tool stack maintainers.



Thank you.
Wei, which one do you prefer?

Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Wei Liu

On Mon, Feb 01, 2016 at 12:52:51AM -0700, Jan Beulich wrote:
> >>> On 30.01.16 at 15:38,  wrote:
> 
> > On 1/30/2016 12:33 AM, Jan Beulich wrote:
> > On 29.01.16 at 11:45,  wrote:
> >>> --- a/xen/arch/x86/hvm/hvm.c
> >>> +++ b/xen/arch/x86/hvm/hvm.c
> >>> @@ -940,6 +940,8 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
> >>> hvm_ioreq_server *s,
> >>>   {
> >>>   unsigned int i;
> >>>   int rc;
> >>> +unsigned int max_wp_ram_ranges =
> >>> +s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES];
> >>
> >> You're still losing the upper 32 bits here. Iirc you agreed to range
> >> check the value before storing into params[]...
> > 
> > Thanks, Jan. :)
> > In this version, the check is added in routine parse_config_data().
> > If option 'max_wp_ram_ranges' is configured with an unreasonable value,
> > the xl will terminate, before calling xc_hvm_param_set(). Does this
> > change meet your requirement? Or maybe did I have some misunderstanding
> > on this issue?
> 
> Checking in the tools is desirable, but the hypervisor shouldn't rely
> on any tool side checking.
> 

As in hypervisor needs to sanitise all input from toolstack? I don't
think Xen does that today.

What is the difference between this particular configuration option and
all other options in the same hvm_set_conf_params function?

Wei.

> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Jan Beulich

>>> On 01.02.16 at 13:02,  wrote:
> On Mon, Feb 01, 2016 at 12:52:51AM -0700, Jan Beulich wrote:
>> >>> On 30.01.16 at 15:38,  wrote:
>> 
>> > On 1/30/2016 12:33 AM, Jan Beulich wrote:
>> > On 29.01.16 at 11:45,  wrote:
>> >>> --- a/xen/arch/x86/hvm/hvm.c
>> >>> +++ b/xen/arch/x86/hvm/hvm.c
>> >>> @@ -940,6 +940,8 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
> hvm_ioreq_server *s,
>> >>>   {
>> >>>   unsigned int i;
>> >>>   int rc;
>> >>> +unsigned int max_wp_ram_ranges =
>> >>> +s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES];
>> >>
>> >> You're still losing the upper 32 bits here. Iirc you agreed to range
>> >> check the value before storing into params[]...
>> > 
>> > Thanks, Jan. :)
>> > In this version, the check is added in routine parse_config_data().
>> > If option 'max_wp_ram_ranges' is configured with an unreasonable value,
>> > the xl will terminate, before calling xc_hvm_param_set(). Does this
>> > change meet your requirement? Or maybe did I have some misunderstanding
>> > on this issue?
>> 
>> Checking in the tools is desirable, but the hypervisor shouldn't rely
>> on any tool side checking.
>> 
> 
> As in hypervisor needs to sanitise all input from toolstack? I don't
> think Xen does that today.

If it doesn't, then that's a bug. Note that in many cases (domctl-s
and alike) such bogus trusting in the tool stack behaving correctly
is only not a security issue due to XSA-77. Yet with XSA-77 we
made quite clear that we shouldn't knowingly allow in further such
issues (it'll be hard enough to find and address all existing ones).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Wei Liu

On Mon, Feb 01, 2016 at 05:15:16AM -0700, Jan Beulich wrote:
> >>> On 01.02.16 at 13:02,  wrote:
> > On Mon, Feb 01, 2016 at 12:52:51AM -0700, Jan Beulich wrote:
> >> >>> On 30.01.16 at 15:38,  wrote:
> >> 
> >> > On 1/30/2016 12:33 AM, Jan Beulich wrote:
> >> > On 29.01.16 at 11:45,  wrote:
> >> >>> --- a/xen/arch/x86/hvm/hvm.c
> >> >>> +++ b/xen/arch/x86/hvm/hvm.c
> >> >>> @@ -940,6 +940,8 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
> > hvm_ioreq_server *s,
> >> >>>   {
> >> >>>   unsigned int i;
> >> >>>   int rc;
> >> >>> +unsigned int max_wp_ram_ranges =
> >> >>> +
> >> >>> s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES];
> >> >>
> >> >> You're still losing the upper 32 bits here. Iirc you agreed to range
> >> >> check the value before storing into params[]...
> >> > 
> >> > Thanks, Jan. :)
> >> > In this version, the check is added in routine parse_config_data().
> >> > If option 'max_wp_ram_ranges' is configured with an unreasonable value,
> >> > the xl will terminate, before calling xc_hvm_param_set(). Does this
> >> > change meet your requirement? Or maybe did I have some misunderstanding
> >> > on this issue?
> >> 
> >> Checking in the tools is desirable, but the hypervisor shouldn't rely
> >> on any tool side checking.
> >> 
> > 
> > As in hypervisor needs to sanitise all input from toolstack? I don't
> > think Xen does that today.
> 
> If it doesn't, then that's a bug. Note that in many cases (domctl-s
> and alike) such bogus trusting in the tool stack behaving correctly
> is only not a security issue due to XSA-77. Yet with XSA-77 we
> made quite clear that we shouldn't knowingly allow in further such
> issues (it'll be hard enough to find and address all existing ones).
> 

So are you suggesting pulling the check done in toolstack into
hypervisor?

Wei.

> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Wei Liu

On Fri, Jan 29, 2016 at 06:45:14PM +0800, Yu Zhang wrote:
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 25507c7..0c19dee 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "libxl.h"
>  #include "libxl_utils.h"
> @@ -1626,6 +1627,22 @@ static void parse_config_data(const char 
> *config_source,
>  
>  if (!xlu_cfg_get_long (config, "rdm_mem_boundary", , 0))
>  b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
> +
> +if (!xlu_cfg_get_long (config, "max_wp_ram_ranges", , 0)) {
> +uint64_t nr_pages = (b_info->max_memkb << 10) >> XC_PAGE_SHIFT;
> +
> +/* Due to rangeset's ability to combine continuous ranges, this
> + * parameter shall not be configured with values greater than 
> half
> + * of the number of VM's page frames. It also shall not exceed 
> 4G,
> + * because of the limitation from the rangeset side. */
> +if (l > (nr_pages / 2) || l > UINT32_MAX) {
> +fprintf(stderr, "ERROR: Invalid value for 
> \"max_wp_ram_ranges\". "
> +"Shall not exceed %ld or 4G.\n", nr_pages / 2);
> +exit(1);
> +}
> +b_info->u.hvm.max_wp_ram_ranges = l;
> +}
> +

Xl is only one of the applications that use libxl (the library).  This
check should be inside libxl so that all applications (xl, libvirt and
others) have the same behaviour.

Take a look at initiate_domain_create where numerous validations are
done.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Yu, Zhang




On 2/1/2016 7:57 PM, Wei Liu wrote:

On Fri, Jan 29, 2016 at 06:45:14PM +0800, Yu Zhang wrote:

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 25507c7..0c19dee 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -35,6 +35,7 @@
  #include 
  #include 
  #include 
+#include 

  #include "libxl.h"
  #include "libxl_utils.h"
@@ -1626,6 +1627,22 @@ static void parse_config_data(const char *config_source,

  if (!xlu_cfg_get_long (config, "rdm_mem_boundary", , 0))
  b_info->u.hvm.rdm_mem_boundary_memkb = l * 1024;
+
+if (!xlu_cfg_get_long (config, "max_wp_ram_ranges", , 0)) {
+uint64_t nr_pages = (b_info->max_memkb << 10) >> XC_PAGE_SHIFT;
+
+/* Due to rangeset's ability to combine continuous ranges, this
+ * parameter shall not be configured with values greater than half
+ * of the number of VM's page frames. It also shall not exceed 4G,
+ * because of the limitation from the rangeset side. */
+if (l > (nr_pages / 2) || l > UINT32_MAX) {
+fprintf(stderr, "ERROR: Invalid value for \"max_wp_ram_ranges\". 
"
+"Shall not exceed %ld or 4G.\n", nr_pages / 2);
+exit(1);
+}
+b_info->u.hvm.max_wp_ram_ranges = l;
+}
+


Xl is only one of the applications that use libxl (the library).  This
check should be inside libxl so that all applications (xl, libvirt and
others) have the same behaviour.

Take a look at initiate_domain_create where numerous validations are
done.

Wei.



Thank you, Wei. I'll try to move this part into
initiate_domain_create().

B.R.
Yu

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

2016-02-01 Thread Yu, Zhang




On 2/1/2016 9:07 PM, Jan Beulich wrote:

On 01.02.16 at 13:49,  wrote:

On Mon, Feb 01, 2016 at 05:15:16AM -0700, Jan Beulich wrote:

On 01.02.16 at 13:02,  wrote:

On Mon, Feb 01, 2016 at 12:52:51AM -0700, Jan Beulich wrote:

On 30.01.16 at 15:38,  wrote:



On 1/30/2016 12:33 AM, Jan Beulich wrote:

On 29.01.16 at 11:45,  wrote:

--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -940,6 +940,8 @@ static int hvm_ioreq_server_alloc_rangesets(struct

hvm_ioreq_server *s,

   {
   unsigned int i;
   int rc;
+unsigned int max_wp_ram_ranges =
+s->domain->arch.hvm_domain.params[HVM_PARAM_MAX_WP_RAM_RANGES];


You're still losing the upper 32 bits here. Iirc you agreed to range
check the value before storing into params[]...


Thanks, Jan. :)
In this version, the check is added in routine parse_config_data().
If option 'max_wp_ram_ranges' is configured with an unreasonable value,
the xl will terminate, before calling xc_hvm_param_set(). Does this
change meet your requirement? Or maybe did I have some misunderstanding
on this issue?


Checking in the tools is desirable, but the hypervisor shouldn't rely
on any tool side checking.



As in hypervisor needs to sanitise all input from toolstack? I don't
think Xen does that today.


If it doesn't, then that's a bug. Note that in many cases (domctl-s
and alike) such bogus trusting in the tool stack behaving correctly
is only not a security issue due to XSA-77. Yet with XSA-77 we
made quite clear that we shouldn't knowingly allow in further such
issues (it'll be hard enough to find and address all existing ones).


So are you suggesting pulling the check done in toolstack into
hypervisor?


I think the check in the tools should stay (allowing for a
distinguishable error message to be issued); all I'm saying is
that doing the check in the tools is not enough.

Jan



Thank you Jan and Wei. And sorry for the late response.
But I still do not quite understand. :)
If tool stack can guarantee the validity of a parameter,
under which circumstances will hypervisor be threatened?
I'm not familiar with XSA-77, and I'll read it ASAP.

B.R.
Yu


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 >

1 - 100 of 105 matches

Mail list logo