Re: [Xen-devel] [PATCH] domain_create: honour global grant/maptrack frame limits...

2019-11-13 Thread Durrant, Paul
Sorry, the Cc list got dropped... I'll re-send.

  Paul

> -Original Message-
> From: Paul Durrant 
> Sent: 13 November 2019 13:47
> To: xen-devel@lists.xenproject.org
> Cc: Durrant, Paul 
> Subject: [PATCH] domain_create: honour global grant/maptrack frame
> limits...
> 
> ...when their values are larger than the per-domain configured limits.
> 
> Signed-off-by: Paul Durrant 
> ---
> After mining through commits it is still unclear to me exactly when Xen
> stopped honouring the global values, but I really think this commit should
> be back-ported to stable trees as it was a behavioural change that can
> cause domUs to fail in non-obvious ways.
> ---
>  xen/common/domain.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 66c7fc..aad6d55b82 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -335,6 +335,7 @@ struct domain *domain_create(domid_t domid,
>  enum { INIT_watchdog = 1u<<1,
> INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch = 1u<<5 };
>  int err, init_status = 0;
> +unsigned int max_grant_frames, max_maptrack_frames;
> 
>  if ( config && (err = sanitise_domain_config(config)) )
>  return ERR_PTR(err);
> @@ -456,8 +457,17 @@ struct domain *domain_create(domid_t domid,
>  goto fail;
>  init_status |= INIT_evtchn;
> 
> -if ( (err = grant_table_init(d, config->max_grant_frames,
> - config->max_maptrack_frames)) != 0 )
> +/*
> + * Make sure that the configured values don't reduce any
> + * global command line override.
> + */
> +max_grant_frames = max(config->max_grant_frames,
> +   opt_max_grant_frames);
> +max_maptrack_frames = max(config->max_maptrack_frames,
> +  opt_max_maptrack_frames);
> +
> +if ( (err = grant_table_init(d, max_grant_frames,
> + max_maptrack_frames)) != 0 )
>  goto fail;
>  init_status |= INIT_gnttab;
> 
> --
> 2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] domain_create: honour global grant/maptrack frame limits...

2019-11-13 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 13 November 2019 14:05
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH] domain_create: honour global
> grant/maptrack frame limits...
> 
> On 13/11/2019 13:47, Paul Durrant wrote:
> > ...when their values are larger than the per-domain configured limits.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > After mining through commits it is still unclear to me exactly when Xen
> > stopped honouring the global values, but I really think this commit
> should
> > be back-ported to stable trees as it was a behavioural change that can
> > cause domUs to fail in non-obvious ways.
> 
> -1.  Overriding toolstack settings like this is the same kind of "bad"
> as silently converting HAP => Shadow.
> 
> In particular, this breaks one of points of the original per-domain work
> to deliberately allow stub xenstored to be configured with tiny
> grant/maptrack tables.

Ok, but IMO subtly breaking domUs in the process is not really acceptable 
behaviour.

> 
> You also break the setting of these parameters in xl.conf.

No I don't. xl.conf can still increase values over the command line.

> 
> I'm not defending how the interface changed subtly/unexpected; that was
> bad and we should have done better, but this change is just as bad in
> the opposite direction.
> 
> The way to fix this is to plumb the Xen default parameters though so
> that, in the absence of any explicit configuration (vm.cfg or xl.conf),
> libxl can then use "xen cmdline" as a source of configuration, before
> falling back to hardcoded numbers.
> 

I agree that is the best way to fix it, but not so easy to backport. So my 
proposal (via email a few days ago) was that regressions are fixed immediately 
in this way and then a *proper* fix is done moving forward, which I shall base 
upon Juergen's patches which should allow easy retrieval of the command line 
values.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating

2019-11-14 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 14 November 2019 16:42
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross ; Sander Eikelenboom
> ; Andrew Cooper 
> Subject: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating
> 
> update_paging_mode() in the AMD IOMMU code expects to be invoked with
> the PCI devices lock held. The check occurring only when the mode
> actually needs updating, the violation of this rule by the majority
> of callers did go unnoticed until per-domain IOMMU setup was changed
> to do away with on-demand creation of IOMMU page tables.

Wouldn't it be safer to just get rid of update_paging_mode() and start with a 
reasonable number of levels?

  Paul

> 
> Unfortunately the only half way reasonable fix to this that I could
> come up with requires more re-work than would seem desirable at this
> time of the release process, but addressing the issue seems
> unavoidable to me as its manifestation is a regression from the
> IOMMU page table setup re-work. The change also isn't without risk
> of further regressions - if in patch 2 I've missed a code path that
> would also need to invoke the new hook, then this might mean non-
> working guests (with passed-through devices on AMD hardware).
> 
> 1: introduce GFN notification for translated domains
> 2: AMD/IOMMU: use notify_dfn() hook to update paging mode
> 
> Jan
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Call for new Release Manager for Xen 4.14+

2019-11-15 Thread Durrant, Paul
> -Original Message-
> From: Lars Kurth 
> Sent: 07 November 2019 22:30
> To: xen-devel ; Juergen Gross
> 
> Cc: committ...@xenproject.org; Durrant, Paul ; Brian
> Woods 
> Subject: Call for new Release Manager for Xen 4.14+
> 
> Dear Community Members,
> 
> Juergen will be stepping down as Release Manager after Xen 4.13 has been
> delivered, following the 4.11 and 4.12 release. Release managers prior to
> Juergen were Julien Grall, Konrad Wilk, Wei Liu and George Dunlap. We are
> looking for active community members to follow in previous release
> managers footsteps. I also wanted to thank Juergen for performing the
> role.
> 
> We have discussed with a number of people, however Wei made the very valid
> point that we should make an announcement about the role on the list.  In
> terms of effort, the effort required prior to the release is relatively
> low (1-2 days a month), however in the last two months of the release goes
> up to 1-2 days per week. Typically release managers manage 2-3 releases.
> 
> What is involved in the role is described here:
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/process/xen-
> release-
> management.pandoc;h=d6abc90a0248b769161bce79e8dc6904c654904a;hb=HEAD
> 
> If you are a community member that feels the release manager role would be
> a good match for you, please contact me: also feel free to ask me or
> previous release managers any questions

[Replying publicly as requested by Lars]

I would be happy to do the job, so you can consider me a candidate.

  Paul

> 
> Best Regards
> Lars

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating

2019-11-15 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 15 November 2019 09:29
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Sander Eikelenboom ;
> Juergen Gross 
> Subject: Re: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating
> 
> On 14.11.2019 18:29,  Durrant, Paul  wrote:
> >> -Original Message-
> >> From: Xen-devel  On Behalf Of
> Jan
> >> Beulich
> >> Sent: 14 November 2019 16:42
> >> To: xen-devel@lists.xenproject.org
> >> Cc: Juergen Gross ; Sander Eikelenboom
> >> ; Andrew Cooper 
> >> Subject: [Xen-devel] [PATCH v2 0/2] AMD/IOMMU: re-work mode updating
> >>
> >> update_paging_mode() in the AMD IOMMU code expects to be invoked with
> >> the PCI devices lock held. The check occurring only when the mode
> >> actually needs updating, the violation of this rule by the majority
> >> of callers did go unnoticed until per-domain IOMMU setup was changed
> >> to do away with on-demand creation of IOMMU page tables.
> >
> > Wouldn't it be safer to just get rid of update_paging_mode() and start
> > with a reasonable number of levels?
> 
> Andrew did basically ask the same, but I continue to be unconvinced:
> We can't pick a "reasonable" level, we have to pick the maximum a
> guest may end up using. Yet why would we want to have all guests pay
> the price of at least one unnecessary page walk level? I don't mean
> to say I'm entirely opposed, but trading code simplicity for
> performance is almost never an easy or obvious decision.

I think in this case, versus the hoops your patches have to jump through just 
to save (possibly) a level of IOMMU page walk, the simplicity argument is quite 
compelling... particularly at this stage in the release cycle.
The fact that we don't know, at start of day, what the max gfn of the guest is 
going to be is also something that really ought to be fixed too... but that is 
another debate.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking and logging

2019-11-04 Thread Durrant, Paul
> -Original Message-
> From: Anthony PERARD 
> Sent: 04 November 2019 12:14
> To: Durrant, Paul 
> Cc: Andrew Cooper ; xen-
> de...@lists.xenproject.org; jgr...@suse.com; Igor Druzhinin
> ; jbeul...@suse.com
> Subject: Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking
> and logging
> 
> On Mon, Nov 04, 2019 at 11:13:48AM +, Durrant, Paul wrote:
> > > -Original Message-
> > > From: Andrew Cooper 
> > > Sent: 04 November 2019 11:06
> > > To: Durrant, Paul ; xen-
> de...@lists.xenproject.org
> > > Cc: Igor Druzhinin ; jgr...@suse.com;
> > > jbeul...@suse.com
> > > Subject: Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify
> locking
> > > and logging
> > >
> > > On 04/11/2019 08:31, Durrant, Paul wrote:
> > > >> -Original Message-
> > > >> From: Igor Druzhinin 
> > > >> Sent: 01 November 2019 19:28
> > > >> To: xen-devel@lists.xenproject.org
> > > >> Cc: Durrant, Paul ; jbeul...@suse.com;
> > > >> jgr...@suse.com
> > > >> Subject: [PATCH for-4.13 v2] passthrough: simplify locking and
> logging
> > > >>
> > > >> From: Paul Durrant 
> > > >>
> > > >>
> > > >> Signed-off-by: Paul Durrant 
> > > >> ---
> > > >>
> > > >>
> > > >> v2: updated Paul's email address
> > >
> > > This was work you did at Citrix, yes?
> > >
> > > > Reviewed-by: Paul Durrant 
> > >
> > > SoB and R-by?
> >
> > I did do the work while I was at Citrix, but surely the SoB must be
> updated since the patch is only now being posted?
> 
> I don't think it matters when a patch is publicly posted, the SoB
> shouldn't change.
> Also, Igor, I think you need to add your own SoB to the patch. This would
> be because of (b) or (c) of the "Developer's Certificate of Origin 1.1"
> [1].
> 
> > As for the R-b, why should that be historic?
> 
> I think he meant that reviewing your own work is a bit weird. On the
> other hand, it is possible to have both a SoB and a R-b from the same
> persone, if the original patch has been modified.

I was merely reviewing the change of email address and verifying that it was 
the patch I wrote :-)

 Paul

> 
> 
> [1]:
> Developer's Certificate of Origin 1.1
> 
> By making a contribution to this project, I certify that:
> 
> (a) The contribution was created in whole or in part by me and I
> have the right to submit it under the open source license
> indicated in the file; or
> 
> (b) The contribution is based upon previous work that, to the best
> of my knowledge, is covered under an appropriate open source
> license and I have the right under that license to submit that
> work with modifications, whether created in whole or in part
> by me, under the same open source license (unless I am
> permitted to submit under a different license), as indicated
> in the file; or
> 
> (c) The contribution was provided directly to me by some other
> person who certified (a), (b) or (c) and I have not modified
> it.
> 
> (d) I understand and agree that this project and the contribution
> are public and that a record of the contribution (including all
> personal information I submit with it, including my sign-off) is
> maintained indefinitely and may be redistributed consistent with
> this project or the open source license(s) involved.
> 
> 
> Cheers,
> 
> --
> Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] max_grant_frames/max_maptrack_frames

2019-11-08 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 08 November 2019 12:14
> To: Jan Beulich ; Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] max_grant_frames/max_maptrack_frames
> 
> On 08.11.19 12:38, Jan Beulich wrote:
> > On 08.11.2019 09:45,  Durrant, Paul  wrote:
> >> When per-domain options for maximum grant and maptrack frames came in
> (in 4.10?) Xen's behaviour w.r.t. to the global command line values
> (gnttab_max_frames and gnttab_max_maptrack_frames respectively) regressed
> >>
> >> For example, a host running a prior version of Xen with a command line
> setting gnttab_max_frames=128 would have all of its domUs running with 128
> frames. However, after update to a newer Xen, they will only get 32 frames
> (unless the host is particularly large, in which case they will get 64).
> Why is this? It's because neither xl.cfg files, nor xl.conf, will specify
> values (because the scenario is an update from an older installation) and
> so the hardcoded 32/64 default applies. Hence some domUs with large
> numbers of PV devices start failing (or at least substantially slow down)
> and admins start wondering what's going on.
> >>
> >> So how best to fix this?
> >>
> >> For the sake of a quick fix for the regression, and ease of back-
> porting, I think it would be best to add a check in domain_create() and
> create the grant table with parameters which are the larger of the
> toolstack configured value and the corresponding command line value.
> >
> > How about people simply setting the value in xl.conf, if indeed in can
> be
> > set there?
> >
> >> This does, however, go against the recent direction of the toolstack
> getting exactly what it asked for. So for the longer term I am wondering
> whether there ought to be a way for the toolstack to query the globally
> configured grant table limits. A GNTTABOP seems the wrong candidate for
> this, since GNTTABOPs are per-domain, so I'm wondering about a new sysctl
> to return the value of a named command line parameter.
> >
> > Such a series was already posted (and even had some review, so it's
> > already at v4, but iirc no update has been provided since May):
> > https://lists.xenproject.org/archives/html/xen-devel/2019-
> 05/msg02206.html
> 
> My "Hypervisor file system" series includes that functionality:
> 
> https://patchew.org/Xen/20191002112004.25793-1-jgr...@suse.com/
> 

Oh, even better :-)

  Paul

> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] max_grant_frames/max_maptrack_frames

2019-11-08 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 08 November 2019 11:38
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] max_grant_frames/max_maptrack_frames
> 
> On 08.11.2019 09:45,  Durrant, Paul  wrote:
> > When per-domain options for maximum grant and maptrack frames came in
> (in 4.10?) Xen's behaviour w.r.t. to the global command line values
> (gnttab_max_frames and gnttab_max_maptrack_frames respectively) regressed
> >
> > For example, a host running a prior version of Xen with a command line
> setting gnttab_max_frames=128 would have all of its domUs running with 128
> frames. However, after update to a newer Xen, they will only get 32 frames
> (unless the host is particularly large, in which case they will get 64).
> Why is this? It's because neither xl.cfg files, nor xl.conf, will specify
> values (because the scenario is an update from an older installation) and
> so the hardcoded 32/64 default applies. Hence some domUs with large
> numbers of PV devices start failing (or at least substantially slow down)
> and admins start wondering what's going on.
> >
> > So how best to fix this?
> >
> > For the sake of a quick fix for the regression, and ease of back-
> porting, I think it would be best to add a check in domain_create() and
> create the grant table with parameters which are the larger of the
> toolstack configured value and the corresponding command line value.
> 
> How about people simply setting the value in xl.conf, if indeed in can be
> set there?

It could be set there, but that's really not the right solution. A set of 
command line parameters that appropriately configured the host on an older Xen 
really ought to continue to do the same after installation of the newer Xen, 
without any additional config requirements.

> 
> > This does, however, go against the recent direction of the toolstack
> getting exactly what it asked for. So for the longer term I am wondering
> whether there ought to be a way for the toolstack to query the globally
> configured grant table limits. A GNTTABOP seems the wrong candidate for
> this, since GNTTABOPs are per-domain, so I'm wondering about a new sysctl
> to return the value of a named command line parameter.
> 
> Such a series was already posted (and even had some review, so it's
> already at v4, but iirc no update has been provided since May):
> https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg02206.html

Ok, I'll take a look. Thanks,

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] max_grant_frames/max_maptrack_frames

2019-11-08 Thread Durrant, Paul
Picking up the discussion from IRC to make it more widely visible...

When per-domain options for maximum grant and maptrack frames came in (in 
4.10?) Xen's behaviour w.r.t. to the global command line values 
(gnttab_max_frames and gnttab_max_maptrack_frames respectively) regressed

For example, a host running a prior version of Xen with a command line setting 
gnttab_max_frames=128 would have all of its domUs running with 128 frames. 
However, after update to a newer Xen, they will only get 32 frames (unless the 
host is particularly large, in which case they will get 64). Why is this? It's 
because neither xl.cfg files, nor xl.conf, will specify values (because the 
scenario is an update from an older installation) and so the hardcoded 32/64 
default applies. Hence some domUs with large numbers of PV devices start 
failing (or at least substantially slow down) and admins start wondering what's 
going on.

So how best to fix this?

For the sake of a quick fix for the regression, and ease of back-porting, I 
think it would be best to add a check in domain_create() and create the grant 
table with parameters which are the larger of the toolstack configured value 
and the corresponding command line value. This does, however, go against the 
recent direction of the toolstack getting exactly what it asked for. So for the 
longer term I am wondering whether there ought to be a way for the toolstack to 
query the globally configured grant table limits. A GNTTABOP seems the wrong 
candidate for this, since GNTTABOPs are per-domain, so I'm wondering about a 
new sysctl to return the value of a named command line parameter.

Thoughts?

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/2] xen/blkback: Aggressively shrink page pools if a memory pressure is detected

2019-12-04 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> SeongJae Park
> Sent: 04 December 2019 11:34
> To: konrad.w...@oracle.com; roger@citrix.com; ax...@kernel.dk
> Cc: sj38.p...@gmail.com; xen-devel@lists.xenproject.org; linux-
> bl...@vger.kernel.org; linux-ker...@vger.kernel.org; Park, Seongjae
> 
> Subject: [Xen-devel] [PATCH 0/2] xen/blkback: Aggressively shrink page
> pools if a memory pressure is detected
> 
> Each `blkif` has a free pages pool for the grant mapping.  The size of
> the pool starts from zero and be increased on demand while processing
> the I/O requests.  If current I/O requests handling is finished or 100
> milliseconds has passed since last I/O requests handling, it checks and
> shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> 
> Therefore, `blkfront` running guests can cause a memory pressure in the
> `blkback` running guest by attaching arbitrarily large number of block
> devices and inducing I/O.

OOI... How do guests unilaterally cause the attachment of arbitrary numbers of 
PV devices?

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.14 and future work

2019-12-03 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Andrew Cooper
> Sent: 02 December 2019 19:52
> To: Xen-devel List 
> Subject: [Xen-devel] Xen 4.14 and future work
> 
> Hello,
> 
> Now that 4.13 is on its way out of the door, it is time to look to
> ongoing work.
> 
> We have a large backlog of speculation-related work.  For one, we still
> don't virtualise MSR_ARCH_CAPS for guests, or use eIBRS ourselves in
> Xen.  Therefore, while Xen does function on Cascade Lake, support is
> distinctly suboptimal.
> 
> Similarly, AMD systems frequently fill /var/log with:
> 
> (XEN) emul-priv-op.c:1113:d0v13 Domain attempted WRMSR c0011020 from
> 0x00064040 to 0x000640400400
> 
> which is an interaction Linux's prctl() to disable memory disambiguation
> on a per-process basis, Xen's write/discard behaviour for MSRs, and the
> long-overdue series to properly virtualise SSBD support on AMD
> hardware.  AMD Rome hardware, like Cascade Lake, has certain hardware
> speculative mitigation features which need virtualising for guests to
> make use of.
> 

I assume this would addressed by the proposed cpuid/msr policy work? I think it 
is quite vital for Xen that we are able to migrate guests across pools of 
heterogeneous h/w and therefore I'd like to see this done in 4.14 if possible.

> 
> Similarly, there is plenty more work to do with core-aware scheduling,
> and from my side of things, sane guest topology.  This will eventually
> unblock one of the factors on the hard 128 vcpu limit for HVM guests.
> 
> 
> Another big area is the stability of toolstack hypercalls.  This is a
> crippling pain point for distros and upgradeability of systems, and
> there is frankly no justifiable reason for the way we currently do
> things  The real reason is inertia from back in the days when Xen.git
> (bitkeeper as it was back then) contained a fork of every relevant
> pieces of software, but this a long-since obsolete model, but still
> causing us pain.  I will follow up with a proposal in due course, but as
> a oneliner, it will build on the dm_op() API model.

This is also fairly vital for the work on live update of Xen (as discussed at 
the last dev summit). Any instability in the tools ABI will compromise 
hypervisor update and fixing such issues on an ad-hoc basis as they arise is 
not really a desirable prospect.

> 
> Likely included within this is making the domain/vcpu destroy paths
> idempotent so we can fix a load of NULL pointer dereferences in Xen
> caused by XEN_DOMCTL_max_vcpus not being part of XEN_DOMCTL_createdomain.
> 
> Other work in this area involves adding X86_EMUL_{VIRIDIAN,NESTED_VIRT}
> to replace their existing problematic enablement interfaces.
> 

I think this should include deprecation of HVMOP_get/set_param as far as is 
possible (i.e. tools use)...

> 
> A start needs to be made on a total rethink of the HVM ABI.  This has
> come up repeatedly at previous dev summits, and is in desperate need of
> having some work started on it.
> 

...and completely in any new ABI.

I wonder to what extent we can provide a guest-side compat layer here, 
otherwise it would be hard to get traction I think.
There was an interesting talk at KVM Forum (https://sched.co/Tmuy) on dealing 
with emulation inside guest context by essentially re-injecting the VMEXITs 
back into the guest for pseudo-SMM code (loaded as part of the firmware blob) 
to deal with. I could imagine potentially using such a mechanism to have a 
'legacy' hypercall translated to the new ABI, which would allow older guests to 
be supported unmodified (albeit with a performance penalty). Such a mechanism 
may also be useful as an alternative way of dealing with some of the emulation 
dealt with directly in Xen at the moment, to reduce the hypervisor attack 
surface e.g. stdvga caching, hpet, rtc... perhaps.

Cheers,

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory pressure

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 09:39
> To: Park, Seongjae ; ax...@kernel.dk;
> konrad.w...@oracle.com; roger@citrix.com
> Cc: linux-bl...@vger.kernel.org; linux-ker...@vger.kernel.org; Durrant,
> Paul ; sj38.p...@gmail.com; xen-
> de...@lists.xenproject.org
> Subject: Re: [PATCH v3 0/1] xen/blkback: Squeeze page pools if a memory
> pressure
> 
> On 09.12.19 09:58, SeongJae Park wrote:
> > Each `blkif` has a free pages pool for the grant mapping.  The size of
> > the pool starts from zero and be increased on demand while processing
> > the I/O requests.  If current I/O requests handling is finished or 100
> > milliseconds has passed since last I/O requests handling, it checks and
> > shrinks the pool to not exceed the size limit, `max_buffer_pages`.
> >
> > Therefore, `blkfront` running guests can cause a memory pressure in the
> > `blkback` running guest by attaching a large number of block devices and
> > inducing I/O.
> 
> I'm having problems to understand how a guest can attach a large number
> of block devices without those having been configured by the host admin
> before.
> 
> If those devices have been configured, dom0 should be ready for that
> number of devices, e.g. by having enough spare memory area for ballooned
> pages.
> 
> So either I'm missing something here or your reasoning for the need of
> the patch is wrong.
> 

I think the underlying issue is that persistent grant support is hogging memory 
in the backends, thereby compromising scalability. IIUC this patch is 
essentially a band-aid to get back to the scalability that was possible before 
persistent grant support was added. Ultimately the right answer should be to 
get rid of persistent grants support and use grant copy, but such a change is 
clearly more invasive and would need far more testing.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-5.0 v3 5/6] hw/pci-host/i440fx: Extract the IGD passthrough host bridge device

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Philippe Mathieu-Daudé
> Sent: 09 December 2019 09:50
> To: qemu-de...@nongnu.org
> Cc: Thomas Huth ; Stefano Stabellini
> ; Michael S. Tsirkin ; Paul
> Durrant ; Markus Armbruster ; Alex
> Williamson ; Marcel Apfelbaum
> ; Paolo Bonzini ; Anthony
> Perard ; xen-devel@lists.xenproject.org;
> Philippe Mathieu-Daudé 
> Subject: [Xen-devel] [PATCH-for-5.0 v3 5/6] hw/pci-host/i440fx: Extract
> the IGD passthrough host bridge device
> 
> We can use a i440FX without the IGD passthrough host bridge.
> Extract it into a new file, 'hw/pci-host/igd_pt.c'.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Acked-by: Paul Durrant 

> ---
> v3:
> - Rename as 'xen_igd_pt.c' (Alex Williamson)
> - Add an entry in MAINTAINERS::Xen
> ---
>  hw/pci-host/i440fx.c  |  84 --
>  hw/pci-host/xen_igd_pt.c  | 120 ++
>  MAINTAINERS   |   1 +
>  hw/pci-host/Makefile.objs |   1 +
>  4 files changed, 122 insertions(+), 84 deletions(-)
>  create mode 100644 hw/pci-host/xen_igd_pt.c
> 
> diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
> index 414138595b..bae7b42327 100644
> --- a/hw/pci-host/i440fx.c
> +++ b/hw/pci-host/i440fx.c
> @@ -368,89 +368,6 @@ static const TypeInfo i440fx_info = {
>  },
>  };
> 
> -/* IGD Passthrough Host Bridge. */
> -typedef struct {
> -uint8_t offset;
> -uint8_t len;
> -} IGDHostInfo;
> -
> -/* Here we just expose minimal host bridge offset subset. */
> -static const IGDHostInfo igd_host_bridge_infos[] = {
> -{PCI_REVISION_ID, 2},
> -{PCI_SUBSYSTEM_VENDOR_ID, 2},
> -{PCI_SUBSYSTEM_ID,2},
> -{0x50,2}, /* SNB: processor graphics control
> register */
> -{0x52,2}, /* processor graphics control register
> */
> -{0xa4,4}, /* SNB: graphics base of stolen memory
> */
> -{0xa8,4}, /* SNB: base of GTT stolen memory */
> -};
> -
> -static void host_pci_config_read(int pos, int len, uint32_t *val, Error
> **errp)
> -{
> -int rc, config_fd;
> -/* Access real host bridge. */
> -char *path =
> g_strdup_printf("/sys/bus/pci/devices/%04x:%02x:%02x.%d/%s",
> - 0, 0, 0, 0, "config");
> -
> -config_fd = open(path, O_RDWR);
> -if (config_fd < 0) {
> -error_setg_errno(errp, errno, "Failed to open: %s", path);
> -goto out;
> -}
> -
> -if (lseek(config_fd, pos, SEEK_SET) != pos) {
> -error_setg_errno(errp, errno, "Failed to seek: %s", path);
> -goto out_close_fd;
> -}
> -
> -do {
> -rc = read(config_fd, (uint8_t *)val, len);
> -} while (rc < 0 && (errno == EINTR || errno == EAGAIN));
> -if (rc != len) {
> -error_setg_errno(errp, errno, "Failed to read: %s", path);
> -}
> -
> -out_close_fd:
> -close(config_fd);
> -out:
> -g_free(path);
> -}
> -
> -static void igd_pt_i440fx_realize(PCIDevice *pci_dev, Error **errp)
> -{
> -uint32_t val = 0;
> -size_t i;
> -int pos, len;
> -Error *local_err = NULL;
> -
> -for (i = 0; i < ARRAY_SIZE(igd_host_bridge_infos); i++) {
> -pos = igd_host_bridge_infos[i].offset;
> -len = igd_host_bridge_infos[i].len;
> -host_pci_config_read(pos, len, , _err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> -return;
> -}
> -pci_default_write_config(pci_dev, pos, val, len);
> -}
> -}
> -
> -static void igd_passthrough_i440fx_class_init(ObjectClass *klass, void
> *data)
> -{
> -DeviceClass *dc = DEVICE_CLASS(klass);
> -PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> -
> -k->realize = igd_pt_i440fx_realize;
> -dc->desc = "IGD Passthrough Host bridge";
> -}
> -
> -static const TypeInfo igd_passthrough_i440fx_info = {
> -.name  = TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE,
> -.parent= TYPE_I440FX_PCI_DEVICE,
> -.instance_size = sizeof(PCII440FXState),
> -.class_init= igd_passthrough_i440fx_class_init,
> -};
> -
>  static const char *i440fx_pcihost_root_bus_path(PCIHostState
> *host_bridge,
>  PCIBus *rootbus)
>  {
> @@ -495,7 +412,6 @@ static const TypeInfo i440fx_pcihost_info = {
>  static void i440fx_register_types(void)
>  {
>  type_register_static(_info);
> -type_register_static(_passthrough_i440fx_info);
>  type_register_static(_pcihost_info);
>  }
> 
> diff --git a/hw/pci-host/xen_igd_pt.c b/hw/pci-host/xen_igd_pt.c
> new file mode 100644
> index 00..efcc9347ff
> --- /dev/null
> +++ b/hw/pci-host/xen_igd_pt.c
> @@ -0,0 +1,120 @@
> +/*
> + * QEMU Intel IGD Passthrough Host Bridge Emulation
> + *
> + * Copyright (c) 2006 Fabrice Bellard
> + *
> + * SPDX-License-Identifier: MIT
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining
> a copy
> + * of this software 

Re: [Xen-devel] [PATCH-for-5.0 v3 6/6] hw/pci-host: Add Kconfig entry to select the IGD Passthrough Host Bridge

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Philippe Mathieu-Daudé
> Sent: 09 December 2019 09:50
> To: qemu-de...@nongnu.org
> Cc: Thomas Huth ; Stefano Stabellini
> ; Michael S. Tsirkin ; Paul
> Durrant ; Markus Armbruster ; Alex
> Williamson ; Marcel Apfelbaum
> ; Paolo Bonzini ; Anthony
> Perard ; xen-devel@lists.xenproject.org;
> Philippe Mathieu-Daudé 
> Subject: [Xen-devel] [PATCH-for-5.0 v3 6/6] hw/pci-host: Add Kconfig entry
> to select the IGD Passthrough Host Bridge
> 
> Add the XEN_IGD_PASSTHROUGH Kconfig option.
> 
> Xen build has that option selected by default. Non-Xen builds now
> have to select this feature manually.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v3: Only default with Xen (Alex Williamson)
> 
> I did not used 'depends on XEN' as suggested by Alex but
> 'default y if XEN', so one can build XEN without this feature
> (for example, on other ARCH than X86).

Allowing it to be compiled out for Xen builds is quite reasonable IMO. I don't 
believe it is widely used.

Acked-by: Paul Durrant 

> ---
>  hw/pci-host/Kconfig   | 5 +
>  hw/pci-host/Makefile.objs | 2 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/pci-host/Kconfig b/hw/pci-host/Kconfig
> index b0aa8351c4..24ba8ea046 100644
> --- a/hw/pci-host/Kconfig
> +++ b/hw/pci-host/Kconfig
> @@ -1,6 +1,11 @@
>  config PAM
>  bool
> 
> +config XEN_IGD_PASSTHROUGH
> +bool
> +default y if XEN
> +select PCI_I440FX
> +
>  config PREP_PCI
>  bool
>  select PCI
> diff --git a/hw/pci-host/Makefile.objs b/hw/pci-host/Makefile.objs
> index fa6d1556c0..9c466fab01 100644
> --- a/hw/pci-host/Makefile.objs
> +++ b/hw/pci-host/Makefile.objs
> @@ -14,7 +14,7 @@ common-obj-$(CONFIG_VERSATILE_PCI) += versatile.o
>  common-obj-$(CONFIG_PCI_SABRE) += sabre.o
>  common-obj-$(CONFIG_FULONG) += bonito.o
>  common-obj-$(CONFIG_PCI_I440FX) += i440fx.o
> -common-obj-$(CONFIG_PCI_I440FX) += xen_igd_pt.o
> +common-obj-$(CONFIG_XEN_IGD_PASSTHROUGH) += xen_igd_pt.o
>  common-obj-$(CONFIG_PCI_EXPRESS_Q35) += q35.o
>  common-obj-$(CONFIG_PCI_EXPRESS_GENERIC_BRIDGE) += gpex.o
>  common-obj-$(CONFIG_PCI_EXPRESS_XILINX) += xilinx-pcie.o
> --
> 2.21.0
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 11:39
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> Gross ; Stefano Stabellini ;
> Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> > Only force state to closed in the case when the toolstack may need to
> > clean up. This can be detected by checking whether the state in xenstore
> > has been set to closing prior to device removal.
> 
> I'm not sure I see the point of this, I would expect that a failure to
> probe or the removal of the device would leave the xenbus state as
> closed, which is consistent with the actual driver state.
> 
> Can you explain what's the benefit of leaving a device without a
> driver in such unknown state?
> 

If probe fails then I think it should leave the state alone. If the state is 
moved to closed then basically you just killed that connection to the guest (as 
the frontend will normally close down when it sees this change) so, if the 
probe failure was due to a bug in blkback or, e.g., a transient resource issue 
then it's game over as far as that guest goes.
The ultimate goal here is PV backend re-load that is completely transparent to 
the guest. Modifying anything in xenstore compromises that so we need to be 
careful.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] xen-blkback: support dynamic unbind/bind

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 12:17
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Konrad
> Rzeszutek Wilk ; Jens Axboe ;
> Boris Ostrovsky ; Juergen Gross
> ; Stefano Stabellini 
> Subject: Re: [PATCH 4/4] xen-blkback: support dynamic unbind/bind
> 
> On Thu, Dec 05, 2019 at 02:01:23PM +, Paul Durrant wrote:
> > By simply re-attaching to shared rings during connect_ring() rather than
> > assuming they are freshly allocated (i.e assuming the counters are zero)
> > it is possible for vbd instances to be unbound and re-bound from and to
> > (respectively) a running guest.
> >
> > This has been tested by running:
> >
> > while true; do dd if=/dev/urandom of=test.img bs=1M count=1024; done
> >
> > in a PV guest whilst running:
> >
> > while true;
> >   do echo vbd-$DOMID-$VBD >unbind;
> >   echo unbound;
> >   sleep 5;
> >   echo vbd-$DOMID-$VBD >bind;
> >   echo bound;
> >   sleep 3;
> >   done
> 
> So this does unbind blkback while leaving the PV interface as
> connected?
> 

Yes, everything is left in place in the frontend. The backend detaches from the 
ring, closes its end of the event channels, etc. but the guest can still send 
requests which will get serviced when the new backend attaches.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 14:10
> To: Durrant, Paul ; Roger Pau Monné
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 15:06, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 13:39
> >> To: Durrant, Paul ; Roger Pau Monné
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced
> to
> >> closed
> >>
> >> On 09.12.19 13:19, Durrant, Paul wrote:
> >>>> -Original Message-
> >>>> From: Jürgen Groß 
> >>>> Sent: 09 December 2019 12:09
> >>>> To: Durrant, Paul ; Roger Pau Monné
> >>>> 
> >>>> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >> Stefano
> >>>> Stabellini ; Boris Ostrovsky
> >>>> 
> >>>> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced
> >> to
> >>>> closed
> >>>>
> >>>> On 09.12.19 13:03, Durrant, Paul wrote:
> >>>>>> -Original Message-
> >>>>>> From: Jürgen Groß 
> >>>>>> Sent: 09 December 2019 11:55
> >>>>>> To: Roger Pau Monné ; Durrant, Paul
> >>>>>> 
> >>>>>> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >>>> Stefano
> >>>>>> Stabellini ; Boris Ostrovsky
> >>>>>> 
> >>>>>> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> >> forced
> >>>> to
> >>>>>> closed
> >>>>>>
> >>>>>> On 09.12.19 12:39, Roger Pau Monné wrote:
> >>>>>>> On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> >>>>>>>> Only force state to closed in the case when the toolstack may
> need
> >> to
> >>>>>>>> clean up. This can be detected by checking whether the state in
> >>>>>> xenstore
> >>>>>>>> has been set to closing prior to device removal.
> >>>>>>>
> >>>>>>> I'm not sure I see the point of this, I would expect that a
> failure
> >> to
> >>>>>>> probe or the removal of the device would leave the xenbus state as
> >>>>>>> closed, which is consistent with the actual driver state.
> >>>>>>>
> >>>>>>> Can you explain what's the benefit of leaving a device without a
> >>>>>>> driver in such unknown state?
> >>>>>>
> >>>>>> And more concerning: did you check that no frontend/backend is
> >>>>>> relying on the closed state to be visible without closing having
> been
> >>>>>> set before?
> >>>>>
> >>>>> Blkfront doesn't seem to mind and I believe the Windows PV drivers
> >> cope,
> >>>> but I don't really understand the comment since this patch is
> actually
> >>>> removing a case where the backend transitions directly to closed.
> >>>>
> >>>> I'm not speaking of blkfront/blkback only, but of net, tpm, scsi,
> >> pvcall
> >>>> etc. frontends/backends. After all you are modifying a function
> common
> >>>> to all PV driver pairs.
> >>>>
> >>>> You are removing a state switc to "closed" in case the state was
> _not_
> >>>> "closing" before.
> >>>
> >>> Yes, which AFAIK is against the intention of the generic PV protocol
> >> such that it ever existed anyway.
> >>
> >> While this might be the case we should _not_ break any guests
> >> running now. So this kind of reasoning is dangerous.
> >>
> >>>
> >>>> So any PV driver reacting to "closed" of the other end
> >>>> in case the previous state might not have been "closing" before is at
> >>>> risk to misbehave with your patch.
> >>>
> >>> We

Re: [Xen-devel] [PATCH 3/4] xen/interface: don't discard pending work in FRONT/BACK_RING_ATTACH

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Jürgen Groß
> Sent: 09 December 2019 11:52
> To: Roger Pau Monné ; Durrant, Paul
> 
> Cc: xen-devel@lists.xenproject.org; Boris Ostrovsky
> ; Stefano Stabellini ;
> linux-ker...@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH 3/4] xen/interface: don't discard pending
> work in FRONT/BACK_RING_ATTACH
> 
> On 09.12.19 12:41, Roger Pau Monné wrote:
> > On Thu, Dec 05, 2019 at 02:01:22PM +, Paul Durrant wrote:
> >> Currently these macros will skip over any requests/responses that are
> >> added to the shared ring whilst it is detached. This, in general, is
> not
> >> a desirable semantic since most frontend implementations will
> eventually
> >> block waiting for a response which would either never appear or never
> be
> >> processed.
> >>
> >> NOTE: These macros are currently unused. BACK_RING_ATTACH(), however,
> will
> >>be used in a subsequent patch.
> >>
> >> Signed-off-by: Paul Durrant 
> >
> > Those headers come from Xen, and should be modified in Xen first and
> > then imported into Linux IMO.
> 
> In theory, yes. But the Xen variant doesn't contain the ATTACH macros.
> 

OOI do we have a policy about this? Re-importing headers into Linux wholesale 
is always slightly painful because of interdependencies and style checking 
issues.

  Paul

> 
> Juergen
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 11:55
> To: Roger Pau Monné ; Durrant, Paul
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 12:39, Roger Pau Monné wrote:
> > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> >> Only force state to closed in the case when the toolstack may need to
> >> clean up. This can be detected by checking whether the state in
> xenstore
> >> has been set to closing prior to device removal.
> >
> > I'm not sure I see the point of this, I would expect that a failure to
> > probe or the removal of the device would leave the xenbus state as
> > closed, which is consistent with the actual driver state.
> >
> > Can you explain what's the benefit of leaving a device without a
> > driver in such unknown state?
> 
> And more concerning: did you check that no frontend/backend is
> relying on the closed state to be visible without closing having been
> set before?

Blkfront doesn't seem to mind and I believe the Windows PV drivers cope, but I 
don't really understand the comment since this patch is actually removing a 
case where the backend transitions directly to closed.

  Paul

> 
> 
> Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 13:39
> To: Durrant, Paul ; Roger Pau Monné
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 13:19, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 12:09
> >> To: Durrant, Paul ; Roger Pau Monné
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced
> to
> >> closed
> >>
> >> On 09.12.19 13:03, Durrant, Paul wrote:
> >>>> -Original Message-
> >>>> From: Jürgen Groß 
> >>>> Sent: 09 December 2019 11:55
> >>>> To: Roger Pau Monné ; Durrant, Paul
> >>>> 
> >>>> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >> Stefano
> >>>> Stabellini ; Boris Ostrovsky
> >>>> 
> >>>> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced
> >> to
> >>>> closed
> >>>>
> >>>> On 09.12.19 12:39, Roger Pau Monné wrote:
> >>>>> On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> >>>>>> Only force state to closed in the case when the toolstack may need
> to
> >>>>>> clean up. This can be detected by checking whether the state in
> >>>> xenstore
> >>>>>> has been set to closing prior to device removal.
> >>>>>
> >>>>> I'm not sure I see the point of this, I would expect that a failure
> to
> >>>>> probe or the removal of the device would leave the xenbus state as
> >>>>> closed, which is consistent with the actual driver state.
> >>>>>
> >>>>> Can you explain what's the benefit of leaving a device without a
> >>>>> driver in such unknown state?
> >>>>
> >>>> And more concerning: did you check that no frontend/backend is
> >>>> relying on the closed state to be visible without closing having been
> >>>> set before?
> >>>
> >>> Blkfront doesn't seem to mind and I believe the Windows PV drivers
> cope,
> >> but I don't really understand the comment since this patch is actually
> >> removing a case where the backend transitions directly to closed.
> >>
> >> I'm not speaking of blkfront/blkback only, but of net, tpm, scsi,
> pvcall
> >> etc. frontends/backends. After all you are modifying a function common
> >> to all PV driver pairs.
> >>
> >> You are removing a state switc to "closed" in case the state was _not_
> >> "closing" before.
> >
> > Yes, which AFAIK is against the intention of the generic PV protocol
> such that it ever existed anyway.
> 
> While this might be the case we should _not_ break any guests
> running now. So this kind of reasoning is dangerous.
> 
> >
> >> So any PV driver reacting to "closed" of the other end
> >> in case the previous state might not have been "closing" before is at
> >> risk to misbehave with your patch.
> >
> > Well, they will see nothing now. If the state was not closing, it gets
> left alone, so the frontend shouldn't do anything. The only risk that I
> can see is that some frontend/backend pair needed a direct 4 -> 6
> transition to support 'unbind' before but AFAIK nothing has ever supported
> that, and blk and net crash'n'burn if you try that on upstream as it
> stands. A clean unplug would always set state to 5 first, since that's
> part of the unplug protocol.
> 
> That was my question: are you sure all current and previous
> guest frontends and backends are handling unplug this way?
> 
> Not "should handle", but "do handle".

That depends on the toolstack. IIUC the only 'supported' toolstack is xl/libxl, 
which will set 'state' to 5 and 'online' to 0 to initiate an unplug.

  Paul

> 
> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 14:41
> To: Durrant, Paul ; Roger Pau Monné
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 15:23, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 14:10
> >> To: Durrant, Paul ; Roger Pau Monné
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced
> to
> >> closed
> >>
> >> On 09.12.19 15:06, Durrant, Paul wrote:
> >>>> -Original Message-
> >>>> From: Jürgen Groß 
> >>>> Sent: 09 December 2019 13:39
> >>>> To: Durrant, Paul ; Roger Pau Monné
> >>>> 
> >>>> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >> Stefano
> >>>> Stabellini ; Boris Ostrovsky
> >>>> 
> >>>> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced
> >> to
> >>>> closed
> >>>>
> >>>> On 09.12.19 13:19, Durrant, Paul wrote:
> >>>>>> -Original Message-
> >>>>>> From: Jürgen Groß 
> >>>>>> Sent: 09 December 2019 12:09
> >>>>>> To: Durrant, Paul ; Roger Pau Monné
> >>>>>> 
> >>>>>> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >>>> Stefano
> >>>>>> Stabellini ; Boris Ostrovsky
> >>>>>> 
> >>>>>> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> >> forced
> >>>> to
> >>>>>> closed
> >>>>>>
> >>>>>> On 09.12.19 13:03, Durrant, Paul wrote:
> >>>>>>>> -Original Message-
> >>>>>>>> From: Jürgen Groß 
> >>>>>>>> Sent: 09 December 2019 11:55
> >>>>>>>> To: Roger Pau Monné ; Durrant, Paul
> >>>>>>>> 
> >>>>>>>> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> >>>>>> Stefano
> >>>>>>>> Stabellini ; Boris Ostrovsky
> >>>>>>>> 
> >>>>>>>> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> >>>> forced
> >>>>>> to
> >>>>>>>> closed
> >>>>>>>>
> >>>>>>>> On 09.12.19 12:39, Roger Pau Monné wrote:
> >>>>>>>>> On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> >>>>>>>>>> Only force state to closed in the case when the toolstack may
> >> need
> >>>> to
> >>>>>>>>>> clean up. This can be detected by checking whether the state in
> >>>>>>>> xenstore
> >>>>>>>>>> has been set to closing prior to device removal.
> >>>>>>>>>
> >>>>>>>>> I'm not sure I see the point of this, I would expect that a
> >> failure
> >>>> to
> >>>>>>>>> probe or the removal of the device would leave the xenbus state
> as
> >>>>>>>>> closed, which is consistent with the actual driver state.
> >>>>>>>>>
> >>>>>>>>> Can you explain what's the benefit of leaving a device without a
> >>>>>>>>> driver in such unknown state?
> >>>>>>>>
> >>>>>>>> And more concerning: did you check that no frontend/backend is
> >>>>>>>> relying on the closed state to be visible without closing having
> >> been
> >>>>>>>> set before?
> >>>>>>>
> >>>>>>> Blkfront doesn't seem to mind and I believe the Windows PV drivers
> >>>> cope,
> >>>>>> but I don't really understand the comment since this patch is
> >> actually
> >>>>>> removing a case where the backend transitions directly

Re: [Xen-devel] [PATCH 1/4] xenbus: move xenbus_dev_shutdown() into frontend code...

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 11:34
> To: Durrant, Paul ; linux-ker...@vger.kernel.org;
> xen-devel@lists.xenproject.org
> Cc: Boris Ostrovsky ; Stefano Stabellini
> 
> Subject: Re: [PATCH 1/4] xenbus: move xenbus_dev_shutdown() into frontend
> code...
> 
> On 05.12.19 15:01, Paul Durrant wrote:
> > ...and make it static
> >
> > xenbus_dev_shutdown() is seemingly intended to cause clean shutdown of
> PV
> > frontends when a guest is rebooted. Indeed the function waits for a
> > conpletion which is only set by a call to xenbus_frontend_closed().
> >
> > This patch removes the shutdown() method from backends and moves
> > xenbus_dev_shutdown() from xenbus_probe.c into xenbus_probe_frontend.c,
> > renaming it appropriately and making it static.
> 
> Is this a good move considering driver domains?

I don't think it can have ever worked properly for driver domains, and with the 
rest of the patches a backend should be able go away and return unannounced (as 
long as the domain id is kept... for which patches need to be upstreamed into 
Xen).

> 
> At least I'd expect the commit message addressing the expected behavior
> with rebooting a driver domain and why this patch isn't making things
> worse.
> 

For a clean reboot I'd expect the toolstack to shut down the protocol before 
rebooting the driver domain, so the backend shutdown method is moot. And I 
don't believe re-startable driver domains were something that ever made it into 
support (because of the non-persistent domid problem). I can add something to 
the commit comment to that effect if you'd like.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 12:09
> To: Durrant, Paul ; Roger Pau Monné
> 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On 09.12.19 13:03, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 09 December 2019 11:55
> >> To: Roger Pau Monné ; Durrant, Paul
> >> 
> >> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Stefano
> >> Stabellini ; Boris Ostrovsky
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced
> to
> >> closed
> >>
> >> On 09.12.19 12:39, Roger Pau Monné wrote:
> >>> On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> >>>> Only force state to closed in the case when the toolstack may need to
> >>>> clean up. This can be detected by checking whether the state in
> >> xenstore
> >>>> has been set to closing prior to device removal.
> >>>
> >>> I'm not sure I see the point of this, I would expect that a failure to
> >>> probe or the removal of the device would leave the xenbus state as
> >>> closed, which is consistent with the actual driver state.
> >>>
> >>> Can you explain what's the benefit of leaving a device without a
> >>> driver in such unknown state?
> >>
> >> And more concerning: did you check that no frontend/backend is
> >> relying on the closed state to be visible without closing having been
> >> set before?
> >
> > Blkfront doesn't seem to mind and I believe the Windows PV drivers cope,
> but I don't really understand the comment since this patch is actually
> removing a case where the backend transitions directly to closed.
> 
> I'm not speaking of blkfront/blkback only, but of net, tpm, scsi, pvcall
> etc. frontends/backends. After all you are modifying a function common
> to all PV driver pairs.
> 
> You are removing a state switc to "closed" in case the state was _not_
> "closing" before.

Yes, which AFAIK is against the intention of the generic PV protocol such that 
it ever existed anyway.

> So any PV driver reacting to "closed" of the other end
> in case the previous state might not have been "closing" before is at
> risk to misbehave with your patch.

Well, they will see nothing now. If the state was not closing, it gets left 
alone, so the frontend shouldn't do anything. The only risk that I can see is 
that some frontend/backend pair needed a direct 4 -> 6 transition to support 
'unbind' before but AFAIK nothing has ever supported that, and blk and net 
crash'n'burn if you try that on upstream as it stands. A clean unplug would 
always set state to 5 first, since that's part of the unplug protocol.

  Paul

> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 12:26
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> Gross ; Stefano Stabellini ;
> Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On Mon, Dec 09, 2019 at 12:01:38PM +, Durrant, Paul wrote:
> > > -Original Message-
> > > From: Roger Pau Monné 
> > > Sent: 09 December 2019 11:39
> > > To: Durrant, Paul 
> > > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Juergen
> > > Gross ; Stefano Stabellini ;
> > > Boris Ostrovsky 
> > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced to
> > > closed
> > >
> > > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> > > > Only force state to closed in the case when the toolstack may need
> to
> > > > clean up. This can be detected by checking whether the state in
> xenstore
> > > > has been set to closing prior to device removal.
> > >
> > > I'm not sure I see the point of this, I would expect that a failure to
> > > probe or the removal of the device would leave the xenbus state as
> > > closed, which is consistent with the actual driver state.
> > >
> > > Can you explain what's the benefit of leaving a device without a
> > > driver in such unknown state?
> > >
> >
> > If probe fails then I think it should leave the state alone. If the
> > state is moved to closed then basically you just killed that
> > connection to the guest (as the frontend will normally close down
> > when it sees this change) so, if the probe failure was due to a bug
> > in blkback or, e.g., a transient resource issue then it's game over
> > as far as that guest goes.
> 
> But the connection can be restarted by switching the backend to the
> init state again.

Too late. The frontend saw closed and you already lost.

> 
> > The ultimate goal here is PV backend re-load that is completely
> transparent to the guest. Modifying anything in xenstore compromises that
> so we need to be careful.
> 
> That's a fine goal, but not switching to closed state in
> xenbus_dev_remove seems wrong, as you have actually left the frontend
> without a matching backend and with the state not set to closed.
> 

Why is this a problem? With this series fully applied a (block) backend can 
come and go without needing to change the state. Relying on guests to DTRT is 
not a sustainable option for a cloud deployment.

> Ie: that would be fine if you explicitly state this is some kind of
> internal blkback reload, but not for the general case where blkback
> has been unbound. I think we need someway to difference a blkback
> reload vs a unbound.
> 

Why do we need that though? Why is it advantageous for a backend to go to 
closed. No PV backends cope with an unbind as-is, and a toolstack initiated 
unplug will always set state to 5 anyway. So TBH any state transition done 
directly in the xenbus code looks wrong to me anyway (but appears to be a 
necessary evil to keep the toolstack working in the event it spawns a backend 
where there is actually to driver present, or it doesn't come online).

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] xen-blkback: support dynamic unbind/bind

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 13:58
> To: Durrant, Paul ; linux-ker...@vger.kernel.org;
> xen-devel@lists.xenproject.org
> Cc: Konrad Rzeszutek Wilk ; Roger Pau Monné
> ; Jens Axboe ; Boris Ostrovsky
> ; Stefano Stabellini 
> Subject: Re: [PATCH 4/4] xen-blkback: support dynamic unbind/bind
> 
> On 05.12.19 15:01, Paul Durrant wrote:
> > By simply re-attaching to shared rings during connect_ring() rather than
> > assuming they are freshly allocated (i.e assuming the counters are zero)
> > it is possible for vbd instances to be unbound and re-bound from and to
> > (respectively) a running guest.
> >
> > This has been tested by running:
> >
> > while true; do dd if=/dev/urandom of=test.img bs=1M count=1024; done
> >
> > in a PV guest whilst running:
> >
> > while true;
> >do echo vbd-$DOMID-$VBD >unbind;
> >echo unbound;
> >sleep 5;
> >echo vbd-$DOMID-$VBD >bind;
> >echo bound;
> >sleep 3;
> >done
> >
> > in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and
> > re-bind its system disk image.
> 
> Could you do the same test with mixed reads/writes and verification of
> the read/written data, please? A write-only test is not _that_
> convincing regarding correctness. It only proves the guest is not
> crashing.

Sure. I'll find something that will verify content.

> 
> I'm fine with the general approach, though.
> 

Cool, thanks,

  Paul

> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 14:29
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> Gross ; Stefano Stabellini ;
> Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On Mon, Dec 09, 2019 at 12:40:47PM +, Durrant, Paul wrote:
> > > -Original Message-
> > > From: Roger Pau Monné 
> > > Sent: 09 December 2019 12:26
> > > To: Durrant, Paul 
> > > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> Juergen
> > > Gross ; Stefano Stabellini ;
> > > Boris Ostrovsky 
> > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced to
> > > closed
> > >
> > > On Mon, Dec 09, 2019 at 12:01:38PM +, Durrant, Paul wrote:
> > > > > -Original Message-
> > > > > From: Roger Pau Monné 
> > > > > Sent: 09 December 2019 11:39
> > > > > To: Durrant, Paul 
> > > > > Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org;
> > > Juergen
> > > > > Gross ; Stefano Stabellini
> ;
> > > > > Boris Ostrovsky 
> > > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> > > forced to
> > > > > closed
> > > > >
> > > > > On Thu, Dec 05, 2019 at 02:01:21PM +, Paul Durrant wrote:
> > > > > > Only force state to closed in the case when the toolstack may
> need
> > > to
> > > > > > clean up. This can be detected by checking whether the state in
> > > xenstore
> > > > > > has been set to closing prior to device removal.
> > > > >
> > > > > I'm not sure I see the point of this, I would expect that a
> failure to
> > > > > probe or the removal of the device would leave the xenbus state as
> > > > > closed, which is consistent with the actual driver state.
> > > > >
> > > > > Can you explain what's the benefit of leaving a device without a
> > > > > driver in such unknown state?
> > > > >
> > > >
> > > > If probe fails then I think it should leave the state alone. If the
> > > > state is moved to closed then basically you just killed that
> > > > connection to the guest (as the frontend will normally close down
> > > > when it sees this change) so, if the probe failure was due to a bug
> > > > in blkback or, e.g., a transient resource issue then it's game over
> > > > as far as that guest goes.
> > >
> > > But the connection can be restarted by switching the backend to the
> > > init state again.
> >
> > Too late. The frontend saw closed and you already lost.
> >
> > >
> > > > The ultimate goal here is PV backend re-load that is completely
> > > transparent to the guest. Modifying anything in xenstore compromises
> that
> > > so we need to be careful.
> > >
> > > That's a fine goal, but not switching to closed state in
> > > xenbus_dev_remove seems wrong, as you have actually left the frontend
> > > without a matching backend and with the state not set to closed.
> > >
> >
> > Why is this a problem? With this series fully applied a (block) backend
> can come and go without needing to change the state. Relying on guests to
> DTRT is not a sustainable option for a cloud deployment.
> >
> > > Ie: that would be fine if you explicitly state this is some kind of
> > > internal blkback reload, but not for the general case where blkback
> > > has been unbound. I think we need someway to difference a blkback
> > > reload vs a unbound.
> > >
> >
> > Why do we need that though? Why is it advantageous for a backend to go
> to closed. No PV backends cope with an unbind as-is, and a toolstack
> initiated unplug will always set state to 5 anyway. So TBH any state
> transition done directly in the xenbus code looks wrong to me anyway (but
> appears to be a necessary evil to keep the toolstack working in the event
> it spawns a backend where there is actually to driver present, or it
> doesn't come online).
> 
> IMO the normal flow for unbind would be to attempt to close open
> connections and then remove the driver: leaving frontends connected
> without any attached backends is not correct, and will just block the
> guest frontend until requests start timing out.
> 
> I can see

Re: [Xen-devel] [PATCH 3/4] xen/interface: don't discard pending work in FRONT/BACK_RING_ATTACH

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 09 December 2019 13:55
> To: Durrant, Paul ; linux-ker...@vger.kernel.org;
> xen-devel@lists.xenproject.org
> Cc: Boris Ostrovsky ; Stefano Stabellini
> 
> Subject: Re: [PATCH 3/4] xen/interface: don't discard pending work in
> FRONT/BACK_RING_ATTACH
> 
> On 05.12.19 15:01, Paul Durrant wrote:
> > Currently these macros will skip over any requests/responses that are
> > added to the shared ring whilst it is detached. This, in general, is not
> > a desirable semantic since most frontend implementations will eventually
> > block waiting for a response which would either never appear or never be
> > processed.
> >
> > NOTE: These macros are currently unused. BACK_RING_ATTACH(), however,
> will
> >be used in a subsequent patch.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Boris Ostrovsky 
> > Cc: Juergen Gross 
> > Cc: Stefano Stabellini 
> > ---
> >   include/xen/interface/io/ring.h | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/xen/interface/io/ring.h
> b/include/xen/interface/io/ring.h
> > index 3f40501fc60b..405adfed87e6 100644
> > --- a/include/xen/interface/io/ring.h
> > +++ b/include/xen/interface/io/ring.h
> > @@ -143,14 +143,14 @@ struct __name##_back_ring {
>   \
> >   #define FRONT_RING_ATTACH(_r, _s, __size) do {
> > \
> >   (_r)->sring = (_s);   
> > \
> >   (_r)->req_prod_pvt = (_s)->req_prod;  \
> > -(_r)->rsp_cons = (_s)->rsp_prod;   
> > \
> > +(_r)->rsp_cons = (_s)->req_prod;   
> > \
> >   (_r)->nr_ents = __RING_SIZE(_s, __size);  
> > \
> >   } while (0)
> >
> >   #define BACK_RING_ATTACH(_r, _s, __size) do { 
> > \
> >   (_r)->sring = (_s);   
> > \
> >   (_r)->rsp_prod_pvt = (_s)->rsp_prod;  \
> > -(_r)->req_cons = (_s)->req_prod;   
> > \
> > +(_r)->req_cons = (_s)->rsp_prod;   
> > \
> >   (_r)->nr_ents = __RING_SIZE(_s, __size);  
> > \
> >   } while (0)
> 
> Lets look at all possible scenarios where BACK_RING_ATTACH()
> might happen:
> 
> Initially (after [FRONT|BACK]_RING_INIT(), leaving _pvt away):
> req_prod=0, rsp_cons=0, rsp_prod=0, req_cons=0
> Using BACK_RING_ATTACH() is fine (no change)
> 
> Request queued:
> req_prod=1, rsp_cons=0, rsp_prod=0, req_cons=0
> Using BACK_RING_ATTACH() is fine (no change)
> 
> and taken by backend:
> req_prod=1, rsp_cons=0, rsp_prod=0, req_cons=1
> Using BACK_RING_ATTACH() is resetting req_cons to 0, will result
> in redoing request (for blk this is fine, other devices like SCSI
> tapes will have issues with that). One possible solution would be
> to ensure all taken requests are either stopped or the response
> is queued already.

Yes, it is the assumption that a backend will drain and complete any requests 
it is handling, but it will not deal with new ones being posted by the 
frontend. This does appear to be the case for blkback.

> 
> Response queued:
> req_prod=1, rsp_cons=0, rsp_prod=1, req_cons=1
> Using BACK_RING_ATTACH() is fine (no change)
> 
> Response taken:
> req_prod=1, rsp_cons=1, rsp_prod=1, req_cons=1
> Using BACK_RING_ATTACH() is fine (no change)
> 
> In general I believe the [FRONT|BACK]_RING_ATTACH() macros are not
> fine to be used in the current state, as the *_pvt fields normally not
> accessible by the other end are initialized using the (possibly
> untrusted) values from the shared ring. There needs at least to be a
> test for the values to be sane, and your change should not result in the
> same value to be read twice, as it could have changed in between.

What test would you apply to sanitize the value of the pvt pointer? Another 
option would be to have a backend write its pvt value into the xenstore backend 
area when the ring is unmapped, so that a new instance definitely resumes where 
the old one left off. The value of rsp_prod could, of course, be overwritten by 
the guest at any time and so there's little point in attempting sanitize it.

> 
> As this is an error which can happen in other OS's, too, I'd recommend
> to add the adapted macros (plus a comment regarding the possible
> problem noted above for special devices like tapes) to the Xen variant
> of ring.h.
> 

I can certainly send a patch to Xen once we agree on the final definition.

  Paul

> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 09 December 2019 17:18
> To: Durrant, Paul 
> Cc: linux-ker...@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen
> Gross ; Stefano Stabellini ;
> Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
> 
> On Mon, Dec 09, 2019 at 04:26:15PM +, Durrant, Paul wrote:
> > > > If you want unbind to actually do a proper unplug then that's extra
> work
> > > and not really something I want to tackle (and re-bind would still
> need to
> > > be toolstack initiated as something would have to re-create the
> xenstore
> > > area).
> > >
> > > Why do you say the xenstore area would need to be recreated?
> > >
> > > Setting state to closed shouldn't cause any cleanup of the xenstore
> > > area, as that should already happen for example when using pvgrub
> > > since in that case grub itself disconnects and already causes a
> > > transition to closed and a re-attachment afterwards by the guest
> > > kernel.
> > >
> >
> > For some reason, when I originally tested, the xenstore area
> disappeared. I checked again and it did not this time. I just ended up
> with a frontend stuck in state 5 (because it is the system disk and won't
> go offline) trying to talk to a non-existent backend. Upon re-bind the
> backend goes into state 5 (because it sees the 5 in the frontend) and
> leaves the guest wedged.
> 
> Likely blkfront should go back to init state, but anyway, that's not
> something that needs fixing as part of this series.
> 

Ok, cool.

I am wondering though whether we ought to suppress bind/unbind for drivers that 
don't whitelist themselves (through the new xenbus_driver flag that I'll add). 
It's somewhat misleading that the nodes are there but don't necessarily work.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

2019-12-09 Thread Durrant, Paul
> -Original Message-
[snip]
> >
> > Well unbind is pretty useless now IMO since bind doesn't work, and a
> transition straight to closed is just plain wrong anyway.
> 
> Why do you claim that a straight transition into the closed state is
> wrong?

It's badly documented, I agree, but have a look at 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/xen-netback/xenbus.c#n480.
 Connected -> Closed is not a valid transition, and I don't think it was ever 
intended to be.

> 
> I don't see any such mention in blkif.h, which also doesn't contain
> any guidelines regarding closing state transitions, so unless
> otherwise stated somewhere else transitions into closed can happen
> from any state IMO.
> 

They can, but it is even more poorly documented what should be done in this 
case.

> > But, we could have a flag that the backend driver sets to say that it
> supports transparent re-bind that gates this code. Would that make you
> feel more comfortable?
> 
> Having an option to leave state untouched when unbinding would be fine
> for me, otherwise state should be set to closed when unbinding. I
> don't think there's anything else that needs to be done in this
> regard, the cleanup should be exactly the same the only difference
> being the setting of all the active backends to closed state.
> 

Ok, I'll add such a flag and define it for blkback only, in patch #4 i.e. when 
it actually gains the ability to rebind.

> > If you want unbind to actually do a proper unplug then that's extra work
> and not really something I want to tackle (and re-bind would still need to
> be toolstack initiated as something would have to re-create the xenstore
> area).
> 
> Why do you say the xenstore area would need to be recreated?
> 
> Setting state to closed shouldn't cause any cleanup of the xenstore
> area, as that should already happen for example when using pvgrub
> since in that case grub itself disconnects and already causes a
> transition to closed and a re-attachment afterwards by the guest
> kernel.
> 

For some reason, when I originally tested, the xenstore area disappeared. I 
checked again and it did not this time. I just ended up with a frontend stuck 
in state 5 (because it is the system disk and won't go offline) trying to talk 
to a non-existent backend. Upon re-bind the backend goes into state 5 (because 
it sees the 5 in the frontend) and leaves the guest wedged.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86 / iommu: set up a scratch page in the quarantine domain

2019-12-10 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 10 December 2019 09:07
> To: Jan Beulich 
> Cc: Tian, Kevin ; Durrant, Paul
> ; Andrew Cooper ; xen-
> de...@lists.xenproject.org; Roger Pau Monné ; Wei
> Liu 
> Subject: Re: [PATCH v2] x86 / iommu: set up a scratch page in the
> quarantine domain
> 
> On 10.12.19 09:57, Jan Beulich wrote:
> > On 10.12.2019 09:12, Jürgen Groß wrote:
> >> On 10.12.19 09:05, Jan Beulich wrote:
> >>> On 10.12.2019 08:16, Tian, Kevin wrote:
> >>>> While the quarantine idea sounds good overall, I'm still not
> convinced
> >>>> to have it the only way in place just for handling some known-buggy
> >>>> device. It kills the possibility of identifying a new buggy device
> and then
> >>>> deciding not to use it in the first space... I thought about whether
> it
> >>>> will get better when future IOMMU implements A/D bit - by checking
> >>>> access bit being set then we'll know some buggy device exists, but,
> >>>> the scratch page is shared by all devices then we cannot rely on this
> >>>> feature to find out the actual buggy one.
> >>>
> >>> Thinking about it - yes, I think I agree. This (as with so many
> >>> workarounds) would better be an off-by-default one. The main issue
> >>> I understand this would have is that buggy systems then might hang
> >>> without even having managed to get a log message out - Paul?
> >>>
> >>> Jürgen - would you be amenable to an almost last minute refinement
> >>> here (would then also need to still be backported to 4.12.2, or
> >>> the original backport reverted, to avoid giving the impression of
> >>> a regression)?
> >>
> >> So what is your suggestion here? To have a boot option (defaulting to
> >> off) for enabling the scratch page?
> >
> > Yes (and despite having seen Paul's reply).
> 
> I'd release ack such a patch in case you come to an agreement regarding
> the default soon.
> 

Ok. The default is not that crucial. Perhaps it's just me who thinks defaults 
should be chosen on the basis of being most likely to result in a working 
system.

  Paul

> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86 / iommu: set up a scratch page in the quarantine domain

2019-12-10 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 10 December 2019 09:45
> To: Durrant, Paul 
> Cc: Jürgen Groß ; Tian, Kevin ;
> Andrew Cooper ; xen-devel@lists.xenproject.org;
> Roger Pau Monné ; Wei Liu 
> Subject: Re: [PATCH v2] x86 / iommu: set up a scratch page in the
> quarantine domain
> 
> On 10.12.2019 10:16, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 10 December 2019 09:07
> >> To: Jan Beulich 
> >> Cc: Tian, Kevin ; Durrant, Paul
> >> ; Andrew Cooper ; xen-
> >> de...@lists.xenproject.org; Roger Pau Monné ; Wei
> >> Liu 
> >> Subject: Re: [PATCH v2] x86 / iommu: set up a scratch page in the
> >> quarantine domain
> >>
> >> On 10.12.19 09:57, Jan Beulich wrote:
> >>> On 10.12.2019 09:12, Jürgen Groß wrote:
> >>>> On 10.12.19 09:05, Jan Beulich wrote:
> >>>>> On 10.12.2019 08:16, Tian, Kevin wrote:
> >>>>>> While the quarantine idea sounds good overall, I'm still not
> >> convinced
> >>>>>> to have it the only way in place just for handling some known-buggy
> >>>>>> device. It kills the possibility of identifying a new buggy device
> >> and then
> >>>>>> deciding not to use it in the first space... I thought about
> whether
> >> it
> >>>>>> will get better when future IOMMU implements A/D bit - by checking
> >>>>>> access bit being set then we'll know some buggy device exists, but,
> >>>>>> the scratch page is shared by all devices then we cannot rely on
> this
> >>>>>> feature to find out the actual buggy one.
> >>>>>
> >>>>> Thinking about it - yes, I think I agree. This (as with so many
> >>>>> workarounds) would better be an off-by-default one. The main issue
> >>>>> I understand this would have is that buggy systems then might hang
> >>>>> without even having managed to get a log message out - Paul?
> >>>>>
> >>>>> Jürgen - would you be amenable to an almost last minute refinement
> >>>>> here (would then also need to still be backported to 4.12.2, or
> >>>>> the original backport reverted, to avoid giving the impression of
> >>>>> a regression)?
> >>>>
> >>>> So what is your suggestion here? To have a boot option (defaulting to
> >>>> off) for enabling the scratch page?
> >>>
> >>> Yes (and despite having seen Paul's reply).
> >>
> >> I'd release ack such a patch in case you come to an agreement regarding
> >> the default soon.
> >>
> >
> > Ok. The default is not that crucial. Perhaps it's just me who thinks
> > defaults should be chosen on the basis of being most likely to result
> > in a working system.
> 
> If it wasn't for quirky hardware (or firmware to cover the general case,
> in particular to avoid getting quoted on this wrt my position on EFI
> workarounds), I'd agree. But personally I think Kevin's point takes
> priority here: Admins should at least be aware of running quirky
> hardware, and hence I'd prefer the default to be logging of faults
> rather than their silencing. Documentation of the new (sub-)option may
> give suitable hints, and we may even go as far as providing a Kconfig
> option for the default to be chosen at build time.
> 
> Main question now is - who's going to make a patch? Will you? Should I?
> 

I'm happy to do it, but it would probably be more expedient if you did.

  Paul

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced to closed

2019-12-11 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 11 December 2019 10:06
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; linux-ker...@vger.kernel.org; Juergen
> Gross ; Stefano Stabellini ;
> Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced
> to closed
> 
> On Tue, Dec 10, 2019 at 11:33:45AM +, Paul Durrant wrote:
> > If a driver probe() fails then leave the xenstore state alone. There is
> no
> > reason to modify it as the failure may be due to transient resource
> > allocation issues and hence a subsequent probe() may succeed.
> >
> > If the driver supports re-binding then only force state to closed during
> > remove() only in the case when the toolstack may need to clean up. This
> can
> > be detected by checking whether the state in xenstore has been set to
> > closing prior to device removal.
> >
> > NOTE: Re-bind support is indicated by new boolean in struct
> xenbus_driver,
> >   which defaults to false. Subsequent patches will add support to
> >   some backend drivers.
> 
> My intention was to specify whether you want to close the
> backends on unbind in sysfs, so that an user can decide at runtime,
> rather than having a hardcoded value in the driver.
> 
> Anyway, I'm less sure whether such runtime tunable is useful at all,
> so let's leave it out and can always be added afterwards. At the end
> of day a user wrongly doing a rmmod blkback can always recover
> gracefully by loading blkback again with your proposed approach to
> leave connections open on module removal.
> 
> Sorry for the extra work.
> 

Does this mean you don't think the extra driver flag is necessary any more? NB: 
now that xenbus actually takes module references you can't accidentally rmmod 
any more :-)

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 3/4] xen/interface: re-define FRONT/BACK_RING_ATTACH()

2019-12-13 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 13 December 2019 09:00
> To: Durrant, Paul ; xen-devel@lists.xenproject.org;
> linux-bl...@vger.kernel.org; linux-ker...@vger.kernel.org
> Cc: Boris Ostrovsky ; Stefano Stabellini
> 
> Subject: Re: [PATCH v3 3/4] xen/interface: re-define
> FRONT/BACK_RING_ATTACH()
> 
> On 12.12.19 07:04, Jürgen Groß wrote:
> > On 11.12.19 16:29, Paul Durrant wrote:
> >> Currently these macros are defined to re-initialize a front/back ring
> >> (respectively) to values read from the shared ring in such a way that
> any
> >> requests/responses that are added to the shared ring whilst the
> >> front/back
> >> is detached will be skipped over. This, in general, is not a desirable
> >> semantic since most frontend implementations will eventually block
> >> waiting
> >> for a response which would either never appear or never be processed.
> >>
> >> Since the macros are currently unused, take this opportunity to re-
> define
> >> them to re-initialize a front/back ring using specified values. This
> also
> >> allows FRONT/BACK_RING_INIT() to be re-defined in terms of
> >> FRONT/BACK_RING_ATTACH() using a specified value of 0.
> >>
> >> NOTE: BACK_RING_ATTACH() will be used directly in a subsequent patch.
> >>
> >> Signed-off-by: Paul Durrant 
> >
> > Reviewed-by: Juergen Gross 
> 
> Paul, I think you should send a patch changing ring.h in the Xen tree.
> 
> As soon as it has been accepted I'll take your series for the kernel.
> 

Ok. I was waiting for a push so that I could cite the commit hash but I'll prep 
something now instead.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] public/io/netif.h: document a mechanism to advertise carrier state

2019-12-13 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 13 December 2019 14:17
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Konrad Rzeszutek Wilk 
> Subject: Re: [PATCH] public/io/netif.h: document a mechanism to advertise
> carrier state
> 
> On 13.12.19 14:03, Paul Durrant wrote:
> > This patch adds a specification for a 'carrier' node in xenstore to
> allow
> > a backend to notify a frontend of it's virtual carrier/link state. E.g.
> > a backend that is unable to forward packets from the guest because it is
> > not attached to a bridge may wish to advertise 'no carrier'.
> >
> > NOTE: This is purely a documentation patch. No functional change.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: Juergen Gross 
> > ---
> >   xen/include/public/io/netif.h | 14 ++
> >   1 file changed, 14 insertions(+)
> >
> > diff --git a/xen/include/public/io/netif.h
> b/xen/include/public/io/netif.h
> > index 2454448baa..e587055f68 100644
> > --- a/xen/include/public/io/netif.h
> > +++ b/xen/include/public/io/netif.h
> > @@ -190,6 +190,20 @@
> >* order as requests.
> >*/
> >
> > +/*
> > + * Link state
> > + * ==
> > + *
> > + * The backend can advertise it is current link (carrier) state to the
> 
> s/it is/its/ ?
> 

Oh yes.

> > + * frontend using the /local/domain/X/backend///carrier
> node.
> 
> Hmm, I just realized that the other mentioned backend path in this file
> is wrong, it should be: /local/domain/X/backend/vif///...
> 
> Mind correcting that in your patch, too?
> 

Sure.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-4.13-testing test] 144736: regressions - FAIL

2019-12-13 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Julien Grall
> Sent: 13 December 2019 15:37
> To: Ian Jackson 
> Cc: Jürgen Groß ; xen-devel@lists.xenproject.org; Stefano
> Stabellini ; osstest service owner  ad...@xenproject.org>; Anthony Perard 
> Subject: Re: [Xen-devel] [xen-4.13-testing test] 144736: regressions -
> FAIL
> 
> +Anthony
> 
> On 13/12/2019 11:40, Ian Jackson wrote:
> > Julien Grall writes ("Re: [Xen-devel] [xen-4.13-testing test] 144736:
> regressions - FAIL"):
> >> AMD Seattle boards (laxton*) are known to fail booting time to time
> >> because of PCI training issue. We have workaround for it (involving
> >> longer power cycle) but this is not 100% reliable.
> >
> > This wasn't a power cycle.  It was a software-initiated reboot.  It
> > does appear to hang in the firmware somewhere.  Do we expect the pci
> > training issue to occur in this case ?
> 
> The PCI training happens at every reset (including software). So I may
> have confused the workaround for firmware corruption with the PCI
> training. We definitely have a workfround for the former.
> 
> For the latter, I can't remember if we did use a new firmware or just
> hope it does not happen often.
> 
> I think we had a thread on infra@ about the workaround some times last
> year. Sadly this was sent on my Arm e-mail address and I didn't archive
> it before leaving :(. Can you have a look if you can find the thread?
> 
> >
>     test-armhf-armhf-xl-vhd  18 leak-check/check fail
> REGR.
>  vs. 144673
> >>>
> >>> That one is strange. A qemu process seems to have have died producing
> >>> a core file, but I couldn't find any log containing any other
> indication
> >>> of a crashed program.
> >>
> >> I haven't found anything interesting in the log. @Ian could you set up
> >> a repro for this?
> >
> > There is some heisenbug where qemu crashes with very low probability.
> > (I forget whether only on arm or on x86 too).  This has been around
> > for a little while.  I doubt this particular failure will be
> > reproducible.
> 
> I can't remember such bug been reported on Arm before. Anyway, I managed
> to get the stack trace from gdb:
> 
> Core was generated by `/usr/local/lib/xen/bin/qemu-system-i386
> -xen-domid 1 -chardev socket,id=libxl-c'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x006342be in xen_block_handle_requests (dataplane=0x108e600) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
> dir/hw/block/dataplane/xen-block.c:531
> 531
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
> dir/hw/block/dataplane/xen-block.c:
> No such file or directory.
> [Current thread is 1 (LWP 1987)]
> (gdb) bt
> #0  0x006342be in xen_block_handle_requests (dataplane=0x108e600) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
> dir/hw/block/dataplane/xen-block.c:531
> #1  0x0063447c in xen_block_dataplane_event (opaque=0x108e600) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
> dir/hw/block/dataplane/xen-block.c:626
> #2  0x008d005c in xen_device_poll (opaque=0x107a3b0) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/hw/xen/xen-
> bus.c:1077
> #3  0x00a4175c in run_poll_handlers_once (ctx=0x1079708,
> timeout=0xb1ba17f8) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-
> posix.c:520
> #4  0x00a41826 in run_poll_handlers (ctx=0x1079708, max_ns=8000,
> timeout=0xb1ba17f8) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-
> posix.c:562
> #5  0x00a41956 in try_poll_mode (ctx=0x1079708, timeout=0xb1ba17f8) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-
> posix.c:597
> #6  0x00a41a2c in aio_poll (ctx=0x1079708, blocking=true) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/aio-
> posix.c:639
> #7  0x0071dc16 in iothread_run (opaque=0x107d328) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-
> dir/iothread.c:75
> #8  0x00a44c80 in qemu_thread_start (args=0x1079538) at
> /home/osstest/build.144736.build-armhf/xen/tools/qemu-xen-dir/util/qemu-
> thread-posix.c:502
> #9  0xb67ae5d8 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> 
> This feels like a race condition between the init/free code with
> handler. Anthony, does it ring any bell?
> 

From that stack bt it looks like an iothread managed to run after the sring was 
NULLed. This should not be able happen as the dataplane should have been moved 
back onto QEMU's main thread context before the ring is unmapped.

  Paul

> Cheers,
> 
> --
> Julien Grall
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined devices optional

2019-12-13 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Jürgen Groß
> Sent: 13 December 2019 13:47
> To: Jan Beulich 
> Cc: Kevin Tian ; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Konrad Wilk ; George Dunlap
> ; Andrew Cooper ;
> Paul Durrant ; Ian Jackson ; xen-
> de...@lists.xenproject.org; Roger Pau Monné 
> Subject: Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of
> quarantined devices optional
> 
> On 13.12.19 14:38, Jan Beulich wrote:
> > On 13.12.2019 14:31, Jürgen Groß wrote:
> >> On 13.12.19 14:21, Jan Beulich wrote:
> >>> On 13.12.2019 14:11, Jürgen Groß wrote:
>  On 13.12.19 13:53, Jan Beulich wrote:
> > Containing still in flight DMA was introduced to work around certain
> > devices / systems hanging hard upon hitting an IOMMU fault. Passing
> > through (such) devices (on such systems) is inherently insecure (as
> > guests could easily arrange for IOMMU faults to occur). Defaulting
> to
> > a mode where admins may not even become aware of issues with devices
> can
> > be considered undesirable. Therefore convert this mode of operation
> to
> > an optional one, not one enabled by default.
> >
> > This involves resurrecting code commit ea38867831da ("x86 / iommu:
> set
> > up a scratch page in the quarantine domain") did remove, in a
> slightly
> > extended and abstracted fashion. Here, instead of reintroducing a
> pretty
> > pointless use of "goto" in domain_context_unmap(), and instead of
> making
> > the function (at least temporarily) inconsistent, take the
> opportunity
> > and replace the other similarly pointless "goto" as well.
> >
> > In order to key the re-instated bypasses off of there (not) being a
> root
> > page table this further requires moving the
> allocate_domain_resources()
> > invocation from reassign_device() to amd_iommu_setup_domain_device()
> (or
> > else reassign_device() would allocate a root page table anyway);
> this is
> > benign to the second caller of the latter function.
> >
> > Signed-off-by: Jan Beulich 
> > ---
> > As far as 4.13 is concerned, I guess if we can't come to an
> agreement
> > here, the only other option is to revert ea38867831da from the
> branch,
> > for having been committed prematurely (I'm not so much worried about
> the
> > master branch, where we have ample time until 4.14). What I surely
> want
> > to see us avoid is a back and forth in behavior of released
> versions.
> > (Note that 4.12.2 is similarly blocked on a decision either way
> here.)
> 
>  I'm not really sure we really need to revert ea38867831da before the
>  4.13 release. It might not be optimal, but I'm quite sure the number
> of
>  cases where this could be an issue is rather small already, and I
> tend
>  to agree with Paul that admins who really care will more likely want
> to
>  select the option where the system will "just work". IMO the
> "noticeable
>  failure" is something which will be selected mostly by developers.
> But
>  I'm not an expert in that area, so I don't want to influence the
>  decision regarding the to be selected default too much.
> >>>
> >>> An admin not wanting to know is, to me, the same as them not wanting
> >>> to know about security issues, and hence not subscribing to our
> >>> announcements lists. I can accept this being a reasonable thing to
> >>> do when it is an _informed_ decision. But with the current
> >>> arrangements there's no way whatsoever for an admin to know.
> >>
> >> Maybe I have misunderstood the current state, but I thought that it
> >> would just silently hide quirky devices without imposing a security
> >> risk. We would not learn which devices are quirky, but OTOH I doubt
> >> we'd get many reports about those in case your patch goes in.
> >
> > We don't want or need such reports, that's not the point. The
> > security risk comes from the quirkiness of the devices - admins
> > may wrongly think all is well and expose quirky devices to not
> > sufficiently trusted guests. (I say this fully realizing that
> > exposing devices to untrusted guests is almost always a certain
> > level of risk.)
> 
> Do we _know_ those devices are problematic from security standpoint?
> Normally the IOMMU should do the isolation just fine. If it doesn't
> then its not the quirky device which is problematic, but the IOMMU.
> 
> I thought the problem was that the quirky devices would not stop all
> (read) DMA even when being unassigned from the guest resulting in
> fatal IOMMU faults. The dummy page should stop those faults to happen
> resulting in a more stable system.

That's right.

> 
> So what are the security problems which are added by this behavior?
> 

Since *not* having the 'sink' page allows a guest pull off a host DoS in the 
presence of such h/w, security is surely increased by having it?

  Paul

> 
> Juergen
> 
> 

Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old udev related code

2019-12-13 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 13 December 2019 05:41
> To: David Miller ; Durrant, Paul
> 
> Cc: xen-devel@lists.xenproject.org; wei@kernel.org; linux-
> ker...@vger.kernel.org; net...@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old udev
> related code
> 
> On 12.12.19 20:05, David Miller wrote:
> > From: Paul Durrant 
> > Date: Thu, 12 Dec 2019 13:54:06 +
> >
> >> In the past it used to be the case that the Xen toolstack relied upon
> >> udev to execute backend hotplug scripts. However this has not been the
> >> case for many releases now and removal of the associated code in
> >> xen-netback shortens the source by more than 100 lines, and removes
> much
> >> complexity in the interaction with the xenstore backend state.
> >>
> >> NOTE: xen-netback is the only xenbus driver to have a functional
> uevent()
> >>method. The only other driver to have a method at all is
> >>pvcalls-back, and currently pvcalls_back_uevent() simply returns
> 0.
> >>Hence this patch also facilitates further cleanup.
> >>
> >> Signed-off-by: Paul Durrant 
> >
> > If userspace ever used this stuff, I seriously doubt you can remove this
> > even if it hasn't been used in 5+ years.
> 
> Hmm, depends.
> 
> This has been used by Xen tools in dom0 only. If the last usage has been
> in a Xen version which is no longer able to run with current Linux in
> dom0 it could be removed. But I guess this would have to be a rather old
> version of Xen (like 3.x?).
> 
> Paul, can you give a hint since which Xen version the toolstack no
> longer relies on udev to start the hotplug scripts?
> 

The udev rules were in a file called tools/hotplug/Linux/xen-backend.rules (in 
xen.git), and a commit from Roger removed the NIC rules in 2012:

commit 57ad6afe2a08a03c40bcd336bfb27e008e1d3e53
Author: Roger Pau Monne 
Date:   Thu Jul 26 16:47:35 2012 +0100

libxl: call hotplug scripts for nic devices from libxl

Since most of the needed work is already done in previous patches,
this patch only contains the necessary code to call hotplug scripts
for nic devices, that should be called when the device is added or
removed from a guest.

Added another parameter to libxl__get_hotplug_script_info, that is
used to know the number of times hotplug scripts have been called for
that device. This is currently used by IOEMU nics on Linux.

Signed-off-by: Roger Pau Monne 
Acked-by: Ian Jackson
Committed-by: Ian Campbell 

The last commit I could find to that file modified its name to 
xen-backend.rules.in, and this was finally removed by George in 2015:

commit 2ba368d13893402b2f1fb3c283ddcc714659dd9b
Author: George Dunlap 
Date:   Mon Jul 6 11:51:39 2015 +0100

libxl: Remove linux udev rules

They are no longer needed, having been replaced by a daemon for
driverdomains which will run scripts as necessary.

Worse yet, they seem to be broken for script-based block devices, such
as block-iscsi.  This wouldn't matter so much if they were never run
by default; but if you run block-attach without having created a
domain, then the appropriate node to disable running udev scripts will
not have been written yet, and the attach will silently fail.

Rather than try to sort out that issue, just remove them entirely.

Signed-off-by: George Dunlap 
Acked-by: Wei Liu 

So, I think this means anyone using a version of the Xen tools within recent 
memory will be having their hotplug scripts called directly by libxl (and 
having udev rules present would actually be counter-productive, as George's 
commit states and as I discovered the hard way when the change was originally 
made).

  Paul



> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined devices optional

2019-12-13 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 13 December 2019 13:26
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Juergen Gross ; Kevin
> Tian ; Stefano Stabellini ;
> Julien Grall ; Wei Liu ; Konrad Wilk
> ; George Dunlap ;
> Andrew Cooper ; Paul Durrant ;
> Ian Jackson ; Roger Pau Monné
> 
> Subject: Re: [PATCH v2] IOMMU: make DMA containment of quarantined devices
> optional
> 
> On 13.12.2019 14:12, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Xen-devel  On Behalf Of
> Jan
> >> Beulich
> >> Sent: 13 December 2019 12:53
> >> To: xen-devel@lists.xenproject.org
> >> Cc: Juergen Gross ; Kevin Tian ;
> >> Stefano Stabellini ; Julien Grall
> >> ; Wei Liu ; Konrad Wilk
> >> ; George Dunlap ;
> >> Andrew Cooper ; Paul Durrant ;
> >> Ian Jackson ; Roger Pau Monné
> >> 
> >> Subject: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of
> quarantined
> >> devices optional
> >>
> >> Containing still in flight DMA was introduced to work around certain
> >> devices / systems hanging hard upon hitting an IOMMU fault. Passing
> >> through (such) devices (on such systems) is inherently insecure (as
> >> guests could easily arrange for IOMMU faults to occur). Defaulting to
> >> a mode where admins may not even become aware of issues with devices
> can
> >> be considered undesirable. Therefore convert this mode of operation to
> >> an optional one, not one enabled by default.
> >>
> >> This involves resurrecting code commit ea38867831da ("x86 / iommu: set
> >> up a scratch page in the quarantine domain") did remove, in a slightly
> >> extended and abstracted fashion. Here, instead of reintroducing a
> pretty
> >> pointless use of "goto" in domain_context_unmap(), and instead of
> making
> >> the function (at least temporarily) inconsistent, take the opportunity
> >> and replace the other similarly pointless "goto" as well.
> >>
> >> In order to key the re-instated bypasses off of there (not) being a
> root
> >> page table this further requires moving the allocate_domain_resources()
> >> invocation from reassign_device() to amd_iommu_setup_domain_device()
> (or
> >> else reassign_device() would allocate a root page table anyway); this
> is
> >> benign to the second caller of the latter function.
> >>
> >> Signed-off-by: Jan Beulich 
> >> ---
> >> As far as 4.13 is concerned, I guess if we can't come to an agreement
> >> here, the only other option is to revert ea38867831da from the branch,
> >> for having been committed prematurely (I'm not so much worried about
> the
> >> master branch, where we have ample time until 4.14). What I surely want
> >> to see us avoid is a back and forth in behavior of released versions.
> >> (Note that 4.12.2 is similarly blocked on a decision either way here.)
> >>
> >> I'm happy to take better suggestions to replace "full".
> >
> > How about simply "sink", since that's what it does?
> 
> But it's not really a "sink", as we still fault writes (which is the
> only thing I can see to be "sunk" if I'm getting the meaning of the
> word right).
> 
> >> --- a/xen/drivers/passthrough/iommu.c
> >> +++ b/xen/drivers/passthrough/iommu.c
> >> @@ -30,13 +30,17 @@ bool_t __initdata iommu_enable = 1;
> >>  bool_t __read_mostly iommu_enabled;
> >>  bool_t __read_mostly force_iommu;
> >>  bool_t __read_mostly iommu_verbose;
> >> -bool __read_mostly iommu_quarantine = true;
> >>  bool_t __read_mostly iommu_igfx = 1;
> >>  bool_t __read_mostly iommu_snoop = 1;
> >>  bool_t __read_mostly iommu_qinval = 1;
> >>  bool_t __read_mostly iommu_intremap = 1;
> >>  bool_t __read_mostly iommu_crash_disable;
> >>
> >> +#define IOMMU_quarantine_none  0
> >> +#define IOMMU_quarantine_basic 1
> >> +#define IOMMU_quarantine_full  2
> >> +uint8_t __read_mostly iommu_quarantine = IOMMU_quarantine_basic;
> >
> > If we have 'IOMMU_quarantine_sink' instead of 'IOMMU_quarantine_full',
> > then how about 'IOMMU_quarantine_write_fault' instead of
> > 'IOMMU_quarantine_basic'?
> 
> Why "write_fault"? Even in "full" mode you only avoid read faults
> aiui (see also above). So if anything "write_fault" would be a
> replacement for "full"; "basic" could be replaced by just "fault"
> then.

Sorry, yes, I had things the wrong way round. "fault" and "write_fault" sound 
good.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old udev related code

2019-12-13 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 13 December 2019 10:02
> To: Durrant, Paul ; David Miller
> 
> Cc: xen-devel@lists.xenproject.org; wei@kernel.org; linux-
> ker...@vger.kernel.org; net...@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old udev
> related code
> 
> On 13.12.19 10:24, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 13 December 2019 05:41
> >> To: David Miller ; Durrant, Paul
> >> 
> >> Cc: xen-devel@lists.xenproject.org; wei@kernel.org; linux-
> >> ker...@vger.kernel.org; net...@vger.kernel.org
> >> Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old
> udev
> >> related code
> >>
> >> On 12.12.19 20:05, David Miller wrote:
> >>> From: Paul Durrant 
> >>> Date: Thu, 12 Dec 2019 13:54:06 +
> >>>
> >>>> In the past it used to be the case that the Xen toolstack relied upon
> >>>> udev to execute backend hotplug scripts. However this has not been
> the
> >>>> case for many releases now and removal of the associated code in
> >>>> xen-netback shortens the source by more than 100 lines, and removes
> >> much
> >>>> complexity in the interaction with the xenstore backend state.
> >>>>
> >>>> NOTE: xen-netback is the only xenbus driver to have a functional
> >> uevent()
> >>>> method. The only other driver to have a method at all is
> >>>> pvcalls-back, and currently pvcalls_back_uevent() simply
> returns
> >> 0.
> >>>> Hence this patch also facilitates further cleanup.
> >>>>
> >>>> Signed-off-by: Paul Durrant 
> >>>
> >>> If userspace ever used this stuff, I seriously doubt you can remove
> this
> >>> even if it hasn't been used in 5+ years.
> >>
> >> Hmm, depends.
> >>
> >> This has been used by Xen tools in dom0 only. If the last usage has
> been
> >> in a Xen version which is no longer able to run with current Linux in
> >> dom0 it could be removed. But I guess this would have to be a rather
> old
> >> version of Xen (like 3.x?).
> >>
> >> Paul, can you give a hint since which Xen version the toolstack no
> >> longer relies on udev to start the hotplug scripts?
> >>
> >
> > The udev rules were in a file called tools/hotplug/Linux/xen-
> backend.rules (in xen.git), and a commit from Roger removed the NIC rules
> in 2012:
> >
> > commit 57ad6afe2a08a03c40bcd336bfb27e008e1d3e53
> 
> Xen 4.2
> 
> > The last commit I could find to that file modified its name to xen-
> backend.rules.in, and this was finally removed by George in 2015:
> >
> > commit 2ba368d13893402b2f1fb3c283ddcc714659dd9b
> 
> Xen 4.6
> 
> > So, I think this means anyone using a version of the Xen tools within
> recent memory will be having their hotplug scripts called directly by
> libxl (and having udev rules present would actually be counter-productive,
> as George's commit states and as I discovered the hard way when the change
> was originally made).
> 
> The problem are systems with either old Xen versions (before Xen 4.2) or
> with other toolstacks (e.g. Xen 4.4 with xend) which want to use a new
> dom0 kernel.
> 
> And I'm not sure there aren't such systems (especially in case someone
> wants to stick with xend).
> 

But would someone sticking with such an old toolstack expect to run on an 
unmodified upstream dom0? There has to be some way in which we can retire old 
code.

Aside from the udev kicks though, I still think the hotplug-status/ring state 
interaction is just bogus anyway. As I said in a previous thread, the 
hotplug-status ought to be indicated as carrier status, if at all, so I still 
think all that code ought to go.

  Paul

> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined devices optional

2019-12-13 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 13 December 2019 12:53
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross ; Kevin Tian ;
> Stefano Stabellini ; Julien Grall
> ; Wei Liu ; Konrad Wilk
> ; George Dunlap ;
> Andrew Cooper ; Paul Durrant ;
> Ian Jackson ; Roger Pau Monné
> 
> Subject: [Xen-devel] [PATCH v2] IOMMU: make DMA containment of quarantined
> devices optional
> 
> Containing still in flight DMA was introduced to work around certain
> devices / systems hanging hard upon hitting an IOMMU fault. Passing
> through (such) devices (on such systems) is inherently insecure (as
> guests could easily arrange for IOMMU faults to occur). Defaulting to
> a mode where admins may not even become aware of issues with devices can
> be considered undesirable. Therefore convert this mode of operation to
> an optional one, not one enabled by default.
> 
> This involves resurrecting code commit ea38867831da ("x86 / iommu: set
> up a scratch page in the quarantine domain") did remove, in a slightly
> extended and abstracted fashion. Here, instead of reintroducing a pretty
> pointless use of "goto" in domain_context_unmap(), and instead of making
> the function (at least temporarily) inconsistent, take the opportunity
> and replace the other similarly pointless "goto" as well.
> 
> In order to key the re-instated bypasses off of there (not) being a root
> page table this further requires moving the allocate_domain_resources()
> invocation from reassign_device() to amd_iommu_setup_domain_device() (or
> else reassign_device() would allocate a root page table anyway); this is
> benign to the second caller of the latter function.
> 
> Signed-off-by: Jan Beulich 
> ---
> As far as 4.13 is concerned, I guess if we can't come to an agreement
> here, the only other option is to revert ea38867831da from the branch,
> for having been committed prematurely (I'm not so much worried about the
> master branch, where we have ample time until 4.14). What I surely want
> to see us avoid is a back and forth in behavior of released versions.
> (Note that 4.12.2 is similarly blocked on a decision either way here.)
> 
> I'm happy to take better suggestions to replace "full".

How about simply "sink", since that's what it does?

[snip]
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -30,13 +30,17 @@ bool_t __initdata iommu_enable = 1;
>  bool_t __read_mostly iommu_enabled;
>  bool_t __read_mostly force_iommu;
>  bool_t __read_mostly iommu_verbose;
> -bool __read_mostly iommu_quarantine = true;
>  bool_t __read_mostly iommu_igfx = 1;
>  bool_t __read_mostly iommu_snoop = 1;
>  bool_t __read_mostly iommu_qinval = 1;
>  bool_t __read_mostly iommu_intremap = 1;
>  bool_t __read_mostly iommu_crash_disable;
> 
> +#define IOMMU_quarantine_none  0
> +#define IOMMU_quarantine_basic 1
> +#define IOMMU_quarantine_full  2
> +uint8_t __read_mostly iommu_quarantine = IOMMU_quarantine_basic;

If we have 'IOMMU_quarantine_sink' instead of 'IOMMU_quarantine_full', then how 
about 'IOMMU_quarantine_write_fault' instead of 'IOMMU_quarantine_basic'?

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 3/3] xen/netback: Fix grant copy across page boundary with KASAN

2019-12-17 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Sergey Dyasli
> Sent: 17 December 2019 14:08
> To: xen-de...@lists.xen.org; kasan-...@googlegroups.com; linux-
> ker...@vger.kernel.org
> Cc: Juergen Gross ; Sergey Dyasli
> ; Stefano Stabellini ;
> George Dunlap ; Ross Lagerwall
> ; Alexander Potapenko ;
> Andrey Ryabinin ; Boris Ostrovsky
> ; Dmitry Vyukov 
> Subject: [Xen-devel] [RFC PATCH 3/3] xen/netback: Fix grant copy across
> page boundary with KASAN
> 
> From: Ross Lagerwall 
> 
> When KASAN (or SLUB_DEBUG) is turned on, the normal expectation that
> allocations are aligned to the next power of 2 of the size does not
> hold. Therefore, handle grant copies that cross page boundaries.
> 
> Signed-off-by: Ross Lagerwall 
> Signed-off-by: Sergey Dyasli 

Would have been nice to cc netback maintainers...

> ---
>  drivers/net/xen-netback/common.h  |  2 +-
>  drivers/net/xen-netback/netback.c | 55 ---
>  2 files changed, 45 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
> netback/common.h
> index 05847eb91a1b..e57684415edd 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -155,7 +155,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
>   struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
>   grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
> 
> - struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
> + struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS * 2];
>   struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
>   struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
>   /* passed to gnttab_[un]map_refs with pages under (un)mapping */
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> netback/netback.c
> index 0020b2e8c279..1541b6e0cc62 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -320,6 +320,7 @@ static int xenvif_count_requests(struct xenvif_queue
> *queue,
> 
>  struct xenvif_tx_cb {
>   u16 pending_idx;
> + u8 copies;
>  };

I know we're a way off the limit (48 bytes) but I wonder if we ought to have a 
compile time check here that we're not overflowing skb->cb.

> 
>  #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
> @@ -439,6 +440,7 @@ static int xenvif_tx_check_gop(struct xenvif_queue
> *queue,
>  {
>   struct gnttab_map_grant_ref *gop_map = *gopp_map;
>   u16 pending_idx = XENVIF_TX_CB(skb)->pending_idx;
> + u8 copies = XENVIF_TX_CB(skb)->copies;
>   /* This always points to the shinfo of the skb being checked, which
>* could be either the first or the one on the frag_list
>*/
> @@ -450,23 +452,27 @@ static int xenvif_tx_check_gop(struct xenvif_queue
> *queue,
>   int nr_frags = shinfo->nr_frags;
>   const bool sharedslot = nr_frags &&
>   frag_get_pending_idx(>frags[0]) ==
> pending_idx;
> - int i, err;
> + int i, err = 0;
> 
> - /* Check status of header. */
> - err = (*gopp_copy)->status;
> - if (unlikely(err)) {
> - if (net_ratelimit())
> - netdev_dbg(queue->vif->dev,
> + while (copies) {
> + /* Check status of header. */
> + int newerr = (*gopp_copy)->status;
> + if (unlikely(newerr)) {
> + if (net_ratelimit())
> + netdev_dbg(queue->vif->dev,
>  "Grant copy of header failed! status: %d
> pending_idx: %u ref: %u\n",
>  (*gopp_copy)->status,
>  pending_idx,
>  (*gopp_copy)->source.u.ref);
> - /* The first frag might still have this slot mapped */
> - if (!sharedslot)
> - xenvif_idx_release(queue, pending_idx,
> -XEN_NETIF_RSP_ERROR);
> + /* The first frag might still have this slot mapped */
> + if (!sharedslot && !err)
> + xenvif_idx_release(queue, pending_idx,
> +XEN_NETIF_RSP_ERROR);

Can't this be done after the loop, if there is an accumulated err? I think it 
would make the code slightly neater.

> + err = newerr;
> + }
> + (*gopp_copy)++;
> + copies--;
>   }
> - (*gopp_copy)++;
> 
>  check_frags:
>   for (i = 0; i < nr_frags; i++, gop_map++) {
> @@ -910,6 +916,7 @@ static void xenvif_tx_build_gops(struct xenvif_queue
> *queue,
>   xenvif_tx_err(queue, , extra_count, idx);
>   break;
>   }
> + XENVIF_TX_CB(skb)->copies = 0;
> 
>   skb_shinfo(skb)->nr_frags = ret;
>   if (data_len < txreq.size)
> @@ -933,6 +940,7 @@ static void 

Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv-tlfs.h from Linux

2019-12-11 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu 
> Sent: 11 December 2019 11:15
> To: Jan Beulich 
> Cc: Durrant, Paul ; Wei Liu ; Wei Liu
> ; Paul Durrant ; Andrew Cooper
> ; Michael Kelley ; Xen
> Development List ; Roger Pau Monné
> 
> Subject: Re: [PATCH for-next 1/7] x86: import hyperv-tlfs.h from Linux
> 
> On Tue, Dec 10, 2019 at 04:43:30PM +0100, Jan Beulich wrote:
> > On 10.12.2019 16:37, Durrant, Paul wrote:
> > >> -Original Message-
> > >> From: Xen-devel  On Behalf Of
> Jan
> > >> Beulich
> > >> Sent: 10 December 2019 15:34
> > >> To: Wei Liu 
> > >> Cc: Wei Liu ; Paul Durrant ;
> Andrew
> > >> Cooper ; Michael Kelley
> > >> ; Xen Development List  > >> de...@lists.xenproject.org>; Roger Pau Monné 
> > >> Subject: Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv-
> tlfs.h
> > >> from Linux
> > >>
> > >> On 25.10.2019 11:16, Wei Liu wrote:
> > >>> Taken from Linux commit b2d8b167e15bb5ec2691d1119c025630a247f649.
> > >>>
> > >>> This is a pristine copy from Linux. It is not used yet and probably
> > >>> doesn't compile. Changes to make it work will come later.
> > >>>
> > >>> Signed-off-by: Wei Liu 
> > >>
> > >> This coming from Linux and assuming at least a fair part of it is
> > >> going to be used, in principle
> > >> Acked-by: Jan Beulich 
> > >>
> > >> However, there are many seemingly unnecessary uses of __packed
> > >> here, which I'd rather not see go in at all (i.e. not be dropped
> > >> later on, and then potentially missing some). I find ...
> > >>
> > >>> +typedef struct _HV_REFERENCE_TSC_PAGE {
> > >>> +   __u32 tsc_sequence;
> > >>> +   __u32 res1;
> > >>> +   __u64 tsc_scale;
> > >>> +   __s64 tsc_offset;
> > >>> +}  __packed HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
> > >>
> > >
> > > You realise there's a definition of this in the viridian code already,
> right?
> >
> > It looked familiar, but it didn't occur to me to point this out.
> > Yes, there looks to be room for deduplication...
> >
> 
> I had a plan to make viridian code use this copy directly.
> 

I have no objection to that, but I think it ought to be done as part of this 
series so that we don't end up with long-term duplication.

  Paul

> 
> > Actually, Wei, one more thing I was curious about - what is "tlfs"
> > an acronym of?
> 
> It means "Top-Level Function Specification".
> 
> (I wish Xen had something similar)
> 
> Wei.
> 
> >
> > Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen-blkback: prevent premature module unload

2019-12-11 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 11 December 2019 11:29
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Konrad Rzeszutek Wilk ;
> Jens Axboe 
> Subject: Re: [PATCH] xen-blkback: prevent premature module unload
> 
> On Tue, Dec 10, 2019 at 02:53:05PM +, Paul Durrant wrote:
> > Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem
> > cache. This cache is destoyed when xen-blkif is unloaded so it is
> > necessary to wait for the deferred free routine used for such objects to
> > complete. This necessity was missed in commit 14855954f636 "xen-blkback:
> > allow module to be cleanly unloaded". This patch fixes the problem by
> > taking/releasing extra module references in xen_blkif_alloc/free()
> > respectively.
> >
> > Signed-off-by: Paul Durrant 
> 
> Reviewed-by: Roger Pau Monné 
> 
> One nit below.
> 
> > ---
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: "Roger Pau Monné" 
> > Cc: Jens Axboe 
> > ---
> >  drivers/block/xen-blkback/xenbus.c | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> blkback/xenbus.c
> > index e8c5c54e1d26..59d576d27ca7 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -171,6 +171,15 @@ static struct xen_blkif *xen_blkif_alloc(domid_t
> domid)
> > blkif->domid = domid;
> > atomic_set(>refcnt, 1);
> > init_completion(>drain_complete);
> > +
> > +   /*
> > +* Because freeing back to the cache may be deferred, it is not
> > +* safe to unload the module (and hence destroy the cache) until
> > +* this has completed. To prevent premature unloading, take an
> > +* extra module reference here and release only when the object
> > +* has been free back to the cache.
> ^ freed

Oh yes. Can this be done on commit, or would you like me to send a v2?

  Paul

> > +*/
> > +   __module_get(THIS_MODULE);
> > INIT_WORK(>free_work, xen_blkif_deferred_free);
> >
> > return blkif;
> > @@ -320,6 +329,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> >
> > /* Make sure everything is drained before shutting down */
> > kmem_cache_free(xen_blkif_cachep, blkif);
> > +   module_put(THIS_MODULE);
> >  }
> >
> >  int __init xen_blkif_interface_init(void)
> > --
> > 2.20.1
> >

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen-blkback: prevent premature module unload

2019-12-11 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 11 December 2019 13:55
> To: Durrant, Paul ; Juergen Gross 
> Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Konrad Rzeszutek Wilk ;
> Jens Axboe 
> Subject: Re: [PATCH] xen-blkback: prevent premature module unload
> 
> On Wed, Dec 11, 2019 at 01:27:42PM +, Durrant, Paul wrote:
> > > -Original Message-
> > > From: Roger Pau Monné 
> > > Sent: 11 December 2019 11:29
> > > To: Durrant, Paul 
> > > Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org;
> linux-
> > > ker...@vger.kernel.org; Konrad Rzeszutek Wilk
> ;
> > > Jens Axboe 
> > > Subject: Re: [PATCH] xen-blkback: prevent premature module unload
> > >
> > > On Tue, Dec 10, 2019 at 02:53:05PM +, Paul Durrant wrote:
> > > > Objects allocated by xen_blkif_alloc come from the 'blkif_cache'
> kmem
> > > > cache. This cache is destoyed when xen-blkif is unloaded so it is
> > > > necessary to wait for the deferred free routine used for such
> objects to
> > > > complete. This necessity was missed in commit 14855954f636 "xen-
> blkback:
> > > > allow module to be cleanly unloaded". This patch fixes the problem
> by
> > > > taking/releasing extra module references in xen_blkif_alloc/free()
> > > > respectively.
> > > >
> > > > Signed-off-by: Paul Durrant 
> > >
> > > Reviewed-by: Roger Pau Monné 
> > >
> > > One nit below.
> > >
> > > > ---
> > > > Cc: Konrad Rzeszutek Wilk 
> > > > Cc: "Roger Pau Monné" 
> > > > Cc: Jens Axboe 
> > > > ---
> > > >  drivers/block/xen-blkback/xenbus.c | 10 ++
> > > >  1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> > > blkback/xenbus.c
> > > > index e8c5c54e1d26..59d576d27ca7 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -171,6 +171,15 @@ static struct xen_blkif
> *xen_blkif_alloc(domid_t
> > > domid)
> > > > blkif->domid = domid;
> > > > atomic_set(>refcnt, 1);
> > > > init_completion(>drain_complete);
> > > > +
> > > > +   /*
> > > > +* Because freeing back to the cache may be deferred, it is
> not
> > > > +* safe to unload the module (and hence destroy the cache)
> until
> > > > +* this has completed. To prevent premature unloading, take an
> > > > +* extra module reference here and release only when the
> object
> > > > +* has been free back to the cache.
> > > ^ freed
> >
> > Oh yes. Can this be done on commit, or would you like me to send a v2?
> 
> Adjusting on commit would be fine for me, but it's up to Juergen since
> he is the one that will pick this up. IIRC the module unload patches
> didn't go through the block subsystem.

True. I forgot manually add Juergen cc list.

  Paul

> 
> Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/4] xen-blkback: support dynamic unbind/bind

2019-12-11 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 11 December 2019 10:46
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; linux-ker...@vger.kernel.org; Konrad
> Rzeszutek Wilk ; Jens Axboe ;
> Boris Ostrovsky ; Juergen Gross
> ; Stefano Stabellini 
> Subject: Re: [PATCH v2 4/4] xen-blkback: support dynamic unbind/bind
> 
> On Tue, Dec 10, 2019 at 11:33:47AM +, Paul Durrant wrote:
> > By simply re-attaching to shared rings during connect_ring() rather than
> > assuming they are freshly allocated (i.e assuming the counters are zero)
> > it is possible for vbd instances to be unbound and re-bound from and to
> > (respectively) a running guest.
> >
> > This has been tested by running:
> >
> > while true;
> >   do fio --name=randwrite --ioengine=libaio --iodepth=16 \
> >   --rw=randwrite --bs=4k --direct=1 --size=1G --verify=crc32;
> >   done
> >
> > in a PV guest whilst running:
> >
> > while true;
> >   do echo vbd-$DOMID-$VBD >unbind;
> >   echo unbound;
> >   sleep 5;
> 
> Is there anyway to know when the unbind has finished? AFAICT
> xen_blkif_disconnect will return EBUSY if there are in flight
> requests, and the disconnect won't be completed until those requests
> are finished.

Yes, the device sysfs node will disappear when remove() completes.

> 
> >   echo vbd-$DOMID-$VBD >bind;
> >   echo bound;
> >   sleep 3;
> >   done
> >
> > in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and
> > re-bind its system disk image.
> >
> > This is a highly useful feature for a backend module as it allows it to
> be
> > unloaded and re-loaded (i.e. updated) without requiring domUs to be
> halted.
> > This was also tested by running:
> >
> > while true;
> >   do echo vbd-$DOMID-$VBD >unbind;
> >   echo unbound;
> >   sleep 5;
> >   rmmod xen-blkback;
> >   echo unloaded;
> >   sleep 1;
> >   modprobe xen-blkback;
> >   echo bound;
> >   cd $(pwd);
> >   sleep 3;
> >   done
> >
> > in dom0 whilst running the same loop as above in the (single) PV guest.
> >
> > Some (less stressful) testing has also been done using a Windows HVM
> guest
> > with the latest 9.0 PV drivers installed.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: "Roger Pau Monné" 
> > Cc: Jens Axboe 
> > Cc: Boris Ostrovsky 
> > Cc: Juergen Gross 
> > Cc: Stefano Stabellini 
> >
> > v2:
> >  - Apply a sanity check to the value of rsp_prod and fail the re-attach
> >if it is implausible
> >  - Set allow_rebind to prevent ring from being closed on unbind
> >  - Update test workload from dd to fio (with verification)
> > ---
> >  drivers/block/xen-blkback/xenbus.c | 59 +-
> >  1 file changed, 41 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> blkback/xenbus.c
> > index e8c5c54e1d26..13d09630b237 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -181,6 +181,8 @@ static int xen_blkif_map(struct xen_blkif_ring
> *ring, grant_ref_t *gref,
> >  {
> > int err;
> > struct xen_blkif *blkif = ring->blkif;
> > +   struct blkif_common_sring *sring_common;
> > +   RING_IDX rsp_prod, req_prod;
> >
> > /* Already connected through? */
> > if (ring->irq)
> > @@ -191,46 +193,66 @@ static int xen_blkif_map(struct xen_blkif_ring
> *ring, grant_ref_t *gref,
> > if (err < 0)
> > return err;
> >
> > +   sring_common = (struct blkif_common_sring *)ring->blk_ring;
> > +   rsp_prod = READ_ONCE(sring_common->rsp_prod);
> > +   req_prod = READ_ONCE(sring_common->req_prod);
> > +
> > switch (blkif->blk_protocol) {
> > case BLKIF_PROTOCOL_NATIVE:
> > {
> > -   struct blkif_sring *sring;
> > -   sring = (struct blkif_sring *)ring->blk_ring;
> > -   BACK_RING_INIT(>blk_rings.native, sring,
> > -  XEN_PAGE_SIZE * nr_grefs);
> > +   struct blkif_sring *sring_native =
> > +   (struct blkif_sring *)ring->blk_ring;
> 
> I think you can constify both sring_native and sring_common (and the
> other instances below).

Yes, I can do that. I don't think the macros would mind.

> 
> > +   unsigned int size = __RING_SIZE(sring_native,
> > +  

Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced to closed

2019-12-11 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 11 December 2019 10:21
> To: Durrant, Paul ; Roger Pau Monné
> 
> Cc: xen-devel@lists.xenproject.org; linux-ker...@vger.kernel.org; Stefano
> Stabellini ; Boris Ostrovsky
> 
> Subject: Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is forced
> to closed
> 
> On 11.12.19 11:14, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Roger Pau Monné 
> >> Sent: 11 December 2019 10:06
> >> To: Durrant, Paul 
> >> Cc: xen-devel@lists.xenproject.org; linux-ker...@vger.kernel.org;
> Juergen
> >> Gross ; Stefano Stabellini ;
> >> Boris Ostrovsky 
> >> Subject: Re: [Xen-devel] [PATCH v2 2/4] xenbus: limit when state is
> forced
> >> to closed
> >>
> >> On Tue, Dec 10, 2019 at 11:33:45AM +, Paul Durrant wrote:
> >>> If a driver probe() fails then leave the xenstore state alone. There
> is
> >> no
> >>> reason to modify it as the failure may be due to transient resource
> >>> allocation issues and hence a subsequent probe() may succeed.
> >>>
> >>> If the driver supports re-binding then only force state to closed
> during
> >>> remove() only in the case when the toolstack may need to clean up.
> This
> >> can
> >>> be detected by checking whether the state in xenstore has been set to
> >>> closing prior to device removal.
> >>>
> >>> NOTE: Re-bind support is indicated by new boolean in struct
> >> xenbus_driver,
> >>>which defaults to false. Subsequent patches will add support to
> >>>some backend drivers.
> >>
> >> My intention was to specify whether you want to close the
> >> backends on unbind in sysfs, so that an user can decide at runtime,
> >> rather than having a hardcoded value in the driver.
> >>
> >> Anyway, I'm less sure whether such runtime tunable is useful at all,
> >> so let's leave it out and can always be added afterwards. At the end
> >> of day a user wrongly doing a rmmod blkback can always recover
> >> gracefully by loading blkback again with your proposed approach to
> >> leave connections open on module removal.
> >>
> >> Sorry for the extra work.
> >>
> >
> > Does this mean you don't think the extra driver flag is necessary any
> more? NB: now that xenbus actually takes module references you can't
> accidentally rmmod any more :-)
> 
> I'd like it to be kept, please.
> 

Ok. I'll leave this patch alone then.

  Paul

> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/4] xen-blkback: support dynamic unbind/bind

2019-12-11 Thread Durrant, Paul
> -Original Message-
> > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> > blkback/xenbus.c
> > > index e8c5c54e1d26..13d09630b237 100644
> > > --- a/drivers/block/xen-blkback/xenbus.c
> > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > @@ -181,6 +181,8 @@ static int xen_blkif_map(struct xen_blkif_ring
> > *ring, grant_ref_t *gref,
> > >  {
> > >   int err;
> > >   struct xen_blkif *blkif = ring->blkif;
> > > + struct blkif_common_sring *sring_common;
> > > + RING_IDX rsp_prod, req_prod;
> > >
> > >   /* Already connected through? */
> > >   if (ring->irq)
> > > @@ -191,46 +193,66 @@ static int xen_blkif_map(struct xen_blkif_ring
> > *ring, grant_ref_t *gref,
> > >   if (err < 0)
> > >   return err;
> > >
> > > + sring_common = (struct blkif_common_sring *)ring->blk_ring;
> > > + rsp_prod = READ_ONCE(sring_common->rsp_prod);
> > > + req_prod = READ_ONCE(sring_common->req_prod);
> > > +
> > >   switch (blkif->blk_protocol) {
> > >   case BLKIF_PROTOCOL_NATIVE:
> > >   {
> > > - struct blkif_sring *sring;
> > > - sring = (struct blkif_sring *)ring->blk_ring;
> > > - BACK_RING_INIT(>blk_rings.native, sring,
> > > -XEN_PAGE_SIZE * nr_grefs);
> > > + struct blkif_sring *sring_native =
> > > + (struct blkif_sring *)ring->blk_ring;
> >
> > I think you can constify both sring_native and sring_common (and the
> > other instances below).
> 
> Yes, I can do that. I don't think the macros would mind.
> 

Spoke to soon. They do mind, of course, because the sring pointer in the 
front/back ring is not (and should not) be const. I can const sring_common but 
no others.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old udev related code

2019-12-12 Thread Durrant, Paul
> -Original Message-
> From: jandr...@gmail.com 
> Sent: 12 December 2019 16:32
> To: Durrant, Paul 
> Cc: xen-devel ; net...@vger.kernel.org;
> open list ; Wei Liu ;
> David S. Miller 
> Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old udev
> related code
> 
> On Thu, Dec 12, 2019 at 8:56 AM Paul Durrant  wrote:
> >
> > In the past it used to be the case that the Xen toolstack relied upon
> > udev to execute backend hotplug scripts. However this has not been the
> > case for many releases now and removal of the associated code in
> > xen-netback shortens the source by more than 100 lines, and removes much
> > complexity in the interaction with the xenstore backend state.
> >
> > NOTE: xen-netback is the only xenbus driver to have a functional
> uevent()
> >   method. The only other driver to have a method at all is
> >   pvcalls-back, and currently pvcalls_back_uevent() simply returns
> 0.
> >   Hence this patch also facilitates further cleanup.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Wei Liu 
> > Cc: "David S. Miller" 
> > ---
> >  drivers/net/xen-netback/common.h |  11 ---
> >  drivers/net/xen-netback/xenbus.c | 125 ---
> >  2 files changed, 14 insertions(+), 122 deletions(-)
> >
> > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
> netback/common.h
> > index 05847eb91a1b..e48da004c1a3 100644
> 
> 
> 
> > -static inline void backend_switch_state(struct backend_info *be,
> > -   enum xenbus_state state)
> > -{
> > -   struct xenbus_device *dev = be->dev;
> > -
> > -   pr_debug("%s -> %s\n", dev->nodename, xenbus_strstate(state));
> > -   be->state = state;
> > -
> > -   /* If we are waiting for a hotplug script then defer the
> > -* actual xenbus state change.
> > -*/
> > -   if (!be->have_hotplug_status_watch)
> > -   xenbus_switch_state(dev, state);
> 
> have_hotplug_status_watch prevents xen-netback from switching to
> connected state unless the the backend scripts have written
> "hotplug-status" "success".  I had always thought that was intentional
> so the frontend doesn't connect when the backend is unconnected.  i.e.
> if the backend scripts fails, it writes "hotplug-status" "error" and
> the frontend doesn't connect.
> 
> That behavior is independent of using udev to run the scripts.  I'm
> not opposed to removing it, but I think it at least warrants
> mentioning in the commit message.

True, but it's probably related. The netback probe would previously kick udev, 
the hotplug script would then run, and then the state would go connected. I 
think, because the hotplug is invoked directly by the toolstack now, these 
things really ought not to be tied together. TBH I can't see any harm in the 
frontend seeing the network connection before the backend plumbing is done... 
If the frontend should have any sort of indication of whether the backend is 
plumbed or not then IMO it ought to be as a virtual carrier/link status, 
because unplumbing and re-plumbing could be done at any time really without any 
need for the shared ring to go away (and in fact I will be following up at some 
point with a patch to allow unbind and re-bind of netback).

I'll elaborate in the commit message as you suggest :-)

Cheers,

  Paul

> 
> Regards,
> Jason
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.14 and future work

2019-12-06 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Andrew Cooper
> Sent: 05 December 2019 15:31
> To: Xen-devel List 
> Subject: Re: [Xen-devel] Xen 4.14 and future work
> 
> On 02/12/2019 19:51, Andrew Cooper wrote:
> > Hello,
> >
> > Now that 4.13 is on its way out of the door, it is time to look to
> > ongoing work.
> 
[snip]

/me remembers something else...

ISTR work was being done to replace minios stubdoms with something more modern. 
Is this continuing? AFAIK we are really only keeping qemu trad alive for 
stubdoms and it would be nice if we could finally retire it.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] xen-unstable (4.14 to be): Assertion '!preempt_count()' failed at preempt.c:36

2019-12-05 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Sander Eikelenboom
> Sent: 04 December 2019 21:04
> To: Jan Beulich 
> Cc: xen-devel@lists.xenproject.org; Igor Druzhinin
> ; Paul Durrant 
> Subject: Re: [Xen-devel] xen-unstable (4.14 to be): Assertion
> '!preempt_count()' failed at preempt.c:36
> 
> On 04/12/2019 18:30, Jan Beulich wrote:
> > On 04.12.2019 18:21, Sander Eikelenboom wrote:
> >> On current xen-unstable (4.14 to be) and AMD cpu:
> >>
> >> After rebooting the host, while the guests are starting, I hit the
> assertion below.
> >> xen-staging-4.13 seems fine on the same machine.
> >
> > Nothing between 4.13 RC4 and the tip of staging stands out,
> > so I wonder if you could bisect over this range? Or perhaps
> > someone else sees something I don't see (right now).
> >
> > Jan
> 
> Bisection came up with:
> 
> commit cd7dedad8209753e0fc8a97e61d04b74912b53dc
> Author: Paul Durrant 
> Date:   Fri Nov 15 18:59:30 2019 +
> 
> passthrough: simplify locking and logging
> 
> Dropping the pcidevs lock between calling device_assigned() and
> assign_device() means that the latter has to do the same check as the
> former for no obvious gain. Also, since long running operations under
> pcidevs lock already drop the lock and return -ERESTART periodically
> there
> is little point in immediately failing an assignment operation with
> -ERESTART just because the pcidevs lock could not be acquired (for the
> second time, having already blocked on acquiring the lock in
> device_assigned()).
> 
> This patch instead acquires the lock once for assignment (or test
> assign)
> operations directly in iommu_do_pci_domctl() and thus can remove the
> duplicate domain ownership check in assign_device(). Whilst in the
> neighbourhood, the patch also removes some debug logging from
> assign_device() and deassign_device() and replaces it with proper
> error
> logging, which allows error logging in iommu_do_pci_domctl() to be
> removed.
> 
> Signed-off-by: Paul Durrant 
> Signed-off-by: Igor Druzhinin 
> Acked-by: Jan Beulich 
> 

Going through the code, I notice a missing pcidevs_unlock() in the case of a 
device already assigned. I fixed it with a bit of re-structuring. Could you try 
the following patch?

---8<---
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index ced0c28e4f..c7207998a5 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1696,16 +1696,12 @@ int iommu_do_pci_domctl(

 pcidevs_lock();
 ret = device_assigned(seg, bus, devfn);
-if ( domctl->cmd == XEN_DOMCTL_test_assign_device )
+if ( ret && domctl->cmd == XEN_DOMCTL_test_assign_device )
 {
-if ( ret )
-{
-printk(XENLOG_G_INFO
-   "%04x:%02x:%02x.%u already assigned, or non-existent\n",
-   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
-ret = -EINVAL;
-}
-break;
+printk(XENLOG_G_INFO
+   "%04x:%02x:%02x.%u already assigned, or non-existent\n",
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+ret = -EINVAL;
 }
---8<---

Thanks,

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] xen-unstable (4.14 to be): Assertion '!preempt_count()' failed at preempt.c:36

2019-12-05 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 05 December 2019 08:44
> To: Durrant, Paul 
> Cc: Sander Eikelenboom ; xen-
> de...@lists.xenproject.org; Igor Druzhinin ;
> Paul Durrant 
> Subject: Re: xen-unstable (4.14 to be): Assertion '!preempt_count()'
> failed at preempt.c:36
> 
> On 05.12.2019 09:35, Durrant, Paul wrote:
> > --- a/xen/drivers/passthrough/pci.c
> > +++ b/xen/drivers/passthrough/pci.c
> > @@ -1696,16 +1696,12 @@ int iommu_do_pci_domctl(
> >
> >  pcidevs_lock();
> >  ret = device_assigned(seg, bus, devfn);
> > -if ( domctl->cmd == XEN_DOMCTL_test_assign_device )
> > +if ( ret && domctl->cmd == XEN_DOMCTL_test_assign_device )
> >  {
> > -if ( ret )
> > -{
> > -printk(XENLOG_G_INFO
> > -   "%04x:%02x:%02x.%u already assigned, or non-
> existent\n",
> > -   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> > -ret = -EINVAL;
> > -}
> > -break;
> > +printk(XENLOG_G_INFO
> > +   "%04x:%02x:%02x.%u already assigned, or non-
> existent\n",
> > +   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> > +ret = -EINVAL;
> >  }
> 
> But this seems wrong - you'd end up calling assign_device() even
> for the XEN_DOMCTL_test_assign_device case, when ret is 0. All we
> want is to delete the break statement afaict.
> 

Ah, yes; that logic is quite confusing. The patch should indeed be:

---8<---
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index ced0c28e4f..c07a63981a 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1705,7 +1705,6 @@ int iommu_do_pci_domctl(
seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
 ret = -EINVAL;
 }
-break;
 }
 else if ( !ret )
 ret = assign_device(d, seg, bus, devfn, flags);
---8<---

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] passthrough: drop break statement following c/s cd7dedad820

2019-12-05 Thread Durrant, Paul
> -Original Message-
> From: Igor Druzhinin 
> Sent: 05 December 2019 12:14
> To: xen-devel@lists.xenproject.org
> Cc: jbeul...@suse.com; li...@eikelenboom.it; Durrant, Paul
> ; Igor Druzhinin 
> Subject: [PATCH v2] passthrough: drop break statement following c/s
> cd7dedad820
> 
> The locking responsibilities have changed and a premature break in
> this section now causes the following assertion:
> 
> Assertion '!preempt_count()' failed at preempt.c:36
> 
> Suggested-by: Paul Durrant 

Actually, it was suggested by Jan, but you can put my R-b on the patch.

  Paul

> Reported-by: Sander Eikelenboom 
> Signed-off-by: Igor Druzhinin 
> ---
>  xen/drivers/passthrough/pci.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index ced0c28..c07a639 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -1705,7 +1705,6 @@ int iommu_do_pci_domctl(
> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
>  ret = -EINVAL;
>  }
> -break;
>  }
>  else if ( !ret )
>  ret = assign_device(d, seg, bus, devfn, flags);
> --
> 2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] xen-block: race condition when stopping the device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)

2019-12-16 Thread Durrant, Paul
> -Original Message-
[snip]
> >>
> >> This feels like a race condition between the init/free code with
> >> handler. Anthony, does it ring any bell?
> >>
> >
> >  From that stack bt it looks like an iothread managed to run after the
> sring was NULLed. This should not be able happen as the dataplane should
> have been moved back onto QEMU's main thread context before the ring is
> unmapped.
> 
> My knowledge of this code is fairly limited, so correct me if I am wrong.
> 
> blk_set_aio_context() would set the context for the block aio. AFAICT,
> the only aio for the block is xen_block_complete_aio().

Not quite. xen_block_dataplane_start() calls xen_device_bind_event_channel() 
and that will add an event channel fd into the aio context, so the shared ring 
is polled by the iothread as well as block i/o completion.

> 
> In the stack above, we are not dealing with a block aio but an aio tie
> to the event channel (see the call from xen_device_poll). So I don't
> think the blk_set_aio_context() would affect the aio.
> 

For the reason I outline above, it does.

> So it would be possible to get the iothread running because we received
> a notification on the event channel while we are stopping the block (i.e
> xen_block_dataplane_stop()).
> 

We should assume an iothread can essentially run at any time, as it is a 
polling entity. It should eventually block polling on fds assign to its aio 
context but I don't think the abstraction guarantees that it cannot be awoken 
for other reasons (e.g. off a timeout). However and event from the frontend 
will certainly cause the evtchn fd poll to wake up.

> If xen_block_dataplane_stop() grab the context lock first, then the
> iothread dealing with the event may wait on the lock until its released.
> 
> By the time the lock is grabbed, we may have free all the resources
> (including srings). So the event iothread will end up to dereference a
> NULL pointer.
> 

I think the problem may actually be that xen_block_dataplane_event() does not 
acquire the context and thus is not synchronized with 
xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt is not 
clear whether a poll handler called by an iothread needs to acquire the context 
though; TBH I would not have thought it necessary.

> It feels to me we need a way to quiesce all the iothreads (blk,
> event,...) before continuing. But I am a bit unsure how to do this in
> QEMU.
> 

Looking at virtio-blk.c I see that it does seem to close off its evtchn 
equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder whether 
the 'right' thing to do is to call xen_device_unbind_event_channel() using the 
same mechanism to ensure xen_block_dataplane_event() can't race.

  Paul

> Cheers,
> 
> --
> Julien Grall
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old udev related code

2019-12-16 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 16 December 2019 08:10
> To: Durrant, Paul ; David Miller
> 
> Cc: xen-devel@lists.xenproject.org; wei@kernel.org; linux-
> ker...@vger.kernel.org; net...@vger.kernel.org
> Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old udev
> related code
> 
> On 13.12.19 11:12, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 13 December 2019 10:02
> >> To: Durrant, Paul ; David Miller
> >> 
> >> Cc: xen-devel@lists.xenproject.org; wei@kernel.org; linux-
> >> ker...@vger.kernel.org; net...@vger.kernel.org
> >> Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old
> udev
> >> related code
> >>
> >> On 13.12.19 10:24, Durrant, Paul wrote:
> >>>> -Original Message-
> >>>> From: Jürgen Groß 
> >>>> Sent: 13 December 2019 05:41
> >>>> To: David Miller ; Durrant, Paul
> >>>> 
> >>>> Cc: xen-devel@lists.xenproject.org; wei@kernel.org; linux-
> >>>> ker...@vger.kernel.org; net...@vger.kernel.org
> >>>> Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: get rid of old
> >> udev
> >>>> related code
> >>>>
> >>>> On 12.12.19 20:05, David Miller wrote:
> >>>>> From: Paul Durrant 
> >>>>> Date: Thu, 12 Dec 2019 13:54:06 +
> >>>>>
> >>>>>> In the past it used to be the case that the Xen toolstack relied
> upon
> >>>>>> udev to execute backend hotplug scripts. However this has not been
> >> the
> >>>>>> case for many releases now and removal of the associated code in
> >>>>>> xen-netback shortens the source by more than 100 lines, and removes
> >>>> much
> >>>>>> complexity in the interaction with the xenstore backend state.
> >>>>>>
> >>>>>> NOTE: xen-netback is the only xenbus driver to have a functional
> >>>> uevent()
> >>>>>>  method. The only other driver to have a method at all is
> >>>>>>  pvcalls-back, and currently pvcalls_back_uevent() simply
> >> returns
> >>>> 0.
> >>>>>>  Hence this patch also facilitates further cleanup.
> >>>>>>
> >>>>>> Signed-off-by: Paul Durrant 
> >>>>>
> >>>>> If userspace ever used this stuff, I seriously doubt you can remove
> >> this
> >>>>> even if it hasn't been used in 5+ years.
> >>>>
> >>>> Hmm, depends.
> >>>>
> >>>> This has been used by Xen tools in dom0 only. If the last usage has
> >> been
> >>>> in a Xen version which is no longer able to run with current Linux in
> >>>> dom0 it could be removed. But I guess this would have to be a rather
> >> old
> >>>> version of Xen (like 3.x?).
> >>>>
> >>>> Paul, can you give a hint since which Xen version the toolstack no
> >>>> longer relies on udev to start the hotplug scripts?
> >>>>
> >>>
> >>> The udev rules were in a file called tools/hotplug/Linux/xen-
> >> backend.rules (in xen.git), and a commit from Roger removed the NIC
> rules
> >> in 2012:
> >>>
> >>> commit 57ad6afe2a08a03c40bcd336bfb27e008e1d3e53
> >>
> >> Xen 4.2
> >>
> >>> The last commit I could find to that file modified its name to xen-
> >> backend.rules.in, and this was finally removed by George in 2015:
> >>>
> >>> commit 2ba368d13893402b2f1fb3c283ddcc714659dd9b
> >>
> >> Xen 4.6
> >>
> >>> So, I think this means anyone using a version of the Xen tools within
> >> recent memory will be having their hotplug scripts called directly by
> >> libxl (and having udev rules present would actually be counter-
> productive,
> >> as George's commit states and as I discovered the hard way when the
> change
> >> was originally made).
> >>
> >> The problem are systems with either old Xen versions (before Xen 4.2)
> or
> >> with other toolstacks (e.g. Xen 4.4 with xend) which want to use a new
> >> dom0 kernel.
> >>
> >> And I'm not sure there aren't such systems (especially in case someone
> >> wants to 

Re: [Xen-devel] xen-block: race condition when stopping the device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)

2019-12-16 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Durrant, Paul
> Sent: 16 December 2019 09:34
> To: Julien Grall ; Ian Jackson 
> Cc: Jürgen Groß ; Stefano Stabellini
> ; qemu-de...@nongnu.org; osstest service owner
> ; Anthony Perard
> ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] xen-block: race condition when stopping the
> device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)
> 
> > -Original Message-
> [snip]
> > >>
> > >> This feels like a race condition between the init/free code with
> > >> handler. Anthony, does it ring any bell?
> > >>
> > >
> > >  From that stack bt it looks like an iothread managed to run after the
> > sring was NULLed. This should not be able happen as the dataplane should
> > have been moved back onto QEMU's main thread context before the ring is
> > unmapped.
> >
> > My knowledge of this code is fairly limited, so correct me if I am
> wrong.
> >
> > blk_set_aio_context() would set the context for the block aio. AFAICT,
> > the only aio for the block is xen_block_complete_aio().
> 
> Not quite. xen_block_dataplane_start() calls
> xen_device_bind_event_channel() and that will add an event channel fd into
> the aio context, so the shared ring is polled by the iothread as well as
> block i/o completion.
> 
> >
> > In the stack above, we are not dealing with a block aio but an aio tie
> > to the event channel (see the call from xen_device_poll). So I don't
> > think the blk_set_aio_context() would affect the aio.
> >
> 
> For the reason I outline above, it does.
> 
> > So it would be possible to get the iothread running because we received
> > a notification on the event channel while we are stopping the block (i.e
> > xen_block_dataplane_stop()).
> >
> 
> We should assume an iothread can essentially run at any time, as it is a
> polling entity. It should eventually block polling on fds assign to its
> aio context but I don't think the abstraction guarantees that it cannot be
> awoken for other reasons (e.g. off a timeout). However and event from the
> frontend will certainly cause the evtchn fd poll to wake up.
> 
> > If xen_block_dataplane_stop() grab the context lock first, then the
> > iothread dealing with the event may wait on the lock until its released.
> >
> > By the time the lock is grabbed, we may have free all the resources
> > (including srings). So the event iothread will end up to dereference a
> > NULL pointer.
> >
> 
> I think the problem may actually be that xen_block_dataplane_event() does
> not acquire the context and thus is not synchronized with
> xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt is
> not clear whether a poll handler called by an iothread needs to acquire
> the context though; TBH I would not have thought it necessary.
> 
> > It feels to me we need a way to quiesce all the iothreads (blk,
> > event,...) before continuing. But I am a bit unsure how to do this in
> > QEMU.
> >
> 
> Looking at virtio-blk.c I see that it does seem to close off its evtchn
> equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder
> whether the 'right' thing to do is to call
> xen_device_unbind_event_channel() using the same mechanism to ensure
> xen_block_dataplane_event() can't race.

Digging around the virtio-blk history I see:

commit 1010cadf62332017648abee0d7a3dc7f2eef9632
Author: Stefan Hajnoczi 
Date:   Wed Mar 7 14:42:03 2018 +

virtio-blk: fix race between .ioeventfd_stop() and vq handler

If the main loop thread invokes .ioeventfd_stop() just as the vq handler
function begins in the IOThread then the handler may lose the race for
the AioContext lock.  By the time the vq handler is able to acquire the
AioContext lock the ioeventfd has already been removed and the handler
isn't supposed to run anymore!

Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
from within the IOThread.  This way no races with the vq handler are
possible.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Fam Zheng 
Acked-by: Paolo Bonzini 
Message-id: 20180307144205.20619-3-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 

...so I think xen-block has exactly the same problem. I think we may also be 
missing a qemu_bh_cancel() to make sure block aio completions are stopped. I'll 
prep a patch.

  Paul

> 
>   Paul
> 
> > Cheers,
> >
> > --
> > Julien Grall
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] xen-block: race condition when stopping the device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)

2019-12-16 Thread Durrant, Paul
> -Original Message-
> From: Durrant, Paul 
> Sent: 16 December 2019 09:50
> To: Durrant, Paul ; Julien Grall ;
> Ian Jackson 
> Cc: Jürgen Groß ; Stefano Stabellini
> ; qemu-de...@nongnu.org; osstest service owner
> ; Anthony Perard
> ; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] xen-block: race condition when stopping the
> device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)
> 
> > -Original Message-----
> > From: Xen-devel  On Behalf Of
> > Durrant, Paul
> > Sent: 16 December 2019 09:34
> > To: Julien Grall ; Ian Jackson 
> > Cc: Jürgen Groß ; Stefano Stabellini
> > ; qemu-de...@nongnu.org; osstest service owner
> > ; Anthony Perard
> > ; xen-devel@lists.xenproject.org
> > Subject: Re: [Xen-devel] xen-block: race condition when stopping the
> > device (WAS: Re: [xen-4.13-testing test] 144736: regressions - FAIL)
> >
> > > -Original Message-
> > [snip]
> > > >>
> > > >> This feels like a race condition between the init/free code with
> > > >> handler. Anthony, does it ring any bell?
> > > >>
> > > >
> > > >  From that stack bt it looks like an iothread managed to run after
> the
> > > sring was NULLed. This should not be able happen as the dataplane
> should
> > > have been moved back onto QEMU's main thread context before the ring
> is
> > > unmapped.
> > >
> > > My knowledge of this code is fairly limited, so correct me if I am
> > wrong.
> > >
> > > blk_set_aio_context() would set the context for the block aio. AFAICT,
> > > the only aio for the block is xen_block_complete_aio().
> >
> > Not quite. xen_block_dataplane_start() calls
> > xen_device_bind_event_channel() and that will add an event channel fd
> into
> > the aio context, so the shared ring is polled by the iothread as well as
> > block i/o completion.
> >
> > >
> > > In the stack above, we are not dealing with a block aio but an aio tie
> > > to the event channel (see the call from xen_device_poll). So I don't
> > > think the blk_set_aio_context() would affect the aio.
> > >
> >
> > For the reason I outline above, it does.
> >
> > > So it would be possible to get the iothread running because we
> received
> > > a notification on the event channel while we are stopping the block
> (i.e
> > > xen_block_dataplane_stop()).
> > >
> >
> > We should assume an iothread can essentially run at any time, as it is a
> > polling entity. It should eventually block polling on fds assign to its
> > aio context but I don't think the abstraction guarantees that it cannot
> be
> > awoken for other reasons (e.g. off a timeout). However and event from
> the
> > frontend will certainly cause the evtchn fd poll to wake up.
> >
> > > If xen_block_dataplane_stop() grab the context lock first, then the
> > > iothread dealing with the event may wait on the lock until its
> released.
> > >
> > > By the time the lock is grabbed, we may have free all the resources
> > > (including srings). So the event iothread will end up to dereference a
> > > NULL pointer.
> > >
> >
> > I think the problem may actually be that xen_block_dataplane_event()
> does
> > not acquire the context and thus is not synchronized with
> > xen_block_dataplane_stop(). The documentation in multiple-iothreads.txt
> is
> > not clear whether a poll handler called by an iothread needs to acquire
> > the context though; TBH I would not have thought it necessary.
> >
> > > It feels to me we need a way to quiesce all the iothreads (blk,
> > > event,...) before continuing. But I am a bit unsure how to do this in
> > > QEMU.
> > >
> >
> > Looking at virtio-blk.c I see that it does seem to close off its evtchn
> > equivalent from iothread context via aio_wait_bh_oneshot(). So I wonder
> > whether the 'right' thing to do is to call
> > xen_device_unbind_event_channel() using the same mechanism to ensure
> > xen_block_dataplane_event() can't race.
> 
> Digging around the virtio-blk history I see:
> 
> commit 1010cadf62332017648abee0d7a3dc7f2eef9632
> Author: Stefan Hajnoczi 
> Date:   Wed Mar 7 14:42:03 2018 +
> 
> virtio-blk: fix race between .ioeventfd_stop() and vq handler
> 
> If the main loop thread invokes .ioeventfd_stop() just as the vq
> handler
> function begins in the IOThread then the handler may lose the race for
> the AioCont

Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv-tlfs.h from Linux

2019-12-10 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 10 December 2019 15:34
> To: Wei Liu 
> Cc: Wei Liu ; Paul Durrant ; Andrew
> Cooper ; Michael Kelley
> ; Xen Development List  de...@lists.xenproject.org>; Roger Pau Monné 
> Subject: Re: [Xen-devel] [PATCH for-next 1/7] x86: import hyperv-tlfs.h
> from Linux
> 
> On 25.10.2019 11:16, Wei Liu wrote:
> > Taken from Linux commit b2d8b167e15bb5ec2691d1119c025630a247f649.
> >
> > This is a pristine copy from Linux. It is not used yet and probably
> > doesn't compile. Changes to make it work will come later.
> >
> > Signed-off-by: Wei Liu 
> 
> This coming from Linux and assuming at least a fair part of it is
> going to be used, in principle
> Acked-by: Jan Beulich 
> 
> However, there are many seemingly unnecessary uses of __packed
> here, which I'd rather not see go in at all (i.e. not be dropped
> later on, and then potentially missing some). I find ...
> 
> > +typedef struct _HV_REFERENCE_TSC_PAGE {
> > +   __u32 tsc_sequence;
> > +   __u32 res1;
> > +   __u64 tsc_scale;
> > +   __s64 tsc_offset;
> > +}  __packed HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
>

You realise there's a definition of this in the viridian code already, right?

  Paul
 
> .. this one particularly suspicious: I don't think it is well
> defined for __packed to also apply to the type
> PHV_REFERENCE_TSC_PAGE points to (and I suspect it doesn't).
> 
> Jan
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] grant table size

2019-11-20 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 20 November 2019 12:09
> To: Durrant, Paul 
> Cc: Roger Pau Monné ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] grant table size
> 
> On 20.11.2019 11:49,  Durrant, Paul  wrote:
> >> From: Roger Pau Monné 
> >> Sent: 20 November 2019 11:06
> >>
> >> Do you have in mind to signal this somehow to guests, or the
> >> expectation is that the guest will have to poll GNTTABOP_query_size
> >> and at some point the size will increase?
> >
> > I don't think the guest need care until its grant table grows to the
> > max. At that point, rather than giving up, the guest would re-query
> > the max value to see if there is now more headroom and then re-size
> > its internal data structures accordingly.
> 
> If we consider dynamic adjustments, what about shrinking of the
> table? This would of course require some form of guest consent,
> but it would be nice if the option would at least be accounted
> for when working out how all of this should behave, even if the
> case may not get handled right now.
> 

Well, perhaps we could have a set_size gnttab op where a guest would be allowed 
to call it with a value less than (or equal to) its current max, so that it can 
voluntarily yield its headroom, but only a privileged guest would be allowed to 
call it with an increased max value?
I'm not sure what mechanism would be best for requesting a guest reduction 
though, I guess probably xenstore... something akin to balloon target pages?

A guest reduction of max is of pretty limited value though AFAICT as only 
in-use frames really use any memory. The (active/shared/status) arrays could, 
of course, be reduced in size but that only gets you a few bytes back.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] grant table size

2019-11-20 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 20 November 2019 11:06
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] grant table size
> 
> On Wed, Nov 20, 2019 at 09:43:59AM +, Durrant, Paul wrote:
> > I've dealt with a few problems over the years where the root cause was a
> guest running out of grant table and so I'm wondering whether it would be
> a good idea to allow a toolstack to increase the table size of a running
> guest, e.g. when plugging in a new PV interface.
> I would rather have a new xl command that does the grant table
> increase (ie: xl set-max-grant-frames) instead of doing it when
> plugging new interfaces.
>

That would be ok too... Just thought it might be nicer if it were automatic but 
it would indeed be complete guess-work in libxl to come up with a per-interface 
grant table quota.
 
> > It would appear that current Linux guests would not be able to make use
> of this as it stands (but that could be fixed), but as far as I can tell a
> pvops kernel would not misbehave if the maximum table size were to
> increase. Similarly Windows PV drivers would need modification to make use
> of a dynamic maximum table size but would not misbehave as is.
> > Does anyone have any objection to the idea?
> 
> Do you have in mind to signal this somehow to guests, or the
> expectation is that the guest will have to poll GNTTABOP_query_size
> and at some point the size will increase?
> 

I don't think the guest need care until its grant table grows to the max. At 
that point, rather than giving up, the guest would re-query the max value to 
see if there is now more headroom and then re-size its internal data structures 
accordingly.

  Paul



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] grant table size

2019-11-20 Thread Durrant, Paul
I've dealt with a few problems over the years where the root cause was a guest 
running out of grant table and so I'm wondering whether it would be a good idea 
to allow a toolstack to increase the table size of a running guest, e.g. when 
plugging in a new PV interface.
It would appear that current Linux guests would not be able to make use of this 
as it stands (but that could be fixed), but as far as I can tell a pvops kernel 
would not misbehave if the maximum table size were to increase. Similarly 
Windows PV drivers would need modification to make use of a dynamic maximum 
table size but would not misbehave as is.
Does anyone have any objection to the idea?

  Paul 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] grant table size

2019-11-20 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 20 November 2019 12:42
> To: Durrant, Paul 
> Cc: Roger Pau Monné ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] grant table size
> 
> On 20.11.2019 12:18,  Durrant, Paul  wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: 20 November 2019 12:09
> >> To: Durrant, Paul 
> >> Cc: Roger Pau Monné ; xen-
> de...@lists.xenproject.org
> >> Subject: Re: [Xen-devel] grant table size
> >>
> >> On 20.11.2019 11:49,  Durrant, Paul  wrote:
> >>>> From: Roger Pau Monné 
> >>>> Sent: 20 November 2019 11:06
> >>>>
> >>>> Do you have in mind to signal this somehow to guests, or the
> >>>> expectation is that the guest will have to poll GNTTABOP_query_size
> >>>> and at some point the size will increase?
> >>>
> >>> I don't think the guest need care until its grant table grows to the
> >>> max. At that point, rather than giving up, the guest would re-query
> >>> the max value to see if there is now more headroom and then re-size
> >>> its internal data structures accordingly.
> >>
> >> If we consider dynamic adjustments, what about shrinking of the
> >> table? This would of course require some form of guest consent,
> >> but it would be nice if the option would at least be accounted
> >> for when working out how all of this should behave, even if the
> >> case may not get handled right now.
> >>
> >
> > Well, perhaps we could have a set_size gnttab op where a guest would
> > be allowed to call it with a value less than (or equal to) its current
> > max, so that it can voluntarily yield its headroom, but only a
> > privileged guest would be allowed to call it with an increased max
> > value?
> 
> Ah yes, this sounds good.
> 
> > I'm not sure what mechanism would be best for requesting a guest
> > reduction though, I guess probably xenstore... something akin to
> > balloon target pages?
> 
> Perhaps.
> 
> > A guest reduction of max is of pretty limited value though AFAICT as
> > only in-use frames really use any memory. The (active/shared/status)
> > arrays could, of course, be reduced in size but that only gets you a
> > few bytes back.
> 
> Well, if this really was about just "a few bytes", why wouldn't we
> allow arbitrary size grant tables to begin with?
> 

Well, another option would be to always set the value of max seen by the guest 
to be some really large value but actually apply a lower limit in Xen, which 
could then be increased by the toolstack. I don't believe that would require 
any guest-side modification either.

  Paul 

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 26 November 2019 15:06
> To: George Dunlap ; xen-
> de...@lists.xenproject.org; Wei Liu ; Jan Beulich
> ; Durrant, Paul ; Juergen Gross
> 
> Subject: Re: [PATCH for-4.13] docs/xl: Document pci-assignable state
> 
> Ian Jackson writes ("Re: [PATCH for-4.13] docs/xl: Document pci-assignable
> state"):
> > George Dunlap writes ("Re: [PATCH for-4.13] docs/xl: Document pci-
> assignable state"):
> > > I kind of feel like the discussion of the security risks inherent in
> pci
> > > passthrough belong in a separate document, but perhaps a brief mention
> > > here would be helpful.  Perhaps the following?
> > >
> > > "As always, this should only be done if you trust the guest, or are
> > > confident that the particular device you're re-assigning to dom0 will
> > > cancel all in-flight DMA on FLR."
> >
> > SGTM.
> >
> > I like "as always" which clearly signals that this is a more general
> > problem without requiring us to actually write that other
> > comprehensive document...
> 

The text sounds fine in general but the 'as always' does rather imply 'hey, we 
never said PCI pass-through was safe, did we?'

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] domain_create: honour global grant/maptrack frame limits...

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 26 November 2019 11:37
> To: Paul Durrant ; Durrant, Paul 
> Cc: Stefano Stabellini ; Julien Grall
> ; Wei Liu ; Konrad Rzeszutek Wilk
> ; George Dunlap ;
> Andrew Cooper ; Ian Jackson
> ; Jan Beulich ; xen-devel
> 
> Subject: Re: [Xen-devel] [PATCH] domain_create: honour global
> grant/maptrack frame limits...
> 
> On 26.11.19 12:30, Paul Durrant wrote:
> > On Wed, 13 Nov 2019 at 13:55, Paul Durrant  wrote:
> >>
> >> ...when their values are larger than the per-domain configured limits.
> >>
> >> Signed-off-by: Paul Durrant 
> >> ---
> >> Cc: Andrew Cooper 
> >> Cc: George Dunlap 
> >> Cc: Ian Jackson 
> >> Cc: Jan Beulich 
> >> Cc: Julien Grall 
> >> Cc: Konrad Rzeszutek Wilk 
> >> Cc: Stefano Stabellini 
> >> Cc: Wei Liu 
> >>
> >> After mining through commits it is still unclear to me exactly when Xen
> >> stopped honouring the global values, but I really think this commit
> should
> >> be back-ported to stable trees as it was a behavioural change that can
> >> cause domUs to fail in non-obvious ways.
> >
> > Any other opinions on this? AFAICT questions is still open:
> >
> > - Do we consider not honouring the command line values to be a
> > regression (since domUs that would have worked before will no longer
> > work after a basic upgrade of Xen)?
> >
> >Paul
> >
> >> ---
> >>   xen/common/domain.c | 14 --
> >>   1 file changed, 12 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/xen/common/domain.c b/xen/common/domain.c
> >> index 66c7fc..aad6d55b82 100644
> >> --- a/xen/common/domain.c
> >> +++ b/xen/common/domain.c
> >> @@ -335,6 +335,7 @@ struct domain *domain_create(domid_t domid,
> >>   enum { INIT_watchdog = 1u<<1,
> >>  INIT_evtchn = 1u<<3, INIT_gnttab = 1u<<4, INIT_arch =
> 1u<<5 };
> >>   int err, init_status = 0;
> >> +unsigned int max_grant_frames, max_maptrack_frames;
> >>
> >>   if ( config && (err = sanitise_domain_config(config)) )
> >>   return ERR_PTR(err);
> >> @@ -456,8 +457,17 @@ struct domain *domain_create(domid_t domid,
> >>   goto fail;
> >>   init_status |= INIT_evtchn;
> >>
> >> -if ( (err = grant_table_init(d, config->max_grant_frames,
> >> - config->max_maptrack_frames)) !=
> 0 )
> >> +/*
> >> + * Make sure that the configured values don't reduce any
> >> + * global command line override.
> >> + */
> >> +max_grant_frames = max(config->max_grant_frames,
> >> +   opt_max_grant_frames);
> >> +max_maptrack_frames = max(config->max_maptrack_frames,
> >> +  opt_max_maptrack_frames);
> >> +
> >> +if ( (err = grant_table_init(d, max_grant_frames,
> >> + max_maptrack_frames)) != 0 )
> 
> So basically the per-domain settings are ignored.
> 

Basically, yes.

> They are not allowed to be smaller than the global limits (due to
> using max()).
> 
> They are not allowed to be larger than the global limits (due to the
> test in grant_table_init().
> 
> That is _not_ the purpose of being able to control the settings per
> domain.
> 

Ok, if a straight-up return to old behaviour is out then I guess 4.13 will 
carry the regression.

  Paul

> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 26 November 2019 14:22
> To: George Dunlap 
> Cc: xen-devel@lists.xenproject.org; Wei Liu ; Jan Beulich
> ; Paul Durrant ; Juergen Gross
> 
> Subject: Re: [PATCH for-4.13] docs/xl: Document pci-assignable state
> 
> [resending to just Paul to fix email address problem]
> 
> George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable
> state"):
> >  =item B [I<-r>] I
> ...
> > +Make the device at PCI Bus/Device/Function BDF not assignable to
> > +guests.  This will at least unbind the device from pciback, and
> > +re-assign it from the "quarantine domain" back to domain 0.  If the -r
> > +option is specified, it will also attempt to re-bind the device to its
> > +original driver, making it usable by Domain 0 again.  If the device is
> > +not bound to pciback, it will return success.
> > +
> > +Note that this functionality will work even for devices which were not
> > +made assignable by B.  This can be used to allow
> > +dom0 to access devices which were automatically quarantined by Xen
> > +after domain destruction as a result of Xen's B
> > +command-line default.
> 
> What are the security implications of doing this if the device might
> still be doing DMA or something ?
> 
> (For that matter, presumably there are security implications of
> assigning the same device in sequence to different guests?)
> 

Assigning any device carries a risk and can never considered to be secure in 
any general way. E.g. a device that exposes its config space in a writable 
fashion via an internal i2c bus that can be accessed via one of its BARs. 
Quarantining helps to the extent that, if a device is continuing to DMA than at 
least that doesn't hit dom0 whilst the FLR/SBR is attempted, but if even that's 
not effective then the device should probably remain in quarantine until it is 
power-cycled.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 26 November 2019 14:27
> To: Ian Jackson 
> Cc: Juergen Gross ; xen-devel@lists.xenproject.org; Paul
> Durrant ; George Dunlap
> ; Wei Liu 
> Subject: Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable
> state
> 
> On 26.11.2019 15:14, Ian Jackson wrote:
> > George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable
> state"):
> >>  =item B [I<-r>] I
> > ...
> >> +Make the device at PCI Bus/Device/Function BDF not assignable to
> >> +guests.  This will at least unbind the device from pciback, and
> >> +re-assign it from the "quarantine domain" back to domain 0.  If the -r
> >> +option is specified, it will also attempt to re-bind the device to its
> >> +original driver, making it usable by Domain 0 again.  If the device is
> >> +not bound to pciback, it will return success.
> >> +
> >> +Note that this functionality will work even for devices which were not
> >> +made assignable by B.  This can be used to allow
> >> +dom0 to access devices which were automatically quarantined by Xen
> >> +after domain destruction as a result of Xen's B
> >> +command-line default.
> >
> > What are the security implications of doing this if the device might
> > still be doing DMA or something ?
> 
> Devices get reset in between, so well behaving ones should not
> still be doing DMA at that point. Misbehaving ones would better
> not be assigned (back and forth) anyway. But a recent patch of
> Paul's suggests that people still wish to do so, on the
> assumption that such DMA will drain sufficiently quickly.

Yes. I will hopefully find time to post the next version of that patch this 
week.

  Paul

> 
> > (For that matter, presumably there are security implications of
> > assigning the same device in sequence to different guests?)
> 
> Right.
> 
> Jan
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] domain_create: honour global grant/maptrack frame limits...

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: George Dunlap 
> Sent: 26 November 2019 12:32
> To: Paul Durrant ; Durrant, Paul 
> Cc: xen-devel ; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Konrad Rzeszutek Wilk ; George
> Dunlap ; Andrew Cooper
> ; Ian Jackson ; Jan
> Beulich 
> Subject: Re: [Xen-devel] [PATCH] domain_create: honour global
> grant/maptrack frame limits...
> 
> On 11/26/19 11:30 AM, Paul Durrant wrote:
> > On Wed, 13 Nov 2019 at 13:55, Paul Durrant  wrote:
> >>
> >> ...when their values are larger than the per-domain configured limits.
> >>
> >> Signed-off-by: Paul Durrant 
> >> ---
> >> Cc: Andrew Cooper 
> >> Cc: George Dunlap 
> >> Cc: Ian Jackson 
> >> Cc: Jan Beulich 
> >> Cc: Julien Grall 
> >> Cc: Konrad Rzeszutek Wilk 
> >> Cc: Stefano Stabellini 
> >> Cc: Wei Liu 
> >>
> >> After mining through commits it is still unclear to me exactly when Xen
> >> stopped honouring the global values, but I really think this commit
> should
> >> be back-ported to stable trees as it was a behavioural change that can
> >> cause domUs to fail in non-obvious ways.
> >
> > Any other opinions on this? AFAICT questions is still open:
> >
> > - Do we consider not honouring the command line values to be a
> > regression (since domUs that would have worked before will no longer
> > work after a basic upgrade of Xen)?
> 
> This would be a bit easier to form a "policy" opinion on (or perhaps
> alternate solutions to) if more of the situation were outlined here.
> 
> Is the problem that the per-domain config is always set, and doesn't
> take the hypervisor-set config into account?  Wouldn't it be better to
> modify the toolstack to use the hypervisor value if it's not set?
> 
> In fact, it looks kind of like things are screwed up anyway -- the
> "default" value of max_grant_frames, if no value is specified, is set in
> xl.c.  If that were the behavior we wanted, it should be set in libxl.c.
> 
> But it doesn't seem like it should be terribly difficult to get a "use
> the default" sentinel value passed in to Xen, such that:
> 
> 1. People who don't do anything will get the default currently specified
> in xl.c
> 
> 2. People who set the value on the Xen command-line and don't set
> anything in the guest config file will get the Xen command-line value
> 
> 3. People who set the value in the config file will get the value they
> specified (regardless of the global setting).
> 
> Is that the behaviour you'd like to see, Paul?

I think the order should be:

If set in xl.cfg => use that, else
If set in xl.conf => use that, else
Use the command line/default value

I.e. the ultimate value should be set in Xen (and possibly overridden by the 
command line) and not hardcoded at any other layer.

There is also the issue of limits but I guess the rationale there should be: If 
a value *is* specified then it should not exceed the value set in Xen.

Does that sound right?

  Paul


> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> George Dunlap
> Sent: 26 November 2019 17:18
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross ; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Paul Durrant ; Andrew Cooper
> ; Konrad Rzeszutek Wilk
> ; George Dunlap ; Marek
> Marczykowski-Górecki ; Jan Beulich
> ; Ian Jackson 
> Subject: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> Xen used to have single, system-wide limits for the number of grant
> frames and maptrack frames a guest was allowed to create.  Increasing
> or decreasing this single limit on the Xen command-line would change
> the limit for all guests on the system.
> 
> Later, per-domain limits for these values was created.  The
> system-wide limits became strict limits: domains could not be created
> with higher limits, but could be created with lower limits.
> 
> However, the change also introduced a range of different "default"
> values into various places in the toolstack:
> 
> - The python libxc bindings hard-coded these values to 32 and 1024,
>   respectively
> 
> - The libxl default values are 32 and 1024 respectively.
> 
> - xl will use the libxl default for maptrack, but does its own default
>   calculation for grant frames: either 32 or 64, based on the max
>   possible mfn.
> 
> These defaults interact poorly with the hypervisor command-line limit:
> 
> - The hypervisor command-line limit cannot be used to raise the limit
>   for all guests anymore, as the default in the toolstack will
>   effectively override this.
> 
> - If you use the hypervisor command-line limit to *reduce* the limit,
>   then the "default" values generated by the toolstack are too high,
>   and all guest creations will fail.
> 
> In other words, the toolstack defaults require any change to be
> effected by having the admin explicitly specify a new value in every
> guest.
> 
> In order to address this, have grant_table_init treat '0' values for
> max_grant_frames and max_maptrack_frames as instructions to use the
> system-wide default.  Have all the above toolstacks default to passing
> 0 unless a different value is explicitly given.
> 
> This restores the old behavior, that changing the hypervisor
> command-line option can change the behavior for all guests, while
> retaining the ability to set per-guest values.  It also removes the
> bug that *reducing* the system-wide max will cause all domains without
> explicit limits to fail.
> 
> (The ocaml bindings require the caller to always specify a value, and
> the code to start a xenstored stubdomain hard-codes these to 4 and 128
> respectively; these will not be addressed here.)
> 
> Signed-off-by: George Dunlap 
> ---
> Release justification: This is an observed regression (albeit one that
> has spanned several releases now).
> 
> Compile-tested only.
> 
> NB this patch could be applied without the whitespace fixes (perhaps
> with some fix-ups); it's just easier since my editor strips trailing
> whitespace out automatically.
> 
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Andrew Cooper 
> CC: Jan Beulich 
> CC: Paul Durrant 
> CC: Julien Grall 
> CC: Konrad Rzeszutek Wilk 
> CC: Stefano Stabellini 
> CC: Juergen Gross 
> CC: Marek Marczykowski-Górecki 
> ---
>  tools/libxl/libxl.h   |  4 ++--
>  tools/python/xen/lowlevel/xc/xc.c |  2 --
>  tools/xl/xl.c | 12 ++--
>  xen/common/grant_table.c  |  7 +++
>  xen/include/public/domctl.h   |  6 --
>  5 files changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 49b56fa1a3..1648d337e7 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -364,8 +364,8 @@
>   */
>  #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1
> 
> -#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32
> -#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024
> +#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 0
> +#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 0
> 
>  /*
>   * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has
> diff --git a/tools/python/xen/lowlevel/xc/xc.c
> b/tools/python/xen/lowlevel/xc/xc.c
> index 6d2afd5695..0f861872ce 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -127,8 +127,6 @@ static PyObject *pyxc_domain_create(XcObject *self,
>  },
>  .max_vcpus = 1,
>  .max_evtchn_port = -1, /* No limit. */
> -.max_grant_frames = 32,
> -.max_maptrack_frames = 1024,
>  };
> 
>  static char *kwd_list[] = { "domid", "ssidref", "handle", "flags",
> diff --git a/tools/xl/xl.c b/tools/xl/xl.c
> index ddd29b3f1b..b6e220184d 100644
> --- a/tools/xl/xl.c
> +++ b/tools/xl/xl.c
> @@ -51,8 +51,8 @@ libxl_bitmap global_pv_affinity_mask;
>  enum output_format default_output_format = OUTPUT_FORMAT_JSON;
>  int claim_mode = 1;
>  bool progress_use_cr = 0;
> -int max_grant_frames = -1;
> -int max_maptrack_frames = -1;
> +int 

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Ian
> Jackson
> Sent: 26 November 2019 17:36
> To: George Dunlap 
> Cc: Juergen Gross ; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Paul Durrant ; Andrew Cooper
> ; Konrad Rzeszutek Wilk
> ; Marek Marczykowski-Górecki
> ; Hans van Kranenburg ;
> Jan Beulich ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames
> and max_maptrack_frames handling
> 
> George Dunlap writes ("[PATCH for-4.13 2/2] Rationalize max_grant_frames
> and max_maptrack_frames handling"):
> > Xen used to have single, system-wide limits for the number of grant
> > frames and maptrack frames a guest was allowed to create.  Increasing
> > or decreasing this single limit on the Xen command-line would change
> > the limit for all guests on the system.
> 
> If I am not mistaken, this is an important change to have.
> 

It is, and many thanks to George for picking this up.

> I have seen reports of users who ran out of grant/maptrack frames
> because of updates to use multiring protocols etc.  The error messages
> are not very good and the recommended workaround has been to increase
> the default limit on the hypervisor command line.
> 
> It is important that we don't break that workaround!

Alas it has apparently been broken for several releases now :-(

  Paul

> 
> Thanks,
> Ian.
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 27 November 2019 09:44
> To: Durrant, Paul ; Grall, Julien 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Roger Pau Monné ; Wei
> Liu 
> Subject: Re: [PATCH] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> On 26.11.2019 18:17, Paul Durrant wrote:
> > From: Julien Grall 
> >
> > A guest will setup a shared page with the hypervisor for each vCPU via
> > XENPMU_init. The page will then get mapped in the hypervisor and only
> > released when XEMPMU_finish is called.
> >
> > This means that if the guest is not shutdown gracefully (such as via xl
> > destroy), the page will stay mapped in the hypervisor.
> 
> Isn't this still too weak a description? It's not the tool stack
> invoking XENPMU_finish, but the guest itself afaics. I.e. a
> misbehaving guest could prevent proper cleanup even with graceful
> shutdown.
> 

Ok, how about 'if the guest fails to invoke XENPMU_finish, e.g. if it is 
destroyed, rather than cleanly shut down'?

> > @@ -2224,6 +2221,9 @@ int domain_relinquish_resources(struct domain *d)
> >  if ( is_hvm_domain(d) )
> >  hvm_domain_relinquish_resources(d);
> >
> > +for_each_vcpu ( d, v )
> > +vpmu_destroy(v);
> > +
> >  return 0;
> >  }
> 
> I think simple things which may allow shrinking the page lists
> should be done early in the function. As vpmu_destroy() looks
> to be idempotent, how about leveraging the very first
> for_each_vcpu() loop in the function (there are too many of them
> there anyway, at least for my taste)?
> 

Ok. I did wonder where in the sequence was best... Leaving to the end obviously 
puts it closer to where it was previously called, but I can't see any harm in 
moving it earlier.

  Paul

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 27 November 2019 11:10
> To: Durrant, Paul 
> Cc: Ian Jackson ; George Dunlap
> ; Juergen Gross ; Stefano
> Stabellini ; Julien Grall ; Wei
> Liu ; Paul Durrant ; Andrew Cooper
> ; Konrad Rzeszutek Wilk
> ; Marek Marczykowski-Górecki
> ; Hans van Kranenburg ;
> Jan Beulich ; xen-devel@lists.xenproject.org
> Subject: RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames
> and max_maptrack_frames handling
> 
> Durrant, Paul writes ("RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize
> max_grant_frames and max_maptrack_frames handling"):
> > > -Original Message-
> > > From: Xen-devel  On Behalf Of
> Ian
> > > Jackson
> > > I have seen reports of users who ran out of grant/maptrack frames
> > > because of updates to use multiring protocols etc.  The error messages
> > > are not very good and the recommended workaround has been to increase
> > > the default limit on the hypervisor command line.
> > >
> > > It is important that we don't break that workaround!
> >
> > Alas it has apparently been broken for several releases now :-(
> 
> I guess at least in Debian (where I have seen this) we haven't
> released with any affected versions yet...

I believe the problem was introduce in 4.10, so I think it would be prudent to 
also back-port the final fix to stable trees from then on.

  Paul

> 
> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 27 November 2019 11:15
> To: Jan Beulich ; Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Roger Pau Monné ; Wei
> Liu 
> Subject: Re: [PATCH] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> Hi,
> 
> On 27/11/2019 09:44, Jan Beulich wrote:
> > On 26.11.2019 18:17, Paul Durrant wrote:
> >> From: Julien Grall 
> >>
> >> A guest will setup a shared page with the hypervisor for each vCPU via
> >> XENPMU_init. The page will then get mapped in the hypervisor and only
> >> released when XEMPMU_finish is called.
> >>
> >> This means that if the guest is not shutdown gracefully (such as via xl
> >> destroy), the page will stay mapped in the hypervisor.
> >
> > Isn't this still too weak a description? It's not the tool stack
> > invoking XENPMU_finish, but the guest itself afaics. I.e. a
> > misbehaving guest could prevent proper cleanup even with graceful
> > shutdown.
> >
> >> @@ -2224,6 +2221,9 @@ int domain_relinquish_resources(struct domain *d)
> >>   if ( is_hvm_domain(d) )
> >>   hvm_domain_relinquish_resources(d);
> >>
> >> +for_each_vcpu ( d, v )
> >> +vpmu_destroy(v);
> >> +
> >>   return 0;
> >>   }
> >
> > I think simple things which may allow shrinking the page lists
> > should be done early in the function. As vpmu_destroy() looks
> > to be idempotent, how about leveraging the very first
> > for_each_vcpu() loop in the function (there are too many of them
> > there anyway, at least for my taste)?
> 
> This is not entirely obvious that vpmu_destroy() is idempotent.
> 
> For instance, I can't find out who is clearing VCPU_CONTEXT_ALLOCATED.
> so I think vcpu_arch_destroy() would be executed over and over.
> 
> I don't know whether this is an issue, but I can't figure out that is it
> not one. Did I miss anything?

It's sufficiently unobvious that it is a concern whether a guest invoking 
XENPMU_finish multiple times can cause harm. I'll see if I can clean that up.

  Paul

> 
> Cheers,
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 27 November 2019 15:56
> To: Durrant, Paul ; George Dunlap
> 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Anthony PERARD ;
> Roger Pau Monné ; Volodymyr Babchuk
> ; George Dunlap ;
> Ian Jackson ; Marek Marczykowski-Górecki
> ; Stefano Stabellini
> ; Konrad Rzeszutek Wilk ;
> Julien Grall ; Wei Liu 
> Subject: Re: [PATCH v2] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On 27.11.2019 15:37, Paul Durrant wrote:
> > --- a/xen/arch/arm/setup.c
> > +++ b/xen/arch/arm/setup.c
> > @@ -789,7 +789,7 @@ void __init start_xen(unsigned long
> boot_phys_offset,
> >  .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >  .max_evtchn_port = -1,
> >  .max_grant_frames = gnttab_dom0_frames(),
> > -.max_maptrack_frames = opt_max_maptrack_frames,
> > +.max_maptrack_frames = -1,
> >  };
> >  int rc;
> >
> > --- a/xen/arch/x86/setup.c
> > +++ b/xen/arch/x86/setup.c
> > @@ -697,8 +697,8 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
> >  struct xen_domctl_createdomain dom0_cfg = {
> >  .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity
> : 0,
> >  .max_evtchn_port = -1,
> > -.max_grant_frames = opt_max_grant_frames,
> > -.max_maptrack_frames = opt_max_maptrack_frames,
> > +.max_grant_frames = -1,
> > +.max_maptrack_frames = -1,
> >  };
> 
> With these there's no need anymore for opt_max_maptrack_frames to
> be non-static. Sadly Arm still wants opt_max_grant_frames
> accessible in gnttab_dom0_frames().
>

Yes, I was about to make them static until I saw what the ARM code did.
 
> > --- a/xen/common/grant_table.c
> > +++ b/xen/common/grant_table.c
> > @@ -1837,12 +1837,18 @@ active_alloc_failed:
> >  return -ENOMEM;
> >  }
> >
> > -int grant_table_init(struct domain *d, unsigned int max_grant_frames,
> > - unsigned int max_maptrack_frames)
> > +int grant_table_init(struct domain *d, int max_grant_frames,
> > + int max_maptrack_frames)
> >  {
> >  struct grant_table *gt;
> >  int ret = -ENOMEM;
> >
> > +/* Default to maximum value if no value was specified */
> > +if ( max_grant_frames < 0 )
> > +max_grant_frames = opt_max_grant_frames;
> > +if ( max_maptrack_frames < 0 )
> > +max_maptrack_frames = opt_max_maptrack_frames;
> > +
> >  if ( max_grant_frames < INITIAL_NR_GRANT_FRAMES ||
> 
> I take it we don't expect people to specify 2^^31 or more
> frames for either option. It looks like almost everything
> here would cope, except for this very comparison. Nevertheless
> I wonder whether you wouldn't better confine both values to
> [0, INT_MAX] now, including when adjusted at runtime.

I can certainly remove the 'U' from the definition of INITIAL_NR_GRANT_FRAMES, 
but do you want me to make opt_max_grant_frames and opt_max_maptrack_frames 
into signed ints and add signed parser code too? I also don't understand the 
'adjusted at runtime' part.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: Tian, Kevin 
> Sent: 25 November 2019 08:22
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: RE: [PATCH] x86 / iommu: set up a scratch page in the quarantine
> domain
> 
> > From: Paul Durrant [mailto:pdurr...@amazon.com]
> > Sent: Wednesday, November 20, 2019 8:09 PM
> >
> > This patch introduces a new iommu_op to facilitate a per-implementation
> > quarantine set up, and then further code for x86 implementations
> > (amd and vtd) to set up a read/wrote scratch page to serve as the
> source/
> > target for all DMA whilst a device is assigned to dom_io.
> >
> > The reason for doing this is that some hardware may continue to re-try
> > DMA, despite FLR, in the event of an error. Having a scratch page mapped
> > will allow pending DMA to drain and thus quiesce such buggy hardware.
> 
> then there is no diagnostics at all since all faults are quiescent now...
> why do we want to support such buggy hardware? Is it better to make
> it an default-off option since buggy is supposed to niche case?

I guess it could be a command line option... perhaps making the new 
'iommu=quarantine' boolean into something more complex, but I'm not sure it's 
really worth it. Perhaps a compile time option would be better?

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: George Dunlap 
> Sent: 27 November 2019 16:34
> To: Jan Beulich ; Durrant, Paul 
> Cc: AndrewCooper ; Anthony PERARD
> ; Roger Pau Monné ;
> Volodymyr Babchuk ; George Dunlap
> ; Ian Jackson ;
> Marek Marczykowski-Górecki ; Stefano
> Stabellini ; xen-devel@lists.xenproject.org;
> Konrad Rzeszutek Wilk ; Julien Grall
> ; Wei Liu 
> Subject: Re: [PATCH v2] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On 11/27/19 4:20 PM, Jan Beulich wrote:
> > On 27.11.2019 17:14,  Durrant, Paul  wrote:
> >>> From: Jan Beulich 
> >>> Sent: 27 November 2019 15:56
> >>>
> >>> On 27.11.2019 15:37, Paul Durrant wrote:
> >>>> --- a/xen/arch/arm/setup.c
> >>>> +++ b/xen/arch/arm/setup.c
> >>>> @@ -789,7 +789,7 @@ void __init start_xen(unsigned long
> >>> boot_phys_offset,
> >>>>  .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >>>>  .max_evtchn_port = -1,
> >>>>  .max_grant_frames = gnttab_dom0_frames(),
> >>>> -.max_maptrack_frames = opt_max_maptrack_frames,
> >>>> +.max_maptrack_frames = -1,
> >>>>  };
> >>>>  int rc;
> >>>>
> >>>> --- a/xen/arch/x86/setup.c
> >>>> +++ b/xen/arch/x86/setup.c
> >>>> @@ -697,8 +697,8 @@ void __init noreturn __start_xen(unsigned long
> >>> mbi_p)
> >>>>  struct xen_domctl_createdomain dom0_cfg = {
> >>>>  .flags = IS_ENABLED(CONFIG_TBOOT) ?
> XEN_DOMCTL_CDF_s3_integrity
> >>> : 0,
> >>>>  .max_evtchn_port = -1,
> >>>> -.max_grant_frames = opt_max_grant_frames,
> >>>> -.max_maptrack_frames = opt_max_maptrack_frames,
> >>>> +.max_grant_frames = -1,
> >>>> +.max_maptrack_frames = -1,
> >>>>  };
> >>>
> >>> With these there's no need anymore for opt_max_maptrack_frames to
> >>> be non-static. Sadly Arm still wants opt_max_grant_frames
> >>> accessible in gnttab_dom0_frames().
> >>
> >> Yes, I was about to make them static until I saw what the ARM code did.
> >
> > But the one that Arm doesn't need should become static now.
> >
> >>>> --- a/xen/common/grant_table.c
> >>>> +++ b/xen/common/grant_table.c
> >>>> @@ -1837,12 +1837,18 @@ active_alloc_failed:
> >>>>  return -ENOMEM;
> >>>>  }
> >>>>
> >>>> -int grant_table_init(struct domain *d, unsigned int
> max_grant_frames,
> >>>> - unsigned int max_maptrack_frames)
> >>>> +int grant_table_init(struct domain *d, int max_grant_frames,
> >>>> + int max_maptrack_frames)
> >>>>  {
> >>>>  struct grant_table *gt;
> >>>>  int ret = -ENOMEM;
> >>>>
> >>>> +/* Default to maximum value if no value was specified */
> >>>> +if ( max_grant_frames < 0 )
> >>>> +max_grant_frames = opt_max_grant_frames;
> >>>> +if ( max_maptrack_frames < 0 )
> >>>> +max_maptrack_frames = opt_max_maptrack_frames;
> >>>> +
> >>>>  if ( max_grant_frames < INITIAL_NR_GRANT_FRAMES ||
> >>>
> >>> I take it we don't expect people to specify 2^^31 or more
> >>> frames for either option. It looks like almost everything
> >>> here would cope, except for this very comparison. Nevertheless
> >>> I wonder whether you wouldn't better confine both values to
> >>> [0, INT_MAX] now, including when adjusted at runtime.
> >>
> >> I can certainly remove the 'U' from the definition of
> >> INITIAL_NR_GRANT_FRAMES,
> >
> > Oh, I didn't pay attention that is has a U on it - in this case
> > the comparison above is fine.
> >
> >> but do you want me to make opt_max_grant_frames and
> >> opt_max_maptrack_frames into signed ints and add signed parser
> >> code too?
> >
> > Definitely not. They should remain unsigned quantities, but their
> > values may need sanity checking now.
> >
> >> I also don't understand the 'adjusted at runtime' part.
> >
> > Well, for a command line drive value you could adjust an out of
> > bounds value in some __init function. But for runtime modifiable
> > settings you won't get away this easily.
> 
> TBH I'd be tempted to define XENSOMETHING_MAX_DEFAULT as (unsigned
> long)(-1) or something, and explicitly compare to that.  That leaves
> open the possibility of having more sentinel values if we decided we
> wanted them.

I'm extremely confused now. What do you want me to compare and where?

I assume we're talking about the opt_XXX values. Am I supposed to stop >INT_MAX 
being assigned to them? Or should I define local unsigned values for 
max_maptrack/grant_frames and simply initialize them to the passed-in arg (if 
>= 0) or the opt_XXX value otherwise.

  Paul

> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 20 November 2019 13:52
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Kevin Tian ;
> Roger Pau Monné ; Wei Liu ; Andrew
> Cooper 
> Subject: Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the
> quarantine domain
> 
> On 20.11.2019 13:08, Paul Durrant wrote:
> > This patch introduces a new iommu_op to facilitate a per-implementation
> > quarantine set up, and then further code for x86 implementations
> > (amd and vtd) to set up a read/wrote scratch page to serve as the
> source/
> > target for all DMA whilst a device is assigned to dom_io.
> 
> A single page in the system won't do, I'm afraid. If one guest's
> (prior) device is retrying reads with data containing secrets of that
> guest, another guest's (prior) device could end up writing this data
> to e.g. storage where after a guest restart it is then available to
> the wrong guest.
> 

True. I was unsure whether this was a concern in the scenarios we had to deal 
with but I'm informed it is, and in the general case it is too.

> Also nit: s/wrote/write/ .
> 

Yep. Will fix.

> > The reason for doing this is that some hardware may continue to re-try
> > DMA, despite FLR, in the event of an error. Having a scratch page mapped
> > will allow pending DMA to drain and thus quiesce such buggy hardware.
> 
> Without a "sink" page mapped, this would result in IOMMU faults aiui.
> What's the problem with having these faults surface and get handled,
> eventually leading to the device getting bus-mastering disabled? Is
> it that devices continue DMAing even when bus-mastering is off? If
> so, is it even safe to pass through any such device? In any event
> the description needs to be extended here.
> 

The devices in question ignore both FLR and BME and some IOMMU faults are 
fatal. I believe, however, write faults are not and so I think a single 
read-only 'source' page will be sufficient.

> > Signed-off-by: Paul Durrant 
> 
> What about Arm? Can devices which Arm allows to assign to guests
> also "babble" like this after de-assignment? If not, this should be
> said in the description. If so, obviously that side would also want
> fixing.
> 
> > --- a/xen/drivers/passthrough/amd/iommu_map.c
> > +++ b/xen/drivers/passthrough/amd/iommu_map.c
> > @@ -560,6 +560,63 @@ int amd_iommu_reserve_domain_unity_map(struct
> domain *domain,
> >  return rt;
> >  }
> >
> > +int amd_iommu_quarantine_init(struct domain *d)
> 
> __init
> 

Ok.

> > +{
> > +struct domain_iommu *hd = dom_iommu(d);
> > +unsigned int level;
> > +struct amd_iommu_pte *table;
> > +
> > +if ( hd->arch.root_table )
> > +{
> > +ASSERT_UNREACHABLE();
> > +return 0;
> > +}
> > +
> > +spin_lock(>arch.mapping_lock);
> > +
> > +level = hd->arch.paging_mode;
> 
> With DomIO being PV in principle, this is going to be the
> fixed value PV domains get set, merely depending on RAM size at
> boot time (i.e. not accounting for memory hotplug). This could
> be easily too little for HVM guests, which are free to extend
> their GFN (and hence DFN) space. Therefore I think you need to
> set the maximum possible level here.
>

Ok. I'd not considered memory hotplug. I'll use a static maximum value instead, 
as VT-d does in general.

> > +hd->arch.root_table = alloc_amd_iommu_pgtable();
> > +if ( !hd->arch.root_table )
> > +goto out;
> > +
> > +table = __map_domain_page(hd->arch.root_table);
> > +while ( level )
> > +{
> > +struct page_info *pg;
> > +unsigned int i;
> > +
> > +/*
> > + * The pgtable allocator is fine for the leaf page, as well as
> > + * page table pages.
> > + */
> > +pg = alloc_amd_iommu_pgtable();
> > +if ( !pg )
> > +break;
> > +
> > +for ( i = 0; i < PTE_PER_TABLE_SIZE; i++ )
> > +{
> > +struct amd_iommu_pte *pde = [i];
> > +
> > +set_iommu_pde_present(pde, mfn_x(page_to_mfn(pg)), level -
> 1,
> > +  true, true);
> 
> This would also benefit from a comment indicating that it's fine
> for the leaf level as well, despite the "pde" in the name (and
> its sibling set_iommu_pte_present() actually existing). Of course
> you could as well extend the comment a few lines up.

The AMD IOMMU conflates PDE and PTE all over the place but I'll add 

Re: [Xen-devel] [PATCH v2] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: Boris Ostrovsky 
> Sent: 27 November 2019 16:32
> To: Jan Beulich ; Durrant, Paul 
> Cc: Grall, Julien ; Andrew Cooper
> ; Roger Pau Monné ; Jun
> Nakajima ; Kevin Tian ; Wei
> Liu ; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v2] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> On 11/27/19 10:44 AM, Jan Beulich wrote:
> > On 27.11.2019 13:00, Paul Durrant wrote:
> >> --- a/xen/arch/x86/cpu/vpmu.c
> >> +++ b/xen/arch/x86/cpu/vpmu.c
> >> @@ -479,6 +479,8 @@ static int vpmu_arch_initialise(struct vcpu *v)
> >>
> >>  if ( ret )
> >>  printk(XENLOG_G_WARNING "VPMU: Initialization failed for
> %pv\n", v);
> >> +else
> >> +vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
> 
> That won't work I think.
> 
> On Intel the context is allocated lazily for HVM/PVH guests during the
> first MSR access. For example:
> 
> core2_vpmu_do_wrmsr() ->
>     core2_vpmu_msr_common_check()):
>         if ( unlikely(!vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED)) &&
>      !core2_vpmu_alloc_resource(current) )
>                 return 0;
> 
> For PV guests the context *is* allocated from vmx_vpmu_initialise().
> 
> I don't remember why only PV does eager allocation but I think doing it
> for all guests would make code much simpler and then this patch will be
> correct.
> 

Ok. Simpler if I leave setting the flag in the implementation code. I think 
clearing it in vcpu_arch_destroy() would still be correct in all cases.

  Paul

> -boris
> 
> 
> >>
> >>  return ret;
> >>  }
> >> @@ -576,11 +578,36 @@ static void vpmu_arch_destroy(struct vcpu *v)
> >>
> >>   vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
> >>  }
> >> +
> >> +vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
> >>  }
> > Boris,
> >
> > I'd like to ask that you comment on this part of the change at
> > least, as I seem to vaguely recall that things were intentionally
> > not done this way originally.
> >
> > Paul,
> >
> > everything else looks god to me now.
> >
> > Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the quarantine domain

2019-11-27 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 27 November 2019 15:26
> To: Durrant, Paul 
> Cc: Kevin Tian ; xen-devel@lists.xenproject.org;
> Andrew Cooper ; Roger Pau Monné
> ; Wei Liu 
> Subject: Re: [Xen-devel] [PATCH] x86 / iommu: set up a scratch page in the
> quarantine domain
> 
> On 27.11.2019 16:18,  Durrant, Paul  wrote:
> >> -Original Message-
> >> From: Tian, Kevin 
> >> Sent: 25 November 2019 08:22
> >> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> >> Cc: Jan Beulich ; Andrew Cooper
> >> ; Wei Liu ; Roger Pau Monné
> >> 
> >> Subject: RE: [PATCH] x86 / iommu: set up a scratch page in the
> quarantine
> >> domain
> >>
> >>> From: Paul Durrant [mailto:pdurr...@amazon.com]
> >>> Sent: Wednesday, November 20, 2019 8:09 PM
> >>>
> >>> This patch introduces a new iommu_op to facilitate a per-
> implementation
> >>> quarantine set up, and then further code for x86 implementations
> >>> (amd and vtd) to set up a read/wrote scratch page to serve as the
> >> source/
> >>> target for all DMA whilst a device is assigned to dom_io.
> >>>
> >>> The reason for doing this is that some hardware may continue to re-try
> >>> DMA, despite FLR, in the event of an error. Having a scratch page
> mapped
> >>> will allow pending DMA to drain and thus quiesce such buggy hardware.
> >>
> >> then there is no diagnostics at all since all faults are quiescent
> now...
> >> why do we want to support such buggy hardware? Is it better to make
> >> it an default-off option since buggy is supposed to niche case?
> >
> > I guess it could be a command line option... perhaps making the new
> > 'iommu=quarantine' boolean into something more complex, but I'm not
> > sure it's really worth it. Perhaps a compile time option would be
> > better?
> 
> Yet another option: How about installing the scratch page mappings
> only after a (handful of) IOMMU faults? But of course there was the
> related earlier question of whether indeed our turning off of bus
> mastering doesn't already help silencing the faults.

No. Unfortunately the h/w has zero tolerance for some faults.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 1/3] introduce GFN notification for translated domains

2019-11-25 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 25 November 2019 09:58
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross ; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Konrad Wilk ; George Dunlap
> ; Andrew Cooper ;
> Sander Eikelenboom ; Ian Jackson
> ; Roger Pau Monné 
> Subject: [Xen-devel] [PATCH v3 1/3] introduce GFN notification for
> translated domains
> 
> In order for individual IOMMU drivers (and from an abstract pov also
> architectures) to be able to adjust, ahead of actual mapping requests,
> their data structures when they might cover only a sub-range of all
> possible GFNs, introduce a notification call used by various code paths
> potentially installing a fresh mapping of a never used GFN (for a
> particular domain).
> 
> Note that before this patch, in gnttab_transfer(), once past
> assign_pages(), further errors modifying the physmap are ignored
> (presumably because it would be too complicated to try to roll back at
> that point). This patch follows suit by ignoring failed notify_gfn()s or
> races due to the need to intermediately drop locks, simply printing out
> a warning that the gfn may not be accessible due to the failure.
> 
> Signed-off-by: Jan Beulich 
> ---
> v3: Conditionalize upon CONFIG_IOMMU_FORCE_PT_SHARE, also covering the
> share_p2m_table() functionality as appropriate. Un-comment the
> GNTMAP_host_map check.
> v2: Introduce arch_notify_gfn(), to invoke gfn_valid() on x86 (this
> unfortunately means it and notify_gfn() now need to be macros, or
> else include file dependencies get in the way, as gfn_valid() lives
> in paging.h, which we shouldn't include from xen/sched.h). Improve
> description.
> 
> TBD: Does Arm actually have anything to check against in its
>  arch_notify_gfn()?
> 
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -173,7 +173,8 @@ static int __init pvh_populate_memory_ra
>  continue;
>  }
> 
> -rc = guest_physmap_add_page(d, _gfn(start), page_to_mfn(page),
> +rc = notify_gfn(d, _gfn(start + (1UL << order) - 1)) ?:
> + guest_physmap_add_page(d, _gfn(start), page_to_mfn(page),
>  order);
>  if ( rc != 0 )
>  {
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -4304,9 +4304,17 @@ static int hvmop_set_param(
>  if ( a.value > SHUTDOWN_MAX )
>  rc = -EINVAL;
>  break;
> +
>  case HVM_PARAM_IOREQ_SERVER_PFN:
> -d->arch.hvm.ioreq_gfn.base = a.value;
> +if ( d->arch.hvm.params[HVM_PARAM_NR_IOREQ_SERVER_PAGES] )
> +rc = notify_gfn(
> + d,
> + _gfn(a.value + d->arch.hvm.params
> +[HVM_PARAM_NR_IOREQ_SERVER_PAGES] -
> 1));

IIRC the PFN is typically set by the toolstack before the number of pages, so 
the notify will be for a.value - 1, i.e. the previous gfn. Is that a problem?

  Paul

> +if ( !rc )
> + d->arch.hvm.ioreq_gfn.base = a.value;
>  break;
> +
>  case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
>  {
>  unsigned int i;
> @@ -4317,6 +4325,9 @@ static int hvmop_set_param(
>  rc = -EINVAL;
>  break;
>  }
> +rc = notify_gfn(d, _gfn(d->arch.hvm.ioreq_gfn.base + a.value -
> 1));
> +if ( rc )
> +break;
>  for ( i = 0; i < a.value; i++ )
>  set_bit(i, >arch.hvm.ioreq_gfn.mask);
> 
> @@ -4330,7 +4341,11 @@ static int hvmop_set_param(
>  BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN >
>   sizeof(d->arch.hvm.ioreq_gfn.legacy_mask) * 8);
>  if ( a.value )
> -set_bit(a.index, >arch.hvm.ioreq_gfn.legacy_mask);
> +{
> +rc = notify_gfn(d, _gfn(a.value));
> +if ( !rc )
> +set_bit(a.index, >arch.hvm.ioreq_gfn.legacy_mask);
> +}
>  break;
> 
>  case HVM_PARAM_X87_FIP_WIDTH:
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -946,6 +946,16 @@ map_grant_ref(
>  return;
>  }
> 
> +if ( paging_mode_translate(ld) && (op->flags & GNTMAP_host_map) &&
> + (rc = notify_gfn(ld, gaddr_to_gfn(op->host_addr))) )
> +{
> +gdprintk(XENLOG_INFO, "notify(%"PRI_gfn") -> %d\n",
> + gfn_x(gaddr_to_gfn(op->host_addr)), rc);
> +op->status = GNTST_general_error;
> +return;
> +BUILD_BUG_ON(GNTST_okay);
> +}
> +
>  if ( unlikely((rd = rcu_lock_domain_by_id(op->dom)) == NULL) )
>  {
>  gdprintk(XENLOG_INFO, "Could not find domain %d\n", op->dom);
> @@ -2123,6 +2133,7 @@ gnttab_transfer(
>  {
>  bool_t okay;
>  int rc;
> +gfn_t gfn;
> 
>  if ( i && hypercall_preempt_check() )
>  return i;
> @@ -2300,21 +2311,52 @@ gnttab_transfer(
>  act = 

Re: [Xen-devel] [PATCH v3 1/3] introduce GFN notification for translated domains

2019-11-25 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 25 November 2019 10:51
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Ian Jackson ; Roger
> Pau Monné ; Sander Eikelenboom
> ; George Dunlap ;
> Stefano Stabellini ; Konrad Wilk
> ; Juergen Gross ; Julien Grall
> ; Wei Liu 
> Subject: Re: [Xen-devel] [PATCH v3 1/3] introduce GFN notification for
> translated domains
> 
> On 25.11.2019 11:37,  Durrant, Paul  wrote:
> >> -Original Message-
> >> From: Xen-devel  On Behalf Of
> Jan
> >> Beulich
> >> Sent: 25 November 2019 09:58
> >>
> >> --- a/xen/arch/x86/hvm/hvm.c
> >> +++ b/xen/arch/x86/hvm/hvm.c
> >> @@ -4304,9 +4304,17 @@ static int hvmop_set_param(
> >>  if ( a.value > SHUTDOWN_MAX )
> >>  rc = -EINVAL;
> >>  break;
> >> +
> >>  case HVM_PARAM_IOREQ_SERVER_PFN:
> >> -d->arch.hvm.ioreq_gfn.base = a.value;
> >> +if ( d->arch.hvm.params[HVM_PARAM_NR_IOREQ_SERVER_PAGES] )
> >> +rc = notify_gfn(
> >> + d,
> >> + _gfn(a.value + d->arch.hvm.params
> >> +[HVM_PARAM_NR_IOREQ_SERVER_PAGES]
> -
> >> 1));
> >
> > IIRC the PFN is typically set by the toolstack before the number of
> > pages, so the notify will be for a.value - 1, i.e. the previous gfn.
> > Is that a problem?
> 
> There's an if() around the invocation to avoid this situation, so I'm
> afraid I don't understand the question.

D'oh... Missed it. Sorry for the noise.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-28 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 27 November 2019 19:42
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> ; Jun Nakajima ; Kevin Tian
> 
> Subject: Re: [PATCH v2] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> Hi Paul,
> 
> On 27/11/2019 12:00, Paul Durrant wrote:
> > From: Julien Grall 
> >
> > A guest will setup a shared page with the hypervisor for each vCPU via
> > XENPMU_init. The page will then get mapped in the hypervisor and only
> > released when XENPMU_finish is called.
> >
> > This means that if the guest fails to invoke XENPMU_finish, e.g if it is
> > destroyed rather than cleanly shut down, the page will stay mapped in
> the
> > hypervisor. One of the consequences is the domain can never be fully
> > destroyed as a page reference is still held.
> >
> > As Xen should never rely on the guest to correctly clean-up any
> > allocation in the hypervisor, we should also unmap such pages during the
> > domain destruction if there are any left.
> >
> > We can re-use the same logic as in pvpmu_finish(). To avoid
> > duplication, move the logic in a new function that can also be called
> > from vpmu_destroy().
> >
> > NOTE: The call to vpmu_destroy() must also be moved from
> >arch_vcpu_destroy() into domain_relinquish_resources() such that
> the
> >reference on the mapped page does not prevent domain_destroy()
> (which
> >calls arch_vcpu_destroy()) from being called.
> >Also, whils it appears that vpmu_arch_destroy() is idempotent it
> is
> >by no means obvious. Hence move manipulation of the
> >VPMU_CONTEXT_ALLOCATED flag out of implementation specific code
> and
> >make sure it is cleared at the end of vpmu_arch_destroy().
> 
> If you resend the patch, it might be worth to add a line about the lack
> of XSA. Something like:
> 
> There is no associated XSA because vPMU  is not security supported (see
> XSA-163).

Sure, I'll add another note.

  Paul

> 
> Cheers,
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-28 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 28 November 2019 10:23
> To: Durrant, Paul ; Boris Ostrovsky
> 
> Cc: xen-devel@lists.xenproject.org; Grall, Julien ;
> Andrew Cooper ; Roger Pau Monné
> ; Jun Nakajima ; Kevin Tian
> ; Wei Liu 
> Subject: Re: [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> On 28.11.2019 10:38, Paul Durrant wrote:
> > From: Julien Grall 
> >
> > A guest will setup a shared page with the hypervisor for each vCPU via
> > XENPMU_init. The page will then get mapped in the hypervisor and only
> > released when XENPMU_finish is called.
> >
> > This means that if the guest fails to invoke XENPMU_finish, e.g if it is
> > destroyed rather than cleanly shut down, the page will stay mapped in
> the
> > hypervisor. One of the consequences is the domain can never be fully
> > destroyed as a page reference is still held.
> >
> > As Xen should never rely on the guest to correctly clean-up any
> > allocation in the hypervisor, we should also unmap such pages during the
> > domain destruction if there are any left.
> >
> > We can re-use the same logic as in pvpmu_finish(). To avoid
> > duplication, move the logic in a new function that can also be called
> > from vpmu_destroy().
> >
> > NOTE: - The call to vpmu_destroy() must also be moved from
> > arch_vcpu_destroy() into domain_relinquish_resources() such that
> > the reference on the mapped page does not prevent
> domain_destroy()
> > (which calls arch_vcpu_destroy()) from being called.
> >   - Whilst it appears that vpmu_arch_destroy() is idempotent it is
> > by no means obvious. Hence make sure the VPMU_CONTEXT_ALLOCATED
> > flag is cleared at the end of vpmu_arch_destroy().
> >   - This is not an XSA because vPMU is not security supported (see
> > XSA-163).
> >
> > Signed-off-by: Julien Grall 
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Jan Beulich 
> > Cc: Andrew Cooper 
> > Cc: Wei Liu 
> > Cc: "Roger Pau Monné" 
> > Cc: Jun Nakajima 
> > Cc: Kevin Tian 
> >
> > v2:
> >  - Re-word commit comment slightly
> >  - Re-enforce idempotency of vmpu_arch_destroy()
> >  - Move invocation of vpmu_destroy() earlier in
> >domain_relinquish_resources()
> 
> What about v3?

Oh, sorry:

v3:
 - Add comment regarding XSA-163
 - Revert changes setting VPMU_CONTEXT_ALLOCATED in common code

  Paul

> 
> > --- a/xen/arch/x86/cpu/vpmu.c
> > +++ b/xen/arch/x86/cpu/vpmu.c
> > @@ -576,11 +576,36 @@ static void vpmu_arch_destroy(struct vcpu *v)
> >
> >   vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
> >  }
> > +
> > +vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
> >  }
> 
> Boris, to be on the safe side - are you in agreement with this
> change, now that the setting of the flag is being left untouched?
> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v3] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Durrant, Paul
> -Original Message-
> From: Paul Durrant 
> Sent: 28 November 2019 12:51
> To: xen-devel@lists.xenproject.org
> Cc: George Dunlap ; Durrant, Paul
> ; Ian Jackson ; Wei Liu
> ; Andrew Cooper ; George Dunlap
> ; Jan Beulich ; Julien
> Grall ; Konrad Rzeszutek Wilk ;
> Stefano Stabellini ; Anthony PERARD
> ; Marek Marczykowski-Górecki
> ; Volodymyr Babchuk
> ; Roger Pau Monné 
> Subject: [PATCH-for-4.13 v3] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> From: George Dunlap 
> 
> Xen used to have single, system-wide limits for the number of grant
> frames and maptrack frames a guest was allowed to create. Increasing
> or decreasing this single limit on the Xen command-line would change
> the limit for all guests on the system.
> 
> Later, per-domain limits for these values was created. The system-wide
> limits became strict limits: domains could not be created with higher
> limits, but could be created with lower limits. However, that change
> also introduced a range of different "default" values into various
> places in the toolstack:
> 
> - The python libxc bindings hard-coded these values to 32 and 1024,
>   respectively
> - The libxl default values are 32 and 1024 respectively.
> - xl will use the libxl default for maptrack, but does its own default
>   calculation for grant frames: either 32 or 64, based on the max
>   possible mfn.
> 
> These defaults interact poorly with the hypervisor command-line limit:
> 
> - The hypervisor command-line limit cannot be used to raise the limit
>   for all guests anymore, as the default in the toolstack will
>   effectively override this.
> - If you use the hypervisor command-line limit to *reduce* the limit,
>   then the "default" values generated by the toolstack are too high,
>   and all guest creations will fail.
> 
> In other words, the toolstack defaults require any change to be
> effected by having the admin explicitly specify a new value in every
> guest.
> 
> In order to address this, have grant_table_init treat negative values
> for max_grant_frames and max_maptrack_frames as instructions to use the
> system-wide default, and have all the above toolstacks default to passing
> -1 unless a different value is explicitly configured.
> 
> This restores the old behavior in that changing the hypervisor command-
> line
> option can change the behavior for all guests, while retaining the ability
> to set per-guest values.  It also removes the bug that reducing the
> system-wide max will cause all domains without explicit limits to fail.
> 
> NOTE: - The Ocaml bindings require the caller to always specify a value,
> and the code to start a xenstored stubdomain hard-codes these to 4
>   and 128 respectively; this behavour will not be modified.
> 
> Signed-off-by: George Dunlap 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Ian Jackson 
> Cc: Wei Liu 
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Jan Beulich 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Anthony PERARD 
> Cc: "Marek Marczykowski-Górecki" 
> Cc: Volodymyr Babchuk 
> Cc: "Roger Pau Monné" 
> 
> v3:
>  - Make sure that specified values cannot be negative or overflow a
>signed int
> 
> v2:
>  - re-worked George's original commit massage a little
>  - fixed the text in xl.conf.5.pod
>  - use -1 as the sentinel value for 'default' and < 0 for checking it
> ---
>  docs/man/xl.conf.5.pod|  6 ++--
>  tools/libxl/libxl.h   |  4 +--
>  tools/libxl/libxl_types.idl   |  4 +--
>  tools/libxl/libxlu_cfg.c  | 24 ++--
>  tools/libxl/libxlutil.h   |  2 ++
>  tools/python/xen/lowlevel/xc/xc.c |  4 +--
>  tools/xl/xl.c | 15 --
>  tools/xl/xl_parse.c   |  9 --
>  xen/arch/arm/setup.c  |  2 +-
>  xen/arch/x86/setup.c  |  4 +--
>  xen/common/grant_table.c  | 46 ---
>  xen/include/public/domctl.h   | 10 ---
>  xen/include/xen/grant_table.h |  8 +++---
>  13 files changed, 100 insertions(+), 38 deletions(-)
> 
> diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod
> index 962144e38e..207ab3e77a 100644
> --- a/docs/man/xl.conf.5.pod
> +++ b/docs/man/xl.conf.5.pod
> @@ -81,13 +81,15 @@ Default: C
> 
>  Sets the default value for the C domain config value.
> 
> -Default: C<32> on hosts up to 16TB of memory, C<64> on hosts larger than
> 16TB
> +Default: value of Xen command line B parameter (or its
> +default value if unspecified).
> 
>  =item B
> 
>  Sets the defa

Re: [Xen-devel] [PATCH v2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Durrant, Paul
> -Original Message-
> From: George Dunlap 
> Sent: 27 November 2019 16:52
> To: Durrant, Paul ; Jan Beulich 
> Cc: AndrewCooper ; Anthony PERARD
> ; Roger Pau Monné ;
> Volodymyr Babchuk ; George Dunlap
> ; Ian Jackson ;
> Marek Marczykowski-Górecki ; Stefano
> Stabellini ; xen-devel@lists.xenproject.org;
> Konrad Rzeszutek Wilk ; Julien Grall
> ; Wei Liu 
> Subject: Re: [PATCH v2] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On 11/27/19 4:43 PM, Durrant, Paul wrote:
> >> -Original Message-
> >> From: George Dunlap 
> >> Sent: 27 November 2019 16:34
> >> To: Jan Beulich ; Durrant, Paul
> 
> >> Cc: AndrewCooper ; Anthony PERARD
> >> ; Roger Pau Monné ;
> >> Volodymyr Babchuk ; George Dunlap
> >> ; Ian Jackson ;
> >> Marek Marczykowski-Górecki ; Stefano
> >> Stabellini ; xen-devel@lists.xenproject.org;
> >> Konrad Rzeszutek Wilk ; Julien Grall
> >> ; Wei Liu 
> >> Subject: Re: [PATCH v2] Rationalize max_grant_frames and
> >> max_maptrack_frames handling
> >>
> >> On 11/27/19 4:20 PM, Jan Beulich wrote:
> >>> On 27.11.2019 17:14,  Durrant, Paul  wrote:
> >>>>> From: Jan Beulich 
> >>>>> Sent: 27 November 2019 15:56
> >>>>>
> >>>>> On 27.11.2019 15:37, Paul Durrant wrote:
> >>>>>> --- a/xen/arch/arm/setup.c
> >>>>>> +++ b/xen/arch/arm/setup.c
> >>>>>> @@ -789,7 +789,7 @@ void __init start_xen(unsigned long
> >>>>> boot_phys_offset,
> >>>>>>  .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >>>>>>  .max_evtchn_port = -1,
> >>>>>>  .max_grant_frames = gnttab_dom0_frames(),
> >>>>>> -.max_maptrack_frames = opt_max_maptrack_frames,
> >>>>>> +.max_maptrack_frames = -1,
> >>>>>>  };
> >>>>>>  int rc;
> >>>>>>
> >>>>>> --- a/xen/arch/x86/setup.c
> >>>>>> +++ b/xen/arch/x86/setup.c
> >>>>>> @@ -697,8 +697,8 @@ void __init noreturn __start_xen(unsigned long
> >>>>> mbi_p)
> >>>>>>  struct xen_domctl_createdomain dom0_cfg = {
> >>>>>>  .flags = IS_ENABLED(CONFIG_TBOOT) ?
> >> XEN_DOMCTL_CDF_s3_integrity
> >>>>> : 0,
> >>>>>>  .max_evtchn_port = -1,
> >>>>>> -.max_grant_frames = opt_max_grant_frames,
> >>>>>> -.max_maptrack_frames = opt_max_maptrack_frames,
> >>>>>> +.max_grant_frames = -1,
> >>>>>> +.max_maptrack_frames = -1,
> >>>>>>  };
> >>>>>
> >>>>> With these there's no need anymore for opt_max_maptrack_frames to
> >>>>> be non-static. Sadly Arm still wants opt_max_grant_frames
> >>>>> accessible in gnttab_dom0_frames().
> >>>>
> >>>> Yes, I was about to make them static until I saw what the ARM code
> did.
> >>>
> >>> But the one that Arm doesn't need should become static now.
> >>>
> >>>>>> --- a/xen/common/grant_table.c
> >>>>>> +++ b/xen/common/grant_table.c
> >>>>>> @@ -1837,12 +1837,18 @@ active_alloc_failed:
> >>>>>>  return -ENOMEM;
> >>>>>>  }
> >>>>>>
> >>>>>> -int grant_table_init(struct domain *d, unsigned int
> >> max_grant_frames,
> >>>>>> - unsigned int max_maptrack_frames)
> >>>>>> +int grant_table_init(struct domain *d, int max_grant_frames,
> >>>>>> + int max_maptrack_frames)
> >>>>>>  {
> >>>>>>  struct grant_table *gt;
> >>>>>>  int ret = -ENOMEM;
> >>>>>>
> >>>>>> +/* Default to maximum value if no value was specified */
> >>>>>> +if ( max_grant_frames < 0 )
> >>>>>> +max_grant_frames = opt_max_grant_frames;
> >>>>>> +if ( max_maptrack_frames < 0 )
> >>>>>> +max_maptrack_frames = opt_max_maptrack_frames;
> >>>>>> +
> >>>>>>  if ( max_grant_frames < INITIAL_NR_GRANT_FRAMES ||
>

Re: [Xen-devel] [PATCH-for-4.13 v5] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-29 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 29 November 2019 10:46
> To: Durrant, Paul 
> Cc: Andrew Cooper ; Anthony PERARD
> ; George Dunlap ;
> Roger Pau Monné ; Volodymyr Babchuk
> ; George Dunlap ;
> Ian Jackson ; Marek Marczykowski-Górecki
> ; Stefano Stabellini
> ; xen-devel@lists.xenproject.org; Konrad Rzeszutek
> Wilk ; Julien Grall ; Wei Liu
> 
> Subject: Re: [PATCH-for-4.13 v5] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On 29.11.2019 11:39, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: 29 November 2019 10:29
> >> To: Durrant, Paul 
> >> Cc: Andrew Cooper ; Anthony PERARD
> >> ; George Dunlap ;
> >> Roger Pau Monné ; Volodymyr Babchuk
> >> ; George Dunlap
> ;
> >> Ian Jackson ; Marek Marczykowski-Górecki
> >> ; Stefano Stabellini
> >> ; xen-devel@lists.xenproject.org; Konrad
> Rzeszutek
> >> Wilk ; Julien Grall ; Wei Liu
> >> 
> >> Subject: Re: [PATCH-for-4.13 v5] Rationalize max_grant_frames and
> >> max_maptrack_frames handling
> >>
> >> On 29.11.2019 11:22, Jan Beulich wrote:
> >>> On 28.11.2019 17:52, Paul Durrant wrote:
> >>>> --- a/xen/common/grant_table.c
> >>>> +++ b/xen/common/grant_table.c
> >>>> @@ -84,11 +84,40 @@ struct grant_table {
> >>>>  struct grant_table_arch arch;
> >>>>  };
> >>>>
> >>>> +static int parse_gnttab_limit(const char *param, const char *arg,
> >>>> +  unsigned int *valp)
> >>>> +{
> >>>> +const char *e;
> >>>> +unsigned long val;
> >>>> +
> >>>> +val = simple_strtoul(arg, , 0);
> >>>> +if ( *e )
> >>>> +return -EINVAL;
> >>>> +
> >>>> +if ( val > INT_MAX )
> >>>> +return -ERANGE;
> >>>> +
> >>>> +return 0;
> >>>> +}
> >>>
> >>> *valp doesn't get written to anymore.
> >
> > That was intentional, given Juergen's comment...
> >
> >> With this fixed (and no new
> >>> issues introduced ;-) ) hypervisor side
> >>> Reviewed-by: Jan Beulich 
> >>
> >> And I guess I should have clarified: I'd be fine adding the missing
> >> assignment while committing, provided the tools side won't require
> >> any changes.
> >
> > ...but if we want to allow dom0 to set itself up for INT_MAX frames
> > in the event of a bad value then I'm not objecting.
> 
> Looks like you're misunderstanding, or I'm missing something:
> The command line options right now won't take any effect, as
> the opt_* global variables won't be written to at all. I'm not
> taking about falling back to using INT_MAX when we've noticed
> an out of bounds value.

Oh, sorry... too deep with my cutting. Yes, of course there should be a '*valp 
= val' just above 'return 0'.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v5] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-29 Thread Durrant, Paul
> -Original Message-
> From: Anthony PERARD 
> Sent: 29 November 2019 12:46
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; George Dunlap
> ; Ian Jackson ; Wei
> Liu ; Andrew Cooper ; George Dunlap
> ; Jan Beulich ; Julien
> Grall ; Konrad Rzeszutek Wilk ;
> Stefano Stabellini ; Marek Marczykowski-Górecki
> ; Volodymyr Babchuk
> ; Roger Pau Monné 
> Subject: Re: [PATCH-for-4.13 v5] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On Thu, Nov 28, 2019 at 04:52:24PM +, Paul Durrant wrote:
> > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> > index 49b56fa1a3..a2a5d321c5 100644
> > --- a/tools/libxl/libxl.h
> > +++ b/tools/libxl/libxl.h
> > @@ -364,8 +364,8 @@
> >   */
> >  #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1
> >
> > -#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32
> > -#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024
> > +#define LIBXL_MAX_GRANT_FRAMES_DEFAULT -1
> > +#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT -1
> >
> >  /*
> >   * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has
> > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> > index 0546d7865a..63e29bb2fb 100644
> > --- a/tools/libxl/libxl_types.idl
> > +++ b/tools/libxl/libxl_types.idl
> > @@ -511,8 +511,8 @@ libxl_domain_build_info =
> Struct("domain_build_info",[
> >
> >  ("vnuma_nodes", Array(libxl_vnode_info, "num_vnuma_nodes")),
> >
> > -("max_grant_frames",uint32, {'init_val':
> 'LIBXL_MAX_GRANT_FRAMES_DEFAULT'}),
> > -("max_maptrack_frames", uint32, {'init_val':
> 'LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT'}),
> > +("max_grant_frames",integer, {'init_val':
> 'LIBXL_MAX_GRANT_FRAMES_DEFAULT'}),
> > +("max_maptrack_frames", integer, {'init_val':
> 'LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT'}),
> 
> That's a change in the libxl API, could you add a LIBX_HAVE_* macro?
> 

Is it really, in practice?

> >
> >  ("device_model_version", libxl_device_model_version),
> >  ("device_model_stubdomain", libxl_defbool),
> > diff --git a/tools/libxl/libxlu_cfg.c b/tools/libxl/libxlu_cfg.c
> > index 72815d25dd..cafc632fc1 100644
> > --- a/tools/libxl/libxlu_cfg.c
> > +++ b/tools/libxl/libxlu_cfg.c
> > @@ -268,8 +268,9 @@ int xlu_cfg_replace_string(const XLU_Config *cfg,
> const char *n,
> >  return 0;
> >  }
> >
> > -int xlu_cfg_get_long(const XLU_Config *cfg, const char *n,
> > - long *value_r, int dont_warn) {
> > +int xlu_cfg_get_bounded_long(const XLU_Config *cfg, const char *n,
> > + long min, long max, long *value_r,
> > + int dont_warn) {
> >  long l;
> >  XLU_ConfigSetting *set;
> >  int e;
> > @@ -303,10 +304,31 @@ int xlu_cfg_get_long(const XLU_Config *cfg, const
> char *n,
> >  cfg->config_source, set->lineno, n);
> >  return EINVAL;
> >  }
> > +if (l < min) {
> > +if (!dont_warn)
> > +fprintf(cfg->report,
> > +"%s:%d: warning: value `%ld' is smaller than
> minimum bound '%ld'\n",
> > +cfg->config_source, set->lineno, l, min);
> > +return EINVAL;
> > +}
> > +if (l > max) {
> > +if (!dont_warn)
> > +fprintf(cfg->report,
> > +"%s:%d: warning: value `%ld' is greater than
> maximum bound '%ld'\n",
> > +cfg->config_source, set->lineno, l, max);
> > +return EINVAL;
> > +}
> 
> I'm not sure what was the intention with the new function
> xlu_cfg_get_bounded_long(), but I don't think libxlu is the right place
> for it. That function is only going to make it harder for users to find
> mistakes in the config file. If `n' value is out of bound, it will only
> get ignored, and xl will keep going. I think xlu_cfg should only be a
> parser (and can check for syntax error).
> 
> Can you move that function to xl?
> 

I can, but why is this not considered useful in libxl? The call returns failure 
for an out-of-bounds check. If xl currently chooses to treat EINVAL as ENOENT 
then that's xl's bug to deal with.

  Paul

> Thanks,
> 
> --
> Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v5] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-29 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 29 November 2019 10:29
> To: Durrant, Paul 
> Cc: Andrew Cooper ; Anthony PERARD
> ; George Dunlap ;
> Roger Pau Monné ; Volodymyr Babchuk
> ; George Dunlap ;
> Ian Jackson ; Marek Marczykowski-Górecki
> ; Stefano Stabellini
> ; xen-devel@lists.xenproject.org; Konrad Rzeszutek
> Wilk ; Julien Grall ; Wei Liu
> 
> Subject: Re: [PATCH-for-4.13 v5] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On 29.11.2019 11:22, Jan Beulich wrote:
> > On 28.11.2019 17:52, Paul Durrant wrote:
> >> --- a/xen/common/grant_table.c
> >> +++ b/xen/common/grant_table.c
> >> @@ -84,11 +84,40 @@ struct grant_table {
> >>  struct grant_table_arch arch;
> >>  };
> >>
> >> +static int parse_gnttab_limit(const char *param, const char *arg,
> >> +  unsigned int *valp)
> >> +{
> >> +const char *e;
> >> +unsigned long val;
> >> +
> >> +val = simple_strtoul(arg, , 0);
> >> +if ( *e )
> >> +return -EINVAL;
> >> +
> >> +if ( val > INT_MAX )
> >> +return -ERANGE;
> >> +
> >> +return 0;
> >> +}
> >
> > *valp doesn't get written to anymore.

That was intentional, given Juergen's comment...

> With this fixed (and no new
> > issues introduced ;-) ) hypervisor side
> > Reviewed-by: Jan Beulich 
> 
> And I guess I should have clarified: I'd be fine adding the missing
> assignment while committing, provided the tools side won't require
> any changes.

...but if we want to allow dom0 to set itself up for INT_MAX frames in the 
event of a bad value then I'm not objecting.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen-blkback: allow module to be cleanly unloaded

2019-11-29 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 29 November 2019 11:56
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Roger Pau Monné ; Jens Axboe
> ; Konrad Rzeszutek Wilk 
> Subject: Re: [PATCH] xen-blkback: allow module to be cleanly unloaded
> 
> On 29.11.2019 12:31, Paul Durrant wrote:
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -173,6 +173,8 @@ static struct xen_blkif *xen_blkif_alloc(domid_t
> domid)
> > init_completion(>drain_complete);
> > INIT_WORK(>free_work, xen_blkif_deferred_free);
> >
> > +   __module_get(THIS_MODULE);
> > +
> > return blkif;
> >  }
> >
> > @@ -320,6 +322,8 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> >
> > /* Make sure everything is drained before shutting down */
> > kmem_cache_free(xen_blkif_cachep, blkif);
> > +
> > +   module_put(THIS_MODULE);
> >  }
> 
> I realize there are various example of this in the tree, but
> isn't this a flawed approach? __module_get() (nor even
> try_module_get()) will prevent an unload attempt ahead of it
> getting invoked, while execution is already in this module's
> .text section.

Good point. That does appear to be a race.

> I think the xenbus driver should do this
> before calling ->probe(), in case of its failure, and after
> a successful call to ->remove().
> 

That does sound better. I'll see if I can pick up other occurrences (certainly 
netback) and fix.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v5] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-29 Thread Durrant, Paul
> -Original Message-
> From: Anthony PERARD 
> Sent: 29 November 2019 13:52
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; George Dunlap
> ; Ian Jackson ; Wei
> Liu ; Andrew Cooper ; George Dunlap
> ; Jan Beulich ; Julien
> Grall ; Konrad Rzeszutek Wilk ;
> Stefano Stabellini ; Marek Marczykowski-Górecki
> ; Volodymyr Babchuk
> ; Roger Pau Monné 
> Subject: Re: [PATCH-for-4.13 v5] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On Fri, Nov 29, 2019 at 12:51:47PM +, Durrant, Paul wrote:
> > > -Original Message-
> > > From: Anthony PERARD 
> > > Sent: 29 November 2019 12:46
> > > I'm not sure what was the intention with the new function
> > > xlu_cfg_get_bounded_long(), but I don't think libxlu is the right
> place
> > > for it. That function is only going to make it harder for users to
> find
> > > mistakes in the config file. If `n' value is out of bound, it will
> only
> > > get ignored, and xl will keep going. I think xlu_cfg should only be a
> > > parser (and can check for syntax error).
> > >
> > > Can you move that function to xl?
> > >
> >
> > I can, but why is this not considered useful in libxl? The call returns
> failure for an out-of-bounds check.
> 
> Sorry that the repo layout is confusing, but libxl != libxlu. libxl
> doesn't even use libxlu!

Oh, that is confusing... there is code under tools/libxl that is not use by 
libxl; totally sane, of course.

> 
> > If xl currently chooses to treat EINVAL as ENOENT then that's xl's bug
> to deal with.
> 
> The general use of xlu_cfg_get_*() that treats all errors as ENOENT in
> xl is an issue, I think, but this patch does the same thing and treat
> EINVAL as ENOENT when using the newly introduced
> xlu_cfg_get_bounded_long() function. I don't think that an xl bug to
> deal with, but an issue with the patch.
> 

True, but I don't think that makes things strictly worse. I'll send v6 with 
extra checks though.

  Paul

> Cheers,
> 
> --
> Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] block/xen-blkback: allow module to be cleanly unloaded

2019-11-29 Thread Durrant, Paul
> -Original Message-
> From: Roger Pau Monné 
> Sent: 29 November 2019 15:00
> To: Durrant, Paul 
> Cc: linux-bl...@vger.kernel.org; linux-ker...@vger.kernel.org; xen-
> de...@lists.xenproject.org; Konrad Rzeszutek Wilk
> ; Jens Axboe 
> Subject: Re: [PATCH v2 2/2] block/xen-blkback: allow module to be cleanly
> unloaded
> 
> On Fri, Nov 29, 2019 at 01:43:06PM +, Paul Durrant wrote:
> > Add a module_exit() to perform the necessary clean-up.
> >
> > Signed-off-by: Paul Durrant 
> 
> LGTM:
> 
> Reviewed-by: Roger Pau Monné 
> 

Thanks.

> AFAICT we should make sure this is not committed before patch 1, or
> else you could unload a blkback module that's still in use?
> 

Yes, that's correct.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v5] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-29 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 29 November 2019 15:47
> To: Wei Liu 
> Cc: Durrant, Paul ; Anthony Perard
> ; xen-devel@lists.xenproject.org; George Dunlap
> ; Andrew Cooper ; Jan
> Beulich ; Julien Grall ; Konrad
> Rzeszutek Wilk ; Stefano Stabellini
> ; Marek Marczykowski-Górecki
> ; Volodymyr Babchuk
> ; Roger Pau Monne 
> Subject: Re: [PATCH-for-4.13 v5] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> Wei Liu writes ("Re: [PATCH-for-4.13 v5] Rationalize max_grant_frames and
> max_maptrack_frames handling"):
> > What if we use 0x to denote default instead? That wouldn't
> > require changing the type here.
> 
> Is there some reason we wouldn't use ~0 to mean default ?
> 
> In the tools area we normally spell this as
>  ~(some appropriate type)0
> to make sure it has the right width.  But if we know the type and it
> is of fixed length, as here, 0xu is OK too.
> 
> > The type change here makes me feel a bit uncomfortable, though in
> > practice it may not matter. I don't see anyone would specify a value
> > that would become negative when cast from uint32 to integer.
> 
> The problem with the type change is that in principle we have to audit
> all the places the variables are used.
> 

Can a toolstack maintainer please come up with a concrete suggestion as to what 
the patch should do then? It's now at v6 and time is short.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] xen/xenbus: reference count registered modules

2019-11-29 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 29 November 2019 16:01
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; linux-bl...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Stefano Stabellini ; Boris
> Ostrovsky ; Juergen Gross 
> Subject: Re: [PATCH v2 1/2] xen/xenbus: reference count registered modules
> 
> On 29.11.2019 14:43, Paul Durrant wrote:
> > To prevent a module being removed whilst attached to a frontend, and
> 
> Why only frontend?
> 

True. Originally this was only intended for backends, but I guess this should 
now be 'otherend' or some equivalent form of words.

> > hence xenbus calling into potentially invalid text, take a reference on
> > the module before calling the probe() method (dropping it if
> unsuccessful)
> > and drop the reference after returning from the remove() method.
> >
> > NOTE: This allows the ad-hoc reference counting in xen-netback to be
> >   removed. This will be done in a subsequent patch.
> >
> > Suggested-by: Jan Beulich 
> > Signed-off-by: Paul Durrant 
> >
> > --- a/drivers/xen/xenbus/xenbus_probe.c
> > +++ b/drivers/xen/xenbus/xenbus_probe.c
> > @@ -232,9 +232,11 @@ int xenbus_dev_probe(struct device *_dev)
> > return err;
> > }
> >
> > +   __module_get(drv->driver.owner);
> 
> I guess you really want try_module_get() and deal with it returning
> false.
> 

Perhaps, yes.

  Paul

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v5] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-29 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 29 November 2019 16:08
> To: Durrant, Paul 
> Cc: Wei Liu ; Anthony Perard ; xen-
> de...@lists.xenproject.org; George Dunlap ;
> Andrew Cooper ; Jan Beulich
> ; Julien Grall ; Konrad Rzeszutek Wilk
> ; Stefano Stabellini ;
> Marek Marczykowski-Górecki ; Volodymyr
> Babchuk ; Roger Pau Monne
> 
> Subject: RE: [PATCH-for-4.13 v5] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> Durrant, Paul writes ("RE: [PATCH-for-4.13 v5] Rationalize
> max_grant_frames and max_maptrack_frames handling"):
> > > -Original Message-
> > > From: Ian Jackson 
> ...
> > > Is there some reason we wouldn't use ~0 to mean default ?
> > >
> > > In the tools area we normally spell this as
> > >  ~(some appropriate type)0
> > > to make sure it has the right width.  But if we know the type and it
> > > is of fixed length, as here, 0xu is OK too.
> > >
> > > > The type change here makes me feel a bit uncomfortable, though in
> > > > practice it may not matter. I don't see anyone would specify a value
> > > > that would become negative when cast from uint32 to integer.
> > >
> > > The problem with the type change is that in principle we have to audit
> > > all the places the variables are used.
> >
> > Can a toolstack maintainer please come up with a concrete suggestion as
> to what the patch should do then? It's now at v6 and time is short.
> 
> I think our proposal is to drop the type change, continue to use
> uint32_t everwhere for these values, and specify the "use default"
> value to be all-bits-set.
> 

Where? Everywhere or just in buildinfo? The switch from uint32_t to int32_t in 
the domctl does not, of course, change the width at all.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking and logging

2019-11-04 Thread Durrant, Paul
> -Original Message-
> From: Igor Druzhinin 
> Sent: 01 November 2019 19:28
> To: xen-devel@lists.xenproject.org
> Cc: Durrant, Paul ; jbeul...@suse.com;
> jgr...@suse.com
> Subject: [PATCH for-4.13 v2] passthrough: simplify locking and logging
> 
> From: Paul Durrant 
> 
> Dropping the pcidevs lock between calling device_assigned() and
> assign_device() means that the latter has to do the same check as the
> former for no obvious gain. Also, since long running operations under
> pcidevs lock already drop the lock and return -ERESTART periodically there
> is little point in immediately failing an assignment operation with
> -ERESTART just because the pcidevs lock could not be acquired (for the
> second time, having already blocked on acquiring the lock in
> device_assigned()).
> 
> This patch instead acquires the lock once for assignment (or test assign)
> operations directly in iommu_do_pci_domctl() and thus can remove the
> duplicate domain ownership check in assign_device(). Whilst in the
> neighbourhood, the patch also removes some debug logging from
> assign_device() and deassign_device() and replaces it with proper error
> logging, which allows error logging in iommu_do_pci_domctl() to be
> removed. Also, since device_assigned() can tell the difference between a
> guest assigned device and a non-existent one, log the actual error
> condition rather then being ambiguous for the sake a few extra lines of
> code.
> 
> Signed-off-by: Paul Durrant 
> ---
> 
> This is XSA-302 followup and contains some changes important for
> XenServer.
> Juergen, could this be considered for 4.13 inclusion?
> 
> v2: updated Paul's email address

Reviewed-by: Paul Durrant 

> 
> ---
>  xen/drivers/passthrough/pci.c | 101 ++---
> -
>  1 file changed, 42 insertions(+), 59 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index e64666d..ea0770d 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -932,30 +932,27 @@ static int deassign_device(struct domain *d,
> uint16_t seg, uint8_t bus,
>  break;
>  ret = hd->platform_ops->reassign_device(d, target, devfn,
>  pci_to_dev(pdev));
> -if ( !ret )
> -continue;
> -
> -printk(XENLOG_G_ERR "%pd: deassign %04x:%02x:%02x.%u failed
> (%d)\n",
> -   d, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret);
> -return ret;
> +if ( ret )
> +goto out;
>  }
> 
>  devfn = pdev->devfn;
>  ret = hd->platform_ops->reassign_device(d, target, devfn,
>  pci_to_dev(pdev));
>  if ( ret )
> -{
> -dprintk(XENLOG_G_ERR,
> -"%pd: deassign device (%04x:%02x:%02x.%u) failed\n",
> -d, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
> -return ret;
> -}
> +goto out;
> 
>  if ( pdev->domain == hardware_domain  )
>  pdev->quarantine = false;
> 
>  pdev->fault.count = 0;
> 
> +out:
> +if ( ret )
> +printk(XENLOG_G_ERR
> +   "%pd: deassign device (%04x:%02x:%02x.%u) failed (%d)\n",
> d,
> +   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret);
> +
>  return ret;
>  }
> 
> @@ -976,10 +973,7 @@ int pci_release_devices(struct domain *d)
>  {
>  bus = pdev->bus;
>  devfn = pdev->devfn;
> -if ( deassign_device(d, pdev->seg, bus, devfn) )
> -printk("domain %d: deassign device (%04x:%02x:%02x.%u)
> failed!\n",
> -   d->domain_id, pdev->seg, bus,
> -   PCI_SLOT(devfn), PCI_FUNC(devfn));
> +deassign_device(d, pdev->seg, bus, devfn);
>  }
>  pcidevs_unlock();
> 
> @@ -1534,8 +1528,7 @@ static int device_assigned(u16 seg, u8 bus, u8
> devfn)
>  struct pci_dev *pdev;
>  int rc = 0;
> 
> -pcidevs_lock();
> -
> +ASSERT(pcidevs_locked());
>  pdev = pci_get_pdev(seg, bus, devfn);
> 
>  if ( !pdev )
> @@ -1549,11 +1542,10 @@ static int device_assigned(u16 seg, u8 bus, u8
> devfn)
>pdev->domain != dom_io )
>  rc = -EBUSY;
> 
> -pcidevs_unlock();
> -
>  return rc;
>  }
> 
> +/* caller should hold the pcidevs_lock */
>  static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32
> flag)
>  {
>  const struct domain_iommu *hd = dom_iommu(d);
> @@ -1571,23 +1563,11 @@ static int a

Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking and logging

2019-11-04 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 04 November 2019 11:06
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Igor Druzhinin ; jgr...@suse.com;
> jbeul...@suse.com
> Subject: Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking
> and logging
> 
> On 04/11/2019 08:31, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Igor Druzhinin 
> >> Sent: 01 November 2019 19:28
> >> To: xen-devel@lists.xenproject.org
> >> Cc: Durrant, Paul ; jbeul...@suse.com;
> >> jgr...@suse.com
> >> Subject: [PATCH for-4.13 v2] passthrough: simplify locking and logging
> >>
> >> From: Paul Durrant 
> >>
> >> Dropping the pcidevs lock between calling device_assigned() and
> >> assign_device() means that the latter has to do the same check as the
> >> former for no obvious gain. Also, since long running operations under
> >> pcidevs lock already drop the lock and return -ERESTART periodically
> there
> >> is little point in immediately failing an assignment operation with
> >> -ERESTART just because the pcidevs lock could not be acquired (for the
> >> second time, having already blocked on acquiring the lock in
> >> device_assigned()).
> >>
> >> This patch instead acquires the lock once for assignment (or test
> assign)
> >> operations directly in iommu_do_pci_domctl() and thus can remove the
> >> duplicate domain ownership check in assign_device(). Whilst in the
> >> neighbourhood, the patch also removes some debug logging from
> >> assign_device() and deassign_device() and replaces it with proper error
> >> logging, which allows error logging in iommu_do_pci_domctl() to be
> >> removed. Also, since device_assigned() can tell the difference between
> a
> >> guest assigned device and a non-existent one, log the actual error
> >> condition rather then being ambiguous for the sake a few extra lines of
> >> code.
> >>
> >> Signed-off-by: Paul Durrant 
> >> ---
> >>
> >> This is XSA-302 followup and contains some changes important for
> >> XenServer.
> >> Juergen, could this be considered for 4.13 inclusion?
> >>
> >> v2: updated Paul's email address
> 
> This was work you did at Citrix, yes?
> 
> > Reviewed-by: Paul Durrant 
> 
> SoB and R-by?

I did do the work while I was at Citrix, but surely the SoB must be updated 
since the patch is only now being posted? As for the R-b, why should that be 
historic?

  Paul

> 
> ~Andrew
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] gnttab: make sure grant map operations don't skip their IOMMU part

2019-11-22 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 21 November 2019 17:38
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross ; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Konrad Wilk ; George Dunlap
> ; Andrew Cooper ;
> Ian Jackson 
> Subject: [Xen-devel] [PATCH] gnttab: make sure grant map operations don't
> skip their IOMMU part
> 
> Two almost simultaneous mapping requests need to make sure that at the
> completion of the earlier one IOMMU mappings (established explicitly
> here in the PV case) have been put in place. Forever since the splitting
> of the grant table lock a violation of this has been possible (using
> simplified pin counts, as it doesn't matter whether we talk about read
> or write mappings here):
> 
> initial state: act->pin = 0
> 
> vCPU A: progress the operation past the dropping of the locks after the
> act->pin updates (act->pin = 1, old_pin = 0, act_pin = 1)
> 
> vCPU B: progress the operation past the dropping of the locks after the
> act->pin updates (act->pin = 2, old_pin = 1, act_pin = 2)
> 
> vCPU B: (re-)acquire both gt locks, mapkind() returns 0, but both
> iommu_legacy_map() invocations get skipped due to non-zero
> old_pin
> 
> vCPU B: return to caller without IOMMU mapping
> 
> vCPU A: (re-)acquire both gt locks, mapkind() returns 0,
> iommu_legacy_map() gets invoked
> 
> With the locks dropped intermediately, whether to invoke
> iommu_legacy_map() must depend on only the return value of mapkind()
> and of course the kind of mapping request being processed, just like
> is already the case in unmap_common().
> 
> Also fix the style of the adjacent comment, and correct a nearby one
> still referring to a prior name of what is now mapkind().
> 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -917,8 +917,6 @@ map_grant_ref(
>  mfn_t mfn;
>  struct page_info *pg = NULL;
>  intrc = GNTST_okay;
> -u32old_pin;
> -u32act_pin;
>  unsigned int   cache_flags, clear_flags = 0, refcnt = 0, typecnt = 0;
>  bool   host_map_created = false;
>  struct active_grant_entry *act = NULL;
> @@ -1027,7 +1025,6 @@ map_grant_ref(
>  }
>  }
> 
> -old_pin = act->pin;
>  if ( op->flags & GNTMAP_device_map )
>  act->pin += (op->flags & GNTMAP_readonly) ?
>  GNTPIN_devr_inc : GNTPIN_devw_inc;
> @@ -1036,7 +1033,6 @@ map_grant_ref(
>  GNTPIN_hstr_inc : GNTPIN_hstw_inc;
> 
>  mfn = act->mfn;
> -act_pin = act->pin;
> 
>  cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
> 
> @@ -1144,27 +1140,22 @@ map_grant_ref(
>  if ( need_iommu )
>  {
>  unsigned int kind;
> -int err = 0;
> 
>  double_gt_lock(lgt, rgt);
> 
> -/* We're not translated, so we know that gmfns and mfns are
> -   the same things, so the IOMMU entry is always 1-to-1. */
> +/*
> + * We're not translated, so we know that dfns and mfns are
> + * the same things, so the IOMMU entry is always 1-to-1.
> + */
>  kind = mapkind(lgt, rd, mfn);
> -if ( (act_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
> - !(old_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
> -{
> -if ( !(kind & MAPKIND_WRITE) )
> -err = iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 0,
> -   IOMMUF_readable |
> IOMMUF_writable);
> -}
> -else if ( act_pin && !old_pin )
> -{
> -if ( !kind )
> -err = iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 0,
> -   IOMMUF_readable);
> -}
> -if ( err )
> +if ( !(op->flags & GNTMAP_readonly) &&
> + !(kind & MAPKIND_WRITE) )
> +kind = IOMMUF_readable | IOMMUF_writable;
> +else if ( !kind )
> +kind = IOMMUF_readable;
> +else
> +kind = 0;
> +if ( kind && iommu_legacy_map(ld, _dfn(mfn_x(mfn)), mfn, 0, kind)

Re-using 'kind' in this way slightly obfuscates things. I'm sure the compiler 
would still generate reasonable code if you used a separate 'flags' variable 
within the same scope.

  Paul

> )
>  {
>  double_gt_unlock(lgt, rgt);
>  rc = GNTST_general_error;
> @@ -1179,7 +1170,7 @@ map_grant_ref(
>   * other fields so just ensure the flags field is stored last.
>   *
>   * However, if gnttab_need_iommu_mapping() then this would race
> - * with a concurrent mapcount() call (on an unmap, for example)
> + * with a concurrent mapkind() call (on an unmap, for example)
>   * and a lock is required.
>   */
>  mt = _entry(lgt, handle);
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> 

Re: [Xen-devel] [PATCH v4 4/8] x86: introduce hypervisor framework

2019-11-22 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 21 November 2019 19:51
> To: Xen Development List 
> Cc: Wei Liu ; Wei Liu ; Andrew Cooper
> ; Michael Kelley ; Jan
> Beulich ; Roger Pau Monné 
> Subject: [Xen-devel] [PATCH v4 4/8] x86: introduce hypervisor framework
> 
> We will soon implement Hyper-V support for Xen. Add a framework for
> that.
> 
> This requires moving some of the hypervisor_* functions from xen.h to
> hypervisor.h.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
> Changes in v4:
> 1. Add ASSERT_UNREACHABLE to stubs.
> 2. Move __read_mostly.
> 3. Return hops directly.
> 4. Drop Paul's review tag.
> ---
>  xen/arch/x86/guest/Makefile|  2 +
>  xen/arch/x86/guest/hypervisor.c| 42 +
>  xen/include/asm-x86/guest.h|  1 +
>  xen/include/asm-x86/guest/hypervisor.h | 62 ++
>  xen/include/asm-x86/guest/xen.h| 12 -
>  5 files changed, 107 insertions(+), 12 deletions(-)
>  create mode 100644 xen/arch/x86/guest/hypervisor.c
>  create mode 100644 xen/include/asm-x86/guest/hypervisor.h
> 
> diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
> index 6806f04947..f63d64bbee 100644
> --- a/xen/arch/x86/guest/Makefile
> +++ b/xen/arch/x86/guest/Makefile
> @@ -1 +1,3 @@
> +obj-y += hypervisor.o
> +
>  subdir-$(CONFIG_XEN_GUEST) += xen
> diff --git a/xen/arch/x86/guest/hypervisor.c
> b/xen/arch/x86/guest/hypervisor.c
> new file mode 100644
> index 00..103feba5d8
> --- /dev/null
> +++ b/xen/arch/x86/guest/hypervisor.c
> @@ -0,0 +1,42 @@
> +/
> **
> + * arch/x86/guest/hypervisor.c
> + *
> + * Support for detecting and running under a hypervisor.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see .
> + *
> + * Copyright (c) 2019 Microsoft.
> + */
> +
> +#include 
> +
> +#include 
> +#include 
> +
> +static const struct hypervisor_ops __read_mostly *hops;
> +
> +const struct hypervisor_ops *hypervisor_probe(void)
> +{
> +return hops;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-x86/guest.h b/xen/include/asm-x86/guest.h
> index a38c6b5b3f..8e167165ae 100644
> --- a/xen/include/asm-x86/guest.h
> +++ b/xen/include/asm-x86/guest.h
> @@ -20,6 +20,7 @@
>  #define __X86_GUEST_H__
> 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> diff --git a/xen/include/asm-x86/guest/hypervisor.h b/xen/include/asm-
> x86/guest/hypervisor.h
> new file mode 100644
> index 00..2ab15a7108
> --- /dev/null
> +++ b/xen/include/asm-x86/guest/hypervisor.h
> @@ -0,0 +1,62 @@
> +/
> **
> + * asm-x86/guest/hypervisor.h
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> .
> + *
> + * Copyright (c) 2019 Microsoft.
> + */
> +
> +#ifndef __X86_HYPERVISOR_H__
> +#define __X86_HYPERVISOR_H__
> +
> +struct hypervisor_ops {
> +/* Name of the hypervisor */
> +const char *name;
> +/* Main setup routine */
> +void (*setup)(void);
> +/* AP setup */
> +void (*ap_setup)(void);
> +/* Resume from suspension */
> +void (*resume)(void);
> +};
> +
> +#ifdef CONFIG_GUEST
> +
> +const struct hypervisor_ops *hypervisor_probe(void);
> +void hypervisor_setup(void);
> +void hypervisor_ap_setup(void);
> +void hypervisor_resume(void);
> +
> +#else
> +
> +#include 
> +#include 
> +
> +static inline const struct hypervisor_ops *hypervisor_probe(void) {
> return NULL; }
> +static inline void hypervisor_setup(void) { ASSERT_UNREACHABLE(); }
> +static inline void 

Re: [Xen-devel] [PATCH v4 5/8] x86: rename hypervisor_{alloc, free}_unused_page

2019-11-22 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 21 November 2019 19:51
> To: Xen Development List 
> Cc: Wei Liu ; Wei Liu ; Andrew Cooper
> ; Michael Kelley ; Jan
> Beulich ; Roger Pau Monné 
> Subject: [Xen-devel] [PATCH v4 5/8] x86: rename hypervisor_{alloc,
> free}_unused_page
> 
> They are used in Xen code only.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
> Changes in v4:
> 1. Use xg_ prefix instead.
> 2. Drop Roger's review tag.
> ---
>  xen/arch/x86/guest/xen/xen.c| 6 +++---
>  xen/arch/x86/pv/shim.c  | 4 ++--
>  xen/include/asm-x86/guest/xen.h | 4 ++--
>  3 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
> index 1e37086518..0f5b5267c5 100644
> --- a/xen/arch/x86/guest/xen/xen.c
> +++ b/xen/arch/x86/guest/xen/xen.c
> @@ -93,7 +93,7 @@ static void map_shared_info(void)
>  unsigned int i;
>  unsigned long rc;
> 
> -if ( hypervisor_alloc_unused_page() )
> +if ( xg_alloc_unused_page() )
>  panic("unable to reserve shared info memory page\n");
> 
>  xatp.gpfn = mfn_x(mfn);
> @@ -280,7 +280,7 @@ void hypervisor_ap_setup(void)
>  init_evtchn();
>  }
> 
> -int hypervisor_alloc_unused_page(mfn_t *mfn)
> +int xg_alloc_unused_page(mfn_t *mfn)
>  {
>  unsigned long m;
>  int rc;
> @@ -292,7 +292,7 @@ int hypervisor_alloc_unused_page(mfn_t *mfn)
>  return rc;
>  }
> 
> -int hypervisor_free_unused_page(mfn_t mfn)
> +int xg_free_unused_page(mfn_t mfn)
>  {
>  return rangeset_remove_range(mem, mfn_x(mfn), mfn_x(mfn));
>  }
> diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
> index 351da970ef..7a898fdbe5 100644
> --- a/xen/arch/x86/pv/shim.c
> +++ b/xen/arch/x86/pv/shim.c
> @@ -742,7 +742,7 @@ static long pv_shim_grant_table_op(unsigned int cmd,
>  };
>  mfn_t mfn;
> 
> -rc = hypervisor_alloc_unused_page();
> +rc = xg_alloc_unused_page();
>  if ( rc )
>  {
>  gprintk(XENLOG_ERR,
> @@ -754,7 +754,7 @@ static long pv_shim_grant_table_op(unsigned int cmd,
>  rc = xen_hypercall_memory_op(XENMEM_add_to_physmap,
> );
>  if ( rc )
>  {
> -hypervisor_free_unused_page(mfn);
> +xg_free_unused_page(mfn);
>  break;
>  }
> 
> diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-
> x86/guest/xen.h
> index 3145f75361..01dc3ee6f6 100644
> --- a/xen/include/asm-x86/guest/xen.h
> +++ b/xen/include/asm-x86/guest/xen.h
> @@ -33,8 +33,8 @@ extern bool pv_console;
>  extern uint32_t xen_cpuid_base;
> 
>  void probe_hypervisor(void);
> -int hypervisor_alloc_unused_page(mfn_t *mfn);
> -int hypervisor_free_unused_page(mfn_t mfn);
> +int xg_alloc_unused_page(mfn_t *mfn);
> +int xg_free_unused_page(mfn_t mfn);
> 
>  DECLARE_PER_CPU(unsigned int, vcpu_id);
>  DECLARE_PER_CPU(struct vcpu_info *, vcpu_info);
> --
> 2.20.1
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 7/8] x86: be more verbose when running on a hypervisor

2019-11-22 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 21 November 2019 19:51
> To: Xen Development List 
> Cc: Wei Liu ; Wei Liu ; Andrew Cooper
> ; Michael Kelley ; Jan
> Beulich ; Roger Pau Monné 
> Subject: [Xen-devel] [PATCH v4 7/8] x86: be more verbose when running on a
> hypervisor
> 
> Also replace xen_guest with running_on_hypervisor boolean.
> 
> Signed-off-by: Wei Liu 
> ---
> Changes in v4:
> 1. Access ->name directly.
> 2. Drop Roger's review tag.
> ---
>  xen/arch/x86/setup.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> index 19606d909b..123436b35a 100644
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -689,6 +689,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  int i, j, e820_warn = 0, bytes = 0;
>  bool acpi_boot_table_init_done = false, relocated = false;
>  int ret;
> +bool running_on_hypervisor;

Why not stash hops here? Then you can...

>  struct ns16550_defaults ns16550 = {
>  .data_bits = 8,
>  .parity= 'n',
> @@ -763,7 +764,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>   * allocing any xenheap structures wanted in lower memory. */
>  kexec_early_calculations();
> 
> -hypervisor_probe();
> +running_on_hypervisor = !!hypervisor_probe();
> 
>  parse_video_info();
> 
> @@ -788,6 +789,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  printk("Command line: %s\n", cmdline);
> 
>  printk("Xen image load base address: %#lx\n", xen_phys_start);
> +if ( running_on_hypervisor )
> +printk("Running on %s\n", hypervisor_probe()->name);

...avoid calling hypervisor_probe() again here.

  Paul

> 
>  #ifdef CONFIG_VIDEO
>  printk("Video information:\n");
> @@ -1569,7 +1572,7 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
>  max_cpus = nr_cpu_ids;
>  }
> 
> -if ( xen_guest )
> +if ( running_on_hypervisor )
>  hypervisor_setup();
> 
>  /* Low mappings were only needed for some BIOS table parsing. */
> --
> 2.20.1
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 6/8] x86: switch xen guest implementation to use hypervisor framework

2019-11-22 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 21 November 2019 19:51
> To: Xen Development List 
> Cc: Wei Liu ; Wei Liu ; Andrew Cooper
> ; Michael Kelley ; Jan
> Beulich ; Roger Pau Monné 
> Subject: [Xen-devel] [PATCH v4 6/8] x86: switch xen guest implementation
> to use hypervisor framework
> 
> Signed-off-by: Wei Liu 
[snip] 
> diff --git a/xen/include/asm-x86/guest/xen.h b/xen/include/asm-
> x86/guest/xen.h
> index 01dc3ee6f6..db90b550a7 100644
> --- a/xen/include/asm-x86/guest/xen.h
> +++ b/xen/include/asm-x86/guest/xen.h
> @@ -23,6 +23,7 @@
> 
>  #include 
>  #include 
> +#include 
> 
>  #define XEN_shared_info ((struct shared_info
> *)fix_to_virt(FIX_XEN_SHARED_INFO))
> 
> @@ -32,7 +33,7 @@ extern bool xen_guest;
>  extern bool pv_console;
>  extern uint32_t xen_cpuid_base;
> 
> -void probe_hypervisor(void);
> +const struct hypervisor_ops *xen_probe(void);
>  int xg_alloc_unused_page(mfn_t *mfn);
>  int xg_free_unused_page(mfn_t mfn);
> 
> @@ -44,7 +45,7 @@ DECLARE_PER_CPU(struct vcpu_info *, vcpu_info);
>  #define xen_guest 0
>  #define pv_console 0

Nit: These should be #defined to false rather than 0. The rest LGTM so with 
those fixed,

Reviewed-by: Paul Durrant 



> 
> -static inline void probe_hypervisor(void) {}
> +static inline const struct hypervisor_ops *xen_probe(void) { return NULL;
> }
> 
>  #endif /* CONFIG_XEN_GUEST */
>  #endif /* __X86_GUEST_XEN_H__ */
> --
> 2.20.1
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 8/8] x86: introduce CONFIG_HYPERV and detection code

2019-11-22 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 21 November 2019 19:51
> To: Xen Development List 
> Cc: Wei Liu ; Wei Liu ; Andrew Cooper
> ; Michael Kelley ; Jan
> Beulich ; Roger Pau Monné 
> Subject: [Xen-devel] [PATCH v4 8/8] x86: introduce CONFIG_HYPERV and
> detection code
> 
> We use the same code structure as we did for Xen.
> 
> As starters, detect Hyper-V in probe routine. More complex
> functionalities will be added later.
> 
> Take the chance to fix XEN_GUEST in Kconfig.

Would this fix be better in your earlier renaming patch?

> 
> Signed-off-by: Wei Liu 

Either way...

Reviewed-by: Paul Durrant 

> ---
> Changes in V4:
> 1. Add comment regarding order of probe functions.
> 2. Adapt to changes in previous patches.
> ---
>  xen/arch/x86/Kconfig   | 11 --
>  xen/arch/x86/guest/Makefile|  1 +
>  xen/arch/x86/guest/hyperv/Makefile |  1 +
>  xen/arch/x86/guest/hyperv/hyperv.c | 54 ++
>  xen/arch/x86/guest/hypervisor.c|  8 +
>  xen/include/asm-x86/guest.h|  1 +
>  xen/include/asm-x86/guest/hyperv.h | 43 
>  7 files changed, 117 insertions(+), 2 deletions(-)
>  create mode 100644 xen/arch/x86/guest/hyperv/Makefile
>  create mode 100644 xen/arch/x86/guest/hyperv/hyperv.c
>  create mode 100644 xen/include/asm-x86/guest/hyperv.h
> 
> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
> index 867de857e8..0a02b6ee3f 100644
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -164,10 +164,17 @@ endchoice
>  config GUEST
>   bool
> 
> +config HYPERV_GUEST
> + bool "Hyper-V Guest"
> + select GUEST
> + ---help---
> +   Support for Xen detecting when it is running under Hyper-V.
> +
> +   If unsure, say N.
> +
>  config XEN_GUEST
> - def_bool n
> + bool "Xen Guest"
>   select GUEST
> - prompt "Xen Guest"
>   ---help---
> Support for Xen detecting when it is running under Xen.
> 
> diff --git a/xen/arch/x86/guest/Makefile b/xen/arch/x86/guest/Makefile
> index f63d64bbee..f164196772 100644
> --- a/xen/arch/x86/guest/Makefile
> +++ b/xen/arch/x86/guest/Makefile
> @@ -1,3 +1,4 @@
>  obj-y += hypervisor.o
> 
> +subdir-$(CONFIG_HYPERV_GUEST) += hyperv
>  subdir-$(CONFIG_XEN_GUEST) += xen
> diff --git a/xen/arch/x86/guest/hyperv/Makefile
> b/xen/arch/x86/guest/hyperv/Makefile
> new file mode 100644
> index 00..68170109a9
> --- /dev/null
> +++ b/xen/arch/x86/guest/hyperv/Makefile
> @@ -0,0 +1 @@
> +obj-y += hyperv.o
> diff --git a/xen/arch/x86/guest/hyperv/hyperv.c
> b/xen/arch/x86/guest/hyperv/hyperv.c
> new file mode 100644
> index 00..916e08ff89
> --- /dev/null
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -0,0 +1,54 @@
> +/
> **
> + * arch/x86/guest/hyperv/hyperv.c
> + *
> + * Support for detecting and running under Hyper-V.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see .
> + *
> + * Copyright (c) 2019 Microsoft.
> + */
> +#include 
> +
> +#include 
> +
> +static const struct hypervisor_ops hyperv_ops = {
> +.name = "Hyper-V",
> +};
> +
> +const struct hypervisor_ops * __init hyperv_probe(void)
> +{
> +uint32_t eax, ebx, ecx, edx;
> +
> +cpuid(0x4000, , , , );
> +if ( !((ebx == 0x7263694d) &&  /* "Micr" */
> +   (ecx == 0x666f736f) &&  /* "osof" */
> +   (edx == 0x76482074)) )  /* "t Hv" */
> +return NULL;
> +
> +cpuid(0x4001, , , , );
> +if ( eax != 0x31237648 )/* Hv#1 */
> +return NULL;
> +
> +return _ops;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/x86/guest/hypervisor.c
> b/xen/arch/x86/guest/hypervisor.c
> index a067cacecb..c293e185cc 100644
> --- a/xen/arch/x86/guest/hypervisor.c
> +++ b/xen/arch/x86/guest/hypervisor.c
> @@ -39,6 +39,14 @@ const struct hypervisor_ops *hypervisor_probe(void)
>  if ( hops )
>  goto out;
> 
> +/*
> + * Detection of Hyper-V must come after Xen to avoid false positive
> due
> + * to viridian support
> + */
> +hops = hyperv_probe();
> +if ( hops )
> +goto out;
> +
>   out:
>  return hops;
>  }
> diff --git 

  1   2   3   4   5   >