Re: [Xen-devel] [PATCH v2] xen/events: remove event handling recursion detection

2019-11-28 Thread Jürgen Groß

On 28.11.19 22:37, Boris Ostrovsky wrote:

On 11/28/19 3:45 AM, Juergen Gross wrote:

-
  static void __xen_evtchn_do_upcall(void)
  {
struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
-   int cpu = get_cpu();
-   unsigned count;
+   int cpu = smp_processor_id();
  
  	do {

vcpu_info->evtchn_upcall_pending = 0;
  
-		if (__this_cpu_inc_return(xed_nesting_count) - 1)

-   goto out;
-
xen_evtchn_handle_events(cpu);
  
  		BUG_ON(!irqs_disabled());
  
-		count = __this_cpu_read(xed_nesting_count);

-   __this_cpu_write(xed_nesting_count, 0);
-   } while (count != 1 || vcpu_info->evtchn_upcall_pending);
-
-out:
+   rmb(); /* Hypervisor can set upcall pending. */


virt_rmb() perhaps then?


Yes, that's better.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 6/6] Added Resolving Disagreement

2019-11-28 Thread Lars Kurth


On 28/11/2019, 12:50, "Stefano Stabellini"  wrote:

On Thu, 28 Nov 2019, Jan Beulich wrote:
> On 28.11.2019 01:56, Stefano Stabellini wrote:
> > On Thu, 26 Sep 2019, Lars Kurth wrote:

> > I think a good recommendation would be for the contributor to try to
> > follow the maintainers requests, even if they could be considered
> > "style", trusting their experience on the matter. And a good
> > recommendation for the maintainer would be to try to let the contributor
> > have freedom of implementation choice on things that don't make a
> > significant difference.
> 
> I think we try to, but I also think we suffer from too little
> clear documentation on e.g. style aspects. Attempts on my part
> to address this have mostly (not entirely) lead no-where (lack of
> feedback on proposed patches to ./CODING_STYLE). So for the time
> being there are (many) aspects where we have de-facto expectations
> that aren't written down anywhere, with the result of (in a subset
> of cases) disagreement on what the perceived de-facto standard
> actually is.

I recognize that it could be challenging finding a consensus to update
CODING_STYLE but it might be worth doing to reduce frictions with both
contributors and other reviewers.

But to be clear, I was also referring to things that might be actually
hard to add to CODING_STYLE, such as macro vs. static inlines, when to
split a single function into multiple smaller functions, etc.

I think this is definitely something we ought to do. I am volunteering to
pick this up, but changing/clarifying the CODING_STYLE needs to be 
considered in conjunction with checking tools

I have parked this for now, as
a) I did not want to disrupt 4.13 
b) and until recently I also didn’t fully understand what kind of coding
standards would help with safety certification

And of course, having a bot do the checking would remove the friction
entirely. 

Lars


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide

2019-11-28 Thread Lars Kurth


On 28/11/2019, 12:12, "Rich Persaud"  wrote:

On Nov 28, 2019, at 05:12, Jan Beulich  wrote:
> 
> On 28.11.2019 01:54, Stefano Stabellini wrote:
>>> On Thu, 26 Sep 2019, Lars Kurth wrote:
>>> From: Lars Kurth 
>>> 
>>> This document highlights what reviewers such as maintainers and 
committers look
>>> for when reviewing code. It sets expectations for code authors and 
provides
>>> a framework for code reviewers.
>> 
>> I think the document is missing a couple of things:
>> 
>> - a simple one line statement that possibly the most important thing in
>>  a code review is to indentify any bugs in the code
>> 
>> - an explanation that requests for major changes to the series should be
>>  made early on (i.e. let's not change the architecture of a feature at
>>  v9 if possible) I also made this comment in reply to patch #5. I'll
>>  let you decide where is the best place for it.
> 
> This needs balancing. People crucial to the evaluation of a new
> feature and its implementation simply may not have the time to
> reply prior to v9. We've had situations where people posted new
> revisions every other day, sometimes even more than one per day.
> 
> As indicated in several other contexts before - imo people not
> helping to shoulder the review load should also not have the
> expectation that their (large) contributions will be looked at
> in due course. 

To make this actionable, we could have:

- reviewer demand index:  automated index of open patches still in need of 
review, sorted by decreasing age

- review flow control:  each new patch submission cites one recent review 
by the patch submitter, for a patch of comparable size

- reviewer supply growth:  a bootstrapping guide for new reviewers and 
submitters, with patterns, anti-patterns, and examples to be emulated

That is a great idea. However, I would not want to hold up the publication of 
these documents on these suggestions. Some of them would require implementing 
tools. I was hoping there would be more progress on lore and others 
tooling/workflow related stuff by now. So I think for now, I think it is 
sufficient to set expectations better.

Regards
Lars

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] psr: fix bug which may cause crash

2019-11-28 Thread Yi Sun
On 19-11-28 12:25:44, Jan Beulich wrote:
> On 28.11.2019 11:18, Yi Sun wrote:
> > --- a/xen/arch/x86/psr.c
> > +++ b/xen/arch/x86/psr.c
> > @@ -1271,7 +1271,8 @@ static void do_write_psr_msrs(void *data)
> >  
> >  for ( j = 0; j < cos_num; j++, index++ )
> >  {
> > -if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] )
> > +if ( cos <= feat->cos_max &&
> > + feat->cos_reg_val[cos * cos_num + j] != info->val[index] )
> 
> The description is indeed much better now, thanks. However,
> as indicated in reply to v1, this extra (and at the first glance
> unmotivated) bounds check wants to be accompanied by a brief but

I will add the comment.

> precise comment. Furthermore with the loop bounded by a local
> variable, why not
> 
> cos_num = min(props->cos_num, feat->cos_max + 1);
> 
> a few lines up from here (again suitable commented)?
>
cos_num is a different thing with the number of COS registers. The
meaning of it is "COS registers number that feature uses for one COS
ID". E.g. MBA/CAT cos_num is always 1. But CDP cos_num is 2 because it
uses 2 COS registers for one COS ID.
 
> Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 6/6] Added Resolving Disagreement

2019-11-28 Thread Lars Kurth


On 27/11/2019, 18:56, "Stefano Stabellini"  wrote:

On Thu, 26 Sep 2019, Lars Kurth wrote:
> From: Lars Kurth 
> 
> This guide provides Best Practice on identifying and resolving
> common classes of disagreement
> 
> Signed-off-by: Lars Kurth 
> --
> Cc: minios-de...@lists.xenproject.org
> Cc: xen-...@lists.xenproject.org
> Cc: win-pv-de...@lists.xenproject.org
> Cc: mirageos-de...@lists.xenproject.org
> Cc: committ...@xenproject.org
> ---
>  resolving-disagreement.md | 146 
++
>  1 file changed, 146 insertions(+)
>  create mode 100644 resolving-disagreement.md
> 
> diff --git a/resolving-disagreement.md b/resolving-disagreement.md
> new file mode 100644
> index 000..19aedbe
> --- /dev/null
> +++ b/resolving-disagreement.md
> @@ -0,0 +1,146 @@
> +# Resolving Disagreement
> +
> +This guide provides Best Practice on resolving disagreement, such as
> +* Gracefully accept constructive criticism
> +* Focus on what is best for the community
> +* Resolve differences in opinion effectively
> +
> +## Theory: Paul Graham's hierarchy of disagreement
> +Paul Graham proposed a **disagreement hierarchy** in a 2008 essay 
> +**[How to Disagree](http://www.paulgraham.com/disagree.html)**, putting 
types of
> +arguments into a seven-point hierarchy and observing that *moving up the
> +disagreement hierarchy makes people less mean, and will make most of 
them happier*.
> +Graham also suggested that the hierarchy can be thought of as a pyramid, 
as the 
> +highest forms of disagreement are rarer.
> +
> +| ![Graham's Hierarchy of 
Disagreemen](https://upload.wikimedia.org/wikipedia/commons/a/a3/Graham%27s_Hierarchy_of_Disagreement-en.svg)
 |
 ^ Disagreement

This is a NIT but in a few places in this series you go over the
original line length.

True: typically for URLs. Primarily because I don't know whether they are 
rendered correctly when split

> +| *A representation of Graham's hierarchy of disagreement from 
[Loudacris](http://www.createdebate.com/user/viewprofile/Loudacris) modified by 
[Rocket000](https://en.wikipedia.org/wiki/User:Rocket000)* |
> +
> +In the context of the Xen Project we strive to **only use the top half** 
of the hierarchy.
> +**Name-calling** and **Ad hominem** arguments are not acceptable within 
the Xen
> +Project.
> +
> +## Issue: Scope creep
> +
> +One thing which occasionally happens during code review is that a code 
reviewer
> +asks or appears to ask the author of patch to implement additional 
functionality.
   ^ a patch  ^ 
functionalities 


> +This could take for example the form of
> +> Do you think it would be useful for the code to do XXX? 
> +> I can imagine a user wanting to do YYY (and XXX would enable this)
> +
> +That potentially adds additional work for the code author, which they 
may not have
> +the time to perform. It is good practice for authors to consider such a 
request in terms of
> +* Usefulness to the user
> +* Code churn, complexity or impact on other system properties
> +* Extra time to implement and report back to the reviewer
> +
> +If you believe that the impact/cost is too high, report back to the 
reviewer. To resolve
> +this, it is advisable to
> +* Report your findings
> +* And then check whether this was merely an interesting suggestion, or 
something the
> +reviewer feels more strongly about
> +
> +In the latter case, there are typically several common outcomes
> +* The **author and reviewer agree** that the suggestion should be 
implemented
> +* The **author and reviewer agree** that it may make sense to defer 
implementation
> +* The **author and reviewer agree** that it makes no sense to implement 
the suggestion
> +
> +The author of a patch would typically suggest their preferred outcome, 
for example
> +> I am not sure it is worth to implement XXX
> +> Do you think this could be done as a separate patch in future?
> +
> +In cases, where no agreement can be found, the best approach would be to 
get an
> +independent opinion from another maintainer or the project's leadership 
team.

I think we should mention somewhere here that it is recommended for
reviewers to be explicit about whether a request is optional or whether
it is a requirement.

For instance: "I think it would be good if X also did Y" doesn't say if
it is optional (future work) or it is actually required as part of this
series. More explicit word choices are preferable, such as:

"I think it would be good if X also did Y, not a requirement but good to
have."

"I think it 

Re: [Xen-devel] [MirageOS-devel] [PATCH v2 4/6] Add Code Review Guide

2019-11-28 Thread Lars Kurth


From: Rich Persaud 
Date: Thursday, 28 November 2019 at 12:21
To: Lars Kurth 
Cc: 'Jan Beulich' , "lars.ku...@xenproject.org" 
, Stefano Stabellini , 
"xen-...@lists.xenproject.org" , 
"minios-de...@lists.xenproject.org" , 
"committ...@xenproject.org" , 
"mirageos-de...@lists.xenproject.org" , 
xen-devel , "win-pv-de...@lists.xenproject.org" 

Subject: Re: [MirageOS-devel] [PATCH v2 4/6] Add Code Review Guide

On Nov 28, 2019, at 09:05, Lars Kurth  wrote:

On 28/11/2019, 07:37, "Jan Beulich"  wrote:

   On 28.11.2019 14:06, Lars Kurth wrote:

I can certainly add something on the timing , along the lines of
* For complex series, consider the time it takes to do reviews (maybe with a 
guide of LOC per hour) and give reviewers enough time to
* For series with design issues or large questions, try and highlight the key 
open issues in cover letters clearly and solicit feedback from key maintainers 
who can comment on the open issue. The idea is to save both the contributor and 
the reviewers time by focussing on what needs to be resolved
* Don’t repost a series, unless all review comments are addressed
or the reviewers asked you to do so. The problem with this is that
this is somewhat in conflict with the "let's focus on the core
issues and not get distracted by details early on in a review cycle".
In other words, this can only work, if reviewers focus on major
issues in early reviews only and do not focus on style, coding
standards, etc.

   But this doesn't make much sense either, because then full re-reviews
   need to happen anyway on later versions, to also deal with the minor
   issues. For RFC kind of series omitting style and alike feedback
   certainly makes sense, but as soon as a patch is non-RFC, it should
   be considered good to go in by the submitter.

OK, I think we have a disconnect between ideal and reality.

I see two issues today
* Key maintainers don't always review RFC series [they end up at the bottom of 
the priority list, even though spending time on RFCs will save time elsewhere 
later]. So the effect is that then the contributor assumes there are no major 
issues and ends it as a proper series
* In practice what has happened often in the past is that design, architecture, 
assumption flaws are found in early versions of a series.
  - This usually happens because of an oversight or because there was no design 
discussion prior to the series being posted and agreed
  - Common sense would dictate that the biggest benefit for both the reviewer, 
the contributor and the community as a whole would be to try and focus on such 
flaws and leave everything aside
  - Of course there may be value in doing a detailed review of parts of such a 
series as there may be bits that are unaffected by such a flaw
  - But there will likely be parts which are not: doing a detailed review of 
such portions wastes everyone's time

So coming back to your point. Ideally, it would be nice if we had the 
capability to call out parts of a series as "problematic" and treating such 
parts differently.

  We may be able to reuse some "Shift Left" terminology, including citations of 
previous Xen code reviews to illustrate categories of design issues that can be 
shifted left:

https://devopedia.org/shift-left

I like that idea. We seem to not have come to a conclusion on this specific 
topic, but maybe for now it is sufficient to call this out as a potential issue 
in the guide.

Before I send out a new version, it would be good to get at least Jan’s view on 
the issue.

Lars

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xtf test] 144355: regressions - FAIL

2019-11-28 Thread osstest service owner
flight 144355 xtf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144355/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-xtf-amd64-amd64-4   85 leak-check/check fail REGR. vs. 143721
 test-xtf-amd64-amd64-1   85 leak-check/check fail REGR. vs. 143721

Tests which did not succeed, but are not blocking:
 test-xtf-amd64-amd64-4   84 xtf/test-pv32pae-xsa-296 fail   never pass
 test-xtf-amd64-amd64-1   84 xtf/test-pv32pae-xsa-296 fail   never pass

version targeted for testing:
 xtf  58bcde1b9209bab1e51f5645a913998b9da14bce
baseline version:
 xtf  08a19af3c78e8a03f83bc354b50545136c03edd2

Last test of basis   143721  2019-11-04 13:24:42 Z   24 days
Testing same since   144355  2019-11-28 23:38:54 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 

jobs:
 build-amd64-xtf  pass
 build-amd64  pass
 build-amd64-pvopspass
 test-xtf-amd64-amd64-1   fail
 test-xtf-amd64-amd64-2   pass
 test-xtf-amd64-amd64-3   pass
 test-xtf-amd64-amd64-4   fail
 test-xtf-amd64-amd64-5   pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 58bcde1b9209bab1e51f5645a913998b9da14bce
Author: Andrew Cooper 
Date:   Tue May 7 11:19:58 2019 +0100

XSA-298 PoC

commit fe4b41b4519d54f60347eb3f895d0a7b63d46e8d
Author: Andrew Cooper 
Date:   Thu Apr 11 16:50:23 2019 +0100

XSA-296 PoC

Signed-off-by: Andrew Cooper 

commit 7482af39dc2c6b9e90fa100cc35e3727283da6e0
Author: Andrew Cooper 
Date:   Fri Nov 15 13:23:03 2019 +

idt: Constify the xtf_idte parameter to xtf_set_idte()

It is only ever read.  Take the opportunity to adjust all callers to 
construct
their struct xtf_idte in .rodata.

Signed-off-by: Andrew Cooper 

commit 31c03eaaf3424fb454c4768cc8df28a0c45bbddb
Author: Andrew Cooper 
Date:   Thu Nov 28 20:52:36 2019 +

docs: Use https:// links in preference to http://

Signed-off-by: Andrew Cooper 

commit b213e50d7b5af3c1bd9e4ba0d22f15f91c3a877a
Author: Andrew Cooper 
Date:   Mon Nov 25 13:31:54 2019 +

Drop custom stack handling for nested tasks

I don't recall how I came to this conclusion, but its not correct.  IRET 
with
NT set doesn't inspect the stack at all.

Signed-off-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable test] 144349: regressions - FAIL

2019-11-28 Thread osstest service owner
flight 144349 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144349/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-libvirt 18 guest-start/debian.repeat fail REGR. vs. 144344

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 16 guest-localmigrate   fail  like 144344
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144344
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144344
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144344
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144344
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144344
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144344
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 144344
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144344
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144344
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144344
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 xen  56348df32bbc782e63b6e3fb978b80e015ae76e7
baseline version:
 xen  9a400d1797ec7f77ffefeb5c4e17a8c2e8b91a12

Last test of basis   144344  2019-11-28 02:00:09 Z0 days
Testing same since   144349  2019-11-28 14:36:38 Z0 days1 attempts


People who touched 

Re: [Xen-devel] [PATCH v2 5/6] Add guide on Communication Best Practice

2019-11-28 Thread Lars Kurth


On 27/11/2019, 19:06, "Stefano Stabellini"  wrote:

On Fri, 27 Sep 2019, Jan Beulich wrote:
> On 26.09.2019 21:39, Lars Kurth wrote:
> > +### Verbose vs. terse
> > +Due to the time it takes to review and compose code reviewer, 
reviewers often adopt a
> > +terse style. It is not unusual to see review comments such as
> > +> typo
> > +> s/resions/regions/
> > +> coding style
> > +> coding style: brackets not needed
> > +etc.
> > +
> > +Terse code review style has its place and can be productive for both 
the reviewer and
> > +the author. However, overuse can come across as unfriendly, lacking 
empathy and
> > +can thus create a negative impression with the author of a patch. This 
is in particular
> > +true, when you do not know the author or the author is a newcomer. 
Terse
> > +communication styles can also be perceived as rude in some cultures.
> 
> And another remark here: Not being terse in situations like the ones
> enumerated as examples above is a double waste of the reviewer's time:
> They shouldn't even need to make such comments, especially not many
> times for a single patch (see your mention of "overuse"). I realize
> we still have no automated mechanism to check style aspects, but
> anybody can easily look over their patches before submitting them.
> And for an occasional issue I think a terse reply is quite reasonable
> to have.
> 
> Overall I'm seeing the good intentions of this document, yet I'd still
> vote at least -1 on it if it came to a vote. Following even just a
> fair part of it is a considerable extra amount of time to invest in
> reviews, when we already have a severe reviewing bottleneck. If I have
> to judge between doing a bad (stylistically according to this doc, not
> technically) review or none at all (because of time constraints), I'd
> favor the former. Unless of course I'm asked to stop doing so, in
> which case I'd expect whoever asks to arrange for the reviews to be
> done by someone else in due course.

Reading the document, I think Jan has a point that it gives the
impression that following the suggestions would take significant
efforts, while actually I don't think Lars meant it that way at all, and
I don't think it should be the case either.

Yes. Ultimately the effect of a better communication should overall be a 
net-positive in terms of effort. 

Maybe we should highlight and encourage "clarity" instead of "verbosity"
of the communication, and encourage "expressing appreciation" to
newcomers, not necessarily to seasoned contributors.

Good idea

The ultimate goal of this document is actually to *reduce* our overall
efforts by making our communication more efficient, not to increase
efforts. Maybe it is worth saying this too.

It is worth saying this. 

Regards
Lars



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 5/6] Add guide on Communication Best Practice

2019-11-28 Thread Lars Kurth


On 27/11/2019, 18:57, "Stefano Stabellini"  wrote:

On Thu, 26 Sep 2019, Lars Kurth wrote:
> From: Lars Kurth 
> 
> This guide covers the bulk on Best Practice related to code review
> It primarily focusses on code review interactions
> It also covers how to deal with Misunderstandings and Cultural
> Differences
> 
> Signed-off-by: Lars Kurth 
> ---
> Cc: minios-de...@lists.xenproject.org
> Cc: xen-...@lists.xenproject.org
> Cc: win-pv-de...@lists.xenproject.org
> Cc: mirageos-de...@lists.xenproject.org
> Cc: committ...@xenproject.org
> ---
>  communication-practice.md | 410 
++
>  1 file changed, 410 insertions(+)
>  create mode 100644 communication-practice.md
> 
> diff --git a/communication-practice.md b/communication-practice.md
> new file mode 100644
> index 000..db9a5ef
> --- /dev/null
> +++ b/communication-practice.md
> @@ -0,0 +1,410 @@
> +# Communication Best Practice
> +
> +This guide provides communication Best Practice that helps you in
> +* Using welcoming and inclusive language
> +* Keeping discussions technical and actionable
> +* Being respectful of differing viewpoints and experiences
> +* Being aware of your own and counterpart’s communication style and 
culture
> +* Show empathy towards other community members
> +
> +## Code reviews for **reviewers** and **patch authors**
> +
> +Before embarking on a code review, it is important to remember that
> +* A poorly executed code review can hurt the contributors feeling, even 
when a reviewer
> +  did not intend to do so. Feeling defensive is a normal reaction to a 
critique or feedback.
> +  A reviewer should be aware of how the pitch, tone, or sentiment of 
their comments
> +  could be interpreted by the contributor. The same applies to responses 
of an author
> +  to the reviewer.
> +* When reviewing someone's code, you are ultimately looking for issues. 
A good code
> +  reviewer is able to mentally separate finding issues from articulating 
code review
> +  comments in a constructive and positive manner: depending on your 
personality this
> +  can be **difficult** and you may need to develop a technique that 
works for you.
> +* As software engineers we like to be proud of the solutions we came up 
with. This can
> +  make it easy to take another people’s criticism personally. Always 
remember that it is
> +  the code that is being reviewed, not you as a person.
> +* When you receive code review feedback, please be aware that we have 
reviewers
> +  from different backgrounds, communication styles and cultures. 
Although we all trying
> +  to create a productive, welcoming and agile environment, we do not 
always succeed.
> +
> +### Express appreciation
> +As the nature of code review to find bugs and possible issues, it is 
very easy for
> +reviewers to get into a mode of operation where the patch review ends up 
being a list
> +of issues, not mentioning what is right and well done. This can lead to 
the code
> +submitter interpreting your feedback in a negative way.
> +
> +The opening of a code review provides an opportunity to address this and 
also sets the
> +tone for the rest of the code review. Starting **every** review on a 
positive note, helps
> +set the tone for the rest of the review.
> +
> +For an initial patch, you can use phrases such as
> +> Thanks for the patch
> +> Thanks for doing this
> +
> +For further revisions within a review, phrases such as
> +> Thank you for addressing the last set of changes
> +
> +If you believe the code was good, it is good practice to highlight this 
by using phrases
> +such as
> +> Looks good, just a few comments
> +> The changes you have made since the last version look good
> +
> +If you think there were issues too many with the code to use one of the 
phrases,
> +you can still start on a positive note, by for example saying
> +> I think this is a good change
> +> I think this is a good feature proposal
> +
> +It is also entirely fine to highlight specific changes as good. The best 
place to
> +do this, is at top of a patch, as addressing code review comments 
typically requires
 ^ the top


> +a contributor to go through the list of things to address and an 
in-lined positive
> +comment is likely to break that workflow.
> +
> +You should also consider, that if you review a patch of an experienced
> +contributor phrases such as *Thanks for the patch* could come across as
> +patronizing, while using *Thanks for doing this* is less likely to be 
interpreted
> +as such.
> +
> +Appreciation should also be expressed by patch authors when asking for 

Re: [Xen-devel] [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-28 Thread Boris Ostrovsky
On 11/28/19 5:23 AM, Jan Beulich wrote:
> On 28.11.2019 10:38, Paul Durrant wrote:
>
>> --- a/xen/arch/x86/cpu/vpmu.c
>> +++ b/xen/arch/x86/cpu/vpmu.c
>> @@ -576,11 +576,36 @@ static void vpmu_arch_destroy(struct vcpu *v)
>>  
>>   vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
>>  }
>> +
>> +vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
>>  }
> Boris, to be on the safe side - are you in agreement with this
> change, now that the setting of the flag is being left untouched?

Yes, this is fine.

(I probably would clear it in arch_vpmu_destroy op since it is set in
arch-specific code but either way works)

-boris

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 0/2] xen/gntdev: sanitize user interface handling

2019-11-28 Thread Boris Ostrovsky
On 11/28/19 3:48 AM, Jürgen Groß wrote:
> On 07.11.19 12:15, Juergen Gross wrote:
>> The Xen gntdev driver's checking of the number of allowed mapped pages
>> is in need of some sanitizing work.
>>
>> Changes in V2:
>> - enhanced commit message of patch 1 (Andrew Cooper)
>>
>> Juergen Gross (2):
>>    xen/gntdev: replace global limit of mapped pages by limit per call
>>    xen/gntdev: switch from kcalloc() to kvcalloc()
>>
>>   drivers/xen/gntdev-common.h |  2 +-
>>   drivers/xen/gntdev-dmabuf.c | 11 +++--
>>   drivers/xen/gntdev.c    | 55
>> +++--
>>   3 files changed, 27 insertions(+), 41 deletions(-)
>>
>
> Boris, could you please comment on the patches?

Sure, both look good (and so R-b). I haven't commented since Oleksandr
already gave you his.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/events: remove event handling recursion detection

2019-11-28 Thread Boris Ostrovsky
On 11/28/19 3:45 AM, Juergen Gross wrote:
> -
>  static void __xen_evtchn_do_upcall(void)
>  {
>   struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu);
> - int cpu = get_cpu();
> - unsigned count;
> + int cpu = smp_processor_id();
>  
>   do {
>   vcpu_info->evtchn_upcall_pending = 0;
>  
> - if (__this_cpu_inc_return(xed_nesting_count) - 1)
> - goto out;
> -
>   xen_evtchn_handle_events(cpu);
>  
>   BUG_ON(!irqs_disabled());
>  
> - count = __this_cpu_read(xed_nesting_count);
> - __this_cpu_write(xed_nesting_count, 0);
> - } while (count != 1 || vcpu_info->evtchn_upcall_pending);
> -
> -out:
> + rmb(); /* Hypervisor can set upcall pending. */

virt_rmb() perhaps then?

-boris

>  
> - put_cpu();
> + } while (vcpu_info->evtchn_upcall_pending);
>  }
>  
>  void xen_evtchn_do_upcall(struct pt_regs *regs)


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 144352: tolerable all pass - PUSHED

2019-11-28 Thread osstest service owner
flight 144352 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144352/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  1d758bc6d1a8c0f658a874470c349ee4e27aee46
baseline version:
 xen  5655ce8b1ec2a82ef080078e41c73bbd536174e1

Last test of basis   144350  2019-11-28 15:00:23 Z0 days
Testing same since   144352  2019-11-28 18:01:10 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Konrad Rzeszutek Wilk 
  Sergey Dyasli 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   5655ce8b1e..1d758bc6d1  1d758bc6d1a8c0f658a874470c349ee4e27aee46 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range

2019-11-28 Thread Julien Grall
On Thu, 28 Nov 2019, 18:09 Stefano Stabellini, 
wrote:

> On Thu, 28 Nov 2019, Julien Grall wrote:
> > > In both cases I see no reason to keep wrong code.
> > >
> > > Either the patch will let run Linux 5.4 fine - then the patch should
> > > definitely be taken.
> > That's up to Stefano and Peng to provide me information why this is fine.
> > FAOD, the current justification provided is not acceptable for me.
>
> I disagree. This is a typo fix. The original design was never spec
> compliant. You cannot expect the typo fix to explain why the original
> behavior is tolerable. That is out of scope and should *not* be required
> for this fix.


May I remind you that as a maintainer, this is in my right to say no to a
patch.


> We cannot expect typo fixes to go and trigger vgic/gic reworks and deep
> investigations. This is a wrong expectation now, and going forward.
>

That's the best way to turn Xen into a bunch of hacks.

I pointed out several times a potential issue with this patch. I also spent
some part of my week-end investigating it and provide some insight. Did you
look at them?

If you want this patch in, then please help explaining why 5.4 is going to
run fine on Xen 4.13 rather than keeping arguing this is a typo fix.

Cheers,
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/arm: include xen-tools/libs.h from libxl_arm_acpi.c

2019-11-28 Thread Stefano Stabellini
On Thu, 28 Nov 2019, Wei Liu wrote:
> On Thu, Nov 28, 2019 at 11:30:34AM +0100, Jürgen Groß wrote:
> > On 28.11.19 11:15, Wei Liu wrote:
> > > On Wed, Nov 27, 2019 at 06:24:58PM -0800, Stefano Stabellini wrote:
> > > > libxl_arm_acpi.c is using BUILD_BUG_ON but it is not including
> > > > xen-tools/libs.h that defines it.
> > > > 
> > > > Signed-off-by: Stefano Stabellini 
> > > 
> > > Acked-by: Wei Liu 
> > > 
> > > Juergen, this is a trivial patch. I think it can go in 4.13.
> > 
> > Why is this patch needed?
> > 
> > tools/libxl/libxl_arm_acpi.c includes libxl_arm.h, which includes
> > libxl_internal.h, which includes xen-tools/libs.h.
> 
> Oh I missed that.
> 
> In that case I don't think this patch is required for 4.13.
> 
> Stefano, did you see a build error or something?
> 

Hi Wei, and Jurgen,

Thanks for the review and also for probably catching a mistake in the
patch.

Yes, this patch fixes a build error with the latest Yocto and gcc 9:

| libxl_arm_acpi.c: In function 'make_acpi_rsdp':
| libxl_arm_acpi.c:193:5: error: implicit declaration of function 
'BUILD_BUG_ON' [-Werror=implicit-function-declaration]
|   193 | BUILD_BUG_ON(sizeof(ACPI_OEM_ID) != sizeof(rsdp->oem_id));
|   | ^~~~

but the error was based on an older Xen tree based on 4.11, which
doesn't have an include of xen-tools/libs.h in libxl_internal.h. So, I
think Juergen is right that this should not be needed upstream.

I didn't actually have a repro (the issue was reported to me by somebody
else), so before I sent the patch to xen-devel I manually check the code
but couldn't actually try a build. And I didn't notice the include
xen-tools/libs.h in libxl_internal.h. Sorry about that.___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 6/6] Added Resolving Disagreement

2019-11-28 Thread Stefano Stabellini
On Thu, 28 Nov 2019, Jan Beulich wrote:
> On 28.11.2019 01:56, Stefano Stabellini wrote:
> > On Thu, 26 Sep 2019, Lars Kurth wrote:
> >> +This could take for example the form of
> >> +> Do you think it would be useful for the code to do XXX? 
> >> +> I can imagine a user wanting to do YYY (and XXX would enable this)
> >> +
> >> +That potentially adds additional work for the code author, which they may 
> >> not have
> >> +the time to perform. It is good practice for authors to consider such a 
> >> request in terms of
> >> +* Usefulness to the user
> >> +* Code churn, complexity or impact on other system properties
> >> +* Extra time to implement and report back to the reviewer
> >> +
> >> +If you believe that the impact/cost is too high, report back to the 
> >> reviewer. To resolve
> >> +this, it is advisable to
> >> +* Report your findings
> >> +* And then check whether this was merely an interesting suggestion, or 
> >> something the
> >> +reviewer feels more strongly about
> >> +
> >> +In the latter case, there are typically several common outcomes
> >> +* The **author and reviewer agree** that the suggestion should be 
> >> implemented
> >> +* The **author and reviewer agree** that it may make sense to defer 
> >> implementation
> >> +* The **author and reviewer agree** that it makes no sense to implement 
> >> the suggestion
> >> +
> >> +The author of a patch would typically suggest their preferred outcome, 
> >> for example
> >> +> I am not sure it is worth to implement XXX
> >> +> Do you think this could be done as a separate patch in future?
> >> +
> >> +In cases, where no agreement can be found, the best approach would be to 
> >> get an
> >> +independent opinion from another maintainer or the project's leadership 
> >> team.
> > 
> > I think we should mention somewhere here that it is recommended for
> > reviewers to be explicit about whether a request is optional or whether
> > it is a requirement.
> > 
> > For instance: "I think it would be good if X also did Y" doesn't say if
> > it is optional (future work) or it is actually required as part of this
> > series. More explicit word choices are preferable, such as:
> > 
> > "I think it would be good if X also did Y, not a requirement but good to
> > have."
> > 
> > "I think it would be good if X also did Y and it should be part of this
> > series."
> 
> I think without it being made explicit that something is optional,
> the assumption should be that it isn't. I.e. in the first example
> I agree with the idea to have something after the comma, but in
> the second example I think the extra wording is a waste of effort.

If you are concerned about brevity, then a better example would be:

  "X should also do Y" -> required

  "It would be good if X also did Y, just optional." -> optional

I didn't want to go that far but we could even have an actually tag,
like:

  "X should also do Y [REQ]"
  "X should also do Y [OPT]"


On the default, let me premise that at the end of the day I agree that
the default should be that "it is required", because it is probably what
it makes more sense.

That said, the issue is that as human beings we tend to forget about
these things. As an example, I have been trying to apply this simple
optional/reply communication style to my own reviews in the last few
months and I still forget often to mark them appropriately.

If you forget to mark an optional suggestion as such, it might annoy the
contributor, thinking that you are requiring something that she doesn't
want to do. If we switched the default to optional, then the risk would
be that the contributor could ignore it, which is problematic for the
reviewer, although we recommend in these docs for the contributor to say
what they intend to do about optional suggestions explicitly, and also
as a reviewer I always manually check if all my comments from a previous
version of the series were addressed anyway.

I don't have a good solution to this, I am just sharing my experience.


> > I think there is something else we should say on this topic. There is a
> > category of things which could be done in multiple ways and it is not
> > overtly obvious which one is best. It is done to the maintainer and the
> > author personal styles. It is easy to disagree on that.
> > 
> > I think a good recommendation would be for the contributor to try to
> > follow the maintainers requests, even if they could be considered
> > "style", trusting their experience on the matter. And a good
> > recommendation for the maintainer would be to try to let the contributor
> > have freedom of implementation choice on things that don't make a
> > significant difference.
> 
> I think we try to, but I also think we suffer from too little
> clear documentation on e.g. style aspects. Attempts on my part
> to address this have mostly (not entirely) lead no-where (lack of
> feedback on proposed patches to ./CODING_STYLE). So for the time
> being there are (many) aspects where we have de-facto 

Re: [Xen-devel] [MirageOS-devel] [PATCH v2 4/6] Add Code Review Guide

2019-11-28 Thread Rich Persaud
On Nov 28, 2019, at 09:05, Lars Kurth  wrote:
> 
> On 28/11/2019, 07:37, "Jan Beulich"  wrote:
> 
>>On 28.11.2019 14:06, Lars Kurth wrote:
>> I can certainly add something on the timing , along the lines of
>> * For complex series, consider the time it takes to do reviews (maybe with a 
>> guide of LOC per hour) and give reviewers enough time to
>> * For series with design issues or large questions, try and highlight the 
>> key open issues in cover letters clearly and solicit feedback from key 
>> maintainers who can comment on the open issue. The idea is to save both the 
>> contributor and the reviewers time by focussing on what needs to be resolved 
>> * Don’t repost a series, unless all review comments are addressed
>> or the reviewers asked you to do so. The problem with this is that
>> this is somewhat in conflict with the "let's focus on the core
>> issues and not get distracted by details early on in a review cycle".
>> In other words, this can only work, if reviewers focus on major
>> issues in early reviews only and do not focus on style, coding
>> standards, etc.
> 
>But this doesn't make much sense either, because then full re-reviews
>need to happen anyway on later versions, to also deal with the minor
>issues. For RFC kind of series omitting style and alike feedback
>certainly makes sense, but as soon as a patch is non-RFC, it should
>be considered good to go in by the submitter.
> 
> OK, I think we have a disconnect between ideal and reality. 
> 
> I see two issues today
> * Key maintainers don't always review RFC series [they end up at the bottom 
> of the priority list, even though spending time on RFCs will save time 
> elsewhere later]. So the effect is that then the contributor assumes there 
> are no major issues and ends it as a proper series
> * In practice what has happened often in the past is that design, 
> architecture, assumption flaws are found in early versions of a series.
>   - This usually happens because of an oversight or because there was no 
> design discussion prior to the series being posted and agreed
>   - Common sense would dictate that the biggest benefit for both the 
> reviewer, the contributor and the community as a whole would be to try and 
> focus on such flaws and leave everything aside
>   - Of course there may be value in doing a detailed reviews of such a series 
> as there may be bits that are unaffected by such a flaw
>   - But there will likely be parts which are not: doing a detailed review of 
> such portions wastes everyone's time
> 
> So coming back to your point. Ideally, it would be nice if we had the 
> capability to call out parts of a series as "problematic" and treating such 
> parts differently.

We may be able to reuse some "Shift Left" terminology, including citations of 
previous Xen code reviews to illustrate categories of design issues that can be 
shifted left:

  https://devopedia.org/shift-left

Rich

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide

2019-11-28 Thread Stefano Stabellini
On Thu, 28 Nov 2019, Jan Beulich wrote:
> On 28.11.2019 01:54, Stefano Stabellini wrote:
> > On Thu, 26 Sep 2019, Lars Kurth wrote:
> >> From: Lars Kurth 
> >>
> >> This document highlights what reviewers such as maintainers and committers 
> >> look
> >> for when reviewing code. It sets expectations for code authors and provides
> >> a framework for code reviewers.
> > 
> > I think the document is missing a couple of things:
> > 
> > - a simple one line statement that possibly the most important thing in
> >   a code review is to indentify any bugs in the code
> > 
> > - an explanation that requests for major changes to the series should be
> >   made early on (i.e. let's not change the architecture of a feature at
> >   v9 if possible) I also made this comment in reply to patch #5. I'll
> >   let you decide where is the best place for it.
> 
> This needs balancing. People crucial to the evaluation of a new
> feature and its implementation simply may not have the time to
> reply prior to v9. We've had situations where people posted new
> revisions every other day, sometimes even more than one per day.

Yes, you are right, it needs balancing. This is not meant to encourage
contributors to send 9 versions of a series within a week or two :-)

We could say that "contributors should make sure to give enough time to
all the key stakeholders to review the series".



> As indicated in several other contexts before - imo people not
> helping to shoulder the review load should also not have the
> expectation that their (large) contributions will be looked at
> in due course. 

I think you are right on this point, and maybe we could add something to
that effect

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide

2019-11-28 Thread Rich Persaud
On Nov 28, 2019, at 05:12, Jan Beulich  wrote:
> 
> On 28.11.2019 01:54, Stefano Stabellini wrote:
>>> On Thu, 26 Sep 2019, Lars Kurth wrote:
>>> From: Lars Kurth 
>>> 
>>> This document highlights what reviewers such as maintainers and committers 
>>> look
>>> for when reviewing code. It sets expectations for code authors and provides
>>> a framework for code reviewers.
>> 
>> I think the document is missing a couple of things:
>> 
>> - a simple one line statement that possibly the most important thing in
>>  a code review is to indentify any bugs in the code
>> 
>> - an explanation that requests for major changes to the series should be
>>  made early on (i.e. let's not change the architecture of a feature at
>>  v9 if possible) I also made this comment in reply to patch #5. I'll
>>  let you decide where is the best place for it.
> 
> This needs balancing. People crucial to the evaluation of a new
> feature and its implementation simply may not have the time to
> reply prior to v9. We've had situations where people posted new
> revisions every other day, sometimes even more than one per day.
> 
> As indicated in several other contexts before - imo people not
> helping to shoulder the review load should also not have the
> expectation that their (large) contributions will be looked at
> in due course. 

To make this actionable, we could have:

- reviewer demand index:  automated index of open patches still in need of 
review, sorted by decreasing age

- review flow control:  each new patch submission cites one recent review by 
the patch submitter, for a patch of comparable size

- reviewer supply growth:  a bootstrapping guide for new reviewers and 
submitters, with patterns, anti-patterns, and examples to be emulated

Rich
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range

2019-11-28 Thread Stefano Stabellini
On Thu, 28 Nov 2019, Julien Grall wrote:
> > In both cases I see no reason to keep wrong code.
> > 
> > Either the patch will let run Linux 5.4 fine - then the patch should
> > definitely be taken.
> That's up to Stefano and Peng to provide me information why this is fine.
> FAOD, the current justification provided is not acceptable for me.

I disagree. This is a typo fix. The original design was never spec
compliant. You cannot expect the typo fix to explain why the original
behavior is tolerable. That is out of scope and should *not* be required
for this fix.

We cannot expect typo fixes to go and trigger vgic/gic reworks and deep
investigations. This is a wrong expectation now, and going forward.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 144350: tolerable all pass - PUSHED

2019-11-28 Thread osstest service owner
flight 144350 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144350/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  5655ce8b1ec2a82ef080078e41c73bbd536174e1
baseline version:
 xen  56348df32bbc782e63b6e3fb978b80e015ae76e7

Last test of basis   144346  2019-11-28 11:01:36 Z0 days
Testing same since   144350  2019-11-28 15:00:23 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   56348df32b..5655ce8b1e  5655ce8b1ec2a82ef080078e41c73bbd536174e1 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Jan Beulich
On 28.11.2019 17:42, Ian Jackson wrote:
> Jan Beulich writes ("Re: [PATCH-for-4.13 v4] Rationalize max_grant_frames and 
> max_maptrack_frames handling"):
>> On 28.11.2019 14:58, Paul Durrant wrote:
>>>  uint32_t max_vcpus;
>>>  uint32_t max_evtchn_port;
>>> -uint32_t max_grant_frames;
>>> -uint32_t max_maptrack_frames;
>>> +int32_t max_grant_frames;
>>> +int32_t max_maptrack_frames;
>>
>> While this may want backporting aiui, we need to be a little
>> careful with the interface change here.
> 
> A note here in a list discussion, or even in a commit message, is
> perhaps not going to be very effective to deal with this.
> 
> How bad would it be to change the names as well as the types ?

Hmm, not sure - on one hand this would flag the issue to people
consuming the interface. Otoh it wouldn't help people using
derived header files (they'd notice far later), and causing
breakage like this in stable trees does't look very friendly
either.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH-for-4.13 v5] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Paul Durrant
From: George Dunlap 

Xen used to have single, system-wide limits for the number of grant
frames and maptrack frames a guest was allowed to create. Increasing
or decreasing this single limit on the Xen command-line would change
the limit for all guests on the system.

Later, per-domain limits for these values was created. The system-wide
limits became strict limits: domains could not be created with higher
limits, but could be created with lower limits. However, that change
also introduced a range of different "default" values into various
places in the toolstack:

- The python libxc bindings hard-coded these values to 32 and 1024,
  respectively
- The libxl default values are 32 and 1024 respectively.
- xl will use the libxl default for maptrack, but does its own default
  calculation for grant frames: either 32 or 64, based on the max
  possible mfn.

These defaults interact poorly with the hypervisor command-line limit:

- The hypervisor command-line limit cannot be used to raise the limit
  for all guests anymore, as the default in the toolstack will
  effectively override this.
- If you use the hypervisor command-line limit to *reduce* the limit,
  then the "default" values generated by the toolstack are too high,
  and all guest creations will fail.

In other words, the toolstack defaults require any change to be
effected by having the admin explicitly specify a new value in every
guest.

In order to address this, have grant_table_init treat negative values
for max_grant_frames and max_maptrack_frames as instructions to use the
system-wide default, and have all the above toolstacks default to passing
-1 unless a different value is explicitly configured.

This restores the old behavior in that changing the hypervisor command-line
option can change the behavior for all guests, while retaining the ability
to set per-guest values.  It also removes the bug that reducing the
system-wide max will cause all domains without explicit limits to fail.

NOTE: - The Ocaml bindings require the caller to always specify a value,
and the code to start a xenstored stubdomain hard-codes these to 4
and 128 respectively; this behavour will not be modified.

Signed-off-by: George Dunlap 
Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Anthony PERARD 
Cc: "Marek Marczykowski-Górecki" 
Cc: Volodymyr Babchuk 
Cc: "Roger Pau Monné" 

v5:
 - Remove erroneous __init annotations
 - Fail out of range command line values with ERANGE
 - Make opt_max_maptrack_frames static

v4:
 - Add missing braces in xlu_cfg_get_bounded_long()

v3:
 - Make sure that specified values cannot be negative or overflow a
   signed int

v2:
 - re-worked George's original commit massage a little
 - fixed the text in xl.conf.5.pod
 - use -1 as the sentinel value for 'default' and < 0 for checking it
---
 docs/man/xl.conf.5.pod|  6 +++--
 tools/libxl/libxl.h   |  4 +--
 tools/libxl/libxl_types.idl   |  4 +--
 tools/libxl/libxlu_cfg.c  | 26 --
 tools/libxl/libxlutil.h   |  2 ++
 tools/python/xen/lowlevel/xc/xc.c |  4 +--
 tools/xl/xl.c | 15 ---
 tools/xl/xl_parse.c   |  9 ---
 xen/arch/arm/setup.c  |  2 +-
 xen/arch/x86/setup.c  |  4 +--
 xen/common/grant_table.c  | 45 +++
 xen/include/public/domctl.h   | 10 ---
 xen/include/xen/grant_table.h | 10 +++
 13 files changed, 100 insertions(+), 41 deletions(-)

diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod
index 962144e38e..207ab3e77a 100644
--- a/docs/man/xl.conf.5.pod
+++ b/docs/man/xl.conf.5.pod
@@ -81,13 +81,15 @@ Default: C
 
 Sets the default value for the C domain config value.
 
-Default: C<32> on hosts up to 16TB of memory, C<64> on hosts larger than 16TB
+Default: value of Xen command line B parameter (or its
+default value if unspecified).
 
 =item B
 
 Sets the default value for the C domain config value.
 
-Default: C<1024>
+Default: value of Xen command line B
+parameter (or its default value if unspecified).
 
 =item B
 
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 49b56fa1a3..a2a5d321c5 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -364,8 +364,8 @@
  */
 #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1
 
-#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32
-#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024
+#define LIBXL_MAX_GRANT_FRAMES_DEFAULT -1
+#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT -1
 
 /*
  * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 0546d7865a..63e29bb2fb 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -511,8 +511,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
 
 ("vnuma_nodes", 

Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Jan Beulich
On 28.11.2019 17:47, Jürgen Groß wrote:
> On 28.11.19 17:36, Jan Beulich wrote:
>> On 28.11.2019 14:58, Paul Durrant wrote:
>>> --- a/xen/common/grant_table.c
>>> +++ b/xen/common/grant_table.c
>>> @@ -84,11 +84,43 @@ struct grant_table {
>>>   struct grant_table_arch arch;
>>>   };
>>>   
>>> +static int __init parse_gnttab_limit(const char *param, const char *arg,
>>
>> No __init please here and below, for this being runtime option
>> parsing functions.
>>
>>> + unsigned int *valp)
>>> +{
>>> +const char *e;
>>> +unsigned long val;
>>> +
>>> +val = simple_strtoul(arg, , 0);
>>> +if ( *e )
>>> +return -EINVAL;
>>> +
>>> +if ( val <= INT_MAX )
>>> +*valp = val;
>>> +else
>>> +printk("parameter \"%s\" value \"%s\" is out of range; using value 
>>> \"%u\"\n",
>>> +   param, arg, *valp);
>>
>> Better store INT_MAX in this case rather than leaving the value
> 
> No, INT_MAX is no good idea. In case of this happening at boot time we'd
> allocate an array of 2 billion pointers for dom0...

But we've been asked for even more. We should let the admin shoot
itself in the foot, I think.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Jan Beulich
On 28.11.2019 17:42,  Durrant, Paul  wrote:
>> From: Jan Beulich 
>> Sent: 28 November 2019 16:37
>>
>> On 28.11.2019 14:58, Paul Durrant wrote:
>>> +val = simple_strtoul(arg, , 0);
>>> +if ( *e )
>>> +return -EINVAL;
>>> +
>>> +if ( val <= INT_MAX )
>>> +*valp = val;
>>> +else
>>> +printk("parameter \"%s\" value \"%s\" is out of range; using
>> value \"%u\"\n",
>>> +   param, arg, *valp);
>>
>> Better store INT_MAX in this case rather than leaving the value
>> unchanged? Or otherwise ...
>>
>>> +return 0;
>>
>> ... at least don't return success?
> 
> TBH I wasn't sure what the best thing to do was. In the end I opted
> for the warning and a successful completion as I thought a failure
> would be largely unhelpful. I can change this into an error though.

Well, if you return success, then the option should be handled in at
least a best effort manner, i.e. by storing INT_MAX as indicated.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Jürgen Groß

On 28.11.19 17:36, Jan Beulich wrote:

On 28.11.2019 14:58, Paul Durrant wrote:

--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -84,11 +84,43 @@ struct grant_table {
  struct grant_table_arch arch;
  };
  
+static int __init parse_gnttab_limit(const char *param, const char *arg,


No __init please here and below, for this being runtime option
parsing functions.


+ unsigned int *valp)
+{
+const char *e;
+unsigned long val;
+
+val = simple_strtoul(arg, , 0);
+if ( *e )
+return -EINVAL;
+
+if ( val <= INT_MAX )
+*valp = val;
+else
+printk("parameter \"%s\" value \"%s\" is out of range; using value 
\"%u\"\n",
+   param, arg, *valp);


Better store INT_MAX in this case rather than leaving the value


No, INT_MAX is no good idea. In case of this happening at boot time we'd
allocate an array of 2 billion pointers for dom0...


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 28 November 2019 16:37
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Anthony PERARD ;
> George Dunlap ; Roger Pau Monné
> ; Volodymyr Babchuk ;
> George Dunlap ; Ian Jackson
> ; Marek Marczykowski-Górecki
> ; Stefano Stabellini
> ; Konrad Rzeszutek Wilk ;
> Julien Grall ; Wei Liu 
> Subject: Re: [PATCH-for-4.13 v4] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On 28.11.2019 14:58, Paul Durrant wrote:
> > --- a/xen/common/grant_table.c
> > +++ b/xen/common/grant_table.c
> > @@ -84,11 +84,43 @@ struct grant_table {
> >  struct grant_table_arch arch;
> >  };
> >
> > +static int __init parse_gnttab_limit(const char *param, const char
> *arg,
> 
> No __init please here and below, for this being runtime option
> parsing functions.
> 

Sorry, yes... forgot about the runtime part.

> > + unsigned int *valp)
> > +{
> > +const char *e;
> > +unsigned long val;
> > +
> > +val = simple_strtoul(arg, , 0);
> > +if ( *e )
> > +return -EINVAL;
> > +
> > +if ( val <= INT_MAX )
> > +*valp = val;
> > +else
> > +printk("parameter \"%s\" value \"%s\" is out of range; using
> value \"%u\"\n",
> > +   param, arg, *valp);
> 
> Better store INT_MAX in this case rather than leaving the value
> unchanged? Or otherwise ...
> 
> > +return 0;
> 
> ... at least don't return success?

TBH I wasn't sure what the best thing to do was. In the end I opted for the 
warning and a successful completion as I thought a failure would be largely 
unhelpful. I can change this into an error though.

> 
> > +}
> > +
> >  unsigned int __read_mostly opt_max_grant_frames = 64;
> > -integer_runtime_param("gnttab_max_frames", opt_max_grant_frames);
> > +
> > +static int __init parse_gnttab_max_frames(const char *arg)
> > +{
> > +return parse_gnttab_limit("gnttab_max_frames", arg,
> > +  _max_grant_frames);
> > +}
> > +custom_runtime_param("gnttab_max_frames", parse_gnttab_max_frames);
> >
> >  unsigned int __read_mostly opt_max_maptrack_frames = 1024;
> 
> As indicated this wants to become static now.

Sorry I forgot about that.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH-for-4.13 v4] Rationalize max_grant_frames and 
max_maptrack_frames handling"):
> On 28.11.2019 14:58, Paul Durrant wrote:
> >  uint32_t max_vcpus;
> >  uint32_t max_evtchn_port;
> > -uint32_t max_grant_frames;
> > -uint32_t max_maptrack_frames;
> > +int32_t max_grant_frames;
> > +int32_t max_maptrack_frames;
> 
> While this may want backporting aiui, we need to be a little
> careful with the interface change here.

A note here in a list discussion, or even in a commit message, is
perhaps not going to be very effective to deal with this.

How bad would it be to change the names as well as the types ?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses

2019-11-28 Thread Jan Beulich
On 28.11.2019 12:44, Andrew Cooper wrote:
> c/s d0a699a389f1 "x86/monitor: add support for descriptor access events"
> introduced logic looking for what appeared to be exitinfo (not that this
> exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring
> information.  There is never any IDT vectoring involved in these intercepts so
> the value passed is always zero.
> 
> In fact, SVM doesn't provide any information, even in exitinfo1 and 2.  Note
> the error in the public API and state that this field is always 0, and drop
> the SVM logic in hvm_monitor_descriptor_access().
> 
> In the SVM vmexit handler itself, optimise the switch statement by observing
> that there is a linear transformation between the SVM exit_reason and
> VM_EVENT_DESC_* values.  (Bloat-o-meter reports 6028 => 5877 for a saving of
> 151 bytes).
> 
> Signed-off-by: Andrew Cooper 

SVM part of the change
Acked-by: Jan Beulich 

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Jan Beulich
On 28.11.2019 14:58, Paul Durrant wrote:
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -84,11 +84,43 @@ struct grant_table {
>  struct grant_table_arch arch;
>  };
>  
> +static int __init parse_gnttab_limit(const char *param, const char *arg,

No __init please here and below, for this being runtime option
parsing functions.

> + unsigned int *valp)
> +{
> +const char *e;
> +unsigned long val;
> +
> +val = simple_strtoul(arg, , 0);
> +if ( *e )
> +return -EINVAL;
> +
> +if ( val <= INT_MAX )
> +*valp = val;
> +else
> +printk("parameter \"%s\" value \"%s\" is out of range; using value 
> \"%u\"\n",
> +   param, arg, *valp);

Better store INT_MAX in this case rather than leaving the value
unchanged? Or otherwise ...

> +return 0;

... at least don't return success?

> +}
> +
>  unsigned int __read_mostly opt_max_grant_frames = 64;
> -integer_runtime_param("gnttab_max_frames", opt_max_grant_frames);
> +
> +static int __init parse_gnttab_max_frames(const char *arg)
> +{
> +return parse_gnttab_limit("gnttab_max_frames", arg,
> +  _max_grant_frames);
> +}
> +custom_runtime_param("gnttab_max_frames", parse_gnttab_max_frames);
>  
>  unsigned int __read_mostly opt_max_maptrack_frames = 1024;

As indicated this wants to become static now.

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -82,13 +82,15 @@ struct xen_domctl_createdomain {
>  uint32_t iommu_opts;
>  
>  /*
> - * Various domain limits, which impact the quantity of resources (global
> - * mapping space, xenheap, etc) a guest may consume.
> + * Various domain limits, which impact the quantity of resources
> + * (global mapping space, xenheap, etc) a guest may consume.  For
> + * max_grant_frames and max_maptrack_frames, < 0 means "use the
> + * default maximum value in the hypervisor".
>   */
>  uint32_t max_vcpus;
>  uint32_t max_evtchn_port;
> -uint32_t max_grant_frames;
> -uint32_t max_maptrack_frames;
> +int32_t max_grant_frames;
> +int32_t max_maptrack_frames;

While this may want backporting aiui, we need to be a little
careful with the interface change here.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 15:30, Roger Pau Monné  wrote:
> On Thu, Nov 28, 2019 at 03:19:50PM +0100, Jan Beulich wrote:
>> On 28.11.2019 15:13, Roger Pau Monné  wrote:
>>> On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote:
 On 28.11.2019 12:39, Roger Pau Monné wrote:
> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
>> At the time the pending EOI stack was introduced there were no
>> internally used IRQs which would have the LAPIC EOI issued from the
>> ->end() hook. This had then changed with the introduction of IOMMUs,
>> but the interaction issue was presumably masked by
>> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
>> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
>> running without need"]).
>>
>> The problem is that with us re-enabling interrupts across handler
>> invocation, a higher priority (guest) interrupt may trigger while
>> handling a lower priority (internal) one. The EOI issued from
>> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
>> EOI the higher priority (guest) interrupt, breaking (among other
>> things) pending EOI stack logic's assumptions.
>
> Maybe there's something that I'm missing, but shouldn't hypervisor
> vectors always be higher priority than guest ones?

 Depends - IOMMU ones imo aren't something that needs urgently
 dealing with, so a little bit of delay won't hurt. There would
 only be a problem if such interrupts could be deferred
 indefinitely.

> I see there's already a range reserved for high priority vectors
> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
> interrupts not using this range?

 We'd quickly run out of high priority vectors on systems with
 multiple (and perhaps indeed many) IOMMUs.
>>>
>>> Well, there's no limit on the number of high priority vectors, since
>>> this is all a software abstraction. It only matters that such vectors
>>> are higher than guest owned ones.
>>>
>>> I have to take a look, but I would think that Xen used vectors are the
>>> first ones to be allocated, and hence could start from
>>> FIRST_HIPRIORITY_VECTOR - 1 and go down from there.
>>
>> If this was the case, then we wouldn't have observed the issue (despite
>> it being there) this patch tries to address. The IOMMUs for both Andrew
>> and me ended up using vector 0x28, below everything that e.g. the
>> IO-APIC RTE got assigned.
> 
> I know it's not like that ATM, and hence I wonder whether it would be
> possible to make it so: Xen vectors get allocated down from
> FIRST_HIPRIORITY_VECTOR - 1 and then we won't have this issue.
> 
>> Also don't forget that we don't allocate
>> vectors continuously, but such that they'd get spread across the
>> different priority levels. (Whether that's an awfully good idea is a
>> separate question.)
> 
> Well, vectors used by Xen would be allocated downwards continuously
> from FIRST_HIPRIORITY_VECTOR - 1, and hence won't be spread.
> 
> Guest used vectors could continue to use the same allocation
> mechanism, since that's a different issue.

The issue would go away only if guest vectors are at strictly
lower priority than Xen ones. I.e. we'd need to go in steps of
16. And there aren't that many vectors ... (I'm happy to see
changes here, but it'll need to be very careful ones.)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v2 0/3] x86/hvm: Multiple corrections to task switch handling

2019-11-28 Thread Jürgen Groß

On 26.11.19 13:03, Andrew Cooper wrote:

These patches want backporting due to the severity of patch 2.  They should
therefore be considered for 4.13 at this point.

Andrew Cooper (3):
   x86/vtx: Fix fault semantics for early task switch failures
   x86/svm: Always intercept ICEBP
   x86/svm: Write the correct %eip into the outgoing task

  xen/arch/x86/hvm/hvm.c|  4 +-
  xen/arch/x86/hvm/svm/emulate.c| 54 
  xen/arch/x86/hvm/svm/svm.c| 77 ++-
  xen/arch/x86/hvm/svm/vmcb.c   |  2 +-
  xen/arch/x86/hvm/vmx/vmx.c|  4 +-
  xen/arch/x86/monitor.c|  3 --
  xen/include/asm-x86/hvm/hvm.h | 13 +-
  xen/include/asm-x86/hvm/svm/emulate.h |  1 +
  8 files changed, 109 insertions(+), 49 deletions(-)



For the series:

Release-acked-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Jürgen Groß

On 28.11.19 16:33, Hans van Kranenburg wrote:

On 11/28/19 3:54 PM, George Dunlap wrote:

On 11/27/19 10:32 PM, Hans van Kranenburg wrote:

Hi all,

On 11/27/19 12:13 PM, Durrant, Paul wrote:

-Original Message-
From: Ian Jackson 
Sent: 27 November 2019 11:10
[...]
Subject: RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames
and max_maptrack_frames handling

Durrant, Paul writes ("RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize
max_grant_frames and max_maptrack_frames handling"):

-Original Message-
From: Xen-devel  On Behalf Of

Ian

Jackson
I have seen reports of users who ran out of grant/maptrack frames
because of updates to use multiring protocols etc.  The error messages
are not very good and the recommended workaround has been to increase
the default limit on the hypervisor command line.

It is important that we don't break that workaround!


Alas it has apparently been broken for several releases now :-(


I guess at least in Debian (where I have seen this) we haven't
released with any affected versions yet...


I believe the problem was introduce in 4.10, so I think it would be prudent to 
also back-port the final fix to stable trees from then on.


Yes, the max grant frame issue has historically always been a painful
experience for end users, and Xen 4.11 which we now have in the current
Debian stable has made it worse compared to previous versions indeed.


This rather suggests that the default value isn't very well chosen.
Ideally some investigation would be done to improve the default sizing;
end-users shouldn't have to know anything about grant table frames.


Most of the problems started happening a few years ago when using a
newer Linux that got all kinds of multiqueue block stuff for disk and
network enabled on top of an older Xen. (e.g. in Debian using the Linux
4.9 backports kernel on top of Xen 4.4 in Jessie).

The default for the hypervisor option has already been doubled from 32
to 64, which I think is sufficient. However, having the toolstack revert
it back to 32 again is not very helpful, but that's what this thread is
about to solve. :)

A while ago I did some testing:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554#119

I haven't been able to cause nr_frames to go over 64 in any test myself,
and also have never seen values that high in production use. The above
debian bug also does not contain any other report from anyone with a
number above 64. There are reports of users setting it to 256 and then
not caring about it any more, but they didn't report the xen_diag output
back after that, so there's no real data.


I have seen guests needing 256.

My Linux kernel patches reducing the default max. number of queues in
netfront/netback to 8 made things much better (on a large host running
a guest with 64 vcpus using 8 network interfaces was blowing up rather
fast).


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Hans van Kranenburg
On 11/28/19 3:54 PM, George Dunlap wrote:
> On 11/27/19 10:32 PM, Hans van Kranenburg wrote:
>> Hi all,
>>
>> On 11/27/19 12:13 PM, Durrant, Paul wrote:
 -Original Message-
 From: Ian Jackson 
 Sent: 27 November 2019 11:10
 [...]
 Subject: RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames
 and max_maptrack_frames handling

 Durrant, Paul writes ("RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize
 max_grant_frames and max_maptrack_frames handling"):
>> -Original Message-
>> From: Xen-devel  On Behalf Of
 Ian
>> Jackson
>> I have seen reports of users who ran out of grant/maptrack frames
>> because of updates to use multiring protocols etc.  The error messages
>> are not very good and the recommended workaround has been to increase
>> the default limit on the hypervisor command line.
>>
>> It is important that we don't break that workaround!
>
> Alas it has apparently been broken for several releases now :-(

 I guess at least in Debian (where I have seen this) we haven't
 released with any affected versions yet...
>>>
>>> I believe the problem was introduce in 4.10, so I think it would be prudent 
>>> to also back-port the final fix to stable trees from then on.
>>
>> Yes, the max grant frame issue has historically always been a painful
>> experience for end users, and Xen 4.11 which we now have in the current
>> Debian stable has made it worse compared to previous versions indeed.
> 
> This rather suggests that the default value isn't very well chosen.
> Ideally some investigation would be done to improve the default sizing;
> end-users shouldn't have to know anything about grant table frames.

Most of the problems started happening a few years ago when using a
newer Linux that got all kinds of multiqueue block stuff for disk and
network enabled on top of an older Xen. (e.g. in Debian using the Linux
4.9 backports kernel on top of Xen 4.4 in Jessie).

The default for the hypervisor option has already been doubled from 32
to 64, which I think is sufficient. However, having the toolstack revert
it back to 32 again is not very helpful, but that's what this thread is
about to solve. :)

A while ago I did some testing:
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554#119

I haven't been able to cause nr_frames to go over 64 in any test myself,
and also have never seen values that high in production use. The above
debian bug also does not contain any other report from anyone with a
number above 64. There are reports of users setting it to 256 and then
not caring about it any more, but they didn't report the xen_diag output
back after that, so there's no real data.

Hans

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Andrew Cooper
On 28/11/2019 14:00, Jan Beulich wrote:
> On 28.11.2019 14:53, Andrew Cooper wrote:
>> On 28/11/2019 12:10, Andrew Cooper wrote:
>>> On 28/11/2019 11:03, Jan Beulich wrote:
 Notes:

 - In principle we could get away without the check_eoi_deferral flag.
   I've introduced it just to make sure there's as little change as
   possible to unaffected paths.
 - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
   strictly necessary.
>>> I don't think the cpu_has_pending_apic_eoi() check is necessary.  It is
>>> checked at the head of end_nonmaskable_irq() as well.
>>>
>>> Similarly, I'm not sure that check_eoi_deferral is something that we'd
>>> want to introduce.
>>>
>>> I'll drop both of these and test, seeing as I have a repro of the problem.
>> Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so
>> I didn't), and dropping just check_eoi_deferral on its own definitely
>> breaks things.
>>
>> Given the 4.13 timeline, lets go with it in this form, seeing as it is
>> the version which had all of last night's worth of testing.
>>
>> Acked-by: Andrew Cooper 
>> Tested-by: Andrew Cooper 
> Thanks! I've taken note to produce a patch (if possible at all, given
> the results of your attempt) to remove the extra pieces again, ideally
> to go in pretty soon after the branching.

Just for morbid curiosity, this is what happens without the
check_eoi_deferral check:

[  383.845620] dracut-initqueue[302]: Warning: Could not boot.
[  383.892655] dracut-initqueue[302]: Warning:
/dev/disk/by-label/root-qomhwa does not exist
 Starting Dracut Emergency Shell...

dracut:/# [  439.282461] BUG: unable to handle kernel NULL pointer
dereference at 09a0
[  439.282773] PGD 0 P4D 0
[  439.282928] Oops:  [#1] SMP NOPTI
[  439.283088] CPU: 0 PID: 281 Comm: systemd-udevd Tainted: G  
O  4.19.0+1 #1
[  439.283337] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS
2.0.2 10/22/2012
[  439.283591] RIP: e030:mptscsih_remove+0x5b/0xb0 [mptscsih]
[  439.283751] Code: ff 74 1e 8b 83 9c 01 00 00 44 8d 2c c5 00 00 00 00
e8 c9 4d e9 c0 48 c7 83 c0 0c 00 00 00 00 00 00 f6 83 c8 00 00 00 01 75
34 <48> 8b bd a0
[  439.284153] RSP: e02b:c900404b7b50 EFLAGS: 00010246
[  439.284312] RAX: 0049 RBX: 88809a7d9000 RCX:
0006
[  439.284477] RDX:  RSI: 0001 RDI:

[  439.284644] RBP:  R08:  R09:
0384
[  439.284810] R10: 0004 R11:  R12:
88809a7cc000
[  439.284975] R13:  R14: c038b180 R15:
c900404b7e98
[  439.285149] FS:  7ff0922a38c0() GS:8880a360()
knlGS:
[  439.285428] CS:  e033 DS:  ES:  CR0: 80050033
[  439.285590] CR2: 09a0 CR3: 04012000 CR4:
0660
[  439.285765] Call Trace:
[  439.285924]  mptsas_probe+0x384/0x530 [mptsas]
[  439.286133]  pci_device_probe+0xc9/0x140
[  439.286297]  really_probe+0x238/0x3e0
[  439.286455]  driver_probe_device+0x115/0x130
[  439.286622]  __driver_attach+0x103/0x110
[  439.286805]  ? driver_probe_device+0x130/0x130
[  439.286964]  bus_for_each_dev+0x67/0xc0
[  439.287120]  bus_add_driver+0x41/0x260
[  439.287274]  ? 0xc038f000
[  439.287426]  driver_register+0x5b/0xe0
[  439.287578]  ? 0xc038f000
[  439.287732]  mptsas_init+0x114/0x1000 [mptsas]
[  439.287890]  do_one_initcall+0x4e/0x1d4
[  439.288045]  ? _cond_resched+0x15/0x30
[  439.288232]  ? kmem_cache_alloc_trace+0x15f/0x1c0
[  439.288392]  do_init_module+0x5a/0x21b
[  439.288546]  load_module+0x2270/0x2800
[  439.288700]  ? m_show+0x1c0/0x1c0
[  439.288854]  __do_sys_finit_module+0x94/0xe0
[  439.289012]  do_syscall_64+0x4e/0x100
[  439.289168]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  439.289326] RIP: 0033:0x7ff090ef9ec9
[  439.289482] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f8
[  439.289871] RSP: 002b:7ffd96d355d8 EFLAGS: 0246 ORIG_RAX:
0139
[  439.290120] RAX: ffda RBX: 5581851f9f70 RCX:
7ff090ef9ec9
[  439.290287] RDX:  RSI: 7ff09181a099 RDI:
000c
[  439.290458] RBP: 7ff09181a099 R08:  R09:
5581851f6670
[  439.290625] R10: 000c R11: 0246 R12:

[  439.290792] R13: 5581851f6240 R14: 0002 R15:

[  439.290993] Modules linked in: ata_generic pata_acpi ohci_pci ahci
serio_raw pata_atiixp libahci mptsas(+) scsi_transport_sas ehci_pci
bnx2(O) libata mptscst
[  439.291484] CR2: 09a0
[  439.291665] ---[ end trace 519b6f631d8509d1 ]---
[  439.295635] RIP: e030:mptscsih_remove+0x5b/0xb0 [mptscsih]
[  439.295804] Code: ff 74 1e 8b 83 9c 01 00 00 44 8d 2c c5 00 00 00 00
e8 c9 4d e9 c0 48 c7 83 c0 0c 00 00 00 00 00 00 f6 83 

Re: [Xen-devel] [PATCH LP-BUILD_TOOLS] Fix building with updated ENFORCE_UNIQUE_SYMBOLS behaviour

2019-11-28 Thread Sergey Dyasli
On 28/11/2019 14:51, Ross Lagerwall wrote:
> The patch "build: provide option to disambiguate symbol names" changes
> ENFORCE_UNIQUE_SYMBOLS so that gcc generates output to a temporary file
> and then objcopy is used to create the final object file. This breaks
> livepatch-build's interposition of GCC to capture the changed object
> files so intercept calls to objcopy as well to capture the final object
> files.
> 
> While in the area, add a couple of extra object files to be ignored when
> patching.
> 
> Signed-off-by: Ross Lagerwall 
> ---
> 
> With this change, I've built and successfully applied a trivial
> livepatch with Jan's patch applied and ENFORCE_UNIQUE_SYMBOLS turned on.

Can confirm the same with my test LP. So

Tested-by: Sergey Dyasli 

can go to both this and Jan's patch I guess.

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses

2019-11-28 Thread Adrian Pop

Hello,

Andrew Cooper  writes:

> c/s d0a699a389f1 "x86/monitor: add support for descriptor access events"
> introduced logic looking for what appeared to be exitinfo (not that this
> exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring
> information.  There is never any IDT vectoring involved in these intercepts so
> the value passed is always zero.
>
> In fact, SVM doesn't provide any information, even in exitinfo1 and 2.  Note
> the error in the public API and state that this field is always 0, and drop
> the SVM logic in hvm_monitor_descriptor_access().
>
> In the SVM vmexit handler itself, optimise the switch statement by observing
> that there is a linear transformation between the SVM exit_reason and
> VM_EVENT_DESC_* values.  (Bloat-o-meter reports 6028 => 5877 for a saving of
> 151 bytes).
>
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Razvan Cojocaru 
> CC: Tamas K Lengyel 
> CC: Alexandru Isaila 
> CC: Petre Pircalabu 
> CC: Adrian Pop 
>
> Adrian: Do you recall what information you were attempting to forward from the
> VMCB?  I can't locate anything which would plausibly be interesting.

The SVM part was most likely meant to mirror the logic from VMX.  But,
as I recall, monitoring hadn't been implemented with SVM so this
couldn't really be used on AMD.  Unfortunately I'm not sure what
information was supposed to be forwarded.

The cleanup looks good to me.

Acked-by: Adrian Pop 

> This is part of a longer cleanup series I gathered in the wake of the task
> switch issues, but it is pulled out ahead due to its interaction with the
> public interface.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Wei Liu
On Thu, Nov 28, 2019 at 01:10:05PM +, Andrew Cooper wrote:
> On 28/11/2019 10:17, George Dunlap wrote:
> >> On Nov 28, 2019, at 9:55 AM, Jan Beulich  wrote:
> > Has anyone actually tried building a livepatch with this change in 
> > place?
>  Actually - what is your concern here? The exact spelling of symbols
>  names should be of no interest to the tool. After all the compiler is
>  free to invent all sorts of names for its local symbols, including
>  the ones we would produce with this change in place. All the tool
>  cares about is that the names be unambiguous. Hence any (theoretical)
>  regression here would be a bug in the tools, which imo is no reason
>  to delay this change any further. (Granted I should have got to it
>  earlier, but it had been continuing to get deferred.)
> >>> This might all be true (theoretically).
> >>>
> >>> The livepatch build tools are fragile and very sensitive to how the
> >>> object files are laid out.  There is a very real risk that this change
> >>> accidentally breaks livepatching totally, even on GCC builds.
> >>>
> >>> Were this to happen, it would be yet another 4.13 regression.
> >> It's perhaps a matter of perception, but I'd still call this a
> >> live patching tools bug then, not a 4.13 regression.
> > After the discussion yesterday, I was thinking a bit more about this, and 
> > I’m not happy with the principle Andy seems to be operating on,
> 
> I'm sorry that you feel that way.
> 
> > that it’s reasonable to completely block a bug-fixing patch to Xen, forcing 
> > a work-around to be used in a release, because it has unknown effects on 
> > livepatching.
> 
> This is not a fair characterisation of what is going on here.  Ignore
> the specifics of this patch - they are not relevant to my objection.
> 
> As a maintainer, it is my responsibility to ensure that crap doesn't get
> committed.
> 

It is fine to have differing opinions; it is not fine to make an
emotionally charged comment like this. It serves no-one's interest.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread George Dunlap
On 11/27/19 10:32 PM, Hans van Kranenburg wrote:
> Hi all,
> 
> On 11/27/19 12:13 PM, Durrant, Paul wrote:
>>> -Original Message-
>>> From: Ian Jackson 
>>> Sent: 27 November 2019 11:10
>>> [...]
>>> Subject: RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames
>>> and max_maptrack_frames handling
>>>
>>> Durrant, Paul writes ("RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize
>>> max_grant_frames and max_maptrack_frames handling"):
> -Original Message-
> From: Xen-devel  On Behalf Of
>>> Ian
> Jackson
> I have seen reports of users who ran out of grant/maptrack frames
> because of updates to use multiring protocols etc.  The error messages
> are not very good and the recommended workaround has been to increase
> the default limit on the hypervisor command line.
>
> It is important that we don't break that workaround!

 Alas it has apparently been broken for several releases now :-(
>>>
>>> I guess at least in Debian (where I have seen this) we haven't
>>> released with any affected versions yet...
>>
>> I believe the problem was introduce in 4.10, so I think it would be prudent 
>> to also back-port the final fix to stable trees from then on.
> 
> Yes, the max grant frame issue has historically always been a painful
> experience for end users, and Xen 4.11 which we now have in the current
> Debian stable has made it worse compared to previous versions indeed.

This rather suggests that the default value isn't very well chosen.
Ideally some investigation would be done to improve the default sizing;
end-users shouldn't have to know anything about grant table frames.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH LP-BUILD_TOOLS] Fix building with updated ENFORCE_UNIQUE_SYMBOLS behaviour

2019-11-28 Thread Ross Lagerwall
The patch "build: provide option to disambiguate symbol names" changes
ENFORCE_UNIQUE_SYMBOLS so that gcc generates output to a temporary file
and then objcopy is used to create the final object file. This breaks
livepatch-build's interposition of GCC to capture the changed object
files so intercept calls to objcopy as well to capture the final object
files.

While in the area, add a couple of extra object files to be ignored when
patching.

Signed-off-by: Ross Lagerwall 
---

With this change, I've built and successfully applied a trivial
livepatch with Jan's patch applied and ENFORCE_UNIQUE_SYMBOLS turned on.

To be applied at the same time as Jan's patch.

 livepatch-gcc | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/livepatch-gcc b/livepatch-gcc
index 01e4b8c..91333d5 100755
--- a/livepatch-gcc
+++ b/livepatch-gcc
@@ -26,6 +26,7 @@ declare -a args=("$@")
 keep=no
 
 declare -r GCC_RE='gcc.*'
+declare -r OBJCOPY_RE='objcopy.*'
 if [[ "$TOOLCHAINCMD" =~ $GCC_RE ]] ; then
 while [ "$#" -gt 0 ]; do
 if [ "$1" = "-o" ]; then
@@ -34,7 +35,9 @@ if [[ "$TOOLCHAINCMD" =~ $GCC_RE ]] ; then
 case "$obj" in
 version.o|\
 debug.o|\
+efi/check.o|\
 *.xen-syms.*.o|\
+*.xen.efi.*.o|\
 built_in.o|\
 prelink.o|\
 .*.o)
@@ -56,6 +59,25 @@ if [[ "$TOOLCHAINCMD" =~ $GCC_RE ]] ; then
 fi
 shift
 done
+elif [[ "$TOOLCHAINCMD" =~ $OBJCOPY_RE ]] ; then
+obj="${!#}"
+case "$obj" in
+version.o|\
+debug.o|\
+efi/check.o|\
+.*.o)
+;;
+*.o)
+path="$(pwd)/$(dirname $obj)"
+dir="${path#$LIVEPATCH_BUILD_DIR}"
+if [ -n "$LIVEPATCH_CAPTURE_DIR" -a -d "$LIVEPATCH_CAPTURE_DIR" ]; 
then
+echo "$dir/$obj" >> "${LIVEPATCH_CAPTURE_DIR}/changed_objs"
+keep=yes
+fi
+;;
+*)
+;;
+esac
 fi
 
 "$TOOLCHAINCMD" "${args[@]}"
-- 
2.21.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Ross Lagerwall
On 11/28/19 1:57 PM, Jan Beulich wrote:
> On 28.11.2019 14:54, Ross Lagerwall wrote:
>> On 11/28/19 1:49 PM, Jan Beulich wrote:
>>> On 28.11.2019 13:15, Sergey Dyasli wrote:
 Applying the patch didn't end up well for my test LP (from another thread):

 Perform full initial build with 8 CPU(s)...
 Reading special section data
 Apply patch and build with 8 CPU(s)...
 Unapply patch and build with 8 CPU(s)...
 Extracting new and modified ELF sections...
 Processing xen/arch/x86/mm/shadow/guest_2.o
 Processing xen/arch/x86/mm/shadow/guest_4.o
 Processing xen/arch/x86/mm/shadow/guest_3.o
 Processing xen/arch/x86/mm/guest_walk_3.o
 Processing xen/arch/x86/mm/hap/guest_walk_3level.o
 Processing xen/arch/x86/mm/hap/guest_walk_4level.o
 Processing xen/arch/x86/mm/hap/guest_walk_2level.o
 Processing xen/arch/x86/mm/guest_walk_2.o
 Processing xen/arch/x86/mm/guest_walk_4.o
 Processing xen/arch/x86/efi/efi/check.o
 Processing xen/arch/x86/pv/gpr_switch.o
 Processing xen/arch/x86/indirect-thunk.o
 Processing xen/arch/x86/boot/head.o
 Processing xen/arch/x86/x86_64/kexec_reloc.o
 Processing xen/arch/x86/x86_64/compat/entry.o
 Processing xen/arch/x86/x86_64/entry.o
 Processing xen/arch/x86/hvm/vmx/entry.o
 Processing xen/arch/x86/hvm/svm/entry.o
 Processing 
 xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o
 Processing 
 xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o
 Processing 
 xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1s.o
 Processing 
 xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1r.o
 ERROR: no functional changes found.

 So this looks like a regression.
>>>
>>> Thanks for doing the testing. But what am I to conclude from
>>> the above? I can't even tell why "no functional changes found"
>>> is an error.
>>>
>>
>> It's due to the way livepatch-build tool interposes on the build to capture
>> changed object files for later comparison.  Now that objcopy writes out the
>> proper object files rather than gcc (which just writes a temporary one), the
>> livepatch-build tool needs some adjustment otherwise it doesn't capture any
>> changed files. I'm working on a patch.
> 
> For my own education, and just if you have the time: Why would there
> be any dependency on which build utility produces the object file?
> 

It uses CROSS_COMPILE to funnel all build output to a script
(https://xenbits.xen.org/gitweb/?p=livepatch-build-tools.git;a=blob;f=livepatch-gcc)
which then notes changed object files by processing gcc's command-line.

If objcopy is used instead of gcc to produce the final output then the script
processes gcc's command-line but doesn't get the output it expects so no
changes are detected.

Yes, this is hacky -- improvements are welcome!

-- 
Ross Lagerwall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] arm64: remove uaccess_ttbr0 asm macros from cache functions

2019-11-28 Thread Mark Rutland
On Wed, Nov 27, 2019 at 01:44:52PM -0500, Pavel Tatashin wrote:
> We currently duplicate the logic to enable/disable uaccess via TTBR0,
> with C functions and assembly macros. This is a maintenenace burden
> and is liable to lead to subtle bugs, so let's get rid of the assembly
> macros, and always use the C functions. This requires refactoring
> some assembly functions to have a C wrapper.

[...]

> +static inline int invalidate_icache_range(unsigned long start,
> +   unsigned long end)
> +{
> + int rv;

Please make this 'ret', for consistency with other arm64 code. We only
use 'rv' in one place where it's short for "Revision and Variant", and
'ret' is our usual name for a temporary variable used to hold a return
value.

> +
> + if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) {
> + isb();
> + return 0;
> + }
> + uaccess_ttbr0_enable();

Please place a newline between these two, for consistency with other
arm64 code.

> + rv = asm_invalidate_icache_range(start, end);
> + uaccess_ttbr0_disable();
> +
> + return rv;
> +}
> +
>  static inline void flush_icache_range(unsigned long start, unsigned long end)
>  {
>   __flush_icache_range(start, end);
> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index db767b072601..a48b6dba304e 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -15,7 +15,7 @@
>  #include 
>  
>  /*
> - *   flush_icache_range(start,end)
> + *   __asm_flush_icache_range(start,end)
>   *
>   *   Ensure that the I and D caches are coherent within specified region.
>   *   This is typically used when code has been written to a memory region,
> @@ -24,11 +24,11 @@
>   *   - start   - virtual start address of region
>   *   - end - virtual end address of region
>   */
> -ENTRY(__flush_icache_range)
> +ENTRY(__asm_flush_icache_range)
>   /* FALLTHROUGH */
>  
>  /*
> - *   __flush_cache_user_range(start,end)
> + *   __asm_flush_cache_user_range(start,end)
>   *
>   *   Ensure that the I and D caches are coherent within specified region.
>   *   This is typically used when code has been written to a memory region,
> @@ -37,8 +37,7 @@ ENTRY(__flush_icache_range)
>   *   - start   - virtual start address of region
>   *   - end - virtual end address of region
>   */
> -ENTRY(__flush_cache_user_range)
> - uaccess_ttbr0_enable x2, x3, x4
> +ENTRY(__asm_flush_cache_user_range)
>  alternative_if ARM64_HAS_CACHE_IDC

It's unfortunate that we pulled the IDC alternative out for
invalidate_icache_range(), but not here.

If we want to pull out that, then I reckon what we might want to do is
have two asm primitives:

* __asm_clean_dcache_range
* __asm_invalidate_icache_range

... which just do the by_line op, with a fixup that can return -EFAULT.

Then we can give each a C wrapper that just does the IDC/DIC check, e.g.

static int __clean_dcache_range(unsigned long start, unsigned long end)
{
if (cpus_have_const_cap(ARM64_HAS_CACHE_IDC)) {
dsb(ishst);
return 0;
}

return __asm_clean_dcache_range(start, end);
}

Then we can build all the more complicated variants atop of those, e.g.

static int __flush_cache_user_range(unsigned long start, unsigned long end)
{
int ret;

uaccess_ttbr0_enable();

ret = __clean_dcache_range(start, end);
if (ret)
goto out;

ret = __invalidate_icache_range(start, end);

out:
uaccess_ttbr0_disable();
return ret;
}

Thanks,
Mark.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses

2019-11-28 Thread Alexandru Stefan ISAILA


On 28.11.2019 13:44, Andrew Cooper wrote:
> c/s d0a699a389f1 "x86/monitor: add support for descriptor access events"
> introduced logic looking for what appeared to be exitinfo (not that this
> exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring
> information.  There is never any IDT vectoring involved in these intercepts so
> the value passed is always zero.
> 
> In fact, SVM doesn't provide any information, even in exitinfo1 and 2.  Note
> the error in the public API and state that this field is always 0, and drop
> the SVM logic in hvm_monitor_descriptor_access().
> 
> In the SVM vmexit handler itself, optimise the switch statement by observing
> that there is a linear transformation between the SVM exit_reason and
> VM_EVENT_DESC_* values.  (Bloat-o-meter reports 6028 => 5877 for a saving of
> 151 bytes).
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Alexandru Isaila 

I agree with Tamas, good thing to have that field removed.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Roger Pau Monné
On Thu, Nov 28, 2019 at 03:19:50PM +0100, Jan Beulich wrote:
> On 28.11.2019 15:13, Roger Pau Monné  wrote:
> > On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote:
> >> On 28.11.2019 12:39, Roger Pau Monné wrote:
> >>> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
>  At the time the pending EOI stack was introduced there were no
>  internally used IRQs which would have the LAPIC EOI issued from the
>  ->end() hook. This had then changed with the introduction of IOMMUs,
>  but the interaction issue was presumably masked by
>  irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
>  (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
>  running without need"]).
> 
>  The problem is that with us re-enabling interrupts across handler
>  invocation, a higher priority (guest) interrupt may trigger while
>  handling a lower priority (internal) one. The EOI issued from
>  ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
>  EOI the higher priority (guest) interrupt, breaking (among other
>  things) pending EOI stack logic's assumptions.
> >>>
> >>> Maybe there's something that I'm missing, but shouldn't hypervisor
> >>> vectors always be higher priority than guest ones?
> >>
> >> Depends - IOMMU ones imo aren't something that needs urgently
> >> dealing with, so a little bit of delay won't hurt. There would
> >> only be a problem if such interrupts could be deferred
> >> indefinitely.
> >>
> >>> I see there's already a range reserved for high priority vectors
> >>> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
> >>> interrupts not using this range?
> >>
> >> We'd quickly run out of high priority vectors on systems with
> >> multiple (and perhaps indeed many) IOMMUs.
> > 
> > Well, there's no limit on the number of high priority vectors, since
> > this is all a software abstraction. It only matters that such vectors
> > are higher than guest owned ones.
> > 
> > I have to take a look, but I would think that Xen used vectors are the
> > first ones to be allocated, and hence could start from
> > FIRST_HIPRIORITY_VECTOR - 1 and go down from there.
> 
> If this was the case, then we wouldn't have observed the issue (despite
> it being there) this patch tries to address. The IOMMUs for both Andrew
> and me ended up using vector 0x28, below everything that e.g. the
> IO-APIC RTE got assigned.

I know it's not like that ATM, and hence I wonder whether it would be
possible to make it so: Xen vectors get allocated down from
FIRST_HIPRIORITY_VECTOR - 1 and then we won't have this issue.

> Also don't forget that we don't allocate
> vectors continuously, but such that they'd get spread across the
> different priority levels. (Whether that's an awfully good idea is a
> separate question.)

Well, vectors used by Xen would be allocated downwards continuously
from FIRST_HIPRIORITY_VECTOR - 1, and hence won't be spread.

Guest used vectors could continue to use the same allocation
mechanism, since that's a different issue.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable test] 144344: tolerable FAIL - PUSHED

2019-11-28 Thread osstest service owner
flight 144344 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144344/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 16 guest-localmigrate   fail REGR. vs. 144323

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144323
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144323
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144323
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144323
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144323
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144323
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 144323
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144323
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144323
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144323
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 xen  9a400d1797ec7f77ffefeb5c4e17a8c2e8b91a12
baseline version:
 xen  5530782cfe70ed22fe44358f6a10c38916443b42

Last test of basis   144323  2019-11-27 12:02:43 Z1 days
Testing same since   144344  2019-11-28 02:00:09 Z0 days1 attempts


People who touched revisions under test:
  George Dunlap 
  Ian Jackson 
  Igor Druzhinin 
  Jan Beulich 
  Julien 

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread George Dunlap
On 11/28/19 1:10 PM, Andrew Cooper wrote:
> On 28/11/2019 10:17, George Dunlap wrote:
>>> On Nov 28, 2019, at 9:55 AM, Jan Beulich  wrote:
>> Has anyone actually tried building a livepatch with this change in place?
> Actually - what is your concern here? The exact spelling of symbols
> names should be of no interest to the tool. After all the compiler is
> free to invent all sorts of names for its local symbols, including
> the ones we would produce with this change in place. All the tool
> cares about is that the names be unambiguous. Hence any (theoretical)
> regression here would be a bug in the tools, which imo is no reason
> to delay this change any further. (Granted I should have got to it
> earlier, but it had been continuing to get deferred.)
 This might all be true (theoretically).

 The livepatch build tools are fragile and very sensitive to how the
 object files are laid out.  There is a very real risk that this change
 accidentally breaks livepatching totally, even on GCC builds.

 Were this to happen, it would be yet another 4.13 regression.
>>> It's perhaps a matter of perception, but I'd still call this a
>>> live patching tools bug then, not a 4.13 regression.
>> After the discussion yesterday, I was thinking a bit more about this, and 
>> I’m not happy with the principle Andy seems to be operating on,
> 
> I'm sorry that you feel that way.
> 
>> that it’s reasonable to completely block a bug-fixing patch to Xen, forcing 
>> a work-around to be used in a release, because it has unknown effects on 
>> livepatching.
> 
> This is not a fair characterisation of what is going on here.  Ignore
> the specifics of this patch - they are not relevant to my objection.
> 
> As a maintainer, it is my responsibility to ensure that crap doesn't get
> committed.

Jan's patch is not crap; this is out of line.

>> And I do not consider it my responsibility to
>> do regression tests of the live patching tools.
> 
> Yes.  Yes it really is, when you're making a material change
> specifically to this area, with a high chance of adverse impact.

Then it looks like we need to have a wider discussion about this,
because my answer is, "No, not it's really not."

And I don't think you would think so either, except that you happen to
use livepatching.  Imagine if you posted a patch trying to fix nested
HVM, and out of the blue Olaf turned up and said, "Have you tested this
with hypervisor paging?"  And when you said, "No, I have no idea how to
test that", Jan simply blocked the patch indefinitely?  You would be
angry, and rightly so.

As a general principle, the cost of features should be borne by the
people who use them.  That includes the cost of determining
authoritatively whether a change is safe or not -- either by review, or
by doing manual testing, or by having automated tests in place to flag
up issues.

If you had questions about the patch, *you*, with your developer hat on,
should have either done testing, or arranged for someone who uses it
regularly to do testing to make sure it works.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Jan Beulich
On 28.11.2019 10:55, Jan Beulich wrote:
> On 27.11.2019 19:17, Andrew Cooper wrote:
>> On 21/11/2019 08:34, Jan Beulich wrote:
>>> On 20.11.2019 18:13, Andrew Cooper wrote:
 On 20/11/2019 16:40, Jürgen Groß wrote:
> On 20.11.19 17:30, Jan Beulich wrote:
>> On 08.11.2019 12:18, Jan Beulich wrote:
>>> The .file assembler directives generated by the compiler do not include
>>> any path components (gcc) or just the ones specified on the command
>>> line
>>> (clang, at least version 5), and hence multiple identically named
>>> source
>>> files (in different directories) may produce identically named static
>>> symbols (in their kallsyms representation). The binary diffing
>>> algorithm
>>> used by xen-livepatch, however, depends on having unique symbols.
>>>
>>> Make the ENFORCE_UNIQUE_SYMBOLS Kconfig option control the (build)
>>> behavior, and if enabled use objcopy to prepend the (relative to the
>>> xen/ subdirectory) path to the compiler invoked STT_FILE symbols. Note
>>> that this build option is made no longer depend on LIVEPATCH, but
>>> merely
>>> defaults to its setting now.
>>>
>>> Conditionalize explicit .file directive insertion in C files where it
>>> exists just to disambiguate names in a less generic manner; note that
>>> at the same time the redundant emission of STT_FILE symbols gets
>>> suppressed for clang. Assembler files as well as multiply compiled C
>>> ones using __OBJECT_FILE__ are left alone for the time being.
>>>
>>> Since we now expect there not to be any duplicates anymore, also don't
>>> force the selection of the option to 'n' anymore in allrandom.config.
>>> Similarly COVERAGE no longer suppresses duplicate symbol warnings if
>>> enforcement is in effect, which in turn allows
>>> SUPPRESS_DUPLICATE_SYMBOL_WARNINGS to simply depend on
>>> !ENFORCE_UNIQUE_SYMBOLS.
>>>
>>> Signed-off-by: Jan Beulich 
>> I've got acks from Konrad and Wei, but still need an x86 and a release
>> one here. Andrew? Or alternatively - Jürgen, would you rather not see
>> this go in anymore?
> In case the needed x86 Ack is coming in before RC3 I'm fine to give my
> Release-ack, but I'm hesitant to take it later.
 Has anyone actually tried building a livepatch with this change in place?
>>> Actually - what is your concern here? The exact spelling of symbols
>>> names should be of no interest to the tool. After all the compiler is
>>> free to invent all sorts of names for its local symbols, including
>>> the ones we would produce with this change in place. All the tool
>>> cares about is that the names be unambiguous. Hence any (theoretical)
>>> regression here would be a bug in the tools, which imo is no reason
>>> to delay this change any further. (Granted I should have got to it
>>> earlier, but it had been continuing to get deferred.)
>>
>> This might all be true (theoretically).
>>
>> The livepatch build tools are fragile and very sensitive to how the
>> object files are laid out.  There is a very real risk that this change
>> accidentally breaks livepatching totally, even on GCC builds.
>>
>> Were this to happen, it would be yet another 4.13 regression.
> 
> It's perhaps a matter of perception, but I'd still call this a
> live patching tools bug then, not a 4.13 regression.
> 
>> This is a change to fix a concrete livepatch issue with Clang.  Sure -
>> it resolves the symbol uniqueness failures for the in-tree build, but
>> considering the risks to the area you are modifying, the fact you
>> haven't even done a dev test of a livepatch build on GCC means that the
>> patch as a whole has not had what I would consider a reasonable amount
>> of testing.
> 
> While Clang is the primary area where we need this change, it is
> in no way limited to that environment. Gcc can, at any time,
> start triggering the issue again as well - both because of changes
> to the compiler itself, or because of (perhaps seemingly innocent)
> changes we do to Xen. I can't imagine the workaround for this
> would be to make it impossible altogether to select LIVEPATCH=y.

And indeed - on my box with gcc 4.3, a full fresh build fails with
two duplicate symbols. Roger's workaround therefore is not enough
in any event for 4.13, unless we want to - in the last minute -
raise the bar of what gcc versions we claim compatibility with.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 15:13, Roger Pau Monné  wrote:
> On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote:
>> On 28.11.2019 12:39, Roger Pau Monné wrote:
>>> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
 At the time the pending EOI stack was introduced there were no
 internally used IRQs which would have the LAPIC EOI issued from the
 ->end() hook. This had then changed with the introduction of IOMMUs,
 but the interaction issue was presumably masked by
 irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
 (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
 running without need"]).

 The problem is that with us re-enabling interrupts across handler
 invocation, a higher priority (guest) interrupt may trigger while
 handling a lower priority (internal) one. The EOI issued from
 ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
 EOI the higher priority (guest) interrupt, breaking (among other
 things) pending EOI stack logic's assumptions.
>>>
>>> Maybe there's something that I'm missing, but shouldn't hypervisor
>>> vectors always be higher priority than guest ones?
>>
>> Depends - IOMMU ones imo aren't something that needs urgently
>> dealing with, so a little bit of delay won't hurt. There would
>> only be a problem if such interrupts could be deferred
>> indefinitely.
>>
>>> I see there's already a range reserved for high priority vectors
>>> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
>>> interrupts not using this range?
>>
>> We'd quickly run out of high priority vectors on systems with
>> multiple (and perhaps indeed many) IOMMUs.
> 
> Well, there's no limit on the number of high priority vectors, since
> this is all a software abstraction. It only matters that such vectors
> are higher than guest owned ones.
> 
> I have to take a look, but I would think that Xen used vectors are the
> first ones to be allocated, and hence could start from
> FIRST_HIPRIORITY_VECTOR - 1 and go down from there.

If this was the case, then we wouldn't have observed the issue (despite
it being there) this patch tries to address. The IOMMUs for both Andrew
and me ended up using vector 0x28, below everything that e.g. the
IO-APIC RTE got assigned. Also don't forget that we don't allocate
vectors continuously, but such that they'd get spread across the
different priority levels. (Whether that's an awfully good idea is a
separate question.)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Roger Pau Monné
On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote:
> On 28.11.2019 12:39, Roger Pau Monné wrote:
> > On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
> >> At the time the pending EOI stack was introduced there were no
> >> internally used IRQs which would have the LAPIC EOI issued from the
> >> ->end() hook. This had then changed with the introduction of IOMMUs,
> >> but the interaction issue was presumably masked by
> >> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
> >> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
> >> running without need"]).
> >>
> >> The problem is that with us re-enabling interrupts across handler
> >> invocation, a higher priority (guest) interrupt may trigger while
> >> handling a lower priority (internal) one. The EOI issued from
> >> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
> >> EOI the higher priority (guest) interrupt, breaking (among other
> >> things) pending EOI stack logic's assumptions.
> > 
> > Maybe there's something that I'm missing, but shouldn't hypervisor
> > vectors always be higher priority than guest ones?
> 
> Depends - IOMMU ones imo aren't something that needs urgently
> dealing with, so a little bit of delay won't hurt. There would
> only be a problem if such interrupts could be deferred
> indefinitely.
> 
> > I see there's already a range reserved for high priority vectors
> > ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
> > interrupts not using this range?
> 
> We'd quickly run out of high priority vectors on systems with
> multiple (and perhaps indeed many) IOMMUs.

Well, there's no limit on the number of high priority vectors, since
this is all a software abstraction. It only matters that such vectors
are higher than guest owned ones.

I have to take a look, but I would think that Xen used vectors are the
first ones to be allocated, and hence could start from
FIRST_HIPRIORITY_VECTOR - 1 and go down from there. This way we will
end up with high priority vectors first, Xen used vectors, and
finally guest vectors.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses

2019-11-28 Thread Razvan COJOCARU
On 11/28/19 1:44 PM, Andrew Cooper wrote:
> c/s d0a699a389f1 "x86/monitor: add support for descriptor access events"
> introduced logic looking for what appeared to be exitinfo (not that this
> exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring
> information.  There is never any IDT vectoring involved in these intercepts so
> the value passed is always zero.
> 
> In fact, SVM doesn't provide any information, even in exitinfo1 and 2.  Note
> the error in the public API and state that this field is always 0, and drop
> the SVM logic in hvm_monitor_descriptor_access().
> 
> In the SVM vmexit handler itself, optimise the switch statement by observing
> that there is a linear transformation between the SVM exit_reason and
> VM_EVENT_DESC_* values.  (Bloat-o-meter reports 6028 => 5877 for a saving of
> 151 bytes).
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Razvan Cojocaru 
> CC: Tamas K Lengyel 
> CC: Alexandru Isaila 
> CC: Petre Pircalabu 
> CC: Adrian Pop 
> 
> Adrian: Do you recall what information you were attempting to forward from the
> VMCB?  I can't locate anything which would plausibly be interesting.

I think it's safe to go the route you're going (you shouldn't break 
anything).

Acked-by: Razvan Cojocaru 

(with or without addressing Tamas' comments).


Thanks,
Razvan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide

2019-11-28 Thread Lars Kurth


On 28/11/2019, 07:37, "Jan Beulich"  wrote:

On 28.11.2019 14:06, Lars Kurth wrote:
> I can certainly add something on the timing , along the lines of
> * For complex series, consider the time it takes to do reviews (maybe 
with a guide of LOC per hour) and give reviewers enough time to
> * For series with design issues or large questions, try and highlight the 
key open issues in cover letters clearly and solicit feedback from key 
maintainers who can comment on the open issue. The idea is to save both the 
contributor and the reviewers time by focussing on what needs to be resolved 
> * Don’t repost a series, unless all review comments are addressed
> or the reviewers asked you to do so. The problem with this is that
> this is somewhat in conflict with the "let's focus on the core
> issues and not get distracted by details early on in a review cycle".
> In other words, this can only work, if reviewers focus on major
> issues in early reviews only and do not focus on style, coding
> standards, etc.

But this doesn't make much sense either, because then full re-reviews
need to happen anyway on later versions, to also deal with the minor
issues. For RFC kind of series omitting style and alike feedback
certainly makes sense, but as soon as a patch is non-RFC, it should
be considered good to go in by the submitter.

OK, I think we have a disconnect between ideal and reality. 

I see two issues today
* Key maintainers don't always review RFC series [they end up at the bottom of 
the priority list, even though spending time on RFCs will save time elsewhere 
later]. So the effect is that then the contributor assumes there are no major 
issues and ends it as a proper series
* In practice what has happened often in the past is that design, architecture, 
assumption flaws are found in early versions of a series.
   - This usually happens because of an oversight or because there was no 
design discussion prior to the series being posted and agreed
   - Common sense would dictate that the biggest benefit for both the reviewer, 
the contributor and the community as a whole would be to try and focus on such 
flaws and leave everything aside
   - Of course there may be value in doing a detailed reviews of such a series 
as there may be bits that are unaffected by such a flaw
   - But there will likely be parts which are not: doing a detailed review of 
such portions wastes everyone's time

So coming back to your point. Ideally, it would be nice if we had the 
capability to call out parts of a series as "problematic" and treating such 
parts differently

Lars

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 14:53, Andrew Cooper wrote:
> On 28/11/2019 12:10, Andrew Cooper wrote:
>> On 28/11/2019 11:03, Jan Beulich wrote:
>>> Notes:
>>>
>>> - In principle we could get away without the check_eoi_deferral flag.
>>>   I've introduced it just to make sure there's as little change as
>>>   possible to unaffected paths.
>>> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
>>>   strictly necessary.
>> I don't think the cpu_has_pending_apic_eoi() check is necessary.  It is
>> checked at the head of end_nonmaskable_irq() as well.
>>
>> Similarly, I'm not sure that check_eoi_deferral is something that we'd
>> want to introduce.
>>
>> I'll drop both of these and test, seeing as I have a repro of the problem.
> 
> Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so
> I didn't), and dropping just check_eoi_deferral on its own definitely
> breaks things.
> 
> Given the 4.13 timeline, lets go with it in this form, seeing as it is
> the version which had all of last night's worth of testing.
> 
> Acked-by: Andrew Cooper 
> Tested-by: Andrew Cooper 

Thanks! I've taken note to produce a patch (if possible at all, given
the results of your attempt) to remove the extra pieces again, ideally
to go in pretty soon after the branching.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Paul Durrant
From: George Dunlap 

Xen used to have single, system-wide limits for the number of grant
frames and maptrack frames a guest was allowed to create. Increasing
or decreasing this single limit on the Xen command-line would change
the limit for all guests on the system.

Later, per-domain limits for these values was created. The system-wide
limits became strict limits: domains could not be created with higher
limits, but could be created with lower limits. However, that change
also introduced a range of different "default" values into various
places in the toolstack:

- The python libxc bindings hard-coded these values to 32 and 1024,
  respectively
- The libxl default values are 32 and 1024 respectively.
- xl will use the libxl default for maptrack, but does its own default
  calculation for grant frames: either 32 or 64, based on the max
  possible mfn.

These defaults interact poorly with the hypervisor command-line limit:

- The hypervisor command-line limit cannot be used to raise the limit
  for all guests anymore, as the default in the toolstack will
  effectively override this.
- If you use the hypervisor command-line limit to *reduce* the limit,
  then the "default" values generated by the toolstack are too high,
  and all guest creations will fail.

In other words, the toolstack defaults require any change to be
effected by having the admin explicitly specify a new value in every
guest.

In order to address this, have grant_table_init treat negative values
for max_grant_frames and max_maptrack_frames as instructions to use the
system-wide default, and have all the above toolstacks default to passing
-1 unless a different value is explicitly configured.

This restores the old behavior in that changing the hypervisor command-line
option can change the behavior for all guests, while retaining the ability
to set per-guest values.  It also removes the bug that reducing the
system-wide max will cause all domains without explicit limits to fail.

NOTE: - The Ocaml bindings require the caller to always specify a value,
and the code to start a xenstored stubdomain hard-codes these to 4
and 128 respectively; this behavour will not be modified.

Signed-off-by: George Dunlap 
Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Anthony PERARD 
Cc: "Marek Marczykowski-Górecki" 
Cc: Volodymyr Babchuk 
Cc: "Roger Pau Monné" 

v4:
 - Add missing braces in xlu_cfg_get_bounded_long()

v3:
 - Make sure that specified values cannot be negative or overflow a
   signed int

v2:
 - re-worked George's original commit massage a little
 - fixed the text in xl.conf.5.pod
 - use -1 as the sentinel value for 'default' and < 0 for checking it
---
 docs/man/xl.conf.5.pod|  6 ++--
 tools/libxl/libxl.h   |  4 +--
 tools/libxl/libxl_types.idl   |  4 +--
 tools/libxl/libxlu_cfg.c  | 26 +++--
 tools/libxl/libxlutil.h   |  2 ++
 tools/python/xen/lowlevel/xc/xc.c |  4 +--
 tools/xl/xl.c | 15 --
 tools/xl/xl_parse.c   |  9 --
 xen/arch/arm/setup.c  |  2 +-
 xen/arch/x86/setup.c  |  4 +--
 xen/common/grant_table.c  | 46 ---
 xen/include/public/domctl.h   | 10 ---
 xen/include/xen/grant_table.h |  8 +++---
 13 files changed, 102 insertions(+), 38 deletions(-)

diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod
index 962144e38e..207ab3e77a 100644
--- a/docs/man/xl.conf.5.pod
+++ b/docs/man/xl.conf.5.pod
@@ -81,13 +81,15 @@ Default: C
 
 Sets the default value for the C domain config value.
 
-Default: C<32> on hosts up to 16TB of memory, C<64> on hosts larger than 16TB
+Default: value of Xen command line B parameter (or its
+default value if unspecified).
 
 =item B
 
 Sets the default value for the C domain config value.
 
-Default: C<1024>
+Default: value of Xen command line B
+parameter (or its default value if unspecified).
 
 =item B
 
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 49b56fa1a3..a2a5d321c5 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -364,8 +364,8 @@
  */
 #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1
 
-#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32
-#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024
+#define LIBXL_MAX_GRANT_FRAMES_DEFAULT -1
+#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT -1
 
 /*
  * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 0546d7865a..63e29bb2fb 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -511,8 +511,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
 
 ("vnuma_nodes", Array(libxl_vnode_info, "num_vnuma_nodes")),
 
-("max_grant_frames",uint32, {'init_val': 
'LIBXL_MAX_GRANT_FRAMES_DEFAULT'}),
-

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Jan Beulich
On 28.11.2019 14:54, Ross Lagerwall wrote:
> On 11/28/19 1:49 PM, Jan Beulich wrote:
>> On 28.11.2019 13:15, Sergey Dyasli wrote:
>>> Applying the patch didn't end up well for my test LP (from another thread):
>>>
>>> Perform full initial build with 8 CPU(s)...
>>> Reading special section data
>>> Apply patch and build with 8 CPU(s)...
>>> Unapply patch and build with 8 CPU(s)...
>>> Extracting new and modified ELF sections...
>>> Processing xen/arch/x86/mm/shadow/guest_2.o
>>> Processing xen/arch/x86/mm/shadow/guest_4.o
>>> Processing xen/arch/x86/mm/shadow/guest_3.o
>>> Processing xen/arch/x86/mm/guest_walk_3.o
>>> Processing xen/arch/x86/mm/hap/guest_walk_3level.o
>>> Processing xen/arch/x86/mm/hap/guest_walk_4level.o
>>> Processing xen/arch/x86/mm/hap/guest_walk_2level.o
>>> Processing xen/arch/x86/mm/guest_walk_2.o
>>> Processing xen/arch/x86/mm/guest_walk_4.o
>>> Processing xen/arch/x86/efi/efi/check.o
>>> Processing xen/arch/x86/pv/gpr_switch.o
>>> Processing xen/arch/x86/indirect-thunk.o
>>> Processing xen/arch/x86/boot/head.o
>>> Processing xen/arch/x86/x86_64/kexec_reloc.o
>>> Processing xen/arch/x86/x86_64/compat/entry.o
>>> Processing xen/arch/x86/x86_64/entry.o
>>> Processing xen/arch/x86/hvm/vmx/entry.o
>>> Processing xen/arch/x86/hvm/svm/entry.o
>>> Processing 
>>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o
>>> Processing 
>>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o
>>> Processing 
>>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1s.o
>>> Processing 
>>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1r.o
>>> ERROR: no functional changes found.
>>>
>>> So this looks like a regression.
>>
>> Thanks for doing the testing. But what am I to conclude from
>> the above? I can't even tell why "no functional changes found"
>> is an error.
>>
> 
> It's due to the way livepatch-build tool interposes on the build to capture
> changed object files for later comparison.  Now that objcopy writes out the
> proper object files rather than gcc (which just writes a temporary one), the
> livepatch-build tool needs some adjustment otherwise it doesn't capture any
> changed files. I'm working on a patch.

For my own education, and just if you have the time: Why would there
be any dependency on which build utility produces the object file?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 144346: tolerable all pass - PUSHED

2019-11-28 Thread osstest service owner
flight 144346 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144346/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  56348df32bbc782e63b6e3fb978b80e015ae76e7
baseline version:
 xen  9a400d1797ec7f77ffefeb5c4e17a8c2e8b91a12

Last test of basis   144335  2019-11-27 17:01:49 Z0 days
Testing same since   144346  2019-11-28 11:01:36 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Joe Jin 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   9a400d1797..56348df32b  56348df32bbc782e63b6e3fb978b80e015ae76e7 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Ross Lagerwall
On 11/28/19 1:49 PM, Jan Beulich wrote:
> On 28.11.2019 13:15, Sergey Dyasli wrote:
>> Applying the patch didn't end up well for my test LP (from another thread):
>>
>> Perform full initial build with 8 CPU(s)...
>> Reading special section data
>> Apply patch and build with 8 CPU(s)...
>> Unapply patch and build with 8 CPU(s)...
>> Extracting new and modified ELF sections...
>> Processing xen/arch/x86/mm/shadow/guest_2.o
>> Processing xen/arch/x86/mm/shadow/guest_4.o
>> Processing xen/arch/x86/mm/shadow/guest_3.o
>> Processing xen/arch/x86/mm/guest_walk_3.o
>> Processing xen/arch/x86/mm/hap/guest_walk_3level.o
>> Processing xen/arch/x86/mm/hap/guest_walk_4level.o
>> Processing xen/arch/x86/mm/hap/guest_walk_2level.o
>> Processing xen/arch/x86/mm/guest_walk_2.o
>> Processing xen/arch/x86/mm/guest_walk_4.o
>> Processing xen/arch/x86/efi/efi/check.o
>> Processing xen/arch/x86/pv/gpr_switch.o
>> Processing xen/arch/x86/indirect-thunk.o
>> Processing xen/arch/x86/boot/head.o
>> Processing xen/arch/x86/x86_64/kexec_reloc.o
>> Processing xen/arch/x86/x86_64/compat/entry.o
>> Processing xen/arch/x86/x86_64/entry.o
>> Processing xen/arch/x86/hvm/vmx/entry.o
>> Processing xen/arch/x86/hvm/svm/entry.o
>> Processing 
>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o
>> Processing 
>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o
>> Processing 
>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1s.o
>> Processing 
>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1r.o
>> ERROR: no functional changes found.
>>
>> So this looks like a regression.
> 
> Thanks for doing the testing. But what am I to conclude from
> the above? I can't even tell why "no functional changes found"
> is an error.
> 

It's due to the way livepatch-build tool interposes on the build to capture
changed object files for later comparison.  Now that objcopy writes out the
proper object files rather than gcc (which just writes a temporary one), the
livepatch-build tool needs some adjustment otherwise it doesn't capture any
changed files. I'm working on a patch.

Thanks,
-- 
Ross Lagerwall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Andrew Cooper
On 28/11/2019 12:10, Andrew Cooper wrote:
> On 28/11/2019 11:03, Jan Beulich wrote:
>> Notes:
>>
>> - In principle we could get away without the check_eoi_deferral flag.
>>   I've introduced it just to make sure there's as little change as
>>   possible to unaffected paths.
>> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
>>   strictly necessary.
> I don't think the cpu_has_pending_apic_eoi() check is necessary.  It is
> checked at the head of end_nonmaskable_irq() as well.
>
> Similarly, I'm not sure that check_eoi_deferral is something that we'd
> want to introduce.
>
> I'll drop both of these and test, seeing as I have a repro of the problem.

Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so
I didn't), and dropping just check_eoi_deferral on its own definitely
breaks things.

Given the 4.13 timeline, lets go with it in this form, seeing as it is
the version which had all of last night's worth of testing.

Acked-by: Andrew Cooper 
Tested-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH-for-4.13 v3] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Durrant, Paul
> -Original Message-
> From: Paul Durrant 
> Sent: 28 November 2019 12:51
> To: xen-devel@lists.xenproject.org
> Cc: George Dunlap ; Durrant, Paul
> ; Ian Jackson ; Wei Liu
> ; Andrew Cooper ; George Dunlap
> ; Jan Beulich ; Julien
> Grall ; Konrad Rzeszutek Wilk ;
> Stefano Stabellini ; Anthony PERARD
> ; Marek Marczykowski-Górecki
> ; Volodymyr Babchuk
> ; Roger Pau Monné 
> Subject: [PATCH-for-4.13 v3] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> From: George Dunlap 
> 
> Xen used to have single, system-wide limits for the number of grant
> frames and maptrack frames a guest was allowed to create. Increasing
> or decreasing this single limit on the Xen command-line would change
> the limit for all guests on the system.
> 
> Later, per-domain limits for these values was created. The system-wide
> limits became strict limits: domains could not be created with higher
> limits, but could be created with lower limits. However, that change
> also introduced a range of different "default" values into various
> places in the toolstack:
> 
> - The python libxc bindings hard-coded these values to 32 and 1024,
>   respectively
> - The libxl default values are 32 and 1024 respectively.
> - xl will use the libxl default for maptrack, but does its own default
>   calculation for grant frames: either 32 or 64, based on the max
>   possible mfn.
> 
> These defaults interact poorly with the hypervisor command-line limit:
> 
> - The hypervisor command-line limit cannot be used to raise the limit
>   for all guests anymore, as the default in the toolstack will
>   effectively override this.
> - If you use the hypervisor command-line limit to *reduce* the limit,
>   then the "default" values generated by the toolstack are too high,
>   and all guest creations will fail.
> 
> In other words, the toolstack defaults require any change to be
> effected by having the admin explicitly specify a new value in every
> guest.
> 
> In order to address this, have grant_table_init treat negative values
> for max_grant_frames and max_maptrack_frames as instructions to use the
> system-wide default, and have all the above toolstacks default to passing
> -1 unless a different value is explicitly configured.
> 
> This restores the old behavior in that changing the hypervisor command-
> line
> option can change the behavior for all guests, while retaining the ability
> to set per-guest values.  It also removes the bug that reducing the
> system-wide max will cause all domains without explicit limits to fail.
> 
> NOTE: - The Ocaml bindings require the caller to always specify a value,
> and the code to start a xenstored stubdomain hard-codes these to 4
>   and 128 respectively; this behavour will not be modified.
> 
> Signed-off-by: George Dunlap 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Ian Jackson 
> Cc: Wei Liu 
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Jan Beulich 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Anthony PERARD 
> Cc: "Marek Marczykowski-Górecki" 
> Cc: Volodymyr Babchuk 
> Cc: "Roger Pau Monné" 
> 
> v3:
>  - Make sure that specified values cannot be negative or overflow a
>signed int
> 
> v2:
>  - re-worked George's original commit massage a little
>  - fixed the text in xl.conf.5.pod
>  - use -1 as the sentinel value for 'default' and < 0 for checking it
> ---
>  docs/man/xl.conf.5.pod|  6 ++--
>  tools/libxl/libxl.h   |  4 +--
>  tools/libxl/libxl_types.idl   |  4 +--
>  tools/libxl/libxlu_cfg.c  | 24 ++--
>  tools/libxl/libxlutil.h   |  2 ++
>  tools/python/xen/lowlevel/xc/xc.c |  4 +--
>  tools/xl/xl.c | 15 --
>  tools/xl/xl_parse.c   |  9 --
>  xen/arch/arm/setup.c  |  2 +-
>  xen/arch/x86/setup.c  |  4 +--
>  xen/common/grant_table.c  | 46 ---
>  xen/include/public/domctl.h   | 10 ---
>  xen/include/xen/grant_table.h |  8 +++---
>  13 files changed, 100 insertions(+), 38 deletions(-)
> 
> diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod
> index 962144e38e..207ab3e77a 100644
> --- a/docs/man/xl.conf.5.pod
> +++ b/docs/man/xl.conf.5.pod
> @@ -81,13 +81,15 @@ Default: C
> 
>  Sets the default value for the C domain config value.
> 
> -Default: C<32> on hosts up to 16TB of memory, C<64> on hosts larger than
> 16TB
> +Default: value of Xen command line B parameter (or its
> +default value if unspecified).
> 
>  =item B
> 
>  Sets the default value for the C domain config
> value.
> 
> -Default: C<1024>
> +Default: value of Xen command line B
> +parameter (or its default value if unspecified).
> 
>  =item B
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 49b56fa1a3..a2a5d321c5 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -364,8 +364,8 @@
>   */
>  #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1
> 
> -#define 

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Jan Beulich
On 28.11.2019 13:15, Sergey Dyasli wrote:
> Applying the patch didn't end up well for my test LP (from another thread):
> 
> Perform full initial build with 8 CPU(s)...
> Reading special section data
> Apply patch and build with 8 CPU(s)...
> Unapply patch and build with 8 CPU(s)...
> Extracting new and modified ELF sections...
> Processing xen/arch/x86/mm/shadow/guest_2.o
> Processing xen/arch/x86/mm/shadow/guest_4.o
> Processing xen/arch/x86/mm/shadow/guest_3.o
> Processing xen/arch/x86/mm/guest_walk_3.o
> Processing xen/arch/x86/mm/hap/guest_walk_3level.o
> Processing xen/arch/x86/mm/hap/guest_walk_4level.o
> Processing xen/arch/x86/mm/hap/guest_walk_2level.o
> Processing xen/arch/x86/mm/guest_walk_2.o
> Processing xen/arch/x86/mm/guest_walk_4.o
> Processing xen/arch/x86/efi/efi/check.o
> Processing xen/arch/x86/pv/gpr_switch.o
> Processing xen/arch/x86/indirect-thunk.o
> Processing xen/arch/x86/boot/head.o
> Processing xen/arch/x86/x86_64/kexec_reloc.o
> Processing xen/arch/x86/x86_64/compat/entry.o
> Processing xen/arch/x86/x86_64/entry.o
> Processing xen/arch/x86/hvm/vmx/entry.o
> Processing xen/arch/x86/hvm/svm/entry.o
> Processing 
> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o
> Processing 
> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o
> Processing 
> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1s.o
> Processing 
> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1r.o
> ERROR: no functional changes found.
> 
> So this looks like a regression.

Thanks for doing the testing. But what am I to conclude from
the above? I can't even tell why "no functional changes found"
is an error.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses

2019-11-28 Thread Tamas K Lengyel
On Thu, Nov 28, 2019 at 4:44 AM Andrew Cooper  wrote:
>
> c/s d0a699a389f1 "x86/monitor: add support for descriptor access events"
> introduced logic looking for what appeared to be exitinfo (not that this
> exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring
> information.  There is never any IDT vectoring involved in these intercepts so
> the value passed is always zero.
>
> In fact, SVM doesn't provide any information, even in exitinfo1 and 2.  Note
> the error in the public API and state that this field is always 0, and drop
> the SVM logic in hvm_monitor_descriptor_access().
>
> In the SVM vmexit handler itself, optimise the switch statement by observing
> that there is a linear transformation between the SVM exit_reason and
> VM_EVENT_DESC_* values.  (Bloat-o-meter reports 6028 => 5877 for a saving of
> 151 bytes).
>
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Razvan Cojocaru 
> CC: Tamas K Lengyel 
> CC: Alexandru Isaila 
> CC: Petre Pircalabu 
> CC: Adrian Pop 
>
> Adrian: Do you recall what information you were attempting to forward from the
> VMCB?  I can't locate anything which would plausibly be interesting.
>
> This is part of a longer cleanup series I gathered in the wake of the task
> switch issues, but it is pulled out ahead due to its interaction with the
> public interface.
> ---
>  xen/arch/x86/hvm/monitor.c|  4 
>  xen/arch/x86/hvm/svm/svm.c| 37 +
>  xen/include/public/vm_event.h |  4 ++--
>  3 files changed, 19 insertions(+), 26 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
> index 7fb1e2c04e..1f23fe25e8 100644
> --- a/xen/arch/x86/hvm/monitor.c
> +++ b/xen/arch/x86/hvm/monitor.c
> @@ -113,10 +113,6 @@ void hvm_monitor_descriptor_access(uint64_t exit_info,
>  req.u.desc_access.arch.vmx.instr_info = exit_info;
>  req.u.desc_access.arch.vmx.exit_qualification = 
> vmx_exit_qualification;
>  }
> -else
> -{
> -req.u.desc_access.arch.svm.exitinfo = exit_info;
> -}
>
>  monitor_traps(current, true, );
>  }
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 0fb1908c18..776cf11459 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -2980,29 +2980,26 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
>  svm_vmexit_do_pause(regs);
>  break;
>
> -case VMEXIT_IDTR_READ:
> -case VMEXIT_IDTR_WRITE:
> -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0,
> -VM_EVENT_DESC_IDTR, exit_reason == VMEXIT_IDTR_WRITE);
> -break;
> -
> -case VMEXIT_GDTR_READ:
> -case VMEXIT_GDTR_WRITE:
> -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0,
> -VM_EVENT_DESC_GDTR, exit_reason == VMEXIT_GDTR_WRITE);
> -break;
> +case VMEXIT_IDTR_READ ... VMEXIT_TR_WRITE:
> +{
> +/*
> + * Consecutive block of 8 exit codes (sadly not aligned).  Top bit
> + * indicates write (vs read), bottom 2 bits map linearly to
> + * VM_EVENT_DESC_* values.
> + */
> +#define E2D(e)  e) - VMEXIT_IDTR_READ) & 3) + 1)
> +bool write = ((exit_reason - VMEXIT_IDTR_READ) & 4);
> +unsigned int desc = E2D(exit_reason);
>
> -case VMEXIT_LDTR_READ:
> -case VMEXIT_LDTR_WRITE:
> -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0,
> -VM_EVENT_DESC_LDTR, exit_reason == VMEXIT_LDTR_WRITE);
> -break;
> +BUILD_BUG_ON(E2D(VMEXIT_IDTR_READ) != VM_EVENT_DESC_IDTR);
> +BUILD_BUG_ON(E2D(VMEXIT_GDTR_READ) != VM_EVENT_DESC_GDTR);
> +BUILD_BUG_ON(E2D(VMEXIT_LDTR_READ) != VM_EVENT_DESC_LDTR);
> +BUILD_BUG_ON(E2D(VMEXIT_TR_READ)   != VM_EVENT_DESC_TR);
> +#undef E2D
>
> -case VMEXIT_TR_READ:
> -case VMEXIT_TR_WRITE:
> -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0,
> -VM_EVENT_DESC_TR, exit_reason == VMEXIT_TR_WRITE);
> +hvm_descriptor_access_intercept(0, 0, desc, write);
>  break;
> +}
>
>  default:
>  unexpected_exit_type:
> diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
> index 959083d8c4..d1b5c95f72 100644
> --- a/xen/include/public/vm_event.h
> +++ b/xen/include/public/vm_event.h
> @@ -302,8 +302,8 @@ struct vm_event_desc_access {
>  uint64_t exit_qualification; /* VMX: VMCS Exit Qualification */
>  } vmx;
>  struct {
> -uint64_t exitinfo;   /* SVM: VMCB EXITINFO */
> -uint64_t _pad2;
> +uint64_t exitinfo;   /* SVM: Always 0.  This field made 
> */

There really is no point in retaining a useless field. Just remove it
and bump the event interface version. That's what it's for.

> +uint64_t _pad2;  /* its way into the API by error.  
> */


Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Jan Beulich
On 28.11.2019 14:10, Andrew Cooper wrote:
> On 28/11/2019 10:17, George Dunlap wrote:
>>> On Nov 28, 2019, at 9:55 AM, Jan Beulich  wrote:
>> Has anyone actually tried building a livepatch with this change in place?
> Actually - what is your concern here? The exact spelling of symbols
> names should be of no interest to the tool. After all the compiler is
> free to invent all sorts of names for its local symbols, including
> the ones we would produce with this change in place. All the tool
> cares about is that the names be unambiguous. Hence any (theoretical)
> regression here would be a bug in the tools, which imo is no reason
> to delay this change any further. (Granted I should have got to it
> earlier, but it had been continuing to get deferred.)
 This might all be true (theoretically).

 The livepatch build tools are fragile and very sensitive to how the
 object files are laid out.  There is a very real risk that this change
 accidentally breaks livepatching totally, even on GCC builds.

 Were this to happen, it would be yet another 4.13 regression.
>>> It's perhaps a matter of perception, but I'd still call this a
>>> live patching tools bug then, not a 4.13 regression.
>> After the discussion yesterday, I was thinking a bit more about
> this, and I’m not happy with the principle Andy seems to be operating on,
> 
> I'm sorry that you feel that way.
> 
>> that it’s reasonable to completely block a bug-fixing patch to
> Xen, forcing a work-around to be used in a release, because it
> has unknown effects on livepatching.
> 
> This is not a fair characterisation of what is going on here.  Ignore
> the specifics of this patch - they are not relevant to my objection.
> 
> As a maintainer, it is my responsibility to ensure that crap doesn't get
> committed.

While above you say to ignore the specifics of this patch, I'm
afraid I still take "crap" as a qualification of the submitted
patch (and possibly other parts of my work). Perhaps I should
go look for another job if it's like this.

> As a consequence, it is up to me to judge whether I believe that the
> submitter of a patch has provided adequate thought/testing to what they
> are changing.  Mostly this is judged on comments provided (or usually,
> their absence), weighed up against the risk of what it might be likely
> to break.
> 
> In the case that I don't believe adequate through/testing/etc has been
> done, I'm not going to ack the patch.  I'd be failing as a maintainer if
> I did.
> 
> Ergo, I am not inclined to change my position.
> 
> 
> In this case, all I asked was "has anyone done a livepatch build?"
> 
> I'd be entirely happy with a reply of "yes [I or someone else did] and
> it seems ok".
> 
> I'd even be happy with "There does seem to be an issue with
> livepatching, but I've engaged the relevant people in this other thread".
> 
> What I'm not happy with is "I haven't even done a single build to see
> whether it might have problems", and what is definitely not acceptable
> is, and I quote:
> 
>> And I do not consider it my responsibility to
>> do regression tests of the live patching tools.
> 
> Yes.  Yes it really is, when you're making a material change
> specifically to this area, with a high chance of adverse impact.
> 
> I don't expect you necessarily to fix the issue, but I do expect you to
> have some idea of whether you're trading one 4.13 blocker for a
> different one.

Once again - no. If there is a problem with this patch, then
please point it out by review comments. If there is a problem
with the live patching tools, it is there where things need
fixing. Just to remind you of what I'm hearing/seeing you and
others say/do: Quite often issues with the tools arise when
dealing with particular XSAs. That's still no reason to
delay/reject the fixes for those XSAs.

And once again - please accept that views if different people
may vary.

>>>  If they're so
>>> extremely fragile, then I think this needs urgently taking care of
>>> by their maintainers. As mentioned before - how exactly static
>>> symbols get represented is up to the compiler, i.e. may change at
>>> any time. As a result, any change whatsoever would need such
>>> regression testing, no matter that I agree that a larger scope
>>> change like this one has slightly higher potential of triggering
>>> some issue.
>> This is another argument I would agree with.
>>
>> Given the closeness to the release, I’d favor checking in the
> patch today or tomorrow, regardless of testing status, so that
> it can get more testing in our automated systems; it can always
> be reverted if it is shown to break livepatching on gcc.
> 
> Oh, and shocker in what is apparently a surprise to everyone but me...
> 
> Sergey went and did the work Jan believed to be "not his
> responsibility", and yes - this really does break livepatching.
> 
> Do I expect Jan to fix it?  No, but I do expect a discussion and an
> understanding of the 

Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide

2019-11-28 Thread Jan Beulich
On 28.11.2019 14:06, Lars Kurth wrote:
> I can certainly add something on the timing , along the lines of
> * For complex series, consider the time it takes to do reviews (maybe with a 
> guide of LOC per hour) and give reviewers enough time to
> * For series with design issues or large questions, try and highlight the key 
> open issues in cover letters clearly and solicit feedback from key 
> maintainers who can comment on the open issue. The idea is to save both the 
> contributor and the reviewers time by focussing on what needs to be resolved 
> * Don’t repost a series, unless all review comments are addressed
> or the reviewers asked you to do so. The problem with this is that
> this is somewhat in conflict with the "let's focus on the core
> issues and not get distracted by details early on in a review cycle".
> In other words, this can only work, if reviewers focus on major
> issues in early reviews only and do not focus on style, coding
> standards, etc.

But this doesn't make much sense either, because then full re-reviews
need to happen anyway on later versions, to also deal with the minor
issues. For RFC kind of series omitting style and alike feedback
certainly makes sense, but as soon as a patch is non-RFC, it should
be considered good to go in by the submitter.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 12:39, Roger Pau Monné wrote:
> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
>> At the time the pending EOI stack was introduced there were no
>> internally used IRQs which would have the LAPIC EOI issued from the
>> ->end() hook. This had then changed with the introduction of IOMMUs,
>> but the interaction issue was presumably masked by
>> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
>> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
>> running without need"]).
>>
>> The problem is that with us re-enabling interrupts across handler
>> invocation, a higher priority (guest) interrupt may trigger while
>> handling a lower priority (internal) one. The EOI issued from
>> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
>> EOI the higher priority (guest) interrupt, breaking (among other
>> things) pending EOI stack logic's assumptions.
> 
> Maybe there's something that I'm missing, but shouldn't hypervisor
> vectors always be higher priority than guest ones?

Depends - IOMMU ones imo aren't something that needs urgently
dealing with, so a little bit of delay won't hurt. There would
only be a problem if such interrupts could be deferred
indefinitely.

> I see there's already a range reserved for high priority vectors
> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
> interrupts not using this range?

We'd quickly run out of high priority vectors on systems with
multiple (and perhaps indeed many) IOMMUs.

> IMO it seems troublesome that pending guests vectors can delay the
> injection of hypervisor interrupt vectors.

As per above - depends.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Andrew Cooper
On 28/11/2019 10:17, George Dunlap wrote:
>> On Nov 28, 2019, at 9:55 AM, Jan Beulich  wrote:
> Has anyone actually tried building a livepatch with this change in place?
 Actually - what is your concern here? The exact spelling of symbols
 names should be of no interest to the tool. After all the compiler is
 free to invent all sorts of names for its local symbols, including
 the ones we would produce with this change in place. All the tool
 cares about is that the names be unambiguous. Hence any (theoretical)
 regression here would be a bug in the tools, which imo is no reason
 to delay this change any further. (Granted I should have got to it
 earlier, but it had been continuing to get deferred.)
>>> This might all be true (theoretically).
>>>
>>> The livepatch build tools are fragile and very sensitive to how the
>>> object files are laid out.  There is a very real risk that this change
>>> accidentally breaks livepatching totally, even on GCC builds.
>>>
>>> Were this to happen, it would be yet another 4.13 regression.
>> It's perhaps a matter of perception, but I'd still call this a
>> live patching tools bug then, not a 4.13 regression.
> After the discussion yesterday, I was thinking a bit more about this, and I’m 
> not happy with the principle Andy seems to be operating on,

I'm sorry that you feel that way.

> that it’s reasonable to completely block a bug-fixing patch to Xen, forcing a 
> work-around to be used in a release, because it has unknown effects on 
> livepatching.

This is not a fair characterisation of what is going on here.  Ignore
the specifics of this patch - they are not relevant to my objection.

As a maintainer, it is my responsibility to ensure that crap doesn't get
committed.

As a consequence, it is up to me to judge whether I believe that the
submitter of a patch has provided adequate thought/testing to what they
are changing.  Mostly this is judged on comments provided (or usually,
their absence), weighed up against the risk of what it might be likely
to break.

In the case that I don't believe adequate through/testing/etc has been
done, I'm not going to ack the patch.  I'd be failing as a maintainer if
I did.

Ergo, I am not inclined to change my position.


In this case, all I asked was "has anyone done a livepatch build?"

I'd be entirely happy with a reply of "yes [I or someone else did] and
it seems ok".

I'd even be happy with "There does seem to be an issue with
livepatching, but I've engaged the relevant people in this other thread".

What I'm not happy with is "I haven't even done a single build to see
whether it might have problems", and what is definitely not acceptable
is, and I quote:

> And I do not consider it my responsibility to
> do regression tests of the live patching tools.

Yes.  Yes it really is, when you're making a material change
specifically to this area, with a high chance of adverse impact.

I don't expect you necessarily to fix the issue, but I do expect you to
have some idea of whether you're trading one 4.13 blocker for a
different one.

>>  If they're so
>> extremely fragile, then I think this needs urgently taking care of
>> by their maintainers. As mentioned before - how exactly static
>> symbols get represented is up to the compiler, i.e. may change at
>> any time. As a result, any change whatsoever would need such
>> regression testing, no matter that I agree that a larger scope
>> change like this one has slightly higher potential of triggering
>> some issue.
> This is another argument I would agree with.
>
> Given the closeness to the release, I’d favor checking in the patch today or 
> tomorrow, regardless of testing status, so that it can get more testing in 
> our automated systems; it can always be reverted if it is shown to break 
> livepatching on gcc.

Oh, and shocker in what is apparently a surprise to everyone but me...

Sergey went and did the work Jan believed to be "not his
responsibility", and yes - this really does break livepatching.

Do I expect Jan to fix it?  No, but I do expect a discussion and an
understanding of the issue before this patch gets anywhere near staging.

~Andrew, quite irritated at the total lack of due diligence being
demonstrated here.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide

2019-11-28 Thread Lars Kurth


On 28/11/2019, 04:09, "Jan Beulich"  wrote:

On 28.11.2019 01:54, Stefano Stabellini wrote:
> On Thu, 26 Sep 2019, Lars Kurth wrote:
>> From: Lars Kurth 
>>
>> This document highlights what reviewers such as maintainers and 
committers look
>> for when reviewing code. It sets expectations for code authors and 
provides
>> a framework for code reviewers.
> 
> I think the document is missing a couple of things:
> 
> - a simple one line statement that possibly the most important thing in
>   a code review is to indentify any bugs in the code
> 
> - an explanation that requests for major changes to the series should be
>   made early on (i.e. let's not change the architecture of a feature at
>   v9 if possible) I also made this comment in reply to patch #5. I'll
>   let you decide where is the best place for it.

This needs balancing. People crucial to the evaluation of a new
feature and its implementation simply may not have the time to
reply prior to v9. We've had situations where people posted new
revisions every other day, sometimes even more than one per day.

I can certainly add something on the timing , along the lines of
* For complex series, consider the time it takes to do reviews (maybe with a 
guide of LOC per hour) and give reviewers enough time to
* For series with design issues or large questions, try and highlight the key 
open issues in cover letters clearly and solicit feedback from key maintainers 
who can comment on the open issue. The idea is to save both the contributor and 
the reviewers time by focussing on what needs to be resolved 
* Don’t repost a series, unless all review comments are addressed or the 
reviewers asked you to do so. The problem with this is that this is somewhat in 
conflict with the "let's focus on the core issues and not get distracted by 
details early on in a review cycle". In other words, this can only work, if 
reviewers focus on major issues in early reviews only and do not focus on 
style, coding standards, etc. As soon as a reviewer comes back with detailed 
feedback, the contributor will feel obliged to fix these. This creates a 
motivation to want to please the reviewer send out new versions of series 
fixing cosmetic issues without addressing the substantial issues, leading to 
what Jan describes. I am looking for opinions here.  

As indicated in several other contexts before - imo people not
helping to shoulder the review load should also not have the
expectation that their (large) contributions will be looked at
in due course. 

I can add something to this effect.  

Lars


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH-for-4.13 v3] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Paul Durrant
From: George Dunlap 

Xen used to have single, system-wide limits for the number of grant
frames and maptrack frames a guest was allowed to create. Increasing
or decreasing this single limit on the Xen command-line would change
the limit for all guests on the system.

Later, per-domain limits for these values was created. The system-wide
limits became strict limits: domains could not be created with higher
limits, but could be created with lower limits. However, that change
also introduced a range of different "default" values into various
places in the toolstack:

- The python libxc bindings hard-coded these values to 32 and 1024,
  respectively
- The libxl default values are 32 and 1024 respectively.
- xl will use the libxl default for maptrack, but does its own default
  calculation for grant frames: either 32 or 64, based on the max
  possible mfn.

These defaults interact poorly with the hypervisor command-line limit:

- The hypervisor command-line limit cannot be used to raise the limit
  for all guests anymore, as the default in the toolstack will
  effectively override this.
- If you use the hypervisor command-line limit to *reduce* the limit,
  then the "default" values generated by the toolstack are too high,
  and all guest creations will fail.

In other words, the toolstack defaults require any change to be
effected by having the admin explicitly specify a new value in every
guest.

In order to address this, have grant_table_init treat negative values
for max_grant_frames and max_maptrack_frames as instructions to use the
system-wide default, and have all the above toolstacks default to passing
-1 unless a different value is explicitly configured.

This restores the old behavior in that changing the hypervisor command-line
option can change the behavior for all guests, while retaining the ability
to set per-guest values.  It also removes the bug that reducing the
system-wide max will cause all domains without explicit limits to fail.

NOTE: - The Ocaml bindings require the caller to always specify a value,
and the code to start a xenstored stubdomain hard-codes these to 4
and 128 respectively; this behavour will not be modified.

Signed-off-by: George Dunlap 
Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Anthony PERARD 
Cc: "Marek Marczykowski-Górecki" 
Cc: Volodymyr Babchuk 
Cc: "Roger Pau Monné" 

v3:
 - Make sure that specified values cannot be negative or overflow a
   signed int

v2:
 - re-worked George's original commit massage a little
 - fixed the text in xl.conf.5.pod
 - use -1 as the sentinel value for 'default' and < 0 for checking it
---
 docs/man/xl.conf.5.pod|  6 ++--
 tools/libxl/libxl.h   |  4 +--
 tools/libxl/libxl_types.idl   |  4 +--
 tools/libxl/libxlu_cfg.c  | 24 ++--
 tools/libxl/libxlutil.h   |  2 ++
 tools/python/xen/lowlevel/xc/xc.c |  4 +--
 tools/xl/xl.c | 15 --
 tools/xl/xl_parse.c   |  9 --
 xen/arch/arm/setup.c  |  2 +-
 xen/arch/x86/setup.c  |  4 +--
 xen/common/grant_table.c  | 46 ---
 xen/include/public/domctl.h   | 10 ---
 xen/include/xen/grant_table.h |  8 +++---
 13 files changed, 100 insertions(+), 38 deletions(-)

diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod
index 962144e38e..207ab3e77a 100644
--- a/docs/man/xl.conf.5.pod
+++ b/docs/man/xl.conf.5.pod
@@ -81,13 +81,15 @@ Default: C
 
 Sets the default value for the C domain config value.
 
-Default: C<32> on hosts up to 16TB of memory, C<64> on hosts larger than 16TB
+Default: value of Xen command line B parameter (or its
+default value if unspecified).
 
 =item B
 
 Sets the default value for the C domain config value.
 
-Default: C<1024>
+Default: value of Xen command line B
+parameter (or its default value if unspecified).
 
 =item B
 
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 49b56fa1a3..a2a5d321c5 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -364,8 +364,8 @@
  */
 #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1
 
-#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32
-#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024
+#define LIBXL_MAX_GRANT_FRAMES_DEFAULT -1
+#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT -1
 
 /*
  * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 0546d7865a..63e29bb2fb 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -511,8 +511,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
 
 ("vnuma_nodes", Array(libxl_vnode_info, "num_vnuma_nodes")),
 
-("max_grant_frames",uint32, {'init_val': 
'LIBXL_MAX_GRANT_FRAMES_DEFAULT'}),
-("max_maptrack_frames", uint32, {'init_val': 

[Xen-devel] [libvirt test] 144345: regressions - FAIL

2019-11-28 Thread osstest service owner
flight 144345 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144345/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-libvirt-qcow2 15 guest-start/debian.repeat fail REGR. vs. 
144304

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144304
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144304
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 12 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 13 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  83254ea750b476b5041c838a583d9ea3f632a6a2
baseline version:
 libvirt  9d6920bd7de3f92be1894790adeb689060ab25eb

Last test of basis   144304  2019-11-26 04:19:14 Z2 days
Failing since144318  2019-11-27 04:19:28 Z1 days2 attempts
Testing same since   144345  2019-11-28 04:19:33 Z0 days1 attempts


People who touched revisions under test:
  Daniel P. Berrangé 
  Michal Privoznik 
  Peter Krempa 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   fail
 test-armhf-armhf-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 

Re: [Xen-devel] [RFC v5 000/126] error: auto propagated local_err

2019-11-28 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> 28.11.2019 11:54, Markus Armbruster wrote:
>> Please accept my sincere apologies for taking so long to reply.  A few
>> thoughts before I dig deeper.
>> 
>> Vladimir Sementsov-Ogievskiy  writes:
>> 
>>> Hi all!
>>>
>>> At the request of Markus: full version of errp propagation. Let's look
>>> at it. Cover as much as possible, except inserting macro invocation
>>> where it's not necessary.
>>>
>>> It's huge, and so it's an RFC.
>> 
>> It's a monster.  Best to get it into full view before we commit to
>> fighting it.
>> 
>>> In v5 I've added a lot more preparation cleanups:
>>> 01-23 are preparation cleanups
>>>01: not changed, keep Eric's r-b
>>>02: improve commit msg [Markus], keep Eric's r-b
>>>03: changed, only error API here, drop r-b
>>> 24 is core macro
>>>- improve cover letter, wording and macro code style
>>>- keep Eric's r-b
>>> 25-26: automation scripts
>>> - commit-per-subsystem changed a lot. it's a draft, don't bother too
>>>   much with it
>>> - coccinelle: add support of error_propagate_prepend
>>>
>>> 27-126: generated patches
>> 
>> Splitting up the monster can make fighting it easier.
>> 
>> Your description suggests three high-level parts:
>> 
>> Part 1: Preparation (makes sense by itself)
>
> I already resent part 1 all patches (handling review comments) in separate as 
> v6.
> If it is convenient, I can resend them in one series as v7.

Recommend to await review.  The more we can merge without another
respin, the better.

>> Part 2: Error interface update (with rules what code should do now)
>
> Note, that patch 21 is actually from part2, not part1.
> So Part 2 is 21, 24, 25.

Thanks for the heads-up.

> So I wait for your comments and resend (if needed) as separate small series.
>
> And 26 is auto-patch-splitter, but we don't need it now, if we are going
> to start from several big subsystems.
>
>> Part 3: Make the code obey the new rules everywhere
>> 
>> I hope we can get part 1 out of the way quickly.  Diffstat:
>> 
>>   backends/cryptodev.c   |  11 +---
>>   block/nbd.c|  10 +--
>>   block/snapshot.c   |   4 +-
>>   dump/dump-hmp-cmds.c   |   4 +-
>>   hw/9pfs/9p-local.c |   4 +-
>>   hw/9pfs/9p-proxy.c |   5 +-
>>   hw/core/loader-fit.c   |   5 +-
>>   hw/core/machine-hmp-cmds.c |   6 +-
>>   hw/core/qdev.c |  28 
>>   hw/i386/amd_iommu.c|  14 ++--
>>   hw/ppc/spapr.c |   2 +-
>>   hw/s390x/event-facility.c  |   2 +-
>>   hw/s390x/s390-stattrib.c   |   3 +-
>>   hw/sd/sdhci.c  |   2 +-
>>   hw/tpm/tpm_emulator.c  |   8 +--
>>   hw/usb/dev-network.c   |   2 +-
>>   hw/vfio/ap.c   |  16 +
>>   include/block/snapshot.h   |   2 +-
>>   include/monitor/hmp.h  |   2 +-
>>   include/qapi/error.h   |  69 ++--
>>   include/qom/object.h   |   4 +-
>>   monitor/hmp-cmds.c | 155 
>> ++---
>>   monitor/qmp-cmds.c |   2 +-
>>   net/net.c  |  17 ++---
>>   qdev-monitor.c |  28 
>>   qga/commands-posix.c   |   2 +-
>>   qga/commands-win32.c   |   2 +-
>>   qga/commands.c |  12 ++--
>>   qom/qom-hmp-cmds.c |   4 +-
>>   target/ppc/kvm.c   |   6 +-
>>   target/ppc/kvm_ppc.h   |   4 +-
>>   ui/vnc.c   |  20 ++
>>   ui/vnc.h   |   2 +-
>>   util/error.c   |  30 -
>>   34 files changed, 261 insertions(+), 226 deletions(-)
>> 
>> At first glance, I can see bug fixes, non-mechanical cleanups, and
>> mechanical cleanups.
>> 
>> Within each of these three groups, we have related sub-groups.  For
>> instance, several patches clean up funny names for the common Error **
>> parameters.  Several more rename "uncommon" Error ** parameters, to
>> signal their uncommon role.  I doubt splitting up these subgroups of
>> related mechanical changes along subsystem lines is worthwhile.
>> 
>> Part 2 needs careful interface review.  Having part 3 ready helps there,
>> because we can see rather than guess how the interface changes play out.
>> We really want to get this part right from the start, because if we
>> don't, we get to do part 3 again.
>> 
>> Part 3 is what makes this a monster.  I understand it's mechanical.  We
>> can merge it incrementally, but we do want to merge it all, and sooner
>> rather than later, to avoid a mix of old and new error handling code.
>> Such mixes inevitably confuse developers, and lead to new instances of
>> the old patterns creeping in.
>> 
>> I do have doubts about your automated split.
>> 
>> I acknowledge maintainers of active subsystems may want to merge this on
>> their own terms, to minimize disruption.  Splitting off sub-monsters for
>> them makes sense.  Splitting off the long tail of less busy subsystems
>> not so much; it'll only drag out the merging.  Your list 

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Sergey Dyasli
On 27/11/2019 18:17, Andrew Cooper wrote:
> On 21/11/2019 08:34, Jan Beulich wrote:
>> On 20.11.2019 18:13, Andrew Cooper wrote:
>>> On 20/11/2019 16:40, Jürgen Groß wrote:
 On 20.11.19 17:30, Jan Beulich wrote:
> On 08.11.2019 12:18, Jan Beulich wrote:
>> The .file assembler directives generated by the compiler do not include
>> any path components (gcc) or just the ones specified on the command
>> line
>> (clang, at least version 5), and hence multiple identically named
>> source
>> files (in different directories) may produce identically named static
>> symbols (in their kallsyms representation). The binary diffing
>> algorithm
>> used by xen-livepatch, however, depends on having unique symbols.
>>
>> Make the ENFORCE_UNIQUE_SYMBOLS Kconfig option control the (build)
>> behavior, and if enabled use objcopy to prepend the (relative to the
>> xen/ subdirectory) path to the compiler invoked STT_FILE symbols. Note
>> that this build option is made no longer depend on LIVEPATCH, but
>> merely
>> defaults to its setting now.
>>
>> Conditionalize explicit .file directive insertion in C files where it
>> exists just to disambiguate names in a less generic manner; note that
>> at the same time the redundant emission of STT_FILE symbols gets
>> suppressed for clang. Assembler files as well as multiply compiled C
>> ones using __OBJECT_FILE__ are left alone for the time being.
>>
>> Since we now expect there not to be any duplicates anymore, also don't
>> force the selection of the option to 'n' anymore in allrandom.config.
>> Similarly COVERAGE no longer suppresses duplicate symbol warnings if
>> enforcement is in effect, which in turn allows
>> SUPPRESS_DUPLICATE_SYMBOL_WARNINGS to simply depend on
>> !ENFORCE_UNIQUE_SYMBOLS.
>>
>> Signed-off-by: Jan Beulich 
> I've got acks from Konrad and Wei, but still need an x86 and a release
> one here. Andrew? Or alternatively - Jürgen, would you rather not see
> this go in anymore?
 In case the needed x86 Ack is coming in before RC3 I'm fine to give my
 Release-ack, but I'm hesitant to take it later.
>>> Has anyone actually tried building a livepatch with this change in place?
>> Actually - what is your concern here? The exact spelling of symbols
>> names should be of no interest to the tool. After all the compiler is
>> free to invent all sorts of names for its local symbols, including
>> the ones we would produce with this change in place. All the tool
>> cares about is that the names be unambiguous. Hence any (theoretical)
>> regression here would be a bug in the tools, which imo is no reason
>> to delay this change any further. (Granted I should have got to it
>> earlier, but it had been continuing to get deferred.)
> 
> This might all be true (theoretically).
> 
> The livepatch build tools are fragile and very sensitive to how the
> object files are laid out.  There is a very real risk that this change
> accidentally breaks livepatching totally, even on GCC builds.
> 
> Were this to happen, it would be yet another 4.13 regression.
> 
> This is a change to fix a concrete livepatch issue with Clang.  Sure -
> it resolves the symbol uniqueness failures for the in-tree build, but
> considering the risks to the area you are modifying, the fact you
> haven't even done a dev test of a livepatch build on GCC means that the
> patch as a whole has not had what I would consider a reasonable amount
> of testing.
> 
> Luckily for you, Ross and Sergey have agreed to smoke test this with
> some livepatches.  They will report on this thread with their findings.

Applying the patch didn't end up well for my test LP (from another thread):

Perform full initial build with 8 CPU(s)...
Reading special section data
Apply patch and build with 8 CPU(s)...
Unapply patch and build with 8 CPU(s)...
Extracting new and modified ELF sections...
Processing xen/arch/x86/mm/shadow/guest_2.o
Processing xen/arch/x86/mm/shadow/guest_4.o
Processing xen/arch/x86/mm/shadow/guest_3.o
Processing xen/arch/x86/mm/guest_walk_3.o
Processing xen/arch/x86/mm/hap/guest_walk_3level.o
Processing xen/arch/x86/mm/hap/guest_walk_4level.o
Processing xen/arch/x86/mm/hap/guest_walk_2level.o
Processing xen/arch/x86/mm/guest_walk_2.o
Processing xen/arch/x86/mm/guest_walk_4.o
Processing xen/arch/x86/efi/efi/check.o
Processing xen/arch/x86/pv/gpr_switch.o
Processing xen/arch/x86/indirect-thunk.o
Processing xen/arch/x86/boot/head.o
Processing xen/arch/x86/x86_64/kexec_reloc.o
Processing xen/arch/x86/x86_64/compat/entry.o
Processing xen/arch/x86/x86_64/entry.o
Processing xen/arch/x86/hvm/vmx/entry.o
Processing xen/arch/x86/hvm/svm/entry.o
Processing 
xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o
Processing 
xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o
Processing 

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Andrew Cooper
On 28/11/2019 11:03, Jan Beulich wrote:
> Notes:
>
> - In principle we could get away without the check_eoi_deferral flag.
>   I've introduced it just to make sure there's as little change as
>   possible to unaffected paths.
> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
>   strictly necessary.

I don't think the cpu_has_pending_apic_eoi() check is necessary.  It is
checked at the head of end_nonmaskable_irq() as well.

Similarly, I'm not sure that check_eoi_deferral is something that we'd
want to introduce.

I'll drop both of these and test, seeing as I have a repro of the problem.

> - The new function's name isn't very helpful with its use in
>   end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to
>   parallel ack_APIC_irq()), but then liked this even less.

I don't have a better suggestion.

> - Other than I first thought serial console IRQs shouldn't be
>   affected, as we run them on specifically allocated high priority
>   vectors.

Good, although I don't think this wants to find its way into the final
commit message.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses

2019-11-28 Thread Andrew Cooper
c/s d0a699a389f1 "x86/monitor: add support for descriptor access events"
introduced logic looking for what appeared to be exitinfo (not that this
exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring
information.  There is never any IDT vectoring involved in these intercepts so
the value passed is always zero.

In fact, SVM doesn't provide any information, even in exitinfo1 and 2.  Note
the error in the public API and state that this field is always 0, and drop
the SVM logic in hvm_monitor_descriptor_access().

In the SVM vmexit handler itself, optimise the switch statement by observing
that there is a linear transformation between the SVM exit_reason and
VM_EVENT_DESC_* values.  (Bloat-o-meter reports 6028 => 5877 for a saving of
151 bytes).

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Razvan Cojocaru 
CC: Tamas K Lengyel 
CC: Alexandru Isaila 
CC: Petre Pircalabu 
CC: Adrian Pop 

Adrian: Do you recall what information you were attempting to forward from the
VMCB?  I can't locate anything which would plausibly be interesting.

This is part of a longer cleanup series I gathered in the wake of the task
switch issues, but it is pulled out ahead due to its interaction with the
public interface.
---
 xen/arch/x86/hvm/monitor.c|  4 
 xen/arch/x86/hvm/svm/svm.c| 37 +
 xen/include/public/vm_event.h |  4 ++--
 3 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c
index 7fb1e2c04e..1f23fe25e8 100644
--- a/xen/arch/x86/hvm/monitor.c
+++ b/xen/arch/x86/hvm/monitor.c
@@ -113,10 +113,6 @@ void hvm_monitor_descriptor_access(uint64_t exit_info,
 req.u.desc_access.arch.vmx.instr_info = exit_info;
 req.u.desc_access.arch.vmx.exit_qualification = vmx_exit_qualification;
 }
-else
-{
-req.u.desc_access.arch.svm.exitinfo = exit_info;
-}
 
 monitor_traps(current, true, );
 }
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 0fb1908c18..776cf11459 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2980,29 +2980,26 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
 svm_vmexit_do_pause(regs);
 break;
 
-case VMEXIT_IDTR_READ:
-case VMEXIT_IDTR_WRITE:
-hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0,
-VM_EVENT_DESC_IDTR, exit_reason == VMEXIT_IDTR_WRITE);
-break;
-
-case VMEXIT_GDTR_READ:
-case VMEXIT_GDTR_WRITE:
-hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0,
-VM_EVENT_DESC_GDTR, exit_reason == VMEXIT_GDTR_WRITE);
-break;
+case VMEXIT_IDTR_READ ... VMEXIT_TR_WRITE:
+{
+/*
+ * Consecutive block of 8 exit codes (sadly not aligned).  Top bit
+ * indicates write (vs read), bottom 2 bits map linearly to
+ * VM_EVENT_DESC_* values.
+ */
+#define E2D(e)  e) - VMEXIT_IDTR_READ) & 3) + 1)
+bool write = ((exit_reason - VMEXIT_IDTR_READ) & 4);
+unsigned int desc = E2D(exit_reason);
 
-case VMEXIT_LDTR_READ:
-case VMEXIT_LDTR_WRITE:
-hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0,
-VM_EVENT_DESC_LDTR, exit_reason == VMEXIT_LDTR_WRITE);
-break;
+BUILD_BUG_ON(E2D(VMEXIT_IDTR_READ) != VM_EVENT_DESC_IDTR);
+BUILD_BUG_ON(E2D(VMEXIT_GDTR_READ) != VM_EVENT_DESC_GDTR);
+BUILD_BUG_ON(E2D(VMEXIT_LDTR_READ) != VM_EVENT_DESC_LDTR);
+BUILD_BUG_ON(E2D(VMEXIT_TR_READ)   != VM_EVENT_DESC_TR);
+#undef E2D
 
-case VMEXIT_TR_READ:
-case VMEXIT_TR_WRITE:
-hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0,
-VM_EVENT_DESC_TR, exit_reason == VMEXIT_TR_WRITE);
+hvm_descriptor_access_intercept(0, 0, desc, write);
 break;
+}
 
 default:
 unexpected_exit_type:
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index 959083d8c4..d1b5c95f72 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -302,8 +302,8 @@ struct vm_event_desc_access {
 uint64_t exit_qualification; /* VMX: VMCS Exit Qualification */
 } vmx;
 struct {
-uint64_t exitinfo;   /* SVM: VMCB EXITINFO */
-uint64_t _pad2;
+uint64_t exitinfo;   /* SVM: Always 0.  This field made */
+uint64_t _pad2;  /* its way into the API by error.  */
 } svm;
 } arch;
 uint8_t descriptor;  /* VM_EVENT_DESC_* */
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Roger Pau Monné
On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote:
> At the time the pending EOI stack was introduced there were no
> internally used IRQs which would have the LAPIC EOI issued from the
> ->end() hook. This had then changed with the introduction of IOMMUs,
> but the interaction issue was presumably masked by
> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
> running without need"]).
> 
> The problem is that with us re-enabling interrupts across handler
> invocation, a higher priority (guest) interrupt may trigger while
> handling a lower priority (internal) one. The EOI issued from
> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
> EOI the higher priority (guest) interrupt, breaking (among other
> things) pending EOI stack logic's assumptions.

Maybe there's something that I'm missing, but shouldn't hypervisor
vectors always be higher priority than guest ones?

I see there's already a range reserved for high priority vectors
({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor
interrupts not using this range?

IMO it seems troublesome that pending guests vectors can delay the
injection of hypervisor interrupt vectors.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jürgen Groß

On 28.11.19 12:19, Jan Beulich wrote:

On 28.11.2019 12:03, Jan Beulich wrote:

At the time the pending EOI stack was introduced there were no
internally used IRQs which would have the LAPIC EOI issued from the
->end() hook. This had then changed with the introduction of IOMMUs,
but the interaction issue was presumably masked by
irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
(which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
running without need"]).

The problem is that with us re-enabling interrupts across handler
invocation, a higher priority (guest) interrupt may trigger while
handling a lower priority (internal) one. The EOI issued from
->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
EOI the higher priority (guest) interrupt, breaking (among other
things) pending EOI stack logic's assumptions.

Notes:

- In principle we could get away without the check_eoi_deferral flag.
   I've introduced it just to make sure there's as little change as
   possible to unaffected paths.
- Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
   strictly necessary.
- The new function's name isn't very helpful with its use in
   end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to
   parallel ack_APIC_irq()), but then liked this even less.
- Other than I first thought serial console IRQs shouldn't be
   affected, as we run them on specifically allocated high priority
   vectors.

Reported-by: Igor Druzhinin 
Diagnosed-by: Andrew Cooper 
Signed-off-by: Jan Beulich 


In case it's not explicit enough from the description: While the
issue appears to be long standing, it looks to have been masked
by other shortcomings prior to 4.13. Therefore this should be
considered at least a perceived regression, even if it may not
strictly be one.


Assuming an Ack: in a timely manner:

Release-acked-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] psr: fix bug which may cause crash

2019-11-28 Thread Jan Beulich
On 28.11.2019 11:18, Yi Sun wrote:
> --- a/xen/arch/x86/psr.c
> +++ b/xen/arch/x86/psr.c
> @@ -1271,7 +1271,8 @@ static void do_write_psr_msrs(void *data)
>  
>  for ( j = 0; j < cos_num; j++, index++ )
>  {
> -if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] )
> +if ( cos <= feat->cos_max &&
> + feat->cos_reg_val[cos * cos_num + j] != info->val[index] )

The description is indeed much better now, thanks. However,
as indicated in reply to v1, this extra (and at the first glance
unmotivated) bounds check wants to be accompanied by a brief but
precise comment. Furthermore with the loop bounded by a local
variable, why not

cos_num = min(props->cos_num, feat->cos_max + 1);

a few lines up from here (again suitable commented)?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86 / iommu: set up a scratch page in the quarantine domain

2019-11-28 Thread Jürgen Groß

On 28.11.19 12:17, Jan Beulich wrote:

On 27.11.2019 18:11, Paul Durrant wrote:

This patch introduces a new iommu_op to facilitate a per-implementation
quarantine set up, and then further code for x86 implementations
(amd and vtd) to set up a read-only scratch page to serve as the source
for DMA reads whilst a device is assigned to dom_io. DMA writes will
continue to fault as before.

The reason for doing this is that some hardware may continue to re-try
DMA (despite FLR) in the event of an error, or even BME being cleared, and
will fail to deal with DMA read faults gracefully. Having a scratch page
mapped will allow pending DMA reads to complete and thus such buggy
hardware will eventually be quiesced.

NOTE: These modifications are restricted to x86 implementations only as
   the buggy h/w I am aware of is only used with Xen in an x86
   environment. ARM may require similar code but, since I am not
   aware of the need, this patch does not modify any ARM implementation.

Signed-off-by: Paul Durrant 


Reviewed-by: Jan Beulich 


There is still the open question of whether use of a scratch page ought
to be gated on something, either are run-time or compile-time.


I have no clear opinion either way here. The workaround seems low
overhead enough that there may not be a need to have an admin (or
build time) control for this.

As to 4.13: The quarantining as a whole is pretty fresh. While it
has been backported to security maintained trees, I'd still consider
it a new feature in 4.13, and hence this workaround at least eligible
for consideration.


I agree.

Release-acked-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
On 28.11.2019 12:03, Jan Beulich wrote:
> At the time the pending EOI stack was introduced there were no
> internally used IRQs which would have the LAPIC EOI issued from the
> ->end() hook. This had then changed with the introduction of IOMMUs,
> but the interaction issue was presumably masked by
> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
> running without need"]).
> 
> The problem is that with us re-enabling interrupts across handler
> invocation, a higher priority (guest) interrupt may trigger while
> handling a lower priority (internal) one. The EOI issued from
> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
> EOI the higher priority (guest) interrupt, breaking (among other
> things) pending EOI stack logic's assumptions.
> 
> Notes:
> 
> - In principle we could get away without the check_eoi_deferral flag.
>   I've introduced it just to make sure there's as little change as
>   possible to unaffected paths.
> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
>   strictly necessary.
> - The new function's name isn't very helpful with its use in
>   end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to
>   parallel ack_APIC_irq()), but then liked this even less.
> - Other than I first thought serial console IRQs shouldn't be
>   affected, as we run them on specifically allocated high priority
>   vectors.
> 
> Reported-by: Igor Druzhinin 
> Diagnosed-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

In case it's not explicit enough from the description: While the
issue appears to be long standing, it looks to have been masked
by other shortcomings prior to 4.13. Therefore this should be
considered at least a perceived regression, even if it may not
strictly be one.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86 / iommu: set up a scratch page in the quarantine domain

2019-11-28 Thread Jan Beulich
On 27.11.2019 18:11, Paul Durrant wrote:
> This patch introduces a new iommu_op to facilitate a per-implementation
> quarantine set up, and then further code for x86 implementations
> (amd and vtd) to set up a read-only scratch page to serve as the source
> for DMA reads whilst a device is assigned to dom_io. DMA writes will
> continue to fault as before.
> 
> The reason for doing this is that some hardware may continue to re-try
> DMA (despite FLR) in the event of an error, or even BME being cleared, and
> will fail to deal with DMA read faults gracefully. Having a scratch page
> mapped will allow pending DMA reads to complete and thus such buggy
> hardware will eventually be quiesced.
> 
> NOTE: These modifications are restricted to x86 implementations only as
>   the buggy h/w I am aware of is only used with Xen in an x86
>   environment. ARM may require similar code but, since I am not
>   aware of the need, this patch does not modify any ARM implementation.
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Jan Beulich 

> There is still the open question of whether use of a scratch page ought
> to be gated on something, either are run-time or compile-time.

I have no clear opinion either way here. The workaround seems low
overhead enough that there may not be a need to have an admin (or
build time) control for this.

As to 4.13: The quarantining as a whole is pretty fresh. While it
has been backported to security maintained trees, I'd still consider
it a new feature in 4.13, and hence this workaround at least eligible
for consideration.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack

2019-11-28 Thread Jan Beulich
At the time the pending EOI stack was introduced there were no
internally used IRQs which would have the LAPIC EOI issued from the
->end() hook. This had then changed with the introduction of IOMMUs,
but the interaction issue was presumably masked by
irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early
(which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer
running without need"]).

The problem is that with us re-enabling interrupts across handler
invocation, a higher priority (guest) interrupt may trigger while
handling a lower priority (internal) one. The EOI issued from
->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly
EOI the higher priority (guest) interrupt, breaking (among other
things) pending EOI stack logic's assumptions.

Notes:

- In principle we could get away without the check_eoi_deferral flag.
  I've introduced it just to make sure there's as little change as
  possible to unaffected paths.
- Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't
  strictly necessary.
- The new function's name isn't very helpful with its use in
  end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to
  parallel ack_APIC_irq()), but then liked this even less.
- Other than I first thought serial console IRQs shouldn't be
  affected, as we run them on specifically allocated high priority
  vectors.

Reported-by: Igor Druzhinin 
Diagnosed-by: Andrew Cooper 
Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1734,7 +1734,7 @@ static void end_level_ioapic_irq_new(str
 
 v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
 
-ack_APIC_irq();
+end_nonmaskable_irq(desc, vector);
 
 if ( (desc->status & IRQ_MOVE_PENDING) &&
  !io_apic_level_ack_pending(desc->irq) )
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -438,6 +438,7 @@ int __init init_irq_data(void)
 }
 
 static void __do_IRQ_guest(int vector);
+static void flush_ready_eoi(void);
 
 static void ack_none(struct irq_desc *desc)
 {
@@ -865,6 +866,7 @@ void pirq_set_affinity(struct domain *d,
 }
 
 DEFINE_PER_CPU(unsigned int, irq_count);
+static DEFINE_PER_CPU(bool, check_eoi_deferral);
 
 uint8_t alloc_hipriority_vector(void)
 {
@@ -1008,7 +1010,25 @@ void do_IRQ(struct cpu_user_regs *regs)
 
  out:
 if ( desc->handler->end )
+{
+/*
+ * If higher priority vectors still have their EOIs pending, we may
+ * not issue an EOI here, as this would EOI the highest priority one.
+ */
+if ( cpu_has_pending_apic_eoi() )
+{
+this_cpu(check_eoi_deferral) = true;
+desc->handler->end(desc, vector);
+this_cpu(check_eoi_deferral) = false;
+
+spin_unlock(>lock);
+flush_ready_eoi();
+goto out_no_unlock;
+}
+
 desc->handler->end(desc, vector);
+}
+
  out_no_end:
 spin_unlock(>lock);
  out_no_unlock:
@@ -1163,6 +1183,29 @@ bool cpu_has_pending_apic_eoi(void)
 return pending_eoi_sp(this_cpu(pending_eoi)) != 0;
 }
 
+void end_nonmaskable_irq(struct irq_desc *desc, uint8_t vector)
+{
+struct pending_eoi *peoi = this_cpu(pending_eoi);
+unsigned int sp = pending_eoi_sp(peoi);
+
+if ( !this_cpu(check_eoi_deferral) || !sp || peoi[sp - 1].vector < vector )
+{
+ack_APIC_irq();
+return;
+}
+
+/* Defer this vector's EOI until all higher ones have been EOI-ed. */
+pending_eoi_sp(peoi) = sp + 1;
+do {
+peoi[sp] = peoi[sp - 1];
+} while ( --sp && peoi[sp - 1].vector > vector );
+ASSERT(!sp || peoi[sp - 1].vector < vector);
+
+peoi[sp].irq = desc->irq;
+peoi[sp].vector = vector;
+peoi[sp].ready = 1;
+}
+
 static inline void set_pirq_eoi(struct domain *d, unsigned int irq)
 {
 if ( d->arch.pirq_eoi_map )
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -512,11 +512,6 @@ static void ack_maskable_msi_irq(struct
 ack_APIC_irq(); /* ACKTYPE_NONE */
 }
 
-void end_nonmaskable_msi_irq(struct irq_desc *desc, u8 vector)
-{
-ack_APIC_irq(); /* ACKTYPE_EOI */
-}
-
 /*
  * IRQ chip for MSI PCI/PCI-X/PCI-Express devices,
  * which implement the MSI or MSI-X capability structure.
@@ -539,7 +534,7 @@ static hw_irq_controller pci_msi_nonmask
 .enable   = irq_enable_none,
 .disable  = irq_disable_none,
 .ack  = ack_nonmaskable_msi_irq,
-.end  = end_nonmaskable_msi_irq,
+.end  = end_nonmaskable_irq,
 .set_affinity = set_msi_affinity
 };
 
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -427,7 +427,7 @@ static unsigned int iommu_msi_startup(st
 static void iommu_msi_end(struct irq_desc *desc, u8 vector)
 {
 iommu_msi_unmask(desc);
-ack_APIC_irq();
+end_nonmaskable_irq(desc, vector);
 }
 
 
@@ -460,7 +460,7 @@ static void iommu_maskable_msi_shutdown(
  * maskable flavors here, as we want the ACK to be issued in ->end().
  */
 #define 

Re: [Xen-devel] [Xen-users] 4.13RC3 and PVHVM makes drive drops just after boot

2019-11-28 Thread Anthony PERARD
On Wed, Nov 27, 2019 at 10:54:42PM +1000, Andrew wrote:
> I have the system setup for testing, so happy to test any patches that may
> come out.

Thanks Andrew for the report, I think I understand the issue. I'll work
on a fix.

Cheers,

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/arm: include xen-tools/libs.h from libxl_arm_acpi.c

2019-11-28 Thread Wei Liu
On Thu, Nov 28, 2019 at 11:30:34AM +0100, Jürgen Groß wrote:
> On 28.11.19 11:15, Wei Liu wrote:
> > On Wed, Nov 27, 2019 at 06:24:58PM -0800, Stefano Stabellini wrote:
> > > libxl_arm_acpi.c is using BUILD_BUG_ON but it is not including
> > > xen-tools/libs.h that defines it.
> > > 
> > > Signed-off-by: Stefano Stabellini 
> > 
> > Acked-by: Wei Liu 
> > 
> > Juergen, this is a trivial patch. I think it can go in 4.13.
> 
> Why is this patch needed?
> 
> tools/libxl/libxl_arm_acpi.c includes libxl_arm.h, which includes
> libxl_internal.h, which includes xen-tools/libs.h.

Oh I missed that.

In that case I don't think this patch is required for 4.13.

Stefano, did you see a build error or something?

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Jan Beulich
On 28.11.2019 11:26,  Durrant, Paul  wrote:
>> From: George Dunlap 
>> Sent: 27 November 2019 16:52
>>
>> On 11/27/19 4:43 PM, Durrant, Paul wrote:
 From: George Dunlap 
 Sent: 27 November 2019 16:34

 TBH I'd be tempted to define XENSOMETHING_MAX_DEFAULT as (unsigned
 long)(-1) or something, and explicitly compare to that.  That leaves
 open the possibility of having more sentinel values if we decided we
 wanted them.
>>>
>>> I'm extremely confused now. What do you want me to compare and where?
>>>
>>> I assume we're talking about the opt_XXX values. Am I supposed to stop
>>> INT_MAX being assigned to them? Or should I define local unsigned values
>> for max_maptrack/grant_frames and simply initialize them to the passed-in
>> arg (if >= 0) or the opt_XXX value otherwise.
>>
>> In this version of the patch, you change the domctl arguments from
>> uint32_t to int32_t.  I would leave them uint32_t, and if (
>> max_grant_frames == XENSOMETHING_MAX_DEFAULT ) max_grant_frames = opt_
>>
>> Then the only invalid value we have to worry about is checking for
>> XENSOMETHING_MAX_DEFAULT.
>>
>> This is a suggestion, and I wouldn't argue strongly if someone thought
>> it was a bad idea, but it seems like the most straightforward option to
>> me.
> 
> AFAICT the definition of that invalid value is going to be needed by both
> the grant table code and the user-space toolstack code so I guess the
> logical place for the definition would be a tools-only section of the
> public grant table header? TBH I prefer the idea of any negative value
> being default though.

FWIW I agree, as I can't really see what other purposes we might need
sentinel values for down the road.

> As long as the xl/libxl parts don't allow a *specified* value > INT_MAX
> then that should be fine, although for the full story a custom parser
> for the command line values should also be added to ensure the same
> semantics there.

Right. While going a little far, I can't right now see easy alternatives
to a custom parser.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/arm: include xen-tools/libs.h from libxl_arm_acpi.c

2019-11-28 Thread Jürgen Groß

On 28.11.19 11:15, Wei Liu wrote:

On Wed, Nov 27, 2019 at 06:24:58PM -0800, Stefano Stabellini wrote:

libxl_arm_acpi.c is using BUILD_BUG_ON but it is not including
xen-tools/libs.h that defines it.

Signed-off-by: Stefano Stabellini 


Acked-by: Wei Liu 

Juergen, this is a trivial patch. I think it can go in 4.13.


Why is this patch needed?

tools/libxl/libxl_arm_acpi.c includes libxl_arm.h, which includes
libxl_internal.h, which includes xen-tools/libs.h.

So this is a purely cosmetic patch, especially as libxl_arm.h and
libxl_internal.h are libxl-internal includes, so there is a very low
risk for the include of xen-tools/libs.h suddenly disappearing, and
even it would due to a patch of one of those include files, it would
be detected by a failing build immediately.

So I won't take it for 4.13, even if being trivial.


Juergen



Wei.


---
  tools/libxl/libxl_arm_acpi.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tools/libxl/libxl_arm_acpi.c b/tools/libxl/libxl_arm_acpi.c
index ba874c3d32..52c476ff65 100644
--- a/tools/libxl/libxl_arm_acpi.c
+++ b/tools/libxl/libxl_arm_acpi.c
@@ -19,6 +19,7 @@
  #include "libxl_arm.h"
  
  #include 

+#include 
  
  /* Below typedefs are useful for the headers under acpi/ */

  typedef uint8_t u8;
--
2.17.1




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-28 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 28 November 2019 10:23
> To: Durrant, Paul ; Boris Ostrovsky
> 
> Cc: xen-devel@lists.xenproject.org; Grall, Julien ;
> Andrew Cooper ; Roger Pau Monné
> ; Jun Nakajima ; Kevin Tian
> ; Wei Liu 
> Subject: Re: [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the
> domain is destroyed
> 
> On 28.11.2019 10:38, Paul Durrant wrote:
> > From: Julien Grall 
> >
> > A guest will setup a shared page with the hypervisor for each vCPU via
> > XENPMU_init. The page will then get mapped in the hypervisor and only
> > released when XENPMU_finish is called.
> >
> > This means that if the guest fails to invoke XENPMU_finish, e.g if it is
> > destroyed rather than cleanly shut down, the page will stay mapped in
> the
> > hypervisor. One of the consequences is the domain can never be fully
> > destroyed as a page reference is still held.
> >
> > As Xen should never rely on the guest to correctly clean-up any
> > allocation in the hypervisor, we should also unmap such pages during the
> > domain destruction if there are any left.
> >
> > We can re-use the same logic as in pvpmu_finish(). To avoid
> > duplication, move the logic in a new function that can also be called
> > from vpmu_destroy().
> >
> > NOTE: - The call to vpmu_destroy() must also be moved from
> > arch_vcpu_destroy() into domain_relinquish_resources() such that
> > the reference on the mapped page does not prevent
> domain_destroy()
> > (which calls arch_vcpu_destroy()) from being called.
> >   - Whilst it appears that vpmu_arch_destroy() is idempotent it is
> > by no means obvious. Hence make sure the VPMU_CONTEXT_ALLOCATED
> > flag is cleared at the end of vpmu_arch_destroy().
> >   - This is not an XSA because vPMU is not security supported (see
> > XSA-163).
> >
> > Signed-off-by: Julien Grall 
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Jan Beulich 
> > Cc: Andrew Cooper 
> > Cc: Wei Liu 
> > Cc: "Roger Pau Monné" 
> > Cc: Jun Nakajima 
> > Cc: Kevin Tian 
> >
> > v2:
> >  - Re-word commit comment slightly
> >  - Re-enforce idempotency of vmpu_arch_destroy()
> >  - Move invocation of vpmu_destroy() earlier in
> >domain_relinquish_resources()
> 
> What about v3?

Oh, sorry:

v3:
 - Add comment regarding XSA-163
 - Revert changes setting VPMU_CONTEXT_ALLOCATED in common code

  Paul

> 
> > --- a/xen/arch/x86/cpu/vpmu.c
> > +++ b/xen/arch/x86/cpu/vpmu.c
> > @@ -576,11 +576,36 @@ static void vpmu_arch_destroy(struct vcpu *v)
> >
> >   vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
> >  }
> > +
> > +vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
> >  }
> 
> Boris, to be on the safe side - are you in agreement with this
> change, now that the setting of the flag is being left untouched?
> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Durrant, Paul
> -Original Message-
> From: George Dunlap 
> Sent: 27 November 2019 16:52
> To: Durrant, Paul ; Jan Beulich 
> Cc: AndrewCooper ; Anthony PERARD
> ; Roger Pau Monné ;
> Volodymyr Babchuk ; George Dunlap
> ; Ian Jackson ;
> Marek Marczykowski-Górecki ; Stefano
> Stabellini ; xen-devel@lists.xenproject.org;
> Konrad Rzeszutek Wilk ; Julien Grall
> ; Wei Liu 
> Subject: Re: [PATCH v2] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On 11/27/19 4:43 PM, Durrant, Paul wrote:
> >> -Original Message-
> >> From: George Dunlap 
> >> Sent: 27 November 2019 16:34
> >> To: Jan Beulich ; Durrant, Paul
> 
> >> Cc: AndrewCooper ; Anthony PERARD
> >> ; Roger Pau Monné ;
> >> Volodymyr Babchuk ; George Dunlap
> >> ; Ian Jackson ;
> >> Marek Marczykowski-Górecki ; Stefano
> >> Stabellini ; xen-devel@lists.xenproject.org;
> >> Konrad Rzeszutek Wilk ; Julien Grall
> >> ; Wei Liu 
> >> Subject: Re: [PATCH v2] Rationalize max_grant_frames and
> >> max_maptrack_frames handling
> >>
> >> On 11/27/19 4:20 PM, Jan Beulich wrote:
> >>> On 27.11.2019 17:14,  Durrant, Paul  wrote:
> > From: Jan Beulich 
> > Sent: 27 November 2019 15:56
> >
> > On 27.11.2019 15:37, Paul Durrant wrote:
> >> --- a/xen/arch/arm/setup.c
> >> +++ b/xen/arch/arm/setup.c
> >> @@ -789,7 +789,7 @@ void __init start_xen(unsigned long
> > boot_phys_offset,
> >>  .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >>  .max_evtchn_port = -1,
> >>  .max_grant_frames = gnttab_dom0_frames(),
> >> -.max_maptrack_frames = opt_max_maptrack_frames,
> >> +.max_maptrack_frames = -1,
> >>  };
> >>  int rc;
> >>
> >> --- a/xen/arch/x86/setup.c
> >> +++ b/xen/arch/x86/setup.c
> >> @@ -697,8 +697,8 @@ void __init noreturn __start_xen(unsigned long
> > mbi_p)
> >>  struct xen_domctl_createdomain dom0_cfg = {
> >>  .flags = IS_ENABLED(CONFIG_TBOOT) ?
> >> XEN_DOMCTL_CDF_s3_integrity
> > : 0,
> >>  .max_evtchn_port = -1,
> >> -.max_grant_frames = opt_max_grant_frames,
> >> -.max_maptrack_frames = opt_max_maptrack_frames,
> >> +.max_grant_frames = -1,
> >> +.max_maptrack_frames = -1,
> >>  };
> >
> > With these there's no need anymore for opt_max_maptrack_frames to
> > be non-static. Sadly Arm still wants opt_max_grant_frames
> > accessible in gnttab_dom0_frames().
> 
>  Yes, I was about to make them static until I saw what the ARM code
> did.
> >>>
> >>> But the one that Arm doesn't need should become static now.
> >>>
> >> --- a/xen/common/grant_table.c
> >> +++ b/xen/common/grant_table.c
> >> @@ -1837,12 +1837,18 @@ active_alloc_failed:
> >>  return -ENOMEM;
> >>  }
> >>
> >> -int grant_table_init(struct domain *d, unsigned int
> >> max_grant_frames,
> >> - unsigned int max_maptrack_frames)
> >> +int grant_table_init(struct domain *d, int max_grant_frames,
> >> + int max_maptrack_frames)
> >>  {
> >>  struct grant_table *gt;
> >>  int ret = -ENOMEM;
> >>
> >> +/* Default to maximum value if no value was specified */
> >> +if ( max_grant_frames < 0 )
> >> +max_grant_frames = opt_max_grant_frames;
> >> +if ( max_maptrack_frames < 0 )
> >> +max_maptrack_frames = opt_max_maptrack_frames;
> >> +
> >>  if ( max_grant_frames < INITIAL_NR_GRANT_FRAMES ||
> >
> > I take it we don't expect people to specify 2^^31 or more
> > frames for either option. It looks like almost everything
> > here would cope, except for this very comparison. Nevertheless
> > I wonder whether you wouldn't better confine both values to
> > [0, INT_MAX] now, including when adjusted at runtime.
> 
>  I can certainly remove the 'U' from the definition of
>  INITIAL_NR_GRANT_FRAMES,
> >>>
> >>> Oh, I didn't pay attention that is has a U on it - in this case
> >>> the comparison above is fine.
> >>>
>  but do you want me to make opt_max_grant_frames and
>  opt_max_maptrack_frames into signed ints and add signed parser
>  code too?
> >>>
> >>> Definitely not. They should remain unsigned quantities, but their
> >>> values may need sanity checking now.
> >>>
>  I also don't understand the 'adjusted at runtime' part.
> >>>
> >>> Well, for a command line drive value you could adjust an out of
> >>> bounds value in some __init function. But for runtime modifiable
> >>> settings you won't get away this easily.
> >>
> >> TBH I'd be tempted to define XENSOMETHING_MAX_DEFAULT as (unsigned
> >> long)(-1) or something, and explicitly compare to that.  That leaves
> >> open the possibility of having more sentinel values if we decided we
> >> wanted them.
> >
> > I'm extremely confused now. What do you want me to compare and where?
> >
> > 

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Jürgen Groß

On 28.11.19 11:17, George Dunlap wrote:



On Nov 28, 2019, at 9:55 AM, Jan Beulich  wrote:

Has anyone actually tried building a livepatch with this change in place?

Actually - what is your concern here? The exact spelling of symbols
names should be of no interest to the tool. After all the compiler is
free to invent all sorts of names for its local symbols, including
the ones we would produce with this change in place. All the tool
cares about is that the names be unambiguous. Hence any (theoretical)
regression here would be a bug in the tools, which imo is no reason
to delay this change any further. (Granted I should have got to it
earlier, but it had been continuing to get deferred.)


This might all be true (theoretically).

The livepatch build tools are fragile and very sensitive to how the
object files are laid out.  There is a very real risk that this change
accidentally breaks livepatching totally, even on GCC builds.

Were this to happen, it would be yet another 4.13 regression.


It's perhaps a matter of perception, but I'd still call this a
live patching tools bug then, not a 4.13 regression.


After the discussion yesterday, I was thinking a bit more about this, and I’m 
not happy with the principle Andy seems to be operating on, that it’s 
reasonable to completely block a bug-fixing patch to Xen, forcing a work-around 
to be used in a release, because it has unknown effects on livepatching.

Consider the relationship between Xen and Linux, for example.  Suppose that a 
patch were posted to Linux to fix an issue, and Juergen responded by saying 
that he wasn’t happy with it because it  might possibly break things running 
under Xen.  But he didn’t actually test it himself, nor propose some alternate 
way of fixing the original problem; rather, he expected the original patch 
submitter, who doesn’t use Xen, to set up a Xen system and test it themselves 
before accepting it.

Do you think anyone in the kernel community would stand for that?  Of course 
not.  Naturally the patch would be *paused* while *people in the Xen community* 
tested and or proposed alternate solutions; but if there was a delay, 
eventually it would be checked in.

I think the same principle should apply here.  If people using the livepatch 
code are afraid that Jan’s patch *may* affect livepatching on gcc, then they 
should be given time to review, test, and/or propose alternate solutions.  But 
it should be the responsibility of people working on that code, not the 
responsibility of developers who don’t use that code.


  If they're so
extremely fragile, then I think this needs urgently taking care of
by their maintainers. As mentioned before - how exactly static
symbols get represented is up to the compiler, i.e. may change at
any time. As a result, any change whatsoever would need such
regression testing, no matter that I agree that a larger scope
change like this one has slightly higher potential of triggering
some issue.


This is another argument I would agree with.

Given the closeness to the release, I’d favor checking in the patch today or 
tomorrow, regardless of testing status, so that it can get more testing in our 
automated systems; it can always be reverted if it is shown to break 
livepatching on gcc.


In that case: please rather today than tomorrow. The earlier the
better.

Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-28 Thread Jan Beulich
On 28.11.2019 10:38, Paul Durrant wrote:
> From: Julien Grall 
> 
> A guest will setup a shared page with the hypervisor for each vCPU via
> XENPMU_init. The page will then get mapped in the hypervisor and only
> released when XENPMU_finish is called.
> 
> This means that if the guest fails to invoke XENPMU_finish, e.g if it is
> destroyed rather than cleanly shut down, the page will stay mapped in the
> hypervisor. One of the consequences is the domain can never be fully
> destroyed as a page reference is still held.
> 
> As Xen should never rely on the guest to correctly clean-up any
> allocation in the hypervisor, we should also unmap such pages during the
> domain destruction if there are any left.
> 
> We can re-use the same logic as in pvpmu_finish(). To avoid
> duplication, move the logic in a new function that can also be called
> from vpmu_destroy().
> 
> NOTE: - The call to vpmu_destroy() must also be moved from
> arch_vcpu_destroy() into domain_relinquish_resources() such that
> the reference on the mapped page does not prevent domain_destroy()
> (which calls arch_vcpu_destroy()) from being called.
>   - Whilst it appears that vpmu_arch_destroy() is idempotent it is
> by no means obvious. Hence make sure the VPMU_CONTEXT_ALLOCATED
> flag is cleared at the end of vpmu_arch_destroy().
>   - This is not an XSA because vPMU is not security supported (see
> XSA-163).
> 
> Signed-off-by: Julien Grall 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: "Roger Pau Monné" 
> Cc: Jun Nakajima 
> Cc: Kevin Tian 
> 
> v2:
>  - Re-word commit comment slightly
>  - Re-enforce idempotency of vmpu_arch_destroy()
>  - Move invocation of vpmu_destroy() earlier in
>domain_relinquish_resources()

What about v3?

> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -576,11 +576,36 @@ static void vpmu_arch_destroy(struct vcpu *v)
>  
>   vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
>  }
> +
> +vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
>  }

Boris, to be on the safe side - are you in agreement with this
change, now that the setting of the flag is being left untouched?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2] psr: fix bug which may cause crash

2019-11-28 Thread Yi Sun
During test, we found a crash on Xen with below trace.
(XEN) Xen call trace:
(XEN)[] R psr.c#l3_cdp_write_msr+0x1e/0x22
(XEN)[] F psr.c#do_write_psr_msrs+0x6d/0x109
(XEN)[] F smp_call_function_interrupt+0x5a/0xac
(XEN)[] F call_function_interrupt+0x20/0x34
(XEN)[] F do_IRQ+0x175/0x6ae
(XEN)[] F common_interrupt+0x10a/0x120
(XEN)[] F cpu_idle.c#acpi_idle_do_entry+0x9d/0xb1
(XEN)[] F cpu_idle.c#acpi_processor_idle+0x41d/0x626
(XEN)[] F domain.c#idle_loop+0xa5/0xa7
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 20:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=]
(XEN) 

The bug happens when CDP and MBA co-exist and MBA COS_MAX is bigger
than CDP COS_MAX. E.g. MBA has 8 COS registers but CDP only have 6.
When setting MBA throttling value for the 7th guest, the value array
would be:
+--+--+--+
| Data default val | Code default val | MBA throttle |
+--+--+--+

Then, COS id 7 will be selected for writting the values. We should
avoid writting CDP data/code valules to COS id 7 MSR because it
exceeds the CDP COS_MAX.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 5866a26..ecca5b4 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -1271,7 +1271,8 @@ static void do_write_psr_msrs(void *data)
 
 for ( j = 0; j < cos_num; j++, index++ )
 {
-if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] )
+if ( cos <= feat->cos_max &&
+ feat->cos_reg_val[cos * cos_num + j] != info->val[index] )
 {
 feat->cos_reg_val[cos * cos_num + j] = info->val[index];
 props->write_msr(cos, info->val[index], props->type[j]);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 6/6] Added Resolving Disagreement

2019-11-28 Thread Jan Beulich
On 28.11.2019 01:56, Stefano Stabellini wrote:
> On Thu, 26 Sep 2019, Lars Kurth wrote:
>> +This could take for example the form of
>> +> Do you think it would be useful for the code to do XXX? 
>> +> I can imagine a user wanting to do YYY (and XXX would enable this)
>> +
>> +That potentially adds additional work for the code author, which they may 
>> not have
>> +the time to perform. It is good practice for authors to consider such a 
>> request in terms of
>> +* Usefulness to the user
>> +* Code churn, complexity or impact on other system properties
>> +* Extra time to implement and report back to the reviewer
>> +
>> +If you believe that the impact/cost is too high, report back to the 
>> reviewer. To resolve
>> +this, it is advisable to
>> +* Report your findings
>> +* And then check whether this was merely an interesting suggestion, or 
>> something the
>> +reviewer feels more strongly about
>> +
>> +In the latter case, there are typically several common outcomes
>> +* The **author and reviewer agree** that the suggestion should be 
>> implemented
>> +* The **author and reviewer agree** that it may make sense to defer 
>> implementation
>> +* The **author and reviewer agree** that it makes no sense to implement the 
>> suggestion
>> +
>> +The author of a patch would typically suggest their preferred outcome, for 
>> example
>> +> I am not sure it is worth to implement XXX
>> +> Do you think this could be done as a separate patch in future?
>> +
>> +In cases, where no agreement can be found, the best approach would be to 
>> get an
>> +independent opinion from another maintainer or the project's leadership 
>> team.
> 
> I think we should mention somewhere here that it is recommended for
> reviewers to be explicit about whether a request is optional or whether
> it is a requirement.
> 
> For instance: "I think it would be good if X also did Y" doesn't say if
> it is optional (future work) or it is actually required as part of this
> series. More explicit word choices are preferable, such as:
> 
> "I think it would be good if X also did Y, not a requirement but good to
> have."
> 
> "I think it would be good if X also did Y and it should be part of this
> series."

I think without it being made explicit that something is optional,
the assumption should be that it isn't. I.e. in the first example
I agree with the idea to have something after the comma, but in
the second example I think the extra wording is a waste of effort.

> I think there is something else we should say on this topic. There is a
> category of things which could be done in multiple ways and it is not
> overtly obvious which one is best. It is done to the maintainer and the
> author personal styles. It is easy to disagree on that.
> 
> I think a good recommendation would be for the contributor to try to
> follow the maintainers requests, even if they could be considered
> "style", trusting their experience on the matter. And a good
> recommendation for the maintainer would be to try to let the contributor
> have freedom of implementation choice on things that don't make a
> significant difference.

I think we try to, but I also think we suffer from too little
clear documentation on e.g. style aspects. Attempts on my part
to address this have mostly (not entirely) lead no-where (lack of
feedback on proposed patches to ./CODING_STYLE). So for the time
being there are (many) aspects where we have de-facto expectations
that aren't written down anywhere, with the result of (in a subset
of cases) disagreement on what the perceived de-facto standard
actually is.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-4.12-testing test] 144339: tolerable FAIL - PUSHED

2019-11-28 Thread osstest service owner
flight 144339 xen-4.12-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144339/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qcow217 guest-localmigrate/x10   fail  like 144319
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 xen  e10c1fbde8a81f541cb2c58a3078b2e7df89a801
baseline version:
 xen  875879a7b8c1d561e6ea2a20958a1e61242ffef1

Last test of basis   144319  2019-11-27 05:27:24 Z1 days
Testing same since   144339  2019-11-27 23:05:58 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Mark Rutland 
  Stefano Stabellini 
  Will Deacon 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread George Dunlap

> On Nov 28, 2019, at 9:55 AM, Jan Beulich  wrote:
 Has anyone actually tried building a livepatch with this change in place?
>>> Actually - what is your concern here? The exact spelling of symbols
>>> names should be of no interest to the tool. After all the compiler is
>>> free to invent all sorts of names for its local symbols, including
>>> the ones we would produce with this change in place. All the tool
>>> cares about is that the names be unambiguous. Hence any (theoretical)
>>> regression here would be a bug in the tools, which imo is no reason
>>> to delay this change any further. (Granted I should have got to it
>>> earlier, but it had been continuing to get deferred.)
>> 
>> This might all be true (theoretically).
>> 
>> The livepatch build tools are fragile and very sensitive to how the
>> object files are laid out.  There is a very real risk that this change
>> accidentally breaks livepatching totally, even on GCC builds.
>> 
>> Were this to happen, it would be yet another 4.13 regression.
> 
> It's perhaps a matter of perception, but I'd still call this a
> live patching tools bug then, not a 4.13 regression.

After the discussion yesterday, I was thinking a bit more about this, and I’m 
not happy with the principle Andy seems to be operating on, that it’s 
reasonable to completely block a bug-fixing patch to Xen, forcing a work-around 
to be used in a release, because it has unknown effects on livepatching.

Consider the relationship between Xen and Linux, for example.  Suppose that a 
patch were posted to Linux to fix an issue, and Juergen responded by saying 
that he wasn’t happy with it because it  might possibly break things running 
under Xen.  But he didn’t actually test it himself, nor propose some alternate 
way of fixing the original problem; rather, he expected the original patch 
submitter, who doesn’t use Xen, to set up a Xen system and test it themselves 
before accepting it.

Do you think anyone in the kernel community would stand for that?  Of course 
not.  Naturally the patch would be *paused* while *people in the Xen community* 
tested and or proposed alternate solutions; but if there was a delay, 
eventually it would be checked in.

I think the same principle should apply here.  If people using the livepatch 
code are afraid that Jan’s patch *may* affect livepatching on gcc, then they 
should be given time to review, test, and/or propose alternate solutions.  But 
it should be the responsibility of people working on that code, not the 
responsibility of developers who don’t use that code.

>  If they're so
> extremely fragile, then I think this needs urgently taking care of
> by their maintainers. As mentioned before - how exactly static
> symbols get represented is up to the compiler, i.e. may change at
> any time. As a result, any change whatsoever would need such
> regression testing, no matter that I agree that a larger scope
> change like this one has slightly higher potential of triggering
> some issue.

This is another argument I would agree with.

Given the closeness to the release, I’d favor checking in the patch today or 
tomorrow, regardless of testing status, so that it can get more testing in our 
automated systems; it can always be reverted if it is shown to break 
livepatching on gcc.

 -George
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/arm: include xen-tools/libs.h from libxl_arm_acpi.c

2019-11-28 Thread Wei Liu
On Wed, Nov 27, 2019 at 06:24:58PM -0800, Stefano Stabellini wrote:
> libxl_arm_acpi.c is using BUILD_BUG_ON but it is not including
> xen-tools/libs.h that defines it.
> 
> Signed-off-by: Stefano Stabellini 

Acked-by: Wei Liu 

Juergen, this is a trivial patch. I think it can go in 4.13.

Wei.

> ---
>  tools/libxl/libxl_arm_acpi.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/libxl/libxl_arm_acpi.c b/tools/libxl/libxl_arm_acpi.c
> index ba874c3d32..52c476ff65 100644
> --- a/tools/libxl/libxl_arm_acpi.c
> +++ b/tools/libxl/libxl_arm_acpi.c
> @@ -19,6 +19,7 @@
>  #include "libxl_arm.h"
>  
>  #include 
> +#include 
>  
>  /* Below typedefs are useful for the headers under acpi/ */
>  typedef uint8_t u8;
> -- 
> 2.17.1
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide

2019-11-28 Thread Jan Beulich
On 28.11.2019 01:54, Stefano Stabellini wrote:
> On Thu, 26 Sep 2019, Lars Kurth wrote:
>> From: Lars Kurth 
>>
>> This document highlights what reviewers such as maintainers and committers 
>> look
>> for when reviewing code. It sets expectations for code authors and provides
>> a framework for code reviewers.
> 
> I think the document is missing a couple of things:
> 
> - a simple one line statement that possibly the most important thing in
>   a code review is to indentify any bugs in the code
> 
> - an explanation that requests for major changes to the series should be
>   made early on (i.e. let's not change the architecture of a feature at
>   v9 if possible) I also made this comment in reply to patch #5. I'll
>   let you decide where is the best place for it.

This needs balancing. People crucial to the evaluation of a new
feature and its implementation simply may not have the time to
reply prior to v9. We've had situations where people posted new
revisions every other day, sometimes even more than one per day.

As indicated in several other contexts before - imo people not
helping to shoulder the review load should also not have the
expectation that their (large) contributions will be looked at
in due course. 

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] getting 4.11.3 ready

2019-11-28 Thread Jan Beulich
On 28.11.2019 00:51, Lars Kurth wrote:
> On 29/10/2019, 12:41, "Stefano Stabellini"  wrote:
> On Tue, 29 Oct 2019, Julien Grall wrote:
> > On 28/10/2019 21:43, Stefano Stabellini wrote:
> > > On Mon, 28 Oct 2019, Julien Grall wrote:
> > >> Hi,
> > >>
> > >> On 25/10/2019 11:31, Jan Beulich wrote:
> > >>> All,
> > >>>
> > >>> the 4.11.3 stable release is due. I intend to wait for the XSA fixes
> > >>> going public on the 31st, but not (much) longer. Please point out
> > >>> backporting candidates that you find missing from the respective
> > >>> stable trees. I have three ones queued which haven't passed the push
> > >>> gate to the master branch yet:
> > >>>
> > >>> 9257c218e5  x86/vvmx: Fix the use of RDTSCP when it is intercepted 
> at L0
> > >>> 7eee9c16d6  x86/tsc: update vcpu time info on guest TSC adjustments
> > >>> 9633929824  x86: fix off-by-one in is_xen_fixed_mfn()
> > >>
> > >> We don't seem to have backported patches for quite a while on Arm. 
> Some of my
> > >> patches have been marked as to be backported from the beginning [1]. 
> I am
> > >> specifically thinking to:
> > >>  
> > >> e04818b46d xen/arm: traps: Avoid using BUG_ON() to check guest state 
> in
> > >> advance_pc()
> > 
> > Urgh, I gave the correct title but the wrong commit sha1. It should be 
> > 
> > 72615f2e6b98e861c08abb1d2b194126013d54fe
> > 
> > > 
> > > I have e04818b46d, plus the following marked for backport:
> > > 
> > > e98edccb944a80db782e551f3090628e66c7fb52 xen/arm: SCTLR_EL1 is a 
> 64-bit register on Arm64
> > 
> > There are more than that to backport:
> > 
> > 30f5047b2c4e577436b505ba7627f34c3be02014xen/arm: gic: Make sure the 
> number of interrupt lines is valid before using it  [4.11]
> > 8aa276235b93eeb4f81095c638970900e19b31e5xen/arm: irq: End cleanly 
> spurious interrupt[4.11]
> > b4df73de493954c44f240f78779c9bd3782e1572xen/arm: gic-v2: deactivate 
> interrupts during initialization[4.11]
> > 0322e0db5b29a0d1ce4b452885e34023e3a4b00earm: gic-v3: deactivate 
> interrupts during initialization[4.11]
> > 
> > 5ba1c5d0641cf63086b3058e547fcd28c3c4a011xen/arm: memaccess: 
> Initialize correctly *access in __p2m_get_mem_access[4.12]
> > 07e44b3d1be32fa2165c2367ae3ef9c6c8b39e1exen/arm: Implement 
> workaround for Cortex A-57 and Cortex A72 AT speculate   [4.12]
> > 
> > 08e2059facd78d5ffaf206ba06ac2017c4adeed4xen/arm: setup: Calculate 
> correctly the size of Xen [4.11+]
> > 8dba9a81e7c62b8a7dbe023fffecd2e16cc20486xen/arm: Don't use _end in 
> is_xen_fixed_mfn()   [4.11+]
> > 671878779741b38c5f2363adceef8de2ce0b3945xen/arm: p2m: Free the p2m 
> entry after flushing the IOMMU TLBs  [4.11+]
> > 7f4217cc60574866cb90d67d9750228c6b86c91exen/arm: vsmc: The function 
> identifier is always 32-bit [4.11+]
> > 612d476e74a314be514ee6a9744eea8db09d32e5xen/arm64: Correctly 
> compute the virtual address in maddr_to_virt() [4.11+]
> > f51027be0688540aaab61513b06a8693a37e4c00xen/arm: fix nr_pdxs 
> calculation[4.11+]
> > a189ef027dbb7a3c0dfe566137f05c06d6685fb9xen/arm: mm: Flush the TLBs 
> even if a mapping failed in create_xen_entries  [4.11+]
> 
> They all make sense, I did the backports, building each commit
> individually.
> 
> Jan, AFAICT this is not yet ready to run the XSA checking tools.
> Let me know when you think I should run them

I'm lost. Why would the tree not be ready? There are no further
backports supposed to arrive prior to the release. I was hoping
to do the release yesterday. Also the mail you've replied to is
about a month old.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] psr: fix bug which may cause crash

2019-11-28 Thread Jan Beulich
On 28.11.2019 09:17, Yi Sun wrote:
> On 19-11-27 11:06:49, Jan Beulich wrote:
>> On 27.11.2019 07:24, Yi Sun wrote:
>>> --- a/xen/arch/x86/psr.c
>>> +++ b/xen/arch/x86/psr.c
>>> @@ -316,6 +316,7 @@ static bool cat_init_feature(const struct cpuid_leaf 
>>> *regs,
>>>  [FEAT_TYPE_L3_CDP] = "L3 CDP",
>>>  [FEAT_TYPE_L2_CAT] = "L2 CAT",
>>>  };
>>> +unsigned int i = 0;
>>
>> Unnecessary initializer and too wide a scope.
>>
> Ok, u8 is enough.

Did you perhaps mistake "scope" for "width"? You shouldn't use
fixed width types when there's no strict need to do so.

>>> @@ -332,7 +333,8 @@ static bool cat_init_feature(const struct cpuid_leaf 
>>> *regs,
>>>  return false;
>>>  
>>>  /* We reserve cos=0 as default cbm (all bits within cbm_len are 
>>> 1). */
>>> -feat->cos_reg_val[0] = cat_default_val(feat->cat.cbm_len);
>>> +for(i = 0; i < MAX_COS_REG_CNT; i++)
>>
>> There are number of blanks missing here (and even more ones in
>> the other instance below). It also seems to me that the comment
>> ends up misplaced now. If ...
>>
> Sorry, the comment should be modified.
> 
>>> +feat->cos_reg_val[i] = cat_default_val(feat->cat.cbm_len);
>>>  
>>>  wrmsrl((type == FEAT_TYPE_L3_CAT ?
>>>  MSR_IA32_PSR_L3_MASK(0) :
>>
>> ... this indeed is to remain a single write, it may want to move
>> here. But as per above keeping cached and actual values in sync
>> may make it necessary to move this write into the loop as well.
>>
> You are right, I missed to loop this sentence.
> 
> Another idea:
> I remembered that the original purpose to only write COS[0] here is to
> improve performance by not writing too many MSRs. So I am thinking to
> change the fix to below line in do_write_psr_msrs().
> if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] &&
>  cos <= feat->cos_max )
> 
> What is your opinion? Thanks!

Looks reasonable, provided it gets accompanied by a brief but
precise comment. I'd also suggest switching around the two
sides of the && .

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names

2019-11-28 Thread Jan Beulich
On 27.11.2019 19:17, Andrew Cooper wrote:
> On 21/11/2019 08:34, Jan Beulich wrote:
>> On 20.11.2019 18:13, Andrew Cooper wrote:
>>> On 20/11/2019 16:40, Jürgen Groß wrote:
 On 20.11.19 17:30, Jan Beulich wrote:
> On 08.11.2019 12:18, Jan Beulich wrote:
>> The .file assembler directives generated by the compiler do not include
>> any path components (gcc) or just the ones specified on the command
>> line
>> (clang, at least version 5), and hence multiple identically named
>> source
>> files (in different directories) may produce identically named static
>> symbols (in their kallsyms representation). The binary diffing
>> algorithm
>> used by xen-livepatch, however, depends on having unique symbols.
>>
>> Make the ENFORCE_UNIQUE_SYMBOLS Kconfig option control the (build)
>> behavior, and if enabled use objcopy to prepend the (relative to the
>> xen/ subdirectory) path to the compiler invoked STT_FILE symbols. Note
>> that this build option is made no longer depend on LIVEPATCH, but
>> merely
>> defaults to its setting now.
>>
>> Conditionalize explicit .file directive insertion in C files where it
>> exists just to disambiguate names in a less generic manner; note that
>> at the same time the redundant emission of STT_FILE symbols gets
>> suppressed for clang. Assembler files as well as multiply compiled C
>> ones using __OBJECT_FILE__ are left alone for the time being.
>>
>> Since we now expect there not to be any duplicates anymore, also don't
>> force the selection of the option to 'n' anymore in allrandom.config.
>> Similarly COVERAGE no longer suppresses duplicate symbol warnings if
>> enforcement is in effect, which in turn allows
>> SUPPRESS_DUPLICATE_SYMBOL_WARNINGS to simply depend on
>> !ENFORCE_UNIQUE_SYMBOLS.
>>
>> Signed-off-by: Jan Beulich 
> I've got acks from Konrad and Wei, but still need an x86 and a release
> one here. Andrew? Or alternatively - Jürgen, would you rather not see
> this go in anymore?
 In case the needed x86 Ack is coming in before RC3 I'm fine to give my
 Release-ack, but I'm hesitant to take it later.
>>> Has anyone actually tried building a livepatch with this change in place?
>> Actually - what is your concern here? The exact spelling of symbols
>> names should be of no interest to the tool. After all the compiler is
>> free to invent all sorts of names for its local symbols, including
>> the ones we would produce with this change in place. All the tool
>> cares about is that the names be unambiguous. Hence any (theoretical)
>> regression here would be a bug in the tools, which imo is no reason
>> to delay this change any further. (Granted I should have got to it
>> earlier, but it had been continuing to get deferred.)
> 
> This might all be true (theoretically).
> 
> The livepatch build tools are fragile and very sensitive to how the
> object files are laid out.  There is a very real risk that this change
> accidentally breaks livepatching totally, even on GCC builds.
> 
> Were this to happen, it would be yet another 4.13 regression.

It's perhaps a matter of perception, but I'd still call this a
live patching tools bug then, not a 4.13 regression.

> This is a change to fix a concrete livepatch issue with Clang.  Sure -
> it resolves the symbol uniqueness failures for the in-tree build, but
> considering the risks to the area you are modifying, the fact you
> haven't even done a dev test of a livepatch build on GCC means that the
> patch as a whole has not had what I would consider a reasonable amount
> of testing.

While Clang is the primary area where we need this change, it is
in no way limited to that environment. Gcc can, at any time,
start triggering the issue again as well - both because of changes
to the compiler itself, or because of (perhaps seemingly innocent)
changes we do to Xen. I can't imagine the workaround for this
would be to make it impossible altogether to select LIVEPATCH=y.
(In fact some of the recent always_inline may want dropping again
as well with this change in place, when they were added for
symbol collision reasons rather than code generation ones.)

> Luckily for you, Ross and Sergey have agreed to smoke test this with
> some livepatches.  They will report on this thread with their findings.

I appreciate this. And I do not consider it my responsibility to
do regression tests of the live patching tools. If they're so
extremely fragile, then I think this needs urgently taking care of
by their maintainers. As mentioned before - how exactly static
symbols get represented is up to the compiler, i.e. may change at
any time. As a result, any change whatsoever would need such
regression testing, no matter that I agree that a larger scope
change like this one has slightly higher potential of triggering
some issue.

Jan

___
Xen-devel mailing list

Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests

2019-11-28 Thread Julien Grall



On 28/11/2019 01:07, Stefano Stabellini wrote:

On Wed, 27 Nov 2019, Julien Grall wrote:

On Tue, 26 Nov 2019, 23:18 Stefano Stabellini,  wrote:
   On Fri, 15 Nov 2019, Stewart Hildebrand wrote:
   > Allow vgic_get_hw_irq_desc to be called with a vcpu argument.
   >
   > Use vcpu argument in vgic_connect_hw_irq.
   >
   > vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with
   > ASSERTs.
   >
   > Signed-off-by: Stewart Hildebrand 
   >
   > ---
   > v3: new patch
   >
   > ---
   > Note: I have only modified the old vgic to allow delivery of PPIs.
   > ---
   >  xen/arch/arm/gic-vgic.c | 24 
   >  xen/arch/arm/vgic.c     |  6 +++---
   >  2 files changed, 19 insertions(+), 11 deletions(-)
   >
   > diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
   > index 98c021f1a8..2c66a8fa92 100644
   > --- a/xen/arch/arm/gic-vgic.c
   > +++ b/xen/arch/arm/gic-vgic.c
   > @@ -418,7 +418,7 @@ struct irq_desc *vgic_get_hw_irq_desc(struct 
domain *d, struct vcpu *v,
   >  {
   >      struct pending_irq *p;
   >
   > -    ASSERT(!v && virq >= 32);
   > +    ASSERT((!v && (virq >= 32)) || (!d && v && (virq >= 16) && (virq 
< 32)));

   I don't think !d is necessary for this to work as intended so I would
   limit the ASSERT to

     ASSERT((!v && (virq >= 32)) || (v && (virq >= 16) && (virq < 32)));

   the caller can always pass v->domain

But then you have the risk to run into d != v->domain. So at least with the 
ASSERT you document the expectation.


Yes, that was not my intention.

It makes sense in certain scenarios for v to be NULL. What I was trying
to say is that when v is not-NULL, then also d should be not-NULL for
consistency. I don't think it makes sense to pass v corresponding to
vcpu1 of domain2 and d == NULL, right?


While I usually like consistency, 'd' is only used to find 'v' if it is 
NULL. So I really see limited reason to impose the caller to set 'd' in 
this case.




I don't know if you want to add a (d == v->domain) check to the ASSERT
as it is pretty busy already. I am OK either way.



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range

2019-11-28 Thread Julien Grall



On 28/11/2019 09:00, Jürgen Groß wrote:

On 28.11.19 09:48, Julien Grall wrote:

Hi,

On 28/11/2019 08:32, Jürgen Groß wrote:

On 28.11.19 09:14, Julien Grall wrote:
In short, I think the patch should go in now and there are no 
downsides

to it. That's it, I rest my case. Julien, I hope you'll reconsider.
I don't really see the point to try to allow Linux 5.4 booting on 
Xen 4.13 without knowing whether we are not going to uncovered more 
BUG around I*ACTIVER.


Sorry, but this is a rather weird statement.

IIUC you are saying that a typo blocking boot of Linux 5.4 should not be
fixed as you are not sure there are no other bugs?


The implementation of I*ACTIVER was incorrect but gone unnoticed 
because no-one used it until 5.4. It also happen that we didn't cover 
all the I*ACTIVER registers, so 5.4 crashes instead of using the wrong 
behavior.


This patch is basically replacing a guest crash by a behavior we don't 
fully understand.




If you really want this patch in Xen 4.13, then you should read my 
mail on the first version and trying to answer me why this 5.4 is 
going to be safe running on Xen 4.13.


Or do you think that with the typo fixed and running Linux 5.4 guests
will destabilize the host?


It is not going to destabilize the hosts. But this is not going to 
make 5.4 running correctly as Xen guest.


Have you verified it isn't running correctly or do you just think it
could hit problems?


I haven't tested myself, but any bug around vGIC is usually subtled. I 
wrote a long e-mail on v1 (see [1]) explaning what could happen.


To summarize briefly, Linux is reading the I*ACTIVER registers to check 
whether an interrupt is active at the hardware level. For instance, this 
is used to ensure all active interrupts have been handled before continuing.


By always returning 0, we tell Linux there are no interrupts. One of the 
risk is interrupts may be lost.


But that's Linux behavior, I can't tell how this is going to be used by 
others OSes.




In both cases I see no reason to keep wrong code.

Either the patch will let run Linux 5.4 fine - then the patch should
definitely be taken.
That's up to Stefano and Peng to provide me information why this is 
fine. FAOD, the current justification provided is not acceptable for me.




Or the patch will let Linux 5.4 boot further, but some problems will
occur. Then it will be possible to analyze those problems and try to
fix them, very possibly with the sane approach you are hoping for.


Juergen


[1] 
https://lore.kernel.org/xen-devel/7289f75f-1ab2-2d42-cd88-1be5306b3...@xen.org/


--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-28 Thread Paul Durrant
From: Julien Grall 

A guest will setup a shared page with the hypervisor for each vCPU via
XENPMU_init. The page will then get mapped in the hypervisor and only
released when XENPMU_finish is called.

This means that if the guest fails to invoke XENPMU_finish, e.g if it is
destroyed rather than cleanly shut down, the page will stay mapped in the
hypervisor. One of the consequences is the domain can never be fully
destroyed as a page reference is still held.

As Xen should never rely on the guest to correctly clean-up any
allocation in the hypervisor, we should also unmap such pages during the
domain destruction if there are any left.

We can re-use the same logic as in pvpmu_finish(). To avoid
duplication, move the logic in a new function that can also be called
from vpmu_destroy().

NOTE: - The call to vpmu_destroy() must also be moved from
arch_vcpu_destroy() into domain_relinquish_resources() such that
the reference on the mapped page does not prevent domain_destroy()
(which calls arch_vcpu_destroy()) from being called.
  - Whilst it appears that vpmu_arch_destroy() is idempotent it is
by no means obvious. Hence make sure the VPMU_CONTEXT_ALLOCATED
flag is cleared at the end of vpmu_arch_destroy().
  - This is not an XSA because vPMU is not security supported (see
XSA-163).

Signed-off-by: Julien Grall 
Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 
Cc: Jun Nakajima 
Cc: Kevin Tian 

v2:
 - Re-word commit comment slightly
 - Re-enforce idempotency of vmpu_arch_destroy()
 - Move invocation of vpmu_destroy() earlier in
   domain_relinquish_resources()
---
 xen/arch/x86/cpu/vpmu.c | 47 +++--
 xen/arch/x86/domain.c   | 10 +
 2 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index f397183ec3..792953e7cb 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -576,11 +576,36 @@ static void vpmu_arch_destroy(struct vcpu *v)
 
  vpmu->arch_vpmu_ops->arch_vpmu_destroy(v);
 }
+
+vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
 }
 
-void vpmu_destroy(struct vcpu *v)
+static void vpmu_cleanup(struct vcpu *v)
 {
+struct vpmu_struct *vpmu = vcpu_vpmu(v);
+mfn_t mfn;
+void *xenpmu_data;
+
+spin_lock(>vpmu_lock);
+
 vpmu_arch_destroy(v);
+xenpmu_data = vpmu->xenpmu_data;
+vpmu->xenpmu_data = NULL;
+
+spin_unlock(>vpmu_lock);
+
+if ( xenpmu_data )
+{
+mfn = domain_page_map_to_mfn(xenpmu_data);
+ASSERT(mfn_valid(mfn));
+unmap_domain_page_global(xenpmu_data);
+put_page_and_type(mfn_to_page(mfn));
+}
+}
+
+void vpmu_destroy(struct vcpu *v)
+{
+vpmu_cleanup(v);
 
 put_vpmu(v);
 }
@@ -639,9 +664,6 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t 
*params)
 static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
 {
 struct vcpu *v;
-struct vpmu_struct *vpmu;
-mfn_t mfn;
-void *xenpmu_data;
 
 if ( (params->vcpu >= d->max_vcpus) || (d->vcpu[params->vcpu] == NULL) )
 return;
@@ -650,22 +672,7 @@ static void pvpmu_finish(struct domain *d, 
xen_pmu_params_t *params)
 if ( v != current )
 vcpu_pause(v);
 
-vpmu = vcpu_vpmu(v);
-spin_lock(>vpmu_lock);
-
-vpmu_arch_destroy(v);
-xenpmu_data = vpmu->xenpmu_data;
-vpmu->xenpmu_data = NULL;
-
-spin_unlock(>vpmu_lock);
-
-if ( xenpmu_data )
-{
-mfn = domain_page_map_to_mfn(xenpmu_data);
-ASSERT(mfn_valid(mfn));
-unmap_domain_page_global(xenpmu_data);
-put_page_and_type(mfn_to_page(mfn));
-}
+vpmu_cleanup(v);
 
 if ( v != current )
 vcpu_unpause(v);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f1dd86e12e..f5c0c378ef 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -454,9 +454,6 @@ void arch_vcpu_destroy(struct vcpu *v)
 xfree(v->arch.msrs);
 v->arch.msrs = NULL;
 
-if ( !is_idle_domain(v->domain) )
-vpmu_destroy(v);
-
 if ( is_hvm_vcpu(v) )
 hvm_vcpu_destroy(v);
 else
@@ -2136,12 +2133,17 @@ int domain_relinquish_resources(struct domain *d)
 
 PROGRESS(vcpu_pagetables):
 
-/* Drop the in-use references to page-table bases. */
+/*
+ * Drop the in-use references to page-table bases and clean
+ * up vPMU instances.
+ */
 for_each_vcpu ( d, v )
 {
 ret = vcpu_destroy_pagetables(v);
 if ( ret )
 return ret;
+
+vpmu_destroy(v);
 }
 
 if ( altp2m_active(d) )
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC v5 000/126] error: auto propagated local_err

2019-11-28 Thread Vladimir Sementsov-Ogievskiy
28.11.2019 11:54, Markus Armbruster wrote:
> Please accept my sincere apologies for taking so long to reply.  A few
> thoughts before I dig deeper.
> 
> Vladimir Sementsov-Ogievskiy  writes:
> 
>> Hi all!
>>
>> At the request of Markus: full version of errp propagation. Let's look
>> at it. Cover as much as possible, except inserting macro invocation
>> where it's not necessary.
>>
>> It's huge, and so it's an RFC.
> 
> It's a monster.  Best to get it into full view before we commit to
> fighting it.
> 
>> In v5 I've added a lot more preparation cleanups:
>> 01-23 are preparation cleanups
>>01: not changed, keep Eric's r-b
>>02: improve commit msg [Markus], keep Eric's r-b
>>03: changed, only error API here, drop r-b
>> 24 is core macro
>>- improve cover letter, wording and macro code style
>>- keep Eric's r-b
>> 25-26: automation scripts
>> - commit-per-subsystem changed a lot. it's a draft, don't bother too
>>   much with it
>> - coccinelle: add support of error_propagate_prepend
>>
>> 27-126: generated patches
> 
> Splitting up the monster can make fighting it easier.
> 
> Your description suggests three high-level parts:
> 
> Part 1: Preparation (makes sense by itself)

I already resent part 1 all patches (handling review comments) in separate as 
v6.
If it is convenient, I can resend them in one series as v7.

> Part 2: Error interface update (with rules what code should do now)

Note, that patch 21 is actually from part2, not part1.
So Part 2 is 21, 24, 25.
So I wait for your comments and resend (if needed) as separate small series.

And 26 is auto-patch-splitter, but we don't need it now, if we are going
to start from several big subsystems.

> Part 3: Make the code obey the new rules everywhere
> 
> I hope we can get part 1 out of the way quickly.  Diffstat:
> 
>   backends/cryptodev.c   |  11 +---
>   block/nbd.c|  10 +--
>   block/snapshot.c   |   4 +-
>   dump/dump-hmp-cmds.c   |   4 +-
>   hw/9pfs/9p-local.c |   4 +-
>   hw/9pfs/9p-proxy.c |   5 +-
>   hw/core/loader-fit.c   |   5 +-
>   hw/core/machine-hmp-cmds.c |   6 +-
>   hw/core/qdev.c |  28 
>   hw/i386/amd_iommu.c|  14 ++--
>   hw/ppc/spapr.c |   2 +-
>   hw/s390x/event-facility.c  |   2 +-
>   hw/s390x/s390-stattrib.c   |   3 +-
>   hw/sd/sdhci.c  |   2 +-
>   hw/tpm/tpm_emulator.c  |   8 +--
>   hw/usb/dev-network.c   |   2 +-
>   hw/vfio/ap.c   |  16 +
>   include/block/snapshot.h   |   2 +-
>   include/monitor/hmp.h  |   2 +-
>   include/qapi/error.h   |  69 ++--
>   include/qom/object.h   |   4 +-
>   monitor/hmp-cmds.c | 155 
> ++---
>   monitor/qmp-cmds.c |   2 +-
>   net/net.c  |  17 ++---
>   qdev-monitor.c |  28 
>   qga/commands-posix.c   |   2 +-
>   qga/commands-win32.c   |   2 +-
>   qga/commands.c |  12 ++--
>   qom/qom-hmp-cmds.c |   4 +-
>   target/ppc/kvm.c   |   6 +-
>   target/ppc/kvm_ppc.h   |   4 +-
>   ui/vnc.c   |  20 ++
>   ui/vnc.h   |   2 +-
>   util/error.c   |  30 -
>   34 files changed, 261 insertions(+), 226 deletions(-)
> 
> At first glance, I can see bug fixes, non-mechanical cleanups, and
> mechanical cleanups.
> 
> Within each of these three groups, we have related sub-groups.  For
> instance, several patches clean up funny names for the common Error **
> parameters.  Several more rename "uncommon" Error ** parameters, to
> signal their uncommon role.  I doubt splitting up these subgroups of
> related mechanical changes along subsystem lines is worthwhile.
> 
> Part 2 needs careful interface review.  Having part 3 ready helps there,
> because we can see rather than guess how the interface changes play out.
> We really want to get this part right from the start, because if we
> don't, we get to do part 3 again.
> 
> Part 3 is what makes this a monster.  I understand it's mechanical.  We
> can merge it incrementally, but we do want to merge it all, and sooner
> rather than later, to avoid a mix of old and new error handling code.
> Such mixes inevitably confuse developers, and lead to new instances of
> the old patterns creeping in.
> 
> I do have doubts about your automated split.
> 
> I acknowledge maintainers of active subsystems may want to merge this on
> their own terms, to minimize disruption.  Splitting off sub-monsters for
> them makes sense.  Splitting off the long tail of less busy subsystems
> not so much; it'll only drag out the merging.  Your list below shows 100
> parts, and chasing their maintainers is not going to be a fun
> experience.
> 
> Moreover, using MAINTAINERS to guide an automatic split is a cute idea,
> but it falls apart when MAINTAINERS attributes the same file to several
> subsystems, which is fairly 

  1   2   >