Re: [Xen-devel] [PATCH v2] xen/events: remove event handling recursion detection
On 28.11.19 22:37, Boris Ostrovsky wrote: On 11/28/19 3:45 AM, Juergen Gross wrote: - static void __xen_evtchn_do_upcall(void) { struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu); - int cpu = get_cpu(); - unsigned count; + int cpu = smp_processor_id(); do { vcpu_info->evtchn_upcall_pending = 0; - if (__this_cpu_inc_return(xed_nesting_count) - 1) - goto out; - xen_evtchn_handle_events(cpu); BUG_ON(!irqs_disabled()); - count = __this_cpu_read(xed_nesting_count); - __this_cpu_write(xed_nesting_count, 0); - } while (count != 1 || vcpu_info->evtchn_upcall_pending); - -out: + rmb(); /* Hypervisor can set upcall pending. */ virt_rmb() perhaps then? Yes, that's better. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 6/6] Added Resolving Disagreement
On 28/11/2019, 12:50, "Stefano Stabellini" wrote: On Thu, 28 Nov 2019, Jan Beulich wrote: > On 28.11.2019 01:56, Stefano Stabellini wrote: > > On Thu, 26 Sep 2019, Lars Kurth wrote: > > I think a good recommendation would be for the contributor to try to > > follow the maintainers requests, even if they could be considered > > "style", trusting their experience on the matter. And a good > > recommendation for the maintainer would be to try to let the contributor > > have freedom of implementation choice on things that don't make a > > significant difference. > > I think we try to, but I also think we suffer from too little > clear documentation on e.g. style aspects. Attempts on my part > to address this have mostly (not entirely) lead no-where (lack of > feedback on proposed patches to ./CODING_STYLE). So for the time > being there are (many) aspects where we have de-facto expectations > that aren't written down anywhere, with the result of (in a subset > of cases) disagreement on what the perceived de-facto standard > actually is. I recognize that it could be challenging finding a consensus to update CODING_STYLE but it might be worth doing to reduce frictions with both contributors and other reviewers. But to be clear, I was also referring to things that might be actually hard to add to CODING_STYLE, such as macro vs. static inlines, when to split a single function into multiple smaller functions, etc. I think this is definitely something we ought to do. I am volunteering to pick this up, but changing/clarifying the CODING_STYLE needs to be considered in conjunction with checking tools I have parked this for now, as a) I did not want to disrupt 4.13 b) and until recently I also didn’t fully understand what kind of coding standards would help with safety certification And of course, having a bot do the checking would remove the friction entirely. Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide
On 28/11/2019, 12:12, "Rich Persaud" wrote: On Nov 28, 2019, at 05:12, Jan Beulich wrote: > > On 28.11.2019 01:54, Stefano Stabellini wrote: >>> On Thu, 26 Sep 2019, Lars Kurth wrote: >>> From: Lars Kurth >>> >>> This document highlights what reviewers such as maintainers and committers look >>> for when reviewing code. It sets expectations for code authors and provides >>> a framework for code reviewers. >> >> I think the document is missing a couple of things: >> >> - a simple one line statement that possibly the most important thing in >> a code review is to indentify any bugs in the code >> >> - an explanation that requests for major changes to the series should be >> made early on (i.e. let's not change the architecture of a feature at >> v9 if possible) I also made this comment in reply to patch #5. I'll >> let you decide where is the best place for it. > > This needs balancing. People crucial to the evaluation of a new > feature and its implementation simply may not have the time to > reply prior to v9. We've had situations where people posted new > revisions every other day, sometimes even more than one per day. > > As indicated in several other contexts before - imo people not > helping to shoulder the review load should also not have the > expectation that their (large) contributions will be looked at > in due course. To make this actionable, we could have: - reviewer demand index: automated index of open patches still in need of review, sorted by decreasing age - review flow control: each new patch submission cites one recent review by the patch submitter, for a patch of comparable size - reviewer supply growth: a bootstrapping guide for new reviewers and submitters, with patterns, anti-patterns, and examples to be emulated That is a great idea. However, I would not want to hold up the publication of these documents on these suggestions. Some of them would require implementing tools. I was hoping there would be more progress on lore and others tooling/workflow related stuff by now. So I think for now, I think it is sufficient to set expectations better. Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] psr: fix bug which may cause crash
On 19-11-28 12:25:44, Jan Beulich wrote: > On 28.11.2019 11:18, Yi Sun wrote: > > --- a/xen/arch/x86/psr.c > > +++ b/xen/arch/x86/psr.c > > @@ -1271,7 +1271,8 @@ static void do_write_psr_msrs(void *data) > > > > for ( j = 0; j < cos_num; j++, index++ ) > > { > > -if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] ) > > +if ( cos <= feat->cos_max && > > + feat->cos_reg_val[cos * cos_num + j] != info->val[index] ) > > The description is indeed much better now, thanks. However, > as indicated in reply to v1, this extra (and at the first glance > unmotivated) bounds check wants to be accompanied by a brief but I will add the comment. > precise comment. Furthermore with the loop bounded by a local > variable, why not > > cos_num = min(props->cos_num, feat->cos_max + 1); > > a few lines up from here (again suitable commented)? > cos_num is a different thing with the number of COS registers. The meaning of it is "COS registers number that feature uses for one COS ID". E.g. MBA/CAT cos_num is always 1. But CDP cos_num is 2 because it uses 2 COS registers for one COS ID. > Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 6/6] Added Resolving Disagreement
On 27/11/2019, 18:56, "Stefano Stabellini" wrote: On Thu, 26 Sep 2019, Lars Kurth wrote: > From: Lars Kurth > > This guide provides Best Practice on identifying and resolving > common classes of disagreement > > Signed-off-by: Lars Kurth > -- > Cc: minios-de...@lists.xenproject.org > Cc: xen-...@lists.xenproject.org > Cc: win-pv-de...@lists.xenproject.org > Cc: mirageos-de...@lists.xenproject.org > Cc: committ...@xenproject.org > --- > resolving-disagreement.md | 146 ++ > 1 file changed, 146 insertions(+) > create mode 100644 resolving-disagreement.md > > diff --git a/resolving-disagreement.md b/resolving-disagreement.md > new file mode 100644 > index 000..19aedbe > --- /dev/null > +++ b/resolving-disagreement.md > @@ -0,0 +1,146 @@ > +# Resolving Disagreement > + > +This guide provides Best Practice on resolving disagreement, such as > +* Gracefully accept constructive criticism > +* Focus on what is best for the community > +* Resolve differences in opinion effectively > + > +## Theory: Paul Graham's hierarchy of disagreement > +Paul Graham proposed a **disagreement hierarchy** in a 2008 essay > +**[How to Disagree](http://www.paulgraham.com/disagree.html)**, putting types of > +arguments into a seven-point hierarchy and observing that *moving up the > +disagreement hierarchy makes people less mean, and will make most of them happier*. > +Graham also suggested that the hierarchy can be thought of as a pyramid, as the > +highest forms of disagreement are rarer. > + > +| ![Graham's Hierarchy of Disagreemen](https://upload.wikimedia.org/wikipedia/commons/a/a3/Graham%27s_Hierarchy_of_Disagreement-en.svg) | ^ Disagreement This is a NIT but in a few places in this series you go over the original line length. True: typically for URLs. Primarily because I don't know whether they are rendered correctly when split > +| *A representation of Graham's hierarchy of disagreement from [Loudacris](http://www.createdebate.com/user/viewprofile/Loudacris) modified by [Rocket000](https://en.wikipedia.org/wiki/User:Rocket000)* | > + > +In the context of the Xen Project we strive to **only use the top half** of the hierarchy. > +**Name-calling** and **Ad hominem** arguments are not acceptable within the Xen > +Project. > + > +## Issue: Scope creep > + > +One thing which occasionally happens during code review is that a code reviewer > +asks or appears to ask the author of patch to implement additional functionality. ^ a patch ^ functionalities > +This could take for example the form of > +> Do you think it would be useful for the code to do XXX? > +> I can imagine a user wanting to do YYY (and XXX would enable this) > + > +That potentially adds additional work for the code author, which they may not have > +the time to perform. It is good practice for authors to consider such a request in terms of > +* Usefulness to the user > +* Code churn, complexity or impact on other system properties > +* Extra time to implement and report back to the reviewer > + > +If you believe that the impact/cost is too high, report back to the reviewer. To resolve > +this, it is advisable to > +* Report your findings > +* And then check whether this was merely an interesting suggestion, or something the > +reviewer feels more strongly about > + > +In the latter case, there are typically several common outcomes > +* The **author and reviewer agree** that the suggestion should be implemented > +* The **author and reviewer agree** that it may make sense to defer implementation > +* The **author and reviewer agree** that it makes no sense to implement the suggestion > + > +The author of a patch would typically suggest their preferred outcome, for example > +> I am not sure it is worth to implement XXX > +> Do you think this could be done as a separate patch in future? > + > +In cases, where no agreement can be found, the best approach would be to get an > +independent opinion from another maintainer or the project's leadership team. I think we should mention somewhere here that it is recommended for reviewers to be explicit about whether a request is optional or whether it is a requirement. For instance: "I think it would be good if X also did Y" doesn't say if it is optional (future work) or it is actually required as part of this series. More explicit word choices are preferable, such as: "I think it would be good if X also did Y, not a requirement but good to have." "I think it
Re: [Xen-devel] [MirageOS-devel] [PATCH v2 4/6] Add Code Review Guide
From: Rich Persaud Date: Thursday, 28 November 2019 at 12:21 To: Lars Kurth Cc: 'Jan Beulich' , "lars.ku...@xenproject.org" , Stefano Stabellini , "xen-...@lists.xenproject.org" , "minios-de...@lists.xenproject.org" , "committ...@xenproject.org" , "mirageos-de...@lists.xenproject.org" , xen-devel , "win-pv-de...@lists.xenproject.org" Subject: Re: [MirageOS-devel] [PATCH v2 4/6] Add Code Review Guide On Nov 28, 2019, at 09:05, Lars Kurth wrote: On 28/11/2019, 07:37, "Jan Beulich" wrote: On 28.11.2019 14:06, Lars Kurth wrote: I can certainly add something on the timing , along the lines of * For complex series, consider the time it takes to do reviews (maybe with a guide of LOC per hour) and give reviewers enough time to * For series with design issues or large questions, try and highlight the key open issues in cover letters clearly and solicit feedback from key maintainers who can comment on the open issue. The idea is to save both the contributor and the reviewers time by focussing on what needs to be resolved * Don’t repost a series, unless all review comments are addressed or the reviewers asked you to do so. The problem with this is that this is somewhat in conflict with the "let's focus on the core issues and not get distracted by details early on in a review cycle". In other words, this can only work, if reviewers focus on major issues in early reviews only and do not focus on style, coding standards, etc. But this doesn't make much sense either, because then full re-reviews need to happen anyway on later versions, to also deal with the minor issues. For RFC kind of series omitting style and alike feedback certainly makes sense, but as soon as a patch is non-RFC, it should be considered good to go in by the submitter. OK, I think we have a disconnect between ideal and reality. I see two issues today * Key maintainers don't always review RFC series [they end up at the bottom of the priority list, even though spending time on RFCs will save time elsewhere later]. So the effect is that then the contributor assumes there are no major issues and ends it as a proper series * In practice what has happened often in the past is that design, architecture, assumption flaws are found in early versions of a series. - This usually happens because of an oversight or because there was no design discussion prior to the series being posted and agreed - Common sense would dictate that the biggest benefit for both the reviewer, the contributor and the community as a whole would be to try and focus on such flaws and leave everything aside - Of course there may be value in doing a detailed review of parts of such a series as there may be bits that are unaffected by such a flaw - But there will likely be parts which are not: doing a detailed review of such portions wastes everyone's time So coming back to your point. Ideally, it would be nice if we had the capability to call out parts of a series as "problematic" and treating such parts differently. We may be able to reuse some "Shift Left" terminology, including citations of previous Xen code reviews to illustrate categories of design issues that can be shifted left: https://devopedia.org/shift-left I like that idea. We seem to not have come to a conclusion on this specific topic, but maybe for now it is sufficient to call this out as a potential issue in the guide. Before I send out a new version, it would be good to get at least Jan’s view on the issue. Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xtf test] 144355: regressions - FAIL
flight 144355 xtf real [real] http://logs.test-lab.xenproject.org/osstest/logs/144355/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-xtf-amd64-amd64-4 85 leak-check/check fail REGR. vs. 143721 test-xtf-amd64-amd64-1 85 leak-check/check fail REGR. vs. 143721 Tests which did not succeed, but are not blocking: test-xtf-amd64-amd64-4 84 xtf/test-pv32pae-xsa-296 fail never pass test-xtf-amd64-amd64-1 84 xtf/test-pv32pae-xsa-296 fail never pass version targeted for testing: xtf 58bcde1b9209bab1e51f5645a913998b9da14bce baseline version: xtf 08a19af3c78e8a03f83bc354b50545136c03edd2 Last test of basis 143721 2019-11-04 13:24:42 Z 24 days Testing same since 144355 2019-11-28 23:38:54 Z0 days1 attempts People who touched revisions under test: Andrew Cooper jobs: build-amd64-xtf pass build-amd64 pass build-amd64-pvopspass test-xtf-amd64-amd64-1 fail test-xtf-amd64-amd64-2 pass test-xtf-amd64-amd64-3 pass test-xtf-amd64-amd64-4 fail test-xtf-amd64-amd64-5 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit 58bcde1b9209bab1e51f5645a913998b9da14bce Author: Andrew Cooper Date: Tue May 7 11:19:58 2019 +0100 XSA-298 PoC commit fe4b41b4519d54f60347eb3f895d0a7b63d46e8d Author: Andrew Cooper Date: Thu Apr 11 16:50:23 2019 +0100 XSA-296 PoC Signed-off-by: Andrew Cooper commit 7482af39dc2c6b9e90fa100cc35e3727283da6e0 Author: Andrew Cooper Date: Fri Nov 15 13:23:03 2019 + idt: Constify the xtf_idte parameter to xtf_set_idte() It is only ever read. Take the opportunity to adjust all callers to construct their struct xtf_idte in .rodata. Signed-off-by: Andrew Cooper commit 31c03eaaf3424fb454c4768cc8df28a0c45bbddb Author: Andrew Cooper Date: Thu Nov 28 20:52:36 2019 + docs: Use https:// links in preference to http:// Signed-off-by: Andrew Cooper commit b213e50d7b5af3c1bd9e4ba0d22f15f91c3a877a Author: Andrew Cooper Date: Mon Nov 25 13:31:54 2019 + Drop custom stack handling for nested tasks I don't recall how I came to this conclusion, but its not correct. IRET with NT set doesn't inspect the stack at all. Signed-off-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable test] 144349: regressions - FAIL
flight 144349 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/144349/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-libvirt 18 guest-start/debian.repeat fail REGR. vs. 144344 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 16 guest-localmigrate fail like 144344 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144344 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144344 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144344 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144344 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144344 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144344 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 144344 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144344 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144344 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144344 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen 56348df32bbc782e63b6e3fb978b80e015ae76e7 baseline version: xen 9a400d1797ec7f77ffefeb5c4e17a8c2e8b91a12 Last test of basis 144344 2019-11-28 02:00:09 Z0 days Testing same since 144349 2019-11-28 14:36:38 Z0 days1 attempts People who touched
Re: [Xen-devel] [PATCH v2 5/6] Add guide on Communication Best Practice
On 27/11/2019, 19:06, "Stefano Stabellini" wrote: On Fri, 27 Sep 2019, Jan Beulich wrote: > On 26.09.2019 21:39, Lars Kurth wrote: > > +### Verbose vs. terse > > +Due to the time it takes to review and compose code reviewer, reviewers often adopt a > > +terse style. It is not unusual to see review comments such as > > +> typo > > +> s/resions/regions/ > > +> coding style > > +> coding style: brackets not needed > > +etc. > > + > > +Terse code review style has its place and can be productive for both the reviewer and > > +the author. However, overuse can come across as unfriendly, lacking empathy and > > +can thus create a negative impression with the author of a patch. This is in particular > > +true, when you do not know the author or the author is a newcomer. Terse > > +communication styles can also be perceived as rude in some cultures. > > And another remark here: Not being terse in situations like the ones > enumerated as examples above is a double waste of the reviewer's time: > They shouldn't even need to make such comments, especially not many > times for a single patch (see your mention of "overuse"). I realize > we still have no automated mechanism to check style aspects, but > anybody can easily look over their patches before submitting them. > And for an occasional issue I think a terse reply is quite reasonable > to have. > > Overall I'm seeing the good intentions of this document, yet I'd still > vote at least -1 on it if it came to a vote. Following even just a > fair part of it is a considerable extra amount of time to invest in > reviews, when we already have a severe reviewing bottleneck. If I have > to judge between doing a bad (stylistically according to this doc, not > technically) review or none at all (because of time constraints), I'd > favor the former. Unless of course I'm asked to stop doing so, in > which case I'd expect whoever asks to arrange for the reviews to be > done by someone else in due course. Reading the document, I think Jan has a point that it gives the impression that following the suggestions would take significant efforts, while actually I don't think Lars meant it that way at all, and I don't think it should be the case either. Yes. Ultimately the effect of a better communication should overall be a net-positive in terms of effort. Maybe we should highlight and encourage "clarity" instead of "verbosity" of the communication, and encourage "expressing appreciation" to newcomers, not necessarily to seasoned contributors. Good idea The ultimate goal of this document is actually to *reduce* our overall efforts by making our communication more efficient, not to increase efforts. Maybe it is worth saying this too. It is worth saying this. Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 5/6] Add guide on Communication Best Practice
On 27/11/2019, 18:57, "Stefano Stabellini" wrote: On Thu, 26 Sep 2019, Lars Kurth wrote: > From: Lars Kurth > > This guide covers the bulk on Best Practice related to code review > It primarily focusses on code review interactions > It also covers how to deal with Misunderstandings and Cultural > Differences > > Signed-off-by: Lars Kurth > --- > Cc: minios-de...@lists.xenproject.org > Cc: xen-...@lists.xenproject.org > Cc: win-pv-de...@lists.xenproject.org > Cc: mirageos-de...@lists.xenproject.org > Cc: committ...@xenproject.org > --- > communication-practice.md | 410 ++ > 1 file changed, 410 insertions(+) > create mode 100644 communication-practice.md > > diff --git a/communication-practice.md b/communication-practice.md > new file mode 100644 > index 000..db9a5ef > --- /dev/null > +++ b/communication-practice.md > @@ -0,0 +1,410 @@ > +# Communication Best Practice > + > +This guide provides communication Best Practice that helps you in > +* Using welcoming and inclusive language > +* Keeping discussions technical and actionable > +* Being respectful of differing viewpoints and experiences > +* Being aware of your own and counterpart’s communication style and culture > +* Show empathy towards other community members > + > +## Code reviews for **reviewers** and **patch authors** > + > +Before embarking on a code review, it is important to remember that > +* A poorly executed code review can hurt the contributors feeling, even when a reviewer > + did not intend to do so. Feeling defensive is a normal reaction to a critique or feedback. > + A reviewer should be aware of how the pitch, tone, or sentiment of their comments > + could be interpreted by the contributor. The same applies to responses of an author > + to the reviewer. > +* When reviewing someone's code, you are ultimately looking for issues. A good code > + reviewer is able to mentally separate finding issues from articulating code review > + comments in a constructive and positive manner: depending on your personality this > + can be **difficult** and you may need to develop a technique that works for you. > +* As software engineers we like to be proud of the solutions we came up with. This can > + make it easy to take another people’s criticism personally. Always remember that it is > + the code that is being reviewed, not you as a person. > +* When you receive code review feedback, please be aware that we have reviewers > + from different backgrounds, communication styles and cultures. Although we all trying > + to create a productive, welcoming and agile environment, we do not always succeed. > + > +### Express appreciation > +As the nature of code review to find bugs and possible issues, it is very easy for > +reviewers to get into a mode of operation where the patch review ends up being a list > +of issues, not mentioning what is right and well done. This can lead to the code > +submitter interpreting your feedback in a negative way. > + > +The opening of a code review provides an opportunity to address this and also sets the > +tone for the rest of the code review. Starting **every** review on a positive note, helps > +set the tone for the rest of the review. > + > +For an initial patch, you can use phrases such as > +> Thanks for the patch > +> Thanks for doing this > + > +For further revisions within a review, phrases such as > +> Thank you for addressing the last set of changes > + > +If you believe the code was good, it is good practice to highlight this by using phrases > +such as > +> Looks good, just a few comments > +> The changes you have made since the last version look good > + > +If you think there were issues too many with the code to use one of the phrases, > +you can still start on a positive note, by for example saying > +> I think this is a good change > +> I think this is a good feature proposal > + > +It is also entirely fine to highlight specific changes as good. The best place to > +do this, is at top of a patch, as addressing code review comments typically requires ^ the top > +a contributor to go through the list of things to address and an in-lined positive > +comment is likely to break that workflow. > + > +You should also consider, that if you review a patch of an experienced > +contributor phrases such as *Thanks for the patch* could come across as > +patronizing, while using *Thanks for doing this* is less likely to be interpreted > +as such. > + > +Appreciation should also be expressed by patch authors when asking for
Re: [Xen-devel] [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed
On 11/28/19 5:23 AM, Jan Beulich wrote: > On 28.11.2019 10:38, Paul Durrant wrote: > >> --- a/xen/arch/x86/cpu/vpmu.c >> +++ b/xen/arch/x86/cpu/vpmu.c >> @@ -576,11 +576,36 @@ static void vpmu_arch_destroy(struct vcpu *v) >> >> vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); >> } >> + >> +vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); >> } > Boris, to be on the safe side - are you in agreement with this > change, now that the setting of the flag is being left untouched? Yes, this is fine. (I probably would clear it in arch_vpmu_destroy op since it is set in arch-specific code but either way works) -boris ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 0/2] xen/gntdev: sanitize user interface handling
On 11/28/19 3:48 AM, Jürgen Groß wrote: > On 07.11.19 12:15, Juergen Gross wrote: >> The Xen gntdev driver's checking of the number of allowed mapped pages >> is in need of some sanitizing work. >> >> Changes in V2: >> - enhanced commit message of patch 1 (Andrew Cooper) >> >> Juergen Gross (2): >> xen/gntdev: replace global limit of mapped pages by limit per call >> xen/gntdev: switch from kcalloc() to kvcalloc() >> >> drivers/xen/gntdev-common.h | 2 +- >> drivers/xen/gntdev-dmabuf.c | 11 +++-- >> drivers/xen/gntdev.c | 55 >> +++-- >> 3 files changed, 27 insertions(+), 41 deletions(-) >> > > Boris, could you please comment on the patches? Sure, both look good (and so R-b). I haven't commented since Oleksandr already gave you his. -boris ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] xen/events: remove event handling recursion detection
On 11/28/19 3:45 AM, Juergen Gross wrote: > - > static void __xen_evtchn_do_upcall(void) > { > struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu); > - int cpu = get_cpu(); > - unsigned count; > + int cpu = smp_processor_id(); > > do { > vcpu_info->evtchn_upcall_pending = 0; > > - if (__this_cpu_inc_return(xed_nesting_count) - 1) > - goto out; > - > xen_evtchn_handle_events(cpu); > > BUG_ON(!irqs_disabled()); > > - count = __this_cpu_read(xed_nesting_count); > - __this_cpu_write(xed_nesting_count, 0); > - } while (count != 1 || vcpu_info->evtchn_upcall_pending); > - > -out: > + rmb(); /* Hypervisor can set upcall pending. */ virt_rmb() perhaps then? -boris > > - put_cpu(); > + } while (vcpu_info->evtchn_upcall_pending); > } > > void xen_evtchn_do_upcall(struct pt_regs *regs) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 144352: tolerable all pass - PUSHED
flight 144352 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/144352/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 1d758bc6d1a8c0f658a874470c349ee4e27aee46 baseline version: xen 5655ce8b1ec2a82ef080078e41c73bbd536174e1 Last test of basis 144350 2019-11-28 15:00:23 Z0 days Testing same since 144352 2019-11-28 18:01:10 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Jan Beulich Konrad Rzeszutek Wilk Sergey Dyasli Wei Liu jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 5655ce8b1e..1d758bc6d1 1d758bc6d1a8c0f658a874470c349ee4e27aee46 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range
On Thu, 28 Nov 2019, 18:09 Stefano Stabellini, wrote: > On Thu, 28 Nov 2019, Julien Grall wrote: > > > In both cases I see no reason to keep wrong code. > > > > > > Either the patch will let run Linux 5.4 fine - then the patch should > > > definitely be taken. > > That's up to Stefano and Peng to provide me information why this is fine. > > FAOD, the current justification provided is not acceptable for me. > > I disagree. This is a typo fix. The original design was never spec > compliant. You cannot expect the typo fix to explain why the original > behavior is tolerable. That is out of scope and should *not* be required > for this fix. May I remind you that as a maintainer, this is in my right to say no to a patch. > We cannot expect typo fixes to go and trigger vgic/gic reworks and deep > investigations. This is a wrong expectation now, and going forward. > That's the best way to turn Xen into a bunch of hacks. I pointed out several times a potential issue with this patch. I also spent some part of my week-end investigating it and provide some insight. Did you look at them? If you want this patch in, then please help explaining why 5.4 is going to run fine on Xen 4.13 rather than keeping arguing this is a typo fix. Cheers, ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/arm: include xen-tools/libs.h from libxl_arm_acpi.c
On Thu, 28 Nov 2019, Wei Liu wrote: > On Thu, Nov 28, 2019 at 11:30:34AM +0100, Jürgen Groß wrote: > > On 28.11.19 11:15, Wei Liu wrote: > > > On Wed, Nov 27, 2019 at 06:24:58PM -0800, Stefano Stabellini wrote: > > > > libxl_arm_acpi.c is using BUILD_BUG_ON but it is not including > > > > xen-tools/libs.h that defines it. > > > > > > > > Signed-off-by: Stefano Stabellini > > > > > > Acked-by: Wei Liu > > > > > > Juergen, this is a trivial patch. I think it can go in 4.13. > > > > Why is this patch needed? > > > > tools/libxl/libxl_arm_acpi.c includes libxl_arm.h, which includes > > libxl_internal.h, which includes xen-tools/libs.h. > > Oh I missed that. > > In that case I don't think this patch is required for 4.13. > > Stefano, did you see a build error or something? > Hi Wei, and Jurgen, Thanks for the review and also for probably catching a mistake in the patch. Yes, this patch fixes a build error with the latest Yocto and gcc 9: | libxl_arm_acpi.c: In function 'make_acpi_rsdp': | libxl_arm_acpi.c:193:5: error: implicit declaration of function 'BUILD_BUG_ON' [-Werror=implicit-function-declaration] | 193 | BUILD_BUG_ON(sizeof(ACPI_OEM_ID) != sizeof(rsdp->oem_id)); | | ^~~~ but the error was based on an older Xen tree based on 4.11, which doesn't have an include of xen-tools/libs.h in libxl_internal.h. So, I think Juergen is right that this should not be needed upstream. I didn't actually have a repro (the issue was reported to me by somebody else), so before I sent the patch to xen-devel I manually check the code but couldn't actually try a build. And I didn't notice the include xen-tools/libs.h in libxl_internal.h. Sorry about that.___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 6/6] Added Resolving Disagreement
On Thu, 28 Nov 2019, Jan Beulich wrote: > On 28.11.2019 01:56, Stefano Stabellini wrote: > > On Thu, 26 Sep 2019, Lars Kurth wrote: > >> +This could take for example the form of > >> +> Do you think it would be useful for the code to do XXX? > >> +> I can imagine a user wanting to do YYY (and XXX would enable this) > >> + > >> +That potentially adds additional work for the code author, which they may > >> not have > >> +the time to perform. It is good practice for authors to consider such a > >> request in terms of > >> +* Usefulness to the user > >> +* Code churn, complexity or impact on other system properties > >> +* Extra time to implement and report back to the reviewer > >> + > >> +If you believe that the impact/cost is too high, report back to the > >> reviewer. To resolve > >> +this, it is advisable to > >> +* Report your findings > >> +* And then check whether this was merely an interesting suggestion, or > >> something the > >> +reviewer feels more strongly about > >> + > >> +In the latter case, there are typically several common outcomes > >> +* The **author and reviewer agree** that the suggestion should be > >> implemented > >> +* The **author and reviewer agree** that it may make sense to defer > >> implementation > >> +* The **author and reviewer agree** that it makes no sense to implement > >> the suggestion > >> + > >> +The author of a patch would typically suggest their preferred outcome, > >> for example > >> +> I am not sure it is worth to implement XXX > >> +> Do you think this could be done as a separate patch in future? > >> + > >> +In cases, where no agreement can be found, the best approach would be to > >> get an > >> +independent opinion from another maintainer or the project's leadership > >> team. > > > > I think we should mention somewhere here that it is recommended for > > reviewers to be explicit about whether a request is optional or whether > > it is a requirement. > > > > For instance: "I think it would be good if X also did Y" doesn't say if > > it is optional (future work) or it is actually required as part of this > > series. More explicit word choices are preferable, such as: > > > > "I think it would be good if X also did Y, not a requirement but good to > > have." > > > > "I think it would be good if X also did Y and it should be part of this > > series." > > I think without it being made explicit that something is optional, > the assumption should be that it isn't. I.e. in the first example > I agree with the idea to have something after the comma, but in > the second example I think the extra wording is a waste of effort. If you are concerned about brevity, then a better example would be: "X should also do Y" -> required "It would be good if X also did Y, just optional." -> optional I didn't want to go that far but we could even have an actually tag, like: "X should also do Y [REQ]" "X should also do Y [OPT]" On the default, let me premise that at the end of the day I agree that the default should be that "it is required", because it is probably what it makes more sense. That said, the issue is that as human beings we tend to forget about these things. As an example, I have been trying to apply this simple optional/reply communication style to my own reviews in the last few months and I still forget often to mark them appropriately. If you forget to mark an optional suggestion as such, it might annoy the contributor, thinking that you are requiring something that she doesn't want to do. If we switched the default to optional, then the risk would be that the contributor could ignore it, which is problematic for the reviewer, although we recommend in these docs for the contributor to say what they intend to do about optional suggestions explicitly, and also as a reviewer I always manually check if all my comments from a previous version of the series were addressed anyway. I don't have a good solution to this, I am just sharing my experience. > > I think there is something else we should say on this topic. There is a > > category of things which could be done in multiple ways and it is not > > overtly obvious which one is best. It is done to the maintainer and the > > author personal styles. It is easy to disagree on that. > > > > I think a good recommendation would be for the contributor to try to > > follow the maintainers requests, even if they could be considered > > "style", trusting their experience on the matter. And a good > > recommendation for the maintainer would be to try to let the contributor > > have freedom of implementation choice on things that don't make a > > significant difference. > > I think we try to, but I also think we suffer from too little > clear documentation on e.g. style aspects. Attempts on my part > to address this have mostly (not entirely) lead no-where (lack of > feedback on proposed patches to ./CODING_STYLE). So for the time > being there are (many) aspects where we have de-facto
Re: [Xen-devel] [MirageOS-devel] [PATCH v2 4/6] Add Code Review Guide
On Nov 28, 2019, at 09:05, Lars Kurth wrote: > > On 28/11/2019, 07:37, "Jan Beulich" wrote: > >>On 28.11.2019 14:06, Lars Kurth wrote: >> I can certainly add something on the timing , along the lines of >> * For complex series, consider the time it takes to do reviews (maybe with a >> guide of LOC per hour) and give reviewers enough time to >> * For series with design issues or large questions, try and highlight the >> key open issues in cover letters clearly and solicit feedback from key >> maintainers who can comment on the open issue. The idea is to save both the >> contributor and the reviewers time by focussing on what needs to be resolved >> * Don’t repost a series, unless all review comments are addressed >> or the reviewers asked you to do so. The problem with this is that >> this is somewhat in conflict with the "let's focus on the core >> issues and not get distracted by details early on in a review cycle". >> In other words, this can only work, if reviewers focus on major >> issues in early reviews only and do not focus on style, coding >> standards, etc. > >But this doesn't make much sense either, because then full re-reviews >need to happen anyway on later versions, to also deal with the minor >issues. For RFC kind of series omitting style and alike feedback >certainly makes sense, but as soon as a patch is non-RFC, it should >be considered good to go in by the submitter. > > OK, I think we have a disconnect between ideal and reality. > > I see two issues today > * Key maintainers don't always review RFC series [they end up at the bottom > of the priority list, even though spending time on RFCs will save time > elsewhere later]. So the effect is that then the contributor assumes there > are no major issues and ends it as a proper series > * In practice what has happened often in the past is that design, > architecture, assumption flaws are found in early versions of a series. > - This usually happens because of an oversight or because there was no > design discussion prior to the series being posted and agreed > - Common sense would dictate that the biggest benefit for both the > reviewer, the contributor and the community as a whole would be to try and > focus on such flaws and leave everything aside > - Of course there may be value in doing a detailed reviews of such a series > as there may be bits that are unaffected by such a flaw > - But there will likely be parts which are not: doing a detailed review of > such portions wastes everyone's time > > So coming back to your point. Ideally, it would be nice if we had the > capability to call out parts of a series as "problematic" and treating such > parts differently. We may be able to reuse some "Shift Left" terminology, including citations of previous Xen code reviews to illustrate categories of design issues that can be shifted left: https://devopedia.org/shift-left Rich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide
On Thu, 28 Nov 2019, Jan Beulich wrote: > On 28.11.2019 01:54, Stefano Stabellini wrote: > > On Thu, 26 Sep 2019, Lars Kurth wrote: > >> From: Lars Kurth > >> > >> This document highlights what reviewers such as maintainers and committers > >> look > >> for when reviewing code. It sets expectations for code authors and provides > >> a framework for code reviewers. > > > > I think the document is missing a couple of things: > > > > - a simple one line statement that possibly the most important thing in > > a code review is to indentify any bugs in the code > > > > - an explanation that requests for major changes to the series should be > > made early on (i.e. let's not change the architecture of a feature at > > v9 if possible) I also made this comment in reply to patch #5. I'll > > let you decide where is the best place for it. > > This needs balancing. People crucial to the evaluation of a new > feature and its implementation simply may not have the time to > reply prior to v9. We've had situations where people posted new > revisions every other day, sometimes even more than one per day. Yes, you are right, it needs balancing. This is not meant to encourage contributors to send 9 versions of a series within a week or two :-) We could say that "contributors should make sure to give enough time to all the key stakeholders to review the series". > As indicated in several other contexts before - imo people not > helping to shoulder the review load should also not have the > expectation that their (large) contributions will be looked at > in due course. I think you are right on this point, and maybe we could add something to that effect ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide
On Nov 28, 2019, at 05:12, Jan Beulich wrote: > > On 28.11.2019 01:54, Stefano Stabellini wrote: >>> On Thu, 26 Sep 2019, Lars Kurth wrote: >>> From: Lars Kurth >>> >>> This document highlights what reviewers such as maintainers and committers >>> look >>> for when reviewing code. It sets expectations for code authors and provides >>> a framework for code reviewers. >> >> I think the document is missing a couple of things: >> >> - a simple one line statement that possibly the most important thing in >> a code review is to indentify any bugs in the code >> >> - an explanation that requests for major changes to the series should be >> made early on (i.e. let's not change the architecture of a feature at >> v9 if possible) I also made this comment in reply to patch #5. I'll >> let you decide where is the best place for it. > > This needs balancing. People crucial to the evaluation of a new > feature and its implementation simply may not have the time to > reply prior to v9. We've had situations where people posted new > revisions every other day, sometimes even more than one per day. > > As indicated in several other contexts before - imo people not > helping to shoulder the review load should also not have the > expectation that their (large) contributions will be looked at > in due course. To make this actionable, we could have: - reviewer demand index: automated index of open patches still in need of review, sorted by decreasing age - review flow control: each new patch submission cites one recent review by the patch submitter, for a patch of comparable size - reviewer supply growth: a bootstrapping guide for new reviewers and submitters, with patterns, anti-patterns, and examples to be emulated Rich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range
On Thu, 28 Nov 2019, Julien Grall wrote: > > In both cases I see no reason to keep wrong code. > > > > Either the patch will let run Linux 5.4 fine - then the patch should > > definitely be taken. > That's up to Stefano and Peng to provide me information why this is fine. > FAOD, the current justification provided is not acceptable for me. I disagree. This is a typo fix. The original design was never spec compliant. You cannot expect the typo fix to explain why the original behavior is tolerable. That is out of scope and should *not* be required for this fix. We cannot expect typo fixes to go and trigger vgic/gic reworks and deep investigations. This is a wrong expectation now, and going forward. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 144350: tolerable all pass - PUSHED
flight 144350 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/144350/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 5655ce8b1ec2a82ef080078e41c73bbd536174e1 baseline version: xen 56348df32bbc782e63b6e3fb978b80e015ae76e7 Last test of basis 144346 2019-11-28 11:01:36 Z0 days Testing same since 144350 2019-11-28 15:00:23 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Jan Beulich jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 56348df32b..5655ce8b1e 5655ce8b1ec2a82ef080078e41c73bbd536174e1 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling
On 28.11.2019 17:42, Ian Jackson wrote: > Jan Beulich writes ("Re: [PATCH-for-4.13 v4] Rationalize max_grant_frames and > max_maptrack_frames handling"): >> On 28.11.2019 14:58, Paul Durrant wrote: >>> uint32_t max_vcpus; >>> uint32_t max_evtchn_port; >>> -uint32_t max_grant_frames; >>> -uint32_t max_maptrack_frames; >>> +int32_t max_grant_frames; >>> +int32_t max_maptrack_frames; >> >> While this may want backporting aiui, we need to be a little >> careful with the interface change here. > > A note here in a list discussion, or even in a commit message, is > perhaps not going to be very effective to deal with this. > > How bad would it be to change the names as well as the types ? Hmm, not sure - on one hand this would flag the issue to people consuming the interface. Otoh it wouldn't help people using derived header files (they'd notice far later), and causing breakage like this in stable trees does't look very friendly either. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH-for-4.13 v5] Rationalize max_grant_frames and max_maptrack_frames handling
From: George Dunlap Xen used to have single, system-wide limits for the number of grant frames and maptrack frames a guest was allowed to create. Increasing or decreasing this single limit on the Xen command-line would change the limit for all guests on the system. Later, per-domain limits for these values was created. The system-wide limits became strict limits: domains could not be created with higher limits, but could be created with lower limits. However, that change also introduced a range of different "default" values into various places in the toolstack: - The python libxc bindings hard-coded these values to 32 and 1024, respectively - The libxl default values are 32 and 1024 respectively. - xl will use the libxl default for maptrack, but does its own default calculation for grant frames: either 32 or 64, based on the max possible mfn. These defaults interact poorly with the hypervisor command-line limit: - The hypervisor command-line limit cannot be used to raise the limit for all guests anymore, as the default in the toolstack will effectively override this. - If you use the hypervisor command-line limit to *reduce* the limit, then the "default" values generated by the toolstack are too high, and all guest creations will fail. In other words, the toolstack defaults require any change to be effected by having the admin explicitly specify a new value in every guest. In order to address this, have grant_table_init treat negative values for max_grant_frames and max_maptrack_frames as instructions to use the system-wide default, and have all the above toolstacks default to passing -1 unless a different value is explicitly configured. This restores the old behavior in that changing the hypervisor command-line option can change the behavior for all guests, while retaining the ability to set per-guest values. It also removes the bug that reducing the system-wide max will cause all domains without explicit limits to fail. NOTE: - The Ocaml bindings require the caller to always specify a value, and the code to start a xenstored stubdomain hard-codes these to 4 and 128 respectively; this behavour will not be modified. Signed-off-by: George Dunlap Signed-off-by: Paul Durrant --- Cc: Ian Jackson Cc: Wei Liu Cc: Andrew Cooper Cc: George Dunlap Cc: Jan Beulich Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Anthony PERARD Cc: "Marek Marczykowski-Górecki" Cc: Volodymyr Babchuk Cc: "Roger Pau Monné" v5: - Remove erroneous __init annotations - Fail out of range command line values with ERANGE - Make opt_max_maptrack_frames static v4: - Add missing braces in xlu_cfg_get_bounded_long() v3: - Make sure that specified values cannot be negative or overflow a signed int v2: - re-worked George's original commit massage a little - fixed the text in xl.conf.5.pod - use -1 as the sentinel value for 'default' and < 0 for checking it --- docs/man/xl.conf.5.pod| 6 +++-- tools/libxl/libxl.h | 4 +-- tools/libxl/libxl_types.idl | 4 +-- tools/libxl/libxlu_cfg.c | 26 -- tools/libxl/libxlutil.h | 2 ++ tools/python/xen/lowlevel/xc/xc.c | 4 +-- tools/xl/xl.c | 15 --- tools/xl/xl_parse.c | 9 --- xen/arch/arm/setup.c | 2 +- xen/arch/x86/setup.c | 4 +-- xen/common/grant_table.c | 45 +++ xen/include/public/domctl.h | 10 --- xen/include/xen/grant_table.h | 10 +++ 13 files changed, 100 insertions(+), 41 deletions(-) diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod index 962144e38e..207ab3e77a 100644 --- a/docs/man/xl.conf.5.pod +++ b/docs/man/xl.conf.5.pod @@ -81,13 +81,15 @@ Default: C Sets the default value for the C domain config value. -Default: C<32> on hosts up to 16TB of memory, C<64> on hosts larger than 16TB +Default: value of Xen command line B parameter (or its +default value if unspecified). =item B Sets the default value for the C domain config value. -Default: C<1024> +Default: value of Xen command line B +parameter (or its default value if unspecified). =item B diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 49b56fa1a3..a2a5d321c5 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -364,8 +364,8 @@ */ #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1 -#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32 -#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024 +#define LIBXL_MAX_GRANT_FRAMES_DEFAULT -1 +#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT -1 /* * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 0546d7865a..63e29bb2fb 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -511,8 +511,8 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("vnuma_nodes",
Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling
On 28.11.2019 17:47, Jürgen Groß wrote: > On 28.11.19 17:36, Jan Beulich wrote: >> On 28.11.2019 14:58, Paul Durrant wrote: >>> --- a/xen/common/grant_table.c >>> +++ b/xen/common/grant_table.c >>> @@ -84,11 +84,43 @@ struct grant_table { >>> struct grant_table_arch arch; >>> }; >>> >>> +static int __init parse_gnttab_limit(const char *param, const char *arg, >> >> No __init please here and below, for this being runtime option >> parsing functions. >> >>> + unsigned int *valp) >>> +{ >>> +const char *e; >>> +unsigned long val; >>> + >>> +val = simple_strtoul(arg, , 0); >>> +if ( *e ) >>> +return -EINVAL; >>> + >>> +if ( val <= INT_MAX ) >>> +*valp = val; >>> +else >>> +printk("parameter \"%s\" value \"%s\" is out of range; using value >>> \"%u\"\n", >>> + param, arg, *valp); >> >> Better store INT_MAX in this case rather than leaving the value > > No, INT_MAX is no good idea. In case of this happening at boot time we'd > allocate an array of 2 billion pointers for dom0... But we've been asked for even more. We should let the admin shoot itself in the foot, I think. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling
On 28.11.2019 17:42, Durrant, Paul wrote: >> From: Jan Beulich >> Sent: 28 November 2019 16:37 >> >> On 28.11.2019 14:58, Paul Durrant wrote: >>> +val = simple_strtoul(arg, , 0); >>> +if ( *e ) >>> +return -EINVAL; >>> + >>> +if ( val <= INT_MAX ) >>> +*valp = val; >>> +else >>> +printk("parameter \"%s\" value \"%s\" is out of range; using >> value \"%u\"\n", >>> + param, arg, *valp); >> >> Better store INT_MAX in this case rather than leaving the value >> unchanged? Or otherwise ... >> >>> +return 0; >> >> ... at least don't return success? > > TBH I wasn't sure what the best thing to do was. In the end I opted > for the warning and a successful completion as I thought a failure > would be largely unhelpful. I can change this into an error though. Well, if you return success, then the option should be handled in at least a best effort manner, i.e. by storing INT_MAX as indicated. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling
On 28.11.19 17:36, Jan Beulich wrote: On 28.11.2019 14:58, Paul Durrant wrote: --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -84,11 +84,43 @@ struct grant_table { struct grant_table_arch arch; }; +static int __init parse_gnttab_limit(const char *param, const char *arg, No __init please here and below, for this being runtime option parsing functions. + unsigned int *valp) +{ +const char *e; +unsigned long val; + +val = simple_strtoul(arg, , 0); +if ( *e ) +return -EINVAL; + +if ( val <= INT_MAX ) +*valp = val; +else +printk("parameter \"%s\" value \"%s\" is out of range; using value \"%u\"\n", + param, arg, *valp); Better store INT_MAX in this case rather than leaving the value No, INT_MAX is no good idea. In case of this happening at boot time we'd allocate an array of 2 billion pointers for dom0... Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling
> -Original Message- > From: Jan Beulich > Sent: 28 November 2019 16:37 > To: Durrant, Paul > Cc: xen-devel@lists.xenproject.org; Andrew Cooper > ; Anthony PERARD ; > George Dunlap ; Roger Pau Monné > ; Volodymyr Babchuk ; > George Dunlap ; Ian Jackson > ; Marek Marczykowski-Górecki > ; Stefano Stabellini > ; Konrad Rzeszutek Wilk ; > Julien Grall ; Wei Liu > Subject: Re: [PATCH-for-4.13 v4] Rationalize max_grant_frames and > max_maptrack_frames handling > > On 28.11.2019 14:58, Paul Durrant wrote: > > --- a/xen/common/grant_table.c > > +++ b/xen/common/grant_table.c > > @@ -84,11 +84,43 @@ struct grant_table { > > struct grant_table_arch arch; > > }; > > > > +static int __init parse_gnttab_limit(const char *param, const char > *arg, > > No __init please here and below, for this being runtime option > parsing functions. > Sorry, yes... forgot about the runtime part. > > + unsigned int *valp) > > +{ > > +const char *e; > > +unsigned long val; > > + > > +val = simple_strtoul(arg, , 0); > > +if ( *e ) > > +return -EINVAL; > > + > > +if ( val <= INT_MAX ) > > +*valp = val; > > +else > > +printk("parameter \"%s\" value \"%s\" is out of range; using > value \"%u\"\n", > > + param, arg, *valp); > > Better store INT_MAX in this case rather than leaving the value > unchanged? Or otherwise ... > > > +return 0; > > ... at least don't return success? TBH I wasn't sure what the best thing to do was. In the end I opted for the warning and a successful completion as I thought a failure would be largely unhelpful. I can change this into an error though. > > > +} > > + > > unsigned int __read_mostly opt_max_grant_frames = 64; > > -integer_runtime_param("gnttab_max_frames", opt_max_grant_frames); > > + > > +static int __init parse_gnttab_max_frames(const char *arg) > > +{ > > +return parse_gnttab_limit("gnttab_max_frames", arg, > > + _max_grant_frames); > > +} > > +custom_runtime_param("gnttab_max_frames", parse_gnttab_max_frames); > > > > unsigned int __read_mostly opt_max_maptrack_frames = 1024; > > As indicated this wants to become static now. Sorry I forgot about that. Paul ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling
Jan Beulich writes ("Re: [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling"): > On 28.11.2019 14:58, Paul Durrant wrote: > > uint32_t max_vcpus; > > uint32_t max_evtchn_port; > > -uint32_t max_grant_frames; > > -uint32_t max_maptrack_frames; > > +int32_t max_grant_frames; > > +int32_t max_maptrack_frames; > > While this may want backporting aiui, we need to be a little > careful with the interface change here. A note here in a list discussion, or even in a commit message, is perhaps not going to be very effective to deal with this. How bad would it be to change the names as well as the types ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses
On 28.11.2019 12:44, Andrew Cooper wrote: > c/s d0a699a389f1 "x86/monitor: add support for descriptor access events" > introduced logic looking for what appeared to be exitinfo (not that this > exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring > information. There is never any IDT vectoring involved in these intercepts so > the value passed is always zero. > > In fact, SVM doesn't provide any information, even in exitinfo1 and 2. Note > the error in the public API and state that this field is always 0, and drop > the SVM logic in hvm_monitor_descriptor_access(). > > In the SVM vmexit handler itself, optimise the switch statement by observing > that there is a linear transformation between the SVM exit_reason and > VM_EVENT_DESC_* values. (Bloat-o-meter reports 6028 => 5877 for a saving of > 151 bytes). > > Signed-off-by: Andrew Cooper SVM part of the change Acked-by: Jan Beulich Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling
On 28.11.2019 14:58, Paul Durrant wrote: > --- a/xen/common/grant_table.c > +++ b/xen/common/grant_table.c > @@ -84,11 +84,43 @@ struct grant_table { > struct grant_table_arch arch; > }; > > +static int __init parse_gnttab_limit(const char *param, const char *arg, No __init please here and below, for this being runtime option parsing functions. > + unsigned int *valp) > +{ > +const char *e; > +unsigned long val; > + > +val = simple_strtoul(arg, , 0); > +if ( *e ) > +return -EINVAL; > + > +if ( val <= INT_MAX ) > +*valp = val; > +else > +printk("parameter \"%s\" value \"%s\" is out of range; using value > \"%u\"\n", > + param, arg, *valp); Better store INT_MAX in this case rather than leaving the value unchanged? Or otherwise ... > +return 0; ... at least don't return success? > +} > + > unsigned int __read_mostly opt_max_grant_frames = 64; > -integer_runtime_param("gnttab_max_frames", opt_max_grant_frames); > + > +static int __init parse_gnttab_max_frames(const char *arg) > +{ > +return parse_gnttab_limit("gnttab_max_frames", arg, > + _max_grant_frames); > +} > +custom_runtime_param("gnttab_max_frames", parse_gnttab_max_frames); > > unsigned int __read_mostly opt_max_maptrack_frames = 1024; As indicated this wants to become static now. > --- a/xen/include/public/domctl.h > +++ b/xen/include/public/domctl.h > @@ -82,13 +82,15 @@ struct xen_domctl_createdomain { > uint32_t iommu_opts; > > /* > - * Various domain limits, which impact the quantity of resources (global > - * mapping space, xenheap, etc) a guest may consume. > + * Various domain limits, which impact the quantity of resources > + * (global mapping space, xenheap, etc) a guest may consume. For > + * max_grant_frames and max_maptrack_frames, < 0 means "use the > + * default maximum value in the hypervisor". > */ > uint32_t max_vcpus; > uint32_t max_evtchn_port; > -uint32_t max_grant_frames; > -uint32_t max_maptrack_frames; > +int32_t max_grant_frames; > +int32_t max_maptrack_frames; While this may want backporting aiui, we need to be a little careful with the interface change here. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 15:30, Roger Pau Monné wrote: > On Thu, Nov 28, 2019 at 03:19:50PM +0100, Jan Beulich wrote: >> On 28.11.2019 15:13, Roger Pau Monné wrote: >>> On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote: On 28.11.2019 12:39, Roger Pau Monné wrote: > On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: >> At the time the pending EOI stack was introduced there were no >> internally used IRQs which would have the LAPIC EOI issued from the >> ->end() hook. This had then changed with the introduction of IOMMUs, >> but the interaction issue was presumably masked by >> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early >> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer >> running without need"]). >> >> The problem is that with us re-enabling interrupts across handler >> invocation, a higher priority (guest) interrupt may trigger while >> handling a lower priority (internal) one. The EOI issued from >> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly >> EOI the higher priority (guest) interrupt, breaking (among other >> things) pending EOI stack logic's assumptions. > > Maybe there's something that I'm missing, but shouldn't hypervisor > vectors always be higher priority than guest ones? Depends - IOMMU ones imo aren't something that needs urgently dealing with, so a little bit of delay won't hurt. There would only be a problem if such interrupts could be deferred indefinitely. > I see there's already a range reserved for high priority vectors > ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor > interrupts not using this range? We'd quickly run out of high priority vectors on systems with multiple (and perhaps indeed many) IOMMUs. >>> >>> Well, there's no limit on the number of high priority vectors, since >>> this is all a software abstraction. It only matters that such vectors >>> are higher than guest owned ones. >>> >>> I have to take a look, but I would think that Xen used vectors are the >>> first ones to be allocated, and hence could start from >>> FIRST_HIPRIORITY_VECTOR - 1 and go down from there. >> >> If this was the case, then we wouldn't have observed the issue (despite >> it being there) this patch tries to address. The IOMMUs for both Andrew >> and me ended up using vector 0x28, below everything that e.g. the >> IO-APIC RTE got assigned. > > I know it's not like that ATM, and hence I wonder whether it would be > possible to make it so: Xen vectors get allocated down from > FIRST_HIPRIORITY_VECTOR - 1 and then we won't have this issue. > >> Also don't forget that we don't allocate >> vectors continuously, but such that they'd get spread across the >> different priority levels. (Whether that's an awfully good idea is a >> separate question.) > > Well, vectors used by Xen would be allocated downwards continuously > from FIRST_HIPRIORITY_VECTOR - 1, and hence won't be spread. > > Guest used vectors could continue to use the same allocation > mechanism, since that's a different issue. The issue would go away only if guest vectors are at strictly lower priority than Xen ones. I.e. we'd need to go in steps of 16. And there aren't that many vectors ... (I'm happy to see changes here, but it'll need to be very careful ones.) Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v2 0/3] x86/hvm: Multiple corrections to task switch handling
On 26.11.19 13:03, Andrew Cooper wrote: These patches want backporting due to the severity of patch 2. They should therefore be considered for 4.13 at this point. Andrew Cooper (3): x86/vtx: Fix fault semantics for early task switch failures x86/svm: Always intercept ICEBP x86/svm: Write the correct %eip into the outgoing task xen/arch/x86/hvm/hvm.c| 4 +- xen/arch/x86/hvm/svm/emulate.c| 54 xen/arch/x86/hvm/svm/svm.c| 77 ++- xen/arch/x86/hvm/svm/vmcb.c | 2 +- xen/arch/x86/hvm/vmx/vmx.c| 4 +- xen/arch/x86/monitor.c| 3 -- xen/include/asm-x86/hvm/hvm.h | 13 +- xen/include/asm-x86/hvm/svm/emulate.h | 1 + 8 files changed, 109 insertions(+), 49 deletions(-) For the series: Release-acked-by: Juergen Gross Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling
On 28.11.19 16:33, Hans van Kranenburg wrote: On 11/28/19 3:54 PM, George Dunlap wrote: On 11/27/19 10:32 PM, Hans van Kranenburg wrote: Hi all, On 11/27/19 12:13 PM, Durrant, Paul wrote: -Original Message- From: Ian Jackson Sent: 27 November 2019 11:10 [...] Subject: RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling Durrant, Paul writes ("RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling"): -Original Message- From: Xen-devel On Behalf Of Ian Jackson I have seen reports of users who ran out of grant/maptrack frames because of updates to use multiring protocols etc. The error messages are not very good and the recommended workaround has been to increase the default limit on the hypervisor command line. It is important that we don't break that workaround! Alas it has apparently been broken for several releases now :-( I guess at least in Debian (where I have seen this) we haven't released with any affected versions yet... I believe the problem was introduce in 4.10, so I think it would be prudent to also back-port the final fix to stable trees from then on. Yes, the max grant frame issue has historically always been a painful experience for end users, and Xen 4.11 which we now have in the current Debian stable has made it worse compared to previous versions indeed. This rather suggests that the default value isn't very well chosen. Ideally some investigation would be done to improve the default sizing; end-users shouldn't have to know anything about grant table frames. Most of the problems started happening a few years ago when using a newer Linux that got all kinds of multiqueue block stuff for disk and network enabled on top of an older Xen. (e.g. in Debian using the Linux 4.9 backports kernel on top of Xen 4.4 in Jessie). The default for the hypervisor option has already been doubled from 32 to 64, which I think is sufficient. However, having the toolstack revert it back to 32 again is not very helpful, but that's what this thread is about to solve. :) A while ago I did some testing: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554#119 I haven't been able to cause nr_frames to go over 64 in any test myself, and also have never seen values that high in production use. The above debian bug also does not contain any other report from anyone with a number above 64. There are reports of users setting it to 256 and then not caring about it any more, but they didn't report the xen_diag output back after that, so there's no real data. I have seen guests needing 256. My Linux kernel patches reducing the default max. number of queues in netfront/netback to 8 made things much better (on a large host running a guest with 64 vcpus using 8 network interfaces was blowing up rather fast). Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling
On 11/28/19 3:54 PM, George Dunlap wrote: > On 11/27/19 10:32 PM, Hans van Kranenburg wrote: >> Hi all, >> >> On 11/27/19 12:13 PM, Durrant, Paul wrote: -Original Message- From: Ian Jackson Sent: 27 November 2019 11:10 [...] Subject: RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling Durrant, Paul writes ("RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling"): >> -Original Message- >> From: Xen-devel On Behalf Of Ian >> Jackson >> I have seen reports of users who ran out of grant/maptrack frames >> because of updates to use multiring protocols etc. The error messages >> are not very good and the recommended workaround has been to increase >> the default limit on the hypervisor command line. >> >> It is important that we don't break that workaround! > > Alas it has apparently been broken for several releases now :-( I guess at least in Debian (where I have seen this) we haven't released with any affected versions yet... >>> >>> I believe the problem was introduce in 4.10, so I think it would be prudent >>> to also back-port the final fix to stable trees from then on. >> >> Yes, the max grant frame issue has historically always been a painful >> experience for end users, and Xen 4.11 which we now have in the current >> Debian stable has made it worse compared to previous versions indeed. > > This rather suggests that the default value isn't very well chosen. > Ideally some investigation would be done to improve the default sizing; > end-users shouldn't have to know anything about grant table frames. Most of the problems started happening a few years ago when using a newer Linux that got all kinds of multiqueue block stuff for disk and network enabled on top of an older Xen. (e.g. in Debian using the Linux 4.9 backports kernel on top of Xen 4.4 in Jessie). The default for the hypervisor option has already been doubled from 32 to 64, which I think is sufficient. However, having the toolstack revert it back to 32 again is not very helpful, but that's what this thread is about to solve. :) A while ago I did some testing: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554#119 I haven't been able to cause nr_frames to go over 64 in any test myself, and also have never seen values that high in production use. The above debian bug also does not contain any other report from anyone with a number above 64. There are reports of users setting it to 256 and then not caring about it any more, but they didn't report the xen_diag output back after that, so there's no real data. Hans ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28/11/2019 14:00, Jan Beulich wrote: > On 28.11.2019 14:53, Andrew Cooper wrote: >> On 28/11/2019 12:10, Andrew Cooper wrote: >>> On 28/11/2019 11:03, Jan Beulich wrote: Notes: - In principle we could get away without the check_eoi_deferral flag. I've introduced it just to make sure there's as little change as possible to unaffected paths. - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't strictly necessary. >>> I don't think the cpu_has_pending_apic_eoi() check is necessary. It is >>> checked at the head of end_nonmaskable_irq() as well. >>> >>> Similarly, I'm not sure that check_eoi_deferral is something that we'd >>> want to introduce. >>> >>> I'll drop both of these and test, seeing as I have a repro of the problem. >> Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so >> I didn't), and dropping just check_eoi_deferral on its own definitely >> breaks things. >> >> Given the 4.13 timeline, lets go with it in this form, seeing as it is >> the version which had all of last night's worth of testing. >> >> Acked-by: Andrew Cooper >> Tested-by: Andrew Cooper > Thanks! I've taken note to produce a patch (if possible at all, given > the results of your attempt) to remove the extra pieces again, ideally > to go in pretty soon after the branching. Just for morbid curiosity, this is what happens without the check_eoi_deferral check: [ 383.845620] dracut-initqueue[302]: Warning: Could not boot. [ 383.892655] dracut-initqueue[302]: Warning: /dev/disk/by-label/root-qomhwa does not exist Starting Dracut Emergency Shell... dracut:/# [ 439.282461] BUG: unable to handle kernel NULL pointer dereference at 09a0 [ 439.282773] PGD 0 P4D 0 [ 439.282928] Oops: [#1] SMP NOPTI [ 439.283088] CPU: 0 PID: 281 Comm: systemd-udevd Tainted: G O 4.19.0+1 #1 [ 439.283337] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS 2.0.2 10/22/2012 [ 439.283591] RIP: e030:mptscsih_remove+0x5b/0xb0 [mptscsih] [ 439.283751] Code: ff 74 1e 8b 83 9c 01 00 00 44 8d 2c c5 00 00 00 00 e8 c9 4d e9 c0 48 c7 83 c0 0c 00 00 00 00 00 00 f6 83 c8 00 00 00 01 75 34 <48> 8b bd a0 [ 439.284153] RSP: e02b:c900404b7b50 EFLAGS: 00010246 [ 439.284312] RAX: 0049 RBX: 88809a7d9000 RCX: 0006 [ 439.284477] RDX: RSI: 0001 RDI: [ 439.284644] RBP: R08: R09: 0384 [ 439.284810] R10: 0004 R11: R12: 88809a7cc000 [ 439.284975] R13: R14: c038b180 R15: c900404b7e98 [ 439.285149] FS: 7ff0922a38c0() GS:8880a360() knlGS: [ 439.285428] CS: e033 DS: ES: CR0: 80050033 [ 439.285590] CR2: 09a0 CR3: 04012000 CR4: 0660 [ 439.285765] Call Trace: [ 439.285924] mptsas_probe+0x384/0x530 [mptsas] [ 439.286133] pci_device_probe+0xc9/0x140 [ 439.286297] really_probe+0x238/0x3e0 [ 439.286455] driver_probe_device+0x115/0x130 [ 439.286622] __driver_attach+0x103/0x110 [ 439.286805] ? driver_probe_device+0x130/0x130 [ 439.286964] bus_for_each_dev+0x67/0xc0 [ 439.287120] bus_add_driver+0x41/0x260 [ 439.287274] ? 0xc038f000 [ 439.287426] driver_register+0x5b/0xe0 [ 439.287578] ? 0xc038f000 [ 439.287732] mptsas_init+0x114/0x1000 [mptsas] [ 439.287890] do_one_initcall+0x4e/0x1d4 [ 439.288045] ? _cond_resched+0x15/0x30 [ 439.288232] ? kmem_cache_alloc_trace+0x15f/0x1c0 [ 439.288392] do_init_module+0x5a/0x21b [ 439.288546] load_module+0x2270/0x2800 [ 439.288700] ? m_show+0x1c0/0x1c0 [ 439.288854] __do_sys_finit_module+0x94/0xe0 [ 439.289012] do_syscall_64+0x4e/0x100 [ 439.289168] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 439.289326] RIP: 0033:0x7ff090ef9ec9 [ 439.289482] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f8 [ 439.289871] RSP: 002b:7ffd96d355d8 EFLAGS: 0246 ORIG_RAX: 0139 [ 439.290120] RAX: ffda RBX: 5581851f9f70 RCX: 7ff090ef9ec9 [ 439.290287] RDX: RSI: 7ff09181a099 RDI: 000c [ 439.290458] RBP: 7ff09181a099 R08: R09: 5581851f6670 [ 439.290625] R10: 000c R11: 0246 R12: [ 439.290792] R13: 5581851f6240 R14: 0002 R15: [ 439.290993] Modules linked in: ata_generic pata_acpi ohci_pci ahci serio_raw pata_atiixp libahci mptsas(+) scsi_transport_sas ehci_pci bnx2(O) libata mptscst [ 439.291484] CR2: 09a0 [ 439.291665] ---[ end trace 519b6f631d8509d1 ]--- [ 439.295635] RIP: e030:mptscsih_remove+0x5b/0xb0 [mptscsih] [ 439.295804] Code: ff 74 1e 8b 83 9c 01 00 00 44 8d 2c c5 00 00 00 00 e8 c9 4d e9 c0 48 c7 83 c0 0c 00 00 00 00 00 00 f6 83
Re: [Xen-devel] [PATCH LP-BUILD_TOOLS] Fix building with updated ENFORCE_UNIQUE_SYMBOLS behaviour
On 28/11/2019 14:51, Ross Lagerwall wrote: > The patch "build: provide option to disambiguate symbol names" changes > ENFORCE_UNIQUE_SYMBOLS so that gcc generates output to a temporary file > and then objcopy is used to create the final object file. This breaks > livepatch-build's interposition of GCC to capture the changed object > files so intercept calls to objcopy as well to capture the final object > files. > > While in the area, add a couple of extra object files to be ignored when > patching. > > Signed-off-by: Ross Lagerwall > --- > > With this change, I've built and successfully applied a trivial > livepatch with Jan's patch applied and ENFORCE_UNIQUE_SYMBOLS turned on. Can confirm the same with my test LP. So Tested-by: Sergey Dyasli can go to both this and Jan's patch I guess. -- Thanks, Sergey ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses
Hello, Andrew Cooper writes: > c/s d0a699a389f1 "x86/monitor: add support for descriptor access events" > introduced logic looking for what appeared to be exitinfo (not that this > exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring > information. There is never any IDT vectoring involved in these intercepts so > the value passed is always zero. > > In fact, SVM doesn't provide any information, even in exitinfo1 and 2. Note > the error in the public API and state that this field is always 0, and drop > the SVM logic in hvm_monitor_descriptor_access(). > > In the SVM vmexit handler itself, optimise the switch statement by observing > that there is a linear transformation between the SVM exit_reason and > VM_EVENT_DESC_* values. (Bloat-o-meter reports 6028 => 5877 for a saving of > 151 bytes). > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > CC: Razvan Cojocaru > CC: Tamas K Lengyel > CC: Alexandru Isaila > CC: Petre Pircalabu > CC: Adrian Pop > > Adrian: Do you recall what information you were attempting to forward from the > VMCB? I can't locate anything which would plausibly be interesting. The SVM part was most likely meant to mirror the logic from VMX. But, as I recall, monitoring hadn't been implemented with SVM so this couldn't really be used on AMD. Unfortunately I'm not sure what information was supposed to be forwarded. The cleanup looks good to me. Acked-by: Adrian Pop > This is part of a longer cleanup series I gathered in the wake of the task > switch issues, but it is pulled out ahead due to its interaction with the > public interface. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On Thu, Nov 28, 2019 at 01:10:05PM +, Andrew Cooper wrote: > On 28/11/2019 10:17, George Dunlap wrote: > >> On Nov 28, 2019, at 9:55 AM, Jan Beulich wrote: > > Has anyone actually tried building a livepatch with this change in > > place? > Actually - what is your concern here? The exact spelling of symbols > names should be of no interest to the tool. After all the compiler is > free to invent all sorts of names for its local symbols, including > the ones we would produce with this change in place. All the tool > cares about is that the names be unambiguous. Hence any (theoretical) > regression here would be a bug in the tools, which imo is no reason > to delay this change any further. (Granted I should have got to it > earlier, but it had been continuing to get deferred.) > >>> This might all be true (theoretically). > >>> > >>> The livepatch build tools are fragile and very sensitive to how the > >>> object files are laid out. There is a very real risk that this change > >>> accidentally breaks livepatching totally, even on GCC builds. > >>> > >>> Were this to happen, it would be yet another 4.13 regression. > >> It's perhaps a matter of perception, but I'd still call this a > >> live patching tools bug then, not a 4.13 regression. > > After the discussion yesterday, I was thinking a bit more about this, and > > I’m not happy with the principle Andy seems to be operating on, > > I'm sorry that you feel that way. > > > that it’s reasonable to completely block a bug-fixing patch to Xen, forcing > > a work-around to be used in a release, because it has unknown effects on > > livepatching. > > This is not a fair characterisation of what is going on here. Ignore > the specifics of this patch - they are not relevant to my objection. > > As a maintainer, it is my responsibility to ensure that crap doesn't get > committed. > It is fine to have differing opinions; it is not fine to make an emotionally charged comment like this. It serves no-one's interest. Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling
On 11/27/19 10:32 PM, Hans van Kranenburg wrote: > Hi all, > > On 11/27/19 12:13 PM, Durrant, Paul wrote: >>> -Original Message- >>> From: Ian Jackson >>> Sent: 27 November 2019 11:10 >>> [...] >>> Subject: RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames >>> and max_maptrack_frames handling >>> >>> Durrant, Paul writes ("RE: [Xen-devel] [PATCH for-4.13 2/2] Rationalize >>> max_grant_frames and max_maptrack_frames handling"): > -Original Message- > From: Xen-devel On Behalf Of >>> Ian > Jackson > I have seen reports of users who ran out of grant/maptrack frames > because of updates to use multiring protocols etc. The error messages > are not very good and the recommended workaround has been to increase > the default limit on the hypervisor command line. > > It is important that we don't break that workaround! Alas it has apparently been broken for several releases now :-( >>> >>> I guess at least in Debian (where I have seen this) we haven't >>> released with any affected versions yet... >> >> I believe the problem was introduce in 4.10, so I think it would be prudent >> to also back-port the final fix to stable trees from then on. > > Yes, the max grant frame issue has historically always been a painful > experience for end users, and Xen 4.11 which we now have in the current > Debian stable has made it worse compared to previous versions indeed. This rather suggests that the default value isn't very well chosen. Ideally some investigation would be done to improve the default sizing; end-users shouldn't have to know anything about grant table frames. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH LP-BUILD_TOOLS] Fix building with updated ENFORCE_UNIQUE_SYMBOLS behaviour
The patch "build: provide option to disambiguate symbol names" changes ENFORCE_UNIQUE_SYMBOLS so that gcc generates output to a temporary file and then objcopy is used to create the final object file. This breaks livepatch-build's interposition of GCC to capture the changed object files so intercept calls to objcopy as well to capture the final object files. While in the area, add a couple of extra object files to be ignored when patching. Signed-off-by: Ross Lagerwall --- With this change, I've built and successfully applied a trivial livepatch with Jan's patch applied and ENFORCE_UNIQUE_SYMBOLS turned on. To be applied at the same time as Jan's patch. livepatch-gcc | 22 ++ 1 file changed, 22 insertions(+) diff --git a/livepatch-gcc b/livepatch-gcc index 01e4b8c..91333d5 100755 --- a/livepatch-gcc +++ b/livepatch-gcc @@ -26,6 +26,7 @@ declare -a args=("$@") keep=no declare -r GCC_RE='gcc.*' +declare -r OBJCOPY_RE='objcopy.*' if [[ "$TOOLCHAINCMD" =~ $GCC_RE ]] ; then while [ "$#" -gt 0 ]; do if [ "$1" = "-o" ]; then @@ -34,7 +35,9 @@ if [[ "$TOOLCHAINCMD" =~ $GCC_RE ]] ; then case "$obj" in version.o|\ debug.o|\ +efi/check.o|\ *.xen-syms.*.o|\ +*.xen.efi.*.o|\ built_in.o|\ prelink.o|\ .*.o) @@ -56,6 +59,25 @@ if [[ "$TOOLCHAINCMD" =~ $GCC_RE ]] ; then fi shift done +elif [[ "$TOOLCHAINCMD" =~ $OBJCOPY_RE ]] ; then +obj="${!#}" +case "$obj" in +version.o|\ +debug.o|\ +efi/check.o|\ +.*.o) +;; +*.o) +path="$(pwd)/$(dirname $obj)" +dir="${path#$LIVEPATCH_BUILD_DIR}" +if [ -n "$LIVEPATCH_CAPTURE_DIR" -a -d "$LIVEPATCH_CAPTURE_DIR" ]; then +echo "$dir/$obj" >> "${LIVEPATCH_CAPTURE_DIR}/changed_objs" +keep=yes +fi +;; +*) +;; +esac fi "$TOOLCHAINCMD" "${args[@]}" -- 2.21.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 11/28/19 1:57 PM, Jan Beulich wrote: > On 28.11.2019 14:54, Ross Lagerwall wrote: >> On 11/28/19 1:49 PM, Jan Beulich wrote: >>> On 28.11.2019 13:15, Sergey Dyasli wrote: Applying the patch didn't end up well for my test LP (from another thread): Perform full initial build with 8 CPU(s)... Reading special section data Apply patch and build with 8 CPU(s)... Unapply patch and build with 8 CPU(s)... Extracting new and modified ELF sections... Processing xen/arch/x86/mm/shadow/guest_2.o Processing xen/arch/x86/mm/shadow/guest_4.o Processing xen/arch/x86/mm/shadow/guest_3.o Processing xen/arch/x86/mm/guest_walk_3.o Processing xen/arch/x86/mm/hap/guest_walk_3level.o Processing xen/arch/x86/mm/hap/guest_walk_4level.o Processing xen/arch/x86/mm/hap/guest_walk_2level.o Processing xen/arch/x86/mm/guest_walk_2.o Processing xen/arch/x86/mm/guest_walk_4.o Processing xen/arch/x86/efi/efi/check.o Processing xen/arch/x86/pv/gpr_switch.o Processing xen/arch/x86/indirect-thunk.o Processing xen/arch/x86/boot/head.o Processing xen/arch/x86/x86_64/kexec_reloc.o Processing xen/arch/x86/x86_64/compat/entry.o Processing xen/arch/x86/x86_64/entry.o Processing xen/arch/x86/hvm/vmx/entry.o Processing xen/arch/x86/hvm/svm/entry.o Processing xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o Processing xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o Processing xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1s.o Processing xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1r.o ERROR: no functional changes found. So this looks like a regression. >>> >>> Thanks for doing the testing. But what am I to conclude from >>> the above? I can't even tell why "no functional changes found" >>> is an error. >>> >> >> It's due to the way livepatch-build tool interposes on the build to capture >> changed object files for later comparison. Now that objcopy writes out the >> proper object files rather than gcc (which just writes a temporary one), the >> livepatch-build tool needs some adjustment otherwise it doesn't capture any >> changed files. I'm working on a patch. > > For my own education, and just if you have the time: Why would there > be any dependency on which build utility produces the object file? > It uses CROSS_COMPILE to funnel all build output to a script (https://xenbits.xen.org/gitweb/?p=livepatch-build-tools.git;a=blob;f=livepatch-gcc) which then notes changed object files by processing gcc's command-line. If objcopy is used instead of gcc to produce the final output then the script processes gcc's command-line but doesn't get the output it expects so no changes are detected. Yes, this is hacky -- improvements are welcome! -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/3] arm64: remove uaccess_ttbr0 asm macros from cache functions
On Wed, Nov 27, 2019 at 01:44:52PM -0500, Pavel Tatashin wrote: > We currently duplicate the logic to enable/disable uaccess via TTBR0, > with C functions and assembly macros. This is a maintenenace burden > and is liable to lead to subtle bugs, so let's get rid of the assembly > macros, and always use the C functions. This requires refactoring > some assembly functions to have a C wrapper. [...] > +static inline int invalidate_icache_range(unsigned long start, > + unsigned long end) > +{ > + int rv; Please make this 'ret', for consistency with other arm64 code. We only use 'rv' in one place where it's short for "Revision and Variant", and 'ret' is our usual name for a temporary variable used to hold a return value. > + > + if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC)) { > + isb(); > + return 0; > + } > + uaccess_ttbr0_enable(); Please place a newline between these two, for consistency with other arm64 code. > + rv = asm_invalidate_icache_range(start, end); > + uaccess_ttbr0_disable(); > + > + return rv; > +} > + > static inline void flush_icache_range(unsigned long start, unsigned long end) > { > __flush_icache_range(start, end); > diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S > index db767b072601..a48b6dba304e 100644 > --- a/arch/arm64/mm/cache.S > +++ b/arch/arm64/mm/cache.S > @@ -15,7 +15,7 @@ > #include > > /* > - * flush_icache_range(start,end) > + * __asm_flush_icache_range(start,end) > * > * Ensure that the I and D caches are coherent within specified region. > * This is typically used when code has been written to a memory region, > @@ -24,11 +24,11 @@ > * - start - virtual start address of region > * - end - virtual end address of region > */ > -ENTRY(__flush_icache_range) > +ENTRY(__asm_flush_icache_range) > /* FALLTHROUGH */ > > /* > - * __flush_cache_user_range(start,end) > + * __asm_flush_cache_user_range(start,end) > * > * Ensure that the I and D caches are coherent within specified region. > * This is typically used when code has been written to a memory region, > @@ -37,8 +37,7 @@ ENTRY(__flush_icache_range) > * - start - virtual start address of region > * - end - virtual end address of region > */ > -ENTRY(__flush_cache_user_range) > - uaccess_ttbr0_enable x2, x3, x4 > +ENTRY(__asm_flush_cache_user_range) > alternative_if ARM64_HAS_CACHE_IDC It's unfortunate that we pulled the IDC alternative out for invalidate_icache_range(), but not here. If we want to pull out that, then I reckon what we might want to do is have two asm primitives: * __asm_clean_dcache_range * __asm_invalidate_icache_range ... which just do the by_line op, with a fixup that can return -EFAULT. Then we can give each a C wrapper that just does the IDC/DIC check, e.g. static int __clean_dcache_range(unsigned long start, unsigned long end) { if (cpus_have_const_cap(ARM64_HAS_CACHE_IDC)) { dsb(ishst); return 0; } return __asm_clean_dcache_range(start, end); } Then we can build all the more complicated variants atop of those, e.g. static int __flush_cache_user_range(unsigned long start, unsigned long end) { int ret; uaccess_ttbr0_enable(); ret = __clean_dcache_range(start, end); if (ret) goto out; ret = __invalidate_icache_range(start, end); out: uaccess_ttbr0_disable(); return ret; } Thanks, Mark. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses
On 28.11.2019 13:44, Andrew Cooper wrote: > c/s d0a699a389f1 "x86/monitor: add support for descriptor access events" > introduced logic looking for what appeared to be exitinfo (not that this > exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring > information. There is never any IDT vectoring involved in these intercepts so > the value passed is always zero. > > In fact, SVM doesn't provide any information, even in exitinfo1 and 2. Note > the error in the public API and state that this field is always 0, and drop > the SVM logic in hvm_monitor_descriptor_access(). > > In the SVM vmexit handler itself, optimise the switch statement by observing > that there is a linear transformation between the SVM exit_reason and > VM_EVENT_DESC_* values. (Bloat-o-meter reports 6028 => 5877 for a saving of > 151 bytes). > > Signed-off-by: Andrew Cooper Reviewed-by: Alexandru Isaila I agree with Tamas, good thing to have that field removed. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On Thu, Nov 28, 2019 at 03:19:50PM +0100, Jan Beulich wrote: > On 28.11.2019 15:13, Roger Pau Monné wrote: > > On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote: > >> On 28.11.2019 12:39, Roger Pau Monné wrote: > >>> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: > At the time the pending EOI stack was introduced there were no > internally used IRQs which would have the LAPIC EOI issued from the > ->end() hook. This had then changed with the introduction of IOMMUs, > but the interaction issue was presumably masked by > irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early > (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer > running without need"]). > > The problem is that with us re-enabling interrupts across handler > invocation, a higher priority (guest) interrupt may trigger while > handling a lower priority (internal) one. The EOI issued from > ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly > EOI the higher priority (guest) interrupt, breaking (among other > things) pending EOI stack logic's assumptions. > >>> > >>> Maybe there's something that I'm missing, but shouldn't hypervisor > >>> vectors always be higher priority than guest ones? > >> > >> Depends - IOMMU ones imo aren't something that needs urgently > >> dealing with, so a little bit of delay won't hurt. There would > >> only be a problem if such interrupts could be deferred > >> indefinitely. > >> > >>> I see there's already a range reserved for high priority vectors > >>> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor > >>> interrupts not using this range? > >> > >> We'd quickly run out of high priority vectors on systems with > >> multiple (and perhaps indeed many) IOMMUs. > > > > Well, there's no limit on the number of high priority vectors, since > > this is all a software abstraction. It only matters that such vectors > > are higher than guest owned ones. > > > > I have to take a look, but I would think that Xen used vectors are the > > first ones to be allocated, and hence could start from > > FIRST_HIPRIORITY_VECTOR - 1 and go down from there. > > If this was the case, then we wouldn't have observed the issue (despite > it being there) this patch tries to address. The IOMMUs for both Andrew > and me ended up using vector 0x28, below everything that e.g. the > IO-APIC RTE got assigned. I know it's not like that ATM, and hence I wonder whether it would be possible to make it so: Xen vectors get allocated down from FIRST_HIPRIORITY_VECTOR - 1 and then we won't have this issue. > Also don't forget that we don't allocate > vectors continuously, but such that they'd get spread across the > different priority levels. (Whether that's an awfully good idea is a > separate question.) Well, vectors used by Xen would be allocated downwards continuously from FIRST_HIPRIORITY_VECTOR - 1, and hence won't be spread. Guest used vectors could continue to use the same allocation mechanism, since that's a different issue. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable test] 144344: tolerable FAIL - PUSHED
flight 144344 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/144344/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 16 guest-localmigrate fail REGR. vs. 144323 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144323 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144323 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144323 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144323 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144323 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144323 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail like 144323 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144323 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144323 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144323 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen 9a400d1797ec7f77ffefeb5c4e17a8c2e8b91a12 baseline version: xen 5530782cfe70ed22fe44358f6a10c38916443b42 Last test of basis 144323 2019-11-27 12:02:43 Z1 days Testing same since 144344 2019-11-28 02:00:09 Z0 days1 attempts People who touched revisions under test: George Dunlap Ian Jackson Igor Druzhinin Jan Beulich Julien
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 11/28/19 1:10 PM, Andrew Cooper wrote: > On 28/11/2019 10:17, George Dunlap wrote: >>> On Nov 28, 2019, at 9:55 AM, Jan Beulich wrote: >> Has anyone actually tried building a livepatch with this change in place? > Actually - what is your concern here? The exact spelling of symbols > names should be of no interest to the tool. After all the compiler is > free to invent all sorts of names for its local symbols, including > the ones we would produce with this change in place. All the tool > cares about is that the names be unambiguous. Hence any (theoretical) > regression here would be a bug in the tools, which imo is no reason > to delay this change any further. (Granted I should have got to it > earlier, but it had been continuing to get deferred.) This might all be true (theoretically). The livepatch build tools are fragile and very sensitive to how the object files are laid out. There is a very real risk that this change accidentally breaks livepatching totally, even on GCC builds. Were this to happen, it would be yet another 4.13 regression. >>> It's perhaps a matter of perception, but I'd still call this a >>> live patching tools bug then, not a 4.13 regression. >> After the discussion yesterday, I was thinking a bit more about this, and >> I’m not happy with the principle Andy seems to be operating on, > > I'm sorry that you feel that way. > >> that it’s reasonable to completely block a bug-fixing patch to Xen, forcing >> a work-around to be used in a release, because it has unknown effects on >> livepatching. > > This is not a fair characterisation of what is going on here. Ignore > the specifics of this patch - they are not relevant to my objection. > > As a maintainer, it is my responsibility to ensure that crap doesn't get > committed. Jan's patch is not crap; this is out of line. >> And I do not consider it my responsibility to >> do regression tests of the live patching tools. > > Yes. Yes it really is, when you're making a material change > specifically to this area, with a high chance of adverse impact. Then it looks like we need to have a wider discussion about this, because my answer is, "No, not it's really not." And I don't think you would think so either, except that you happen to use livepatching. Imagine if you posted a patch trying to fix nested HVM, and out of the blue Olaf turned up and said, "Have you tested this with hypervisor paging?" And when you said, "No, I have no idea how to test that", Jan simply blocked the patch indefinitely? You would be angry, and rightly so. As a general principle, the cost of features should be borne by the people who use them. That includes the cost of determining authoritatively whether a change is safe or not -- either by review, or by doing manual testing, or by having automated tests in place to flag up issues. If you had questions about the patch, *you*, with your developer hat on, should have either done testing, or arranged for someone who uses it regularly to do testing to make sure it works. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 28.11.2019 10:55, Jan Beulich wrote: > On 27.11.2019 19:17, Andrew Cooper wrote: >> On 21/11/2019 08:34, Jan Beulich wrote: >>> On 20.11.2019 18:13, Andrew Cooper wrote: On 20/11/2019 16:40, Jürgen Groß wrote: > On 20.11.19 17:30, Jan Beulich wrote: >> On 08.11.2019 12:18, Jan Beulich wrote: >>> The .file assembler directives generated by the compiler do not include >>> any path components (gcc) or just the ones specified on the command >>> line >>> (clang, at least version 5), and hence multiple identically named >>> source >>> files (in different directories) may produce identically named static >>> symbols (in their kallsyms representation). The binary diffing >>> algorithm >>> used by xen-livepatch, however, depends on having unique symbols. >>> >>> Make the ENFORCE_UNIQUE_SYMBOLS Kconfig option control the (build) >>> behavior, and if enabled use objcopy to prepend the (relative to the >>> xen/ subdirectory) path to the compiler invoked STT_FILE symbols. Note >>> that this build option is made no longer depend on LIVEPATCH, but >>> merely >>> defaults to its setting now. >>> >>> Conditionalize explicit .file directive insertion in C files where it >>> exists just to disambiguate names in a less generic manner; note that >>> at the same time the redundant emission of STT_FILE symbols gets >>> suppressed for clang. Assembler files as well as multiply compiled C >>> ones using __OBJECT_FILE__ are left alone for the time being. >>> >>> Since we now expect there not to be any duplicates anymore, also don't >>> force the selection of the option to 'n' anymore in allrandom.config. >>> Similarly COVERAGE no longer suppresses duplicate symbol warnings if >>> enforcement is in effect, which in turn allows >>> SUPPRESS_DUPLICATE_SYMBOL_WARNINGS to simply depend on >>> !ENFORCE_UNIQUE_SYMBOLS. >>> >>> Signed-off-by: Jan Beulich >> I've got acks from Konrad and Wei, but still need an x86 and a release >> one here. Andrew? Or alternatively - Jürgen, would you rather not see >> this go in anymore? > In case the needed x86 Ack is coming in before RC3 I'm fine to give my > Release-ack, but I'm hesitant to take it later. Has anyone actually tried building a livepatch with this change in place? >>> Actually - what is your concern here? The exact spelling of symbols >>> names should be of no interest to the tool. After all the compiler is >>> free to invent all sorts of names for its local symbols, including >>> the ones we would produce with this change in place. All the tool >>> cares about is that the names be unambiguous. Hence any (theoretical) >>> regression here would be a bug in the tools, which imo is no reason >>> to delay this change any further. (Granted I should have got to it >>> earlier, but it had been continuing to get deferred.) >> >> This might all be true (theoretically). >> >> The livepatch build tools are fragile and very sensitive to how the >> object files are laid out. There is a very real risk that this change >> accidentally breaks livepatching totally, even on GCC builds. >> >> Were this to happen, it would be yet another 4.13 regression. > > It's perhaps a matter of perception, but I'd still call this a > live patching tools bug then, not a 4.13 regression. > >> This is a change to fix a concrete livepatch issue with Clang. Sure - >> it resolves the symbol uniqueness failures for the in-tree build, but >> considering the risks to the area you are modifying, the fact you >> haven't even done a dev test of a livepatch build on GCC means that the >> patch as a whole has not had what I would consider a reasonable amount >> of testing. > > While Clang is the primary area where we need this change, it is > in no way limited to that environment. Gcc can, at any time, > start triggering the issue again as well - both because of changes > to the compiler itself, or because of (perhaps seemingly innocent) > changes we do to Xen. I can't imagine the workaround for this > would be to make it impossible altogether to select LIVEPATCH=y. And indeed - on my box with gcc 4.3, a full fresh build fails with two duplicate symbols. Roger's workaround therefore is not enough in any event for 4.13, unless we want to - in the last minute - raise the bar of what gcc versions we claim compatibility with. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 15:13, Roger Pau Monné wrote: > On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote: >> On 28.11.2019 12:39, Roger Pau Monné wrote: >>> On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: At the time the pending EOI stack was introduced there were no internally used IRQs which would have the LAPIC EOI issued from the ->end() hook. This had then changed with the introduction of IOMMUs, but the interaction issue was presumably masked by irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer running without need"]). The problem is that with us re-enabling interrupts across handler invocation, a higher priority (guest) interrupt may trigger while handling a lower priority (internal) one. The EOI issued from ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly EOI the higher priority (guest) interrupt, breaking (among other things) pending EOI stack logic's assumptions. >>> >>> Maybe there's something that I'm missing, but shouldn't hypervisor >>> vectors always be higher priority than guest ones? >> >> Depends - IOMMU ones imo aren't something that needs urgently >> dealing with, so a little bit of delay won't hurt. There would >> only be a problem if such interrupts could be deferred >> indefinitely. >> >>> I see there's already a range reserved for high priority vectors >>> ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor >>> interrupts not using this range? >> >> We'd quickly run out of high priority vectors on systems with >> multiple (and perhaps indeed many) IOMMUs. > > Well, there's no limit on the number of high priority vectors, since > this is all a software abstraction. It only matters that such vectors > are higher than guest owned ones. > > I have to take a look, but I would think that Xen used vectors are the > first ones to be allocated, and hence could start from > FIRST_HIPRIORITY_VECTOR - 1 and go down from there. If this was the case, then we wouldn't have observed the issue (despite it being there) this patch tries to address. The IOMMUs for both Andrew and me ended up using vector 0x28, below everything that e.g. the IO-APIC RTE got assigned. Also don't forget that we don't allocate vectors continuously, but such that they'd get spread across the different priority levels. (Whether that's an awfully good idea is a separate question.) Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On Thu, Nov 28, 2019 at 02:33:08PM +0100, Jan Beulich wrote: > On 28.11.2019 12:39, Roger Pau Monné wrote: > > On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: > >> At the time the pending EOI stack was introduced there were no > >> internally used IRQs which would have the LAPIC EOI issued from the > >> ->end() hook. This had then changed with the introduction of IOMMUs, > >> but the interaction issue was presumably masked by > >> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early > >> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer > >> running without need"]). > >> > >> The problem is that with us re-enabling interrupts across handler > >> invocation, a higher priority (guest) interrupt may trigger while > >> handling a lower priority (internal) one. The EOI issued from > >> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly > >> EOI the higher priority (guest) interrupt, breaking (among other > >> things) pending EOI stack logic's assumptions. > > > > Maybe there's something that I'm missing, but shouldn't hypervisor > > vectors always be higher priority than guest ones? > > Depends - IOMMU ones imo aren't something that needs urgently > dealing with, so a little bit of delay won't hurt. There would > only be a problem if such interrupts could be deferred > indefinitely. > > > I see there's already a range reserved for high priority vectors > > ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor > > interrupts not using this range? > > We'd quickly run out of high priority vectors on systems with > multiple (and perhaps indeed many) IOMMUs. Well, there's no limit on the number of high priority vectors, since this is all a software abstraction. It only matters that such vectors are higher than guest owned ones. I have to take a look, but I would think that Xen used vectors are the first ones to be allocated, and hence could start from FIRST_HIPRIORITY_VECTOR - 1 and go down from there. This way we will end up with high priority vectors first, Xen used vectors, and finally guest vectors. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses
On 11/28/19 1:44 PM, Andrew Cooper wrote: > c/s d0a699a389f1 "x86/monitor: add support for descriptor access events" > introduced logic looking for what appeared to be exitinfo (not that this > exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring > information. There is never any IDT vectoring involved in these intercepts so > the value passed is always zero. > > In fact, SVM doesn't provide any information, even in exitinfo1 and 2. Note > the error in the public API and state that this field is always 0, and drop > the SVM logic in hvm_monitor_descriptor_access(). > > In the SVM vmexit handler itself, optimise the switch statement by observing > that there is a linear transformation between the SVM exit_reason and > VM_EVENT_DESC_* values. (Bloat-o-meter reports 6028 => 5877 for a saving of > 151 bytes). > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > CC: Razvan Cojocaru > CC: Tamas K Lengyel > CC: Alexandru Isaila > CC: Petre Pircalabu > CC: Adrian Pop > > Adrian: Do you recall what information you were attempting to forward from the > VMCB? I can't locate anything which would plausibly be interesting. I think it's safe to go the route you're going (you shouldn't break anything). Acked-by: Razvan Cojocaru (with or without addressing Tamas' comments). Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide
On 28/11/2019, 07:37, "Jan Beulich" wrote: On 28.11.2019 14:06, Lars Kurth wrote: > I can certainly add something on the timing , along the lines of > * For complex series, consider the time it takes to do reviews (maybe with a guide of LOC per hour) and give reviewers enough time to > * For series with design issues or large questions, try and highlight the key open issues in cover letters clearly and solicit feedback from key maintainers who can comment on the open issue. The idea is to save both the contributor and the reviewers time by focussing on what needs to be resolved > * Don’t repost a series, unless all review comments are addressed > or the reviewers asked you to do so. The problem with this is that > this is somewhat in conflict with the "let's focus on the core > issues and not get distracted by details early on in a review cycle". > In other words, this can only work, if reviewers focus on major > issues in early reviews only and do not focus on style, coding > standards, etc. But this doesn't make much sense either, because then full re-reviews need to happen anyway on later versions, to also deal with the minor issues. For RFC kind of series omitting style and alike feedback certainly makes sense, but as soon as a patch is non-RFC, it should be considered good to go in by the submitter. OK, I think we have a disconnect between ideal and reality. I see two issues today * Key maintainers don't always review RFC series [they end up at the bottom of the priority list, even though spending time on RFCs will save time elsewhere later]. So the effect is that then the contributor assumes there are no major issues and ends it as a proper series * In practice what has happened often in the past is that design, architecture, assumption flaws are found in early versions of a series. - This usually happens because of an oversight or because there was no design discussion prior to the series being posted and agreed - Common sense would dictate that the biggest benefit for both the reviewer, the contributor and the community as a whole would be to try and focus on such flaws and leave everything aside - Of course there may be value in doing a detailed reviews of such a series as there may be bits that are unaffected by such a flaw - But there will likely be parts which are not: doing a detailed review of such portions wastes everyone's time So coming back to your point. Ideally, it would be nice if we had the capability to call out parts of a series as "problematic" and treating such parts differently Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 14:53, Andrew Cooper wrote: > On 28/11/2019 12:10, Andrew Cooper wrote: >> On 28/11/2019 11:03, Jan Beulich wrote: >>> Notes: >>> >>> - In principle we could get away without the check_eoi_deferral flag. >>> I've introduced it just to make sure there's as little change as >>> possible to unaffected paths. >>> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't >>> strictly necessary. >> I don't think the cpu_has_pending_apic_eoi() check is necessary. It is >> checked at the head of end_nonmaskable_irq() as well. >> >> Similarly, I'm not sure that check_eoi_deferral is something that we'd >> want to introduce. >> >> I'll drop both of these and test, seeing as I have a repro of the problem. > > Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so > I didn't), and dropping just check_eoi_deferral on its own definitely > breaks things. > > Given the 4.13 timeline, lets go with it in this form, seeing as it is > the version which had all of last night's worth of testing. > > Acked-by: Andrew Cooper > Tested-by: Andrew Cooper Thanks! I've taken note to produce a patch (if possible at all, given the results of your attempt) to remove the extra pieces again, ideally to go in pretty soon after the branching. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling
From: George Dunlap Xen used to have single, system-wide limits for the number of grant frames and maptrack frames a guest was allowed to create. Increasing or decreasing this single limit on the Xen command-line would change the limit for all guests on the system. Later, per-domain limits for these values was created. The system-wide limits became strict limits: domains could not be created with higher limits, but could be created with lower limits. However, that change also introduced a range of different "default" values into various places in the toolstack: - The python libxc bindings hard-coded these values to 32 and 1024, respectively - The libxl default values are 32 and 1024 respectively. - xl will use the libxl default for maptrack, but does its own default calculation for grant frames: either 32 or 64, based on the max possible mfn. These defaults interact poorly with the hypervisor command-line limit: - The hypervisor command-line limit cannot be used to raise the limit for all guests anymore, as the default in the toolstack will effectively override this. - If you use the hypervisor command-line limit to *reduce* the limit, then the "default" values generated by the toolstack are too high, and all guest creations will fail. In other words, the toolstack defaults require any change to be effected by having the admin explicitly specify a new value in every guest. In order to address this, have grant_table_init treat negative values for max_grant_frames and max_maptrack_frames as instructions to use the system-wide default, and have all the above toolstacks default to passing -1 unless a different value is explicitly configured. This restores the old behavior in that changing the hypervisor command-line option can change the behavior for all guests, while retaining the ability to set per-guest values. It also removes the bug that reducing the system-wide max will cause all domains without explicit limits to fail. NOTE: - The Ocaml bindings require the caller to always specify a value, and the code to start a xenstored stubdomain hard-codes these to 4 and 128 respectively; this behavour will not be modified. Signed-off-by: George Dunlap Signed-off-by: Paul Durrant --- Cc: Ian Jackson Cc: Wei Liu Cc: Andrew Cooper Cc: George Dunlap Cc: Jan Beulich Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Anthony PERARD Cc: "Marek Marczykowski-Górecki" Cc: Volodymyr Babchuk Cc: "Roger Pau Monné" v4: - Add missing braces in xlu_cfg_get_bounded_long() v3: - Make sure that specified values cannot be negative or overflow a signed int v2: - re-worked George's original commit massage a little - fixed the text in xl.conf.5.pod - use -1 as the sentinel value for 'default' and < 0 for checking it --- docs/man/xl.conf.5.pod| 6 ++-- tools/libxl/libxl.h | 4 +-- tools/libxl/libxl_types.idl | 4 +-- tools/libxl/libxlu_cfg.c | 26 +++-- tools/libxl/libxlutil.h | 2 ++ tools/python/xen/lowlevel/xc/xc.c | 4 +-- tools/xl/xl.c | 15 -- tools/xl/xl_parse.c | 9 -- xen/arch/arm/setup.c | 2 +- xen/arch/x86/setup.c | 4 +-- xen/common/grant_table.c | 46 --- xen/include/public/domctl.h | 10 --- xen/include/xen/grant_table.h | 8 +++--- 13 files changed, 102 insertions(+), 38 deletions(-) diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod index 962144e38e..207ab3e77a 100644 --- a/docs/man/xl.conf.5.pod +++ b/docs/man/xl.conf.5.pod @@ -81,13 +81,15 @@ Default: C Sets the default value for the C domain config value. -Default: C<32> on hosts up to 16TB of memory, C<64> on hosts larger than 16TB +Default: value of Xen command line B parameter (or its +default value if unspecified). =item B Sets the default value for the C domain config value. -Default: C<1024> +Default: value of Xen command line B +parameter (or its default value if unspecified). =item B diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 49b56fa1a3..a2a5d321c5 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -364,8 +364,8 @@ */ #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1 -#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32 -#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024 +#define LIBXL_MAX_GRANT_FRAMES_DEFAULT -1 +#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT -1 /* * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 0546d7865a..63e29bb2fb 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -511,8 +511,8 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("vnuma_nodes", Array(libxl_vnode_info, "num_vnuma_nodes")), -("max_grant_frames",uint32, {'init_val': 'LIBXL_MAX_GRANT_FRAMES_DEFAULT'}), -
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 28.11.2019 14:54, Ross Lagerwall wrote: > On 11/28/19 1:49 PM, Jan Beulich wrote: >> On 28.11.2019 13:15, Sergey Dyasli wrote: >>> Applying the patch didn't end up well for my test LP (from another thread): >>> >>> Perform full initial build with 8 CPU(s)... >>> Reading special section data >>> Apply patch and build with 8 CPU(s)... >>> Unapply patch and build with 8 CPU(s)... >>> Extracting new and modified ELF sections... >>> Processing xen/arch/x86/mm/shadow/guest_2.o >>> Processing xen/arch/x86/mm/shadow/guest_4.o >>> Processing xen/arch/x86/mm/shadow/guest_3.o >>> Processing xen/arch/x86/mm/guest_walk_3.o >>> Processing xen/arch/x86/mm/hap/guest_walk_3level.o >>> Processing xen/arch/x86/mm/hap/guest_walk_4level.o >>> Processing xen/arch/x86/mm/hap/guest_walk_2level.o >>> Processing xen/arch/x86/mm/guest_walk_2.o >>> Processing xen/arch/x86/mm/guest_walk_4.o >>> Processing xen/arch/x86/efi/efi/check.o >>> Processing xen/arch/x86/pv/gpr_switch.o >>> Processing xen/arch/x86/indirect-thunk.o >>> Processing xen/arch/x86/boot/head.o >>> Processing xen/arch/x86/x86_64/kexec_reloc.o >>> Processing xen/arch/x86/x86_64/compat/entry.o >>> Processing xen/arch/x86/x86_64/entry.o >>> Processing xen/arch/x86/hvm/vmx/entry.o >>> Processing xen/arch/x86/hvm/svm/entry.o >>> Processing >>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o >>> Processing >>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o >>> Processing >>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1s.o >>> Processing >>> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1r.o >>> ERROR: no functional changes found. >>> >>> So this looks like a regression. >> >> Thanks for doing the testing. But what am I to conclude from >> the above? I can't even tell why "no functional changes found" >> is an error. >> > > It's due to the way livepatch-build tool interposes on the build to capture > changed object files for later comparison. Now that objcopy writes out the > proper object files rather than gcc (which just writes a temporary one), the > livepatch-build tool needs some adjustment otherwise it doesn't capture any > changed files. I'm working on a patch. For my own education, and just if you have the time: Why would there be any dependency on which build utility produces the object file? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 144346: tolerable all pass - PUSHED
flight 144346 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/144346/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 56348df32bbc782e63b6e3fb978b80e015ae76e7 baseline version: xen 9a400d1797ec7f77ffefeb5c4e17a8c2e8b91a12 Last test of basis 144335 2019-11-27 17:01:49 Z0 days Testing same since 144346 2019-11-28 11:01:36 Z0 days1 attempts People who touched revisions under test: Jan Beulich Joe Jin Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 9a400d1797..56348df32b 56348df32bbc782e63b6e3fb978b80e015ae76e7 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 11/28/19 1:49 PM, Jan Beulich wrote: > On 28.11.2019 13:15, Sergey Dyasli wrote: >> Applying the patch didn't end up well for my test LP (from another thread): >> >> Perform full initial build with 8 CPU(s)... >> Reading special section data >> Apply patch and build with 8 CPU(s)... >> Unapply patch and build with 8 CPU(s)... >> Extracting new and modified ELF sections... >> Processing xen/arch/x86/mm/shadow/guest_2.o >> Processing xen/arch/x86/mm/shadow/guest_4.o >> Processing xen/arch/x86/mm/shadow/guest_3.o >> Processing xen/arch/x86/mm/guest_walk_3.o >> Processing xen/arch/x86/mm/hap/guest_walk_3level.o >> Processing xen/arch/x86/mm/hap/guest_walk_4level.o >> Processing xen/arch/x86/mm/hap/guest_walk_2level.o >> Processing xen/arch/x86/mm/guest_walk_2.o >> Processing xen/arch/x86/mm/guest_walk_4.o >> Processing xen/arch/x86/efi/efi/check.o >> Processing xen/arch/x86/pv/gpr_switch.o >> Processing xen/arch/x86/indirect-thunk.o >> Processing xen/arch/x86/boot/head.o >> Processing xen/arch/x86/x86_64/kexec_reloc.o >> Processing xen/arch/x86/x86_64/compat/entry.o >> Processing xen/arch/x86/x86_64/entry.o >> Processing xen/arch/x86/hvm/vmx/entry.o >> Processing xen/arch/x86/hvm/svm/entry.o >> Processing >> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o >> Processing >> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o >> Processing >> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1s.o >> Processing >> xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1r.o >> ERROR: no functional changes found. >> >> So this looks like a regression. > > Thanks for doing the testing. But what am I to conclude from > the above? I can't even tell why "no functional changes found" > is an error. > It's due to the way livepatch-build tool interposes on the build to capture changed object files for later comparison. Now that objcopy writes out the proper object files rather than gcc (which just writes a temporary one), the livepatch-build tool needs some adjustment otherwise it doesn't capture any changed files. I'm working on a patch. Thanks, -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28/11/2019 12:10, Andrew Cooper wrote: > On 28/11/2019 11:03, Jan Beulich wrote: >> Notes: >> >> - In principle we could get away without the check_eoi_deferral flag. >> I've introduced it just to make sure there's as little change as >> possible to unaffected paths. >> - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't >> strictly necessary. > I don't think the cpu_has_pending_apic_eoi() check is necessary. It is > checked at the head of end_nonmaskable_irq() as well. > > Similarly, I'm not sure that check_eoi_deferral is something that we'd > want to introduce. > > I'll drop both of these and test, seeing as I have a repro of the problem. Dropping cpu_has_pending_apic_eoi() wasn't possible in a trivial way (so I didn't), and dropping just check_eoi_deferral on its own definitely breaks things. Given the 4.13 timeline, lets go with it in this form, seeing as it is the version which had all of last night's worth of testing. Acked-by: Andrew Cooper Tested-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH-for-4.13 v3] Rationalize max_grant_frames and max_maptrack_frames handling
> -Original Message- > From: Paul Durrant > Sent: 28 November 2019 12:51 > To: xen-devel@lists.xenproject.org > Cc: George Dunlap ; Durrant, Paul > ; Ian Jackson ; Wei Liu > ; Andrew Cooper ; George Dunlap > ; Jan Beulich ; Julien > Grall ; Konrad Rzeszutek Wilk ; > Stefano Stabellini ; Anthony PERARD > ; Marek Marczykowski-Górecki > ; Volodymyr Babchuk > ; Roger Pau Monné > Subject: [PATCH-for-4.13 v3] Rationalize max_grant_frames and > max_maptrack_frames handling > > From: George Dunlap > > Xen used to have single, system-wide limits for the number of grant > frames and maptrack frames a guest was allowed to create. Increasing > or decreasing this single limit on the Xen command-line would change > the limit for all guests on the system. > > Later, per-domain limits for these values was created. The system-wide > limits became strict limits: domains could not be created with higher > limits, but could be created with lower limits. However, that change > also introduced a range of different "default" values into various > places in the toolstack: > > - The python libxc bindings hard-coded these values to 32 and 1024, > respectively > - The libxl default values are 32 and 1024 respectively. > - xl will use the libxl default for maptrack, but does its own default > calculation for grant frames: either 32 or 64, based on the max > possible mfn. > > These defaults interact poorly with the hypervisor command-line limit: > > - The hypervisor command-line limit cannot be used to raise the limit > for all guests anymore, as the default in the toolstack will > effectively override this. > - If you use the hypervisor command-line limit to *reduce* the limit, > then the "default" values generated by the toolstack are too high, > and all guest creations will fail. > > In other words, the toolstack defaults require any change to be > effected by having the admin explicitly specify a new value in every > guest. > > In order to address this, have grant_table_init treat negative values > for max_grant_frames and max_maptrack_frames as instructions to use the > system-wide default, and have all the above toolstacks default to passing > -1 unless a different value is explicitly configured. > > This restores the old behavior in that changing the hypervisor command- > line > option can change the behavior for all guests, while retaining the ability > to set per-guest values. It also removes the bug that reducing the > system-wide max will cause all domains without explicit limits to fail. > > NOTE: - The Ocaml bindings require the caller to always specify a value, > and the code to start a xenstored stubdomain hard-codes these to 4 > and 128 respectively; this behavour will not be modified. > > Signed-off-by: George Dunlap > Signed-off-by: Paul Durrant > --- > Cc: Ian Jackson > Cc: Wei Liu > Cc: Andrew Cooper > Cc: George Dunlap > Cc: Jan Beulich > Cc: Julien Grall > Cc: Konrad Rzeszutek Wilk > Cc: Stefano Stabellini > Cc: Anthony PERARD > Cc: "Marek Marczykowski-Górecki" > Cc: Volodymyr Babchuk > Cc: "Roger Pau Monné" > > v3: > - Make sure that specified values cannot be negative or overflow a >signed int > > v2: > - re-worked George's original commit massage a little > - fixed the text in xl.conf.5.pod > - use -1 as the sentinel value for 'default' and < 0 for checking it > --- > docs/man/xl.conf.5.pod| 6 ++-- > tools/libxl/libxl.h | 4 +-- > tools/libxl/libxl_types.idl | 4 +-- > tools/libxl/libxlu_cfg.c | 24 ++-- > tools/libxl/libxlutil.h | 2 ++ > tools/python/xen/lowlevel/xc/xc.c | 4 +-- > tools/xl/xl.c | 15 -- > tools/xl/xl_parse.c | 9 -- > xen/arch/arm/setup.c | 2 +- > xen/arch/x86/setup.c | 4 +-- > xen/common/grant_table.c | 46 --- > xen/include/public/domctl.h | 10 --- > xen/include/xen/grant_table.h | 8 +++--- > 13 files changed, 100 insertions(+), 38 deletions(-) > > diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod > index 962144e38e..207ab3e77a 100644 > --- a/docs/man/xl.conf.5.pod > +++ b/docs/man/xl.conf.5.pod > @@ -81,13 +81,15 @@ Default: C > > Sets the default value for the C domain config value. > > -Default: C<32> on hosts up to 16TB of memory, C<64> on hosts larger than > 16TB > +Default: value of Xen command line B parameter (or its > +default value if unspecified). > > =item B > > Sets the default value for the C domain config > value. > > -Default: C<1024> > +Default: value of Xen command line B > +parameter (or its default value if unspecified). > > =item B > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index 49b56fa1a3..a2a5d321c5 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -364,8 +364,8 @@ > */ > #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1 > > -#define
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 28.11.2019 13:15, Sergey Dyasli wrote: > Applying the patch didn't end up well for my test LP (from another thread): > > Perform full initial build with 8 CPU(s)... > Reading special section data > Apply patch and build with 8 CPU(s)... > Unapply patch and build with 8 CPU(s)... > Extracting new and modified ELF sections... > Processing xen/arch/x86/mm/shadow/guest_2.o > Processing xen/arch/x86/mm/shadow/guest_4.o > Processing xen/arch/x86/mm/shadow/guest_3.o > Processing xen/arch/x86/mm/guest_walk_3.o > Processing xen/arch/x86/mm/hap/guest_walk_3level.o > Processing xen/arch/x86/mm/hap/guest_walk_4level.o > Processing xen/arch/x86/mm/hap/guest_walk_2level.o > Processing xen/arch/x86/mm/guest_walk_2.o > Processing xen/arch/x86/mm/guest_walk_4.o > Processing xen/arch/x86/efi/efi/check.o > Processing xen/arch/x86/pv/gpr_switch.o > Processing xen/arch/x86/indirect-thunk.o > Processing xen/arch/x86/boot/head.o > Processing xen/arch/x86/x86_64/kexec_reloc.o > Processing xen/arch/x86/x86_64/compat/entry.o > Processing xen/arch/x86/x86_64/entry.o > Processing xen/arch/x86/hvm/vmx/entry.o > Processing xen/arch/x86/hvm/svm/entry.o > Processing > xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o > Processing > xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o > Processing > xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1s.o > Processing > xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.1r.o > ERROR: no functional changes found. > > So this looks like a regression. Thanks for doing the testing. But what am I to conclude from the above? I can't even tell why "no functional changes found" is an error. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses
On Thu, Nov 28, 2019 at 4:44 AM Andrew Cooper wrote: > > c/s d0a699a389f1 "x86/monitor: add support for descriptor access events" > introduced logic looking for what appeared to be exitinfo (not that this > exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring > information. There is never any IDT vectoring involved in these intercepts so > the value passed is always zero. > > In fact, SVM doesn't provide any information, even in exitinfo1 and 2. Note > the error in the public API and state that this field is always 0, and drop > the SVM logic in hvm_monitor_descriptor_access(). > > In the SVM vmexit handler itself, optimise the switch statement by observing > that there is a linear transformation between the SVM exit_reason and > VM_EVENT_DESC_* values. (Bloat-o-meter reports 6028 => 5877 for a saving of > 151 bytes). > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Wei Liu > CC: Roger Pau Monné > CC: Razvan Cojocaru > CC: Tamas K Lengyel > CC: Alexandru Isaila > CC: Petre Pircalabu > CC: Adrian Pop > > Adrian: Do you recall what information you were attempting to forward from the > VMCB? I can't locate anything which would plausibly be interesting. > > This is part of a longer cleanup series I gathered in the wake of the task > switch issues, but it is pulled out ahead due to its interaction with the > public interface. > --- > xen/arch/x86/hvm/monitor.c| 4 > xen/arch/x86/hvm/svm/svm.c| 37 + > xen/include/public/vm_event.h | 4 ++-- > 3 files changed, 19 insertions(+), 26 deletions(-) > > diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c > index 7fb1e2c04e..1f23fe25e8 100644 > --- a/xen/arch/x86/hvm/monitor.c > +++ b/xen/arch/x86/hvm/monitor.c > @@ -113,10 +113,6 @@ void hvm_monitor_descriptor_access(uint64_t exit_info, > req.u.desc_access.arch.vmx.instr_info = exit_info; > req.u.desc_access.arch.vmx.exit_qualification = > vmx_exit_qualification; > } > -else > -{ > -req.u.desc_access.arch.svm.exitinfo = exit_info; > -} > > monitor_traps(current, true, ); > } > diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c > index 0fb1908c18..776cf11459 100644 > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -2980,29 +2980,26 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) > svm_vmexit_do_pause(regs); > break; > > -case VMEXIT_IDTR_READ: > -case VMEXIT_IDTR_WRITE: > -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0, > -VM_EVENT_DESC_IDTR, exit_reason == VMEXIT_IDTR_WRITE); > -break; > - > -case VMEXIT_GDTR_READ: > -case VMEXIT_GDTR_WRITE: > -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0, > -VM_EVENT_DESC_GDTR, exit_reason == VMEXIT_GDTR_WRITE); > -break; > +case VMEXIT_IDTR_READ ... VMEXIT_TR_WRITE: > +{ > +/* > + * Consecutive block of 8 exit codes (sadly not aligned). Top bit > + * indicates write (vs read), bottom 2 bits map linearly to > + * VM_EVENT_DESC_* values. > + */ > +#define E2D(e) e) - VMEXIT_IDTR_READ) & 3) + 1) > +bool write = ((exit_reason - VMEXIT_IDTR_READ) & 4); > +unsigned int desc = E2D(exit_reason); > > -case VMEXIT_LDTR_READ: > -case VMEXIT_LDTR_WRITE: > -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0, > -VM_EVENT_DESC_LDTR, exit_reason == VMEXIT_LDTR_WRITE); > -break; > +BUILD_BUG_ON(E2D(VMEXIT_IDTR_READ) != VM_EVENT_DESC_IDTR); > +BUILD_BUG_ON(E2D(VMEXIT_GDTR_READ) != VM_EVENT_DESC_GDTR); > +BUILD_BUG_ON(E2D(VMEXIT_LDTR_READ) != VM_EVENT_DESC_LDTR); > +BUILD_BUG_ON(E2D(VMEXIT_TR_READ) != VM_EVENT_DESC_TR); > +#undef E2D > > -case VMEXIT_TR_READ: > -case VMEXIT_TR_WRITE: > -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0, > -VM_EVENT_DESC_TR, exit_reason == VMEXIT_TR_WRITE); > +hvm_descriptor_access_intercept(0, 0, desc, write); > break; > +} > > default: > unexpected_exit_type: > diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h > index 959083d8c4..d1b5c95f72 100644 > --- a/xen/include/public/vm_event.h > +++ b/xen/include/public/vm_event.h > @@ -302,8 +302,8 @@ struct vm_event_desc_access { > uint64_t exit_qualification; /* VMX: VMCS Exit Qualification */ > } vmx; > struct { > -uint64_t exitinfo; /* SVM: VMCB EXITINFO */ > -uint64_t _pad2; > +uint64_t exitinfo; /* SVM: Always 0. This field made > */ There really is no point in retaining a useless field. Just remove it and bump the event interface version. That's what it's for. > +uint64_t _pad2; /* its way into the API by error. > */
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 28.11.2019 14:10, Andrew Cooper wrote: > On 28/11/2019 10:17, George Dunlap wrote: >>> On Nov 28, 2019, at 9:55 AM, Jan Beulich wrote: >> Has anyone actually tried building a livepatch with this change in place? > Actually - what is your concern here? The exact spelling of symbols > names should be of no interest to the tool. After all the compiler is > free to invent all sorts of names for its local symbols, including > the ones we would produce with this change in place. All the tool > cares about is that the names be unambiguous. Hence any (theoretical) > regression here would be a bug in the tools, which imo is no reason > to delay this change any further. (Granted I should have got to it > earlier, but it had been continuing to get deferred.) This might all be true (theoretically). The livepatch build tools are fragile and very sensitive to how the object files are laid out. There is a very real risk that this change accidentally breaks livepatching totally, even on GCC builds. Were this to happen, it would be yet another 4.13 regression. >>> It's perhaps a matter of perception, but I'd still call this a >>> live patching tools bug then, not a 4.13 regression. >> After the discussion yesterday, I was thinking a bit more about > this, and I’m not happy with the principle Andy seems to be operating on, > > I'm sorry that you feel that way. > >> that it’s reasonable to completely block a bug-fixing patch to > Xen, forcing a work-around to be used in a release, because it > has unknown effects on livepatching. > > This is not a fair characterisation of what is going on here. Ignore > the specifics of this patch - they are not relevant to my objection. > > As a maintainer, it is my responsibility to ensure that crap doesn't get > committed. While above you say to ignore the specifics of this patch, I'm afraid I still take "crap" as a qualification of the submitted patch (and possibly other parts of my work). Perhaps I should go look for another job if it's like this. > As a consequence, it is up to me to judge whether I believe that the > submitter of a patch has provided adequate thought/testing to what they > are changing. Mostly this is judged on comments provided (or usually, > their absence), weighed up against the risk of what it might be likely > to break. > > In the case that I don't believe adequate through/testing/etc has been > done, I'm not going to ack the patch. I'd be failing as a maintainer if > I did. > > Ergo, I am not inclined to change my position. > > > In this case, all I asked was "has anyone done a livepatch build?" > > I'd be entirely happy with a reply of "yes [I or someone else did] and > it seems ok". > > I'd even be happy with "There does seem to be an issue with > livepatching, but I've engaged the relevant people in this other thread". > > What I'm not happy with is "I haven't even done a single build to see > whether it might have problems", and what is definitely not acceptable > is, and I quote: > >> And I do not consider it my responsibility to >> do regression tests of the live patching tools. > > Yes. Yes it really is, when you're making a material change > specifically to this area, with a high chance of adverse impact. > > I don't expect you necessarily to fix the issue, but I do expect you to > have some idea of whether you're trading one 4.13 blocker for a > different one. Once again - no. If there is a problem with this patch, then please point it out by review comments. If there is a problem with the live patching tools, it is there where things need fixing. Just to remind you of what I'm hearing/seeing you and others say/do: Quite often issues with the tools arise when dealing with particular XSAs. That's still no reason to delay/reject the fixes for those XSAs. And once again - please accept that views if different people may vary. >>> If they're so >>> extremely fragile, then I think this needs urgently taking care of >>> by their maintainers. As mentioned before - how exactly static >>> symbols get represented is up to the compiler, i.e. may change at >>> any time. As a result, any change whatsoever would need such >>> regression testing, no matter that I agree that a larger scope >>> change like this one has slightly higher potential of triggering >>> some issue. >> This is another argument I would agree with. >> >> Given the closeness to the release, I’d favor checking in the > patch today or tomorrow, regardless of testing status, so that > it can get more testing in our automated systems; it can always > be reverted if it is shown to break livepatching on gcc. > > Oh, and shocker in what is apparently a surprise to everyone but me... > > Sergey went and did the work Jan believed to be "not his > responsibility", and yes - this really does break livepatching. > > Do I expect Jan to fix it? No, but I do expect a discussion and an > understanding of the
Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide
On 28.11.2019 14:06, Lars Kurth wrote: > I can certainly add something on the timing , along the lines of > * For complex series, consider the time it takes to do reviews (maybe with a > guide of LOC per hour) and give reviewers enough time to > * For series with design issues or large questions, try and highlight the key > open issues in cover letters clearly and solicit feedback from key > maintainers who can comment on the open issue. The idea is to save both the > contributor and the reviewers time by focussing on what needs to be resolved > * Don’t repost a series, unless all review comments are addressed > or the reviewers asked you to do so. The problem with this is that > this is somewhat in conflict with the "let's focus on the core > issues and not get distracted by details early on in a review cycle". > In other words, this can only work, if reviewers focus on major > issues in early reviews only and do not focus on style, coding > standards, etc. But this doesn't make much sense either, because then full re-reviews need to happen anyway on later versions, to also deal with the minor issues. For RFC kind of series omitting style and alike feedback certainly makes sense, but as soon as a patch is non-RFC, it should be considered good to go in by the submitter. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 12:39, Roger Pau Monné wrote: > On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: >> At the time the pending EOI stack was introduced there were no >> internally used IRQs which would have the LAPIC EOI issued from the >> ->end() hook. This had then changed with the introduction of IOMMUs, >> but the interaction issue was presumably masked by >> irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early >> (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer >> running without need"]). >> >> The problem is that with us re-enabling interrupts across handler >> invocation, a higher priority (guest) interrupt may trigger while >> handling a lower priority (internal) one. The EOI issued from >> ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly >> EOI the higher priority (guest) interrupt, breaking (among other >> things) pending EOI stack logic's assumptions. > > Maybe there's something that I'm missing, but shouldn't hypervisor > vectors always be higher priority than guest ones? Depends - IOMMU ones imo aren't something that needs urgently dealing with, so a little bit of delay won't hurt. There would only be a problem if such interrupts could be deferred indefinitely. > I see there's already a range reserved for high priority vectors > ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor > interrupts not using this range? We'd quickly run out of high priority vectors on systems with multiple (and perhaps indeed many) IOMMUs. > IMO it seems troublesome that pending guests vectors can delay the > injection of hypervisor interrupt vectors. As per above - depends. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 28/11/2019 10:17, George Dunlap wrote: >> On Nov 28, 2019, at 9:55 AM, Jan Beulich wrote: > Has anyone actually tried building a livepatch with this change in place? Actually - what is your concern here? The exact spelling of symbols names should be of no interest to the tool. After all the compiler is free to invent all sorts of names for its local symbols, including the ones we would produce with this change in place. All the tool cares about is that the names be unambiguous. Hence any (theoretical) regression here would be a bug in the tools, which imo is no reason to delay this change any further. (Granted I should have got to it earlier, but it had been continuing to get deferred.) >>> This might all be true (theoretically). >>> >>> The livepatch build tools are fragile and very sensitive to how the >>> object files are laid out. There is a very real risk that this change >>> accidentally breaks livepatching totally, even on GCC builds. >>> >>> Were this to happen, it would be yet another 4.13 regression. >> It's perhaps a matter of perception, but I'd still call this a >> live patching tools bug then, not a 4.13 regression. > After the discussion yesterday, I was thinking a bit more about this, and I’m > not happy with the principle Andy seems to be operating on, I'm sorry that you feel that way. > that it’s reasonable to completely block a bug-fixing patch to Xen, forcing a > work-around to be used in a release, because it has unknown effects on > livepatching. This is not a fair characterisation of what is going on here. Ignore the specifics of this patch - they are not relevant to my objection. As a maintainer, it is my responsibility to ensure that crap doesn't get committed. As a consequence, it is up to me to judge whether I believe that the submitter of a patch has provided adequate thought/testing to what they are changing. Mostly this is judged on comments provided (or usually, their absence), weighed up against the risk of what it might be likely to break. In the case that I don't believe adequate through/testing/etc has been done, I'm not going to ack the patch. I'd be failing as a maintainer if I did. Ergo, I am not inclined to change my position. In this case, all I asked was "has anyone done a livepatch build?" I'd be entirely happy with a reply of "yes [I or someone else did] and it seems ok". I'd even be happy with "There does seem to be an issue with livepatching, but I've engaged the relevant people in this other thread". What I'm not happy with is "I haven't even done a single build to see whether it might have problems", and what is definitely not acceptable is, and I quote: > And I do not consider it my responsibility to > do regression tests of the live patching tools. Yes. Yes it really is, when you're making a material change specifically to this area, with a high chance of adverse impact. I don't expect you necessarily to fix the issue, but I do expect you to have some idea of whether you're trading one 4.13 blocker for a different one. >> If they're so >> extremely fragile, then I think this needs urgently taking care of >> by their maintainers. As mentioned before - how exactly static >> symbols get represented is up to the compiler, i.e. may change at >> any time. As a result, any change whatsoever would need such >> regression testing, no matter that I agree that a larger scope >> change like this one has slightly higher potential of triggering >> some issue. > This is another argument I would agree with. > > Given the closeness to the release, I’d favor checking in the patch today or > tomorrow, regardless of testing status, so that it can get more testing in > our automated systems; it can always be reverted if it is shown to break > livepatching on gcc. Oh, and shocker in what is apparently a surprise to everyone but me... Sergey went and did the work Jan believed to be "not his responsibility", and yes - this really does break livepatching. Do I expect Jan to fix it? No, but I do expect a discussion and an understanding of the issue before this patch gets anywhere near staging. ~Andrew, quite irritated at the total lack of due diligence being demonstrated here. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide
On 28/11/2019, 04:09, "Jan Beulich" wrote: On 28.11.2019 01:54, Stefano Stabellini wrote: > On Thu, 26 Sep 2019, Lars Kurth wrote: >> From: Lars Kurth >> >> This document highlights what reviewers such as maintainers and committers look >> for when reviewing code. It sets expectations for code authors and provides >> a framework for code reviewers. > > I think the document is missing a couple of things: > > - a simple one line statement that possibly the most important thing in > a code review is to indentify any bugs in the code > > - an explanation that requests for major changes to the series should be > made early on (i.e. let's not change the architecture of a feature at > v9 if possible) I also made this comment in reply to patch #5. I'll > let you decide where is the best place for it. This needs balancing. People crucial to the evaluation of a new feature and its implementation simply may not have the time to reply prior to v9. We've had situations where people posted new revisions every other day, sometimes even more than one per day. I can certainly add something on the timing , along the lines of * For complex series, consider the time it takes to do reviews (maybe with a guide of LOC per hour) and give reviewers enough time to * For series with design issues or large questions, try and highlight the key open issues in cover letters clearly and solicit feedback from key maintainers who can comment on the open issue. The idea is to save both the contributor and the reviewers time by focussing on what needs to be resolved * Don’t repost a series, unless all review comments are addressed or the reviewers asked you to do so. The problem with this is that this is somewhat in conflict with the "let's focus on the core issues and not get distracted by details early on in a review cycle". In other words, this can only work, if reviewers focus on major issues in early reviews only and do not focus on style, coding standards, etc. As soon as a reviewer comes back with detailed feedback, the contributor will feel obliged to fix these. This creates a motivation to want to please the reviewer send out new versions of series fixing cosmetic issues without addressing the substantial issues, leading to what Jan describes. I am looking for opinions here. As indicated in several other contexts before - imo people not helping to shoulder the review load should also not have the expectation that their (large) contributions will be looked at in due course. I can add something to this effect. Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH-for-4.13 v3] Rationalize max_grant_frames and max_maptrack_frames handling
From: George Dunlap Xen used to have single, system-wide limits for the number of grant frames and maptrack frames a guest was allowed to create. Increasing or decreasing this single limit on the Xen command-line would change the limit for all guests on the system. Later, per-domain limits for these values was created. The system-wide limits became strict limits: domains could not be created with higher limits, but could be created with lower limits. However, that change also introduced a range of different "default" values into various places in the toolstack: - The python libxc bindings hard-coded these values to 32 and 1024, respectively - The libxl default values are 32 and 1024 respectively. - xl will use the libxl default for maptrack, but does its own default calculation for grant frames: either 32 or 64, based on the max possible mfn. These defaults interact poorly with the hypervisor command-line limit: - The hypervisor command-line limit cannot be used to raise the limit for all guests anymore, as the default in the toolstack will effectively override this. - If you use the hypervisor command-line limit to *reduce* the limit, then the "default" values generated by the toolstack are too high, and all guest creations will fail. In other words, the toolstack defaults require any change to be effected by having the admin explicitly specify a new value in every guest. In order to address this, have grant_table_init treat negative values for max_grant_frames and max_maptrack_frames as instructions to use the system-wide default, and have all the above toolstacks default to passing -1 unless a different value is explicitly configured. This restores the old behavior in that changing the hypervisor command-line option can change the behavior for all guests, while retaining the ability to set per-guest values. It also removes the bug that reducing the system-wide max will cause all domains without explicit limits to fail. NOTE: - The Ocaml bindings require the caller to always specify a value, and the code to start a xenstored stubdomain hard-codes these to 4 and 128 respectively; this behavour will not be modified. Signed-off-by: George Dunlap Signed-off-by: Paul Durrant --- Cc: Ian Jackson Cc: Wei Liu Cc: Andrew Cooper Cc: George Dunlap Cc: Jan Beulich Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Anthony PERARD Cc: "Marek Marczykowski-Górecki" Cc: Volodymyr Babchuk Cc: "Roger Pau Monné" v3: - Make sure that specified values cannot be negative or overflow a signed int v2: - re-worked George's original commit massage a little - fixed the text in xl.conf.5.pod - use -1 as the sentinel value for 'default' and < 0 for checking it --- docs/man/xl.conf.5.pod| 6 ++-- tools/libxl/libxl.h | 4 +-- tools/libxl/libxl_types.idl | 4 +-- tools/libxl/libxlu_cfg.c | 24 ++-- tools/libxl/libxlutil.h | 2 ++ tools/python/xen/lowlevel/xc/xc.c | 4 +-- tools/xl/xl.c | 15 -- tools/xl/xl_parse.c | 9 -- xen/arch/arm/setup.c | 2 +- xen/arch/x86/setup.c | 4 +-- xen/common/grant_table.c | 46 --- xen/include/public/domctl.h | 10 --- xen/include/xen/grant_table.h | 8 +++--- 13 files changed, 100 insertions(+), 38 deletions(-) diff --git a/docs/man/xl.conf.5.pod b/docs/man/xl.conf.5.pod index 962144e38e..207ab3e77a 100644 --- a/docs/man/xl.conf.5.pod +++ b/docs/man/xl.conf.5.pod @@ -81,13 +81,15 @@ Default: C Sets the default value for the C domain config value. -Default: C<32> on hosts up to 16TB of memory, C<64> on hosts larger than 16TB +Default: value of Xen command line B parameter (or its +default value if unspecified). =item B Sets the default value for the C domain config value. -Default: C<1024> +Default: value of Xen command line B +parameter (or its default value if unspecified). =item B diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 49b56fa1a3..a2a5d321c5 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -364,8 +364,8 @@ */ #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1 -#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32 -#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024 +#define LIBXL_MAX_GRANT_FRAMES_DEFAULT -1 +#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT -1 /* * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 0546d7865a..63e29bb2fb 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -511,8 +511,8 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("vnuma_nodes", Array(libxl_vnode_info, "num_vnuma_nodes")), -("max_grant_frames",uint32, {'init_val': 'LIBXL_MAX_GRANT_FRAMES_DEFAULT'}), -("max_maptrack_frames", uint32, {'init_val':
[Xen-devel] [libvirt test] 144345: regressions - FAIL
flight 144345 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/144345/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-arm64-arm64-libvirt-qcow2 15 guest-start/debian.repeat fail REGR. vs. 144304 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144304 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144304 test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass version targeted for testing: libvirt 83254ea750b476b5041c838a583d9ea3f632a6a2 baseline version: libvirt 9d6920bd7de3f92be1894790adeb689060ab25eb Last test of basis 144304 2019-11-26 04:19:14 Z2 days Failing since144318 2019-11-27 04:19:28 Z1 days2 attempts Testing same since 144345 2019-11-28 04:19:33 Z0 days1 attempts People who touched revisions under test: Daniel P. Berrangé Michal Privoznik Peter Krempa jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt pass test-arm64-arm64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-i386-libvirt-pair pass test-arm64-arm64-libvirt-qcow2 fail test-armhf-armhf-libvirt-raw pass test-amd64-amd64-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. commit
Re: [Xen-devel] [RFC v5 000/126] error: auto propagated local_err
Vladimir Sementsov-Ogievskiy writes: > 28.11.2019 11:54, Markus Armbruster wrote: >> Please accept my sincere apologies for taking so long to reply. A few >> thoughts before I dig deeper. >> >> Vladimir Sementsov-Ogievskiy writes: >> >>> Hi all! >>> >>> At the request of Markus: full version of errp propagation. Let's look >>> at it. Cover as much as possible, except inserting macro invocation >>> where it's not necessary. >>> >>> It's huge, and so it's an RFC. >> >> It's a monster. Best to get it into full view before we commit to >> fighting it. >> >>> In v5 I've added a lot more preparation cleanups: >>> 01-23 are preparation cleanups >>>01: not changed, keep Eric's r-b >>>02: improve commit msg [Markus], keep Eric's r-b >>>03: changed, only error API here, drop r-b >>> 24 is core macro >>>- improve cover letter, wording and macro code style >>>- keep Eric's r-b >>> 25-26: automation scripts >>> - commit-per-subsystem changed a lot. it's a draft, don't bother too >>> much with it >>> - coccinelle: add support of error_propagate_prepend >>> >>> 27-126: generated patches >> >> Splitting up the monster can make fighting it easier. >> >> Your description suggests three high-level parts: >> >> Part 1: Preparation (makes sense by itself) > > I already resent part 1 all patches (handling review comments) in separate as > v6. > If it is convenient, I can resend them in one series as v7. Recommend to await review. The more we can merge without another respin, the better. >> Part 2: Error interface update (with rules what code should do now) > > Note, that patch 21 is actually from part2, not part1. > So Part 2 is 21, 24, 25. Thanks for the heads-up. > So I wait for your comments and resend (if needed) as separate small series. > > And 26 is auto-patch-splitter, but we don't need it now, if we are going > to start from several big subsystems. > >> Part 3: Make the code obey the new rules everywhere >> >> I hope we can get part 1 out of the way quickly. Diffstat: >> >> backends/cryptodev.c | 11 +--- >> block/nbd.c| 10 +-- >> block/snapshot.c | 4 +- >> dump/dump-hmp-cmds.c | 4 +- >> hw/9pfs/9p-local.c | 4 +- >> hw/9pfs/9p-proxy.c | 5 +- >> hw/core/loader-fit.c | 5 +- >> hw/core/machine-hmp-cmds.c | 6 +- >> hw/core/qdev.c | 28 >> hw/i386/amd_iommu.c| 14 ++-- >> hw/ppc/spapr.c | 2 +- >> hw/s390x/event-facility.c | 2 +- >> hw/s390x/s390-stattrib.c | 3 +- >> hw/sd/sdhci.c | 2 +- >> hw/tpm/tpm_emulator.c | 8 +-- >> hw/usb/dev-network.c | 2 +- >> hw/vfio/ap.c | 16 + >> include/block/snapshot.h | 2 +- >> include/monitor/hmp.h | 2 +- >> include/qapi/error.h | 69 ++-- >> include/qom/object.h | 4 +- >> monitor/hmp-cmds.c | 155 >> ++--- >> monitor/qmp-cmds.c | 2 +- >> net/net.c | 17 ++--- >> qdev-monitor.c | 28 >> qga/commands-posix.c | 2 +- >> qga/commands-win32.c | 2 +- >> qga/commands.c | 12 ++-- >> qom/qom-hmp-cmds.c | 4 +- >> target/ppc/kvm.c | 6 +- >> target/ppc/kvm_ppc.h | 4 +- >> ui/vnc.c | 20 ++ >> ui/vnc.h | 2 +- >> util/error.c | 30 - >> 34 files changed, 261 insertions(+), 226 deletions(-) >> >> At first glance, I can see bug fixes, non-mechanical cleanups, and >> mechanical cleanups. >> >> Within each of these three groups, we have related sub-groups. For >> instance, several patches clean up funny names for the common Error ** >> parameters. Several more rename "uncommon" Error ** parameters, to >> signal their uncommon role. I doubt splitting up these subgroups of >> related mechanical changes along subsystem lines is worthwhile. >> >> Part 2 needs careful interface review. Having part 3 ready helps there, >> because we can see rather than guess how the interface changes play out. >> We really want to get this part right from the start, because if we >> don't, we get to do part 3 again. >> >> Part 3 is what makes this a monster. I understand it's mechanical. We >> can merge it incrementally, but we do want to merge it all, and sooner >> rather than later, to avoid a mix of old and new error handling code. >> Such mixes inevitably confuse developers, and lead to new instances of >> the old patterns creeping in. >> >> I do have doubts about your automated split. >> >> I acknowledge maintainers of active subsystems may want to merge this on >> their own terms, to minimize disruption. Splitting off sub-monsters for >> them makes sense. Splitting off the long tail of less busy subsystems >> not so much; it'll only drag out the merging. Your list
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 27/11/2019 18:17, Andrew Cooper wrote: > On 21/11/2019 08:34, Jan Beulich wrote: >> On 20.11.2019 18:13, Andrew Cooper wrote: >>> On 20/11/2019 16:40, Jürgen Groß wrote: On 20.11.19 17:30, Jan Beulich wrote: > On 08.11.2019 12:18, Jan Beulich wrote: >> The .file assembler directives generated by the compiler do not include >> any path components (gcc) or just the ones specified on the command >> line >> (clang, at least version 5), and hence multiple identically named >> source >> files (in different directories) may produce identically named static >> symbols (in their kallsyms representation). The binary diffing >> algorithm >> used by xen-livepatch, however, depends on having unique symbols. >> >> Make the ENFORCE_UNIQUE_SYMBOLS Kconfig option control the (build) >> behavior, and if enabled use objcopy to prepend the (relative to the >> xen/ subdirectory) path to the compiler invoked STT_FILE symbols. Note >> that this build option is made no longer depend on LIVEPATCH, but >> merely >> defaults to its setting now. >> >> Conditionalize explicit .file directive insertion in C files where it >> exists just to disambiguate names in a less generic manner; note that >> at the same time the redundant emission of STT_FILE symbols gets >> suppressed for clang. Assembler files as well as multiply compiled C >> ones using __OBJECT_FILE__ are left alone for the time being. >> >> Since we now expect there not to be any duplicates anymore, also don't >> force the selection of the option to 'n' anymore in allrandom.config. >> Similarly COVERAGE no longer suppresses duplicate symbol warnings if >> enforcement is in effect, which in turn allows >> SUPPRESS_DUPLICATE_SYMBOL_WARNINGS to simply depend on >> !ENFORCE_UNIQUE_SYMBOLS. >> >> Signed-off-by: Jan Beulich > I've got acks from Konrad and Wei, but still need an x86 and a release > one here. Andrew? Or alternatively - Jürgen, would you rather not see > this go in anymore? In case the needed x86 Ack is coming in before RC3 I'm fine to give my Release-ack, but I'm hesitant to take it later. >>> Has anyone actually tried building a livepatch with this change in place? >> Actually - what is your concern here? The exact spelling of symbols >> names should be of no interest to the tool. After all the compiler is >> free to invent all sorts of names for its local symbols, including >> the ones we would produce with this change in place. All the tool >> cares about is that the names be unambiguous. Hence any (theoretical) >> regression here would be a bug in the tools, which imo is no reason >> to delay this change any further. (Granted I should have got to it >> earlier, but it had been continuing to get deferred.) > > This might all be true (theoretically). > > The livepatch build tools are fragile and very sensitive to how the > object files are laid out. There is a very real risk that this change > accidentally breaks livepatching totally, even on GCC builds. > > Were this to happen, it would be yet another 4.13 regression. > > This is a change to fix a concrete livepatch issue with Clang. Sure - > it resolves the symbol uniqueness failures for the in-tree build, but > considering the risks to the area you are modifying, the fact you > haven't even done a dev test of a livepatch build on GCC means that the > patch as a whole has not had what I would consider a reasonable amount > of testing. > > Luckily for you, Ross and Sergey have agreed to smoke test this with > some livepatches. They will report on this thread with their findings. Applying the patch didn't end up well for my test LP (from another thread): Perform full initial build with 8 CPU(s)... Reading special section data Apply patch and build with 8 CPU(s)... Unapply patch and build with 8 CPU(s)... Extracting new and modified ELF sections... Processing xen/arch/x86/mm/shadow/guest_2.o Processing xen/arch/x86/mm/shadow/guest_4.o Processing xen/arch/x86/mm/shadow/guest_3.o Processing xen/arch/x86/mm/guest_walk_3.o Processing xen/arch/x86/mm/hap/guest_walk_3level.o Processing xen/arch/x86/mm/hap/guest_walk_4level.o Processing xen/arch/x86/mm/hap/guest_walk_2level.o Processing xen/arch/x86/mm/guest_walk_2.o Processing xen/arch/x86/mm/guest_walk_4.o Processing xen/arch/x86/efi/efi/check.o Processing xen/arch/x86/pv/gpr_switch.o Processing xen/arch/x86/indirect-thunk.o Processing xen/arch/x86/boot/head.o Processing xen/arch/x86/x86_64/kexec_reloc.o Processing xen/arch/x86/x86_64/compat/entry.o Processing xen/arch/x86/x86_64/entry.o Processing xen/arch/x86/hvm/vmx/entry.o Processing xen/arch/x86/hvm/svm/entry.o Processing xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0s.o Processing xen/arch/x86/mnt/media/git/upstream/xen/xen/mnt/media/git/upstream/xen/xen/.xen.efi.0r.o Processing
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28/11/2019 11:03, Jan Beulich wrote: > Notes: > > - In principle we could get away without the check_eoi_deferral flag. > I've introduced it just to make sure there's as little change as > possible to unaffected paths. > - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't > strictly necessary. I don't think the cpu_has_pending_apic_eoi() check is necessary. It is checked at the head of end_nonmaskable_irq() as well. Similarly, I'm not sure that check_eoi_deferral is something that we'd want to introduce. I'll drop both of these and test, seeing as I have a repro of the problem. > - The new function's name isn't very helpful with its use in > end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to > parallel ack_APIC_irq()), but then liked this even less. I don't have a better suggestion. > - Other than I first thought serial console IRQs shouldn't be > affected, as we run them on specifically allocated high priority > vectors. Good, although I don't think this wants to find its way into the final commit message. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-next] x86/svm: Correct vm_event API for descriptor accesses
c/s d0a699a389f1 "x86/monitor: add support for descriptor access events" introduced logic looking for what appeared to be exitinfo (not that this exists in SVM - exitinfo1 or 2 do), but actually passed the exit IDT vectoring information. There is never any IDT vectoring involved in these intercepts so the value passed is always zero. In fact, SVM doesn't provide any information, even in exitinfo1 and 2. Note the error in the public API and state that this field is always 0, and drop the SVM logic in hvm_monitor_descriptor_access(). In the SVM vmexit handler itself, optimise the switch statement by observing that there is a linear transformation between the SVM exit_reason and VM_EVENT_DESC_* values. (Bloat-o-meter reports 6028 => 5877 for a saving of 151 bytes). Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné CC: Razvan Cojocaru CC: Tamas K Lengyel CC: Alexandru Isaila CC: Petre Pircalabu CC: Adrian Pop Adrian: Do you recall what information you were attempting to forward from the VMCB? I can't locate anything which would plausibly be interesting. This is part of a longer cleanup series I gathered in the wake of the task switch issues, but it is pulled out ahead due to its interaction with the public interface. --- xen/arch/x86/hvm/monitor.c| 4 xen/arch/x86/hvm/svm/svm.c| 37 + xen/include/public/vm_event.h | 4 ++-- 3 files changed, 19 insertions(+), 26 deletions(-) diff --git a/xen/arch/x86/hvm/monitor.c b/xen/arch/x86/hvm/monitor.c index 7fb1e2c04e..1f23fe25e8 100644 --- a/xen/arch/x86/hvm/monitor.c +++ b/xen/arch/x86/hvm/monitor.c @@ -113,10 +113,6 @@ void hvm_monitor_descriptor_access(uint64_t exit_info, req.u.desc_access.arch.vmx.instr_info = exit_info; req.u.desc_access.arch.vmx.exit_qualification = vmx_exit_qualification; } -else -{ -req.u.desc_access.arch.svm.exitinfo = exit_info; -} monitor_traps(current, true, ); } diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 0fb1908c18..776cf11459 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2980,29 +2980,26 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) svm_vmexit_do_pause(regs); break; -case VMEXIT_IDTR_READ: -case VMEXIT_IDTR_WRITE: -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0, -VM_EVENT_DESC_IDTR, exit_reason == VMEXIT_IDTR_WRITE); -break; - -case VMEXIT_GDTR_READ: -case VMEXIT_GDTR_WRITE: -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0, -VM_EVENT_DESC_GDTR, exit_reason == VMEXIT_GDTR_WRITE); -break; +case VMEXIT_IDTR_READ ... VMEXIT_TR_WRITE: +{ +/* + * Consecutive block of 8 exit codes (sadly not aligned). Top bit + * indicates write (vs read), bottom 2 bits map linearly to + * VM_EVENT_DESC_* values. + */ +#define E2D(e) e) - VMEXIT_IDTR_READ) & 3) + 1) +bool write = ((exit_reason - VMEXIT_IDTR_READ) & 4); +unsigned int desc = E2D(exit_reason); -case VMEXIT_LDTR_READ: -case VMEXIT_LDTR_WRITE: -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0, -VM_EVENT_DESC_LDTR, exit_reason == VMEXIT_LDTR_WRITE); -break; +BUILD_BUG_ON(E2D(VMEXIT_IDTR_READ) != VM_EVENT_DESC_IDTR); +BUILD_BUG_ON(E2D(VMEXIT_GDTR_READ) != VM_EVENT_DESC_GDTR); +BUILD_BUG_ON(E2D(VMEXIT_LDTR_READ) != VM_EVENT_DESC_LDTR); +BUILD_BUG_ON(E2D(VMEXIT_TR_READ) != VM_EVENT_DESC_TR); +#undef E2D -case VMEXIT_TR_READ: -case VMEXIT_TR_WRITE: -hvm_descriptor_access_intercept(vmcb->exitintinfo.bytes, 0, -VM_EVENT_DESC_TR, exit_reason == VMEXIT_TR_WRITE); +hvm_descriptor_access_intercept(0, 0, desc, write); break; +} default: unexpected_exit_type: diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h index 959083d8c4..d1b5c95f72 100644 --- a/xen/include/public/vm_event.h +++ b/xen/include/public/vm_event.h @@ -302,8 +302,8 @@ struct vm_event_desc_access { uint64_t exit_qualification; /* VMX: VMCS Exit Qualification */ } vmx; struct { -uint64_t exitinfo; /* SVM: VMCB EXITINFO */ -uint64_t _pad2; +uint64_t exitinfo; /* SVM: Always 0. This field made */ +uint64_t _pad2; /* its way into the API by error. */ } svm; } arch; uint8_t descriptor; /* VM_EVENT_DESC_* */ -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On Thu, Nov 28, 2019 at 12:03:47PM +0100, Jan Beulich wrote: > At the time the pending EOI stack was introduced there were no > internally used IRQs which would have the LAPIC EOI issued from the > ->end() hook. This had then changed with the introduction of IOMMUs, > but the interaction issue was presumably masked by > irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early > (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer > running without need"]). > > The problem is that with us re-enabling interrupts across handler > invocation, a higher priority (guest) interrupt may trigger while > handling a lower priority (internal) one. The EOI issued from > ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly > EOI the higher priority (guest) interrupt, breaking (among other > things) pending EOI stack logic's assumptions. Maybe there's something that I'm missing, but shouldn't hypervisor vectors always be higher priority than guest ones? I see there's already a range reserved for high priority vectors ({FIRST/LAST}_HIPRIORITY_VECTOR), what's the reason for hypervisor interrupts not using this range? IMO it seems troublesome that pending guests vectors can delay the injection of hypervisor interrupt vectors. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.19 12:19, Jan Beulich wrote: On 28.11.2019 12:03, Jan Beulich wrote: At the time the pending EOI stack was introduced there were no internally used IRQs which would have the LAPIC EOI issued from the ->end() hook. This had then changed with the introduction of IOMMUs, but the interaction issue was presumably masked by irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer running without need"]). The problem is that with us re-enabling interrupts across handler invocation, a higher priority (guest) interrupt may trigger while handling a lower priority (internal) one. The EOI issued from ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly EOI the higher priority (guest) interrupt, breaking (among other things) pending EOI stack logic's assumptions. Notes: - In principle we could get away without the check_eoi_deferral flag. I've introduced it just to make sure there's as little change as possible to unaffected paths. - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't strictly necessary. - The new function's name isn't very helpful with its use in end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to parallel ack_APIC_irq()), but then liked this even less. - Other than I first thought serial console IRQs shouldn't be affected, as we run them on specifically allocated high priority vectors. Reported-by: Igor Druzhinin Diagnosed-by: Andrew Cooper Signed-off-by: Jan Beulich In case it's not explicit enough from the description: While the issue appears to be long standing, it looks to have been masked by other shortcomings prior to 4.13. Therefore this should be considered at least a perceived regression, even if it may not strictly be one. Assuming an Ack: in a timely manner: Release-acked-by: Juergen Gross Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] psr: fix bug which may cause crash
On 28.11.2019 11:18, Yi Sun wrote: > --- a/xen/arch/x86/psr.c > +++ b/xen/arch/x86/psr.c > @@ -1271,7 +1271,8 @@ static void do_write_psr_msrs(void *data) > > for ( j = 0; j < cos_num; j++, index++ ) > { > -if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] ) > +if ( cos <= feat->cos_max && > + feat->cos_reg_val[cos * cos_num + j] != info->val[index] ) The description is indeed much better now, thanks. However, as indicated in reply to v1, this extra (and at the first glance unmotivated) bounds check wants to be accompanied by a brief but precise comment. Furthermore with the loop bounded by a local variable, why not cos_num = min(props->cos_num, feat->cos_max + 1); a few lines up from here (again suitable commented)? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86 / iommu: set up a scratch page in the quarantine domain
On 28.11.19 12:17, Jan Beulich wrote: On 27.11.2019 18:11, Paul Durrant wrote: This patch introduces a new iommu_op to facilitate a per-implementation quarantine set up, and then further code for x86 implementations (amd and vtd) to set up a read-only scratch page to serve as the source for DMA reads whilst a device is assigned to dom_io. DMA writes will continue to fault as before. The reason for doing this is that some hardware may continue to re-try DMA (despite FLR) in the event of an error, or even BME being cleared, and will fail to deal with DMA read faults gracefully. Having a scratch page mapped will allow pending DMA reads to complete and thus such buggy hardware will eventually be quiesced. NOTE: These modifications are restricted to x86 implementations only as the buggy h/w I am aware of is only used with Xen in an x86 environment. ARM may require similar code but, since I am not aware of the need, this patch does not modify any ARM implementation. Signed-off-by: Paul Durrant Reviewed-by: Jan Beulich There is still the open question of whether use of a scratch page ought to be gated on something, either are run-time or compile-time. I have no clear opinion either way here. The workaround seems low overhead enough that there may not be a need to have an admin (or build time) control for this. As to 4.13: The quarantining as a whole is pretty fresh. While it has been backported to security maintained trees, I'd still consider it a new feature in 4.13, and hence this workaround at least eligible for consideration. I agree. Release-acked-by: Juergen Gross Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
On 28.11.2019 12:03, Jan Beulich wrote: > At the time the pending EOI stack was introduced there were no > internally used IRQs which would have the LAPIC EOI issued from the > ->end() hook. This had then changed with the introduction of IOMMUs, > but the interaction issue was presumably masked by > irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early > (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer > running without need"]). > > The problem is that with us re-enabling interrupts across handler > invocation, a higher priority (guest) interrupt may trigger while > handling a lower priority (internal) one. The EOI issued from > ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly > EOI the higher priority (guest) interrupt, breaking (among other > things) pending EOI stack logic's assumptions. > > Notes: > > - In principle we could get away without the check_eoi_deferral flag. > I've introduced it just to make sure there's as little change as > possible to unaffected paths. > - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't > strictly necessary. > - The new function's name isn't very helpful with its use in > end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to > parallel ack_APIC_irq()), but then liked this even less. > - Other than I first thought serial console IRQs shouldn't be > affected, as we run them on specifically allocated high priority > vectors. > > Reported-by: Igor Druzhinin > Diagnosed-by: Andrew Cooper > Signed-off-by: Jan Beulich In case it's not explicit enough from the description: While the issue appears to be long standing, it looks to have been masked by other shortcomings prior to 4.13. Therefore this should be considered at least a perceived regression, even if it may not strictly be one. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86 / iommu: set up a scratch page in the quarantine domain
On 27.11.2019 18:11, Paul Durrant wrote: > This patch introduces a new iommu_op to facilitate a per-implementation > quarantine set up, and then further code for x86 implementations > (amd and vtd) to set up a read-only scratch page to serve as the source > for DMA reads whilst a device is assigned to dom_io. DMA writes will > continue to fault as before. > > The reason for doing this is that some hardware may continue to re-try > DMA (despite FLR) in the event of an error, or even BME being cleared, and > will fail to deal with DMA read faults gracefully. Having a scratch page > mapped will allow pending DMA reads to complete and thus such buggy > hardware will eventually be quiesced. > > NOTE: These modifications are restricted to x86 implementations only as > the buggy h/w I am aware of is only used with Xen in an x86 > environment. ARM may require similar code but, since I am not > aware of the need, this patch does not modify any ARM implementation. > > Signed-off-by: Paul Durrant Reviewed-by: Jan Beulich > There is still the open question of whether use of a scratch page ought > to be gated on something, either are run-time or compile-time. I have no clear opinion either way here. The workaround seems low overhead enough that there may not be a need to have an admin (or build time) control for this. As to 4.13: The quarantining as a whole is pretty fresh. While it has been backported to security maintained trees, I'd still consider it a new feature in 4.13, and hence this workaround at least eligible for consideration. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] x86/IRQ: make internally used IRQs also honor the pending EOI stack
At the time the pending EOI stack was introduced there were no internally used IRQs which would have the LAPIC EOI issued from the ->end() hook. This had then changed with the introduction of IOMMUs, but the interaction issue was presumably masked by irq_guest_eoi_timer_fn() frequently EOI-ing interrupts way too early (which got fixed by 359cf6f8a0ec ["x86/IRQ: don't keep EOI timer running without need"]). The problem is that with us re-enabling interrupts across handler invocation, a higher priority (guest) interrupt may trigger while handling a lower priority (internal) one. The EOI issued from ->end() (for ACKTYPE_EOI kind interrupts) would then mistakenly EOI the higher priority (guest) interrupt, breaking (among other things) pending EOI stack logic's assumptions. Notes: - In principle we could get away without the check_eoi_deferral flag. I've introduced it just to make sure there's as little change as possible to unaffected paths. - Similarly the cpu_has_pending_apic_eoi() check in do_IRQ() isn't strictly necessary. - The new function's name isn't very helpful with its use in end_level_ioapic_irq_new(). I did also consider eoi_APIC_irq() (to parallel ack_APIC_irq()), but then liked this even less. - Other than I first thought serial console IRQs shouldn't be affected, as we run them on specifically allocated high priority vectors. Reported-by: Igor Druzhinin Diagnosed-by: Andrew Cooper Signed-off-by: Jan Beulich --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -1734,7 +1734,7 @@ static void end_level_ioapic_irq_new(str v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1)); -ack_APIC_irq(); +end_nonmaskable_irq(desc, vector); if ( (desc->status & IRQ_MOVE_PENDING) && !io_apic_level_ack_pending(desc->irq) ) --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -438,6 +438,7 @@ int __init init_irq_data(void) } static void __do_IRQ_guest(int vector); +static void flush_ready_eoi(void); static void ack_none(struct irq_desc *desc) { @@ -865,6 +866,7 @@ void pirq_set_affinity(struct domain *d, } DEFINE_PER_CPU(unsigned int, irq_count); +static DEFINE_PER_CPU(bool, check_eoi_deferral); uint8_t alloc_hipriority_vector(void) { @@ -1008,7 +1010,25 @@ void do_IRQ(struct cpu_user_regs *regs) out: if ( desc->handler->end ) +{ +/* + * If higher priority vectors still have their EOIs pending, we may + * not issue an EOI here, as this would EOI the highest priority one. + */ +if ( cpu_has_pending_apic_eoi() ) +{ +this_cpu(check_eoi_deferral) = true; +desc->handler->end(desc, vector); +this_cpu(check_eoi_deferral) = false; + +spin_unlock(>lock); +flush_ready_eoi(); +goto out_no_unlock; +} + desc->handler->end(desc, vector); +} + out_no_end: spin_unlock(>lock); out_no_unlock: @@ -1163,6 +1183,29 @@ bool cpu_has_pending_apic_eoi(void) return pending_eoi_sp(this_cpu(pending_eoi)) != 0; } +void end_nonmaskable_irq(struct irq_desc *desc, uint8_t vector) +{ +struct pending_eoi *peoi = this_cpu(pending_eoi); +unsigned int sp = pending_eoi_sp(peoi); + +if ( !this_cpu(check_eoi_deferral) || !sp || peoi[sp - 1].vector < vector ) +{ +ack_APIC_irq(); +return; +} + +/* Defer this vector's EOI until all higher ones have been EOI-ed. */ +pending_eoi_sp(peoi) = sp + 1; +do { +peoi[sp] = peoi[sp - 1]; +} while ( --sp && peoi[sp - 1].vector > vector ); +ASSERT(!sp || peoi[sp - 1].vector < vector); + +peoi[sp].irq = desc->irq; +peoi[sp].vector = vector; +peoi[sp].ready = 1; +} + static inline void set_pirq_eoi(struct domain *d, unsigned int irq) { if ( d->arch.pirq_eoi_map ) --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -512,11 +512,6 @@ static void ack_maskable_msi_irq(struct ack_APIC_irq(); /* ACKTYPE_NONE */ } -void end_nonmaskable_msi_irq(struct irq_desc *desc, u8 vector) -{ -ack_APIC_irq(); /* ACKTYPE_EOI */ -} - /* * IRQ chip for MSI PCI/PCI-X/PCI-Express devices, * which implement the MSI or MSI-X capability structure. @@ -539,7 +534,7 @@ static hw_irq_controller pci_msi_nonmask .enable = irq_enable_none, .disable = irq_disable_none, .ack = ack_nonmaskable_msi_irq, -.end = end_nonmaskable_msi_irq, +.end = end_nonmaskable_irq, .set_affinity = set_msi_affinity }; --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -427,7 +427,7 @@ static unsigned int iommu_msi_startup(st static void iommu_msi_end(struct irq_desc *desc, u8 vector) { iommu_msi_unmask(desc); -ack_APIC_irq(); +end_nonmaskable_irq(desc, vector); } @@ -460,7 +460,7 @@ static void iommu_maskable_msi_shutdown( * maskable flavors here, as we want the ACK to be issued in ->end(). */ #define
Re: [Xen-devel] [Xen-users] 4.13RC3 and PVHVM makes drive drops just after boot
On Wed, Nov 27, 2019 at 10:54:42PM +1000, Andrew wrote: > I have the system setup for testing, so happy to test any patches that may > come out. Thanks Andrew for the report, I think I understand the issue. I'll work on a fix. Cheers, -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/arm: include xen-tools/libs.h from libxl_arm_acpi.c
On Thu, Nov 28, 2019 at 11:30:34AM +0100, Jürgen Groß wrote: > On 28.11.19 11:15, Wei Liu wrote: > > On Wed, Nov 27, 2019 at 06:24:58PM -0800, Stefano Stabellini wrote: > > > libxl_arm_acpi.c is using BUILD_BUG_ON but it is not including > > > xen-tools/libs.h that defines it. > > > > > > Signed-off-by: Stefano Stabellini > > > > Acked-by: Wei Liu > > > > Juergen, this is a trivial patch. I think it can go in 4.13. > > Why is this patch needed? > > tools/libxl/libxl_arm_acpi.c includes libxl_arm.h, which includes > libxl_internal.h, which includes xen-tools/libs.h. Oh I missed that. In that case I don't think this patch is required for 4.13. Stefano, did you see a build error or something? Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] Rationalize max_grant_frames and max_maptrack_frames handling
On 28.11.2019 11:26, Durrant, Paul wrote: >> From: George Dunlap >> Sent: 27 November 2019 16:52 >> >> On 11/27/19 4:43 PM, Durrant, Paul wrote: From: George Dunlap Sent: 27 November 2019 16:34 TBH I'd be tempted to define XENSOMETHING_MAX_DEFAULT as (unsigned long)(-1) or something, and explicitly compare to that. That leaves open the possibility of having more sentinel values if we decided we wanted them. >>> >>> I'm extremely confused now. What do you want me to compare and where? >>> >>> I assume we're talking about the opt_XXX values. Am I supposed to stop >>> INT_MAX being assigned to them? Or should I define local unsigned values >> for max_maptrack/grant_frames and simply initialize them to the passed-in >> arg (if >= 0) or the opt_XXX value otherwise. >> >> In this version of the patch, you change the domctl arguments from >> uint32_t to int32_t. I would leave them uint32_t, and if ( >> max_grant_frames == XENSOMETHING_MAX_DEFAULT ) max_grant_frames = opt_ >> >> Then the only invalid value we have to worry about is checking for >> XENSOMETHING_MAX_DEFAULT. >> >> This is a suggestion, and I wouldn't argue strongly if someone thought >> it was a bad idea, but it seems like the most straightforward option to >> me. > > AFAICT the definition of that invalid value is going to be needed by both > the grant table code and the user-space toolstack code so I guess the > logical place for the definition would be a tools-only section of the > public grant table header? TBH I prefer the idea of any negative value > being default though. FWIW I agree, as I can't really see what other purposes we might need sentinel values for down the road. > As long as the xl/libxl parts don't allow a *specified* value > INT_MAX > then that should be fine, although for the full story a custom parser > for the command line values should also be added to ensure the same > semantics there. Right. While going a little far, I can't right now see easy alternatives to a custom parser. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/arm: include xen-tools/libs.h from libxl_arm_acpi.c
On 28.11.19 11:15, Wei Liu wrote: On Wed, Nov 27, 2019 at 06:24:58PM -0800, Stefano Stabellini wrote: libxl_arm_acpi.c is using BUILD_BUG_ON but it is not including xen-tools/libs.h that defines it. Signed-off-by: Stefano Stabellini Acked-by: Wei Liu Juergen, this is a trivial patch. I think it can go in 4.13. Why is this patch needed? tools/libxl/libxl_arm_acpi.c includes libxl_arm.h, which includes libxl_internal.h, which includes xen-tools/libs.h. So this is a purely cosmetic patch, especially as libxl_arm.h and libxl_internal.h are libxl-internal includes, so there is a very low risk for the include of xen-tools/libs.h suddenly disappearing, and even it would due to a patch of one of those include files, it would be detected by a failing build immediately. So I won't take it for 4.13, even if being trivial. Juergen Wei. --- tools/libxl/libxl_arm_acpi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/libxl/libxl_arm_acpi.c b/tools/libxl/libxl_arm_acpi.c index ba874c3d32..52c476ff65 100644 --- a/tools/libxl/libxl_arm_acpi.c +++ b/tools/libxl/libxl_arm_acpi.c @@ -19,6 +19,7 @@ #include "libxl_arm.h" #include +#include /* Below typedefs are useful for the headers under acpi/ */ typedef uint8_t u8; -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed
> -Original Message- > From: Jan Beulich > Sent: 28 November 2019 10:23 > To: Durrant, Paul ; Boris Ostrovsky > > Cc: xen-devel@lists.xenproject.org; Grall, Julien ; > Andrew Cooper ; Roger Pau Monné > ; Jun Nakajima ; Kevin Tian > ; Wei Liu > Subject: Re: [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the > domain is destroyed > > On 28.11.2019 10:38, Paul Durrant wrote: > > From: Julien Grall > > > > A guest will setup a shared page with the hypervisor for each vCPU via > > XENPMU_init. The page will then get mapped in the hypervisor and only > > released when XENPMU_finish is called. > > > > This means that if the guest fails to invoke XENPMU_finish, e.g if it is > > destroyed rather than cleanly shut down, the page will stay mapped in > the > > hypervisor. One of the consequences is the domain can never be fully > > destroyed as a page reference is still held. > > > > As Xen should never rely on the guest to correctly clean-up any > > allocation in the hypervisor, we should also unmap such pages during the > > domain destruction if there are any left. > > > > We can re-use the same logic as in pvpmu_finish(). To avoid > > duplication, move the logic in a new function that can also be called > > from vpmu_destroy(). > > > > NOTE: - The call to vpmu_destroy() must also be moved from > > arch_vcpu_destroy() into domain_relinquish_resources() such that > > the reference on the mapped page does not prevent > domain_destroy() > > (which calls arch_vcpu_destroy()) from being called. > > - Whilst it appears that vpmu_arch_destroy() is idempotent it is > > by no means obvious. Hence make sure the VPMU_CONTEXT_ALLOCATED > > flag is cleared at the end of vpmu_arch_destroy(). > > - This is not an XSA because vPMU is not security supported (see > > XSA-163). > > > > Signed-off-by: Julien Grall > > Signed-off-by: Paul Durrant > > --- > > Cc: Jan Beulich > > Cc: Andrew Cooper > > Cc: Wei Liu > > Cc: "Roger Pau Monné" > > Cc: Jun Nakajima > > Cc: Kevin Tian > > > > v2: > > - Re-word commit comment slightly > > - Re-enforce idempotency of vmpu_arch_destroy() > > - Move invocation of vpmu_destroy() earlier in > >domain_relinquish_resources() > > What about v3? Oh, sorry: v3: - Add comment regarding XSA-163 - Revert changes setting VPMU_CONTEXT_ALLOCATED in common code Paul > > > --- a/xen/arch/x86/cpu/vpmu.c > > +++ b/xen/arch/x86/cpu/vpmu.c > > @@ -576,11 +576,36 @@ static void vpmu_arch_destroy(struct vcpu *v) > > > > vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); > > } > > + > > +vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); > > } > > Boris, to be on the safe side - are you in agreement with this > change, now that the setting of the flag is being left untouched? > > Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] Rationalize max_grant_frames and max_maptrack_frames handling
> -Original Message- > From: George Dunlap > Sent: 27 November 2019 16:52 > To: Durrant, Paul ; Jan Beulich > Cc: AndrewCooper ; Anthony PERARD > ; Roger Pau Monné ; > Volodymyr Babchuk ; George Dunlap > ; Ian Jackson ; > Marek Marczykowski-Górecki ; Stefano > Stabellini ; xen-devel@lists.xenproject.org; > Konrad Rzeszutek Wilk ; Julien Grall > ; Wei Liu > Subject: Re: [PATCH v2] Rationalize max_grant_frames and > max_maptrack_frames handling > > On 11/27/19 4:43 PM, Durrant, Paul wrote: > >> -Original Message- > >> From: George Dunlap > >> Sent: 27 November 2019 16:34 > >> To: Jan Beulich ; Durrant, Paul > > >> Cc: AndrewCooper ; Anthony PERARD > >> ; Roger Pau Monné ; > >> Volodymyr Babchuk ; George Dunlap > >> ; Ian Jackson ; > >> Marek Marczykowski-Górecki ; Stefano > >> Stabellini ; xen-devel@lists.xenproject.org; > >> Konrad Rzeszutek Wilk ; Julien Grall > >> ; Wei Liu > >> Subject: Re: [PATCH v2] Rationalize max_grant_frames and > >> max_maptrack_frames handling > >> > >> On 11/27/19 4:20 PM, Jan Beulich wrote: > >>> On 27.11.2019 17:14, Durrant, Paul wrote: > > From: Jan Beulich > > Sent: 27 November 2019 15:56 > > > > On 27.11.2019 15:37, Paul Durrant wrote: > >> --- a/xen/arch/arm/setup.c > >> +++ b/xen/arch/arm/setup.c > >> @@ -789,7 +789,7 @@ void __init start_xen(unsigned long > > boot_phys_offset, > >> .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > >> .max_evtchn_port = -1, > >> .max_grant_frames = gnttab_dom0_frames(), > >> -.max_maptrack_frames = opt_max_maptrack_frames, > >> +.max_maptrack_frames = -1, > >> }; > >> int rc; > >> > >> --- a/xen/arch/x86/setup.c > >> +++ b/xen/arch/x86/setup.c > >> @@ -697,8 +697,8 @@ void __init noreturn __start_xen(unsigned long > > mbi_p) > >> struct xen_domctl_createdomain dom0_cfg = { > >> .flags = IS_ENABLED(CONFIG_TBOOT) ? > >> XEN_DOMCTL_CDF_s3_integrity > > : 0, > >> .max_evtchn_port = -1, > >> -.max_grant_frames = opt_max_grant_frames, > >> -.max_maptrack_frames = opt_max_maptrack_frames, > >> +.max_grant_frames = -1, > >> +.max_maptrack_frames = -1, > >> }; > > > > With these there's no need anymore for opt_max_maptrack_frames to > > be non-static. Sadly Arm still wants opt_max_grant_frames > > accessible in gnttab_dom0_frames(). > > Yes, I was about to make them static until I saw what the ARM code > did. > >>> > >>> But the one that Arm doesn't need should become static now. > >>> > >> --- a/xen/common/grant_table.c > >> +++ b/xen/common/grant_table.c > >> @@ -1837,12 +1837,18 @@ active_alloc_failed: > >> return -ENOMEM; > >> } > >> > >> -int grant_table_init(struct domain *d, unsigned int > >> max_grant_frames, > >> - unsigned int max_maptrack_frames) > >> +int grant_table_init(struct domain *d, int max_grant_frames, > >> + int max_maptrack_frames) > >> { > >> struct grant_table *gt; > >> int ret = -ENOMEM; > >> > >> +/* Default to maximum value if no value was specified */ > >> +if ( max_grant_frames < 0 ) > >> +max_grant_frames = opt_max_grant_frames; > >> +if ( max_maptrack_frames < 0 ) > >> +max_maptrack_frames = opt_max_maptrack_frames; > >> + > >> if ( max_grant_frames < INITIAL_NR_GRANT_FRAMES || > > > > I take it we don't expect people to specify 2^^31 or more > > frames for either option. It looks like almost everything > > here would cope, except for this very comparison. Nevertheless > > I wonder whether you wouldn't better confine both values to > > [0, INT_MAX] now, including when adjusted at runtime. > > I can certainly remove the 'U' from the definition of > INITIAL_NR_GRANT_FRAMES, > >>> > >>> Oh, I didn't pay attention that is has a U on it - in this case > >>> the comparison above is fine. > >>> > but do you want me to make opt_max_grant_frames and > opt_max_maptrack_frames into signed ints and add signed parser > code too? > >>> > >>> Definitely not. They should remain unsigned quantities, but their > >>> values may need sanity checking now. > >>> > I also don't understand the 'adjusted at runtime' part. > >>> > >>> Well, for a command line drive value you could adjust an out of > >>> bounds value in some __init function. But for runtime modifiable > >>> settings you won't get away this easily. > >> > >> TBH I'd be tempted to define XENSOMETHING_MAX_DEFAULT as (unsigned > >> long)(-1) or something, and explicitly compare to that. That leaves > >> open the possibility of having more sentinel values if we decided we > >> wanted them. > > > > I'm extremely confused now. What do you want me to compare and where? > > > >
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 28.11.19 11:17, George Dunlap wrote: On Nov 28, 2019, at 9:55 AM, Jan Beulich wrote: Has anyone actually tried building a livepatch with this change in place? Actually - what is your concern here? The exact spelling of symbols names should be of no interest to the tool. After all the compiler is free to invent all sorts of names for its local symbols, including the ones we would produce with this change in place. All the tool cares about is that the names be unambiguous. Hence any (theoretical) regression here would be a bug in the tools, which imo is no reason to delay this change any further. (Granted I should have got to it earlier, but it had been continuing to get deferred.) This might all be true (theoretically). The livepatch build tools are fragile and very sensitive to how the object files are laid out. There is a very real risk that this change accidentally breaks livepatching totally, even on GCC builds. Were this to happen, it would be yet another 4.13 regression. It's perhaps a matter of perception, but I'd still call this a live patching tools bug then, not a 4.13 regression. After the discussion yesterday, I was thinking a bit more about this, and I’m not happy with the principle Andy seems to be operating on, that it’s reasonable to completely block a bug-fixing patch to Xen, forcing a work-around to be used in a release, because it has unknown effects on livepatching. Consider the relationship between Xen and Linux, for example. Suppose that a patch were posted to Linux to fix an issue, and Juergen responded by saying that he wasn’t happy with it because it might possibly break things running under Xen. But he didn’t actually test it himself, nor propose some alternate way of fixing the original problem; rather, he expected the original patch submitter, who doesn’t use Xen, to set up a Xen system and test it themselves before accepting it. Do you think anyone in the kernel community would stand for that? Of course not. Naturally the patch would be *paused* while *people in the Xen community* tested and or proposed alternate solutions; but if there was a delay, eventually it would be checked in. I think the same principle should apply here. If people using the livepatch code are afraid that Jan’s patch *may* affect livepatching on gcc, then they should be given time to review, test, and/or propose alternate solutions. But it should be the responsibility of people working on that code, not the responsibility of developers who don’t use that code. If they're so extremely fragile, then I think this needs urgently taking care of by their maintainers. As mentioned before - how exactly static symbols get represented is up to the compiler, i.e. may change at any time. As a result, any change whatsoever would need such regression testing, no matter that I agree that a larger scope change like this one has slightly higher potential of triggering some issue. This is another argument I would agree with. Given the closeness to the release, I’d favor checking in the patch today or tomorrow, regardless of testing status, so that it can get more testing in our automated systems; it can always be reverted if it is shown to break livepatching on gcc. In that case: please rather today than tomorrow. The earlier the better. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed
On 28.11.2019 10:38, Paul Durrant wrote: > From: Julien Grall > > A guest will setup a shared page with the hypervisor for each vCPU via > XENPMU_init. The page will then get mapped in the hypervisor and only > released when XENPMU_finish is called. > > This means that if the guest fails to invoke XENPMU_finish, e.g if it is > destroyed rather than cleanly shut down, the page will stay mapped in the > hypervisor. One of the consequences is the domain can never be fully > destroyed as a page reference is still held. > > As Xen should never rely on the guest to correctly clean-up any > allocation in the hypervisor, we should also unmap such pages during the > domain destruction if there are any left. > > We can re-use the same logic as in pvpmu_finish(). To avoid > duplication, move the logic in a new function that can also be called > from vpmu_destroy(). > > NOTE: - The call to vpmu_destroy() must also be moved from > arch_vcpu_destroy() into domain_relinquish_resources() such that > the reference on the mapped page does not prevent domain_destroy() > (which calls arch_vcpu_destroy()) from being called. > - Whilst it appears that vpmu_arch_destroy() is idempotent it is > by no means obvious. Hence make sure the VPMU_CONTEXT_ALLOCATED > flag is cleared at the end of vpmu_arch_destroy(). > - This is not an XSA because vPMU is not security supported (see > XSA-163). > > Signed-off-by: Julien Grall > Signed-off-by: Paul Durrant > --- > Cc: Jan Beulich > Cc: Andrew Cooper > Cc: Wei Liu > Cc: "Roger Pau Monné" > Cc: Jun Nakajima > Cc: Kevin Tian > > v2: > - Re-word commit comment slightly > - Re-enforce idempotency of vmpu_arch_destroy() > - Move invocation of vpmu_destroy() earlier in >domain_relinquish_resources() What about v3? > --- a/xen/arch/x86/cpu/vpmu.c > +++ b/xen/arch/x86/cpu/vpmu.c > @@ -576,11 +576,36 @@ static void vpmu_arch_destroy(struct vcpu *v) > > vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); > } > + > +vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); > } Boris, to be on the safe side - are you in agreement with this change, now that the setting of the flag is being left untouched? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2] psr: fix bug which may cause crash
During test, we found a crash on Xen with below trace. (XEN) Xen call trace: (XEN)[] R psr.c#l3_cdp_write_msr+0x1e/0x22 (XEN)[] F psr.c#do_write_psr_msrs+0x6d/0x109 (XEN)[] F smp_call_function_interrupt+0x5a/0xac (XEN)[] F call_function_interrupt+0x20/0x34 (XEN)[] F do_IRQ+0x175/0x6ae (XEN)[] F common_interrupt+0x10a/0x120 (XEN)[] F cpu_idle.c#acpi_idle_do_entry+0x9d/0xb1 (XEN)[] F cpu_idle.c#acpi_processor_idle+0x41d/0x626 (XEN)[] F domain.c#idle_loop+0xa5/0xa7 (XEN) (XEN) (XEN) (XEN) Panic on CPU 20: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=] (XEN) The bug happens when CDP and MBA co-exist and MBA COS_MAX is bigger than CDP COS_MAX. E.g. MBA has 8 COS registers but CDP only have 6. When setting MBA throttling value for the 7th guest, the value array would be: +--+--+--+ | Data default val | Code default val | MBA throttle | +--+--+--+ Then, COS id 7 will be selected for writting the values. We should avoid writting CDP data/code valules to COS id 7 MSR because it exceeds the CDP COS_MAX. Signed-off-by: Yi Sun --- xen/arch/x86/psr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 5866a26..ecca5b4 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -1271,7 +1271,8 @@ static void do_write_psr_msrs(void *data) for ( j = 0; j < cos_num; j++, index++ ) { -if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] ) +if ( cos <= feat->cos_max && + feat->cos_reg_val[cos * cos_num + j] != info->val[index] ) { feat->cos_reg_val[cos * cos_num + j] = info->val[index]; props->write_msr(cos, info->val[index], props->type[j]); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 6/6] Added Resolving Disagreement
On 28.11.2019 01:56, Stefano Stabellini wrote: > On Thu, 26 Sep 2019, Lars Kurth wrote: >> +This could take for example the form of >> +> Do you think it would be useful for the code to do XXX? >> +> I can imagine a user wanting to do YYY (and XXX would enable this) >> + >> +That potentially adds additional work for the code author, which they may >> not have >> +the time to perform. It is good practice for authors to consider such a >> request in terms of >> +* Usefulness to the user >> +* Code churn, complexity or impact on other system properties >> +* Extra time to implement and report back to the reviewer >> + >> +If you believe that the impact/cost is too high, report back to the >> reviewer. To resolve >> +this, it is advisable to >> +* Report your findings >> +* And then check whether this was merely an interesting suggestion, or >> something the >> +reviewer feels more strongly about >> + >> +In the latter case, there are typically several common outcomes >> +* The **author and reviewer agree** that the suggestion should be >> implemented >> +* The **author and reviewer agree** that it may make sense to defer >> implementation >> +* The **author and reviewer agree** that it makes no sense to implement the >> suggestion >> + >> +The author of a patch would typically suggest their preferred outcome, for >> example >> +> I am not sure it is worth to implement XXX >> +> Do you think this could be done as a separate patch in future? >> + >> +In cases, where no agreement can be found, the best approach would be to >> get an >> +independent opinion from another maintainer or the project's leadership >> team. > > I think we should mention somewhere here that it is recommended for > reviewers to be explicit about whether a request is optional or whether > it is a requirement. > > For instance: "I think it would be good if X also did Y" doesn't say if > it is optional (future work) or it is actually required as part of this > series. More explicit word choices are preferable, such as: > > "I think it would be good if X also did Y, not a requirement but good to > have." > > "I think it would be good if X also did Y and it should be part of this > series." I think without it being made explicit that something is optional, the assumption should be that it isn't. I.e. in the first example I agree with the idea to have something after the comma, but in the second example I think the extra wording is a waste of effort. > I think there is something else we should say on this topic. There is a > category of things which could be done in multiple ways and it is not > overtly obvious which one is best. It is done to the maintainer and the > author personal styles. It is easy to disagree on that. > > I think a good recommendation would be for the contributor to try to > follow the maintainers requests, even if they could be considered > "style", trusting their experience on the matter. And a good > recommendation for the maintainer would be to try to let the contributor > have freedom of implementation choice on things that don't make a > significant difference. I think we try to, but I also think we suffer from too little clear documentation on e.g. style aspects. Attempts on my part to address this have mostly (not entirely) lead no-where (lack of feedback on proposed patches to ./CODING_STYLE). So for the time being there are (many) aspects where we have de-facto expectations that aren't written down anywhere, with the result of (in a subset of cases) disagreement on what the perceived de-facto standard actually is. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-4.12-testing test] 144339: tolerable FAIL - PUSHED
flight 144339 xen-4.12-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/144339/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qcow217 guest-localmigrate/x10 fail like 144319 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen e10c1fbde8a81f541cb2c58a3078b2e7df89a801 baseline version: xen 875879a7b8c1d561e6ea2a20958a1e61242ffef1 Last test of basis 144319 2019-11-27 05:27:24 Z1 days Testing same since 144339 2019-11-27 23:05:58 Z0 days1 attempts People who touched revisions under test: Julien Grall Mark Rutland Stefano Stabellini Will Deacon jobs: build-amd64-xsm pass build-arm64-xsm
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
> On Nov 28, 2019, at 9:55 AM, Jan Beulich wrote: Has anyone actually tried building a livepatch with this change in place? >>> Actually - what is your concern here? The exact spelling of symbols >>> names should be of no interest to the tool. After all the compiler is >>> free to invent all sorts of names for its local symbols, including >>> the ones we would produce with this change in place. All the tool >>> cares about is that the names be unambiguous. Hence any (theoretical) >>> regression here would be a bug in the tools, which imo is no reason >>> to delay this change any further. (Granted I should have got to it >>> earlier, but it had been continuing to get deferred.) >> >> This might all be true (theoretically). >> >> The livepatch build tools are fragile and very sensitive to how the >> object files are laid out. There is a very real risk that this change >> accidentally breaks livepatching totally, even on GCC builds. >> >> Were this to happen, it would be yet another 4.13 regression. > > It's perhaps a matter of perception, but I'd still call this a > live patching tools bug then, not a 4.13 regression. After the discussion yesterday, I was thinking a bit more about this, and I’m not happy with the principle Andy seems to be operating on, that it’s reasonable to completely block a bug-fixing patch to Xen, forcing a work-around to be used in a release, because it has unknown effects on livepatching. Consider the relationship between Xen and Linux, for example. Suppose that a patch were posted to Linux to fix an issue, and Juergen responded by saying that he wasn’t happy with it because it might possibly break things running under Xen. But he didn’t actually test it himself, nor propose some alternate way of fixing the original problem; rather, he expected the original patch submitter, who doesn’t use Xen, to set up a Xen system and test it themselves before accepting it. Do you think anyone in the kernel community would stand for that? Of course not. Naturally the patch would be *paused* while *people in the Xen community* tested and or proposed alternate solutions; but if there was a delay, eventually it would be checked in. I think the same principle should apply here. If people using the livepatch code are afraid that Jan’s patch *may* affect livepatching on gcc, then they should be given time to review, test, and/or propose alternate solutions. But it should be the responsibility of people working on that code, not the responsibility of developers who don’t use that code. > If they're so > extremely fragile, then I think this needs urgently taking care of > by their maintainers. As mentioned before - how exactly static > symbols get represented is up to the compiler, i.e. may change at > any time. As a result, any change whatsoever would need such > regression testing, no matter that I agree that a larger scope > change like this one has slightly higher potential of triggering > some issue. This is another argument I would agree with. Given the closeness to the release, I’d favor checking in the patch today or tomorrow, regardless of testing status, so that it can get more testing in our automated systems; it can always be reverted if it is shown to break livepatching on gcc. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] tools/arm: include xen-tools/libs.h from libxl_arm_acpi.c
On Wed, Nov 27, 2019 at 06:24:58PM -0800, Stefano Stabellini wrote: > libxl_arm_acpi.c is using BUILD_BUG_ON but it is not including > xen-tools/libs.h that defines it. > > Signed-off-by: Stefano Stabellini Acked-by: Wei Liu Juergen, this is a trivial patch. I think it can go in 4.13. Wei. > --- > tools/libxl/libxl_arm_acpi.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/libxl/libxl_arm_acpi.c b/tools/libxl/libxl_arm_acpi.c > index ba874c3d32..52c476ff65 100644 > --- a/tools/libxl/libxl_arm_acpi.c > +++ b/tools/libxl/libxl_arm_acpi.c > @@ -19,6 +19,7 @@ > #include "libxl_arm.h" > > #include > +#include > > /* Below typedefs are useful for the headers under acpi/ */ > typedef uint8_t u8; > -- > 2.17.1 > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 4/6] Add Code Review Guide
On 28.11.2019 01:54, Stefano Stabellini wrote: > On Thu, 26 Sep 2019, Lars Kurth wrote: >> From: Lars Kurth >> >> This document highlights what reviewers such as maintainers and committers >> look >> for when reviewing code. It sets expectations for code authors and provides >> a framework for code reviewers. > > I think the document is missing a couple of things: > > - a simple one line statement that possibly the most important thing in > a code review is to indentify any bugs in the code > > - an explanation that requests for major changes to the series should be > made early on (i.e. let's not change the architecture of a feature at > v9 if possible) I also made this comment in reply to patch #5. I'll > let you decide where is the best place for it. This needs balancing. People crucial to the evaluation of a new feature and its implementation simply may not have the time to reply prior to v9. We've had situations where people posted new revisions every other day, sometimes even more than one per day. As indicated in several other contexts before - imo people not helping to shoulder the review load should also not have the expectation that their (large) contributions will be looked at in due course. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] getting 4.11.3 ready
On 28.11.2019 00:51, Lars Kurth wrote: > On 29/10/2019, 12:41, "Stefano Stabellini" wrote: > On Tue, 29 Oct 2019, Julien Grall wrote: > > On 28/10/2019 21:43, Stefano Stabellini wrote: > > > On Mon, 28 Oct 2019, Julien Grall wrote: > > >> Hi, > > >> > > >> On 25/10/2019 11:31, Jan Beulich wrote: > > >>> All, > > >>> > > >>> the 4.11.3 stable release is due. I intend to wait for the XSA fixes > > >>> going public on the 31st, but not (much) longer. Please point out > > >>> backporting candidates that you find missing from the respective > > >>> stable trees. I have three ones queued which haven't passed the push > > >>> gate to the master branch yet: > > >>> > > >>> 9257c218e5 x86/vvmx: Fix the use of RDTSCP when it is intercepted > at L0 > > >>> 7eee9c16d6 x86/tsc: update vcpu time info on guest TSC adjustments > > >>> 9633929824 x86: fix off-by-one in is_xen_fixed_mfn() > > >> > > >> We don't seem to have backported patches for quite a while on Arm. > Some of my > > >> patches have been marked as to be backported from the beginning [1]. > I am > > >> specifically thinking to: > > >> > > >> e04818b46d xen/arm: traps: Avoid using BUG_ON() to check guest state > in > > >> advance_pc() > > > > Urgh, I gave the correct title but the wrong commit sha1. It should be > > > > 72615f2e6b98e861c08abb1d2b194126013d54fe > > > > > > > > I have e04818b46d, plus the following marked for backport: > > > > > > e98edccb944a80db782e551f3090628e66c7fb52 xen/arm: SCTLR_EL1 is a > 64-bit register on Arm64 > > > > There are more than that to backport: > > > > 30f5047b2c4e577436b505ba7627f34c3be02014xen/arm: gic: Make sure the > number of interrupt lines is valid before using it [4.11] > > 8aa276235b93eeb4f81095c638970900e19b31e5xen/arm: irq: End cleanly > spurious interrupt[4.11] > > b4df73de493954c44f240f78779c9bd3782e1572xen/arm: gic-v2: deactivate > interrupts during initialization[4.11] > > 0322e0db5b29a0d1ce4b452885e34023e3a4b00earm: gic-v3: deactivate > interrupts during initialization[4.11] > > > > 5ba1c5d0641cf63086b3058e547fcd28c3c4a011xen/arm: memaccess: > Initialize correctly *access in __p2m_get_mem_access[4.12] > > 07e44b3d1be32fa2165c2367ae3ef9c6c8b39e1exen/arm: Implement > workaround for Cortex A-57 and Cortex A72 AT speculate [4.12] > > > > 08e2059facd78d5ffaf206ba06ac2017c4adeed4xen/arm: setup: Calculate > correctly the size of Xen [4.11+] > > 8dba9a81e7c62b8a7dbe023fffecd2e16cc20486xen/arm: Don't use _end in > is_xen_fixed_mfn() [4.11+] > > 671878779741b38c5f2363adceef8de2ce0b3945xen/arm: p2m: Free the p2m > entry after flushing the IOMMU TLBs [4.11+] > > 7f4217cc60574866cb90d67d9750228c6b86c91exen/arm: vsmc: The function > identifier is always 32-bit [4.11+] > > 612d476e74a314be514ee6a9744eea8db09d32e5xen/arm64: Correctly > compute the virtual address in maddr_to_virt() [4.11+] > > f51027be0688540aaab61513b06a8693a37e4c00xen/arm: fix nr_pdxs > calculation[4.11+] > > a189ef027dbb7a3c0dfe566137f05c06d6685fb9xen/arm: mm: Flush the TLBs > even if a mapping failed in create_xen_entries [4.11+] > > They all make sense, I did the backports, building each commit > individually. > > Jan, AFAICT this is not yet ready to run the XSA checking tools. > Let me know when you think I should run them I'm lost. Why would the tree not be ready? There are no further backports supposed to arrive prior to the release. I was hoping to do the release yesterday. Also the mail you've replied to is about a month old. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] psr: fix bug which may cause crash
On 28.11.2019 09:17, Yi Sun wrote: > On 19-11-27 11:06:49, Jan Beulich wrote: >> On 27.11.2019 07:24, Yi Sun wrote: >>> --- a/xen/arch/x86/psr.c >>> +++ b/xen/arch/x86/psr.c >>> @@ -316,6 +316,7 @@ static bool cat_init_feature(const struct cpuid_leaf >>> *regs, >>> [FEAT_TYPE_L3_CDP] = "L3 CDP", >>> [FEAT_TYPE_L2_CAT] = "L2 CAT", >>> }; >>> +unsigned int i = 0; >> >> Unnecessary initializer and too wide a scope. >> > Ok, u8 is enough. Did you perhaps mistake "scope" for "width"? You shouldn't use fixed width types when there's no strict need to do so. >>> @@ -332,7 +333,8 @@ static bool cat_init_feature(const struct cpuid_leaf >>> *regs, >>> return false; >>> >>> /* We reserve cos=0 as default cbm (all bits within cbm_len are >>> 1). */ >>> -feat->cos_reg_val[0] = cat_default_val(feat->cat.cbm_len); >>> +for(i = 0; i < MAX_COS_REG_CNT; i++) >> >> There are number of blanks missing here (and even more ones in >> the other instance below). It also seems to me that the comment >> ends up misplaced now. If ... >> > Sorry, the comment should be modified. > >>> +feat->cos_reg_val[i] = cat_default_val(feat->cat.cbm_len); >>> >>> wrmsrl((type == FEAT_TYPE_L3_CAT ? >>> MSR_IA32_PSR_L3_MASK(0) : >> >> ... this indeed is to remain a single write, it may want to move >> here. But as per above keeping cached and actual values in sync >> may make it necessary to move this write into the loop as well. >> > You are right, I missed to loop this sentence. > > Another idea: > I remembered that the original purpose to only write COS[0] here is to > improve performance by not writing too many MSRs. So I am thinking to > change the fix to below line in do_write_psr_msrs(). > if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] && > cos <= feat->cos_max ) > > What is your opinion? Thanks! Looks reasonable, provided it gets accompanied by a brief but precise comment. I'd also suggest switching around the two sides of the && . Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Ping: [PATCH v2] build: provide option to disambiguate symbol names
On 27.11.2019 19:17, Andrew Cooper wrote: > On 21/11/2019 08:34, Jan Beulich wrote: >> On 20.11.2019 18:13, Andrew Cooper wrote: >>> On 20/11/2019 16:40, Jürgen Groß wrote: On 20.11.19 17:30, Jan Beulich wrote: > On 08.11.2019 12:18, Jan Beulich wrote: >> The .file assembler directives generated by the compiler do not include >> any path components (gcc) or just the ones specified on the command >> line >> (clang, at least version 5), and hence multiple identically named >> source >> files (in different directories) may produce identically named static >> symbols (in their kallsyms representation). The binary diffing >> algorithm >> used by xen-livepatch, however, depends on having unique symbols. >> >> Make the ENFORCE_UNIQUE_SYMBOLS Kconfig option control the (build) >> behavior, and if enabled use objcopy to prepend the (relative to the >> xen/ subdirectory) path to the compiler invoked STT_FILE symbols. Note >> that this build option is made no longer depend on LIVEPATCH, but >> merely >> defaults to its setting now. >> >> Conditionalize explicit .file directive insertion in C files where it >> exists just to disambiguate names in a less generic manner; note that >> at the same time the redundant emission of STT_FILE symbols gets >> suppressed for clang. Assembler files as well as multiply compiled C >> ones using __OBJECT_FILE__ are left alone for the time being. >> >> Since we now expect there not to be any duplicates anymore, also don't >> force the selection of the option to 'n' anymore in allrandom.config. >> Similarly COVERAGE no longer suppresses duplicate symbol warnings if >> enforcement is in effect, which in turn allows >> SUPPRESS_DUPLICATE_SYMBOL_WARNINGS to simply depend on >> !ENFORCE_UNIQUE_SYMBOLS. >> >> Signed-off-by: Jan Beulich > I've got acks from Konrad and Wei, but still need an x86 and a release > one here. Andrew? Or alternatively - Jürgen, would you rather not see > this go in anymore? In case the needed x86 Ack is coming in before RC3 I'm fine to give my Release-ack, but I'm hesitant to take it later. >>> Has anyone actually tried building a livepatch with this change in place? >> Actually - what is your concern here? The exact spelling of symbols >> names should be of no interest to the tool. After all the compiler is >> free to invent all sorts of names for its local symbols, including >> the ones we would produce with this change in place. All the tool >> cares about is that the names be unambiguous. Hence any (theoretical) >> regression here would be a bug in the tools, which imo is no reason >> to delay this change any further. (Granted I should have got to it >> earlier, but it had been continuing to get deferred.) > > This might all be true (theoretically). > > The livepatch build tools are fragile and very sensitive to how the > object files are laid out. There is a very real risk that this change > accidentally breaks livepatching totally, even on GCC builds. > > Were this to happen, it would be yet another 4.13 regression. It's perhaps a matter of perception, but I'd still call this a live patching tools bug then, not a 4.13 regression. > This is a change to fix a concrete livepatch issue with Clang. Sure - > it resolves the symbol uniqueness failures for the in-tree build, but > considering the risks to the area you are modifying, the fact you > haven't even done a dev test of a livepatch build on GCC means that the > patch as a whole has not had what I would consider a reasonable amount > of testing. While Clang is the primary area where we need this change, it is in no way limited to that environment. Gcc can, at any time, start triggering the issue again as well - both because of changes to the compiler itself, or because of (perhaps seemingly innocent) changes we do to Xen. I can't imagine the workaround for this would be to make it impossible altogether to select LIVEPATCH=y. (In fact some of the recent always_inline may want dropping again as well with this change in place, when they were added for symbol collision reasons rather than code generation ones.) > Luckily for you, Ross and Sergey have agreed to smoke test this with > some livepatches. They will report on this thread with their findings. I appreciate this. And I do not consider it my responsibility to do regression tests of the live patching tools. If they're so extremely fragile, then I think this needs urgently taking care of by their maintainers. As mentioned before - how exactly static symbols get represented is up to the compiler, i.e. may change at any time. As a result, any change whatsoever would need such regression testing, no matter that I agree that a larger scope change like this one has slightly higher potential of triggering some issue. Jan ___ Xen-devel mailing list
Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests
On 28/11/2019 01:07, Stefano Stabellini wrote: On Wed, 27 Nov 2019, Julien Grall wrote: On Tue, 26 Nov 2019, 23:18 Stefano Stabellini, wrote: On Fri, 15 Nov 2019, Stewart Hildebrand wrote: > Allow vgic_get_hw_irq_desc to be called with a vcpu argument. > > Use vcpu argument in vgic_connect_hw_irq. > > vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with > ASSERTs. > > Signed-off-by: Stewart Hildebrand > > --- > v3: new patch > > --- > Note: I have only modified the old vgic to allow delivery of PPIs. > --- > xen/arch/arm/gic-vgic.c | 24 > xen/arch/arm/vgic.c | 6 +++--- > 2 files changed, 19 insertions(+), 11 deletions(-) > > diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c > index 98c021f1a8..2c66a8fa92 100644 > --- a/xen/arch/arm/gic-vgic.c > +++ b/xen/arch/arm/gic-vgic.c > @@ -418,7 +418,7 @@ struct irq_desc *vgic_get_hw_irq_desc(struct domain *d, struct vcpu *v, > { > struct pending_irq *p; > > - ASSERT(!v && virq >= 32); > + ASSERT((!v && (virq >= 32)) || (!d && v && (virq >= 16) && (virq < 32))); I don't think !d is necessary for this to work as intended so I would limit the ASSERT to ASSERT((!v && (virq >= 32)) || (v && (virq >= 16) && (virq < 32))); the caller can always pass v->domain But then you have the risk to run into d != v->domain. So at least with the ASSERT you document the expectation. Yes, that was not my intention. It makes sense in certain scenarios for v to be NULL. What I was trying to say is that when v is not-NULL, then also d should be not-NULL for consistency. I don't think it makes sense to pass v corresponding to vcpu1 of domain2 and d == NULL, right? While I usually like consistency, 'd' is only used to find 'v' if it is NULL. So I really see limited reason to impose the caller to set 'd' in this case. I don't know if you want to add a (d == v->domain) check to the ASSERT as it is pretty busy already. I am OK either way. -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range
On 28/11/2019 09:00, Jürgen Groß wrote: On 28.11.19 09:48, Julien Grall wrote: Hi, On 28/11/2019 08:32, Jürgen Groß wrote: On 28.11.19 09:14, Julien Grall wrote: In short, I think the patch should go in now and there are no downsides to it. That's it, I rest my case. Julien, I hope you'll reconsider. I don't really see the point to try to allow Linux 5.4 booting on Xen 4.13 without knowing whether we are not going to uncovered more BUG around I*ACTIVER. Sorry, but this is a rather weird statement. IIUC you are saying that a typo blocking boot of Linux 5.4 should not be fixed as you are not sure there are no other bugs? The implementation of I*ACTIVER was incorrect but gone unnoticed because no-one used it until 5.4. It also happen that we didn't cover all the I*ACTIVER registers, so 5.4 crashes instead of using the wrong behavior. This patch is basically replacing a guest crash by a behavior we don't fully understand. If you really want this patch in Xen 4.13, then you should read my mail on the first version and trying to answer me why this 5.4 is going to be safe running on Xen 4.13. Or do you think that with the typo fixed and running Linux 5.4 guests will destabilize the host? It is not going to destabilize the hosts. But this is not going to make 5.4 running correctly as Xen guest. Have you verified it isn't running correctly or do you just think it could hit problems? I haven't tested myself, but any bug around vGIC is usually subtled. I wrote a long e-mail on v1 (see [1]) explaning what could happen. To summarize briefly, Linux is reading the I*ACTIVER registers to check whether an interrupt is active at the hardware level. For instance, this is used to ensure all active interrupts have been handled before continuing. By always returning 0, we tell Linux there are no interrupts. One of the risk is interrupts may be lost. But that's Linux behavior, I can't tell how this is going to be used by others OSes. In both cases I see no reason to keep wrong code. Either the patch will let run Linux 5.4 fine - then the patch should definitely be taken. That's up to Stefano and Peng to provide me information why this is fine. FAOD, the current justification provided is not acceptable for me. Or the patch will let Linux 5.4 boot further, but some problems will occur. Then it will be possible to analyze those problems and try to fix them, very possibly with the sane approach you are hoping for. Juergen [1] https://lore.kernel.org/xen-devel/7289f75f-1ab2-2d42-cd88-1be5306b3...@xen.org/ -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed
From: Julien Grall A guest will setup a shared page with the hypervisor for each vCPU via XENPMU_init. The page will then get mapped in the hypervisor and only released when XENPMU_finish is called. This means that if the guest fails to invoke XENPMU_finish, e.g if it is destroyed rather than cleanly shut down, the page will stay mapped in the hypervisor. One of the consequences is the domain can never be fully destroyed as a page reference is still held. As Xen should never rely on the guest to correctly clean-up any allocation in the hypervisor, we should also unmap such pages during the domain destruction if there are any left. We can re-use the same logic as in pvpmu_finish(). To avoid duplication, move the logic in a new function that can also be called from vpmu_destroy(). NOTE: - The call to vpmu_destroy() must also be moved from arch_vcpu_destroy() into domain_relinquish_resources() such that the reference on the mapped page does not prevent domain_destroy() (which calls arch_vcpu_destroy()) from being called. - Whilst it appears that vpmu_arch_destroy() is idempotent it is by no means obvious. Hence make sure the VPMU_CONTEXT_ALLOCATED flag is cleared at the end of vpmu_arch_destroy(). - This is not an XSA because vPMU is not security supported (see XSA-163). Signed-off-by: Julien Grall Signed-off-by: Paul Durrant --- Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: "Roger Pau Monné" Cc: Jun Nakajima Cc: Kevin Tian v2: - Re-word commit comment slightly - Re-enforce idempotency of vmpu_arch_destroy() - Move invocation of vpmu_destroy() earlier in domain_relinquish_resources() --- xen/arch/x86/cpu/vpmu.c | 47 +++-- xen/arch/x86/domain.c | 10 + 2 files changed, 33 insertions(+), 24 deletions(-) diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c index f397183ec3..792953e7cb 100644 --- a/xen/arch/x86/cpu/vpmu.c +++ b/xen/arch/x86/cpu/vpmu.c @@ -576,11 +576,36 @@ static void vpmu_arch_destroy(struct vcpu *v) vpmu->arch_vpmu_ops->arch_vpmu_destroy(v); } + +vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); } -void vpmu_destroy(struct vcpu *v) +static void vpmu_cleanup(struct vcpu *v) { +struct vpmu_struct *vpmu = vcpu_vpmu(v); +mfn_t mfn; +void *xenpmu_data; + +spin_lock(>vpmu_lock); + vpmu_arch_destroy(v); +xenpmu_data = vpmu->xenpmu_data; +vpmu->xenpmu_data = NULL; + +spin_unlock(>vpmu_lock); + +if ( xenpmu_data ) +{ +mfn = domain_page_map_to_mfn(xenpmu_data); +ASSERT(mfn_valid(mfn)); +unmap_domain_page_global(xenpmu_data); +put_page_and_type(mfn_to_page(mfn)); +} +} + +void vpmu_destroy(struct vcpu *v) +{ +vpmu_cleanup(v); put_vpmu(v); } @@ -639,9 +664,6 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params) static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params) { struct vcpu *v; -struct vpmu_struct *vpmu; -mfn_t mfn; -void *xenpmu_data; if ( (params->vcpu >= d->max_vcpus) || (d->vcpu[params->vcpu] == NULL) ) return; @@ -650,22 +672,7 @@ static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params) if ( v != current ) vcpu_pause(v); -vpmu = vcpu_vpmu(v); -spin_lock(>vpmu_lock); - -vpmu_arch_destroy(v); -xenpmu_data = vpmu->xenpmu_data; -vpmu->xenpmu_data = NULL; - -spin_unlock(>vpmu_lock); - -if ( xenpmu_data ) -{ -mfn = domain_page_map_to_mfn(xenpmu_data); -ASSERT(mfn_valid(mfn)); -unmap_domain_page_global(xenpmu_data); -put_page_and_type(mfn_to_page(mfn)); -} +vpmu_cleanup(v); if ( v != current ) vcpu_unpause(v); diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index f1dd86e12e..f5c0c378ef 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -454,9 +454,6 @@ void arch_vcpu_destroy(struct vcpu *v) xfree(v->arch.msrs); v->arch.msrs = NULL; -if ( !is_idle_domain(v->domain) ) -vpmu_destroy(v); - if ( is_hvm_vcpu(v) ) hvm_vcpu_destroy(v); else @@ -2136,12 +2133,17 @@ int domain_relinquish_resources(struct domain *d) PROGRESS(vcpu_pagetables): -/* Drop the in-use references to page-table bases. */ +/* + * Drop the in-use references to page-table bases and clean + * up vPMU instances. + */ for_each_vcpu ( d, v ) { ret = vcpu_destroy_pagetables(v); if ( ret ) return ret; + +vpmu_destroy(v); } if ( altp2m_active(d) ) -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC v5 000/126] error: auto propagated local_err
28.11.2019 11:54, Markus Armbruster wrote: > Please accept my sincere apologies for taking so long to reply. A few > thoughts before I dig deeper. > > Vladimir Sementsov-Ogievskiy writes: > >> Hi all! >> >> At the request of Markus: full version of errp propagation. Let's look >> at it. Cover as much as possible, except inserting macro invocation >> where it's not necessary. >> >> It's huge, and so it's an RFC. > > It's a monster. Best to get it into full view before we commit to > fighting it. > >> In v5 I've added a lot more preparation cleanups: >> 01-23 are preparation cleanups >>01: not changed, keep Eric's r-b >>02: improve commit msg [Markus], keep Eric's r-b >>03: changed, only error API here, drop r-b >> 24 is core macro >>- improve cover letter, wording and macro code style >>- keep Eric's r-b >> 25-26: automation scripts >> - commit-per-subsystem changed a lot. it's a draft, don't bother too >> much with it >> - coccinelle: add support of error_propagate_prepend >> >> 27-126: generated patches > > Splitting up the monster can make fighting it easier. > > Your description suggests three high-level parts: > > Part 1: Preparation (makes sense by itself) I already resent part 1 all patches (handling review comments) in separate as v6. If it is convenient, I can resend them in one series as v7. > Part 2: Error interface update (with rules what code should do now) Note, that patch 21 is actually from part2, not part1. So Part 2 is 21, 24, 25. So I wait for your comments and resend (if needed) as separate small series. And 26 is auto-patch-splitter, but we don't need it now, if we are going to start from several big subsystems. > Part 3: Make the code obey the new rules everywhere > > I hope we can get part 1 out of the way quickly. Diffstat: > > backends/cryptodev.c | 11 +--- > block/nbd.c| 10 +-- > block/snapshot.c | 4 +- > dump/dump-hmp-cmds.c | 4 +- > hw/9pfs/9p-local.c | 4 +- > hw/9pfs/9p-proxy.c | 5 +- > hw/core/loader-fit.c | 5 +- > hw/core/machine-hmp-cmds.c | 6 +- > hw/core/qdev.c | 28 > hw/i386/amd_iommu.c| 14 ++-- > hw/ppc/spapr.c | 2 +- > hw/s390x/event-facility.c | 2 +- > hw/s390x/s390-stattrib.c | 3 +- > hw/sd/sdhci.c | 2 +- > hw/tpm/tpm_emulator.c | 8 +-- > hw/usb/dev-network.c | 2 +- > hw/vfio/ap.c | 16 + > include/block/snapshot.h | 2 +- > include/monitor/hmp.h | 2 +- > include/qapi/error.h | 69 ++-- > include/qom/object.h | 4 +- > monitor/hmp-cmds.c | 155 > ++--- > monitor/qmp-cmds.c | 2 +- > net/net.c | 17 ++--- > qdev-monitor.c | 28 > qga/commands-posix.c | 2 +- > qga/commands-win32.c | 2 +- > qga/commands.c | 12 ++-- > qom/qom-hmp-cmds.c | 4 +- > target/ppc/kvm.c | 6 +- > target/ppc/kvm_ppc.h | 4 +- > ui/vnc.c | 20 ++ > ui/vnc.h | 2 +- > util/error.c | 30 - > 34 files changed, 261 insertions(+), 226 deletions(-) > > At first glance, I can see bug fixes, non-mechanical cleanups, and > mechanical cleanups. > > Within each of these three groups, we have related sub-groups. For > instance, several patches clean up funny names for the common Error ** > parameters. Several more rename "uncommon" Error ** parameters, to > signal their uncommon role. I doubt splitting up these subgroups of > related mechanical changes along subsystem lines is worthwhile. > > Part 2 needs careful interface review. Having part 3 ready helps there, > because we can see rather than guess how the interface changes play out. > We really want to get this part right from the start, because if we > don't, we get to do part 3 again. > > Part 3 is what makes this a monster. I understand it's mechanical. We > can merge it incrementally, but we do want to merge it all, and sooner > rather than later, to avoid a mix of old and new error handling code. > Such mixes inevitably confuse developers, and lead to new instances of > the old patterns creeping in. > > I do have doubts about your automated split. > > I acknowledge maintainers of active subsystems may want to merge this on > their own terms, to minimize disruption. Splitting off sub-monsters for > them makes sense. Splitting off the long tail of less busy subsystems > not so much; it'll only drag out the merging. Your list below shows 100 > parts, and chasing their maintainers is not going to be a fun > experience. > > Moreover, using MAINTAINERS to guide an automatic split is a cute idea, > but it falls apart when MAINTAINERS attributes the same file to several > subsystems, which is fairly