[Xen-devel] [PATCH v1] psr: fix bug which may cause crash
During test, we found a crash on Xen with below trace. (XEN) Xen call trace: (XEN)[] R psr.c#l3_cdp_write_msr+0x1e/0x22 (XEN)[] F psr.c#do_write_psr_msrs+0x6d/0x109 (XEN)[] F smp_call_function_interrupt+0x5a/0xac (XEN)[] F call_function_interrupt+0x20/0x34 (XEN)[] F do_IRQ+0x175/0x6ae (XEN)[] F common_interrupt+0x10a/0x120 (XEN)[] F cpu_idle.c#acpi_idle_do_entry+0x9d/0xb1 (XEN)[] F cpu_idle.c#acpi_processor_idle+0x41d/0x626 (XEN)[] F domain.c#idle_loop+0xa5/0xa7 (XEN) (XEN) (XEN) (XEN) Panic on CPU 20: (XEN) GENERAL PROTECTION FAULT (XEN) [error_code=] (XEN) Root cause is that the cache of COS registers are not initialized for CAT/CDP which have non-zero default value. That causes invalid write to MSR when COS id has exceeded the max number.. So fix it by initializing the cache. Signed-off-by: Yi Sun --- xen/arch/x86/psr.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 5866a26..d3e7467 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -316,6 +316,7 @@ static bool cat_init_feature(const struct cpuid_leaf *regs, [FEAT_TYPE_L3_CDP] = "L3 CDP", [FEAT_TYPE_L2_CAT] = "L2 CAT", }; +unsigned int i = 0; /* No valid value so do not enable feature. */ if ( !regs->a || !regs->d ) @@ -332,7 +333,8 @@ static bool cat_init_feature(const struct cpuid_leaf *regs, return false; /* We reserve cos=0 as default cbm (all bits within cbm_len are 1). */ -feat->cos_reg_val[0] = cat_default_val(feat->cat.cbm_len); +for(i = 0; i < MAX_COS_REG_CNT; i++) +feat->cos_reg_val[i] = cat_default_val(feat->cat.cbm_len); wrmsrl((type == FEAT_TYPE_L3_CAT ? MSR_IA32_PSR_L3_MASK(0) : @@ -352,8 +354,11 @@ static bool cat_init_feature(const struct cpuid_leaf *regs, feat->cos_max = (feat->cos_max - 1) >> 1; /* We reserve cos=0 as default cbm (all bits within cbm_len are 1). */ -get_cdp_code(feat, 0) = cat_default_val(feat->cat.cbm_len); -get_cdp_data(feat, 0) = cat_default_val(feat->cat.cbm_len); +for(i = 0; i < MAX_COS_REG_CNT/2; i++) +{ +get_cdp_code(feat, i) = cat_default_val(feat->cat.cbm_len); +get_cdp_data(feat, i) = cat_default_val(feat->cat.cbm_len); +} wrmsrl(MSR_IA32_PSR_L3_MASK(0), cat_default_val(feat->cat.cbm_len)); wrmsrl(MSR_IA32_PSR_L3_MASK(1), cat_default_val(feat->cat.cbm_len)); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range
On 27.11.19 01:01, Julien Grall wrote: Hi, On 26/11/2019 23:17, Stefano Stabellini wrote: On Tue, 26 Nov 2019, Julien Grall wrote: Hi, On 26/11/2019 20:43, Stefano Stabellini wrote: + Juergen I missed that you weren't in CC to the original patch, sorry. I think this patch should go in, as otherwise Linux 5.4 could run into problems. It is also a pretty straightforward 4 lines patch. 5.5 (or 5.6) is not going to run on Xen for other reasons (still in the vGIC)... So I would not view this as critical. 5.5 is not out yet, in fact, the dev window has just opened. Isn't your statement a bit premature? The GICv4.1 work [1] is going to prevent Linux booting on all current versions of Xen. While I can't confirm this is going to be merged in 5.5, I can tell you this will break. In any case, even if potential future Linux releases could have other additional issues, I don't think it should change our current view on this specific issue which affects 5.4, just released. The patch is definitely not as straightforward as you may think. Please refer to the discussion we had on the first version. I voiced concern about this approach and gave point what could go wrong with happen. This patch may be better than the current state (i.e crashing), but this wasn't tested enough to confirm this is the correct things to do and no other bug will appear (I don't believe reading I*ACTIVER was ever tested before). It is an annoying bug, but this is only affecting 5.4 which has just been released. It feels to me this is a fairly risky choice to merge it qutie late in the release without a good graps of the problem (see above). So I would definitly, prefer if this patch is getting through backport once we get more testing. We can still document the bug in the release note and point people to the patch. Anyway, this is Juergen choice here. But at least now he has the full picture... Cheers, [1] https://lwn.net/Articles/800494/ Thanks, Julien, for sharing your opinion. With that statement I'd like to defer this patch to 4.14. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling
On 26.11.19 18:17, George Dunlap wrote: Xen used to have single, system-wide limits for the number of grant frames and maptrack frames a guest was allowed to create. Increasing or decreasing this single limit on the Xen command-line would change the limit for all guests on the system. Later, per-domain limits for these values was created. The system-wide limits became strict limits: domains could not be created with higher limits, but could be created with lower limits. However, the change also introduced a range of different "default" values into various places in the toolstack: - The python libxc bindings hard-coded these values to 32 and 1024, respectively - The libxl default values are 32 and 1024 respectively. - xl will use the libxl default for maptrack, but does its own default calculation for grant frames: either 32 or 64, based on the max possible mfn. These defaults interact poorly with the hypervisor command-line limit: - The hypervisor command-line limit cannot be used to raise the limit for all guests anymore, as the default in the toolstack will effectively override this. - If you use the hypervisor command-line limit to *reduce* the limit, then the "default" values generated by the toolstack are too high, and all guest creations will fail. In other words, the toolstack defaults require any change to be effected by having the admin explicitly specify a new value in every guest. In order to address this, have grant_table_init treat '0' values for max_grant_frames and max_maptrack_frames as instructions to use the system-wide default. Have all the above toolstacks default to passing 0 unless a different value is explicitly given. This restores the old behavior, that changing the hypervisor command-line option can change the behavior for all guests, while retaining the ability to set per-guest values. It also removes the bug that *reducing* the system-wide max will cause all domains without explicit limits to fail. (The ocaml bindings require the caller to always specify a value, and the code to start a xenstored stubdomain hard-codes these to 4 and 128 respectively; these will not be addressed here.) Signed-off-by: George Dunlap --- Release justification: This is an observed regression (albeit one that has spanned several releases now). Compile-tested only. NB this patch could be applied without the whitespace fixes (perhaps with some fix-ups); it's just easier since my editor strips trailing whitespace out automatically. CC: Ian Jackson CC: Wei Liu CC: Andrew Cooper CC: Jan Beulich CC: Paul Durrant CC: Julien Grall CC: Konrad Rzeszutek Wilk CC: Stefano Stabellini CC: Juergen Gross CC: Marek Marczykowski-Górecki --- tools/libxl/libxl.h | 4 ++-- tools/python/xen/lowlevel/xc/xc.c | 2 -- tools/xl/xl.c | 12 ++-- xen/common/grant_table.c | 7 +++ xen/include/public/domctl.h | 6 -- 5 files changed, 15 insertions(+), 16 deletions(-) diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 49b56fa1a3..1648d337e7 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -364,8 +364,8 @@ */ #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1 -#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32 -#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024 +#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 0 +#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 0 I'd rather use -1 for the "not specified" value. This allows to set e.g. the maptrack frames to 0 for non-driver domains. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v2] AMD/IOMMU: honour IR setting while pre-filling DTEs
On 26.11.19 18:08, Igor Druzhinin wrote: IV bit shouldn't be set in DTE if interrupt remapping is not enabled. It's a regression in behavior of "iommu=no-intremap" option which otherwise would keep interrupt requests untranslated for all of the devices in the system regardless of wether it's described as valid in IVRS or not. Signed-off-by: Igor Druzhinin Release-acked-by: Juergen Gross Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v3 1/2] x86/vmx: add ASSERT to prevent syncing PIR to IRR...
> From: Jan Beulich > Sent: Wednesday, November 27, 2019 12:59 AM > > On 26.11.2019 17:47, Roger Pau Monné wrote: > > On Tue, Nov 26, 2019 at 05:32:04PM +0100, Jan Beulich wrote: > >> On 26.11.2019 14:26, Roger Pau Monne wrote: > >>> --- a/xen/arch/x86/hvm/vmx/vmx.c > >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c > >>> @@ -2054,6 +2054,19 @@ static void vmx_sync_pir_to_irr(struct vcpu > *v) > >>> unsigned int group, i; > >>> DECLARE_BITMAP(pending_intr, NR_VECTORS); > >>> > >>> +if ( v != current && !atomic_read(>pause_count) ) > >>> +{ > >>> +/* > >>> + * Syncing PIR to IRR must not be done behind the back of the > >>> CPU, > >>> + * since the IRR is controlled by the hardware when the vCPU is > >>> + * executing. Only allow Xen to do such sync if the vCPU is the > current > >>> + * one or if it's paused: that's required in order to sync the > >>> lapic > >>> + * state before saving it. > >>> + */ > >> > >> Is this stated this way by the SDM anywhere? > > > > No, I think the SDM is not very clear on this, there's a paragraph > > about PIR: > > > > "The logical processor performs a logical-OR of PIR into VIRR and > > clears PIR. No other agent can read or write a PIR bit (or group of > > bits) between the time it is read (to determine what to OR into VIRR) > > and when it is cleared." > > Well, this is about PIR, but my question was rather towards the > effects on vIRR. > > >> I ask because the > >> comment then really doesn't apply to just this function, but to > >> vlapic_{,test_and_}{set,clear}_vector() more generally. It's > >> not clear to me at all whether the CPU caches (in an incoherent > >> fashion) IRR (and maybe other APIC page elements), rather than > >> honoring the atomic updates these macros do. > > > > IMO syncing PIR to IRR when the vCPU is running on a different pCPU is > > likely to at least defeat the purpose of posted interrupts: > > I agree here. > > > when the > > CPU receives the posted interrupt vector it won't see the > > outstanding-notification bit in the posted-interrupt descriptor > > because the sync done from a different pCPU would have cleared it, at > > which point it's not clear to me that the processor will check vIRR > > for pending interrupts. The description in section 29.6 > > POSTED-INTERRUPT PROCESSING doesn't explicitly mention whether the > > value of the outstanding-notification bit affects the logic of posted > > interrupt processing. I think the outstanding-notification is one-off checked for triggering interrupt posting process. Once the process starts, there is no need to look at it again. The step 3 of posting process in 29.6 clearly says: "The processor clears the outstanding-notification bit in the posted- interrupt descriptor. This is done atomically so as to leave the remainder of the descriptor unmodified (e.g., with a locked AND operation)." But regardless of the hardware behavior, I think it's safe to restrict sync_pir_to_irr as this patch does. > > But overall this again is all posted interrupt centric when my > question was about vIRR, in particular whether the asserting you > add may need to be even more rigid. > > Anyway, let's see what the VMX maintainers have to say. > There is one paragraph in 29.6: "Use of the posted-interrupt descriptor differs from that of other data structures that are referenced by pointers in a VMCS. There is a general requirement that software ensure that each such data structure is modified only when no logical processor with a current VMCS that references it is in VMX non-root operation. That requirement does not apply to the posted-interrupt descriptor. There is a requirement, however, that such modifications be done using locked read-modify-write instructions." virtual-APIC page is pointer-referenced by VMCS, thus it falls into above general requirement. But I suppose there should be some exception with this page too, otherwise the point of posted interrupt is killed (if we have to kick the dest vcpu into root to update the vIRR). Let me confirm internally. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 for 4.13] x86/microcode: refuse to load the same revision ucode
On Tue, Nov 26, 2019 at 03:41:53PM +, Sergey Dyasli wrote: >Currently if a user tries to live-load the same or older ucode revision >than CPU already has, he will get a single message in Xen log like: > >(XEN) 128 cores are to update their microcode > >No actual ucode loading will happen and this situation can be quite >confusing. Fix this by starting ucode update only when the provided >ucode revision is higher than the currently cached one (if any). >This is based on the property that if microcode_cache exists, all CPUs >in the system should have at least that ucode revision. > >Additionally, print a user friendly message if no matching or newer >ucode can be found in the provided blob. This also requires ignoring >-ENODATA in AMD-side code, otherwise the message given to the user is: > >(XEN) Parsing microcode blob error -61 > >Which actually means that a ucode blob was parsed fine, but no matching >ucode was found. > >Signed-off-by: Sergey Dyasli >--- >v2 --> v3: >- move ucode comparison to generic code >- ignore -ENODATA in a different code section > >v1 --> v2: >- compare provided ucode with the currently cached one > >CC: Jan Beulich >CC: Andrew Cooper >CC: Roger Pau Monné >CC: Chao Gao >CC: Juergen Gross >--- > xen/arch/x86/microcode.c | 19 +++ > xen/arch/x86/microcode_amd.c | 7 +++ > 2 files changed, 26 insertions(+) > >diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c >index 65d1f41e7c..44efc2d9b3 100644 >--- a/xen/arch/x86/microcode.c >+++ b/xen/arch/x86/microcode.c >@@ -640,10 +640,29 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) >buf, unsigned long len) > > if ( !patch ) > { >+printk(XENLOG_WARNING "microcode: couldn't find any matching ucode in >" >+ "the provided blob!\n"); > ret = -ENOENT; > goto put; > } > >+/* >+ * If microcode_cache exists, all CPUs in the system should have at least >+ * that ucode revision. >+ */ >+spin_lock(_mutex); >+if ( microcode_cache && >+ microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE ) >+{ >+spin_unlock(_mutex); >+printk(XENLOG_WARNING "microcode: couldn't find any newer revision " >+ "in the provided blob!\n"); The patch needs to be freed. With it fixed, Reviewed-by: Chao Gao Thanks Chao ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v3 2/2] x86/vmx: always sync PIR to IRR before vmentry
> From: Roger Pau Monne > Sent: Tuesday, November 26, 2019 9:27 PM > > When using posted interrupts on Intel hardware it's possible that the > vCPU resumes execution with a stale local APIC IRR register because > depending on the interrupts to be injected vlapic_has_pending_irq > might not be called, and thus PIR won't be synced into IRR. > > Fix this by making sure PIR is always synced to IRR in > hvm_vcpu_has_pending_irq regardless of what interrupts are pending. > > While there also simplify the code in __vmx_deliver_posted_interrupt: > only raise a softirq if the vCPU is the one currently running and > __vmx_deliver_posted_interrupt is called from interrupt context. The as commented earlier, this is what exactly original code does. Then what is the simplification? > softirq is raised to make sure vmx_intr_assist is retried if the > interrupt happens to arrive after vmx_intr_assist but before > interrupts are disabled in vmx_do_vmentry. Also simplify the logic for > IPIing other pCPUs, there's no need to check v->processor since the > IPI should be sent as long as the vCPU is not the current one and it's > running. > > Reported-by: Joe Jin > Signed-off-by: Roger Pau Monné > --- > Cc: Juergen Gross > --- > Changes since v2: > - Raise a softirq if in interrupt context and the vCPU is the current >one. > - Use is_running instead of runnable. > - Remove the call to vmx_sync_pir_to_irr in vmx_intr_assist and >instead always call vlapic_has_pending_irq in >hvm_vcpu_has_pending_irq. > --- > xen/arch/x86/hvm/irq.c | 7 +++-- > xen/arch/x86/hvm/vmx/vmx.c | 64 +++--- > 2 files changed, 24 insertions(+), 47 deletions(-) > > diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c > index e03a87ad50..b50ac62a16 100644 > --- a/xen/arch/x86/hvm/irq.c > +++ b/xen/arch/x86/hvm/irq.c > @@ -515,7 +515,11 @@ void hvm_set_callback_via(struct domain *d, > uint64_t via) > struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v) > { > struct hvm_domain *plat = >domain->arch.hvm; > -int vector; > +/* > + * Always call vlapic_has_pending_irq so that PIR is synced into IRR when > + * using posted interrupts. > + */ > +int vector = vlapic_has_pending_irq(v); > > if ( unlikely(v->nmi_pending) ) > return hvm_intack_nmi; > @@ -530,7 +534,6 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct > vcpu *v) > if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output ) > return hvm_intack_pic(0); > > -vector = vlapic_has_pending_irq(v); > if ( vector != -1 ) > return hvm_intack_lapic(vector); > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index c817aec75d..4dea868cda 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -1945,57 +1945,31 @@ static void vmx_process_isr(int isr, struct vcpu > *v) > > static void __vmx_deliver_posted_interrupt(struct vcpu *v) > { > -bool_t running = v->is_running; > - > vcpu_unblock(v); > -/* > - * Just like vcpu_kick(), nothing is needed for the following two cases: > - * 1. The target vCPU is not running, meaning it is blocked or runnable. > - * 2. The target vCPU is the current vCPU and we're in non-interrupt > - * context. > - */ > -if ( running && (in_irq() || (v != current)) ) > -{ > +if ( v->is_running && v != current ) > /* > - * Note: Only two cases will reach here: > - * 1. The target vCPU is running on other pCPU. > - * 2. The target vCPU is the current vCPU. > + * If the vCPU is running on another pCPU send an IPI to the pCPU. > When > + * the IPI arrives, the target vCPU may be running in non-root mode, > + * running in root mode, runnable or blocked. If the target vCPU is > + * running in non-root mode, the hardware will sync PIR to vIRR for > + * 'posted_intr_vector' is a special vector handled directly by the > + * hardware. > * > - * Note2: Don't worry the v->processor may change. The vCPU being > - * moved to another processor is guaranteed to sync PIR to vIRR, > - * due to the involved scheduling cycle. > + * If the target vCPU is running in root-mode, the interrupt handler > + * starts to run. Considering an IPI may arrive in the window between > + * the call to vmx_intr_assist() and interrupts getting disabled, the > + * interrupt handler should raise a softirq to ensure events will be > + * delivered in time. I prefer to original comment which covers all possible conditions that the target vcpu might be. You may help improve it if some words are not well-written, but removing useful information is not good there. > */ > -unsigned int cpu = v->processor; > - > -/* > - * For case 1, we send an IPI to the pCPU. When an IPI arrives, the > - *
Re: [Xen-devel] [PATCH for-4.13] x86/vmx: always sync PIR to IRR before vmentry
> From: Roger Pau Monné > Sent: Monday, November 18, 2019 10:56 PM > > On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote: > > On 18.11.2019 15:03, Roger Pau Monné wrote: > > > On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote: > > >> On 18.11.2019 11:16, Roger Pau Monne wrote: > > >>> @@ -1954,48 +1952,28 @@ static void > __vmx_deliver_posted_interrupt(struct vcpu *v) > > >>> * 2. The target vCPU is the current vCPU and we're in > > >>> non-interrupt > > >>> * context. > > >>> */ > > >>> -if ( running && (in_irq() || (v != current)) ) > > >>> -{ > > >>> +if ( vcpu_runnable(v) && v != current ) > > >> > > >> I'm afraid you need to be more careful with the running vs runnable > > >> distinction here. The comment above here becomes stale with the > > >> change (also wrt the removal of in_irq(), which I'm at least uneasy > > >> about), and the new commentary below also largely says/assumes > > >> "running", not "runnable". > > > > > > I've missed to fix that comment, will take care in the next version. > > > Note also that the comment is quite pointless, it only states what the > > > code below is supposed to do, but doesn't give any reasoning as to why > > > in_irq is relevant here. > > > > It's main "value" is to refer to vcpu_kick(), which has ... > > > > > TBH I'm not sure of the point of the in_irq check, I don't think it's > > > relevant for the code here. > > > > ... a similar in_irq() check. Sadly that one, while having a bigger > > comment, also doesn't explain what it's needed for. It looks like I > > should recall the reason, but I'm sorry - I don't right now. > > By reading the message of the commit that introduced the in_irq check > in vcpu_kick: > > "The drawback is that {vmx,svm}_intr_assist() now races new event > notifications delivered by IRQ or IPI. We close down this race by > having vcpu_kick() send a dummy softirq -- this gets picked up in > IRQ-sage context and will cause retry of *_intr_assist(). We avoid > delivering the softirq where possible by avoiding it when we are > running in the non-IRQ context of the VCPU to be kicked." > > AFAICT in the vcpu_kick case this is done because the softirq should > only be raised when in IRQ context in order to trigger the code in > vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant > if vcpu_kick is issued from an irq handler executed after > vmx_intr_assist and before the disabling interrupts in > vmx_do_vmentry. > > I think we need something along the lines of: > > if ( v->is_running && v != current ) > send_IPI_mask(cpumask_of(v->processor), posted_intr_vector); > else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) ) > raise_softirq(VCPU_KICK_SOFTIRQ); Then what's the difference from original logic? > > So that vmx_intr_assist is also retried if a vector is signaled in PIR > on the vCPU currently running between the call to vmx_intr_assist and > the disabling of interrupts in vmx_do_vmentry. > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] x86/vmx: always sync PIR to IRR before vmentry
> From: Roger Pau Monné > Sent: Thursday, November 21, 2019 5:26 PM > > On Mon, Nov 18, 2019 at 05:00:29PM +0100, Jan Beulich wrote: > > On 18.11.2019 15:20, Roger Pau Monné wrote: > > > On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote: > > >> On 18.11.2019 14:46, Roger Pau Monné wrote: > > >>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote: > > On 18.11.2019 11:16, Roger Pau Monne wrote: > > > When using posted interrupts on Intel hardware it's possible that the > > > vCPU resumes execution with a stale local APIC IRR register because > > > depending on the interrupts to be injected vlapic_has_pending_irq > > > might not be called, and thus PIR won't be synced into IRR. > > > > > > Fix this by making sure PIR is always synced to IRR in vmx_intr_assist > > > regardless of what interrupts are pending. > > > > For this part, did you consider pulling ahead to the beginning > > of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()? > > >>> > > >>> I assumed the order in hvm_vcpu_has_pending_irq is there for a > reason. > > >>> I could indeed move vlapic_has_pending_irq to the top, but then either > > >>> the result is discarded if for example a NMI is pending injection > > >>> (in which case there's no need to go through all the logic in > > >>> vlapic_has_pending_irq), or we invert the priority of event > > >>> injection. > > >> > > >> Changing the order of events injected is not an option afaict. The > > >> pointless processing done is a valid concern, yet the suggestion > > >> was specifically to have (part of) this processing to occur early. > > >> The discarding of the result, in turn, is not a problem afaict, as > > >> a subsequent call will return the same result (unless a higher > > >> priority interrupt has surfaced in the meantime). > > > > > > Yes, that's fine. So you would prefer to move the call to > > > vlapic_has_pending_irq before any exit path in > > > hvm_vcpu_has_pending_irq? > > > > "Prefer" isn't really the way I would put it. I'd like this to be > > considered as an alternative because, as said, I think the current > > placement look more like a plaster than a cure. I'm also open for > > other suggestions. But first of all I'd like to see what the VMX > > maintainers think. > > Kevin/Jun, can we please get your opinion on the above item? > putting the sync within hvm_vcpu_has_pending_irq sounds better, implying that all intermediate states must be synced back to architectural states anytime when software wants to check virtual interrupt. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests
On Tue, 26 Nov 2019, 23:18 Stefano Stabellini, wrote: > On Fri, 15 Nov 2019, Stewart Hildebrand wrote: > > Allow vgic_get_hw_irq_desc to be called with a vcpu argument. > > > > Use vcpu argument in vgic_connect_hw_irq. > > > > vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with > > ASSERTs. > > > > Signed-off-by: Stewart Hildebrand > > > > --- > > v3: new patch > > > > --- > > Note: I have only modified the old vgic to allow delivery of PPIs. > > --- > > xen/arch/arm/gic-vgic.c | 24 > > xen/arch/arm/vgic.c | 6 +++--- > > 2 files changed, 19 insertions(+), 11 deletions(-) > > > > diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c > > index 98c021f1a8..2c66a8fa92 100644 > > --- a/xen/arch/arm/gic-vgic.c > > +++ b/xen/arch/arm/gic-vgic.c > > @@ -418,7 +418,7 @@ struct irq_desc *vgic_get_hw_irq_desc(struct domain > *d, struct vcpu *v, > > { > > struct pending_irq *p; > > > > -ASSERT(!v && virq >= 32); > > +ASSERT((!v && (virq >= 32)) || (!d && v && (virq >= 16) && (virq < > 32))); > > I don't think !d is necessary for this to work as intended so I would > limit the ASSERT to > > ASSERT((!v && (virq >= 32)) || (v && (virq >= 16) && (virq < 32))); > > the caller can always pass v->domain > But then you have the risk to run into d != v->domain. So at least with the ASSERT you document the expectation. Cheers, > > > if ( !v ) > > v = d->vcpu[0]; > > ___ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range
Hi, On 26/11/2019 23:17, Stefano Stabellini wrote: On Tue, 26 Nov 2019, Julien Grall wrote: Hi, On 26/11/2019 20:43, Stefano Stabellini wrote: + Juergen I missed that you weren't in CC to the original patch, sorry. I think this patch should go in, as otherwise Linux 5.4 could run into problems. It is also a pretty straightforward 4 lines patch. 5.5 (or 5.6) is not going to run on Xen for other reasons (still in the vGIC)... So I would not view this as critical. 5.5 is not out yet, in fact, the dev window has just opened. Isn't your statement a bit premature? The GICv4.1 work [1] is going to prevent Linux booting on all current versions of Xen. While I can't confirm this is going to be merged in 5.5, I can tell you this will break. In any case, even if potential future Linux releases could have other additional issues, I don't think it should change our current view on this specific issue which affects 5.4, just released. The patch is definitely not as straightforward as you may think. Please refer to the discussion we had on the first version. I voiced concern about this approach and gave point what could go wrong with happen. This patch may be better than the current state (i.e crashing), but this wasn't tested enough to confirm this is the correct things to do and no other bug will appear (I don't believe reading I*ACTIVER was ever tested before). It is an annoying bug, but this is only affecting 5.4 which has just been released. It feels to me this is a fairly risky choice to merge it qutie late in the release without a good graps of the problem (see above). So I would definitly, prefer if this patch is getting through backport once we get more testing. We can still document the bug in the release note and point people to the patch. Anyway, this is Juergen choice here. But at least now he has the full picture... Cheers, [1] https://lwn.net/Articles/800494/ -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [qemu-mainline test] 144305: tolerable FAIL - PUSHED
flight 144305 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/144305/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144297 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail blocked in 144297 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144297 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144297 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144297 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144297 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144297 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass version targeted for testing: qemuu65e05c82bdc6d348155e301c9d87dba7a08a5701 baseline version: qemuu122e6d2a9c1bf8aa1d51409c15809a82621515b1 Last test of basis 144297 2019-11-25 15:06:18 Z1 days Testing same since 144305 2019-11-26 05:17:32 Z0 days1 attempts People who touched revisions under test: Fangrui Song Marc-André Lureau Markus Armbruster Michael S. Tsirkin Peter Maydell Qi, Yadong Zhang, Qi jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64
Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range
On Tue, 26 Nov 2019, Julien Grall wrote: > Hi, > > On 26/11/2019 20:43, Stefano Stabellini wrote: > > + Juergen > > > > I missed that you weren't in CC to the original patch, sorry. > > I think this patch should go in, as otherwise Linux 5.4 could run into > > problems. It is also a pretty straightforward 4 lines patch. > > 5.5 (or 5.6) is not going to run on Xen for other reasons (still in the > vGIC)... So I would not view this as critical. 5.5 is not out yet, in fact, the dev window has just opened. Isn't your statement a bit premature? In any case, even if potential future Linux releases could have other additional issues, I don't think it should change our current view on this specific issue which affects 5.4, just released. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC XEN PATCH v3 10/11] xen: arm: context switch vtimer PPI state.
On Mon, 25 Nov 2019, Julien Grall wrote: > On 15/11/2019 20:14, Stewart Hildebrand wrote: > > From: Ian Campbell > > > > ... instead of artificially masking the timer interrupt in the timer > > state and relying on the guest to unmask (which it isn't required to > > do per the h/w spec, although Linux does). > > > > By using the newly added hwppi save/restore functionality we make use > > of the GICD I[SC]ACTIVER registers to save and restore the active > > state of the interrupt, which prevents the nested invocations which > > the current masking works around. > > > > Signed-off-by: Ian Campbell > > Signed-off-by: Stewart Hildebrand > > --- > > v2: Rebased, in particular over Julien's passthrough stuff which > > reworked a bunch of IRQ related stuff. > > Also largely rewritten since precursor patches now lay very > > different groundwork. > > > > v3: Address feedback from v2 [1]: > >* Remove virt_timer_irqs performance counter since it is now unused. > >* Add caveat to comment about not using I*ACTIVER register. > >* HACK: don't initialize pending_irq->irq in vtimer for new vGIC to > > allows us to build with CONFIG_NEW_VGIC=y > > > > [1] > > https://lists.xenproject.org/archives/html/xen-devel/2015-11/msg01065.html > > --- > > > > Note: Regarding Stefano's comment in [2], I did test it with the call > > to gic_hwppi_set_pending removed, and I was able to boot dom0. > > When dealing with the vGIC, testing is not enough to justify the removal of > some code. We need a worded justification of why it is (or not) necessary. > > In this case the timer is level (despite some broken HW misconfiguring it), so > by removing set_pending() you don't affect anything as restoring the timer > registers will automatically mark the interrupt pending. > > > > > [2] > > https://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02683.html > > --- > > xen/arch/arm/time.c | 26 ++ > > xen/arch/arm/vtimer.c| 45 +--- > > xen/include/asm-arm/domain.h | 1 + > > xen/include/asm-arm/perfc_defn.h | 1 - > > 4 files changed, 44 insertions(+), 29 deletions(-) > > > > diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c > > index 739bcf186c..e3a23b8e16 100644 > > --- a/xen/arch/arm/time.c > > +++ b/xen/arch/arm/time.c > > @@ -243,28 +243,6 @@ static void timer_interrupt(int irq, void *dev_id, > > struct cpu_user_regs *regs) > > } > > } > > -static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs > > *regs) > > -{ > > -/* > > - * Edge-triggered interrupts can be used for the virtual timer. Even > > - * if the timer output signal is masked in the context switch, the > > - * GIC will keep track that of any interrupts raised while IRQS are > > - * disabled. As soon as IRQs are re-enabled, the virtual interrupt > > - * will be injected to Xen. > > - * > > - * If an IDLE vCPU was scheduled next then we should ignore the > > - * interrupt. > > - */ > > -if ( unlikely(is_idle_vcpu(current)) ) > > -return; > > - > > -perfc_incr(virt_timer_irqs); > > - > > -current->arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0); > > -WRITE_SYSREG32(current->arch.virt_timer.ctl | CNTx_CTL_MASK, > > CNTV_CTL_EL0); > > -vgic_inject_irq(current->domain, current, current->arch.virt_timer.irq, > > true); > > -} > > - > > /* > >* Arch timer interrupt really ought to be level triggered, since the > >* design of the timer/comparator mechanism is based around that > > @@ -304,8 +282,8 @@ void init_timer_interrupt(void) > > request_irq(timer_irq[TIMER_HYP_PPI], 0, timer_interrupt, > > "hyptimer", NULL); > > -request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt, > > - "virtimer", NULL); > > +route_hwppi_to_current_vcpu(timer_irq[TIMER_VIRT_PPI], "virtimer"); > > + > > request_irq(timer_irq[TIMER_PHYS_NONSECURE_PPI], 0, timer_interrupt, > > "phytimer", NULL); > > diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c > > index e6aebdac9e..6e3498952d 100644 > > --- a/xen/arch/arm/vtimer.c > > +++ b/xen/arch/arm/vtimer.c > > @@ -55,9 +55,19 @@ static void phys_timer_expired(void *data) > > static void virt_timer_expired(void *data) > > { > > struct vtimer *t = data; > > -t->ctl |= CNTx_CTL_MASK; > > -vgic_inject_irq(t->v->domain, t->v, t->irq, true); > > -perfc_incr(vtimer_virt_inject); > > +t->ctl |= CNTx_CTL_PENDING; > > I don't think this is necessary. If the software timer fire, then the virtual > timer (in HW) would have fired too. So by restoring the timer, then the HW > should by itself set the pending bit and trigger the interrupt. > > > +if ( !(t->ctl & CNTx_CTL_MASK) ) > > The timer is only set if the virtual timer is enabled and not masked. So I > think this check is unnecessary as we could never reached
Re: [Xen-devel] [RFC XEN PATCH v3 08/11] xen: arm: vgic: don't fail if IRQ is already connected
On Fri, 15 Nov 2019, Stewart Hildebrand wrote: > There are some IRQs that happen to have multiple "interrupts = < ... >;" > properties with the same IRQ in the device tree. For example: > > interrupts = <0 123 4>, > <0 123 4>, > <0 123 4>, > <0 123 4>, > <0 123 4>; > > In this case it seems that we are invoking vgic_connect_hw_irq multiple > times for the same IRQ. > > Rework the checks to allow booting in this scenario. > > I have not seen any cases where the pre-existing p->desc is any different from > the new desc, so BUG() out if they're different for now. > > Signed-off-by: Stewart Hildebrand > > --- > v3: new patch > > I tested on Xilinx Zynq UltraScale+ with the old vGIC. I have not fully > tested with CONFIG_NEW_VGIC. This hack only became necessary after > introducing the PPI series, and I'm not entirely sure what the reason > is for that. I think the reason is actually very simple: with the previous code if the irq was already setup and the details matched it would "goto out" all the way out of route_irq_to_guest. Now with the new code, it would "goto out" of setup_guest_irq returning zero, which means that gic_route_irq_to_guest is actually going to be called anyway, which is a mistake. I think we want to avoid that by returning an appropriate error condition from setup_guest_irq so that we also return early from route_irq_to_guest. > I'm also unsure if BUG()ing out is the right thing to do in case of > desc != p->desc, or what conditions would even trigger this? Is this > function exposed to guests? I think the original code printed a warning and returned an error. That's probably still what we want. > --- > xen/arch/arm/gic-vgic.c | 9 +++-- > xen/arch/arm/vgic/vgic.c | 4 > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c > index 2c66a8fa92..5c16e66b32 100644 > --- a/xen/arch/arm/gic-vgic.c > +++ b/xen/arch/arm/gic-vgic.c > @@ -460,9 +460,14 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu > *v, unsigned int virq, > if ( connect ) > { > /* The VIRQ should not be already enabled by the guest */ > -if ( !p->desc && > - !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) > +if ( !test_bit(GIC_IRQ_GUEST_ENABLED, >status) ) > +{ > +if (p->desc && p->desc != desc) Code style > +{ > +BUG(); > +} > p->desc = desc; > +} > else > ret = -EBUSY; > } > diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c > index f0f2ea5021..aa775f7668 100644 > --- a/xen/arch/arm/vgic/vgic.c > +++ b/xen/arch/arm/vgic/vgic.c > @@ -882,6 +882,10 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu > *vcpu, > irq->hw = true; > irq->hwintid = desc->irq; > } > +else if ( irq->hw && !irq->enabled && irq->hwintid == desc->irq ) > +{ > +/* The IRQ was already connected. No action is necessary. */ > +} > else > ret = -EBUSY; > } ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests
On Fri, 15 Nov 2019, Stewart Hildebrand wrote: > Allow vgic_get_hw_irq_desc to be called with a vcpu argument. > > Use vcpu argument in vgic_connect_hw_irq. > > vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with > ASSERTs. > > Signed-off-by: Stewart Hildebrand > > --- > v3: new patch > > --- > Note: I have only modified the old vgic to allow delivery of PPIs. > --- > xen/arch/arm/gic-vgic.c | 24 > xen/arch/arm/vgic.c | 6 +++--- > 2 files changed, 19 insertions(+), 11 deletions(-) > > diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c > index 98c021f1a8..2c66a8fa92 100644 > --- a/xen/arch/arm/gic-vgic.c > +++ b/xen/arch/arm/gic-vgic.c > @@ -418,7 +418,7 @@ struct irq_desc *vgic_get_hw_irq_desc(struct domain *d, > struct vcpu *v, > { > struct pending_irq *p; > > -ASSERT(!v && virq >= 32); > +ASSERT((!v && (virq >= 32)) || (!d && v && (virq >= 16) && (virq < 32))); I don't think !d is necessary for this to work as intended so I would limit the ASSERT to ASSERT((!v && (virq >= 32)) || (v && (virq >= 16) && (virq < 32))); the caller can always pass v->domain > if ( !v ) > v = d->vcpu[0]; ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [XEN PATCH v3 05/11] xen: arm: add interfaces to save/restore the state of a PPI.
On Fri, 15 Nov 2019, Stewart Hildebrand wrote: > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index f3f3fb7d7f..c3f4cd5069 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -34,6 +34,17 @@ enum domain_type { > /* The hardware domain has always its memory direct mapped. */ > #define is_domain_direct_mapped(d) ((d) == hardware_domain) > > +struct hwppi_state { > +/* h/w state */ > +unsigned irq; It doesn't look like we need to save the irq number again here. > +unsigned long enabled:1; > +unsigned long pending:1; > +unsigned long active:1; > + > +/* Xen s/w state */ > +unsigned long inprogress:1; > +}; > + > struct vtimer { > struct vcpu *v; > int irq; > diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h > index 793d324b33..1164e0c7a6 100644 > --- a/xen/include/asm-arm/gic.h > +++ b/xen/include/asm-arm/gic.h > @@ -275,6 +275,26 @@ extern int gicv_setup(struct domain *d); > extern void gic_save_state(struct vcpu *v); > extern void gic_restore_state(struct vcpu *v); > > +/* > + * Save/restore the state of a single PPI which must be routed to > + * (that is, is defined to be injected to the current > + * vcpu). > + * > + * We expect the device which use this PPI to be quiet while we > + * save/restore. > + * > + * For instance we want to disable the timer before saving the state. > + * Otherwise we will mess up the state. > + */ > +struct hwppi_state; It is a bit awkward to have to do this "redefine" struct hwppi_state here. I know that the Xen headers file don't always include correctly, but could we move the full definition of struct hwppi_state here? domain.h is already including gic.h, so it should work? > +extern void gic_hwppi_state_init(struct hwppi_state *s, unsigned irq); > +extern void gic_hwppi_set_pending(struct hwppi_state *s); > +extern void gic_save_and_mask_hwppi(struct vcpu *v, unsigned irq, > +struct hwppi_state *s); > +extern void gic_restore_hwppi(struct vcpu *v, > + const unsigned virq, > + const struct hwppi_state *s); > + > /* SGI (AKA IPIs) */ > enum gic_sgi { > GIC_SGI_EVENT_CHECK = 0, > @@ -325,8 +345,10 @@ struct gic_hw_operations { > int (*init)(void); > /* Save GIC registers */ > void (*save_state)(struct vcpu *); > +void (*save_and_mask_hwppi)(struct irq_desc *desc, struct hwppi_state > *s); > /* Restore GIC registers */ > void (*restore_state)(const struct vcpu *); > +void (*restore_hwppi)(struct irq_desc *desc, const struct hwppi_state > *s); > /* Dump GIC LR register information */ > void (*dump_state)(const struct vcpu *); > > diff --git a/xen/include/asm-arm/irq.h b/xen/include/asm-arm/irq.h > index e14001b5c6..3b37a21c06 100644 > --- a/xen/include/asm-arm/irq.h > +++ b/xen/include/asm-arm/irq.h > @@ -96,6 +96,8 @@ void irq_set_affinity(struct irq_desc *desc, const > cpumask_t *cpu_mask); > */ > bool irq_type_set_by_domain(const struct domain *d); > > +void irq_set_virq(struct irq_desc *desc, unsigned virq); > + > #endif /* _ASM_HW_IRQ_H */ > /* > * Local variables: ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests
On 26/11/2019 22:36, Stefano Stabellini wrote: On Mon, 25 Nov 2019, Julien Grall wrote: On 23/11/2019 20:35, Julien Grall wrote: Hi, On 15/11/2019 20:10, Stewart Hildebrand wrote: Allow vgic_get_hw_irq_desc to be called with a vcpu argument. Use vcpu argument in vgic_connect_hw_irq. vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with ASSERTs. Signed-off-by: Stewart Hildebrand --- v3: new patch --- Note: I have only modified the old vgic to allow delivery of PPIs. The new vGIC should also be modified to support delivery of PPIs. diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 82f524a35c..c3933c2687 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -410,10 +410,10 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n) irq_set_affinity(p->desc, cpumask_of(v_target->processor)); spin_lock_irqsave(>desc->lock, flags); /* - * The irq cannot be a PPI, we only support delivery of SPIs - * to guests. + * The irq cannot be a SGI, we only support delivery of SPIs + * and PPIs to guests. */ - ASSERT(irq >= 32); + ASSERT(irq >= NR_SGIS); We usually put ASSERT() in place we know that code wouldn't be able to work correctly if there ASSERT were hit. In this particular case: if ( irq_type_set_by_domain(d) ) gic_set_irq_type(p->desc, vgic_get_virq_type(v, n, i)); 1) We don't want to allow any domain (including Dom0) to modify the interrupt type (i.e. level/edge) for PPIs as this is shared. You will also most likely need to modify the counterpart in setup_guest_irq(). p->desc->handler->enable(p->desc); 2) On GICv3, the re-distributor of vCPU A is accessible by vCPU B. So vCPU B could enable the SGI for vCPU A. But this would be called on the wrong pCPU leading to inconsistency between the hardware state of the internal vGIC state. I thought a bit more of the issue over the week-end. The current vGIC is fairly messy. I can see two solutions on how to solve this: 1) Send an IPI to the pCPU where the vCPU A is running and disable/enable the interrupt. The other side would need to the vCPU was actually running to avoid disabling the PPI for the wrong pCPU 2) Keep the HW interrupt always enabled We propagated the enable/disable because of some messy part in the vGIC: - vgic_inject_irq() will not queue any pending interrupt if the vCPU is offline. While interrupt cannot be delivered, we still need to keep them pending as they will never occur again otherwise. This is because they are active on the host side and the guest has no way to deactivate them. - Our implementation of PSCI CPU will remove all pending interrupts (see vgic_clear_pending_irqs()). I am not entirely sure the implication here because of the previous. There are a probably more. Aside the issues with it, I don't really see good advantage to propagate the interrupt state as the interrupts (PPIs, SPIs) have active state. So they can only be received once until the guest actually handles it. So my preference would still be 2) because this makes the code simpler, avoid IPI and other potential locking trouble. Yes, I think that is a good suggestion. I take that you mean that in vgic_disable_irqs for PPIs we would only clear GIC_IRQ_GUEST_ENABLED then return basically, right? Not really, I am only suggesting to remove the part if ( desc != NULL ) ... But this change alone is not enough. It would require some modification in the rest of the vGIC (see my previous e-mail) and likely some investigation to understand the implication of keeping the interrupt enabled from the HW (I am a bit worry we may have backed this assumption into other part of the vGIC :(). Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests
On Mon, 25 Nov 2019, Julien Grall wrote: > On 23/11/2019 20:35, Julien Grall wrote: > > Hi, > > > > On 15/11/2019 20:10, Stewart Hildebrand wrote: > > > Allow vgic_get_hw_irq_desc to be called with a vcpu argument. > > > > > > Use vcpu argument in vgic_connect_hw_irq. > > > > > > vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with > > > ASSERTs. > > > > > > Signed-off-by: Stewart Hildebrand > > > > > > --- > > > v3: new patch > > > > > > --- > > > Note: I have only modified the old vgic to allow delivery of PPIs. > > > > The new vGIC should also be modified to support delivery of PPIs. > > > > > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c > > > index 82f524a35c..c3933c2687 100644 > > > --- a/xen/arch/arm/vgic.c > > > +++ b/xen/arch/arm/vgic.c > > > @@ -410,10 +410,10 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, > > > int n) > > > irq_set_affinity(p->desc, cpumask_of(v_target->processor)); > > > spin_lock_irqsave(>desc->lock, flags); > > > /* > > > - * The irq cannot be a PPI, we only support delivery of SPIs > > > - * to guests. > > > + * The irq cannot be a SGI, we only support delivery of SPIs > > > + * and PPIs to guests. > > > */ > > > - ASSERT(irq >= 32); > > > + ASSERT(irq >= NR_SGIS); > > > > We usually put ASSERT() in place we know that code wouldn't be able to work > > correctly if there ASSERT were hit. In this particular case: > > > > > if ( irq_type_set_by_domain(d) ) > > > gic_set_irq_type(p->desc, vgic_get_virq_type(v, n, i)); > > > > 1) We don't want to allow any domain (including Dom0) to modify the > > interrupt type (i.e. level/edge) for PPIs as this is shared. You will also > > most likely need to modify the counterpart in setup_guest_irq(). > > > > > p->desc->handler->enable(p->desc); > > > > 2) On GICv3, the re-distributor of vCPU A is accessible by vCPU B. So vCPU B > > could enable the SGI for vCPU A. But this would be called on the wrong pCPU > > leading to inconsistency between the hardware state of the internal vGIC > > state. > > I thought a bit more of the issue over the week-end. The current vGIC is > fairly messy. I can see two solutions on how to solve this: > 1) Send an IPI to the pCPU where the vCPU A is running and disable/enable > the interrupt. The other side would need to the vCPU was actually running to > avoid disabling the PPI for the wrong pCPU > 2) Keep the HW interrupt always enabled > > We propagated the enable/disable because of some messy part in the vGIC: > - vgic_inject_irq() will not queue any pending interrupt if the vCPU is > offline. While interrupt cannot be delivered, we still need to keep them > pending as they will never occur again otherwise. This is because they are > active on the host side and the guest has no way to deactivate them. > - Our implementation of PSCI CPU will remove all pending interrupts (see > vgic_clear_pending_irqs()). I am not entirely sure the implication here > because of the previous. > > There are a probably more. Aside the issues with it, I don't really see good > advantage to propagate the interrupt state as the interrupts (PPIs, SPIs) have > active state. So they can only be received once until the guest actually > handles it. > > So my preference would still be 2) because this makes the code simpler, avoid > IPI and other potential locking trouble. Yes, I think that is a good suggestion. I take that you mean that in vgic_disable_irqs for PPIs we would only clear GIC_IRQ_GUEST_ENABLED then return basically, right?___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)
On Tue, Nov 26, 2019 at 1:20 PM Rich Persaud wrote: > > On Nov 26, 2019, at 15:23, Andrew Cooper wrote: > > > On 26/11/2019 20:12, Roman Shaposhnik wrote: > > On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki > > wrote: > > On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote: > > Hi Marek, after applying Jan's patch I'm making much further progress. > > Xen boots fine and Dom0 seems to be OK (more tests are needed tho on > > my end). > > > I'm attaching the logs from Xen and Dom0. > > > At this point it seems that adding efi=attr=uc is a better option for > > these boxes than a wholesale efi=no-rs > > > Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was > > supposed to cover by default (so I don't have to add efi=attr=uc)? > > No, this looks like some different firmware (?) issue. > > > Question #2: is there any downside to *always* specifying efi=attr=uc? > > Even for servers that, strictly speaking, don't need it? > > TL;DR: It should be fine. It is what Linux does too. > > > Details: > > > Lets take a look why 'efi=attr=uc' helps, and how can we make it work > > out of the box: > > > The issue is about memory marked as type=11 (EfiMemoryMappedIO) with > > attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability > > attribute is defined. For the record, defined attributes are (UEFI spec > > .6): > > >EFI_MEMORY_UC Memory cacheability attribute: The memory region supports > >being configured as not cacheable. > > >EFI_MEMORY_WC Memory cacheability attribute: The memory region supports > >being configured as write combining. > > >EFI_MEMORY_WT Memory cacheability attribute: The memory region supports > >being configured as cacheable with a “write through” policy. > >Writes that hit in the cache will also be written to main memory. > > >EFI_MEMORY_WB Memory cacheability attribute: The memory region supports > >being configured as cacheable with a “write back” policy. Reads > >and writes that hit in the cache do not propagate to main memory. > >Dirty data is written back to main memory when a new cache line > >is allocated. > > >EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports > >being configured as not cacheable, exported, and supports the > >“fetch and add” semaphore mechanism. > > > My reading of UEFI spec doesn't give much hints what to do with memory > > mappings without any cachability attribute. The only related info I've > > found is about EfiMemoryMappedIO: > > >This memory is not used by the OS. All system memory-mapped IO > >information should come from ACPI tables. > > > So, maybe there is some more info? > > > Anyway, if I understand correctly, MMIO region should be mapped as UC, > > right? > > > I've also taken look at what Linux does. And basically, the only bit > > Linux care about is EFI_MEMORY_WB - if it's absent, then set the region > > as uncachable (page cache disabled bit in page table entry). So, > > basically Linux by default does what Xen's efi=attr=uc does. > > Very interesting! Thanks for doing the research. > > > So, to improve Xen's hardware/firmware compatibility, I have two ideas: > > > 1. Make efi=attr=uc the default (it's still possible to disable it with > > efi=attr=no). > > I'd be very much in favor of that too (especially since it seems to match > > Linux behaviour) What do others think? > > > Its more than just this. Linux also doesn't use EFI reboot because it > is broken almost everywhere (because Windows doesn't use it because its > broken almost everywhere, so it never gets fixed). > > Xen should be following Linux, but I'm exhausted arguing this point. > > A consequence is that downstream tend to share a pile of "unbreak Xen on > UEFI" patches which have been rejected upstream on philosophical rather > than technical grounds, despite this being a toxic environment to work in. > > > As an intermediate step, could we have an umbrella opt-in Kconfig option > (CONFIG_EFI_NONSPEC_COMPATIBILITY?) that enables multiple EFI options for > maximum hardware compatibility? For this thread and Xen 4.13, that would be > EFI_SET_VIRTUAL_ADDRESS_MAP and efi=attr=uc. If more options/quirks are > added in the future, downstreams using EFI_NONSPEC_COMPATIBILITY would get > them by default. As one of those downstream users I have to say I like this A LOT! > The long-term solution is an OSS virtualization-security test tool (e.g. with > Xen and QEMU KVM) that can be run by OEM/ODM QA factory teams on > pre-production firmware and hardware. That is the most OEM-actionable > development window where firmware quality issues can be detected and fixed. > Microsoft's hardware logo/certification work with Windows 10 OEMs on "secured > core" features is also tackling firmware improvements for > virtualization-based security. That's a good proposal, but the question, as always becomes who moves the needle on this one so we avoid
Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)
On Nov 26, 2019, at 15:23, Andrew Cooper wrote: > On 26/11/2019 20:12, Roman Shaposhnik wrote: >>> On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki >>> wrote: >>> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote: Hi Marek, after applying Jan's patch I'm making much further progress. Xen boots fine and Dom0 seems to be OK (more tests are needed tho on my end). I'm attaching the logs from Xen and Dom0. At this point it seems that adding efi=attr=uc is a better option for these boxes than a wholesale efi=no-rs Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was supposed to cover by default (so I don't have to add efi=attr=uc)? >>> No, this looks like some different firmware (?) issue. Question #2: is there any downside to *always* specifying efi=attr=uc? Even for servers that, strictly speaking, don't need it? >>> TL;DR: It should be fine. It is what Linux does too. >>> Details: >>> Lets take a look why 'efi=attr=uc' helps, and how can we make it work >>> out of the box: >>> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with >>> attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability >>> attribute is defined. For the record, defined attributes are (UEFI spec >>> .6): >>>EFI_MEMORY_UC Memory cacheability attribute: The memory region supports >>>being configured as not cacheable. >>>EFI_MEMORY_WC Memory cacheability attribute: The memory region supports >>>being configured as write combining. >>>EFI_MEMORY_WT Memory cacheability attribute: The memory region supports >>>being configured as cacheable with a “write through” policy. >>>Writes that hit in the cache will also be written to main memory. >>>EFI_MEMORY_WB Memory cacheability attribute: The memory region supports >>>being configured as cacheable with a “write back” policy. Reads >>>and writes that hit in the cache do not propagate to main memory. >>>Dirty data is written back to main memory when a new cache line >>>is allocated. >>>EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports >>>being configured as not cacheable, exported, and supports the >>>“fetch and add” semaphore mechanism. >>> My reading of UEFI spec doesn't give much hints what to do with memory >>> mappings without any cachability attribute. The only related info I've >>> found is about EfiMemoryMappedIO: >>>This memory is not used by the OS. All system memory-mapped IO >>>information should come from ACPI tables. >>> So, maybe there is some more info? >>> Anyway, if I understand correctly, MMIO region should be mapped as UC, >>> right? >>> I've also taken look at what Linux does. And basically, the only bit >>> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region >>> as uncachable (page cache disabled bit in page table entry). So, >>> basically Linux by default does what Xen's efi=attr=uc does. >> Very interesting! Thanks for doing the research. >> >>> So, to improve Xen's hardware/firmware compatibility, I have two ideas: >>> 1. Make efi=attr=uc the default (it's still possible to disable it with >>> efi=attr=no). >> I'd be very much in favor of that too (especially since it seems to match >> Linux behaviour) What do others think? > > Its more than just this. Linux also doesn't use EFI reboot because it > is broken almost everywhere (because Windows doesn't use it because its > broken almost everywhere, so it never gets fixed). > > Xen should be following Linux, but I'm exhausted arguing this point. > > A consequence is that downstream tend to share a pile of "unbreak Xen on > UEFI" patches which have been rejected upstream on philosophical rather > than technical grounds, despite this being a toxic environment to work in. As an intermediate step, could we have an umbrella opt-in Kconfig option (CONFIG_EFI_NONSPEC_COMPATIBILITY?) that enables multiple EFI options for maximum hardware compatibility? For this thread and Xen 4.13, that would be EFI_SET_VIRTUAL_ADDRESS_MAP and efi=attr=uc. If more options/quirks are added in the future, downstreams using EFI_NONSPEC_COMPATIBILITY would get them by default. The long-term solution is an OSS virtualization-security test tool (e.g. with Xen and QEMU KVM) that can be run by OEM/ODM QA factory teams on pre-production firmware and hardware. That is the most OEM-actionable development window where firmware quality issues can be detected and fixed. Microsoft's hardware logo/certification work with Windows 10 OEMs on "secured core" features is also tackling firmware improvements for virtualization-based security. From the business side, Dell/HP/Lenovo + other OEMs and ODMs could add premium "FirmCare" SKUs into their custom build ordering systems, where customers could pay a small fee for additional firmware support, custom root-of-trust (e.g. BootGuard) key management, or even coreboot.
[Xen-devel] [PATCH v2] xen/arm: remove physical timer offset
The physical timer traps apply an offset so that time starts at 0 for the guest. However, this offset is not currently applied to the physical counter. Per the ARMv8 Reference Manual (ARM DDI 0487E.a), section D11.2.4 Timers, the "Offset" between the counter and timer should be zero for a physical timer. This removes the offset to make the timer and counter consistent. Furthermore, section D11.2.4 specifies that the values in the TimerValue view of the timers are signed in standard two's complement form. When writing to the TimerValue register, it should be signed extended as described by the equation CompareValue = (Counter[63:0] + SignExtend(TimerValue))[63:0] Signed-off-by: Jeff Kubascik --- Changes in v2: - Update commit message to specify reference manual version and section - Change physical timer cval to hold hardware value - Make sure to sign extend TimerValue on writes. This was done by first casting the r pointer to (int32_t *), dereferencing it, then casting to uint64_t. Please let me know if there is a more correct way to do this --- xen/arch/arm/vtimer.c| 21 + xen/include/asm-arm/domain.h | 3 --- 2 files changed, 9 insertions(+), 15 deletions(-) diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index e6aebdac9e..eb12a08acf 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -62,7 +62,6 @@ static void virt_timer_expired(void *data) int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config) { -d->arch.phys_timer_base.offset = NOW(); d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0); d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - boot_count); do_div(d->time_offset_seconds, 10); @@ -185,7 +184,7 @@ static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, uint32_t *r, bool read) if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + v->domain->arch.phys_timer_base.offset); + ticks_to_ns(v->arch.phys_timer.cval - boot_count)); } else stop_timer(>arch.phys_timer.timer); @@ -197,26 +196,25 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, uint32_t *r, bool read) { struct vcpu *v = current; -s_time_t now; +uint64_t cntpct; if ( !ACCESS_ALLOWED(regs, EL0PTEN) ) return false; -now = NOW() - v->domain->arch.phys_timer_base.offset; +cntpct = get_cycles(); if ( read ) { -*r = (uint32_t)(ns_to_ticks(v->arch.phys_timer.cval - now) & 0xull); +*r = (uint32_t)((v->arch.phys_timer.cval - cntpct) & 0xull); } else { -v->arch.phys_timer.cval = now + ticks_to_ns(*r); +v->arch.phys_timer.cval = cntpct + (uint64_t)(*((int32_t *)r)); if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + - v->domain->arch.phys_timer_base.offset); + ticks_to_ns(v->arch.phys_timer.cval - boot_count)); } } return true; @@ -232,17 +230,16 @@ static bool vtimer_cntp_cval(struct cpu_user_regs *regs, uint64_t *r, if ( read ) { -*r = ns_to_ticks(v->arch.phys_timer.cval); +*r = v->arch.phys_timer.cval; } else { -v->arch.phys_timer.cval = ticks_to_ns(*r); +v->arch.phys_timer.cval = *r; if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) { v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; set_timer(>arch.phys_timer.timer, - v->arch.phys_timer.cval + - v->domain->arch.phys_timer_base.offset); + ticks_to_ns(v->arch.phys_timer.cval - boot_count)); } } return true; diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 86ebdd2bcf..16a7150a95 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -65,9 +65,6 @@ struct arch_domain RELMEM_done, } relmem; -struct { -uint64_t offset; -} phys_timer_base; struct { uint64_t offset; } virt_timer_base; -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range
Hi, On 26/11/2019 20:43, Stefano Stabellini wrote: + Juergen I missed that you weren't in CC to the original patch, sorry. I think this patch should go in, as otherwise Linux 5.4 could run into problems. It is also a pretty straightforward 4 lines patch. 5.5 (or 5.6) is not going to run on Xen for other reasons (still in the vGIC)... So I would not view this as critical. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] bsp/xen: Update README
On 11/26/2019 3:34 PM, Julien Grall wrote: > Hi, > > On 26/11/2019 20:27, Jeff Kubascik wrote: >> Add some background information on the BSP and instructions on how to >> run the ticker application. >> >> Change-Id: I05050a335a938f00cc59bae69a014c5f04e05d23 >> --- >> bsps/arm/xen/README | 130 ++-- > > Hmmm what repo is it? Whoops. This is the RTEMS port that I am working on. I must of did a git send-email from the wrong directory. This will be making its way to rtems-devel soon. Apologies! > Cheers, > >> 1 file changed, 64 insertions(+), 66 deletions(-) >> >> diff --git a/bsps/arm/xen/README b/bsps/arm/xen/README >> index 1b24d84c9a..2ae2f2170d 100644 >> --- a/bsps/arm/xen/README >> +++ b/bsps/arm/xen/README >> @@ -1,66 +1,64 @@ >> -# This is a sample hardware description file for a BSP. This comment >> -# block does not have to appear in a real one. The intention of this >> -# file is to provide a central place to look when searching for >> -# information about a board when starting a new BSP. For example, >> -# you may want to find an existing timer driver for the chip you are >> -# using on your board. It is easier to grep for the chip name in >> -# all of the HARDWARE files than to peruse the source tree. Hopefully, >> -# making the HARDDWARE files accurate will also alleviate the common >> -# problem of not knowing anything about a board based on its BSP >> -# name. >> -# >> -# NOTE: If you have a class of peripheral chip on board which >> -# is not in this list please add it to this file so >> -# others will also use the same name. >> -# >> -# Timer resolution is the way it is configured in this BSP. >> -# On a counting timer, this is the length of time which >> -# corresponds to 1 count. >> -# >> - >> -BSP NAME: fastsbc1 >> -BOARD: Fat Computers, Fast SBC-1 >> -BUS:SchoolBus >> -CPU FAMILY: i386 >> -CPU:Intel Hexium >> -COPROCESSORS: Witch Hex87 >> -MODE: 32 bit mode >> - >> -DEBUG MONITOR: HexBug >> - >> -PERIPHERALS >> -=== >> -TIMERS: Intel i8254 >> - RESOLUTION: .0001 microseconds >> -SERIAL PORTS: Zilog Z8530 (with 2 ports) >> -REAL-TIME CLOCK:RTC-4 >> -DMA:Intel i8259 >> -VIDEO: none >> -SCSI: none >> -NETWORKING: none >> - >> -DRIVER INFORMATION >> -== >> -CLOCK DRIVER: RTC-4 >> -IOSUPP DRIVER: Zilog Z8530 port A >> -SHMSUPP:polled and interrupts >> -TIMER DRIVER: Intel i8254 >> -TTY DRIVER: stub only >> - >> -STDIO >> -= >> -PORT: Console port 0 >> -ELECTRICAL: RS-232 >> -BAUD: 9600 >> -BITS PER CHARACTER: 8 >> -PARITY: None >> -STOP BITS: 1 >> - >> -NOTES >> -= >> - >> -(1) 900 Mhz and 950 Mhz versions. >> - >> -(2) 1 Gb or 2 Gb RAM. >> - >> -(3) PC compatible if HexBug not enabled. >> +BSP for Xen on ARM >> + >> +Overview >> + >> + >> +This BSP enables RTEMS to run as a guest virtual machine in AArch32 mode on >> the >> +Xen hypervisor for ARMv8 platforms. >> + >> +Drivers: >> +- Clock: ARMv7-AR Generic Timer >> +- Console: Virtual PL011 device >> +- Interrupt: GICv2 >> + >> +BSP variants: >> +- xen_virtual: completely virtualized guest with no dependence on underlying >> + hardware >> + >> +The xen_virtual BSP variant relies on standard Xen features, so it should be >> +able to run on any ARMv8 platform. >> + >> +Xen allows for the passthrough of hardware peripherals to guest virtual >> +machines. BSPs could be added in the future targeting specific hardware >> +platforms and include the appropriate drivers. >> + >> +This BSP was tested with Xen running on the Xilinx Zynq UltraScale+ MPSoC >> using >> +the Virtuosity distribution maintained by DornerWorks. >> + >> +Execution >> +- >> + >> +This procedure describes how to run the ticker sample application that >> should >> +already be built with the BSP. >> + >> +The `ticker.exe` file can be found in the BSP build tree at: >> + >> + arm-rtems5/c/xen_virtual/testsuites/samples/ticker.exe >> + >> +The `ticker.exe` elf file must be translated to a binary format. >> + >> + arm-rtems5-objcopy -O binary ticker.exe ticker.bin >> + >> +Then place the `ticker.bin` file on the dom0 filesystem. >> + >> +From the dom0 console, create a configuration file `ticker.cfg` with the >> +following contents. >> + >> + name = "ticker" >> + kernel = "ticker.bin" >> + memory = 8 >> + vcpus = 1 >> + gic_version = "v2" >> + vuart = "sbsa_uart" >> + >> +Create the virtual machine and attach to the virtual vpl011 console. >> + >> + xl create ticker.cfg && xl console -t vuart ticker >> + >> +To return back to the dom0 console, press both `Ctrl` and `]` on your >> keyboard. >> + >> +Additional information >> +-- >> + >>
Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range
+ Juergen I missed that you weren't in CC to the original patch, sorry. I think this patch should go in, as otherwise Linux 5.4 could run into problems. It is also a pretty straightforward 4 lines patch. On Fri, 22 Nov 2019, Stefano Stabellini wrote: > On Fri, 22 Nov 2019, Peng Fan wrote: > > The end should be GICD_ISACTIVERN not GICD_ISACTIVER, > > and also print a warning for the unhandled read. > > > > Signed-off-by: Peng Fan > > --- > > > > V2: > > Add a warning message > > > > xen/arch/arm/vgic-v3.c | 5 - > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c > > index 422b94f902..a15b9f6441 100644 > > --- a/xen/arch/arm/vgic-v3.c > > +++ b/xen/arch/arm/vgic-v3.c > > @@ -706,7 +706,10 @@ static int __vgic_v3_distr_common_mmio_read(const char > > *name, struct vcpu *v, > > goto read_as_zero; > > > > /* Read the active status of an IRQ via GICD/GICR is not supported */ > > -case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVER): > > +case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN): > > +printk(XENLOG_G_ERR "%pv: vGICD: unhandled read from > > ISACTIVER%d\n", > > + v, (reg - GICD_ISACTIVER) / 4); > > All the other similar printks that we have in vgic-v3.c don't have the > "/ 4", for instance: > > case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN): > if ( dabt.size != DABT_WORD ) goto bad_width; > printk(XENLOG_G_ERR >"%pv: %s: unhandled word write %#"PRIregister" to > ISACTIVER%d\n", >v, name, r, reg - GICD_ISACTIVER); > > However, reg reflects the address of the register, so actually, the > division by 4 looks correct if we want to get the index of the specific > register. Thanks for spotting this. We'll need to do a clean-up in the > file later. > > Reviewed-by: Stefano Stabellini > > > > > +goto read_as_zero; > > case VRANGE32(GICD_ICACTIVER, GICD_ICACTIVERN): > > goto read_as_zero; ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [libvirt test] 144304: tolerable all pass - PUSHED
flight 144304 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/144304/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144233 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144233 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 12 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 13 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass version targeted for testing: libvirt 9d6920bd7de3f92be1894790adeb689060ab25eb baseline version: libvirt 5e939cea896fb3373a6f68f86e325c657429ed3d Last test of basis 144233 2019-11-21 04:18:53 Z5 days Failing since144244 2019-11-22 04:18:48 Z4 days5 attempts Testing same since 144304 2019-11-26 04:19:14 Z0 days1 attempts People who touched revisions under test: Christian Ehrhardt Daniel P. Berrangé Erik Skultety Jamie Strandboge Jiri Denemark Ján Tomko Laine Stump Michal Privoznik Pavel Mores Peter Krempa Pino Toscano jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt pass test-arm64-arm64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-i386-libvirt-pair pass test-arm64-arm64-libvirt-qcow2 pass test-armhf-armhf-libvirt-raw pass test-amd64-amd64-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/libvirt.git 5e939cea89..9d6920bd7d
Re: [Xen-devel] [PATCH v2] bsp/xen: Update README
Hi, On 26/11/2019 20:27, Jeff Kubascik wrote: Add some background information on the BSP and instructions on how to run the ticker application. Change-Id: I05050a335a938f00cc59bae69a014c5f04e05d23 --- bsps/arm/xen/README | 130 ++-- Hmmm what repo is it? Cheers, 1 file changed, 64 insertions(+), 66 deletions(-) diff --git a/bsps/arm/xen/README b/bsps/arm/xen/README index 1b24d84c9a..2ae2f2170d 100644 --- a/bsps/arm/xen/README +++ b/bsps/arm/xen/README @@ -1,66 +1,64 @@ -# This is a sample hardware description file for a BSP. This comment -# block does not have to appear in a real one. The intention of this -# file is to provide a central place to look when searching for -# information about a board when starting a new BSP. For example, -# you may want to find an existing timer driver for the chip you are -# using on your board. It is easier to grep for the chip name in -# all of the HARDWARE files than to peruse the source tree. Hopefully, -# making the HARDDWARE files accurate will also alleviate the common -# problem of not knowing anything about a board based on its BSP -# name. -# -# NOTE: If you have a class of peripheral chip on board which -# is not in this list please add it to this file so -# others will also use the same name. -# -# Timer resolution is the way it is configured in this BSP. -# On a counting timer, this is the length of time which -# corresponds to 1 count. -# - -BSP NAME: fastsbc1 -BOARD: Fat Computers, Fast SBC-1 -BUS:SchoolBus -CPU FAMILY: i386 -CPU:Intel Hexium -COPROCESSORS: Witch Hex87 -MODE: 32 bit mode - -DEBUG MONITOR: HexBug - -PERIPHERALS -=== -TIMERS: Intel i8254 - RESOLUTION: .0001 microseconds -SERIAL PORTS: Zilog Z8530 (with 2 ports) -REAL-TIME CLOCK:RTC-4 -DMA:Intel i8259 -VIDEO: none -SCSI: none -NETWORKING: none - -DRIVER INFORMATION -== -CLOCK DRIVER: RTC-4 -IOSUPP DRIVER: Zilog Z8530 port A -SHMSUPP:polled and interrupts -TIMER DRIVER: Intel i8254 -TTY DRIVER: stub only - -STDIO -= -PORT: Console port 0 -ELECTRICAL: RS-232 -BAUD: 9600 -BITS PER CHARACTER: 8 -PARITY: None -STOP BITS: 1 - -NOTES -= - -(1) 900 Mhz and 950 Mhz versions. - -(2) 1 Gb or 2 Gb RAM. - -(3) PC compatible if HexBug not enabled. +BSP for Xen on ARM + +Overview + + +This BSP enables RTEMS to run as a guest virtual machine in AArch32 mode on the +Xen hypervisor for ARMv8 platforms. + +Drivers: +- Clock: ARMv7-AR Generic Timer +- Console: Virtual PL011 device +- Interrupt: GICv2 + +BSP variants: +- xen_virtual: completely virtualized guest with no dependence on underlying + hardware + +The xen_virtual BSP variant relies on standard Xen features, so it should be +able to run on any ARMv8 platform. + +Xen allows for the passthrough of hardware peripherals to guest virtual +machines. BSPs could be added in the future targeting specific hardware +platforms and include the appropriate drivers. + +This BSP was tested with Xen running on the Xilinx Zynq UltraScale+ MPSoC using +the Virtuosity distribution maintained by DornerWorks. + +Execution +- + +This procedure describes how to run the ticker sample application that should +already be built with the BSP. + +The `ticker.exe` file can be found in the BSP build tree at: + + arm-rtems5/c/xen_virtual/testsuites/samples/ticker.exe + +The `ticker.exe` elf file must be translated to a binary format. + + arm-rtems5-objcopy -O binary ticker.exe ticker.bin + +Then place the `ticker.bin` file on the dom0 filesystem. + +From the dom0 console, create a configuration file `ticker.cfg` with the +following contents. + + name = "ticker" + kernel = "ticker.bin" + memory = 8 + vcpus = 1 + gic_version = "v2" + vuart = "sbsa_uart" + +Create the virtual machine and attach to the virtual vpl011 console. + + xl create ticker.cfg && xl console -t vuart ticker + +To return back to the dom0 console, press both `Ctrl` and `]` on your keyboard. + +Additional information +-- + +The Virtuosity distribution can be found at + https://dornerworks.com/xen/virtuosity -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable test] 144301: tolerable FAIL - PUSHED
flight 144301 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/144301/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemut-ws16-amd64 16 guest-localmigrate/x10 fail in 144295 pass in 144301 test-amd64-amd64-libvirt-xsm 21 leak-check/check fail pass in 144295 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 12 guest-start/debianhvm.repeat fail pass in 144295 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-rtds 17 guest-start.2 fail in 144295 blocked in 144289 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10 fail like 144289 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144289 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144289 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144289 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 144289 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144289 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 144289 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144289 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144289 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144289 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen 77beba7c921a286c31a2a76f26500047f353614a baseline version: xen 183f354e1430087879de071f0c7122e42703916e Last test of basis 144289
[Xen-devel] [PATCH v2] bsp/xen: Update README
Add some background information on the BSP and instructions on how to run the ticker application. Change-Id: I05050a335a938f00cc59bae69a014c5f04e05d23 --- bsps/arm/xen/README | 130 ++-- 1 file changed, 64 insertions(+), 66 deletions(-) diff --git a/bsps/arm/xen/README b/bsps/arm/xen/README index 1b24d84c9a..2ae2f2170d 100644 --- a/bsps/arm/xen/README +++ b/bsps/arm/xen/README @@ -1,66 +1,64 @@ -# This is a sample hardware description file for a BSP. This comment -# block does not have to appear in a real one. The intention of this -# file is to provide a central place to look when searching for -# information about a board when starting a new BSP. For example, -# you may want to find an existing timer driver for the chip you are -# using on your board. It is easier to grep for the chip name in -# all of the HARDWARE files than to peruse the source tree. Hopefully, -# making the HARDDWARE files accurate will also alleviate the common -# problem of not knowing anything about a board based on its BSP -# name. -# -# NOTE: If you have a class of peripheral chip on board which -# is not in this list please add it to this file so -# others will also use the same name. -# -# Timer resolution is the way it is configured in this BSP. -# On a counting timer, this is the length of time which -# corresponds to 1 count. -# - -BSP NAME: fastsbc1 -BOARD: Fat Computers, Fast SBC-1 -BUS:SchoolBus -CPU FAMILY: i386 -CPU:Intel Hexium -COPROCESSORS: Witch Hex87 -MODE: 32 bit mode - -DEBUG MONITOR: HexBug - -PERIPHERALS -=== -TIMERS: Intel i8254 - RESOLUTION: .0001 microseconds -SERIAL PORTS: Zilog Z8530 (with 2 ports) -REAL-TIME CLOCK:RTC-4 -DMA:Intel i8259 -VIDEO: none -SCSI: none -NETWORKING: none - -DRIVER INFORMATION -== -CLOCK DRIVER: RTC-4 -IOSUPP DRIVER: Zilog Z8530 port A -SHMSUPP:polled and interrupts -TIMER DRIVER: Intel i8254 -TTY DRIVER: stub only - -STDIO -= -PORT: Console port 0 -ELECTRICAL: RS-232 -BAUD: 9600 -BITS PER CHARACTER: 8 -PARITY: None -STOP BITS: 1 - -NOTES -= - -(1) 900 Mhz and 950 Mhz versions. - -(2) 1 Gb or 2 Gb RAM. - -(3) PC compatible if HexBug not enabled. +BSP for Xen on ARM + +Overview + + +This BSP enables RTEMS to run as a guest virtual machine in AArch32 mode on the +Xen hypervisor for ARMv8 platforms. + +Drivers: +- Clock: ARMv7-AR Generic Timer +- Console: Virtual PL011 device +- Interrupt: GICv2 + +BSP variants: +- xen_virtual: completely virtualized guest with no dependence on underlying + hardware + +The xen_virtual BSP variant relies on standard Xen features, so it should be +able to run on any ARMv8 platform. + +Xen allows for the passthrough of hardware peripherals to guest virtual +machines. BSPs could be added in the future targeting specific hardware +platforms and include the appropriate drivers. + +This BSP was tested with Xen running on the Xilinx Zynq UltraScale+ MPSoC using +the Virtuosity distribution maintained by DornerWorks. + +Execution +- + +This procedure describes how to run the ticker sample application that should +already be built with the BSP. + +The `ticker.exe` file can be found in the BSP build tree at: + + arm-rtems5/c/xen_virtual/testsuites/samples/ticker.exe + +The `ticker.exe` elf file must be translated to a binary format. + + arm-rtems5-objcopy -O binary ticker.exe ticker.bin + +Then place the `ticker.bin` file on the dom0 filesystem. + +From the dom0 console, create a configuration file `ticker.cfg` with the +following contents. + + name = "ticker" + kernel = "ticker.bin" + memory = 8 + vcpus = 1 + gic_version = "v2" + vuart = "sbsa_uart" + +Create the virtual machine and attach to the virtual vpl011 console. + + xl create ticker.cfg && xl console -t vuart ticker + +To return back to the dom0 console, press both `Ctrl` and `]` on your keyboard. + +Additional information +-- + +The Virtuosity distribution can be found at + https://dornerworks.com/xen/virtuosity -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)
On 26/11/2019 20:12, Roman Shaposhnik wrote: > On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki > wrote: >> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote: >>> Hi Marek, after applying Jan's patch I'm making much further progress. >>> Xen boots fine and Dom0 seems to be OK (more tests are needed tho on >>> my end). >>> >>> I'm attaching the logs from Xen and Dom0. >>> >>> At this point it seems that adding efi=attr=uc is a better option for >>> these boxes than a wholesale efi=no-rs >>> >>> Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was >>> supposed to cover by default (so I don't have to add efi=attr=uc)? >> No, this looks like some different firmware (?) issue. >> >>> Question #2: is there any downside to *always* specifying efi=attr=uc? >>> Even for servers that, strictly speaking, don't need it? >> TL;DR: It should be fine. It is what Linux does too. >> >> Details: >> >> Lets take a look why 'efi=attr=uc' helps, and how can we make it work >> out of the box: >> >> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with >> attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability >> attribute is defined. For the record, defined attributes are (UEFI spec >> .6): >> >> EFI_MEMORY_UC Memory cacheability attribute: The memory region supports >> being configured as not cacheable. >> >> EFI_MEMORY_WC Memory cacheability attribute: The memory region supports >> being configured as write combining. >> >> EFI_MEMORY_WT Memory cacheability attribute: The memory region supports >> being configured as cacheable with a “write through” policy. >> Writes that hit in the cache will also be written to main memory. >> >> EFI_MEMORY_WB Memory cacheability attribute: The memory region supports >> being configured as cacheable with a “write back” policy. Reads >> and writes that hit in the cache do not propagate to main memory. >> Dirty data is written back to main memory when a new cache line >> is allocated. >> >> EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports >> being configured as not cacheable, exported, and supports the >> “fetch and add” semaphore mechanism. >> >> My reading of UEFI spec doesn't give much hints what to do with memory >> mappings without any cachability attribute. The only related info I've >> found is about EfiMemoryMappedIO: >> >> This memory is not used by the OS. All system memory-mapped IO >> information should come from ACPI tables. >> >> So, maybe there is some more info? >> >> Anyway, if I understand correctly, MMIO region should be mapped as UC, >> right? >> >> I've also taken look at what Linux does. And basically, the only bit >> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region >> as uncachable (page cache disabled bit in page table entry). So, >> basically Linux by default does what Xen's efi=attr=uc does. > Very interesting! Thanks for doing the research. > >> So, to improve Xen's hardware/firmware compatibility, I have two ideas: >> >> 1. Make efi=attr=uc the default (it's still possible to disable it with >> efi=attr=no). > I'd be very much in favor of that too (especially since it seems to match > Linux behaviour) What do others think? Its more than just this. Linux also doesn't use EFI reboot because it is broken almost everywhere (because Windows doesn't use it because its broken almost everywhere, so it never gets fixed). Xen should be following Linux, but I'm exhausted arguing this point. A consequence is that downstream tend to share a pile of "unbreak Xen on UEFI" patches which have been rejected upstream on philosophical rather than technical grounds, despite this being a toxic environment to work in. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)
On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki wrote: > > On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote: > > Hi Marek, after applying Jan's patch I'm making much further progress. > > Xen boots fine and Dom0 seems to be OK (more tests are needed tho on > > my end). > > > > I'm attaching the logs from Xen and Dom0. > > > > At this point it seems that adding efi=attr=uc is a better option for > > these boxes than a wholesale efi=no-rs > > > > Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was > > supposed to cover by default (so I don't have to add efi=attr=uc)? > > No, this looks like some different firmware (?) issue. > > > Question #2: is there any downside to *always* specifying efi=attr=uc? > > Even for servers that, strictly speaking, don't need it? > > TL;DR: It should be fine. It is what Linux does too. > > Details: > > Lets take a look why 'efi=attr=uc' helps, and how can we make it work > out of the box: > > The issue is about memory marked as type=11 (EfiMemoryMappedIO) with > attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability > attribute is defined. For the record, defined attributes are (UEFI spec > .6): > > EFI_MEMORY_UC Memory cacheability attribute: The memory region supports > being configured as not cacheable. > > EFI_MEMORY_WC Memory cacheability attribute: The memory region supports > being configured as write combining. > > EFI_MEMORY_WT Memory cacheability attribute: The memory region supports > being configured as cacheable with a “write through” policy. > Writes that hit in the cache will also be written to main memory. > > EFI_MEMORY_WB Memory cacheability attribute: The memory region supports > being configured as cacheable with a “write back” policy. Reads > and writes that hit in the cache do not propagate to main memory. > Dirty data is written back to main memory when a new cache line > is allocated. > > EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports > being configured as not cacheable, exported, and supports the > “fetch and add” semaphore mechanism. > > My reading of UEFI spec doesn't give much hints what to do with memory > mappings without any cachability attribute. The only related info I've > found is about EfiMemoryMappedIO: > > This memory is not used by the OS. All system memory-mapped IO > information should come from ACPI tables. > > So, maybe there is some more info? > > Anyway, if I understand correctly, MMIO region should be mapped as UC, > right? > > I've also taken look at what Linux does. And basically, the only bit > Linux care about is EFI_MEMORY_WB - if it's absent, then set the region > as uncachable (page cache disabled bit in page table entry). So, > basically Linux by default does what Xen's efi=attr=uc does. Very interesting! Thanks for doing the research. > So, to improve Xen's hardware/firmware compatibility, I have two ideas: > > 1. Make efi=attr=uc the default (it's still possible to disable it with > efi=attr=no). I'd be very much in favor of that too (especially since it seems to match Linux behaviour) What do others think? > 2. Map type=11 (MMIO) as UC, unless attributes specify otherwise. This seems to be the subset of the #1 option. As such -- perhaps it is "safer" than a wholesale efi=attr=uc but at the same time Linux behaviour gives me pretty good confidence that we should probably be safe, no? > Any preference? I can prepare a patch for either version. But I guess > it's too late for getting it into 4.13. Good question as well. Thanks, Roman. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] livepatch-build-tools regression
It looks like gcc plays the usual dirty tricks with local variables renaming: - xen-syms 7529: 82d0805fed50 8 OBJECT LOCAL DEFAULT 4230 lastpage.22857 - livepatch 289: 8 OBJECT GLOBAL DEFAULT UND hvm.c#lastpage.22856 Then, symbols resolution by name fails.. Can you please try to build the livepatch module with additional option '—prelink' and give it a try ? > On 26. Nov 2019, at 18:51, Wieczorkiewicz, Pawel wrote: > > > >> On 20. Nov 2019, at 12:42, Sergey Dyasli wrote: >> >> On 19/11/2019 17:21, Wieczorkiewicz, Pawel wrote: >>> >>> On 18. Nov 2019, at 18:41, Sergey Dyasli wrote: On 18/11/2019 17:28, Wieczorkiewicz, Pawel wrote: > > Could you build the lp with debug (-d) and provide me with the > create-diff-object.log file? > I've attached the log. Btw, I think I provided all the necessary information for others to repeat my experiment. >>> >>> Sorry for another request, but I do not seem to be able to reproduce this >>> locally. >>> Could you send me the livepatch module binary that fails to upload? >> >> That's interesting. I've attached the binary that my system produces. >> What version of gcc do you use? > > The version used was: gcc (GCC) 7.2.1 20170915 > > But I have finally managed to reproduce the issue with: > 1. gcc (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026 > 2. gcc-7 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 > > I think it is not related to the commit: > commit 854a7ca60e35 "create-diff-object: Do not include all .rodata sections" > > I managed to reproduce it also with earlier version commit: > "0c10457 Remove section alignment requirement" > > But this time a different symbol causes the failure: > > (XEN) livepatch: 0001-live-patch: Unknown symbol: hvm.c#lastpage.22856 > >> >> -- >> Thanks, >> Sergey >> <0001-live-patch-stripped.livepatch> > > Best Regards, > Pawel Wieczorkiewicz Best Regards, Pawel Wieczorkiewicz Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)
On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote: > Hi Marek, after applying Jan's patch I'm making much further progress. > Xen boots fine and Dom0 seems to be OK (more tests are needed tho on > my end). > > I'm attaching the logs from Xen and Dom0. > > At this point it seems that adding efi=attr=uc is a better option for > these boxes than a wholesale efi=no-rs > > Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was > supposed to cover by default (so I don't have to add efi=attr=uc)? No, this looks like some different firmware (?) issue. > Question #2: is there any downside to *always* specifying efi=attr=uc? > Even for servers that, strictly speaking, don't need it? TL;DR: It should be fine. It is what Linux does too. Details: Lets take a look why 'efi=attr=uc' helps, and how can we make it work out of the box: The issue is about memory marked as type=11 (EfiMemoryMappedIO) with attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability attribute is defined. For the record, defined attributes are (UEFI spec .6): EFI_MEMORY_UC Memory cacheability attribute: The memory region supports being configured as not cacheable. EFI_MEMORY_WC Memory cacheability attribute: The memory region supports being configured as write combining. EFI_MEMORY_WT Memory cacheability attribute: The memory region supports being configured as cacheable with a “write through” policy. Writes that hit in the cache will also be written to main memory. EFI_MEMORY_WB Memory cacheability attribute: The memory region supports being configured as cacheable with a “write back” policy. Reads and writes that hit in the cache do not propagate to main memory. Dirty data is written back to main memory when a new cache line is allocated. EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports being configured as not cacheable, exported, and supports the “fetch and add” semaphore mechanism. My reading of UEFI spec doesn't give much hints what to do with memory mappings without any cachability attribute. The only related info I've found is about EfiMemoryMappedIO: This memory is not used by the OS. All system memory-mapped IO information should come from ACPI tables. So, maybe there is some more info? Anyway, if I understand correctly, MMIO region should be mapped as UC, right? I've also taken look at what Linux does. And basically, the only bit Linux care about is EFI_MEMORY_WB - if it's absent, then set the region as uncachable (page cache disabled bit in page table entry). So, basically Linux by default does what Xen's efi=attr=uc does. So, to improve Xen's hardware/firmware compatibility, I have two ideas: 1. Make efi=attr=uc the default (it's still possible to disable it with efi=attr=no). 2. Map type=11 (MMIO) as UC, unless attributes specify otherwise. Any preference? I can prepare a patch for either version. But I guess it's too late for getting it into 4.13. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v3 2/2] x86/vmx: always sync PIR to IRR before vmentry
On Tue, Nov 26, 2019 at 05:50:32PM +0100, Jan Beulich wrote: > On 26.11.2019 14:26, Roger Pau Monne wrote: > > --- a/xen/arch/x86/hvm/irq.c > > +++ b/xen/arch/x86/hvm/irq.c > > @@ -515,7 +515,11 @@ void hvm_set_callback_via(struct domain *d, uint64_t > > via) > > struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v) > > { > > struct hvm_domain *plat = >domain->arch.hvm; > > -int vector; > > +/* > > + * Always call vlapic_has_pending_irq so that PIR is synced into IRR > > when > > + * using posted interrupts. > > + */ > > +int vector = vlapic_has_pending_irq(v); > > Did you consider doing this conditionally either here ... > > > @@ -530,7 +534,6 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu > > *v) > > if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output ) > > return hvm_intack_pic(0); > > > > -vector = vlapic_has_pending_irq(v); > > if ( vector != -1 ) > > return hvm_intack_lapic(vector); > > ... or here? I ask not only because the function isn't exactly > cheap to call (as iirc you did also mention during the v2 > discussion), but also because of its interaction with Viridian > and nested mode. In case of problems there, avoiding the use > of interrupt posting would be a workaround in such cases then. TBH my preference was to do the PIR to IRR sync in vmx_intr_assist by directly calling vmx_sync_pir_to_irr because it was IMO less intrusive and confined to VMX code. I think this approach is more risky as vlapic_has_pending_irq does way more than simply syncing PIR to IRR. > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -1945,57 +1945,31 @@ static void vmx_process_isr(int isr, struct vcpu *v) > > > > static void __vmx_deliver_posted_interrupt(struct vcpu *v) > > { > > -bool_t running = v->is_running; > > - > > vcpu_unblock(v); > > -/* > > - * Just like vcpu_kick(), nothing is needed for the following two > > cases: > > - * 1. The target vCPU is not running, meaning it is blocked or > > runnable. > > - * 2. The target vCPU is the current vCPU and we're in non-interrupt > > - * context. > > - */ > > -if ( running && (in_irq() || (v != current)) ) > > -{ > > +if ( v->is_running && v != current ) > > I continue to be concerned by this evaluation of ->is_running > _after_ vcpu_unblock(), when previously (just like vcpu_kick() > does) it was intentionally done before. If the unblock sets v->is_running to true that's fine, Xen will send a posted interrupt IPI and the destination pCPU will either be in root mode and thus raise a softirq or in non-root mode and perform the PIR to IRR sync and possible interrupt injection. I see that caching the value of is_running might be helpful in order to prevent pointless IPI'ing. If the vCPU wasn't running before the unblock there's no reason to send an IPI to it, because the sync of PIR to IRR will happen at vmentry anyway. > I wonder anyway > whether this and the change to irq.c should really be in a > single patch, the more that you start the respective part of > the description with "While there also simplify ...". But in > the end it is up to Kevin's or Jun's judgement anyway. Yes, that makes sense. Will wait for feedback from Kevin or Jun before sending a new version anyway. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)
Hi Marek, after applying Jan's patch I'm making much further progress. Xen boots fine and Dom0 seems to be OK (more tests are needed tho on my end). I'm attaching the logs from Xen and Dom0. At this point it seems that adding efi=attr=uc is a better option for these boxes than a wholesale efi=no-rs Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was supposed to cover by default (so I don't have to add efi=attr=uc)? Question #2: is there any downside to *always* specifying efi=attr=uc? Even for servers that, strictly speaking, don't need it? Thanks, Roman. On Mon, Nov 25, 2019 at 11:02 PM Roman Shaposhnik wrote: > > On Mon, Nov 25, 2019 at 7:55 PM Marek Marczykowski-Górecki > wrote: > > > > On Mon, Nov 25, 2019 at 07:44:03PM -0800, Roman Shaposhnik wrote: > > > On Sun, Nov 24, 2019 at 4:48 PM Marek Marczykowski-Górecki > > > wrote: > > > > Do you have by > > > > a chance messages of that crash (without efi=no-rs, but with > > > > EFI_SET_VIRTUAL_ADDRESS_MAP enabled)? Or even a photo if no serial > > > > output is > > > > available? > > > > > > With my awesome soldering skills ;-) I managed to rig a serial console. > > > > > > Output is attached. Please let me know if you'd like me to run any > > > other experiments. > > > > Looks helpful, lets try to do something: > > > > > Xen 4.13.0-rc > > > (XEN) Xen version 4.13.0-rc (@) (gcc (Alpine 6.4.0) 6.4.0) debug=y Tue > > > Nov 26 03:19:38 UTC 2019 > > > (XEN) Latest ChangeSet: > > > (XEN) build-id: 07aa9f711fe09a91be2588ee7df10d93ebe34c80 > > > (XEN) Bootloader: GRUB 2.03 > > > (XEN) Command line: com1=115200,8n1 console=com1 loglvl=all noreboot > > > dom0_mem=640M,max:640M dom0_max_vcpus=1 dom0_vcpus_pin smt=false > > (...) > > > (XEN) EFI memory map: > > (...) > > > (XEN) 077587000-0775f4fff type=5 attr=800f > > > > This is code that crashes - runtime services code, so somewhere with > > actual UEFI code. > > Yup -- that was my hunch with adding efi=no-rs option. > > > (...) > > > (XEN) 0ff90-0 type=11 attr=8000 > > > (XEN) Unknown cachability for MFNs 0xff900-0xf > > > > The faulting address is in this range. And because of unknown > > cachability, it isn't mapped. Try adding 'efi=attr=uc' to the Xen > > cmdline. > > Feels like we're getting exactly the same failure. Log attached. > > Thanks, > Roman. (XEN) Xen version 4.13.0-rc (@) (gcc (Alpine 6.4.0) 6.4.0) debug=y Tue Nov 26 16:59:33 UTC 2019 (XEN) Latest ChangeSet: (XEN) build-id: 65fa4bd58d340488ec6963bd8ca5418747541fe5 (XEN) Bootloader: GRUB 2.03 (XEN) Command line: com1=115200,8n1 console=com1 efi=attr=uc loglvl=all noreboot dom0_mem=640M,max:640M dom0_max_vcpus=1 dom0_vcpus_pin smt=false (XEN) Xen image load base address: 0x70e0 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) Disc information: (XEN) Found 0 MBR signatures (XEN) Found 1 EDD information structures (XEN) EFI RAM map: (XEN) - 0003f000 (usable) (XEN) 0003f000 - 0004 (ACPI NVS) (XEN) 0004 - 000a (usable) (XEN) 0010 - 2000 (usable) (XEN) 2000 - 2010 (reserved) (XEN) 2010 - 76ccb000 (usable) (XEN) 76ccb000 - 76d43000 (reserved) (XEN) 76d43000 - 76d54000 (ACPI data) (XEN) 76d54000 - 772de000 (ACPI NVS) (XEN) 772de000 - 775f5000 (reserved) (XEN) 775f5000 - 775f6000 (usable) (XEN) 775f6000 - 77638000 (reserved) (XEN) 77638000 - 789e5000 (usable) (XEN) 789e5000 - 78ffa000 (reserved) (XEN) 78ffa000 - 7900 (usable) (XEN) e000 - f000 (reserved) (XEN) fec0 - fec01000 (reserved) (XEN) fed01000 - fed02000 (reserved) (XEN) fed03000 - fed04000 (reserved) (XEN) fed08000 - fed09000 (reserved) (XEN) fed0c000 - fed1 (reserved) (XEN) fed1c000 - fed1d000 (reserved) (XEN) fee0 - fee01000 (reserved) (XEN) fef0 - ff00 (reserved) (XEN) ff90 - 0001 (reserved) (XEN) System RAM: 1919MB (1965176kB) (XEN) ACPI: RSDP 76D46000, 0024 (r2 DELL) (XEN) ACPI: XSDT 76D46088, 0094 (r1 DELL AS09 1072009 AMI 10013) (XEN) ACPI: FACP 76D52560, 010C (r5 DELL AS09 1072009 AMI 10013) (XEN) ACPI: DSDT 76D461B0, C3AF (r2 DELL AS09 1072009 INTL 20120913) (XEN) ACPI: FACS 772DDE80, 0040 (XEN) ACPI: APIC 76D52670, 0068 (r3 DELL AS09 1072009 AMI 10013) (XEN) ACPI: FPDT 76D526D8, 0044 (r1 DELL AS09 1072009 AMI 10013) (XEN) ACPI: FIDT 76D52720, 009C (r1 DELL AS09 1072009 AMI 10013) (XEN) ACPI: MCFG 76D527C0, 003C (r1 DELL AS09 1072009 MSFT 97) (XEN) ACPI: LPIT 76D52800, 0104 (r1 DELL AS09
[Xen-devel] [xen-unstable-smoke test] 144310: tolerable all pass - PUSHED
flight 144310 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/144310/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 5530782cfe70ed22fe44358f6a10c38916443b42 baseline version: xen 8c79c129a6db2220c1089e0ce5fa49e7298b1d3e Last test of basis 144307 2019-11-26 11:03:51 Z0 days Testing same since 144310 2019-11-26 15:01:24 Z0 days1 attempts People who touched revisions under test: Anthony PERARD Jan Beulich jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 8c79c129a6..5530782cfe 5530782cfe70ed22fe44358f6a10c38916443b42 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] livepatch-build-tools regression
> On 20. Nov 2019, at 12:42, Sergey Dyasli wrote: > > On 19/11/2019 17:21, Wieczorkiewicz, Pawel wrote: >> >> >>> On 18. Nov 2019, at 18:41, Sergey Dyasli wrote: >>> >>> On 18/11/2019 17:28, Wieczorkiewicz, Pawel wrote: Could you build the lp with debug (-d) and provide me with the create-diff-object.log file? >>> >>> I've attached the log. Btw, I think I provided all the necessary information >>> for others to repeat my experiment. >>> >> >> Sorry for another request, but I do not seem to be able to reproduce this >> locally. >> Could you send me the livepatch module binary that fails to upload? > > That's interesting. I've attached the binary that my system produces. > What version of gcc do you use? The version used was: gcc (GCC) 7.2.1 20170915 But I have finally managed to reproduce the issue with: 1. gcc (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026 2. gcc-7 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 I think it is not related to the commit: commit 854a7ca60e35 "create-diff-object: Do not include all .rodata sections" I managed to reproduce it also with earlier version commit: "0c10457 Remove section alignment requirement" But this time a different symbol causes the failure: (XEN) livepatch: 0001-live-patch: Unknown symbol: hvm.c#lastpage.22856 > > -- > Thanks, > Sergey > <0001-live-patch-stripped.livepatch> Best Regards, Pawel Wieczorkiewicz Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling
> -Original Message- > From: Xen-devel On Behalf Of > George Dunlap > Sent: 26 November 2019 17:18 > To: xen-devel@lists.xenproject.org > Cc: Juergen Gross ; Stefano Stabellini > ; Julien Grall ; Wei Liu > ; Paul Durrant ; Andrew Cooper > ; Konrad Rzeszutek Wilk > ; George Dunlap ; Marek > Marczykowski-Górecki ; Jan Beulich > ; Ian Jackson > Subject: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and > max_maptrack_frames handling > > Xen used to have single, system-wide limits for the number of grant > frames and maptrack frames a guest was allowed to create. Increasing > or decreasing this single limit on the Xen command-line would change > the limit for all guests on the system. > > Later, per-domain limits for these values was created. The > system-wide limits became strict limits: domains could not be created > with higher limits, but could be created with lower limits. > > However, the change also introduced a range of different "default" > values into various places in the toolstack: > > - The python libxc bindings hard-coded these values to 32 and 1024, > respectively > > - The libxl default values are 32 and 1024 respectively. > > - xl will use the libxl default for maptrack, but does its own default > calculation for grant frames: either 32 or 64, based on the max > possible mfn. > > These defaults interact poorly with the hypervisor command-line limit: > > - The hypervisor command-line limit cannot be used to raise the limit > for all guests anymore, as the default in the toolstack will > effectively override this. > > - If you use the hypervisor command-line limit to *reduce* the limit, > then the "default" values generated by the toolstack are too high, > and all guest creations will fail. > > In other words, the toolstack defaults require any change to be > effected by having the admin explicitly specify a new value in every > guest. > > In order to address this, have grant_table_init treat '0' values for > max_grant_frames and max_maptrack_frames as instructions to use the > system-wide default. Have all the above toolstacks default to passing > 0 unless a different value is explicitly given. > > This restores the old behavior, that changing the hypervisor > command-line option can change the behavior for all guests, while > retaining the ability to set per-guest values. It also removes the > bug that *reducing* the system-wide max will cause all domains without > explicit limits to fail. > > (The ocaml bindings require the caller to always specify a value, and > the code to start a xenstored stubdomain hard-codes these to 4 and 128 > respectively; these will not be addressed here.) > > Signed-off-by: George Dunlap > --- > Release justification: This is an observed regression (albeit one that > has spanned several releases now). > > Compile-tested only. > > NB this patch could be applied without the whitespace fixes (perhaps > with some fix-ups); it's just easier since my editor strips trailing > whitespace out automatically. > > CC: Ian Jackson > CC: Wei Liu > CC: Andrew Cooper > CC: Jan Beulich > CC: Paul Durrant > CC: Julien Grall > CC: Konrad Rzeszutek Wilk > CC: Stefano Stabellini > CC: Juergen Gross > CC: Marek Marczykowski-Górecki > --- > tools/libxl/libxl.h | 4 ++-- > tools/python/xen/lowlevel/xc/xc.c | 2 -- > tools/xl/xl.c | 12 ++-- > xen/common/grant_table.c | 7 +++ > xen/include/public/domctl.h | 6 -- > 5 files changed, 15 insertions(+), 16 deletions(-) > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index 49b56fa1a3..1648d337e7 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -364,8 +364,8 @@ > */ > #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1 > > -#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32 > -#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024 > +#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 0 > +#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 0 > > /* > * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has > diff --git a/tools/python/xen/lowlevel/xc/xc.c > b/tools/python/xen/lowlevel/xc/xc.c > index 6d2afd5695..0f861872ce 100644 > --- a/tools/python/xen/lowlevel/xc/xc.c > +++ b/tools/python/xen/lowlevel/xc/xc.c > @@ -127,8 +127,6 @@ static PyObject *pyxc_domain_create(XcObject *self, > }, > .max_vcpus = 1, > .max_evtchn_port = -1, /* No limit. */ > -.max_grant_frames = 32, > -.max_maptrack_frames = 1024, > }; > > static char *kwd_list[] = { "domid", "ssidref", "handle", "flags", > diff --git a/tools/xl/xl.c b/tools/xl/xl.c > index ddd29b3f1b..b6e220184d 100644 > --- a/tools/xl/xl.c > +++ b/tools/xl/xl.c > @@ -51,8 +51,8 @@ libxl_bitmap global_pv_affinity_mask; > enum output_format default_output_format = OUTPUT_FORMAT_JSON; > int claim_mode = 1; > bool progress_use_cr = 0; > -int max_grant_frames = -1; > -int max_maptrack_frames = -1; > +int
Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling
George Dunlap writes ("[PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling"): > Xen used to have single, system-wide limits for the number of grant > frames and maptrack frames a guest was allowed to create. Increasing > or decreasing this single limit on the Xen command-line would change > the limit for all guests on the system. If I am not mistaken, this is an important change to have. I have seen reports of users who ran out of grant/maptrack frames because of updates to use multiring protocols etc. The error messages are not very good and the recommended workaround has been to increase the default limit on the hypervisor command line. It is important that we don't break that workaround! Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling
On 11/26/19 5:17 PM, George Dunlap wrote: > - xl will use the libxl default for maptrack, but does its own default > calculation for grant frames: either 32 or 64, based on the max > possible mfn. [snip] > @@ -199,13 +198,6 @@ static void parse_global_config(const char *configfile, > > if (!xlu_cfg_get_long (config, "max_grant_frames", , 0)) > max_grant_frames = l; > -else { > -libxl_physinfo_init(); > -max_grant_frames = (libxl_get_physinfo(ctx, ) != 0 || > -!(physinfo.max_possible_mfn >> 32)) > - ? 32 : 64; > -libxl_physinfo_dispose(); > -} Sorry, meant to add a patch to add this functionality back into the hypervisor -- i.e., so that opt_max_grant_frames would be 32 on systems with 32-bit mfns. But this seems like a fairly strange calculation anyway; it's not clear to me where it would have come from. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] UEFI support on Dell boxes
On Tue, Nov 26, 2019 at 12:31 AM Jan Beulich wrote: > > On 26.11.2019 08:02, Roman Shaposhnik wrote: > > On Mon, Nov 25, 2019 at 7:55 PM Marek Marczykowski-Górecki > > wrote: > >> On Mon, Nov 25, 2019 at 07:44:03PM -0800, Roman Shaposhnik wrote: > >>> (XEN) 0ff90-0 type=11 attr=8000 > >>> (XEN) Unknown cachability for MFNs 0xff900-0xf > >> > >> The faulting address is in this range. And because of unknown > >> cachability, it isn't mapped. Try adding 'efi=attr=uc' to the Xen > >> cmdline. > > > > Feels like we're getting exactly the same failure. Log attached. > > Clearly the option hasn't been taking effect. Could you please > retry with this fix > https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg01494.html > in place? This works very well indeed! I acked it in the patch thread. Thanks, Roman. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] EFI: fix "efi=attr=" handling
On Tue, Nov 26, 2019 at 1:51 AM Wei Liu wrote: > > On Tue, Nov 26, 2019 at 09:25:27AM +0100, Jan Beulich wrote: > > Commit 633a40947321 ("docs: Improve documentation and parsing for efi=") > > failed to honor the strcmp()-like return value convention of > > cmdline_strcmp(). > > > > Reported-by: Roman Shaposhnik > > Signed-off-by: Jan Beulich > > Reviewed-by: Wei Liu Tested-by: Roman Shaposhnik Thanks, Roman. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] getting 4.12.2 ready
On 25/11/2019, 09:10, "Jan Beulich" wrote: All, the 4.12.2 stable release is due in about 2 weeks time. Please point out backporting candidates that you find missing from the respective stable trees. Jan Hi all: I ran the XSA scripts and all is fine. BreaThe tool does not always understand the xsa303/*.patch xen-unstable .. Xen 4.9 notation in xsa.git, which is not well defined. But it's probably not worth trying to fix this, as a manual check takes less than a minute Tool output is below XSA 296 : Some patches not applied => check Applied XSA 297 : No patch found => check Was applied in Xen 4.12.1 XSA 298 : All patches found (no need to check) XSA 299 : All patches found (no need to check) XSA 300 : No patch found => check Linux only XSA 301 : All patches found (no need to check) XSA 302 : Some patches not applied => check Applied XSA 303 : Some patches not applied => check Applied XSA 304 : All patches found (no need to check) XSA 305 : All patches found (no need to check) Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed
From: Julien Grall A guest will setup a shared page with the hypervisor for each vCPU via XENPMU_init. The page will then get mapped in the hypervisor and only released when XEMPMU_finish is called. This means that if the guest is not shutdown gracefully (such as via xl destroy), the page will stay mapped in the hypervisor. One of the consequence is the domain can never be fully destroyed as some of its memory is still mapped. As Xen should never rely on the guest to correctly clean-up any allocation in the hypervisor, we should also unmap pages during the domain destruction if there are any left. We can re-use the same logic as in pvpmu_finish(). To avoid duplication, move the logic in a new function that can also be called from vpmu_destroy(). NOTE: The call to vpmu_destroy() must also be moved from arch_vcpu_destroy() into domain_relinquish_resources() such that the mapped page does not prevent domain_destroy() (which calls arch_vcpu_destroy()) from being called. Signed-off-by: Julien Grall Signed-off-by: Paul Durrant --- Cc: Jan Beulich Cc: Andrew Cooper Cc: Wei Liu Cc: "Roger Pau Monné" --- xen/arch/x86/cpu/vpmu.c | 45 +++-- xen/arch/x86/domain.c | 6 +++--- 2 files changed, 28 insertions(+), 23 deletions(-) diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c index f397183ec3..9ae4ed48c8 100644 --- a/xen/arch/x86/cpu/vpmu.c +++ b/xen/arch/x86/cpu/vpmu.c @@ -578,9 +578,32 @@ static void vpmu_arch_destroy(struct vcpu *v) } } -void vpmu_destroy(struct vcpu *v) +static void vpmu_cleanup(struct vcpu *v) { +struct vpmu_struct *vpmu = vcpu_vpmu(v); +mfn_t mfn; +void *xenpmu_data; + +spin_lock(>vpmu_lock); + vpmu_arch_destroy(v); +xenpmu_data = vpmu->xenpmu_data; +vpmu->xenpmu_data = NULL; + +spin_unlock(>vpmu_lock); + +if ( xenpmu_data ) +{ +mfn = domain_page_map_to_mfn(xenpmu_data); +ASSERT(mfn_valid(mfn)); +unmap_domain_page_global(xenpmu_data); +put_page_and_type(mfn_to_page(mfn)); +} +} + +void vpmu_destroy(struct vcpu *v) +{ +vpmu_cleanup(v); put_vpmu(v); } @@ -639,9 +662,6 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params) static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params) { struct vcpu *v; -struct vpmu_struct *vpmu; -mfn_t mfn; -void *xenpmu_data; if ( (params->vcpu >= d->max_vcpus) || (d->vcpu[params->vcpu] == NULL) ) return; @@ -650,22 +670,7 @@ static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params) if ( v != current ) vcpu_pause(v); -vpmu = vcpu_vpmu(v); -spin_lock(>vpmu_lock); - -vpmu_arch_destroy(v); -xenpmu_data = vpmu->xenpmu_data; -vpmu->xenpmu_data = NULL; - -spin_unlock(>vpmu_lock); - -if ( xenpmu_data ) -{ -mfn = domain_page_map_to_mfn(xenpmu_data); -ASSERT(mfn_valid(mfn)); -unmap_domain_page_global(xenpmu_data); -put_page_and_type(mfn_to_page(mfn)); -} +vpmu_cleanup(v); if ( v != current ) vcpu_unpause(v); diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index f1dd86e12e..1d75b2e6c3 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -454,9 +454,6 @@ void arch_vcpu_destroy(struct vcpu *v) xfree(v->arch.msrs); v->arch.msrs = NULL; -if ( !is_idle_domain(v->domain) ) -vpmu_destroy(v); - if ( is_hvm_vcpu(v) ) hvm_vcpu_destroy(v); else @@ -2224,6 +2221,9 @@ int domain_relinquish_resources(struct domain *d) if ( is_hvm_domain(d) ) hvm_domain_relinquish_resources(d); +for_each_vcpu ( d, v ) +vpmu_destroy(v); + return 0; } -- 2.20.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.13 1/2] python/xc.c: Remove trailing whitespace
No functional change. Signed-off-by: George Dunlap --- CC: Marek Marczykowski-Górecki CC: Juergen Gross --- tools/python/xen/lowlevel/xc/xc.c | 210 +++--- 1 file changed, 105 insertions(+), 105 deletions(-) diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index 44d3606141..6d2afd5695 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -1,6 +1,6 @@ /** * Xc.c - * + * * Copyright (c) 2003-2004, K A Fraser (University of Cambridge) */ @@ -107,7 +107,7 @@ static PyObject *pyxc_domain_dumpcore(XcObject *self, PyObject *args) if ( xc_domain_dumpcore(self->xc_handle, dom, corefile) != 0 ) return pyxc_error_to_exception(self->xc_handle); - + Py_INCREF(zero); return zero; } @@ -141,7 +141,7 @@ static PyObject *pyxc_domain_create(XcObject *self, return NULL; if ( pyhandle != NULL ) { -if ( !PyList_Check(pyhandle) || +if ( !PyList_Check(pyhandle) || (PyList_Size(pyhandle) != sizeof(xen_domain_handle_t)) ) goto out_exception; @@ -188,7 +188,7 @@ static PyObject *pyxc_domain_max_vcpus(XcObject *self, PyObject *args) if (xc_domain_max_vcpus(self->xc_handle, dom, max) != 0) return pyxc_error_to_exception(self->xc_handle); - + Py_INCREF(zero); return zero; } @@ -223,7 +223,7 @@ static PyObject *pyxc_domain_shutdown(XcObject *self, PyObject *args) if ( xc_domain_shutdown(self->xc_handle, dom, reason) != 0 ) return pyxc_error_to_exception(self->xc_handle); - + Py_INCREF(zero); return zero; } @@ -255,7 +255,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self, static char *kwd_list[] = { "domid", "vcpu", "cpumap", NULL }; -if ( !PyArg_ParseTupleAndKeywords(args, kwds, "i|iO", kwd_list, +if ( !PyArg_ParseTupleAndKeywords(args, kwds, "i|iO", kwd_list, , , ) ) return NULL; @@ -269,7 +269,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self, if ( (cpulist != NULL) && PyList_Check(cpulist) ) { -for ( i = 0; i < PyList_Size(cpulist); i++ ) +for ( i = 0; i < PyList_Size(cpulist); i++ ) { long cpu = PyLongOrInt_AsLong(PyList_GetItem(cpulist, i)); if ( cpu < 0 || cpu >= nr_cpus ) @@ -282,7 +282,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self, cpumap[cpu / 8] |= 1 << (cpu % 8); } } - + if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap, NULL, XEN_VCPUAFFINITY_HARD) != 0 ) { @@ -290,7 +290,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self, return pyxc_error_to_exception(self->xc_handle); } Py_INCREF(zero); -free(cpumap); +free(cpumap); return zero; } @@ -304,7 +304,7 @@ static PyObject *pyxc_domain_sethandle(XcObject *self, PyObject *args) if (!PyArg_ParseTuple(args, "iO", , )) return NULL; -if ( !PyList_Check(pyhandle) || +if ( !PyList_Check(pyhandle) || (PyList_Size(pyhandle) != sizeof(xen_domain_handle_t)) ) { goto out_exception; @@ -320,7 +320,7 @@ static PyObject *pyxc_domain_sethandle(XcObject *self, PyObject *args) if (xc_domain_sethandle(self->xc_handle, dom, handle) < 0) return pyxc_error_to_exception(self->xc_handle); - + Py_INCREF(zero); return zero; @@ -342,7 +342,7 @@ static PyObject *pyxc_domain_getinfo(XcObject *self, xc_dominfo_t *info; static char *kwd_list[] = { "first_dom", "max_doms", NULL }; - + if ( !PyArg_ParseTupleAndKeywords(args, kwds, "|ii", kwd_list, _dom, _doms) ) return NULL; @@ -415,7 +415,7 @@ static PyObject *pyxc_vcpu_getinfo(XcObject *self, int nr_cpus; static char *kwd_list[] = { "domid", "vcpu", NULL }; - + if ( !PyArg_ParseTupleAndKeywords(args, kwds, "i|i", kwd_list, , ) ) return NULL; @@ -470,7 +470,7 @@ static PyObject *pyxc_hvm_param_get(XcObject *self, int param; uint64_t value; -static char *kwd_list[] = { "domid", "param", NULL }; +static char *kwd_list[] = { "domid", "param", NULL }; if ( !PyArg_ParseTupleAndKeywords(args, kwds, "ii", kwd_list, , ) ) return NULL; @@ -490,7 +490,7 @@ static PyObject *pyxc_hvm_param_set(XcObject *self, int param; uint64_t value; -static char *kwd_list[] = { "domid", "param", "value", NULL }; +static char *kwd_list[] = { "domid", "param", "value", NULL }; if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iiL", kwd_list, , , ) ) return NULL; @@ -660,7 +660,7 @@ static PyObject
[Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling
Xen used to have single, system-wide limits for the number of grant frames and maptrack frames a guest was allowed to create. Increasing or decreasing this single limit on the Xen command-line would change the limit for all guests on the system. Later, per-domain limits for these values was created. The system-wide limits became strict limits: domains could not be created with higher limits, but could be created with lower limits. However, the change also introduced a range of different "default" values into various places in the toolstack: - The python libxc bindings hard-coded these values to 32 and 1024, respectively - The libxl default values are 32 and 1024 respectively. - xl will use the libxl default for maptrack, but does its own default calculation for grant frames: either 32 or 64, based on the max possible mfn. These defaults interact poorly with the hypervisor command-line limit: - The hypervisor command-line limit cannot be used to raise the limit for all guests anymore, as the default in the toolstack will effectively override this. - If you use the hypervisor command-line limit to *reduce* the limit, then the "default" values generated by the toolstack are too high, and all guest creations will fail. In other words, the toolstack defaults require any change to be effected by having the admin explicitly specify a new value in every guest. In order to address this, have grant_table_init treat '0' values for max_grant_frames and max_maptrack_frames as instructions to use the system-wide default. Have all the above toolstacks default to passing 0 unless a different value is explicitly given. This restores the old behavior, that changing the hypervisor command-line option can change the behavior for all guests, while retaining the ability to set per-guest values. It also removes the bug that *reducing* the system-wide max will cause all domains without explicit limits to fail. (The ocaml bindings require the caller to always specify a value, and the code to start a xenstored stubdomain hard-codes these to 4 and 128 respectively; these will not be addressed here.) Signed-off-by: George Dunlap --- Release justification: This is an observed regression (albeit one that has spanned several releases now). Compile-tested only. NB this patch could be applied without the whitespace fixes (perhaps with some fix-ups); it's just easier since my editor strips trailing whitespace out automatically. CC: Ian Jackson CC: Wei Liu CC: Andrew Cooper CC: Jan Beulich CC: Paul Durrant CC: Julien Grall CC: Konrad Rzeszutek Wilk CC: Stefano Stabellini CC: Juergen Gross CC: Marek Marczykowski-Górecki --- tools/libxl/libxl.h | 4 ++-- tools/python/xen/lowlevel/xc/xc.c | 2 -- tools/xl/xl.c | 12 ++-- xen/common/grant_table.c | 7 +++ xen/include/public/domctl.h | 6 -- 5 files changed, 15 insertions(+), 16 deletions(-) diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 49b56fa1a3..1648d337e7 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -364,8 +364,8 @@ */ #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1 -#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32 -#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024 +#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 0 +#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 0 /* * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index 6d2afd5695..0f861872ce 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -127,8 +127,6 @@ static PyObject *pyxc_domain_create(XcObject *self, }, .max_vcpus = 1, .max_evtchn_port = -1, /* No limit. */ -.max_grant_frames = 32, -.max_maptrack_frames = 1024, }; static char *kwd_list[] = { "domid", "ssidref", "handle", "flags", diff --git a/tools/xl/xl.c b/tools/xl/xl.c index ddd29b3f1b..b6e220184d 100644 --- a/tools/xl/xl.c +++ b/tools/xl/xl.c @@ -51,8 +51,8 @@ libxl_bitmap global_pv_affinity_mask; enum output_format default_output_format = OUTPUT_FORMAT_JSON; int claim_mode = 1; bool progress_use_cr = 0; -int max_grant_frames = -1; -int max_maptrack_frames = -1; +int max_grant_frames = 0; +int max_maptrack_frames = 0; xentoollog_level minmsglevel = minmsglevel_default; @@ -96,7 +96,6 @@ static void parse_global_config(const char *configfile, XLU_Config *config; int e; const char *buf; -libxl_physinfo physinfo; config = xlu_cfg_init(stderr, configfile); if (!config) { @@ -199,13 +198,6 @@ static void parse_global_config(const char *configfile, if (!xlu_cfg_get_long (config, "max_grant_frames", , 0)) max_grant_frames = l; -else { -libxl_physinfo_init(); -max_grant_frames = (libxl_get_physinfo(ctx, ) != 0 || -!(physinfo.max_possible_mfn >> 32)) -
Re: [Xen-devel] [PATCH for-4.13 v2] AMD/IOMMU: honour IR setting while pre-filling DTEs
On 26/11/2019 17:08, Igor Druzhinin wrote: > IV bit shouldn't be set in DTE if interrupt remapping is not > enabled. It's a regression in behavior of "iommu=no-intremap" > option which otherwise would keep interrupt requests untranslated > for all of the devices in the system regardless of wether it's > described as valid in IVRS or not. > > Signed-off-by: Igor Druzhinin Reviewed-by: Andrew Cooper Juergen: 4.13 justification - this is a regression from 4.12. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.13 v2] AMD/IOMMU: honour IR setting while pre-filling DTEs
IV bit shouldn't be set in DTE if interrupt remapping is not enabled. It's a regression in behavior of "iommu=no-intremap" option which otherwise would keep interrupt requests untranslated for all of the devices in the system regardless of wether it's described as valid in IVRS or not. Signed-off-by: Igor Druzhinin --- xen/drivers/passthrough/amd/iommu_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c index 16e84d4..2b81e38 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table( for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf ) dt[bdf] = (struct amd_iommu_dte){ .v = true, - .iv = true, + .iv = iommu_intremap, }; } -- 2.7.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v3 1/2] x86/vmx: add ASSERT to prevent syncing PIR to IRR...
On 26.11.2019 17:47, Roger Pau Monné wrote: > On Tue, Nov 26, 2019 at 05:32:04PM +0100, Jan Beulich wrote: >> On 26.11.2019 14:26, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/hvm/vmx/vmx.c >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c >>> @@ -2054,6 +2054,19 @@ static void vmx_sync_pir_to_irr(struct vcpu *v) >>> unsigned int group, i; >>> DECLARE_BITMAP(pending_intr, NR_VECTORS); >>> >>> +if ( v != current && !atomic_read(>pause_count) ) >>> +{ >>> +/* >>> + * Syncing PIR to IRR must not be done behind the back of the CPU, >>> + * since the IRR is controlled by the hardware when the vCPU is >>> + * executing. Only allow Xen to do such sync if the vCPU is the >>> current >>> + * one or if it's paused: that's required in order to sync the >>> lapic >>> + * state before saving it. >>> + */ >> >> Is this stated this way by the SDM anywhere? > > No, I think the SDM is not very clear on this, there's a paragraph > about PIR: > > "The logical processor performs a logical-OR of PIR into VIRR and > clears PIR. No other agent can read or write a PIR bit (or group of > bits) between the time it is read (to determine what to OR into VIRR) > and when it is cleared." Well, this is about PIR, but my question was rather towards the effects on vIRR. >> I ask because the >> comment then really doesn't apply to just this function, but to >> vlapic_{,test_and_}{set,clear}_vector() more generally. It's >> not clear to me at all whether the CPU caches (in an incoherent >> fashion) IRR (and maybe other APIC page elements), rather than >> honoring the atomic updates these macros do. > > IMO syncing PIR to IRR when the vCPU is running on a different pCPU is > likely to at least defeat the purpose of posted interrupts: I agree here. > when the > CPU receives the posted interrupt vector it won't see the > outstanding-notification bit in the posted-interrupt descriptor > because the sync done from a different pCPU would have cleared it, at > which point it's not clear to me that the processor will check vIRR > for pending interrupts. The description in section 29.6 > POSTED-INTERRUPT PROCESSING doesn't explicitly mention whether the > value of the outstanding-notification bit affects the logic of posted > interrupt processing. But overall this again is all posted interrupt centric when my question was about vIRR, in particular whether the asserting you add may need to be even more rigid. Anyway, let's see what the VMX maintainers have to say. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v3 2/2] x86/vmx: always sync PIR to IRR before vmentry
On 26.11.2019 14:26, Roger Pau Monne wrote: > --- a/xen/arch/x86/hvm/irq.c > +++ b/xen/arch/x86/hvm/irq.c > @@ -515,7 +515,11 @@ void hvm_set_callback_via(struct domain *d, uint64_t via) > struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v) > { > struct hvm_domain *plat = >domain->arch.hvm; > -int vector; > +/* > + * Always call vlapic_has_pending_irq so that PIR is synced into IRR when > + * using posted interrupts. > + */ > +int vector = vlapic_has_pending_irq(v); Did you consider doing this conditionally either here ... > @@ -530,7 +534,6 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v) > if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output ) > return hvm_intack_pic(0); > > -vector = vlapic_has_pending_irq(v); > if ( vector != -1 ) > return hvm_intack_lapic(vector); ... or here? I ask not only because the function isn't exactly cheap to call (as iirc you did also mention during the v2 discussion), but also because of its interaction with Viridian and nested mode. In case of problems there, avoiding the use of interrupt posting would be a workaround in such cases then. > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -1945,57 +1945,31 @@ static void vmx_process_isr(int isr, struct vcpu *v) > > static void __vmx_deliver_posted_interrupt(struct vcpu *v) > { > -bool_t running = v->is_running; > - > vcpu_unblock(v); > -/* > - * Just like vcpu_kick(), nothing is needed for the following two cases: > - * 1. The target vCPU is not running, meaning it is blocked or runnable. > - * 2. The target vCPU is the current vCPU and we're in non-interrupt > - * context. > - */ > -if ( running && (in_irq() || (v != current)) ) > -{ > +if ( v->is_running && v != current ) I continue to be concerned by this evaluation of ->is_running _after_ vcpu_unblock(), when previously (just like vcpu_kick() does) it was intentionally done before. I wonder anyway whether this and the change to irq.c should really be in a single patch, the more that you start the respective part of the description with "While there also simplify ...". But in the end it is up to Kevin's or Jun's judgement anyway. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v3 1/2] x86/vmx: add ASSERT to prevent syncing PIR to IRR...
On Tue, Nov 26, 2019 at 05:32:04PM +0100, Jan Beulich wrote: > On 26.11.2019 14:26, Roger Pau Monne wrote: > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -2054,6 +2054,19 @@ static void vmx_sync_pir_to_irr(struct vcpu *v) > > unsigned int group, i; > > DECLARE_BITMAP(pending_intr, NR_VECTORS); > > > > +if ( v != current && !atomic_read(>pause_count) ) > > +{ > > +/* > > + * Syncing PIR to IRR must not be done behind the back of the CPU, > > + * since the IRR is controlled by the hardware when the vCPU is > > + * executing. Only allow Xen to do such sync if the vCPU is the > > current > > + * one or if it's paused: that's required in order to sync the > > lapic > > + * state before saving it. > > + */ > > Is this stated this way by the SDM anywhere? No, I think the SDM is not very clear on this, there's a paragraph about PIR: "The logical processor performs a logical-OR of PIR into VIRR and clears PIR. No other agent can read or write a PIR bit (or group of bits) between the time it is read (to determine what to OR into VIRR) and when it is cleared." > I ask because the > comment then really doesn't apply to just this function, but to > vlapic_{,test_and_}{set,clear}_vector() more generally. It's > not clear to me at all whether the CPU caches (in an incoherent > fashion) IRR (and maybe other APIC page elements), rather than > honoring the atomic updates these macros do. IMO syncing PIR to IRR when the vCPU is running on a different pCPU is likely to at least defeat the purpose of posted interrupts: when the CPU receives the posted interrupt vector it won't see the outstanding-notification bit in the posted-interrupt descriptor because the sync done from a different pCPU would have cleared it, at which point it's not clear to me that the processor will check vIRR for pending interrupts. The description in section 29.6 POSTED-INTERRUPT PROCESSING doesn't explicitly mention whether the value of the outstanding-notification bit affects the logic of posted interrupt processing. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v3 0/2] x86/vmx: posted interrupt fixes
On Tue, Nov 26, 2019 at 02:26:46PM +0100, Roger Pau Monne wrote: > Hello, > > The following series aim to solve the issue reported by Joe Jin related > to posted interrupts. Regarding the release blockers email, and the qualification of this series: - a regression introduced since 4.12 This is not a regression, since AFAICT the posted interrupt code has always been like this. - a severe bug of a 4.13 feature The bug seems to impact people using PCI-passthrough on Intel hardware that supports posted interrupts (aka APICv). In my opinion, we either fix it or disable APICv by default (now it's currently enabled by default). Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v3 2/2] x86/vmx: always sync PIR to IRR before vmentry
On 11/26/19 5:26 AM, Roger Pau Monne wrote: > When using posted interrupts on Intel hardware it's possible that the > vCPU resumes execution with a stale local APIC IRR register because > depending on the interrupts to be injected vlapic_has_pending_irq > might not be called, and thus PIR won't be synced into IRR. > > Fix this by making sure PIR is always synced to IRR in > hvm_vcpu_has_pending_irq regardless of what interrupts are pending. > > While there also simplify the code in __vmx_deliver_posted_interrupt: > only raise a softirq if the vCPU is the one currently running and > __vmx_deliver_posted_interrupt is called from interrupt context. The > softirq is raised to make sure vmx_intr_assist is retried if the > interrupt happens to arrive after vmx_intr_assist but before > interrupts are disabled in vmx_do_vmentry. Also simplify the logic for > IPIing other pCPUs, there's no need to check v->processor since the > IPI should be sent as long as the vCPU is not the current one and it's > running. > > Reported-by: Joe Jin > Signed-off-by: Roger Pau Monné > --- > Cc: Juergen Gross Patch works for me. Tested-by: Joe Jin Thanks, Joe ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v3 1/2] x86/vmx: add ASSERT to prevent syncing PIR to IRR...
On 26.11.2019 14:26, Roger Pau Monne wrote: > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -2054,6 +2054,19 @@ static void vmx_sync_pir_to_irr(struct vcpu *v) > unsigned int group, i; > DECLARE_BITMAP(pending_intr, NR_VECTORS); > > +if ( v != current && !atomic_read(>pause_count) ) > +{ > +/* > + * Syncing PIR to IRR must not be done behind the back of the CPU, > + * since the IRR is controlled by the hardware when the vCPU is > + * executing. Only allow Xen to do such sync if the vCPU is the > current > + * one or if it's paused: that's required in order to sync the lapic > + * state before saving it. > + */ Is this stated this way by the SDM anywhere? I ask because the comment then really doesn't apply to just this function, but to vlapic_{,test_and_}{set,clear}_vector() more generally. It's not clear to me at all whether the CPU caches (in an incoherent fashion) IRR (and maybe other APIC page elements), rather than honoring the atomic updates these macros do. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Block Tap driver
Hello, I am doing a bit of research on block tap, and I cannot find the source code of a blktap driver module in the linux kernel. I see in the linux tree some references to commits related to blktap, but I am stuck with finding the actual code. It would be really helpful to provide me some directions because I am completely lost. I also noticed that blktap2 from tools was removed and added again later. What is the reason for that? Thank you very much! Best, Roxana ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v2] docs/xl: Document pci-assignable state
George Dunlap writes ("[PATCH for-4.13 v2] docs/xl: Document pci-assignable state"): > Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and > ba2ab00bbb ("IOMMU: default to always quarantining PCI devices") > introduced PCI device "quarantine" behavior, but did not document how > the pci-assignable-add and -remove functions act in regard to this. > Rectify this. > > Signed-off-by: George Dunlap > Release-acked-by: Juergen Gross Acked-by: Ian Jackson ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP
On 26/11/2019 16:14, Jan Beulich wrote: > On 26.11.2019 17:11, Andrew Cooper wrote: >> On 26/11/2019 16:05, Jan Beulich wrote: >>> On 26.11.2019 16:59, Andrew Cooper wrote: On 26/11/2019 15:32, Jan Beulich wrote: > On 26.11.2019 13:03, Andrew Cooper wrote: >> ICEBP isn't handled well by SVM. >> >> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the >> appropriate instruction boundary (fault or trap, as appropriate), except >> for >> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP >> instruction >> rather than after it. As ICEBP isn't distinguished in the vectoring >> event >> type, the state is ambiguous. >> >> To add to the confusion, an ICEBP which occurs due to Introspection >> intercepting the instruction, or from x86_emulate() will have %rip >> updated as >> a consequence of partial emulation required to inject an ICEBP event in >> the >> first place. >> >> We could in principle spot the non-injected case in the TASK_SWITCH >> handler, >> but this still results in complexity if the ICEBP instruction also has an >> Instruction Breakpoint active on it (which genuinely has fault >> semantics). >> >> Unconditionally intercept ICEBP. This does have a trap semantics for the >> intercept, and allows us to move %rip forwards appropriately before the >> TASK_SWITCH intercept is hit. > Both because of you mentioning the moving forwards of %rip and with the > irc discussion in mind that we had no irc, don't you mean "fault > semantics" here? ICEBP really is too broken under SVM to handle architecturally. The ICEBP intercept has nRIP decode support, because it is an instruction intercept. We emulate the injection (because it is ICEBP), which means we re-enter the guest with %rip moved forward, and #DB (HW_EXCEPTION) pending for injection. This means that... > If so > Reviewed-by: Jan Beulich ... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after the ICEBP instruction, rather than at it, making it consistent with every other #DB-vectored TASK_SWITCH. This does means that an early task-switch fault for ICEBP will reliably be delivered with the wrong (i.e. trap) semantics, but this is less bad than mixed fault/trap semantics depending on whether the source of the ICEBP was introspection/emulation or native execution. We could restore proper fault behaviour by extending svm_emul_swint_injection() to figure out that a task switch is needed, and invoke hvm_task_switch() directly, but I don't have enough TUITS right now. > Otherwise I guess I'm still missing something. I hope this clears it up. >>> Well, it helps, but you don't really answer the question: Is "trap" >>> in that sentence of the description really correct? I.e. don't you >>> instead mean "fault" there? >> I've reworded that bit to: >> >> Unconditionally intercept ICEBP. This does have NRIPs support as it is an >> instruction intercept, which allows us allows us to move %rip forwards >> appropriately before the TASK_SWITCH intercept is hit. This allows... >> >> Any better? > Ah yes, thanks. (But please drop one of the two "allows us".) Oops yes. Irritatingly, that causes #DB-vectored to move onto a new line, and trigger Git's comment syntax. I'll tweak a little bit more. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] 4.13 Release blockers
In order to be able to release Xen 4.13 we need to get all regressions fixed rather soon. I know there are quite some patches waiting to be taken for 4.13. So please, don't tag any further patches as "for 4.13" if they are not fixing either: - a regression introduced since 4.12 - a severe bug of a 4.13 feature I'd like to ask all patch authors who have pending patches "for 4.13" to reply to their patches clearly stating whether the patch qualifies for 4.13 regarding above rules. For any such pending patches I'd like the maintainers to review those patches at high priority. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] xen/blkback: Avoid unmapping unmapped grant pages
From: SeongJae Park For each I/O request, blkback first maps the foreign pages for the request to its local pages. If an allocation of a local page for the mapping fails, it should unmap every mapping already made for the request. However, blkback's handling mechanism for the allocation failure does not mark the remaining foreign pages as unmapped. Therefore, the unmap function merely tries to unmap every valid grant page for the request, including the pages not mapped due to the allocation failure. On a system that fails the allocation frequently, this problem leads to following kernel crash. [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0001 [ 372.012546] IP: [] gnttab_unmap_refs.part.7+0x1c/0x40 [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0 [ 372.012562] Oops: 0002 [#1] SMP [ 372.012566] Modules linked in: act_police sch_ingress cls_u32 ... [ 372.012746] Call Trace: [ 372.012752] [] gnttab_unmap_refs+0x34/0x40 [ 372.012759] [] xen_blkbk_unmap+0x83/0x150 [xen_blkback] ... [ 372.012802] [] dispatch_rw_block_io+0x970/0x980 [xen_blkback] ... Decompressing Linux... Parsing ELF... done. Booting the kernel. [0.00] Initializing cgroup subsys cpuset This commit fixes this problem by marking the grant pages of the given request that didn't mapped due to the allocation failure as invalid. Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings") Signed-off-by: SeongJae Park Reviewed-by: David Woodhouse Reviewed-by: Maximilian Heyne Reviewed-by: Paul Durrant --- drivers/block/xen-blkback/blkback.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index fd1e19f1a49f..3666afa639d1 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -936,6 +936,8 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring, out_of_memory: pr_alert("%s: out of memory\n", __func__); put_free_pages(ring, pages_to_gnt, segs_to_map); + for (i = last_map; i < num; i++) + pages[i]->handle = BLKBACK_INVALID_HANDLE; return -ENOMEM; } -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP
On 26.11.2019 17:11, Andrew Cooper wrote: > On 26/11/2019 16:05, Jan Beulich wrote: >> On 26.11.2019 16:59, Andrew Cooper wrote: >>> On 26/11/2019 15:32, Jan Beulich wrote: On 26.11.2019 13:03, Andrew Cooper wrote: > ICEBP isn't handled well by SVM. > > The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the > appropriate instruction boundary (fault or trap, as appropriate), except > for > an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP > instruction > rather than after it. As ICEBP isn't distinguished in the vectoring event > type, the state is ambiguous. > > To add to the confusion, an ICEBP which occurs due to Introspection > intercepting the instruction, or from x86_emulate() will have %rip > updated as > a consequence of partial emulation required to inject an ICEBP event in > the > first place. > > We could in principle spot the non-injected case in the TASK_SWITCH > handler, > but this still results in complexity if the ICEBP instruction also has an > Instruction Breakpoint active on it (which genuinely has fault semantics). > > Unconditionally intercept ICEBP. This does have a trap semantics for the > intercept, and allows us to move %rip forwards appropriately before the > TASK_SWITCH intercept is hit. Both because of you mentioning the moving forwards of %rip and with the irc discussion in mind that we had no irc, don't you mean "fault semantics" here? >>> ICEBP really is too broken under SVM to handle architecturally. >>> >>> The ICEBP intercept has nRIP decode support, because it is an >>> instruction intercept. We emulate the injection (because it is ICEBP), >>> which means we re-enter the guest with %rip moved forward, and #DB >>> (HW_EXCEPTION) pending for injection. This means that... >>> If so Reviewed-by: Jan Beulich >>> ... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after >>> the ICEBP instruction, rather than at it, making it consistent with >>> every other #DB-vectored TASK_SWITCH. >>> >>> This does means that an early task-switch fault for ICEBP will reliably >>> be delivered with the wrong (i.e. trap) semantics, but this is less bad >>> than mixed fault/trap semantics depending on whether the source of the >>> ICEBP was introspection/emulation or native execution. >>> >>> We could restore proper fault behaviour by extending >>> svm_emul_swint_injection() to figure out that a task switch is needed, >>> and invoke hvm_task_switch() directly, but I don't have enough TUITS >>> right now. >>> Otherwise I guess I'm still missing something. >>> I hope this clears it up. >> Well, it helps, but you don't really answer the question: Is "trap" >> in that sentence of the description really correct? I.e. don't you >> instead mean "fault" there? > > I've reworded that bit to: > > Unconditionally intercept ICEBP. This does have NRIPs support as it is an > instruction intercept, which allows us allows us to move %rip forwards > appropriately before the TASK_SWITCH intercept is hit. This allows... > > Any better? Ah yes, thanks. (But please drop one of the two "allows us".) Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP
On 26/11/2019 16:05, Jan Beulich wrote: > On 26.11.2019 16:59, Andrew Cooper wrote: >> On 26/11/2019 15:32, Jan Beulich wrote: >>> On 26.11.2019 13:03, Andrew Cooper wrote: ICEBP isn't handled well by SVM. The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the appropriate instruction boundary (fault or trap, as appropriate), except for an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction rather than after it. As ICEBP isn't distinguished in the vectoring event type, the state is ambiguous. To add to the confusion, an ICEBP which occurs due to Introspection intercepting the instruction, or from x86_emulate() will have %rip updated as a consequence of partial emulation required to inject an ICEBP event in the first place. We could in principle spot the non-injected case in the TASK_SWITCH handler, but this still results in complexity if the ICEBP instruction also has an Instruction Breakpoint active on it (which genuinely has fault semantics). Unconditionally intercept ICEBP. This does have a trap semantics for the intercept, and allows us to move %rip forwards appropriately before the TASK_SWITCH intercept is hit. >>> Both because of you mentioning the moving forwards of %rip and with the >>> irc discussion in mind that we had no irc, don't you mean "fault >>> semantics" here? >> ICEBP really is too broken under SVM to handle architecturally. >> >> The ICEBP intercept has nRIP decode support, because it is an >> instruction intercept. We emulate the injection (because it is ICEBP), >> which means we re-enter the guest with %rip moved forward, and #DB >> (HW_EXCEPTION) pending for injection. This means that... >> >>> If so >>> Reviewed-by: Jan Beulich >> ... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after >> the ICEBP instruction, rather than at it, making it consistent with >> every other #DB-vectored TASK_SWITCH. >> >> This does means that an early task-switch fault for ICEBP will reliably >> be delivered with the wrong (i.e. trap) semantics, but this is less bad >> than mixed fault/trap semantics depending on whether the source of the >> ICEBP was introspection/emulation or native execution. >> >> We could restore proper fault behaviour by extending >> svm_emul_swint_injection() to figure out that a task switch is needed, >> and invoke hvm_task_switch() directly, but I don't have enough TUITS >> right now. >> >>> Otherwise I guess I'm still missing something. >> I hope this clears it up. > Well, it helps, but you don't really answer the question: Is "trap" > in that sentence of the description really correct? I.e. don't you > instead mean "fault" there? I've reworded that bit to: Unconditionally intercept ICEBP. This does have NRIPs support as it is an instruction intercept, which allows us allows us to move %rip forwards appropriately before the TASK_SWITCH intercept is hit. This allows... Any better? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP
On 26/11/2019 15:34, Roger Pau Monné wrote: > On Tue, Nov 26, 2019 at 12:03:56PM +, Andrew Cooper wrote: >> ICEBP isn't handled well by SVM. >> >> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the >> appropriate instruction boundary (fault or trap, as appropriate), except for >> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction >> rather than after it. As ICEBP isn't distinguished in the vectoring event >> type, the state is ambiguous. >> >> To add to the confusion, an ICEBP which occurs due to Introspection >> intercepting the instruction, or from x86_emulate() will have %rip updated as >> a consequence of partial emulation required to inject an ICEBP event in the >> first place. >> >> We could in principle spot the non-injected case in the TASK_SWITCH handler, >> but this still results in complexity if the ICEBP instruction also has an >> Instruction Breakpoint active on it (which genuinely has fault semantics). >> >> Unconditionally intercept ICEBP. This does have a trap semantics for the >> intercept, and allows us to move %rip forwards appropriately before the >> TASK_SWITCH intercept is hit. This makes the behaviour of #DB-vectored >> switches consistent however the ICEBP #DB came about, and avoids special >> cases >> in the TASK_SWITCH intercept. >> >> This in turn allows for the removal of the conditional >> hvm_set_icebp_interception() logic used by the monitor subsystem, as ICEBP's >> will now always be submitted for monitoring checks. >> >> Signed-off-by: Andrew Cooper > Reviewed-by: Roger Pau Monné > > AFAICT this brings AMD implementation inline with Intel that also will > unconditionally vmexit on icebp? VT-x and SVM handle things quite differently. VT-x has no instruction intercept for ICEBP, but the #DB intercept will triggered by an ICEBP instruction. ICEBP has its own event type (Privileged Software Exception, which is an amusing name considering it is an unprivleged instruction, bypasses privilege checks, and sets the External bit in an error code). SVM does have an instruction intercept for ICEBP, but the #DB from ICEBP's don't trigger the normal #DB intercept. However, secondary #DB's generated by ICEBP's unintercepted #DB do trigger the #DB intercept. For safety reasons we must intercept #DB to prevent CPU deadlocks. This means that ICEBP are in practice always intercepted on Intel due to their #DB side effect, but they weren't intercepted on AMD, which is why the monitor subsystem had a way of turning interception on. So yes, the overall effect is that ICEBPs will now unconditionally vmexit on both Intel and AMD, but underlying mechanism for why they vmexit is still vendor-specific. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP
On 26.11.2019 16:59, Andrew Cooper wrote: > On 26/11/2019 15:32, Jan Beulich wrote: >> On 26.11.2019 13:03, Andrew Cooper wrote: >>> ICEBP isn't handled well by SVM. >>> >>> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the >>> appropriate instruction boundary (fault or trap, as appropriate), except for >>> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction >>> rather than after it. As ICEBP isn't distinguished in the vectoring event >>> type, the state is ambiguous. >>> >>> To add to the confusion, an ICEBP which occurs due to Introspection >>> intercepting the instruction, or from x86_emulate() will have %rip updated >>> as >>> a consequence of partial emulation required to inject an ICEBP event in the >>> first place. >>> >>> We could in principle spot the non-injected case in the TASK_SWITCH handler, >>> but this still results in complexity if the ICEBP instruction also has an >>> Instruction Breakpoint active on it (which genuinely has fault semantics). >>> >>> Unconditionally intercept ICEBP. This does have a trap semantics for the >>> intercept, and allows us to move %rip forwards appropriately before the >>> TASK_SWITCH intercept is hit. >> Both because of you mentioning the moving forwards of %rip and with the >> irc discussion in mind that we had no irc, don't you mean "fault >> semantics" here? > > ICEBP really is too broken under SVM to handle architecturally. > > The ICEBP intercept has nRIP decode support, because it is an > instruction intercept. We emulate the injection (because it is ICEBP), > which means we re-enter the guest with %rip moved forward, and #DB > (HW_EXCEPTION) pending for injection. This means that... > >> If so >> Reviewed-by: Jan Beulich > > ... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after > the ICEBP instruction, rather than at it, making it consistent with > every other #DB-vectored TASK_SWITCH. > > This does means that an early task-switch fault for ICEBP will reliably > be delivered with the wrong (i.e. trap) semantics, but this is less bad > than mixed fault/trap semantics depending on whether the source of the > ICEBP was introspection/emulation or native execution. > > We could restore proper fault behaviour by extending > svm_emul_swint_injection() to figure out that a task switch is needed, > and invoke hvm_task_switch() directly, but I don't have enough TUITS > right now. > >> Otherwise I guess I'm still missing something. > > I hope this clears it up. Well, it helps, but you don't really answer the question: Is "trap" in that sentence of the description really correct? I.e. don't you instead mean "fault" there? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP
On 26/11/2019 15:32, Jan Beulich wrote: > On 26.11.2019 13:03, Andrew Cooper wrote: >> ICEBP isn't handled well by SVM. >> >> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the >> appropriate instruction boundary (fault or trap, as appropriate), except for >> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction >> rather than after it. As ICEBP isn't distinguished in the vectoring event >> type, the state is ambiguous. >> >> To add to the confusion, an ICEBP which occurs due to Introspection >> intercepting the instruction, or from x86_emulate() will have %rip updated as >> a consequence of partial emulation required to inject an ICEBP event in the >> first place. >> >> We could in principle spot the non-injected case in the TASK_SWITCH handler, >> but this still results in complexity if the ICEBP instruction also has an >> Instruction Breakpoint active on it (which genuinely has fault semantics). >> >> Unconditionally intercept ICEBP. This does have a trap semantics for the >> intercept, and allows us to move %rip forwards appropriately before the >> TASK_SWITCH intercept is hit. > Both because of you mentioning the moving forwards of %rip and with the > irc discussion in mind that we had no irc, don't you mean "fault > semantics" here? ICEBP really is too broken under SVM to handle architecturally. The ICEBP intercept has nRIP decode support, because it is an instruction intercept. We emulate the injection (because it is ICEBP), which means we re-enter the guest with %rip moved forward, and #DB (HW_EXCEPTION) pending for injection. This means that... > If so > Reviewed-by: Jan Beulich ... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after the ICEBP instruction, rather than at it, making it consistent with every other #DB-vectored TASK_SWITCH. This does means that an early task-switch fault for ICEBP will reliably be delivered with the wrong (i.e. trap) semantics, but this is less bad than mixed fault/trap semantics depending on whether the source of the ICEBP was introspection/emulation or native execution. We could restore proper fault behaviour by extending svm_emul_swint_injection() to figure out that a task switch is needed, and invoke hvm_task_switch() directly, but I don't have enough TUITS right now. > Otherwise I guess I'm still missing something. I hope this clears it up. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 for 4.13] x86/microcode: refuse to load the same revision ucode
On 26.11.2019 16:41, Sergey Dyasli wrote: > Currently if a user tries to live-load the same or older ucode revision > than CPU already has, he will get a single message in Xen log like: > > (XEN) 128 cores are to update their microcode > > No actual ucode loading will happen and this situation can be quite > confusing. Fix this by starting ucode update only when the provided > ucode revision is higher than the currently cached one (if any). > This is based on the property that if microcode_cache exists, all CPUs > in the system should have at least that ucode revision. > > Additionally, print a user friendly message if no matching or newer > ucode can be found in the provided blob. This also requires ignoring > -ENODATA in AMD-side code, otherwise the message given to the user is: > > (XEN) Parsing microcode blob error -61 > > Which actually means that a ucode blob was parsed fine, but no matching > ucode was found. > > Signed-off-by: Sergey Dyasli Acked-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.13 v2] docs/xl: Document pci-assignable state
Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and ba2ab00bbb ("IOMMU: default to always quarantining PCI devices") introduced PCI device "quarantine" behavior, but did not document how the pci-assignable-add and -remove functions act in regard to this. Rectify this. Signed-off-by: George Dunlap Release-acked-by: Juergen Gross --- Release justification: This brings documentation into line with the actual code that will be released. CC: Ian Jackson CC: Wei Liu CC: Jan Beulich CC: Paul Durrant CC: Juergen Gross --- docs/man/xl.1.pod.in | 31 ++- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in index 2303b81e4f..d4b5e8e362 100644 --- a/docs/man/xl.1.pod.in +++ b/docs/man/xl.1.pod.in @@ -1589,10 +1589,12 @@ backend driver in domain 0 rather than a real driver. =item B I Make the device at PCI Bus/Device/Function BDF assignable to guests. -This will bind the device to the pciback driver. If it is already -bound to a driver, it will first be unbound, and the original driver -stored so that it can be re-bound to the same driver later if desired. -If the device is already bound, it will return success. +This will bind the device to the pciback driver and assign it to the +"quarantine domain". If it is already bound to a driver, it will +first be unbound, and the original driver stored so that it can be +re-bound to the same driver later if desired. If the device is +already bound, it will assign it to the quarantine domain and return +success. CAUTION: This will make the device unusable by Domain 0 until it is returned with pci-assignable-remove. Care should therefore be taken @@ -1602,11 +1604,22 @@ being used. =item B [I<-r>] I -Make the device at PCI Bus/Device/Function BDF not assignable to guests. This -will at least unbind the device from pciback. If the -r option is specified, -it will also attempt to re-bind the device to its original driver, making it -usable by Domain 0 again. If the device is not bound to pciback, it will -return success. +Make the device at PCI Bus/Device/Function BDF not assignable to +guests. This will at least unbind the device from pciback, and +re-assign it from the "quarantine domain" back to domain 0. If the -r +option is specified, it will also attempt to re-bind the device to its +original driver, making it usable by Domain 0 again. If the device is +not bound to pciback, it will return success. + +Note that this functionality will work even for devices which were not +made assignable by B. This can be used to allow +dom0 to access devices which were automatically quarantined by Xen +after domain destruction as a result of Xen's B +command-line default. + +As always, this should only be done if you trust the guest, or are +confident that the particular device you're re-assigning to dom0 will +cancel all in-flight DMA on FLR. =item B I I -- 2.24.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] x86/svm: Write the correct %eip into the outgoing task
On 26.11.2019 13:03, Andrew Cooper wrote: > The TASK_SWITCH vmexit has fault semantics, and doesn't provide any NRIPs > assistance with instruction length. As a result, any instruction-induced task > switch has the outgoing task's %eip pointing at the instruction switch caused > the switch, rather than after it. > > This causes callers of task gates to livelock (repeatedly execute the call/jmp > to enter the task), and any restartable task to become a nop after its first > use (the (re)entry state points at the ret/iret used to exit the task). > > 32bit Windows in particular is known to use task gates for NMI handling, and > to use NMI IPIs. > > In the task switch handler, distinguish instruction-induced from > interrupt/exception-induced task switches, and decode the instruction under > %rip to calculate its length. > > Signed-off-by: Andrew Cooper Reviewed-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 for 4.13] x86/microcode: refuse to load the same revision ucode
Currently if a user tries to live-load the same or older ucode revision than CPU already has, he will get a single message in Xen log like: (XEN) 128 cores are to update their microcode No actual ucode loading will happen and this situation can be quite confusing. Fix this by starting ucode update only when the provided ucode revision is higher than the currently cached one (if any). This is based on the property that if microcode_cache exists, all CPUs in the system should have at least that ucode revision. Additionally, print a user friendly message if no matching or newer ucode can be found in the provided blob. This also requires ignoring -ENODATA in AMD-side code, otherwise the message given to the user is: (XEN) Parsing microcode blob error -61 Which actually means that a ucode blob was parsed fine, but no matching ucode was found. Signed-off-by: Sergey Dyasli --- v2 --> v3: - move ucode comparison to generic code - ignore -ENODATA in a different code section v1 --> v2: - compare provided ucode with the currently cached one CC: Jan Beulich CC: Andrew Cooper CC: Roger Pau Monné CC: Chao Gao CC: Juergen Gross --- xen/arch/x86/microcode.c | 19 +++ xen/arch/x86/microcode_amd.c | 7 +++ 2 files changed, 26 insertions(+) diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c index 65d1f41e7c..44efc2d9b3 100644 --- a/xen/arch/x86/microcode.c +++ b/xen/arch/x86/microcode.c @@ -640,10 +640,29 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) if ( !patch ) { +printk(XENLOG_WARNING "microcode: couldn't find any matching ucode in " + "the provided blob!\n"); ret = -ENOENT; goto put; } +/* + * If microcode_cache exists, all CPUs in the system should have at least + * that ucode revision. + */ +spin_lock(_mutex); +if ( microcode_cache && + microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE ) +{ +spin_unlock(_mutex); +printk(XENLOG_WARNING "microcode: couldn't find any newer revision " + "in the provided blob!\n"); +ret = -ENOENT; + +goto put; +} +spin_unlock(_mutex); + if ( microcode_ops->start_update ) { ret = microcode_ops->start_update(); diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c index 1e52f7f49a..00750f7bbb 100644 --- a/xen/arch/x86/microcode_amd.c +++ b/xen/arch/x86/microcode_amd.c @@ -502,6 +502,13 @@ static struct microcode_patch *cpu_request_microcode(const void *buf, if ( error ) { +/* + * -ENODATA here means that the blob was parsed fine but no matching + * ucode was found. Don't return it to the caller. + */ +if ( error == -ENODATA ) +error = 0; + xfree(mc_amd->equiv_cpu_table); xfree(mc_amd); goto out; -- 2.17.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 0/2] Fixes to AMD IOMMU logging
On 26.11.19 16:01, Andrew Cooper wrote: RFC for-4.13. These are diagnostic improvements/corrections, so are low risk and high utility. Sorry, but the release is at least 1 month late now. As said before: I'll only take real bug fixes now. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP
On Tue, Nov 26, 2019 at 12:03:56PM +, Andrew Cooper wrote: > ICEBP isn't handled well by SVM. > > The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the > appropriate instruction boundary (fault or trap, as appropriate), except for > an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction > rather than after it. As ICEBP isn't distinguished in the vectoring event > type, the state is ambiguous. > > To add to the confusion, an ICEBP which occurs due to Introspection > intercepting the instruction, or from x86_emulate() will have %rip updated as > a consequence of partial emulation required to inject an ICEBP event in the > first place. > > We could in principle spot the non-injected case in the TASK_SWITCH handler, > but this still results in complexity if the ICEBP instruction also has an > Instruction Breakpoint active on it (which genuinely has fault semantics). > > Unconditionally intercept ICEBP. This does have a trap semantics for the > intercept, and allows us to move %rip forwards appropriately before the > TASK_SWITCH intercept is hit. This makes the behaviour of #DB-vectored > switches consistent however the ICEBP #DB came about, and avoids special cases > in the TASK_SWITCH intercept. > > This in turn allows for the removal of the conditional > hvm_set_icebp_interception() logic used by the monitor subsystem, as ICEBP's > will now always be submitted for monitoring checks. > > Signed-off-by: Andrew Cooper Reviewed-by: Roger Pau Monné AFAICT this brings AMD implementation inline with Intel that also will unconditionally vmexit on icebp? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP
On 26.11.2019 13:03, Andrew Cooper wrote: > ICEBP isn't handled well by SVM. > > The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the > appropriate instruction boundary (fault or trap, as appropriate), except for > an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction > rather than after it. As ICEBP isn't distinguished in the vectoring event > type, the state is ambiguous. > > To add to the confusion, an ICEBP which occurs due to Introspection > intercepting the instruction, or from x86_emulate() will have %rip updated as > a consequence of partial emulation required to inject an ICEBP event in the > first place. > > We could in principle spot the non-injected case in the TASK_SWITCH handler, > but this still results in complexity if the ICEBP instruction also has an > Instruction Breakpoint active on it (which genuinely has fault semantics). > > Unconditionally intercept ICEBP. This does have a trap semantics for the > intercept, and allows us to move %rip forwards appropriately before the > TASK_SWITCH intercept is hit. Both because of you mentioning the moving forwards of %rip and with the irc discussion in mind that we had no irc, don't you mean "fault semantics" here? If so Reviewed-by: Jan Beulich Otherwise I guess I'm still missing something. > --- > xen/arch/x86/hvm/svm/svm.c| 19 --- > xen/arch/x86/hvm/svm/vmcb.c | 2 +- > xen/arch/x86/monitor.c| 3 --- > xen/include/asm-x86/hvm/hvm.h | 11 --- > 4 files changed, 1 insertion(+), 34 deletions(-) This, of course, is pretty nice in any event. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
> -Original Message- > From: Xen-devel On Behalf Of Jan > Beulich > Sent: 26 November 2019 14:27 > To: Ian Jackson > Cc: Juergen Gross ; xen-devel@lists.xenproject.org; Paul > Durrant ; George Dunlap > ; Wei Liu > Subject: Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable > state > > On 26.11.2019 15:14, Ian Jackson wrote: > > George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable > state"): > >> =item B [I<-r>] I > > ... > >> +Make the device at PCI Bus/Device/Function BDF not assignable to > >> +guests. This will at least unbind the device from pciback, and > >> +re-assign it from the "quarantine domain" back to domain 0. If the -r > >> +option is specified, it will also attempt to re-bind the device to its > >> +original driver, making it usable by Domain 0 again. If the device is > >> +not bound to pciback, it will return success. > >> + > >> +Note that this functionality will work even for devices which were not > >> +made assignable by B. This can be used to allow > >> +dom0 to access devices which were automatically quarantined by Xen > >> +after domain destruction as a result of Xen's B > >> +command-line default. > > > > What are the security implications of doing this if the device might > > still be doing DMA or something ? > > Devices get reset in between, so well behaving ones should not > still be doing DMA at that point. Misbehaving ones would better > not be assigned (back and forth) anyway. But a recent patch of > Paul's suggests that people still wish to do so, on the > assumption that such DMA will drain sufficiently quickly. Yes. I will hopefully find time to post the next version of that patch this week. Paul > > > (For that matter, presumably there are security implications of > > assigning the same device in sequence to different guests?) > > Right. > > Jan > > ___ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
> -Original Message- > From: Ian Jackson > Sent: 26 November 2019 15:06 > To: George Dunlap ; xen- > de...@lists.xenproject.org; Wei Liu ; Jan Beulich > ; Durrant, Paul ; Juergen Gross > > Subject: Re: [PATCH for-4.13] docs/xl: Document pci-assignable state > > Ian Jackson writes ("Re: [PATCH for-4.13] docs/xl: Document pci-assignable > state"): > > George Dunlap writes ("Re: [PATCH for-4.13] docs/xl: Document pci- > assignable state"): > > > I kind of feel like the discussion of the security risks inherent in > pci > > > passthrough belong in a separate document, but perhaps a brief mention > > > here would be helpful. Perhaps the following? > > > > > > "As always, this should only be done if you trust the guest, or are > > > confident that the particular device you're re-assigning to dom0 will > > > cancel all in-flight DMA on FLR." > > > > SGTM. > > > > I like "as always" which clearly signals that this is a more general > > problem without requiring us to actually write that other > > comprehensive document... > The text sounds fine in general but the 'as always' does rather imply 'hey, we never said PCI pass-through was safe, did we?' Paul ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/arm: remove physical timer offset
On 11/25/2019 5:07 PM, Julien Grall wrote: > Hi, > > On 25/11/2019 16:14, Jeff Kubascik wrote: >> The physical timer traps apply an offset so that time starts at 0 for >> the guest. However, this offset is not currently applied to the physical >> counter. Per the ARMv8 Arch Reference Manual, the offset between the > > Which bit of the Arm Arm do you refer to here? In general, I would > recommend to give the exact section and version of the manual you use to > avoid any misunderstanding. Fair point, I'll clarify this. >> physical timer and counter should be 0. This removes the offset to make >> the timer and counter consistent. >> >> Xen time is at offset boot_count from the physical counter, so we need >> to take this into account when reading/writing to CNTP_CVAL. >> >> Signed-off-by: Jeff Kubascik >> --- >> xen/arch/arm/vtimer.c| 18 ++ >> xen/include/asm-arm/domain.h | 3 --- >> 2 files changed, 6 insertions(+), 15 deletions(-) >> >> diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c >> index e6aebdac9e..4790b5ce58 100644 >> --- a/xen/arch/arm/vtimer.c >> +++ b/xen/arch/arm/vtimer.c >> @@ -62,7 +62,6 @@ static void virt_timer_expired(void *data) >> >> int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig >> *config) >> { >> -d->arch.phys_timer_base.offset = NOW(); >> d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0); >> d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - >> boot_count); >> do_div(d->time_offset_seconds, 10); >> @@ -184,8 +183,7 @@ static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, >> uint32_t *r, bool read) >> >> if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) >> { >> -set_timer(>arch.phys_timer.timer, >> - v->arch.phys_timer.cval + >> v->domain->arch.phys_timer_base.offset); >> +set_timer(>arch.phys_timer.timer, v->arch.phys_timer.cval); >> } >> else >> stop_timer(>arch.phys_timer.timer); >> @@ -202,7 +200,7 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, >> uint32_t *r, >> if ( !ACCESS_ALLOWED(regs, EL0PTEN) ) >> return false; >> >> -now = NOW() - v->domain->arch.phys_timer_base.offset; >> +now = NOW(); >> >> if ( read ) >> { >> @@ -214,9 +212,7 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, >> uint32_t *r, >> if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) >> { >> v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; >> -set_timer(>arch.phys_timer.timer, >> - v->arch.phys_timer.cval + >> - v->domain->arch.phys_timer_base.offset); >> +set_timer(>arch.phys_timer.timer, v->arch.phys_timer.cval); >> } >> } >> return true; >> @@ -232,17 +228,15 @@ static bool vtimer_cntp_cval(struct cpu_user_regs >> *regs, uint64_t *r, >> >> if ( read ) >> { >> -*r = ns_to_ticks(v->arch.phys_timer.cval); >> +*r = ns_to_ticks(v->arch.phys_timer.cval) + boot_count; >> } >> else >> { >> -v->arch.phys_timer.cval = ticks_to_ns(*r); >> +v->arch.phys_timer.cval = ticks_to_ns(*r - boot_count); > > I know that this is already like that in the code. But it feels weird > (to not say wrong) that cval will have a different meaning between the > virtual timer and physical timer. > > Indeed, in the former case it is an exact copy of the hardware value > whilst in the latter it is the hardware value - NOW(). I noticed this discrepancy as well. Even worse, the virtual timer cval is in ticks and the physical timer cval is Xen system time in ns. I believe that changing the physical timer cval to be the hardware value in ticks would be the more correct approach. The conversion to Xen system time is only needed for the timer APIs. > While you are reworking a big chunk of the physical timer emulation, > could you looking at removing this discrepancy? > >> if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE ) >> { >> v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING; >> -set_timer(>arch.phys_timer.timer, >> - v->arch.phys_timer.cval + >> - v->domain->arch.phys_timer_base.offset); >> +set_timer(>arch.phys_timer.timer, v->arch.phys_timer.cval); >> } >> } >> return true; >> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h >> index 86ebdd2bcf..16a7150a95 100644 >> --- a/xen/include/asm-arm/domain.h >> +++ b/xen/include/asm-arm/domain.h >> @@ -65,9 +65,6 @@ struct arch_domain >> RELMEM_done, >> } relmem; >> >> -struct { >> -uint64_t offset; >> -} phys_timer_base; >> struct { >> uint64_t offset; >> } virt_timer_base; >> > > Cheers, > > -- > Julien Grall > Sure thing, I'll update
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
> -Original Message- > From: Ian Jackson > Sent: 26 November 2019 14:22 > To: George Dunlap > Cc: xen-devel@lists.xenproject.org; Wei Liu ; Jan Beulich > ; Paul Durrant ; Juergen Gross > > Subject: Re: [PATCH for-4.13] docs/xl: Document pci-assignable state > > [resending to just Paul to fix email address problem] > > George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable > state"): > > =item B [I<-r>] I > ... > > +Make the device at PCI Bus/Device/Function BDF not assignable to > > +guests. This will at least unbind the device from pciback, and > > +re-assign it from the "quarantine domain" back to domain 0. If the -r > > +option is specified, it will also attempt to re-bind the device to its > > +original driver, making it usable by Domain 0 again. If the device is > > +not bound to pciback, it will return success. > > + > > +Note that this functionality will work even for devices which were not > > +made assignable by B. This can be used to allow > > +dom0 to access devices which were automatically quarantined by Xen > > +after domain destruction as a result of Xen's B > > +command-line default. > > What are the security implications of doing this if the device might > still be doing DMA or something ? > > (For that matter, presumably there are security implications of > assigning the same device in sequence to different guests?) > Assigning any device carries a risk and can never considered to be secure in any general way. E.g. a device that exposes its config space in a writable fashion via an internal i2c bus that can be accessed via one of its BARs. Quarantining helps to the extent that, if a device is continuing to DMA than at least that doesn't hit dom0 whilst the FLR/SBR is attempted, but if even that's not effective then the device should probably remain in quarantine until it is power-cycled. Paul ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs
On 26.11.2019 15:34, Andrew Cooper wrote: > On 26/11/2019 14:14, Jan Beulich wrote: >> On 26.11.2019 13:25, Andrew Cooper wrote: >>> On 26/11/2019 08:42, Jan Beulich wrote: On 25.11.2019 22:05, Igor Druzhinin wrote: > --- a/xen/drivers/passthrough/amd/iommu_init.c > +++ b/xen/drivers/passthrough/amd/iommu_init.c > @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table( > for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf ) > dt[bdf] = (struct amd_iommu_dte){ >.v = true, > - .iv = true, > + .iv = iommu_intremap, This was very intentionally "true", and ignoring "iommu_intremap": >>> Deliberate or not, it is a regression from 4.12. >> I accept it's a regression (which wants fixing), but I don't think >> this is the way to address is. I could be convinced by good >> arguments, though. >> >>> Booting with iommu=no-intremap is a common debugging technique, and that >>> means no interrupt remapping anywhere in the system, even for >>> supposedly-unused DTEs. >> Whether IV=1 or IV=0, there's no interrupt _remapping_ with this >> option specified. There's some interrupt _blocking_, yes. It's >> not immediately clear to me whether this is a good or a bad thing. > > You're attempting to argue semantics. Blocking is a special case remapping. > > "iommu=no-intremap" (for better or worse, naming wise) refers to the > interrupt mediation functionality in the IOMMU, and means "don't use any > of it". Any other behaviour is a regression. I can accept this pov. Nevertheless I'd like to first see whether we can't address the issue at hand with a less big hammer solution. We can then always decide to still put in this change. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP
On Tue, 2019-11-26 at 12:03 +, Andrew Cooper wrote: > ICEBP isn't handled well by SVM. > > The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to > the > appropriate instruction boundary (fault or trap, as appropriate), > except for > an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP > instruction > rather than after it. As ICEBP isn't distinguished in the vectoring > event > type, the state is ambiguous. > > To add to the confusion, an ICEBP which occurs due to Introspection > intercepting the instruction, or from x86_emulate() will have %rip > updated as > a consequence of partial emulation required to inject an ICEBP event > in the > first place. > > We could in principle spot the non-injected case in the TASK_SWITCH > handler, > but this still results in complexity if the ICEBP instruction also > has an > Instruction Breakpoint active on it (which genuinely has fault > semantics). > > Unconditionally intercept ICEBP. This does have a trap semantics for > the > intercept, and allows us to move %rip forwards appropriately before > the > TASK_SWITCH intercept is hit. This makes the behaviour of #DB- > vectored > switches consistent however the ICEBP #DB came about, and avoids > special cases > in the TASK_SWITCH intercept. > > This in turn allows for the removal of the conditional > hvm_set_icebp_interception() logic used by the monitor subsystem, as > ICEBP's > will now always be submitted for monitoring checks. > > Signed-off-by: Andrew Cooper > Reviewed-by: Petre Pircalabu ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
Ian Jackson writes ("Re: [PATCH for-4.13] docs/xl: Document pci-assignable state"): > George Dunlap writes ("Re: [PATCH for-4.13] docs/xl: Document pci-assignable > state"): > > I kind of feel like the discussion of the security risks inherent in pci > > passthrough belong in a separate document, but perhaps a brief mention > > here would be helpful. Perhaps the following? > > > > "As always, this should only be done if you trust the guest, or are > > confident that the particular device you're re-assigning to dom0 will > > cancel all in-flight DMA on FLR." > > SGTM. > > I like "as always" which clearly signals that this is a more general > problem without requiring us to actually write that other > comprehensive document... Resending with Paul's new address. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
George Dunlap writes ("Re: [PATCH for-4.13] docs/xl: Document pci-assignable state"): > I kind of feel like the discussion of the security risks inherent in pci > passthrough belong in a separate document, but perhaps a brief mention > here would be helpful. Perhaps the following? > > "As always, this should only be done if you trust the guest, or are > confident that the particular device you're re-assigning to dom0 will > cancel all in-flight DMA on FLR." SGTM. I like "as always" which clearly signals that this is a more general problem without requiring us to actually write that other comprehensive document... Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.13 0/2] Fixes to AMD IOMMU logging
RFC for-4.13. These are diagnostic improvements/corrections, so are low risk and high utility. Andrew Cooper (2): AMD/IOMMU: Always print IOMMU errors AMD/IOMMU: Render IO_PAGE_FAULT errors in a more useful manner xen/drivers/passthrough/amd/iommu_init.c | 48 +++ xen/include/asm-x86/hvm/svm/amd-iommu-proto.h | 3 -- 2 files changed, 27 insertions(+), 24 deletions(-) -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 1/2] AMD/IOMMU: Always print IOMMU errors
Unhandled IOMMU errors (i.e. not IO_PAGE_FAULT) should still be printed, and not hidden behind iommu=debug. While adjusting this, factor out the symbolic name handling to just one location exposing its off-by-one nature. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Juergen Gross --- xen/drivers/passthrough/amd/iommu_init.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c index 16e84d43d4..8aa8788797 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -530,6 +530,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[]) EVENT_STR(INVALID_DEV_REQUEST) #undef EVENT_STR }; +const char *code_str = "event"; code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK, IOMMU_EVENT_CODE_SHIFT); @@ -553,6 +554,10 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[]) IOMMU_EVENT_CODE_SHIFT); } +/* Look up the symbolic name for code. */ +if ( code <= ARRAY_SIZE(event_str) ) +code_str = event_str[code - 1]; + if ( code == IOMMU_EVENT_IO_PAGE_FAULT ) { device_id = iommu_get_devid_from_event(entry[0]); @@ -566,7 +571,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[]) printk(XENLOG_ERR "AMD-Vi: " "%s: domain = %d, device id = %#x, " "fault address = %#"PRIx64", flags = %#x\n", - event_str[code-1], domain_id, device_id, *addr, flags); + code_str, domain_id, device_id, *addr, flags); for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ ) if ( get_dma_requestor_id(iommu->seg, bdf) == device_id ) @@ -574,12 +579,8 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[]) PCI_DEVFN2(bdf)); } else -{ -AMD_IOMMU_DEBUG("%s %08x %08x %08x %08x\n", -code <= ARRAY_SIZE(event_str) ? event_str[code - 1] - : "event", -entry[0], entry[1], entry[2], entry[3]); -} +printk(XENLOG_ERR "%s %08x %08x %08x %08x\n", + code_str, entry[0], entry[1], entry[2], entry[3]); memset(entry, 0, IOMMU_EVENT_LOG_ENTRY_SIZE); } -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/2] AMD/IOMMU: Render IO_PAGE_FAULT errors in a more useful manner
Print the PCI coordinates in its common format and use d%u notation for the domain. As well as printing flags, decode them. IO_PAGE_FAULT is used for interrupt remapping errors as well as DMA remapping errors. Before: (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 0xbf695000, flags = 0x10 (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 0xbf695040, flags = 0x10 (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 0xfff0, flags = 0x30 (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 0x1, flags = 0x30 (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 0x10040, flags = 0x30 After: (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr bf5fc000 flags 0x10 PR (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr bf5fc040 flags 0x10 PR (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr fff0 flags 0x30 RW PR (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr 0001 flags 0x30 RW PR (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr 00010040 flags 0x30 RW PR No functional change. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Juergen Gross --- xen/drivers/passthrough/amd/iommu_init.c | 35 +++ xen/include/asm-x86/hvm/svm/amd-iommu-proto.h | 3 --- 2 files changed, 20 insertions(+), 18 deletions(-) diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c index 8aa8788797..cd4e6e16b8 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -513,10 +513,7 @@ static hw_irq_controller iommu_x2apic_type = { static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[]) { -u16 domain_id, device_id, flags; -unsigned int bdf; u32 code; -u64 *addr; int count = 0; static const char *const event_str[] = { #define EVENT_STR(name) [IOMMU_EVENT_##name - 1] = #name @@ -560,18 +557,26 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[]) if ( code == IOMMU_EVENT_IO_PAGE_FAULT ) { -device_id = iommu_get_devid_from_event(entry[0]); -domain_id = get_field_from_reg_u32(entry[1], - IOMMU_EVENT_DOMAIN_ID_MASK, - IOMMU_EVENT_DOMAIN_ID_SHIFT); -flags = get_field_from_reg_u32(entry[1], - IOMMU_EVENT_FLAGS_MASK, - IOMMU_EVENT_FLAGS_SHIFT); -addr= (u64*) (entry + 2); -printk(XENLOG_ERR "AMD-Vi: " - "%s: domain = %d, device id = %#x, " - "fault address = %#"PRIx64", flags = %#x\n", - code_str, domain_id, device_id, *addr, flags); +unsigned int bdf; +uint16_t device_id = MASK_EXTR(entry[0], IOMMU_CMD_DEVICE_ID_MASK); +uint16_t domain_id = MASK_EXTR(entry[1], IOMMU_EVENT_DOMAIN_ID_MASK); +uint16_t flags = MASK_EXTR(entry[1], IOMMU_EVENT_FLAGS_MASK); +uint64_t addr = *(uint64_t *)(entry + 2); + +printk(XENLOG_ERR "AMD-Vi: %s: %04x:%02x:%02x.%u d%d addr %016"PRIx64 + " flags %#x%s%s%s%s%s%s%s%s%s%s\n", + code_str, iommu->seg, PCI_BUS(device_id), PCI_SLOT(device_id), + PCI_FUNC(device_id), domain_id, addr, flags, + (flags & 0xe00) ? " ??" : "", + (flags & 0x100) ? " TR" : "", + (flags & 0x080) ? " RZ" : "", + (flags & 0x040) ? " PE" : "", + (flags & 0x020) ? " RW" : "", + (flags & 0x010) ? " PR" : "", + (flags & 0x008) ? " I" : "", + (flags & 0x004) ? " US" : "", + (flags & 0x002) ? " NX" : "", + (flags & 0x001) ? " GN" : ""); for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ ) if ( get_dma_requestor_id(iommu->seg, bdf) == device_id ) diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h index 8ed9482791..53900cd60c 100644 --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h @@ -265,9 +265,6 @@ static inline uint32_t iommu_get_addr_hi_from_cmd(uint32_t cmd) IOMMU_CMD_ADDR_HIGH_SHIFT); } -/* access address field from event log entry */ -#define iommu_get_devid_from_event iommu_get_devid_from_cmd - /* access iommu base addresses field from mmio regs */ static inline void iommu_set_addr_lo_to_reg(uint32_t *reg, uint32_t addr) { -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs
On 26/11/2019 14:14, Jan Beulich wrote: > On 26.11.2019 13:25, Andrew Cooper wrote: >> On 26/11/2019 08:42, Jan Beulich wrote: >>> On 25.11.2019 22:05, Igor Druzhinin wrote: --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table( for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf ) dt[bdf] = (struct amd_iommu_dte){ .v = true, - .iv = true, + .iv = iommu_intremap, >>> This was very intentionally "true", and ignoring "iommu_intremap": >> Deliberate or not, it is a regression from 4.12. > I accept it's a regression (which wants fixing), but I don't think > this is the way to address is. I could be convinced by good > arguments, though. > >> Booting with iommu=no-intremap is a common debugging technique, and that >> means no interrupt remapping anywhere in the system, even for >> supposedly-unused DTEs. > Whether IV=1 or IV=0, there's no interrupt _remapping_ with this > option specified. There's some interrupt _blocking_, yes. It's > not immediately clear to me whether this is a good or a bad thing. You're attempting to argue semantics. Blocking is a special case remapping. "iommu=no-intremap" (for better or worse, naming wise) refers to the interrupt mediation functionality in the IOMMU, and means "don't use any of it". Any other behaviour is a regression. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 6/7] livepatch-build: Strip transient or unneeded symbols
On 11/26/19 12:25 PM, Pawel Wieczorkiewicz wrote: > In the process of creating a final hotpatch module file make sure to > strip all transient symbols that have not been caught and removed by > create-diff-object processing. For now these are only the hooks > kpatch load/unload symbols. > > For all new object files that are carried along for the final linking > the transient hooks symbols are not stripped and neither are any > unneeded symbols. Strip the transient hooks symbols explicitly from > resulting object file. > Add a new option '--strip' to additionally strip all unneeded symbols > from new object files. > > Signed-off-by: Pawel Wieczorkiewicz > --- Reviewed-by: Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs
On 26.11.2019 15:24, Igor Druzhinin wrote: > On 26/11/2019 14:14, Jan Beulich wrote: >> On 26.11.2019 13:25, Andrew Cooper wrote: >>> On 26/11/2019 08:42, Jan Beulich wrote: On 25.11.2019 22:05, Igor Druzhinin wrote: > --- a/xen/drivers/passthrough/amd/iommu_init.c > +++ b/xen/drivers/passthrough/amd/iommu_init.c > @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table( > for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf ) > dt[bdf] = (struct amd_iommu_dte){ >.v = true, > - .iv = true, > + .iv = iommu_intremap, This was very intentionally "true", and ignoring "iommu_intremap": >>> >>> Deliberate or not, it is a regression from 4.12. >> >> I accept it's a regression (which wants fixing), but I don't think >> this is the way to address is. I could be convinced by good >> arguments, though. > > Do you have any suggestions how to address that? I'd like to reply in the other context, after a little more thinking about the situation. I think I see an oversight of mine. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] MAINTAINERS: Add mandatory V: version identifier
On 11/26/19 1:11 PM, Pawel Wieczorkiewicz wrote: > The livepatch-build-tools MAINTAINERS file is missing V: version > identifier. This seems required by the Xen repo's add_maintainers.pl > script. > > Signed-off-by: Pawel Wieczorkiewicz > --- Acked-by: Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
On 26.11.2019 15:14, Ian Jackson wrote: > George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable > state"): >> =item B [I<-r>] I > ... >> +Make the device at PCI Bus/Device/Function BDF not assignable to >> +guests. This will at least unbind the device from pciback, and >> +re-assign it from the "quarantine domain" back to domain 0. If the -r >> +option is specified, it will also attempt to re-bind the device to its >> +original driver, making it usable by Domain 0 again. If the device is >> +not bound to pciback, it will return success. >> + >> +Note that this functionality will work even for devices which were not >> +made assignable by B. This can be used to allow >> +dom0 to access devices which were automatically quarantined by Xen >> +after domain destruction as a result of Xen's B >> +command-line default. > > What are the security implications of doing this if the device might > still be doing DMA or something ? Devices get reset in between, so well behaving ones should not still be doing DMA at that point. Misbehaving ones would better not be assigned (back and forth) anyway. But a recent patch of Paul's suggests that people still wish to do so, on the assumption that such DMA will drain sufficiently quickly. > (For that matter, presumably there are security implications of > assigning the same device in sequence to different guests?) Right. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
On 11/26/19 2:14 PM, Ian Jackson wrote: > George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable > state"): >> =item B [I<-r>] I > ... >> +Make the device at PCI Bus/Device/Function BDF not assignable to >> +guests. This will at least unbind the device from pciback, and >> +re-assign it from the "quarantine domain" back to domain 0. If the -r >> +option is specified, it will also attempt to re-bind the device to its >> +original driver, making it usable by Domain 0 again. If the device is >> +not bound to pciback, it will return success. >> + >> +Note that this functionality will work even for devices which were not >> +made assignable by B. This can be used to allow >> +dom0 to access devices which were automatically quarantined by Xen >> +after domain destruction as a result of Xen's B >> +command-line default. > > What are the security implications of doing this if the device might > still be doing DMA or something ? Then the device might scribble over any memory dom0 has access to. Function-level reset will theoretically stop this, but fundamentally we have to consider it unreliable in the general case. Same thing for assigning to a different guest. I kind of feel like the discussion of the security risks inherent in pci passthrough belong in a separate document, but perhaps a brief mention here would be helpful. Perhaps the following? "As always, this should only be done if you trust the guest, or are confident that the particular device you're re-assigning to dom0 will cancel all in-flight DMA on FLR." -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs
On 26/11/2019 14:14, Jan Beulich wrote: > On 26.11.2019 13:25, Andrew Cooper wrote: >> On 26/11/2019 08:42, Jan Beulich wrote: >>> On 25.11.2019 22:05, Igor Druzhinin wrote: --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table( for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf ) dt[bdf] = (struct amd_iommu_dte){ .v = true, - .iv = true, + .iv = iommu_intremap, >>> This was very intentionally "true", and ignoring "iommu_intremap": >> >> Deliberate or not, it is a regression from 4.12. > > I accept it's a regression (which wants fixing), but I don't think > this is the way to address is. I could be convinced by good > arguments, though. Do you have any suggestions how to address that? >> Booting with iommu=no-intremap is a common debugging technique, and that >> means no interrupt remapping anywhere in the system, even for >> supposedly-unused DTEs. > > Whether IV=1 or IV=0, there's no interrupt _remapping_ with this > option specified. There's some interrupt _blocking_, yes. It's > not immediately clear to me whether this is a good or a bad thing. From user point of view, if I supply "iommu=no-intremap" I'm not expecting any interrupts in the system to be blocked either. And as Andrew said we frequently use this option for debugging which means we expect this functionality to be off completely. Igor ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
On 26.11.19 15:10, George Dunlap wrote: Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and ba2ab00bbb ("IOMMU: default to always quarantining PCI devices") introduced PCI device "quarantine" behavior, but did not document how the pci-assignable-add and -remove functions act in regard to this. Rectify this. Signed-off-by: George Dunlap Release-acked-by: Juergen Gross Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 144307: tolerable all pass - PUSHED
flight 144307 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/144307/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 8c79c129a6db2220c1089e0ce5fa49e7298b1d3e baseline version: xen 77beba7c921a286c31a2a76f26500047f353614a Last test of basis 144294 2019-11-25 11:00:30 Z1 days Testing same since 144307 2019-11-26 11:03:51 Z0 days1 attempts People who touched revisions under test: George Dunlap Jan Beulich jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 77beba7c92..8c79c129a6 8c79c129a6db2220c1089e0ce5fa49e7298b1d3e -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs
On 26.11.2019 13:25, Andrew Cooper wrote: > On 26/11/2019 08:42, Jan Beulich wrote: >> On 25.11.2019 22:05, Igor Druzhinin wrote: >>> --- a/xen/drivers/passthrough/amd/iommu_init.c >>> +++ b/xen/drivers/passthrough/amd/iommu_init.c >>> @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table( >>> for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf ) >>> dt[bdf] = (struct amd_iommu_dte){ >>>.v = true, >>> - .iv = true, >>> + .iv = iommu_intremap, >> This was very intentionally "true", and ignoring "iommu_intremap": > > Deliberate or not, it is a regression from 4.12. I accept it's a regression (which wants fixing), but I don't think this is the way to address is. I could be convinced by good arguments, though. > Booting with iommu=no-intremap is a common debugging technique, and that > means no interrupt remapping anywhere in the system, even for > supposedly-unused DTEs. Whether IV=1 or IV=0, there's no interrupt _remapping_ with this option specified. There's some interrupt _blocking_, yes. It's not immediately clear to me whether this is a good or a bad thing. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable state"): > =item B [I<-r>] I ... > +Make the device at PCI Bus/Device/Function BDF not assignable to > +guests. This will at least unbind the device from pciback, and > +re-assign it from the "quarantine domain" back to domain 0. If the -r > +option is specified, it will also attempt to re-bind the device to its > +original driver, making it usable by Domain 0 again. If the device is > +not bound to pciback, it will return success. > + > +Note that this functionality will work even for devices which were not > +made assignable by B. This can be used to allow > +dom0 to access devices which were automatically quarantined by Xen > +after domain destruction as a result of Xen's B > +command-line default. What are the security implications of doing this if the device might still be doing DMA or something ? (For that matter, presumably there are security implications of assigning the same device in sequence to different guests?) Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
On Tue, Nov 26, 2019 at 02:10:26PM +, George Dunlap wrote: > Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and > ba2ab00bbb ("IOMMU: default to always quarantining PCI devices") > introduced PCI device "quarantine" behavior, but did not document how > the pci-assignable-add and -remove functions act in regard to this. > Rectify this. > > Signed-off-by: George Dunlap LGTM. Acked-by: Wei Liu ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state
Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and ba2ab00bbb ("IOMMU: default to always quarantining PCI devices") introduced PCI device "quarantine" behavior, but did not document how the pci-assignable-add and -remove functions act in regard to this. Rectify this. Signed-off-by: George Dunlap --- Release justification: This brings documentation into line with the actual code that will be released. CC: Ian Jackson CC: Wei Liu CC: Jan Beulich CC: Paul Durrant CC: Juergen Gross --- docs/man/xl.1.pod.in | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in index 2303b81e4f..372c229244 100644 --- a/docs/man/xl.1.pod.in +++ b/docs/man/xl.1.pod.in @@ -1589,10 +1589,12 @@ backend driver in domain 0 rather than a real driver. =item B I Make the device at PCI Bus/Device/Function BDF assignable to guests. -This will bind the device to the pciback driver. If it is already -bound to a driver, it will first be unbound, and the original driver -stored so that it can be re-bound to the same driver later if desired. -If the device is already bound, it will return success. +This will bind the device to the pciback driver and assign it to the +"quarantine domain". If it is already bound to a driver, it will +first be unbound, and the original driver stored so that it can be +re-bound to the same driver later if desired. If the device is +already bound, it will assign it to the quarantine domain and return +success. CAUTION: This will make the device unusable by Domain 0 until it is returned with pci-assignable-remove. Care should therefore be taken @@ -1602,11 +1604,18 @@ being used. =item B [I<-r>] I -Make the device at PCI Bus/Device/Function BDF not assignable to guests. This -will at least unbind the device from pciback. If the -r option is specified, -it will also attempt to re-bind the device to its original driver, making it -usable by Domain 0 again. If the device is not bound to pciback, it will -return success. +Make the device at PCI Bus/Device/Function BDF not assignable to +guests. This will at least unbind the device from pciback, and +re-assign it from the "quarantine domain" back to domain 0. If the -r +option is specified, it will also attempt to re-bind the device to its +original driver, making it usable by Domain 0 again. If the device is +not bound to pciback, it will return success. + +Note that this functionality will work even for devices which were not +made assignable by B. This can be used to allow +dom0 to access devices which were automatically quarantined by Xen +after domain destruction as a result of Xen's B +command-line default. =item B I I -- 2.24.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] MAINTAINERS: Update path to the livepatch documentation
On 26.11.2019 14:30, Julien Grall wrote: > Commit d661611d08 "docs/markdown: Switch to using pandoc, and fix > underscore escaping" converted the livepatch documentation from markdown > to pandoc. > > Update MAINTAINERS to reflect the change so the correct maintainers are > CCed to the patches. > > Fixes: d661611d08 ("docs/markdown: Switch to using pandoc, and fix underscore > escaping") > Signed-off-by: Julien Grall Acked-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-4.11-testing test] 144300: tolerable FAIL - PUSHED
flight 144300 xen-4.11-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/144300/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass version targeted for testing: xen 0eb99bf90b64737c5ba6adaa46951127dcf150cc baseline version: xen 006b2041242129896fbd30135b3dc6f575894a07 Last test of basis 144025 2019-11-11 17:36:00 Z 14 days Failing since144058 2019-11-12 18:05:56 Z 13 days 24 attempts Testing same since 144300 2019-11-25 20:37:05 Z0 days1 attempts People who touched revisions under test: Andrew Cooper Anthony
Re: [Xen-devel] [PATCH] domain_create: honour global grant/maptrack frame limits...
On 11/26/19 1:26 PM, Durrant, Paul wrote: >> -Original Message- >> From: George Dunlap >> Sent: 26 November 2019 12:32 >> To: Paul Durrant ; Durrant, Paul >> Cc: xen-devel ; Stefano Stabellini >> ; Julien Grall ; Wei Liu >> ; Konrad Rzeszutek Wilk ; George >> Dunlap ; Andrew Cooper >> ; Ian Jackson ; Jan >> Beulich >> Subject: Re: [Xen-devel] [PATCH] domain_create: honour global >> grant/maptrack frame limits... >> >> On 11/26/19 11:30 AM, Paul Durrant wrote: >>> On Wed, 13 Nov 2019 at 13:55, Paul Durrant wrote: ...when their values are larger than the per-domain configured limits. Signed-off-by: Paul Durrant --- Cc: Andrew Cooper Cc: George Dunlap Cc: Ian Jackson Cc: Jan Beulich Cc: Julien Grall Cc: Konrad Rzeszutek Wilk Cc: Stefano Stabellini Cc: Wei Liu After mining through commits it is still unclear to me exactly when Xen stopped honouring the global values, but I really think this commit >> should be back-ported to stable trees as it was a behavioural change that can cause domUs to fail in non-obvious ways. >>> >>> Any other opinions on this? AFAICT questions is still open: >>> >>> - Do we consider not honouring the command line values to be a >>> regression (since domUs that would have worked before will no longer >>> work after a basic upgrade of Xen)? >> >> This would be a bit easier to form a "policy" opinion on (or perhaps >> alternate solutions to) if more of the situation were outlined here. >> >> Is the problem that the per-domain config is always set, and doesn't >> take the hypervisor-set config into account? Wouldn't it be better to >> modify the toolstack to use the hypervisor value if it's not set? >> >> In fact, it looks kind of like things are screwed up anyway -- the >> "default" value of max_grant_frames, if no value is specified, is set in >> xl.c. If that were the behavior we wanted, it should be set in libxl.c. >> >> But it doesn't seem like it should be terribly difficult to get a "use >> the default" sentinel value passed in to Xen, such that: >> >> 1. People who don't do anything will get the default currently specified >> in xl.c >> >> 2. People who set the value on the Xen command-line and don't set >> anything in the guest config file will get the Xen command-line value >> >> 3. People who set the value in the config file will get the value they >> specified (regardless of the global setting). >> >> Is that the behaviour you'd like to see, Paul? > > I think the order should be: > > If set in xl.cfg => use that, else > If set in xl.conf => use that, else > Use the command line/default value > > I.e. the ultimate value should be set in Xen (and possibly overridden by the > command line) and not hardcoded at any other layer. > > There is also the issue of limits but I guess the rationale there should be: > If a value *is* specified then it should not exceed the value set in Xen. > > Does that sound right? So part of the issue here sounds like a terminology issue. Is it the case that there's a default "max", and you want to raise the default "max"; is that right? But the documentation actually says: "Specify the maximum number of frames which any domain may use as part of its grant table." Which makes it sound a lot more like a "maximum max" -- i.e., that any domain which is created with a value higher than this should fail. -George ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen-unstable + linux 5.4.0-rc8: RIP: 0010:xennet_poll+0x35f/0xae0
On 25/11/2019 15:42, Jan Beulich wrote: > On 25.11.2019 15:21, Sander Eikelenboom wrote: >> L.S., >> >> At present one of my PVH VM's kernel crashed with the splat below >> (haven't seen it before, so could be something that happens sporadically). >> >> Any ideas ? >> >> -- >> Sander >> >> >> >> database databaselogin: login: [184503.428811] general protection fault: >> [#1] SMP NOPTI >> [184503.428887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted >> 5.4.0-rc8-20191123-doflr-mac80211debug+ #1 >> [184503.428932] RIP: 0010:xennet_poll+0x35f/0xae0 >> [184503.428955] Code: ba 00 01 00 00 48 8b 8d c0 00 00 00 0f b7 b4 24 92 00 >> 00 00 48 8b 5c 24 78 3d 00 01 00 00 0f 4e d0 89 55 28 8b 95 bc 00 00 00 <89> >> 74 11 3c 48 8b 8d c0 00 00 00 8b 95 bc 00 00 00 89 44 11 38 89 > > The insn here being "mov %esi,(%rcx,%rdx,0x3c)" ... > >> [184503.429027] RSP: 0018:c9003e10 EFLAGS: 00010287 >> [184503.429049] RAX: 0042 RBX: c9003e88 RCX: >> fffe88800b865a80 > > ... I notice corruption to bit 48 of RCX here. This can be a result of > memory corruption, but prior instances of such that I had to look into > were bit flips in the CPU instead. Is this a server or desktop class > CPU? > > Jan Hi Jan, Fortunately (or more unfortunate for me), memtest86 gave errors on one stick of memory. So this is the probable cause. Sorry for the noise. -- Sander ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests
Hi, On 26/11/2019 01:20, Stefano Stabellini wrote: On Mon, 25 Nov 2019, Julien Grall wrote: (+ Andre) On 23/11/2019 20:35, Julien Grall wrote: Hi, On 15/11/2019 20:10, Stewart Hildebrand wrote: Allow vgic_get_hw_irq_desc to be called with a vcpu argument. Use vcpu argument in vgic_connect_hw_irq. vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with ASSERTs. Signed-off-by: Stewart Hildebrand --- v3: new patch --- Note: I have only modified the old vgic to allow delivery of PPIs. The new vGIC should also be modified to support delivery of PPIs. diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 82f524a35c..c3933c2687 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -410,10 +410,10 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n) irq_set_affinity(p->desc, cpumask_of(v_target->processor)); spin_lock_irqsave(>desc->lock, flags); /* - * The irq cannot be a PPI, we only support delivery of SPIs - * to guests. + * The irq cannot be a SGI, we only support delivery of SPIs + * and PPIs to guests. */ - ASSERT(irq >= 32); + ASSERT(irq >= NR_SGIS); We usually put ASSERT() in place we know that code wouldn't be able to work correctly if there ASSERT were hit. In this particular case: if ( irq_type_set_by_domain(d) ) gic_set_irq_type(p->desc, vgic_get_virq_type(v, n, i)); 1) We don't want to allow any domain (including Dom0) to modify the interrupt type (i.e. level/edge) for PPIs as this is shared. You will also most likely need to modify the counterpart in setup_guest_irq(). p->desc->handler->enable(p->desc); 2) On GICv3, the re-distributor of vCPU A is accessible by vCPU B. So vCPU B could enable the SGI for vCPU A. But this would be called on the wrong pCPU leading to inconsistency between the hardware state of the internal vGIC state. Is it actually meant to work from a GIC specification perspective? It sounds "wrong" somehow to me, but I went through the spec and it doesn't say explicitly that cpuB couldn't enable a SGI/PPI of cpuA. I am still a bit shocked by this revelation. To be honest, I can see reason to allow this but this is a different subject. In this case the re-distributor is per-CPU and can accessible by any CPU. For instance, Linux will access it to find the re-distributor associated to a given CPU at boot. FWIW, the vGIC implementation in KVM handles it the same way. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel