[Xen-devel] [PATCH v1] psr: fix bug which may cause crash

2019-11-26 Thread Yi Sun
During test, we found a crash on Xen with below trace.
(XEN) Xen call trace:
(XEN)[] R psr.c#l3_cdp_write_msr+0x1e/0x22
(XEN)[] F psr.c#do_write_psr_msrs+0x6d/0x109
(XEN)[] F smp_call_function_interrupt+0x5a/0xac
(XEN)[] F call_function_interrupt+0x20/0x34
(XEN)[] F do_IRQ+0x175/0x6ae
(XEN)[] F common_interrupt+0x10a/0x120
(XEN)[] F cpu_idle.c#acpi_idle_do_entry+0x9d/0xb1
(XEN)[] F cpu_idle.c#acpi_processor_idle+0x41d/0x626
(XEN)[] F domain.c#idle_loop+0xa5/0xa7
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 20:
(XEN) GENERAL PROTECTION FAULT
(XEN) [error_code=]
(XEN) 

Root cause is that the cache of COS registers are not initialized
for CAT/CDP which have non-zero default value. That causes invalid
write to MSR when COS id has exceeded the max number.. So fix it by
initializing the cache.

Signed-off-by: Yi Sun 
---
 xen/arch/x86/psr.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 5866a26..d3e7467 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -316,6 +316,7 @@ static bool cat_init_feature(const struct cpuid_leaf *regs,
 [FEAT_TYPE_L3_CDP] = "L3 CDP",
 [FEAT_TYPE_L2_CAT] = "L2 CAT",
 };
+unsigned int i = 0;
 
 /* No valid value so do not enable feature. */
 if ( !regs->a || !regs->d )
@@ -332,7 +333,8 @@ static bool cat_init_feature(const struct cpuid_leaf *regs,
 return false;
 
 /* We reserve cos=0 as default cbm (all bits within cbm_len are 1). */
-feat->cos_reg_val[0] = cat_default_val(feat->cat.cbm_len);
+for(i = 0; i < MAX_COS_REG_CNT; i++)
+feat->cos_reg_val[i] = cat_default_val(feat->cat.cbm_len);
 
 wrmsrl((type == FEAT_TYPE_L3_CAT ?
 MSR_IA32_PSR_L3_MASK(0) :
@@ -352,8 +354,11 @@ static bool cat_init_feature(const struct cpuid_leaf *regs,
 feat->cos_max = (feat->cos_max - 1) >> 1;
 
 /* We reserve cos=0 as default cbm (all bits within cbm_len are 1). */
-get_cdp_code(feat, 0) = cat_default_val(feat->cat.cbm_len);
-get_cdp_data(feat, 0) = cat_default_val(feat->cat.cbm_len);
+for(i = 0; i < MAX_COS_REG_CNT/2; i++)
+{
+get_cdp_code(feat, i) = cat_default_val(feat->cat.cbm_len);
+get_cdp_data(feat, i) = cat_default_val(feat->cat.cbm_len);
+}
 
 wrmsrl(MSR_IA32_PSR_L3_MASK(0), cat_default_val(feat->cat.cbm_len));
 wrmsrl(MSR_IA32_PSR_L3_MASK(1), cat_default_val(feat->cat.cbm_len));
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range

2019-11-26 Thread Jürgen Groß

On 27.11.19 01:01, Julien Grall wrote:

Hi,

On 26/11/2019 23:17, Stefano Stabellini wrote:

On Tue, 26 Nov 2019, Julien Grall wrote:

Hi,

On 26/11/2019 20:43, Stefano Stabellini wrote:

+ Juergen

I missed that you weren't in CC to the original patch, sorry.
I think this patch should go in, as otherwise Linux 5.4 could run into
problems. It is also a pretty straightforward 4 lines patch.


5.5 (or 5.6) is not going to run on Xen for other reasons (still in the
vGIC)... So I would not view this as critical.


5.5 is not out yet, in fact, the dev window has just opened. Isn't your
statement a bit premature?


The GICv4.1 work [1] is going to prevent Linux booting on all current 
versions of Xen. While I can't confirm this is going to be merged in 
5.5, I can tell you this will break.




In any case, even if potential future Linux releases could have other
additional issues, I don't think it should change our current view on
this specific issue which affects 5.4, just released.


The patch is definitely not as straightforward as you may think. Please 
refer to the discussion we had on the first version. I voiced concern 
about this approach and gave point what could go wrong with happen.


This patch may be better than the current state (i.e crashing), but this 
wasn't tested enough to confirm this is the correct things to do and no 
other bug will appear (I don't believe reading I*ACTIVER was ever tested 
before).


It is an annoying bug, but this is only affecting 5.4 which has just 
been released. It feels to me this is a fairly risky choice to merge it 
qutie late in the release without a good graps of the problem (see above).


So I would definitly, prefer if this patch is getting through backport 
once we get more testing.


We can still document the bug in the release note and point people to 
the patch.


Anyway, this is Juergen choice here. But at least now he has the full 
picture...


Cheers,

[1] https://lwn.net/Articles/800494/



Thanks, Julien, for sharing your opinion.

With that statement I'd like to defer this patch to 4.14.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-26 Thread Jürgen Groß

On 26.11.19 18:17, George Dunlap wrote:

Xen used to have single, system-wide limits for the number of grant
frames and maptrack frames a guest was allowed to create.  Increasing
or decreasing this single limit on the Xen command-line would change
the limit for all guests on the system.

Later, per-domain limits for these values was created.  The
system-wide limits became strict limits: domains could not be created
with higher limits, but could be created with lower limits.

However, the change also introduced a range of different "default"
values into various places in the toolstack:

- The python libxc bindings hard-coded these values to 32 and 1024,
   respectively

- The libxl default values are 32 and 1024 respectively.

- xl will use the libxl default for maptrack, but does its own default
   calculation for grant frames: either 32 or 64, based on the max
   possible mfn.

These defaults interact poorly with the hypervisor command-line limit:

- The hypervisor command-line limit cannot be used to raise the limit
   for all guests anymore, as the default in the toolstack will
   effectively override this.

- If you use the hypervisor command-line limit to *reduce* the limit,
   then the "default" values generated by the toolstack are too high,
   and all guest creations will fail.

In other words, the toolstack defaults require any change to be
effected by having the admin explicitly specify a new value in every
guest.

In order to address this, have grant_table_init treat '0' values for
max_grant_frames and max_maptrack_frames as instructions to use the
system-wide default.  Have all the above toolstacks default to passing
0 unless a different value is explicitly given.

This restores the old behavior, that changing the hypervisor
command-line option can change the behavior for all guests, while
retaining the ability to set per-guest values.  It also removes the
bug that *reducing* the system-wide max will cause all domains without
explicit limits to fail.

(The ocaml bindings require the caller to always specify a value, and
the code to start a xenstored stubdomain hard-codes these to 4 and 128
respectively; these will not be addressed here.)

Signed-off-by: George Dunlap 
---
Release justification: This is an observed regression (albeit one that
has spanned several releases now).

Compile-tested only.

NB this patch could be applied without the whitespace fixes (perhaps
with some fix-ups); it's just easier since my editor strips trailing
whitespace out automatically.

CC: Ian Jackson 
CC: Wei Liu 
CC: Andrew Cooper 
CC: Jan Beulich 
CC: Paul Durrant 
CC: Julien Grall 
CC: Konrad Rzeszutek Wilk 
CC: Stefano Stabellini 
CC: Juergen Gross 
CC: Marek Marczykowski-Górecki 
---
  tools/libxl/libxl.h   |  4 ++--
  tools/python/xen/lowlevel/xc/xc.c |  2 --
  tools/xl/xl.c | 12 ++--
  xen/common/grant_table.c  |  7 +++
  xen/include/public/domctl.h   |  6 --
  5 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 49b56fa1a3..1648d337e7 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -364,8 +364,8 @@
   */
  #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1
  
-#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32

-#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024
+#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 0
+#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 0


I'd rather use -1 for the "not specified" value. This allows to set e.g.
the maptrack frames to 0 for non-driver domains.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v2] AMD/IOMMU: honour IR setting while pre-filling DTEs

2019-11-26 Thread Jürgen Groß

On 26.11.19 18:08, Igor Druzhinin wrote:

IV bit shouldn't be set in DTE if interrupt remapping is not
enabled. It's a regression in behavior of "iommu=no-intremap"
option which otherwise would keep interrupt requests untranslated
for all of the devices in the system regardless of wether it's
described as valid in IVRS or not.

Signed-off-by: Igor Druzhinin 


Release-acked-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v3 1/2] x86/vmx: add ASSERT to prevent syncing PIR to IRR...

2019-11-26 Thread Tian, Kevin
> From: Jan Beulich 
> Sent: Wednesday, November 27, 2019 12:59 AM
> 
> On 26.11.2019 17:47, Roger Pau Monné  wrote:
> > On Tue, Nov 26, 2019 at 05:32:04PM +0100, Jan Beulich wrote:
> >> On 26.11.2019 14:26, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >>> @@ -2054,6 +2054,19 @@ static void vmx_sync_pir_to_irr(struct vcpu
> *v)
> >>>  unsigned int group, i;
> >>>  DECLARE_BITMAP(pending_intr, NR_VECTORS);
> >>>
> >>> +if ( v != current && !atomic_read(>pause_count) )
> >>> +{
> >>> +/*
> >>> + * Syncing PIR to IRR must not be done behind the back of the 
> >>> CPU,
> >>> + * since the IRR is controlled by the hardware when the vCPU is
> >>> + * executing. Only allow Xen to do such sync if the vCPU is the
> current
> >>> + * one or if it's paused: that's required in order to sync the 
> >>> lapic
> >>> + * state before saving it.
> >>> + */
> >>
> >> Is this stated this way by the SDM anywhere?
> >
> > No, I think the SDM is not very clear on this, there's a paragraph
> > about PIR:
> >
> > "The logical processor performs a logical-OR of PIR into VIRR and
> > clears PIR. No other agent can read or write a PIR bit (or group of
> > bits) between the time it is read (to determine what to OR into VIRR)
> > and when it is cleared."
> 
> Well, this is about PIR, but my question was rather towards the
> effects on vIRR.
> 
> >> I ask because the
> >> comment then really doesn't apply to just this function, but to
> >> vlapic_{,test_and_}{set,clear}_vector() more generally. It's
> >> not clear to me at all whether the CPU caches (in an incoherent
> >> fashion) IRR (and maybe other APIC page elements), rather than
> >> honoring the atomic updates these macros do.
> >
> > IMO syncing PIR to IRR when the vCPU is running on a different pCPU is
> > likely to at least defeat the purpose of posted interrupts:
> 
> I agree here.
> 
> > when the
> > CPU receives the posted interrupt vector it won't see the
> > outstanding-notification bit in the posted-interrupt descriptor
> > because the sync done from a different pCPU would have cleared it, at
> > which point it's not clear to me that the processor will check vIRR
> > for pending interrupts. The description in section 29.6
> > POSTED-INTERRUPT PROCESSING doesn't explicitly mention whether the
> > value of the outstanding-notification bit affects the logic of posted
> > interrupt processing.

I think the outstanding-notification is one-off checked for triggering 
interrupt posting process. Once the process starts, there is no need to 
look at it again. The step 3 of posting process in 29.6 clearly says:

"The processor clears the outstanding-notification bit in the posted-
interrupt descriptor. This is done atomically so as to leave the remainder 
of the descriptor unmodified (e.g., with a locked AND operation)."

But regardless of the hardware behavior, I think it's safe to restrict
sync_pir_to_irr as this patch does.

> 
> But overall this again is all posted interrupt centric when my
> question was about vIRR, in particular whether the asserting you
> add may need to be even more rigid.
> 
> Anyway, let's see what the VMX maintainers have to say.
> 

There is one paragraph in 29.6:

"Use of the posted-interrupt descriptor differs from that of other data 
structures that are referenced by pointers in a VMCS. There is a general 
requirement that software ensure that each such data structure is 
modified only when no logical processor with a current VMCS that 
references it is in VMX non-root operation. That requirement does
not apply to the posted-interrupt descriptor. There is a requirement, 
however, that such modifications be done using locked read-modify-write 
instructions."

virtual-APIC page is pointer-referenced by VMCS, thus it falls into above
general requirement. But I suppose there should be some exception with
this page too, otherwise the point of posted interrupt is killed (if we have
to kick the dest vcpu into root to update the vIRR). Let me confirm
internally.

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 for 4.13] x86/microcode: refuse to load the same revision ucode

2019-11-26 Thread Chao Gao
On Tue, Nov 26, 2019 at 03:41:53PM +, Sergey Dyasli wrote:
>Currently if a user tries to live-load the same or older ucode revision
>than CPU already has, he will get a single message in Xen log like:
>
>(XEN) 128 cores are to update their microcode
>
>No actual ucode loading will happen and this situation can be quite
>confusing. Fix this by starting ucode update only when the provided
>ucode revision is higher than the currently cached one (if any).
>This is based on the property that if microcode_cache exists, all CPUs
>in the system should have at least that ucode revision.
>
>Additionally, print a user friendly message if no matching or newer
>ucode can be found in the provided blob. This also requires ignoring
>-ENODATA in AMD-side code, otherwise the message given to the user is:
>
>(XEN) Parsing microcode blob error -61
>
>Which actually means that a ucode blob was parsed fine, but no matching
>ucode was found.
>
>Signed-off-by: Sergey Dyasli 
>---
>v2 --> v3:
>- move ucode comparison to generic code
>- ignore -ENODATA in a different code section
>
>v1 --> v2:
>- compare provided ucode with the currently cached one
>
>CC: Jan Beulich 
>CC: Andrew Cooper 
>CC: Roger Pau Monné 
>CC: Chao Gao 
>CC: Juergen Gross 
>---
> xen/arch/x86/microcode.c | 19 +++
> xen/arch/x86/microcode_amd.c |  7 +++
> 2 files changed, 26 insertions(+)
>
>diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>index 65d1f41e7c..44efc2d9b3 100644
>--- a/xen/arch/x86/microcode.c
>+++ b/xen/arch/x86/microcode.c
>@@ -640,10 +640,29 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
>buf, unsigned long len)
> 
> if ( !patch )
> {
>+printk(XENLOG_WARNING "microcode: couldn't find any matching ucode in 
>"
>+  "the provided blob!\n");
> ret = -ENOENT;
> goto put;
> }
> 
>+/*
>+ * If microcode_cache exists, all CPUs in the system should have at least
>+ * that ucode revision.
>+ */
>+spin_lock(_mutex);
>+if ( microcode_cache &&
>+ microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE )
>+{
>+spin_unlock(_mutex);
>+printk(XENLOG_WARNING "microcode: couldn't find any newer revision "
>+  "in the provided blob!\n");

The patch needs to be freed.

With it fixed,
Reviewed-by: Chao Gao 

Thanks
Chao

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v3 2/2] x86/vmx: always sync PIR to IRR before vmentry

2019-11-26 Thread Tian, Kevin
> From: Roger Pau Monne 
> Sent: Tuesday, November 26, 2019 9:27 PM
> 
> When using posted interrupts on Intel hardware it's possible that the
> vCPU resumes execution with a stale local APIC IRR register because
> depending on the interrupts to be injected vlapic_has_pending_irq
> might not be called, and thus PIR won't be synced into IRR.
> 
> Fix this by making sure PIR is always synced to IRR in
> hvm_vcpu_has_pending_irq regardless of what interrupts are pending.
> 
> While there also simplify the code in __vmx_deliver_posted_interrupt:
> only raise a softirq if the vCPU is the one currently running and
> __vmx_deliver_posted_interrupt is called from interrupt context. The

as commented earlier, this is what exactly original code does. Then
what is the simplification?

> softirq is raised to make sure vmx_intr_assist is retried if the
> interrupt happens to arrive after vmx_intr_assist but before
> interrupts are disabled in vmx_do_vmentry. Also simplify the logic for
> IPIing other pCPUs, there's no need to check v->processor since the
> IPI should be sent as long as the vCPU is not the current one and it's
> running.
> 
> Reported-by: Joe Jin 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Juergen Gross 
> ---
> Changes since v2:
>  - Raise a softirq if in interrupt context and the vCPU is the current
>one.
>  - Use is_running instead of runnable.
>  - Remove the call to vmx_sync_pir_to_irr in vmx_intr_assist and
>instead always call vlapic_has_pending_irq in
>hvm_vcpu_has_pending_irq.
> ---
>  xen/arch/x86/hvm/irq.c |  7 +++--
>  xen/arch/x86/hvm/vmx/vmx.c | 64 +++---
>  2 files changed, 24 insertions(+), 47 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
> index e03a87ad50..b50ac62a16 100644
> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -515,7 +515,11 @@ void hvm_set_callback_via(struct domain *d,
> uint64_t via)
>  struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
>  {
>  struct hvm_domain *plat = >domain->arch.hvm;
> -int vector;
> +/*
> + * Always call vlapic_has_pending_irq so that PIR is synced into IRR when
> + * using posted interrupts.
> + */
> +int vector = vlapic_has_pending_irq(v);
> 
>  if ( unlikely(v->nmi_pending) )
>  return hvm_intack_nmi;
> @@ -530,7 +534,6 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct
> vcpu *v)
>  if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output )
>  return hvm_intack_pic(0);
> 
> -vector = vlapic_has_pending_irq(v);
>  if ( vector != -1 )
>  return hvm_intack_lapic(vector);
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index c817aec75d..4dea868cda 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1945,57 +1945,31 @@ static void vmx_process_isr(int isr, struct vcpu
> *v)
> 
>  static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>  {
> -bool_t running = v->is_running;
> -
>  vcpu_unblock(v);
> -/*
> - * Just like vcpu_kick(), nothing is needed for the following two cases:
> - * 1. The target vCPU is not running, meaning it is blocked or runnable.
> - * 2. The target vCPU is the current vCPU and we're in non-interrupt
> - * context.
> - */
> -if ( running && (in_irq() || (v != current)) )
> -{
> +if ( v->is_running && v != current )
>  /*
> - * Note: Only two cases will reach here:
> - * 1. The target vCPU is running on other pCPU.
> - * 2. The target vCPU is the current vCPU.
> + * If the vCPU is running on another pCPU send an IPI to the pCPU.
> When
> + * the IPI arrives, the target vCPU may be running in non-root mode,
> + * running in root mode, runnable or blocked. If the target vCPU is
> + * running in non-root mode, the hardware will sync PIR to vIRR for
> + * 'posted_intr_vector' is a special vector handled directly by the
> + * hardware.
>   *
> - * Note2: Don't worry the v->processor may change. The vCPU being
> - * moved to another processor is guaranteed to sync PIR to vIRR,
> - * due to the involved scheduling cycle.
> + * If the target vCPU is running in root-mode, the interrupt handler
> + * starts to run. Considering an IPI may arrive in the window between
> + * the call to vmx_intr_assist() and interrupts getting disabled, the
> + * interrupt handler should raise a softirq to ensure events will be
> + * delivered in time.

I prefer to original comment which covers all possible conditions that the
target vcpu might be. You may help improve it if some words are not
well-written, but removing useful information is not good there.

>   */
> -unsigned int cpu = v->processor;
> -
> -/*
> - * For case 1, we send an IPI to the pCPU. When an IPI arrives, the
> - * 

Re: [Xen-devel] [PATCH for-4.13] x86/vmx: always sync PIR to IRR before vmentry

2019-11-26 Thread Tian, Kevin
> From: Roger Pau Monné 
> Sent: Monday, November 18, 2019 10:56 PM
> 
> On Mon, Nov 18, 2019 at 03:19:50PM +0100, Jan Beulich wrote:
> > On 18.11.2019 15:03, Roger Pau Monné  wrote:
> > > On Mon, Nov 18, 2019 at 01:26:46PM +0100, Jan Beulich wrote:
> > >> On 18.11.2019 11:16, Roger Pau Monne wrote:
> > >>> @@ -1954,48 +1952,28 @@ static void
> __vmx_deliver_posted_interrupt(struct vcpu *v)
> > >>>   * 2. The target vCPU is the current vCPU and we're in 
> > >>> non-interrupt
> > >>>   * context.
> > >>>   */
> > >>> -if ( running && (in_irq() || (v != current)) )
> > >>> -{
> > >>> +if ( vcpu_runnable(v) && v != current )
> > >>
> > >> I'm afraid you need to be more careful with the running vs runnable
> > >> distinction here. The comment above here becomes stale with the
> > >> change (also wrt the removal of in_irq(), which I'm at least uneasy
> > >> about), and the new commentary below also largely says/assumes
> > >> "running", not "runnable".
> > >
> > > I've missed to fix that comment, will take care in the next version.
> > > Note also that the comment is quite pointless, it only states what the
> > > code below is supposed to do, but doesn't give any reasoning as to why
> > > in_irq is relevant here.
> >
> > It's main "value" is to refer to vcpu_kick(), which has ...
> >
> > > TBH I'm not sure of the point of the in_irq check, I don't think it's
> > > relevant for the code here.
> >
> > ... a similar in_irq() check. Sadly that one, while having a bigger
> > comment, also doesn't explain what it's needed for. It looks like I
> > should recall the reason, but I'm sorry - I don't right now.
> 
> By reading the message of the commit that introduced the in_irq check
> in vcpu_kick:
> 
> "The drawback is that {vmx,svm}_intr_assist() now races new event
> notifications delivered by IRQ or IPI. We close down this race by
> having vcpu_kick() send a dummy softirq -- this gets picked up in
> IRQ-sage context and will cause retry of *_intr_assist(). We avoid
> delivering the softirq where possible by avoiding it when we are
> running in the non-IRQ context of the VCPU to be kicked."
> 
> AFAICT in the vcpu_kick case this is done because the softirq should
> only be raised when in IRQ context in order to trigger the code in
> vmx_do_vmentry to retry the call to vmx_intr_assist (this is relevant
> if vcpu_kick is issued from an irq handler executed after
> vmx_intr_assist and before the disabling interrupts in
> vmx_do_vmentry.
> 
> I think we need something along the lines of:
> 
> if ( v->is_running && v != current )
> send_IPI_mask(cpumask_of(v->processor), posted_intr_vector);
> else if ( v == current && in_irq() && !softirq_pending(smp_processor_id()) )
> raise_softirq(VCPU_KICK_SOFTIRQ);

Then what's the difference from original logic?

> 
> So that vmx_intr_assist is also retried if a vector is signaled in PIR
> on the vCPU currently running between the call to vmx_intr_assist and
> the disabling of interrupts in vmx_do_vmentry.
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] x86/vmx: always sync PIR to IRR before vmentry

2019-11-26 Thread Tian, Kevin
> From: Roger Pau Monné 
> Sent: Thursday, November 21, 2019 5:26 PM
> 
> On Mon, Nov 18, 2019 at 05:00:29PM +0100, Jan Beulich wrote:
> > On 18.11.2019 15:20, Roger Pau Monné  wrote:
> > > On Mon, Nov 18, 2019 at 03:00:00PM +0100, Jan Beulich wrote:
> > >> On 18.11.2019 14:46, Roger Pau Monné  wrote:
> > >>> On Mon, Nov 18, 2019 at 01:01:58PM +0100, Jan Beulich wrote:
> >  On 18.11.2019 11:16, Roger Pau Monne wrote:
> > > When using posted interrupts on Intel hardware it's possible that the
> > > vCPU resumes execution with a stale local APIC IRR register because
> > > depending on the interrupts to be injected vlapic_has_pending_irq
> > > might not be called, and thus PIR won't be synced into IRR.
> > >
> > > Fix this by making sure PIR is always synced to IRR in vmx_intr_assist
> > > regardless of what interrupts are pending.
> > 
> >  For this part, did you consider pulling ahead to the beginning
> >  of hvm_vcpu_has_pending_irq() its call to vlapic_has_pending_irq()?
> > >>>
> > >>> I assumed the order in hvm_vcpu_has_pending_irq is there for a
> reason.
> > >>> I could indeed move vlapic_has_pending_irq to the top, but then either
> > >>> the result is discarded if for example a NMI is pending injection
> > >>> (in which case there's no need to go through all the logic in
> > >>> vlapic_has_pending_irq), or we invert the priority of event
> > >>> injection.
> > >>
> > >> Changing the order of events injected is not an option afaict. The
> > >> pointless processing done is a valid concern, yet the suggestion
> > >> was specifically to have (part of) this processing to occur early.
> > >> The discarding of the result, in turn, is not a problem afaict, as
> > >> a subsequent call will return the same result (unless a higher
> > >> priority interrupt has surfaced in the meantime).
> > >
> > > Yes, that's fine. So you would prefer to move the call to
> > > vlapic_has_pending_irq before any exit path in
> > > hvm_vcpu_has_pending_irq?
> >
> > "Prefer" isn't really the way I would put it. I'd like this to be
> > considered as an alternative because, as said, I think the current
> > placement look more like a plaster than a cure. I'm also open for
> > other suggestions. But first of all I'd like to see what the VMX
> > maintainers think.
> 
> Kevin/Jun, can we please get your opinion on the above item?
> 

putting the sync within hvm_vcpu_has_pending_irq sounds better,
implying that all intermediate states must be synced back to 
architectural states anytime when software wants to check virtual
interrupt.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests

2019-11-26 Thread Julien Grall
On Tue, 26 Nov 2019, 23:18 Stefano Stabellini, 
wrote:

> On Fri, 15 Nov 2019, Stewart Hildebrand wrote:
> > Allow vgic_get_hw_irq_desc to be called with a vcpu argument.
> >
> > Use vcpu argument in vgic_connect_hw_irq.
> >
> > vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with
> > ASSERTs.
> >
> > Signed-off-by: Stewart Hildebrand 
> >
> > ---
> > v3: new patch
> >
> > ---
> > Note: I have only modified the old vgic to allow delivery of PPIs.
> > ---
> >  xen/arch/arm/gic-vgic.c | 24 
> >  xen/arch/arm/vgic.c |  6 +++---
> >  2 files changed, 19 insertions(+), 11 deletions(-)
> >
> > diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
> > index 98c021f1a8..2c66a8fa92 100644
> > --- a/xen/arch/arm/gic-vgic.c
> > +++ b/xen/arch/arm/gic-vgic.c
> > @@ -418,7 +418,7 @@ struct irq_desc *vgic_get_hw_irq_desc(struct domain
> *d, struct vcpu *v,
> >  {
> >  struct pending_irq *p;
> >
> > -ASSERT(!v && virq >= 32);
> > +ASSERT((!v && (virq >= 32)) || (!d && v && (virq >= 16) && (virq <
> 32)));
>
> I don't think !d is necessary for this to work as intended so I would
> limit the ASSERT to
>
>   ASSERT((!v && (virq >= 32)) || (v && (virq >= 16) && (virq < 32)));
>
> the caller can always pass v->domain
>

But then you have the risk to run into d != v->domain. So at least with the
ASSERT you document the expectation.

Cheers,


>
> >  if ( !v )
> >  v = d->vcpu[0];
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range

2019-11-26 Thread Julien Grall

Hi,

On 26/11/2019 23:17, Stefano Stabellini wrote:

On Tue, 26 Nov 2019, Julien Grall wrote:

Hi,

On 26/11/2019 20:43, Stefano Stabellini wrote:

+ Juergen

I missed that you weren't in CC to the original patch, sorry.
I think this patch should go in, as otherwise Linux 5.4 could run into
problems. It is also a pretty straightforward 4 lines patch.


5.5 (or 5.6) is not going to run on Xen for other reasons (still in the
vGIC)... So I would not view this as critical.


5.5 is not out yet, in fact, the dev window has just opened. Isn't your
statement a bit premature?


The GICv4.1 work [1] is going to prevent Linux booting on all current 
versions of Xen. While I can't confirm this is going to be merged in 
5.5, I can tell you this will break.




In any case, even if potential future Linux releases could have other
additional issues, I don't think it should change our current view on
this specific issue which affects 5.4, just released.


The patch is definitely not as straightforward as you may think. Please 
refer to the discussion we had on the first version. I voiced concern 
about this approach and gave point what could go wrong with happen.


This patch may be better than the current state (i.e crashing), but this 
wasn't tested enough to confirm this is the correct things to do and no 
other bug will appear (I don't believe reading I*ACTIVER was ever tested 
before).


It is an annoying bug, but this is only affecting 5.4 which has just 
been released. It feels to me this is a fairly risky choice to merge it 
qutie late in the release without a good graps of the problem (see above).


So I would definitly, prefer if this patch is getting through backport 
once we get more testing.


We can still document the bug in the release note and point people to 
the patch.


Anyway, this is Juergen choice here. But at least now he has the full 
picture...


Cheers,

[1] https://lwn.net/Articles/800494/

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-mainline test] 144305: tolerable FAIL - PUSHED

2019-11-26 Thread osstest service owner
flight 144305 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144305/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 144297

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10  fail blocked in 144297
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144297
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144297
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144297
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144297
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144297
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 qemuu65e05c82bdc6d348155e301c9d87dba7a08a5701
baseline version:
 qemuu122e6d2a9c1bf8aa1d51409c15809a82621515b1

Last test of basis   144297  2019-11-25 15:06:18 Z1 days
Testing same since   144305  2019-11-26 05:17:32 Z0 days1 attempts


People who touched revisions under test:
  Fangrui Song 
  Marc-André Lureau 
  Markus Armbruster 
  Michael S. Tsirkin 
  Peter Maydell 
  Qi, Yadong 
  Zhang, Qi 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64 

Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range

2019-11-26 Thread Stefano Stabellini
On Tue, 26 Nov 2019, Julien Grall wrote:
> Hi,
> 
> On 26/11/2019 20:43, Stefano Stabellini wrote:
> > + Juergen
> > 
> > I missed that you weren't in CC to the original patch, sorry.
> > I think this patch should go in, as otherwise Linux 5.4 could run into
> > problems. It is also a pretty straightforward 4 lines patch.
> 
> 5.5 (or 5.6) is not going to run on Xen for other reasons (still in the
> vGIC)... So I would not view this as critical.

5.5 is not out yet, in fact, the dev window has just opened. Isn't your
statement a bit premature?

In any case, even if potential future Linux releases could have other
additional issues, I don't think it should change our current view on
this specific issue which affects 5.4, just released.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC XEN PATCH v3 10/11] xen: arm: context switch vtimer PPI state.

2019-11-26 Thread Stefano Stabellini
On Mon, 25 Nov 2019, Julien Grall wrote:
> On 15/11/2019 20:14, Stewart Hildebrand wrote:
> > From: Ian Campbell 
> > 
> > ... instead of artificially masking the timer interrupt in the timer
> > state and relying on the guest to unmask (which it isn't required to
> > do per the h/w spec, although Linux does).
> > 
> > By using the newly added hwppi save/restore functionality we make use
> > of the GICD I[SC]ACTIVER registers to save and restore the active
> > state of the interrupt, which prevents the nested invocations which
> > the current masking works around.
> > 
> > Signed-off-by: Ian Campbell 
> > Signed-off-by: Stewart Hildebrand 
> > ---
> > v2: Rebased, in particular over Julien's passthrough stuff which
> >  reworked a bunch of IRQ related stuff.
> >  Also largely rewritten since precursor patches now lay very
> >  different groundwork.
> > 
> > v3: Address feedback from v2 [1]:
> >* Remove virt_timer_irqs performance counter since it is now unused.
> >* Add caveat to comment about not using I*ACTIVER register.
> >* HACK: don't initialize pending_irq->irq in vtimer for new vGIC to
> >  allows us to build with CONFIG_NEW_VGIC=y
> > 
> > [1]
> > https://lists.xenproject.org/archives/html/xen-devel/2015-11/msg01065.html
> > ---
> > 
> > Note: Regarding Stefano's comment in [2], I did test it with the call
> > to gic_hwppi_set_pending removed, and I was able to boot dom0.
> 
> When dealing with the vGIC, testing is not enough to justify the removal of
> some code. We need a worded justification of why it is (or not) necessary.
> 
> In this case the timer is level (despite some broken HW misconfiguring it), so
> by removing set_pending() you don't affect anything as restoring the timer
> registers will automatically mark the interrupt pending.
> 
> > 
> > [2]
> > https://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02683.html
> > ---
> >   xen/arch/arm/time.c  | 26 ++
> >   xen/arch/arm/vtimer.c| 45 +---
> >   xen/include/asm-arm/domain.h |  1 +
> >   xen/include/asm-arm/perfc_defn.h |  1 -
> >   4 files changed, 44 insertions(+), 29 deletions(-)
> > 
> > diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
> > index 739bcf186c..e3a23b8e16 100644
> > --- a/xen/arch/arm/time.c
> > +++ b/xen/arch/arm/time.c
> > @@ -243,28 +243,6 @@ static void timer_interrupt(int irq, void *dev_id,
> > struct cpu_user_regs *regs)
> >   }
> >   }
> >   -static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs
> > *regs)
> > -{
> > -/*
> > - * Edge-triggered interrupts can be used for the virtual timer. Even
> > - * if the timer output signal is masked in the context switch, the
> > - * GIC will keep track that of any interrupts raised while IRQS are
> > - * disabled. As soon as IRQs are re-enabled, the virtual interrupt
> > - * will be injected to Xen.
> > - *
> > - * If an IDLE vCPU was scheduled next then we should ignore the
> > - * interrupt.
> > - */
> > -if ( unlikely(is_idle_vcpu(current)) )
> > -return;
> > -
> > -perfc_incr(virt_timer_irqs);
> > -
> > -current->arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0);
> > -WRITE_SYSREG32(current->arch.virt_timer.ctl | CNTx_CTL_MASK,
> > CNTV_CTL_EL0);
> > -vgic_inject_irq(current->domain, current, current->arch.virt_timer.irq,
> > true);
> > -}
> > -
> >   /*
> >* Arch timer interrupt really ought to be level triggered, since the
> >* design of the timer/comparator mechanism is based around that
> > @@ -304,8 +282,8 @@ void init_timer_interrupt(void)
> > request_irq(timer_irq[TIMER_HYP_PPI], 0, timer_interrupt,
> >   "hyptimer", NULL);
> > -request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt,
> > -   "virtimer", NULL);
> > +route_hwppi_to_current_vcpu(timer_irq[TIMER_VIRT_PPI], "virtimer");
> > +
> >   request_irq(timer_irq[TIMER_PHYS_NONSECURE_PPI], 0, timer_interrupt,
> >   "phytimer", NULL);
> >   diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
> > index e6aebdac9e..6e3498952d 100644
> > --- a/xen/arch/arm/vtimer.c
> > +++ b/xen/arch/arm/vtimer.c
> > @@ -55,9 +55,19 @@ static void phys_timer_expired(void *data)
> >   static void virt_timer_expired(void *data)
> >   {
> >   struct vtimer *t = data;
> > -t->ctl |= CNTx_CTL_MASK;
> > -vgic_inject_irq(t->v->domain, t->v, t->irq, true);
> > -perfc_incr(vtimer_virt_inject);
> > +t->ctl |= CNTx_CTL_PENDING;
> 
> I don't think this is necessary. If the software timer fire, then the virtual
> timer (in HW) would have fired too. So by restoring the timer, then the HW
> should by itself set the pending bit and trigger the interrupt.
> 
> > +if ( !(t->ctl & CNTx_CTL_MASK) )
> 
> The timer is only set if the virtual timer is enabled and not masked. So I
> think this check is unnecessary as we could never reached 

Re: [Xen-devel] [RFC XEN PATCH v3 08/11] xen: arm: vgic: don't fail if IRQ is already connected

2019-11-26 Thread Stefano Stabellini
On Fri, 15 Nov 2019, Stewart Hildebrand wrote:
> There are some IRQs that happen to have multiple "interrupts = < ... >;"
> properties with the same IRQ in the device tree. For example:
> 
> interrupts = <0 123 4>,
>  <0 123 4>,
>  <0 123 4>,
>  <0 123 4>,
>  <0 123 4>;
> 
> In this case it seems that we are invoking vgic_connect_hw_irq multiple
> times for the same IRQ.
> 
> Rework the checks to allow booting in this scenario.
> 
> I have not seen any cases where the pre-existing p->desc is any different from
> the new desc, so BUG() out if they're different for now.
> 
> Signed-off-by: Stewart Hildebrand 
> 
> ---
> v3: new patch
> 
> I tested on Xilinx Zynq UltraScale+ with the old vGIC. I have not fully
> tested with CONFIG_NEW_VGIC. This hack only became necessary after
> introducing the PPI series, and I'm not entirely sure what the reason
> is for that.

I think the reason is actually very simple: with the previous code if
the irq was already setup and the details matched it would "goto out"
all the way out of route_irq_to_guest.

Now with the new code, it would "goto out" of setup_guest_irq returning
zero, which means that gic_route_irq_to_guest is actually going to be
called anyway, which is a mistake. I think we want to avoid that by
returning an appropriate error condition from setup_guest_irq so that we
also return early from route_irq_to_guest.


> I'm also unsure if BUG()ing out is the right thing to do in case of
> desc != p->desc, or what conditions would even trigger this? Is this
> function exposed to guests?

I think the original code printed a warning and returned an error.
That's probably still what we want.



> ---
>  xen/arch/arm/gic-vgic.c  | 9 +++--
>  xen/arch/arm/vgic/vgic.c | 4 
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
> index 2c66a8fa92..5c16e66b32 100644
> --- a/xen/arch/arm/gic-vgic.c
> +++ b/xen/arch/arm/gic-vgic.c
> @@ -460,9 +460,14 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu 
> *v, unsigned int virq,
>  if ( connect )
>  {
>  /* The VIRQ should not be already enabled by the guest */
> -if ( !p->desc &&
> - !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
> +if ( !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
> +{
> +if (p->desc && p->desc != desc)

Code style


> +{
> +BUG();
> +}
>  p->desc = desc;
> +}
>  else
>  ret = -EBUSY;
>  }
> diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
> index f0f2ea5021..aa775f7668 100644
> --- a/xen/arch/arm/vgic/vgic.c
> +++ b/xen/arch/arm/vgic/vgic.c
> @@ -882,6 +882,10 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu 
> *vcpu,
>  irq->hw = true;
>  irq->hwintid = desc->irq;
>  }
> +else if ( irq->hw && !irq->enabled && irq->hwintid == desc->irq )
> +{
> +/* The IRQ was already connected. No action is necessary. */
> +}
>  else
>  ret = -EBUSY;
>  }


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests

2019-11-26 Thread Stefano Stabellini
On Fri, 15 Nov 2019, Stewart Hildebrand wrote:
> Allow vgic_get_hw_irq_desc to be called with a vcpu argument.
> 
> Use vcpu argument in vgic_connect_hw_irq.
> 
> vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with
> ASSERTs.
> 
> Signed-off-by: Stewart Hildebrand 
> 
> ---
> v3: new patch
> 
> ---
> Note: I have only modified the old vgic to allow delivery of PPIs.
> ---
>  xen/arch/arm/gic-vgic.c | 24 
>  xen/arch/arm/vgic.c |  6 +++---
>  2 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
> index 98c021f1a8..2c66a8fa92 100644
> --- a/xen/arch/arm/gic-vgic.c
> +++ b/xen/arch/arm/gic-vgic.c
> @@ -418,7 +418,7 @@ struct irq_desc *vgic_get_hw_irq_desc(struct domain *d, 
> struct vcpu *v,
>  {
>  struct pending_irq *p;
>  
> -ASSERT(!v && virq >= 32);
> +ASSERT((!v && (virq >= 32)) || (!d && v && (virq >= 16) && (virq < 32)));

I don't think !d is necessary for this to work as intended so I would
limit the ASSERT to

  ASSERT((!v && (virq >= 32)) || (v && (virq >= 16) && (virq < 32)));

the caller can always pass v->domain


>  if ( !v )
>  v = d->vcpu[0];

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH v3 05/11] xen: arm: add interfaces to save/restore the state of a PPI.

2019-11-26 Thread Stefano Stabellini
On Fri, 15 Nov 2019, Stewart Hildebrand wrote:
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index f3f3fb7d7f..c3f4cd5069 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -34,6 +34,17 @@ enum domain_type {
>  /* The hardware domain has always its memory direct mapped. */
>  #define is_domain_direct_mapped(d) ((d) == hardware_domain)
>  
> +struct hwppi_state {
> +/* h/w state */
> +unsigned irq;

It doesn't look like we need to save the irq number again here.


> +unsigned long enabled:1;
> +unsigned long pending:1;
> +unsigned long active:1;
> +
> +/* Xen s/w state */
> +unsigned long inprogress:1;
> +};
> +
>  struct vtimer {
>  struct vcpu *v;
>  int irq;
> diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
> index 793d324b33..1164e0c7a6 100644
> --- a/xen/include/asm-arm/gic.h
> +++ b/xen/include/asm-arm/gic.h
> @@ -275,6 +275,26 @@ extern int gicv_setup(struct domain *d);
>  extern void gic_save_state(struct vcpu *v);
>  extern void gic_restore_state(struct vcpu *v);
>  
> +/*
> + * Save/restore the state of a single PPI which must be routed to
> + *  (that is, is defined to be injected to the current
> + * vcpu).
> + *
> + * We expect the device which use this PPI to be quiet while we
> + * save/restore.
> + *
> + * For instance we want to disable the timer before saving the state.
> + * Otherwise we will mess up the state.
> + */
> +struct hwppi_state;

It is a bit awkward to have to do this "redefine" struct hwppi_state
here. I know that the Xen headers file don't always include correctly,
but could we move the full definition of struct hwppi_state here?
domain.h is already including gic.h, so it should work?


> +extern void gic_hwppi_state_init(struct hwppi_state *s, unsigned irq);
> +extern void gic_hwppi_set_pending(struct hwppi_state *s);
> +extern void gic_save_and_mask_hwppi(struct vcpu *v, unsigned irq,
> +struct hwppi_state *s);
> +extern void gic_restore_hwppi(struct vcpu *v,
> +  const unsigned virq,
> +  const struct hwppi_state *s);
> +
>  /* SGI (AKA IPIs) */
>  enum gic_sgi {
>  GIC_SGI_EVENT_CHECK = 0,
> @@ -325,8 +345,10 @@ struct gic_hw_operations {
>  int (*init)(void);
>  /* Save GIC registers */
>  void (*save_state)(struct vcpu *);
> +void (*save_and_mask_hwppi)(struct irq_desc *desc, struct hwppi_state 
> *s);
>  /* Restore GIC registers */
>  void (*restore_state)(const struct vcpu *);
> +void (*restore_hwppi)(struct irq_desc *desc, const struct hwppi_state 
> *s);
>  /* Dump GIC LR register information */
>  void (*dump_state)(const struct vcpu *);
>  
> diff --git a/xen/include/asm-arm/irq.h b/xen/include/asm-arm/irq.h
> index e14001b5c6..3b37a21c06 100644
> --- a/xen/include/asm-arm/irq.h
> +++ b/xen/include/asm-arm/irq.h
> @@ -96,6 +96,8 @@ void irq_set_affinity(struct irq_desc *desc, const 
> cpumask_t *cpu_mask);
>   */
>  bool irq_type_set_by_domain(const struct domain *d);
>  
> +void irq_set_virq(struct irq_desc *desc, unsigned virq);
> +
>  #endif /* _ASM_HW_IRQ_H */
>  /*
>   * Local variables:

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests

2019-11-26 Thread Julien Grall



On 26/11/2019 22:36, Stefano Stabellini wrote:

On Mon, 25 Nov 2019, Julien Grall wrote:

On 23/11/2019 20:35, Julien Grall wrote:

Hi,

On 15/11/2019 20:10, Stewart Hildebrand wrote:

Allow vgic_get_hw_irq_desc to be called with a vcpu argument.

Use vcpu argument in vgic_connect_hw_irq.

vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with
ASSERTs.

Signed-off-by: Stewart Hildebrand 

---
v3: new patch

---
Note: I have only modified the old vgic to allow delivery of PPIs.


The new vGIC should also be modified to support delivery of PPIs.


diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 82f524a35c..c3933c2687 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -410,10 +410,10 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r,
int n)
   irq_set_affinity(p->desc, cpumask_of(v_target->processor));
   spin_lock_irqsave(>desc->lock, flags);
   /*
- * The irq cannot be a PPI, we only support delivery of SPIs
- * to guests.
+ * The irq cannot be a SGI, we only support delivery of SPIs
+ * and PPIs to guests.
    */
-    ASSERT(irq >= 32);
+    ASSERT(irq >= NR_SGIS);


We usually put ASSERT() in place we know that code wouldn't be able to work
correctly if there ASSERT were hit. In this particular case:


   if ( irq_type_set_by_domain(d) )
   gic_set_irq_type(p->desc, vgic_get_virq_type(v, n, i));


1) We don't want to allow any domain (including Dom0) to modify the
interrupt type (i.e. level/edge) for PPIs as this is shared. You will also
most likely need to modify the counterpart in setup_guest_irq().


   p->desc->handler->enable(p->desc);


2) On GICv3, the re-distributor of vCPU A is accessible by vCPU B. So vCPU B
could enable the SGI for vCPU A. But this would be called on the wrong pCPU
leading to inconsistency between the hardware state of the internal vGIC
state.


I thought a bit more of the issue over the week-end. The current vGIC is
fairly messy. I can see two solutions on how to solve this:
 1) Send an IPI to the pCPU where the vCPU A is running and disable/enable
the interrupt. The other side would need to the vCPU was actually running to
avoid disabling the PPI for the wrong pCPU
 2) Keep the HW interrupt always enabled

We propagated the enable/disable because of some messy part in the vGIC:
 - vgic_inject_irq() will not queue any pending interrupt if the vCPU is
offline. While interrupt cannot be delivered, we still need to keep them
pending as they will never occur again otherwise. This is because they are
active on the host side and the guest has no way to deactivate them.
 - Our implementation of PSCI CPU will remove all pending interrupts (see
vgic_clear_pending_irqs()). I am not entirely sure the implication here
because of the previous.

There are a probably more. Aside the issues with it, I don't really see good
advantage to propagate the interrupt state as the interrupts (PPIs, SPIs) have
active state. So they can only be received once until the guest actually
handles it.

So my preference would still be 2) because this makes the code simpler, avoid
IPI and other potential locking trouble.


Yes, I think that is a good suggestion. I take that you mean that in
vgic_disable_irqs for PPIs we would only clear GIC_IRQ_GUEST_ENABLED
then return basically, right?

Not really, I am only suggesting to remove the part

if ( desc != NULL )
  ...

But this change alone is not enough. It would require some modification 
in the rest of the vGIC (see my previous e-mail) and likely some 
investigation to understand the implication of keeping the interrupt 
enabled from the HW (I am a bit worry we may have backed this assumption 
into other part of the vGIC :().


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests

2019-11-26 Thread Stefano Stabellini
On Mon, 25 Nov 2019, Julien Grall wrote:
> On 23/11/2019 20:35, Julien Grall wrote:
> > Hi,
> > 
> > On 15/11/2019 20:10, Stewart Hildebrand wrote:
> > > Allow vgic_get_hw_irq_desc to be called with a vcpu argument.
> > > 
> > > Use vcpu argument in vgic_connect_hw_irq.
> > > 
> > > vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with
> > > ASSERTs.
> > > 
> > > Signed-off-by: Stewart Hildebrand 
> > > 
> > > ---
> > > v3: new patch
> > > 
> > > ---
> > > Note: I have only modified the old vgic to allow delivery of PPIs.
> > 
> > The new vGIC should also be modified to support delivery of PPIs.
> > 
> > > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > > index 82f524a35c..c3933c2687 100644
> > > --- a/xen/arch/arm/vgic.c
> > > +++ b/xen/arch/arm/vgic.c
> > > @@ -410,10 +410,10 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r,
> > > int n)
> > >   irq_set_affinity(p->desc, cpumask_of(v_target->processor));
> > >   spin_lock_irqsave(>desc->lock, flags);
> > >   /*
> > > - * The irq cannot be a PPI, we only support delivery of SPIs
> > > - * to guests.
> > > + * The irq cannot be a SGI, we only support delivery of SPIs
> > > + * and PPIs to guests.
> > >    */
> > > -    ASSERT(irq >= 32);
> > > +    ASSERT(irq >= NR_SGIS);
> > 
> > We usually put ASSERT() in place we know that code wouldn't be able to work
> > correctly if there ASSERT were hit. In this particular case:
> > 
> > >   if ( irq_type_set_by_domain(d) )
> > >   gic_set_irq_type(p->desc, vgic_get_virq_type(v, n, i));
> > 
> > 1) We don't want to allow any domain (including Dom0) to modify the
> > interrupt type (i.e. level/edge) for PPIs as this is shared. You will also
> > most likely need to modify the counterpart in setup_guest_irq().
> > 
> > >   p->desc->handler->enable(p->desc);
> > 
> > 2) On GICv3, the re-distributor of vCPU A is accessible by vCPU B. So vCPU B
> > could enable the SGI for vCPU A. But this would be called on the wrong pCPU
> > leading to inconsistency between the hardware state of the internal vGIC
> > state.
> 
> I thought a bit more of the issue over the week-end. The current vGIC is
> fairly messy. I can see two solutions on how to solve this:
> 1) Send an IPI to the pCPU where the vCPU A is running and disable/enable
> the interrupt. The other side would need to the vCPU was actually running to
> avoid disabling the PPI for the wrong pCPU
> 2) Keep the HW interrupt always enabled
> 
> We propagated the enable/disable because of some messy part in the vGIC:
> - vgic_inject_irq() will not queue any pending interrupt if the vCPU is
> offline. While interrupt cannot be delivered, we still need to keep them
> pending as they will never occur again otherwise. This is because they are
> active on the host side and the guest has no way to deactivate them.
> - Our implementation of PSCI CPU will remove all pending interrupts (see
> vgic_clear_pending_irqs()). I am not entirely sure the implication here
> because of the previous.
> 
> There are a probably more. Aside the issues with it, I don't really see good
> advantage to propagate the interrupt state as the interrupts (PPIs, SPIs) have
> active state. So they can only be received once until the guest actually
> handles it.
> 
> So my preference would still be 2) because this makes the code simpler, avoid
> IPI and other potential locking trouble.

Yes, I think that is a good suggestion. I take that you mean that in
vgic_disable_irqs for PPIs we would only clear GIC_IRQ_GUEST_ENABLED
then return basically, right?___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)

2019-11-26 Thread Roman Shaposhnik
On Tue, Nov 26, 2019 at 1:20 PM Rich Persaud  wrote:
>
> On Nov 26, 2019, at 15:23, Andrew Cooper  wrote:
>
>
> On 26/11/2019 20:12, Roman Shaposhnik wrote:
>
> On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki
>
>  wrote:
>
> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
>
> Hi Marek, after applying Jan's patch I'm making much further progress.
>
> Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
>
> my end).
>
>
> I'm attaching the logs from Xen and Dom0.
>
>
> At this point it seems that adding efi=attr=uc is a better option for
>
> these boxes than a wholesale efi=no-rs
>
>
> Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
>
> supposed to cover by default (so I don't have to add efi=attr=uc)?
>
> No, this looks like some different firmware (?) issue.
>
>
> Question #2: is there any downside to *always* specifying efi=attr=uc?
>
> Even for servers that, strictly speaking, don't need it?
>
> TL;DR: It should be fine. It is what Linux does too.
>
>
> Details:
>
>
> Lets take a look why 'efi=attr=uc' helps, and how can we make it work
>
> out of the box:
>
>
> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
>
> attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability
>
> attribute is defined. For the record, defined attributes are (UEFI spec
>
> .6):
>
>
>EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
>
>being configured as not cacheable.
>
>
>EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
>
>being configured as write combining.
>
>
>EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
>
>being configured as cacheable with a “write through” policy.
>
>Writes that hit in the cache will also be written to main memory.
>
>
>EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
>
>being configured as cacheable with a “write back” policy. Reads
>
>and writes that hit in the cache do not propagate to main memory.
>
>Dirty data is written back to main memory when a new cache line
>
>is allocated.
>
>
>EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
>
>being configured as not cacheable, exported, and supports the
>
>“fetch and add” semaphore mechanism.
>
>
> My reading of UEFI spec doesn't give much hints what to do with memory
>
> mappings without any cachability attribute. The only related info I've
>
> found is about EfiMemoryMappedIO:
>
>
>This memory is not used by the OS. All system memory-mapped IO
>
>information should come from ACPI tables.
>
>
> So, maybe there is some more info?
>
>
> Anyway, if I understand correctly, MMIO region should be mapped as UC,
>
> right?
>
>
> I've also taken look at what Linux does. And basically, the only bit
>
> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
>
> as uncachable (page cache disabled bit in page table entry). So,
>
> basically Linux by default does what Xen's efi=attr=uc does.
>
> Very interesting! Thanks for doing the research.
>
>
> So, to improve Xen's hardware/firmware compatibility, I have two ideas:
>
>
> 1. Make efi=attr=uc the default (it's still possible to disable it with
>
> efi=attr=no).
>
> I'd be very much in favor of that too (especially since it seems to match
>
> Linux behaviour) What do others think?
>
>
> Its more than just this.  Linux also doesn't use EFI reboot because it
> is broken almost everywhere (because Windows doesn't use it because its
> broken almost everywhere, so it never gets fixed).
>
> Xen should be following Linux, but I'm exhausted arguing this point.
>
> A consequence is that downstream tend to share a pile of "unbreak Xen on
> UEFI" patches which have been rejected upstream on philosophical rather
> than technical grounds, despite this being a toxic environment to work in.
>
>
> As an intermediate step, could we have an umbrella opt-in Kconfig option 
> (CONFIG_EFI_NONSPEC_COMPATIBILITY?) that enables multiple EFI options for 
> maximum hardware compatibility?  For this thread and Xen 4.13, that would be 
> EFI_SET_VIRTUAL_ADDRESS_MAP and efi=attr=uc.  If more options/quirks are 
> added in the future, downstreams using EFI_NONSPEC_COMPATIBILITY would get 
> them by default.

As one of those downstream users I have to say I like this A LOT!

> The long-term solution is an OSS virtualization-security test tool (e.g. with 
> Xen and QEMU KVM) that can be run by OEM/ODM QA factory teams on 
> pre-production firmware and hardware.  That is the most OEM-actionable 
> development window where firmware quality issues can be detected and fixed.  
> Microsoft's hardware logo/certification work with Windows 10 OEMs on "secured 
> core" features is also tackling firmware improvements for 
> virtualization-based security.

That's a good proposal, but the question, as always becomes who moves
the needle on this one so we avoid 

Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)

2019-11-26 Thread Rich Persaud
On Nov 26, 2019, at 15:23, Andrew Cooper  wrote:
> On 26/11/2019 20:12, Roman Shaposhnik wrote:
>>> On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki
>>>  wrote:
>>> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
 Hi Marek, after applying Jan's patch I'm making much further progress.
 Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
 my end).
 I'm attaching the logs from Xen and Dom0.
 At this point it seems that adding efi=attr=uc is a better option for
 these boxes than a wholesale efi=no-rs
 Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
 supposed to cover by default (so I don't have to add efi=attr=uc)?
>>> No, this looks like some different firmware (?) issue.
 Question #2: is there any downside to *always* specifying efi=attr=uc?
 Even for servers that, strictly speaking, don't need it?
>>> TL;DR: It should be fine. It is what Linux does too.
>>> Details:
>>> Lets take a look why 'efi=attr=uc' helps, and how can we make it work
>>> out of the box:
>>> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
>>> attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability
>>> attribute is defined. For the record, defined attributes are (UEFI spec
>>> .6):
>>>EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
>>>being configured as not cacheable.
>>>EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
>>>being configured as write combining.
>>>EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
>>>being configured as cacheable with a “write through” policy.
>>>Writes that hit in the cache will also be written to main memory.
>>>EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
>>>being configured as cacheable with a “write back” policy. Reads
>>>and writes that hit in the cache do not propagate to main memory.
>>>Dirty data is written back to main memory when a new cache line
>>>is allocated.
>>>EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
>>>being configured as not cacheable, exported, and supports the
>>>“fetch and add” semaphore mechanism.
>>> My reading of UEFI spec doesn't give much hints what to do with memory
>>> mappings without any cachability attribute. The only related info I've
>>> found is about EfiMemoryMappedIO:
>>>This memory is not used by the OS. All system memory-mapped IO
>>>information should come from ACPI tables.
>>> So, maybe there is some more info?
>>> Anyway, if I understand correctly, MMIO region should be mapped as UC,
>>> right?
>>> I've also taken look at what Linux does. And basically, the only bit
>>> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
>>> as uncachable (page cache disabled bit in page table entry). So,
>>> basically Linux by default does what Xen's efi=attr=uc does.
>> Very interesting! Thanks for doing the research.
>> 
>>> So, to improve Xen's hardware/firmware compatibility, I have two ideas:
>>> 1. Make efi=attr=uc the default (it's still possible to disable it with
>>> efi=attr=no).
>> I'd be very much in favor of that too (especially since it seems to match
>> Linux behaviour) What do others think?
> 
> Its more than just this.  Linux also doesn't use EFI reboot because it
> is broken almost everywhere (because Windows doesn't use it because its
> broken almost everywhere, so it never gets fixed).
> 
> Xen should be following Linux, but I'm exhausted arguing this point.
> 
> A consequence is that downstream tend to share a pile of "unbreak Xen on
> UEFI" patches which have been rejected upstream on philosophical rather
> than technical grounds, despite this being a toxic environment to work in.

As an intermediate step, could we have an umbrella opt-in Kconfig option 
(CONFIG_EFI_NONSPEC_COMPATIBILITY?) that enables multiple EFI options for 
maximum hardware compatibility?  For this thread and Xen 4.13, that would be 
EFI_SET_VIRTUAL_ADDRESS_MAP and efi=attr=uc.  If more options/quirks are added 
in the future, downstreams using EFI_NONSPEC_COMPATIBILITY would get them by 
default.

The long-term solution is an OSS virtualization-security test tool (e.g. with 
Xen and QEMU KVM) that can be run by OEM/ODM QA factory teams on pre-production 
firmware and hardware.  That is the most OEM-actionable development window 
where firmware quality issues can be detected and fixed.  Microsoft's hardware 
logo/certification work with Windows 10 OEMs on "secured core" features is also 
tackling firmware improvements for virtualization-based security. 

From the business side, Dell/HP/Lenovo + other OEMs and ODMs could add premium 
"FirmCare" SKUs into their custom build ordering systems, where customers could 
pay a small fee for additional firmware support, custom root-of-trust (e.g. 
BootGuard) key management, or even coreboot.  

[Xen-devel] [PATCH v2] xen/arm: remove physical timer offset

2019-11-26 Thread Jeff Kubascik
The physical timer traps apply an offset so that time starts at 0 for
the guest. However, this offset is not currently applied to the physical
counter. Per the ARMv8 Reference Manual (ARM DDI 0487E.a), section
D11.2.4 Timers, the "Offset" between the counter and timer should be
zero for a physical timer. This removes the offset to make the timer and
counter consistent.

Furthermore, section D11.2.4 specifies that the values in the TimerValue
view of the timers are signed in standard two's complement form. When
writing to the TimerValue register, it should be signed extended as
described by the equation

  CompareValue = (Counter[63:0] + SignExtend(TimerValue))[63:0]

Signed-off-by: Jeff Kubascik 
---
Changes in v2:
- Update commit message to specify reference manual version and section
- Change physical timer cval to hold hardware value
- Make sure to sign extend TimerValue on writes. This was done by first
  casting the r pointer to (int32_t *), dereferencing it, then casting
  to uint64_t. Please let me know if there is a more correct way to do
  this
---
 xen/arch/arm/vtimer.c| 21 +
 xen/include/asm-arm/domain.h |  3 ---
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index e6aebdac9e..eb12a08acf 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -62,7 +62,6 @@ static void virt_timer_expired(void *data)
 
 int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)
 {
-d->arch.phys_timer_base.offset = NOW();
 d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0);
 d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - 
boot_count);
 do_div(d->time_offset_seconds, 10);
@@ -185,7 +184,7 @@ static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, 
uint32_t *r, bool read)
 if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )
 {
 set_timer(>arch.phys_timer.timer,
-  v->arch.phys_timer.cval + 
v->domain->arch.phys_timer_base.offset);
+  ticks_to_ns(v->arch.phys_timer.cval - boot_count));
 }
 else
 stop_timer(>arch.phys_timer.timer);
@@ -197,26 +196,25 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, 
uint32_t *r,
  bool read)
 {
 struct vcpu *v = current;
-s_time_t now;
+uint64_t cntpct;
 
 if ( !ACCESS_ALLOWED(regs, EL0PTEN) )
 return false;
 
-now = NOW() - v->domain->arch.phys_timer_base.offset;
+cntpct = get_cycles();
 
 if ( read )
 {
-*r = (uint32_t)(ns_to_ticks(v->arch.phys_timer.cval - now) & 
0xull);
+*r = (uint32_t)((v->arch.phys_timer.cval - cntpct) & 0xull);
 }
 else
 {
-v->arch.phys_timer.cval = now + ticks_to_ns(*r);
+v->arch.phys_timer.cval = cntpct + (uint64_t)(*((int32_t *)r));
 if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )
 {
 v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING;
 set_timer(>arch.phys_timer.timer,
-  v->arch.phys_timer.cval +
-  v->domain->arch.phys_timer_base.offset);
+  ticks_to_ns(v->arch.phys_timer.cval - boot_count));
 }
 }
 return true;
@@ -232,17 +230,16 @@ static bool vtimer_cntp_cval(struct cpu_user_regs *regs, 
uint64_t *r,
 
 if ( read )
 {
-*r = ns_to_ticks(v->arch.phys_timer.cval);
+*r = v->arch.phys_timer.cval;
 }
 else
 {
-v->arch.phys_timer.cval = ticks_to_ns(*r);
+v->arch.phys_timer.cval = *r;
 if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )
 {
 v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING;
 set_timer(>arch.phys_timer.timer,
-  v->arch.phys_timer.cval +
-  v->domain->arch.phys_timer_base.offset);
+  ticks_to_ns(v->arch.phys_timer.cval - boot_count));
 }
 }
 return true;
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 86ebdd2bcf..16a7150a95 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -65,9 +65,6 @@ struct arch_domain
 RELMEM_done,
 } relmem;
 
-struct {
-uint64_t offset;
-} phys_timer_base;
 struct {
 uint64_t offset;
 } virt_timer_base;
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range

2019-11-26 Thread Julien Grall

Hi,

On 26/11/2019 20:43, Stefano Stabellini wrote:

+ Juergen

I missed that you weren't in CC to the original patch, sorry.
I think this patch should go in, as otherwise Linux 5.4 could run into
problems. It is also a pretty straightforward 4 lines patch.


5.5 (or 5.6) is not going to run on Xen for other reasons (still in the 
vGIC)... So I would not view this as critical.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] bsp/xen: Update README

2019-11-26 Thread Jeff Kubascik
On 11/26/2019 3:34 PM, Julien Grall wrote:
> Hi,
> 
> On 26/11/2019 20:27, Jeff Kubascik wrote:
>> Add some background information on the BSP and instructions on how to
>> run the ticker application.
>>
>> Change-Id: I05050a335a938f00cc59bae69a014c5f04e05d23
>> ---
>>   bsps/arm/xen/README | 130 ++--
> 
> Hmmm what repo is it?

Whoops. This is the RTEMS port that I am working on. I must of did a git
send-email from the wrong directory. This will be making its way to rtems-devel
soon. Apologies!

> Cheers,
> 
>>   1 file changed, 64 insertions(+), 66 deletions(-)
>>
>> diff --git a/bsps/arm/xen/README b/bsps/arm/xen/README
>> index 1b24d84c9a..2ae2f2170d 100644
>> --- a/bsps/arm/xen/README
>> +++ b/bsps/arm/xen/README
>> @@ -1,66 +1,64 @@
>> -#  This is a sample hardware description file for a BSP.  This comment
>> -#  block does not have to appear in a real one.  The intention of this
>> -#  file is to provide a central place to look when searching for
>> -#  information about a board when starting a new BSP.  For example,
>> -#  you may want to find an existing timer driver for the chip you are
>> -#  using on your board.  It is easier to grep for the chip name in
>> -#  all of the HARDWARE files than to peruse the source tree.  Hopefully,
>> -#  making the HARDDWARE files accurate will also alleviate the common
>> -#  problem of not knowing anything about a board based on its BSP
>> -#  name.
>> -#
>> -#  NOTE:  If you have a class of peripheral chip on board which
>> -# is not in this list please add it to this file so
>> -# others will also use the same name.
>> -#
>> -# Timer resolution is the way it is configured in this BSP.
>> -# On a counting timer, this is the length of time which
>> -# corresponds to 1 count.
>> -#
>> -
>> -BSP NAME:   fastsbc1
>> -BOARD:  Fat Computers, Fast SBC-1
>> -BUS:SchoolBus
>> -CPU FAMILY: i386
>> -CPU:Intel Hexium
>> -COPROCESSORS:   Witch Hex87
>> -MODE:   32 bit mode
>> -
>> -DEBUG MONITOR:  HexBug
>> -
>> -PERIPHERALS
>> -===
>> -TIMERS: Intel i8254
>> -  RESOLUTION: .0001 microseconds
>> -SERIAL PORTS:   Zilog Z8530 (with 2 ports)
>> -REAL-TIME CLOCK:RTC-4
>> -DMA:Intel i8259
>> -VIDEO:  none
>> -SCSI:   none
>> -NETWORKING: none
>> -
>> -DRIVER INFORMATION
>> -==
>> -CLOCK DRIVER:   RTC-4
>> -IOSUPP DRIVER:  Zilog Z8530 port A
>> -SHMSUPP:polled and interrupts
>> -TIMER DRIVER:   Intel i8254
>> -TTY DRIVER: stub only
>> -
>> -STDIO
>> -=
>> -PORT:   Console port 0
>> -ELECTRICAL: RS-232
>> -BAUD:   9600
>> -BITS PER CHARACTER: 8
>> -PARITY: None
>> -STOP BITS:  1
>> -
>> -NOTES
>> -=
>> -
>> -(1) 900 Mhz and 950 Mhz versions.
>> -
>> -(2) 1 Gb or 2 Gb RAM.
>> -
>> -(3) PC compatible if HexBug not enabled.
>> +BSP for Xen on ARM
>> +
>> +Overview
>> +
>> +
>> +This BSP enables RTEMS to run as a guest virtual machine in AArch32 mode on 
>> the
>> +Xen hypervisor for ARMv8 platforms.
>> +
>> +Drivers:
>> +- Clock: ARMv7-AR Generic Timer
>> +- Console: Virtual PL011 device
>> +- Interrupt: GICv2
>> +
>> +BSP variants:
>> +- xen_virtual: completely virtualized guest with no dependence on underlying
>> +  hardware
>> +
>> +The xen_virtual BSP variant relies on standard Xen features, so it should be
>> +able to run on any ARMv8 platform.
>> +
>> +Xen allows for the passthrough of hardware peripherals to guest virtual
>> +machines. BSPs could be added in the future targeting specific hardware
>> +platforms and include the appropriate drivers.
>> +
>> +This BSP was tested with Xen running on the Xilinx Zynq UltraScale+ MPSoC 
>> using
>> +the Virtuosity distribution maintained by DornerWorks.
>> +
>> +Execution
>> +-
>> +
>> +This procedure describes how to run the ticker sample application that 
>> should
>> +already be built with the BSP.
>> +
>> +The `ticker.exe` file can be found in the BSP build tree at:
>> +
>> +  arm-rtems5/c/xen_virtual/testsuites/samples/ticker.exe
>> +
>> +The `ticker.exe` elf file must be translated to a binary format.
>> +
>> +  arm-rtems5-objcopy -O binary ticker.exe ticker.bin
>> +
>> +Then place the `ticker.bin` file on the dom0 filesystem.
>> +
>> +From the dom0 console, create a configuration file `ticker.cfg` with the
>> +following contents.
>> +
>> +  name = "ticker"
>> +  kernel = "ticker.bin"
>> +  memory = 8
>> +  vcpus = 1
>> +  gic_version = "v2"
>> +  vuart = "sbsa_uart"
>> +
>> +Create the virtual machine and attach to the virtual vpl011 console.
>> +
>> +  xl create ticker.cfg && xl console -t vuart ticker
>> +
>> +To return back to the dom0 console, press both `Ctrl` and `]` on your 
>> keyboard.
>> +
>> +Additional information
>> +--
>> +
>> 

Re: [Xen-devel] [PATCH V2] arch: arm: vgic-v3: fix GICD_ISACTIVER range

2019-11-26 Thread Stefano Stabellini
+ Juergen

I missed that you weren't in CC to the original patch, sorry.
I think this patch should go in, as otherwise Linux 5.4 could run into
problems. It is also a pretty straightforward 4 lines patch.



On Fri, 22 Nov 2019, Stefano Stabellini wrote:
> On Fri, 22 Nov 2019, Peng Fan wrote:
> > The end should be GICD_ISACTIVERN not GICD_ISACTIVER,
> > and also print a warning for the unhandled read.
> > 
> > Signed-off-by: Peng Fan 
> > ---
> > 
> > V2:
> >  Add a warning message
> > 
> >  xen/arch/arm/vgic-v3.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> > index 422b94f902..a15b9f6441 100644
> > --- a/xen/arch/arm/vgic-v3.c
> > +++ b/xen/arch/arm/vgic-v3.c
> > @@ -706,7 +706,10 @@ static int __vgic_v3_distr_common_mmio_read(const char 
> > *name, struct vcpu *v,
> >  goto read_as_zero;
> >  
> >  /* Read the active status of an IRQ via GICD/GICR is not supported */
> > -case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVER):
> > +case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
> > +printk(XENLOG_G_ERR "%pv: vGICD: unhandled read from 
> > ISACTIVER%d\n",
> > +   v, (reg - GICD_ISACTIVER) / 4);
> 
> All the other similar printks that we have in vgic-v3.c don't have the
> "/ 4", for instance:
> 
> case VRANGE32(GICD_ISACTIVER, GICD_ISACTIVERN):
> if ( dabt.size != DABT_WORD ) goto bad_width;
> printk(XENLOG_G_ERR
>"%pv: %s: unhandled word write %#"PRIregister" to 
> ISACTIVER%d\n",
>v, name, r, reg - GICD_ISACTIVER);
> 
> However, reg reflects the address of the register, so actually, the
> division by 4 looks correct if we want to get the index of the specific
> register. Thanks for spotting this. We'll need to do a clean-up in the
> file later.
> 
> Reviewed-by: Stefano Stabellini 
> 
> 
> 
> > +goto read_as_zero;
> >  case VRANGE32(GICD_ICACTIVER, GICD_ICACTIVERN):
> >  goto read_as_zero;


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [libvirt test] 144304: tolerable all pass - PUSHED

2019-11-26 Thread osstest service owner
flight 144304 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144304/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144233
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144233
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 12 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 13 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  9d6920bd7de3f92be1894790adeb689060ab25eb
baseline version:
 libvirt  5e939cea896fb3373a6f68f86e325c657429ed3d

Last test of basis   144233  2019-11-21 04:18:53 Z5 days
Failing since144244  2019-11-22 04:18:48 Z4 days5 attempts
Testing same since   144304  2019-11-26 04:19:14 Z0 days1 attempts


People who touched revisions under test:
  Christian Ehrhardt 
  Daniel P. Berrangé 
  Erik Skultety 
  Jamie Strandboge 
  Jiri Denemark 
  Ján Tomko 
  Laine Stump 
  Michal Privoznik 
  Pavel Mores 
  Peter Krempa 
  Pino Toscano 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/libvirt.git
   5e939cea89..9d6920bd7d  

Re: [Xen-devel] [PATCH v2] bsp/xen: Update README

2019-11-26 Thread Julien Grall

Hi,

On 26/11/2019 20:27, Jeff Kubascik wrote:

Add some background information on the BSP and instructions on how to
run the ticker application.

Change-Id: I05050a335a938f00cc59bae69a014c5f04e05d23
---
  bsps/arm/xen/README | 130 ++--


Hmmm what repo is it?

Cheers,


  1 file changed, 64 insertions(+), 66 deletions(-)

diff --git a/bsps/arm/xen/README b/bsps/arm/xen/README
index 1b24d84c9a..2ae2f2170d 100644
--- a/bsps/arm/xen/README
+++ b/bsps/arm/xen/README
@@ -1,66 +1,64 @@
-#  This is a sample hardware description file for a BSP.  This comment
-#  block does not have to appear in a real one.  The intention of this
-#  file is to provide a central place to look when searching for
-#  information about a board when starting a new BSP.  For example,
-#  you may want to find an existing timer driver for the chip you are
-#  using on your board.  It is easier to grep for the chip name in
-#  all of the HARDWARE files than to peruse the source tree.  Hopefully,
-#  making the HARDDWARE files accurate will also alleviate the common
-#  problem of not knowing anything about a board based on its BSP
-#  name.
-#
-#  NOTE:  If you have a class of peripheral chip on board which
-# is not in this list please add it to this file so
-# others will also use the same name.
-#
-# Timer resolution is the way it is configured in this BSP.
-# On a counting timer, this is the length of time which
-# corresponds to 1 count.
-#
-
-BSP NAME:   fastsbc1
-BOARD:  Fat Computers, Fast SBC-1
-BUS:SchoolBus
-CPU FAMILY: i386
-CPU:Intel Hexium
-COPROCESSORS:   Witch Hex87
-MODE:   32 bit mode
-
-DEBUG MONITOR:  HexBug
-
-PERIPHERALS
-===
-TIMERS: Intel i8254
-  RESOLUTION: .0001 microseconds
-SERIAL PORTS:   Zilog Z8530 (with 2 ports)
-REAL-TIME CLOCK:RTC-4
-DMA:Intel i8259
-VIDEO:  none
-SCSI:   none
-NETWORKING: none
-
-DRIVER INFORMATION
-==
-CLOCK DRIVER:   RTC-4
-IOSUPP DRIVER:  Zilog Z8530 port A
-SHMSUPP:polled and interrupts
-TIMER DRIVER:   Intel i8254
-TTY DRIVER: stub only
-
-STDIO
-=
-PORT:   Console port 0
-ELECTRICAL: RS-232
-BAUD:   9600
-BITS PER CHARACTER: 8
-PARITY: None
-STOP BITS:  1
-
-NOTES
-=
-
-(1) 900 Mhz and 950 Mhz versions.
-
-(2) 1 Gb or 2 Gb RAM.
-
-(3) PC compatible if HexBug not enabled.
+BSP for Xen on ARM
+
+Overview
+
+
+This BSP enables RTEMS to run as a guest virtual machine in AArch32 mode on the
+Xen hypervisor for ARMv8 platforms.
+
+Drivers:
+- Clock: ARMv7-AR Generic Timer
+- Console: Virtual PL011 device
+- Interrupt: GICv2
+
+BSP variants:
+- xen_virtual: completely virtualized guest with no dependence on underlying
+  hardware
+
+The xen_virtual BSP variant relies on standard Xen features, so it should be
+able to run on any ARMv8 platform.
+
+Xen allows for the passthrough of hardware peripherals to guest virtual
+machines. BSPs could be added in the future targeting specific hardware
+platforms and include the appropriate drivers.
+
+This BSP was tested with Xen running on the Xilinx Zynq UltraScale+ MPSoC using
+the Virtuosity distribution maintained by DornerWorks.
+
+Execution
+-
+
+This procedure describes how to run the ticker sample application that should
+already be built with the BSP.
+
+The `ticker.exe` file can be found in the BSP build tree at:
+
+  arm-rtems5/c/xen_virtual/testsuites/samples/ticker.exe
+
+The `ticker.exe` elf file must be translated to a binary format.
+
+  arm-rtems5-objcopy -O binary ticker.exe ticker.bin
+
+Then place the `ticker.bin` file on the dom0 filesystem.
+
+From the dom0 console, create a configuration file `ticker.cfg` with the
+following contents.
+
+  name = "ticker"
+  kernel = "ticker.bin"
+  memory = 8
+  vcpus = 1
+  gic_version = "v2"
+  vuart = "sbsa_uart"
+
+Create the virtual machine and attach to the virtual vpl011 console.
+
+  xl create ticker.cfg && xl console -t vuart ticker
+
+To return back to the dom0 console, press both `Ctrl` and `]` on your keyboard.
+
+Additional information
+--
+
+The Virtuosity distribution can be found at
+  https://dornerworks.com/xen/virtuosity



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable test] 144301: tolerable FAIL - PUSHED

2019-11-26 Thread osstest service owner
flight 144301 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144301/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-ws16-amd64 16 guest-localmigrate/x10 fail in 144295 
pass in 144301
 test-amd64-amd64-libvirt-xsm 21 leak-check/check   fail pass in 144295
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 12 
guest-start/debianhvm.repeat fail pass in 144295

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-rtds 17 guest-start.2 fail in 144295 blocked in 144289
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail  like 144289
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 144289
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 144289
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 144289
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 144289
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 144289
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 144289
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 144289
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 144289
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 144289
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 xen  77beba7c921a286c31a2a76f26500047f353614a
baseline version:
 xen  183f354e1430087879de071f0c7122e42703916e

Last test of basis   144289  

[Xen-devel] [PATCH v2] bsp/xen: Update README

2019-11-26 Thread Jeff Kubascik
Add some background information on the BSP and instructions on how to
run the ticker application.

Change-Id: I05050a335a938f00cc59bae69a014c5f04e05d23
---
 bsps/arm/xen/README | 130 ++--
 1 file changed, 64 insertions(+), 66 deletions(-)

diff --git a/bsps/arm/xen/README b/bsps/arm/xen/README
index 1b24d84c9a..2ae2f2170d 100644
--- a/bsps/arm/xen/README
+++ b/bsps/arm/xen/README
@@ -1,66 +1,64 @@
-#  This is a sample hardware description file for a BSP.  This comment
-#  block does not have to appear in a real one.  The intention of this
-#  file is to provide a central place to look when searching for
-#  information about a board when starting a new BSP.  For example,
-#  you may want to find an existing timer driver for the chip you are
-#  using on your board.  It is easier to grep for the chip name in
-#  all of the HARDWARE files than to peruse the source tree.  Hopefully,
-#  making the HARDDWARE files accurate will also alleviate the common
-#  problem of not knowing anything about a board based on its BSP
-#  name.
-#
-#  NOTE:  If you have a class of peripheral chip on board which
-# is not in this list please add it to this file so
-# others will also use the same name.
-#
-# Timer resolution is the way it is configured in this BSP.
-# On a counting timer, this is the length of time which
-# corresponds to 1 count.
-#
-
-BSP NAME:   fastsbc1
-BOARD:  Fat Computers, Fast SBC-1
-BUS:SchoolBus
-CPU FAMILY: i386
-CPU:Intel Hexium
-COPROCESSORS:   Witch Hex87
-MODE:   32 bit mode
-
-DEBUG MONITOR:  HexBug
-
-PERIPHERALS
-===
-TIMERS: Intel i8254
-  RESOLUTION: .0001 microseconds
-SERIAL PORTS:   Zilog Z8530 (with 2 ports)
-REAL-TIME CLOCK:RTC-4
-DMA:Intel i8259
-VIDEO:  none
-SCSI:   none
-NETWORKING: none
-
-DRIVER INFORMATION
-==
-CLOCK DRIVER:   RTC-4
-IOSUPP DRIVER:  Zilog Z8530 port A
-SHMSUPP:polled and interrupts
-TIMER DRIVER:   Intel i8254
-TTY DRIVER: stub only
-
-STDIO
-=
-PORT:   Console port 0
-ELECTRICAL: RS-232
-BAUD:   9600
-BITS PER CHARACTER: 8
-PARITY: None
-STOP BITS:  1
-
-NOTES
-=
-
-(1) 900 Mhz and 950 Mhz versions.
-
-(2) 1 Gb or 2 Gb RAM.
-
-(3) PC compatible if HexBug not enabled.
+BSP for Xen on ARM
+
+Overview
+
+
+This BSP enables RTEMS to run as a guest virtual machine in AArch32 mode on the
+Xen hypervisor for ARMv8 platforms.
+
+Drivers:
+- Clock: ARMv7-AR Generic Timer
+- Console: Virtual PL011 device
+- Interrupt: GICv2
+
+BSP variants:
+- xen_virtual: completely virtualized guest with no dependence on underlying
+  hardware
+
+The xen_virtual BSP variant relies on standard Xen features, so it should be
+able to run on any ARMv8 platform.
+
+Xen allows for the passthrough of hardware peripherals to guest virtual
+machines. BSPs could be added in the future targeting specific hardware
+platforms and include the appropriate drivers.
+
+This BSP was tested with Xen running on the Xilinx Zynq UltraScale+ MPSoC using
+the Virtuosity distribution maintained by DornerWorks.
+
+Execution
+-
+
+This procedure describes how to run the ticker sample application that should
+already be built with the BSP.
+
+The `ticker.exe` file can be found in the BSP build tree at:
+
+  arm-rtems5/c/xen_virtual/testsuites/samples/ticker.exe
+
+The `ticker.exe` elf file must be translated to a binary format.
+
+  arm-rtems5-objcopy -O binary ticker.exe ticker.bin
+
+Then place the `ticker.bin` file on the dom0 filesystem.
+
+From the dom0 console, create a configuration file `ticker.cfg` with the
+following contents.
+
+  name = "ticker"
+  kernel = "ticker.bin"
+  memory = 8
+  vcpus = 1
+  gic_version = "v2"
+  vuart = "sbsa_uart"
+
+Create the virtual machine and attach to the virtual vpl011 console.
+
+  xl create ticker.cfg && xl console -t vuart ticker
+
+To return back to the dom0 console, press both `Ctrl` and `]` on your keyboard.
+
+Additional information
+--
+
+The Virtuosity distribution can be found at
+  https://dornerworks.com/xen/virtuosity
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)

2019-11-26 Thread Andrew Cooper
On 26/11/2019 20:12, Roman Shaposhnik wrote:
> On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki
>  wrote:
>> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
>>> Hi Marek, after applying Jan's patch I'm making much further progress.
>>> Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
>>> my end).
>>>
>>> I'm attaching the logs from Xen and Dom0.
>>>
>>> At this point it seems that adding efi=attr=uc is a better option for
>>> these boxes than a wholesale efi=no-rs
>>>
>>> Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
>>> supposed to cover by default (so I don't have to add efi=attr=uc)?
>> No, this looks like some different firmware (?) issue.
>>
>>> Question #2: is there any downside to *always* specifying efi=attr=uc?
>>> Even for servers that, strictly speaking, don't need it?
>> TL;DR: It should be fine. It is what Linux does too.
>>
>> Details:
>>
>> Lets take a look why 'efi=attr=uc' helps, and how can we make it work
>> out of the box:
>>
>> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
>> attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability
>> attribute is defined. For the record, defined attributes are (UEFI spec
>> .6):
>>
>> EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
>> being configured as not cacheable.
>>
>> EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
>> being configured as write combining.
>>
>> EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
>> being configured as cacheable with a “write through” policy.
>> Writes that hit in the cache will also be written to main memory.
>>
>> EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
>> being configured as cacheable with a “write back” policy. Reads
>> and writes that hit in the cache do not propagate to main memory.
>> Dirty data is written back to main memory when a new cache line
>> is allocated.
>>
>> EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
>> being configured as not cacheable, exported, and supports the
>> “fetch and add” semaphore mechanism.
>>
>> My reading of UEFI spec doesn't give much hints what to do with memory
>> mappings without any cachability attribute. The only related info I've
>> found is about EfiMemoryMappedIO:
>>
>> This memory is not used by the OS. All system memory-mapped IO
>> information should come from ACPI tables.
>>
>> So, maybe there is some more info?
>>
>> Anyway, if I understand correctly, MMIO region should be mapped as UC,
>> right?
>>
>> I've also taken look at what Linux does. And basically, the only bit
>> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
>> as uncachable (page cache disabled bit in page table entry). So,
>> basically Linux by default does what Xen's efi=attr=uc does.
> Very interesting! Thanks for doing the research.
>
>> So, to improve Xen's hardware/firmware compatibility, I have two ideas:
>>
>> 1. Make efi=attr=uc the default (it's still possible to disable it with
>> efi=attr=no).
> I'd be very much in favor of that too (especially since it seems to match
> Linux behaviour) What do others think?

Its more than just this.  Linux also doesn't use EFI reboot because it
is broken almost everywhere (because Windows doesn't use it because its
broken almost everywhere, so it never gets fixed).

Xen should be following Linux, but I'm exhausted arguing this point.

A consequence is that downstream tend to share a pile of "unbreak Xen on
UEFI" patches which have been rejected upstream on philosophical rather
than technical grounds, despite this being a toxic environment to work in.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)

2019-11-26 Thread Roman Shaposhnik
On Tue, Nov 26, 2019 at 10:32 AM Marek Marczykowski-Górecki
 wrote:
>
> On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
> > Hi Marek, after applying Jan's patch I'm making much further progress.
> > Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
> > my end).
> >
> > I'm attaching the logs from Xen and Dom0.
> >
> > At this point it seems that adding efi=attr=uc is a better option for
> > these boxes than a wholesale efi=no-rs
> >
> > Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
> > supposed to cover by default (so I don't have to add efi=attr=uc)?
>
> No, this looks like some different firmware (?) issue.
>
> > Question #2: is there any downside to *always* specifying efi=attr=uc?
> > Even for servers that, strictly speaking, don't need it?
>
> TL;DR: It should be fine. It is what Linux does too.
>
> Details:
>
> Lets take a look why 'efi=attr=uc' helps, and how can we make it work
> out of the box:
>
> The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
> attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability
> attribute is defined. For the record, defined attributes are (UEFI spec
> .6):
>
> EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
> being configured as not cacheable.
>
> EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
> being configured as write combining.
>
> EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
> being configured as cacheable with a “write through” policy.
> Writes that hit in the cache will also be written to main memory.
>
> EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
> being configured as cacheable with a “write back” policy. Reads
> and writes that hit in the cache do not propagate to main memory.
> Dirty data is written back to main memory when a new cache line
> is allocated.
>
> EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
> being configured as not cacheable, exported, and supports the
> “fetch and add” semaphore mechanism.
>
> My reading of UEFI spec doesn't give much hints what to do with memory
> mappings without any cachability attribute. The only related info I've
> found is about EfiMemoryMappedIO:
>
> This memory is not used by the OS. All system memory-mapped IO
> information should come from ACPI tables.
>
> So, maybe there is some more info?
>
> Anyway, if I understand correctly, MMIO region should be mapped as UC,
> right?
>
> I've also taken look at what Linux does. And basically, the only bit
> Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
> as uncachable (page cache disabled bit in page table entry). So,
> basically Linux by default does what Xen's efi=attr=uc does.

Very interesting! Thanks for doing the research.

> So, to improve Xen's hardware/firmware compatibility, I have two ideas:
>
> 1. Make efi=attr=uc the default (it's still possible to disable it with
> efi=attr=no).

I'd be very much in favor of that too (especially since it seems to match
Linux behaviour) What do others think?

> 2. Map type=11 (MMIO) as UC, unless attributes specify otherwise.

This seems to be the subset of the #1 option. As such -- perhaps it
is "safer" than a wholesale efi=attr=uc but at the same time Linux
behaviour gives me pretty good confidence that we should probably
be safe, no?

> Any preference? I can prepare a patch for either version. But I guess
> it's too late for getting it into 4.13.

Good question as well.

Thanks,
Roman.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] livepatch-build-tools regression

2019-11-26 Thread Wieczorkiewicz, Pawel
It looks like gcc plays the usual dirty tricks with local variables renaming:

- xen-syms
  7529: 82d0805fed50 8 OBJECT  LOCAL  DEFAULT 4230 lastpage.22857
- livepatch
   289:  8 OBJECT  GLOBAL DEFAULT  UND hvm.c#lastpage.22856

Then, symbols resolution by name fails..

Can you please try to build the livepatch module with additional option 
'—prelink' and give it a try ?

> On 26. Nov 2019, at 18:51, Wieczorkiewicz, Pawel  wrote:
> 
> 
> 
>> On 20. Nov 2019, at 12:42, Sergey Dyasli  wrote:
>> 
>> On 19/11/2019 17:21, Wieczorkiewicz, Pawel wrote:
>>> 
>>> 
 On 18. Nov 2019, at 18:41, Sergey Dyasli  wrote:
 
 On 18/11/2019 17:28, Wieczorkiewicz, Pawel wrote:
> 
> Could you build the lp with debug (-d) and provide me with the 
> create-diff-object.log file?
> 
 
 I've attached the log. Btw, I think I provided all the necessary 
 information
 for others to repeat my experiment.
 
>>> 
>>> Sorry for another request, but I do not seem to be able to reproduce this 
>>> locally.
>>> Could you send me the livepatch module binary that fails to upload?
>> 
>> That's interesting. I've attached the binary that my system produces.
>> What version of gcc do you use?
> 
> The version used was: gcc (GCC) 7.2.1 20170915
> 
> But I have finally managed to reproduce the issue with:
> 1. gcc (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026
> 2. gcc-7 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> 
> I think it is not related to the commit:
> commit 854a7ca60e35 "create-diff-object: Do not include all .rodata sections"
> 
> I managed to reproduce it also with earlier version commit:
> "0c10457 Remove section alignment requirement"
> 
> But this time a different symbol causes the failure:
> 
> (XEN) livepatch: 0001-live-patch: Unknown symbol: hvm.c#lastpage.22856
> 
>> 
>> --
>> Thanks,
>> Sergey
>> <0001-live-patch-stripped.livepatch>
> 
> Best Regards,
> Pawel Wieczorkiewicz

Best Regards,
Pawel Wieczorkiewicz






Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)

2019-11-26 Thread Marek Marczykowski-Górecki
On Tue, Nov 26, 2019 at 09:56:25AM -0800, Roman Shaposhnik wrote:
> Hi Marek, after applying Jan's patch I'm making much further progress.
> Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
> my end).
> 
> I'm attaching the logs from Xen and Dom0.
> 
> At this point it seems that adding efi=attr=uc is a better option for
> these boxes than a wholesale efi=no-rs
> 
> Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
> supposed to cover by default (so I don't have to add efi=attr=uc)?

No, this looks like some different firmware (?) issue.

> Question #2: is there any downside to *always* specifying efi=attr=uc?
> Even for servers that, strictly speaking, don't need it?

TL;DR: It should be fine. It is what Linux does too.

Details:

Lets take a look why 'efi=attr=uc' helps, and how can we make it work
out of the box:

The issue is about memory marked as type=11 (EfiMemoryMappedIO) with
attr=8000 (EFI_MEMORY_RUNTIME). Indeed none of cachability 
attribute is defined. For the record, defined attributes are (UEFI spec 
.6):

EFI_MEMORY_UC Memory cacheability attribute: The memory region supports
being configured as not cacheable.

EFI_MEMORY_WC Memory cacheability attribute: The memory region supports
being configured as write combining.

EFI_MEMORY_WT Memory cacheability attribute: The memory region supports
being configured as cacheable with a “write through” policy.
Writes that hit in the cache will also be written to main memory.

EFI_MEMORY_WB Memory cacheability attribute: The memory region supports
being configured as cacheable with a “write back” policy. Reads
and writes that hit in the cache do not propagate to main memory.
Dirty data is written back to main memory when a new cache line
is allocated.

EFI_MEMORY_UCE Memory cacheability attribute: The memory region supports
being configured as not cacheable, exported, and supports the
“fetch and add” semaphore mechanism.

My reading of UEFI spec doesn't give much hints what to do with memory
mappings without any cachability attribute. The only related info I've
found is about EfiMemoryMappedIO:

This memory is not used by the OS. All system memory-mapped IO
information should come from ACPI tables.

So, maybe there is some more info?

Anyway, if I understand correctly, MMIO region should be mapped as UC,
right?

I've also taken look at what Linux does. And basically, the only bit
Linux care about is EFI_MEMORY_WB - if it's absent, then set the region
as uncachable (page cache disabled bit in page table entry). So,
basically Linux by default does what Xen's efi=attr=uc does.

So, to improve Xen's hardware/firmware compatibility, I have two ideas:

1. Make efi=attr=uc the default (it's still possible to disable it with
efi=attr=no).

2. Map type=11 (MMIO) as UC, unless attributes specify otherwise.

Any preference? I can prepare a patch for either version. But I guess
it's too late for getting it into 4.13.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v3 2/2] x86/vmx: always sync PIR to IRR before vmentry

2019-11-26 Thread Roger Pau Monné
On Tue, Nov 26, 2019 at 05:50:32PM +0100, Jan Beulich wrote:
> On 26.11.2019 14:26, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/irq.c
> > +++ b/xen/arch/x86/hvm/irq.c
> > @@ -515,7 +515,11 @@ void hvm_set_callback_via(struct domain *d, uint64_t 
> > via)
> >  struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
> >  {
> >  struct hvm_domain *plat = >domain->arch.hvm;
> > -int vector;
> > +/*
> > + * Always call vlapic_has_pending_irq so that PIR is synced into IRR 
> > when
> > + * using posted interrupts.
> > + */
> > +int vector = vlapic_has_pending_irq(v);
> 
> Did you consider doing this conditionally either here ...
> 
> > @@ -530,7 +534,6 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu 
> > *v)
> >  if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output )
> >  return hvm_intack_pic(0);
> >  
> > -vector = vlapic_has_pending_irq(v);
> >  if ( vector != -1 )
> >  return hvm_intack_lapic(vector);
> 
> ... or here? I ask not only because the function isn't exactly
> cheap to call (as iirc you did also mention during the v2
> discussion), but also because of its interaction with Viridian
> and nested mode. In case of problems there, avoiding the use
> of interrupt posting would be a workaround in such cases then.

TBH my preference was to do the PIR to IRR sync in vmx_intr_assist by
directly calling vmx_sync_pir_to_irr because it was IMO less intrusive
and confined to VMX code. I think this approach is more risky as
vlapic_has_pending_irq does way more than simply syncing PIR to IRR.

> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1945,57 +1945,31 @@ static void vmx_process_isr(int isr, struct vcpu *v)
> >  
> >  static void __vmx_deliver_posted_interrupt(struct vcpu *v)
> >  {
> > -bool_t running = v->is_running;
> > -
> >  vcpu_unblock(v);
> > -/*
> > - * Just like vcpu_kick(), nothing is needed for the following two 
> > cases:
> > - * 1. The target vCPU is not running, meaning it is blocked or 
> > runnable.
> > - * 2. The target vCPU is the current vCPU and we're in non-interrupt
> > - * context.
> > - */
> > -if ( running && (in_irq() || (v != current)) )
> > -{
> > +if ( v->is_running && v != current )
> 
> I continue to be concerned by this evaluation of ->is_running
> _after_ vcpu_unblock(), when previously (just like vcpu_kick()
> does) it was intentionally done before.

If the unblock sets v->is_running to true that's fine, Xen will send a
posted interrupt IPI and the destination pCPU will either be in root
mode and thus raise a softirq or in non-root mode and perform the PIR
to IRR sync and possible interrupt injection.

I see that caching the value of is_running might be helpful in order
to prevent pointless IPI'ing. If the vCPU wasn't running before the
unblock there's no reason to send an IPI to it, because the sync of
PIR to IRR will happen at vmentry anyway.

> I wonder anyway
> whether this and the change to irq.c should really be in a
> single patch, the more that you start the respective part of
> the description with "While there also simplify ...". But in
> the end it is up to Kevin's or Jun's judgement anyway.

Yes, that makes sense. Will wait for feedback from Kevin or Jun before
sending a new version anyway.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] UEFI support on Dell boxes (was: Re: Status of 4.13)

2019-11-26 Thread Roman Shaposhnik
Hi Marek, after applying Jan's patch I'm making much further progress.
Xen boots fine and Dom0 seems to be OK (more tests are needed tho on
my end).

I'm attaching the logs from Xen and Dom0.

At this point it seems that adding efi=attr=uc is a better option for
these boxes than a wholesale efi=no-rs

Question #1: is this something that EFI_SET_VIRTUAL_ADDRESS_MAP was
supposed to cover by default (so I don't have to add efi=attr=uc)?

Question #2: is there any downside to *always* specifying efi=attr=uc?
Even for servers that, strictly speaking, don't need it?

Thanks,
Roman.

On Mon, Nov 25, 2019 at 11:02 PM Roman Shaposhnik  wrote:
>
> On Mon, Nov 25, 2019 at 7:55 PM Marek Marczykowski-Górecki
>  wrote:
> >
> > On Mon, Nov 25, 2019 at 07:44:03PM -0800, Roman Shaposhnik wrote:
> > > On Sun, Nov 24, 2019 at 4:48 PM Marek Marczykowski-Górecki
> > >  wrote:
> > > > Do you have by
> > > > a chance messages of that crash (without efi=no-rs, but with
> > > > EFI_SET_VIRTUAL_ADDRESS_MAP enabled)? Or even a photo if no serial 
> > > > output is
> > > > available?
> > >
> > > With my awesome soldering skills ;-) I managed to rig a serial console.
> > >
> > > Output is attached. Please let me know if you'd like me to run any
> > > other experiments.
> >
> > Looks helpful, lets try to do something:
> >
> > >  Xen 4.13.0-rc
> > > (XEN) Xen version 4.13.0-rc (@) (gcc (Alpine 6.4.0) 6.4.0) debug=y  Tue 
> > > Nov 26 03:19:38 UTC 2019
> > > (XEN) Latest ChangeSet:
> > > (XEN) build-id: 07aa9f711fe09a91be2588ee7df10d93ebe34c80
> > > (XEN) Bootloader: GRUB 2.03
> > > (XEN) Command line: com1=115200,8n1 console=com1 loglvl=all noreboot 
> > > dom0_mem=640M,max:640M dom0_max_vcpus=1 dom0_vcpus_pin smt=false
> > (...)
> > > (XEN) EFI memory map:
> > (...)
> > > (XEN)  077587000-0775f4fff type=5 attr=800f
> >
> > This is code that crashes - runtime services code, so somewhere with
> > actual UEFI code.
>
> Yup -- that was my hunch with adding efi=no-rs option.
>
> > (...)
> > > (XEN)  0ff90-0 type=11 attr=8000
> > > (XEN) Unknown cachability for MFNs 0xff900-0xf
> >
> > The faulting address is in this range. And because of unknown
> > cachability, it isn't mapped. Try adding 'efi=attr=uc' to the Xen
> > cmdline.
>
> Feels like we're getting exactly the same failure. Log attached.
>
> Thanks,
> Roman.
(XEN) Xen version 4.13.0-rc (@) (gcc (Alpine 6.4.0) 6.4.0) debug=y  Tue Nov 26 
16:59:33 UTC 2019
(XEN) Latest ChangeSet:
(XEN) build-id: 65fa4bd58d340488ec6963bd8ca5418747541fe5
(XEN) Bootloader: GRUB 2.03
(XEN) Command line: com1=115200,8n1 console=com1 efi=attr=uc loglvl=all 
noreboot dom0_mem=640M,max:640M dom0_max_vcpus=1 dom0_vcpus_pin smt=false
(XEN) Xen image load base address: 0x70e0
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN) Disc information:
(XEN)  Found 0 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) EFI RAM map:
(XEN)   - 0003f000 (usable)
(XEN)  0003f000 - 0004 (ACPI NVS)
(XEN)  0004 - 000a (usable)
(XEN)  0010 - 2000 (usable)
(XEN)  2000 - 2010 (reserved)
(XEN)  2010 - 76ccb000 (usable)
(XEN)  76ccb000 - 76d43000 (reserved)
(XEN)  76d43000 - 76d54000 (ACPI data)
(XEN)  76d54000 - 772de000 (ACPI NVS)
(XEN)  772de000 - 775f5000 (reserved)
(XEN)  775f5000 - 775f6000 (usable)
(XEN)  775f6000 - 77638000 (reserved)
(XEN)  77638000 - 789e5000 (usable)
(XEN)  789e5000 - 78ffa000 (reserved)
(XEN)  78ffa000 - 7900 (usable)
(XEN)  e000 - f000 (reserved)
(XEN)  fec0 - fec01000 (reserved)
(XEN)  fed01000 - fed02000 (reserved)
(XEN)  fed03000 - fed04000 (reserved)
(XEN)  fed08000 - fed09000 (reserved)
(XEN)  fed0c000 - fed1 (reserved)
(XEN)  fed1c000 - fed1d000 (reserved)
(XEN)  fee0 - fee01000 (reserved)
(XEN)  fef0 - ff00 (reserved)
(XEN)  ff90 - 0001 (reserved)
(XEN) System RAM: 1919MB (1965176kB)
(XEN) ACPI: RSDP 76D46000, 0024 (r2   DELL)
(XEN) ACPI: XSDT 76D46088, 0094 (r1   DELL AS09  1072009 AMI 10013)
(XEN) ACPI: FACP 76D52560, 010C (r5   DELL AS09  1072009 AMI 10013)
(XEN) ACPI: DSDT 76D461B0, C3AF (r2   DELL AS09  1072009 INTL 20120913)
(XEN) ACPI: FACS 772DDE80, 0040
(XEN) ACPI: APIC 76D52670, 0068 (r3   DELL AS09  1072009 AMI 10013)
(XEN) ACPI: FPDT 76D526D8, 0044 (r1   DELL AS09  1072009 AMI 10013)
(XEN) ACPI: FIDT 76D52720, 009C (r1   DELL AS09  1072009 AMI 10013)
(XEN) ACPI: MCFG 76D527C0, 003C (r1   DELL AS09  1072009 MSFT   97)
(XEN) ACPI: LPIT 76D52800, 0104 (r1   DELL AS09   

[Xen-devel] [xen-unstable-smoke test] 144310: tolerable all pass - PUSHED

2019-11-26 Thread osstest service owner
flight 144310 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144310/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  5530782cfe70ed22fe44358f6a10c38916443b42
baseline version:
 xen  8c79c129a6db2220c1089e0ce5fa49e7298b1d3e

Last test of basis   144307  2019-11-26 11:03:51 Z0 days
Testing same since   144310  2019-11-26 15:01:24 Z0 days1 attempts


People who touched revisions under test:
  Anthony PERARD 
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   8c79c129a6..5530782cfe  5530782cfe70ed22fe44358f6a10c38916443b42 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] livepatch-build-tools regression

2019-11-26 Thread Wieczorkiewicz, Pawel


> On 20. Nov 2019, at 12:42, Sergey Dyasli  wrote:
> 
> On 19/11/2019 17:21, Wieczorkiewicz, Pawel wrote:
>> 
>> 
>>> On 18. Nov 2019, at 18:41, Sergey Dyasli  wrote:
>>> 
>>> On 18/11/2019 17:28, Wieczorkiewicz, Pawel wrote:
 
 Could you build the lp with debug (-d) and provide me with the 
 create-diff-object.log file?
 
>>> 
>>> I've attached the log. Btw, I think I provided all the necessary information
>>> for others to repeat my experiment.
>>> 
>> 
>> Sorry for another request, but I do not seem to be able to reproduce this 
>> locally.
>> Could you send me the livepatch module binary that fails to upload?
> 
> That's interesting. I've attached the binary that my system produces.
> What version of gcc do you use?

The version used was: gcc (GCC) 7.2.1 20170915

But I have finally managed to reproduce the issue with:
1. gcc (Ubuntu 6.5.0-2ubuntu1~18.04) 6.5.0 20181026
2. gcc-7 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

I think it is not related to the commit:
commit 854a7ca60e35 "create-diff-object: Do not include all .rodata sections"

I managed to reproduce it also with earlier version commit:
"0c10457 Remove section alignment requirement"

But this time a different symbol causes the failure:

(XEN) livepatch: 0001-live-patch: Unknown symbol: hvm.c#lastpage.22856

> 
> --
> Thanks,
> Sergey
> <0001-live-patch-stripped.livepatch>

Best Regards,
Pawel Wieczorkiewicz






Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> George Dunlap
> Sent: 26 November 2019 17:18
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross ; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Paul Durrant ; Andrew Cooper
> ; Konrad Rzeszutek Wilk
> ; George Dunlap ; Marek
> Marczykowski-Górecki ; Jan Beulich
> ; Ian Jackson 
> Subject: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> Xen used to have single, system-wide limits for the number of grant
> frames and maptrack frames a guest was allowed to create.  Increasing
> or decreasing this single limit on the Xen command-line would change
> the limit for all guests on the system.
> 
> Later, per-domain limits for these values was created.  The
> system-wide limits became strict limits: domains could not be created
> with higher limits, but could be created with lower limits.
> 
> However, the change also introduced a range of different "default"
> values into various places in the toolstack:
> 
> - The python libxc bindings hard-coded these values to 32 and 1024,
>   respectively
> 
> - The libxl default values are 32 and 1024 respectively.
> 
> - xl will use the libxl default for maptrack, but does its own default
>   calculation for grant frames: either 32 or 64, based on the max
>   possible mfn.
> 
> These defaults interact poorly with the hypervisor command-line limit:
> 
> - The hypervisor command-line limit cannot be used to raise the limit
>   for all guests anymore, as the default in the toolstack will
>   effectively override this.
> 
> - If you use the hypervisor command-line limit to *reduce* the limit,
>   then the "default" values generated by the toolstack are too high,
>   and all guest creations will fail.
> 
> In other words, the toolstack defaults require any change to be
> effected by having the admin explicitly specify a new value in every
> guest.
> 
> In order to address this, have grant_table_init treat '0' values for
> max_grant_frames and max_maptrack_frames as instructions to use the
> system-wide default.  Have all the above toolstacks default to passing
> 0 unless a different value is explicitly given.
> 
> This restores the old behavior, that changing the hypervisor
> command-line option can change the behavior for all guests, while
> retaining the ability to set per-guest values.  It also removes the
> bug that *reducing* the system-wide max will cause all domains without
> explicit limits to fail.
> 
> (The ocaml bindings require the caller to always specify a value, and
> the code to start a xenstored stubdomain hard-codes these to 4 and 128
> respectively; these will not be addressed here.)
> 
> Signed-off-by: George Dunlap 
> ---
> Release justification: This is an observed regression (albeit one that
> has spanned several releases now).
> 
> Compile-tested only.
> 
> NB this patch could be applied without the whitespace fixes (perhaps
> with some fix-ups); it's just easier since my editor strips trailing
> whitespace out automatically.
> 
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Andrew Cooper 
> CC: Jan Beulich 
> CC: Paul Durrant 
> CC: Julien Grall 
> CC: Konrad Rzeszutek Wilk 
> CC: Stefano Stabellini 
> CC: Juergen Gross 
> CC: Marek Marczykowski-Górecki 
> ---
>  tools/libxl/libxl.h   |  4 ++--
>  tools/python/xen/lowlevel/xc/xc.c |  2 --
>  tools/xl/xl.c | 12 ++--
>  xen/common/grant_table.c  |  7 +++
>  xen/include/public/domctl.h   |  6 --
>  5 files changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 49b56fa1a3..1648d337e7 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -364,8 +364,8 @@
>   */
>  #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1
> 
> -#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32
> -#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024
> +#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 0
> +#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 0
> 
>  /*
>   * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has
> diff --git a/tools/python/xen/lowlevel/xc/xc.c
> b/tools/python/xen/lowlevel/xc/xc.c
> index 6d2afd5695..0f861872ce 100644
> --- a/tools/python/xen/lowlevel/xc/xc.c
> +++ b/tools/python/xen/lowlevel/xc/xc.c
> @@ -127,8 +127,6 @@ static PyObject *pyxc_domain_create(XcObject *self,
>  },
>  .max_vcpus = 1,
>  .max_evtchn_port = -1, /* No limit. */
> -.max_grant_frames = 32,
> -.max_maptrack_frames = 1024,
>  };
> 
>  static char *kwd_list[] = { "domid", "ssidref", "handle", "flags",
> diff --git a/tools/xl/xl.c b/tools/xl/xl.c
> index ddd29b3f1b..b6e220184d 100644
> --- a/tools/xl/xl.c
> +++ b/tools/xl/xl.c
> @@ -51,8 +51,8 @@ libxl_bitmap global_pv_affinity_mask;
>  enum output_format default_output_format = OUTPUT_FORMAT_JSON;
>  int claim_mode = 1;
>  bool progress_use_cr = 0;
> -int max_grant_frames = -1;
> -int max_maptrack_frames = -1;
> +int 

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-26 Thread Ian Jackson
George Dunlap writes ("[PATCH for-4.13 2/2] Rationalize max_grant_frames and 
max_maptrack_frames handling"):
> Xen used to have single, system-wide limits for the number of grant
> frames and maptrack frames a guest was allowed to create.  Increasing
> or decreasing this single limit on the Xen command-line would change
> the limit for all guests on the system.

If I am not mistaken, this is an important change to have.

I have seen reports of users who ran out of grant/maptrack frames
because of updates to use multiring protocols etc.  The error messages
are not very good and the recommended workaround has been to increase
the default limit on the hypervisor command line.

It is important that we don't break that workaround!

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-26 Thread George Dunlap
On 11/26/19 5:17 PM, George Dunlap wrote:
> - xl will use the libxl default for maptrack, but does its own default
>   calculation for grant frames: either 32 or 64, based on the max
>   possible mfn.

[snip]

> @@ -199,13 +198,6 @@ static void parse_global_config(const char *configfile,
>  
>  if (!xlu_cfg_get_long (config, "max_grant_frames", , 0))
>  max_grant_frames = l;
> -else {
> -libxl_physinfo_init();
> -max_grant_frames = (libxl_get_physinfo(ctx, ) != 0 ||
> -!(physinfo.max_possible_mfn >> 32))
> -   ? 32 : 64;
> -libxl_physinfo_dispose();
> -}

Sorry, meant to add a patch to add this functionality back into the
hypervisor -- i.e., so that opt_max_grant_frames would be 32 on systems
with 32-bit mfns.

But this seems like a fairly strange calculation anyway; it's not clear
to me where it would have come from.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] UEFI support on Dell boxes

2019-11-26 Thread Roman Shaposhnik
On Tue, Nov 26, 2019 at 12:31 AM Jan Beulich  wrote:
>
> On 26.11.2019 08:02, Roman Shaposhnik wrote:
> > On Mon, Nov 25, 2019 at 7:55 PM Marek Marczykowski-Górecki
> >  wrote:
> >> On Mon, Nov 25, 2019 at 07:44:03PM -0800, Roman Shaposhnik wrote:
> >>> (XEN)  0ff90-0 type=11 attr=8000
> >>> (XEN) Unknown cachability for MFNs 0xff900-0xf
> >>
> >> The faulting address is in this range. And because of unknown
> >> cachability, it isn't mapped. Try adding 'efi=attr=uc' to the Xen
> >> cmdline.
> >
> > Feels like we're getting exactly the same failure. Log attached.
>
> Clearly the option hasn't been taking effect. Could you please
> retry with this fix
> https://lists.xenproject.org/archives/html/xen-devel/2019-11/msg01494.html
> in place?

This works very well indeed! I acked it in the patch thread.

Thanks,
Roman.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] EFI: fix "efi=attr=" handling

2019-11-26 Thread Roman Shaposhnik
On Tue, Nov 26, 2019 at 1:51 AM Wei Liu  wrote:
>
> On Tue, Nov 26, 2019 at 09:25:27AM +0100, Jan Beulich wrote:
> > Commit 633a40947321 ("docs: Improve documentation and parsing for efi=")
> > failed to honor the strcmp()-like return value convention of
> > cmdline_strcmp().
> >
> > Reported-by: Roman Shaposhnik 
> > Signed-off-by: Jan Beulich 
>
> Reviewed-by: Wei Liu 

Tested-by: Roman Shaposhnik 

Thanks,
Roman.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] getting 4.12.2 ready

2019-11-26 Thread Lars Kurth


On 25/11/2019, 09:10, "Jan Beulich"  wrote:

All,

the 4.12.2 stable release is due in about 2 weeks time. Please point
out backporting candidates that you find missing from the respective
stable trees.

Jan


Hi all: 

I ran the XSA scripts and all is fine. BreaThe tool does not always understand 
the
xsa303/*.patch xen-unstable .. Xen 4.9
notation in xsa.git, which is not well defined. But it's probably not worth 
trying to fix this, as a manual check takes less than a minute

Tool output is below

XSA 296 : Some patches not applied => check
Applied 

XSA 297 : No patch found => check
Was applied in Xen 4.12.1

XSA 298 : All patches found (no need to check)
XSA 299 : All patches found (no need to check)

XSA 300 : No patch found => check
Linux only

XSA 301 : All patches found (no need to check)

XSA 302 : Some patches not applied => check
Applied

XSA 303 : Some patches not applied => check
Applied

XSA 304 : All patches found (no need to check)
XSA 305 : All patches found (no need to check)

Regards
Lars

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xen/x86: vpmu: Unmap per-vCPU PMU page when the domain is destroyed

2019-11-26 Thread Paul Durrant
From: Julien Grall 

A guest will setup a shared page with the hypervisor for each vCPU via
XENPMU_init. The page will then get mapped in the hypervisor and only
released when XEMPMU_finish is called.

This means that if the guest is not shutdown gracefully (such as via xl
destroy), the page will stay mapped in the hypervisor. One of the
consequence is the domain can never be fully destroyed as some of its
memory is still mapped.

As Xen should never rely on the guest to correctly clean-up any
allocation in the hypervisor, we should also unmap pages during the
domain destruction if there are any left.

We can re-use the same logic as in pvpmu_finish(). To avoid
duplication, move the logic in a new function that can also be called
from vpmu_destroy().

NOTE: The call to vpmu_destroy() must also be moved from
  arch_vcpu_destroy() into domain_relinquish_resources() such that the
  mapped page does not prevent domain_destroy() (which calls
  arch_vcpu_destroy()) from being called.

Signed-off-by: Julien Grall 
Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Wei Liu 
Cc: "Roger Pau Monné" 
---
 xen/arch/x86/cpu/vpmu.c | 45 +++--
 xen/arch/x86/domain.c   |  6 +++---
 2 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index f397183ec3..9ae4ed48c8 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -578,9 +578,32 @@ static void vpmu_arch_destroy(struct vcpu *v)
 }
 }
 
-void vpmu_destroy(struct vcpu *v)
+static void vpmu_cleanup(struct vcpu *v)
 {
+struct vpmu_struct *vpmu = vcpu_vpmu(v);
+mfn_t mfn;
+void *xenpmu_data;
+
+spin_lock(>vpmu_lock);
+
 vpmu_arch_destroy(v);
+xenpmu_data = vpmu->xenpmu_data;
+vpmu->xenpmu_data = NULL;
+
+spin_unlock(>vpmu_lock);
+
+if ( xenpmu_data )
+{
+mfn = domain_page_map_to_mfn(xenpmu_data);
+ASSERT(mfn_valid(mfn));
+unmap_domain_page_global(xenpmu_data);
+put_page_and_type(mfn_to_page(mfn));
+}
+}
+
+void vpmu_destroy(struct vcpu *v)
+{
+vpmu_cleanup(v);
 
 put_vpmu(v);
 }
@@ -639,9 +662,6 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t 
*params)
 static void pvpmu_finish(struct domain *d, xen_pmu_params_t *params)
 {
 struct vcpu *v;
-struct vpmu_struct *vpmu;
-mfn_t mfn;
-void *xenpmu_data;
 
 if ( (params->vcpu >= d->max_vcpus) || (d->vcpu[params->vcpu] == NULL) )
 return;
@@ -650,22 +670,7 @@ static void pvpmu_finish(struct domain *d, 
xen_pmu_params_t *params)
 if ( v != current )
 vcpu_pause(v);
 
-vpmu = vcpu_vpmu(v);
-spin_lock(>vpmu_lock);
-
-vpmu_arch_destroy(v);
-xenpmu_data = vpmu->xenpmu_data;
-vpmu->xenpmu_data = NULL;
-
-spin_unlock(>vpmu_lock);
-
-if ( xenpmu_data )
-{
-mfn = domain_page_map_to_mfn(xenpmu_data);
-ASSERT(mfn_valid(mfn));
-unmap_domain_page_global(xenpmu_data);
-put_page_and_type(mfn_to_page(mfn));
-}
+vpmu_cleanup(v);
 
 if ( v != current )
 vcpu_unpause(v);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f1dd86e12e..1d75b2e6c3 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -454,9 +454,6 @@ void arch_vcpu_destroy(struct vcpu *v)
 xfree(v->arch.msrs);
 v->arch.msrs = NULL;
 
-if ( !is_idle_domain(v->domain) )
-vpmu_destroy(v);
-
 if ( is_hvm_vcpu(v) )
 hvm_vcpu_destroy(v);
 else
@@ -2224,6 +2221,9 @@ int domain_relinquish_resources(struct domain *d)
 if ( is_hvm_domain(d) )
 hvm_domain_relinquish_resources(d);
 
+for_each_vcpu ( d, v )
+vpmu_destroy(v);
+
 return 0;
 }
 
-- 
2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.13 1/2] python/xc.c: Remove trailing whitespace

2019-11-26 Thread George Dunlap
No functional change.

Signed-off-by: George Dunlap 
---
CC: Marek Marczykowski-Górecki 
CC: Juergen Gross 
---
 tools/python/xen/lowlevel/xc/xc.c | 210 +++---
 1 file changed, 105 insertions(+), 105 deletions(-)

diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index 44d3606141..6d2afd5695 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -1,6 +1,6 @@
 /**
  * Xc.c
- * 
+ *
  * Copyright (c) 2003-2004, K A Fraser (University of Cambridge)
  */
 
@@ -107,7 +107,7 @@ static PyObject *pyxc_domain_dumpcore(XcObject *self, 
PyObject *args)
 
 if ( xc_domain_dumpcore(self->xc_handle, dom, corefile) != 0 )
 return pyxc_error_to_exception(self->xc_handle);
-
+
 Py_INCREF(zero);
 return zero;
 }
@@ -141,7 +141,7 @@ static PyObject *pyxc_domain_create(XcObject *self,
 return NULL;
 if ( pyhandle != NULL )
 {
-if ( !PyList_Check(pyhandle) || 
+if ( !PyList_Check(pyhandle) ||
  (PyList_Size(pyhandle) != sizeof(xen_domain_handle_t)) )
 goto out_exception;
 
@@ -188,7 +188,7 @@ static PyObject *pyxc_domain_max_vcpus(XcObject *self, 
PyObject *args)
 
 if (xc_domain_max_vcpus(self->xc_handle, dom, max) != 0)
 return pyxc_error_to_exception(self->xc_handle);
-
+
 Py_INCREF(zero);
 return zero;
 }
@@ -223,7 +223,7 @@ static PyObject *pyxc_domain_shutdown(XcObject *self, 
PyObject *args)
 
 if ( xc_domain_shutdown(self->xc_handle, dom, reason) != 0 )
 return pyxc_error_to_exception(self->xc_handle);
-
+
 Py_INCREF(zero);
 return zero;
 }
@@ -255,7 +255,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self,
 
 static char *kwd_list[] = { "domid", "vcpu", "cpumap", NULL };
 
-if ( !PyArg_ParseTupleAndKeywords(args, kwds, "i|iO", kwd_list, 
+if ( !PyArg_ParseTupleAndKeywords(args, kwds, "i|iO", kwd_list,
   , , ) )
 return NULL;
 
@@ -269,7 +269,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self,
 
 if ( (cpulist != NULL) && PyList_Check(cpulist) )
 {
-for ( i = 0; i < PyList_Size(cpulist); i++ ) 
+for ( i = 0; i < PyList_Size(cpulist); i++ )
 {
 long cpu = PyLongOrInt_AsLong(PyList_GetItem(cpulist, i));
 if ( cpu < 0 || cpu >= nr_cpus )
@@ -282,7 +282,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self,
 cpumap[cpu / 8] |= 1 << (cpu % 8);
 }
 }
-  
+
 if ( xc_vcpu_setaffinity(self->xc_handle, dom, vcpu, cpumap,
  NULL, XEN_VCPUAFFINITY_HARD) != 0 )
 {
@@ -290,7 +290,7 @@ static PyObject *pyxc_vcpu_setaffinity(XcObject *self,
 return pyxc_error_to_exception(self->xc_handle);
 }
 Py_INCREF(zero);
-free(cpumap); 
+free(cpumap);
 return zero;
 }
 
@@ -304,7 +304,7 @@ static PyObject *pyxc_domain_sethandle(XcObject *self, 
PyObject *args)
 if (!PyArg_ParseTuple(args, "iO", , ))
 return NULL;
 
-if ( !PyList_Check(pyhandle) || 
+if ( !PyList_Check(pyhandle) ||
  (PyList_Size(pyhandle) != sizeof(xen_domain_handle_t)) )
 {
 goto out_exception;
@@ -320,7 +320,7 @@ static PyObject *pyxc_domain_sethandle(XcObject *self, 
PyObject *args)
 
 if (xc_domain_sethandle(self->xc_handle, dom, handle) < 0)
 return pyxc_error_to_exception(self->xc_handle);
-
+
 Py_INCREF(zero);
 return zero;
 
@@ -342,7 +342,7 @@ static PyObject *pyxc_domain_getinfo(XcObject *self,
 xc_dominfo_t *info;
 
 static char *kwd_list[] = { "first_dom", "max_doms", NULL };
-
+
 if ( !PyArg_ParseTupleAndKeywords(args, kwds, "|ii", kwd_list,
   _dom, _doms) )
 return NULL;
@@ -415,7 +415,7 @@ static PyObject *pyxc_vcpu_getinfo(XcObject *self,
 int nr_cpus;
 
 static char *kwd_list[] = { "domid", "vcpu", NULL };
-
+
 if ( !PyArg_ParseTupleAndKeywords(args, kwds, "i|i", kwd_list,
   , ) )
 return NULL;
@@ -470,7 +470,7 @@ static PyObject *pyxc_hvm_param_get(XcObject *self,
 int param;
 uint64_t value;
 
-static char *kwd_list[] = { "domid", "param", NULL }; 
+static char *kwd_list[] = { "domid", "param", NULL };
 if ( !PyArg_ParseTupleAndKeywords(args, kwds, "ii", kwd_list,
   , ) )
 return NULL;
@@ -490,7 +490,7 @@ static PyObject *pyxc_hvm_param_set(XcObject *self,
 int param;
 uint64_t value;
 
-static char *kwd_list[] = { "domid", "param", "value", NULL }; 
+static char *kwd_list[] = { "domid", "param", "value", NULL };
 if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iiL", kwd_list,
   , , ) )
 return NULL;
@@ -660,7 +660,7 @@ static PyObject 

[Xen-devel] [PATCH for-4.13 2/2] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-26 Thread George Dunlap
Xen used to have single, system-wide limits for the number of grant
frames and maptrack frames a guest was allowed to create.  Increasing
or decreasing this single limit on the Xen command-line would change
the limit for all guests on the system.

Later, per-domain limits for these values was created.  The
system-wide limits became strict limits: domains could not be created
with higher limits, but could be created with lower limits.

However, the change also introduced a range of different "default"
values into various places in the toolstack:

- The python libxc bindings hard-coded these values to 32 and 1024,
  respectively

- The libxl default values are 32 and 1024 respectively.

- xl will use the libxl default for maptrack, but does its own default
  calculation for grant frames: either 32 or 64, based on the max
  possible mfn.

These defaults interact poorly with the hypervisor command-line limit:

- The hypervisor command-line limit cannot be used to raise the limit
  for all guests anymore, as the default in the toolstack will
  effectively override this.

- If you use the hypervisor command-line limit to *reduce* the limit,
  then the "default" values generated by the toolstack are too high,
  and all guest creations will fail.

In other words, the toolstack defaults require any change to be
effected by having the admin explicitly specify a new value in every
guest.

In order to address this, have grant_table_init treat '0' values for
max_grant_frames and max_maptrack_frames as instructions to use the
system-wide default.  Have all the above toolstacks default to passing
0 unless a different value is explicitly given.

This restores the old behavior, that changing the hypervisor
command-line option can change the behavior for all guests, while
retaining the ability to set per-guest values.  It also removes the
bug that *reducing* the system-wide max will cause all domains without
explicit limits to fail.

(The ocaml bindings require the caller to always specify a value, and
the code to start a xenstored stubdomain hard-codes these to 4 and 128
respectively; these will not be addressed here.)

Signed-off-by: George Dunlap 
---
Release justification: This is an observed regression (albeit one that
has spanned several releases now).

Compile-tested only.

NB this patch could be applied without the whitespace fixes (perhaps
with some fix-ups); it's just easier since my editor strips trailing
whitespace out automatically.

CC: Ian Jackson 
CC: Wei Liu 
CC: Andrew Cooper 
CC: Jan Beulich 
CC: Paul Durrant 
CC: Julien Grall 
CC: Konrad Rzeszutek Wilk 
CC: Stefano Stabellini 
CC: Juergen Gross 
CC: Marek Marczykowski-Górecki 
---
 tools/libxl/libxl.h   |  4 ++--
 tools/python/xen/lowlevel/xc/xc.c |  2 --
 tools/xl/xl.c | 12 ++--
 xen/common/grant_table.c  |  7 +++
 xen/include/public/domctl.h   |  6 --
 5 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 49b56fa1a3..1648d337e7 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -364,8 +364,8 @@
  */
 #define LIBXL_HAVE_BUILDINFO_GRANT_LIMITS 1
 
-#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 32
-#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 1024
+#define LIBXL_MAX_GRANT_FRAMES_DEFAULT 0
+#define LIBXL_MAX_MAPTRACK_FRAMES_DEFAULT 0
 
 /*
  * LIBXL_HAVE_BUILDINFO_* indicates that libxl_domain_build_info has
diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index 6d2afd5695..0f861872ce 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -127,8 +127,6 @@ static PyObject *pyxc_domain_create(XcObject *self,
 },
 .max_vcpus = 1,
 .max_evtchn_port = -1, /* No limit. */
-.max_grant_frames = 32,
-.max_maptrack_frames = 1024,
 };
 
 static char *kwd_list[] = { "domid", "ssidref", "handle", "flags",
diff --git a/tools/xl/xl.c b/tools/xl/xl.c
index ddd29b3f1b..b6e220184d 100644
--- a/tools/xl/xl.c
+++ b/tools/xl/xl.c
@@ -51,8 +51,8 @@ libxl_bitmap global_pv_affinity_mask;
 enum output_format default_output_format = OUTPUT_FORMAT_JSON;
 int claim_mode = 1;
 bool progress_use_cr = 0;
-int max_grant_frames = -1;
-int max_maptrack_frames = -1;
+int max_grant_frames = 0;
+int max_maptrack_frames = 0;
 
 xentoollog_level minmsglevel = minmsglevel_default;
 
@@ -96,7 +96,6 @@ static void parse_global_config(const char *configfile,
 XLU_Config *config;
 int e;
 const char *buf;
-libxl_physinfo physinfo;
 
 config = xlu_cfg_init(stderr, configfile);
 if (!config) {
@@ -199,13 +198,6 @@ static void parse_global_config(const char *configfile,
 
 if (!xlu_cfg_get_long (config, "max_grant_frames", , 0))
 max_grant_frames = l;
-else {
-libxl_physinfo_init();
-max_grant_frames = (libxl_get_physinfo(ctx, ) != 0 ||
-!(physinfo.max_possible_mfn >> 32))
-

Re: [Xen-devel] [PATCH for-4.13 v2] AMD/IOMMU: honour IR setting while pre-filling DTEs

2019-11-26 Thread Andrew Cooper
On 26/11/2019 17:08, Igor Druzhinin wrote:
> IV bit shouldn't be set in DTE if interrupt remapping is not
> enabled. It's a regression in behavior of "iommu=no-intremap"
> option which otherwise would keep interrupt requests untranslated
> for all of the devices in the system regardless of wether it's
> described as valid in IVRS or not.
>
> Signed-off-by: Igor Druzhinin 

Reviewed-by: Andrew Cooper 

Juergen: 4.13 justification - this is a regression from 4.12.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.13 v2] AMD/IOMMU: honour IR setting while pre-filling DTEs

2019-11-26 Thread Igor Druzhinin
IV bit shouldn't be set in DTE if interrupt remapping is not
enabled. It's a regression in behavior of "iommu=no-intremap"
option which otherwise would keep interrupt requests untranslated
for all of the devices in the system regardless of wether it's
described as valid in IVRS or not.

Signed-off-by: Igor Druzhinin 
---
 xen/drivers/passthrough/amd/iommu_init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/amd/iommu_init.c 
b/xen/drivers/passthrough/amd/iommu_init.c
index 16e84d4..2b81e38 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table(
 for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
 dt[bdf] = (struct amd_iommu_dte){
   .v = true,
-  .iv = true,
+  .iv = iommu_intremap,
   };
 }
 
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v3 1/2] x86/vmx: add ASSERT to prevent syncing PIR to IRR...

2019-11-26 Thread Jan Beulich
On 26.11.2019 17:47, Roger Pau Monné  wrote:
> On Tue, Nov 26, 2019 at 05:32:04PM +0100, Jan Beulich wrote:
>> On 26.11.2019 14:26, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -2054,6 +2054,19 @@ static void vmx_sync_pir_to_irr(struct vcpu *v)
>>>  unsigned int group, i;
>>>  DECLARE_BITMAP(pending_intr, NR_VECTORS);
>>>  
>>> +if ( v != current && !atomic_read(>pause_count) )
>>> +{
>>> +/*
>>> + * Syncing PIR to IRR must not be done behind the back of the CPU,
>>> + * since the IRR is controlled by the hardware when the vCPU is
>>> + * executing. Only allow Xen to do such sync if the vCPU is the 
>>> current
>>> + * one or if it's paused: that's required in order to sync the 
>>> lapic
>>> + * state before saving it.
>>> + */
>>
>> Is this stated this way by the SDM anywhere?
> 
> No, I think the SDM is not very clear on this, there's a paragraph
> about PIR:
> 
> "The logical processor performs a logical-OR of PIR into VIRR and
> clears PIR. No other agent can read or write a PIR bit (or group of
> bits) between the time it is read (to determine what to OR into VIRR)
> and when it is cleared."

Well, this is about PIR, but my question was rather towards the
effects on vIRR.

>> I ask because the
>> comment then really doesn't apply to just this function, but to
>> vlapic_{,test_and_}{set,clear}_vector() more generally. It's
>> not clear to me at all whether the CPU caches (in an incoherent
>> fashion) IRR (and maybe other APIC page elements), rather than
>> honoring the atomic updates these macros do.
> 
> IMO syncing PIR to IRR when the vCPU is running on a different pCPU is
> likely to at least defeat the purpose of posted interrupts:

I agree here.

> when the
> CPU receives the posted interrupt vector it won't see the
> outstanding-notification bit in the posted-interrupt descriptor
> because the sync done from a different pCPU would have cleared it, at
> which point it's not clear to me that the processor will check vIRR
> for pending interrupts. The description in section 29.6
> POSTED-INTERRUPT PROCESSING doesn't explicitly mention whether the
> value of the outstanding-notification bit affects the logic of posted
> interrupt processing.

But overall this again is all posted interrupt centric when my
question was about vIRR, in particular whether the asserting you
add may need to be even more rigid.

Anyway, let's see what the VMX maintainers have to say.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v3 2/2] x86/vmx: always sync PIR to IRR before vmentry

2019-11-26 Thread Jan Beulich
On 26.11.2019 14:26, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -515,7 +515,11 @@ void hvm_set_callback_via(struct domain *d, uint64_t via)
>  struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
>  {
>  struct hvm_domain *plat = >domain->arch.hvm;
> -int vector;
> +/*
> + * Always call vlapic_has_pending_irq so that PIR is synced into IRR when
> + * using posted interrupts.
> + */
> +int vector = vlapic_has_pending_irq(v);

Did you consider doing this conditionally either here ...

> @@ -530,7 +534,6 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu *v)
>  if ( vlapic_accept_pic_intr(v) && plat->vpic[0].int_output )
>  return hvm_intack_pic(0);
>  
> -vector = vlapic_has_pending_irq(v);
>  if ( vector != -1 )
>  return hvm_intack_lapic(vector);

... or here? I ask not only because the function isn't exactly
cheap to call (as iirc you did also mention during the v2
discussion), but also because of its interaction with Viridian
and nested mode. In case of problems there, avoiding the use
of interrupt posting would be a workaround in such cases then.

> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1945,57 +1945,31 @@ static void vmx_process_isr(int isr, struct vcpu *v)
>  
>  static void __vmx_deliver_posted_interrupt(struct vcpu *v)
>  {
> -bool_t running = v->is_running;
> -
>  vcpu_unblock(v);
> -/*
> - * Just like vcpu_kick(), nothing is needed for the following two cases:
> - * 1. The target vCPU is not running, meaning it is blocked or runnable.
> - * 2. The target vCPU is the current vCPU and we're in non-interrupt
> - * context.
> - */
> -if ( running && (in_irq() || (v != current)) )
> -{
> +if ( v->is_running && v != current )

I continue to be concerned by this evaluation of ->is_running
_after_ vcpu_unblock(), when previously (just like vcpu_kick()
does) it was intentionally done before. I wonder anyway
whether this and the change to irq.c should really be in a
single patch, the more that you start the respective part of
the description with "While there also simplify ...". But in
the end it is up to Kevin's or Jun's judgement anyway.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v3 1/2] x86/vmx: add ASSERT to prevent syncing PIR to IRR...

2019-11-26 Thread Roger Pau Monné
On Tue, Nov 26, 2019 at 05:32:04PM +0100, Jan Beulich wrote:
> On 26.11.2019 14:26, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -2054,6 +2054,19 @@ static void vmx_sync_pir_to_irr(struct vcpu *v)
> >  unsigned int group, i;
> >  DECLARE_BITMAP(pending_intr, NR_VECTORS);
> >  
> > +if ( v != current && !atomic_read(>pause_count) )
> > +{
> > +/*
> > + * Syncing PIR to IRR must not be done behind the back of the CPU,
> > + * since the IRR is controlled by the hardware when the vCPU is
> > + * executing. Only allow Xen to do such sync if the vCPU is the 
> > current
> > + * one or if it's paused: that's required in order to sync the 
> > lapic
> > + * state before saving it.
> > + */
> 
> Is this stated this way by the SDM anywhere?

No, I think the SDM is not very clear on this, there's a paragraph
about PIR:

"The logical processor performs a logical-OR of PIR into VIRR and
clears PIR. No other agent can read or write a PIR bit (or group of
bits) between the time it is read (to determine what to OR into VIRR)
and when it is cleared."

> I ask because the
> comment then really doesn't apply to just this function, but to
> vlapic_{,test_and_}{set,clear}_vector() more generally. It's
> not clear to me at all whether the CPU caches (in an incoherent
> fashion) IRR (and maybe other APIC page elements), rather than
> honoring the atomic updates these macros do.

IMO syncing PIR to IRR when the vCPU is running on a different pCPU is
likely to at least defeat the purpose of posted interrupts: when the
CPU receives the posted interrupt vector it won't see the
outstanding-notification bit in the posted-interrupt descriptor
because the sync done from a different pCPU would have cleared it, at
which point it's not clear to me that the processor will check vIRR
for pending interrupts. The description in section 29.6
POSTED-INTERRUPT PROCESSING doesn't explicitly mention whether the
value of the outstanding-notification bit affects the logic of posted
interrupt processing.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v3 0/2] x86/vmx: posted interrupt fixes

2019-11-26 Thread Roger Pau Monné
On Tue, Nov 26, 2019 at 02:26:46PM +0100, Roger Pau Monne wrote:
> Hello,
> 
> The following series aim to solve the issue reported by Joe Jin related
> to posted interrupts.

Regarding the release blockers email, and the qualification of this
series:

 - a regression introduced since 4.12

This is not a regression, since AFAICT the posted interrupt code has
always been like this.

 - a severe bug of a 4.13 feature

The bug seems to impact people using PCI-passthrough on Intel hardware
that supports posted interrupts (aka APICv). In my opinion, we either
fix it or disable APICv by default (now it's currently enabled by
default).

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v3 2/2] x86/vmx: always sync PIR to IRR before vmentry

2019-11-26 Thread Joe Jin
On 11/26/19 5:26 AM, Roger Pau Monne wrote:
> When using posted interrupts on Intel hardware it's possible that the
> vCPU resumes execution with a stale local APIC IRR register because
> depending on the interrupts to be injected vlapic_has_pending_irq
> might not be called, and thus PIR won't be synced into IRR.
> 
> Fix this by making sure PIR is always synced to IRR in
> hvm_vcpu_has_pending_irq regardless of what interrupts are pending.
> 
> While there also simplify the code in __vmx_deliver_posted_interrupt:
> only raise a softirq if the vCPU is the one currently running and
> __vmx_deliver_posted_interrupt is called from interrupt context. The
> softirq is raised to make sure vmx_intr_assist is retried if the
> interrupt happens to arrive after vmx_intr_assist but before
> interrupts are disabled in vmx_do_vmentry. Also simplify the logic for
> IPIing other pCPUs, there's no need to check v->processor since the
> IPI should be sent as long as the vCPU is not the current one and it's
> running.
> 
> Reported-by: Joe Jin 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Juergen Gross 

Patch works for me.
Tested-by: Joe Jin 

Thanks,
Joe

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v3 1/2] x86/vmx: add ASSERT to prevent syncing PIR to IRR...

2019-11-26 Thread Jan Beulich
On 26.11.2019 14:26, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2054,6 +2054,19 @@ static void vmx_sync_pir_to_irr(struct vcpu *v)
>  unsigned int group, i;
>  DECLARE_BITMAP(pending_intr, NR_VECTORS);
>  
> +if ( v != current && !atomic_read(>pause_count) )
> +{
> +/*
> + * Syncing PIR to IRR must not be done behind the back of the CPU,
> + * since the IRR is controlled by the hardware when the vCPU is
> + * executing. Only allow Xen to do such sync if the vCPU is the 
> current
> + * one or if it's paused: that's required in order to sync the lapic
> + * state before saving it.
> + */

Is this stated this way by the SDM anywhere? I ask because the
comment then really doesn't apply to just this function, but to
vlapic_{,test_and_}{set,clear}_vector() more generally. It's
not clear to me at all whether the CPU caches (in an incoherent
fashion) IRR (and maybe other APIC page elements), rather than
honoring the atomic updates these macros do.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Block Tap driver

2019-11-26 Thread Roxana Nicolescu

Hello,


I am doing a bit of research on block tap, and I cannot find the source 
code of a blktap driver module in the linux kernel.


I see in the linux tree some references to commits related to blktap, 
but I am stuck with finding the actual code.


It would be really helpful to provide me some directions because I am 
completely lost.


I also noticed that blktap2 from tools was removed and added again 
later. What is the reason for that?


Thank you very much!


Best,

Roxana


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 v2] docs/xl: Document pci-assignable state

2019-11-26 Thread Ian Jackson
George Dunlap writes ("[PATCH for-4.13 v2] docs/xl: Document pci-assignable 
state"):
> Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and
> ba2ab00bbb ("IOMMU: default to always quarantining PCI devices")
> introduced PCI device "quarantine" behavior, but did not document how
> the pci-assignable-add and -remove functions act in regard to this.
> Rectify this.
> 
> Signed-off-by: George Dunlap 
> Release-acked-by: Juergen Gross 

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP

2019-11-26 Thread Andrew Cooper
On 26/11/2019 16:14, Jan Beulich wrote:
> On 26.11.2019 17:11, Andrew Cooper wrote:
>> On 26/11/2019 16:05, Jan Beulich wrote:
>>> On 26.11.2019 16:59, Andrew Cooper wrote:
 On 26/11/2019 15:32, Jan Beulich wrote:
> On 26.11.2019 13:03, Andrew Cooper wrote:
>> ICEBP isn't handled well by SVM.
>>
>> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the
>> appropriate instruction boundary (fault or trap, as appropriate), except 
>> for
>> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP 
>> instruction
>> rather than after it.  As ICEBP isn't distinguished in the vectoring 
>> event
>> type, the state is ambiguous.
>>
>> To add to the confusion, an ICEBP which occurs due to Introspection
>> intercepting the instruction, or from x86_emulate() will have %rip 
>> updated as
>> a consequence of partial emulation required to inject an ICEBP event in 
>> the
>> first place.
>>
>> We could in principle spot the non-injected case in the TASK_SWITCH 
>> handler,
>> but this still results in complexity if the ICEBP instruction also has an
>> Instruction Breakpoint active on it (which genuinely has fault 
>> semantics).
>>
>> Unconditionally intercept ICEBP.  This does have a trap semantics for the
>> intercept, and allows us to move %rip forwards appropriately before the
>> TASK_SWITCH intercept is hit.
> Both because of you mentioning the moving forwards of %rip and with the
> irc discussion in mind that we had no irc, don't you mean "fault
> semantics" here?
 ICEBP really is too broken under SVM to handle architecturally.

 The ICEBP intercept has nRIP decode support, because it is an
 instruction intercept.  We emulate the injection (because it is ICEBP),
 which means we re-enter the guest with %rip moved forward, and #DB
 (HW_EXCEPTION) pending for injection.  This means that...

>  If so
> Reviewed-by: Jan Beulich 
 ... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after
 the ICEBP instruction, rather than at it, making it consistent with
 every other #DB-vectored TASK_SWITCH.

 This does means that an early task-switch fault for ICEBP will reliably
 be delivered with the wrong (i.e. trap) semantics, but this is less bad
 than mixed fault/trap semantics depending on whether the source of the
 ICEBP was introspection/emulation or native execution.

 We could restore proper fault behaviour by extending
 svm_emul_swint_injection() to figure out that a task switch is needed,
 and invoke hvm_task_switch() directly, but I don't have enough TUITS
 right now.

> Otherwise I guess I'm still missing something.
 I hope this clears it up.
>>> Well, it helps, but you don't really answer the question: Is "trap"
>>> in that sentence of the description really correct? I.e. don't you
>>> instead mean "fault" there?
>> I've reworded that bit to:
>>
>> Unconditionally intercept ICEBP.  This does have NRIPs support as it is an
>> instruction intercept, which allows us allows us to move %rip forwards
>> appropriately before the TASK_SWITCH intercept is hit.  This allows...
>>
>> Any better?
> Ah yes, thanks. (But please drop one of the two "allows us".)

Oops yes.  Irritatingly, that causes #DB-vectored to move onto a new
line, and trigger Git's comment syntax.  I'll tweak a little bit more.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] 4.13 Release blockers

2019-11-26 Thread Jürgen Groß

In order to be able to release Xen 4.13 we need to get all regressions
fixed rather soon. I know there are quite some patches waiting to be
taken for 4.13.

So please, don't tag any further patches as "for 4.13" if they are not
fixing either:

- a regression introduced since 4.12
- a severe bug of a 4.13 feature

I'd like to ask all patch authors who have pending patches "for 4.13"
to reply to their patches clearly stating whether the patch qualifies
for 4.13 regarding above rules.

For any such pending patches I'd like the maintainers to review those
patches at high priority.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] xen/blkback: Avoid unmapping unmapped grant pages

2019-11-26 Thread SeongJae Park
From: SeongJae Park 

For each I/O request, blkback first maps the foreign pages for the
request to its local pages.  If an allocation of a local page for the
mapping fails, it should unmap every mapping already made for the
request.

However, blkback's handling mechanism for the allocation failure does
not mark the remaining foreign pages as unmapped.  Therefore, the unmap
function merely tries to unmap every valid grant page for the request,
including the pages not mapped due to the allocation failure.  On a
system that fails the allocation frequently, this problem leads to
following kernel crash.

  [  372.012538] BUG: unable to handle kernel NULL pointer dereference at 
0001
  [  372.012546] IP: [] gnttab_unmap_refs.part.7+0x1c/0x40
  [  372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0
  [  372.012562] Oops: 0002 [#1] SMP
  [  372.012566] Modules linked in: act_police sch_ingress cls_u32
  ...
  [  372.012746] Call Trace:
  [  372.012752]  [] gnttab_unmap_refs+0x34/0x40
  [  372.012759]  [] xen_blkbk_unmap+0x83/0x150 [xen_blkback]
  ...
  [  372.012802]  [] dispatch_rw_block_io+0x970/0x980 
[xen_blkback]
  ...
  Decompressing Linux... Parsing ELF... done.
  Booting the kernel.
  [0.00] Initializing cgroup subsys cpuset

This commit fixes this problem by marking the grant pages of the given
request that didn't mapped due to the allocation failure as invalid.

Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings")

Signed-off-by: SeongJae Park 
Reviewed-by: David Woodhouse 
Reviewed-by: Maximilian Heyne 
Reviewed-by: Paul Durrant 
---
 drivers/block/xen-blkback/blkback.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd1e19f1a49f..3666afa639d1 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -936,6 +936,8 @@ static int xen_blkbk_map(struct xen_blkif_ring *ring,
 out_of_memory:
pr_alert("%s: out of memory\n", __func__);
put_free_pages(ring, pages_to_gnt, segs_to_map);
+   for (i = last_map; i < num; i++)
+   pages[i]->handle = BLKBACK_INVALID_HANDLE;
return -ENOMEM;
 }
 
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP

2019-11-26 Thread Jan Beulich
On 26.11.2019 17:11, Andrew Cooper wrote:
> On 26/11/2019 16:05, Jan Beulich wrote:
>> On 26.11.2019 16:59, Andrew Cooper wrote:
>>> On 26/11/2019 15:32, Jan Beulich wrote:
 On 26.11.2019 13:03, Andrew Cooper wrote:
> ICEBP isn't handled well by SVM.
>
> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the
> appropriate instruction boundary (fault or trap, as appropriate), except 
> for
> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP 
> instruction
> rather than after it.  As ICEBP isn't distinguished in the vectoring event
> type, the state is ambiguous.
>
> To add to the confusion, an ICEBP which occurs due to Introspection
> intercepting the instruction, or from x86_emulate() will have %rip 
> updated as
> a consequence of partial emulation required to inject an ICEBP event in 
> the
> first place.
>
> We could in principle spot the non-injected case in the TASK_SWITCH 
> handler,
> but this still results in complexity if the ICEBP instruction also has an
> Instruction Breakpoint active on it (which genuinely has fault semantics).
>
> Unconditionally intercept ICEBP.  This does have a trap semantics for the
> intercept, and allows us to move %rip forwards appropriately before the
> TASK_SWITCH intercept is hit.
 Both because of you mentioning the moving forwards of %rip and with the
 irc discussion in mind that we had no irc, don't you mean "fault
 semantics" here?
>>> ICEBP really is too broken under SVM to handle architecturally.
>>>
>>> The ICEBP intercept has nRIP decode support, because it is an
>>> instruction intercept.  We emulate the injection (because it is ICEBP),
>>> which means we re-enter the guest with %rip moved forward, and #DB
>>> (HW_EXCEPTION) pending for injection.  This means that...
>>>
  If so
 Reviewed-by: Jan Beulich 
>>> ... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after
>>> the ICEBP instruction, rather than at it, making it consistent with
>>> every other #DB-vectored TASK_SWITCH.
>>>
>>> This does means that an early task-switch fault for ICEBP will reliably
>>> be delivered with the wrong (i.e. trap) semantics, but this is less bad
>>> than mixed fault/trap semantics depending on whether the source of the
>>> ICEBP was introspection/emulation or native execution.
>>>
>>> We could restore proper fault behaviour by extending
>>> svm_emul_swint_injection() to figure out that a task switch is needed,
>>> and invoke hvm_task_switch() directly, but I don't have enough TUITS
>>> right now.
>>>
 Otherwise I guess I'm still missing something.
>>> I hope this clears it up.
>> Well, it helps, but you don't really answer the question: Is "trap"
>> in that sentence of the description really correct? I.e. don't you
>> instead mean "fault" there?
> 
> I've reworded that bit to:
> 
> Unconditionally intercept ICEBP.  This does have NRIPs support as it is an
> instruction intercept, which allows us allows us to move %rip forwards
> appropriately before the TASK_SWITCH intercept is hit.  This allows...
> 
> Any better?

Ah yes, thanks. (But please drop one of the two "allows us".)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP

2019-11-26 Thread Andrew Cooper
On 26/11/2019 16:05, Jan Beulich wrote:
> On 26.11.2019 16:59, Andrew Cooper wrote:
>> On 26/11/2019 15:32, Jan Beulich wrote:
>>> On 26.11.2019 13:03, Andrew Cooper wrote:
 ICEBP isn't handled well by SVM.

 The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the
 appropriate instruction boundary (fault or trap, as appropriate), except 
 for
 an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP 
 instruction
 rather than after it.  As ICEBP isn't distinguished in the vectoring event
 type, the state is ambiguous.

 To add to the confusion, an ICEBP which occurs due to Introspection
 intercepting the instruction, or from x86_emulate() will have %rip updated 
 as
 a consequence of partial emulation required to inject an ICEBP event in the
 first place.

 We could in principle spot the non-injected case in the TASK_SWITCH 
 handler,
 but this still results in complexity if the ICEBP instruction also has an
 Instruction Breakpoint active on it (which genuinely has fault semantics).

 Unconditionally intercept ICEBP.  This does have a trap semantics for the
 intercept, and allows us to move %rip forwards appropriately before the
 TASK_SWITCH intercept is hit.
>>> Both because of you mentioning the moving forwards of %rip and with the
>>> irc discussion in mind that we had no irc, don't you mean "fault
>>> semantics" here?
>> ICEBP really is too broken under SVM to handle architecturally.
>>
>> The ICEBP intercept has nRIP decode support, because it is an
>> instruction intercept.  We emulate the injection (because it is ICEBP),
>> which means we re-enter the guest with %rip moved forward, and #DB
>> (HW_EXCEPTION) pending for injection.  This means that...
>>
>>>  If so
>>> Reviewed-by: Jan Beulich 
>> ... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after
>> the ICEBP instruction, rather than at it, making it consistent with
>> every other #DB-vectored TASK_SWITCH.
>>
>> This does means that an early task-switch fault for ICEBP will reliably
>> be delivered with the wrong (i.e. trap) semantics, but this is less bad
>> than mixed fault/trap semantics depending on whether the source of the
>> ICEBP was introspection/emulation or native execution.
>>
>> We could restore proper fault behaviour by extending
>> svm_emul_swint_injection() to figure out that a task switch is needed,
>> and invoke hvm_task_switch() directly, but I don't have enough TUITS
>> right now.
>>
>>> Otherwise I guess I'm still missing something.
>> I hope this clears it up.
> Well, it helps, but you don't really answer the question: Is "trap"
> in that sentence of the description really correct? I.e. don't you
> instead mean "fault" there?

I've reworded that bit to:

Unconditionally intercept ICEBP.  This does have NRIPs support as it is an
instruction intercept, which allows us allows us to move %rip forwards
appropriately before the TASK_SWITCH intercept is hit.  This allows...

Any better?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP

2019-11-26 Thread Andrew Cooper
On 26/11/2019 15:34, Roger Pau Monné wrote:
> On Tue, Nov 26, 2019 at 12:03:56PM +, Andrew Cooper wrote:
>> ICEBP isn't handled well by SVM.
>>
>> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the
>> appropriate instruction boundary (fault or trap, as appropriate), except for
>> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction
>> rather than after it.  As ICEBP isn't distinguished in the vectoring event
>> type, the state is ambiguous.
>>
>> To add to the confusion, an ICEBP which occurs due to Introspection
>> intercepting the instruction, or from x86_emulate() will have %rip updated as
>> a consequence of partial emulation required to inject an ICEBP event in the
>> first place.
>>
>> We could in principle spot the non-injected case in the TASK_SWITCH handler,
>> but this still results in complexity if the ICEBP instruction also has an
>> Instruction Breakpoint active on it (which genuinely has fault semantics).
>>
>> Unconditionally intercept ICEBP.  This does have a trap semantics for the
>> intercept, and allows us to move %rip forwards appropriately before the
>> TASK_SWITCH intercept is hit.  This makes the behaviour of #DB-vectored
>> switches consistent however the ICEBP #DB came about, and avoids special 
>> cases
>> in the TASK_SWITCH intercept.
>>
>> This in turn allows for the removal of the conditional
>> hvm_set_icebp_interception() logic used by the monitor subsystem, as ICEBP's
>> will now always be submitted for monitoring checks.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Roger Pau Monné 
>
> AFAICT this brings AMD implementation inline with Intel that also will
> unconditionally vmexit on icebp?

VT-x and SVM handle things quite differently.

VT-x has no instruction intercept for ICEBP, but the #DB intercept will
triggered by an ICEBP instruction.  ICEBP has its own event type
(Privileged Software Exception, which is an amusing name considering it
is an unprivleged instruction, bypasses privilege checks, and sets the
External bit in an error code).

SVM does have an instruction intercept for ICEBP, but the #DB from
ICEBP's don't trigger the normal #DB intercept.  However, secondary
#DB's generated by ICEBP's unintercepted #DB do trigger the #DB intercept.

For safety reasons we must intercept #DB to prevent CPU deadlocks.  This
means that ICEBP are in practice always intercepted on Intel due to
their #DB side effect, but they weren't intercepted on AMD, which is why
the monitor subsystem had a way of turning interception on.

So yes, the overall effect is that ICEBPs will now unconditionally
vmexit on both Intel and AMD, but underlying mechanism for why they
vmexit is still vendor-specific.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP

2019-11-26 Thread Jan Beulich
On 26.11.2019 16:59, Andrew Cooper wrote:
> On 26/11/2019 15:32, Jan Beulich wrote:
>> On 26.11.2019 13:03, Andrew Cooper wrote:
>>> ICEBP isn't handled well by SVM.
>>>
>>> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the
>>> appropriate instruction boundary (fault or trap, as appropriate), except for
>>> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction
>>> rather than after it.  As ICEBP isn't distinguished in the vectoring event
>>> type, the state is ambiguous.
>>>
>>> To add to the confusion, an ICEBP which occurs due to Introspection
>>> intercepting the instruction, or from x86_emulate() will have %rip updated 
>>> as
>>> a consequence of partial emulation required to inject an ICEBP event in the
>>> first place.
>>>
>>> We could in principle spot the non-injected case in the TASK_SWITCH handler,
>>> but this still results in complexity if the ICEBP instruction also has an
>>> Instruction Breakpoint active on it (which genuinely has fault semantics).
>>>
>>> Unconditionally intercept ICEBP.  This does have a trap semantics for the
>>> intercept, and allows us to move %rip forwards appropriately before the
>>> TASK_SWITCH intercept is hit.
>> Both because of you mentioning the moving forwards of %rip and with the
>> irc discussion in mind that we had no irc, don't you mean "fault
>> semantics" here?
> 
> ICEBP really is too broken under SVM to handle architecturally.
> 
> The ICEBP intercept has nRIP decode support, because it is an
> instruction intercept.  We emulate the injection (because it is ICEBP),
> which means we re-enter the guest with %rip moved forward, and #DB
> (HW_EXCEPTION) pending for injection.  This means that...
> 
>>  If so
>> Reviewed-by: Jan Beulich 
> 
> ... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after
> the ICEBP instruction, rather than at it, making it consistent with
> every other #DB-vectored TASK_SWITCH.
> 
> This does means that an early task-switch fault for ICEBP will reliably
> be delivered with the wrong (i.e. trap) semantics, but this is less bad
> than mixed fault/trap semantics depending on whether the source of the
> ICEBP was introspection/emulation or native execution.
> 
> We could restore proper fault behaviour by extending
> svm_emul_swint_injection() to figure out that a task switch is needed,
> and invoke hvm_task_switch() directly, but I don't have enough TUITS
> right now.
> 
>> Otherwise I guess I'm still missing something.
> 
> I hope this clears it up.

Well, it helps, but you don't really answer the question: Is "trap"
in that sentence of the description really correct? I.e. don't you
instead mean "fault" there?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP

2019-11-26 Thread Andrew Cooper
On 26/11/2019 15:32, Jan Beulich wrote:
> On 26.11.2019 13:03, Andrew Cooper wrote:
>> ICEBP isn't handled well by SVM.
>>
>> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the
>> appropriate instruction boundary (fault or trap, as appropriate), except for
>> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction
>> rather than after it.  As ICEBP isn't distinguished in the vectoring event
>> type, the state is ambiguous.
>>
>> To add to the confusion, an ICEBP which occurs due to Introspection
>> intercepting the instruction, or from x86_emulate() will have %rip updated as
>> a consequence of partial emulation required to inject an ICEBP event in the
>> first place.
>>
>> We could in principle spot the non-injected case in the TASK_SWITCH handler,
>> but this still results in complexity if the ICEBP instruction also has an
>> Instruction Breakpoint active on it (which genuinely has fault semantics).
>>
>> Unconditionally intercept ICEBP.  This does have a trap semantics for the
>> intercept, and allows us to move %rip forwards appropriately before the
>> TASK_SWITCH intercept is hit.
> Both because of you mentioning the moving forwards of %rip and with the
> irc discussion in mind that we had no irc, don't you mean "fault
> semantics" here?

ICEBP really is too broken under SVM to handle architecturally.

The ICEBP intercept has nRIP decode support, because it is an
instruction intercept.  We emulate the injection (because it is ICEBP),
which means we re-enter the guest with %rip moved forward, and #DB
(HW_EXCEPTION) pending for injection.  This means that...

>  If so
> Reviewed-by: Jan Beulich 

... the ICEBP-#DB-vectored TASK_SWITCH will now find %rip pointing after
the ICEBP instruction, rather than at it, making it consistent with
every other #DB-vectored TASK_SWITCH.

This does means that an early task-switch fault for ICEBP will reliably
be delivered with the wrong (i.e. trap) semantics, but this is less bad
than mixed fault/trap semantics depending on whether the source of the
ICEBP was introspection/emulation or native execution.

We could restore proper fault behaviour by extending
svm_emul_swint_injection() to figure out that a task switch is needed,
and invoke hvm_task_switch() directly, but I don't have enough TUITS
right now.

> Otherwise I guess I'm still missing something.

I hope this clears it up.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 for 4.13] x86/microcode: refuse to load the same revision ucode

2019-11-26 Thread Jan Beulich
On 26.11.2019 16:41, Sergey Dyasli wrote:
> Currently if a user tries to live-load the same or older ucode revision
> than CPU already has, he will get a single message in Xen log like:
> 
> (XEN) 128 cores are to update their microcode
> 
> No actual ucode loading will happen and this situation can be quite
> confusing. Fix this by starting ucode update only when the provided
> ucode revision is higher than the currently cached one (if any).
> This is based on the property that if microcode_cache exists, all CPUs
> in the system should have at least that ucode revision.
> 
> Additionally, print a user friendly message if no matching or newer
> ucode can be found in the provided blob. This also requires ignoring
> -ENODATA in AMD-side code, otherwise the message given to the user is:
> 
> (XEN) Parsing microcode blob error -61
> 
> Which actually means that a ucode blob was parsed fine, but no matching
> ucode was found.
> 
> Signed-off-by: Sergey Dyasli 

Acked-by: Jan Beulich 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.13 v2] docs/xl: Document pci-assignable state

2019-11-26 Thread George Dunlap
Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and
ba2ab00bbb ("IOMMU: default to always quarantining PCI devices")
introduced PCI device "quarantine" behavior, but did not document how
the pci-assignable-add and -remove functions act in regard to this.
Rectify this.

Signed-off-by: George Dunlap 
Release-acked-by: Juergen Gross 
---
Release justification: This brings documentation into line with the
actual code that will be released.

CC: Ian Jackson 
CC: Wei Liu 
CC: Jan Beulich 
CC: Paul Durrant 
CC: Juergen Gross 
---
 docs/man/xl.1.pod.in | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 2303b81e4f..d4b5e8e362 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -1589,10 +1589,12 @@ backend driver in domain 0 rather than a real driver.
 =item B I
 
 Make the device at PCI Bus/Device/Function BDF assignable to guests.
-This will bind the device to the pciback driver.  If it is already
-bound to a driver, it will first be unbound, and the original driver
-stored so that it can be re-bound to the same driver later if desired.
-If the device is already bound, it will return success.
+This will bind the device to the pciback driver and assign it to the
+"quarantine domain".  If it is already bound to a driver, it will
+first be unbound, and the original driver stored so that it can be
+re-bound to the same driver later if desired.  If the device is
+already bound, it will assign it to the quarantine domain and return
+success.
 
 CAUTION: This will make the device unusable by Domain 0 until it is
 returned with pci-assignable-remove.  Care should therefore be taken
@@ -1602,11 +1604,22 @@ being used.
 
 =item B [I<-r>] I
 
-Make the device at PCI Bus/Device/Function BDF not assignable to guests.  This
-will at least unbind the device from pciback.  If the -r option is specified,
-it will also attempt to re-bind the device to its original driver, making it
-usable by Domain 0 again.  If the device is not bound to pciback, it will
-return success.
+Make the device at PCI Bus/Device/Function BDF not assignable to
+guests.  This will at least unbind the device from pciback, and
+re-assign it from the "quarantine domain" back to domain 0.  If the -r
+option is specified, it will also attempt to re-bind the device to its
+original driver, making it usable by Domain 0 again.  If the device is
+not bound to pciback, it will return success.
+
+Note that this functionality will work even for devices which were not
+made assignable by B.  This can be used to allow
+dom0 to access devices which were automatically quarantined by Xen
+after domain destruction as a result of Xen's B
+command-line default.
+
+As always, this should only be done if you trust the guest, or are
+confident that the particular device you're re-assigning to dom0 will
+cancel all in-flight DMA on FLR.
 
 =item B I I
 
-- 
2.24.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/3] x86/svm: Write the correct %eip into the outgoing task

2019-11-26 Thread Jan Beulich
On 26.11.2019 13:03, Andrew Cooper wrote:
> The TASK_SWITCH vmexit has fault semantics, and doesn't provide any NRIPs
> assistance with instruction length.  As a result, any instruction-induced task
> switch has the outgoing task's %eip pointing at the instruction switch caused
> the switch, rather than after it.
> 
> This causes callers of task gates to livelock (repeatedly execute the call/jmp
> to enter the task), and any restartable task to become a nop after its first
> use (the (re)entry state points at the ret/iret used to exit the task).
> 
> 32bit Windows in particular is known to use task gates for NMI handling, and
> to use NMI IPIs.
> 
> In the task switch handler, distinguish instruction-induced from
> interrupt/exception-induced task switches, and decode the instruction under
> %rip to calculate its length.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 for 4.13] x86/microcode: refuse to load the same revision ucode

2019-11-26 Thread Sergey Dyasli
Currently if a user tries to live-load the same or older ucode revision
than CPU already has, he will get a single message in Xen log like:

(XEN) 128 cores are to update their microcode

No actual ucode loading will happen and this situation can be quite
confusing. Fix this by starting ucode update only when the provided
ucode revision is higher than the currently cached one (if any).
This is based on the property that if microcode_cache exists, all CPUs
in the system should have at least that ucode revision.

Additionally, print a user friendly message if no matching or newer
ucode can be found in the provided blob. This also requires ignoring
-ENODATA in AMD-side code, otherwise the message given to the user is:

(XEN) Parsing microcode blob error -61

Which actually means that a ucode blob was parsed fine, but no matching
ucode was found.

Signed-off-by: Sergey Dyasli 
---
v2 --> v3:
- move ucode comparison to generic code
- ignore -ENODATA in a different code section

v1 --> v2:
- compare provided ucode with the currently cached one

CC: Jan Beulich 
CC: Andrew Cooper 
CC: Roger Pau Monné 
CC: Chao Gao 
CC: Juergen Gross 
---
 xen/arch/x86/microcode.c | 19 +++
 xen/arch/x86/microcode_amd.c |  7 +++
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 65d1f41e7c..44efc2d9b3 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -640,10 +640,29 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) 
buf, unsigned long len)
 
 if ( !patch )
 {
+printk(XENLOG_WARNING "microcode: couldn't find any matching ucode in "
+  "the provided blob!\n");
 ret = -ENOENT;
 goto put;
 }
 
+/*
+ * If microcode_cache exists, all CPUs in the system should have at least
+ * that ucode revision.
+ */
+spin_lock(_mutex);
+if ( microcode_cache &&
+ microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE )
+{
+spin_unlock(_mutex);
+printk(XENLOG_WARNING "microcode: couldn't find any newer revision "
+  "in the provided blob!\n");
+ret = -ENOENT;
+
+goto put;
+}
+spin_unlock(_mutex);
+
 if ( microcode_ops->start_update )
 {
 ret = microcode_ops->start_update();
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 1e52f7f49a..00750f7bbb 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -502,6 +502,13 @@ static struct microcode_patch *cpu_request_microcode(const 
void *buf,
 
 if ( error )
 {
+/*
+ * -ENODATA here means that the blob was parsed fine but no matching
+ * ucode was found. Don't return it to the caller.
+ */
+if ( error == -ENODATA )
+error = 0;
+
 xfree(mc_amd->equiv_cpu_table);
 xfree(mc_amd);
 goto out;
-- 
2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13 0/2] Fixes to AMD IOMMU logging

2019-11-26 Thread Jürgen Groß

On 26.11.19 16:01, Andrew Cooper wrote:

RFC for-4.13.  These are diagnostic improvements/corrections, so are low risk
and high utility.


Sorry, but the release is at least 1 month late now. As said before:
I'll only take real bug fixes now.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP

2019-11-26 Thread Roger Pau Monné
On Tue, Nov 26, 2019 at 12:03:56PM +, Andrew Cooper wrote:
> ICEBP isn't handled well by SVM.
> 
> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the
> appropriate instruction boundary (fault or trap, as appropriate), except for
> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction
> rather than after it.  As ICEBP isn't distinguished in the vectoring event
> type, the state is ambiguous.
> 
> To add to the confusion, an ICEBP which occurs due to Introspection
> intercepting the instruction, or from x86_emulate() will have %rip updated as
> a consequence of partial emulation required to inject an ICEBP event in the
> first place.
> 
> We could in principle spot the non-injected case in the TASK_SWITCH handler,
> but this still results in complexity if the ICEBP instruction also has an
> Instruction Breakpoint active on it (which genuinely has fault semantics).
> 
> Unconditionally intercept ICEBP.  This does have a trap semantics for the
> intercept, and allows us to move %rip forwards appropriately before the
> TASK_SWITCH intercept is hit.  This makes the behaviour of #DB-vectored
> switches consistent however the ICEBP #DB came about, and avoids special cases
> in the TASK_SWITCH intercept.
> 
> This in turn allows for the removal of the conditional
> hvm_set_icebp_interception() logic used by the monitor subsystem, as ICEBP's
> will now always be submitted for monitoring checks.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

AFAICT this brings AMD implementation inline with Intel that also will
unconditionally vmexit on icebp?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP

2019-11-26 Thread Jan Beulich
On 26.11.2019 13:03, Andrew Cooper wrote:
> ICEBP isn't handled well by SVM.
> 
> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to the
> appropriate instruction boundary (fault or trap, as appropriate), except for
> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP instruction
> rather than after it.  As ICEBP isn't distinguished in the vectoring event
> type, the state is ambiguous.
> 
> To add to the confusion, an ICEBP which occurs due to Introspection
> intercepting the instruction, or from x86_emulate() will have %rip updated as
> a consequence of partial emulation required to inject an ICEBP event in the
> first place.
> 
> We could in principle spot the non-injected case in the TASK_SWITCH handler,
> but this still results in complexity if the ICEBP instruction also has an
> Instruction Breakpoint active on it (which genuinely has fault semantics).
> 
> Unconditionally intercept ICEBP.  This does have a trap semantics for the
> intercept, and allows us to move %rip forwards appropriately before the
> TASK_SWITCH intercept is hit.

Both because of you mentioning the moving forwards of %rip and with the
irc discussion in mind that we had no irc, don't you mean "fault
semantics" here? If so
Reviewed-by: Jan Beulich 
Otherwise I guess I'm still missing something.

> ---
>  xen/arch/x86/hvm/svm/svm.c| 19 ---
>  xen/arch/x86/hvm/svm/vmcb.c   |  2 +-
>  xen/arch/x86/monitor.c|  3 ---
>  xen/include/asm-x86/hvm/hvm.h | 11 ---
>  4 files changed, 1 insertion(+), 34 deletions(-)

This, of course, is pretty nice in any event.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 26 November 2019 14:27
> To: Ian Jackson 
> Cc: Juergen Gross ; xen-devel@lists.xenproject.org; Paul
> Durrant ; George Dunlap
> ; Wei Liu 
> Subject: Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable
> state
> 
> On 26.11.2019 15:14, Ian Jackson wrote:
> > George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable
> state"):
> >>  =item B [I<-r>] I
> > ...
> >> +Make the device at PCI Bus/Device/Function BDF not assignable to
> >> +guests.  This will at least unbind the device from pciback, and
> >> +re-assign it from the "quarantine domain" back to domain 0.  If the -r
> >> +option is specified, it will also attempt to re-bind the device to its
> >> +original driver, making it usable by Domain 0 again.  If the device is
> >> +not bound to pciback, it will return success.
> >> +
> >> +Note that this functionality will work even for devices which were not
> >> +made assignable by B.  This can be used to allow
> >> +dom0 to access devices which were automatically quarantined by Xen
> >> +after domain destruction as a result of Xen's B
> >> +command-line default.
> >
> > What are the security implications of doing this if the device might
> > still be doing DMA or something ?
> 
> Devices get reset in between, so well behaving ones should not
> still be doing DMA at that point. Misbehaving ones would better
> not be assigned (back and forth) anyway. But a recent patch of
> Paul's suggests that people still wish to do so, on the
> assumption that such DMA will drain sufficiently quickly.

Yes. I will hopefully find time to post the next version of that patch this 
week.

  Paul

> 
> > (For that matter, presumably there are security implications of
> > assigning the same device in sequence to different guests?)
> 
> Right.
> 
> Jan
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 26 November 2019 15:06
> To: George Dunlap ; xen-
> de...@lists.xenproject.org; Wei Liu ; Jan Beulich
> ; Durrant, Paul ; Juergen Gross
> 
> Subject: Re: [PATCH for-4.13] docs/xl: Document pci-assignable state
> 
> Ian Jackson writes ("Re: [PATCH for-4.13] docs/xl: Document pci-assignable
> state"):
> > George Dunlap writes ("Re: [PATCH for-4.13] docs/xl: Document pci-
> assignable state"):
> > > I kind of feel like the discussion of the security risks inherent in
> pci
> > > passthrough belong in a separate document, but perhaps a brief mention
> > > here would be helpful.  Perhaps the following?
> > >
> > > "As always, this should only be done if you trust the guest, or are
> > > confident that the particular device you're re-assigning to dom0 will
> > > cancel all in-flight DMA on FLR."
> >
> > SGTM.
> >
> > I like "as always" which clearly signals that this is a more general
> > problem without requiring us to actually write that other
> > comprehensive document...
> 

The text sounds fine in general but the 'as always' does rather imply 'hey, we 
never said PCI pass-through was safe, did we?'

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/arm: remove physical timer offset

2019-11-26 Thread Jeff Kubascik
On 11/25/2019 5:07 PM, Julien Grall wrote:
> Hi,
> 
> On 25/11/2019 16:14, Jeff Kubascik wrote:
>> The physical timer traps apply an offset so that time starts at 0 for
>> the guest. However, this offset is not currently applied to the physical
>> counter. Per the ARMv8 Arch Reference Manual, the offset between the
> 
> Which bit of the Arm Arm do you refer to here? In general, I would
> recommend to give the exact section and version of the manual you use to
> avoid any misunderstanding.

Fair point, I'll clarify this.

>> physical timer and counter should be 0. This removes the offset to make
>> the timer and counter consistent.
>>
>> Xen time is at offset boot_count from the physical counter, so we need
>> to take this into account when reading/writing to CNTP_CVAL.
>>
>> Signed-off-by: Jeff Kubascik 
>> ---
>>   xen/arch/arm/vtimer.c| 18 ++
>>   xen/include/asm-arm/domain.h |  3 ---
>>   2 files changed, 6 insertions(+), 15 deletions(-)
>>
>> diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
>> index e6aebdac9e..4790b5ce58 100644
>> --- a/xen/arch/arm/vtimer.c
>> +++ b/xen/arch/arm/vtimer.c
>> @@ -62,7 +62,6 @@ static void virt_timer_expired(void *data)
>>
>>   int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig 
>> *config)
>>   {
>> -d->arch.phys_timer_base.offset = NOW();
>>   d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0);
>>   d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - 
>> boot_count);
>>   do_div(d->time_offset_seconds, 10);
>> @@ -184,8 +183,7 @@ static bool vtimer_cntp_ctl(struct cpu_user_regs *regs, 
>> uint32_t *r, bool read)
>>
>>   if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )
>>   {
>> -set_timer(>arch.phys_timer.timer,
>> -  v->arch.phys_timer.cval + 
>> v->domain->arch.phys_timer_base.offset);
>> +set_timer(>arch.phys_timer.timer, v->arch.phys_timer.cval);
>>   }
>>   else
>>   stop_timer(>arch.phys_timer.timer);
>> @@ -202,7 +200,7 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, 
>> uint32_t *r,
>>   if ( !ACCESS_ALLOWED(regs, EL0PTEN) )
>>   return false;
>>
>> -now = NOW() - v->domain->arch.phys_timer_base.offset;
>> +now = NOW();
>>
>>   if ( read )
>>   {
>> @@ -214,9 +212,7 @@ static bool vtimer_cntp_tval(struct cpu_user_regs *regs, 
>> uint32_t *r,
>>   if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )
>>   {
>>   v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING;
>> -set_timer(>arch.phys_timer.timer,
>> -  v->arch.phys_timer.cval +
>> -  v->domain->arch.phys_timer_base.offset);
>> +set_timer(>arch.phys_timer.timer, v->arch.phys_timer.cval);
>>   }
>>   }
>>   return true;
>> @@ -232,17 +228,15 @@ static bool vtimer_cntp_cval(struct cpu_user_regs 
>> *regs, uint64_t *r,
>>
>>   if ( read )
>>   {
>> -*r = ns_to_ticks(v->arch.phys_timer.cval);
>> +*r = ns_to_ticks(v->arch.phys_timer.cval) + boot_count;
>>   }
>>   else
>>   {
>> -v->arch.phys_timer.cval = ticks_to_ns(*r);
>> +v->arch.phys_timer.cval = ticks_to_ns(*r - boot_count);
> 
> I know that this is already like that in the code. But it feels weird
> (to not say wrong) that cval will have a different meaning between the
> virtual timer and physical timer.
> 
> Indeed, in the former case it is an exact copy of the hardware value
> whilst in the latter it is the hardware value - NOW().

I noticed this discrepancy as well. Even worse, the virtual timer cval is in
ticks and the physical timer cval is Xen system time in ns.

I believe that changing the physical timer cval to be the hardware value in
ticks would be the more correct approach. The conversion to Xen system time is
only needed for the timer APIs.

> While you are reworking a big chunk of the physical timer emulation,
> could you looking at removing this discrepancy?
> 
>>   if ( v->arch.phys_timer.ctl & CNTx_CTL_ENABLE )
>>   {
>>   v->arch.phys_timer.ctl &= ~CNTx_CTL_PENDING;
>> -set_timer(>arch.phys_timer.timer,
>> -  v->arch.phys_timer.cval +
>> -  v->domain->arch.phys_timer_base.offset);
>> +set_timer(>arch.phys_timer.timer, v->arch.phys_timer.cval);
>>   }
>>   }
>>   return true;
>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>> index 86ebdd2bcf..16a7150a95 100644
>> --- a/xen/include/asm-arm/domain.h
>> +++ b/xen/include/asm-arm/domain.h
>> @@ -65,9 +65,6 @@ struct arch_domain
>>   RELMEM_done,
>>   } relmem;
>>
>> -struct {
>> -uint64_t offset;
>> -} phys_timer_base;
>>   struct {
>>   uint64_t offset;
>>   } virt_timer_base;
>>
> 
> Cheers,
> 
> --
> Julien Grall
> 

Sure thing, I'll update 

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 26 November 2019 14:22
> To: George Dunlap 
> Cc: xen-devel@lists.xenproject.org; Wei Liu ; Jan Beulich
> ; Paul Durrant ; Juergen Gross
> 
> Subject: Re: [PATCH for-4.13] docs/xl: Document pci-assignable state
> 
> [resending to just Paul to fix email address problem]
> 
> George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable
> state"):
> >  =item B [I<-r>] I
> ...
> > +Make the device at PCI Bus/Device/Function BDF not assignable to
> > +guests.  This will at least unbind the device from pciback, and
> > +re-assign it from the "quarantine domain" back to domain 0.  If the -r
> > +option is specified, it will also attempt to re-bind the device to its
> > +original driver, making it usable by Domain 0 again.  If the device is
> > +not bound to pciback, it will return success.
> > +
> > +Note that this functionality will work even for devices which were not
> > +made assignable by B.  This can be used to allow
> > +dom0 to access devices which were automatically quarantined by Xen
> > +after domain destruction as a result of Xen's B
> > +command-line default.
> 
> What are the security implications of doing this if the device might
> still be doing DMA or something ?
> 
> (For that matter, presumably there are security implications of
> assigning the same device in sequence to different guests?)
> 

Assigning any device carries a risk and can never considered to be secure in 
any general way. E.g. a device that exposes its config space in a writable 
fashion via an internal i2c bus that can be accessed via one of its BARs. 
Quarantining helps to the extent that, if a device is continuing to DMA than at 
least that doesn't hit dom0 whilst the FLR/SBR is attempted, but if even that's 
not effective then the device should probably remain in quarantine until it is 
power-cycled.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs

2019-11-26 Thread Jan Beulich
On 26.11.2019 15:34, Andrew Cooper wrote:
> On 26/11/2019 14:14, Jan Beulich wrote:
>> On 26.11.2019 13:25, Andrew Cooper wrote:
>>> On 26/11/2019 08:42, Jan Beulich wrote:
 On 25.11.2019 22:05, Igor Druzhinin wrote:
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table(
>  for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
>  dt[bdf] = (struct amd_iommu_dte){
>.v = true,
> -  .iv = true,
> +  .iv = iommu_intremap,
 This was very intentionally "true", and ignoring "iommu_intremap":
>>> Deliberate or not, it is a regression from 4.12.
>> I accept it's a regression (which wants fixing), but I don't think
>> this is the way to address is. I could be convinced by good
>> arguments, though.
>>
>>> Booting with iommu=no-intremap is a common debugging technique, and that
>>> means no interrupt remapping anywhere in the system, even for
>>> supposedly-unused DTEs.
>> Whether IV=1 or IV=0, there's no interrupt _remapping_ with this
>> option specified. There's some interrupt _blocking_, yes. It's
>> not immediately clear to me whether this is a good or a bad thing.
> 
> You're attempting to argue semantics.  Blocking is a special case remapping.
> 
> "iommu=no-intremap" (for better or worse, naming wise) refers to the
> interrupt mediation functionality in the IOMMU, and means "don't use any
> of it".  Any other behaviour is a regression.

I can accept this pov. Nevertheless I'd like to first see whether
we can't address the issue at hand with a less big hammer solution.
We can then always decide to still put in this change.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] x86/svm: Always intercept ICEBP

2019-11-26 Thread Petre Ovidiu PIRCALABU
On Tue, 2019-11-26 at 12:03 +, Andrew Cooper wrote:
> ICEBP isn't handled well by SVM.
> 
> The VMexit state for a #DB-vectored TASK_SWITCH has %rip pointing to
> the
> appropriate instruction boundary (fault or trap, as appropriate),
> except for
> an ICEBP-induced #DB TASK_SWITCH, where %rip points at the ICEBP
> instruction
> rather than after it.  As ICEBP isn't distinguished in the vectoring
> event
> type, the state is ambiguous.
> 
> To add to the confusion, an ICEBP which occurs due to Introspection
> intercepting the instruction, or from x86_emulate() will have %rip
> updated as
> a consequence of partial emulation required to inject an ICEBP event
> in the
> first place.
> 
> We could in principle spot the non-injected case in the TASK_SWITCH
> handler,
> but this still results in complexity if the ICEBP instruction also
> has an
> Instruction Breakpoint active on it (which genuinely has fault
> semantics).
> 
> Unconditionally intercept ICEBP.  This does have a trap semantics for
> the
> intercept, and allows us to move %rip forwards appropriately before
> the
> TASK_SWITCH intercept is hit.  This makes the behaviour of #DB-
> vectored
> switches consistent however the ICEBP #DB came about, and avoids
> special cases
> in the TASK_SWITCH intercept.
> 
> This in turn allows for the removal of the conditional
> hvm_set_icebp_interception() logic used by the monitor subsystem, as
> ICEBP's
> will now always be submitted for monitoring checks.
> 
> Signed-off-by: Andrew Cooper 
> 
Reviewed-by: Petre Pircalabu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Ian Jackson
Ian Jackson writes ("Re: [PATCH for-4.13] docs/xl: Document pci-assignable 
state"):
> George Dunlap writes ("Re: [PATCH for-4.13] docs/xl: Document pci-assignable 
> state"):
> > I kind of feel like the discussion of the security risks inherent in pci
> > passthrough belong in a separate document, but perhaps a brief mention
> > here would be helpful.  Perhaps the following?
> > 
> > "As always, this should only be done if you trust the guest, or are
> > confident that the particular device you're re-assigning to dom0 will
> > cancel all in-flight DMA on FLR."
> 
> SGTM.
> 
> I like "as always" which clearly signals that this is a more general
> problem without requiring us to actually write that other
> comprehensive document...

Resending with Paul's new address.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Ian Jackson
George Dunlap writes ("Re: [PATCH for-4.13] docs/xl: Document pci-assignable 
state"):
> I kind of feel like the discussion of the security risks inherent in pci
> passthrough belong in a separate document, but perhaps a brief mention
> here would be helpful.  Perhaps the following?
> 
> "As always, this should only be done if you trust the guest, or are
> confident that the particular device you're re-assigning to dom0 will
> cancel all in-flight DMA on FLR."

SGTM.

I like "as always" which clearly signals that this is a more general
problem without requiring us to actually write that other
comprehensive document...

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.13 0/2] Fixes to AMD IOMMU logging

2019-11-26 Thread Andrew Cooper
RFC for-4.13.  These are diagnostic improvements/corrections, so are low risk
and high utility.

Andrew Cooper (2):
  AMD/IOMMU: Always print IOMMU errors
  AMD/IOMMU: Render IO_PAGE_FAULT errors in a more useful manner

 xen/drivers/passthrough/amd/iommu_init.c  | 48 +++
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |  3 --
 2 files changed, 27 insertions(+), 24 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 1/2] AMD/IOMMU: Always print IOMMU errors

2019-11-26 Thread Andrew Cooper
Unhandled IOMMU errors (i.e. not IO_PAGE_FAULT) should still be printed, and
not hidden behind iommu=debug.

While adjusting this, factor out the symbolic name handling to just one
location exposing its off-by-one nature.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Juergen Gross 
---
 xen/drivers/passthrough/amd/iommu_init.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_init.c 
b/xen/drivers/passthrough/amd/iommu_init.c
index 16e84d43d4..8aa8788797 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -530,6 +530,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, 
u32 entry[])
 EVENT_STR(INVALID_DEV_REQUEST)
 #undef EVENT_STR
 };
+const char *code_str = "event";
 
 code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
 IOMMU_EVENT_CODE_SHIFT);
@@ -553,6 +554,10 @@ static void parse_event_log_entry(struct amd_iommu *iommu, 
u32 entry[])
   IOMMU_EVENT_CODE_SHIFT);
 }
 
+/* Look up the symbolic name for code. */
+if ( code <= ARRAY_SIZE(event_str) )
+code_str = event_str[code - 1];
+
 if ( code == IOMMU_EVENT_IO_PAGE_FAULT )
 {
 device_id = iommu_get_devid_from_event(entry[0]);
@@ -566,7 +571,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, 
u32 entry[])
 printk(XENLOG_ERR "AMD-Vi: "
"%s: domain = %d, device id = %#x, "
"fault address = %#"PRIx64", flags = %#x\n",
-   event_str[code-1], domain_id, device_id, *addr, flags);
+   code_str, domain_id, device_id, *addr, flags);
 
 for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
 if ( get_dma_requestor_id(iommu->seg, bdf) == device_id )
@@ -574,12 +579,8 @@ static void parse_event_log_entry(struct amd_iommu *iommu, 
u32 entry[])
  PCI_DEVFN2(bdf));
 }
 else
-{
-AMD_IOMMU_DEBUG("%s %08x %08x %08x %08x\n",
-code <= ARRAY_SIZE(event_str) ? event_str[code - 1]
-  : "event",
-entry[0], entry[1], entry[2], entry[3]);
-}
+printk(XENLOG_ERR "%s %08x %08x %08x %08x\n",
+   code_str, entry[0], entry[1], entry[2], entry[3]);
 
 memset(entry, 0, IOMMU_EVENT_LOG_ENTRY_SIZE);
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] AMD/IOMMU: Render IO_PAGE_FAULT errors in a more useful manner

2019-11-26 Thread Andrew Cooper
Print the PCI coordinates in its common format and use d%u notation for the
domain.  As well as printing flags, decode them.  IO_PAGE_FAULT is used for
interrupt remapping errors as well as DMA remapping errors.

Before:
  (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 
0xbf695000, flags = 0x10
  (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 
0xbf695040, flags = 0x10
  (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 
0xfff0, flags = 0x30
  (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 
0x1, flags = 0x30
  (XEN) AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0xa1, fault address = 
0x10040, flags = 0x30

After:
  (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr bf5fc000 flags 0x10 
PR
  (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr bf5fc040 flags 0x10 
PR
  (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr fff0 flags 0x30 
RW PR
  (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr 0001 flags 0x30 
RW PR
  (XEN) AMD-Vi: IO_PAGE_FAULT: :00:14.1 d0 addr 00010040 flags 0x30 
RW PR

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Juergen Gross 
---
 xen/drivers/passthrough/amd/iommu_init.c  | 35 +++
 xen/include/asm-x86/hvm/svm/amd-iommu-proto.h |  3 ---
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_init.c 
b/xen/drivers/passthrough/amd/iommu_init.c
index 8aa8788797..cd4e6e16b8 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -513,10 +513,7 @@ static hw_irq_controller iommu_x2apic_type = {
 
 static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
 {
-u16 domain_id, device_id, flags;
-unsigned int bdf;
 u32 code;
-u64 *addr;
 int count = 0;
 static const char *const event_str[] = {
 #define EVENT_STR(name) [IOMMU_EVENT_##name - 1] = #name
@@ -560,18 +557,26 @@ static void parse_event_log_entry(struct amd_iommu 
*iommu, u32 entry[])
 
 if ( code == IOMMU_EVENT_IO_PAGE_FAULT )
 {
-device_id = iommu_get_devid_from_event(entry[0]);
-domain_id = get_field_from_reg_u32(entry[1],
-   IOMMU_EVENT_DOMAIN_ID_MASK,
-   IOMMU_EVENT_DOMAIN_ID_SHIFT);
-flags = get_field_from_reg_u32(entry[1],
-   IOMMU_EVENT_FLAGS_MASK,
-   IOMMU_EVENT_FLAGS_SHIFT);
-addr= (u64*) (entry + 2);
-printk(XENLOG_ERR "AMD-Vi: "
-   "%s: domain = %d, device id = %#x, "
-   "fault address = %#"PRIx64", flags = %#x\n",
-   code_str, domain_id, device_id, *addr, flags);
+unsigned int bdf;
+uint16_t device_id = MASK_EXTR(entry[0], IOMMU_CMD_DEVICE_ID_MASK);
+uint16_t domain_id = MASK_EXTR(entry[1], IOMMU_EVENT_DOMAIN_ID_MASK);
+uint16_t flags = MASK_EXTR(entry[1], IOMMU_EVENT_FLAGS_MASK);
+uint64_t addr = *(uint64_t *)(entry + 2);
+
+printk(XENLOG_ERR "AMD-Vi: %s: %04x:%02x:%02x.%u d%d addr %016"PRIx64
+   " flags %#x%s%s%s%s%s%s%s%s%s%s\n",
+   code_str, iommu->seg, PCI_BUS(device_id), PCI_SLOT(device_id),
+   PCI_FUNC(device_id), domain_id, addr, flags,
+   (flags & 0xe00) ? " ??" : "",
+   (flags & 0x100) ? " TR" : "",
+   (flags & 0x080) ? " RZ" : "",
+   (flags & 0x040) ? " PE" : "",
+   (flags & 0x020) ? " RW" : "",
+   (flags & 0x010) ? " PR" : "",
+   (flags & 0x008) ? " I" : "",
+   (flags & 0x004) ? " US" : "",
+   (flags & 0x002) ? " NX" : "",
+   (flags & 0x001) ? " GN" : "");
 
 for ( bdf = 0; bdf < ivrs_bdf_entries; bdf++ )
 if ( get_dma_requestor_id(iommu->seg, bdf) == device_id )
diff --git a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h 
b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
index 8ed9482791..53900cd60c 100644
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -265,9 +265,6 @@ static inline uint32_t iommu_get_addr_hi_from_cmd(uint32_t 
cmd)
   IOMMU_CMD_ADDR_HIGH_SHIFT);
 }
 
-/* access address field from event log entry */
-#define iommu_get_devid_from_event  iommu_get_devid_from_cmd
-
 /* access iommu base addresses field from mmio regs */
 static inline void iommu_set_addr_lo_to_reg(uint32_t *reg, uint32_t addr)
 {
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs

2019-11-26 Thread Andrew Cooper
On 26/11/2019 14:14, Jan Beulich wrote:
> On 26.11.2019 13:25, Andrew Cooper wrote:
>> On 26/11/2019 08:42, Jan Beulich wrote:
>>> On 25.11.2019 22:05, Igor Druzhinin wrote:
 --- a/xen/drivers/passthrough/amd/iommu_init.c
 +++ b/xen/drivers/passthrough/amd/iommu_init.c
 @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table(
  for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
  dt[bdf] = (struct amd_iommu_dte){
.v = true,
 -  .iv = true,
 +  .iv = iommu_intremap,
>>> This was very intentionally "true", and ignoring "iommu_intremap":
>> Deliberate or not, it is a regression from 4.12.
> I accept it's a regression (which wants fixing), but I don't think
> this is the way to address is. I could be convinced by good
> arguments, though.
>
>> Booting with iommu=no-intremap is a common debugging technique, and that
>> means no interrupt remapping anywhere in the system, even for
>> supposedly-unused DTEs.
> Whether IV=1 or IV=0, there's no interrupt _remapping_ with this
> option specified. There's some interrupt _blocking_, yes. It's
> not immediately clear to me whether this is a good or a bad thing.

You're attempting to argue semantics.  Blocking is a special case remapping.

"iommu=no-intremap" (for better or worse, naming wise) refers to the
interrupt mediation functionality in the IOMMU, and means "don't use any
of it".  Any other behaviour is a regression.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 6/7] livepatch-build: Strip transient or unneeded symbols

2019-11-26 Thread Ross Lagerwall
On 11/26/19 12:25 PM, Pawel Wieczorkiewicz wrote:
> In the process of creating a final hotpatch module file make sure to
> strip all transient symbols that have not been caught and removed by
> create-diff-object processing. For now these are only the hooks
> kpatch load/unload symbols.
> 
> For all new object files that are carried along for the final linking
> the transient hooks symbols are not stripped and neither are any
> unneeded symbols. Strip the transient hooks symbols explicitly from
> resulting object file.
> Add a new option '--strip' to additionally strip all unneeded symbols
> from new object files.
> 
> Signed-off-by: Pawel Wieczorkiewicz 
> ---
Reviewed-by: Ross Lagerwall 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs

2019-11-26 Thread Jan Beulich
On 26.11.2019 15:24, Igor Druzhinin wrote:
> On 26/11/2019 14:14, Jan Beulich wrote:
>> On 26.11.2019 13:25, Andrew Cooper wrote:
>>> On 26/11/2019 08:42, Jan Beulich wrote:
 On 25.11.2019 22:05, Igor Druzhinin wrote:
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table(
>  for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
>  dt[bdf] = (struct amd_iommu_dte){
>.v = true,
> -  .iv = true,
> +  .iv = iommu_intremap,
 This was very intentionally "true", and ignoring "iommu_intremap":
>>>
>>> Deliberate or not, it is a regression from 4.12.
>>
>> I accept it's a regression (which wants fixing), but I don't think
>> this is the way to address is. I could be convinced by good
>> arguments, though.
> 
> Do you have any suggestions how to address that?

I'd like to reply in the other context, after a little more
thinking about the situation. I think I see an oversight of
mine.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] MAINTAINERS: Add mandatory V: version identifier

2019-11-26 Thread Ross Lagerwall
On 11/26/19 1:11 PM, Pawel Wieczorkiewicz wrote:
> The livepatch-build-tools MAINTAINERS file is missing V: version
> identifier. This seems required by the Xen repo's add_maintainers.pl
> script.
> 
> Signed-off-by: Pawel Wieczorkiewicz 
> ---

Acked-by: Ross Lagerwall 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Jan Beulich
On 26.11.2019 15:14, Ian Jackson wrote:
> George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable 
> state"):
>>  =item B [I<-r>] I
> ...
>> +Make the device at PCI Bus/Device/Function BDF not assignable to
>> +guests.  This will at least unbind the device from pciback, and
>> +re-assign it from the "quarantine domain" back to domain 0.  If the -r
>> +option is specified, it will also attempt to re-bind the device to its
>> +original driver, making it usable by Domain 0 again.  If the device is
>> +not bound to pciback, it will return success.
>> +
>> +Note that this functionality will work even for devices which were not
>> +made assignable by B.  This can be used to allow
>> +dom0 to access devices which were automatically quarantined by Xen
>> +after domain destruction as a result of Xen's B
>> +command-line default.
> 
> What are the security implications of doing this if the device might
> still be doing DMA or something ?

Devices get reset in between, so well behaving ones should not
still be doing DMA at that point. Misbehaving ones would better
not be assigned (back and forth) anyway. But a recent patch of
Paul's suggests that people still wish to do so, on the
assumption that such DMA will drain sufficiently quickly.

> (For that matter, presumably there are security implications of
> assigning the same device in sequence to different guests?)

Right.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread George Dunlap
On 11/26/19 2:14 PM, Ian Jackson wrote:
> George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable 
> state"):
>>  =item B [I<-r>] I
> ...
>> +Make the device at PCI Bus/Device/Function BDF not assignable to
>> +guests.  This will at least unbind the device from pciback, and
>> +re-assign it from the "quarantine domain" back to domain 0.  If the -r
>> +option is specified, it will also attempt to re-bind the device to its
>> +original driver, making it usable by Domain 0 again.  If the device is
>> +not bound to pciback, it will return success.
>> +
>> +Note that this functionality will work even for devices which were not
>> +made assignable by B.  This can be used to allow
>> +dom0 to access devices which were automatically quarantined by Xen
>> +after domain destruction as a result of Xen's B
>> +command-line default.
> 
> What are the security implications of doing this if the device might
> still be doing DMA or something ?

Then the device might scribble over any memory dom0 has access to.
Function-level reset will theoretically stop this, but fundamentally we
have to consider it unreliable in the general case.  Same thing for
assigning to a different guest.

I kind of feel like the discussion of the security risks inherent in pci
passthrough belong in a separate document, but perhaps a brief mention
here would be helpful.  Perhaps the following?

"As always, this should only be done if you trust the guest, or are
confident that the particular device you're re-assigning to dom0 will
cancel all in-flight DMA on FLR."

 -George


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs

2019-11-26 Thread Igor Druzhinin
On 26/11/2019 14:14, Jan Beulich wrote:
> On 26.11.2019 13:25, Andrew Cooper wrote:
>> On 26/11/2019 08:42, Jan Beulich wrote:
>>> On 25.11.2019 22:05, Igor Druzhinin wrote:
 --- a/xen/drivers/passthrough/amd/iommu_init.c
 +++ b/xen/drivers/passthrough/amd/iommu_init.c
 @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table(
  for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
  dt[bdf] = (struct amd_iommu_dte){
.v = true,
 -  .iv = true,
 +  .iv = iommu_intremap,
>>> This was very intentionally "true", and ignoring "iommu_intremap":
>>
>> Deliberate or not, it is a regression from 4.12.
> 
> I accept it's a regression (which wants fixing), but I don't think
> this is the way to address is. I could be convinced by good
> arguments, though.

Do you have any suggestions how to address that?

>> Booting with iommu=no-intremap is a common debugging technique, and that
>> means no interrupt remapping anywhere in the system, even for
>> supposedly-unused DTEs.
> 
> Whether IV=1 or IV=0, there's no interrupt _remapping_ with this
> option specified. There's some interrupt _blocking_, yes. It's
> not immediately clear to me whether this is a good or a bad thing.

From user point of view, if I supply "iommu=no-intremap" I'm not
expecting any interrupts in the system to be blocked either. And
as Andrew said we frequently use this option for debugging which
means we expect this functionality to be off completely.

Igor

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Jürgen Groß

On 26.11.19 15:10, George Dunlap wrote:

Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and
ba2ab00bbb ("IOMMU: default to always quarantining PCI devices")
introduced PCI device "quarantine" behavior, but did not document how
the pci-assignable-add and -remove functions act in regard to this.
Rectify this.

Signed-off-by: George Dunlap 


Release-acked-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 144307: tolerable all pass - PUSHED

2019-11-26 Thread osstest service owner
flight 144307 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144307/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  8c79c129a6db2220c1089e0ce5fa49e7298b1d3e
baseline version:
 xen  77beba7c921a286c31a2a76f26500047f353614a

Last test of basis   144294  2019-11-25 11:00:30 Z1 days
Testing same since   144307  2019-11-26 11:03:51 Z0 days1 attempts


People who touched revisions under test:
  George Dunlap 
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   77beba7c92..8c79c129a6  8c79c129a6db2220c1089e0ce5fa49e7298b1d3e -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] AMD/IOMMU: honour IR setting while pre-filling DTEs

2019-11-26 Thread Jan Beulich
On 26.11.2019 13:25, Andrew Cooper wrote:
> On 26/11/2019 08:42, Jan Beulich wrote:
>> On 25.11.2019 22:05, Igor Druzhinin wrote:
>>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>>> @@ -1279,7 +1279,7 @@ static int __init amd_iommu_setup_device_table(
>>>  for ( bdf = 0, size /= sizeof(*dt); bdf < size; ++bdf )
>>>  dt[bdf] = (struct amd_iommu_dte){
>>>.v = true,
>>> -  .iv = true,
>>> +  .iv = iommu_intremap,
>> This was very intentionally "true", and ignoring "iommu_intremap":
> 
> Deliberate or not, it is a regression from 4.12.

I accept it's a regression (which wants fixing), but I don't think
this is the way to address is. I could be convinced by good
arguments, though.

> Booting with iommu=no-intremap is a common debugging technique, and that
> means no interrupt remapping anywhere in the system, even for
> supposedly-unused DTEs.

Whether IV=1 or IV=0, there's no interrupt _remapping_ with this
option specified. There's some interrupt _blocking_, yes. It's
not immediately clear to me whether this is a good or a bad thing.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Ian Jackson
George Dunlap writes ("[PATCH for-4.13] docs/xl: Document pci-assignable 
state"):
>  =item B [I<-r>] I
...
> +Make the device at PCI Bus/Device/Function BDF not assignable to
> +guests.  This will at least unbind the device from pciback, and
> +re-assign it from the "quarantine domain" back to domain 0.  If the -r
> +option is specified, it will also attempt to re-bind the device to its
> +original driver, making it usable by Domain 0 again.  If the device is
> +not bound to pciback, it will return success.
> +
> +Note that this functionality will work even for devices which were not
> +made assignable by B.  This can be used to allow
> +dom0 to access devices which were automatically quarantined by Xen
> +after domain destruction as a result of Xen's B
> +command-line default.

What are the security implications of doing this if the device might
still be doing DMA or something ?

(For that matter, presumably there are security implications of
assigning the same device in sequence to different guests?)

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread Wei Liu
On Tue, Nov 26, 2019 at 02:10:26PM +, George Dunlap wrote:
> Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and
> ba2ab00bbb ("IOMMU: default to always quarantining PCI devices")
> introduced PCI device "quarantine" behavior, but did not document how
> the pci-assignable-add and -remove functions act in regard to this.
> Rectify this.
> 
> Signed-off-by: George Dunlap 

LGTM.

Acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH for-4.13] docs/xl: Document pci-assignable state

2019-11-26 Thread George Dunlap
Changesets 319f9a0ba9 ("passthrough: quarantine PCI devices") and
ba2ab00bbb ("IOMMU: default to always quarantining PCI devices")
introduced PCI device "quarantine" behavior, but did not document how
the pci-assignable-add and -remove functions act in regard to this.
Rectify this.

Signed-off-by: George Dunlap 
---
Release justification: This brings documentation into line with the
actual code that will be released.

CC: Ian Jackson 
CC: Wei Liu 
CC: Jan Beulich 
CC: Paul Durrant 
CC: Juergen Gross 
---
 docs/man/xl.1.pod.in | 27 ++-
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 2303b81e4f..372c229244 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -1589,10 +1589,12 @@ backend driver in domain 0 rather than a real driver.
 =item B I
 
 Make the device at PCI Bus/Device/Function BDF assignable to guests.
-This will bind the device to the pciback driver.  If it is already
-bound to a driver, it will first be unbound, and the original driver
-stored so that it can be re-bound to the same driver later if desired.
-If the device is already bound, it will return success.
+This will bind the device to the pciback driver and assign it to the
+"quarantine domain".  If it is already bound to a driver, it will
+first be unbound, and the original driver stored so that it can be
+re-bound to the same driver later if desired.  If the device is
+already bound, it will assign it to the quarantine domain and return
+success.
 
 CAUTION: This will make the device unusable by Domain 0 until it is
 returned with pci-assignable-remove.  Care should therefore be taken
@@ -1602,11 +1604,18 @@ being used.
 
 =item B [I<-r>] I
 
-Make the device at PCI Bus/Device/Function BDF not assignable to guests.  This
-will at least unbind the device from pciback.  If the -r option is specified,
-it will also attempt to re-bind the device to its original driver, making it
-usable by Domain 0 again.  If the device is not bound to pciback, it will
-return success.
+Make the device at PCI Bus/Device/Function BDF not assignable to
+guests.  This will at least unbind the device from pciback, and
+re-assign it from the "quarantine domain" back to domain 0.  If the -r
+option is specified, it will also attempt to re-bind the device to its
+original driver, making it usable by Domain 0 again.  If the device is
+not bound to pciback, it will return success.
+
+Note that this functionality will work even for devices which were not
+made assignable by B.  This can be used to allow
+dom0 to access devices which were automatically quarantined by Xen
+after domain destruction as a result of Xen's B
+command-line default.
 
 =item B I I
 
-- 
2.24.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: Update path to the livepatch documentation

2019-11-26 Thread Jan Beulich
On 26.11.2019 14:30, Julien Grall wrote:
> Commit d661611d08 "docs/markdown: Switch to using pandoc, and fix
> underscore escaping" converted the livepatch documentation from markdown
> to pandoc.
> 
> Update MAINTAINERS to reflect the change so the correct maintainers are
> CCed to the patches.
> 
> Fixes: d661611d08 ("docs/markdown: Switch to using pandoc, and fix underscore 
> escaping")
> Signed-off-by: Julien Grall 

Acked-by: Jan Beulich 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-4.11-testing test] 144300: tolerable FAIL - PUSHED

2019-11-26 Thread osstest service owner
flight 144300 xen-4.11-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/144300/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail never pass
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass

version targeted for testing:
 xen  0eb99bf90b64737c5ba6adaa46951127dcf150cc
baseline version:
 xen  006b2041242129896fbd30135b3dc6f575894a07

Last test of basis   144025  2019-11-11 17:36:00 Z   14 days
Failing since144058  2019-11-12 18:05:56 Z   13 days   24 attempts
Testing same since   144300  2019-11-25 20:37:05 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony 

Re: [Xen-devel] [PATCH] domain_create: honour global grant/maptrack frame limits...

2019-11-26 Thread George Dunlap
On 11/26/19 1:26 PM, Durrant, Paul wrote:
>> -Original Message-
>> From: George Dunlap 
>> Sent: 26 November 2019 12:32
>> To: Paul Durrant ; Durrant, Paul 
>> Cc: xen-devel ; Stefano Stabellini
>> ; Julien Grall ; Wei Liu
>> ; Konrad Rzeszutek Wilk ; George
>> Dunlap ; Andrew Cooper
>> ; Ian Jackson ; Jan
>> Beulich 
>> Subject: Re: [Xen-devel] [PATCH] domain_create: honour global
>> grant/maptrack frame limits...
>>
>> On 11/26/19 11:30 AM, Paul Durrant wrote:
>>> On Wed, 13 Nov 2019 at 13:55, Paul Durrant  wrote:

 ...when their values are larger than the per-domain configured limits.

 Signed-off-by: Paul Durrant 
 ---
 Cc: Andrew Cooper 
 Cc: George Dunlap 
 Cc: Ian Jackson 
 Cc: Jan Beulich 
 Cc: Julien Grall 
 Cc: Konrad Rzeszutek Wilk 
 Cc: Stefano Stabellini 
 Cc: Wei Liu 

 After mining through commits it is still unclear to me exactly when Xen
 stopped honouring the global values, but I really think this commit
>> should
 be back-ported to stable trees as it was a behavioural change that can
 cause domUs to fail in non-obvious ways.
>>>
>>> Any other opinions on this? AFAICT questions is still open:
>>>
>>> - Do we consider not honouring the command line values to be a
>>> regression (since domUs that would have worked before will no longer
>>> work after a basic upgrade of Xen)?
>>
>> This would be a bit easier to form a "policy" opinion on (or perhaps
>> alternate solutions to) if more of the situation were outlined here.
>>
>> Is the problem that the per-domain config is always set, and doesn't
>> take the hypervisor-set config into account?  Wouldn't it be better to
>> modify the toolstack to use the hypervisor value if it's not set?
>>
>> In fact, it looks kind of like things are screwed up anyway -- the
>> "default" value of max_grant_frames, if no value is specified, is set in
>> xl.c.  If that were the behavior we wanted, it should be set in libxl.c.
>>
>> But it doesn't seem like it should be terribly difficult to get a "use
>> the default" sentinel value passed in to Xen, such that:
>>
>> 1. People who don't do anything will get the default currently specified
>> in xl.c
>>
>> 2. People who set the value on the Xen command-line and don't set
>> anything in the guest config file will get the Xen command-line value
>>
>> 3. People who set the value in the config file will get the value they
>> specified (regardless of the global setting).
>>
>> Is that the behaviour you'd like to see, Paul?
> 
> I think the order should be:
> 
> If set in xl.cfg => use that, else
> If set in xl.conf => use that, else
> Use the command line/default value
> 
> I.e. the ultimate value should be set in Xen (and possibly overridden by the 
> command line) and not hardcoded at any other layer.
> 
> There is also the issue of limits but I guess the rationale there should be: 
> If a value *is* specified then it should not exceed the value set in Xen.
> 
> Does that sound right?

So part of the issue here sounds like a terminology issue.  Is it the
case that there's a default "max", and you want to raise the default
"max"; is that right?

But the documentation actually says:

"Specify the maximum number of frames which any domain may use as part
of its grant table."

Which makes it sound a lot more like a "maximum max" -- i.e., that any
domain which is created with a value higher than this should fail.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen-unstable + linux 5.4.0-rc8: RIP: 0010:xennet_poll+0x35f/0xae0

2019-11-26 Thread Sander Eikelenboom
On 25/11/2019 15:42, Jan Beulich wrote:
> On 25.11.2019 15:21, Sander Eikelenboom wrote:
>> L.S.,
>>
>> At present one of my PVH VM's kernel crashed with the splat below
>> (haven't seen it before, so could be something that happens sporadically).
>>
>> Any ideas ?
>>
>> --
>> Sander
>>
>>
>>
>> database databaselogin:  login: [184503.428811] general protection fault: 
>>  [#1] SMP NOPTI
>> [184503.428887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
>> 5.4.0-rc8-20191123-doflr-mac80211debug+ #1
>> [184503.428932] RIP: 0010:xennet_poll+0x35f/0xae0
>> [184503.428955] Code: ba 00 01 00 00 48 8b 8d c0 00 00 00 0f b7 b4 24 92 00 
>> 00 00 48 8b 5c 24 78 3d 00 01 00 00 0f 4e d0 89 55 28 8b 95 bc 00 00 00 <89> 
>> 74 11 3c 48 8b 8d c0 00 00 00 8b 95 bc 00 00 00 89 44 11 38 89
> 
> The insn here being "mov %esi,(%rcx,%rdx,0x3c)" ...
> 
>> [184503.429027] RSP: 0018:c9003e10 EFLAGS: 00010287
>> [184503.429049] RAX: 0042 RBX: c9003e88 RCX: 
>> fffe88800b865a80
> 
> ... I notice corruption to bit 48 of RCX here. This can be a result of
> memory corruption, but prior instances of such that I had to look into
> were bit flips in the CPU instead. Is this a server or desktop class
> CPU?
> 
> Jan

Hi Jan,

Fortunately (or more unfortunate for me), memtest86 gave errors on one
stick of memory. So this is the probable cause.

Sorry for the noise.

--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [XEN PATCH v3 07/11] xen: arm: vgic: allow delivery of PPIs to guests

2019-11-26 Thread Julien Grall

Hi,

On 26/11/2019 01:20, Stefano Stabellini wrote:

On Mon, 25 Nov 2019, Julien Grall wrote:

(+ Andre)

On 23/11/2019 20:35, Julien Grall wrote:

Hi,

On 15/11/2019 20:10, Stewart Hildebrand wrote:

Allow vgic_get_hw_irq_desc to be called with a vcpu argument.

Use vcpu argument in vgic_connect_hw_irq.

vgic_connect_hw_irq is called for PPIs and SPIs, not SGIs. Enforce with
ASSERTs.

Signed-off-by: Stewart Hildebrand 

---
v3: new patch

---
Note: I have only modified the old vgic to allow delivery of PPIs.


The new vGIC should also be modified to support delivery of PPIs.


diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 82f524a35c..c3933c2687 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -410,10 +410,10 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r,
int n)
   irq_set_affinity(p->desc, cpumask_of(v_target->processor));
   spin_lock_irqsave(>desc->lock, flags);
   /*
- * The irq cannot be a PPI, we only support delivery of SPIs
- * to guests.
+ * The irq cannot be a SGI, we only support delivery of SPIs
+ * and PPIs to guests.
    */
-    ASSERT(irq >= 32);
+    ASSERT(irq >= NR_SGIS);


We usually put ASSERT() in place we know that code wouldn't be able to work
correctly if there ASSERT were hit. In this particular case:


   if ( irq_type_set_by_domain(d) )
   gic_set_irq_type(p->desc, vgic_get_virq_type(v, n, i));


1) We don't want to allow any domain (including Dom0) to modify the
interrupt type (i.e. level/edge) for PPIs as this is shared. You will also
most likely need to modify the counterpart in setup_guest_irq().


   p->desc->handler->enable(p->desc);


2) On GICv3, the re-distributor of vCPU A is accessible by vCPU B. So vCPU B
could enable the SGI for vCPU A. But this would be called on the wrong pCPU
leading to inconsistency between the hardware state of the internal vGIC
state.


Is it actually meant to work from a GIC specification perspective? It
sounds "wrong" somehow to me, but I went through the spec and it doesn't
say explicitly that cpuB couldn't enable a SGI/PPI of cpuA. I am still
a bit shocked by this revelation.


To be honest, I can see reason to allow this but this is a different 
subject.


In this case the re-distributor is per-CPU and can accessible by any 
CPU. For instance, Linux will access it to find the re-distributor 
associated to a given CPU at boot.


FWIW, the vGIC implementation in KVM handles it the same way.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >