date:20170802

Re: [Xen-devel] Is possible to do GPU virtualization in Intel® Atom?

2017-08-02 Thread Haozhong Zhang

+Hongbo from Intel GPU virtualization team

On 08/02/17 09:41 +, Asharaf Perinchikkal wrote:
> Is possible to achieve GPU virtualization in Intel® Atom using para 
> virtualization?
> 
> From: Roger Pau Monné [roger@citrix.com]
> Sent: Wednesday, August 02, 2017 1:04 PM
> To: Asharaf Perinchikkal
> Cc: xen-devel@lists.xen.org; Anoop Babu
> Subject: Re: [Xen-devel] Is possible to do GPU virtualization in Intel® Atom?
> 
> On Tue, Aug 01, 2017 at 10:01:01AM +, Asharaf Perinchikkal wrote:
> > Hi All,
> >
> >
> > In Intel® Atom™ E3845(MinnowBoard Turbot Quad-Core board) has only  support 
> > for Virtualization Technology (VT-x).
> >
> > No support for Intel® Virtualization Technology for Directed I/O (VT-d). 
> > [https://ark.intel.com/products/78475/Intel-Atom-Processor-E3845-2M-Cache-1_91-GHz]
> 
> Without VT-d (IOMMU) you won't be able to passthrough any physical
> device to a guest, so no, you won't be able to do GPU passthrough (at
> least in a safe way).
> 
> Roger.
> ---Disclaimer-- This e-mail contains PRIVILEGED 
> AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). 
> If you are not the intended recipient, please notify the sender by e-mail and 
> delete the original message. Opinions, conclusions and other information in 
> this transmission that do not relate to the official business of QuEST Global 
> and/or its subsidiaries, shall be understood as neither given nor endorsed by 
> it. Any statements made herein that are tantamount to contractual 
> obligations, promises, claims or commitments shall not be binding on the 
> Company unless followed by written confirmation by an authorized signatory of 
> the Company. 
> ---
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] DESIGN v2: CPUID part 3

2017-08-02 Thread Dario Faggioli

On Wed, 2017-08-02 at 11:34 +0100, Joao Martins wrote:
> On 08/01/2017 07:34 PM, Andrew Cooper wrote:
> > > On Wed, Jul 05, 2017 at 02:22:00PM +0100, Joao Martins wrote:
> > > > 
> > > > There could be other uses too on passing this info to Xen, say
> > > > e.g. the
> > > > scheduler knowing the guest CPU topology it would allow better
> > > > selection of
> > > > core+sibling pair such that it could match cache/cpu topology
> > > > passed on the
> > > > guest (for unpinned SMT guests).
> > 
> > I remain to be convinced (i.e. with some real performance numbers)
> > that
> > the added complexity in the scheduler for that logic is a benefit
> > in the
> > general case.
> > 
> 
> The suggestion above was a simple extension to struct domain (e.g.
> cores/threads
> or struct cpu_topology field) - nothing too disruptive I think.
> 
> But I cannot really argue on this as this was just an idea that I
> found
> interesting (no numbers to support it entirely). We just happened to
> see it
> under-perform when a simple range of cpus was used for affinity, and
> that some
> vcpus end up being scheduled belonging the same core+sibling pair
> IIRC; hence I
> (perhaps naively) imagined that there could be value in further
> scheduler
> enlightenment e.g. "gang-scheduling" where we schedule core+sibling
> always
> together. I was speaking to Dario (CC'ed) on the summit whether CPU
> topology
> could have value - and there might be but it remains to be explored
> once we're
> able to pass a cpu topology to the guest. (In the past it seemed
> enthusiastic of
> the idea of the topology[0] and hence I assumed to be in the context
> of schedulers)
> 
> [0] https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg0
> 3850.html
> 
> > In practice, customers are either running very specific and
> > dedicated
> > workloads (at which point pinning is used and there is no
> > oversubscription, and exposing the actual SMT topology is a good
> > thing),
> > 
> /nods
> 
I am enthusiast of there going to be a way for specifying explicitly
the CPU topology of a guest.

The way we can take advantage of this, at least as a first step, is,
when the guest is pinned, and two of its vCPUs are pinned to two host
hyperthreads, make the two vCPUs hyperthreads as well, from the guest
point of view.

Then, it will be the guest's (e.g., Linux's) scheduler that will do
something clever with this information, so no need for adding
complexity anywhere (well, in theory, in the guest scheduler, but in
practise, code it's there already!).

Or, on the other hand, if pinning is *not* used, then I'd use this
mechanism to tell the guest that there's no relationship between its
vCPUs whatsoever.

In fact, currently --sticking to SMT as example-- by not specifying the
topology explicitly, there may be cases where the guest scheduler comes
to thinking that two vCPUs are SMT siblings, while they either are not,
or (if no pinning is in place), they may or may not be, depending on
onto which pCPUs the two vCPUs are executing at any given time. This
means the guest's scheduler's SMT optimization logic will trigger,
while it probably better wouldn't have.

These are the first two use cases that, as the "scheduler guy", I'm
interested to use this feature for.

Then, there indeed is the chance of using the guest topology to affect
the decision of the Xen's scheduler, e.g., to implement some form of
gang scheduling, or to force two vCPU to be executed on pCPUs that
respect such topology... But this is all still in the "wild ideas"
camp, for now. :-D

> > or customers are running general workloads with no pinning (or
> > perhaps
> > cpupool-numa-split) with a moderate amount of oversubscription (at
> > which
> > point exposing SMT is a bad move).
> > 
> 
> Given the scale you folks invest on over-subscription (1000 VMs), I
> wonder what
> moderate here means :P
> 
> > Counterintuitively, exposing NUMA in general oversubscribed
> > scenarios is
> > terrible for net system performance.  What happens in practice is
> > that
> > VMs which see NUMA spend their idle cycles trying to balance their
> > own
> > userspace processes, rather than yielding to the hypervisor so
> > another
> > guest can get a go.
> > 
> 
For NUMA aware workloads, running in guests, the guests themselves
doing something sane with both the placement and the balancing of tasks
and memory is a good thing, that will improve the performance of the
workload itself. Provided the (topology) information used for doing
this placement and this balancing are accurate... a vNUMA is what makes
them accurate.

So, IMO, for big guests, running NUMA-aware workload, vNUMA will most
of the times improve things.

I totally don't get the part where a vCPU becoming idle is what Xen
needs for running other guests' vCPUs... Xen does not at all rely on a
vCPU to periodically block or yield, in order to let another vCPU run,
neither in undesubscribed, nor in oversubscribed scenarios.

It's competitive multitasking, not

Re: [Xen-devel] [PATCH RFC v1 0/3] Enable XL to set and get per-VCPU work conserving flag for RTDS scheduler

2017-08-02 Thread Meng Xu

On Wed, Aug 2, 2017 at 1:49 PM, Dario Faggioli
 wrote:
> On Tue, 2017-08-01 at 14:33 -0400, Meng Xu wrote:
>> This series of patches enable the toolstack to
>> set and get per-VCPU work-conserving flag.
>> With the toolstack, system administrators can decide
>> which VCPUs will be made work-conserving.
>>
> Thanks for this series as well, Meng. I'll look at it in the next
> couple of days.
>>
>> We plan to perform two steps in making RTDS scheduler work-
>> conserving:
>> (1) First make all VCPUs work-conserving by default,
>> which was sent as a separate patch. This work aims for Xen 4.10
>> release.
>> (2) After that, we enable the XL to set and get per-VCPU work-
>> conserving flag,
>> which is this series of patches.
>>
> I think it's better if you merge the "xen:rtds: towards work conserving
> RTDS" as patch 1 of this series.
>
> In fact, sending them as separate series, you make people think that
> they're independent, while they're not (as this series is pretty
> useless, without that patch :-P).

Sure. I can do that. :)

Thanks,

Meng


Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1] xen:rtds: towards work conserving RTDS

2017-08-02 Thread Meng Xu

On Wed, Aug 2, 2017 at 1:46 PM, Dario Faggioli
 wrote:
> Hey, Meng!
>
> It's really cool to see progress on this... There was quite a bit of
> interest in scheduling in general at the Summit in Budapest, and one
> important thing for making sure RTDS will be really useful, is for it
> to have a work conserving mode! :-)

Glad to hear that. :-)

>
> On Tue, 2017-08-01 at 14:13 -0400, Meng Xu wrote:
>> Make RTDS scheduler work conserving to utilize the idle resource,
>> without breaking the real-time guarantees.
>
> Just kill the "to utilize the idle resource". We can expect that people
>  that are interested in this commit, also know what 'work conserving'
> means. :-)

Got it. Will do.

>
>> VCPU model:
>> Each real-time VCPU is extended to have a work conserving flag
>> and a priority_level field.
>> When a VCPU's budget is depleted in the current period,
>> if it has work conserving flag set,
>> its priority_level will increase by 1 and its budget will be
>> refilled;
>> othewrise, the VCPU will be moved to the depletedq.
>>
> Mmm... Ok. But is the budget burned, while the vCPU executes at
> priority_level 1? If yes, doesn't this mean we risk having less budget
> when we get back to priority_lvevel 0?
>
> Oh, wait, maybe it's the case that, when we get back to priority_level
> 0, we also get another replenishment, is that the case? If yes, I
> actually think it's fine...

It's the latter case: the vcpu will get another replenishment when it
gets back to priority_level 0.

>
>> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
>> index 39f6bee..740a712 100644
>> --- a/xen/common/sched_rt.c
>> +++ b/xen/common/sched_rt.c
>> @@ -191,6 +195,7 @@ struct rt_vcpu {
>>  /* VCPU parameters, in nanoseconds */
>>  s_time_t period;
>>  s_time_t budget;
>> +bool_t is_work_conserving;   /* is vcpu work conserving */
>>
>>  /* VCPU current infomation in nanosecond */
>>  s_time_t cur_budget; /* current budget */
>> @@ -201,6 +206,8 @@ struct rt_vcpu {
>>  struct rt_dom *sdom;
>>  struct vcpu *vcpu;
>>
>> +unsigned priority_level;
>> +
>>  unsigned flags;  /* mark __RTDS_scheduled, etc.. */
>>
> So, since we've got a 'flags' field already, can the flag be one of its
> bit, instead of adding a new bool in the struct:
>
> /*
>  * RTDS_work_conserving: Can the vcpu run in the time that is
>  * not part of any real-time reservation, and would therefore
>  * be otherwise left idle?
>  */
> __RTDS_work_conserving   4
> #define RTDS_work_conserving (1<<__RTDS_work_conserving)

Thank you very much for the suggestion! I will modify based on your suggestion.

Actually, I was not very comfortable with the is_work_conserving field either.
It makes the structure verbose and mess up the struct's the cache_line
alignment.

>
>> @@ -245,6 +252,11 @@ static inline struct list_head *rt_replq(const
>> struct scheduler *ops)
>>  return _priv(ops)->replq;
>>  }
>>
>> +static inline bool_t is_work_conserving(const struct rt_vcpu *svc)
>> +{
>>
> Use bool.

OK.

>
>> @@ -273,6 +285,20 @@ vcpu_on_replq(const struct rt_vcpu *svc)
>>  return !list_empty(>replq_elem);
>>  }
>>
>> +/* If v1 priority >= v2 priority, return value > 0
>> + * Otherwise, return value < 0
>> + */
>>
> Comment style.

Got it. Will make it as:
/*
 * If v1 priority >= v2 priority, return value > 0
 * Otherwise, return value < 0
 */

>
> Apart from that, do you want this to return >0 if v1 should have
> priority over v2, and <0 if vice-versa, right? If yes...

Yes.

>
>> +static int
>> +compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu
>> *v2)
>> +{
>> +if ( v1->priority_level < v2->priority_level ||
>> + ( v1->priority_level == v2->priority_level &&
>> + v1->cur_deadline <= v2->cur_deadline ) )
>> +return 1;
>> +else
>> +return -1;
>>
>   int prio = v2->priority_level - v1->priority_level;
>
>   if ( prio == 0 )
> return v2->cur_deadline - v1->cur_deadline;
>
>   return prio;
>
> Return type has to become s_time_t, and there's a chance that it'll
> return 0, if they are at the same level, and have the same absolute
> deadline. But I think you can deal with this in the caller.

OK. Will do.

>
>> @@ -966,8 +1001,16 @@ burn_budget(const struct scheduler *ops, struct
>> rt_vcpu *svc, s_time_t now)
>>
>>  if ( svc->cur_budget <= 0 )
>>  {
>> -svc->cur_budget = 0;
>> -__set_bit(__RTDS_depleted, >flags);
>> +if ( is_work_conserving(svc) )
>> +{
>> +svc->priority_level++;
>>
>ASSERT(svc->priority_level <= 1);
>
>> +svc->cur_budget = svc->budget;
>> +}
>> +else
>> +{
>> +svc->cur_budget = 0;
>> +__set_bit(__RTDS_depleted, >flags);
>> +}
>>  }
>>
> The rest looks good to me.

Thank you very much for the review!

I will revise it and combine this patch into the series

[Xen-devel] [PATCH v15.1 13/23] x86: refactor psr: CDP: implement CPU init flow.

2017-08-02 Thread Yi Sun

This patch implements the CPU init flow for CDP. The flow is almost
same as L3 CAT.

Signed-off-by: Yi Sun 
---
v15:
- refine process in 'psr_cpu_init' to remove the 'goto'.
  (suggested by Jan Beulich)
v14:
- remove the 'Notes' in commit message because a stub function is
  implemented to avoid potential issue.
  (suggested by Jan Beulich)
- remove 'feat_l3_cdp' because it can be replaced by 'feat_l3'.
  (suggested by Jan Beulich)
- implement stub callback functions for CDP to avoid system crash if
  not full CDP patches applied.
  (suggested by Jan Beulich)
- directly assign correct value to 'alt_type' of CDP.
  (suggested by Jan Beulich)
- goto L3 CAT init process if CDP init fails.
  (suggested by Jan Beulich)
v13:
- add commit message.
  (suggested by Jan Beulich)
- fix comment issue.
  (suggested by Jan Beulich)
- set CDP default value before enabling it.
  (suggested by Jan Beulich)
- remove unnecessary check.
  (suggested by Jan Beulich)
- set 'alt_type' for CDP.
  (suggested by Jan Beulich)
- check 'cos_max' and substract 1 before right shift it to get correct
  value.
  (suggested by Jan Beulich)
v12:
- move 'type[]' assignment into l3_cdp_props declaration to make it be
  'const'.
  (suggested by Jan Beulich)
- remove "L2 CAT" indication in printk.
  (suggested by Jan Beulich)
- fix coding style issue.
  (suggested by Jan Beulich)
- change 'val' type to uint64_t.
  (suggested by Jan Beulich)
- use 1ull.
  (suggested by Jan Beulich)
- restore mask(0) MSR to default value.
  (suggested by Jan Beulich)
v11:
- changes about 'feat_props'.
  (suggested by Jan Beulich)
- remove MSR restore action which is unnecessary.
  (suggested by Jan Beulich)
- modify commit message.
v10:
- fix comment.
  (suggested by Jan Beulich)
- use swith in 'cat_init_feature' to handle different feature types.
  (suggested by Jan Beulich)
- changes about 'props'.
  (suggested by Jan Beulich)
- restore MSRs to default value when cpu online.
  (suggested by Jan Beulich)
- remove feat_mask.
  (suggested by Jan Beulich)
v9:
- modify commit message to describe flow clearer.
- handle cpu offline and online again case to read MSRs registers values
  back and save them into cos array to make user can get real data.
- modify error handling process in 'psr_cpu_prepare' to reduce redundant
  codes.
- modify 'get_cdp_data' and 'get_cdp_code' to make them standard.
  (suggested by Roger Pau and Jan Beulich)
- encapsulate CDP operations into 'cat_init_feature' to reduce redundant
  codes.
  (suggested by Roger Pau)
- reuse 'cat_get_cos_max' for CDP.
  (suggested by Roger Pau)
- handle 'PSR_CDP' in psr_presmp_init to make init work can be done when
  there is only 'psr=cdp' in cmdline.
- remove unnecessary comment.
  (suggested by Jan Beulich)
- move CDP related codes in 'cpu_init_work' into 'psr_cpu_init'.
  (suggested by Jan Beulich)
- add codes to handle CDP's 'cos_num'.
  (suggested by Jan Beulich)
- fix coding style issue.
  (suggested by Jan Beulich)
- do not free resources when allocation fails in 'psr_cpu_prepare'.
  (suggested by Jan Beulich)
- changes about 'uint64_t' to 'uint32_t'.
  (suggested by Jan Beulich)
v7:
- initialize 'l3_cdp'.
  (suggested by Konrad Rzeszutek Wilk)
v6:
- use 'cpuid_leaf'.
  (suggested by Konrad Rzeszutek Wilk and Jan Beulich)
v5:
- remove codes to free 'feat_l3_cdp' in 'free_feature'.
  (suggested by Jan Beulich)
- encapsulate cpuid registers into 'struct cpuid_leaf_regs'.
  (suggested by Jan Beulich)
- print socket info when 'opt_cpu_info' is true.
  (suggested by Jan Beulich)
- rename 'l3_cdp_get_max_cos_max' to 'l3_cdp_get_cos_max'.
  (suggested by Jan Beulich)
- rename 'dat[]' to 'data[]'.
  (suggested by Jan Beulich)
- move 'cpu_prepare_work' contents into 'psr_cpu_prepare'.
  (suggested by Jan Beulich)
v4:
- create this patch to make codes easier to understand.
  (suggested by Jan Beulich)
---
 xen/arch/x86/psr.c | 83 ++
 1 file changed, 77 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c
index 6ea2e4e..f007e3b 100644
--- a/xen/arch/x86/psr.c
+++ b/xen/arch/x86/psr.c
@@ -62,6 +62,7 @@
 
 enum psr_feat_type {
 FEAT_TYPE_L3_CAT,
+FEAT_TYPE_L3_CDP,
 FEAT_TYPE_NUM,
 FEAT_TYPE_UNKNOWN,
 };
@@ -163,6 +164,22 @@ static struct feat_node *feat_l3;
 #define cat_default_val(len) (0x >> (32 - (len)))
 
 /*
+ * get_cdp_data - get DATA COS register value from input COS ID.
+ * @feat:the feature node.
+ * @cos: the COS ID.
+ */
+#define get_cdp_data(feat, cos)

[Xen-devel] [PATCH v5] xen: rtds: only tickle non-already tickled CPUs

2017-08-02 Thread Meng Xu

When more than one idle VCPUs that have the same PCPU as their
previous running core invoke runq_tickle(), they will tickle the same
PCPU. The tickled PCPU will only pick at most one VCPU, i.e., the
highest-priority one, to execute. The other VCPUs will not be
scheduled for a period, even when there is an idle core, making these
VCPUs unnecessarily starve for one period.

Therefore, always make sure that we only tickle PCPUs that have not
been tickled already.

Signed-off-by: Haoran Li 
Signed-off-by: Meng Xu 
Reviewed-by: Dario Faggioli 

---
The initial discussion of this patch can be found at
https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg02857.html

Changes in v5:
Revise comments as Dario suggested

Changes in v4:
1) Take Dario's suggestions:
   Search the new->cpu first for the cpu to tickle.
   This get rid of the if statement in previous versions.
2) Reword the comments and commit messages.
3) Rebased on staging branch.

Issues in v2 and v3:
Did not rebase on the latest staging branch.
Did not solve the comments/issues in v1.
Please ignore the v2 and v3.
---
 xen/common/sched_rt.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 39f6bee..0ac5816 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1147,9 +1147,9 @@ rt_vcpu_sleep(const struct scheduler *ops, struct vcpu 
*vc)
  * Called by wake() and context_saved()
  * We have a running candidate here, the kick logic is:
  * Among all the cpus that are within the cpu affinity
- * 1) if the new->cpu is idle, kick it. This could benefit cache hit
- * 2) if there are any idle vcpu, kick it.
- * 3) now all pcpus are busy;
+ * 1) if there are any idle CPUs, kick one.
+  For cache benefit, we check new->cpu as first
+ * 2) now all pcpus are busy;
  *among all the running vcpus, pick lowest priority one
  *if snext has higher priority, kick it.
  *
@@ -1177,17 +1177,13 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu 
*new)
 cpumask_and(_tickled, online, new->vcpu->cpu_hard_affinity);
 cpumask_andnot(_tickled, _tickled, >tickled);
 
-/* 1) if new's previous cpu is idle, kick it for cache benefit */
-if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
-{
-SCHED_STAT_CRANK(tickled_idle_cpu);
-cpu_to_tickle = new->vcpu->processor;
-goto out;
-}
-
-/* 2) if there are any idle pcpu, kick it */
-/* The same loop also find the one with lowest priority */
-for_each_cpu(cpu, _tickled)
+/*
+ * 1) If there are any idle CPUs, kick one.
+ *For cache benefit,we first search new->cpu.
+ *The same loop also find the one with lowest priority.
+ */
+cpu = cpumask_test_or_cycle(new->vcpu->processor, _tickled);
+while ( cpu!= nr_cpu_ids )
 {
 iter_vc = curr_on_cpu(cpu);
 if ( is_idle_vcpu(iter_vc) )
@@ -1200,9 +1196,12 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu 
*new)
 if ( latest_deadline_vcpu == NULL ||
  iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
 latest_deadline_vcpu = iter_svc;
+
+cpumask_clear_cpu(cpu, _tickled);
+cpu = cpumask_cycle(cpu, _tickled);
 }
 
-/* 3) candicate has higher priority, kick out lowest priority vcpu */
+/* 2) candicate has higher priority, kick out lowest priority vcpu */
 if ( latest_deadline_vcpu != NULL &&
  new->cur_deadline < latest_deadline_vcpu->cur_deadline )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.9 + kernel 4.13rc2 -- ballooning regression? reappearance of "Over-allocation for domain 1" errors

2017-08-02 Thread Boris Ostrovsky

On 08/02/2017 08:01 AM, Juergen Gross wrote:
> On 01/08/17 16:28, PGNet Dev wrote:
>> On 7/28/17 9:02 AM, PGNet Dev wrote:
>>> On 7/27/17 11:23 PM, Juergen Gross wrote:
 Can you please post the domain's config file used to create the domain
 and the kernel config?
>>> Sure.
>>>
>>>https://pastebin.com/M6cr2pX7
>>>
>> Any add'l info needed?
> No, I don't think so.
>
> IMO the problem is related to the fact that the balloon driver tries to
> use then kernel's view of how much memory it is owning and setting this
> number in relation to Xen's view how much memory it should try to have.
>
> Maybe before adding memory from Xen the kernel should ask the hypervisor
> how much memory it has already from Xen's point of view and how much it
> is allowed to have. This will avoid the messages you have seen as long
> as there are no interfering actions from Xen (e.g. lowering the maximum
> reservation) while the kernel is trying to balloon up.

Could this be caused by your recent ballooning patch
(96edd61dcf44362d3ef0bed1a5361e0ac7886a63)?

-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen-platform: constify pci_device_id.

2017-08-02 Thread Boris Ostrovsky

On 08/02/2017 01:46 PM, Arvind Yadav wrote:
> pci_device_id are not supposed to change at runtime. All functions
> working with pci_device_id provided by  work with
> const pci_device_id. So mark the non-const structs as const.
>
> Signed-off-by: Arvind Yadav 

Reviewed-by: Boris Ostrovsky 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] tools/libxl: Fix a segment fault when mmio_hole is set in hvm.cfg

2017-08-02 Thread Christopher Clark

On Wed, Jul 12, 2017 at 3:15 AM, Wei Liu  wrote:
>
> On Thu, Jul 13, 2017 at 10:03:39AM +0800, Xiong Zhang wrote:
> > When valid mmio_hole is set in hvm.cfg, segment fault happens at accessing
> > localents pointer.
> >
> > Because the size of localents pointer isn't enough to store appended
> > mmio_hole_size parameter.
> >
> > Signed-off-by: Xiong Zhang 
> > ---
> >
> > tools/libxl/libxl_create.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> >index bffbc45..1158303 100644
> >--- a/tools/libxl/libxl_create.c
> >+++ b/tools/libxl/libxl_create.c
> >@@ -451,7 +451,7 @@ int libxl__domain_build(libxl__gc *gc,
> > vments[4] = "start_time";
> > vments[5] = GCSPRINTF("%lu.%02d", 
> > start_time.tv_sec,(int)start_time.tv_usec/1);
> >
> >-localents = libxl__calloc(gc, 9, sizeof(char *));
> >+localents = libxl__calloc(gc, 11, sizeof(char *));
> > i = 0;
> > localents[i++] = "platform/acpi";
> > localents[i++] = libxl__acpi_defbool_val(info) ? "1" : "0";
>
>
> Acked-by: Wei Liu 
>
> Ian please backport this.


Bump: the above patch still needs to be backported into 4.9 please.

master commit: 614a14736e33fb84872eb00f08799ebbc73a96c6

thanks,

Christopher
--
http://openxt.org/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC v4]Proposal to allow setting up shared memory areas between VMs from xl config file

2017-08-02 Thread Edgar E. Iglesias

On Wed, Aug 02, 2017 at 07:10:25PM +0100, Julien Grall wrote:
> Hi,
> 
> On 01/08/17 00:53, Stefano Stabellini wrote:
> >On Mon, 31 Jul 2017, Edgar E. Iglesias wrote:
> >  @role Can only be 'master' or 'slave', it defaults to 
> > 'slave'.
> >
> >  @prot When @role = master, this means the largest set 
> > of
> >stage-2 permission flags that can be granted to 
> > the
> >slave domains.
> >When @role = slave, this means the stage-2 permission
> >flags of the shared memory area.
> >Currently only 'rw' is supported. If not set. it
> >defaults to 'rw'.
> >
> >  @cache_policy The stage-2 cacheability/shareability attributes 
> > of the
> >shared memory area. Currently, only two policies 
> > are
> >supported:
> >  * ARM_normal: Only applicable to ARM guests. 
> > This
> >would mean Inner and Outer 
> > Write-Back
> >Cacheable, and Inner Shareable.
> 
> 
> Is there a reason not to set this to Outer Shareable?
> Again, mainly useful when these pages get shared with devs as well.
> 
> The guest can always lower it to Inner Shareable via S1 tables if needed.
> >>>
> >>>I don't think we can support memory sharing with devices in this version
> >>>of the document (see above about GSoC timelines). Normal memory is inner
> >>>shareable in Xen today, it makes sense to default to that.
> >>
> >>I thought we mapped RAM as Outer shareable to guests but you seem to be 
> >>right.
> >>I think we should be mapping all RAM as Outer Shareable and then let the
> >>guest decide what is Inner and what is Outer via it's S1 tables.
> >>Right now it would be impossible to be Coherent with a DMA device outside
> >>of the Inner domain...
> >>
> >>Perhaps we should fix that and then ARM_normal would by itself become Outer.
> >>If there's agreement I can test it and send a patch.
> >
> >Today, only device memory is mapped Outer Shareable, while normal memory
> >is mapped Inner Shareable. I am OK with changing the default in
> >mfn_to_p2m_entry to Outer Shareable for normal RAM if the change would
> >make it possible to do coherent DMA with more devices on the platform.
> >Julien?
> 
> Per Table D4-44 in ARM DDI 0487B.a, outer-shareable takes precedence on
> inner-shareable. So if the guest configures the stage-1 page table with
> outer-shareable, the resultant will be outer-shareable.

You're right, I had gotten this backwards.

Thanks,
Edgar



> 
> As I pointed out in a previous version, this option is a bit confusing
> because the value set in stage-2 does not mean this will be the resultant
> value. I would strongly recommend any user wanting to share memory between
> guests to read D4.5.4 "Combining the stage 1 and stage 2 attributes" before
> using this option. Otherwise, he/she will likely be confused and end up to
> mis-configure the attribute.
> 
> The current configuration (i.e "ARM_normal") gives the most flexibility to
> the guest in term of memory attribute but prevent non-shareable mapping.
> 
> Cheers,
> 
> -- 
> Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] arm: smccc: handle SMCs/HVCs according to SMCCC

2017-08-02 Thread Stefano Stabellini

On Tue, 1 Aug 2017, Julien Grall wrote:
> Hi Edgar,
> 
> On 01/08/2017 22:22, Edgar E. Iglesias wrote:
> > On Tue, Aug 01, 2017 at 10:02:45PM +0100, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 01/08/2017 21:40, Stefano Stabellini wrote:
> > > > On Tue, 1 Aug 2017, Edgar E. Iglesias wrote:
> > > > > On Tue, Aug 01, 2017 at 11:59:00AM +0100, Julien Grall wrote:
> > > > > > (+ Edgar, Mark, Dave)
> > > > > > 
> > > > > > Hi,
> > > > > 
> > > > > Hi Julien,
> > > > > 
> > > > > I'll share some thoughts based on our platforms.
> > > > > 
> > > > > 
> > > > > > On 14/06/17 15:10, Volodymyr Babchuk wrote:
> > > > > > > SMCCC (SMC Call Convention) describes how to handle both HVCs and
> > > > > > > SMCs.
> > > > > > > SMCCC states that both HVC and SMC are valid conduits to call to a
> > > > > > > different
> > > > > > > firmware functions. Thus, for example PSCI calls can be made both
> > > > > > > by
> > > > > > > SMC or HVC. Also SMCCC defines function number coding for such
> > > > > > > calls.
> > > > > > > Besides functional calls there are query calls, which allows
> > > > > > > underling
> > > > > > > OS determine version, UID and number of functions provided by
> > > > > > > service
> > > > > > > provider.
> > > > > > > 
> > > > > > > This patch adds new file `smccc.c`, which handles both generic
> > > > > > > SMCs
> > > > > > > and HVC according to SMC. At this moment it implements only one
> > > > > > > service: Standard Hypervisor Service.
> > > > > > > 
> > > > > > > Standard Hypervisor Service only supports query calls, so caller
> > > > > > > can
> > > > > > > ask about hypervisor UID and determine that it is XEN running.
> > > > > > > 
> > > > > > > This change allows more generic handling for SMCs and HVCs and it
> > > > > > > can
> > > > > > > be easily extended to support new services and functions.
> > > > > > 
> > > > > > I have already reviewed the code and one thing I missed is how a
> > > > > > domain will
> > > > > > know that Xen supports SMCCC.
> > > > > > 
> > > > > > Currently, Xen will:
> > > > > > - inject an undefined instruction for any SMC issued by a
> > > > > > guest
> > > > > > - crash the guest (quite bad) for any unknown HCV #0
> > > > > > 
> > > > > > So a guest needs to be aware whether Xen supports SMCCC convention
> > > > > > or not. I
> > > > > > am not aware of any bindings in the device-tree for doing that.
> > > > > 
> > > > > On our platforms, SW probes the DT for specific service classes and
> > > > > then
> > > > > probes for specific versions via SMC calls using the standard Version
> > > > > FIDs.
> > > > > If the DT does not specify the firmware node, I don't think any SMCs
> > > > > will be
> > > > > issued but the guest may not be functional (as the firmware interface
> > > > > is
> > > > > mandatory).
> > > > > 
> > > > > I don't know of a generic DT node/compat that tells guests if SMCC
> > > > > probing
> > > > > is allowed or not. Perhaps there should be one, or there should be yet
> > > > > another service specific one for Hypervisors. I don't know.
> > > > > 
> > > > > For example, these are the nodes we've got (PSCI and EEMI/SIP):
> > > > >psci {
> > > > >compatible = "arm,psci-0.2";
> > > > >method = "smc";
> > > > >};
> > > > > 
> > > > >pmufw: firmware {
> > > > >compatible = "xlnx,zynqmp-pm";
> > > > >method = "smc";
> > > > >interrupt-parent = <>;
> > > > >interrupts = <0 35 4>;
> > > > >};
> > > > > 
> > > > > SW that does not have DT support, will either directly probe the SMC
> > > > > interface or in some cases just assume it's there and use it.
> > > > > 
> > > > > ZynqMP-wise, Xen is in a little bit of an akward position by messing
> > > > > guests up if they probe for non-existing SMC services but I don't
> > > > > think
> > > > > it's that bad. Mainly because there's really very little ZynqMP guests
> > > > > that need the firmware would be able todo without it.
> > > > > 
> > > > > For other platforms and services, I guess FW may very well be optional
> > > > > and probing makes more sence. So getting support for gracefully
> > > > > returning
> > > > > Unknown FID still important...
> > > > 
> > > > Indeed, but unfortunately older versions of Xen don't do that. That's
> > > > why I think we'll have to introduce a feature flag under the Xen
> > > > hypervisor node on device tree. The presence of the flag could mean "it
> > > > is safe to probe via SMC/HVC". I think it makes sense for the flag to be
> > > > Xen specific, because we are talking about Xen behavior. Applications
> > > > that know they are running on a new enough Xen can skip the check (this
> > > > is going to be the case in most embedded scenarios).
> > > 
> > > This is not Xen specific behavior. Per ARM ARM, SMC will be undefined when
> > > EL3 is not present. Today Xen is behaving the same way as EL3 was not
> > > existing on the

Re: [Xen-devel] Next Xen ARM community call - Wednesday 2nd August 2017

2017-08-02 Thread Stefano Stabellini

Hi everybody,

the Xen based suspend notification interface is implemented in Linux by
drivers/xen/manage.c. The guest monitors a node on xenstore named
"control/shutdown", one of the possible commands is "suspend".

See the corresponding libxl__domain_pvcontrol_* functions and their
callers under tools/libxl. You can see that libxl is able to detect if
the PV protocol is supported by the guest calling
libxl__domain_pvcontrol_available, although the detection wouldn't work
today for ARM guests.

You might also want to give a look at drivers/xen/cpu_hotplug.c (also in
Linux) which is the implementation of the PV cpu hotplug protocol.
See libxl__set_vcpuonline_xenstore in tools/libxl/libxl_domain.c.

I am sorry this protocol is not properly documented anywhere (as far as
I am aware). However, it is so simple that you should be able to
understand how it works just from the code.



On Wed, 2 Aug 2017, Mirela Simonovic wrote:
> Hi,
> Thanks for the invite, please find the presentationhere: 
> https://docs.google.com/presentation/d/1PhK9bdLyY_DSr8RbSs7LDdLWADbSW7YGwwnkpMLWeQ
> g/edit#slide=id.g24c05965d7_0_15
> 
> Regards,
> Mirela
> 
> On Tue, Aug 1, 2017 at 7:43 PM, Stefano Stabellini  
> wrote:
>   On Tue, 1 Aug 2017, Julien Grall wrote:
>   > On 01/08/17 17:44, Edgar E. Iglesias wrote:
>   > > On Wed, Jul 26, 2017 at 04:59:43PM +0100, Julien Grall wrote:
>   > > > Hi all,
>   > > >
>   > > > The next Xen ARM community call will be Wednesday 2nd August 2017 
> 5pm
>   BST.
>   > > >
>   > > > Do you have any specific topic you would like to discuss?
>   > >
>   > > CC: Davorin and Mirella from Aggios
>   > >
>   > > Hi Julien,
>   >
>   > Hi,
>   >
>   > > I was talking with the Aggios folks today and they were wondering if
>   > > it's possible to share screens to present slides during the call?
>   > > I'm guessing not, since the info below only has dial in info.
>   >
>   > It sounds like it is possible to share screen but I can't find a 
> public
>   link
>   > for it :/.
>   >
>   > I can look for an alternative (I have plenty of choice at Arm :)) if 
> it is
>   > something people wants to do in the future. But for tomorrow, it 
> might be
>   > difficult to get something up by tomorrow.
>   >
>   > Can you upload the slides somewhere and send a link by e-mail?
> 
>   Sending the slides by email beforehand is always a good idea. However,
>   it's 2017 and we *have* to be able to share slides live during a meeting
>   :-)
> 
>   I talked with Julien: we are going to use my uberconference details for
>   the call, which supports dialing in by phone, from your PC and slide
>   sharing:
> 
> 
>   Join the call: https://www.uberconference.com/stefano-stabellini
>   US dial-in number: 669-999-0613
>   No PIN needed
> 
>   For the international numbers, go to
>   https://www.uberconference.com/stefano-stabellini and choose "Join by
>   Phone", a list of international numbers and a pin will be provided.
> 
>   I'll also send another reminder tomorrow.
> 
> 
>   Cheers,
> 
>   Stefano
> 
> 
> 
> ___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC v4]Proposal to allow setting up shared memory areas between VMs from xl config file

2017-08-02 Thread Julien Grall


Hi,

On 01/08/17 00:53, Stefano Stabellini wrote:

On Mon, 31 Jul 2017, Edgar E. Iglesias wrote:

  @role Can only be 'master' or 'slave', it defaults to 'slave'.

  @prot When @role = master, this means the largest set of
stage-2 permission flags that can be granted to the
slave domains.
When @role = slave, this means the stage-2 permission
flags of the shared memory area.
Currently only 'rw' is supported. If not set. it
defaults to 'rw'.

  @cache_policy The stage-2 cacheability/shareability attributes of the
shared memory area. Currently, only two policies are
supported:
  * ARM_normal: Only applicable to ARM guests. This
would mean Inner and Outer Write-Back
Cacheable, and Inner Shareable.



Is there a reason not to set this to Outer Shareable?
Again, mainly useful when these pages get shared with devs as well.

The guest can always lower it to Inner Shareable via S1 tables if needed.


I don't think we can support memory sharing with devices in this version
of the document (see above about GSoC timelines). Normal memory is inner
shareable in Xen today, it makes sense to default to that.


I thought we mapped RAM as Outer shareable to guests but you seem to be right.
I think we should be mapping all RAM as Outer Shareable and then let the
guest decide what is Inner and what is Outer via it's S1 tables.
Right now it would be impossible to be Coherent with a DMA device outside
of the Inner domain...

Perhaps we should fix that and then ARM_normal would by itself become Outer.
If there's agreement I can test it and send a patch.


Today, only device memory is mapped Outer Shareable, while normal memory
is mapped Inner Shareable. I am OK with changing the default in
mfn_to_p2m_entry to Outer Shareable for normal RAM if the change would
make it possible to do coherent DMA with more devices on the platform.
Julien?


Per Table D4-44 in ARM DDI 0487B.a, outer-shareable takes precedence on 
inner-shareable. So if the guest configures the stage-1 page table with 
outer-shareable, the resultant will be outer-shareable.


As I pointed out in a previous version, this option is a bit confusing 
because the value set in stage-2 does not mean this will be the 
resultant value. I would strongly recommend any user wanting to share 
memory between guests to read D4.5.4 "Combining the stage 1 and stage 2 
attributes" before using this option. Otherwise, he/she will likely be 
confused and end up to mis-configure the attribute.


The current configuration (i.e "ARM_normal") gives the most flexibility 
to the guest in term of memory attribute but prevent non-shareable mapping.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 16/22] x86/percpu: Adapt percpu for PIE support

2017-08-02 Thread Thomas Garnier

On Wed, Aug 2, 2017 at 9:56 AM, Kees Cook  wrote:
> On Wed, Aug 2, 2017 at 9:42 AM, Thomas Garnier  wrote:
>> I noticed that not only we have the problem of gs:0x40 not being
>> accessible. The compiler will default to the fs register if
>> mcmodel=kernel is not set.
>>
>> On the next patch set, I am going to add support for
>> -mstack-protector-guard=global so a global variable can be used
>> instead of the segment register. Similar approach than ARM/ARM64.
>
> While this is probably understood, I have to point out that this would
> be a major regression for the stack protection on x86.

I agree, the optimal solution will be using updated gcc/clang.

>
>> Following this patch, I will work with gcc and llvm to add
>> -mstack-protector-reg= support similar to PowerPC.
>> This way we can have gs used even without mcmodel=kernel. Once that's
>> an option, I can setup the GDT as described in the previous email
>> (similar to RFG).
>
> It would be much nicer if we could teach gcc about the percpu area
> instead. This would let us solve the global stack protector problem on
> the other architectures:
> http://www.openwall.com/lists/kernel-hardening/2017/06/27/6

Yes, while I am looking at gcc I will take a look at other
architecture to see if I can help there too.

>
> -Kees
>
> --
> Kees Cook
> Pixel Security



-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen-platform: constify pci_device_id.

2017-08-02 Thread Arvind Yadav

pci_device_id are not supposed to change at runtime. All functions
working with pci_device_id provided by  work with
const pci_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/xen/platform-pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/xen/platform-pci.c b/drivers/xen/platform-pci.c
index 1275df8..5d7dcad 100644
--- a/drivers/xen/platform-pci.c
+++ b/drivers/xen/platform-pci.c
@@ -175,7 +175,7 @@ static int platform_pci_probe(struct pci_dev *pdev,
return ret;
 }
 
-static struct pci_device_id platform_pci_tbl[] = {
+static const struct pci_device_id platform_pci_tbl[] = {
{PCI_VENDOR_ID_XEN, PCI_DEVICE_ID_XEN_PLATFORM,
PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
{0,}
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 0/3] Enable XL to set and get per-VCPU work conserving flag for RTDS scheduler

2017-08-02 Thread Dario Faggioli

On Tue, 2017-08-01 at 14:33 -0400, Meng Xu wrote:
> This series of patches enable the toolstack to
> set and get per-VCPU work-conserving flag.
> With the toolstack, system administrators can decide
> which VCPUs will be made work-conserving.
> 
Thanks for this series as well, Meng. I'll look at it in the next
couple of days.
> 
> We plan to perform two steps in making RTDS scheduler work-
> conserving:
> (1) First make all VCPUs work-conserving by default,
> which was sent as a separate patch. This work aims for Xen 4.10
> release.
> (2) After that, we enable the XL to set and get per-VCPU work-
> conserving flag,
> which is this series of patches.
> 
I think it's better if you merge the "xen:rtds: towards work conserving
RTDS" as patch 1 of this series.

In fact, sending them as separate series, you make people think that
they're independent, while they're not (as this series is pretty
useless, without that patch :-P).

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM

2017-08-02 Thread Oleksandr Tyshchenko

Hi, Kevin

On Wed, Aug 2, 2017 at 9:12 AM, Tian, Kevin  wrote:
>> From: Oleksandr Tyshchenko [mailto:olekst...@gmail.com]
>> Sent: Tuesday, August 1, 2017 7:08 PM
>>
>> Hi, Kevin
>>
>> On Tue, Aug 1, 2017 at 6:06 AM, Tian, Kevin  wrote:
>> >> From: Oleksandr Tyshchenko [mailto:olekst...@gmail.com]
>> >> Sent: Monday, July 31, 2017 7:58 PM
>> >>
>> >> Hi, Kevin
>> >>
>> >> On Mon, Jul 31, 2017 at 8:57 AM, Tian, Kevin 
>> wrote:
>> >> >> From: Oleksandr Tyshchenko
>> >> >> Sent: Wednesday, July 26, 2017 1:27 AM
>> >> >>
>> >> >> From: Oleksandr Tyshchenko 
>> >> >>
>> >> >> Hi, all.
>> >> >>
>> >> >> The purpose of this patch series is to create a base for porting
>> >> >> any "Non-shared" IOMMUs to Xen on ARM. Saying "Non-shared"
>> IOMMU
>> >> I
>> >> >> mean
>> >> >> the IOMMU that can't share the page table with the CPU.
>> >> >
>> >> > Is "non-shared" IOMMU a standard terminology in ARM side? I quickly
>> >> > searched to find it mostly used in this thread...
>> >> I don't think that it is a standard terminology.
>> >>
>> >> >
>> >> > On the other hand, all IOMMUs support a basic DMA remapping
>> >> > mechanism with page table not shared with CPU. Then some IOMMUs
>> >> > may optional support Shared Virtual Memory (SVM) through page
>> >> > sharing with CPU. Then I'm not sure why need to highlight the
>> >> > "non-shared" manner in this thread, instead of just saying
>> >> > IPMMU-VMSA support...
>> >> I wouldn't use "IPMMU-VMSA support" in this thread since it may be any
>> >> other IOMMUs which can't share page table
>> >> with CPU because of format incompatibilities.
>> >
>> > As I commented you can assume all IOMMUs cannot share page
>> > table with CPU as the starting point. It's not good to name an IOMMU
>> > driver based on such fact.
>> >
>> >> I needed something short to describe such IOMMUs, but, If title
>> >> "non-shared" IOMMU sounds confusing
>> >> I won't use it anymore. Do you have something in mind?
>> >
>> > IOMMU driver needs to be vendor specific. Is your driver working
>> > for all IPMMU-VMSA compatible IOMMUs or only for Renesas?
>> > If the latter, you may make the name explicit for such purpose.
>> >
>> > btw since you're porting Linux driver to Xen. What 's the name
>> > used in Linux side? that should be a good reference to you.
>>
>> I am afraid a misunderstanding took place. Let me elaborate a bit more
>> about this.
>>
>> The IOMMU driver I am porting to Xen is IPMMU-VMSA [1]. This name is
>> used in Linux
>> and this name was retained during porting to Xen. This driver is
>> intended to work with Renesas VMSA-compatible IPMMUs.
>> But, IPMMU-VMSA support is not a target for the current thread, there
>> is another thread for adding it [2].
>>
>> The purpose of the current thread is to create ground for IPMMU-VMSA
>> IOMMUs
>> (as well as other IOMMUs which can't share page table with CPU because
>> of format incompatibilities) to be functional inside Xen on ARM.
>> The only IOMMU supported today in Xen on ARM is the ARM SMMU (which
>> uses the same page table format as the CPU and can share page table
>> with it).
>> And ARM specific code assumes that P2M table is *always* shared and
>> acts accordingly. So, this patch series is trying
>> to, let's say, brake this assumption and create environment to handle
>> such IOMMUs as well.
>> So, I may use the whole sentence as a patch series title in order not
>> to confuse people:
>> "Support IOMMUs which don't share page table with the CPU on ARM"
>> No objections?
>
> well, I saw where disconnect comes. My context when reviewing
> this message is right around thinking some about Shared Virtual
> Memory (SVM), which is a feature to allow IOMMU sharing same
> CPU page table for VA->PA so user application can directly send
> VA to device to do DMA. In virtualization it means IOMMU supports
> two-level translation with 1st level for GVA -> GPA and 2nd level
> for GPA->HPA. 1st level directly uses guest CPU page table. That
> is an optional feature not supported by all IOMMUs.
>
> while your work is really about IOMMU sharing CPU EPT page
> table (GPA->HPA) (sorry I don't know ARM's term for EPT), and
> for this usage some ARM SMMUs have compatible format while
> others may not, which is why you introduced the "non-shared"
> model here.
Yes. Exactly. It was about sharing *stage-2* page table (that handles
IPA->PA mappings).
Sorry if I was unclear.

>
> I'm not sure whether it's worthy of differentiating two usages
> in your subject line, since SVM support is not in Xen today. And
> from Julien's comment looks people don't have confusion on
> its meaning. Then I'm fine with your original description. it's
> Julien's call. :-)
>
> Thanks
> Kevin



-- 
Regards,

Oleksandr Tyshchenko

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1] xen:rtds: towards work conserving RTDS

2017-08-02 Thread Dario Faggioli

Hey, Meng!

It's really cool to see progress on this... There was quite a bit of
interest in scheduling in general at the Summit in Budapest, and one
important thing for making sure RTDS will be really useful, is for it
to have a work conserving mode! :-)

On Tue, 2017-08-01 at 14:13 -0400, Meng Xu wrote:
> Make RTDS scheduler work conserving to utilize the idle resource,
> without breaking the real-time guarantees.

Just kill the "to utilize the idle resource". We can expect that people
 that are interested in this commit, also know what 'work conserving'
means. :-)

> VCPU model:
> Each real-time VCPU is extended to have a work conserving flag
> and a priority_level field.
> When a VCPU's budget is depleted in the current period,
> if it has work conserving flag set,
> its priority_level will increase by 1 and its budget will be
> refilled;
> othewrise, the VCPU will be moved to the depletedq.
> 
Mmm... Ok. But is the budget burned, while the vCPU executes at
priority_level 1? If yes, doesn't this mean we risk having less budget
when we get back to priority_lvevel 0?

Oh, wait, maybe it's the case that, when we get back to priority_level
0, we also get another replenishment, is that the case? If yes, I
actually think it's fine...

> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 39f6bee..740a712 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -191,6 +195,7 @@ struct rt_vcpu {
>  /* VCPU parameters, in nanoseconds */
>  s_time_t period;
>  s_time_t budget;
> +bool_t is_work_conserving;   /* is vcpu work conserving */
>  
>  /* VCPU current infomation in nanosecond */
>  s_time_t cur_budget; /* current budget */
> @@ -201,6 +206,8 @@ struct rt_vcpu {
>  struct rt_dom *sdom;
>  struct vcpu *vcpu;
>  
> +unsigned priority_level;
> +
>  unsigned flags;  /* mark __RTDS_scheduled, etc.. */
>
So, since we've got a 'flags' field already, can the flag be one of its
bit, instead of adding a new bool in the struct:

/*
 * RTDS_work_conserving: Can the vcpu run in the time that is
 * not part of any real-time reservation, and would therefore
 * be otherwise left idle?
 */
__RTDS_work_conserving   4
#define RTDS_work_conserving (1<<__RTDS_work_conserving)

> @@ -245,6 +252,11 @@ static inline struct list_head *rt_replq(const
> struct scheduler *ops)
>  return _priv(ops)->replq;
>  }
>  
> +static inline bool_t is_work_conserving(const struct rt_vcpu *svc)
> +{
>
Use bool.

> @@ -273,6 +285,20 @@ vcpu_on_replq(const struct rt_vcpu *svc)
>  return !list_empty(>replq_elem);
>  }
>  
> +/* If v1 priority >= v2 priority, return value > 0
> + * Otherwise, return value < 0
> + */
>
Comment style.

Apart from that, do you want this to return >0 if v1 should have
priority over v2, and <0 if vice-versa, right? If yes...

> +static int
> +compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu
> *v2)
> +{
> +if ( v1->priority_level < v2->priority_level ||
> + ( v1->priority_level == v2->priority_level && 
> + v1->cur_deadline <= v2->cur_deadline ) )
> +return 1;
> +else
> +return -1;
>
  int prio = v2->priority_level - v1->priority_level;

  if ( prio == 0 )
return v2->cur_deadline - v1->cur_deadline;

  return prio;

Return type has to become s_time_t, and there's a chance that it'll
return 0, if they are at the same level, and have the same absolute
deadline. But I think you can deal with this in the caller.

> @@ -966,8 +1001,16 @@ burn_budget(const struct scheduler *ops, struct
> rt_vcpu *svc, s_time_t now)
>  
>  if ( svc->cur_budget <= 0 )
>  {
> -svc->cur_budget = 0;
> -__set_bit(__RTDS_depleted, >flags);
> +if ( is_work_conserving(svc) )
> +{
> +svc->priority_level++;
>
   ASSERT(svc->priority_level <= 1);

> +svc->cur_budget = svc->budget;
> +}
> +else
> +{
> +svc->cur_budget = 0;
> +__set_bit(__RTDS_depleted, >flags);
> +}
>  }
>  
The rest looks good to me.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Next Xen ARM community call - Wednesday 2nd August 2017

2017-08-02 Thread Mirela Simonovic

Hi,

Thanks for the invite, please find the presentation here:
https://docs.google.com/presentation/d/1PhK9bdLyY_DSr8RbSs7LDdLWADbSW7YGwwnkpMLWeQg/edit#slide=id.g24c05965d7_0_15

Regards,
Mirela

On Tue, Aug 1, 2017 at 7:43 PM, Stefano Stabellini 
wrote:

> On Tue, 1 Aug 2017, Julien Grall wrote:
> > On 01/08/17 17:44, Edgar E. Iglesias wrote:
> > > On Wed, Jul 26, 2017 at 04:59:43PM +0100, Julien Grall wrote:
> > > > Hi all,
> > > >
> > > > The next Xen ARM community call will be Wednesday 2nd August 2017
> 5pm BST.
> > > >
> > > > Do you have any specific topic you would like to discuss?
> > >
> > > CC: Davorin and Mirella from Aggios
> > >
> > > Hi Julien,
> >
> > Hi,
> >
> > > I was talking with the Aggios folks today and they were wondering if
> > > it's possible to share screens to present slides during the call?
> > > I'm guessing not, since the info below only has dial in info.
> >
> > It sounds like it is possible to share screen but I can't find a public
> link
> > for it :/.
> >
> > I can look for an alternative (I have plenty of choice at Arm :)) if it
> is
> > something people wants to do in the future. But for tomorrow, it might be
> > difficult to get something up by tomorrow.
> >
> > Can you upload the slides somewhere and send a link by e-mail?
>
> Sending the slides by email beforehand is always a good idea. However,
> it's 2017 and we *have* to be able to share slides live during a meeting
> :-)
>
> I talked with Julien: we are going to use my uberconference details for
> the call, which supports dialing in by phone, from your PC and slide
> sharing:
>
>
> Join the call: https://www.uberconference.com/stefano-stabellini
> US dial-in number: 669-999-0613
> No PIN needed
>
> For the international numbers, go to
> https://www.uberconference.com/stefano-stabellini and choose "Join by
> Phone", a list of international numbers and a pin will be provided.
>
> I'll also send another reminder tomorrow.
>
>
> Cheers,
>
> Stefano
>
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 16/22] x86/percpu: Adapt percpu for PIE support

2017-08-02 Thread Kees Cook

On Wed, Aug 2, 2017 at 9:42 AM, Thomas Garnier  wrote:
> I noticed that not only we have the problem of gs:0x40 not being
> accessible. The compiler will default to the fs register if
> mcmodel=kernel is not set.
>
> On the next patch set, I am going to add support for
> -mstack-protector-guard=global so a global variable can be used
> instead of the segment register. Similar approach than ARM/ARM64.

While this is probably understood, I have to point out that this would
be a major regression for the stack protection on x86.

> Following this patch, I will work with gcc and llvm to add
> -mstack-protector-reg= support similar to PowerPC.
> This way we can have gs used even without mcmodel=kernel. Once that's
> an option, I can setup the GDT as described in the previous email
> (similar to RFG).

It would be much nicer if we could teach gcc about the percpu area
instead. This would let us solve the global stack protector problem on
the other architectures:
http://www.openwall.com/lists/kernel-hardening/2017/06/27/6

-Kees

-- 
Kees Cook
Pixel Security

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC 16/22] x86/percpu: Adapt percpu for PIE support

2017-08-02 Thread Thomas Garnier

On Thu, Jul 20, 2017 at 7:26 AM, Thomas Garnier  wrote:
> On Wed, Jul 19, 2017 at 4:33 PM, H. Peter Anvin  wrote:
>> On 07/19/17 11:26, Thomas Garnier wrote:
>>> On Tue, Jul 18, 2017 at 8:08 PM, Brian Gerst  wrote:
 On Tue, Jul 18, 2017 at 6:33 PM, Thomas Garnier  
 wrote:
> Perpcu uses a clever design where the .percu ELF section has a virtual
> address of zero and the relocation code avoid relocating specific
> symbols. It makes the code simple and easily adaptable with or without
> SMP support.
>
> This design is incompatible with PIE because generated code always try to
> access the zero virtual address relative to the default mapping address.
> It becomes impossible when KASLR is configured to go below -2G. This
> patch solves this problem by removing the zero mapping and adapting the GS
> base to be relative to the expected address. These changes are done only
> when PIE is enabled. The original implementation is kept as-is
> by default.

 The reason the per-cpu section is zero-based on x86-64 is to
 workaround GCC hardcoding the stack protector canary at %gs:40.  So
 this patch is incompatible with CONFIG_STACK_PROTECTOR.
>>>
>>> Ok, that make sense. I don't want this feature to not work with
>>> CONFIG_CC_STACKPROTECTOR*. One way to fix that would be adding a GDT
>>> entry for gs so gs:40 points to the correct memory address and
>>> gs:[rip+XX] works correctly through the MSR.
>>
>> What are you talking about?  A GDT entry and the MSR do the same thing,
>> except that a GDT entry is limited to an offset of 0-0x (which
>> doesn't work for us, obviously.)
>>
>
> A GDT entry would allow gs:0x40 to be valid while all gs:[rip+XX]
> addresses uses the MSR.
>
> I didn't tested it but that was used on the RFG mitigation [1]. The fs
> segment register was used for both thread storage and shadow stack.
>
> [1] http://xlab.tencent.com/en/2016/11/02/return-flow-guard/
>

Small update on that.

I noticed that not only we have the problem of gs:0x40 not being
accessible. The compiler will default to the fs register if
mcmodel=kernel is not set.

On the next patch set, I am going to add support for
-mstack-protector-guard=global so a global variable can be used
instead of the segment register. Similar approach than ARM/ARM64.

Following this patch, I will work with gcc and llvm to add
-mstack-protector-reg= support similar to PowerPC.
This way we can have gs used even without mcmodel=kernel. Once that's
an option, I can setup the GDT as described in the previous email
(similar to RFG).

Let me know what you think about this approach.

>>> Given the separate
>>> discussion on mcmodel, I am going first to check if we can move from
>>> PIE to PIC with a mcmodel=small or medium that would remove the percpu
>>> change requirement. I tried before without success but I understand
>>> better percpu and other components so maybe I can make it work.
>>
 This is silly.  The right thing is for PIE is to be explicitly absolute,
 without (%rip).  The use of (%rip) memory references for percpu is just
 an optimization.
>>>
>>> I agree that it is odd but that's how the compiler generates code. I
>>> will re-explore PIC options with mcmodel=small or medium, as mentioned
>>> on other threads.
>>
>> Why should the way compiler generates code affect the way we do things
>> in assembly?
>>
>> That being said, the compiler now has support for generating this kind
>> of code explicitly via the __seg_gs pointer modifier.  That should let
>> us drop the __percpu_prefix and just use variables directly.  I suspect
>> we want to declare percpu variables as "volatile __seg_gs" to account
>> for the possibility of CPU switches.
>>
>> Older compilers won't be able to work with this, of course, but I think
>> that it is acceptable for those older compilers to not be able to
>> support PIE.
>>
>> -hpa
>>
>
>
>
> --
> Thomas



-- 
Thomas

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] xen: rtds: only tickle non-already tickled CPUs

2017-08-02 Thread Dario Faggioli

On Tue, 2017-08-01 at 15:24 -0400, Meng Xu wrote:
> The initial discussion of this patch can be found at
> https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg02857
> .html
> 
> Changes in v4:
> 1) Take Dario's suggestions:
>    Search the new->cpu first for the cpu to tickle.
>    This get rid of the if statement in previous versions.
> 2) Reword the comments and commit messages.
> 3) Rebased on staging branch.
> 
> Issues in v2 and v3:
> Did not rebase on the latest staging branch.
> Did not solve the comments/issues in v1.
> Please ignore the v2 and v3.
>
Ok, thanks for taking care of this.

I've only have two more minor comments.

> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 39f6bee..5fec95f 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -1147,9 +1147,9 @@ rt_vcpu_sleep(const struct scheduler *ops,
> struct vcpu *vc)
>   * Called by wake() and context_saved()
>   * We have a running candidate here, the kick logic is:
>   * Among all the cpus that are within the cpu affinity
> - * 1) if the new->cpu is idle, kick it. This could benefit cache hit
> - * 2) if there are any idle vcpu, kick it.
> - * 3) now all pcpus are busy;
> + * 1) if there are any idle vcpu, kick it.
>
Either:

"if there is an ile CPU, kick it."

Or

"if there are any idle CPUs, kick one."

Feel both more accurate (it's a CPU that is idle, not a vCPU, although,
yes, vcpus of the idle domain do exist, I know), and better in English.

This applies to both here and below, where this line is repeated.

> +  For cache benefit,we first search new->cpu.
> + * 2) now all pcpus are busy;
>   *among all the running vcpus, pick lowest priority one
>   *if snext has higher priority, kick it.
>   *
> @@ -1177,17 +1177,13 @@ runq_tickle(const struct scheduler *ops,
> struct rt_vcpu *new)
>  cpumask_and(_tickled, online, new->vcpu->cpu_hard_affinity);
>  cpumask_andnot(_tickled, _tickled, >tickled);
>  
> -/* 1) if new's previous cpu is idle, kick it for cache benefit
> */
> -if ( is_idle_vcpu(curr_on_cpu(new->vcpu->processor)) )
> -{
> -SCHED_STAT_CRANK(tickled_idle_cpu);
> -cpu_to_tickle = new->vcpu->processor;
> -goto out;
> -}
> -
> -/* 2) if there are any idle pcpu, kick it */
> -/* The same loop also find the one with lowest priority */
> -for_each_cpu(cpu, _tickled)
> +/*
> + * 1) If there are any idle vcpu, kick it.
> + *For cache benefit,we first search new->cpu.
>
"we check new->cpu for first" (or "as first", I think they both can be
used but I'm no native speaker).

With these two adjustments, you can have my:

Reviewed-by: Dario Faggioli 

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Next Xen ARM community call - Wednesday 2nd August 2017

2017-08-02 Thread Stefano Stabellini

Just a reminder that we are going to use the following conf bridge:

Join the call: https://www.uberconference.com/stefano-stabellini
US dial-in number: 669-999-0613
No PIN needed

For the international numbers, go to
https://www.uberconference.com/stefano-stabellini and choose "Join by
Phone", a list of international numbers and a pin will be provided.



On Wed, 26 Jul 2017, Julien Grall wrote:
> Hi all,
> 
> The next Xen ARM community call will be Wednesday 2nd August 2017 5pm BST.
> 
> Do you have any specific topic you would like to discuss?
> 
> Call+44 1223 406065 (Local dial in)
> and enter the access code below followed by # key.
> Participant code: 4915191
> 
> Mobile Auto Dial:
> VoIP: voip://+441223406065;4915191#
> iOS devices: +44 1223 406065,4915191 and press #
> Other devices: +44 1223 406065x4915191#
> 
> Additional Calling Information:
> 
> UK +44 1142828002
> US CA +1 4085761502
> US TX +1 5123141073
> JP +81 453455355
> DE +49 8945604050
> NO +47 73187518
> SE +46 46313131
> FR +33 497235101
> TW +886 35657119
> HU +36 13275600
> IE +353 91337900
> 
> Toll Free
> 
> UK 0800 1412084
> US +1 8668801148
> CN +86 4006782367
> IN 0008009868365
> IN +918049282778
> TW 08000 22065
> HU 0680981587
> IE 1800800022
> KF +972732558877
> 
> -- 
> Julien Grall
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread David Woodhouse

On Wed, 2017-08-02 at 09:16 -0600, Jan Beulich wrote:
> 
> Well, I've seen breakage in all sorts of places I wouldn't have expected
> anyone to fine a need to fiddle with.

This is the nature of 'value subtract', I suppose. 

How about something like this? Somewhat complicated by the fact that
COFF section names get truncated, so maybe there's a nicer place to put
it (or maybe we explicitly include .init.da into the .init.data output
section, in the efi.lds linker script, or something?)

--- a/xen/arch/x86/efi/mkreloc.c
+++ b/xen/arch/x86/efi/mkreloc.c
@@ -346,6 +346,15 @@ int main(int argc, char *argv[])
  memcmp(sec1[i].name, ".lockpro", sizeof(sec1[i].name)) == 0 )
 continue;
 
+   /* For sections with relocations, force them to be writeable */
+   if (memcmp(sec1[i].name, ".init.da", sizeof(sec1[i].name)) == 0)
+   printf(".pushsection .init.data, \"awx\"\n");
+   else if (memcmp(sec1[i].name, ".init.te", sizeof(sec1[i].name) ) == 0)
+   printf(".pushsection .init.text, \"awx\"\n");
+   else
+   printf(".pushsection %.*s, \"awx\"\n", 
(int)sizeof(sec1[i].name), sec1[i].name);
+   printf(".popsection\n");
+
 if ( !sec1[i].rva )
 {
 fprintf(stderr, "Can't handle section %u with zero RVA\n", i);

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] FYI qemu-xen updated to v2.9

2017-08-02 Thread Anthony PERARD

Hi,

I've just pushed QEMU v2.9.0 to our qemu-xen tree.

Regards,

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist

2017-08-02 Thread Boris Ostrovsky

On 08/02/2017 05:24 AM, Jan Beulich wrote:
 Boris Ostrovsky  07/31/17 6:03 PM >>>
> On 07/31/2017 10:45 AM, Jan Beulich wrote:
> Boris Ostrovsky  07/23/17 4:01 AM >>>
>>> On 06/27/2017 01:06 PM, Jan Beulich wrote:
>>> Boris Ostrovsky  06/22/17 8:55 PM >>>
>> --- a/xen/include/asm-x86/mm.h
>> +++ b/xen/include/asm-x86/mm.h
>> @@ -88,7 +88,15 @@ struct page_info
>>   /* Page is on a free list: ((count_info & PGC_count_mask) == 
>> 0). */
>>   struct {
>>   /* Do TLBs need flushing for safety before next page use? 
>> */
>> -bool_t need_tlbflush;
>> +unsigned long need_tlbflush:1;
>> +
>> +/*
>> + * Index of the first *possibly* unscrubbed page in the 
>> buddy.
>> + * One more than maximum possible order (MAX_ORDER+1) to
> Why +1 here and hence ...
 Don't we have MAX_ORDER+1 orders?
>>> So here there might be a simple misunderstanding: I understand the
>>> parenthesized MAX_ORDER+1 to represent "maximum possible
>>> order", i.e. excluding the "one more than", not the least because of
>>> the ...
>>>
> + * accommodate INVALID_DIRTY_IDX.
> + */
> +#define INVALID_DIRTY_IDX (-1UL & (((1UL< +unsigned long first_dirty:MAX_ORDER + 2;
>>> +2 here.
>>>
> ... why +2 instead of +1? And isn't the expression INVALID_DIRTY_IDX 
> wrongly
> parenthesized (apart from lacking blanks around the shift operator)? I'd
> expect you want a value with MAX_ORDER+1 set bits, i.e.
> (1UL << (MAX_ORDER + 1)) - 1. ANDing with -1UL seems quite pointless too.
 Yes to parentheses and AND. Should be (1UL << (MAX_ORDER + 2)) - 1
>>> I.e. I would still expect it to be (1UL << (MAX_ORDER + 1)) - 1
>>> here.
>>
>> Sorry, I still don't get it.
>>
>> Say, MAX_ORDER is 1. Since this implies that indexes 0, 1, 2 and 3 are
>> all valid (because we can have up to 2^(MAX_ORDER+1) pages), don't we
>> need 3 bits to indicate an invalid index?
> Why 0, 1, 2, and 3? 

Of course, it's 0 and 1 only. MAX_ORDER+1 completely threw me off.

-boris

> When MAX_ORDER is 1, we only have a single bit, i.e.
> valid values 0 and 1 (plus one more for the invalid indicator), i.e. need 2 
> bits
> for representation of all used values.
>
> Jan
>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 13/23] x86: refactor psr: CDP: implement CPU init flow.

2017-08-02 Thread Jan Beulich

>>> Yi Sun  08/02/17 5:12 PM >>>
>On 17-08-02 06:35:46, Jan Beulich wrote:
>> >>> Yi Sun  08/01/17 11:04 AM >>>
>> >@@ -1278,15 +1339,31 @@ static void psr_cpu_init(void)
>> >cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 0, ®s);
>> >if ( regs.b & PSR_RESOURCE_TYPE_L3 )
>> >{
>> >+bool do_l3_cat_init = true;
>> >+
>> >cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 1, ®s);
>>  >
>> >feat = feat_l3;
>> >feat_l3 = NULL;
>>  >
>> >-if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CAT) )
>> >-feat_props[FEAT_TYPE_L3_CAT] = _cat_props;
>> >-else
>> >-feat_l3 = feat;
>> >+if ( (regs.c & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) )
>> >+{
>> >+/* If CDP init fails, try to work as L3 CAT. */
>> >+if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CDP) )
>> >+{
>> >+feat_props[FEAT_TYPE_L3_CDP] = _cdp_props;
>> >+/* CDP init succeeds, no need to do L3 CAT init. */
>> >+do_l3_cat_init = false;
>> >+}
>> >+}
>> 
>> The comment ahead of the inner if() now really describes the (implicit)
>> else case. That's somewhat misleading. How about putting feat back
>> into feat_l3 in an actual "else", and using that at once instead of the
>> somewhat clumsily named "do_l3_cat_init" local variable? That would
>> additionally avoid the need for me to ask you to fold the two if()s. Plus
>> the resulting code would become more similar ...
>> 
>> >+if ( do_l3_cat_init )
>> >+{
>> >+if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CAT) )
>> >+feat_props[FEAT_TYPE_L3_CAT] = _cat_props;
>> >+else
>> >+feat_l3 = feat;
>> >+}
>> 
>> ... to this.
>> 
>Thanks for the comment! But I do not understand the intention to put feat
>back into feat_l3 in else. After putting feat back into feat_l3, how to
>handle L3 CAT init? The L3 CAT init should be entered under two cases:
>1. No CDP capability in regs.c. In such case, the feat equals feat_l3 and
>feat_l3 has been NULL at above step.
>2. CDP init fails. In such case, the feat equals feat_l3. If we put feat
>back into feat_l3, then they are equal.
>
>So, we cannot use feat or feat_l3 to decide entering L3 CAT init or not.

You could pull the copying of feat_l3 into feat inside the if() checking
the capability. But your alternative doesn't look too bad as well.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread Jan Beulich

>>> David Woodhouse  08/02/17 4:45 PM >>>
>On Wed, 2017-08-02 at 07:58 -0600, Jan Beulich wrote:
>> > > > David Woodhouse  08/02/17 2:11 PM >>>
>> > This change is sufficient (we believe) to make EFI builds of Xen
>> > actually boot again on current EDK2, is it not?
>>
>> It is safe / sufficient only with the specific behavior you've observed, i.e.
>> when permission restrictions are being removed during ExitBootServices().
>> I don't recall having seen any statement to that effect in the spec, and
>> even if there was such a statement surely we'd find a firmware vendor
>> who doesn't play by it.
>
>I'd be relatively surprised if a vendor were to make changes to the
>underlying TianoCore/EDK2 implementation in this respect. It doesn't
>seem like one of the areas in which they would want to apply the "value
>subtract" that they so desperately cling to.

Well, I've seen breakage in all sorts of places I wouldn't have expected
anyone to fine a need to fiddle with.

>Perhaps we should push to have the spec amended to mandate the current
>behaviour?

That would be nice, but wouldn't keep people from still getting it wrong,
I'm afraid.

>(Hm, ick... the more I look at this, the more I wish my initial
>response had been "la la la it was already broken it wasn't me..." :)

;-)

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 13/23] x86: refactor psr: CDP: implement CPU init flow.

2017-08-02 Thread Yi Sun

On 17-08-02 06:35:46, Jan Beulich wrote:
> >>> Yi Sun  08/01/17 11:04 AM >>>
> >@@ -1278,15 +1339,31 @@ static void psr_cpu_init(void)
> >cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 0, ®s);
> >if ( regs.b & PSR_RESOURCE_TYPE_L3 )
> >{
> >+bool do_l3_cat_init = true;
> >+
> >cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 1, ®s);
>  >
> >feat = feat_l3;
> >feat_l3 = NULL;
>  >
> >-if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CAT) )
> >-feat_props[FEAT_TYPE_L3_CAT] = _cat_props;
> >-else
> >-feat_l3 = feat;
> >+if ( (regs.c & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) )
> >+{
> >+/* If CDP init fails, try to work as L3 CAT. */
> >+if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CDP) )
> >+{
> >+feat_props[FEAT_TYPE_L3_CDP] = _cdp_props;
> >+/* CDP init succeeds, no need to do L3 CAT init. */
> >+do_l3_cat_init = false;
> >+}
> >+}
> 
> The comment ahead of the inner if() now really describes the (implicit)
> else case. That's somewhat misleading. How about putting feat back
> into feat_l3 in an actual "else", and using that at once instead of the
> somewhat clumsily named "do_l3_cat_init" local variable? That would
> additionally avoid the need for me to ask you to fold the two if()s. Plus
> the resulting code would become more similar ...
> 
> >+if ( do_l3_cat_init )
> >+{
> >+if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CAT) )
> >+feat_props[FEAT_TYPE_L3_CAT] = _cat_props;
> >+else
> >+feat_l3 = feat;
> >+}
> 
> ... to this.
> 
Thanks for the comment! But I do not understand the intention to put feat
back into feat_l3 in else. After putting feat back into feat_l3, how to
handle L3 CAT init? The L3 CAT init should be entered under two cases:
1. No CDP capability in regs.c. In such case, the feat equals feat_l3 and
   feat_l3 has been NULL at above step.
2. CDP init fails. In such case, the feat equals feat_l3. If we put feat
   back into feat_l3, then they are equal.

So, we cannot use feat or feat_l3 to decide entering L3 CAT init or not.

I think we may check if 'feat_props[FEAT_TYPE_L3_CDP]' is NULL to decide
entering L3 CAT init or not. Because under above two cases, it is NULL.
E.g.
feat = feat_l3;
feat_l3 = NULL;
if ( (regs.c & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) )
{
/* If CDP init fails, try to work as L3 CAT. */
if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CDP) )
feat_props[FEAT_TYPE_L3_CDP] = _cdp_props;
} 

if ( !feat_props[FEAT_TYPE_L3_CDP] )
{
if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CAT) )
feat_props[FEAT_TYPE_L3_CAT] = _cat_props;
else
feat_l3 = feat;
}

How do you think? Thanks!

> And btw, to avoid spamming the list with another full version of this
> series, I'd be fine with you just submitting v15.1 of this one patch.
> 
> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 9/9] vpci/msix: add MSI-X handlers

2017-08-02 Thread Jan Beulich

>>> Roger Pau Monne  06/30/17 5:01 PM >>>
>Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
>BIR are not trapped by Xen at the moment.

They're mandated r/o by the spec anyway.

>@@ -113,6 +148,35 @@ static int vpci_modify_bar(struct domain *d, const struct 
>vpci_bar *bar,
>if ( IS_ERR(mem) )
>return -PTR_ERR(mem);
 >
>+/*
>+ * Make sure the MSI-X regions of the BAR are not mapped into the domain
>+ * p2m, or else the MSI-X handlers are useless. Only do this when mapping,
>+ * since that's when the memory decoding on the device is enabled.
>+ */
>+for ( i = 0; map && i < ARRAY_SIZE(bar->msix); i++ )
>+{
>+struct vpci_msix_mem *msix = bar->msix[i];
>+
>+if ( !msix || msix->addr == INVALID_PADDR )
>+continue;
>+
>+rc = vpci_unmap_msix(d, msix);

Why do you need this, instead of being able to simply rely on the rangeset
based (un)mapping?

>@@ -405,7 +475,20 @@ static int vpci_init_bars(struct pci_dev *pdev)
>continue;
>}
 >
>-bars[i].addr = (cmd & PCI_COMMAND_MEMORY) ? addr : INVALID_PADDR;
>+if ( cmd & PCI_COMMAND_MEMORY )
>+{
>+unsigned int j;
>+
>+bars[i].addr = addr;
>+
>+for ( j = 0; j < ARRAY_SIZE(bars[i].msix); j++ )
>+if ( bars[i].msix[j] )
>+bars[i].msix[j]->addr = bars[i].addr +
>+bars[i].msix[j]->offset;
>+}
>+else
>+bars[i].addr = INVALID_PADDR;

As (I think) mentioned elsewhere already, this init-time special case looks
dangerous (and unnecessary) to me (or else I'd expect you to also zap
the field when the memory decode bit is being cleared).

>--- /dev/null
>+++ b/xen/drivers/vpci/msix.c
>@@ -0,0 +1,503 @@
>+/*
>+ * Handlers for accesses to the MSI-X capability structure and the memory
>+ * region.
>+ *
>+ * Copyright (C) 2017 Citrix Systems R
>+ *
>+ * This program is free software; you can redistribute it and/or
>+ * modify it under the terms and conditions of the GNU General Public
>+ * License, version 2, as published by the Free Software Foundation.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>+ * General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU General Public
>+ * License along with this program; If not, see 
>.
>+ */
>+
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+
>+#define MSIX_SIZE(num) (offsetof(struct vpci_msix, entries[num]))

The outermost parens are pointless here.

>+static void vpci_msix_control_write(struct pci_dev *pdev, unsigned int reg,
>+union vpci_val val, void *data)
>+{
>+uint8_t seg = pdev->seg, bus = pdev->bus;
>+uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
>+struct vpci_msix *msix = data;
>+paddr_t table_base = pdev->vpci->header.bars[msix->table.bir].addr;
>+bool new_masked, new_enabled;
>+unsigned int i;
>+int rc;
>+
>+new_masked = val.u16 & PCI_MSIX_FLAGS_MASKALL;
>+new_enabled = val.u16 & PCI_MSIX_FLAGS_ENABLE;
>+
>+if ( !msix->enabled && new_enabled )
>+{
>+/* MSI-X enabled. */

Insert "being"?

>+for ( i = 0; i < msix->max_entries; i++ )
>+{
>+if ( msix->entries[i].masked )
>+continue;

Why is the mask bit relevant here, but not the mask-all one?

>+rc = vpci_msix_arch_enable(>entries[i].arch, pdev,
>+   msix->entries[i].addr,
>+   msix->entries[i].data,
>+   msix->entries[i].nr, table_base);
>+if ( rc )
>+{
>+gdprintk(XENLOG_ERR,
<+ "%04x:%02x:%02x.%u: unable to update entry %u: %d\n",
>+ seg, bus, slot, func, i, rc);
>+return;
>+}
>+
>+vpci_msix_arch_mask(>entries[i].arch, pdev, false);

Same question here.

>+}
>+}
>+else if ( msix->enabled && !new_enabled )
>+{
>+/* MSI-X disabled. */
>+for ( i = 0; i < msix->max_entries; i++ )
>+{
>+rc = vpci_msix_arch_disable(>entries[i].arch, pdev);
>+if ( rc )
>+{
>+gdprintk(XENLOG_ERR,
>+ "%04x:%02x:%02x.%u: unable to disable entry %u: 
>%d\n",
>+ seg, bus, slot, func, i, rc);
>+return;
>+}
>+}
>+}
>+
>+if ( (new_enabled != msix->enabled || new_masked != msix->masked) &&
>+ pci_msi_conf_write_intercept(pdev, reg, 2, ) >= 0 )
>+pci_conf_write16(seg, bus, slot, func, reg, val.u32);

DYM val.u16

Re: [Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread David Woodhouse

On Wed, 2017-08-02 at 07:58 -0600, Jan Beulich wrote:
> > > > David Woodhouse  08/02/17 2:11 PM >>>
> > This change is sufficient (we believe) to make EFI builds of Xen
> > actually boot again on current EDK2, is it not?
>
> It is safe / sufficient only with the specific behavior you've observed, i.e.
> when permission restrictions are being removed during ExitBootServices().
> I don't recall having seen any statement to that effect in the spec, and
> even if there was such a statement surely we'd find a firmware vendor
> who doesn't play by it.

I'd be relatively surprised if a vendor were to make changes to the
underlying TianoCore/EDK2 implementation in this respect. It doesn't
seem like one of the areas in which they would want to apply the "value
subtract" that they so desperately cling to.

Perhaps we should push to have the spec amended to mandate the current
behaviour?

> > 
> > What is the "other half" of which you speak? You mentioned marking the
> > section RWX — but for that to be a long-term solution to the issue,
> > we'd basically have to ensure that we always mark *all* sections which
> > might have relocations (even .rodata) as writeable, in case UEFI
> > decides to honour their read-only status at some point in the future.
>
> The other half is to preferably remove all (assembly) contributions from
> sections ending up R or RX. In particular, just like the compiler does,
> such .rodata pieces ought to be moved to .data.rel.ro; .init.text ones
> would need to move to .init.data or .init.data.rel.ro. And ideally we'd have
> link time verification that no relocations exist for R or RX sections ...

It wouldn't be that hard to add build-time verification in mkreloc.c —
it's already processing one section at a time, and can see the flags.
So it shouldn't be hard to make it bail out if a section without the W
flag contains any relocations.

But we might do better just to mark all the COFF sections as writeable,
even if it's done by post-processing the EFI executable. The
*important* part is marking them read-only in our own page tables after
we're running, and grouping them into sections to make that possible is
most useful. UEFI marking them read-only in the brief period while
we're starting up is just kind of pointless from our point of view.

Alternatively, if we really don't trust the UEFI implementation to let
us write to read-only sections after ExitBootServices, we could ensure
that we switch to our own page tables *before* doing the final pass of
relocations. Currently efi_arch_post_exit_boot() will do them just
*before* the switch. 

That's slightly less trivial than it sounds, as it would need to be
position-independent. But doesn't it basically already need to be, or
it would end up relocating itself while it's running?

(Hm, ick... the more I look at this, the more I wish my initial
response had been "la la la it was already broken it wasn't me..." :)

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 8/9] vpci: add a priority parameter to the vPCI register initializer

2017-08-02 Thread Jan Beulich

>>> Roger Pau Monne  06/30/17 5:02 PM >>>
>--- a/xen/drivers/vpci/header.c
>+++ b/xen/drivers/vpci/header.c
>@@ -459,7 +459,7 @@ static int vpci_init_bars(struct pci_dev *pdev)
>return 0;
>}
 >
>-REGISTER_VPCI_INIT(vpci_init_bars);
>+REGISTER_VPCI_INIT(vpci_init_bars, VPCI_PRIORITY_LOW);
 
Why "LOW"? I'd expect the BARs to possibly have further dependents, so
their init should be somewhere in the middle.

>--- a/xen/drivers/vpci/msi.c
>+++ b/xen/drivers/vpci/msi.c
>@@ -290,7 +290,7 @@ static int vpci_init_msi(struct pci_dev *pdev)
>return ret;
>}
 >
>-REGISTER_VPCI_INIT(vpci_init_msi);
>+REGISTER_VPCI_INIT(vpci_init_msi, VPCI_PRIORITY_LOW);
 
Whereas these indeed are unlikely to have further dependents.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread Jan Beulich

>>> David Woodhouse  08/02/17 2:11 PM >>>
>On Wed, 2017-08-02 at 05:56 -0600, Jan Beulich wrote:
>> > > > David Woodhouse  08/02/17 1:30 PM >>>
>> > --- a/xen/arch/x86/efi/efi-boot.h
>> > +++ b/xen/arch/x86/efi/efi-boot.h
>> > @@ -87,7 +87,8 @@ static void __init efi_arch_relocate_image(unsigned long 
>> > delta)
>> > case PE_BASE_RELOC_DIR64:
>> > if ( in_page_tables(addr) )
>> > blexit(L"Unexpected relocation type");
>> > -*(u64 *)addr += delta;
>> > +if ( delta )
>> > +*(u64 *)addr += delta;
>> > break;
>> > default:
>> > blexit(L"Unsupported relocation type");
>>
>> While of course this and the other half of the necessary changes are
>> independent (i.e. this patch could be taken as is), I really had intended
>> to deal with both sides of this issue at once, and hence I'm not entirely
>> happy for this partial change to go in on its own. Nevertheless if need
>> be it can have my ack.
>
>I am, evidently, still being dense.
>
>This change is sufficient (we believe) to make EFI builds of Xen
>actually boot again on current EDK2, is it not?

It is safe / sufficient only with the specific behavior you've observed, i.e.
when permission restrictions are being removed during ExitBootServices().
I don't recall having seen any statement to that effect in the spec, and
even if there was such a statement surely we'd find a firmware vendor
who doesn't play by it.

>What is the "other half" of which you speak? You mentioned marking the
>section RWX — but for that to be a long-term solution to the issue,
>we'd basically have to ensure that we always mark *all* sections which
>might have relocations (even .rodata) as writeable, in case UEFI
>decides to honour their read-only status at some point in the future.

The other half is to preferably remove all (assembly) contributions from
sections ending up R or RX. In particular, just like the compiler does,
such .rodata pieces ought to be moved to .data.rel.ro; .init.text ones
would need to move to .init.data or .init.data.rel.ro. And ideally we'd have
link time verification that no relocations exist for R or RX sections ...

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [XenSummit 2017] Shared coprocessor framework followup

2017-08-02 Thread Andrii Anisov



On 02.08.17 15:58, Edgar E. Iglesias wrote:

Today it's an SMMUv2.

You would need to implement additional ops like [1].


I don't necessarily think so. The context-switch would involve saving and
restoring accelerator state aswell as re-programming the PL.
With allocate/release, we only need to re-program the PL.

Saving the state of the PL might be tricky since we don't know how to
(we don't know the details of the acceelerator ahead of time).
I guess we could somehow let the guest that owns the accellerator
save/restore the state somehow but perhaps that brings us back
in the direction of allocate/release semantics...
In this area context switch means to me a process to prepare a 
coprocessor to (start or continue) serving another domain tasks. It 
depends on a coprocessor nature and use-cases what the context switch 
physically means. And it is up to coprocessor platform driver how 
ctx_switch_from() and ctx_switch_to() are implemented [2]. Theoretically 
the framework should be platform agnostic.


BTW, the most up to date sources you can find here [3].

[1] 
https://github.com/xen-troops/xen/commit/a01f7ccf8bd5e9069f82c6ea6b92e2faca4920d9
[2] 
https://github.com/xen-troops/xen/blob/vgpu-dev/xen/arch/arm/coproc/coproc.h#L81

[3] https://github.com/xen-troops/xen/tree/vgpu-dev

--

*Andrii Anisov*



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v1] tools/libxc: use superpages during restore of HVM guest

2017-08-02 Thread Olaf Hering

During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---

based on RELEASE-4.9.0


 tools/libxc/xc_sr_common.c  |  41 
 tools/libxc/xc_sr_common.h  |  79 +++-
 tools/libxc/xc_sr_restore.c | 135 +-
 tools/libxc/xc_sr_restore_x86_hvm.c | 183 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +-
 5 files changed, 376 insertions(+), 134 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 48fa676f4e..9b68a064eb 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -156,6 +156,47 @@ static void __attribute__((unused)) build_assertions(void)
 }
 
 /*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..ad1a2e6e02 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -140,6 +140,13 @@ struct xc_sr_restore_ops
 int (*setup)(struct xc_sr_context *ctx);
 
 /**
+ * Populate PFNs
+ *
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+/**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
  *
@@ -172,6 +179,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +268,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -331,6 +343,11 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
 } restore;
 };
 } x86_hvm;
@@ -343,6 +360,64 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+}
+
+static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline

Re: [Xen-devel] [PATCH v4 7/9] vpci/msi: add MSI handlers

2017-08-02 Thread Jan Beulich

>>> Roger Pau Monne  06/30/17 5:01 PM >>>
>Add handlers for the MSI control, address, data and mask fields in
>order to detect accesses to them and setup the interrupts as requested
>by the guest.
>
>Note that the pending register is not trapped, and the guest can
>freely read/write to it.
>
>Whether Xen is going to provide this functionality to Dom0 (MSI
>emulation) is controlled by the "msi" option in the dom0 field. When
>disabling this option Xen will hide the MSI capability structure from
>Dom0.

Isn't this last paragraph stale now?

>+void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
>+unsigned int entry, bool mask)
>+{
>+struct domain *d = pdev->domain;
>+const struct pirq *pinfo;
>+struct irq_desc *desc;
>+unsigned long flags;
>+int irq;
>+
>+ASSERT(arch->pirq >= 0);
>+pinfo = pirq_info(d, arch->pirq + entry);
>+ASSERT(pinfo);
>+
>+irq = pinfo->arch.irq;
>+ASSERT(irq < nr_irqs && irq >= 0);
>+
>+desc = irq_to_desc(irq);
>+ASSERT(desc);

I know the goal is Dom0 support only at this point, but nevertheless I think
we shouldn't have ASSERT()s in place which could trigger if Dom0
misbehaves (and which would all need to be audited if we were to extend
support to DomU): I'm not convinced all of the ones above could really only
trigger depending on Xen (mis)behavior.

>+int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
>+ uint64_t address, uint32_t data, unsigned int 
>vectors)
>+{
>+struct msi_info msi_info = {
>+.seg = pdev->seg,
>+.bus = pdev->bus,
>+.devfn = pdev->devfn,
>+.entry_nr = vectors,
>+};
>+unsigned int i;
>+int rc;
>+
>+ASSERT(arch->pirq == -1);

Please introduce a #define for the -1 here, to allow easily matching up
producer and consumer side(s).

>+/* Get a PIRQ. */
>+rc = allocate_and_map_msi_pirq(pdev->domain, -1, >pirq,
>+   MAP_PIRQ_TYPE_MULTI_MSI, _info);
>+if ( rc )
>+{
>+dprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
>+pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>+PCI_FUNC(pdev->devfn), rc);
>+return rc;
>+}
>+
>+for ( i = 0; i < vectors; i++ )
>+{
>+xen_domctl_bind_pt_irq_t bind = {
>+.machine_irq = arch->pirq + i,
>+.irq_type = PT_IRQ_TYPE_MSI,
>+.u.msi.gvec = msi_vector(data) + i,
>+.u.msi.gflags = msi_flags(data, address),
>+};
>+
>+pcidevs_lock();
>+rc = pt_irq_create_bind(pdev->domain, );
>+if ( rc )
>+{
>+dprintk(XENLOG_ERR,
>+"%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
>+pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>+PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
>+spin_lock(>domain->event_lock);
>+unmap_domain_pirq(pdev->domain, arch->pirq);

Don't you also need to undo the pt_irq_create_bind() calls here for all prior
successful iterations?

>+int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
>+  unsigned int vectors)
>+{
>+unsigned int i;
>+
>+ASSERT(arch->pirq != -1);
>+
>+for ( i = 0; i < vectors; i++ )
>+{
>+xen_domctl_bind_pt_irq_t bind = {
>+.machine_irq = arch->pirq + i,
>+.irq_type = PT_IRQ_TYPE_MSI,
>+};
>+
>+pcidevs_lock();
>+pt_irq_destroy_bind(pdev->domain, );

While I agree that the loop should continue of this fails, I'm not convinced
you should entirely ignore the return value here.

>+pcidevs_unlock();
>+}
>+
>+pcidevs_lock();

What good does it do to acquire the lock for most of the loop body as well
as for most of the epilogue, instead of just acquiring it once ahead of the
loop?

>+int vpci_msi_arch_init(struct vpci_arch_msi *arch)
>+{
>+arch->pirq = -1;
>+return 0;
>+}

At this point I think the function would better return void.

>+void vpci_msi_arch_print(struct vpci_arch_msi *arch, uint16_t data,

const

>+ uint64_t addr)
>+{
>+printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu pirq: %d\n",
>+   MASK_EXTR(data, MSI_DATA_VECTOR_MASK),
>+   data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",
>+   data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",
>+   data & MSI_DATA_LEVEL_ASSERT ? "" : "de",
>+   addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",
>+   addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "cpu",

Why "cpu"? Elsewhere we call this mode "fixed".

>--- /dev/null
>+++ b/xen/drivers/vpci/msi.c
>@@ -0,0 +1,348 @@
>+/*
>+ * Handlers for accesses to the MSI capability structure.
>+ *
>+ * Copyright (C) 2017 Citrix Systems R
>+ *
>+ * This program is free software; you can redistribute it and/or
>+

Re: [Xen-devel] xen/arm: Software Step ARMv8 - PC stuck on instruction

2017-08-02 Thread Julien Grall


Hi Florian,

Sorry for the late answer.

On 26/07/17 14:12, Florian Jakobsmeier wrote:

Hello,

i was just testing the single step implementation and realized that the
before mentioned solution is not fully working. I'm still trying to
enable SS for a VM on Xen.


Would you mind sharing the latest version of your code?


To test my implementation i wrote a small Kernel Module and started it
in the DomU. The module only contains a loop which increments a counter
and prints its value.
Right after loading the moduleI start the single step mechanism in the
Dom0 for the VM (again with xen-access).
As soon as i start the SS the VM will stop working.

In the SS handler i print the "cpu_user_regs->pc" program counter. From
there i can see, that each instruction address is used twice: (as it
generates the following outputs)

(XEN) d1v0 do_trap_software_step PC =  0x08081a80
(XEN) d1v0 do_trap_software_step PC =  0x08081a80
(XEN) d1v0 do_trap_software_step PC =  0x08082700
(XEN) d1v0 do_trap_software_step PC =  0x08082700
(XEN) d1v0 do_trap_software_step PC =  0x08082704
(XEN) d1v0 do_trap_software_step PC =  0x08082704
(XEN) d1v0 do_trap_software_step PC =  0x08082708
(XEN) printk: 119614 messages suppressed.
(XEN) d1v0 do_trap_software_step PC =  0x088cbd6c
(XEN) printk: 120131 messages suppressed.
(XEN) d1v0 do_trap_software_step PC =  0x088cbd64
(XEN) printk: 120255 messages suppressed.
(XEN) d1v0 do_trap_software_step PC =  0x088cbd64


Did you look where the PCs point to? Is it your kernel module?




The single step handler "do_trap_software_step" is called from (file is
/arch/arm/arm64/entry.S): hyp_traps_vector
(VBAR_EL2)->guest_sync->do_trap_guest_sync->do_trap_software_step

The ARM ARM (D2-1956 - ARM DDI 0487B.a ID033117) states that, in order
to enables software step:

A debugger enables MDSCR_EL1.SS = 1

Executes an ERET


The PE executes the instruction to be single-stepped

Takes a software step exception on the next instruction



As mentioned I set the needed registers (including MDSCR_EL1) every time
when the "leave_hypervisor_tail" function is called. This function will
called from within the "exit" macro in "/arch/arm/arm64/entry.S" which
is called after every exception return. Including the "guest_sync"
exception.

Right after the "leave_hypervisor_tail" the ERET instruction will also
be called within the "return_from_trap" macro.

Because of the prints in the single step handler I can assure that the
software step exceptions are executed and correctly routed to the
hypervisor.
Yet I can't figure out why the PC got the same value twice and why the
VM will stop working.

My guess is that by setting the needed SS registers ever time when we
leave the guest, the configuration won't allow the guest to execute the
"to be single stepped instruction"
Before executing the (first) instruction the VM will generate the SS
exception (as desired). In the hypervisor we will set the SS registers
again, which could hinder the VM to execute the instruction (which we
want because we already generated an SS exception for this instruction)
and instead generate a second SS exception for it. This will lead to the
second PC print in the single step handler

But I'm not able to find any proof for this.

If I'm using the software step exception for only one instruction and
disable it right after it (from within xen-access with an VM_EVENT) the
VM will work without problems.

Any help to find the missing step in order to enable VM single stepping
would be appreciated

Greetings Florian


2017-07-05 16:03 GMT+02:00 Florian Jakobsmeier
>:


2017-07-04 20:37 GMT+02:00 Julien Grall >:


On 07/04/2017 01:30 PM, Florian Jakobsmeier wrote:

Hello all,


Hi Florian,


  asmlinkage void leave_hypervisor_tail(void)
  {
+/*This methode will be called after the
'guest_entry' macro in
/arch/arm64/entry.S set guest registers
+Check single_step_enabled flag in domain struct
here and set
needed registers
+
+*/
+
+struct vcpu *v = current;
+
+if (
unlikely(v->domain->arch.monitor.singlestep_enabled ) )
+{
+
+WRITE_SYSREG(READ_SYSREG(MDCR_EL2)  | HDCR_TDE,
MDCR_EL2);
+WRITE_SYSREG(READ_SYSREG(SPSR_EL2)  | 0x20,
SPSR_EL2 );
+WRITE_SYSREG(READ_SYSREG(MDSCR_EL1) | 0x1,
MDSCR_EL1);
+
+if

Re: [Xen-devel] [PATCH v2] x86/hvm: Allow guest_request vm_events coming from userspace

2017-08-02 Thread Razvan Cojocaru




On 01.08.2017 19:07, Tamas K Lengyel wrote:

On Tue, Aug 1, 2017 at 4:30 AM, Andrew Cooper  wrote:

On 01/08/17 10:46, Alexandru Isaila wrote:

Allow guest userspace code to request that a vm_event be sent out
via VMCALL. This functionality seems to be handy for a number of
Xen developers, as stated on the mailing list (thread "[Xen-devel]
HVMOP_guest_request_vm_event only works from guest in ring0").
This is a use case in communication between a userspace application
in the guest and the introspection application in dom0.

Signed-off-by: Alexandru Isaila 


This issue has been argued several times before, and while I am in
favour of the change, there is a legitimate argument that it breaks one
of our security boundaries.

One intermediate option comes to mind however.

Could we introduce a new monitor op which permits the use of
HVMOP_guest_request_vm_event from userspace?  This way, it requires a
positive action on behalf of the introspection agent to relax the CPL
check, rather than having the CPL check unconditionally relaxed.


I agree, it would be required to gate this on a monitor option that is
disabled by default.


That's the way we'll move forward. About that: would you prefer a new 
libxc function that toggles only the userspace part, or that we add a 
bool parameter to the existing xc_monitor_guest_request()?



Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [XenSummit 2017] Shared coprocessor framework followup

2017-08-02 Thread Edgar E. Iglesias

On Wed, Aug 02, 2017 at 02:07:17PM +0300, Andrii Anisov wrote:
> Hello Edgar,

Hi Andrii,

> 
> 
> On 01.08.17 20:13, Edgar E. Iglesias wrote:
> >>Are master ports behind IOMMU?
> >Yes, they are.
> What IOMMU IP is used?

Today it's an SMMUv2.


> 
> >>>It's possible to reprogram the configuration of the PL and swap 
> >>>accelerators in
> >>>and out on the fly. It's probably going to be too slow for trying to
> >>>context switch between guests
> >>So let us assume it is a FW-less resource we need to share between domains.
> >>Context switch will be stripped to mapping its mmio (or passing mmio
> >>accesses) next domain to serve and IOMMU configuration switching.
> >>Yep, IOMMU matters.
> >OK. I think the PL is more like a firmware enabled resource.
> >The PL configuration needs to be loaded much like firmware
> >otherwise the accelerators can't change shape and all guests
> >must use the same kind.
> I understand this.
> But I got your words like you are going to give a try to the same kind for
> all domains first. Because you assumed that reconfiguring would be too slow,
> what is actually discussable.

Aha, OK. What I meant was that it may be to slow for context-switching
at a micro-level. But with an allocate/release interface for batch
processing, I don't think it's to slow to reprogram the PL between
guests.

I agree that we need hard numbers on the PL programming before we rule
things out. I'll try to dig internally for some.


> >>>  so I think primarily we would be looking at
> >>>a way to lets say, "allocate" and "release" the resources for batch use.
> >>Kind of voluntary preemption?
> >Right. That could be a start.
> >In the future perhaps it makes sense to context-switch.
> We still need the context switch to be done. The difference is that now it
> could be done only when the accelerator is not busy.

I don't necessarily think so. The context-switch would involve saving and
restoring accelerator state aswell as re-programming the PL.
With allocate/release, we only need to re-program the PL.

Saving the state of the PL might be tricky since we don't know how to
(we don't know the details of the acceelerator ahead of time).
I guess we could somehow let the guest that owns the accellerator
save/restore the state somehow but perhaps that brings us back
in the direction of allocate/release semantics...


> >>>If a guest cannot allocate an accelerator, it could fall back to emulation
> >>>or just to using SW libraries until an accelerator slot is available.
> >>What about the thing I called "an access emulation" [1]? From the domain's
> >>point of view it would be reflected in a delayed response (via IRQ or
> >>register polling) from an accelerator.
> >>
> >>I guess the concept described above is feasible even with current SCF code
> >>and will not take too much efforts.
> >I'll have a look, thanks!
> Do not hesitate to contact us in case you need any help or clarification.

Thanks!
Edgar

> 
> 
> -- 
> 
> *Andrii Anisov*
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v15 13/23] x86: refactor psr: CDP: implement CPU init flow.

2017-08-02 Thread Jan Beulich

>>> Yi Sun  08/01/17 11:04 AM >>>
>@@ -1278,15 +1339,31 @@ static void psr_cpu_init(void)
>cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 0, ®s);
>if ( regs.b & PSR_RESOURCE_TYPE_L3 )
>{
>+bool do_l3_cat_init = true;
>+
>cpuid_count_leaf(PSR_CPUID_LEVEL_CAT, 1, ®s);
 >
>feat = feat_l3;
>feat_l3 = NULL;
 >
>-if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CAT) )
>-feat_props[FEAT_TYPE_L3_CAT] = _cat_props;
>-else
>-feat_l3 = feat;
>+if ( (regs.c & PSR_CAT_CDP_CAPABILITY) && (opt_psr & PSR_CDP) )
>+{
>+/* If CDP init fails, try to work as L3 CAT. */
>+if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CDP) )
>+{
>+feat_props[FEAT_TYPE_L3_CDP] = _cdp_props;
>+/* CDP init succeeds, no need to do L3 CAT init. */
>+do_l3_cat_init = false;
>+}
>+}

The comment ahead of the inner if() now really describes the (implicit)
else case. That's somewhat misleading. How about putting feat back
into feat_l3 in an actual "else", and using that at once instead of the
somewhat clumsily named "do_l3_cat_init" local variable? That would
additionally avoid the need for me to ask you to fold the two if()s. Plus
the resulting code would become more similar ...

>+if ( do_l3_cat_init )
>+{
>+if ( !cat_init_feature(®s, feat, info, FEAT_TYPE_L3_CAT) )
>+feat_props[FEAT_TYPE_L3_CAT] = _cat_props;
>+else
>+feat_l3 = feat;
>+}

... to this.

And btw, to avoid spamming the list with another full version of this
series, I'd be fine with you just submitting v15.1 of this one patch.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread David Woodhouse

On Wed, 2017-08-02 at 05:56 -0600, Jan Beulich wrote:
> > 
> > > 
> > > > 
> > > > David Woodhouse  08/02/17 1:30 PM >>>
> > --- a/xen/arch/x86/efi/efi-boot.h
> > +++ b/xen/arch/x86/efi/efi-boot.h
> > @@ -87,7 +87,8 @@ static void __init efi_arch_relocate_image(unsigned long 
> > delta)
> > case PE_BASE_RELOC_DIR64:
> > if ( in_page_tables(addr) )
> > blexit(L"Unexpected relocation type");
> > -*(u64 *)addr += delta;
> > +if ( delta )
> > +*(u64 *)addr += delta;
> > break;
> > default:
> > blexit(L"Unsupported relocation type");
>
> While of course this and the other half of the necessary changes are
> independent (i.e. this patch could be taken as is), I really had intended
> to deal with both sides of this issue at once, and hence I'm not entirely
> happy for this partial change to go in on its own. Nevertheless if need
> be it can have my ack.

I am, evidently, still being dense.

This change is sufficient (we believe) to make EFI builds of Xen
actually boot again on current EDK2, is it not?

What is the "other half" of which you speak? You mentioned marking the
section RWX — but for that to be a long-term solution to the issue,
we'd basically have to ensure that we always mark *all* sections which
might have relocations (even .rodata) as writeable, in case UEFI
decides to honour their read-only status at some point in the future.

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.9 + kernel 4.13rc2 -- ballooning regression? reappearance of "Over-allocation for domain 1" errors

2017-08-02 Thread Juergen Gross

On 01/08/17 16:28, PGNet Dev wrote:
> On 7/28/17 9:02 AM, PGNet Dev wrote:
>> On 7/27/17 11:23 PM, Juergen Gross wrote:
>>> Can you please post the domain's config file used to create the domain
>>> and the kernel config?
>>
>> Sure.
>>
>>https://pastebin.com/M6cr2pX7
>>
> 
> Any add'l info needed?

No, I don't think so.

IMO the problem is related to the fact that the balloon driver tries to
use then kernel's view of how much memory it is owning and setting this
number in relation to Xen's view how much memory it should try to have.

Maybe before adding memory from Xen the kernel should ask the hypervisor
how much memory it has already from Xen's point of view and how much it
is allowed to have. This will avoid the messages you have seen as long
as there are no interfering actions from Xen (e.g. lowering the maximum
reservation) while the kernel is trying to balloon up.

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread Jan Beulich

>>> David Woodhouse  08/02/17 1:30 PM >>>
>--- a/xen/arch/x86/efi/efi-boot.h
>+++ b/xen/arch/x86/efi/efi-boot.h
>@@ -87,7 +87,8 @@ static void __init efi_arch_relocate_image(unsigned long 
>delta)
>case PE_BASE_RELOC_DIR64:
>if ( in_page_tables(addr) )
>blexit(L"Unexpected relocation type");
>-*(u64 *)addr += delta;
>+if ( delta )
>+*(u64 *)addr += delta;
>break;
>default:
>blexit(L"Unsupported relocation type");

While of course this and the other half of the necessary changes are
independent (i.e. this patch could be taken as is), I really had intended
to deal with both sides of this issue at once, and hence I'm not entirely
happy for this partial change to go in on its own. Nevertheless if need
be it can have my ack.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3 8/8] sg-run-job: hook the memdisk test into examine

2017-08-02 Thread Roger Pau Monne

Hook the memdisk parameter detection and the saving of the host
properties into the examine jobs.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Do not pass a host ident to ts-examine-hostprops-save.
 - Use .- for ts-memdisk-try-append so that the rest of the job will
   run even if this step fails.

Changes since v1:
 - Run the memdisk test first (so that we don't leave the host in a
   weird state).
 - Pass a host to the examine-hostprops-save.
---
 sg-run-job | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sg-run-job b/sg-run-job
index ed1ed3c8..de6e3f76 100755
--- a/sg-run-job
+++ b/sg-run-job
@@ -655,6 +655,7 @@ proc examine-host-install-xen {} {
 proc examine-host-examine {install} {
 global ok
 catching-otherwise fail {
+   run-ts -.  =ts-memdisk-try-append + host
examine-host-install-$install
run-ts .   =ts-examine-serial-pre + host
run-ts .   reboot   ts-host-reboot+ host
@@ -663,6 +664,7 @@ proc examine-host-examine {install} {
 if {$ok} {
run-ts -.  =   ts-examine-serial-post + host
run-ts .   =   ts-examine-logs-save   + host
+   run-ts .   =   ts-examine-hostprops-save
 }
 }
 
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3 4/8] ts-freebsd-host-install: add arguments to test memdisk append options

2017-08-02 Thread Roger Pau Monne

This is needed in order to figure out which memdisk options should be
used to boot the images on each specific box.

Note that when passed the --recordappend argument upon success the
script stores the tentative host property in the runvars.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Fix commit message.

Changes since v1:
 - Provide a --recordappend argument to force the recording the
   memdisk parameters.
 - Exit gracefully if a bootonly test is attempted against a
   non-supported architecture.
 - Use NONE instead of an empty string when calling
   setup_netboot_memdisk if nothing should be appended.
 - Do not perform any arch test in ts-freebsd-host-install.
---
 ts-freebsd-host-install | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/ts-freebsd-host-install b/ts-freebsd-host-install
index 483b9aec..50af5dd3 100755
--- a/ts-freebsd-host-install
+++ b/ts-freebsd-host-install
@@ -41,6 +41,23 @@ use Osstest::TestSupport;
 
 tsreadconfig();
 
+our $bootonly;
+our $memdisk_append;
+our $record_append;
+while (@ARGV && $ARGV[0] =~ m/^-/g) {
+if ($ARGV[0] =~ m/^--memdiskappend=(.*)/) {
+$memdisk_append = $1;
+} elsif ($ARGV[0] eq "--testboot") {
+$memdisk_append //= "NONE";
+$bootonly = 1;
+} elsif ($ARGV[0] eq "--recordappend") {
+$record_append = 1;
+} else {
+die "Unknown argument $ARGV[0]";
+}
+shift @ARGV;
+}
+
 our ($whhost) = @ARGV;
 $whhost ||= 'host';
 our $ho= selecthost($whhost);
@@ -95,7 +112,7 @@ END
 
 # Setup the pxelinux config file
 logm("Booting from installer image at $pxeimg");
-setup_netboot_memdisk($ho, $pxeimg);
+setup_netboot_memdisk($ho, $pxeimg, $memdisk_append);
 }
 
 sub install () {
@@ -247,6 +264,12 @@ power_state($ho, 1);
 logm("Waiting for the installer to boot");
 await_tcp(get_timeout($ho,'reboot',$timeout), 5, $ho);
 
+if ($bootonly) {
+hostprop_putative_record($ho, "MemdiskAppend", $memdisk_append)
+if $record_append;
+exit 0;
+}
+
 # Next boot will be from local disk
 setup_netboot_local($ho);
 
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3 0/8] Add support to examine the needed memdisk flags for each hos

2017-08-02 Thread Roger Pau Monne

Hello,

This is the remaining memdisk examine series (ie: starting with the
first non-acked patch) and it's rebased on top of the previous FreeBSD
host install series.

The series expands the examine flight in order to test which memdisk
options should be used for each host. Hopefully all of this will be
automatic upon running a examine flight. The required options are
detected by ts-memdisk-try-append and stored into the database by
ts-examine-hostprops-save.

This has been tested except for ts-examine-hostprops-save because I
haven't run it in a flight with "real" blessing.

The series can be found at:

git://xenbits.xen.org/people/royger/osstest.git freebsd_v11

Which as said is already rebased on top of the previous FreeBSD
series.

Note that before running the FreeBSD branch flights an examine flight
with "real" intended blessing must be run, so that the proper memdisk
append options are set for each host. After that FreeBSD flights can
be run as normal.

The output of an examine flight with this series can be found at:

http://osstest.xs.citrite.net/~osstest/testlogs/logs/71931/

List of patches:

A = Acked

  1/8 HostDB: introduce set_property
A 2/8 mfi-common: move set_freebsd_runvars to
A 3/8 TestSupport: introduce
  4/8 ts-freebsd-host-install: add arguments to test
A 5/8 ts-memdisk-try-append: introduce a script to
  6/8 ts-examine-hostprops-save: introduce a script
A 7/8 make-hosts-flight: set runvars for FreeBSD
  8/8 sg-run-job: hook the memdisk test into examine

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3 6/8] ts-examine-hostprops-save: introduce a script to save properties

2017-08-02 Thread Roger Pau Monne

This script turns the properties stored in the runvars using the
format hostprop/$ident/$prop=$val into host properties stored in the
database.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Call selecthost based on the idents passed in the putative
   hostprops runvar.
 - Fix commit message.
 - Use '/' instead of '_' in the runvars.
 - Do a dry run if flight blessing != real.
 - Fix parentheses indentation.

Changes since v1:
 - Select a host for setting the properties.
 - Print a message before exiting if blessing != real.
 - Skip properties that don't contain the selected host.
---
 ts-examine-hostprops-save | 46 ++
 1 file changed, 46 insertions(+)
 create mode 100755 ts-examine-hostprops-save

diff --git a/ts-examine-hostprops-save b/ts-examine-hostprops-save
new file mode 100755
index ..99a4627b
--- /dev/null
+++ b/ts-examine-hostprops-save
@@ -0,0 +1,46 @@
+#!/usr/bin/perl -w
+# This is part of "osstest", an automated testing framework for Xen.
+# Copyright (C) 2017 Citrix Inc.
+# 
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+# 
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see .
+
+use strict qw(vars);
+use DBI;
+use POSIX;
+
+unshift @INC, qw(.);
+use Osstest;
+use Osstest::TestSupport;
+
+tsreadconfig();
+
+our $blessing = intended_blessing();
+
+logm("setting host properties");
+
+foreach my $k (sort keys %r) {
+next unless $k =~ m/^hostprop\/([^\/]*)\/([^\/]*)$/;
+my $ho = selecthost($1);
+my $prop = $2;
+
+logm("recording for $ho->{Name} $prop=$r{$k}");
+
+# NB: in order to aid debug only attempt to save the host props
+# on flights with real blessing, for the rest just do a dry run.
+if ($blessing eq "real") {
+set_host_property($ho, $prop, $r{$k});
+} else {
+logm("not saving host prop with blessing $blessing != real");
+}
+}
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3 2/8] mfi-common: move set_freebsd_runvars to mfi-common

2017-08-02 Thread Roger Pau Monne

So that it can also be used by make-hosts-flight. No functional change
intended.

Signed-off-by: Roger Pau Monné 
Acked-by: Ian Jackson 
---
 make-freebsd-flight | 31 ---
 mfi-common  | 31 +++
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/make-freebsd-flight b/make-freebsd-flight
index 64dfe9a6..72695742 100755
--- a/make-freebsd-flight
+++ b/make-freebsd-flight
@@ -36,37 +36,6 @@ job_create_build_filter_callback () {
 :
 }
 
-set_freebsd_runvars () {
-# Caller should have done if required:
-# local freebsd_runvars
-#
-# Figure out where are the installer binaries. The order is the
-# following:
-#
-# 1. Env variable FREEBSD__BUILDJOB: use the output from a
-# previous build--freebsd.
-#
-# 2. Env variables FREEBSD_DIST, FREEBSD_VERSION: set before calling
-# into make-flight, provide the path to the installer image, the sets
-# to install and the version being installed.
-#
-# 3. Config file FreeBSDDist, FreeBSDVersion: same as 2. except that
-# they are set on the config file.
-#
-envvar="FREEBSD_${arch^^}_BUILDJOB"
-if [ -n "${!envvar}" ]; then
-freebsd_runvars="freebsdbuildjob=${!envvar}"
-elif [ -n "$FREEBSD_DIST" ] && [ -n "$FREEBSD_VERSION" ]; then
-freebsd_runvars="freebsd_distpath=$FREEBSD_DIST/$arch \
- freebsd_version=$FREEBSD_VERSION"
-else
-distpath=`getconfig "FreeBSDDist"`
-version=`getconfig "FreeBSDVersion"`
-freebsd_runvars="freebsd_distpath=$distpath/$arch \
- freebsd_version=$version"
-fi
-}
-
 for arch in "$arches"; do
 set_freebsd_runvars
 job_create_build build-$arch-freebsd build-freebsd  \
diff --git a/mfi-common b/mfi-common
index 4827c827..8a9546ab 100644
--- a/mfi-common
+++ b/mfi-common
@@ -113,6 +113,37 @@ set_hostos_runvars () {
   esac
 }
 
+set_freebsd_runvars () {
+# Caller should have done if required:
+# local freebsd_runvars
+#
+# Figure out where are the installer binaries. The order is the
+# following:
+#
+# 1. Env variable FREEBSD__BUILDJOB: use the output from a
+# previous build--freebsd.
+#
+# 2. Env variables FREEBSD_DIST, FREEBSD_VERSION: set before calling
+# into make-flight, provide the path to the installer image, the sets
+# to install and the version being installed.
+#
+# 3. Config file FreeBSDDist, FreeBSDVersion: same as 2. except that
+# they are set on the config file.
+#
+local envvar="FREEBSD_${arch^^}_BUILDJOB"
+if [ -n "${!envvar}" ]; then
+freebsd_runvars="freebsdbuildjob=${!envvar}"
+elif [ -n "$FREEBSD_DIST" ] && [ -n "$FREEBSD_VERSION" ]; then
+freebsd_runvars="freebsd_distpath=$FREEBSD_DIST/$arch \
+ freebsd_version=$FREEBSD_VERSION"
+else
+local distpath=`getconfig "FreeBSDDist"`
+local version=`getconfig "FreeBSDVersion"`
+freebsd_runvars="freebsd_distpath=$distpath/$arch \
+ freebsd_version=$version"
+fi
+}
+
 create_build_jobs () {
 
   local arch
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3 3/8] TestSupport: introduce hostprop_putative_record

2017-08-02 Thread Roger Pau Monne

This is used to store tentative host properties in the runvars of a
job, with the expectation that at some point (ie: at the end of the
job) they will be turned into real properties stored in the database.

Signed-off-by: Roger Pau Monné 
Acked-by: Ian Jackson 
---
Changes since v2:
 - Use the following runvar format to store the putative host props:
   hostprop/$ident/$prop=$val.
---
 Osstest/TestSupport.pm | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 27b2342c..a5cca391 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -82,6 +82,7 @@ BEGIN {
   power_state power_cycle power_cycle_sleep
   serial_fetch_logs set_host_property
   propname_massage propname_check
+  hostprop_putative_record
  
   get_stashed open_unique_stashfile compress_stashed
   dir_identify_vcs
@@ -1189,6 +1190,12 @@ sub set_host_property ($$$) {
 $mhostdb->set_property($ho, $prop, $val);
 }
 
+sub hostprop_putative_record ($$$) {
+my ($ho, $prop, $val) = @_;
+
+store_runvar("hostprop/$ho->{Ident}/$prop", $val);
+}
+
 sub get_target_property ($$;$);
 sub get_target_property ($$;$) {
 my ($ho, $prop, $defval) = @_;
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3 7/8] make-hosts-flight: set runvars for FreeBSD test

2017-08-02 Thread Roger Pau Monne

This is needed in order to run the memdisk test.

Signed-off-by: Roger Pau Monné 
Acked-by: Ian Jackson 
---
 make-hosts-flight | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/make-hosts-flight b/make-hosts-flight
index 0152dfe1..d5670857 100755
--- a/make-hosts-flight
+++ b/make-hosts-flight
@@ -69,10 +69,13 @@ hosts_iterate () {
 case $kern in
   xen|linux)
 local di_version=`getconfig_TftpDiVersion_suite $suite`
+local freebsd_runvars
+set_freebsd_runvars
 runvars+=" 
kernkind=pvops
all_host_di_version=$di_version
all_host_suite=$suite
+   $freebsd_runvars
  "
 ;;
 esac
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3 1/8] HostDB: introduce set_property

2017-08-02 Thread Roger Pau Monne

And provide a helper in TestSupport to use it. This allows osstest to
set host properties from test script themselves (instead of using
the mg-hosts clu).

Note that the setting of host properties is limited to flights with
intended blessing real, and it will fail for any other blessing.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Die if attempting to modify a host prop with intended blessing !=
   real.
---
 Osstest/HostDB/Executive.pm | 23 +++
 Osstest/HostDB/Static.pm|  7 +++
 Osstest/TestSupport.pm  |  8 +++-
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/Osstest/HostDB/Executive.pm b/Osstest/HostDB/Executive.pm
index 300178bb..2a961b6d 100644
--- a/Osstest/HostDB/Executive.pm
+++ b/Osstest/HostDB/Executive.pm
@@ -51,6 +51,29 @@ END
 }
 }
 
+sub set_property() {
+my ($hd, $ho, $prop, $val) = @_;
+my $rmq = $dbh_tests->prepare(<prepare(<execute($ho->{Name}, $prop);
+if (length $val) {
+$addq->execute($ho->{Name}, $prop, $val);
+   }
+});
+}
+
 sub get_flags ($$) {
 my ($hd, $ho) = @_;
 
diff --git a/Osstest/HostDB/Static.pm b/Osstest/HostDB/Static.pm
index 60f5d3c2..3191c565 100644
--- a/Osstest/HostDB/Static.pm
+++ b/Osstest/HostDB/Static.pm
@@ -40,6 +40,13 @@ sub get_properties ($$$) { #method
 my ($hd, $name, $hp) = @_;
 }
 
+sub set_property() {
+my ($hd, $ho, $prop, $val) = @_;
+
+die
+"Cannot set property in standalone mode for $ho->{Name} $prop = $val\n";
+}
+
 sub get_flags ($$) { #method
 my ($hd, $ho) = @_;
 
diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm
index 0af5..27b2342c 100644
--- a/Osstest/TestSupport.pm
+++ b/Osstest/TestSupport.pm
@@ -80,7 +80,7 @@ BEGIN {
   get_target_property get_host_native_linux_console
   hostnamepath hostnamepath_list set_runtime_hostflag
   power_state power_cycle power_cycle_sleep
-  serial_fetch_logs
+  serial_fetch_logs set_host_property
   propname_massage propname_check
  
   get_stashed open_unique_stashfile compress_stashed
@@ -1183,6 +1183,12 @@ sub get_host_property ($$;$) {
 return defined($val) ? $val : $defval;
 }
 
+sub set_host_property ($$$) {
+my ($ho,$prop,$val) = @_;
+
+$mhostdb->set_property($ho, $prop, $val);
+}
+
 sub get_target_property ($$;$);
 sub get_target_property ($$;$) {
 my ($ho, $prop, $defval) = @_;
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v3 5/8] ts-memdisk-try-append: introduce a script to test memdisk options

2017-08-02 Thread Roger Pau Monne

The intended usage is to run this script against every host in order
to record the possible needed memdisk flags.

Signed-off-by: Roger Pau Monné 
Acked-by: Ian Jackson 
---
Changes since v1:
 - Get the arch of the job and exit with 0 if it's not supported.
 - Pass the --recordappend argument to ts-memdisk-try-append.
---
 ts-memdisk-try-append | 45 +
 1 file changed, 45 insertions(+)
 create mode 100755 ts-memdisk-try-append

diff --git a/ts-memdisk-try-append b/ts-memdisk-try-append
new file mode 100755
index ..86c6ee41
--- /dev/null
+++ b/ts-memdisk-try-append
@@ -0,0 +1,45 @@
+#!/bin/bash
+
+# This is part of "osstest", an automated testing framework for Xen.
+# Copyright (C) 2017 Citrix Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see .
+
+set -xe -o posix
+
+arch=`perl -e '
+use Osstest;
+use Osstest::TestSupport;
+
+tsreadconfig();
+print $r{"arch"} or die $!;
+  '`
+
+case "$arch" in
+amd64)
+;;
+*)
+echo "Arch $arch not supported for memdisk tests"
+exit 0
+;;
+esac
+
+if ./ts-freebsd-host-install --testboot --recordappend $@; then
+exit 0
+elif ./ts-freebsd-host-install --testboot --recordappend \
+   --memdiskappend="raw" $@; then
+exit 0
+fi
+
+exit 1
-- 
2.11.0 (Apple Git-81)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link

2017-08-02 Thread David Woodhouse

On Wed, 2017-08-02 at 03:17 -0600, Jan Beulich wrote:
> 
> >... but then again, if the whole function is really doing nothing at
> >all when invoked with delta==0 then perhaps it would just be easier to
> >remove the first pass altogether. I feel sure I'm missing something,
> 
> The reason is even visible in context above: We can't call blexit() after
> having called ExitBootServices(). Hence we need a "dry run" done early,
> to be certain we won't encounter unhandlable relocations later on.

Ah, thanks. I'd glossed over that as a "can never happen" condition.
And the 'default:' case really is that — the build system is broken if
we ever see that. But the DIR64 / in_page_tables() one is less provably
impossible.

> >Either way, this is still broken before my patch though, right?
> 
> As long as there are .rodata entries needing relocation (which I
> understand you've found example of), yes.

And more to the point, .init.text entries needing relocation (since
UEFI isn't marking read-only *data* sections read-only yet for some
reason; only R+X sections).

But still that basically means that new versions of UEFI are going to
stop booting all existing EFI Xen images, doesn't it? Perhaps we should
look for a mitigation on the UEFI side.

Jeff, Jiewen, has this actually been shipped in an EDK2 release yet?

I confess I haven't actually built a current OVMF and *tested* the
hypothesis that it breaks; it just seems "obvious" :)

Adding Mark. Background: we think
https://github.com/tianocore/edk2/commit/d0e92aad46 will break existing
Xen EFI binaries because they write to a read-only code section
(.init.te) before calling ExitBootServices().

smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 06/13] libxl: change p9 to use generec add function

2017-08-02 Thread Oleksandr Grytsov

On Tue, Aug 1, 2017 at 4:00 PM, Wei Liu  wrote:
> On Tue, Aug 01, 2017 at 02:58:19PM +0300, Oleksandr Grytsov wrote:
>> On Mon, Jul 31, 2017 at 5:36 PM, Wei Liu  wrote:
>> > On Sun, Jul 30, 2017 at 09:42:09PM +0300, Oleksandr Grytsov wrote:
>> >> On Fri, Jul 28, 2017 at 7:23 PM, Wei Liu  wrote:
>> >> > On Fri, Jul 28, 2017 at 03:11:34PM +0100, Wei Liu wrote:
>> >> >> On Tue, Jul 18, 2017 at 05:25:23PM +0300, Oleksandr Grytsov wrote:
>> >> >> [...]
>> >> >> >  /* Waits for the passed device to reach state XenbusStateInitWait.
>> >> >> >   * This is not really useful by itself, but is important when 
>> >> >> > executing
>> >> >> >   * hotplug scripts, since we need to be sure the device is in the 
>> >> >> > correct
>> >> >> > @@ -3565,6 +3559,7 @@ extern const struct libxl_device_type 
>> >> >> > libxl__usbctrl_devtype;
>> >> >> >  extern const struct libxl_device_type libxl__usbdev_devtype;
>> >> >> >  extern const struct libxl_device_type libxl__pcidev_devtype;
>> >> >> >  extern const struct libxl_device_type libxl__vdispl_devtype;
>> >> >> > +extern const struct libxl_device_type libxl__p9_devtype;
>> >> >> >
>> >> >> >  extern const struct libxl_device_type *device_type_tbl[];
>> >> >> >
>> >> >> > diff --git a/tools/libxl/libxl_types.idl 
>> >> >> > b/tools/libxl/libxl_types.idl
>> >> >> > index 25563cf..96dbaed 100644
>> >> >> > --- a/tools/libxl/libxl_types.idl
>> >> >> > +++ b/tools/libxl/libxl_types.idl
>> >> >> > @@ -804,7 +804,7 @@ libxl_domain_config = Struct("domain_config", [
>> >> >> >  ("vfbs", Array(libxl_device_vfb, "num_vfbs")),
>> >> >> >  ("vkbs", Array(libxl_device_vkb, "num_vkbs")),
>> >> >> >  ("vtpms", Array(libxl_device_vtpm, "num_vtpms")),
>> >> >> > -("p9", Array(libxl_device_p9, "num_p9s")),
>> >> >> > +("p9s", Array(libxl_device_p9, "num_p9s")),
>> >> >>
>> >> >> Oh, no, please don't do this. We can't change the name of the fields.
>> >> >>
>> >> >> There is already on irregular device type -- the PCI device. I suppose
>> >> >> you probably need another hook somewhere. And please convert PCI 
>> >> >> devices
>> >> >> if you can.
>> >> >
>> >> > OK, going through the code I think we need to come to a conclusion if we
>> >> > want an extra callback to handle the irregular device names first
>> >> > because that's likely to affect the code of the framework in previous
>> >> > patch.
>> >>
>> >> Actually creating new callback to handle irregular device name looks
>> >> not so good.
>> >> There is the pattern which all namings should follow. May be it has to
>> >> be documented
>> >
>> > The normal pattern is DEVTYPEs.
>> >
>> >> somewhere. p9 was added recently we can ask the author to review this 
>> >> rename.
>> >
>> > Once it is released we can't change it, of course unless we deem it
>> > unstable. I'm two minded here. P9 was released in 4.9, which was only a
>> > few months old.
>> >
>>
>> IMHO this the bug I mean wrong naming and it should be fixed.
>>
>> > But we definitely can't change the PCI type. It has been around since
>> > forever. And there is provision in code to deal with that.
>>
>> Agree, I didn't touch PCI.
>>
>> >> From other side this rename touches only internals changes: no changes
>> >> in config file
>> >> or CLI interface.
>> >>
>> >
>> > As said, the framework need to be ready to deal with PCI anyway.
>> >
>> > What sort of issues do you foresee here?
>>
>> Do you mean in case to rework PCI to use the device framework?
>>
>
> No, I mean adding the new hook. You said "handle irregular device name
> looks not so good"

As for me when only internal name of structure or fields are changed
then it should not break anyone configs, setup etc.
Creating hooks in this case makes code hard to read and maintain.

-- 
Best Regards,
Oleksandr Grytsov.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH] x86/efi: Do not write relocations in efi_arch_relocate_image() first pass

2017-08-02 Thread David Woodhouse

The function is invoked with delta=0 before ExitBootServices() is called,
as a dummy run purely to validate that all the relocations can be handled.
This allows us to exit gracefully with an error message.

However, we have relocations in read-only sections such as .rodata and
.init.te(xt). Recent versions of UEFI will actually make those sections
read-only, which will cause a fault. This functionaity was added in
EDK2 commit d0e92aad4 ("MdeModulePkg/DxeCore: Add UEFI image protection.")

It's OK to actually make the changes in the later pass because UEFI will
tear down the protection when ExitBootServices() is called, because it
knows we're going to need to do this kind of thing.

Signed-off-by: David Woodhouse 
---
This basically means that new versions of UEFI are going to break
(all?) existing EFI Xen versions? Or at least any that have relocations
in .init.text.

 xen/arch/x86/efi/efi-boot.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index bedac5c..8d295ff 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -87,7 +87,8 @@ static void __init efi_arch_relocate_image(unsigned long 
delta)
 case PE_BASE_RELOC_DIR64:
 if ( in_page_tables(addr) )
 blexit(L"Unexpected relocation type");
-*(u64 *)addr += delta;
+if ( delta )
+*(u64 *)addr += delta;
 break;
 default:
 blexit(L"Unsupported relocation type");
-- 
2.7.4


smime.p7s
Description: S/MIME cryptographic signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] docs: add xen-release-management.pandoc

2017-08-02 Thread Juergen Gross

On 31/07/17 13:22, Wei Liu wrote:
> A document for the release manager.
> 
> Signed-off-by: Wei Liu 

With two minor corrections (see below):

Reviewed-by: Juergen Gross 

> ---
>  docs/process/xen-release-management.pandoc | 594 
> +
>  1 file changed, 594 insertions(+)
>  create mode 100644 docs/process/xen-release-management.pandoc
> 
> diff --git a/docs/process/xen-release-management.pandoc 
> b/docs/process/xen-release-management.pandoc
> new file mode 100644
> index 00..afdf559429
> --- /dev/null
> +++ b/docs/process/xen-release-management.pandoc
> @@ -0,0 +1,594 @@
> +% Xen Release Management
> +% Wei Liu <>
> +% Revision 1
> +
> +# Motivation
> +
> +Over the years we have had different people signing up as the Release Manager
> +of Xen. It would be rather wasteful if every new Release Manager has to go 
> over
> +everything and tripped over by the same mistakes again and again.
> +
> +This file intends to document the process of managing a Xen release. It is
> +mainly written for Release Manager, but other roles (contributors,
> +maintainers and committers) are also encouraged to read this document, so
> +that they can have an idea what to expect from the Release Manager.
> +
> +# Xen release cycle
> +
> +The Xen hypervisor project now releases twice a year, at the beginning of
> +June and the beginning of December. The actual release date depends on a lot
> +of factors.
> +
> +We can roughly divide one release into two periods. The development period
> +and the freeze period. The former is 4 months long and the latter is about 2
> +months long.
> +
> +During development period, contributors submit patches to be reviewed and
> +committed into xen.git. All feature patches must be committed before a date,
> +which is normally called the "cut-off date", after which the freeze period
> +starts. There will be a date before which all patches that wish to be merged
> +for the release should be posted -- it is normally called the "last posting
> +date" and it is normally two weeks before the "cut-off date".
> +
> +During freeze period, the tree is closed for new features. Only bug fixes are
> +accepted. This period can be shorter or longer than 2 months. If it ends up
> +longer than 2 months, it eats into the next development period.
> +
> +Here is a conjured up example (use ```cal 2017``` to get an idea):
> +
> +* Development period: 2017 June 11 - 2017 September 29
> +* the "cut-off date" is 2017 September 29
> +* the "last posting date" is 2017 September 15
> +* Freeze period: 2017 October 2 - 2017 December 7
> +* the anticipated release date is 2017 December 7
> +
> +# The different roles in a Xen release
> +
> +## Release Manager
> +
> +A trusted developer in the community that owns the release process. The major
> +goal of the Release Manager is to make sure a Xen release has high quality
> +and doesn't slip too much.
> +
> +The Release Manager will not see much workload during development period, but
> +expects to see increasing workload during the freeze period until the final
> +release. He or she is expected to keep track of issues, arrange RCs,
> +negotiate with relevant stakeholders, balance the need from various parties
> +and make difficult decisions when necessary.
> +
> +The Release Manager essentially owns xen-unstable branch during the freeze
> +period. The Committers will act on the wishes of the Release Manager during
> +that time.
> +
> +## Maintainers
> +
> +A group of trusted developers who are responsible for certain components in
> +xen.git. They are expected to respond to patches / questions with regard to
> +their components in a timely manner, especially during the freeze period.
> +
> +## Committers
> +
> +A group of trusted maintainers who can commit to xen.git. During the
> +development window they normally push things as they see fit. During the
> +freeze period they transfer xen-unstable branch ownership and act on the
> +wishes of the Release Manager. That normally means they need to have an
> +Release Ack in order to push a patch.
> +
> +## Contributors
> +
> +Contributors are also expected to respond quickly to any issues regarding the
> +code they submitted during development period. Failing that, the Release
> +Manager might decide to revert the changes, declare feature unsupported or
> +take any action he / she deems appropriate.
> +
> +## The Security Team
> +
> +The Security Team operates independently. The visibility might be rather
> +limited due to the sensitive nature of security work. The best action the
> +Release Manager can take is to set aside some time for potential security
> +issues to be fixed.
> +
> +## The Release Technician
> +
> +The Release Technician is the person who tags various trees, prepares tarball
> +etc. He or she acts on the wishes of the Release Manager. Please make sure
> +the communication is as clear as it can be.
> +
> +## The

Re: [Xen-devel] [PATCH v6] x86/monitor: Notify monitor if an emulation fails.

2017-08-02 Thread Jan Beulich

>>> Razvan Cojocaru  08/01/17 9:22 PM >>>
>Since this seems to be blocked as-is, I propose transforming this patch
>into a series, with one patch adding a new return code specifically for
>unsupported instructions (X86_EMUL_UNIMPLEMENTED or
>X86_EMUL_UNSUPPORTED?), and this patch sending the vm_event out only for
>that. (In which case the event's name should probably change as well to
>reflect the name of the new error code.)

Yes please (and I'd favor X86EMUL_UNIMPLEMENTED fwiw).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] xen: get rid of paravirt op adjust_exception_frame

2017-08-02 Thread Juergen Gross

On 01/08/17 21:45, Andy Lutomirski wrote:
> On Tue, Aug 1, 2017 at 3:39 AM, Juergen Gross  wrote:
>> When running as Xen pv-guest the exception frame on the stack contains
>> %r11 and %rcx additional to the other data pushed by the processor.
>>
>> Instead of having a paravirt op being called for each exception type
>> prepend the Xen specific code to each exception entry. When running as
>> Xen pv-guest just use the exception entry with prepended instructions,
>> otherwise use the entry without the Xen specific code.
>>
>> Signed-off-by: Juergen Gross 
>> ---
>>  arch/x86/entry/entry_64.S | 22 ++
>>  arch/x86/entry/entry_64_compat.S  |  1 -
>>  arch/x86/include/asm/desc.h   | 16 ++
>>  arch/x86/include/asm/paravirt.h   |  5 ---
>>  arch/x86/include/asm/paravirt_types.h |  4 ---
>>  arch/x86/include/asm/proto.h  |  3 ++
>>  arch/x86/include/asm/traps.h  | 51 +--
>>  arch/x86/kernel/asm-offsets_64.c  |  1 -
>>  arch/x86/kernel/paravirt.c|  3 --
>>  arch/x86/kernel/traps.c   | 57 
>> ++-
>>  arch/x86/xen/enlighten_pv.c   | 16 +-
>>  arch/x86/xen/irq.c|  3 --
>>  arch/x86/xen/xen-asm_64.S | 45 ---
>>  arch/x86/xen/xen-ops.h|  1 -
>>  14 files changed, 147 insertions(+), 81 deletions(-)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index a9a8027a6c0e..602bcf68a32c 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -745,7 +745,6 @@ ENTRY(\sym)
>> .endif
>>
>> ASM_CLAC
>> -   PARAVIRT_ADJUST_EXCEPTION_FRAME
>>
>> .ifeq \has_error_code
>> pushq   $-1 /* ORIG_RAX: no syscall to 
>> restart */
>> @@ -901,7 +900,7 @@ ENTRY(do_softirq_own_stack)
>>  END(do_softirq_own_stack)
>>
>>  #ifdef CONFIG_XEN
>> -idtentry xen_hypervisor_callback xen_do_hypervisor_callback has_error_code=0
>> +idtentry hypervisor_callback xen_do_hypervisor_callback has_error_code=0
>>
>>  /*
>>   * A note on the "critical region" in our callback handler.
>> @@ -967,8 +966,6 @@ ENTRY(xen_failsafe_callback)
>> movq8(%rsp), %r11
>> addq$0x30, %rsp
>> pushq   $0  /* RIP */
>> -   pushq   %r11
>> -   pushq   %rcx
>> jmp general_protection
>>  1: /* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
>> movq(%rsp), %rcx
>> @@ -997,9 +994,8 @@ idtentry int3   do_int3  
>>has_error_code=0paranoid=1 shift_ist=DEBUG_STACK
>>  idtentry stack_segment do_stack_segmenthas_error_code=1
>>
>>  #ifdef CONFIG_XEN
>> -idtentry xen_debug do_debughas_error_code=0
>> -idtentry xen_int3  do_int3 has_error_code=0
>> -idtentry xen_stack_segment do_stack_segmenthas_error_code=1
>> +idtentry xendebug  do_debughas_error_code=0
>> +idtentry xenint3   do_int3 has_error_code=0
>>  #endif
>>
>>  idtentry general_protectiondo_general_protection   has_error_code=1
>> @@ -1161,18 +1157,6 @@ END(error_exit)
>>  /* Runs on exception stack */
>>  ENTRY(nmi)
>> /*
>> -* Fix up the exception frame if we're on Xen.
>> -* PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
>> -* one value to the stack on native, so it may clobber the rdx
>> -* scratch slot, but it won't clobber any of the important
>> -* slots past it.
>> -*
>> -* Xen is a different story, because the Xen frame itself overlaps
>> -* the "NMI executing" variable.
>> -*/
> 
> Based on Andrew Cooper's email, it sounds like this function is just
> straight-up broken on Xen PV.  Maybe change the comment to "XXX:
> broken on Xen PV" or similar.

Fine with me.

> 
>> +#if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV)
>> +#define pv_trap_entry(name) (void *)(xen_pv_domain() ? xen_ ## name : name)
>> +#else
>> +#define pv_trap_entry(name) (void *)(name)
>> +#endif
>> +
> 
> Seems reasonable to me.
> 
>>  #ifdef CONFIG_X86_64
>>
>>  static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long 
>> func,
>> @@ -482,6 +490,14 @@ static inline void _set_gate(int gate, unsigned type, 
>> void *addr,
>> 0, 0, __KERNEL_CS); \
>> } while (0)
>>
>> +#define set_intr_gate_pv(n, addr)  \
>> +   do {\
>> +   set_intr_gate_notrace(n, pv_trap_entry(addr));  \
>> +   _trace_set_gate(n, GATE_INTERRUPT,  \
>> +

Re: [Xen-devel] [XenSummit 2017] Shared coprocessor framework followup

2017-08-02 Thread Andrii Anisov


Hello Edgar,


On 01.08.17 20:13, Edgar E. Iglesias wrote:

Are master ports behind IOMMU?

Yes, they are.

What IOMMU IP is used?


It's possible to reprogram the configuration of the PL and swap accelerators in
and out on the fly. It's probably going to be too slow for trying to
context switch between guests

So let us assume it is a FW-less resource we need to share between domains.
Context switch will be stripped to mapping its mmio (or passing mmio
accesses) next domain to serve and IOMMU configuration switching.
Yep, IOMMU matters.

OK. I think the PL is more like a firmware enabled resource.
The PL configuration needs to be loaded much like firmware
otherwise the accelerators can't change shape and all guests
must use the same kind.

I understand this.
But I got your words like you are going to give a try to the same kind 
for all domains first. Because you assumed that reconfiguring would be 
too slow, what is actually discussable.



  so I think primarily we would be looking at
a way to lets say, "allocate" and "release" the resources for batch use.

Kind of voluntary preemption?

Right. That could be a start.
In the future perhaps it makes sense to context-switch.
We still need the context switch to be done. The difference is that now 
it could be done only when the accelerator is not busy.



If a guest cannot allocate an accelerator, it could fall back to emulation
or just to using SW libraries until an accelerator slot is available.

What about the thing I called "an access emulation" [1]? From the domain's
point of view it would be reflected in a delayed response (via IRQ or
register polling) from an accelerator.

I guess the concept described above is feasible even with current SCF code
and will not take too much efforts.

I'll have a look, thanks!

Do not hesitate to contact us in case you need any help or clarification.


--

*Andrii Anisov*


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] DESIGN v2: CPUID part 3

2017-08-02 Thread Joao Martins

On 08/01/2017 07:34 PM, Andrew Cooper wrote:
> On 31/07/2017 20:49, Konrad Rzeszutek Wilk wrote:
>> On Wed, Jul 05, 2017 at 02:22:00PM +0100, Joao Martins wrote:
>>> On 07/05/2017 12:16 PM, Andrew Cooper wrote:
 On 05/07/17 10:46, Joao Martins wrote:
> Hey Andrew,
>
> On 07/04/2017 03:55 PM, Andrew Cooper wrote:
>
>> (RFC: Decide exactly where to fit this.  _XEN\_DOMCTL\_max\_vcpus_ 
>> perhaps?)
>> The toolstack shall also have a mechanism to explicitly select topology
>> configuration for the guest, which primarily affects the virtual APIC ID
>> layout, and has a knock on effect for the APIC ID of the virtual IO-APIC.
>> Xen's auditing shall ensure that guests observe values consistent with 
>> the
>> guarantees made by the vendor manuals.
>>
> Why choose max_vcpus domctl?
 Despite its name, the max_vcpus hypercall is the one which allocates all
 the vcpus in the hypervisor.  I don't want there to be any opportunity
 for vcpus to exist but no topology information to have been provided.

>>> /nods
>>>
>>> So then doing this at vcpus allocation we would need to pass an additional 
>>> CPU
>>> topology argument on the max_vcpus hypercall? Otherwise it's sort of guess 
>>> work
>>> wrt sockets, cores, threads ... no?
>> Andrew, thoughts on this and the one below?
> 
> Urgh sorry.  I've been distracted with some high priority interrupts (of
> the non-maskable variety).
> 
> So, bad news is that the CPUID and MSR policy handling has become
> substantially more complicated and entwined than I had first planned.  A
> change in either of the data alters the auditing of the other, so I am
> leaning towards implementing everything with a single set hypercall (as
> this is the only way to get a plausibly-consistent set of data).
> 
> The good news is that I don't think we actually need any changes to the
> XEN_DOMCTL_max_vcpus.  I now think there is sufficient expressibility in
> the static cpuid policy to work.
> 
Awesome!

>>> There could be other uses too on passing this info to Xen, say e.g. the
>>> scheduler knowing the guest CPU topology it would allow better selection of
>>> core+sibling pair such that it could match cache/cpu topology passed on the
>>> guest (for unpinned SMT guests).
> 
> I remain to be convinced (i.e. with some real performance numbers) that
> the added complexity in the scheduler for that logic is a benefit in the
> general case.
> 
The suggestion above was a simple extension to struct domain (e.g. cores/threads
or struct cpu_topology field) - nothing too disruptive I think.

But I cannot really argue on this as this was just an idea that I found
interesting (no numbers to support it entirely). We just happened to see it
under-perform when a simple range of cpus was used for affinity, and that some
vcpus end up being scheduled belonging the same core+sibling pair IIRC; hence I
(perhaps naively) imagined that there could be value in further scheduler
enlightenment e.g. "gang-scheduling" where we schedule core+sibling always
together. I was speaking to Dario (CC'ed) on the summit whether CPU topology
could have value - and there might be but it remains to be explored once we're
able to pass a cpu topology to the guest. (In the past it seemed enthusiastic of
the idea of the topology[0] and hence I assumed to be in the context of 
schedulers)

[0] https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg03850.html

> In practice, customers are either running very specific and dedicated
> workloads (at which point pinning is used and there is no
> oversubscription, and exposing the actual SMT topology is a good thing),
>
/nods

> or customers are running general workloads with no pinning (or perhaps
> cpupool-numa-split) with a moderate amount of oversubscription (at which
> point exposing SMT is a bad move).
> 
Given the scale you folks invest on over-subscription (1000 VMs), I wonder what
moderate here means :P

> Counterintuitively, exposing NUMA in general oversubscribed scenarios is
> terrible for net system performance.  What happens in practice is that
> VMs which see NUMA spend their idle cycles trying to balance their own
> userspace processes, rather than yielding to the hypervisor so another
> guest can get a go.
> 
Interesting to know - vNUMA perhaps is only better placed for performance cases
where both (or either) I/O topology and memory locality matter - or when going
for bigger guests. Provided that the correspondent CPU topology is provided.

Joao

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Xenstore filling information about Dom0

2017-08-02 Thread Juergen Gross

On 02/08/17 12:12, Waseem, Amna wrote:
> Hello All,
> 
> Can anyone tell me how xenstore filesystem gets populated about information 
> about Dom0.

see tools/helpers/xen-init-dom0.c

> Which function or script creates node for local domain?

see tools/libxl/ (various source files)

> I want to statically configure the xenstore about guest parameters.

I guess you want to create xenstore nodes for all guests? You could use
a Xenstore watch for "@introduceDomain" to be called whenever a new
guest is added to Xenstore.

> I want to statically create nodes before bringing up xenstored daemon.

How should that work? Without xenstored daemon there is no Xenstore.
You can, however, create nodes at xenstored startup. See function
setup_structure() in tools/xenstore/xenstored_core.c

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] PV drivers and zero copying

2017-08-02 Thread Julien Grall


Hi,

On 01/08/17 20:07, Stefano Stabellini wrote:

On Tue, 1 Aug 2017, Oleksandr Andrushchenko wrote:

Hi, Stefano!

On 07/31/2017 11:28 PM, Stefano Stabellini wrote:

On Mon, 31 Jul 2017, Oleksandr Andrushchenko wrote:

3 Sharing with page exchange (XENMEM_exchange)
==

This API was pointed to me by Stefano Stabellini as one of the possible
ways
to
achieve zero copying and share physically contiguous buffers. It is used
by
x86
SWIOTLB code (xen_create_contiguous_region, [5]), but as per my
understanding
this API cannot be used on ARM as of now [6].  Conclusion: not an option
for
ARM
at the moment

Let me elaborate on this. The purpose of XENMEM_exchange is to exchange
a number of memory pages with an equal number of contiguous memory
pages, possibly even under 4G. The original purpose of the hypercall was
to get DMA-able memory.

this is good to know


So far, it has only been used by Dom0 on x86. Dom0 on ARM doesn't need
it because it is mapped 1:1 by default and device assignment is not
allowed without an IOMMU. However it should work on ARM too, as the
implementation is all common code in Xen.

well, according to [6]:
"Currently XENMEM_exchange is not supported on ARM because the steal_page is
left unimplemented.

However, even if steal_page is implemented, the hypercall can't work for ARM
because:
- Direct mapped domain is not supported
- ARM doesn't have a M2P and therefore usage of mfn_to_gmfn is
invalid"
And what I see at [7] is that it is still EOPNOTSUPP
So, yes, common code is usable for both ARM and x86, but
underlying support for ARM is till not there.
Please correct me if I am wrong here


Ops, I forgot about that! Implementing steal_page on ARM is not be a
problem, and direct mapped domains are not a concern in this scenario.

The issue is mfn_to_gmfn. However, we do not actually need mfn_to_gmfn
to implement xen/common/memory.c:memory_exchange as Julien pointed out
in http://marc.info/?l=xen-devel=145037009127660.

Julien, Jan, two years have passed. Do you think we can find a way to
make that old series work for everybody?


I looked at the series and AFAICT Jan was pushing towards M2P.

I've already made my arguments in another thread (see [1]) why I don't 
think M2P is a solution for ARM.


I really want to kill the use of M2P on ARM because I think this using 
too much memory for the benefits it brings.


I don't have much time to investigate more on XENMEM_exchange, so I 
would be grateful if someone pick up that task.


Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg01370.html


--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 5/5] xen: RCU: avoid busy waiting until the end of grace period.

2017-08-02 Thread Dario Faggioli

Hey, apologies in advance, for the very long email! :-O

On Tue, 2017-08-01 at 12:13 -0700, Stefano Stabellini wrote:
> Given the description below, it's possible that the new timer has to
> fire several times before the callback can be finally invoked, right?
> 
> I would make some measurements to check how many times the timer has
> to
> fire (how long we actually need to wait before calling the callback)
> in
> various scenarios. Then I would choose a value to minimize the number
> of
> unnecessary wake-ups.
> 
You seem to be thinking to this timer as something that disturbs the
idleness of a CPU. It's not, because the CPU is not idle! It has a
callback waiting to be invoked, and it better do that as soon as
possible, in order to:

- freeing the memory/the resources as soon as practically possible. 
  E.g., if we wait for, say, 1 sec, destroying a VM with a "passed-
  through" I/O range, and then trying to create another one, with the 
  same resource assigned, would fail until that 1 sec expires. True,
  we save more power on that CPU, but it does not look neither fair 
  nor correct to me to strive toward achieving that, in this situation.
  Power should indeed be saved, and unnecessary wake-ups prevented, but
  when there isn't anything to do! If there is stuff to do, and this is
  the case, our goal should be to get it done as soon as possible, so
  that, afterwords, we can go into a (hopefully) long sleep.

- by delaying detection of a finished grace period, we're also 
  delaying the beginning of a new one. This means, in case there are
  RCU operation waiting for it (because they arrived too late for the
  current one, and have been queued), we're potentially significantly
  delaying them too. So, basically, this again looks like the same
  kind of paradox to me, where we have stuff to do, but we decide to
  try to save power.

RCU are not a mechanism for allow the CPUs to stay idle longer, they're
a lockless and asymmetric serialization primitive. And as any
serialization primitive --either lock-based or lockless-- keeping the
duration of critical sections small is rather important.

So, I think the goal here is not minimizing the wakeups; it much rather
is completing the grace period sooner rather than later. In theory, if
rcu_needs_cpu() returns true, we may very well not even let the CPU go
idle, and have it spin (which indeed is what happens without this
patch).
That would guarantee a prompt detection of CPU quiescence, and of the
ending of a grace period. Then, since we know that spinning consumes a
lot of power, and generates overhead that, potentially, may even affect
and slow down activities going on other CPUs, we let it go idle with a
timer armed. But that's a compromise, and can't be confused with the
primary goal. And any nanosecond that we spend, on that CPU, consuming
less power than what we'd have consumed running a tight loop around
rcu_pending(), we should be thankful for it, instead of regretting that
there could have been more. :-)

In Linux, as said a few times, they do the same, and the role of this
new timer I'm introducing here, is played by the tick. The tick period
is controlled by the HZ config variable, possible values for which (in
2.6.21, where CONFIG_NO_HZ was introduced for first time for all
arches) are 100Hz, 250Hz, 300Hz and 1000Hz, which translates into 10ms,
 4ms, 3.33ms or 1ms. Default is 250HZ==4ms. The periodic tick is the
minimum time granularity of the kernel for a lot of subsystem
(especially in that kernel), so it's something that can be considered
pretty frequently running. As also said many times, we don't have any
such thing, but still I think we should use something with a similar
order of magnitude.

[NOTE that I'm not using Linux as an example because what is done in
Linux is always correct and is always what we also should do. However:
(1) RCU has been basically invented for Linux, or at least Linux is
where they've seen the widest adoption and development, and (2) our RCU
code has been (not in the best possible way, allow me to say) imported
from there. therefore, looking at what's done in Linux seems reasonable
to me, in this case.]

About the kind of measurements you're suggesting doing. I think I
understand the rationale behind it, but at the same time, I'm quite
sure that what we'd get would be pseudo-random numbers.

In fact, the length of this "wait time", although it has a relationship
with the workload being run, it also depends on a lot of other things,
within the *system* as a whole. It would be basically impossible to run
any meaningful statistical analysis on the results and come to
something that is good enough in general.

Think, for instance, at how this is not at all architecture specific
code, but the behavior we're seeing is so different between x86 and 
ARM. Think at how this doesn't really have to do anything with the
scheduler, but the behavior we observe varies, depending on some
internal implementation details of a

[Xen-devel] Xenstore filling information about Dom0

2017-08-02 Thread Waseem, Amna

Hello All,

Can anyone tell me how xenstore filesystem gets populated about information 
about Dom0. Which function or script creates node for local domain?

I want to statically configure the xenstore about guest parameters. I want to 
statically create nodes before bringing up xenstored daemon.

Regards
Amna
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 0/5] x86: guest resource mepping

2017-08-02 Thread Paul Durrant

This series introduces support for direct mapping of guest resources.
Mapping is currently limited to grant tables but support for other resources
will be added by subsequent patches.

Paul Durrant (5):
  arch/[x86|arm]: remove code duplication
  x86/mm: allow a privileged PV domain to map guest mfns
  x86/mm: add HYPERVISOR_memory_op to acquire guest resources
  tools/libxenforeignmemory: add support for resource mapping
  tools/libxenctrl: use new xenforeignmemory API to seed grant table

 tools/include/xen-sys/Linux/privcmd.h  |  11 ++
 tools/libs/foreignmemory/core.c|  42 ++
 .../libs/foreignmemory/include/xenforeignmemory.h  |  39 ++
 tools/libs/foreignmemory/libxenforeignmemory.map   |   5 +
 tools/libs/foreignmemory/linux.c   |  45 +++
 tools/libs/foreignmemory/private.h |  30 +
 tools/libxc/include/xc_dom.h   |   8 +-
 tools/libxc/xc_dom_boot.c  | 102 +++---
 tools/libxc/xc_sr_restore_x86_hvm.c|  10 +-
 tools/libxc/xc_sr_restore_x86_pv.c |   2 +-
 tools/libxl/libxl_dom.c|   1 -
 tools/python/xen/lowlevel/xc/xc.c  |   6 +-
 xen/arch/arm/mm.c  |  29 +---
 xen/arch/x86/mm.c  | 150 +
 xen/arch/x86/mm/p2m.c  |   3 +-
 xen/common/grant_table.c   |  33 +
 xen/include/asm-x86/p2m.h  |   3 +
 xen/include/public/memory.h|  38 +-
 xen/include/xen/grant_table.h  |   3 +
 19 files changed, 467 insertions(+), 93 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/5] x86/mm: allow a privileged PV domain to map guest mfns

2017-08-02 Thread Paul Durrant

In the case where a PV domain is mapping guest resources then it needs make
the HYPERVISOR_mmu_update call using DOMID_SELF, rather than the guest
domid, so that the passed in gmfn values are correctly treated as mfns
rather than gfns present in the guest p2m.

This patch removes a check which currently disallows mapping of a page when
the owner of the page tables matches the domain passed to
HYPERVISOR_mmu_update, but that domain is not the real owner of the page.
The check was introduced by patch d3c6a215ca9 ("x86: Clean up
get_page_from_l1e() to correctly distinguish between owner-of-pte and
owner-of-data-page in all cases") but it's not clear why it was needed.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/mm.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index f8b3505849..6bf8945406 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1010,12 +1010,15 @@ get_page_from_l1e(
(real_pg_owner != dom_cow) ) )
 {
 /*
- * Let privileged domains transfer the right to map their target
- * domain's pages. This is used to allow stub-domain pvfb export to
- * dom0, until pvfb supports granted mappings. At that time this
- * minor hack can go away.
+ * If the real page owner is not the domain specified in the
+ * hypercall then establish that the specified domain has
+ * mapping privilege over the page owner.
+ * This is used to allow stub-domain pvfb export to dom0. It is
+ * also used to allow a privileged PV domain to map mfns using
+ * DOMID_SELF, which is needed for mapping guest resources such
+ * grant table frames.
  */
-if ( (real_pg_owner == NULL) || (pg_owner == l1e_owner) ||
+if ( (real_pg_owner == NULL) ||
  xsm_priv_mapping(XSM_TARGET, pg_owner, real_pg_owner) )
 {
 gdprintk(XENLOG_WARNING,
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH 4/5] tools/libxenforeignmemory: add support for resource mapping

2017-08-02 Thread Paul Durrant

A previous patch introduced a new HYPERVISOR_memory_op to acquire guest
resources for direct priv-mapping.

This patch adds new functionality into libxenforeignmemory to make use
of a new privcmd ioctl [1] that uses the new memory op to make such
resources available via mmap(2).

[1] 
http://xenbits.xen.org/gitweb/?p=people/pauldu/linux.git;a=commit;h=c5cf2b15f7a448277716a7e96fea1c93df6c17a5

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/include/xen-sys/Linux/privcmd.h  | 11 ++
 tools/libs/foreignmemory/core.c| 42 
 .../libs/foreignmemory/include/xenforeignmemory.h  | 39 +++
 tools/libs/foreignmemory/libxenforeignmemory.map   |  5 +++
 tools/libs/foreignmemory/linux.c   | 45 ++
 tools/libs/foreignmemory/private.h | 30 +++
 6 files changed, 172 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index 732ff7c15a..9531b728f9 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -86,6 +86,15 @@ typedef struct privcmd_dm_op {
const privcmd_dm_op_buf_t __user *ubufs;
 } privcmd_dm_op_t;
 
+typedef struct privcmd_mmap_resource {
+   domid_t dom;
+   __u32 type;
+   __u32 id;
+   __u32 idx;
+   __u64 num;
+   __u64 addr;
+} privcmd_mmap_resource_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: _hypercall_t
@@ -103,5 +112,7 @@ typedef struct privcmd_dm_op {
_IOC(_IOC_NONE, 'P', 5, sizeof(privcmd_dm_op_t))
 #define IOCTL_PRIVCMD_RESTRICT \
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
+#define IOCTL_PRIVCMD_MMAP_RESOURCE\
+   _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
 
 #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
diff --git a/tools/libs/foreignmemory/core.c b/tools/libs/foreignmemory/core.c
index a6897dc561..291ee44516 100644
--- a/tools/libs/foreignmemory/core.c
+++ b/tools/libs/foreignmemory/core.c
@@ -120,6 +120,48 @@ int xenforeignmemory_restrict(xenforeignmemory_handle 
*fmem,
 return osdep_xenforeignmemory_restrict(fmem, domid);
 }
 
+xenforeignmemory_resource_handle *xenforeignmemory_map_resource(
+xenforeignmemory_handle *fmem, domid_t domid, unsigned int type,
+unsigned int id, unsigned long frame, unsigned long nr_frames,
+void **paddr, int prot, int flags)
+{
+xenforeignmemory_resource_handle *fres;
+int rc;
+
+fres = calloc(1, sizeof(*fres));
+if ( !fres )
+return NULL;
+
+fres->domid = domid;
+fres->type = type;
+fres->id = id;
+fres->frame = frame;
+fres->nr_frames = nr_frames;
+fres->addr = *paddr;
+fres->prot = prot;
+fres->flags = flags;
+
+rc = osdep_xenforeignmemory_map_resource(fmem, fres);
+if ( rc )
+goto fail;
+
+*paddr = fres->addr;
+return fres;
+
+fail:
+free(fres);
+
+return NULL;
+}
+
+void xenforeignmemory_unmap_resource(
+xenforeignmemory_handle *fmem, xenforeignmemory_resource_handle *fres)
+{
+osdep_xenforeignmemory_unmap_resource(fmem, fres);
+
+free(fres);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/foreignmemory/include/xenforeignmemory.h 
b/tools/libs/foreignmemory/include/xenforeignmemory.h
index f4814c390f..e56eb3c4d4 100644
--- a/tools/libs/foreignmemory/include/xenforeignmemory.h
+++ b/tools/libs/foreignmemory/include/xenforeignmemory.h
@@ -138,6 +138,45 @@ int xenforeignmemory_unmap(xenforeignmemory_handle *fmem,
 int xenforeignmemory_restrict(xenforeignmemory_handle *fmem,
   domid_t domid);
 
+typedef struct xenforeignmemory_resource_handle 
xenforeignmemory_resource_handle;
+
+/**
+ * This function maps a guest resource.
+ *
+ * @parm fmem handle to the open foreignmemory interface
+ * @parm domid the domain id
+ * @parm type the resource type
+ * @parm id the type-specific resource identifier
+ * @parm frame base frame index within the resource
+ * @parm nr_frames number of frames to map
+ * @parm paddr pointer to an address passed through to mmap(2)
+ * @parm prot passed through to mmap(2)
+ * @parm flags passed through to mmap(2)
+ * @return pointer to foreignmemory resource handle on success, NULL on
+ * failure
+ *
+ * *paddr is used, on entry, as a hint address for foreign map placement
+ * (see mmap(2)) so should be set to NULL if no specific placement is
+ * required. On return *paddr contains the address where the resource is
+ * mapped.
+ * As for xenforeignmemory_map2() flags is a set of additional flags
+ * for mmap(2). Not all of the flag combinations are possible due to
+ * implementation details on different platforms.
+ */
+xenforeignmemory_resource_handle *xenforeignmemory_map_resource(
+xenforeignmemory_handle *fmem, domid_t

[Xen-devel] [PATCH 5/5] tools/libxenctrl: use new xenforeignmemory API to seed grant table

2017-08-02 Thread Paul Durrant

A previous patch added support for priv-mapping guest resources directly
(rather than having to foreign-map, which requires P2M modification for
HVM guests).

This patch makes use of the new API to seed the guest grant table unless
the underlying infrastructure (i.e. privcmd) doesn't support it, in which
case the old scheme is used.

NOTE: The call to xc_dom_gnttab_hvm_seed() in hvm_build_set_params() was
  actually unnecessary, as the grant table has already been seeded
  by a prior call to xc_dom_gnttab_init() made by libxl__build_dom().

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libxc/include/xc_dom.h|   8 +--
 tools/libxc/xc_dom_boot.c   | 102 
 tools/libxc/xc_sr_restore_x86_hvm.c |  10 ++--
 tools/libxc/xc_sr_restore_x86_pv.c  |   2 +-
 tools/libxl/libxl_dom.c |   1 -
 tools/python/xen/lowlevel/xc/xc.c   |   6 +--
 6 files changed, 92 insertions(+), 37 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index ce47058c41..d6ca0a8680 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -323,12 +323,8 @@ void *xc_dom_boot_domU_map(struct xc_dom_image *dom, 
xen_pfn_t pfn,
 int xc_dom_boot_image(struct xc_dom_image *dom);
 int xc_dom_compat_check(struct xc_dom_image *dom);
 int xc_dom_gnttab_init(struct xc_dom_image *dom);
-int xc_dom_gnttab_hvm_seed(xc_interface *xch, domid_t domid,
-   xen_pfn_t console_gmfn,
-   xen_pfn_t xenstore_gmfn,
-   domid_t console_domid,
-   domid_t xenstore_domid);
-int xc_dom_gnttab_seed(xc_interface *xch, domid_t domid,
+int xc_dom_gnttab_seed(xc_interface *xch, domid_t guest_domid,
+   bool is_hvm,
xen_pfn_t console_gmfn,
xen_pfn_t xenstore_gmfn,
domid_t console_domid,
diff --git a/tools/libxc/xc_dom_boot.c b/tools/libxc/xc_dom_boot.c
index c3b44dd399..fc3174ad7e 100644
--- a/tools/libxc/xc_dom_boot.c
+++ b/tools/libxc/xc_dom_boot.c
@@ -280,11 +280,11 @@ static xen_pfn_t xc_dom_gnttab_setup(xc_interface *xch, 
domid_t domid)
 return gmfn;
 }
 
-int xc_dom_gnttab_seed(xc_interface *xch, domid_t domid,
-   xen_pfn_t console_gmfn,
-   xen_pfn_t xenstore_gmfn,
-   domid_t console_domid,
-   domid_t xenstore_domid)
+static int compat_gnttab_seed(xc_interface *xch, domid_t domid,
+  xen_pfn_t console_gmfn,
+  xen_pfn_t xenstore_gmfn,
+  domid_t console_domid,
+  domid_t xenstore_domid)
 {
 
 xen_pfn_t gnttab_gmfn;
@@ -337,11 +337,11 @@ int xc_dom_gnttab_seed(xc_interface *xch, domid_t domid,
 return 0;
 }
 
-int xc_dom_gnttab_hvm_seed(xc_interface *xch, domid_t domid,
-   xen_pfn_t console_gpfn,
-   xen_pfn_t xenstore_gpfn,
-   domid_t console_domid,
-   domid_t xenstore_domid)
+static int compat_gnttab_hvm_seed(xc_interface *xch, domid_t domid,
+  xen_pfn_t console_gpfn,
+  xen_pfn_t xenstore_gpfn,
+  domid_t console_domid,
+  domid_t xenstore_domid)
 {
 int rc;
 xen_pfn_t scratch_gpfn;
@@ -380,7 +380,7 @@ int xc_dom_gnttab_hvm_seed(xc_interface *xch, domid_t domid,
 return -1;
 }
 
-rc = xc_dom_gnttab_seed(xch, domid,
+rc = compat_gnttab_seed(xch, domid,
 console_gpfn, xenstore_gpfn,
 console_domid, xenstore_domid);
 if (rc != 0)
@@ -405,18 +405,78 @@ int xc_dom_gnttab_hvm_seed(xc_interface *xch, domid_t 
domid,
 return 0;
 }
 
-int xc_dom_gnttab_init(struct xc_dom_image *dom)
+int xc_dom_gnttab_seed(xc_interface *xch, domid_t guest_domid,
+   bool is_hvm, xen_pfn_t console_gmfn,
+   xen_pfn_t xenstore_gmfn, domid_t console_domid,
+   domid_t xenstore_domid)
 {
-if ( xc_dom_translated(dom) ) {
-return xc_dom_gnttab_hvm_seed(dom->xch, dom->guest_domid,
-  dom->console_pfn, dom->xenstore_pfn,
-  dom->console_domid, dom->xenstore_domid);
-} else {
-return xc_dom_gnttab_seed(dom->xch, dom->guest_domid,
-  xc_dom_p2m(dom, dom->console_pfn),
-  xc_dom_p2m(dom, dom->xenstore_pfn),
-  dom->console_domid, dom->xenstore_domid);
+xenforeignmemory_handle* fmem = xch->fmem;
+

[Xen-devel] [PATCH 1/5] [x86|arm]: remove code duplication

2017-08-02 Thread Paul Durrant

There is a substantial amount of code duplicated between the x86 and arm
implementations of mm.c:xenmem_add_to_physmap_one() for
XENMAPSPACE_grant_table. Also, the code in question looks like it really
should be in common/grant_table.c

This patch introduces a new function in common/grant_table.c to get the mfn
of a specified frame in the grant table of a specified guest, and calls to
that from the arch-specific code in mm.c.

Signed-off-by: Paul Durrant 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/arm/mm.c | 29 -
 xen/arch/x86/mm.c | 26 +++---
 xen/common/grant_table.c  | 33 +
 xen/include/xen/grant_table.h |  3 +++
 4 files changed, 43 insertions(+), 48 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 98260f604c..038b20cd5d 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1229,32 +1229,11 @@ int xenmem_add_to_physmap_one(
 switch ( space )
 {
 case XENMAPSPACE_grant_table:
-grant_write_lock(d->grant_table);
-
-if ( d->grant_table->gt_version == 0 )
-d->grant_table->gt_version = 1;
-
-if ( d->grant_table->gt_version == 2 &&
-(idx & XENMAPIDX_grant_table_status) )
-{
-idx &= ~XENMAPIDX_grant_table_status;
-if ( idx < nr_status_frames(d->grant_table) )
-mfn = virt_to_mfn(d->grant_table->status[idx]);
-else
-return -EINVAL;
-}
-else
-{
-if ( (idx >= nr_grant_frames(d->grant_table)) &&
- (idx < max_grant_frames) )
-gnttab_grow_table(d, idx + 1);
-
-if ( idx < nr_grant_frames(d->grant_table) )
-mfn = virt_to_mfn(d->grant_table->shared_raw[idx]);
-else
-return -EINVAL;
-}
+mfn = gnttab_get_frame(d, idx);
+if ( mfn_eq(mfn, INVALID_MFN) )
+return -EINVAL;
 
+grant_write_lock(d->grant_table);
 d->arch.grant_table_gfn[idx] = gfn;
 
 t = p2m_ram_rw;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 19f672d880..f8b3505849 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4863,29 +4863,9 @@ int xenmem_add_to_physmap_one(
 mfn = virt_to_mfn(d->shared_info);
 break;
 case XENMAPSPACE_grant_table:
-grant_write_lock(d->grant_table);
-
-if ( d->grant_table->gt_version == 0 )
-d->grant_table->gt_version = 1;
-
-if ( d->grant_table->gt_version == 2 &&
- (idx & XENMAPIDX_grant_table_status) )
-{
-idx &= ~XENMAPIDX_grant_table_status;
-if ( idx < nr_status_frames(d->grant_table) )
-mfn = virt_to_mfn(d->grant_table->status[idx]);
-}
-else
-{
-if ( (idx >= nr_grant_frames(d->grant_table)) &&
- (idx < max_grant_frames) )
-gnttab_grow_table(d, idx + 1);
-
-if ( idx < nr_grant_frames(d->grant_table) )
-mfn = virt_to_mfn(d->grant_table->shared_raw[idx]);
-}
-
-grant_write_unlock(d->grant_table);
+mfn = mfn_x(gnttab_get_frame(d, idx));
+if ( mfn_eq(_mfn(mfn), INVALID_MFN) )
+return -EINVAL;
 break;
 case XENMAPSPACE_gmfn_range:
 case XENMAPSPACE_gmfn:
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index ae34547005..1411519126 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1604,6 +1604,39 @@ active_alloc_failed:
 return 0;
 }
 
+mfn_t
+gnttab_get_frame(struct domain *d, unsigned int idx)
+{
+struct grant_table *gt = d->grant_table;
+mfn_t mfn = INVALID_MFN;
+
+grant_write_lock(gt);
+
+if ( gt->gt_version == 0 )
+gt->gt_version = 1;
+
+if ( gt->gt_version == 2 &&
+ (idx & XENMAPIDX_grant_table_status) )
+{
+idx &= ~XENMAPIDX_grant_table_status;
+if ( idx < nr_status_frames(gt) )
+mfn = _mfn(virt_to_mfn(gt->status[idx]));
+}
+else
+{
+if ( (idx >= nr_grant_frames(gt)) &&
+ (idx < max_grant_frames) )
+gnttab_grow_table(d, idx + 1);
+
+if ( idx < nr_grant_frames(gt) )
+mfn = _mfn(virt_to_mfn(gt->shared_raw[idx]));
+}
+
+grant_write_unlock(gt);
+
+return mfn;
+}
+
 static long 
 gnttab_setup_table(
 XEN_GUEST_HANDLE_PARAM(gnttab_setup_table_t) uop, unsigned int count)
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 4e7789968c..685af7c578 100644
--- a/xen/include/xen/grant_table.h
+++

[Xen-devel] [PATCH 3/5] x86/mm: add HYPERVISOR_memory_op to acquire guest resources

2017-08-02 Thread Paul Durrant

Certain memory resources associated with a guest are not necessarily
present in the guest P2M and so are not necessarily available to be
foreign-mapped by a tools domain unless they are inserted, which risks
shattering a super-page mapping.

This patch adds a new memory op to allow such resourced to be priv-mapped
directly, by either a PV or HVM tools domain.

NOTE: Whilst the new op is not intrinsicly specific to the x86 architecture,
  I have no means to test it on an ARM platform and so cannot verify
  that it functions correctly. Hence it is currently only implemented
  for x86.

Signed-off-by: Paul Durrant 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 
---
 xen/arch/x86/mm.c   | 111 
 xen/arch/x86/mm/p2m.c   |   3 +-
 xen/include/asm-x86/p2m.h   |   3 ++
 xen/include/public/memory.h |  38 ++-
 4 files changed, 152 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 6bf8945406..b295dd412a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4941,6 +4941,107 @@ int xenmem_add_to_physmap_one(
 return rc;
 }
 
+static int xenmem_acquire_grant_table(struct domain *d,
+  unsigned long frame,
+  unsigned long nr_frames,
+  unsigned long mfn_list[])
+{
+unsigned int i;
+
+/*
+ * Iterate through the list backwards so that gnttab_get_frame() is
+ * first called for the highest numbered frame. This means that the
+ * out-of-bounds check will be done on the first iteration and, if
+ * the table needs to grow, it will only grow once.
+ */
+i = nr_frames;
+while ( i-- != 0 )
+{
+mfn_t mfn = gnttab_get_frame(d, frame + i);
+
+if ( mfn_eq(mfn, INVALID_MFN) )
+return -EINVAL;
+
+mfn_list[i] = mfn_x(mfn);
+}
+
+return 0;
+}
+
+static int xenmem_acquire_resource(xen_mem_acquire_resource_t *xmar)
+{
+struct domain *d, *currd = current->domain;
+unsigned long *mfn_list;
+int rc;
+
+if ( xmar->nr_frames == 0 )
+return -EINVAL;
+
+d = rcu_lock_domain_by_any_id(xmar->domid);
+if ( d == NULL )
+return -ESRCH;
+
+rc = xsm_domain_memory_map(XSM_TARGET, d);
+if ( rc )
+goto out;
+
+mfn_list = xmalloc_array(unsigned long, xmar->nr_frames);
+
+rc = -ENOMEM;
+if ( !mfn_list )
+goto out;
+
+switch ( xmar->type )
+{
+case XENMEM_resource_grant_table:
+rc = -EINVAL;
+if ( xmar->id ) /* must be zero for grant_table */
+break;
+
+rc = xenmem_acquire_grant_table(d, xmar->frame, xmar->nr_frames,
+mfn_list);
+break;
+
+default:
+rc = -EOPNOTSUPP;
+break;
+}
+
+if ( rc )
+goto free_and_out;
+
+if ( !paging_mode_translate(currd) )
+{
+if ( __copy_to_guest_offset(xmar->gmfn_list, 0, mfn_list,
+xmar->nr_frames) )
+rc = -EFAULT;
+}
+else
+{
+unsigned int i;
+
+for ( i = 0; i < xmar->nr_frames; i++ )
+{
+xen_pfn_t gfn;
+
+rc = -EFAULT;
+if ( __copy_from_guest_offset(, xmar->gmfn_list, i, 1) )
+goto free_and_out;
+
+rc = set_foreign_p2m_entry(currd, gfn, _mfn(mfn_list[i]));
+if ( rc )
+goto free_and_out;
+}
+}
+
+ free_and_out:
+xfree(mfn_list);
+
+ out:
+rcu_unlock_domain(d);
+return rc;
+}
+
 long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 int rc;
@@ -5163,6 +5264,16 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
+case XENMEM_acquire_resource:
+{
+xen_mem_acquire_resource_t xmar;
+
+if ( copy_from_guest(, arg, 1) )
+return -EFAULT;
+
+return xenmem_acquire_resource();
+}
+
 default:
 return subarch_memory_op(cmd, arg);
 }
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index e8a57d118c..c503a7f1d2 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1118,8 +1118,7 @@ static int set_typed_p2m_entry(struct domain *d, unsigned 
long gfn, mfn_t mfn,
 }
 
 /* Set foreign mfn in the given guest's p2m table. */
-static int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
- mfn_t mfn)
+int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
 return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign,
p2m_get_hostp2m(d)->default_access);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index

Re: [Xen-devel] Is possible to do GPU virtualization in Intel® Atom?

2017-08-02 Thread Asharaf Perinchikkal

Is possible to achieve GPU virtualization in Intel® Atom using para 
virtualization?

From: Roger Pau Monné [roger@citrix.com]
Sent: Wednesday, August 02, 2017 1:04 PM
To: Asharaf Perinchikkal
Cc: xen-devel@lists.xen.org; Anoop Babu
Subject: Re: [Xen-devel] Is possible to do GPU virtualization in Intel® Atom?

On Tue, Aug 01, 2017 at 10:01:01AM +, Asharaf Perinchikkal wrote:
> Hi All,
>
>
> In Intel® Atom™ E3845(MinnowBoard Turbot Quad-Core board) has only  support 
> for Virtualization Technology (VT-x).
>
> No support for Intel® Virtualization Technology for Directed I/O (VT-d). 
> [https://ark.intel.com/products/78475/Intel-Atom-Processor-E3845-2M-Cache-1_91-GHz]

Without VT-d (IOMMU) you won't be able to passthrough any physical
device to a guest, so no, you won't be able to do GPU passthrough (at
least in a safe way).

Roger.
---Disclaimer-- This e-mail contains PRIVILEGED AND 
CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If 
you are not the intended recipient, please notify the sender by e-mail and 
delete the original message. Opinions, conclusions and other information in 
this transmission that do not relate to the official business of QuEST Global 
and/or its subsidiaries, shall be understood as neither given nor endorsed by 
it. Any statements made herein that are tantamount to contractual obligations, 
promises, claims or commitments shall not be binding on the Company unless 
followed by written confirmation by an authorized signatory of the Company. 
---

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 4/8] mm: Scrub memory from idle loop

2017-08-02 Thread Jan Beulich

>>> Boris Ostrovsky  07/31/17 6:16 PM >>>
>On 07/31/2017 11:20 AM, Jan Beulich wrote:
> Boris Ostrovsky  07/23/17 4:14 AM >>>
> @@ -1050,17 +1120,42 @@ static void scrub_free_pages(unsigned int node)
> -if ( node_need_scrub[node] == 0 )
> -return;
> +if ( preempt || (node_need_scrub[node] == 0) )
> +goto out;
>   }
>   } while ( order-- != 0 );
>   }
> +
> + out:
> +spin_unlock(_lock);
> +node_clear(node, node_scrubbing);
> +return softirq_pending(cpu) || (node_to_scrub(false) != 
> NUMA_NO_NODE);
 While I can see why you use it here, the softirq_pending() looks sort of
 misplaced: While invoking it twice in the caller will look a little odd 
 too,
 I still think that's where the check belongs.
>>>
>>> scrub_free_pages is called from idle loop as
>>>
>>> else if ( !softirq_pending(cpu) && !scrub_free_pages() )
>>> pm_idle();
>>>
>>> so softirq_pending() is unnecessary here.
>>>
>>> (Not sure why you are saying it would be invoked twice)
>> That was sort of implicit - the caller would want to become
>>
>>
>> else if ( !softirq_pending(cpu) && !scrub_free_pages() && 
>> !softirq_pending(cpu) )
>> pm_idle();
>>
>> to account for the fact that a softirq may become pending while scrubbing.
>
>That would look really odd IMO.
>
>Would
>
>else if ( !softirq_pending(cpu) )
>if ( !scrub_free_pages() && !softirq_pending(cpu) )
>pm_idle();
>
>or 
>
>else if ( !softirq_pending(cpu) && !scrub_free_pages() )
>if ( !softirq_pending(cpu) )
>pm_idle();
>
>
>be better? (I'd prefer the first)

I dislike both (as we always as people to fold such chained if()s), and hence
would prefer my variant plus a suitable comment explaining the oddity.

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] rombios: prevent building with PIC/PIE

2017-08-02 Thread George Dunlap

On Mon, Jun 26, 2017 at 3:37 PM, Andrew Cooper
 wrote:
> On 26/06/17 14:00, Andrew Cooper wrote:
>> On 26/06/17 13:55, Olaf Hering wrote:
>>> If the default compiler silently defaults to to -fPIC/-fPIE building
>>> rombios fails:
>>>
>>>  ld -melf_i386 -s -r 32bitbios.o tcgbios/tcgbiosext.o util.o pmm.o -o 
>>> 32bitbios_all.o
>>>  There are undefined symbols in the BIOS:
>>>   U _GLOBAL_OFFSET_TABLE_
>>>  make[10]: *** [Makefile:26: 32bitbios_all.o] Error 11
>>>
>>> Prevent the failure by enforcing non-PIC/PIE mode.
>>>
>>> Signed-off-by: Olaf Hering 
>> Acked-by: Andrew Cooper 
>
> Committed, thanks.

This needs to be backported at least to 4.8 and 4.9; sounds like maybe
all supported versions as well?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist

2017-08-02 Thread Jan Beulich

>>> Boris Ostrovsky  07/31/17 6:03 PM >>>
On 07/31/2017 10:45 AM, Jan Beulich wrote:
 Boris Ostrovsky  07/23/17 4:01 AM >>>
>> On 06/27/2017 01:06 PM, Jan Beulich wrote:
>> Boris Ostrovsky  06/22/17 8:55 PM >>>
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -88,7 +88,15 @@ struct page_info
>   /* Page is on a free list: ((count_info & PGC_count_mask) == 
> 0). */
>   struct {
>   /* Do TLBs need flushing for safety before next page use? */
> -bool_t need_tlbflush;
> +unsigned long need_tlbflush:1;
> +
> +/*
> + * Index of the first *possibly* unscrubbed page in the 
> buddy.
> + * One more than maximum possible order (MAX_ORDER+1) to
 Why +1 here and hence ...
>>> Don't we have MAX_ORDER+1 orders?
>> So here there might be a simple misunderstanding: I understand the
>> parenthesized MAX_ORDER+1 to represent "maximum possible
>> order", i.e. excluding the "one more than", not the least because of
>> the ...
>>
 + * accommodate INVALID_DIRTY_IDX.
 + */
 +#define INVALID_DIRTY_IDX (-1UL & (((1UL<> +2 here.
>>
 ... why +2 instead of +1? And isn't the expression INVALID_DIRTY_IDX 
 wrongly
 parenthesized (apart from lacking blanks around the shift operator)? I'd
 expect you want a value with MAX_ORDER+1 set bits, i.e.
 (1UL << (MAX_ORDER + 1)) - 1. ANDing with -1UL seems quite pointless too.
>>> Yes to parentheses and AND. Should be (1UL << (MAX_ORDER + 2)) - 1
>> I.e. I would still expect it to be (1UL << (MAX_ORDER + 1)) - 1
>> here.
>
>
>Sorry, I still don't get it.
>
>Say, MAX_ORDER is 1. Since this implies that indexes 0, 1, 2 and 3 are
>all valid (because we can have up to 2^(MAX_ORDER+1) pages), don't we
>need 3 bits to indicate an invalid index?

Why 0, 1, 2, and 3? When MAX_ORDER is 1, we only have a single bit, i.e.
valid values 0 and 1 (plus one more for the invalid indicator), i.e. need 2 bits
for representation of all used values.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/5] xen/livepatch/ARM32: Don't load and crash on livepatches loaded with wrong alignment.

2017-08-02 Thread Jan Beulich

>>> Konrad Rzeszutek Wilk  07/31/17 6:04 PM >>>
>On Mon, Jul 31, 2017 at 07:55:34AM -0600, Jan Beulich wrote:
>> >>> Konrad Rzeszutek Wilk  07/26/17 9:50 PM >>>
>> >--- a/docs/misc/livepatch.markdown
>> >+++ b/docs/misc/livepatch.markdown
>> >@@ -279,6 +279,10 @@ It may also have some architecture-specific sections. 
>> >For example:
>> >* Exception tables.
>> >* Relocations for each of these sections.
>>  >
>> >+Note that on ARM 32 the sections SHOULD be four byte aligned. Otherwise
>> >+we risk hitting Data Abort exception as un-aligned manipulation of data is
>> >+prohibited on ARM 32.
>> 
>> This (and hence the rest of the patch) is not in line with the outcome of the
>> earlier discussion we had. Nothing is wrong with a section having smaller
>> alignment, as long as there are no 32-bit (or wider, but I don't think there
>> are any such) relocations against such a section. And even if there were, I
>> think it should rather be the code doing the relocations needing to cope, as
>> I don't think the ARM ELF ABI imposes any such restriction.
>
>The idea behind this patch is to give advance warnings. Akin to what
>2ff229643b739e2fd0cd0536ee9fca506cfa92f8
>"xen/livepatch: Don't crash on encountering STN_UNDEF relocations" did.
>
>The other patches in this series fix the alignment issues.
>
>The ARM ELF ABI 
>(http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044f/IHI0044F_aaelf.pdf)
>
>says:
>
>4.3.5 Section Alignment
>There is no minimum alignment required for a section. However, sections 
>containing thumb code must be at least
>16-bit aligned and sections containing ARM code must be at least 32-bit 
>aligned.
>Platform standards may set a limit on the maximum alignment that they can 
>guarantee (normally the page size).

Note the "thumb code" and "ARM code" in here - iirc you're checking _all_
sections, not just ones containing code.

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] xen/link: Move .data.rel.ro sections into .rodata for final link

2017-08-02 Thread Jan Beulich

>>> David Woodhouse  07/31/17 5:57 PM >>>
>There is a first call to efi_arch_relocate_image(0) before the
>ExitBootServices() call. However I'm missing something because I can't
>see how that call achieves anything *other* than to trigger the fault
>we're concerned about.
>
>There are three types of relocations — PE_BASE_RELOC_ABS,
>PE_BASE_RELOC_HIGHLOW, and PE_BASE_RELOC_DIR64.
>
>The first (ABS) doesn't seem to do anything, ever. And is never emitted
>by mkrelocs.c. 
>
>The second (HIGHLOW) does nothing if (!delta).
>
>The third (DIR64) simply adds 'delta' to the target address. We could
>potentially stop it faulting on that pointless '*addr += 0' by doing
>this...
>
>--- a/xen/arch/x86/efi/efi-boot.h
>+++ b/xen/arch/x86/efi/efi-boot.h
>@@ -87,7 +87,8 @@ static void __init efi_arch_relocate_image(unsigned long 
>delta)
>case PE_BASE_RELOC_DIR64:
>if ( in_page_tables(addr) )
>blexit(L"Unexpected relocation type");
>-*(u64 *)addr += delta;
>+if ( delta )
>+*(u64 *)addr += delta;

Yes, that's what I had described as "adjustment to efi_arch_relocate_image()".

>break;
>default:
>blexit(L"Unsupported relocation type");
>
>
>... but then again, if the whole function is really doing nothing at
>all when invoked with delta==0 then perhaps it would just be easier to
>remove the first pass altogether. I feel sure I'm missing something,

The reason is even visible in context above: We can't call blexit() after
having called ExitBootServices(). Hence we need a "dry run" done early,
to be certain we won't encounter unhandlable relocations later on.

>Either way, this is still broken before my patch though, right?

As long as there are .rodata entries needing relocation (which I
understand you've found example of), yes.

>Especially if UEFI learns to do it for non-executable sections, but
>AFAICT even before that.
>
>These are the sections I see the PE section headers of a local build:
>
>Name Characteristics   Relocations
>.text0xe0d00020 (WRX)✓
>.rodata  0x40600040 ( R )✓
>.buildid 0x40300040 ( R )
>.init.te 0x60500020 ( RX)✓
>.init.da 0xc0d00040 (WR )✓
>.data.re 0xc0800040 (WR )✓
>.data0xc0d00040 (WR )✓
>.bss 0xc180 (WR )
>.reloc   0x42300040 ( R )
>.pad 0xc0300080 (WR )
>
>So there are (again, before my patch) relocations in .init.da(ta) and
>.rodata sections which UEFI *might* start marking read-only, and also
>in .init.te(xt) which is R+X and could be marked read-only today.

Not sure why you mention .init.data, but yes, .init.text could have the
same issue. But here it would probably be better to simply mark the
section WRX.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing

2017-08-02 Thread Jan Beulich

>>> Boris Ostrovsky  07/23/17 4:28 AM >>>
>On 06/27/2017 03:28 PM, Jan Beulich wrote:
> Boris Ostrovsky  06/22/17 8:56 PM >>>
>>> +static void check_and_stop_scrub(struct page_info *head)
>>> +{
>>> +if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
>>> +{
>>> +struct page_info pg;
>>> +
>>> +head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
>>> +spin_lock_kick();
>>> +for ( ; ; )
>>> +{
>>> +/* Can't ACCESS_ONCE() a bitfield. */
>>> +pg.u.free.val = ACCESS_ONCE(head->u.free.val);
>> 
>> Something like ACCESS_ONCE(head->u.free).val ought to work (or read_atomic(),
>> due to the questionable scalar type check in ACCESS_ONCE()).
>
>Hmm... I couldn't get this to work with either suggestion.
>
>page_alloc.c:751:13: error: conversion to non-scalar type requested
>pg.u.free = read_atomic(>u.free);
>
>page_alloc.c:753:6: error: conversion to non-scalar type requested
>if ( ACCESS_ONCE(head->u.free).scrub_state != BUDDY_SCRUB_ABORT )

Oh, indeed. That's rather unfortunate.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [distros-debian-squeeze test] 71930: tolerable trouble: broken/fail/pass

2017-08-02 Thread Platform Team regression test user

flight 71930 distros-debian-squeeze real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/71930/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 build-arm64-pvops 2 hosts-allocate   broken like 71702
 build-arm64   2 hosts-allocate   broken like 71702
 build-arm64   3 capture-logs broken like 71702
 build-arm64-pvops 3 capture-logs broken like 71702
 test-amd64-i386-amd64-squeeze-netboot-pygrub 10 debian-di-install fail like 
71702
 test-amd64-i386-i386-squeeze-netboot-pygrub 10 debian-di-install fail like 
71702
 test-amd64-amd64-i386-squeeze-netboot-pygrub 10 debian-di-install fail like 
71702
 test-amd64-amd64-amd64-squeeze-netboot-pygrub 10 debian-di-install fail like 
71702

baseline version:
 flight   71702

jobs:
 build-amd64  pass
 build-arm64  broken  
 build-armhf  pass
 build-i386   pass
 build-amd64-pvopspass
 build-arm64-pvopsbroken  
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-amd64-squeeze-netboot-pygrubfail
 test-amd64-i386-amd64-squeeze-netboot-pygrub fail
 test-amd64-amd64-i386-squeeze-netboot-pygrub fail
 test-amd64-i386-i386-squeeze-netboot-pygrub  fail



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] Is possible to do GPU virtualization in Intel® Atom?

2017-08-02 Thread Roger Pau Monné

On Tue, Aug 01, 2017 at 10:01:01AM +, Asharaf Perinchikkal wrote:
> Hi All,
> 
> 
> In Intel® Atom™ E3845(MinnowBoard Turbot Quad-Core board) has only  support 
> for Virtualization Technology (VT-x).
> 
> No support for Intel® Virtualization Technology for Directed I/O (VT-d). 
> [https://ark.intel.com/products/78475/Intel-Atom-Processor-E3845-2M-Cache-1_91-GHz]

Without VT-d (IOMMU) you won't be able to passthrough any physical
device to a guest, so no, you won't be able to do GPU passthrough (at
least in a safe way).

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] OSSTest hardware maintenance

2017-08-02 Thread Wei Liu

Hi all

OSSTest is shut down for maintenance at the moment. It will be brought
back either later today or tomorrow. I will send out another email when
the maintenance is done.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 00/13] "Non-shared" IOMMU support on ARM

2017-08-02 Thread Tian, Kevin

> From: Oleksandr Tyshchenko [mailto:olekst...@gmail.com]
> Sent: Tuesday, August 1, 2017 7:08 PM
> 
> Hi, Kevin
> 
> On Tue, Aug 1, 2017 at 6:06 AM, Tian, Kevin  wrote:
> >> From: Oleksandr Tyshchenko [mailto:olekst...@gmail.com]
> >> Sent: Monday, July 31, 2017 7:58 PM
> >>
> >> Hi, Kevin
> >>
> >> On Mon, Jul 31, 2017 at 8:57 AM, Tian, Kevin 
> wrote:
> >> >> From: Oleksandr Tyshchenko
> >> >> Sent: Wednesday, July 26, 2017 1:27 AM
> >> >>
> >> >> From: Oleksandr Tyshchenko 
> >> >>
> >> >> Hi, all.
> >> >>
> >> >> The purpose of this patch series is to create a base for porting
> >> >> any "Non-shared" IOMMUs to Xen on ARM. Saying "Non-shared"
> IOMMU
> >> I
> >> >> mean
> >> >> the IOMMU that can't share the page table with the CPU.
> >> >
> >> > Is "non-shared" IOMMU a standard terminology in ARM side? I quickly
> >> > searched to find it mostly used in this thread...
> >> I don't think that it is a standard terminology.
> >>
> >> >
> >> > On the other hand, all IOMMUs support a basic DMA remapping
> >> > mechanism with page table not shared with CPU. Then some IOMMUs
> >> > may optional support Shared Virtual Memory (SVM) through page
> >> > sharing with CPU. Then I'm not sure why need to highlight the
> >> > "non-shared" manner in this thread, instead of just saying
> >> > IPMMU-VMSA support...
> >> I wouldn't use "IPMMU-VMSA support" in this thread since it may be any
> >> other IOMMUs which can't share page table
> >> with CPU because of format incompatibilities.
> >
> > As I commented you can assume all IOMMUs cannot share page
> > table with CPU as the starting point. It's not good to name an IOMMU
> > driver based on such fact.
> >
> >> I needed something short to describe such IOMMUs, but, If title
> >> "non-shared" IOMMU sounds confusing
> >> I won't use it anymore. Do you have something in mind?
> >
> > IOMMU driver needs to be vendor specific. Is your driver working
> > for all IPMMU-VMSA compatible IOMMUs or only for Renesas?
> > If the latter, you may make the name explicit for such purpose.
> >
> > btw since you're porting Linux driver to Xen. What 's the name
> > used in Linux side? that should be a good reference to you.
> 
> I am afraid a misunderstanding took place. Let me elaborate a bit more
> about this.
> 
> The IOMMU driver I am porting to Xen is IPMMU-VMSA [1]. This name is
> used in Linux
> and this name was retained during porting to Xen. This driver is
> intended to work with Renesas VMSA-compatible IPMMUs.
> But, IPMMU-VMSA support is not a target for the current thread, there
> is another thread for adding it [2].
> 
> The purpose of the current thread is to create ground for IPMMU-VMSA
> IOMMUs
> (as well as other IOMMUs which can't share page table with CPU because
> of format incompatibilities) to be functional inside Xen on ARM.
> The only IOMMU supported today in Xen on ARM is the ARM SMMU (which
> uses the same page table format as the CPU and can share page table
> with it).
> And ARM specific code assumes that P2M table is *always* shared and
> acts accordingly. So, this patch series is trying
> to, let's say, brake this assumption and create environment to handle
> such IOMMUs as well.
> So, I may use the whole sentence as a patch series title in order not
> to confuse people:
> "Support IOMMUs which don't share page table with the CPU on ARM"
> No objections?

well, I saw where disconnect comes. My context when reviewing
this message is right around thinking some about Shared Virtual
Memory (SVM), which is a feature to allow IOMMU sharing same
CPU page table for VA->PA so user application can directly send
VA to device to do DMA. In virtualization it means IOMMU supports
two-level translation with 1st level for GVA -> GPA and 2nd level
for GPA->HPA. 1st level directly uses guest CPU page table. That 
is an optional feature not supported by all IOMMUs.

while your work is really about IOMMU sharing CPU EPT page
table (GPA->HPA) (sorry I don't know ARM's term for EPT), and
for this usage some ARM SMMUs have compatible format while
others may not, which is why you introduced the "non-shared"
model here.

I'm not sure whether it's worthy of differentiating two usages
in your subject line, since SVM support is not in Xen today. And
from Julien's comment looks people don't have confusion on
its meaning. Then I'm fine with your original description. it's
Julien's call. :-)

Thanks
Kevin
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

81 matches

Mail list logo