Re: [PATCH 16/16] x86/P2M: the majority for struct p2m_domain's fields are HVM-only

2022-02-15 Thread Jan Beulich
On 14.02.2022 17:07, Jan Beulich wrote:
> On 14.02.2022 16:51, George Dunlap wrote:
>>> On Jul 5, 2021, at 5:15 PM, Jan Beulich  wrote:
>>>
>>> ..., as are the majority of the locks involved. Conditionalize things
>>> accordingly.
>>>
>>> Also adjust the ioreq field's indentation at this occasion.
>>>
>>> Signed-off-by: Jan Beulich 
>>
>> Reviewed-by: George Dunlap 
> 
> Thanks.
> 
>> With one question…
>>
>>> @@ -905,10 +917,10 @@ int p2m_altp2m_propagate_change(struct d
>>> /* Set a specific p2m view visibility */
>>> int p2m_set_altp2m_view_visibility(struct domain *d, unsigned int idx,
>>>uint8_t visible);
>>> -#else
>>> +#else /* CONFIG_HVM */
>>> struct p2m_domain *p2m_get_altp2m(struct vcpu *v);
>>> static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx) {}
>>> -#endif
>>> +#endif /* CONFIG_HVM */
>>
>> This is relatively minor, but what’s the normal for how to label #else 
>> macros here?  Wouldn’t you normally see “#endif /* CONFIG_HVM */“ and think 
>> that the immediately preceding lines are compiled only if CONFIG_HVM is 
>> defined?  I.e., would this be more accurate to write “!CONFIG_HVM” here?
>>
>> I realize in this case it’s not a big deal since the #else is just three 
>> lines above it, but since you took the time to add the comment in there, it 
>> seems like it’s worth the time to have a quick think about whether that’s 
>> the right thing to do.
> 
> Hmm, yes, let me make this !CONFIG_HVM. I think we're not really
> consistent with this, but I agree it's more natural like you say.

Coming to write a similar construct elsewhere, I've realized this is
odd. Looking through include/asm/, the model generally used is

#ifdef CONFIG_xyz
#else /* !CONFIG_xyz */
#endif /* CONFIG_xyz */

That's what I'll switch to here then as well.

Jan




Re: [PATCH v2 2/2] xen/include/public: deprecate GNTTABOP_transfer

2022-02-15 Thread Juergen Gross

On 15.02.22 22:13, Julien Grall wrote:

Hi Juergen,

On 03/02/2022 13:14, Juergen Gross wrote:

Add a comment to include/public/grant_table.h that GNTTABOP_transfer
is deprecated, in order to discourage new use cases.


 From the commit message, it is unclear to me why we are discouraging 
new use cases and indirectly encouraging current users to move away from 
the feature.


Patch #1 seems to imply this is because the feature is not present in 
Linux upstream. But I don't think this is a sufficient reason to 
deprecate a feature.


A more compelling reason would be that the feature is broken and too 
complex to fix it.


So can you provide more details?


It is a feature available for PV domains only, and it is very complex
and hasn't been tested for ages.


As a side note, should we also update SUPPORT.md?


Good question.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[qemu-mainline test] 168123: tolerable FAIL - PUSHED

2022-02-15 Thread osstest service owner
flight 168123 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168123/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail REGR. vs. 168120

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168120
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168120
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168120
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168120
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168120
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168120
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168120
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168120
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 qemuuad38520bdeb2b1e0b487db317f29119e94c1c88d
baseline version:
 qemuue56d873f0ed9f7ed35b40cc1be841bf7f22db690

Last test of basis   168120  2022-02-15 14:08:16 Z0 days
Testing same since   168123  2022-02-15 23:08:13 Z0 days1 attempts

---

[PATCH v2] tools: remove xenstore entries on vchan server closure

2022-02-15 Thread Oleksandr Andrushchenko
From: Oleksandr Andrushchenko 

vchan server creates XenStore entries to advertise its event channel and
ring, but those are not removed after the server quits.
Add additional cleanup step, so those are removed, so clients do not try
to connect to a non-existing server.

Signed-off-by: Oleksandr Andrushchenko 

---
Since v1:
- add NULL check after strdup
---
 tools/include/libxenvchan.h |  5 +
 tools/libs/vchan/init.c | 25 +
 tools/libs/vchan/io.c   |  4 
 tools/libs/vchan/vchan.h| 31 +++
 4 files changed, 65 insertions(+)
 create mode 100644 tools/libs/vchan/vchan.h

diff --git a/tools/include/libxenvchan.h b/tools/include/libxenvchan.h
index d6010b145df2..30cc73cf97e3 100644
--- a/tools/include/libxenvchan.h
+++ b/tools/include/libxenvchan.h
@@ -86,6 +86,11 @@ struct libxenvchan {
int blocking:1;
/* communication rings */
struct libxenvchan_ring read, write;
+   /**
+* Base xenstore path for storing ring/event data used by the server
+* during cleanup.
+* */
+   char *xs_path;
 };
 
 /**
diff --git a/tools/libs/vchan/init.c b/tools/libs/vchan/init.c
index c8510e6ce98a..ae9a6b579753 100644
--- a/tools/libs/vchan/init.c
+++ b/tools/libs/vchan/init.c
@@ -46,6 +46,8 @@
 #include 
 #include 
 
+#include "vchan.h"
+
 #ifndef PAGE_SHIFT
 #define PAGE_SHIFT 12
 #endif
@@ -251,6 +253,12 @@ static int init_xs_srv(struct libxenvchan *ctrl, int 
domain, const char* xs_base
char ref[16];
char* domid_str = NULL;
xs_transaction_t xs_trans = XBT_NULL;
+
+   // store the base path so we can clean up on server closure
+   ctrl->xs_path = strdup(xs_base);
+   if (!ctrl->xs_path)
+   goto fail;
+
xs = xs_open(0);
if (!xs)
goto fail;
@@ -298,6 +306,23 @@ retry_transaction:
return ret;
 }
 
+void close_xs_srv(struct libxenvchan *ctrl)
+{
+   struct xs_handle *xs;
+
+   if (!ctrl->xs_path)
+   return;
+
+   xs = xs_open(0);
+   if (!xs)
+   goto fail;
+
+   xs_rm(xs, XBT_NULL, ctrl->xs_path);
+
+fail:
+   free(ctrl->xs_path);
+}
+
 static int min_order(size_t siz)
 {
int rv = PAGE_SHIFT;
diff --git a/tools/libs/vchan/io.c b/tools/libs/vchan/io.c
index da303fbc01ca..1f201ad554f2 100644
--- a/tools/libs/vchan/io.c
+++ b/tools/libs/vchan/io.c
@@ -40,6 +40,8 @@
 #include 
 #include 
 
+#include "vchan.h"
+
 #ifndef PAGE_SHIFT
 #define PAGE_SHIFT 12
 #endif
@@ -384,5 +386,7 @@ void libxenvchan_close(struct libxenvchan *ctrl)
if (ctrl->gnttab)
xengnttab_close(ctrl->gnttab);
}
+   if (ctrl->is_server)
+   close_xs_srv(ctrl);
free(ctrl);
 }
diff --git a/tools/libs/vchan/vchan.h b/tools/libs/vchan/vchan.h
new file mode 100644
index ..621016ef42e5
--- /dev/null
+++ b/tools/libs/vchan/vchan.h
@@ -0,0 +1,31 @@
+/**
+ * @file
+ * @section AUTHORS
+ *
+ * Copyright (C) 2021 EPAM Systems Inc.
+ *
+ * @section LICENSE
+ *
+ *  This library is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ *
+ *  This library is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  Lesser General Public License for more details.
+ *
+ *  You should have received a copy of the GNU Lesser General Public
+ *  License along with this library; If not, see 
.
+ *
+ * @section DESCRIPTION
+ *
+ *  This file contains common libxenvchan declarations.
+ */
+#ifndef LIBVCHAN_H
+#define LIBVCHAN_H
+
+void close_xs_srv(struct libxenvchan *ctrl);
+
+#endif /* LIBVCHAN_H */
-- 
2.25.1




Re: [PATCH] docs: add some clarification to xenstore-migration.md

2022-02-15 Thread Juergen Gross

On 15.02.22 21:40, Julien Grall wrote:

Hi Juergen,

On 10/02/2022 11:26, Juergen Gross wrote:

The Xenstore migration document is missing the specification that a
node record must be preceded by the record of its parent node in case
of live update.

Add that missing part.

Signed-off-by: Juergen Gross 
---
  docs/designs/xenstore-migration.md | 4 
  1 file changed, 4 insertions(+)

diff --git a/docs/designs/xenstore-migration.md 
b/docs/designs/xenstore-migration.md

index 5f1155273e..39e31c984b 100644
--- a/docs/designs/xenstore-migration.md
+++ b/docs/designs/xenstore-migration.md
@@ -316,6 +316,10 @@ a _committed_ node (globally visible in 
xenstored) or a _pending_ node (created
  or modified by a transaction for which there is also a 
`TRANSACTION_DATA`

  record previously present).
+In the live update case the _committed_ nodes must be in a top-down 
sequence,
+i.e. the first node needs to be `/`, and each other node in the 
stream must

+come _after_ its parent node.


I would actually expect the same restriction to apply for the 
non-liveupdate case. I.e. we want the parent to either exist in the tree 
or the record for the parent to be before in the stream.


Hmm, true. I'll rephrase that.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 4/5] xen/cpupool: Create different cpupools at boot time

2022-02-15 Thread Juergen Gross

On 15.02.22 18:56, Luca Fancellu wrote:




On 15 Feb 2022, at 10:48, Juergen Gross  wrote:

On 15.02.22 11:15, Luca Fancellu wrote:

Introduce an architecture specific way to create different cpupools
at boot time, this is particularly useful on ARM big.LITTLE system
where there might be the need to have different cpupools for each type
of core, but also systems using NUMA can have different cpu pools for
each node.
The feature on arm relies on a specification of the cpupools from the
device tree to build pools and assign cpus to them.
Documentation is created to explain the feature.
Signed-off-by: Luca Fancellu 


IIRC I suggested to have the core functionality in common code in order
to allow using boot time cpupool creation e.g. via commandline for x86,
too.


Yes, however I think the parser to handle everything by command line would
be huge due to input sanitisation and not easy enough as the DT, however
I see Hyperlaunch has plans to use DT on x86 so I guess it would be ok to make
this feature common once the DT is available also on x86.


Everything not being explicitly specific to Arm should be in common
code. Think of the work in progress for Risc-V.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/5] xen/sched: create public function for cpupools creation

2022-02-15 Thread Juergen Gross

On 15.02.22 18:50, Luca Fancellu wrote:




On 15 Feb 2022, at 10:38, Juergen Gross  wrote:

On 15.02.22 11:15, Luca Fancellu wrote:

Create new public function to create cpupools, it checks for pool id
uniqueness before creating the pool and can take a scheduler id or
a negative value that means the default Xen scheduler will be used.
Signed-off-by: Luca Fancellu 


Reviewed-by: Juergen Gross 

with one further question: you are allowing to use another scheduler,
but what if someone wants to set non-standard scheduling parameters
(e.g. another time slice)?


I guess for now parameters can be tuned only by xl tool, however it would
be possible as future work to allow parameters in the device tree for each
scheduler.


That is basically my concern here: A true dom0less setup won't have the
possibility to use xl...

I don't mind this series not supporting that scheme, but the chosen
syntax for the device tree should support that extension (I have not
looked into that, as I have no detailed knowledge about that topic).


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 1/5] tools/cpupools: Give a name to unnamed cpupools

2022-02-15 Thread Juergen Gross

On 15.02.22 18:48, Luca Fancellu wrote:




On 15 Feb 2022, at 10:33, Juergen Gross  wrote:

On 15.02.22 11:15, Luca Fancellu wrote:

With the introduction of boot time cpupools, Xen can create many
different cpupools at boot time other than cpupool with id 0.
Since these newly created cpupools can't have an
entry in Xenstore, create the entry using xen-init-dom0
helper with the usual convention: Pool-.
Given the change, remove the check for poolid == 0 from
libxl_cpupoolid_to_name(...).
Signed-off-by: Luca Fancellu 
---
  tools/helpers/xen-init-dom0.c  | 26 +-
  tools/libs/light/libxl_utils.c |  3 +--
  2 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/tools/helpers/xen-init-dom0.c b/tools/helpers/xen-init-dom0.c
index c99224a4b607..3539f56faeb0 100644
--- a/tools/helpers/xen-init-dom0.c
+++ b/tools/helpers/xen-init-dom0.c
@@ -43,7 +43,10 @@ int main(int argc, char **argv)
  int rc;
  struct xs_handle *xsh = NULL;
  xc_interface *xch = NULL;
-char *domname_string = NULL, *domid_string = NULL;
+char *domname_string = NULL, *domid_string = NULL, *pool_string = NULL;


Hi Juergen,



pool_string seems to be unused.


I will remove it




+char pool_path[strlen("/local/pool") + 12], pool_name[strlen("Pool-") + 5];


I don't like that. Why don't you use pointers and ...


+xc_cpupoolinfo_t *xcinfo;
+unsigned int pool_id = 0;
  libxl_uuid uuid;
/* Accept 0 or 1 argument */
@@ -114,6 +117,27 @@ int main(int argc, char **argv)
  goto out;
  }
  +/* Create an entry in xenstore for each cpupool on the system */
+do {
+xcinfo = xc_cpupool_getinfo(xch, pool_id);
+if (xcinfo != NULL) {
+if (xcinfo->cpupool_id != pool_id)
+pool_id = xcinfo->cpupool_id;
+snprintf(pool_path, sizeof(pool_path), "/local/pool/%d/name",
+ pool_id);
+snprintf(pool_name, sizeof(pool_name), "Pool-%d", pool_id);


... use asprintf() here for allocating the strings in the needed size?


Why would you like more this approach? I was trying to avoid allocation/free
operations in a loop that would need also more code to check cases in which
memory is not allocated. Instead with the buffers in the stack we don’t have 
problems.


My main concerns are the usage of strlen() for sizing an on-stack array,
the duplication of the format strings (once in the arrays definition and
once in the snprintf()), and the mixture of strlen() and constants for
sizing the arrays.

There are actually some errors in your approach for sizing the arrays,
showing how fragile your solution is: you are allowing a "positive
integer number" for a cpupool-id, which could easily need 10 digits,
while your arrays allow only for 5 or 4 digits, depending on the array.

And doing the two asprintf() calls and then checking that both arrays
are not NULL isn't that much code. BTW, your approach is missing the
test that the arrays have been large enough.

The performance of that loop shouldn't be that critical that a few
additional microseconds really hurt, especially as I don't think any
use case will exceed single digit loop iterations.






+pool_id++;
+if (!xs_write(xsh, XBT_NULL, pool_path, pool_name,
+  strlen(pool_name))) {
+fprintf(stderr, "cannot set pool name\n");
+rc = 1;
+}
+xc_cpupool_infofree(xch, xcinfo);
+if (rc)
+goto out;


Moving the call of xc_cpupool_infofree() ahead of the call of xs_write()
would drop the need for this last if statement, as you could add the
goto to the upper if.


Yes you are right, it would simplify the code




+}
+} while(xcinfo != NULL);
+


With doing all of this for being able to assign other domains created
at boot to cpupools, shouldn't you add names for other domains than dom0
here, too?


This serie is more about cpupools, maybe I can do that in another commit out
of this serie.


Fine with me.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[linux-linus test] 168121: tolerable FAIL - PUSHED

2022-02-15 Thread osstest service owner
flight 168121 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168121/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail  like 168113
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168113
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168113
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168113
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168113
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168113
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168113
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168113
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168113
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux705d84a366cfccda1e7aec1113a5399cd2ffee7d
baseline version:
 linuxd567f5db412ed52de0b3b3efca4a451263de6108

Last test of basis   168113  2022-02-14 19:11:17 Z1 days
Testing same since   168121  2022-02-15 17:42:38 Z0 days1 attempts


People who touched revisions under test:
  Andy Shevchenko 
  Cai Huoqing 
  David Sterba 
  Dāvis Mosāns 
  Filipe Manana 
  Helge Deller 
  John David Anglin 
  Linus Torvalds 
  Long Li 
  Lorenzo Pieralisi 
  Miaoqian Lin 
  Michael Kelley 
  Nathan Chancellor 
  Purna Pavan Chandra Aekkaladevi 
  Qu Wenruo 
  Randy Dunlap 
  Wei Liu 

jobs:
 b

Re: [PATCH 5/5] arm/dom0less: assign dom0less guests to cpupools

2022-02-15 Thread Stefano Stabellini
On Tue, 15 Feb 2022, Luca Fancellu wrote:
> Introduce domain-cpupool property of a xen,domain device tree node,
> that specifies the cpupool device tree handle of a xen,cpupool node
> that identifies a cpupool created at boot time where the guest will
> be assigned on creation.
> 
> Add member to the xen_arch_domainconfig public interface so the
> XEN_DOMCTL_INTERFACE_VERSION version is bumped.
> 
> Update documentation about the property.
> 
> Signed-off-by: Luca Fancellu 
> ---
>  docs/misc/arm/device-tree/booting.txt | 5 +
>  xen/arch/arm/domain.c | 6 ++
>  xen/arch/arm/domain_build.c   | 9 -
>  xen/arch/x86/domain.c | 6 ++
>  xen/common/domain.c   | 5 -
>  xen/include/public/arch-arm.h | 2 ++
>  xen/include/public/domctl.h   | 2 +-
>  xen/include/xen/domain.h  | 3 +++
>  8 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/misc/arm/device-tree/booting.txt 
> b/docs/misc/arm/device-tree/booting.txt
> index 71895663a4de..0f1f210fa449 100644
> --- a/docs/misc/arm/device-tree/booting.txt
> +++ b/docs/misc/arm/device-tree/booting.txt
> @@ -182,6 +182,11 @@ with the following properties:
>  Both #address-cells and #size-cells need to be specified because
>  both sub-nodes (described shortly) have reg properties.
>  
> +- domain-cpupool
> +
> +Optional. Handle to a xen,cpupool device tree node that identifies the
> +cpupool where the guest will be started at boot.
> +
>  Under the "xen,domain" compatible node, one or more sub-nodes are present
>  for the DomU kernel and ramdisk.
>  
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 92a6c509e5c5..be350b28b588 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -788,6 +788,12 @@ fail:
>  return rc;
>  }
>  
> +unsigned int
> +arch_get_domain_cpupool_id(const struct xen_domctl_createdomain *config)
> +{
> +return config->arch.cpupool_id;
> +}
> +
>  void arch_domain_destroy(struct domain *d)
>  {
>  /* IOMMU page table is shared with P2M, always call
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 6931c022a2e8..4f239e756775 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -3015,7 +3015,8 @@ static int __init construct_domU(struct domain *d,
>  void __init create_domUs(void)
>  {
>  struct dt_device_node *node;
> -const struct dt_device_node *chosen = dt_find_node_by_path("/chosen");
> +const struct dt_device_node *cpupool_node,
> +*chosen = dt_find_node_by_path("/chosen");
>  
>  BUG_ON(chosen == NULL);
>  dt_for_each_child_node(chosen, node)
> @@ -3053,6 +3054,12 @@ void __init create_domUs(void)
>   GUEST_VPL011_SPI - 32 + 1);
>  }
>  
> +/* Get the optional property domain-cpupool */
> +cpupool_node = dt_parse_phandle(node, "domain-cpupool", 0);
> +if ( cpupool_node )
> +dt_property_read_u32(cpupool_node, "cpupool-id",
> + &d_cfg.arch.cpupool_id);

Is this the reason to make "cpupool-id" mandatory in device tree? If so,
I think we can avoid it. Instead, we could:

- generate the cpupool-id in xen/arch/arm/cpupool.c
- keep track of cpupool-id <--> "cpupool name", where "cpupool name"
  is the node name in device tree (sibling node names are unique in
  device tree). (Alternatively we could use the phandle.) We could have
  a __initdata pool_names_map to convert cpupool names to cpupool-ids.
- here retrieve the cpupool_id from the cpupool node name


>  /*
>   * The variable max_init_domid is initialized with zero, so here it's
>   * very important to use the pre-increment operator to call
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index ef1812dc1402..3e3cf88c9c82 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -880,6 +880,12 @@ int arch_domain_create(struct domain *d,
>  return rc;
>  }
>  
> +unsigned int
> +arch_get_domain_cpupool_id(const struct xen_domctl_createdomain *config)
> +{
> +return 0;
> +}
> +
>  void arch_domain_destroy(struct domain *d)
>  {
>  if ( is_hvm_domain(d) )
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 2048ebad86ff..d42ca8292025 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -665,6 +665,8 @@ struct domain *domain_create(domid_t domid,
>  
>  if ( !is_idle_domain(d) )
>  {
> +unsigned int domain_cpupool_id;
> +
>  watchdog_domain_init(d);
>  init_status |= INIT_watchdog;
>  
> @@ -698,7 +700,8 @@ struct domain *domain_create(domid_t domid,
>  if ( !d->pbuf )
>  goto fail;
>  
> -if ( (err = sched_init_domain(d, 0)) != 0 )
> +domain_cpupool_id = arch_get_domain_cpupool_id(config);
> +if ( (err = sched_init_domain(d, doma

Re: [PATCH v2 02/70] xen/sort: Switch to an extern inline implementation

2022-02-15 Thread Stefano Stabellini
On Mon, 14 Feb 2022, Julien Grall wrote:
> On 14/02/2022 12:50, Andrew Cooper wrote:
> > There are exactly 3 callers of sort() in the hypervisor.  Callbacks in a
> > tight
> > loop like this are problematic for performance, especially with Spectre v2
> > protections, which is why extern inline is used commonly by libraries.
> > 
> > Both ARM callers pass in NULL for the swap function, and while this might
> > seem
> > like an attractive option at first, it causes generic_swap() to be used,
> > which
> > forced a byte-wise copy.  Provide real swap functions so the compiler can
> > optimise properly, which is very important for ARM downstreams where
> > milliseconds until the system is up matters.
> 
> Did you actually benchmark it? Both those lists will have < 128 elements in
> them. So I would be extremely surprised if you save more than a few hundreds
> microseconds with this approach.
> 
> So, my opinion on this approach hasn't changed. On v1, we discussed an
> approach that would suit both Stefano and I. Jan seemed to confirm that would
> also suit x86.


This patch series has become 70 patches and for the sake of helping
Andrew move forward in the quickest and most painless way possible, I
append the following using generic_swap as static inline.

Julien, Bertrand, is that acceptable to you?

Andrew, I know this is not your favorite approach but you have quite a
lot of changes to handle -- probably not worth focussing on one detail
which is pretty minor?


---
xen/sort: Switch to an extern inline implementation

There are exactly 3 callers of sort() in the hypervisor.  Callbacks in a tight
loop like this are problematic for performance, especially with Spectre v2
protections, which is why extern inline is used commonly by libraries.

Make generic_swap() a static inline and used it at the two ARM call
sites.

No functional change.

Signed-off-by: Andrew Cooper 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Jan Beulich 
---
 xen/arch/arm/bootfdt.c |  2 +-
 xen/arch/arm/io.c  |  2 +-
 xen/include/xen/sort.h | 67 ++-
 xen/lib/sort.c | 80 ++
 4 files changed, 70 insertions(+), 81 deletions(-)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index afaa0e249b..0d62945d56 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -472,7 +472,7 @@ size_t __init boot_fdt_info(const void *fdt, paddr_t paddr)
  * the banks sorted in ascending order. So sort them through.
  */
 sort(bootinfo.mem.bank, bootinfo.mem.nr_banks, sizeof(struct membank),
- cmp_memory_node, NULL);
+ cmp_memory_node, generic_swap);
 
 early_print_info();
 
diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 729287e37c..1f35aaeea6 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -170,7 +170,7 @@ void register_mmio_handler(struct domain *d,
 
 /* Sort mmio handlers in ascending order based on base address */
 sort(vmmio->handlers, vmmio->num_entries, sizeof(struct mmio_handler),
- cmp_mmio_handler, NULL);
+ cmp_mmio_handler, generic_swap);
 
 write_unlock(&vmmio->lock);
 }
diff --git a/xen/include/xen/sort.h b/xen/include/xen/sort.h
index a403652948..f6065eda58 100644
--- a/xen/include/xen/sort.h
+++ b/xen/include/xen/sort.h
@@ -3,8 +3,73 @@
 
 #include 
 
+extern gnu_inline
+void generic_swap(void *a, void *b, size_t size)
+{
+char t;
+
+do {
+t = *(char *)a;
+*(char *)a++ = *(char *)b;
+*(char *)b++ = t;
+} while ( --size > 0 );
+}
+
+/*
+ * sort - sort an array of elements
+ * @base: pointer to data to sort
+ * @num: number of elements
+ * @size: size of each element
+ * @cmp: pointer to comparison function
+ * @swap: pointer to swap function or NULL
+ *
+ * This function does a heapsort on the given array. You may provide a
+ * swap function optimized to your element type.
+ *
+ * Sorting time is O(n log n) both on average and worst-case. While
+ * qsort is about 20% faster on average, it suffers from exploitable
+ * O(n*n) worst-case behavior and extra memory requirements that make
+ * it less suitable for kernel use.
+ */
+#ifndef SORT_IMPLEMENTATION
+extern gnu_inline
+#endif
 void sort(void *base, size_t num, size_t size,
   int (*cmp)(const void *, const void *),
-  void (*swap)(void *, void *, size_t));
+  void (*swap)(void *, void *, size_t))
+{
+/* pre-scale counters for performance */
+size_t i = (num / 2) * size, n = num * size, c, r;
+
+/* heapify */
+while ( i > 0 )
+{
+for ( r = i -= size; r * 2 + size < n; r = c )
+{
+c = r * 2 + size;
+if ( (c < n - size) && (cmp(base + c, base + c + size) < 0) )
+c += size;
+if ( cmp(base + r, base + c) >= 0 )
+break;
+swap(base + r, base + c, size);
+}
+}
+
+/* sort */
+for ( i = n; i > 0; )
+{
+i -= size;

Re: [PATCH 4/5] xen/cpupool: Create different cpupools at boot time

2022-02-15 Thread Stefano Stabellini
On Tue, 15 Feb 2022, Luca Fancellu wrote:
> Introduce an architecture specific way to create different cpupools
> at boot time, this is particularly useful on ARM big.LITTLE system
> where there might be the need to have different cpupools for each type
> of core, but also systems using NUMA can have different cpu pools for
> each node.
> 
> The feature on arm relies on a specification of the cpupools from the
> device tree to build pools and assign cpus to them.
> 
> Documentation is created to explain the feature.
> 
> Signed-off-by: Luca Fancellu 
> ---
>  docs/misc/arm/device-tree/cpupools.txt | 118 +
>  xen/arch/arm/Kconfig   |   9 ++
>  xen/arch/arm/Makefile  |   1 +
>  xen/arch/arm/cpupool.c | 118 +
>  xen/common/sched/cpupool.c |   4 +-
>  xen/include/xen/sched.h|  11 +++
>  6 files changed, 260 insertions(+), 1 deletion(-)
>  create mode 100644 docs/misc/arm/device-tree/cpupools.txt
>  create mode 100644 xen/arch/arm/cpupool.c
> 
> diff --git a/docs/misc/arm/device-tree/cpupools.txt 
> b/docs/misc/arm/device-tree/cpupools.txt
> new file mode 100644
> index ..7298b6394332
> --- /dev/null
> +++ b/docs/misc/arm/device-tree/cpupools.txt
> @@ -0,0 +1,118 @@
> +Boot time cpupools
> +==
> +
> +On arm, when BOOT_TIME_CPUPOOLS is enabled in the Xen configuration, it is
> +possible to create cpupools during boot phase by specifying them in the 
> device
> +tree.
> +
> +Cpupools specification nodes shall be direct childs of /chosen node.
> +Each cpupool node contains the following properties:
> +
> +- compatible (mandatory)
> +
> +Must always include the compatiblity string: "xen,cpupool".
> +
> +- cpupool-id (mandatory)
> +
> +Must be a positive integer number.

Why is cpupool-id mandatory? It looks like it could be generated by Xen.
Or is it actually better to have the user specify it anyway?


> +- cpupool-cpus (mandatory)
> +
> +Must be a list of device tree phandle to nodes describing cpus (e.g. 
> having
> +device_type = "cpu"), it can't be empty.
> +
> +- cpupool-sched (optional)
> +
> +Must be a string having the name of a Xen scheduler, it has no effect 
> when
> +used in conjunction of a cpupool-id equal to zero, in that case the
> +default Xen scheduler is selected (sched=<...> boot argument).

I don't get why cpupool-id == 0 should trigger a special cpupool-sched
behavior.


> +Constraints
> +===
> +
> +The cpupool with id zero is implicitly created even if not specified, that 
> pool
> +must have at least one cpu assigned, otherwise Xen will stop.
> +
> +Every cpu brought up by Xen will be assigned to the cpupool with id zero if 
> it's
> +not assigned to any other cpupool.
> +
> +If a cpu is assigned to a cpupool, but it's not brought up correctly, Xen 
> will
> +stop.

Thank you for documenting the constraints, but why do we have them?
Imagine a user specifying 3 cpu pools and imagine the cpupool-id is
optional and missing. We could take care of the cpupool-id generation in
Xen and we could also assign the default scheduler everywhere
cpupool-sched is not specified. Maybe I am missing something?

Does cpupool0 has to exist? I guess the answer could be yes, but if it
is specified as id of one of the pools we are fine, otherwise it could
be automatically generated by Xen.

In any case, I don't think that cpupool0 has to have the default
scheduler?

My suggestion would be:

- make cpupool-id optional
- assign automatically cpupool-ids starting from 0
- respect cpupool-ids chosen by the user
- if some CPUs are left out (not specified in any pool) add an extra cpupool
- the extra cpupool doesn't have to be cpupool-id == 0, it could be
  cpupool-id == n
- the extra cpupool uses the default scheduler

If the user created cpupools in device tree covering all CPUs and also
specified all cpupool-ids everywhere, and none of them are 0 (no cpupool
in the system is cpupool0) then panic. (Assuming that cpupool0 is
required.)


> +Examples
> +
> +
> +A system having two types of core, the following device tree specification 
> will
> +instruct Xen to have two cpupools:
> +
> +- The cpupool with id 0 will have 4 cpus assigned.
> +- The cpupool with id 1 will have 2 cpus assigned.
> +
> +As can be seen from the example, cpupool_a has only two cpus assigned, but 
> since
> +there are two cpus unassigned, they are automatically assigned to cpupool 
> with
> +id zero. The following example can work only if hmp-unsafe=1 is passed to Xen
> +boot arguments, otherwise not all cores will be brought up by Xen and the
> +cpupool creation process will stop Xen.
> +
> +
> +a72_1: cpu@0 {
> +compatible = "arm,cortex-a72";
> +reg = <0x0 0x0>;
> +device_type = "cpu";
> +[...]
> +};
> +
> +a72_2: cpu@1 {
> +compatible = "arm,cortex-a72";
> +reg = <0x0 0x1>;
> +de

[xen-unstable-smoke test] 168122: tolerable all pass - PUSHED

2022-02-15 Thread osstest service owner
flight 168122 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168122/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  e7c2017cf4a91ab6a0fea6adca2d9dd2ab1603b0
baseline version:
 xen  94334d854bd358bd1d9c61d5e3306e4d903b120b

Last test of basis   168110  2022-02-14 10:00:43 Z1 days
Testing same since   168122  2022-02-15 21:01:44 Z0 days1 attempts


People who touched revisions under test:
  Stefano Stabellini 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   94334d854b..e7c2017cf4  e7c2017cf4a91ab6a0fea6adca2d9dd2ab1603b0 -> smoke



[qemu-mainline test] 168120: tolerable FAIL - PUSHED

2022-02-15 Thread osstest service owner
flight 168120 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168120/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168114
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168114
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168114
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168114
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168114
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168114
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168114
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168114
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 qemuue56d873f0ed9f7ed35b40cc1be841bf7f22db690
baseline version:
 qemuu2d88a3a595f1094e3ecc6cd2fd1e804634c84b0f

Last test of basis   168114  2022-02-14 23:09:51 Z0 days
Testing same since   168120  2022-02-15 14:08:16 Z0 days1 attempts


People who touched revisions under test:
  Jason Wang 
  Lei Rao 
  Nick Hudson 
  Peter Foley 

Re: [PATCH v2 69/70] x86/efi: Disable CET-IBT around Runtime Services calls

2022-02-15 Thread Andrew Cooper
On 15/02/2022 16:53, Jan Beulich wrote:
> On 14.02.2022 13:51, Andrew Cooper wrote:
>> UEFI Runtime services, at the time of writing, aren't CET-IBT compatible.
>> Work is ongoing to address this. In the meantime, unconditionally disable 
>> IBT.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks.

>
>> --- a/xen/common/efi/runtime.c
>> +++ b/xen/common/efi/runtime.c
>> @@ -21,6 +21,7 @@ struct efi_rs_state {
>>* don't strictly need that.
>>*/
>>   unsigned long __aligned(32) cr3;
>> +unsigned long msr_s_cet;
>>  #endif
>>  };
> The latest with the next addition here we will probably want to ...
>
>> @@ -113,6 +114,19 @@ struct efi_rs_state efi_rs_enter(void)
> ... no longer have this be the function's return type.

So about this.

why aren't we using __attribute__((force_align_arg_pointer)) ?  It
exists in at least GCC 4.1 and Clang 6.

We're way way overdue bumping the minimum toolchain versions, and Clang
3.5=>6 is still very obsolete minimum version.  This way, we're not
depending on some very subtle ABI mechanics to try and keep the stack
properly aligned.

~Andrew


Re: [PATCH V3 6/13] input: serio: use time_is_before_jiffies() instead of open coding it

2022-02-15 Thread Dmitry Torokhov
Hi Wang,

On Mon, Feb 14, 2022 at 05:55:43PM -0800, Qing Wang wrote:
> From: Wang Qing 
> 
> Use the helper function time_is_{before,after}_jiffies() to improve
> code readability.

I applied changes by Danilo Krummrich converting the driver to use
ktime_t (see 
https://lore.kernel.org/r/20220215160208.34826-3-danilokrummr...@dk-develop.de)
which makes this change not applicable.

Thanks.

-- 
Dmitry



Re: [PATCH v6 00/21] Introduce power-off+restart call chain API

2022-02-15 Thread Dmitry Osipenko
31.01.2022 02:36, Dmitry Osipenko пишет:
> Problem
> ---
> 
> SoC devices require power-off call chaining functionality from kernel.
> We have a widely used restart chaining provided by restart notifier API,
> but nothing for power-off.
> 
> Solution
> 
> 
> Introduce new API that provides both restart and power-off call chains.
> 
> Why combine restart with power-off? Because drivers often do both.
> More practical to have API that provides both under the same roof.
> 
> The new API is designed with simplicity and extensibility in mind.
> It's built upon the existing restart and reboot APIs. The simplicity
> is in new helper functions that are convenient for drivers. The
> extensibility is in the design that doesn't hardcode callback
> arguments, making easy to add new parameters and remove old.
> 
> This is a third attempt to introduce the new API. First was made by
> Guenter Roeck back in 2014, second was made by Thierry Reding in 2017.
> In fact the work didn't stop and recently arm_pm_restart() was removed
> from v5.14 kernel, which was a part of preparatory work started by
> Guenter Roeck. I took into account experience and ideas from the
> previous attempts, extended and polished them.


Rafael and all, do you see anything critical that needs to be improved
in this v6?

Will be great if you could take this patchset via the power tree if it
looks okay, or give an ack.



Re: [PATCH v2 2/2] xen/include/public: deprecate GNTTABOP_transfer

2022-02-15 Thread Julien Grall

Hi Juergen,

On 03/02/2022 13:14, Juergen Gross wrote:

Add a comment to include/public/grant_table.h that GNTTABOP_transfer
is deprecated, in order to discourage new use cases.


From the commit message, it is unclear to me why we are discouraging 
new use cases and indirectly encouraging current users to move away from 
the feature.


Patch #1 seems to imply this is because the feature is not present in 
Linux upstream. But I don't think this is a sufficient reason to 
deprecate a feature.


A more compelling reason would be that the feature is broken and too 
complex to fix it.


So can you provide more details?

As a side note, should we also update SUPPORT.md?

Cheers,

--
Julien Grall



Re: [PATCH v2] lib: extend ASSERT()

2022-02-15 Thread Julien Grall

(+ Bertrand)

Hi Jan,

On 27/01/2022 14:34, Jan Beulich wrote:

The increasing amount of constructs along the lines of

 if ( !condition )
 {
 ASSERT_UNREACHABLE();
 return;
 }

is not only longer than necessary, but also doesn't produce incident
specific console output (except for file name and line number).


So I agree that this construct will always result to a minimum 5 lines. 
Which is not nice. But the proposed change is...



Allow
the intended effect to be achieved with ASSERT(), by giving it a second
parameter allowing specification of the action to take in release builds
in case an assertion would have triggered in a debug one. The example
above then becomes

 ASSERT(condition, return);

Make sure the condition will continue to not get evaluated when just a
single argument gets passed to ASSERT().

Signed-off-by: Jan Beulich 
---
v2: Rename new macro parameter.
---
RFC: The use of a control flow construct as a macro argument may be
  controversial.


indeed controversial. I find this quite confusing and not something I 
would request a user to switch to if they use the longer version.


That said, this is mainly a matter of taste. So I am interested to hear 
others view.


I have also CCed Bertrand to have an opinions from the Fusa Group (I 
suspect this will go backward for them).




--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -826,11 +826,7 @@ int xenmem_add_to_physmap(struct domain
  union add_to_physmap_extra extra = {};
  struct page_info *pages[16];
  
-if ( !paging_mode_translate(d) )

-{
-ASSERT_UNREACHABLE();
-return -EACCES;
-}
+ASSERT(paging_mode_translate(d), return -EACCES);
  
  if ( xatp->space == XENMAPSPACE_gmfn_foreign )

  extra.foreign_domid = DOMID_INVALID;
@@ -920,11 +916,7 @@ static int xenmem_add_to_physmap_batch(s
   * call doesn't succumb to dead-code-elimination. Duplicate the 
short-circut
   * from xatp_permission_check() to try and help the compiler out.
   */
-if ( !paging_mode_translate(d) )
-{
-ASSERT_UNREACHABLE();
-return -EACCES;
-}
+ASSERT(paging_mode_translate(d), return -EACCES);
  
  if ( unlikely(xatpb->size < extent) )

  return -EILSEQ;
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -49,11 +49,13 @@
  #endif
  
  #ifndef NDEBUG

-#define ASSERT(p) \
+#define ASSERT(p, ...) \
  do { if ( unlikely(!(p)) ) assert_failed(#p); } while (0)
  #define ASSERT_UNREACHABLE() assert_failed("unreachable")
  #else
-#define ASSERT(p) do { if ( 0 && (p) ) {} } while (0)
+#define ASSERT(p, failsafe...) do { \
+if ( !count_args(failsafe) || unlikely(!(p)) ) { failsafe; } \
+} while ( 0 )
  #define ASSERT_UNREACHABLE() do { } while (0)
  #endif
  



Cheers,

--
Julien Grall



Re: [PATCH v2 68/70] x86/setup: Rework MSR_S_CET handling for CET-IBT

2022-02-15 Thread Andrew Cooper
On 15/02/2022 16:46, Jan Beulich wrote:
> On 14.02.2022 13:51, Andrew Cooper wrote:
>> CET-SS and CET-IBT can be independently controlled, so the configuration of
>> MSR_S_CET can't be constant any more.
>>
>> Introduce xen_msr_s_cet_value(), mostly because I don't fancy
>> writing/maintaining that logic in assembly.  Use this in the 3 paths which
>> alter MSR_S_CET when both features are potentially active.
>>
>> To active CET-IBT, we only need CR4.CET and MSR_S_CET.ENDBR_EN.  This is
>> common with the CET-SS setup, so reorder the operations to set up CR4 and
>> MSR_S_CET for any nonzero result from xen_msr_s_cet_value(), and set up
>> MSR_PL0_SSP and SSP if SHSTK_EN was also set.
>>
>> Adjust the crash path to disable CET-IBT too.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks,

> albeit with a nit and a remark:
>
>> --- a/xen/arch/x86/acpi/wakeup_prot.S
>> +++ b/xen/arch/x86/acpi/wakeup_prot.S
>> @@ -63,7 +63,26 @@ ENTRY(s3_resume)
>>  pushq   %rax
>>  lretq
>>  1:
>> -#ifdef CONFIG_XEN_SHSTK
>> +#if defined(CONFIG_XEN_SHSTK) || defined(CONFIG_XEN_IBT)
>> +callxen_msr_s_cet_value
>> +test%eax, %eax
>> +jz  .L_cet_done
>> +
>> +/* Set up MSR_S_CET. */
>> +mov $MSR_S_CET, %ecx
>> +xor %edx, %edx
>> +wrmsr
>> +
>> +/* Enable CR4.CET. */
>> +mov $XEN_MINIMAL_CR4 | X86_CR4_CET, %ecx
>> +mov %rcx, %cr4
>> +
>> +/* WARNING! call/ret now fatal (iff SHSTK) until SETSSBSY loads SSP 
>> */
>> +
>> +#if defined(CONFIG_XEN_SHSTK)
> Just #ifdef, as it was before?

I can if you insist, but that's breaking consistency with the other
ifdefary.

>
>> @@ -90,10 +101,6 @@ ENTRY(s3_resume)
>>  mov %edi, %eax
>>  wrmsr
>>  
>> -/* Enable CET.  MSR_INTERRUPT_SSP_TABLE is set up later in 
>> load_system_tables(). */
>> -mov $XEN_MINIMAL_CR4 | X86_CR4_CET, %ebx
>> -mov %rbx, %cr4
> The latter part of this comment could do with retaining.

So I tried that in v1, and concluded not for v2.

There is nowhere appropriate for it to live, anywhere in this block. 
And it is an artefact of me bootstrapping SHSTK to start with.

The truth is that nothing about MSR_ISST_TABLE matters until
load_system_table sets up both this and the TSS IST fields together. 
IST exceptions are already fatal at this point for non-SHSTK reasons.

~Andrew


Re: [PATCH] docs: add some clarification to xenstore-migration.md

2022-02-15 Thread Julien Grall

Hi Juergen,

On 10/02/2022 11:26, Juergen Gross wrote:

The Xenstore migration document is missing the specification that a
node record must be preceded by the record of its parent node in case
of live update.

Add that missing part.

Signed-off-by: Juergen Gross 
---
  docs/designs/xenstore-migration.md | 4 
  1 file changed, 4 insertions(+)

diff --git a/docs/designs/xenstore-migration.md 
b/docs/designs/xenstore-migration.md
index 5f1155273e..39e31c984b 100644
--- a/docs/designs/xenstore-migration.md
+++ b/docs/designs/xenstore-migration.md
@@ -316,6 +316,10 @@ a _committed_ node (globally visible in xenstored) or a 
_pending_ node (created
  or modified by a transaction for which there is also a `TRANSACTION_DATA`
  record previously present).
  
+In the live update case the _committed_ nodes must be in a top-down sequence,

+i.e. the first node needs to be `/`, and each other node in the stream must
+come _after_ its parent node.


I would actually expect the same restriction to apply for the 
non-liveupdate case. I.e. we want the parent to either exist in the tree 
or the record for the parent to be before in the stream.


Cheers,

--
Julien Grall



Re: [PATCH v6 03/11] xen/arm: Allow device-passthrough even the IOMMU is off

2022-02-15 Thread Julien Grall

Hi,

On 14/02/2022 03:19, Penny Zheng wrote:

From: Stefano Stabellini 

At the moment, we are only supporting device-passthrough when Xen has
enabled the IOMMU. There are some use cases where it is not possible to
use the IOMMU (e.g. doesn't exist, hardware limitation, performance) yet
it would be OK to assign a device to trusted domain so long they are
direct-mapped or the device doesn't do DMA.

Note that when the IOMMU is disabled, it will be necessary to add
xen,force-assign-without-iommu for every device that needs to be assigned.

Signed-off-by: Stefano Stabellini 
Signed-off-by: Penny Zheng 
Tested-by: Stefano Stabellini 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall



Re: [PATCH v6 02/11] xen: introduce CDF_directmap

2022-02-15 Thread Julien Grall

(+ Jan)

Hi Penny,

I am CCing Jan to give him a chance to...

On 14/02/2022 03:19, Penny Zheng wrote:

diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index cfb0b47f13..24eb4cc7d3 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -31,6 +31,10 @@ void arch_get_domain_info(const struct domain *d,
  /* CDF_* constant. Internal flags for domain creation. */
  /* Is this a privileged domain? */
  #define CDF_privileged   (1U << 0)
+#ifdef CONFIG_ARM
+/* Should domain memory be directly mapped? */
+#define CDF_directmap(1U << 1)
+#endif


... comment on this approach. I would be happy to switch to an ASSERT() 
if that's preferred.


Please note that as you modify x86 code (even it is a couple of lines) 
you should technically CC the x86 maintainers. Similarly the changes in 
include/xen/domain.h should have the REST CCed.


We have a script that will find the proper correct CC for each patch 
(see scripts/add_maintainers.pl). The workflow is written down in the 
script itself.


Cheers,

--
Julien Grall



Re: [PATCH] MAINTAINERS: make Bertrand ARM maintainer

2022-02-15 Thread Julien Grall

Hi,

On 10/02/2022 19:08, Stefano Stabellini wrote:

Signed-off-by: Stefano Stabellini 


Acked-by: Julien Grall 

The proposal has been for a few days on the ML without any objection. So 
I will commit it now.


Cheers,

--
Julien Grall



Re: [PATCH 5/5] arm/dom0less: assign dom0less guests to cpupools

2022-02-15 Thread Luca Fancellu



> On 15 Feb 2022, at 10:56, Juergen Gross  wrote:
> 
> On 15.02.22 11:15, Luca Fancellu wrote:
>> Introduce domain-cpupool property of a xen,domain device tree node,
>> that specifies the cpupool device tree handle of a xen,cpupool node
>> that identifies a cpupool created at boot time where the guest will
>> be assigned on creation.
>> Add member to the xen_arch_domainconfig public interface so the
>> XEN_DOMCTL_INTERFACE_VERSION version is bumped.
>> Update documentation about the property.
>> Signed-off-by: Luca Fancellu 
>> ---
>>  docs/misc/arm/device-tree/booting.txt | 5 +
>>  xen/arch/arm/domain.c | 6 ++
>>  xen/arch/arm/domain_build.c   | 9 -
>>  xen/arch/x86/domain.c | 6 ++
>>  xen/common/domain.c   | 5 -
>>  xen/include/public/arch-arm.h | 2 ++
>>  xen/include/public/domctl.h   | 2 +-
>>  xen/include/xen/domain.h  | 3 +++
>>  8 files changed, 35 insertions(+), 3 deletions(-)
>> diff --git a/docs/misc/arm/device-tree/booting.txt 
>> b/docs/misc/arm/device-tree/booting.txt
>> index 71895663a4de..0f1f210fa449 100644
>> --- a/docs/misc/arm/device-tree/booting.txt
>> +++ b/docs/misc/arm/device-tree/booting.txt
>> @@ -182,6 +182,11 @@ with the following properties:
>>  Both #address-cells and #size-cells need to be specified because
>>  both sub-nodes (described shortly) have reg properties.
>>  +- domain-cpupool
>> +
>> +Optional. Handle to a xen,cpupool device tree node that identifies the
>> +cpupool where the guest will be started at boot.
>> +
>>  Under the "xen,domain" compatible node, one or more sub-nodes are present
>>  for the DomU kernel and ramdisk.
>>  diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 92a6c509e5c5..be350b28b588 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -788,6 +788,12 @@ fail:
>>  return rc;
>>  }
>>  +unsigned int
>> +arch_get_domain_cpupool_id(const struct xen_domctl_createdomain *config)
>> +{
>> +return config->arch.cpupool_id;
>> +}
>> +
> 
> I don't see why this should be arch specific.
> 
>>  void arch_domain_destroy(struct domain *d)
>>  {
>>  /* IOMMU page table is shared with P2M, always call
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index 6931c022a2e8..4f239e756775 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -3015,7 +3015,8 @@ static int __init construct_domU(struct domain *d,
>>  void __init create_domUs(void)
>>  {
>>  struct dt_device_node *node;
>> -const struct dt_device_node *chosen = dt_find_node_by_path("/chosen");
>> +const struct dt_device_node *cpupool_node,
>> +*chosen = dt_find_node_by_path("/chosen");
>>BUG_ON(chosen == NULL);
>>  dt_for_each_child_node(chosen, node)
>> @@ -3053,6 +3054,12 @@ void __init create_domUs(void)
>>   GUEST_VPL011_SPI - 32 + 1);
>>  }
>>  +/* Get the optional property domain-cpupool */
>> +cpupool_node = dt_parse_phandle(node, "domain-cpupool", 0);
>> +if ( cpupool_node )
>> +dt_property_read_u32(cpupool_node, "cpupool-id",
>> + &d_cfg.arch.cpupool_id);
>> +
>>  /*
>>   * The variable max_init_domid is initialized with zero, so here 
>> it's
>>   * very important to use the pre-increment operator to call
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index ef1812dc1402..3e3cf88c9c82 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -880,6 +880,12 @@ int arch_domain_create(struct domain *d,
>>  return rc;
>>  }
>>  +unsigned int
>> +arch_get_domain_cpupool_id(const struct xen_domctl_createdomain *config)
>> +{
>> +return 0;
>> +}
>> +
>>  void arch_domain_destroy(struct domain *d)
>>  {
>>  if ( is_hvm_domain(d) )
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 2048ebad86ff..d42ca8292025 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -665,6 +665,8 @@ struct domain *domain_create(domid_t domid,
>>if ( !is_idle_domain(d) )
>>  {
>> +unsigned int domain_cpupool_id;
>> +
>>  watchdog_domain_init(d);
>>  init_status |= INIT_watchdog;
>>  @@ -698,7 +700,8 @@ struct domain *domain_create(domid_t domid,
>>  if ( !d->pbuf )
>>  goto fail;
>>  -if ( (err = sched_init_domain(d, 0)) != 0 )
>> +domain_cpupool_id = arch_get_domain_cpupool_id(config);
>> +if ( (err = sched_init_domain(d, domain_cpupool_id)) != 0 )
>>  goto fail;
>>if ( (err = late_hwdom_init(d)) != 0 )
>> diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
>> index 94b31511ddea..2c5d1ea7f01a 100644
>> --- a/xen/include/public/arch-arm.h
>> +++ b/xen/include/public/arch-arm.h
>> @@ -321,6 +321,8 @

Re: [PATCH 4/5] xen/cpupool: Create different cpupools at boot time

2022-02-15 Thread Luca Fancellu



> On 15 Feb 2022, at 10:48, Juergen Gross  wrote:
> 
> On 15.02.22 11:15, Luca Fancellu wrote:
>> Introduce an architecture specific way to create different cpupools
>> at boot time, this is particularly useful on ARM big.LITTLE system
>> where there might be the need to have different cpupools for each type
>> of core, but also systems using NUMA can have different cpu pools for
>> each node.
>> The feature on arm relies on a specification of the cpupools from the
>> device tree to build pools and assign cpus to them.
>> Documentation is created to explain the feature.
>> Signed-off-by: Luca Fancellu 
> 
> IIRC I suggested to have the core functionality in common code in order
> to allow using boot time cpupool creation e.g. via commandline for x86,
> too.

Yes, however I think the parser to handle everything by command line would
be huge due to input sanitisation and not easy enough as the DT, however
I see Hyperlaunch has plans to use DT on x86 so I guess it would be ok to make
this feature common once the DT is available also on x86.

Cheers,
Luca

> 
> 
> Juergen
> 




Re: [PATCH 3/5] xen/sched: retrieve scheduler id by name

2022-02-15 Thread Luca Fancellu



> On 15 Feb 2022, at 10:40, Juergen Gross  wrote:
> 
> On 15.02.22 11:15, Luca Fancellu wrote:
>> Add a public function to retrieve the scheduler id by the scheduler
>> name.
>> Signed-off-by: Luca Fancellu 
>> ---
>>  xen/common/sched/core.c | 11 +++
>>  xen/include/xen/sched.h | 11 +++
>>  2 files changed, 22 insertions(+)
>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>> index 8f4b1ca10d1c..9696d3c1d769 100644
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>> @@ -2947,6 +2947,17 @@ void scheduler_enable(void)
>>  scheduler_active = true;
>>  }
>>  +int __init sched_get_id_by_name(const char *sched_name)
>> +{
>> +unsigned int i;
>> +
>> +for ( i = 0; i < NUM_SCHEDULERS; i++ )
>> +if ( schedulers[i] && !strcmp(schedulers[i]->opt_name, sched_name) )
>> +return schedulers[i]->sched_id;
>> +
>> +return -1;
>> +}
>> +
> 
> Please make use of this function in scheduler_init(), as this
> functionality is open coded there, too.
> 

Ok I will change the code in scheduler_init to use the new function.

Cheers,
Luca

> 
> Juergen
> 




Re: [PATCH v2 07/70] x86: Build check for embedded endbr64 instructions

2022-02-15 Thread Andrew Cooper
On 15/02/2022 15:12, Jan Beulich wrote:
> On 14.02.2022 13:50, Andrew Cooper wrote:
>> From: Marek Marczykowski-Górecki 
>>
>> Embedded endbr64 instructions mark legal indirect branches as far as the CPU
>> is concerned, which aren't legal as far as the logic is concerned.
> I think it would help if it was clarified what "embedded" actually means
> here.

Oh yeah, that's lost a bit of context now I've split it out of the patch
introducing endbr.h

>
>> --- a/xen/arch/x86/Makefile
>> +++ b/xen/arch/x86/Makefile
>> @@ -155,6 +155,9 @@ $(TARGET)-syms: prelink.o xen.lds
>>  $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
>>  $(LD) $(XEN_LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
>>  $(@D)/.$(@F).1.o -o $@
>> +ifeq ($(CONFIG_XEN_IBT),y)
>> +$(SHELL) $(BASEDIR)/tools/check-endbr.sh $@
>> +endif
>>  $(NM) -pa --format=sysv $(@D)/$(@F) \
>>  | $(BASEDIR)/tools/symbols --all-symbols --xensyms --sysv 
>> --sort \
>>  >$(@D)/$(@F).map
> The same wants doing on xen.efi, I guess?

Probably.

>
>> --- /dev/null
>> +++ b/xen/tools/check-endbr.sh
>> @@ -0,0 +1,76 @@
>> +#!/bin/sh
>> +
>> +#
>> +# Usage ./$0 xen-syms
>> +#
>> +
>> +set -e
>> +
>> +OBJCOPY="${OBJCOPY:-objcopy} -j .text $1"
>> +OBJDUMP="${OBJDUMP:-objdump} -j .text $1"
>> +
>> +D=$(mktemp -d)
>> +trap "rm -rf $D" EXIT
>> +
>> +TEXT_BIN=$D/xen-syms.text
>> +VALID=$D/valid-addrs
>> +ALL=$D/all-addrs
>> +BAD=$D/bad-addrs
>> +
>> +#
>> +# First, look for all the valid endbr64 instructions.
>> +# A worst-case disassembly, viewed through cat -A, may look like:
>> +#
>> +# 82d040337bd4 :$
>> +# 82d040337bd4:^If3 0f 1e fa  ^Iendbr64 $
>> +# 82d040337bd8:^Ieb fe^Ijmp82d040337bd8 
>> $
>> +# 82d040337bda:^Ib8 f3 0f 1e fa   ^Imov$0xfa1e0ff3,%eax$
>> +#
>> +# Want to grab the address of endbr64 instructions only, ignoring function
>> +# names/jump labels/etc, so look for 'endbr64' preceeded by a tab and with 
>> any
>> +# number of trailing spaces before the end of the line.
>> +#
>> +${OBJDUMP} -d | grep '  endbr64 *$' | cut -f 1 -d ':' > $VALID &
> Since you look at only .text the risk of the disassembler coming
> out of sync with the actual instruction stream is lower than when
> 32- and 16-bit code was also part of what is disassembled, but it's
> not zero.

I'm not sure that we have any interesting non-64bit code at all in .text.

_start is technically 32bit but is mode-invariant as far as decoding goes.

The kexec trampoline is here too, but when I dust off my cleanup patch,
there will no longer be data or mode-dependent things to disassemble.

Everything else I can think of is in .init.text.

> Any zero-padding inserted anywhere by the linker can
> result in an immediately following ENDBR to be missed (because
> sequences of zeros resemble 2-byte insns).

I'm not sure this is a problem.  This pass is looking for everything
that objdump thinks is a legal endbr64 instruction, and it splits at labels.

Only the hand-written stubs can legitimately have an endbr64 without a
symbol pointing at it.

We also don't have any 0 padding.  It's specified as 0x90 in the linker
file, although I've been debating switching this to 0xcc for a while now
already.

>  While this risk may be
> acceptable, I think it wants mentioning at least in the description,
> maybe even at the top of the script (where one would likely look
> first after it spitting out an error).
>
> Do you perhaps want to also pass -w to objdump, to eliminate the
> risk of getting confused by split lines?

I think that's probably a good move irrespective.  This particular pipe
is the longest single task in the script which is why I backgrounded it
while the second scan occurs.  -w means fewer lines so hopefully a minor
speedup.

>> +#
>> +# Second, look for any endbr64 byte sequence
>> +# This has a couple of complications:
>> +#
>> +# 1) Grep binary search isn't VMA aware.  Copy .text out as binary, causing
>> +#the grep offset to be from the start of .text.
>> +#
>> +# 2) AWK can't add 64bit integers, because internally all numbers are 
>> doubles.
>> +#When the upper bits are set, the exponents worth of precision is lost 
>> in
>> +#the lower bits, rounding integers to the nearest 4k.
>> +#
>> +#Instead, use the fact that Xen's .text is within a 1G aligned region, 
>> and
>> +#split the VMA in half so AWK's numeric addition is only working on 32 
>> bit
>> +#numbers, which don't lose precision.
>> +#
>> +eval $(${OBJDUMP} -h | awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", 
>> substr($4, 1, 8), substr($4, 9, 16)}')
>> +
>> +${OBJCOPY} -O binary $TEXT_BIN
>> +grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN |
>> +awk -F':' '{printf "%s%x\n", "'$vma_hi'", strtonum(0x'$vma_lo') + $1}' 
>> > $ALL
> None of the three options passed to grep look to be standardized.
> Is this going to cause problems on non-Linux systems? Should this
> checking perhaps be put behind a

Re: [PATCH 2/5] xen/sched: create public function for cpupools creation

2022-02-15 Thread Luca Fancellu



> On 15 Feb 2022, at 10:38, Juergen Gross  wrote:
> 
> On 15.02.22 11:15, Luca Fancellu wrote:
>> Create new public function to create cpupools, it checks for pool id
>> uniqueness before creating the pool and can take a scheduler id or
>> a negative value that means the default Xen scheduler will be used.
>> Signed-off-by: Luca Fancellu 
> 
> Reviewed-by: Juergen Gross 
> 
> with one further question: you are allowing to use another scheduler,
> but what if someone wants to set non-standard scheduling parameters
> (e.g. another time slice)?

I guess for now parameters can be tuned only by xl tool, however it would
be possible as future work to allow parameters in the device tree for each
scheduler.

Cheers,
Luca

> 
> 
> Juergen
> 




Re: [PATCH 1/5] tools/cpupools: Give a name to unnamed cpupools

2022-02-15 Thread Luca Fancellu



> On 15 Feb 2022, at 10:33, Juergen Gross  wrote:
> 
> On 15.02.22 11:15, Luca Fancellu wrote:
>> With the introduction of boot time cpupools, Xen can create many
>> different cpupools at boot time other than cpupool with id 0.
>> Since these newly created cpupools can't have an
>> entry in Xenstore, create the entry using xen-init-dom0
>> helper with the usual convention: Pool-.
>> Given the change, remove the check for poolid == 0 from
>> libxl_cpupoolid_to_name(...).
>> Signed-off-by: Luca Fancellu 
>> ---
>>  tools/helpers/xen-init-dom0.c  | 26 +-
>>  tools/libs/light/libxl_utils.c |  3 +--
>>  2 files changed, 26 insertions(+), 3 deletions(-)
>> diff --git a/tools/helpers/xen-init-dom0.c b/tools/helpers/xen-init-dom0.c
>> index c99224a4b607..3539f56faeb0 100644
>> --- a/tools/helpers/xen-init-dom0.c
>> +++ b/tools/helpers/xen-init-dom0.c
>> @@ -43,7 +43,10 @@ int main(int argc, char **argv)
>>  int rc;
>>  struct xs_handle *xsh = NULL;
>>  xc_interface *xch = NULL;
>> -char *domname_string = NULL, *domid_string = NULL;
>> +char *domname_string = NULL, *domid_string = NULL, *pool_string = NULL;

Hi Juergen,

> 
> pool_string seems to be unused.

I will remove it

> 
>> +char pool_path[strlen("/local/pool") + 12], pool_name[strlen("Pool-") + 
>> 5];
> 
> I don't like that. Why don't you use pointers and ...
> 
>> +xc_cpupoolinfo_t *xcinfo;
>> +unsigned int pool_id = 0;
>>  libxl_uuid uuid;
>>/* Accept 0 or 1 argument */
>> @@ -114,6 +117,27 @@ int main(int argc, char **argv)
>>  goto out;
>>  }
>>  +/* Create an entry in xenstore for each cpupool on the system */
>> +do {
>> +xcinfo = xc_cpupool_getinfo(xch, pool_id);
>> +if (xcinfo != NULL) {
>> +if (xcinfo->cpupool_id != pool_id)
>> +pool_id = xcinfo->cpupool_id;
>> +snprintf(pool_path, sizeof(pool_path), "/local/pool/%d/name",
>> + pool_id);
>> +snprintf(pool_name, sizeof(pool_name), "Pool-%d", pool_id);
> 
> ... use asprintf() here for allocating the strings in the needed size?

Why would you like more this approach? I was trying to avoid allocation/free
operations in a loop that would need also more code to check cases in which
memory is not allocated. Instead with the buffers in the stack we don’t have 
problems.

> 
>> +pool_id++;
>> +if (!xs_write(xsh, XBT_NULL, pool_path, pool_name,
>> +  strlen(pool_name))) {
>> +fprintf(stderr, "cannot set pool name\n");
>> +rc = 1;
>> +}
>> +xc_cpupool_infofree(xch, xcinfo);
>> +if (rc)
>> +goto out;
> 
> Moving the call of xc_cpupool_infofree() ahead of the call of xs_write()
> would drop the need for this last if statement, as you could add the
> goto to the upper if.

Yes you are right, it would simplify the code

> 
>> +}
>> +} while(xcinfo != NULL);
>> +
> 
> With doing all of this for being able to assign other domains created
> at boot to cpupools, shouldn't you add names for other domains than dom0
> here, too?

This serie is more about cpupools, maybe I can do that in another commit out
of this serie.

Thanks for your review.

Cheers,
Luca

> 
> 
> Juergen
> 




[xen-unstable test] 168118: tolerable FAIL

2022-02-15 Thread osstest service owner
flight 168118 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168118/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install fail in 168111 pass in 
168118
 test-arm64-arm64-xl-thunderx  8 xen-boot   fail pass in 168111
 test-amd64-i386-qemuu-rhel6hvm-amd 14 guest-start/redhat.repeat fail pass in 
168111
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail pass in 
168111
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 18 
guest-start/debianhvm.repeat fail pass in 168111

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop  fail blocked in 168111
 test-arm64-arm64-xl-thunderx 15 migrate-support-check fail in 168111 never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-check fail in 168111 never 
pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168111
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168111
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168111
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168111
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168111
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168111
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168111
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168111
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168111
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168111
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168111
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverest

Re: [PATCH v2 69/70] x86/efi: Disable CET-IBT around Runtime Services calls

2022-02-15 Thread Jan Beulich
On 14.02.2022 13:51, Andrew Cooper wrote:
> UEFI Runtime services, at the time of writing, aren't CET-IBT compatible.
> Work is ongoing to address this. In the meantime, unconditionally disable IBT.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 

> --- a/xen/common/efi/runtime.c
> +++ b/xen/common/efi/runtime.c
> @@ -21,6 +21,7 @@ struct efi_rs_state {
>* don't strictly need that.
>*/
>   unsigned long __aligned(32) cr3;
> +unsigned long msr_s_cet;
>  #endif
>  };

The latest with the next addition here we will probably want to ...

> @@ -113,6 +114,19 @@ struct efi_rs_state efi_rs_enter(void)

... no longer have this be the function's return type.

Jan




Re: [PATCH v2 68/70] x86/setup: Rework MSR_S_CET handling for CET-IBT

2022-02-15 Thread Jan Beulich
On 14.02.2022 13:51, Andrew Cooper wrote:
> CET-SS and CET-IBT can be independently controlled, so the configuration of
> MSR_S_CET can't be constant any more.
> 
> Introduce xen_msr_s_cet_value(), mostly because I don't fancy
> writing/maintaining that logic in assembly.  Use this in the 3 paths which
> alter MSR_S_CET when both features are potentially active.
> 
> To active CET-IBT, we only need CR4.CET and MSR_S_CET.ENDBR_EN.  This is
> common with the CET-SS setup, so reorder the operations to set up CR4 and
> MSR_S_CET for any nonzero result from xen_msr_s_cet_value(), and set up
> MSR_PL0_SSP and SSP if SHSTK_EN was also set.
> 
> Adjust the crash path to disable CET-IBT too.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 
albeit with a nit and a remark:

> --- a/xen/arch/x86/acpi/wakeup_prot.S
> +++ b/xen/arch/x86/acpi/wakeup_prot.S
> @@ -63,7 +63,26 @@ ENTRY(s3_resume)
>  pushq   %rax
>  lretq
>  1:
> -#ifdef CONFIG_XEN_SHSTK
> +#if defined(CONFIG_XEN_SHSTK) || defined(CONFIG_XEN_IBT)
> +callxen_msr_s_cet_value
> +test%eax, %eax
> +jz  .L_cet_done
> +
> +/* Set up MSR_S_CET. */
> +mov $MSR_S_CET, %ecx
> +xor %edx, %edx
> +wrmsr
> +
> +/* Enable CR4.CET. */
> +mov $XEN_MINIMAL_CR4 | X86_CR4_CET, %ecx
> +mov %rcx, %cr4
> +
> +/* WARNING! call/ret now fatal (iff SHSTK) until SETSSBSY loads SSP 
> */
> +
> +#if defined(CONFIG_XEN_SHSTK)

Just #ifdef, as it was before?

> @@ -90,10 +101,6 @@ ENTRY(s3_resume)
>  mov %edi, %eax
>  wrmsr
>  
> -/* Enable CET.  MSR_INTERRUPT_SSP_TABLE is set up later in 
> load_system_tables(). */
> -mov $XEN_MINIMAL_CR4 | X86_CR4_CET, %ebx
> -mov %rbx, %cr4

The latter part of this comment could do with retaining.

Jan




Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Jan Beulich
On 15.02.2022 17:28, Oleksandr Andrushchenko wrote:
> On 15.02.22 18:18, Jan Beulich wrote:
>> On 15.02.2022 16:46, Oleksandr Andrushchenko wrote:
>>> Question: can anyone please explain why pcidevs is a recursive lock?
>> Well, assuming you did look at the change making it so, can you be a
>> little more specific with your question? Are you perhaps suggesting
>> the original reason has disappeared, and no new one has appeared? I'm
>> afraid I have to repeat what I did say before: If you want to remove
>> the recursive nature of the lock, then it is all on you to prove that
>> there's no code path where the lock is taken recursively. IOW even if
>> no-one knew of a reason, you'd still need to provide this proof.
>> Unless of course we'd all agree we're okay to take the risk; I don't
>> see us doing so, though.
> The question was exactly as asked: I don't understand why it is
> recursive and for what reason. I am not suggesting we blindly
> change it to a normal spinlock.

But the reason for changing it to be recursive is stated in the
description of the respective commit bfa493f52e89:

IOMMU: make the pcidevs_lock a recursive one

The pcidevs_lock is going to be recursively taken for hiding ATS
device, when VT-d Device-TLB flush timed out.

Before asking such a question, I would have assumed that you looked
up that very commit. Hence my asking to be more specific.

Jan




Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-15 Thread Jane Malalane
On 15/02/2022 15:21, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
> unless you have verified the sender and know the content is safe.
> 
> On 15.02.2022 16:10, Jane Malalane wrote:
>> On 15/02/2022 10:19, Jan Beulich wrote:
>>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>>> unless you have verified the sender and know the content is safe.
>>>
>>> On 15.02.2022 11:14, Jane Malalane wrote:
 On 15/02/2022 07:09, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
> unless you have verified the sender and know the content is safe.
>
> On 14.02.2022 18:09, Jane Malalane wrote:
>> On 14/02/2022 13:18, Jan Beulich wrote:
>>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open 
>>> attachments unless you have verified the sender and know the content is 
>>> safe.
>>>
>>> On 14.02.2022 14:11, Jane Malalane wrote:
 On 11/02/2022 11:46, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open 
> attachments unless you have verified the sender and know the content 
> is safe.
>
> On 11.02.2022 12:29, Roger Pau Monné wrote:
>> On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote:
>>> On 10/02/2022 10:03, Roger Pau Monné wrote:
 On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c 
> b/xen/arch/x86/hvm/vmx/vmcs.c
> index 7ab15e07a0..4060aef1bd 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
>MSR_IA32_VMX_PROCBASED_CTLS2, &mismatch);
>}
>
> +/* Check whether hardware supports accelerated xapic and 
> x2apic. */
> +if ( bsp )
> +{
> +assisted_xapic_available = 
> cpu_has_vmx_virtualize_apic_accesses;
> +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
> + 
> cpu_has_vmx_virtual_intr_delivery) &&
> +
> cpu_has_vmx_virtualize_x2apic_mode;

 I've been think about this, and it seems kind of asymmetric that 
 for
 xAPIC mode we report hw assisted support only with
 virtualize_apic_accesses available, while for x2APIC we require
 virtualize_x2apic_mode plus either apic_reg_virt or
 virtual_intr_delivery.

 I think we likely need to be more consistent here, and report hw
 assisted x2APIC support as long as virtualize_x2apic_mode is
 available.

 This will likely have some effect on patch 2 also, as you will 
 have to
 adjust vmx_vlapic_msr_changed.

 Thanks, Roger.
>>>
>>> Any other thoughts on this? As on one hand it is asymmetric but also
>>> there isn't much assistance with only virtualize_x2apic_mode set 
>>> as, in
>>> this case, a VM exit will be avoided only when trying to access the 
>>> TPR
>>> register.
>>
>> I've been thinking about this, and reporting hardware assisted
>> x{2}APIC virtualization with just
>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or
>> SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. 
>> While
>> those provide some assistance to the VMM in order to handle APIC
>> accesses, it will still require a trap into the hypervisor to handle
>> most of the accesses.
>>
>> So maybe we should only report hardware assisted support when the
>> mentioned features are present together with
>> SECONDARY_EXEC_APIC_REGISTER_VIRT?
>
> Not sure - "some assistance" seems still a little better than none at 
> all.
> Which route to go depends on what exactly we intend the bit to be 
> used for.
>
 True. I intended this bit to be specifically for enabling
 assisted_x{2}apic. So, would it be inconsistent to report hardware
 assistance with just VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE
 but still claim that x{2}apic is virtualized if no MSR accesses are
 intercepted with XEN_HVM_CPUID_X2APIC_VIRT (in traps.c) so that, as you
 say, the guest gets at least "some assistance" instead of none but we
 still claim x{2}apic virtualization when it is actually complete? Maybe
 I could also add a comment alluding to this in the xl 

Re: [PATCH v2 64/70] x86: Introduce helpers/checks for endbr64 instructions

2022-02-15 Thread Jan Beulich
On 14.02.2022 13:51, Andrew Cooper wrote:
> ... to prevent the optimiser creating unsafe code.  See the code comment for
> full details.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 




Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Oleksandr Andrushchenko


On 15.02.22 18:18, Jan Beulich wrote:
> On 15.02.2022 16:46, Oleksandr Andrushchenko wrote:
>> Question: can anyone please explain why pcidevs is a recursive lock?
> Well, assuming you did look at the change making it so, can you be a
> little more specific with your question? Are you perhaps suggesting
> the original reason has disappeared, and no new one has appeared? I'm
> afraid I have to repeat what I did say before: If you want to remove
> the recursive nature of the lock, then it is all on you to prove that
> there's no code path where the lock is taken recursively. IOW even if
> no-one knew of a reason, you'd still need to provide this proof.
> Unless of course we'd all agree we're okay to take the risk; I don't
> see us doing so, though.
The question was exactly as asked: I don't understand why it is
recursive and for what reason. I am not suggesting we blindly
change it to a normal spinlock.

My impression was that the code is structured in a way
that the same functionality is coded such as functions,
which already hold the lock, can call others which are
about to acquire the same. So, that allowed not introducing
XXX and XXX_unlocked function pairs which can be done
for many reasons.

That's it

> Jan
>
>
Thank you,
Oleksandr

Re: [PATCH v2 60/70] x86: Use control flow typechecking where possible

2022-02-15 Thread Jan Beulich
On 14.02.2022 13:51, Andrew Cooper wrote:
> Now all callees have been annotated, turn on typechecking to catch issues in
> the future.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Wei Liu 
> 
> RFC.  This is still an experimental compiler extention
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

Hmm, the last update to that bugzilla entry was in November. I'm not
sure it is a good idea to carry code for something which hasn't even
reached gcc's master branch yet.

Jan




Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Jan Beulich
On 15.02.2022 16:46, Oleksandr Andrushchenko wrote:
> Question: can anyone please explain why pcidevs is a recursive lock?

Well, assuming you did look at the change making it so, can you be a
little more specific with your question? Are you perhaps suggesting
the original reason has disappeared, and no new one has appeared? I'm
afraid I have to repeat what I did say before: If you want to remove
the recursive nature of the lock, then it is all on you to prove that
there's no code path where the lock is taken recursively. IOW even if
no-one knew of a reason, you'd still need to provide this proof.
Unless of course we'd all agree we're okay to take the risk; I don't
see us doing so, though.

Jan




Re: tools backports

2022-02-15 Thread Anthony PERARD
On Mon, Feb 14, 2022 at 11:55:26AM +0100, Roger Pau Monné wrote:
> On Mon, Feb 14, 2022 at 11:18:44AM +0100, Jan Beulich wrote:
> > Anthony,
> > 
> > I have a couple of simple tool stack backports queued, which - with your
> > agreement - I would want to put onto the stable tree whenever I get
> > around to applying the next batch of backports:
> > 
> > d9d3496e817a tools/libs/light: don't touch nr_vcpus_out if listing vcpus 
> > and returning NULL
> > e62cc29f9b6c tools/libs: Fix build dependencies
> 
> I would also like to request:
> 
> 0bdc43c8de libxl: force netback to wait for hotplug execution before 
> connecting

Looks good to be backported as well.

Thanks,

-- 
Anthony PERARD



Re: tools backports

2022-02-15 Thread Anthony PERARD
On Mon, Feb 14, 2022 at 11:18:44AM +0100, Jan Beulich wrote:
> I have a couple of simple tool stack backports queued, which - with your
> agreement - I would want to put onto the stable tree whenever I get
> around to applying the next batch of backports:
> 
> d9d3496e817a tools/libs/light: don't touch nr_vcpus_out if listing vcpus and 
> returning NULL
> e62cc29f9b6c tools/libs: Fix build dependencies
> 
> For 4.15 additionally
> 
> dd6c062a7a4a tools/libxl: Correctly align the ACPI tables
> 
> Please let me know if that's okay with you.

All looks good to go.

Thanks,


-- 
Anthony PERARD



[ovmf test] 168119: all pass - PUSHED

2022-02-15 Thread osstest service owner
flight 168119 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168119/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 85589ddbf6f8c6dc75f73aa32e484e3cfd439e7a
baseline version:
 ovmf 1193aa2dfbbd11fa7191d000a0cc166d03a249d2

Last test of basis   168115  2022-02-15 02:41:41 Z0 days
Testing same since   168119  2022-02-15 10:43:01 Z0 days1 attempts


People who touched revisions under test:
  Rebecca Cran 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   1193aa2dfb..85589ddbf6  85589ddbf6f8c6dc75f73aa32e484e3cfd439e7a -> 
xen-tested-master



Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Oleksandr Andrushchenko


On 15.02.22 14:56, Jan Beulich wrote:
> On 15.02.2022 13:44, Oleksandr Andrushchenko wrote:
>> On 15.02.22 13:54, Oleksandr Andrushchenko wrote:
>>> On 15.02.22 13:50, Jan Beulich wrote:
 On 15.02.2022 12:45, Oleksandr Andrushchenko wrote:
> I'm on your side, I just want to hear that we all agree pcidevs
> needs to be converted into rwlock according with the plan you
> suggested and at least now it seems to be an acceptable solution.
 I'd like to express worries though about the conversion of this
 recursive lock into an r/w one.
>>> Could you please elaborate more on this?
>> What if we just do the following:
>>
>> static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
>> static rwlock_t DEFINE_RWLOCK(_pcidevs_rwlock);
>>
>> void pcidevs_lock(void)
>> {
>>       read_lock(&_pcidevs_rwlock);
>>       spin_lock_recursive(&_pcidevs_lock);
>> }
>>
>> void pcidevs_unlock(void)
>> {
>>       spin_unlock_recursive(&_pcidevs_lock);
>>       read_unlock(&_pcidevs_rwlock);
>> }
>>
>> void pcidevs_read_lock(void)
>> {
>>       read_lock(&_pcidevs_rwlock);
>> }
>>
>> void pcidevs_read_unlock(void)
>> {
>>       read_unlock(&_pcidevs_rwlock);
>> }
>>
>> void pcidevs_write_lock(void)
>> {
>>       write_lock(&_pcidevs_rwlock);
>> }
>>
>> void pcidevs_write_unlock(void)
>> {
>>       write_unlock(&_pcidevs_rwlock);
>> }
> Hmm, this is an interesting idea. Except that I'm not sure in how
> far it'll be suitable: read_lock() won't lock out users of just
> lock(), so the solution looks tailored to your vPCI use case. Yet
> obviously (I think) read_lock() would want to become usable for
> e.g. simple list traversal as well, down the road.

1. Assumption: _pcidevs_rwlock is used to protect pdev
structure itself, so after calling pcidevs_lock(), pcidevs_read_lock()
and pcidevs_write_lock() we need to check if pdev != NULL
at all sites

2. _pcidevs_rwlock is not meant to protect the contents of pdev:
- for that _pcidevs_lock is used
- _pcidevs_lock doesn't protect pdev->vpci: for that
   pdev->vpci->lock is used.

3. Current code will continue using pcidevs_lock() as it is now.
With the exception of the writers: pci_{add|remove}_device.
These will use pcidevs_write_lock() instead.

4. vPCI code, such as vpci_{read|write} will use
pcidevs_{read|write}_lock (write mode for modify_bars)
and pdev->vpci->lock to protect and/or modify pdev->vpci.
This should be safe because under the rwlock we are
guaranteed that pdev exists and no other code, but vPCI can
remove pdev->vpci.

for_each_pdev and pci_get_pdev_by_domain, when used by vPCI,
we use pcidevs_read_lock expecting we only need to access
pdev->vpci. If this is not the case and we need to modify
contents of pdev we need to acquire
     spin_lock_recursive(&_pcidevs_lock);
with a new helper 5)

5. A new helper is needed to acquire spin_lock_recursive(&_pcidevs_lock);
This will be used by at least vPCI code if it needs modifying
something in pdev other than pdev->vpci. In that case
we "upgrade" pcidevs_read_lock() to pcidevs_lock()

Question: can anyone please explain why pcidevs is a recursive lock?

>
> Jan
>
Thank you and hope to hear your thought on the above,
Oleksandr

Re: [PATCH] tools/xenstore: add error indicator to ring page

2022-02-15 Thread Juergen Gross

On 15.02.22 16:42, Anthony PERARD wrote:

On Thu, Feb 10, 2022 at 12:16:20PM +0100, Juergen Gross wrote:

+The "Connection error indicator" is used to let the server indicate it has
+detected some error that led to deactivation of the connection by the server.
+If the feature has been advertised then the "Connection error indicator" may
+take the following values:
+
+Value   Description
+-
+0   No error, connection is valid
+1   Communication problems (event channel not functional)
+2   Inconsistent producer or consumer offset
+3   Protocol violation (client data package too long)


Is this meant to be the only possible error value? If in the future we
want to add more possible error, does it going to need a new feature
bit and maybe a new error field?


No, as the guest is not opting into this feature, but just gets it
presented, there is no need to have another bit for new error values.

Note that this is a purely informational interface. The error value
(other than 0) is only for diagnostic purposes, there is no way a
guest could react in a sane way to a specific error case.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] tools/xenstore: add error indicator to ring page

2022-02-15 Thread Anthony PERARD
On Thu, Feb 10, 2022 at 12:16:20PM +0100, Juergen Gross wrote:
> +The "Connection error indicator" is used to let the server indicate it has
> +detected some error that led to deactivation of the connection by the server.
> +If the feature has been advertised then the "Connection error indicator" may
> +take the following values:
> +
> +Value   Description
> +-
> +0   No error, connection is valid
> +1   Communication problems (event channel not functional)
> +2   Inconsistent producer or consumer offset
> +3   Protocol violation (client data package too long)

Is this meant to be the only possible error value? If in the future we
want to add more possible error, does it going to need a new feature
bit and maybe a new error field?

Thanks,

-- 
Anthony PERARD



Re: [PATCH] rwlock: remove unneeded subtraction

2022-02-15 Thread Roger Pau Monné
On Tue, Feb 15, 2022 at 03:45:00PM +0100, Jan Beulich wrote:
> On 15.02.2022 15:16, Roger Pau Monné wrote:
> > I could add to the commit message:
> > 
> > "Originally _QR_BIAS was subtracted from the result of
> > atomic_add_return_acquire in order to prevent GCC from emitting an
> > unneeded ADD instruction. This being in the lock slow path such
> > optimizations don't seem likely to make any relevant performance
> > difference. Also modern GCC and CLANG versions will already avoid
> > emitting the ADD instruction."
> 
> ... I'm fine with this as explanation; I'd also be fine adding
> this to the description while committing.

Sure, thanks.

Roger.



[PATCH v2] xen/arm: vpci: remove PCI I/O ranges property value

2022-02-15 Thread Rahul Singh
PCI I/O space are not mapped to dom0 when PCI passthrough is enabled,
also there is no vpci trap handler register for IO bar.

Remove PCI I/O ranges property value from dom0 device tree node so that
dom0 linux will not allocate I/O space for PCI devices if
pci-passthrough is enabled.

Signed-off-by: Rahul Singh 
---
 xen/arch/arm/domain_build.c   | 29 +++
 xen/common/device_tree.c  | 69 +++
 xen/include/xen/device_tree.h | 10 +
 3 files changed, 108 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 6931c022a2..7cfe64fe97 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -648,6 +648,31 @@ static void __init allocate_static_memory(struct domain *d,
 }
 #endif
 
+/*
+ * PCI IO bar are not mapped to dom0 when PCI passthrough is enabled, also
+ * there is no trap handler registered for IO bar, therefore remove the IO
+ * range property from the device tree node for dom0.
+ */
+static int handle_linux_pci_io_ranges(struct kernel_info *kinfo,
+  const struct dt_device_node *node)
+{
+if ( !is_pci_passthrough_enabled() )
+return 0;
+
+if ( !dt_device_type_is_equal(node, "pci") )
+return 0;
+
+/*
+ * The current heuristic assumes that a device is a host bridge
+ * if the type is "pci" and then parent type is not "pci".
+ */
+if ( node->parent && dt_device_type_is_equal(node->parent, "pci") )
+return 0;
+
+return dt_pci_remove_io_ranges(kinfo->fdt, node);
+}
+
+
 /*
  * When PCI passthrough is available we want to keep the
  * "linux,pci-domain" in sync for every host bridge.
@@ -723,6 +748,10 @@ static int __init write_properties(struct domain *d, 
struct kernel_info *kinfo,
 if ( iommu_node && device_get_class(iommu_node) != DEVICE_IOMMU )
 iommu_node = NULL;
 
+res = handle_linux_pci_io_ranges(kinfo, node);
+if ( res )
+return res;
+
 dt_for_each_property_node (node, prop)
 {
 const void *prop_data = prop->value;
diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 4aae281e89..55a883e0f6 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -2195,6 +2195,75 @@ int dt_get_pci_domain_nr(struct dt_device_node *node)
 return (u16)domain;
 }
 
+int dt_pci_remove_io_ranges(void *fdt, const struct dt_device_node *dev)
+{
+const struct dt_device_node *parent = NULL;
+const struct dt_bus *bus, *pbus;
+unsigned int rlen;
+int na, ns, pna, pns, rone;
+const __be32 *ranges;
+__be32 regs[((GUEST_ROOT_ADDRESS_CELLS * 2) + GUEST_ROOT_SIZE_CELLS + 1)
+   * 2];
+__be32 *addr = ®s[0];
+
+bus = dt_match_bus(dev);
+if ( !bus )
+return 0; /* device is not a bus */
+
+parent = dt_get_parent(dev);
+if ( !parent )
+return -EINVAL;
+
+ranges = dt_get_property(dev, "ranges", &rlen);
+if ( !ranges )
+{
+printk(XENLOG_ERR "DT: no ranges; cannot enumerate %s\n",
+   dev->full_name);
+return -EINVAL;
+}
+if ( !rlen ) /* Nothing to do */
+return 0;
+
+bus->count_cells(dev, &na, &ns);
+if ( !DT_CHECK_COUNTS(na, ns) )
+{
+printk(XENLOG_ERR "dt_parse: Bad cell count for device %s\n",
+   dev->full_name);
+return -EINVAL;
+}
+
+pbus = dt_match_bus(parent);
+if ( !pbus )
+{
+printk(XENLOG_ERR "DT: %s is not a valid bus\n", parent->full_name);
+return -EINVAL;
+}
+
+pbus->count_cells(dev, &pna, &pns);
+if ( !DT_CHECK_COUNTS(pna, pns) )
+{
+printk(XENLOG_ERR "dt_parse: Bad cell count for parent %s\n",
+   dev->full_name);
+return -EINVAL;
+}
+
+/* Now walk through the ranges */
+rlen /= 4;
+rone = na + pna + ns;
+for ( ; rlen >= rone; rlen -= rone, ranges += rone )
+{
+unsigned int flags = bus->get_flags(ranges);
+if ( flags & IORESOURCE_IO )
+continue;
+
+memcpy(addr, ranges, 4 * rone);
+
+addr += rone;
+}
+
+return fdt_property(fdt, "ranges", regs, sizeof(regs));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index fd6cd00b43..580231f872 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -849,6 +849,16 @@ int dt_count_phandle_with_args(const struct dt_device_node 
*np,
  */
 int dt_get_pci_domain_nr(struct dt_device_node *node);
 
+/**
+ * dt_pci_remove_io_range - Remove the PCI I/O range property value.
+ * @fdt: Pointer to the file descriptor tree.
+ * @node: Device tree node.
+ *
+ * This function will remove the PCI IO range property from the PCI device tree
+ * node.
+ */
+int dt_pci_remove_io_ranges(void *fdt, const struct dt_device_node *node);
+
 struct dt_device_node *dt_find_node_by_phandle(dt_phandle handle);
 
 #ifdef CONFIG_DEVICE

[PATCH v2 3/3] xen/vpci: msix: move read/write call to MSI-X PBA entry to arch file

2022-02-15 Thread Rahul Singh
{read,write}{l,q} function argument is different for ARM and x86.
ARM {read,wrie}(l,q} function argument is pointer whereas X86
{read,wrie}(l,q} function argument is address itself.

{read,write}{l,q} is only used in common file to access the MSI-X PBA
structure. To avoid impacting other x86 code and to make the code common
move the read/write call to MSI-X PBA to arch specific file.

Signed-off-by: Rahul Singh 
---
Changes since v1:
 - Added in this version
---
 xen/arch/x86/hvm/vmsi.c | 47 +
 xen/drivers/vpci/msix.c | 43 ++---
 xen/include/xen/vpci.h  |  6 ++
 3 files changed, 55 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 761ce674d7..f124a1d07d 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -1033,4 +1033,51 @@ void vpci_msix_arch_register(struct vpci_msix *msix, 
struct domain *d)
 
 list_add(&msix->next, &d->arch.hvm.msix_tables);
 }
+
+bool vpci_msix_arch_pba_read(unsigned long addr, unsigned int len,
+ unsigned long *data)
+{
+/*
+ * Access to PBA.
+ *
+ * TODO: note that this relies on having the PBA identity mapped to the
+ * guest address space. If this changes the address will need to be
+ * translated.
+ */
+switch ( len )
+{
+case 4:
+*data = readl(addr);
+break;
+
+case 8:
+*data = readq(addr);
+break;
+
+default:
+ASSERT_UNREACHABLE();
+break;
+}
+
+return true;
+}
+
+void vpci_msix_arch_pba_write(unsigned long addr, unsigned int len,
+  unsigned long data)
+{
+switch ( len )
+{
+case 4:
+writel(data, addr);
+break;
+
+case 8:
+writeq(data, addr);
+break;
+
+default:
+ASSERT_UNREACHABLE();
+break;
+}
+}
 #endif /* CONFIG_HAS_VPCI */
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 5b315757ef..b6720f1a1a 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -170,31 +170,7 @@ bool vpci_msix_read(struct vpci_msix *msix, unsigned long 
addr,
 return true;
 
 if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
-{
-/*
- * Access to PBA.
- *
- * TODO: note that this relies on having the PBA identity mapped to the
- * guest address space. If this changes the address will need to be
- * translated.
- */
-switch ( len )
-{
-case 4:
-*data = readl(addr);
-break;
-
-case 8:
-*data = readq(addr);
-break;
-
-default:
-ASSERT_UNREACHABLE();
-break;
-}
-
-return true;
-}
+return vpci_msix_arch_pba_read(addr, len, data);
 
 spin_lock(&msix->pdev->vpci->lock);
 entry = get_entry(msix, addr);
@@ -247,22 +223,7 @@ bool vpci_msix_write(struct vpci_msix *msix, unsigned long 
addr,
 {
 /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
 if ( is_hardware_domain(d) )
-{
-switch ( len )
-{
-case 4:
-writel(data, addr);
-break;
-
-case 8:
-writeq(data, addr);
-break;
-
-default:
-ASSERT_UNREACHABLE();
-break;
-}
-}
+vpci_msix_arch_pba_write(addr, len, data);
 
 return true;
 }
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 1c36845abf..a61daf9d53 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -231,6 +231,12 @@ bool vpci_msix_write(struct vpci_msix *msix, unsigned long 
addr,
 bool vpci_msix_read(struct vpci_msix *msix, unsigned long addr,
 unsigned int len, unsigned long *data);
 
+bool vpci_msix_arch_pba_read(unsigned long addr, unsigned int len,
+ unsigned long *data);
+
+void vpci_msix_arch_pba_write(unsigned long addr, unsigned int len,
+  unsigned long data);
+
 #endif /* __XEN__ */
 
 #else /* !CONFIG_HAS_VPCI */
-- 
2.25.1




[PATCH v2 2/3] xen/vpci: msix: change return value of vpci_msix_{read,write}

2022-02-15 Thread Rahul Singh
Return value is different for the MMIO handler on ARM and x86
architecture.

To make the code common for both architectures change the return value
of vpci_msix_{read, write} to bool. Architecture-specific return value
will be handled in arch code.

Signed-off-by: Rahul Singh 
---
Changes since v1:
 - Added in this version
---
 xen/arch/x86/hvm/vmsi.c | 10 --
 xen/drivers/vpci/msix.c | 24 
 xen/include/xen/vpci.h  |  8 
 3 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 17426f238c..761ce674d7 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -1002,7 +1002,10 @@ static int x86_msix_write(struct vcpu *v, unsigned long 
addr, unsigned int len,
 const struct domain *d = v->domain;
 struct vpci_msix *msix = vpci_msix_find(d, addr);
 
-return vpci_msix_write(msix, addr, len, data);
+if( !vpci_msix_write(msix, addr, len, data) )
+return X86EMUL_RETRY;
+
+return X86EMUL_OKAY;
 }
 
 static int x86_msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
@@ -1011,7 +1014,10 @@ static int x86_msix_read(struct vcpu *v, unsigned long 
addr, unsigned int len,
 const struct domain *d = v->domain;
 struct vpci_msix *msix = vpci_msix_find(d, addr);
 
-return vpci_msix_read(msix, addr, len, data);
+if ( !vpci_msix_read(msix, addr, len, data) )
+return X86EMUL_RETRY;
+
+return X86EMUL_OKAY;
 }
 
 static const struct hvm_mmio_ops vpci_msix_table_ops = {
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index d89396a3b4..5b315757ef 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -155,8 +155,8 @@ static struct vpci_msix_entry *get_entry(struct vpci_msix 
*msix,
 return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
 }
 
-int vpci_msix_read(struct vpci_msix *msix, unsigned long addr,
-   unsigned int len, unsigned long *data)
+bool vpci_msix_read(struct vpci_msix *msix, unsigned long addr,
+unsigned int len, unsigned long *data)
 {
 const struct vpci_msix_entry *entry;
 unsigned int offset;
@@ -164,10 +164,10 @@ int vpci_msix_read(struct vpci_msix *msix, unsigned long 
addr,
 *data = ~0ul;
 
 if ( !msix )
-return X86EMUL_RETRY;
+return false;
 
 if ( !access_allowed(msix->pdev, addr, len) )
-return X86EMUL_OKAY;
+return true;
 
 if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
 {
@@ -193,7 +193,7 @@ int vpci_msix_read(struct vpci_msix *msix, unsigned long 
addr,
 break;
 }
 
-return X86EMUL_OKAY;
+return true;
 }
 
 spin_lock(&msix->pdev->vpci->lock);
@@ -227,21 +227,21 @@ int vpci_msix_read(struct vpci_msix *msix, unsigned long 
addr,
 }
 spin_unlock(&msix->pdev->vpci->lock);
 
-return X86EMUL_OKAY;
+return true;
 }
 
-int vpci_msix_write(struct vpci_msix *msix, unsigned long addr,
-unsigned int len, unsigned long data)
+bool vpci_msix_write(struct vpci_msix *msix, unsigned long addr,
+ unsigned int len, unsigned long data)
 {
 const struct domain *d = msix->pdev->domain;
 struct vpci_msix_entry *entry;
 unsigned int offset;
 
 if ( !msix )
-return X86EMUL_RETRY;
+return false;
 
 if ( !access_allowed(msix->pdev, addr, len) )
-return X86EMUL_OKAY;
+return true;
 
 if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
 {
@@ -264,7 +264,7 @@ int vpci_msix_write(struct vpci_msix *msix, unsigned long 
addr,
 }
 }
 
-return X86EMUL_OKAY;
+return true;
 }
 
 spin_lock(&msix->pdev->vpci->lock);
@@ -342,7 +342,7 @@ int vpci_msix_write(struct vpci_msix *msix, unsigned long 
addr,
 }
 spin_unlock(&msix->pdev->vpci->lock);
 
-return X86EMUL_OKAY;
+return true;
 }
 
 static int init_msix(struct pci_dev *pdev)
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 0381a2c911..1c36845abf 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -225,11 +225,11 @@ bool vpci_ecam_read(pci_sbdf_t sbdf, unsigned int reg, 
unsigned int len,
 
 void vpci_msix_arch_register(struct vpci_msix *msix, struct domain *d);
 
-int vpci_msix_write(struct vpci_msix *msix, unsigned long addr,
-unsigned int len, unsigned long data);
+bool vpci_msix_write(struct vpci_msix *msix, unsigned long addr,
+ unsigned int len, unsigned long data);
 
-int vpci_msix_read(struct vpci_msix *msix, unsigned long addr,
-   unsigned int len, unsigned long *data);
+bool vpci_msix_read(struct vpci_msix *msix, unsigned long addr,
+unsigned int len, unsigned long *data);
 
 #endif /* __XEN__ */
 
-- 
2.25.1




[PATCH v2 1/3] xen/vpci: msix: move x86 specific code to x86 file

2022-02-15 Thread Rahul Singh
vpci/msix.c file will be used for arm architecture when vpci msix
support will be added to ARM, but there is x86 specific code in this
file.

Move x86 specific code to the x86/hvm/vmsi.c file to make sure common
code will be used for other architecture.

No functional change intended.

Signed-off-by: Rahul Singh 
---
Changes since v1:
 - Split return value of vpci_msix_{read,write} to separate patch.
 - Fix the {read,write}{l,q} call in common code to move to arch specific file. 
---
 xen/arch/x86/hvm/vmsi.c | 102 
 xen/arch/x86/include/asm/msi.h  |  28 
 xen/arch/x86/msi.c  |   2 +-
 xen/drivers/passthrough/amd/iommu.h |   1 +
 xen/drivers/vpci/msi.c  |   3 +-
 xen/drivers/vpci/msix.c | 101 +++
 xen/include/xen/msi.h   |  28 
 xen/include/xen/vpci.h  |  13 
 8 files changed, 154 insertions(+), 124 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 13e2a190b4..17426f238c 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -925,4 +925,106 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
 
 return 0;
 }
+
+int vpci_make_msix_hole(const struct pci_dev *pdev)
+{
+struct domain *d = pdev->domain;
+unsigned int i;
+
+if ( !pdev->vpci->msix )
+return 0;
+
+/* Make sure there's a hole for the MSIX table/PBA in the p2m. */
+for ( i = 0; i < ARRAY_SIZE(pdev->vpci->msix->tables); i++ )
+{
+unsigned long start = PFN_DOWN(vmsix_table_addr(pdev->vpci, i));
+unsigned long end = PFN_DOWN(vmsix_table_addr(pdev->vpci, i) +
+ vmsix_table_size(pdev->vpci, i) - 1);
+
+for ( ; start <= end; start++ )
+{
+p2m_type_t t;
+mfn_t mfn = get_gfn_query(d, start, &t);
+
+switch ( t )
+{
+case p2m_mmio_dm:
+case p2m_invalid:
+break;
+case p2m_mmio_direct:
+if ( mfn_x(mfn) == start )
+{
+clear_identity_p2m_entry(d, start);
+break;
+}
+/* fallthrough. */
+default:
+put_gfn(d, start);
+gprintk(XENLOG_WARNING,
+"%pp: existing mapping (mfn: %" PRI_mfn
+"type: %d) at %#lx clobbers MSIX MMIO area\n",
+&pdev->sbdf, mfn_x(mfn), t, start);
+return -EEXIST;
+}
+put_gfn(d, start);
+}
+}
+
+return 0;
+}
+
+struct vpci_msix *vpci_msix_find(const struct domain *d, unsigned long addr)
+{
+struct vpci_msix *msix;
+
+list_for_each_entry ( msix, &d->arch.hvm.msix_tables, next )
+{
+const struct vpci_bar *bars = msix->pdev->vpci->header.bars;
+unsigned int i;
+
+for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
+if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
+ VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
+return msix;
+}
+
+return NULL;
+}
+
+static int x86_msix_accept(struct vcpu *v, unsigned long addr)
+{
+return !!vpci_msix_find(v->domain, addr);
+}
+
+static int x86_msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
+  unsigned long data)
+{
+const struct domain *d = v->domain;
+struct vpci_msix *msix = vpci_msix_find(d, addr);
+
+return vpci_msix_write(msix, addr, len, data);
+}
+
+static int x86_msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
+ unsigned long *data)
+{
+const struct domain *d = v->domain;
+struct vpci_msix *msix = vpci_msix_find(d, addr);
+
+return vpci_msix_read(msix, addr, len, data);
+}
+
+static const struct hvm_mmio_ops vpci_msix_table_ops = {
+.check = x86_msix_accept,
+.read = x86_msix_read,
+.write = x86_msix_write,
+};
+
+void vpci_msix_arch_register(struct vpci_msix *msix, struct domain *d)
+{
+if ( list_empty(&d->arch.hvm.msix_tables) )
+register_mmio_handler(d, &vpci_msix_table_ops);
+
+list_add(&msix->next, &d->arch.hvm.msix_tables);
+}
 #endif /* CONFIG_HAS_VPCI */
diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index e228b0f3f3..0a7912e9be 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -148,34 +148,6 @@ int msi_free_irq(struct msi_desc *entry);
  */
 #define NR_HP_RESERVED_VECTORS 20
 
-#define msi_control_reg(base)  (base + PCI_MSI_FLAGS)
-#define msi_lower_address_reg(base)(base + PCI_MSI_ADDRESS_LO)
-#define msi_upper_address_reg(base)(base + PCI_MSI_ADDRESS_HI)
-#define msi_data_reg(base, is64bit)\
-   ( (is64bit == 1) ? base+PCI_MSI_DATA_64 : base+PCI_MSI_DATA_32 )
-#define msi_mask_bits_reg(base, is64bit) \
-  

[PATCH v2 0/3] xen/vpci: msix: Make VPCI MSI-X code non-x86 specific

2022-02-15 Thread Rahul Singh
This patch series is v2 of preparatory work to make VPCI MSI-X code non-x86
specific.

Rahul Singh (3):
  xen/vpci: msix: move x86 specific code to x86 file
  xen/vpci: msix: change return value of vpci_msix_{read,write}
  xen/vpci: msix: move read/write call to MSI-X PBA entry to arch file

 xen/arch/x86/hvm/vmsi.c | 155 +++
 xen/arch/x86/include/asm/msi.h  |  28 -
 xen/arch/x86/msi.c  |   2 +-
 xen/drivers/passthrough/amd/iommu.h |   1 +
 xen/drivers/vpci/msi.c  |   3 +-
 xen/drivers/vpci/msix.c | 158 +++-
 xen/include/xen/msi.h   |  28 +
 xen/include/xen/vpci.h  |  19 
 8 files changed, 222 insertions(+), 172 deletions(-)

-- 
2.25.1




Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-15 Thread Jan Beulich
On 15.02.2022 16:10, Jane Malalane wrote:
> On 15/02/2022 10:19, Jan Beulich wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>> unless you have verified the sender and know the content is safe.
>>
>> On 15.02.2022 11:14, Jane Malalane wrote:
>>> On 15/02/2022 07:09, Jan Beulich wrote:
 [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
 unless you have verified the sender and know the content is safe.

 On 14.02.2022 18:09, Jane Malalane wrote:
> On 14/02/2022 13:18, Jan Beulich wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open 
>> attachments unless you have verified the sender and know the content is 
>> safe.
>>
>> On 14.02.2022 14:11, Jane Malalane wrote:
>>> On 11/02/2022 11:46, Jan Beulich wrote:
 [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open 
 attachments unless you have verified the sender and know the content 
 is safe.

 On 11.02.2022 12:29, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote:
>> On 10/02/2022 10:03, Roger Pau Monné wrote:
>>> On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
 diff --git a/xen/arch/x86/hvm/vmx/vmcs.c 
 b/xen/arch/x86/hvm/vmx/vmcs.c
 index 7ab15e07a0..4060aef1bd 100644
 --- a/xen/arch/x86/hvm/vmx/vmcs.c
 +++ b/xen/arch/x86/hvm/vmx/vmcs.c
 @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
   MSR_IA32_VMX_PROCBASED_CTLS2, &mismatch);
   }
   
 +/* Check whether hardware supports accelerated xapic and 
 x2apic. */
 +if ( bsp )
 +{
 +assisted_xapic_available = 
 cpu_has_vmx_virtualize_apic_accesses;
 +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
 + 
 cpu_has_vmx_virtual_intr_delivery) &&
 +
 cpu_has_vmx_virtualize_x2apic_mode;
>>>
>>> I've been think about this, and it seems kind of asymmetric that for
>>> xAPIC mode we report hw assisted support only with
>>> virtualize_apic_accesses available, while for x2APIC we require
>>> virtualize_x2apic_mode plus either apic_reg_virt or
>>> virtual_intr_delivery.
>>>
>>> I think we likely need to be more consistent here, and report hw
>>> assisted x2APIC support as long as virtualize_x2apic_mode is
>>> available.
>>>
>>> This will likely have some effect on patch 2 also, as you will have 
>>> to
>>> adjust vmx_vlapic_msr_changed.
>>>
>>> Thanks, Roger.
>>
>> Any other thoughts on this? As on one hand it is asymmetric but also
>> there isn't much assistance with only virtualize_x2apic_mode set as, 
>> in
>> this case, a VM exit will be avoided only when trying to access the 
>> TPR
>> register.
>
> I've been thinking about this, and reporting hardware assisted
> x{2}APIC virtualization with just
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or
> SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While
> those provide some assistance to the VMM in order to handle APIC
> accesses, it will still require a trap into the hypervisor to handle
> most of the accesses.
>
> So maybe we should only report hardware assisted support when the
> mentioned features are present together with
> SECONDARY_EXEC_APIC_REGISTER_VIRT?

 Not sure - "some assistance" seems still a little better than none at 
 all.
 Which route to go depends on what exactly we intend the bit to be used 
 for.

>>> True. I intended this bit to be specifically for enabling
>>> assisted_x{2}apic. So, would it be inconsistent to report hardware
>>> assistance with just VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE
>>> but still claim that x{2}apic is virtualized if no MSR accesses are
>>> intercepted with XEN_HVM_CPUID_X2APIC_VIRT (in traps.c) so that, as you
>>> say, the guest gets at least "some assistance" instead of none but we
>>> still claim x{2}apic virtualization when it is actually complete? Maybe
>>> I could also add a comment alluding to this in the xl documentation.
>>
>> To rephrase my earlier point: Which kind of decisions are the consumer(s)
>> of us reporting hardware assistance going to take? In how far is there a
>> risk that "some assistance" is overall going to lead to a loss of
>> performance? I guess I'd need to see comment and a

Re: [PATCH v2 07/70] x86: Build check for embedded endbr64 instructions

2022-02-15 Thread Jan Beulich
On 14.02.2022 13:50, Andrew Cooper wrote:
> From: Marek Marczykowski-Górecki 
> 
> Embedded endbr64 instructions mark legal indirect branches as far as the CPU
> is concerned, which aren't legal as far as the logic is concerned.

I think it would help if it was clarified what "embedded" actually means
here.

> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -155,6 +155,9 @@ $(TARGET)-syms: prelink.o xen.lds
>   $(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
>   $(LD) $(XEN_LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
>   $(@D)/.$(@F).1.o -o $@
> +ifeq ($(CONFIG_XEN_IBT),y)
> + $(SHELL) $(BASEDIR)/tools/check-endbr.sh $@
> +endif
>   $(NM) -pa --format=sysv $(@D)/$(@F) \
>   | $(BASEDIR)/tools/symbols --all-symbols --xensyms --sysv 
> --sort \
>   >$(@D)/$(@F).map

The same wants doing on xen.efi, I guess?

> --- /dev/null
> +++ b/xen/tools/check-endbr.sh
> @@ -0,0 +1,76 @@
> +#!/bin/sh
> +
> +#
> +# Usage ./$0 xen-syms
> +#
> +
> +set -e
> +
> +OBJCOPY="${OBJCOPY:-objcopy} -j .text $1"
> +OBJDUMP="${OBJDUMP:-objdump} -j .text $1"
> +
> +D=$(mktemp -d)
> +trap "rm -rf $D" EXIT
> +
> +TEXT_BIN=$D/xen-syms.text
> +VALID=$D/valid-addrs
> +ALL=$D/all-addrs
> +BAD=$D/bad-addrs
> +
> +#
> +# First, look for all the valid endbr64 instructions.
> +# A worst-case disassembly, viewed through cat -A, may look like:
> +#
> +# 82d040337bd4 :$
> +# 82d040337bd4:^If3 0f 1e fa  ^Iendbr64 $
> +# 82d040337bd8:^Ieb fe^Ijmp82d040337bd8 
> $
> +# 82d040337bda:^Ib8 f3 0f 1e fa   ^Imov$0xfa1e0ff3,%eax$
> +#
> +# Want to grab the address of endbr64 instructions only, ignoring function
> +# names/jump labels/etc, so look for 'endbr64' preceeded by a tab and with 
> any
> +# number of trailing spaces before the end of the line.
> +#
> +${OBJDUMP} -d | grep '   endbr64 *$' | cut -f 1 -d ':' > $VALID &

Since you look at only .text the risk of the disassembler coming
out of sync with the actual instruction stream is lower than when
32- and 16-bit code was also part of what is disassembled, but it's
not zero. Any zero-padding inserted anywhere by the linker can
result in an immediately following ENDBR to be missed (because
sequences of zeros resemble 2-byte insns). While this risk may be
acceptable, I think it wants mentioning at least in the description,
maybe even at the top of the script (where one would likely look
first after it spitting out an error).

Do you perhaps want to also pass -w to objdump, to eliminate the
risk of getting confused by split lines?

> +#
> +# Second, look for any endbr64 byte sequence
> +# This has a couple of complications:
> +#
> +# 1) Grep binary search isn't VMA aware.  Copy .text out as binary, causing
> +#the grep offset to be from the start of .text.
> +#
> +# 2) AWK can't add 64bit integers, because internally all numbers are 
> doubles.
> +#When the upper bits are set, the exponents worth of precision is lost in
> +#the lower bits, rounding integers to the nearest 4k.
> +#
> +#Instead, use the fact that Xen's .text is within a 1G aligned region, 
> and
> +#split the VMA in half so AWK's numeric addition is only working on 32 
> bit
> +#numbers, which don't lose precision.
> +#
> +eval $(${OBJDUMP} -h | awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", 
> substr($4, 1, 8), substr($4, 9, 16)}')
> +
> +${OBJCOPY} -O binary $TEXT_BIN
> +grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN |
> +awk -F':' '{printf "%s%x\n", "'$vma_hi'", strtonum(0x'$vma_lo') + $1}' > 
> $ALL

None of the three options passed to grep look to be standardized.
Is this going to cause problems on non-Linux systems? Should this
checking perhaps be put behind a separate Kconfig option?

> +# Wait for $VALID to become complete
> +wait
> +
> +# Sanity check $VALID and $ALL, in case the string parsing bitrots
> +val_sz=$(stat -c '%s' $VALID)
> +all_sz=$(stat -c '%s' $ALL)
> +[ "$val_sz" -eq 0 ] && { echo "Error: Empty valid-addrs" >&2; exit 
> 1; }
> +[ "$all_sz" -eq 0 ] && { echo "Error: Empty all-addrs" >&2; exit 1; }
> +[ "$all_sz" -lt "$val_sz" ] && { echo "Error: More valid-addrs than 
> all-addrs" >&2; exit 1; }
> +
> +# $BAD = $ALL - $VALID
> +join -v 2 $VALID $ALL > $BAD
> +nr_bad=$(wc -l < $BAD)
> +
> +# Success
> +[ "$nr_bad" -eq 0 ] && exit 0
> +
> +# Failure
> +echo "Fail: Found ${nr_bad} embedded endbr64 instructions" >&2
> +addr2line -afip -e $1 < $BAD >&2

There probably also wants to be an ADDR2LINE variable then. If
one overrides objdump and objcopy, one would likely want/need to
override this one as well.

Jan




Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-15 Thread Jane Malalane
On 15/02/2022 10:19, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
> unless you have verified the sender and know the content is safe.
> 
> On 15.02.2022 11:14, Jane Malalane wrote:
>> On 15/02/2022 07:09, Jan Beulich wrote:
>>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>>> unless you have verified the sender and know the content is safe.
>>>
>>> On 14.02.2022 18:09, Jane Malalane wrote:
 On 14/02/2022 13:18, Jan Beulich wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
> unless you have verified the sender and know the content is safe.
>
> On 14.02.2022 14:11, Jane Malalane wrote:
>> On 11/02/2022 11:46, Jan Beulich wrote:
>>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open 
>>> attachments unless you have verified the sender and know the content is 
>>> safe.
>>>
>>> On 11.02.2022 12:29, Roger Pau Monné wrote:
 On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote:
> On 10/02/2022 10:03, Roger Pau Monné wrote:
>> On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c 
>>> b/xen/arch/x86/hvm/vmx/vmcs.c
>>> index 7ab15e07a0..4060aef1bd 100644
>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>>> @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
>>>   MSR_IA32_VMX_PROCBASED_CTLS2, &mismatch);
>>>   }
>>>   
>>> +/* Check whether hardware supports accelerated xapic and 
>>> x2apic. */
>>> +if ( bsp )
>>> +{
>>> +assisted_xapic_available = 
>>> cpu_has_vmx_virtualize_apic_accesses;
>>> +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
>>> + 
>>> cpu_has_vmx_virtual_intr_delivery) &&
>>> +
>>> cpu_has_vmx_virtualize_x2apic_mode;
>>
>> I've been think about this, and it seems kind of asymmetric that for
>> xAPIC mode we report hw assisted support only with
>> virtualize_apic_accesses available, while for x2APIC we require
>> virtualize_x2apic_mode plus either apic_reg_virt or
>> virtual_intr_delivery.
>>
>> I think we likely need to be more consistent here, and report hw
>> assisted x2APIC support as long as virtualize_x2apic_mode is
>> available.
>>
>> This will likely have some effect on patch 2 also, as you will have 
>> to
>> adjust vmx_vlapic_msr_changed.
>>
>> Thanks, Roger.
>
> Any other thoughts on this? As on one hand it is asymmetric but also
> there isn't much assistance with only virtualize_x2apic_mode set as, 
> in
> this case, a VM exit will be avoided only when trying to access the 
> TPR
> register.

 I've been thinking about this, and reporting hardware assisted
 x{2}APIC virtualization with just
 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or
 SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While
 those provide some assistance to the VMM in order to handle APIC
 accesses, it will still require a trap into the hypervisor to handle
 most of the accesses.

 So maybe we should only report hardware assisted support when the
 mentioned features are present together with
 SECONDARY_EXEC_APIC_REGISTER_VIRT?
>>>
>>> Not sure - "some assistance" seems still a little better than none at 
>>> all.
>>> Which route to go depends on what exactly we intend the bit to be used 
>>> for.
>>>
>> True. I intended this bit to be specifically for enabling
>> assisted_x{2}apic. So, would it be inconsistent to report hardware
>> assistance with just VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE
>> but still claim that x{2}apic is virtualized if no MSR accesses are
>> intercepted with XEN_HVM_CPUID_X2APIC_VIRT (in traps.c) so that, as you
>> say, the guest gets at least "some assistance" instead of none but we
>> still claim x{2}apic virtualization when it is actually complete? Maybe
>> I could also add a comment alluding to this in the xl documentation.
>
> To rephrase my earlier point: Which kind of decisions are the consumer(s)
> of us reporting hardware assistance going to take? In how far is there a
> risk that "some assistance" is overall going to lead to a loss of
> performance? I guess I'd need to see comment and actual code all in one
> place ...
>
 So, I was thinking of adding something along the lines of:

 +=item B B<(x86 only)>
 +E

Re: [PATCH v2] tools/libxl: don't allow IOMMU usage with PoD

2022-02-15 Thread Anthony PERARD
On Thu, Feb 03, 2022 at 03:32:11PM +0100, Roger Pau Monne wrote:
>  if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
> -d_config->num_pcidevs && pod_enabled) {
> +d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED &&
> +pod_enabled) {
>  ret = ERROR_INVAL;
> -LOGD(ERROR, domid,
> - "PCI device assignment for HVM guest failed due to PoD 
> enabled");
> +LOGD(ERROR, domid, "IOMMU not supported together with PoD");

I'm not sure that this new error message is going to be good enough to
point out configuration issue for the guest.

One is going to set 'pci=["foo"]' or 'dtdev=["bar"]', which will enable
passthrough. Then they may get en error about IOMMU or PoD.
Should we maybe write something like this instead?

   "IOMMU or device passthrough not supported together with PoD"

Thanks,

-- 
Anthony PERARD



Re: [PATCH] tools: remove xenstore entries on vchan server closure

2022-02-15 Thread Oleksandr Andrushchenko


On 15.02.22 16:41, Jason Andryuk wrote:
> On Fri, Dec 10, 2021 at 7:35 AM Oleksandr Andrushchenko
>  wrote:
>> From: Oleksandr Andrushchenko 
>>
>> vchan server creates XenStore entries to advertise its event channel and
>> ring, but those are not removed after the server quits.
>> Add additional cleanup step, so those are removed, so clients do not try
>> to connect to a non-existing server.
>>
>> Signed-off-by: Oleksandr Andrushchenko 
>> ---
>>   tools/include/libxenvchan.h |  5 +
>>   tools/libs/vchan/init.c | 23 +++
>>   tools/libs/vchan/io.c   |  4 
>>   tools/libs/vchan/vchan.h| 31 +++
>>   4 files changed, 63 insertions(+)
>>   create mode 100644 tools/libs/vchan/vchan.h
>>
>> diff --git a/tools/include/libxenvchan.h b/tools/include/libxenvchan.h
>> index d6010b145df2..30cc73cf97e3 100644
>> --- a/tools/include/libxenvchan.h
>> +++ b/tools/include/libxenvchan.h
>> @@ -86,6 +86,11 @@ struct libxenvchan {
>>  int blocking:1;
>>  /* communication rings */
>>  struct libxenvchan_ring read, write;
>> +   /**
>> +* Base xenstore path for storing ring/event data used by the server
>> +* during cleanup.
>> +* */
>> +   char *xs_path;
>>   };
>>
>>   /**
>> diff --git a/tools/libs/vchan/init.c b/tools/libs/vchan/init.c
>> index c8510e6ce98a..c6b8674ef541 100644
>> --- a/tools/libs/vchan/init.c
>> +++ b/tools/libs/vchan/init.c
>> @@ -46,6 +46,8 @@
>>   #include 
>>   #include 
>>
>> +#include "vchan.h"
>> +
>>   #ifndef PAGE_SHIFT
>>   #define PAGE_SHIFT 12
>>   #endif
>> @@ -251,6 +253,10 @@ static int init_xs_srv(struct libxenvchan *ctrl, int 
>> domain, const char* xs_base
>>  char ref[16];
>>  char* domid_str = NULL;
>>  xs_transaction_t xs_trans = XBT_NULL;
>> +
>> +   // store the base path so we can clean up on server closure
>> +   ctrl->xs_path = strdup(xs_base);
> You don't check for NULL here, but you do check for NULL in
> close_xs_srv().  I guess it's okay, since it does the right thing.
> But I think it would be more robust to check for NULL here.  Is there
> a specific reason you wrote it this way?  Otherwise it looks good.
It does need a NULL check, thanks
It is after writing code with all those allocations and garbage collector
in the tools stack when allocations "don't fail" ;)
But this is indeed not the case here and needs a proper check
I'll wait for other comments and send v2
>
> Regards,
> Jason
Thank you,
Oleksandr

Re: Metadata and signalling channels for Zephyr virtio-backends on Xen

2022-02-15 Thread Vincent Guittot
On Sat, 12 Feb 2022 at 00:34, Stefano Stabellini
 wrote:
>
> On Fri, 11 Feb 2022, Alex Bennée wrote:
> > > FYI, a good and promising approach to handle both SCMI and SCPI is the
> > > series recently submitted by EPAM to mediate SCMI and SCPI requests in
> > > Xen: https://marc.info/?l=xen-devel&m=163947444032590
> > >
> > > (Another "special" virtio backend is virtio-iommu for similar reasons:
> > > the guest p2m address mappings and also the IOMMU drivers are in Xen.
> > > It is not immediately clear whether a virtio-iommu backend would need to
> > > be in Xen or run as a process in dom0/domU.)
> > >
> > > On the other hand, for all the other "normal" protocols (e.g.
> > > virtio-net, virtio-block, etc.) the backend would naturally run as a
> > > process in dom0 or domU (e.g. QEMU in Dom0) as one would expect.
> >
> > Can domU's not be given particular access to HW they might want to
> > tweak? I assume at some point a block device backend needs to actually
> > talk to real HW to store the blocks (even if in most cases it would be a
> > kernel doing the HW access on it's behalf).
>
> Yes, it would. Block and network are subsystems with limited visibility,
> access, and harmful capabilities (assuming IOMMU).
>
> If the block device goes down or is misused, block might not work but
> everything else is expected to work. Block only requires visibility of
> the block device for it to work. The same is true for network, GPU, USB,
> etc.
>
> SCMI is different. If SCMI is misused the whole platform is affected.
> SCMI implies visibility of everything in the system. It is not much
> about emulating SCMI but more about mediating SCMI calls.
>
> In other words, SCMI is not a device, it is a core interface. In a Xen
> model, Xen virtualizes CPU and memory and other core features/interfaces
> (timers, interrupt controller, IOMMU, etc). The PCI root complex is
> handled by Xen too. Individual (PCI and non-PCI) devices are assigned to
> guests.
>
> These are the reasons why I think the best way to enable SCMI in
> upstream Xen is with a mediator in the hypervisor as it is currently in
> development. Any chances you could combine your efforts with EPAM's
> outstanding series? You might be able to spot gaps if any, and might
> even have already code to fill those gaps. It would be fantastic to have
> your reviews and/or contributions on xen-devel.
>
> Otherwise, if you have to run the virtio-scmi backend in userspace, why

Just to clarify, this goal is not to run the scmi backend as a linux
userspace app but to run a virtual power coprocessor that will handle
everything which is not system critical and will change from one
product to another which make it quite hard to maintain in the
hypervisor.

I have only looked at the cover letter which mentions the use of SMC
call which will be trapped by Xen before being modified and forward to
ATF. AFAICT, the ATF execution context is quite simple and synchronous
with the request. In our case, we want to be able to manage to I2C
device as an example or to notifies VMs with aynshorous event like
sensor or performance change which virtio-scmi support


> not try to get it to work on Xen :-) It might not be the ideal solution,
> but it could be a good learning experience and pave the way for the
> other virtio backends which definitely will be in userspace
> (virtio-block, virtio-gpu, etc).
>
>
> > >> Currently the demo setup
> > >> is intermediated by a double-ended vhost-user daemon running on the
> > >> devbox acting as a go between a number of QEMU instances representing
> > >> the front and back-ends. You can view the architecture with Vincents
> > >> diagram here:
> > >>
> > >>   
> > >> https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhmYHs/edit?usp=sharing
> > >>
> > >> The key virtq handling is done over the special carve outs of shared
> > >> memory between the front end and guest. However the signalling is
> > >> currently over a virtio device on the backend. This is useful for the
> > >> PoC but obviously in a real system we don't have a hidden POSIX system
> > >> acting as a go between not to mention the additional latency it causes
> > >> with all those context switches.
> > >>
> > >> I was hoping we could get some more of the Xen experts to the next
> > >> Stratos sync (17th Feb) to go over approaches for a properly hosted on
> > >> Xen approach. From my recollection (Vincent please correct me if I'm
> > >> wrong) of last week the issues that need solving are:
> > >
> > > Unfortunately I have a regular conflict which prevents me from being
> > > able to join the Stratos calls. However, I can certainly make myself
> > > available for one call (unless something unexpected comes up).
> > >
> > >
> > >>  * How to handle configuration steps as FE guests come up
> > >>
> > >> The SCMI server will be a long running persistent backend because it is
> > >> managing real HW resources. However the guests may be ephemeral (or just
> > >> restarted) so we

Re: [PATCH] rwlock: remove unneeded subtraction

2022-02-15 Thread Jan Beulich
On 15.02.2022 15:16, Roger Pau Monné wrote:
> On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote:
>> On 15.02.2022 14:13, Julien Grall wrote:
>>> On 15/02/2022 09:39, Roger Pau Monne wrote:
 There's no need to subtract _QR_BIAS from the lock value for storing
 in the local cnts variable in the read lock slow path: the users of
 the value in cnts only care about the writer-related bits and use a
 mask to get the value.

 Note that further setting of cnts in rspin_until_writer_unlock already
 do not subtract _QR_BIAS.
>>>
>>> The rwlock is a copy of the Linux implementation. So I looked at the 
>>> history to find out why _QR_BIAS was substracted.
>>>
>>> It looks like this was done to get better assembly on x86:
>>>
>>> commit f9852b74bec0117b888da39d070c323ea1cb7f4c
>>> Author: Peter Zijlstra 
>>> Date:   Mon Apr 18 01:27:03 2016 +0200
>>>
>>>  locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
>>>
>>>  The only reason for the current code is to make GCC emit only the
>>>  "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
>>>  the result), do so nicer.
>>>
>>>  Signed-off-by: Peter Zijlstra (Intel) 
>>>  Acked-by: Waiman Long 
>>>  Cc: Andrew Morton 
>>>  Cc: Linus Torvalds 
>>>  Cc: Paul E. McKenney 
>>>  Cc: Peter Zijlstra 
>>>  Cc: Thomas Gleixner 
>>>  Cc: linux-a...@vger.kernel.org
>>>  Cc: linux-ker...@vger.kernel.org
>>>  Signed-off-by: Ingo Molnar 
>>>
>>> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
>>> index fec082338668..19248ddf37ce 100644
>>> --- a/kernel/locking/qrwlock.c
>>> +++ b/kernel/locking/qrwlock.c
>>> @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
>>> u32 cnts)
>>>   * that accesses can't leak upwards out of our subsequent critical
>>>   * section in the case that the lock is currently held for write.
>>>   */
>>> -   cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
>>> +   cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
>>>  rspin_until_writer_unlock(lock, cnts);
>>>
>>>  /*
>>>
>>> This is a slowpath, so probably not a concern. But I thought I would 
>>> double check whether the x86 folks are still happy to proceed with that 
>>> in mind.
>>
>> Hmm, that's an interesting observation. Roger - did you inspect the
>> generated code? At the very least the description may want amending.
> 
> It seems to always generate the same code for me when using gcc 8.3,
> I even tried using arch_fetch_and_add directly, it always results
> in:
> 
> 82d04022d983:   f0 0f c1 03 lock xadd %eax,(%rbx)
> 82d04022d987:   25 00 30 00 00  and$0x3000,%eax
> 
> Similarly clang 13.0.0 seem to always generate:
> 
> 82d0402085de:   f0 0f c1 03 lock xadd %eax,(%rbx)
> 82d0402085e2:   89 c1   mov%eax,%ecx
> 82d0402085e4:   81 e1 00 30 00 00   and$0x3000,%ecx
> 
> Maybe I'm missing something, but I don't see a difference in the
> generated code.

I've looked myself in the meantime, and I can largely confirm this.
Clang 5 doesn't eliminate the "add" (or really "lea") though. But
nevertheless ...

> I could add to the commit message:
> 
> "Originally _QR_BIAS was subtracted from the result of
> atomic_add_return_acquire in order to prevent GCC from emitting an
> unneeded ADD instruction. This being in the lock slow path such
> optimizations don't seem likely to make any relevant performance
> difference. Also modern GCC and CLANG versions will already avoid
> emitting the ADD instruction."

... I'm fine with this as explanation; I'd also be fine adding
this to the description while committing.

Jan




Re: [PATCH] tools: remove xenstore entries on vchan server closure

2022-02-15 Thread Jason Andryuk
On Fri, Dec 10, 2021 at 7:35 AM Oleksandr Andrushchenko
 wrote:
>
> From: Oleksandr Andrushchenko 
>
> vchan server creates XenStore entries to advertise its event channel and
> ring, but those are not removed after the server quits.
> Add additional cleanup step, so those are removed, so clients do not try
> to connect to a non-existing server.
>
> Signed-off-by: Oleksandr Andrushchenko 
> ---
>  tools/include/libxenvchan.h |  5 +
>  tools/libs/vchan/init.c | 23 +++
>  tools/libs/vchan/io.c   |  4 
>  tools/libs/vchan/vchan.h| 31 +++
>  4 files changed, 63 insertions(+)
>  create mode 100644 tools/libs/vchan/vchan.h
>
> diff --git a/tools/include/libxenvchan.h b/tools/include/libxenvchan.h
> index d6010b145df2..30cc73cf97e3 100644
> --- a/tools/include/libxenvchan.h
> +++ b/tools/include/libxenvchan.h
> @@ -86,6 +86,11 @@ struct libxenvchan {
> int blocking:1;
> /* communication rings */
> struct libxenvchan_ring read, write;
> +   /**
> +* Base xenstore path for storing ring/event data used by the server
> +* during cleanup.
> +* */
> +   char *xs_path;
>  };
>
>  /**
> diff --git a/tools/libs/vchan/init.c b/tools/libs/vchan/init.c
> index c8510e6ce98a..c6b8674ef541 100644
> --- a/tools/libs/vchan/init.c
> +++ b/tools/libs/vchan/init.c
> @@ -46,6 +46,8 @@
>  #include 
>  #include 
>
> +#include "vchan.h"
> +
>  #ifndef PAGE_SHIFT
>  #define PAGE_SHIFT 12
>  #endif
> @@ -251,6 +253,10 @@ static int init_xs_srv(struct libxenvchan *ctrl, int 
> domain, const char* xs_base
> char ref[16];
> char* domid_str = NULL;
> xs_transaction_t xs_trans = XBT_NULL;
> +
> +   // store the base path so we can clean up on server closure
> +   ctrl->xs_path = strdup(xs_base);

You don't check for NULL here, but you do check for NULL in
close_xs_srv().  I guess it's okay, since it does the right thing.
But I think it would be more robust to check for NULL here.  Is there
a specific reason you wrote it this way?  Otherwise it looks good.

Regards,
Jason



Re: Metadata and signalling channels for Zephyr virtio-backends on Xen

2022-02-15 Thread Vincent Guittot
Hi Stefano,

On Tue, 8 Feb 2022 at 01:16, Stefano Stabellini
 wrote:
>
> On Mon, 7 Feb 2022, Alex Bennée wrote:
> > Hi Stefano,
> >
> > Vincent gave an update on his virtio-scmi work at the last Stratos sync
> > call and the discussion moved onto next steps.
>
> Hi Alex,
>
> I don't know the specifics of virtio-scmi, but if it is about power,
> clocks, reset, etc. like the original SCMI protocol, then virtio-scmi is

virtio-scmi is one transport channel that support SCMI protocol

> likely going to be very different from all the other virtio frontends

The virtio-scmi front-end is merged mainline

> and backends. That's because SCMI requires a full view of the system,
> which is different from something like virtio-net that is limited to the
> emulation of 1 device. For this reason, it is likely that the
> virtio-scmi backend would be a better fit in Xen itself, rather than run
> in userspace inside a VM.

Not sure what you mean when you say that SCMI requires a full view of
the system. If you are referring to the system wide resources which
reset or power up/down the whole SoC, this is not really what we are
targeting here. Those system wide resources should already be handled
by a dedicated power coprocessor. In our case, the IPs of the SoC will
be handled by different VMs but those IPs are usually sharing common
resources like a parent PLL , a power domain or a clock gating reg as
few examples. Because all those VMs can't directly set these resources
without taking into account others and because the power coprocessor
doesn't have an unlimited number of channels, we add an SCMI backend
that will gather and proxy the VM request before accessing the
register that gates some clocks IP as an example or before powering
down an external regulator shared between the camera and another
device. This SCMI backend will most probably also send request with
OSPM permission access to the power coprocessor once aggregating all
the VMs ' request
We are using virtio-cmi protocol because it has the main advantage of
not being tied to an hypervisor

In our PoC, the SCMI backend is running with zehyr and reuse the same
software that can run in the power coprocessor which helps splitting
what is critical and must be handled by power coprocessor and what is
not critical for the system (what is usually managed by linux directly
when their no hypervisor involved typically)

>
> FYI, a good and promising approach to handle both SCMI and SCPI is the
> series recently submitted by EPAM to mediate SCMI and SCPI requests in
> Xen: https://marc.info/?l=xen-devel&m=163947444032590
>
> (Another "special" virtio backend is virtio-iommu for similar reasons:
> the guest p2m address mappings and also the IOMMU drivers are in Xen.
> It is not immediately clear whether a virtio-iommu backend would need to
> be in Xen or run as a process in dom0/domU.)
>
> On the other hand, for all the other "normal" protocols (e.g.
> virtio-net, virtio-block, etc.) the backend would naturally run as a
> process in dom0 or domU (e.g. QEMU in Dom0) as one would expect.
>
>
> > Currently the demo setup
> > is intermediated by a double-ended vhost-user daemon running on the
> > devbox acting as a go between a number of QEMU instances representing
> > the front and back-ends. You can view the architecture with Vincents
> > diagram here:
> >
> >   
> > https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhmYHs/edit?usp=sharing
> >
> > The key virtq handling is done over the special carve outs of shared
> > memory between the front end and guest. However the signalling is
> > currently over a virtio device on the backend. This is useful for the
> > PoC but obviously in a real system we don't have a hidden POSIX system
> > acting as a go between not to mention the additional latency it causes
> > with all those context switches.
> >
> > I was hoping we could get some more of the Xen experts to the next
> > Stratos sync (17th Feb) to go over approaches for a properly hosted on
> > Xen approach. From my recollection (Vincent please correct me if I'm
> > wrong) of last week the issues that need solving are:
>
> Unfortunately I have a regular conflict which prevents me from being
> able to join the Stratos calls. However, I can certainly make myself
> available for one call (unless something unexpected comes up).
>
>
> >  * How to handle configuration steps as FE guests come up
> >
> > The SCMI server will be a long running persistent backend because it is
> > managing real HW resources. However the guests may be ephemeral (or just
> > restarted) so we can't just hard-code everything in a DTB. While the
> > virtio-negotiation in the config space covers most things we still need
> > information like where in the guests address space the shared memory
> > lives and at what offset into that the queues are created. As far as I'm
> > aware the canonical source of domain information is XenStore
> > (https://wiki.xenproject.org/wiki/XenStore) but this re

[libvirt test] 168117: regressions - FAIL

2022-02-15 Thread osstest service owner
flight 168117 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168117/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-amd64-libvirt   6 libvirt-buildfail REGR. vs. 151777
 build-i386-libvirt6 libvirt-buildfail REGR. vs. 151777
 build-arm64-libvirt   6 libvirt-buildfail REGR. vs. 151777

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  f92362003104d72ac3545fac34d7d2972449a2c1
baseline version:
 libvirt  2c846fa6bcc11929c9fb857a22430fb9945654ad

Last test of basis   151777  2020-07-10 04:19:19 Z  585 days
Failing since151818  2020-07-11 04:18:52 Z  584 days  566 attempts
Testing same since   168117  2022-02-15 04:19:00 Z0 days1 attempts


People who touched revisions under test:
Adolfo Jayme Barrientos 
  Aleksandr Alekseev 
  Aleksei Zakharov 
  Andika Triwidada 
  Andrea Bolognani 
  Ani Sinha 
  Balázs Meskó 
  Barrett Schonefeld 
  Bastian Germann 
  Bastien Orivel 
  BiaoXiang Ye 
  Bihong Yu 
  Binfeng Wu 
  Bjoern Walk 
  Boris Fiuczynski 
  Brad Laue 
  Brian Turek 
  Bruno Haible 
  Chris Mayo 
  Christian Borntraeger 
  Christian Ehrhardt 
  Christian Kirbach 
  Christian Schoenebeck 
  Christophe Fergeau 
  Cole Robinson 
  Collin Walling 
  Cornelia Huck 
  Cédric Bosdonnat 
  Côme Borsoi 
  Daniel Henrique Barboza 
  Daniel Letai 
  Daniel P. Berrange 
  Daniel P. Berrangé 
  Didik Supriadi 
  dinglimin 
  Divya Garg 
  Dmitrii Shcherbakov 
  Dmytro Linkin 
  Eiichi Tsukata 
  Emilio Herrera 
  Eric Farman 
  Erik Skultety 
  Fabian Affolter 
  Fabian Freyer 
  Fabiano Fidêncio 
  Fangge Jin 
  Farhan Ali 
  Fedora Weblate Translation 
  Franck Ridel 
  Gavi Teitz 
  gongwei 
  Guoyi Tu
  Göran Uddeborg 
  Halil Pasic 
  Han Han 
  Hao Wang 
  Hela Basa 
  Helmut Grohne 
  Hiroki Narukawa 
  Hyman Huang(黄勇) 
  Ian Wienand 
  Ioanna Alifieraki 
  Ivan Teterevkov 
  Jakob Meng 
  Jamie Strandboge 
  Jamie Strandboge 
  Jan Kuparinen 
  jason lee 
  Jean-Baptiste Holcroft 
  Jia Zhou 
  Jianan Gao 
  Jim Fehlig 
  Jin Yan 
  Jing Qi 
  Jinsheng Zhang 
  Jiri Denemark 
  Joachim Falk 
  John Ferlan 
  Jonathan Watt 
  Jonathon Jongsma 
  Julio Faracco 
  Justin Gatzen 
  Ján Tomko 
  Kashyap Chamarthy 
  Kevin Locke 
  Koichi Murase 
  Kristina Hanicova 
  Laine Stump 
  Laszlo Ersek 
  Lee Yarwood 
  Lei Yang 
  Liao Pingfang 
  Lin Ma 
  Lin Ma 
  Lin Ma 
  Liu Yiding 
  Lubomir Rintel 
  Luke Yue 
  Luyao Zhong 
  Marc Hartmayer 
  Marc-André Lureau 
  Marek Marczykowski-Górecki 
  Markus Schade 
  Martin Kletzander 
  Masayoshi Mizuma 
  Matej Cepl 
  Matt Coleman 
  Matt Coleman 
  Mauro Matteo Cascella 
  Meina Li 
  Michal Privoznik 
  Michał Smyk 
  Milo Casagrande 
  Moshe Levi 
  Muha Aliss 
  Nathan 
  Neal Gompa 
  Nick Chevsky 
  Nick Shyrokovskiy 
  Nickys Music Group 
  Nico Pache 
  Nicolas Lécureuil 
  Nicolas Lécureuil 
  Nikolay Shirokovskiy 
  Olaf Hering 
  Olesya Gerasimenko 
  Or Ozeri 
  Orion Poplawski 
  Pany 
  Patrick Magauran 
  Paulo de Rezende Pinatti 
  Pavel Hrdina 
  Peng Liang 
  Peter Krempa 
  Pino Toscano 
  Pino Toscano 
  Piotr Drąg 
  Prathamesh Chavan 
  Praveen K Paladugu 
  Richard W.M. Jones 
  Ricky Tigg 
  Robin Lee 
  Rohit Kumar 
  Roman Bogorodskiy 
  Roman Bolshakov 
  Ryan Gahagan 
  Ryan Schmidt 
  Sam Hartman 
  Scott Shambarger 
  Sebastian Mitterle 
  SeongHyun Jo 
  Shalini Chellathurai Saroja 
  Shaojun Yang 

Re: [PATCH] rwlock: remove unneeded subtraction

2022-02-15 Thread Roger Pau Monné
On Tue, Feb 15, 2022 at 02:22:02PM +0100, Jan Beulich wrote:
> On 15.02.2022 14:13, Julien Grall wrote:
> > On 15/02/2022 09:39, Roger Pau Monne wrote:
> >> There's no need to subtract _QR_BIAS from the lock value for storing
> >> in the local cnts variable in the read lock slow path: the users of
> >> the value in cnts only care about the writer-related bits and use a
> >> mask to get the value.
> >>
> >> Note that further setting of cnts in rspin_until_writer_unlock already
> >> do not subtract _QR_BIAS.
> > 
> > The rwlock is a copy of the Linux implementation. So I looked at the 
> > history to find out why _QR_BIAS was substracted.
> > 
> > It looks like this was done to get better assembly on x86:
> > 
> > commit f9852b74bec0117b888da39d070c323ea1cb7f4c
> > Author: Peter Zijlstra 
> > Date:   Mon Apr 18 01:27:03 2016 +0200
> > 
> >  locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
> > 
> >  The only reason for the current code is to make GCC emit only the
> >  "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
> >  the result), do so nicer.
> > 
> >  Signed-off-by: Peter Zijlstra (Intel) 
> >  Acked-by: Waiman Long 
> >  Cc: Andrew Morton 
> >  Cc: Linus Torvalds 
> >  Cc: Paul E. McKenney 
> >  Cc: Peter Zijlstra 
> >  Cc: Thomas Gleixner 
> >  Cc: linux-a...@vger.kernel.org
> >  Cc: linux-ker...@vger.kernel.org
> >  Signed-off-by: Ingo Molnar 
> > 
> > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> > index fec082338668..19248ddf37ce 100644
> > --- a/kernel/locking/qrwlock.c
> > +++ b/kernel/locking/qrwlock.c
> > @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
> > u32 cnts)
> >   * that accesses can't leak upwards out of our subsequent critical
> >   * section in the case that the lock is currently held for write.
> >   */
> > -   cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> > +   cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
> >  rspin_until_writer_unlock(lock, cnts);
> > 
> >  /*
> > 
> > This is a slowpath, so probably not a concern. But I thought I would 
> > double check whether the x86 folks are still happy to proceed with that 
> > in mind.
> 
> Hmm, that's an interesting observation. Roger - did you inspect the
> generated code? At the very least the description may want amending.

It seems to always generate the same code for me when using gcc 8.3,
I even tried using arch_fetch_and_add directly, it always results
in:

82d04022d983:   f0 0f c1 03 lock xadd %eax,(%rbx)
82d04022d987:   25 00 30 00 00  and$0x3000,%eax

Similarly clang 13.0.0 seem to always generate:

82d0402085de:   f0 0f c1 03 lock xadd %eax,(%rbx)
82d0402085e2:   89 c1   mov%eax,%ecx
82d0402085e4:   81 e1 00 30 00 00   and$0x3000,%ecx

Maybe I'm missing something, but I don't see a difference in the
generated code.

I could add to the commit message:

"Originally _QR_BIAS was subtracted from the result of
atomic_add_return_acquire in order to prevent GCC from emitting an
unneeded ADD instruction. This being in the lock slow path such
optimizations don't seem likely to make any relevant performance
difference. Also modern GCC and CLANG versions will already avoid
emitting the ADD instruction."

Thanks, Roger.



Re: [PATCH v2 34/70] x86/emul: CFI hardening

2022-02-15 Thread Jan Beulich
On 15.02.2022 14:43, Andrew Cooper wrote:
> On 14/02/2022 13:38, Jan Beulich wrote:
>> On 14.02.2022 13:50, Andrew Cooper wrote:
>>> Control Flow Integrity schemes use toolchain and optionally hardware support
>>> to help protect against call/jump/return oriented programming attacks.
>>>
>>> Use cf_check to annotate function pointer targets for the toolchain.
>>>
>>> pv_emul_is_mem_write() is only used in a single file.  Having it as a static
>>> inline is pointless because it can't be inlined to begin with.
>> I'd like you to consider to re-word this:
> 
> This is the reworded version.
> 
>> It being static inline was for
>> the case of there appearing a 2nd user. I don't view such as pointless.
> 
> I find that impossible to reconcile with your normal review feedback.

Interesting. I don't think I would have objected to something like
this, if it was conceivable that a 2nd user may appear. I don't
think this is the only inline function we've got with just a single
user. I also don't think this is the only inline function we've got
with its address taken, and hence having an out-of-line instantiation.

> It is unconditionally forced out of line because of how it's used,
> meaning that if it ever got used in a second translation unit we'd end
> up with a duplicate function, at which point it would need to be
> non-static and exported to pass review.  (And sanity.)

I'm afraid you've lost me here. What duplicate function? Before and
after the patch the function is static; what changes is merely the
"inline". Two CUs can have identically named static functions, can't
they? Or if that's not the point you try to make, then I have no idea
what it is that you're trying to tell me.

Jan




Re: Metadata and signalling channels for Zephyr virtio-backends on Xen

2022-02-15 Thread Vincent Guittot
Hi All,

Sorry for the late reply but I was off last week. I will go through
the thread and try to answer open point

On Mon, 7 Feb 2022 at 11:56, Alex Bennée  wrote:
>
>
> Hi Stefano,
>
> Vincent gave an update on his virtio-scmi work at the last Stratos sync
> call and the discussion moved onto next steps. Currently the demo setup
> is intermediated by a double-ended vhost-user daemon running on the
> devbox acting as a go between a number of QEMU instances representing
> the front and back-ends. You can view the architecture with Vincents
> diagram here:
>
>   
> https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhmYHs/edit?usp=sharing
>
> The key virtq handling is done over the special carve outs of shared
> memory between the front end and guest. However the signalling is
> currently over a virtio device on the backend. This is useful for the
> PoC but obviously in a real system we don't have a hidden POSIX system
> acting as a go between not to mention the additional latency it causes
> with all those context switches.
>
> I was hoping we could get some more of the Xen experts to the next
> Stratos sync (17th Feb) to go over approaches for a properly hosted on
> Xen approach. From my recollection (Vincent please correct me if I'm
> wrong) of last week the issues that need solving are:
>
>  * How to handle configuration steps as FE guests come up
>
> The SCMI server will be a long running persistent backend because it is
> managing real HW resources. However the guests may be ephemeral (or just
> restarted) so we can't just hard-code everything in a DTB. While the
> virtio-negotiation in the config space covers most things we still need
> information like where in the guests address space the shared memory
> lives and at what offset into that the queues are created. As far as I'm
> aware the canonical source of domain information is XenStore
> (https://wiki.xenproject.org/wiki/XenStore) but this relies on a Dom0
> type approach. Is there an alternative for dom0less systems or do we
> need a dom0-light approach, for example using STR-21 (Ensure Zephyr can
> run cleanly as a Dom0 guest) providing just enough services for FE's to
> register metadata and BE's to read it?
>
>  * How to handle mapping of memory
>
> AIUI the Xen model is the FE guest explicitly makes grant table requests
> to expose portions of it's memory to other domains. Can the BE query the
> hypervisor itself to discover the available grants or does it require
> coordination with Dom0/XenStore for that information to be available to
> the BE domain?

I have noticed that it was possible to share memory between VMs in the
VM config file which seem to be quite similar to what is done with
qemu to share memory object between VMs
>
>  * How to handle signalling
>
> I guess this requires a minimal implementation of the IOREQ calls for
> Zephyr so we can register the handler in the backend? Does the IOREQ API
> allow for a IPI style notifications using the global GIC IRQs?
>
> Forgive the incomplete notes from the Stratos sync, I was trying to type
> while participating in the discussion so hopefully this email captures
> what was missed:
>
>   
> https://linaro.atlassian.net/wiki/spaces/STR/pages/28682518685/2022-02-03+Project+Stratos+Sync+Meeting+Notes
>
> Vincent, anything to add?

I want to use an interface that is not tied to an hypervisor that's
why i have reused the virtio_mmio to emulate the device side where the
backend can get virtqueue description

>
> --
> Alex Bennée



Re: [PATCH v2 06/70] x86: Introduce support for CET-IBT

2022-02-15 Thread Jan Beulich
On 14.02.2022 13:50, Andrew Cooper wrote:
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -39,6 +39,11 @@ config HAS_AS_CET_SS
>   # binutils >= 2.29 or LLVM >= 6
>   def_bool $(as-instr,wrssq %rax$(comma)0;setssbsy)
>  
> +config HAS_CC_CET_IBT
> + # GCC >= 9 and binutils >= 2.29
> + # Retpoline check to work around 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654
> + def_bool $(cc-option,-fcf-protection=branch -mmanual-endbr 
> -mindirect-branch=thunk-extern) && $(as-instr,endbr64)

At the top of asm-defns.h we have a number of similarly operand-less
instructions expressed via .macro expanding to .byte. I don't see why
we couldn't do so here as well, eliminating the need for the
$(as-instr ...). In fact ...

> --- a/xen/arch/x86/include/asm/asm-defns.h
> +++ b/xen/arch/x86/include/asm/asm-defns.h
> @@ -57,6 +57,12 @@
>  INDIRECT_BRANCH jmp \arg
>  .endm
>  
> +#ifdef CONFIG_XEN_IBT
> +# define ENDBR64 endbr64
> +#else
> +# define ENDBR64
> +#endif

... it could also be this macro which ends up conditionally empty,
but would then want expressing as an assembler macro. Albeit no, the
lower case form would probably still be needed to deal with compiler
emitted insns, as the compiler doesn't appear to make recognition of
the command line option dependent on the underlying assembler's
capabilities.

> --- a/xen/arch/x86/include/asm/cpufeatures.h
> +++ b/xen/arch/x86/include/asm/cpufeatures.h
> @@ -39,6 +39,7 @@ XEN_CPUFEATURE(SC_VERW_PV,X86_SYNTH(23)) /* VERW 
> used by Xen for PV */
>  XEN_CPUFEATURE(SC_VERW_HVM,   X86_SYNTH(24)) /* VERW used by Xen for HVM 
> */
>  XEN_CPUFEATURE(SC_VERW_IDLE,  X86_SYNTH(25)) /* VERW used by Xen for 
> idle */
>  XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow 
> Stacks */
> +XEN_CPUFEATURE(XEN_IBT,   X86_SYNTH(27)) /* Xen uses CET Indirect 
> Branch Tracking */

Is a feature flag actually warranted here, rather than a single
global boolean? You don't key any alternatives patching to this
bit, unlike was the case for XEN_SHSTK. And the only consumer is
cpu_has_xen_ibt, expanding to the boot CPU's instance of the bit.

Jan




Re: [PATCH v2 34/70] x86/emul: CFI hardening

2022-02-15 Thread Andrew Cooper
On 14/02/2022 13:38, Jan Beulich wrote:
> On 14.02.2022 13:50, Andrew Cooper wrote:
>> Control Flow Integrity schemes use toolchain and optionally hardware support
>> to help protect against call/jump/return oriented programming attacks.
>>
>> Use cf_check to annotate function pointer targets for the toolchain.
>>
>> pv_emul_is_mem_write() is only used in a single file.  Having it as a static
>> inline is pointless because it can't be inlined to begin with.
> I'd like you to consider to re-word this:

This is the reworded version.

> It being static inline was for
> the case of there appearing a 2nd user. I don't view such as pointless.

I find that impossible to reconcile with your normal review feedback.

It is unconditionally forced out of line because of how it's used,
meaning that if it ever got used in a second translation unit we'd end
up with a duplicate function, at which point it would need to be
non-static and exported to pass review.  (And sanity.)

~Andrew



Re: [PATCH] x86/vmx: remove dead code to create domains without a vLAPIC

2022-02-15 Thread Jan Beulich
On 15.02.2022 12:28, Roger Pau Monne wrote:
> After the removal of PVHv1 it's no longer supported to create a domain
> using hardware virtualization extensions and without a local APIC:
> PVHv2 mandates domains to always have a LAPIC. Remove some stale code
> in VMCS construction and related helpers that catered for that
> use-case.
> 
> No functional change.
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 




Re: [PATCH] rwlock: remove unneeded subtraction

2022-02-15 Thread Jan Beulich
On 15.02.2022 14:13, Julien Grall wrote:
> On 15/02/2022 09:39, Roger Pau Monne wrote:
>> There's no need to subtract _QR_BIAS from the lock value for storing
>> in the local cnts variable in the read lock slow path: the users of
>> the value in cnts only care about the writer-related bits and use a
>> mask to get the value.
>>
>> Note that further setting of cnts in rspin_until_writer_unlock already
>> do not subtract _QR_BIAS.
> 
> The rwlock is a copy of the Linux implementation. So I looked at the 
> history to find out why _QR_BIAS was substracted.
> 
> It looks like this was done to get better assembly on x86:
> 
> commit f9852b74bec0117b888da39d070c323ea1cb7f4c
> Author: Peter Zijlstra 
> Date:   Mon Apr 18 01:27:03 2016 +0200
> 
>  locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
> 
>  The only reason for the current code is to make GCC emit only the
>  "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
>  the result), do so nicer.
> 
>  Signed-off-by: Peter Zijlstra (Intel) 
>  Acked-by: Waiman Long 
>  Cc: Andrew Morton 
>  Cc: Linus Torvalds 
>  Cc: Paul E. McKenney 
>  Cc: Peter Zijlstra 
>  Cc: Thomas Gleixner 
>  Cc: linux-a...@vger.kernel.org
>  Cc: linux-ker...@vger.kernel.org
>  Signed-off-by: Ingo Molnar 
> 
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index fec082338668..19248ddf37ce 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
> u32 cnts)
>   * that accesses can't leak upwards out of our subsequent critical
>   * section in the case that the lock is currently held for write.
>   */
> -   cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> +   cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
>  rspin_until_writer_unlock(lock, cnts);
> 
>  /*
> 
> This is a slowpath, so probably not a concern. But I thought I would 
> double check whether the x86 folks are still happy to proceed with that 
> in mind.

Hmm, that's an interesting observation. Roger - did you inspect the
generated code? At the very least the description may want amending.

Jan




Re: [PATCH] rwlock: remove unneeded subtraction

2022-02-15 Thread Julien Grall

Hi,

On 15/02/2022 09:39, Roger Pau Monne wrote:

There's no need to subtract _QR_BIAS from the lock value for storing
in the local cnts variable in the read lock slow path: the users of
the value in cnts only care about the writer-related bits and use a
mask to get the value.

Note that further setting of cnts in rspin_until_writer_unlock already
do not subtract _QR_BIAS.


The rwlock is a copy of the Linux implementation. So I looked at the 
history to find out why _QR_BIAS was substracted.


It looks like this was done to get better assembly on x86:

commit f9852b74bec0117b888da39d070c323ea1cb7f4c
Author: Peter Zijlstra 
Date:   Mon Apr 18 01:27:03 2016 +0200

locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()

The only reason for the current code is to make GCC emit only the
"LOCK XADD" instruction on x86 (and not do a pointless extra ADD on
the result), do so nicer.

Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Waiman Long 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Ingo Molnar 

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index fec082338668..19248ddf37ce 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -93,7 +93,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, 
u32 cnts)

 * that accesses can't leak upwards out of our subsequent critical
 * section in the case that the lock is currently held for write.
 */
-   cnts = atomic_add_return_acquire(_QR_BIAS, &lock->cnts) - _QR_BIAS;
+   cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
rspin_until_writer_unlock(lock, cnts);

/*

This is a slowpath, so probably not a concern. But I thought I would 
double check whether the x86 folks are still happy to proceed with that 
in mind.


Cheers,

--
Julien Grall



Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Jan Beulich
On 15.02.2022 13:56, Oleksandr Andrushchenko wrote:
> On 15.02.22 14:49, Jan Beulich wrote:
>> On 15.02.2022 12:54, Oleksandr Andrushchenko wrote:
>>> On 15.02.22 13:50, Jan Beulich wrote:
 On 15.02.2022 12:45, Oleksandr Andrushchenko wrote:
> I'm on your side, I just want to hear that we all agree pcidevs
> needs to be converted into rwlock according with the plan you
> suggested and at least now it seems to be an acceptable solution.
 I'd like to express worries though about the conversion of this
 recursive lock into an r/w one.
>>> Could you please elaborate more on this?
>> Not sure what to say beyond the obvious:
> I thought you have something specific in your mind that worries
> you and you can tell what it is. Thus the qustion

Well, the "specific" thing I had in mind was: You'll need to prove
correctness, and we'll need to understand the proof.

Jan




Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Oleksandr Andrushchenko


On 15.02.22 14:49, Jan Beulich wrote:
> On 15.02.2022 12:54, Oleksandr Andrushchenko wrote:
>> On 15.02.22 13:50, Jan Beulich wrote:
>>> On 15.02.2022 12:45, Oleksandr Andrushchenko wrote:
 I'm on your side, I just want to hear that we all agree pcidevs
 needs to be converted into rwlock according with the plan you
 suggested and at least now it seems to be an acceptable solution.
>>> I'd like to express worries though about the conversion of this
>>> recursive lock into an r/w one.
>> Could you please elaborate more on this?
> Not sure what to say beyond the obvious:
I thought you have something specific in your mind that worries
you and you can tell what it is. Thus the qustion
>   At the time of the conversion,
> there certainly was an issue to be solved. You'd need to solve this
> issue differently then. Plus you'd need to make sure that no further
> incarnations of the original issue had been there or have been added in
> the meantime.
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Jan Beulich
On 15.02.2022 13:44, Oleksandr Andrushchenko wrote:
> On 15.02.22 13:54, Oleksandr Andrushchenko wrote:
>> On 15.02.22 13:50, Jan Beulich wrote:
>>> On 15.02.2022 12:45, Oleksandr Andrushchenko wrote:
 I'm on your side, I just want to hear that we all agree pcidevs
 needs to be converted into rwlock according with the plan you
 suggested and at least now it seems to be an acceptable solution.
>>> I'd like to express worries though about the conversion of this
>>> recursive lock into an r/w one.
>> Could you please elaborate more on this?
> What if we just do the following:
> 
> static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
> static rwlock_t DEFINE_RWLOCK(_pcidevs_rwlock);
> 
> void pcidevs_lock(void)
> {
>      read_lock(&_pcidevs_rwlock);
>      spin_lock_recursive(&_pcidevs_lock);
> }
> 
> void pcidevs_unlock(void)
> {
>      spin_unlock_recursive(&_pcidevs_lock);
>      read_unlock(&_pcidevs_rwlock);
> }
> 
> void pcidevs_read_lock(void)
> {
>      read_lock(&_pcidevs_rwlock);
> }
> 
> void pcidevs_read_unlock(void)
> {
>      read_unlock(&_pcidevs_rwlock);
> }
> 
> void pcidevs_write_lock(void)
> {
>      write_lock(&_pcidevs_rwlock);
> }
> 
> void pcidevs_write_unlock(void)
> {
>      write_unlock(&_pcidevs_rwlock);
> }

Hmm, this is an interesting idea. Except that I'm not sure in how
far it'll be suitable: read_lock() won't lock out users of just
lock(), so the solution looks tailored to your vPCI use case. Yet
obviously (I think) read_lock() would want to become usable for
e.g. simple list traversal as well, down the road.

Jan




Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Jan Beulich
On 15.02.2022 12:54, Oleksandr Andrushchenko wrote:
> On 15.02.22 13:50, Jan Beulich wrote:
>> On 15.02.2022 12:45, Oleksandr Andrushchenko wrote:
>>> I'm on your side, I just want to hear that we all agree pcidevs
>>> needs to be converted into rwlock according with the plan you
>>> suggested and at least now it seems to be an acceptable solution.
>> I'd like to express worries though about the conversion of this
>> recursive lock into an r/w one.
> Could you please elaborate more on this?

Not sure what to say beyond the obvious: At the time of the conversion,
there certainly was an issue to be solved. You'd need to solve this
issue differently then. Plus you'd need to make sure that no further
incarnations of the original issue had been there or have been added in
the meantime.

Jan




[qemu-mainline test] 168114: tolerable FAIL - PUSHED

2022-02-15 Thread osstest service owner
flight 168114 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168114/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168059
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168059
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168059
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168059
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168059
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168059
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168059
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168059
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 qemuu2d88a3a595f1094e3ecc6cd2fd1e804634c84b0f
baseline version:
 qemuu0a301624c2f4ced3331ffd5bce85b4274fe132af

Last test of basis   168059  2022-02-08 15:36:56 Z6 days
Failing since168095  2022-02-12 22:37:11 Z2 days5 attempts
Testing same since   168114  2022-02-14 23:09:51 Z0 days1 attempts


People who touched r

Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Oleksandr Andrushchenko


On 15.02.22 13:54, Oleksandr Andrushchenko wrote:
>
> On 15.02.22 13:50, Jan Beulich wrote:
>> On 15.02.2022 12:45, Oleksandr Andrushchenko wrote:
>>> I'm on your side, I just want to hear that we all agree pcidevs
>>> needs to be converted into rwlock according with the plan you
>>> suggested and at least now it seems to be an acceptable solution.
>> I'd like to express worries though about the conversion of this
>> recursive lock into an r/w one.
> Could you please elaborate more on this?
What if we just do the following:

static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
static rwlock_t DEFINE_RWLOCK(_pcidevs_rwlock);

void pcidevs_lock(void)
{
     read_lock(&_pcidevs_rwlock);
     spin_lock_recursive(&_pcidevs_lock);
}

void pcidevs_unlock(void)
{
     spin_unlock_recursive(&_pcidevs_lock);
     read_unlock(&_pcidevs_rwlock);
}

void pcidevs_read_lock(void)
{
     read_lock(&_pcidevs_rwlock);
}

void pcidevs_read_unlock(void)
{
     read_unlock(&_pcidevs_rwlock);
}

void pcidevs_write_lock(void)
{
     write_lock(&_pcidevs_rwlock);
}

void pcidevs_write_unlock(void)
{
     write_unlock(&_pcidevs_rwlock);
}

1. This way most of the code continues to use pcidevs_{lock|unlock}.
2. We need to change writers, those which can add /remove pdev, to use
pcidevs_write_{un}lock
3. Those, which do not modify pdevs (vpci_{read|write}), will use
pcidevs_read_lock
4. We do not introduce d->vpci_rwlock and use pcidevs_{read|write}_lock
as vpci doesn't seem to need to acquire _pcidevs_lock + we use pdev->vpci->lock
as it is now

Is this something which may address your worries?

Thank you,
Oleksandr

[linux-linus test] 168113: tolerable FAIL - PUSHED

2022-02-15 Thread osstest service owner
flight 168113 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168113/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 20 guest-localmigrate/x10   fail REGR. vs. 168080

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168080
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168080
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168080
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168080
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168080
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168080
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linuxd567f5db412ed52de0b3b3efca4a451263de6108
baseline version:
 linuxf1baf68e1383f6ed93eb9cff2866d46562607a43

Last test of basis   168080  2022-02-11 00:09:22 Z4 days
Failing since168086  2022-02-11 20:11:19 Z3 days8 attempts
Testing same since   168113  2022-02-14 19:11:17 Z0 days1 attempts


People who touched revisions under test:
  Aaron Liu 
  Adam Ford 
  Al Cooper 
  Alex Deucher 
  Alexander Egorenkov 
  Alexander Gordeev 
  Alexander Stein 
  Alexandre Ghiti 
  Alim Akhtar 
  Alviro Iskandar Set

Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Oleksandr Andrushchenko


On 15.02.22 13:50, Jan Beulich wrote:
> On 15.02.2022 12:45, Oleksandr Andrushchenko wrote:
>> I'm on your side, I just want to hear that we all agree pcidevs
>> needs to be converted into rwlock according with the plan you
>> suggested and at least now it seems to be an acceptable solution.
> I'd like to express worries though about the conversion of this
> recursive lock into an r/w one.
Could you please elaborate more on this?
I would love not to have 4th approach requested to be implemented ;)
> Jan
>
Thank you in advance,
Oleksandr

Re: [PATCH] rwlock: remove unneeded subtraction

2022-02-15 Thread Luca Fancellu



> On 15 Feb 2022, at 09:39, Roger Pau Monne  wrote:
> 
> There's no need to subtract _QR_BIAS from the lock value for storing
> in the local cnts variable in the read lock slow path: the users of
> the value in cnts only care about the writer-related bits and use a
> mask to get the value.
> 
> Note that further setting of cnts in rspin_until_writer_unlock already
> do not subtract _QR_BIAS.
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Luca Fancellu 

> ---
> xen/common/rwlock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
> index dadab372b5..aa15529bbe 100644
> --- a/xen/common/rwlock.c
> +++ b/xen/common/rwlock.c
> @@ -47,7 +47,7 @@ void queue_read_lock_slowpath(rwlock_t *lock)
> while ( atomic_read(&lock->cnts) & _QW_WMASK )
> cpu_relax();
> 
> -cnts = atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
> +cnts = atomic_add_return(_QR_BIAS, &lock->cnts);
> rspin_until_writer_unlock(lock, cnts);
> 
> /*
> -- 
> 2.34.1
> 
> 




Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Jan Beulich
On 15.02.2022 12:45, Oleksandr Andrushchenko wrote:
> I'm on your side, I just want to hear that we all agree pcidevs
> needs to be converted into rwlock according with the plan you
> suggested and at least now it seems to be an acceptable solution.

I'd like to express worries though about the conversion of this
recursive lock into an r/w one.

Jan




Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Oleksandr Andrushchenko


On 15.02.22 13:39, Roger Pau Monné wrote:
> On Tue, Feb 15, 2022 at 11:12:23AM +, Oleksandr Andrushchenko wrote:
>>
>> On 15.02.22 12:48, Roger Pau Monné wrote:
>>> On Tue, Feb 15, 2022 at 10:11:35AM +0200, Oleksandr Andrushchenko wrote:
 From: Oleksandr Andrushchenko 

 Introduce a per-domain read/write lock to check whether vpci is present,
 so we are sure there are no accesses to the contents of the vpci struct
 if not. This lock can be used (and in a few cases is used right away)
 so that vpci removal can be performed while holding the lock in write
 mode. Previously such removal could race with vpci_read for example.

 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
 from being removed.

 2. Writing the command register and ROM BAR register may trigger
 modify_bars to run, which in turn may access multiple pdevs while
 checking for the existing BAR's overlap. The overlapping check, if done
 under the read lock, requires vpci->lock to be acquired on both devices
 being compared, which may produce a deadlock. It is not possible to
 upgrade read lock to write lock in such a case. So, in order to prevent
 the deadlock, check which registers are going to be written and acquire
 the lock in the appropriate mode from the beginning.

 All other code, which doesn't lead to pdev->vpci destruction and does not
 access multiple pdevs at the same time, can still use a combination of the
 read lock and pdev->vpci->lock.

 3. Optimize if ROM BAR write lock required detection by caching offset
 of the ROM BAR register in vpci->header->rom_reg which depends on
 header's type.

 4. Reduce locked region in vpci_remove_device as it is now possible
 to set pdev->vpci to NULL early right after the write lock is acquired.

 5. Reduce locked region in vpci_add_handlers as it is possible to
 initialize many more fields of the struct vpci before assigning it to
 pdev->vpci.

 6. vpci_{add|remove}_register are required to be called with the write lock
 held, but it is not feasible to add an assert there as it requires
 struct domain to be passed for that. So, add a comment about this 
 requirement
 to these and other functions with the equivalent constraints.

 7. Drop const qualifier where the new rwlock is used and this is 
 appropriate.

 8. Do not call process_pending_softirqs with any locks held. For that 
 unlock
 prior the call and re-acquire the locks after. After re-acquiring the
 lock there is no need to check if pdev->vpci exists:
- in apply_map because of the context it is called (no race condition
  possible)
- for MSI/MSI-X debug code because it is called at the end of
  pdev->vpci access and no further access to pdev->vpci is made

 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock
 and if so, allow reading or writing the hardware register directly. This is
 acceptable as we only deal with Dom0 as of now. Once DomU support is
 added the write will need to be ignored and read return all 0's for the
 guests, while Dom0 can still access the registers directly.

 10. Introduce pcidevs_trylock, so there is a possibility to try locking
 the pcidev's lock.

 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain
 while accessing pdevs in vpci code.
>>> So if you use the pcidevs_lock then it's impossible for the pdev or
>>> pdev->vpci to be removed or recreated, as the pcidevs lock protects
>>> any device operations (add, remove, assign, deassign).
>>>
>>> It's however not OK to use the pcidevs lock in vpci_{read,write}
>>> as-is, as the introduced contention is IMO not acceptable.
>>>
>>> The only viable option I see here is to:
>>>
>>>1. Make the pcidevs lock a rwlock: switch current callers to take the
>>>   lock in write mode, detect and fixup any issues that could arise
>>>   from the lock not being recursive anymore.
>>>2. Take the lock in read mode around vpci_{read,write} sections that
>>>   rely on pdev (including the handlers).
>>>
>>> These items should be at least two separate patches. Let's not mix the
>>> conversion of pcidevs locks with the addition of vPCI support.
>>>
>>> I think with that we could get away without requiring a per-domain
>>> rwlock? Just doing lock ordering in modify_bars regarding
>>> tmp->vpci->lock vs pdev->vpci->lock. Neither pdev or vpci can go away
>>> while holding the pcidevs lock.
>>>
>>> Sorting the situation in modify_bars should also be done as a separate
>>> patch on top of 1. and 2.
>> So, to make it crystal clear: we can do with the locking as in this
>> patch and instead we need to convert pcidevs lock into rwlock.
>> Meaning that I need to drop this patch.
>>
>> Then, 3 patches to follow:
>> 1. pcidevs as rwlock
>> 2. vpci_

Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Roger Pau Monné
On Tue, Feb 15, 2022 at 11:12:23AM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 15.02.22 12:48, Roger Pau Monné wrote:
> > On Tue, Feb 15, 2022 at 10:11:35AM +0200, Oleksandr Andrushchenko wrote:
> >> From: Oleksandr Andrushchenko 
> >>
> >> Introduce a per-domain read/write lock to check whether vpci is present,
> >> so we are sure there are no accesses to the contents of the vpci struct
> >> if not. This lock can be used (and in a few cases is used right away)
> >> so that vpci removal can be performed while holding the lock in write
> >> mode. Previously such removal could race with vpci_read for example.
> >>
> >> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
> >> from being removed.
> >>
> >> 2. Writing the command register and ROM BAR register may trigger
> >> modify_bars to run, which in turn may access multiple pdevs while
> >> checking for the existing BAR's overlap. The overlapping check, if done
> >> under the read lock, requires vpci->lock to be acquired on both devices
> >> being compared, which may produce a deadlock. It is not possible to
> >> upgrade read lock to write lock in such a case. So, in order to prevent
> >> the deadlock, check which registers are going to be written and acquire
> >> the lock in the appropriate mode from the beginning.
> >>
> >> All other code, which doesn't lead to pdev->vpci destruction and does not
> >> access multiple pdevs at the same time, can still use a combination of the
> >> read lock and pdev->vpci->lock.
> >>
> >> 3. Optimize if ROM BAR write lock required detection by caching offset
> >> of the ROM BAR register in vpci->header->rom_reg which depends on
> >> header's type.
> >>
> >> 4. Reduce locked region in vpci_remove_device as it is now possible
> >> to set pdev->vpci to NULL early right after the write lock is acquired.
> >>
> >> 5. Reduce locked region in vpci_add_handlers as it is possible to
> >> initialize many more fields of the struct vpci before assigning it to
> >> pdev->vpci.
> >>
> >> 6. vpci_{add|remove}_register are required to be called with the write lock
> >> held, but it is not feasible to add an assert there as it requires
> >> struct domain to be passed for that. So, add a comment about this 
> >> requirement
> >> to these and other functions with the equivalent constraints.
> >>
> >> 7. Drop const qualifier where the new rwlock is used and this is 
> >> appropriate.
> >>
> >> 8. Do not call process_pending_softirqs with any locks held. For that 
> >> unlock
> >> prior the call and re-acquire the locks after. After re-acquiring the
> >> lock there is no need to check if pdev->vpci exists:
> >>   - in apply_map because of the context it is called (no race condition
> >> possible)
> >>   - for MSI/MSI-X debug code because it is called at the end of
> >> pdev->vpci access and no further access to pdev->vpci is made
> >>
> >> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock
> >> and if so, allow reading or writing the hardware register directly. This is
> >> acceptable as we only deal with Dom0 as of now. Once DomU support is
> >> added the write will need to be ignored and read return all 0's for the
> >> guests, while Dom0 can still access the registers directly.
> >>
> >> 10. Introduce pcidevs_trylock, so there is a possibility to try locking
> >> the pcidev's lock.
> >>
> >> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain
> >> while accessing pdevs in vpci code.
> > So if you use the pcidevs_lock then it's impossible for the pdev or
> > pdev->vpci to be removed or recreated, as the pcidevs lock protects
> > any device operations (add, remove, assign, deassign).
> >
> > It's however not OK to use the pcidevs lock in vpci_{read,write}
> > as-is, as the introduced contention is IMO not acceptable.
> >
> > The only viable option I see here is to:
> >
> >   1. Make the pcidevs lock a rwlock: switch current callers to take the
> >  lock in write mode, detect and fixup any issues that could arise
> >  from the lock not being recursive anymore.
> >   2. Take the lock in read mode around vpci_{read,write} sections that
> >  rely on pdev (including the handlers).
> >
> > These items should be at least two separate patches. Let's not mix the
> > conversion of pcidevs locks with the addition of vPCI support.
> >
> > I think with that we could get away without requiring a per-domain
> > rwlock? Just doing lock ordering in modify_bars regarding
> > tmp->vpci->lock vs pdev->vpci->lock. Neither pdev or vpci can go away
> > while holding the pcidevs lock.
> >
> > Sorting the situation in modify_bars should also be done as a separate
> > patch on top of 1. and 2.
> So, to make it crystal clear: we can do with the locking as in this
> patch and instead we need to convert pcidevs lock into rwlock.
> Meaning that I need to drop this patch.
> 
> Then, 3 patches to follow:
> 1. pcidevs as rwlock
> 2. vpci_{read|write} and the rest using new pcidevs rwlock
> 3. loc

Re: [PATCH] tools: remove xenstore entries on vchan server closure

2022-02-15 Thread Oleksandr Andrushchenko
Anthony, could you please take a look?

Thank you in advance,
Oleksandr

On 10.12.21 14:35, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko 
>
> vchan server creates XenStore entries to advertise its event channel and
> ring, but those are not removed after the server quits.
> Add additional cleanup step, so those are removed, so clients do not try
> to connect to a non-existing server.
>
> Signed-off-by: Oleksandr Andrushchenko 
> ---
>   tools/include/libxenvchan.h |  5 +
>   tools/libs/vchan/init.c | 23 +++
>   tools/libs/vchan/io.c   |  4 
>   tools/libs/vchan/vchan.h| 31 +++
>   4 files changed, 63 insertions(+)
>   create mode 100644 tools/libs/vchan/vchan.h
>
> diff --git a/tools/include/libxenvchan.h b/tools/include/libxenvchan.h
> index d6010b145df2..30cc73cf97e3 100644
> --- a/tools/include/libxenvchan.h
> +++ b/tools/include/libxenvchan.h
> @@ -86,6 +86,11 @@ struct libxenvchan {
>   int blocking:1;
>   /* communication rings */
>   struct libxenvchan_ring read, write;
> + /**
> +  * Base xenstore path for storing ring/event data used by the server
> +  * during cleanup.
> +  * */
> + char *xs_path;
>   };
>   
>   /**
> diff --git a/tools/libs/vchan/init.c b/tools/libs/vchan/init.c
> index c8510e6ce98a..c6b8674ef541 100644
> --- a/tools/libs/vchan/init.c
> +++ b/tools/libs/vchan/init.c
> @@ -46,6 +46,8 @@
>   #include 
>   #include 
>   
> +#include "vchan.h"
> +
>   #ifndef PAGE_SHIFT
>   #define PAGE_SHIFT 12
>   #endif
> @@ -251,6 +253,10 @@ static int init_xs_srv(struct libxenvchan *ctrl, int 
> domain, const char* xs_base
>   char ref[16];
>   char* domid_str = NULL;
>   xs_transaction_t xs_trans = XBT_NULL;
> +
> + // store the base path so we can clean up on server closure
> + ctrl->xs_path = strdup(xs_base);
> +
>   xs = xs_open(0);
>   if (!xs)
>   goto fail;
> @@ -298,6 +304,23 @@ retry_transaction:
>   return ret;
>   }
>   
> +void close_xs_srv(struct libxenvchan *ctrl)
> +{
> + struct xs_handle *xs;
> +
> + if (!ctrl->xs_path)
> + return;
> +
> + xs = xs_open(0);
> + if (!xs)
> + goto fail;
> +
> + xs_rm(xs, XBT_NULL, ctrl->xs_path);
> +
> +fail:
> + free(ctrl->xs_path);
> +}
> +
>   static int min_order(size_t siz)
>   {
>   int rv = PAGE_SHIFT;
> diff --git a/tools/libs/vchan/io.c b/tools/libs/vchan/io.c
> index da303fbc01ca..1f201ad554f2 100644
> --- a/tools/libs/vchan/io.c
> +++ b/tools/libs/vchan/io.c
> @@ -40,6 +40,8 @@
>   #include 
>   #include 
>   
> +#include "vchan.h"
> +
>   #ifndef PAGE_SHIFT
>   #define PAGE_SHIFT 12
>   #endif
> @@ -384,5 +386,7 @@ void libxenvchan_close(struct libxenvchan *ctrl)
>   if (ctrl->gnttab)
>   xengnttab_close(ctrl->gnttab);
>   }
> + if (ctrl->is_server)
> + close_xs_srv(ctrl);
>   free(ctrl);
>   }
> diff --git a/tools/libs/vchan/vchan.h b/tools/libs/vchan/vchan.h
> new file mode 100644
> index ..621016ef42e5
> --- /dev/null
> +++ b/tools/libs/vchan/vchan.h
> @@ -0,0 +1,31 @@
> +/**
> + * @file
> + * @section AUTHORS
> + *
> + * Copyright (C) 2021 EPAM Systems Inc.
> + *
> + * @section LICENSE
> + *
> + *  This library is free software; you can redistribute it and/or
> + *  modify it under the terms of the GNU Lesser General Public
> + *  License as published by the Free Software Foundation; either
> + *  version 2.1 of the License, or (at your option) any later version.
> + *
> + *  This library is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + *  Lesser General Public License for more details.
> + *
> + *  You should have received a copy of the GNU Lesser General Public
> + *  License along with this library; If not, see 
> .
> + *
> + * @section DESCRIPTION
> + *
> + *  This file contains common libxenvchan declarations.
> + */
> +#ifndef LIBVCHAN_H
> +#define LIBVCHAN_H
> +
> +void close_xs_srv(struct libxenvchan *ctrl);
> +
> +#endif /* LIBVCHAN_H */


[PATCH] x86/vmx: remove dead code to create domains without a vLAPIC

2022-02-15 Thread Roger Pau Monne
After the removal of PVHv1 it's no longer supported to create a domain
using hardware virtualization extensions and without a local APIC:
PVHv2 mandates domains to always have a LAPIC. Remove some stale code
in VMCS construction and related helpers that catered for that
use-case.

No functional change.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/hvm/vmx/vmcs.c | 14 --
 xen/arch/x86/hvm/vmx/vmx.c  |  4 ++--
 2 files changed, 2 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 7ab15e07a0..9eda8a5f0f 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1123,20 +1123,6 @@ static int construct_vmcs(struct vcpu *v)
 /* Do not enable Monitor Trap Flag unless start single step debug */
 v->arch.hvm.vmx.exec_control &= ~CPU_BASED_MONITOR_TRAP_FLAG;
 
-if ( !has_vlapic(d) )
-{
-/* Disable virtual apics, TPR */
-v->arch.hvm.vmx.secondary_exec_control &=
-~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
-  | SECONDARY_EXEC_APIC_REGISTER_VIRT
-  | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
-v->arch.hvm.vmx.exec_control &= ~CPU_BASED_TPR_SHADOW;
-
-/* In turn, disable posted interrupts. */
-__vmwrite(PIN_BASED_VM_EXEC_CONTROL,
-  vmx_pin_based_exec_control & ~PIN_BASED_POSTED_INTERRUPT);
-}
-
 vmx_update_cpu_exec_control(v);
 
 __vmwrite(VM_EXIT_CONTROLS, vmexit_ctl);
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 36c8a12cfe..0f98fb4f29 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -419,7 +419,7 @@ static void domain_creation_finished(struct domain *d)
 gfn_t gfn = gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE);
 bool ipat;
 
-if ( !has_vlapic(d) || mfn_eq(apic_access_mfn, INVALID_MFN) )
+if ( mfn_eq(apic_access_mfn, INVALID_MFN) )
 return;
 
 ASSERT(epte_get_entry_emt(d, gfn, apic_access_mfn, 0, &ipat,
@@ -3317,7 +3317,7 @@ static void vmx_install_vlapic_mapping(struct vcpu *v)
 {
 paddr_t virt_page_ma, apic_page_ma;
 
-if ( !has_vlapic(v->domain) || mfn_eq(apic_access_mfn, INVALID_MFN) )
+if ( mfn_eq(apic_access_mfn, INVALID_MFN) )
 return;
 
 ASSERT(cpu_has_vmx_virtualize_apic_accesses);
-- 
2.34.1




Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Oleksandr Andrushchenko


On 15.02.22 12:48, Roger Pau Monné wrote:
> On Tue, Feb 15, 2022 at 10:11:35AM +0200, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko 
>>
>> Introduce a per-domain read/write lock to check whether vpci is present,
>> so we are sure there are no accesses to the contents of the vpci struct
>> if not. This lock can be used (and in a few cases is used right away)
>> so that vpci removal can be performed while holding the lock in write
>> mode. Previously such removal could race with vpci_read for example.
>>
>> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
>> from being removed.
>>
>> 2. Writing the command register and ROM BAR register may trigger
>> modify_bars to run, which in turn may access multiple pdevs while
>> checking for the existing BAR's overlap. The overlapping check, if done
>> under the read lock, requires vpci->lock to be acquired on both devices
>> being compared, which may produce a deadlock. It is not possible to
>> upgrade read lock to write lock in such a case. So, in order to prevent
>> the deadlock, check which registers are going to be written and acquire
>> the lock in the appropriate mode from the beginning.
>>
>> All other code, which doesn't lead to pdev->vpci destruction and does not
>> access multiple pdevs at the same time, can still use a combination of the
>> read lock and pdev->vpci->lock.
>>
>> 3. Optimize if ROM BAR write lock required detection by caching offset
>> of the ROM BAR register in vpci->header->rom_reg which depends on
>> header's type.
>>
>> 4. Reduce locked region in vpci_remove_device as it is now possible
>> to set pdev->vpci to NULL early right after the write lock is acquired.
>>
>> 5. Reduce locked region in vpci_add_handlers as it is possible to
>> initialize many more fields of the struct vpci before assigning it to
>> pdev->vpci.
>>
>> 6. vpci_{add|remove}_register are required to be called with the write lock
>> held, but it is not feasible to add an assert there as it requires
>> struct domain to be passed for that. So, add a comment about this requirement
>> to these and other functions with the equivalent constraints.
>>
>> 7. Drop const qualifier where the new rwlock is used and this is appropriate.
>>
>> 8. Do not call process_pending_softirqs with any locks held. For that unlock
>> prior the call and re-acquire the locks after. After re-acquiring the
>> lock there is no need to check if pdev->vpci exists:
>>   - in apply_map because of the context it is called (no race condition
>> possible)
>>   - for MSI/MSI-X debug code because it is called at the end of
>> pdev->vpci access and no further access to pdev->vpci is made
>>
>> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock
>> and if so, allow reading or writing the hardware register directly. This is
>> acceptable as we only deal with Dom0 as of now. Once DomU support is
>> added the write will need to be ignored and read return all 0's for the
>> guests, while Dom0 can still access the registers directly.
>>
>> 10. Introduce pcidevs_trylock, so there is a possibility to try locking
>> the pcidev's lock.
>>
>> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain
>> while accessing pdevs in vpci code.
> So if you use the pcidevs_lock then it's impossible for the pdev or
> pdev->vpci to be removed or recreated, as the pcidevs lock protects
> any device operations (add, remove, assign, deassign).
>
> It's however not OK to use the pcidevs lock in vpci_{read,write}
> as-is, as the introduced contention is IMO not acceptable.
>
> The only viable option I see here is to:
>
>   1. Make the pcidevs lock a rwlock: switch current callers to take the
>  lock in write mode, detect and fixup any issues that could arise
>  from the lock not being recursive anymore.
>   2. Take the lock in read mode around vpci_{read,write} sections that
>  rely on pdev (including the handlers).
>
> These items should be at least two separate patches. Let's not mix the
> conversion of pcidevs locks with the addition of vPCI support.
>
> I think with that we could get away without requiring a per-domain
> rwlock? Just doing lock ordering in modify_bars regarding
> tmp->vpci->lock vs pdev->vpci->lock. Neither pdev or vpci can go away
> while holding the pcidevs lock.
>
> Sorting the situation in modify_bars should also be done as a separate
> patch on top of 1. and 2.
So, to make it crystal clear: we can do with the locking as in this
patch and instead we need to convert pcidevs lock into rwlock.
Meaning that I need to drop this patch.

Then, 3 patches to follow:
1. pcidevs as rwlock
2. vpci_{read|write} and the rest using new pcidevs rwlock
3. lock ordering in modify_bars

Is it what we want?

Thank you,
Oleksandr

Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Oleksandr Andrushchenko


On 15.02.22 12:48, Roger Pau Monné wrote:
> On Tue, Feb 15, 2022 at 10:11:35AM +0200, Oleksandr Andrushchenko wrote:
> @@ -911,7 +914,11 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>>   struct pci_dev *pdev = msix->pdev;
>>   
>>   spin_unlock(&msix->pdev->vpci->lock);
>> +pcidevs_unlock();
>> +read_unlock(&pdev->domain->vpci_rwlock);
>>   process_pending_softirqs();
>> +read_lock(&pdev->domain->vpci_rwlock);
>> +pcidevs_lock();
> This is again an ABBA situation: vpci_add_handlers will get called
> with pci_devs locked, and it will try to acquire the per-domain vpci
> lock (so pcidevs -> vpci_rwlock) while here and in other places in the
> patch to you have inverse locking order (vpci_rwlock -> pcidevs).
Indeed, I need to always lock in this order: pcidevs -> vpci_rwlock
to prevent ABBA, good catch
>
>>   /* NB: we assume that pdev cannot go away for an alive domain. 
>> */
>>   if ( !pdev->vpci || !spin_trylock(&pdev->vpci->lock) )
>>   return -EBUSY;
>> @@ -323,10 +334,18 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
>> unsigned int size)
>>   }
>>   
>>   /* Find the PCI dev matching the address. */
>> +pcidevs_lock();
>>   pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>> +pcidevs_unlock();
>>   if ( !pdev )
>>   return vpci_read_hw(sbdf, reg, size);
> There's a window here (between dropping the pcidevs lock and acquiring
> the vpci_rwlock where either the pdev or pdev->vpci could be removed
> or recreated.
Yes, I know that. But this is the best I came up with...
>
> Thanks, Roger.
Thank you,
Oleksandr

Re: [PATCH 5/5] arm/dom0less: assign dom0less guests to cpupools

2022-02-15 Thread Juergen Gross

On 15.02.22 11:15, Luca Fancellu wrote:

Introduce domain-cpupool property of a xen,domain device tree node,
that specifies the cpupool device tree handle of a xen,cpupool node
that identifies a cpupool created at boot time where the guest will
be assigned on creation.

Add member to the xen_arch_domainconfig public interface so the
XEN_DOMCTL_INTERFACE_VERSION version is bumped.

Update documentation about the property.

Signed-off-by: Luca Fancellu 
---
  docs/misc/arm/device-tree/booting.txt | 5 +
  xen/arch/arm/domain.c | 6 ++
  xen/arch/arm/domain_build.c   | 9 -
  xen/arch/x86/domain.c | 6 ++
  xen/common/domain.c   | 5 -
  xen/include/public/arch-arm.h | 2 ++
  xen/include/public/domctl.h   | 2 +-
  xen/include/xen/domain.h  | 3 +++
  8 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index 71895663a4de..0f1f210fa449 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -182,6 +182,11 @@ with the following properties:
  Both #address-cells and #size-cells need to be specified because
  both sub-nodes (described shortly) have reg properties.
  
+- domain-cpupool

+
+Optional. Handle to a xen,cpupool device tree node that identifies the
+cpupool where the guest will be started at boot.
+
  Under the "xen,domain" compatible node, one or more sub-nodes are present
  for the DomU kernel and ramdisk.
  
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c

index 92a6c509e5c5..be350b28b588 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -788,6 +788,12 @@ fail:
  return rc;
  }
  
+unsigned int

+arch_get_domain_cpupool_id(const struct xen_domctl_createdomain *config)
+{
+return config->arch.cpupool_id;
+}
+


I don't see why this should be arch specific.


  void arch_domain_destroy(struct domain *d)
  {
  /* IOMMU page table is shared with P2M, always call
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 6931c022a2e8..4f239e756775 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -3015,7 +3015,8 @@ static int __init construct_domU(struct domain *d,
  void __init create_domUs(void)
  {
  struct dt_device_node *node;
-const struct dt_device_node *chosen = dt_find_node_by_path("/chosen");
+const struct dt_device_node *cpupool_node,
+*chosen = dt_find_node_by_path("/chosen");
  
  BUG_ON(chosen == NULL);

  dt_for_each_child_node(chosen, node)
@@ -3053,6 +3054,12 @@ void __init create_domUs(void)
   GUEST_VPL011_SPI - 32 + 1);
  }
  
+/* Get the optional property domain-cpupool */

+cpupool_node = dt_parse_phandle(node, "domain-cpupool", 0);
+if ( cpupool_node )
+dt_property_read_u32(cpupool_node, "cpupool-id",
+ &d_cfg.arch.cpupool_id);
+
  /*
   * The variable max_init_domid is initialized with zero, so here it's
   * very important to use the pre-increment operator to call
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index ef1812dc1402..3e3cf88c9c82 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -880,6 +880,12 @@ int arch_domain_create(struct domain *d,
  return rc;
  }
  
+unsigned int

+arch_get_domain_cpupool_id(const struct xen_domctl_createdomain *config)
+{
+return 0;
+}
+
  void arch_domain_destroy(struct domain *d)
  {
  if ( is_hvm_domain(d) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2048ebad86ff..d42ca8292025 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -665,6 +665,8 @@ struct domain *domain_create(domid_t domid,
  
  if ( !is_idle_domain(d) )

  {
+unsigned int domain_cpupool_id;
+
  watchdog_domain_init(d);
  init_status |= INIT_watchdog;
  
@@ -698,7 +700,8 @@ struct domain *domain_create(domid_t domid,

  if ( !d->pbuf )
  goto fail;
  
-if ( (err = sched_init_domain(d, 0)) != 0 )

+domain_cpupool_id = arch_get_domain_cpupool_id(config);
+if ( (err = sched_init_domain(d, domain_cpupool_id)) != 0 )
  goto fail;
  
  if ( (err = late_hwdom_init(d)) != 0 )

diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 94b31511ddea..2c5d1ea7f01a 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -321,6 +321,8 @@ struct xen_arch_domainconfig {
  uint16_t tee_type;
  /* IN */
  uint32_t nr_spis;
+/* IN */
+unsigned int cpupool_id;


As said above: why is this arch specific? Moving it to the common part
would enable libxl to get rid of having to call xc_cpupool_movedomain()
in libxl__domain_make().


Juergen


OpenPGP_0xB

Re: [PATCH 4/5] xen/cpupool: Create different cpupools at boot time

2022-02-15 Thread Juergen Gross

On 15.02.22 11:15, Luca Fancellu wrote:

Introduce an architecture specific way to create different cpupools
at boot time, this is particularly useful on ARM big.LITTLE system
where there might be the need to have different cpupools for each type
of core, but also systems using NUMA can have different cpu pools for
each node.

The feature on arm relies on a specification of the cpupools from the
device tree to build pools and assign cpus to them.

Documentation is created to explain the feature.

Signed-off-by: Luca Fancellu 


IIRC I suggested to have the core functionality in common code in order
to allow using boot time cpupool creation e.g. via commandline for x86,
too.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2] vpci: introduce per-domain lock to protect vpci structure

2022-02-15 Thread Roger Pau Monné
On Tue, Feb 15, 2022 at 10:11:35AM +0200, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko 
> 
> Introduce a per-domain read/write lock to check whether vpci is present,
> so we are sure there are no accesses to the contents of the vpci struct
> if not. This lock can be used (and in a few cases is used right away)
> so that vpci removal can be performed while holding the lock in write
> mode. Previously such removal could race with vpci_read for example.
> 
> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
> from being removed.
> 
> 2. Writing the command register and ROM BAR register may trigger
> modify_bars to run, which in turn may access multiple pdevs while
> checking for the existing BAR's overlap. The overlapping check, if done
> under the read lock, requires vpci->lock to be acquired on both devices
> being compared, which may produce a deadlock. It is not possible to
> upgrade read lock to write lock in such a case. So, in order to prevent
> the deadlock, check which registers are going to be written and acquire
> the lock in the appropriate mode from the beginning.
> 
> All other code, which doesn't lead to pdev->vpci destruction and does not
> access multiple pdevs at the same time, can still use a combination of the
> read lock and pdev->vpci->lock.
> 
> 3. Optimize if ROM BAR write lock required detection by caching offset
> of the ROM BAR register in vpci->header->rom_reg which depends on
> header's type.
> 
> 4. Reduce locked region in vpci_remove_device as it is now possible
> to set pdev->vpci to NULL early right after the write lock is acquired.
> 
> 5. Reduce locked region in vpci_add_handlers as it is possible to
> initialize many more fields of the struct vpci before assigning it to
> pdev->vpci.
> 
> 6. vpci_{add|remove}_register are required to be called with the write lock
> held, but it is not feasible to add an assert there as it requires
> struct domain to be passed for that. So, add a comment about this requirement
> to these and other functions with the equivalent constraints.
> 
> 7. Drop const qualifier where the new rwlock is used and this is appropriate.
> 
> 8. Do not call process_pending_softirqs with any locks held. For that unlock
> prior the call and re-acquire the locks after. After re-acquiring the
> lock there is no need to check if pdev->vpci exists:
>  - in apply_map because of the context it is called (no race condition
>possible)
>  - for MSI/MSI-X debug code because it is called at the end of
>pdev->vpci access and no further access to pdev->vpci is made
> 
> 9. Check for !pdev->vpci in vpci_{read|write} after acquiring the lock
> and if so, allow reading or writing the hardware register directly. This is
> acceptable as we only deal with Dom0 as of now. Once DomU support is
> added the write will need to be ignored and read return all 0's for the
> guests, while Dom0 can still access the registers directly.
> 
> 10. Introduce pcidevs_trylock, so there is a possibility to try locking
> the pcidev's lock.
> 
> 11. Use pcidev's lock around for_each_pdev and pci_get_pdev_by_domain
> while accessing pdevs in vpci code.

So if you use the pcidevs_lock then it's impossible for the pdev or
pdev->vpci to be removed or recreated, as the pcidevs lock protects
any device operations (add, remove, assign, deassign).

It's however not OK to use the pcidevs lock in vpci_{read,write}
as-is, as the introduced contention is IMO not acceptable.

The only viable option I see here is to:

 1. Make the pcidevs lock a rwlock: switch current callers to take the
lock in write mode, detect and fixup any issues that could arise
from the lock not being recursive anymore.
 2. Take the lock in read mode around vpci_{read,write} sections that
rely on pdev (including the handlers).

These items should be at least two separate patches. Let's not mix the
conversion of pcidevs locks with the addition of vPCI support.

I think with that we could get away without requiring a per-domain
rwlock? Just doing lock ordering in modify_bars regarding
tmp->vpci->lock vs pdev->vpci->lock. Neither pdev or vpci can go away
while holding the pcidevs lock.

Sorting the situation in modify_bars should also be done as a separate
patch on top of 1. and 2.

> 
> 12. This is based on the discussion at [1].
> 
> [1] https://lore.kernel.org/all/20220204063459.680961-4-andr2...@gmail.com/
> 
> Suggested-by: Roger Pau Monné 
> Suggested-by: Jan Beulich 
> Signed-off-by: Oleksandr Andrushchenko 

I've made some small comments below, but given my proposal above I
think the code would change a great deal if we decide to use pcidevs
lock.

> 
> ---
> This was checked on x86: with and without PVH Dom0.
> 
> Since v1:
> - s/ASSERT(!!/ASSERT(
> - move vpci_header_write_lock to vpci.c and rename to
>   vpci_header_need_write_lock
> - use a simple static overlap function instead of vpci_offset_cmp
> - signal no ROM BAR with rom_reg == 0
> - msix_accept: new line before retur

Re: [PATCH 3/5] xen/sched: retrieve scheduler id by name

2022-02-15 Thread Juergen Gross

On 15.02.22 11:15, Luca Fancellu wrote:

Add a public function to retrieve the scheduler id by the scheduler
name.

Signed-off-by: Luca Fancellu 
---
  xen/common/sched/core.c | 11 +++
  xen/include/xen/sched.h | 11 +++
  2 files changed, 22 insertions(+)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 8f4b1ca10d1c..9696d3c1d769 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2947,6 +2947,17 @@ void scheduler_enable(void)
  scheduler_active = true;
  }
  
+int __init sched_get_id_by_name(const char *sched_name)

+{
+unsigned int i;
+
+for ( i = 0; i < NUM_SCHEDULERS; i++ )
+if ( schedulers[i] && !strcmp(schedulers[i]->opt_name, sched_name) )
+return schedulers[i]->sched_id;
+
+return -1;
+}
+


Please make use of this function in scheduler_init(), as this
functionality is open coded there, too.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/5] xen/sched: create public function for cpupools creation

2022-02-15 Thread Juergen Gross

On 15.02.22 11:15, Luca Fancellu wrote:

Create new public function to create cpupools, it checks for pool id
uniqueness before creating the pool and can take a scheduler id or
a negative value that means the default Xen scheduler will be used.

Signed-off-by: Luca Fancellu 


Reviewed-by: Juergen Gross 

with one further question: you are allowing to use another scheduler,
but what if someone wants to set non-standard scheduling parameters
(e.g. another time slice)?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] rwlock: remove unneeded subtraction

2022-02-15 Thread Jan Beulich
On 15.02.2022 10:39, Roger Pau Monne wrote:
> There's no need to subtract _QR_BIAS from the lock value for storing
> in the local cnts variable in the read lock slow path: the users of
> the value in cnts only care about the writer-related bits and use a
> mask to get the value.
> 
> Note that further setting of cnts in rspin_until_writer_unlock already
> do not subtract _QR_BIAS.
> 
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 




Re: [PATCH 1/5] tools/cpupools: Give a name to unnamed cpupools

2022-02-15 Thread Juergen Gross

On 15.02.22 11:15, Luca Fancellu wrote:

With the introduction of boot time cpupools, Xen can create many
different cpupools at boot time other than cpupool with id 0.

Since these newly created cpupools can't have an
entry in Xenstore, create the entry using xen-init-dom0
helper with the usual convention: Pool-.

Given the change, remove the check for poolid == 0 from
libxl_cpupoolid_to_name(...).

Signed-off-by: Luca Fancellu 
---
  tools/helpers/xen-init-dom0.c  | 26 +-
  tools/libs/light/libxl_utils.c |  3 +--
  2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/tools/helpers/xen-init-dom0.c b/tools/helpers/xen-init-dom0.c
index c99224a4b607..3539f56faeb0 100644
--- a/tools/helpers/xen-init-dom0.c
+++ b/tools/helpers/xen-init-dom0.c
@@ -43,7 +43,10 @@ int main(int argc, char **argv)
  int rc;
  struct xs_handle *xsh = NULL;
  xc_interface *xch = NULL;
-char *domname_string = NULL, *domid_string = NULL;
+char *domname_string = NULL, *domid_string = NULL, *pool_string = NULL;


pool_string seems to be unused.


+char pool_path[strlen("/local/pool") + 12], pool_name[strlen("Pool-") + 5];


I don't like that. Why don't you use pointers and ...


+xc_cpupoolinfo_t *xcinfo;
+unsigned int pool_id = 0;
  libxl_uuid uuid;
  
  /* Accept 0 or 1 argument */

@@ -114,6 +117,27 @@ int main(int argc, char **argv)
  goto out;
  }
  
+/* Create an entry in xenstore for each cpupool on the system */

+do {
+xcinfo = xc_cpupool_getinfo(xch, pool_id);
+if (xcinfo != NULL) {
+if (xcinfo->cpupool_id != pool_id)
+pool_id = xcinfo->cpupool_id;
+snprintf(pool_path, sizeof(pool_path), "/local/pool/%d/name",
+ pool_id);
+snprintf(pool_name, sizeof(pool_name), "Pool-%d", pool_id);


... use asprintf() here for allocating the strings in the needed size?


+pool_id++;
+if (!xs_write(xsh, XBT_NULL, pool_path, pool_name,
+  strlen(pool_name))) {
+fprintf(stderr, "cannot set pool name\n");
+rc = 1;
+}
+xc_cpupool_infofree(xch, xcinfo);
+if (rc)
+goto out;


Moving the call of xc_cpupool_infofree() ahead of the call of xs_write()
would drop the need for this last if statement, as you could add the
goto to the upper if.


+}
+} while(xcinfo != NULL);
+


With doing all of this for being able to assign other domains created
at boot to cpupools, shouldn't you add names for other domains than dom0
here, too?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-15 Thread Jan Beulich
On 15.02.2022 11:14, Jane Malalane wrote:
> On 15/02/2022 07:09, Jan Beulich wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
>> unless you have verified the sender and know the content is safe.
>>
>> On 14.02.2022 18:09, Jane Malalane wrote:
>>> On 14/02/2022 13:18, Jan Beulich wrote:
 [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
 unless you have verified the sender and know the content is safe.

 On 14.02.2022 14:11, Jane Malalane wrote:
> On 11/02/2022 11:46, Jan Beulich wrote:
>> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open 
>> attachments unless you have verified the sender and know the content is 
>> safe.
>>
>> On 11.02.2022 12:29, Roger Pau Monné wrote:
>>> On Fri, Feb 11, 2022 at 10:06:48AM +, Jane Malalane wrote:
 On 10/02/2022 10:03, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 06:21:00PM +, Jane Malalane wrote:
>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c 
>> b/xen/arch/x86/hvm/vmx/vmcs.c
>> index 7ab15e07a0..4060aef1bd 100644
>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
>> @@ -343,6 +343,15 @@ static int vmx_init_vmcs_config(bool bsp)
>>  MSR_IA32_VMX_PROCBASED_CTLS2, &mismatch);
>>  }
>>  
>> +/* Check whether hardware supports accelerated xapic and 
>> x2apic. */
>> +if ( bsp )
>> +{
>> +assisted_xapic_available = 
>> cpu_has_vmx_virtualize_apic_accesses;
>> +assisted_x2apic_available = (cpu_has_vmx_apic_reg_virt ||
>> + 
>> cpu_has_vmx_virtual_intr_delivery) &&
>> +
>> cpu_has_vmx_virtualize_x2apic_mode;
>
> I've been think about this, and it seems kind of asymmetric that for
> xAPIC mode we report hw assisted support only with
> virtualize_apic_accesses available, while for x2APIC we require
> virtualize_x2apic_mode plus either apic_reg_virt or
> virtual_intr_delivery.
>
> I think we likely need to be more consistent here, and report hw
> assisted x2APIC support as long as virtualize_x2apic_mode is
> available.
>
> This will likely have some effect on patch 2 also, as you will have to
> adjust vmx_vlapic_msr_changed.
>
> Thanks, Roger.

 Any other thoughts on this? As on one hand it is asymmetric but also
 there isn't much assistance with only virtualize_x2apic_mode set as, in
 this case, a VM exit will be avoided only when trying to access the TPR
 register.
>>>
>>> I've been thinking about this, and reporting hardware assisted
>>> x{2}APIC virtualization with just
>>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES or
>>> SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE doesn't seem very helpful. While
>>> those provide some assistance to the VMM in order to handle APIC
>>> accesses, it will still require a trap into the hypervisor to handle
>>> most of the accesses.
>>>
>>> So maybe we should only report hardware assisted support when the
>>> mentioned features are present together with
>>> SECONDARY_EXEC_APIC_REGISTER_VIRT?
>>
>> Not sure - "some assistance" seems still a little better than none at 
>> all.
>> Which route to go depends on what exactly we intend the bit to be used 
>> for.
>>
> True. I intended this bit to be specifically for enabling
> assisted_x{2}apic. So, would it be inconsistent to report hardware
> assistance with just VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE
> but still claim that x{2}apic is virtualized if no MSR accesses are
> intercepted with XEN_HVM_CPUID_X2APIC_VIRT (in traps.c) so that, as you
> say, the guest gets at least "some assistance" instead of none but we
> still claim x{2}apic virtualization when it is actually complete? Maybe
> I could also add a comment alluding to this in the xl documentation.

 To rephrase my earlier point: Which kind of decisions are the consumer(s)
 of us reporting hardware assistance going to take? In how far is there a
 risk that "some assistance" is overall going to lead to a loss of
 performance? I guess I'd need to see comment and actual code all in one
 place ...

>>> So, I was thinking of adding something along the lines of:
>>>
>>> +=item B B<(x86 only)>
>>> +Enables or disables hardware assisted virtualization for xAPIC. This
>>> +allows accessing APIC registers without a VM-exit. Notice enabling
>>> +this does not guarantee full virtualization for xAPIC, as this can
>>> +only be achieved if hardware supports “APIC-register virtualization”
>>> +and “virtual-interrupt delivery”

[PATCH 5/5] arm/dom0less: assign dom0less guests to cpupools

2022-02-15 Thread Luca Fancellu
Introduce domain-cpupool property of a xen,domain device tree node,
that specifies the cpupool device tree handle of a xen,cpupool node
that identifies a cpupool created at boot time where the guest will
be assigned on creation.

Add member to the xen_arch_domainconfig public interface so the
XEN_DOMCTL_INTERFACE_VERSION version is bumped.

Update documentation about the property.

Signed-off-by: Luca Fancellu 
---
 docs/misc/arm/device-tree/booting.txt | 5 +
 xen/arch/arm/domain.c | 6 ++
 xen/arch/arm/domain_build.c   | 9 -
 xen/arch/x86/domain.c | 6 ++
 xen/common/domain.c   | 5 -
 xen/include/public/arch-arm.h | 2 ++
 xen/include/public/domctl.h   | 2 +-
 xen/include/xen/domain.h  | 3 +++
 8 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index 71895663a4de..0f1f210fa449 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -182,6 +182,11 @@ with the following properties:
 Both #address-cells and #size-cells need to be specified because
 both sub-nodes (described shortly) have reg properties.
 
+- domain-cpupool
+
+Optional. Handle to a xen,cpupool device tree node that identifies the
+cpupool where the guest will be started at boot.
+
 Under the "xen,domain" compatible node, one or more sub-nodes are present
 for the DomU kernel and ramdisk.
 
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 92a6c509e5c5..be350b28b588 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -788,6 +788,12 @@ fail:
 return rc;
 }
 
+unsigned int
+arch_get_domain_cpupool_id(const struct xen_domctl_createdomain *config)
+{
+return config->arch.cpupool_id;
+}
+
 void arch_domain_destroy(struct domain *d)
 {
 /* IOMMU page table is shared with P2M, always call
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 6931c022a2e8..4f239e756775 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -3015,7 +3015,8 @@ static int __init construct_domU(struct domain *d,
 void __init create_domUs(void)
 {
 struct dt_device_node *node;
-const struct dt_device_node *chosen = dt_find_node_by_path("/chosen");
+const struct dt_device_node *cpupool_node,
+*chosen = dt_find_node_by_path("/chosen");
 
 BUG_ON(chosen == NULL);
 dt_for_each_child_node(chosen, node)
@@ -3053,6 +3054,12 @@ void __init create_domUs(void)
  GUEST_VPL011_SPI - 32 + 1);
 }
 
+/* Get the optional property domain-cpupool */
+cpupool_node = dt_parse_phandle(node, "domain-cpupool", 0);
+if ( cpupool_node )
+dt_property_read_u32(cpupool_node, "cpupool-id",
+ &d_cfg.arch.cpupool_id);
+
 /*
  * The variable max_init_domid is initialized with zero, so here it's
  * very important to use the pre-increment operator to call
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index ef1812dc1402..3e3cf88c9c82 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -880,6 +880,12 @@ int arch_domain_create(struct domain *d,
 return rc;
 }
 
+unsigned int
+arch_get_domain_cpupool_id(const struct xen_domctl_createdomain *config)
+{
+return 0;
+}
+
 void arch_domain_destroy(struct domain *d)
 {
 if ( is_hvm_domain(d) )
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2048ebad86ff..d42ca8292025 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -665,6 +665,8 @@ struct domain *domain_create(domid_t domid,
 
 if ( !is_idle_domain(d) )
 {
+unsigned int domain_cpupool_id;
+
 watchdog_domain_init(d);
 init_status |= INIT_watchdog;
 
@@ -698,7 +700,8 @@ struct domain *domain_create(domid_t domid,
 if ( !d->pbuf )
 goto fail;
 
-if ( (err = sched_init_domain(d, 0)) != 0 )
+domain_cpupool_id = arch_get_domain_cpupool_id(config);
+if ( (err = sched_init_domain(d, domain_cpupool_id)) != 0 )
 goto fail;
 
 if ( (err = late_hwdom_init(d)) != 0 )
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index 94b31511ddea..2c5d1ea7f01a 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -321,6 +321,8 @@ struct xen_arch_domainconfig {
 uint16_t tee_type;
 /* IN */
 uint32_t nr_spis;
+/* IN */
+unsigned int cpupool_id;
 /*
  * OUT
  * Based on the property clock-frequency in the DT timer node.
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index b85e6170b0aa..31ec083cb06e 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -38,7 +38,7 @@
 #include "hvm/save.h"
 #include "memory.h"
 
-#define XEN_DOMCT

[PATCH 1/5] tools/cpupools: Give a name to unnamed cpupools

2022-02-15 Thread Luca Fancellu
With the introduction of boot time cpupools, Xen can create many
different cpupools at boot time other than cpupool with id 0.

Since these newly created cpupools can't have an
entry in Xenstore, create the entry using xen-init-dom0
helper with the usual convention: Pool-.

Given the change, remove the check for poolid == 0 from
libxl_cpupoolid_to_name(...).

Signed-off-by: Luca Fancellu 
---
 tools/helpers/xen-init-dom0.c  | 26 +-
 tools/libs/light/libxl_utils.c |  3 +--
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/tools/helpers/xen-init-dom0.c b/tools/helpers/xen-init-dom0.c
index c99224a4b607..3539f56faeb0 100644
--- a/tools/helpers/xen-init-dom0.c
+++ b/tools/helpers/xen-init-dom0.c
@@ -43,7 +43,10 @@ int main(int argc, char **argv)
 int rc;
 struct xs_handle *xsh = NULL;
 xc_interface *xch = NULL;
-char *domname_string = NULL, *domid_string = NULL;
+char *domname_string = NULL, *domid_string = NULL, *pool_string = NULL;
+char pool_path[strlen("/local/pool") + 12], pool_name[strlen("Pool-") + 5];
+xc_cpupoolinfo_t *xcinfo;
+unsigned int pool_id = 0;
 libxl_uuid uuid;
 
 /* Accept 0 or 1 argument */
@@ -114,6 +117,27 @@ int main(int argc, char **argv)
 goto out;
 }
 
+/* Create an entry in xenstore for each cpupool on the system */
+do {
+xcinfo = xc_cpupool_getinfo(xch, pool_id);
+if (xcinfo != NULL) {
+if (xcinfo->cpupool_id != pool_id)
+pool_id = xcinfo->cpupool_id;
+snprintf(pool_path, sizeof(pool_path), "/local/pool/%d/name",
+ pool_id);
+snprintf(pool_name, sizeof(pool_name), "Pool-%d", pool_id);
+pool_id++;
+if (!xs_write(xsh, XBT_NULL, pool_path, pool_name,
+  strlen(pool_name))) {
+fprintf(stderr, "cannot set pool name\n");
+rc = 1;
+}
+xc_cpupool_infofree(xch, xcinfo);
+if (rc)
+goto out;
+}
+} while(xcinfo != NULL);
+
 printf("Done setting up Dom0\n");
 
 out:
diff --git a/tools/libs/light/libxl_utils.c b/tools/libs/light/libxl_utils.c
index b91c2cafa223..81780da3ff40 100644
--- a/tools/libs/light/libxl_utils.c
+++ b/tools/libs/light/libxl_utils.c
@@ -151,8 +151,7 @@ char *libxl_cpupoolid_to_name(libxl_ctx *ctx, uint32_t 
poolid)
 
 snprintf(path, sizeof(path), "/local/pool/%d/name", poolid);
 s = xs_read(ctx->xsh, XBT_NULL, path, &len);
-if (!s && (poolid == 0))
-return strdup("Pool-0");
+
 return s;
 }
 
-- 
2.17.1




[PATCH 4/5] xen/cpupool: Create different cpupools at boot time

2022-02-15 Thread Luca Fancellu
Introduce an architecture specific way to create different cpupools
at boot time, this is particularly useful on ARM big.LITTLE system
where there might be the need to have different cpupools for each type
of core, but also systems using NUMA can have different cpu pools for
each node.

The feature on arm relies on a specification of the cpupools from the
device tree to build pools and assign cpus to them.

Documentation is created to explain the feature.

Signed-off-by: Luca Fancellu 
---
 docs/misc/arm/device-tree/cpupools.txt | 118 +
 xen/arch/arm/Kconfig   |   9 ++
 xen/arch/arm/Makefile  |   1 +
 xen/arch/arm/cpupool.c | 118 +
 xen/common/sched/cpupool.c |   4 +-
 xen/include/xen/sched.h|  11 +++
 6 files changed, 260 insertions(+), 1 deletion(-)
 create mode 100644 docs/misc/arm/device-tree/cpupools.txt
 create mode 100644 xen/arch/arm/cpupool.c

diff --git a/docs/misc/arm/device-tree/cpupools.txt 
b/docs/misc/arm/device-tree/cpupools.txt
new file mode 100644
index ..7298b6394332
--- /dev/null
+++ b/docs/misc/arm/device-tree/cpupools.txt
@@ -0,0 +1,118 @@
+Boot time cpupools
+==
+
+On arm, when BOOT_TIME_CPUPOOLS is enabled in the Xen configuration, it is
+possible to create cpupools during boot phase by specifying them in the device
+tree.
+
+Cpupools specification nodes shall be direct childs of /chosen node.
+Each cpupool node contains the following properties:
+
+- compatible (mandatory)
+
+Must always include the compatiblity string: "xen,cpupool".
+
+- cpupool-id (mandatory)
+
+Must be a positive integer number.
+
+- cpupool-cpus (mandatory)
+
+Must be a list of device tree phandle to nodes describing cpus (e.g. having
+device_type = "cpu"), it can't be empty.
+
+- cpupool-sched (optional)
+
+Must be a string having the name of a Xen scheduler, it has no effect when
+used in conjunction of a cpupool-id equal to zero, in that case the
+default Xen scheduler is selected (sched=<...> boot argument).
+
+
+Constraints
+===
+
+The cpupool with id zero is implicitly created even if not specified, that pool
+must have at least one cpu assigned, otherwise Xen will stop.
+
+Every cpu brought up by Xen will be assigned to the cpupool with id zero if 
it's
+not assigned to any other cpupool.
+
+If a cpu is assigned to a cpupool, but it's not brought up correctly, Xen will
+stop.
+
+
+Examples
+
+
+A system having two types of core, the following device tree specification will
+instruct Xen to have two cpupools:
+
+- The cpupool with id 0 will have 4 cpus assigned.
+- The cpupool with id 1 will have 2 cpus assigned.
+
+As can be seen from the example, cpupool_a has only two cpus assigned, but 
since
+there are two cpus unassigned, they are automatically assigned to cpupool with
+id zero. The following example can work only if hmp-unsafe=1 is passed to Xen
+boot arguments, otherwise not all cores will be brought up by Xen and the
+cpupool creation process will stop Xen.
+
+
+a72_1: cpu@0 {
+compatible = "arm,cortex-a72";
+reg = <0x0 0x0>;
+device_type = "cpu";
+[...]
+};
+
+a72_2: cpu@1 {
+compatible = "arm,cortex-a72";
+reg = <0x0 0x1>;
+device_type = "cpu";
+[...]
+};
+
+a53_1: cpu@100 {
+compatible = "arm,cortex-a53";
+reg = <0x0 0x100>;
+device_type = "cpu";
+[...]
+};
+
+a53_2: cpu@101 {
+compatible = "arm,cortex-a53";
+reg = <0x0 0x101>;
+device_type = "cpu";
+[...]
+};
+
+cpu@102 {
+compatible = "arm,cortex-a53";
+reg = <0x0 0x102>;
+device_type = "cpu";
+[...]
+};
+
+cpu@103 {
+compatible = "arm,cortex-a53";
+reg = <0x0 0x103>;
+device_type = "cpu";
+[...]
+};
+
+chosen {
+
+cpupool_a {
+compatible = "xen,cpupool";
+cpupool-id = <0>;
+cpupool-cpus = <&a53_1 &a53_2>;
+};
+cpupool_b {
+compatible = "xen,cpupool";
+cpupool-id = <1>;
+cpupool-cpus = <&a72_1 &a72_2>;
+cpupool-sched = "credit2";
+};
+
+[...]
+
+};
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index ecfa6822e4d3..64c2879513b7 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -33,6 +33,15 @@ config ACPI
  Advanced Configuration and Power Interface (ACPI) support for Xen is
  an alternative to device tree on ARM64.
 
+config BOOT_TIME_CPUPOOLS
+   bool "Create cpupools at boot time"
+   depends on ARM
+   default n
+   help
+
+ Creates cpupools during boot time and assigns cpus to them. Cpupools
+ options can be specified in the device tree.
+
 config GICV3
bool "GICv3 driver"
depends on ARM_64 && !NEW_VGIC
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index d0dee10102b6..6165da4e77b4 10

[PATCH 2/5] xen/sched: create public function for cpupools creation

2022-02-15 Thread Luca Fancellu
Create new public function to create cpupools, it checks for pool id
uniqueness before creating the pool and can take a scheduler id or
a negative value that means the default Xen scheduler will be used.

Signed-off-by: Luca Fancellu 
---
 xen/common/sched/cpupool.c | 26 ++
 xen/include/xen/sched.h| 17 +
 2 files changed, 43 insertions(+)

diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index 8c6e6eb9ccd5..4da12528d6b9 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -1218,6 +1218,32 @@ static void cpupool_hypfs_init(void)
 
 #endif /* CONFIG_HYPFS */
 
+struct cpupool *__init cpupool_create_pool(unsigned int pool_id, int sched_id)
+{
+struct cpupool *pool;
+
+ASSERT(!spin_is_locked(&cpupool_lock));
+
+spin_lock(&cpupool_lock);
+/* Check if a cpupool with pool_id exists */
+pool = __cpupool_find_by_id(pool_id, true);
+spin_unlock(&cpupool_lock);
+
+/* Pool exists, return an error */
+if ( pool )
+return NULL;
+
+if ( sched_id < 0 )
+sched_id = scheduler_get_default()->sched_id;
+
+pool = cpupool_create(pool_id, sched_id);
+
+BUG_ON(IS_ERR(pool));
+cpupool_put(pool);
+
+return pool;
+}
+
 static int __init cpupool_init(void)
 {
 unsigned int cpu;
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 37f78cc4c4c9..a50df1bccdc0 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1145,6 +1145,23 @@ int cpupool_move_domain(struct domain *d, struct cpupool 
*c);
 int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op);
 unsigned int cpupool_get_id(const struct domain *d);
 const cpumask_t *cpupool_valid_cpus(const struct cpupool *pool);
+
+/*
+ * cpupool_create_pool - Creates a cpupool
+ * @pool_id: id of the pool to be created
+ * @sched_id: id of the scheduler to be used for the pool
+ *
+ * Creates a cpupool with pool_id id, the id must be unique and the function
+ * will return an error if the pool id exists.
+ * The sched_id parameter identifies the scheduler to be used, if it is
+ * negative, the default scheduler of Xen will be used.
+ *
+ * returns:
+ * pointer to the struct cpupool just created, on success
+ * NULL, on cpupool creation error
+ */
+struct cpupool *cpupool_create_pool(unsigned int pool_id, int sched_id);
+
 extern void dump_runq(unsigned char key);
 
 void arch_do_physinfo(struct xen_sysctl_physinfo *pi);
-- 
2.17.1




[PATCH 3/5] xen/sched: retrieve scheduler id by name

2022-02-15 Thread Luca Fancellu
Add a public function to retrieve the scheduler id by the scheduler
name.

Signed-off-by: Luca Fancellu 
---
 xen/common/sched/core.c | 11 +++
 xen/include/xen/sched.h | 11 +++
 2 files changed, 22 insertions(+)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 8f4b1ca10d1c..9696d3c1d769 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2947,6 +2947,17 @@ void scheduler_enable(void)
 scheduler_active = true;
 }
 
+int __init sched_get_id_by_name(const char *sched_name)
+{
+unsigned int i;
+
+for ( i = 0; i < NUM_SCHEDULERS; i++ )
+if ( schedulers[i] && !strcmp(schedulers[i]->opt_name, sched_name) )
+return schedulers[i]->sched_id;
+
+return -1;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index a50df1bccdc0..a67a9eb2fe9d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -756,6 +756,17 @@ void sched_destroy_domain(struct domain *d);
 long sched_adjust(struct domain *, struct xen_domctl_scheduler_op *);
 long sched_adjust_global(struct xen_sysctl_scheduler_op *);
 int  sched_id(void);
+
+/*
+ * sched_get_id_by_name - retrieves a scheduler id given a scheduler name
+ * @sched_name: scheduler name as a string
+ *
+ * returns:
+ * positive value being the scheduler id, on success
+ * negative value if the scheduler name is not found.
+ */
+int sched_get_id_by_name(const char *sched_name);
+
 void vcpu_wake(struct vcpu *v);
 long vcpu_yield(void);
 void vcpu_sleep_nosync(struct vcpu *v);
-- 
2.17.1




[PATCH 0/5] Boot time cpupools

2022-02-15 Thread Luca Fancellu
This serie introduces a feature for Xen to create cpu pools at boot time, the
feature is enabled using a configurable that is disabled by default.
The boot time cpupool feature relies on the device tree to describe the cpu
pools.
Another feature is introduced by the serie, the possibility to assign a
dom0less guest to a cpupool at boot time.

Here follows an example, Xen is built with CONFIG_BOOT_TIME_CPUPOOLS=y.

>From the DT:

  [...]

  a72_0: cpu@0 {
compatible = "arm,cortex-a72";
reg = <0x0 0x0>;
device_type = "cpu";
[...]
  };

  a72_1: cpu@1 {
compatible = "arm,cortex-a72";
reg = <0x0 0x1>;
device_type = "cpu";
[...]
  };

  a53_0: cpu@100 {
compatible = "arm,cortex-a53";
reg = <0x0 0x100>;
device_type = "cpu";
[...]
  };

  a53_1: cpu@101 {
compatible = "arm,cortex-a53";
reg = <0x0 0x101>;
device_type = "cpu";
[...]
  };

  a53_2: cpu@102 {
compatible = "arm,cortex-a53";
reg = <0x0 0x102>;
device_type = "cpu";
[...]
  };

  a53_3: cpu@103 {
compatible = "arm,cortex-a53";
reg = <0x0 0x103>;
device_type = "cpu";
[...]
  };

  chosen {
#size-cells = <0x1>;
#address-cells = <0x1>;
xen,dom0-bootargs = "...";
xen,xen-bootargs = "...";

cpupool0 {
  compatible = "xen,cpupool";
  cpupool-id = <0>;
  cpupool-cpus = <&a72_0 &a72_1>;
};

cp1: cpupool1 {
  compatible = "xen,cpupool";
  cpupool-id = <1>;
  cpupool-cpus = <&a53_0 &a53_1 &a53_2 &a53_3>;
  cpupool-sched = "null";
};

module@0 {
  reg = <0x8008 0x130>;
  compatible = "multiboot,module";
};

domU1 {
  #size-cells = <0x1>;
  #address-cells = <0x1>;
  compatible = "xen,domain";
  cpus = <1>;
  memory = <0 0xC>;
  vpl011;
  domain-cpupool = <&cp1>;

  module@9200 {
compatible = "multiboot,kernel", "multiboot,module";
reg = <0x9200 0x1ff>;
bootargs = "...";
  };
};
  };

  [...]

The example DT is instructing Xen to have two cpu pools, the one with id 0
having two phisical cpus and the one with id 1 having 4 phisical cpu, the
second cpu pool uses the null scheduler and from the /chosen node we can see
that a dom0less guest will be started on that cpu pool.

In this particular case Xen must boot with different type of cpus, so the
boot argument hmp_unsafe must be enabled.

Luca Fancellu (5):
  tools/cpupools: Give a name to unnamed cpupools
  xen/sched: create public function for cpupools creation
  xen/sched: retrieve scheduler id by name
  xen/cpupool: Create different cpupools at boot time
  arm/dom0less: assign dom0less guests to cpupools

 docs/misc/arm/device-tree/booting.txt  |   5 ++
 docs/misc/arm/device-tree/cpupools.txt | 118 +
 tools/helpers/xen-init-dom0.c  |  26 +-
 tools/libs/light/libxl_utils.c |   3 +-
 xen/arch/arm/Kconfig   |   9 ++
 xen/arch/arm/Makefile  |   1 +
 xen/arch/arm/cpupool.c | 118 +
 xen/arch/arm/domain.c  |   6 ++
 xen/arch/arm/domain_build.c|   9 +-
 xen/arch/x86/domain.c  |   6 ++
 xen/common/domain.c|   5 +-
 xen/common/sched/core.c|  11 +++
 xen/common/sched/cpupool.c |  30 ++-
 xen/include/public/arch-arm.h  |   2 +
 xen/include/public/domctl.h|   2 +-
 xen/include/xen/domain.h   |   3 +
 xen/include/xen/sched.h|  39 
 17 files changed, 386 insertions(+), 7 deletions(-)
 create mode 100644 docs/misc/arm/device-tree/cpupools.txt
 create mode 100644 xen/arch/arm/cpupool.c

-- 
2.17.1




[ovmf test] 168115: all pass - PUSHED

2022-02-15 Thread osstest service owner
flight 168115 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168115/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 1193aa2dfbbd11fa7191d000a0cc166d03a249d2
baseline version:
 ovmf c9b7c6e0cc7da76b74bcdd8c90cef956d5ae971c

Last test of basis   168074  2022-02-10 02:10:25 Z5 days
Testing same since   168115  2022-02-15 02:41:41 Z0 days1 attempts


People who touched revisions under test:
  Bob Feng 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   c9b7c6e0cc..1193aa2dfb  1193aa2dfbbd11fa7191d000a0cc166d03a249d2 -> 
xen-tested-master



  1   2   >