[PATCH v2] xen/public: add comment to struct xen_mem_acquire_resource

2022-02-07 Thread Juergen Gross
Commit 7c7f7e8fba01 changed xen/include/public/memory.h in an incompatible
way. Unfortunately the changed parts were already in use in the Linux
kernel, so an update of the header in the kernel would result in a build
breakage.

As the change of above commit was in a section originally meant to be not
stable, it was the usage in the kernel which was wrong.

Add a comment to the modified struct for not reusing the now removed bit,
in order to avoid kernels using it stumbling over a possible new meaning
of the bit.

In case the kernel is updating to a new version of the header, the wrong
use case must be removed first.

Signed-off-by: Juergen Gross 
---
V2:
- only add comment instead of reverting commit 7c7f7e8fba01 (Jan Beulich)
---
 xen/include/public/memory.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 383a9468c3..86513057f7 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -662,6 +662,11 @@ struct xen_mem_acquire_resource {
  * two calls.
  */
 uint32_t nr_frames;
+/*
+ * Padding field, must be zero on input.
+ * The lowest bit was named XENMEM_rsrc_acq_caller_owned in a previous
+ * version and should not be reused in future.
+ */
 uint32_t pad;
 /*
  * IN - the index of the initial frame to be mapped. This parameter
-- 
2.34.1




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 18:44, Oleksandr Andrushchenko wrote:
>
> On 07.02.22 18:37, Jan Beulich wrote:
>> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 18:15, Jan Beulich wrote:
 On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
> On 07.02.22 17:26, Jan Beulich wrote:
>> 1b. Make vpci_write use write lock for writes to command register and 
>> BARs
>> only; keep using the read lock for all other writes.
> I am not quite sure how to do that. Do you mean something like:
> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>     uint32_t data)
> [snip]
>     list_for_each_entry ( r, >vpci->handlers, node )
> {
> [snip]
>     if ( r->needs_write_lock)
>         write_lock(d->vpci_lock)
>     else
>         read_lock(d->vpci_lock)
> 
>
> And provide rw as an argument to:
>
> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>   vpci_write_t *write_handler, unsigned int 
> offset,
>   unsigned int size, void *data, --->>> bool 
> write_path <<<-)
>
> Is this what you mean?
 This sounds overly complicated. You can derive locally in vpci_write(),
 from just its "reg" and "size" parameters, whether the lock needs taking
 in write mode.
>>> Yes, I started writing a reply with that. So, the summary (ROM
>>> position depends on header type):
>>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
>>> {
>>>    read PCI_COMMAND and see if memory or IO decoding are enabled.
>>>    if ( enabled )
>>>        write_lock(d->vpci_lock)
>>>    else
>>>        read_lock(d->vpci_lock)
>>> }
>> Hmm, yes, you can actually get away without using "size", since both
>> command register and ROM BAR are 32-bit aligned registers, and 64-bit
>> accesses get split in vpci_ecam_write().
> But, OS may want reading a single byte of ROM BAR, so I think
> I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
> ranges
>> For the command register the memory- / IO-decoding-enabled check may
>> end up a little more complicated, as the value to be written also
>> matters. Maybe read the command register only for the ROM BAR write,
>> using the write lock uniformly for all command register writes?
> Sounds good for the start.
> Another concern is that if we go with a read_lock and then in the
> underlying code we disable memory decoding and try doing
> something and calling cmd_write handler for any reason then
>
> I mean that the check in the vpci_write is somewhat we can tolerate,
> but then it is must be considered that no code in the read path
> is allowed to perform write path functions. Which brings a pretty
> valid use-case: say in read mode we detect an unrecoverable error
> and need to remove the device:
> vpci_process_pending -> ERROR -> vpci_remove_device or similar.
>
> What do we do then? It is all going to be fragile...
I have tried to summarize the options we have wrt locking
and would love to hear from @Roger and @Jan.

In every variant there is a task of dealing with the overlap
detection in modify_bars, so this is the only place as of now
which needs special treatment.

Existing limitations: there is no way to upgrade a read lock to a write
lock, so paths which may require write lock protection need to use
write lock from the very beginning. Workarounds can be applied.

1. Per-domain rw lock, aka d->vpci_lock
==
Note: with per-domain rw lock it is possible to do without introducing
per-device locks, so pdev->vpci->lock can be removed and no pdev->vpci_lock
should be required.

This is only going to work in case if vpci_write always takes the write lock
and vpci_read takes a read lock and no path in vpci_read is allowed to
perform write path operations.
vpci_process_pending uses write lock as it have vpci_remove_device in its
error path.

Pros:
- no per-device vpci lock is needed?
- solves overlap code ABBA in modify_bars

Cons:
- all writes are serialized
- need to carefully select read paths, so they are guaranteed not to lead
   to lock upgrade use-cases

1.1. Semi read lock upgrade in modify bars
--
In this case both vpci_read and vpci_write take a read lock and when it comes
to modify_bars:

1. read_unlock(d->vpci_lock)
2. write_lock(d->vpci_lock)
3. Check that pdev->vpci is still available and is the same object:
if (pdev->vpci && (pdev->vpci == old_vpci) )
{
     /* vpci structure is valid and can be used. */
}
else
{
     /* vpci has gone, return an error. */
}

Pros:
- no per-device vpci lock is needed?
- solves overlap code ABBA in modify_bars
- readers and writers are NOT serialized
- NO need to carefully select read paths, so they are guaranteed not to lead
   to lock upgrade use-cases

Cons:
- ???

2. per-device lock 

[PATCH v3] docs: document patch rules

2022-02-07 Thread Juergen Gross
Add a document to describe the rules for sending a proper patch.

As it contains all the information already being present in
docs/process/tags.pandoc remove that file.

The "Reviewed-by:" and "Acked-by:" tags are expanded to allow an
optional restriction of the tag.

A new tag "Origin:" is added to tag patches taken from another project.

Signed-off-by: Juergen Gross 
Acked-by: Jan Beulich 
Reviewed-by: Julien Grall 
---
v3:
- add note regarding commit id length for Origin: (Julien Grall)
v2:
- expanded commit message (Roger Pau Monné)
- some rewordings (Roger Pau Monné, Jan Beulich)
- add "Requested-by:" description (Jan Beulich)
- rename "Taken-from:" to "Origin:" (Jan Beulich)
- add reviewers as recipients of patch (Jan Beulich)
- style fixes (Roger Pau Monné, Jan Beulich)
---
 docs/process/sending-patches.pandoc | 300 
 docs/process/tags.pandoc|  55 -
 2 files changed, 300 insertions(+), 55 deletions(-)
 create mode 100644 docs/process/sending-patches.pandoc
 delete mode 100644 docs/process/tags.pandoc

diff --git a/docs/process/sending-patches.pandoc 
b/docs/process/sending-patches.pandoc
new file mode 100644
index 00..7ff7826c99
--- /dev/null
+++ b/docs/process/sending-patches.pandoc
@@ -0,0 +1,300 @@
+# How a proper patch should look like
+
+This is a brief description how a proper patch for the Xen project should
+look like. Examples and tooling tips are not part of this document, those
+can be found in the
+[Xen Wiki](https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches).
+
+## The patch subject
+
+The first line at the top of the patch should contain a short description of
+what the patch does, and hints as to what code it touches. This line is used
+as the **Subject** line of the mail when sending the patch.
+
+The hint which code is touched is usually in form of an abstract entity
+(like e.g. `build` for the build system), or a component (like `tools` or
+`iommu`). Further specification is possible via adding a sub-component with
+a slash (e.g. `tools/xenstore`):
+
+: 
+
+E.g.:
+
+xen/arm: increase memory banks number define value
+tools/libxenevtchn: deduplicate xenevtchn_fd()
+MAINTAINERS: update my email address
+build: correct usage comments in Kbuild.include
+
+The description should give a rough hint *what* is done in the patch.
+
+The subject line should in general not exceed 80 characters. It must be
+followed by a blank line.
+
+## The commit message
+
+The commit message is free text describing *why* the patch is done and
+*how* the goal of the patch is achieved. A good commit message will describe
+the current situation, the desired goal, and the way this goal is being
+achieved. Parts of that can be omitted in obvious cases.
+
+In case additional changes are done in the patch (like e.g. cleanups), those
+should be mentioned.
+
+When referencing other patches (e.g. `similar to patch xy ...`) those
+patches should be referenced via their commit id (at least 12 digits)
+and the patch subject, if the very same patch isn't referenced by the
+`Fixes:` tag, too:
+
+Similar to commit 67d01cdb5518 ("x86: infrastructure to allow converting
+certain indirect calls to direct ones") add ...
+
+The following ``git config`` settings can be used to add a pretty format for
+outputting the above style in the ``git log`` or ``git show`` commands:
+
+[core]
+abbrev = 12
+[pretty]
+fixes = Fixes: %h (\"%s\")
+
+Lines in the commit message should not exceed 75 characters, except when
+copying error output directly into the commit message.
+
+## Tags
+
+Tags are entries in the form
+
+Tag: something
+
+In general tags are added in chronological order. So a `Reviewed-by:` tag
+should be added **after** the `Signed-off-by:` tag, as the review happened
+after the patch was written.
+
+Do not split a tag across multiple lines, tags are exempt from the
+"wrap at 75 columns" rule in order to simplify parsing scripts.
+
+### Origin:
+
+Xen has inherited some source files from other open source projects. In case
+a patch modifying such an inherited file is taken from that project (maybe in
+modified form), the `Origin:` tag specifies the source of the patch:
+
+Origin:  
+
+E.g.:
+
+Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
f093b08c47b3
+
+The commit id should be shortened to its first 12 characters.
+
+All tags **above** the `Origin:` tag are from the original patch (which
+should all be kept), while tags **after** `Origin:` are related to the
+normal Xen patch process as described here.
+
+### Fixes:
+
+If your patch fixes a bug in a specific commit, e.g. you found an issue using
+``git bisect``, please use the `Fixes:` tag with the first 12 characters of
+the commit id, and the one line summary.
+
+Fixes:  ("")
+
+E.g.:
+
+Fixes: 67d01cdb5518 ("x86: infrastructure to allow converting certain 
indirect calls to direct 

[PATCH v3 5/5] tools/include: remove xen-external directory

2022-02-07 Thread Juergen Gross
There is no user of tools/include/xen-external/* left. Remove it.

Signed-off-by: Juergen Gross 
---
 tools/include/xen-external/README |   24 -
 tools/include/xen-external/bsd-COPYRIGHT  |  126 --
 tools/include/xen-external/bsd-queue.3| 1044 -
 .../xen-external/bsd-sys-queue-h-seddery  |   74 --
 tools/include/xen-external/bsd-sys-queue.h|  637 --
 5 files changed, 1905 deletions(-)
 delete mode 100644 tools/include/xen-external/README
 delete mode 100644 tools/include/xen-external/bsd-COPYRIGHT
 delete mode 100644 tools/include/xen-external/bsd-queue.3
 delete mode 100755 tools/include/xen-external/bsd-sys-queue-h-seddery
 delete mode 100644 tools/include/xen-external/bsd-sys-queue.h

diff --git a/tools/include/xen-external/README 
b/tools/include/xen-external/README
deleted file mode 100644
index 93c2bc9cd8..00
--- a/tools/include/xen-external/README
+++ /dev/null
@@ -1,24 +0,0 @@
-WARNING - DO NOT EDIT THINGS IN THIS DIRECTORY
---
-
-These files were obtained elsewhere and should only be updated by
-copying new versions from the source location, as documented below:
-
-bsd-COPYRIGHT
-bsd-sys-queue.h
-bsd-queue.3
-
-  Obtained from the FreeBSD SVN using the following commands:
-svn co -r 221843 svn://svn.freebsd.org/base/head/sys/sys/
-svn co -r 221843 svn://svn.freebsd.org/base/head/share/man/man3
-svn cat -r 221843 http://svn.freebsd.org/base/head/COPYRIGHT 
>tools/libxl/external/bsd-COPYRIGHT
-
-Exceptions:
-
-README
-
-  This file
-
-bsd-sys-queue-h-seddery
-
-  Script to transform the above into a new namespace.
diff --git a/tools/include/xen-external/bsd-COPYRIGHT 
b/tools/include/xen-external/bsd-COPYRIGHT
deleted file mode 100644
index 6dc5d16b46..00
--- a/tools/include/xen-external/bsd-COPYRIGHT
+++ /dev/null
@@ -1,126 +0,0 @@
-# $FreeBSD$
-#  @(#)COPYRIGHT   8.2 (Berkeley) 3/21/94
-
-The compilation of software known as FreeBSD is distributed under the
-following terms:
-
-Copyright (c) 1992-2011 The FreeBSD Project. All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions
-are met:
-1. Redistributions of source code must retain the above copyright
-   notice, this list of conditions and the following disclaimer.
-2. Redistributions in binary form must reproduce the above copyright
-   notice, this list of conditions and the following disclaimer in the
-   documentation and/or other materials provided with the distribution.
-
-THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
-OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
-HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
-LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
-OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
-SUCH DAMAGE.
-
-The 4.4BSD and 4.4BSD-Lite software is distributed under the following
-terms:
-
-All of the documentation and software included in the 4.4BSD and 4.4BSD-Lite
-Releases is copyrighted by The Regents of the University of California.
-
-Copyright 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
-   The Regents of the University of California.  All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions
-are met:
-1. Redistributions of source code must retain the above copyright
-   notice, this list of conditions and the following disclaimer.
-2. Redistributions in binary form must reproduce the above copyright
-   notice, this list of conditions and the following disclaimer in the
-   documentation and/or other materials provided with the distribution.
-3. All advertising materials mentioning features or use of this software
-   must display the following acknowledgement:
-This product includes software developed by the University of
-California, Berkeley and its contributors.
-4. Neither the name of the University nor the names of its contributors
-   may be used to endorse or promote products derived from this software
-   without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
-FOR ANY DIRECT, INDIRECT, 

[PATCH v3 3/5] tools/libs/toolcore: replace _xentoolcore_list.h with _xen_list.h

2022-02-07 Thread Juergen Gross
Remove generating _xentoolcore_list.h and use the common _xen_list.h
instead.

Signed-off-by: Juergen Gross 
---
v3:
- fix build (Anthony PERARD)
---
 .gitignore   | 1 -
 tools/include/xentoolcore_internal.h | 4 ++--
 tools/libs/toolcore/Makefile | 8 
 tools/libs/toolcore/handlereg.c  | 8 
 4 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/.gitignore b/.gitignore
index 3f9d55ba87..afe78c787c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -227,7 +227,6 @@ tools/hotplug/NetBSD/rc.d/xencommons
 tools/hotplug/NetBSD/rc.d/xendriverdomain
 tools/include/acpi
 tools/include/_libxl*.h
-tools/include/_xentoolcore_list.h
 tools/include/xen/*
 tools/include/xen-xsm/*
 tools/include/xen-foreign/*.(c|h|size)
diff --git a/tools/include/xentoolcore_internal.h 
b/tools/include/xentoolcore_internal.h
index 04f5848f09..deccefd612 100644
--- a/tools/include/xentoolcore_internal.h
+++ b/tools/include/xentoolcore_internal.h
@@ -27,7 +27,7 @@
 #include 
 
 #include "xentoolcore.h"
-#include "_xentoolcore_list.h"
+#include "_xen_list.h"
 
 /*-- active handle registration --*/
 
@@ -87,7 +87,7 @@ typedef int 
Xentoolcore__Restrict_Callback(Xentoolcore__Active_Handle*,
 
 struct Xentoolcore__Active_Handle {
 Xentoolcore__Restrict_Callback *restrict_callback;
-XENTOOLCORE_LIST_ENTRY(Xentoolcore__Active_Handle) entry;
+XEN_LIST_ENTRY(Xentoolcore__Active_Handle) entry;
 };
 
 void xentoolcore__register_active_handle(Xentoolcore__Active_Handle*);
diff --git a/tools/libs/toolcore/Makefile b/tools/libs/toolcore/Makefile
index ed4ae00694..9c013b2879 100644
--- a/tools/libs/toolcore/Makefile
+++ b/tools/libs/toolcore/Makefile
@@ -3,7 +3,6 @@ include $(XEN_ROOT)/tools/Rules.mk
 
 MAJOR  = 1
 MINOR  = 0
-AUTOINCS := $(XEN_INCLUDE)/_xentoolcore_list.h
 
 LIBHEADER := xentoolcore.h
 
@@ -12,10 +11,3 @@ SRCS-y   += handlereg.c
 include $(XEN_ROOT)/tools/libs/libs.mk
 
 PKG_CONFIG_DESC := Central support for Xen Hypervisor userland libraries
-
-$(LIB_OBJS): $(AUTOINCS)
-$(PIC_OBJS): $(AUTOINCS)
-
-$(XEN_INCLUDE)/_xentoolcore_list.h: 
$(XEN_INCLUDE)/xen-external/bsd-sys-queue-h-seddery 
$(XEN_INCLUDE)/xen-external/bsd-sys-queue.h
-   $(PERL) $^ --prefix=xentoolcore >$(notdir $@).new
-   $(call move-if-changed,$(notdir $@).new,$@)
diff --git a/tools/libs/toolcore/handlereg.c b/tools/libs/toolcore/handlereg.c
index baec55e2a4..b43cb0e8ac 100644
--- a/tools/libs/toolcore/handlereg.c
+++ b/tools/libs/toolcore/handlereg.c
@@ -31,7 +31,7 @@
 #include 
 
 static pthread_mutex_t handles_lock = PTHREAD_MUTEX_INITIALIZER;
-static XENTOOLCORE_LIST_HEAD(, Xentoolcore__Active_Handle) handles;
+static XEN_LIST_HEAD(, Xentoolcore__Active_Handle) handles;
 
 static void lock(void) {
 int e = pthread_mutex_lock(_lock);
@@ -45,13 +45,13 @@ static void unlock(void) {
 
 void xentoolcore__register_active_handle(Xentoolcore__Active_Handle *ah) {
 lock();
-XENTOOLCORE_LIST_INSERT_HEAD(, ah, entry);
+XEN_LIST_INSERT_HEAD(, ah, entry);
 unlock();
 }
 
 void xentoolcore__deregister_active_handle(Xentoolcore__Active_Handle *ah) {
 lock();
-XENTOOLCORE_LIST_REMOVE(ah, entry);
+XEN_LIST_REMOVE(ah, entry);
 unlock();
 }
 
@@ -60,7 +60,7 @@ int xentoolcore_restrict_all(domid_t domid) {
 Xentoolcore__Active_Handle *ah;
 
 lock();
-XENTOOLCORE_LIST_FOREACH(ah, , entry) {
+XEN_LIST_FOREACH(ah, , entry) {
 r = ah->restrict_callback(ah, domid);
 if (r) goto out;
 }
-- 
2.34.1




[PATCH v3 4/5] tools/libs/evtchn: use _xen_list.h

2022-02-07 Thread Juergen Gross
Instead of including xen-external/bsd-sys-queue.h use the header
_xen_list.h in minios.c.

Signed-off-by: Juergen Gross 
---
 tools/libs/evtchn/minios.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/libs/evtchn/minios.c b/tools/libs/evtchn/minios.c
index 30f98bc7e4..65cfccfd09 100644
--- a/tools/libs/evtchn/minios.c
+++ b/tools/libs/evtchn/minios.c
@@ -20,7 +20,7 @@
  * Split off from xc_minios.c
  */
 
-#include "xen-external/bsd-sys-queue.h"
+#include "_xen_list.h"
 #include 
 #include 
 #include 
@@ -38,10 +38,10 @@
 
 #include "private.h"
 
-LIST_HEAD(port_list, port_info);
+XEN_LIST_HEAD(port_list, struct port_info);
 
 struct port_info {
-LIST_ENTRY(port_info) list;
+XEN_LIST_ENTRY(struct port_info) list;
 evtchn_port_t port;
 bool pending;
 bool bound;
@@ -62,7 +62,7 @@ static struct port_info *port_alloc(xenevtchn_handle *xce)
 port_info->port = -1;
 port_info->bound = false;
 
-LIST_INSERT_HEAD(port_list, port_info, list);
+XEN_LIST_INSERT_HEAD(port_list, port_info, list);
 
 return port_info;
 }
@@ -72,7 +72,7 @@ static void port_dealloc(struct port_info *port_info)
 if ( port_info->bound )
 unbind_evtchn(port_info->port);
 
-LIST_REMOVE(port_info, list);
+XEN_LIST_REMOVE(port_info, list);
 free(port_info);
 }
 
@@ -81,7 +81,7 @@ static int evtchn_close_fd(struct file *file)
 struct port_info *port_info, *tmp;
 struct port_list *port_list = file->dev;
 
-LIST_FOREACH_SAFE(port_info, port_list, list, tmp)
+XEN_LIST_FOREACH_SAFE(port_info, port_list, list, tmp)
 port_dealloc(port_info);
 free(port_list);
 
@@ -126,7 +126,7 @@ int osdep_evtchn_open(xenevtchn_handle *xce, unsigned int 
flags)
 }
 
 file->dev = list;
-LIST_INIT(list);
+XEN_LIST_INIT(list);
 xce->fd = fd;
 printf("evtchn_open() -> %d\n", fd);
 
@@ -173,7 +173,7 @@ static void evtchn_handler(evtchn_port_t port, struct 
pt_regs *regs, void *data)
 assert(file);
 port_list = file->dev;
 mask_evtchn(port);
-LIST_FOREACH(port_info, port_list, list)
+XEN_LIST_FOREACH(port_info, port_list, list)
 {
 if ( port_info->port == port )
 goto found;
@@ -257,7 +257,7 @@ int xenevtchn_unbind(xenevtchn_handle *xce, evtchn_port_t 
port)
 struct port_info *port_info;
 struct port_list *port_list = file->dev;
 
-LIST_FOREACH(port_info, port_list, list)
+XEN_LIST_FOREACH(port_info, port_list, list)
 {
 if ( port_info->port == port )
 {
@@ -314,7 +314,7 @@ xenevtchn_port_or_error_t 
xenevtchn_pending(xenevtchn_handle *xce)
 
 file->read = false;
 
-LIST_FOREACH(port_info, port_list, list)
+XEN_LIST_FOREACH(port_info, port_list, list)
 {
 if ( port_info->port != -1 && port_info->pending )
 {
-- 
2.34.1




[PATCH v3 1/5] tools/include: generate a _xen_list.h file

2022-02-07 Thread Juergen Gross
Today tools/include contains two basically identical header files
generated from the same source. They just differ by the used name space
and they are being generated from different Makefiles via a perl
script.

Prepare to have only one such header by using a more generic namespace
"XEN" for _xen_list.h.

As the original header hasn't been updated in the Xen tree since its
introduction about 10 years ago, and the updates of FreeBSD side have
mostly covered BSD internal debugging aids, just don't generate the
new header during build, especially as using the current FreeBSD
version of the file would require some updates of the perl script,
which are potentially more work than just doing the needed editing by
hand. Additionally this enables to remove the not needed debugging
extensions of FreeBSD.

Signed-off-by: Juergen Gross 
---
 tools/include/Makefile|   2 +
 tools/include/_xen_list.h | 509 ++
 2 files changed, 511 insertions(+)
 create mode 100644 tools/include/_xen_list.h

diff --git a/tools/include/Makefile b/tools/include/Makefile
index d7b51006e0..d965987f55 100644
--- a/tools/include/Makefile
+++ b/tools/include/Makefile
@@ -70,11 +70,13 @@ install: all
$(INSTALL_DATA) xen/io/*.h $(DESTDIR)$(includedir)/xen/io
$(INSTALL_DATA) xen/sys/*.h $(DESTDIR)$(includedir)/xen/sys
$(INSTALL_DATA) xen/xsm/*.h $(DESTDIR)$(includedir)/xen/xsm
+   $(INSTALL_DATA) _xen_list.h $(DESTDIR)$(includedir)
 
 .PHONY: uninstall
 uninstall:
echo "[FIXME] uninstall headers"
rm -rf $(DESTDIR)$(includedir)/xen
+   rm -f $(DESTDIR)$(includedir)/_xen_list.h
 
 .PHONY: clean
 clean:
diff --git a/tools/include/_xen_list.h b/tools/include/_xen_list.h
new file mode 100644
index 00..ce246f95c9
--- /dev/null
+++ b/tools/include/_xen_list.h
@@ -0,0 +1,509 @@
+/*-
+ * Copyright (c) 1991, 1993
+ * The Regents of the University of California.  All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 4. Neither the name of the University nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * @(#)queue.h 8.5 (Berkeley) 8/20/94
+ * $FreeBSD$
+ */
+
+#ifndef XEN__SYS_QUEUE_H_
+#defineXEN__SYS_QUEUE_H_
+
+/* #include  */
+
+/*
+ * This file defines four types of data structures: singly-linked lists,
+ * singly-linked tail queues, lists and tail queues.
+ *
+ * A singly-linked list is headed by a single forward pointer. The elements
+ * are singly linked for minimum space and pointer manipulation overhead at
+ * the expense of O(n) removal for arbitrary elements. New elements can be
+ * added to the list after an existing element or at the head of the list.
+ * Elements being removed from the head of the list should use the explicit
+ * macro for this purpose for optimum efficiency. A singly-linked list may
+ * only be traversed in the forward direction.  Singly-linked lists are ideal
+ * for applications with large datasets and few or no removals or for
+ * implementing a LIFO queue.
+ *
+ * A singly-linked tail queue is headed by a pair of pointers, one to the
+ * head of the list and the other to the tail of the list. The elements are
+ * singly linked for minimum space and pointer manipulation overhead at the
+ * expense of O(n) removal for arbitrary elements. New elements can be added
+ * to the list after an existing element, at the head of the list, or at the
+ * end of the list. Elements being removed from the head of the tail queue
+ * should use the explicit macro for this purpose for optimum efficiency.
+ * A singly-linked tail queue may only be traversed 

[PATCH v3 2/5] tools/libs/light: replace _libxl_list.h with _xen_list.h

2022-02-07 Thread Juergen Gross
Remove generating _libxl_list.h and use the common _xen_list.h instead.

Signed-off-by: Juergen Gross 
---
 tools/include/libxl.h|   4 +-
 tools/libs/light/Makefile|  10 +--
 tools/libs/light/libxl.c |  40 -
 tools/libs/light/libxl_aoutils.c |  20 ++---
 tools/libs/light/libxl_device.c  |  27 +++---
 tools/libs/light/libxl_disk.c|   4 +-
 tools/libs/light/libxl_domain.c  |  18 ++--
 tools/libs/light/libxl_event.c   | 128 +--
 tools/libs/light/libxl_fork.c|  44 -
 tools/libs/light/libxl_internal.h|  86 +-
 tools/libs/light/libxl_qmp.c |  19 ++--
 tools/libs/light/libxl_stream_read.c |  20 ++---
 12 files changed, 206 insertions(+), 214 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 2bbbd21f0b..51a9b6cfac 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -747,7 +747,7 @@
 typedef struct libxl__ctx libxl_ctx;
 
 #include 
-#include <_libxl_list.h>
+#include <_xen_list.h>
 
 /* API compatibility. */
 #ifdef LIBXL_API_VERSION
@@ -1448,7 +1448,7 @@ typedef struct {
 } libxl_enum_string_table;
 
 struct libxl_event;
-typedef LIBXL_TAILQ_ENTRY(struct libxl_event) libxl_ev_link;
+typedef XEN_TAILQ_ENTRY(struct libxl_event) libxl_ev_link;
 
 /*
  * A boolean variable with an explicit default state.
diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
index be32d95d39..5642955672 100644
--- a/tools/libs/light/Makefile
+++ b/tools/libs/light/Makefile
@@ -153,14 +153,14 @@ LIBXL_TEST_OBJS += $(foreach t, 
$(LIBXL_TESTS_INSIDE),libxl_test_$t.opic)
 TEST_PROG_OBJS += $(foreach t, $(LIBXL_TESTS_PROGS),test_$t.o) test_common.o
 TEST_PROGS += $(foreach t, $(LIBXL_TESTS_PROGS),test_$t)
 
-AUTOINCS = $(XEN_INCLUDE)/_libxl_list.h _libxl_save_msgs_callout.h 
_libxl_save_msgs_helper.h
+AUTOINCS = _libxl_save_msgs_callout.h _libxl_save_msgs_helper.h
 AUTOSRCS = _libxl_save_msgs_callout.c _libxl_save_msgs_helper.c
 
 CLIENTS = testidl libxl-save-helper
 
 SAVE_HELPER_OBJS = libxl_save_helper.o _libxl_save_msgs_helper.o
 
-LIBHEADER := libxl.h libxl_event.h libxl_json.h _libxl_types.h 
_libxl_types_json.h _libxl_list.h libxl_utils.h libxl_uuid.h
+LIBHEADER := libxl.h libxl_event.h libxl_json.h _libxl_types.h 
_libxl_types_json.h libxl_utils.h libxl_uuid.h
 
 NO_HEADERS_CHK := y
 
@@ -201,17 +201,13 @@ _libxl.api-for-check: $(XEN_INCLUDE)/libxl.h $(AUTOINCS)
>$@.new
mv -f $@.new $@
 
-$(XEN_INCLUDE)/_libxl_list.h: 
$(XEN_INCLUDE)/xen-external/bsd-sys-queue-h-seddery 
$(XEN_INCLUDE)/xen-external/bsd-sys-queue.h
-   $(PERL) $^ --prefix=libxl >$(notdir $@).new
-   $(call move-if-changed,$(notdir $@).new,$@)
-
 _libxl_save_msgs_helper.c _libxl_save_msgs_callout.c \
 _libxl_save_msgs_helper.h _libxl_save_msgs_callout.h: \
libxl_save_msgs_gen.pl
$(PERL) -w $< $@ >$@.new
$(call move-if-changed,$@.new,$@)
 
-$(XEN_INCLUDE)/libxl.h: $(XEN_INCLUDE)/_libxl_types.h 
$(XEN_INCLUDE)/_libxl_list.h
+$(XEN_INCLUDE)/libxl.h: $(XEN_INCLUDE)/_libxl_types.h
 $(XEN_INCLUDE)/libxl_json.h: $(XEN_INCLUDE)/_libxl_types_json.h
 libxl_internal.h: _libxl_types_internal.h _libxl_types_private.h 
_libxl_types_internal_private.h
 libxl_internal_json.h: _libxl_types_internal_json.h
diff --git a/tools/libs/light/libxl.c b/tools/libs/light/libxl.c
index 667ae6409b..a0bf7d186f 100644
--- a/tools/libs/light/libxl.c
+++ b/tools/libs/light/libxl.c
@@ -41,29 +41,29 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version,
 ctx->nogc_gc.alloc_maxsize = -1;
 ctx->nogc_gc.owner = ctx;
 
-LIBXL_TAILQ_INIT(>occurred);
+XEN_TAILQ_INIT(>occurred);
 
 ctx->osevent_hooks = 0;
 
 ctx->poller_app = 0;
-LIBXL_LIST_INIT(>pollers_event);
-LIBXL_LIST_INIT(>pollers_idle);
-LIBXL_LIST_INIT(>pollers_active);
+XEN_LIST_INIT(>pollers_event);
+XEN_LIST_INIT(>pollers_idle);
+XEN_LIST_INIT(>pollers_active);
 
-LIBXL_LIST_INIT(>efds);
-LIBXL_TAILQ_INIT(>etimes);
+XEN_LIST_INIT(>efds);
+XEN_TAILQ_INIT(>etimes);
 
 ctx->watch_slots = 0;
-LIBXL_SLIST_INIT(>watch_freeslots);
+XEN_SLIST_INIT(>watch_freeslots);
 libxl__ev_fd_init(>watch_efd);
 
 ctx->xce = 0;
-LIBXL_LIST_INIT(>evtchns_waiting);
+XEN_LIST_INIT(>evtchns_waiting);
 libxl__ev_fd_init(>evtchn_efd);
 
-LIBXL_LIST_INIT(>aos_inprogress);
+XEN_LIST_INIT(>aos_inprogress);
 
-LIBXL_TAILQ_INIT(>death_list);
+XEN_TAILQ_INIT(>death_list);
 libxl__ev_xswatch_init(>death_watch);
 
 ctx->childproc_hooks = __childproc_default_hooks;
@@ -122,14 +122,14 @@ int libxl_ctx_alloc(libxl_ctx **pctx, int version,
 static void free_disable_deaths(libxl__gc *gc,
 struct libxl__evgen_domain_death_list *l) {
 libxl_evgen_domain_death *death;
-while ((death = LIBXL_TAILQ_FIRST(l)))
+while ((death = XEN_TAILQ_FIRST(l)))
 

[PATCH v3 0/5] tools: remove include/xen-external directory

2022-02-07 Thread Juergen Gross
The tools/include/xen-external directory contains a header file from
FreeBSD used to generate Xen header files. This series is replacing the
complete directory by a single header with the same semantics.

Changes in V3:
- fix patch 3

Changes in V2:
- remove stale comment in patch 1

Juergen Gross (5):
  tools/include: generate a _xen_list.h file
  tools/libs/light: replace _libxl_list.h with _xen_list.h
  tools/libs/toolcore: replace _xentoolcore_list.h with _xen_list.h
  tools/libs/evtchn: use _xen_list.h
  tools/include: remove xen-external directory

 .gitignore|1 -
 tools/include/Makefile|2 +
 tools/include/_xen_list.h |  509 
 tools/include/libxl.h |4 +-
 tools/include/xen-external/README |   24 -
 tools/include/xen-external/bsd-COPYRIGHT  |  126 --
 tools/include/xen-external/bsd-queue.3| 1044 -
 .../xen-external/bsd-sys-queue-h-seddery  |   74 --
 tools/include/xen-external/bsd-sys-queue.h|  637 --
 tools/include/xentoolcore_internal.h  |4 +-
 tools/libs/evtchn/minios.c|   20 +-
 tools/libs/light/Makefile |   10 +-
 tools/libs/light/libxl.c  |   40 +-
 tools/libs/light/libxl_aoutils.c  |   20 +-
 tools/libs/light/libxl_device.c   |   27 +-
 tools/libs/light/libxl_disk.c |4 +-
 tools/libs/light/libxl_domain.c   |   18 +-
 tools/libs/light/libxl_event.c|  128 +-
 tools/libs/light/libxl_fork.c |   44 +-
 tools/libs/light/libxl_internal.h |   86 +-
 tools/libs/light/libxl_qmp.c  |   19 +-
 tools/libs/light/libxl_stream_read.c  |   20 +-
 tools/libs/toolcore/Makefile  |8 -
 tools/libs/toolcore/handlereg.c   |8 +-
 24 files changed, 733 insertions(+), 2144 deletions(-)
 create mode 100644 tools/include/_xen_list.h
 delete mode 100644 tools/include/xen-external/README
 delete mode 100644 tools/include/xen-external/bsd-COPYRIGHT
 delete mode 100644 tools/include/xen-external/bsd-queue.3
 delete mode 100755 tools/include/xen-external/bsd-sys-queue-h-seddery
 delete mode 100644 tools/include/xen-external/bsd-sys-queue.h

-- 
2.34.1




Re: [PATCH v2 0/5] tools: remove include/xen-external directory

2022-02-07 Thread Juergen Gross

On 07.02.22 19:09, Anthony PERARD wrote:

On Mon, Feb 07, 2022 at 07:41:42AM +0100, Juergen Gross wrote:

The tools/include/xen-external directory contains a header file from
FreeBSD used to generate Xen header files. This series is replacing the
complete directory by a single header with the same semantics.

Changes in V2:
- remove stale comment in patch 1


Just need to fix the build now, otherwise, the series looks fine:
https://gitlab.com/xen-project/patchew/xen/-/pipelines/464803007
^ xentoolcore conversion seems unfinished.


Hmm, weird.

Seems I did only an incremental build, but I think it should have failed
nevertheless. I suspect something is wrong with the dependencies.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH V5] xen/gnttab: Store frame GFN in struct page_info on Arm

2022-02-07 Thread Oleksandr Tyshchenko

On 07.02.22 19:15, Julien Grall wrote:
> Hi Oleksandr,


Hi Julien


>
>
> On 05/01/2022 23:11, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko 
>>
>> Rework Arm implementation to store grant table frame GFN
>> in struct page_info directly instead of keeping it in
>> standalone status/shared arrays. This patch is based on
>> the assumption that grant table page is the xenheap page.
>
> I would write "grant table pages are xenheap pages" or "a grant table 
> page is a xenheap page".


ok, will do


>
> [...]
>
>> diff --git a/xen/arch/arm/include/asm/grant_table.h 
>> b/xen/arch/arm/include/asm/grant_table.h
>> index d31a4d6..d6fda31 100644
>> --- a/xen/arch/arm/include/asm/grant_table.h
>> +++ b/xen/arch/arm/include/asm/grant_table.h
>> @@ -11,11 +11,6 @@
>>   #define INITIAL_NR_GRANT_FRAMES 1U
>>   #define GNTTAB_MAX_VERSION 1
>>   -struct grant_table_arch {
>> -    gfn_t *shared_gfn;
>> -    gfn_t *status_gfn;
>> -};
>> -
>>   static inline void gnttab_clear_flags(struct domain *d,
>>     unsigned int mask, uint16_t 
>> *addr)
>>   {
>> @@ -46,41 +41,12 @@ int replace_grant_host_mapping(unsigned long 
>> gpaddr, mfn_t mfn,
>>   #define gnttab_dom0_frames() \
>>   min_t(unsigned int, opt_max_grant_frames, PFN_DOWN(_etext - 
>> _stext))
>>   -#define gnttab_init_arch(gt) \
>> -({ \
>> -    unsigned int ngf_ = 
>> (gt)->max_grant_frames;  \
>> -    unsigned int nsf_ = 
>> grant_to_status_frames(ngf_);    \
>> - \
>> -    (gt)->arch.shared_gfn = xmalloc_array(gfn_t, 
>> ngf_);  \
>> -    (gt)->arch.status_gfn = xmalloc_array(gfn_t, 
>> nsf_);  \
>> -    if ( (gt)->arch.shared_gfn && (gt)->arch.status_gfn 
>> )    \
>> - { \
>> -    while ( ngf_-- 
>> ) \
>> -    (gt)->arch.shared_gfn[ngf_] = 
>> INVALID_GFN;   \
>> -    while ( nsf_-- 
>> ) \
>> -    (gt)->arch.status_gfn[nsf_] = 
>> INVALID_GFN;   \
>> - } \
>> - else \
>> - gnttab_destroy_arch(gt); \
>> -    (gt)->arch.shared_gfn ? 0 : 
>> -ENOMEM; \
>> -})
>> -
>> -#define gnttab_destroy_arch(gt) \
>> -    do { \
>> - XFREE((gt)->arch.shared_gfn); \
>> - XFREE((gt)->arch.status_gfn); \
>> -    } while ( 0 )
>> -
>>   #define gnttab_set_frame_gfn(gt, st, idx, gfn, 
>> mfn)  \
>> ({ \
>> -    int rc_ = 
>> 0; \
>>   gfn_t ogfn = gnttab_get_frame_gfn(gt, st, 
>> idx);  \
>> -    if ( gfn_eq(ogfn, INVALID_GFN) || gfn_eq(ogfn, gfn) 
>> ||   \
>> - (rc_ = guest_physmap_remove_page((gt)->domain, ogfn, 
>> mfn,   \
>> -  0)) == 0 
>> ) \
>> -    ((st) ? 
>> (gt)->arch.status_gfn    \
>> -  : (gt)->arch.shared_gfn)[idx] = 
>> (gfn); \
>> - rc_; \
>> +    (!gfn_eq(ogfn, INVALID_GFN) && !gfn_eq(ogfn, 
>> gfn))   \
>> + ? guest_physmap_remove_page((gt)->domain, ogfn, mfn, 
>> 0) \
>> + : 
>> 0;    \
>
> Given that we are implementing something similar to an M2P, I was 
> expecting the implementation to be pretty much the same as the x86 
> helper.
>
> Would you be able to outline why it is different?

Being honest, I didn't think about it so far.  But, I agree with the 
question.

It feels to me that Arm variant can now behave as x86 one (as 
xenmem_add_to_physmap_one() now checks for the prior mapping), I mean to 
use INVALID_GFN as an indication to remove a page.

What do you think?


>
>
>>   })
>>     #define gnttab_get_frame_gfn(gt, st, idx) 
>> ({ \
>> @@ -88,11 +54,21 @@ int replace_grant_host_mapping(unsigned long 
>> gpaddr, mfn_t mfn,
>>   : gnttab_shared_gfn(NULL, gt, 
>> idx);  \
>>   })
>>   -#define gnttab_shared_gfn(d, t, 
>> i)   \
>> -    (((i) >= nr_grant_frames(t)) ? INVALID_GFN : 
>> (t)->arch.shared_gfn[i])
>> +#define gnttab_shared_page(t, i) 
>> ({  \
>> + virt_to_page((t)->shared_raw[i]); \
>> +})
>
> This can be simplified to:
>
> #define gnttab_shared_page(t, i) virt_to_page((t)->shared_raw[i])


agree, will do


>
>> +
>> +#define gnttab_status_page(t, i) 
>> ({  \
>> + virt_to_page((t)->status[i]); \
>> +})
>
> Same here.

ok


>
>>   -#define gnttab_status_gfn(d, t, 
>> i)   \
>> -    (((i) >= nr_status_frames(t)) ? INVALID_GFN : 
>> (t)->arch.status_gfn[i])
>> +#define gnttab_shared_gfn(d, t, i) 
>> ({    \
>> +    

Re: [PATCH V5] xen/gnttab: Store frame GFN in struct page_info on Arm

2022-02-07 Thread Oleksandr Tyshchenko

On 07.02.22 19:59, Julien Grall wrote:

Hi Julien


>
>
> On 07/02/2022 17:58, Oleksandr Tyshchenko wrote:
>>
>> On 07.02.22 19:41, Julien Grall wrote:
>>> On 06/01/2022 16:30, Oleksandr wrote:
>>>
>>> So I agree with Jan that the name should be adjusted if it stays where
>>> it is.
>>>
>>> That said, I would actually prefer the adjustment in
>>> alloc_heap_pages(). It is one less assignment per page and I don't
>>> expect any issue with setting the bits to INVALID_GFN everywhere in
>>> the future on Arm.
>>
>>
>> Sorry I lost the context. To clarify, are you speaking about what I
>> proposed at [1]?
>
> That's correct.


Thank you for the clarification.


>
> Cheers,
>
-- 
Regards,

Oleksandr Tyshchenko


Re: [PATCH V5] xen/gnttab: Store frame GFN in struct page_info on Arm

2022-02-07 Thread Oleksandr Tyshchenko

On 07.02.22 19:41, Julien Grall wrote:
> Hi,


Hi Julien


>
> Sorry for the late reply.

np


>
> On 06/01/2022 16:30, Oleksandr wrote:
>
> So I agree with Jan that the name should be adjusted if it stays where 
> it is.
>
> That said, I would actually prefer the adjustment in 
> alloc_heap_pages(). It is one less assignment per page and I don't 
> expect any issue with setting the bits to INVALID_GFN everywhere in 
> the future on Arm.


Sorry I lost the context. To clarify, are you speaking about what I 
proposed at [1]?


If yes, then ...


>
> Note that you would also need to update acquire_staticmem_pages().


  ... yes, will do.


[1] 
https://lore.kernel.org/xen-devel/b4832284-9bfc-d600-14b1-1784f53e5...@gmail.com/


>
>
> Cheers,
>
-- 
Regards,

Oleksandr Tyshchenko


[linux-linus test] 168050: tolerable FAIL - PUSHED

2022-02-07 Thread osstest service owner
flight 168050 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168050/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168041
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168041
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168041
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168041
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168041
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168041
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168041
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168041
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux2ade8eef993c37a2a43e51a9b1f6c25509a2acce
baseline version:
 linuxdfd42facf1e4ada021b939b4e19c935dcdd55566

Last test of basis   168041  2022-02-07 05:04:10 Z0 days
Testing same since   168050  2022-02-07 20:39:51 Z0 days1 attempts


People who touched revisions under test:
  Abderraouf Adjal 
  Adrian Hunter 
  Andrey Skvortsov 
  Christian Brauner 
  Damien Le Moal 
  Eric Biggers 
  Greg Kroah-Hartman 
  Jiasheng Jiang 
  Linus Torvalds 
  Mimi Zohar 
  Roberto Sassu 
  Stefan Berger 
  Ulf Hansson 
  Xiaoke Wang 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  

Re: [PATCH v3 1/2] dt-bindings: arm: xen: document Xen iommu device

2022-02-07 Thread Stefano Stabellini
On Fri, 4 Feb 2022, Rob Herring wrote:
> On Wed, Jan 26, 2022 at 10:56:39AM -0800, Stefano Stabellini wrote:
> > On Wed, 26 Jan 2022, Robin Murphy wrote:
> > > On 2022-01-26 15:09, Sergiy Kibrik wrote:
> > > > Hi Robin,
> > > > 
> > > > > 
> > > > > This could break Linux guests, since depending on the deferred probe
> > > > > timeout setting it could lead to drivers never probing because the 
> > > > > "IOMMU"
> > > > > never becomes available.
> > > > > 
> > > > 
> > > > I've noticed no deferred probe timeouts when booting with this patch. 
> > > > Could
> > > > you please explain more on how this would break guests?
> > > 
> > > Right now I think it would actually require command-line intervention, 
> > > e.g.
> > > "fw_devlink=on" or "deferred_probe_timeout=3600" (with modules enabled 
> > > for the
> > > latter to take full effect), but I'm wary of the potential for future 
> > > config
> > > options to control those behaviours by default.
> 
> fw_devlink=on is now the default (for at least a couple of cycles).
> 
> > 
> > If deferred_probe_timeout=3600 was specified, we would just need an
> > IOMMU driver in Linux for the "xen,iommu-el2-v1" node to solve the
> > problem, right? I guess I am trying to say that it wouldn't be a device
> > tree interface problem but rather a Linux implementation discussion.
> 
> You would have to add that IOMMU driver to old, existing kernels if you 
> want compatibility with a new DT. Otherwise, that kernel would stop 
> booting with a new DT.

The tiny "xen,iommu-el2-v1" driver could be backported to the stable
trees, I would imagine. Otherwise, do you have another suggestion?

It looks like fw_devlink=on applies to supplier/consumer interfaces.
If that is the problem, then maybe we should avoid supplier/consumer
interfaces altogether. Instead, we could add a new Xen specific
property, e.g.:

device@ff {
compatible = "this,device";

xen,iommu = "on";
};



Re: Metadata and signalling channels for Zephyr virtio-backends on Xen

2022-02-07 Thread Stefano Stabellini
On Mon, 7 Feb 2022, Alex Bennée wrote:
> Hi Stefano,
> 
> Vincent gave an update on his virtio-scmi work at the last Stratos sync
> call and the discussion moved onto next steps.

Hi Alex,

I don't know the specifics of virtio-scmi, but if it is about power,
clocks, reset, etc. like the original SCMI protocol, then virtio-scmi is
likely going to be very different from all the other virtio frontends
and backends. That's because SCMI requires a full view of the system,
which is different from something like virtio-net that is limited to the
emulation of 1 device. For this reason, it is likely that the
virtio-scmi backend would be a better fit in Xen itself, rather than run
in userspace inside a VM.

FYI, a good and promising approach to handle both SCMI and SCPI is the
series recently submitted by EPAM to mediate SCMI and SCPI requests in
Xen: https://marc.info/?l=xen-devel=163947444032590

(Another "special" virtio backend is virtio-iommu for similar reasons:
the guest p2m address mappings and also the IOMMU drivers are in Xen.
It is not immediately clear whether a virtio-iommu backend would need to
be in Xen or run as a process in dom0/domU.)

On the other hand, for all the other "normal" protocols (e.g.
virtio-net, virtio-block, etc.) the backend would naturally run as a
process in dom0 or domU (e.g. QEMU in Dom0) as one would expect.


> Currently the demo setup
> is intermediated by a double-ended vhost-user daemon running on the
> devbox acting as a go between a number of QEMU instances representing
> the front and back-ends. You can view the architecture with Vincents
> diagram here:
> 
>   
> https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhmYHs/edit?usp=sharing
> 
> The key virtq handling is done over the special carve outs of shared
> memory between the front end and guest. However the signalling is
> currently over a virtio device on the backend. This is useful for the
> PoC but obviously in a real system we don't have a hidden POSIX system
> acting as a go between not to mention the additional latency it causes
> with all those context switches.
> 
> I was hoping we could get some more of the Xen experts to the next
> Stratos sync (17th Feb) to go over approaches for a properly hosted on
> Xen approach. From my recollection (Vincent please correct me if I'm
> wrong) of last week the issues that need solving are:

Unfortunately I have a regular conflict which prevents me from being
able to join the Stratos calls. However, I can certainly make myself
available for one call (unless something unexpected comes up).


>  * How to handle configuration steps as FE guests come up
> 
> The SCMI server will be a long running persistent backend because it is
> managing real HW resources. However the guests may be ephemeral (or just
> restarted) so we can't just hard-code everything in a DTB. While the
> virtio-negotiation in the config space covers most things we still need
> information like where in the guests address space the shared memory
> lives and at what offset into that the queues are created. As far as I'm
> aware the canonical source of domain information is XenStore
> (https://wiki.xenproject.org/wiki/XenStore) but this relies on a Dom0
> type approach. Is there an alternative for dom0less systems or do we
> need a dom0-light approach, for example using STR-21 (Ensure Zephyr can
> run cleanly as a Dom0 guest) providing just enough services for FE's to
> register metadata and BE's to read it?

I'll try to answer the question for a generic virtio frontend and
backend instead (not SCMI because SCMI is unique due to the reasons
above.)

Yes, xenstore is the easiest way to exchange configuration information
between domains. I think EPAM used xenstore to exchange the
configuration information in their virtio-block demo. There is a way to
use xenstore even between dom0less VMs:
https://marc.info/?l=xen-devel=164340547602391 Not just xenstore but
full PV drivers too. However, in the dom0less case xenstore is going to
become available some time after boot, not immediately at startup time.
That's because you need to wait until xenstored is up and running.

There are other ways to send data from one VM to another which are
available immediately at boot, such as Argo and static shared memory.

But dom0less is all about static partitioning, so it makes sense to
exploit the build-time tools to the fullest. In the dom0less case, we
already know what is going to run on the target before it is even turned
on. As an example, we might have already prepared an environment with 3
VMs using Yocto and ImageBuilder. We could also generate all
configurations needed and place them inside each VMs using Yocto's
standard tools and ImageBuilder. So for dom0less, I recommend to go via
a different route and pre-generate the configuration directly where
needed instead of doing dynamic discovery.


>  * How to handle mapping of memory
> 
> AIUI the Xen model is the FE guest explicitly makes grant 

[qemu-mainline test] 168047: tolerable FAIL - PUSHED

2022-02-07 Thread osstest service owner
flight 168047 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168047/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168034
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168034
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168034
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168034
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168034
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168034
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168034
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168034
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuu55ef0b702bc2c90c3c4ed97f97676d8f139e5ca1
baseline version:
 qemuu0d564a3e32ba8494014c67cdd2ebf0fb71860dff

Last test of basis   168034  2022-02-06 18:38:21 Z1 days
Testing same since   168047  2022-02-07 15:37:04 Z0 days1 attempts


People who touched revisions under test:
  Cameron Esfahani 
  Laurent Vivier 
  Patrick 

[xen-unstable-smoke test] 168049: tolerable all pass - PUSHED

2022-02-07 Thread osstest service owner
flight 168049 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168049/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  10d33220f2363a21a52a394159118ab4ddaed50e
baseline version:
 xen  820cc393434097f3b7976acdccbf1d96071d6d23

Last test of basis   168011  2022-02-04 16:01:44 Z3 days
Testing same since   168049  2022-02-07 18:01:43 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Roger Pau Monne 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   820cc39343..10d33220f2  10d33220f2363a21a52a394159118ab4ddaed50e -> smoke



[ovmf test] 168048: all pass - PUSHED

2022-02-07 Thread osstest service owner
flight 168048 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168048/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf b360b0b589697da267f5d3a553e65635b91ebae2
baseline version:
 ovmf 1f54eaa725f45e0c66c28f8d47fa8fb33f7be52c

Last test of basis   168046  2022-02-07 14:11:34 Z0 days
Testing same since   168048  2022-02-07 17:10:21 Z0 days1 attempts


People who touched revisions under test:
  Xiaoyu Lu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   1f54eaa725..b360b0b589  b360b0b589697da267f5d3a553e65635b91ebae2 -> 
xen-tested-master



Re: [PATCH RFC 3/3] xen: Introduce a platform sub-op to retrieve the VGA information

2022-02-07 Thread Julien Grall

Hi Roger,

On 07/02/2022 11:58, Roger Pau Monné wrote:

On Mon, Feb 07, 2022 at 09:57:55AM +0100, Jan Beulich wrote:

On 06.02.2022 20:28, Julien Grall wrote:

From: Julien Grall 

When using EFI, the VGA information is fetched using the EFI
boot services. However, Xen will have exited the boot services.
Therefore, we need to find a different way to pass the information
to dom0.

For PV dom0, they are part of the start_info. But this is not
something that exists on Arm. So the best way would to be to
use a hypercall.

For now the structure layout is based on dom0_vga_console_info
for convenience. I am open on another proposal.

Signed-off-by: Julien Grall 


Cc-ing Roger as this may want using for PVH Dom0 also on x86; my
first attempt to propagate this information was rejected.


I think it's easier to use a Xen specific layout in XENPF, as that's
already a Xen specific interface.

I wonder however if passing the information here (instead of doing it
in the start info or equivalent) could cause a delay in the
initialization of the video console.


My current plan for Arm is to issue the hypercall as part of an 
earlyinit call. But we can do much earlier (i.e. xen_early_init() which 
is called from setup_arch()) if necessary.


This should be early enough for Arm. How about x86?


I guess the same happens when
using the Xen consoles (either the hypercall one or the shared ring),
so it's fine.


--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -244,6 +244,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_efi_runtime_call_t);
  #define  XEN_FW_EFI_PCI_ROM5
  #define  XEN_FW_EFI_APPLE_PROPERTIES 6
  #define XEN_FW_KBD_SHIFT_FLAGS5
+#define XEN_FW_VGA_INFO   6


Perhaps s/VGA/VIDEO/, despite ...


  struct xenpf_firmware_info {
  /* IN variables. */
  uint32_t type;
@@ -311,6 +312,7 @@ struct xenpf_firmware_info {
  
  /* Int16, Fn02: Get keyboard shift flags. */

  uint8_t kbd_shift_flags; /* XEN_FW_KBD_SHIFT_FLAGS */
+struct dom0_vga_console_info vga;


... the structure name including "vga" (but if the #define is adjusted,
the field name would want to become "video" as well).




[...]

(Re-ordered the quote as it makes more sense for my reply)


There's no need to propagate XEN_VGATYPE_TEXT_MODE_3 into this
interface.


So for Arm, we are only caring about XEN_VGATYPE_EFI_LFB. I wasn't sure 
what would be your need on x86. Hence, why I keep it.


If you don't need then, then I am happy to trim the structure for the 
new hypercall.


> It's my understanding that this will forcefully be
> XEN_VGATYPE_EFI_LFB, at which point we could consider giving a type
> name to the vesa_lfb field of dom0_vga_console_info (video_lfb) and
> use the same struct here?>

Just to clarify, are you suggesting to only pass video_lfb? IOW, we will 
always assume it is an EFI framebuffer and not pass the video_type.


Cheers,

--
Julien Grall



Re: [PATCH RFC 2/3] xen/arm: efi: Introduce and fill the vga_console_info

2022-02-07 Thread Julien Grall

Hi,

On 07/02/2022 08:53, Jan Beulich wrote:

On 06.02.2022 20:28, Julien Grall wrote:

From: Julien Grall 

In a follow-up patch will we want to add support for EFI framebuffer
in dom0. Yet, Xen may not use the framebuffer, so it would be ideal
to not have to enable CONFIG_VIDEO/CONFIG_VGA.

Introduce vga_console_info in a hacky way and move the code
to fill it up from x86 to common.

Signed-off-by: Julien Grall 



This is a bit of a hack. Sent early to gather opinion on whether
we should enable allow Dom0 to use the EFI Framebuffer even
if Xen is built with CONFIG_VIDEO=n on Arm.


I have no input here; this will need to be settled among you Arm folks.
I have no objection to the code movement, just one nit:


@@ -1025,6 +1026,77 @@ static void __init efi_get_apple_properties(void)
  }
  }
  
+static void __init efi_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,

+  UINTN info_size,
+  EFI_GRAPHICS_OUTPUT_MODE_INFORMATION 
*mode_info)
+{
+#if defined(CONFIG_VIDEO) || defined(CONFIG_ARM)
+int bpp = 0;
+
+switch ( mode_info->PixelFormat )
+{
+case PixelRedGreenBlueReserved8BitPerColor:
+vga_console_info.u.vesa_lfb.red_pos = 0;
+vga_console_info.u.vesa_lfb.red_size = 8;
+vga_console_info.u.vesa_lfb.green_pos = 8;
+vga_console_info.u.vesa_lfb.green_size = 8;
+vga_console_info.u.vesa_lfb.blue_pos = 16;
+vga_console_info.u.vesa_lfb.blue_size = 8;
+vga_console_info.u.vesa_lfb.rsvd_pos = 24;
+vga_console_info.u.vesa_lfb.rsvd_size = 8;
+bpp = 32;
+break;
+case PixelBlueGreenRedReserved8BitPerColor:
+vga_console_info.u.vesa_lfb.red_pos = 16;
+vga_console_info.u.vesa_lfb.red_size = 8;
+vga_console_info.u.vesa_lfb.green_pos = 8;
+vga_console_info.u.vesa_lfb.green_size = 8;
+vga_console_info.u.vesa_lfb.blue_pos = 0;
+vga_console_info.u.vesa_lfb.blue_size = 8;
+vga_console_info.u.vesa_lfb.rsvd_pos = 24;
+vga_console_info.u.vesa_lfb.rsvd_size = 8;
+bpp = 32;
+break;
+case PixelBitMask:
+bpp = set_color(mode_info->PixelInformation.RedMask, bpp,
+_console_info.u.vesa_lfb.red_pos,
+_console_info.u.vesa_lfb.red_size);
+bpp = set_color(mode_info->PixelInformation.GreenMask, bpp,
+_console_info.u.vesa_lfb.green_pos,
+_console_info.u.vesa_lfb.green_size);
+bpp = set_color(mode_info->PixelInformation.BlueMask, bpp,
+_console_info.u.vesa_lfb.blue_pos,
+_console_info.u.vesa_lfb.blue_size);
+if ( mode_info->PixelInformation.ReservedMask )
+bpp = set_color(mode_info->PixelInformation.ReservedMask, bpp,
+_console_info.u.vesa_lfb.rsvd_pos,
+_console_info.u.vesa_lfb.rsvd_size);
+if ( bpp > 0 )
+break;
+/* fall through */
+default:
+PrintErr(L"Current graphics mode is unsupported!\r\n");
+bpp  = 0;
+break;
+}
+if ( bpp > 0 )
+{
+vga_console_info.video_type = XEN_VGATYPE_EFI_LFB;
+vga_console_info.u.vesa_lfb.gbl_caps = 2; /* possibly non-VGA */
+vga_console_info.u.vesa_lfb.width =
+mode_info->HorizontalResolution;
+vga_console_info.u.vesa_lfb.height = mode_info->VerticalResolution;
+vga_console_info.u.vesa_lfb.bits_per_pixel = bpp;
+vga_console_info.u.vesa_lfb.bytes_per_line =
+(mode_info->PixelsPerScanLine * bpp + 7) >> 3;
+vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase;
+vga_console_info.u.vesa_lfb.ext_lfb_base = gop->Mode->FrameBufferBase 
>> 32;
+vga_console_info.u.vesa_lfb.lfb_size =
+(gop->Mode->FrameBufferSize + 0x) >> 16;
+}
+#endif
+}


While you move this code, could you please insert blank lines between
non-fall-through case blocks, and perhaps another one between the switch()
and the if() blocks? And it looks like
- the "gop" parameter could also do with becoming pointer-to-const,


I can do that.


- the expanded #ifdef could do with a comment briefly explaining why Arm
   needs-special casing.
Agree. I will wait input with the others regarding the #ifdef approach 
before respinning this patch.


Cheers,

--
Julien Grall



Re: [PATCH RFC 1/3] xen/efi: Always query the console information and get GOP

2022-02-07 Thread Julien Grall

Hi Jan,

On 07/02/2022 08:46, Jan Beulich wrote:

On 06.02.2022 20:28, Julien Grall wrote:

From: Julien Grall 

Currently, the EFI stub will only query the console information and
get the GOP when using the configuration file.

However, GRUB is never providing the a configuration file. So the
EFI framebuffer will not be usable at least on Arm (support will
be added in a follow-up patch).

Move out the code outside of the configuration file section.

Take the opportunity to remove the variable 'size' which was
set but never used (interestingly GCC is only complaining if it is
initialization when declaring the variable).

With this change, GCC 8.3 will complain of argc potentially been
used unitiatlized. I suspect this is because the argc will
be iniitalized and used in a different if code-blocks. Yet they
are using the same check.


I'm inclined to suggest this wants to be a separate change, with its
own justification. You're not touching any use of argc here, after
all.


Ok. I will split it.




Signed-off-by: Julien Grall 



It is not entirely clear to me why the GOP was only fetched when
the configuration file is used.

I have tested this on RPI4 and it seems to work. Any chance this
was done to workaround an x86 platform?


This was done so in the context of making the code work for Arm. See
commit c38cf865ec82 ("EFI: ignore EFI commandline, skip console setup
when booted from GRUB"), the description of which explicitly says

"Don't do EFI console or video configuration when booted by GRUB.  The EFI boot
  code does some console and video initialization to support native EFI boot 
from
  the EFI boot manager or EFI shell.  This initlization should not be done when
  booted using GRUB."


I read that and still couldn't figure out why this was done like that.



What you say now is effectively the opposite (and unlike back then
x86 is now able to use this code path as well, so needs considering
too). Cc-ing Daniel for possibly having a GrUB-side opinion.


I am quite interested to know the answer. Linux is able to use the EFI 
framebuffer when booting via GRUB. So I am a bit puzzled why we are 
preventing this setup on dom0/Xen.


Cheers,

--
Julien Grall



[PATCH v2 2/2] x86/xen: Allow per-domain usage of hardware virtualized APIC

2022-02-07 Thread Jane Malalane
Introduce a new per-domain creation x86 specific flag to
select whether hardware assisted virtualization should be used for
x{2}APIC.

A per-domain option is added to xl in order to select the usage of
x{2}APIC hardware assisted vitualization, as well as a global
configuration option.

Having all APIC interaction exit to Xen for emulation is slow and can
induce much overhead. Hardware can speed up x{2}APIC by running APIC
read/write accesses without taking a VM exit.

Being able to disable x{2}APIC hardware assisted vitualization can be
useful for testing and debugging purposes.

Signed-off-by: Jane Malalane 
Suggested-by: Andrew Cooper 
---
CC: Wei Liu 
CC: Anthony PERARD 
CC: Juergen Gross 
CC: Andrew Cooper 
CC: George Dunlap 
CC: Jan Beulich 
CC: Julien Grall 
CC: Stefano Stabellini 
CC: Christian Lindig 
CC: David Scott 
CC: Volodymyr Babchuk 
CC: "Roger Pau Monné" 

v2:
 * Add a LIBXL_HAVE_ASSISTED_APIC macro
 * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo
 * Add a return statement in now "int"
   libxl__arch_domain_build_info_setdefault
 * Preserve libxl__arch_domain_build_info_setdefault 's location in
   libxl_create.c
 * Correct x{2}apic default setting logic in
   libxl__arch_domain_prepare_config
 * Correct logic for parsing assisted_x{2}apic host/guest options in
   xl_parse.c and initialize them to -1 in xl.c
 * Use guest options directly in vmx_vlapic_msr_changed
 * Fix indentation of bool assisted_x{2}apic in struct hvm_domain
 * Add a change in xenctrl_stubs.c to pass xenctrl ABI checks
---
 docs/man/xl.cfg.5.pod.in  | 10 ++
 docs/man/xl.conf.5.pod.in | 12 
 tools/golang/xenlight/helpers.gen.go  | 12 
 tools/include/libxl.h |  7 +++
 tools/libs/light/libxl_arch.h |  5 +++--
 tools/libs/light/libxl_arm.c  |  7 +--
 tools/libs/light/libxl_create.c   | 23 ++-
 tools/libs/light/libxl_types.idl  |  2 ++
 tools/libs/light/libxl_x86.c  | 31 +--
 tools/ocaml/libs/xc/xenctrl.ml|  2 ++
 tools/ocaml/libs/xc/xenctrl.mli   |  2 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c   |  2 +-
 tools/xl/xl.c |  8 
 tools/xl/xl.h |  2 ++
 tools/xl/xl_parse.c   | 16 
 xen/arch/x86/domain.c | 28 +++-
 xen/arch/x86/hvm/vmx/vmcs.c   |  4 
 xen/arch/x86/hvm/vmx/vmx.c| 14 +-
 xen/arch/x86/include/asm/hvm/domain.h |  6 ++
 xen/arch/x86/traps.c  |  8 
 xen/include/public/arch-x86/xen.h |  2 ++
 21 files changed, 173 insertions(+), 30 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index b98d161398..1d98bbd182 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -1862,6 +1862,16 @@ firmware tables when using certain older guest Operating
 Systems. These tables have been superseded by newer constructs within
 the ACPI tables.
 
+=item B
+B<(x86 only)> Enables or disables hardware assisted virtualization for xapic.
+This allows accessing APIC registers without a VM-exit.
+The default is settable via L.
+
+=item B
+B<(x86 only)> Enables or disables hardware assisted virtualization for x2apic.
+This allows accessing APIC registers without a VM-exit.
+The default is settable via L.
+
 =item B
 
 B<(x86 only)> Hides or exposes the No-eXecute capability. This allows a guest
diff --git a/docs/man/xl.conf.5.pod.in b/docs/man/xl.conf.5.pod.in
index df20c08137..30993827e5 100644
--- a/docs/man/xl.conf.5.pod.in
+++ b/docs/man/xl.conf.5.pod.in
@@ -107,6 +107,18 @@ Sets the default value for the C domain 
config value.
 
 Default: maximum grant version supported by the hypervisor.
 
+=item B
+
+If enabled, domains will use xAPIC hardware assisted virtualization by default.
+
+Default: enabled if supported.
+
+=item B
+
+If enabled, domains will use x2APIC hardware assisted virtualization by 
default.
+
+Default: enabled if supported.
+
 =item B
 
 Configures the default hotplug script used by virtual network devices.
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index dd4e6c9f14..90e7b9b205 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -636,6 +636,12 @@ x.Passthrough = Passthrough(xc.passthrough)
 if err := x.XendSuspendEvtchnCompat.fromC(_suspend_evtchn_compat);err 
!= nil {
 return fmt.Errorf("converting field XendSuspendEvtchnCompat: %v", err)
 }
+if err := x.ArchX86.AssistedXapic.fromC(_x86.assisted_xapic);err != 
nil {
+return fmt.Errorf("converting field ArchX86.AssistedXapic: %v", err)
+}
+if err := x.ArchX86.AssistedX2Apic.fromC(_x86.assisted_x2apic);err != 
nil {
+return fmt.Errorf("converting field ArchX86.AssistedX2Apic: %v", err)
+}
 
  return nil}
 
@@ -679,6 +685,12 @@ xc.passthrough = 

[PATCH v2 1/2] xen+tools: Report Interrupt Controller Virtualization capabilities on x86

2022-02-07 Thread Jane Malalane
Add XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_xapic and
XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_x2apic to report accelerated xapic
and x2apic, on x86 hardware.
No such features are currently implemented on AMD hardware.

For that purpose, also add an arch-specific "capabilities" parameter
to struct xen_sysctl_physinfo.

Signed-off-by: Jane Malalane 
Suggested-by: Andrew Cooper 
---
CC: Wei Liu 
CC: Anthony PERARD 
CC: Juergen Gross 
CC: Andrew Cooper 
CC: George Dunlap 
CC: Jan Beulich 
CC: Julien Grall 
CC: Stefano Stabellini 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: "Roger Pau Monné" 

v2:
 * Use one macro LIBXL_HAVE_PHYSINFO_ASSISTED_APIC instead of two
 * Pass xcpyshinfo as a pointer in libxl__arch_get_physinfo
 * Set assisted_x{2}apic_available to be conditional upon "bsp" and
   annotate it with __ro_after_init
 * Change XEN_SYSCTL_PHYSCAP_ARCH_ASSISTED_X{2}APIC to
   .._X86_ASSISTED_X{2}APIC
 * Keep XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X{2}APIC contained within
   sysctl.h
 * Fix padding introduced in struct xen_sysctl_physinfo and bump
   XEN_SYSCTL_INTERFACE_VERSION
---
 tools/golang/xenlight/helpers.gen.go |  4 
 tools/golang/xenlight/types.gen.go   |  6 ++
 tools/include/libxl.h|  7 +++
 tools/libs/light/libxl.c |  3 +++
 tools/libs/light/libxl_arch.h|  4 
 tools/libs/light/libxl_arm.c |  5 +
 tools/libs/light/libxl_types.idl |  2 ++
 tools/libs/light/libxl_x86.c | 11 +++
 tools/ocaml/libs/xc/xenctrl.ml   |  5 +
 tools/ocaml/libs/xc/xenctrl.mli  |  5 +
 tools/xl/xl_info.c   |  6 --
 xen/arch/x86/hvm/vmx/vmcs.c  |  9 +
 xen/arch/x86/include/asm/domain.h|  3 +++
 xen/arch/x86/sysctl.c|  7 +++
 xen/include/public/sysctl.h  |  8 +++-
 15 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index b746ff1081..dd4e6c9f14 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -3373,6 +3373,8 @@ x.CapVmtrace = bool(xc.cap_vmtrace)
 x.CapVpmu = bool(xc.cap_vpmu)
 x.CapGnttabV1 = bool(xc.cap_gnttab_v1)
 x.CapGnttabV2 = bool(xc.cap_gnttab_v2)
+x.CapAssistedXapic = bool(xc.cap_assisted_xapic)
+x.CapAssistedX2Apic = bool(xc.cap_assisted_x2apic)
 
  return nil}
 
@@ -3407,6 +3409,8 @@ xc.cap_vmtrace = C.bool(x.CapVmtrace)
 xc.cap_vpmu = C.bool(x.CapVpmu)
 xc.cap_gnttab_v1 = C.bool(x.CapGnttabV1)
 xc.cap_gnttab_v2 = C.bool(x.CapGnttabV2)
+xc.cap_assisted_xapic = C.bool(x.CapAssistedXapic)
+xc.cap_assisted_x2apic = C.bool(x.CapAssistedX2Apic)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index b1e84d5258..5f384b767c 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -389,6 +389,10 @@ RunHotplugScripts Defbool
 DriverDomain Defbool
 Passthrough Passthrough
 XendSuspendEvtchnCompat Defbool
+ArchX86 struct {
+AssistedXapic Defbool
+AssistedX2Apic Defbool
+}
 }
 
 type DomainRestoreParams struct {
@@ -1014,6 +1018,8 @@ CapVmtrace bool
 CapVpmu bool
 CapGnttabV1 bool
 CapGnttabV2 bool
+CapAssistedXApic bool
+CapAssistedX2apic bool
 }
 
 type Connectorinfo struct {
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 2bbbd21f0b..924e142628 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -528,6 +528,13 @@
 #define LIBXL_HAVE_MAX_GRANT_VERSION 1
 
 /*
+ * LIBXL_HAVE_PHYSINFO_ASSISTED_APIC indicates that libxl_physinfo has
+ * cap_assisted_x{2}apic fields, which indicates the availability of x{2}APIC
+ * hardware assisted virtualization.
+ */
+#define LIBXL_HAVE_PHYSINFO_ASSISTED_APIC 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libs/light/libxl.c b/tools/libs/light/libxl.c
index 667ae6409b..fabb474221 100644
--- a/tools/libs/light/libxl.c
+++ b/tools/libs/light/libxl.c
@@ -15,6 +15,7 @@
 #include "libxl_osdeps.h"
 
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 
 int libxl_ctx_alloc(libxl_ctx **pctx, int version,
 unsigned flags, xentoollog_logger * lg)
@@ -410,6 +411,8 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
*physinfo)
 physinfo->cap_gnttab_v2 =
 !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_gnttab_v2);
 
+libxl__arch_get_physinfo(physinfo, );
+
 GC_FREE;
 return 0;
 }
diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h
index 1522ecb97f..207ceac6a1 100644
--- a/tools/libs/light/libxl_arch.h
+++ b/tools/libs/light/libxl_arch.h
@@ -86,6 +86,10 @@ int libxl__arch_extra_memory(libxl__gc *gc,
  uint64_t *out);
 
 _hidden
+void libxl__arch_get_physinfo(libxl_physinfo *physinfo,
+  const xc_physinfo_t *xcphysinfo);
+
+_hidden
 void 

[PATCH v2 0/2] xen: Report and use hardware APIC virtualization capabilities

2022-02-07 Thread Jane Malalane
Jane Malalane (2):
  xen+tools: Report Interrupt Controller Virtualization capabilities on
x86
  x86/xen: Allow per-domain usage of hardware virtualized APIC

 docs/man/xl.cfg.5.pod.in  | 10 +
 docs/man/xl.conf.5.pod.in | 12 ++
 tools/golang/xenlight/helpers.gen.go  | 16 +
 tools/golang/xenlight/types.gen.go|  6 +
 tools/include/libxl.h | 14 
 tools/libs/light/libxl.c  |  3 +++
 tools/libs/light/libxl_arch.h |  9 ++--
 tools/libs/light/libxl_arm.c  | 12 --
 tools/libs/light/libxl_create.c   | 23 +++
 tools/libs/light/libxl_types.idl  |  4 
 tools/libs/light/libxl_x86.c  | 42 +--
 tools/ocaml/libs/xc/xenctrl.ml|  7 ++
 tools/ocaml/libs/xc/xenctrl.mli   |  7 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c   |  2 +-
 tools/xl/xl.c |  8 +++
 tools/xl/xl.h |  2 ++
 tools/xl/xl_info.c|  6 +++--
 tools/xl/xl_parse.c   | 16 +
 xen/arch/x86/domain.c | 28 ++-
 xen/arch/x86/hvm/vmx/vmcs.c   | 13 +++
 xen/arch/x86/hvm/vmx/vmx.c| 14 +---
 xen/arch/x86/include/asm/domain.h |  3 +++
 xen/arch/x86/include/asm/hvm/domain.h |  6 +
 xen/arch/x86/sysctl.c |  7 ++
 xen/arch/x86/traps.c  |  8 +++
 xen/include/public/arch-x86/xen.h |  2 ++
 xen/include/public/sysctl.h   |  8 ++-
 27 files changed, 255 insertions(+), 33 deletions(-)

-- 
2.11.0




Re: [PATCH v2 0/5] tools: remove include/xen-external directory

2022-02-07 Thread Anthony PERARD
On Mon, Feb 07, 2022 at 07:41:42AM +0100, Juergen Gross wrote:
> The tools/include/xen-external directory contains a header file from
> FreeBSD used to generate Xen header files. This series is replacing the
> complete directory by a single header with the same semantics.
> 
> Changes in V2:
> - remove stale comment in patch 1

Just need to fix the build now, otherwise, the series looks fine:
https://gitlab.com/xen-project/patchew/xen/-/pipelines/464803007
^ xentoolcore conversion seems unfinished.

Thanks,

-- 
Anthony PERARD



Re: [PATCH V5] xen/gnttab: Store frame GFN in struct page_info on Arm

2022-02-07 Thread Julien Grall




On 07/02/2022 17:58, Oleksandr Tyshchenko wrote:


On 07.02.22 19:41, Julien Grall wrote:

On 06/01/2022 16:30, Oleksandr wrote:

So I agree with Jan that the name should be adjusted if it stays where
it is.

That said, I would actually prefer the adjustment in
alloc_heap_pages(). It is one less assignment per page and I don't
expect any issue with setting the bits to INVALID_GFN everywhere in
the future on Arm.



Sorry I lost the context. To clarify, are you speaking about what I
proposed at [1]?


That's correct.

Cheers,

--
Julien Grall



Re: [PATCH V5] xen/gnttab: Store frame GFN in struct page_info on Arm

2022-02-07 Thread Julien Grall

Hi,

Sorry for the late reply.

On 06/01/2022 16:30, Oleksandr wrote:

So I agree with Jan that the name should be adjusted if it stays where 
it is.


That said, I would actually prefer the adjustment in alloc_heap_pages(). 
It is one less assignment per page and I don't expect any issue with 
setting the bits to INVALID_GFN everywhere in the future on Arm.


Note that you would also need to update acquire_staticmem_pages().

Cheers,

--
Julien Grall



Re: [PATCH] xen/smp: Speed up on_selected_cpus()

2022-02-07 Thread Andrew Cooper
On 06/02/2022 19:40, Julien Grall wrote:
> Hi,
>
> On 04/02/2022 20:31, Andrew Cooper wrote:
>> cpumask_weight() is a horribly expensive way to find if no bits are
>> set, made
>> worse by the fact that the calculation is performed with the global
>> call_lock
>> held.
>
> I looked at the archive because I was wondering why we were using
> cpumask_weight here. It looks like this was a left-over of the rework
> in ac3fc35d919c "x86: Fix flush_area_mask() and on_selected_cpus() to
> not race updates".

That change shuffled the code, but didn't introduce the problem.

I'm pretty sure it was 433f14699d48 which dropped the !=0 user of nr_cpus.


Talking of, there is more efficiency to be gained by reworking the
second cpumask_empty() call to not restart from 0 on failure, because
that removes useless reads.


>
>>
>> Switch to using cpumask_empty() instead, which will short circuit as
>> soon as
>> it find any set bit in the cpumask.
>>
>> Signed-off-by: Andrew Cooper 
>
> Reviewed-by: Julien Grall 

Thanks.

~Andrew


Re: [PATCH V5] xen/gnttab: Store frame GFN in struct page_info on Arm

2022-02-07 Thread Julien Grall

Hi Oleksandr,

On 05/01/2022 23:11, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

Rework Arm implementation to store grant table frame GFN
in struct page_info directly instead of keeping it in
standalone status/shared arrays. This patch is based on
the assumption that grant table page is the xenheap page.


I would write "grant table pages are xenheap pages" or "a grant table 
page is a xenheap page".


[...]


diff --git a/xen/arch/arm/include/asm/grant_table.h 
b/xen/arch/arm/include/asm/grant_table.h
index d31a4d6..d6fda31 100644
--- a/xen/arch/arm/include/asm/grant_table.h
+++ b/xen/arch/arm/include/asm/grant_table.h
@@ -11,11 +11,6 @@
  #define INITIAL_NR_GRANT_FRAMES 1U
  #define GNTTAB_MAX_VERSION 1
  
-struct grant_table_arch {

-gfn_t *shared_gfn;
-gfn_t *status_gfn;
-};
-
  static inline void gnttab_clear_flags(struct domain *d,
unsigned int mask, uint16_t *addr)
  {
@@ -46,41 +41,12 @@ int replace_grant_host_mapping(unsigned long gpaddr, mfn_t 
mfn,
  #define gnttab_dom0_frames() \
  min_t(unsigned int, opt_max_grant_frames, PFN_DOWN(_etext - _stext))
  
-#define gnttab_init_arch(gt) \

-({   \
-unsigned int ngf_ = (gt)->max_grant_frames;  \
-unsigned int nsf_ = grant_to_status_frames(ngf_);\
- \
-(gt)->arch.shared_gfn = xmalloc_array(gfn_t, ngf_);  \
-(gt)->arch.status_gfn = xmalloc_array(gfn_t, nsf_);  \
-if ( (gt)->arch.shared_gfn && (gt)->arch.status_gfn )\
-{\
-while ( ngf_-- ) \
-(gt)->arch.shared_gfn[ngf_] = INVALID_GFN;   \
-while ( nsf_-- ) \
-(gt)->arch.status_gfn[nsf_] = INVALID_GFN;   \
-}\
-else \
-gnttab_destroy_arch(gt); \
-(gt)->arch.shared_gfn ? 0 : -ENOMEM; \
-})
-
-#define gnttab_destroy_arch(gt)  \
-do { \
-XFREE((gt)->arch.shared_gfn);\
-XFREE((gt)->arch.status_gfn);\
-} while ( 0 )
-
  #define gnttab_set_frame_gfn(gt, st, idx, gfn, mfn)  \
  ({   \
-int rc_ = 0; \
  gfn_t ogfn = gnttab_get_frame_gfn(gt, st, idx);  \
-if ( gfn_eq(ogfn, INVALID_GFN) || gfn_eq(ogfn, gfn) ||   \
- (rc_ = guest_physmap_remove_page((gt)->domain, ogfn, mfn,   \
-  0)) == 0 ) \
-((st) ? (gt)->arch.status_gfn\
-  : (gt)->arch.shared_gfn)[idx] = (gfn); \
-rc_; \
+(!gfn_eq(ogfn, INVALID_GFN) && !gfn_eq(ogfn, gfn))   \
+ ? guest_physmap_remove_page((gt)->domain, ogfn, mfn, 0) \
+ : 0;\


Given that we are implementing something similar to an M2P, I was 
expecting the implementation to be pretty much the same as the x86 helper.


Would you be able to outline why it is different?


  })
  
  #define gnttab_get_frame_gfn(gt, st, idx) ({ \

@@ -88,11 +54,21 @@ int replace_grant_host_mapping(unsigned long gpaddr, mfn_t 
mfn,
  : gnttab_shared_gfn(NULL, gt, idx);  \
  })
  
-#define gnttab_shared_gfn(d, t, i)   \

-(((i) >= nr_grant_frames(t)) ? INVALID_GFN : (t)->arch.shared_gfn[i])
+#define gnttab_shared_page(t, i) ({  \
+virt_to_page((t)->shared_raw[i]);\
+})


This can be simplified to:

#define gnttab_shared_page(t, i) virt_to_page((t)->shared_raw[i])


+
+#define gnttab_status_page(t, i) ({  \
+virt_to_page((t)->status[i]);\
+})


Same here.

  
-#define gnttab_status_gfn(d, t, i)   \

-(((i) >= nr_status_frames(t)) ? INVALID_GFN : (t)->arch.status_gfn[i])

Re: [PATCH] xen/smp: Speed up on_selected_cpus()

2022-02-07 Thread Andrew Cooper
On 07/02/2022 08:11, Jan Beulich wrote:
> On 04.02.2022 21:31, Andrew Cooper wrote:
>> cpumask_weight() is a horribly expensive way to find if no bits are set, made
>> worse by the fact that the calculation is performed with the global call_lock
>> held.
>>
>> Switch to using cpumask_empty() instead, which will short circuit as soon as
>> it find any set bit in the cpumask.
>>
>> Signed-off-by: Andrew Cooper 
> May I suggest to drop "horribly"? How expensive one is compared to the other
> depends on the number of CPUs actually enumerated in the system.

In absolute terms perhaps, but they both scale as O(nr_cpus).  Hamming
weight has a far larger constant.

>  (And of
> course I still have that conversion to POPCNT alternatives patching pending,
> where Roger did ask for some re-work in reply to v2, but where it has
> remained unclear whether investing time into that wouldn't be in vein,
> considering some of your replies on v1. Thus would have further shrunk the
> difference, without me meaning to say the change here isn't a good one.)

There is a perfectly clear and simple way forward.  It's the one which
doesn't fight the optimiser and actively regress the code generation in
the calling functions, and add an unreasonable quantity technical debt
into the marginal paths.

I will ack a version where you're not adding complexity for negative gains.

~Andrew


Re: [PATCH v6 06/13] vpci/header: implement guest BAR register handlers

2022-02-07 Thread Jan Beulich
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> +static uint32_t guest_bar_ignore_read(const struct pci_dev *pdev,
> +  unsigned int reg, void *data)
> +{
> +return 0;
> +}
> +
> +static int bar_ignore_access(const struct pci_dev *pdev, unsigned int reg,
> + struct vpci_bar *bar)
> +{
> +if ( is_hardware_domain(pdev->domain) )
> +return 0;
> +
> +return vpci_add_register(pdev->vpci, guest_bar_ignore_read, NULL,
> + reg, 4, bar);
> +}

For these two functions: I'm not sure "ignore" is an appropriate
term here. unused_bar_read() and unused_bar() maybe? Or,
considering we already have VPCI_BAR_EMPTY, s/unused/empty/ ? I'm
also not sure we really need the is_hardware_domain() check here:
Returning 0 for Dom0 is going to be fine as well; there's no need
to fetch the value from actual hardware. The one exception might
be for devices with buggy BAR behavior ...

> @@ -516,6 +594,11 @@ static int init_bars(struct pci_dev *pdev)
>  if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>  {
>  bars[i].type = VPCI_BAR_IO;
> +
> +rc = bar_ignore_access(pdev, reg, [i]);
> +if ( rc )
> +return rc;

Elsewhere the command register is restored on error paths.

Jan




Re: [XEN PATCH v3] xen/arm: introduce dummy iommu node for dom0

2022-02-07 Thread Julien Grall

Hi,

On 11/01/2022 11:26, Sergiy Kibrik wrote:

Currently no IOMMU properties are exposed to dom0, thus kernel by default
assumes no protection and enables swiotlb-xen, which leads to costly and
unnecessary buffers bouncing.

To let kernel know which device is behing IOMMU and hence needs no swiotlb
services we introduce dummy xen-iommu node in FDT and link protected device
nodes to it, using here device tree iommu bindings.

Signed-off-by: Sergiy Kibrik 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Oleksandr Tyshchenko 
Cc: Andrii Anisov 


Changelog:

v3: rebased over staging & remove redundand phandle_iommu attribute, discussion:

https://lists.xenproject.org/archives/html/xen-devel/2021-12/msg01753.html

v2: re-use common iommu dt bindings to let guests know which devices are 
protected:

https://lists.xenproject.org/archives/html/xen-devel/2021-10/msg00073.html

  xen/arch/arm/domain_build.c   | 42 +++
  xen/include/public/device_tree_defs.h |  1 +
  2 files changed, 43 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 6931c022a2..b82ba72fac 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -845,6 +845,12 @@ static int __init write_properties(struct domain *d, 
struct kernel_info *kinfo,
  }
  }
  
+if ( iommu_node && is_iommu_enabled(d) && dt_device_is_protected(node) )


I think it should be sufficient to check dt_device_is_protected() 
because it is set it means that device behind an IOMMU known by Xen. So 
iommu_node will always be valid.


Furthermore, you can't assign to dom0 a device that was protected with 
enabling the IOMMU for the domain.



+{
+res = fdt_property_cell(kinfo->fdt, "iommus", GUEST_PHANDLE_IOMMU);
+if ( res )
+return res;
+}
  return 0;
  }
  
@@ -1479,6 +1485,38 @@ static int __init make_cpus_node(const struct domain *d, void *fdt)

  return res;
  }
  
+static int __init make_iommu_node(const struct domain *d,

+  const struct kernel_info *kinfo)
+{
+const char compat[] = "xen,iommu-el2-v1";
+int res;
+
+if ( !is_iommu_enabled(d) )
+return 0;
+
+dt_dprintk("Create iommu node\n");
+
+res = fdt_begin_node(kinfo->fdt, "xen-iommu");
+if ( res )
+return res;
+
+res = fdt_property(kinfo->fdt, "compatible", compat, sizeof(compat));
+if ( res )
+return res;
+
+res = fdt_property_cell(kinfo->fdt, "#iommu-cells", 0);
+if ( res )
+return res;
+
+res = fdt_property_cell(kinfo->fdt, "phandle", GUEST_PHANDLE_IOMMU);


Please don't hardocode the phandle for the IOMMU. Instead we should use 
one for an IOMMU that is used by Xen.


This will reduce the risk to use a phandle that could be possibly used 
in the host Device-Tree.


Cheers,

--
Julien Grall



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 18:37, Jan Beulich wrote:
> On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 18:15, Jan Beulich wrote:
>>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
 On 07.02.22 17:26, Jan Beulich wrote:
> 1b. Make vpci_write use write lock for writes to command register and BARs
> only; keep using the read lock for all other writes.
 I am not quite sure how to do that. Do you mean something like:
 void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
    uint32_t data)
 [snip]
    list_for_each_entry ( r, >vpci->handlers, node )
 {
 [snip]
    if ( r->needs_write_lock)
        write_lock(d->vpci_lock)
    else
        read_lock(d->vpci_lock)
 

 And provide rw as an argument to:

 int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
  vpci_write_t *write_handler, unsigned int offset,
  unsigned int size, void *data, --->>> bool 
 write_path <<<-)

 Is this what you mean?
>>> This sounds overly complicated. You can derive locally in vpci_write(),
>>> from just its "reg" and "size" parameters, whether the lock needs taking
>>> in write mode.
>> Yes, I started writing a reply with that. So, the summary (ROM
>> position depends on header type):
>> if ( (reg == PCI_COMMAND) || (reg == ROM) )
>> {
>>       read PCI_COMMAND and see if memory or IO decoding are enabled.
>>       if ( enabled )
>>           write_lock(d->vpci_lock)
>>       else
>>           read_lock(d->vpci_lock)
>> }
> Hmm, yes, you can actually get away without using "size", since both
> command register and ROM BAR are 32-bit aligned registers, and 64-bit
> accesses get split in vpci_ecam_write().
But, OS may want reading a single byte of ROM BAR, so I think
I'll need to check if reg+size fall into PCI_COMAND and ROM BAR
ranges
>
> For the command register the memory- / IO-decoding-enabled check may
> end up a little more complicated, as the value to be written also
> matters. Maybe read the command register only for the ROM BAR write,
> using the write lock uniformly for all command register writes?
Sounds good for the start.
Another concern is that if we go with a read_lock and then in the
underlying code we disable memory decoding and try doing
something and calling cmd_write handler for any reason then

I mean that the check in the vpci_write is somewhat we can tolerate,
but then it is must be considered that no code in the read path
is allowed to perform write path functions. Which brings a pretty
valid use-case: say in read mode we detect an unrecoverable error
and need to remove the device:
vpci_process_pending -> ERROR -> vpci_remove_device or similar.

What do we do then? It is all going to be fragile...
>
>> Do you also think we can drop pdev->vpci (or currently pdev->vpci->lock)
>> at all then?
> I haven't looked at this in any detail, sorry. It sounds possible,
> yes.
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 17:21, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 18:15, Jan Beulich wrote:
>> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 17:26, Jan Beulich wrote:
 1b. Make vpci_write use write lock for writes to command register and BARs
 only; keep using the read lock for all other writes.
>>> I am not quite sure how to do that. Do you mean something like:
>>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>>       uint32_t data)
>>> [snip]
>>>       list_for_each_entry ( r, >vpci->handlers, node )
>>> {
>>> [snip]
>>>       if ( r->needs_write_lock)
>>>           write_lock(d->vpci_lock)
>>>       else
>>>           read_lock(d->vpci_lock)
>>> 
>>>
>>> And provide rw as an argument to:
>>>
>>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>>     vpci_write_t *write_handler, unsigned int offset,
>>>     unsigned int size, void *data, --->>> bool 
>>> write_path <<<-)
>>>
>>> Is this what you mean?
>> This sounds overly complicated. You can derive locally in vpci_write(),
>> from just its "reg" and "size" parameters, whether the lock needs taking
>> in write mode.
> Yes, I started writing a reply with that. So, the summary (ROM
> position depends on header type):
> if ( (reg == PCI_COMMAND) || (reg == ROM) )
> {
>      read PCI_COMMAND and see if memory or IO decoding are enabled.
>      if ( enabled )
>          write_lock(d->vpci_lock)
>      else
>          read_lock(d->vpci_lock)
> }

Hmm, yes, you can actually get away without using "size", since both
command register and ROM BAR are 32-bit aligned registers, and 64-bit
accesses get split in vpci_ecam_write().

For the command register the memory- / IO-decoding-enabled check may
end up a little more complicated, as the value to be written also
matters. Maybe read the command register only for the ROM BAR write,
using the write lock uniformly for all command register writes?

> Do you also think we can drop pdev->vpci (or currently pdev->vpci->lock)
> at all then?

I haven't looked at this in any detail, sorry. It sounds possible,
yes.

Jan




Re: [PATCH v6 05/13] vpci: add hooks for PCI device assign/de-assign

2022-02-07 Thread Jan Beulich
On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> @@ -1507,6 +1511,8 @@ static int assign_device(struct domain *d, u16 seg, u8 
> bus, u8 devfn, u32 flag)
>  pci_to_dev(pdev), flag);
>  }
>  
> +rc = vpci_assign_device(d, pdev);
> +
>   done:
>  if ( rc )
>  printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",

There's no attempt to undo anything in the case of getting back an
error. ISTR this being deemed okay on the basis that the tool stack
would then take whatever action, but whatever it is that is supposed
to deal with errors here wants spelling out in the description.
What's important is that no caller up the call tree may be left with
the impression that the device is still owned by the original
domain. With how you have it, the device is going to be owned by the
new domain, but not really usable.

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -99,6 +99,33 @@ int vpci_add_handlers(struct pci_dev *pdev)
>  
>  return rc;
>  }
> +
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +/* Notify vPCI that device is assigned to guest. */
> +int vpci_assign_device(struct domain *d, struct pci_dev *pdev)
> +{
> +int rc;
> +
> +if ( !has_vpci(d) )
> +return 0;
> +
> +rc = vpci_add_handlers(pdev);
> +if ( rc )
> +vpci_deassign_device(d, pdev);
> +
> +return rc;
> +}
> +
> +/* Notify vPCI that device is de-assigned from guest. */
> +void vpci_deassign_device(struct domain *d, struct pci_dev *pdev)
> +{
> +if ( !has_vpci(d) )
> +return;
> +
> +vpci_remove_device(pdev);
> +}
> +#endif /* CONFIG_HAS_VPCI_GUEST_SUPPORT */

While for the latter function you look to need two parameters, do you
really need them also in the former one?

Symmetry considerations make me wonder though whether the de-assign
hook shouldn't be called earlier, when pdev->domain still has the
original owner. At which point the 2nd parameter could disappear there
as well.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 18:15, Jan Beulich wrote:
> On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
>> On 07.02.22 17:26, Jan Beulich wrote:
>>> 1b. Make vpci_write use write lock for writes to command register and BARs
>>> only; keep using the read lock for all other writes.
>> I am not quite sure how to do that. Do you mean something like:
>> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>>       uint32_t data)
>> [snip]
>>       list_for_each_entry ( r, >vpci->handlers, node )
>> {
>> [snip]
>>       if ( r->needs_write_lock)
>>           write_lock(d->vpci_lock)
>>       else
>>           read_lock(d->vpci_lock)
>> 
>>
>> And provide rw as an argument to:
>>
>> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>>     vpci_write_t *write_handler, unsigned int offset,
>>     unsigned int size, void *data, --->>> bool 
>> write_path <<<-)
>>
>> Is this what you mean?
> This sounds overly complicated. You can derive locally in vpci_write(),
> from just its "reg" and "size" parameters, whether the lock needs taking
> in write mode.
Yes, I started writing a reply with that. So, the summary (ROM
position depends on header type):
if ( (reg == PCI_COMMAND) || (reg == ROM) )
{
     read PCI_COMMAND and see if memory or IO decoding are enabled.
     if ( enabled )
         write_lock(d->vpci_lock)
     else
         read_lock(d->vpci_lock)
}

Do you also think we can drop pdev->vpci (or currently pdev->vpci->lock)
at all then?
> Jan
>
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 17:07, Oleksandr Andrushchenko wrote:
> On 07.02.22 17:26, Jan Beulich wrote:
>> 1b. Make vpci_write use write lock for writes to command register and BARs
>> only; keep using the read lock for all other writes.
> I am not quite sure how to do that. Do you mean something like:
> void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>      uint32_t data)
> [snip]
>      list_for_each_entry ( r, >vpci->handlers, node )
> {
> [snip]
>      if ( r->needs_write_lock)
>          write_lock(d->vpci_lock)
>      else
>          read_lock(d->vpci_lock)
> 
> 
> And provide rw as an argument to:
> 
> int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
>    vpci_write_t *write_handler, unsigned int offset,
>    unsigned int size, void *data, --->>> bool write_path 
> <<<-)
> 
> Is this what you mean?

This sounds overly complicated. You can derive locally in vpci_write(),
from just its "reg" and "size" parameters, whether the lock needs taking
in write mode.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 17:08, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 04:26:56PM +0100, Jan Beulich wrote:
>> On 07.02.2022 16:11, Oleksandr Andrushchenko wrote:
>>>
>>>
>>> On 07.02.22 16:35, Oleksandr Andrushchenko wrote:

 On 07.02.22 16:27, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 14:46, Roger Pau Monné wrote:
 I think the per-domain rwlock seems like a good option. I would do
 that as a pre-patch.
>>> It is. But it seems it won't solve the thing we started this adventure 
>>> for:
>>>
>>> With per-domain read lock and still ABBA in modify_bars (hope the below
>>> is correctly seen with a monospace font):
>>>
>>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
>>>    rom_write -> modify_bars: tmp (pdev2) ->lock
>>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
>>> modify_bars: tmp (pdev1) ->lock
>>>
>>> There is no API to upgrade read lock to write lock in modify_bars which 
>>> could help,
>>> so in both cases vpci_write should take write lock.
>> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>> to acquire the write lock, but its (perhaps indirect) caller. Effectively
>> vpci_write() would need to take the write lock if the range written
>> overlaps the BARs or the command register.
> I'm confused. If we use a per-domain rwlock approach there would be no
> need to lock tmp again in modify_bars, because we should hold the
> rwlock in write mode, so there's no ABBA?
 this is only possible with what you wrote below:
> We will have however to drop the per domain read and vpci locks and
> pick the per-domain lock in write mode.
 I think this is going to be unreliable. We need a reliable way to
 upgrade read lock to write lock.
 Then, we can drop pdev->vpci_lock at all, because we are always
 protected with d->rwlock and those who want to free pdev->vpci
 will use write lock.

 So, per-domain rwlock with write upgrade implemented minus pdev->vpci
 should do the trick
>>> Linux doesn't implement write upgrade and it seems for a reason [1]:
>>> "Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ 
>>> time
>>> need to do any changes (even if you don’t do it every time), you have to get
>>> the write-lock at the very beginning."
>>>
>>> So, I am not sure we can have the same for Xen...
>>>
>>> At the moment I see at least two possible ways to solve the issue:
>>> 1. Make vpci_write use write lock, thus make all write accesses synchronized
>>> for the given domain, read are fully parallel
>>
>> 1b. Make vpci_write use write lock for writes to command register and BARs
>> only; keep using the read lock for all other writes.
> 
> We do not support writing to the BARs with memory decoding enabled
> currently for dom0, so we would only need to pick the lock in write
> mode for the command register and ROM BAR write handler AFAICT.

Oh, right - this then makes for even less contention due to needing to
acquire the lock in write mode.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 04:26:56PM +0100, Jan Beulich wrote:
> On 07.02.2022 16:11, Oleksandr Andrushchenko wrote:
> > 
> > 
> > On 07.02.22 16:35, Oleksandr Andrushchenko wrote:
> >>
> >> On 07.02.22 16:27, Roger Pau Monné wrote:
> >>> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>  On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
> > On 07.02.22 14:46, Roger Pau Monné wrote:
> >> I think the per-domain rwlock seems like a good option. I would do
> >> that as a pre-patch.
> > It is. But it seems it won't solve the thing we started this adventure 
> > for:
> >
> > With per-domain read lock and still ABBA in modify_bars (hope the below
> > is correctly seen with a monospace font):
> >
> > cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
> >    rom_write -> modify_bars: tmp (pdev2) ->lock
> > cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
> > modify_bars: tmp (pdev1) ->lock
> >
> > There is no API to upgrade read lock to write lock in modify_bars which 
> > could help,
> > so in both cases vpci_write should take write lock.
>  Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>  to acquire the write lock, but its (perhaps indirect) caller. Effectively
>  vpci_write() would need to take the write lock if the range written
>  overlaps the BARs or the command register.
> >>> I'm confused. If we use a per-domain rwlock approach there would be no
> >>> need to lock tmp again in modify_bars, because we should hold the
> >>> rwlock in write mode, so there's no ABBA?
> >> this is only possible with what you wrote below:
> >>> We will have however to drop the per domain read and vpci locks and
> >>> pick the per-domain lock in write mode.
> >> I think this is going to be unreliable. We need a reliable way to
> >> upgrade read lock to write lock.
> >> Then, we can drop pdev->vpci_lock at all, because we are always
> >> protected with d->rwlock and those who want to free pdev->vpci
> >> will use write lock.
> >>
> >> So, per-domain rwlock with write upgrade implemented minus pdev->vpci
> >> should do the trick
> > Linux doesn't implement write upgrade and it seems for a reason [1]:
> > "Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ 
> > time
> > need to do any changes (even if you don’t do it every time), you have to get
> > the write-lock at the very beginning."
> > 
> > So, I am not sure we can have the same for Xen...
> > 
> > At the moment I see at least two possible ways to solve the issue:
> > 1. Make vpci_write use write lock, thus make all write accesses synchronized
> > for the given domain, read are fully parallel
> 
> 1b. Make vpci_write use write lock for writes to command register and BARs
> only; keep using the read lock for all other writes.

We do not support writing to the BARs with memory decoding enabled
currently for dom0, so we would only need to pick the lock in write
mode for the command register and ROM BAR write handler AFAICT.

Thanks, Roger.



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 17:26, Jan Beulich wrote:
> On 07.02.2022 16:11, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 16:35, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 16:27, Roger Pau Monné wrote:
 On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>> On 07.02.22 14:46, Roger Pau Monné wrote:
>>> I think the per-domain rwlock seems like a good option. I would do
>>> that as a pre-patch.
>> It is. But it seems it won't solve the thing we started this adventure 
>> for:
>>
>> With per-domain read lock and still ABBA in modify_bars (hope the below
>> is correctly seen with a monospace font):
>>
>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->
>>   rom_write -> modify_bars: tmp (pdev2) ->lock
>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
>> modify_bars: tmp (pdev1) ->lock
>>
>> There is no API to upgrade read lock to write lock in modify_bars which 
>> could help,
>> so in both cases vpci_write should take write lock.
> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
> to acquire the write lock, but its (perhaps indirect) caller. Effectively
> vpci_write() would need to take the write lock if the range written
> overlaps the BARs or the command register.
 I'm confused. If we use a per-domain rwlock approach there would be no
 need to lock tmp again in modify_bars, because we should hold the
 rwlock in write mode, so there's no ABBA?
>>> this is only possible with what you wrote below:
 We will have however to drop the per domain read and vpci locks and
 pick the per-domain lock in write mode.
>>> I think this is going to be unreliable. We need a reliable way to
>>> upgrade read lock to write lock.
>>> Then, we can drop pdev->vpci_lock at all, because we are always
>>> protected with d->rwlock and those who want to free pdev->vpci
>>> will use write lock.
>>>
>>> So, per-domain rwlock with write upgrade implemented minus pdev->vpci
>>> should do the trick
>> Linux doesn't implement write upgrade and it seems for a reason [1]:
>> "Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ 
>> time
>> need to do any changes (even if you don’t do it every time), you have to get
>> the write-lock at the very beginning."
>>
>> So, I am not sure we can have the same for Xen...
>>
>> At the moment I see at least two possible ways to solve the issue:
>> 1. Make vpci_write use write lock, thus make all write accesses synchronized
>> for the given domain, read are fully parallel
> 1b. Make vpci_write use write lock for writes to command register and BARs
> only; keep using the read lock for all other writes.
I am not quite sure how to do that. Do you mean something like:
void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
     uint32_t data)
[snip]
     list_for_each_entry ( r, >vpci->handlers, node )
{
[snip]
     if ( r->needs_write_lock)
         write_lock(d->vpci_lock)
     else
         read_lock(d->vpci_lock)


And provide rw as an argument to:

int vpci_add_register(struct vpci *vpci, vpci_read_t *read_handler,
   vpci_write_t *write_handler, unsigned int offset,
   unsigned int size, void *data, --->>> bool write_path 
<<<-)

Is this what you mean?

With the above, if we have d->vpci_lock, I think we can drop
pdev->vpci_lock at all

Thank you,
Oleksandr

P.S. I don't think you mean we just drop the read lock and acquire write lock
as it leads to the mentioned before unreliability.


Re: [PATCH] x86/hvm: Fix boot on systems where HVM isn't available

2022-02-07 Thread Andrew Cooper
On 07/02/2022 08:29, Jan Beulich wrote:
> On 05.02.2022 10:47, Roger Pau Monné wrote:
>> On Fri, Feb 04, 2022 at 05:34:05PM +, Andrew Cooper wrote:
>>> c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
>>> alt-call") went too far with dropping NULL function pointer checks.
> Oh, I'm sorry, I should have noticed this.
>
>>> smp_callin() calls hvm_cpu_up() unconditionally.  When the platform doesn't
>>> support HVM, hvm_enable() exits without filling in hvm_funcs, after which 
>>> the
>>> altcall pass nukes the (now unconditional) indirect call, causing:
>>>
>>>   (XEN) [ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]
>>>   (XEN) CPU:1
>>>   (XEN) RIP:e008:[] start_secondary+0x393/0x3b7
>>>   (XEN) RFLAGS: 00010086   CONTEXT: hypervisor
>>>   ...
>>>   (XEN) Xen code around  (start_secondary+0x393/0x3b7):
>>>   (XEN)  ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db 
>>> fe ff ff
>>>   ...
>>>   (XEN) Xen call trace:
>>>   (XEN)[] R start_secondary+0x393/0x3b7
>>>   (XEN)[] F __high_start+0x42/0x60
>>>
>>> To make matters worse, __stop_this_cpu() calls hvm_cpu_down() 
>>> unconditionally
>>> too, so what happen next is:
>>>
>>>   (XEN) [ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]
>>>   (XEN) CPU:0
>>>   (XEN) RIP:e008:[] __stop_this_cpu+0x12/0x3c
>>>   (XEN) RFLAGS: 00010046   CONTEXT: hypervisor
>>>   ...
>>>   (XEN) Xen code around  (__stop_this_cpu+0x12/0x3c):
>>>   (XEN)  48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 
>>> 48 0d ff
>>>   ...
>>>   (XEN) Xen call trace:
>>>   (XEN)[] R __stop_this_cpu+0x12/0x3c
>>>   (XEN)[] F smp_send_stop+0xdd/0xf8
>>>   (XEN)[] F machine_restart+0xa2/0x298
>>>   (XEN)[] F 
>>> arch/x86/shutdown.c#__machine_restart+0xb/0x11
>>>   (XEN)[] F smp_call_function_interrupt+0xbf/0xea
>>>   (XEN)[] F call_function_interrupt+0x35/0x37
>>>   (XEN)[] F do_IRQ+0xa3/0x6b5
>>>   (XEN)[] F common_interrupt+0x10a/0x120
>>>   (XEN)[] F __udelay+0x3a/0x51
>>>   (XEN)[] F __cpu_up+0x48f/0x734
>>>   (XEN)[] F cpu_up+0x7d/0xde
>>>   (XEN)[] F __start_xen+0x200b/0x2618
>>>   (XEN)[] F __high_start+0x4f/0x60
>>>
>>> which recurses until hitting a stack overflow.  The #DF handler, which 
>>> resets
>>> its stack on each invocation, loops indefinitely.
>>>
>>> Reinstate the NULL function pointer checks for hvm_cpu_{up,down}().
>>>
>>> Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations 
>>> to alt-call")
>>> Signed-off-by: Andrew Cooper 
>> Reviewed-by: Roger Pau Monné 
> Reviewed-by: Jan Beulich 

Thanks.

>
>>> ---
>>> CC: Jan Beulich 
>>> CC: Roger Pau Monné 
>>> CC: Wei Liu 
>>>
>>> RFC.  Not tested yet on the imacted hardware.  It's a Xeon PHI with another
>>> werid thing in need of debugging.  First boot is fine, while second
>>> boot (loading microcode this time) has a problem with vmx.
> Sounds not unfamiliar: My meanwhile oldish Romley needs to be cold-
> booted for VMX to actually be usable (not locked) on APs.

This is something which goes wrong as a consequence of loading microcode.

>>> I wonder if we want to modify the callers to check for HVM being enabled,
>>> rather than leaving the NULL pointer checks in a position where they're 
>>> liable
>>> to be reaped again.
>> What about adding a couple of comments to hvm_cpu_{up,down} to note
>> they are called unconditionally regardless of whether HVM is present
>> or not?
> I second this as the perhaps better alternative: The S3 path is
> similarly affected (and you may want to mention this in the
> description), so this would mean up to 5 conditionals (at the
> source level) instead of the just two you get away with here.

Ok.  I've added:

/* Called in boot/resume paths.  Must cope with no HVM support. */

and:

/* Called in shutdown paths.  Must cope with no HVM support. */

~Andrew


Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 17:28, Jan Beulich wrote:
> On 07.02.2022 16:14, Oleksandr Andrushchenko wrote:
>> On 07.02.22 17:05, Jan Beulich wrote:
>>> On 07.02.2022 15:46, Oleksandr Andrushchenko wrote:
 On 07.02.22 16:31, Jan Beulich wrote:
> But: What's still missing here then is the separation of guest and host
> views. When we set INTx behind the guest's back, it shouldn't observe the
> bit set. Or is this meant to be another (big) TODO?
 But, patch [PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for 
 guests
 already takes care of it, I mean that it will set/reset INTx for the guest
 according to MSI/MSI-X. So, if we squash these two patches the whole
 picture will be seen at once.
>>> Does it? I did get the impression that the guest would be able to observe
>>> the bit set even after writing zero to it (while a reason exists that Xen
>>> wants the bit set).
>> Yes, you are correct: guest might not see what it wanted to set.
>> I meant that Xen won't allow resetting INTx if it is not possible
>> due to MSI/MSI-X
>>
>> Anyways, I think squashing will be a good idea to have the relevant
>> functionality in a single change set. Will this work for you?
> It might work, but I'd prefer things which can sensibly be separate to
> remain separate.
Ok, two patches
> Jan
>


[ovmf test] 168046: all pass - PUSHED

2022-02-07 Thread osstest service owner
flight 168046 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168046/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 1f54eaa725f45e0c66c28f8d47fa8fb33f7be52c
baseline version:
 ovmf 96b8b5fd108a1f27960eee3915c0b10db191c849

Last test of basis   168043  2022-02-07 10:42:40 Z0 days
Testing same since   168046  2022-02-07 14:11:34 Z0 days1 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Leif Lindholm 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   96b8b5fd10..1f54eaa725  1f54eaa725f45e0c66c28f8d47fa8fb33f7be52c -> 
xen-tested-master



Re: [PATCH 02/16] x86/P2M: introduce p2m_{add,remove}_page()

2022-02-07 Thread George Dunlap


> On Feb 7, 2022, at 9:38 AM, Jan Beulich  wrote:
> 
> On 04.02.2022 23:07, George Dunlap wrote:
>> On Mon, Jul 5, 2021 at 5:06 PM Jan Beulich  wrote:
>> 
>>> p2m_add_page() is simply a rename from guest_physmap_add_entry().
>>> p2m_remove_page() then is its counterpart, despite rendering
>>> guest_physmap_remove_page().
> 
> First of all: It has been long ago that I noticed that this sentence
> misses words. It now ends "...  a trivial wrapper."
> 
>>> This way callers can use suitable pairs of
>>> functions (previously violated by hvm/grant_table.c).
>>> 
>> 
>> Obviously this needs some clarification.  While we're here, I find this a
>> bit confusing; I tend to use the present tense for the way the code is
>> before the patch, and the imperative for what the patch does; so Id' say:
>> 
>> Rename guest_physmap_add_entry() to p2m_add_page; make
>> guest_physmap_remove_page a wrapper with p2m_remove_page.  That way callers
>> can use suitable pairs...
> 
> Well, yes, I understand you might word it this way. I'm not convinced
> of the fixed scheme you mention for present vs imperative use to be a
> universal fit though, requiring to always be followed. When reading
> the description with the title in mind (and with the previously missing
> words added), I find the use of present tense quite reasonable here.

The way you wrote it is ambiguous grammatically; it could either mean, “Right 
now p2m_add_page() is simply a rename, and so…” or it could mean, “In this 
patch, p2m_add_page() is simply a rename.”  If a reader starts interpreting it 
the first way, then they’ll read along until it doesn’t make sense any more, 
then have to re-evaluate the whole paragraph.

It seems to me that my proposal is unambiguous.

> I'm further slightly puzzled by you keeping the use of present tense in
> "That way callers can use ...".

I wouldn’t call that the present tense; I’m sure a real linguist would have a 
name for it. Consider the sentence, “Put the box near the door; that way we can 
find it easily when we need it.”  The second half of the sentence is set in the 
hypothetical universe in which the imperative has been carried out.

 -George




Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Jan Beulich
On 07.02.2022 16:14, Oleksandr Andrushchenko wrote:
> On 07.02.22 17:05, Jan Beulich wrote:
>> On 07.02.2022 15:46, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 16:31, Jan Beulich wrote:
 But: What's still missing here then is the separation of guest and host
 views. When we set INTx behind the guest's back, it shouldn't observe the
 bit set. Or is this meant to be another (big) TODO?
>>> But, patch [PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for 
>>> guests
>>> already takes care of it, I mean that it will set/reset INTx for the guest
>>> according to MSI/MSI-X. So, if we squash these two patches the whole
>>> picture will be seen at once.
>> Does it? I did get the impression that the guest would be able to observe
>> the bit set even after writing zero to it (while a reason exists that Xen
>> wants the bit set).
> Yes, you are correct: guest might not see what it wanted to set.
> I meant that Xen won't allow resetting INTx if it is not possible
> due to MSI/MSI-X
> 
> Anyways, I think squashing will be a good idea to have the relevant
> functionality in a single change set. Will this work for you?

It might work, but I'd prefer things which can sensibly be separate to
remain separate.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 16:11, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 16:35, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 16:27, Roger Pau Monné wrote:
>>> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
 On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
> On 07.02.22 14:46, Roger Pau Monné wrote:
>> I think the per-domain rwlock seems like a good option. I would do
>> that as a pre-patch.
> It is. But it seems it won't solve the thing we started this adventure 
> for:
>
> With per-domain read lock and still ABBA in modify_bars (hope the below
> is correctly seen with a monospace font):
>
> cpu0: vpci_write-> d->RLock -> pdev1->lock -> 
>  rom_write -> modify_bars: tmp (pdev2) ->lock
> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
> modify_bars: tmp (pdev1) ->lock
>
> There is no API to upgrade read lock to write lock in modify_bars which 
> could help,
> so in both cases vpci_write should take write lock.
 Hmm, yes, I think you're right: It's not modify_bars() itself which needs
 to acquire the write lock, but its (perhaps indirect) caller. Effectively
 vpci_write() would need to take the write lock if the range written
 overlaps the BARs or the command register.
>>> I'm confused. If we use a per-domain rwlock approach there would be no
>>> need to lock tmp again in modify_bars, because we should hold the
>>> rwlock in write mode, so there's no ABBA?
>> this is only possible with what you wrote below:
>>> We will have however to drop the per domain read and vpci locks and
>>> pick the per-domain lock in write mode.
>> I think this is going to be unreliable. We need a reliable way to
>> upgrade read lock to write lock.
>> Then, we can drop pdev->vpci_lock at all, because we are always
>> protected with d->rwlock and those who want to free pdev->vpci
>> will use write lock.
>>
>> So, per-domain rwlock with write upgrade implemented minus pdev->vpci
>> should do the trick
> Linux doesn't implement write upgrade and it seems for a reason [1]:
> "Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ 
> time
> need to do any changes (even if you don’t do it every time), you have to get
> the write-lock at the very beginning."
> 
> So, I am not sure we can have the same for Xen...
> 
> At the moment I see at least two possible ways to solve the issue:
> 1. Make vpci_write use write lock, thus make all write accesses synchronized
> for the given domain, read are fully parallel

1b. Make vpci_write use write lock for writes to command register and BARs
only; keep using the read lock for all other writes.

Jan

> 2. Re-implement pdev/tmp overlapping detection with something which won't
> require pdev->vpci_lock/tmp->vpci_lock
> 
> 3. Drop read and acquire write lock in modify_bars... but this is not reliable
> and will hide a free(pdev->vpci) bug
> 
> @Roger, @Jan: Any other suggestions?
> 
> Thank you,
> Oleksandr
> 
> [1] 
> https://www.kernel.org/doc/html/latest/locking/spinlocks.html#lesson-2-reader-writer-spinlocks




Re: [PATCH 08/16] x86/P2M: PoD, altp2m, and nested-p2m are HVM-only

2022-02-07 Thread Jan Beulich
On 07.02.2022 15:45, George Dunlap wrote:
>> On Feb 7, 2022, at 10:11 AM, Jan Beulich  wrote:
>> On 05.02.2022 22:29, George Dunlap wrote:
 On Jul 5, 2021, at 5:09 PM, Jan Beulich  wrote:
 --- a/xen/arch/x86/mm/p2m-pod.c
 +++ b/xen/arch/x86/mm/p2m-pod.c
 @@ -1135,6 +1135,12 @@ p2m_pod_demand_populate(struct p2m_domai
mfn_t mfn;
unsigned long i;

 +if ( !p2m_is_hostp2m(p2m) )
 +{
 +ASSERT_UNREACHABLE();
 +return false;
 +}
 +
ASSERT(gfn_locked_by_me(p2m, gfn));
pod_lock(p2m);
>>>
>>> Why this check rather than something which explicitly says HVM?
>>
>> Checking for just HVM is too lax here imo. PoD operations should
>> never be invoked for alternative or nested p2ms; see the various
>> uses of p2m_get_hostp2m() in p2m-pod.c.
> 
> The fact remains that it doesn’t match what the patch descriptions says, and 
> you’re making me, the reviewer, guess why you changed it — along with anyone 
> else coming back to try to figure out why the code was this way.
> 
> If you want me to approve of the decision to make the check more strict than 
> simply HVM, then you need to make it clear why you’re doing it.  Adding a 
> sentence in the commit message should be fine.

I've added a paragraph, but already after your first reply I was
asking myself whether I actually need that change here. It's
more of the "just to be on the safe side" nature, I think. But
it's been quite a while since I put this change together, so I
may also have forgotten about some subtle aspect.

Jan




Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 17:05, Jan Beulich wrote:
> On 07.02.2022 15:46, Oleksandr Andrushchenko wrote:
>> On 07.02.22 16:31, Jan Beulich wrote:
>>> But: What's still missing here then is the separation of guest and host
>>> views. When we set INTx behind the guest's back, it shouldn't observe the
>>> bit set. Or is this meant to be another (big) TODO?
>> But, patch [PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for 
>> guests
>> already takes care of it, I mean that it will set/reset INTx for the guest
>> according to MSI/MSI-X. So, if we squash these two patches the whole
>> picture will be seen at once.
> Does it? I did get the impression that the guest would be able to observe
> the bit set even after writing zero to it (while a reason exists that Xen
> wants the bit set).
Yes, you are correct: guest might not see what it wanted to set.
I meant that Xen won't allow resetting INTx if it is not possible
due to MSI/MSI-X

Anyways, I think squashing will be a good idea to have the relevant
functionality in a single change set. Will this work for you?
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 16:35, Oleksandr Andrushchenko wrote:
>
> On 07.02.22 16:27, Roger Pau Monné wrote:
>> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>>> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
 On 07.02.22 14:46, Roger Pau Monné wrote:
> I think the per-domain rwlock seems like a good option. I would do
> that as a pre-patch.
 It is. But it seems it won't solve the thing we started this adventure for:

 With per-domain read lock and still ABBA in modify_bars (hope the below
 is correctly seen with a monospace font):

 cpu0: vpci_write-> d->RLock -> pdev1->lock ->  
     rom_write -> modify_bars: tmp (pdev2) ->lock
 cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> 
 modify_bars: tmp (pdev1) ->lock

 There is no API to upgrade read lock to write lock in modify_bars which 
 could help,
 so in both cases vpci_write should take write lock.
>>> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>>> to acquire the write lock, but its (perhaps indirect) caller. Effectively
>>> vpci_write() would need to take the write lock if the range written
>>> overlaps the BARs or the command register.
>> I'm confused. If we use a per-domain rwlock approach there would be no
>> need to lock tmp again in modify_bars, because we should hold the
>> rwlock in write mode, so there's no ABBA?
> this is only possible with what you wrote below:
>> We will have however to drop the per domain read and vpci locks and
>> pick the per-domain lock in write mode.
> I think this is going to be unreliable. We need a reliable way to
> upgrade read lock to write lock.
> Then, we can drop pdev->vpci_lock at all, because we are always
> protected with d->rwlock and those who want to free pdev->vpci
> will use write lock.
>
> So, per-domain rwlock with write upgrade implemented minus pdev->vpci
> should do the trick
Linux doesn't implement write upgrade and it seems for a reason [1]:
"Also, you cannot “upgrade” a read-lock to a write-lock, so if you at _any_ time
need to do any changes (even if you don’t do it every time), you have to get
the write-lock at the very beginning."

So, I am not sure we can have the same for Xen...

At the moment I see at least two possible ways to solve the issue:
1. Make vpci_write use write lock, thus make all write accesses synchronized
for the given domain, read are fully parallel

2. Re-implement pdev/tmp overlapping detection with something which won't
require pdev->vpci_lock/tmp->vpci_lock

3. Drop read and acquire write lock in modify_bars... but this is not reliable
and will hide a free(pdev->vpci) bug

@Roger, @Jan: Any other suggestions?

Thank you,
Oleksandr

[1] 
https://www.kernel.org/doc/html/latest/locking/spinlocks.html#lesson-2-reader-writer-spinlocks

Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Jan Beulich
On 07.02.2022 15:46, Oleksandr Andrushchenko wrote:
> On 07.02.22 16:31, Jan Beulich wrote:
>> But: What's still missing here then is the separation of guest and host
>> views. When we set INTx behind the guest's back, it shouldn't observe the
>> bit set. Or is this meant to be another (big) TODO?
> But, patch [PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for 
> guests
> already takes care of it, I mean that it will set/reset INTx for the guest
> according to MSI/MSI-X. So, if we squash these two patches the whole
> picture will be seen at once.

Does it? I did get the impression that the guest would be able to observe
the bit set even after writing zero to it (while a reason exists that Xen
wants the bit set).

Jan




Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 16:31, Jan Beulich wrote:
> On 07.02.2022 15:17, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 14:54, Jan Beulich wrote:
>>> On 07.02.2022 13:51, Oleksandr Andrushchenko wrote:
 On 07.02.22 14:38, Jan Beulich wrote:
> On 07.02.2022 12:27, Oleksandr Andrushchenko wrote:
>> On 07.02.22 09:29, Jan Beulich wrote:
>>> On 04.02.2022 15:37, Oleksandr Andrushchenko wrote:
 On 04.02.22 16:30, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> Reset the command register when assigning a PCI device to a guest:
>> according to the PCI spec the PCI_COMMAND register is typically all 
>> 0's
>> after reset.
> It's not entirely clear to me whether setting the hardware register to
> zero is okay. What wants to be zero is the value the guest observes
> initially.
 "the PCI spec says the PCI_COMMAND register is typically all 0's after 
 reset."
 Why wouldn't it be ok? What is the exact concern here?
>>> The concern is - as voiced is similar ways before, perhaps in other
>>> contexts - that you need to consider bit-by-bit whether overwriting
>>> with 0 what is currently there is okay. Xen and/or Dom0 may have put
>>> values there which they expect to remain unaltered. I guess
>>> PCI_COMMAND_SERR is a good example: While the guest's view of this
>>> will want to be zero initially, the host having set it to 1 may not
>>> easily be overwritten with 0, or else you'd effectively imply giving
>>> the guest control of the bit.
>> We have already discussed in great detail PCI_COMMAND emulation [1].
>> At the end you wrote [1]:
>> "Well, in order for the whole thing to be security supported it needs to
>> be explained for every bit why it is safe to allow the guest to drive it.
>> Until you mean vPCI to reach that state, leaving TODO notes in the code
>> for anything not investigated may indeed be good enough.
>>
>> Jan"
>>
>> So, this is why I left a TODO in the PCI_COMMAND emulation for now and 
>> only
>> care about INTx which is honored with the code in this patch.
> Right. The issue I see is that the description does not have any
> mention of this, but instead talks about simply writing zero.
 How do you want that mentioned? Extended commit message or
 just a link to the thread [1]?
>>> What I'd like you to describe is what the change does without
>>> fundamentally implying it'll end up being zero which gets written
>>> to the register. Stating as a conclusion that for the time being
>>> this means writing zero is certainly fine (and likely helpful if
>>> made explicit).
>> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
>> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
>> guest's view of this will want to be zero initially, the host having set
>> it to 1 may not easily be overwritten with 0, or else we'd effectively
>> imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
>> proper emulation in order to honor host's settings.
>>
>> There are examples of emulators [1], [2] which already deal with PCI_COMMAND
>> register emulation and it seems that at most they care about the only INTX
>> bit (besides IO/memory enable and bus muster which are write through).
>> It could be because in order to properly emulate the PCI_COMMAND register
>> we need to know about the whole PCI topology, e.g. if any setting in device's
>> command register is aligned with the upstream port etc.
>> This makes me think that because of this complexity others just ignore that.
>> Neither I think this can be easily done in Xen case.
>>
>> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
>> Device Control" says that the reset state of the command register is
>> typically 0, so reset the command register when assigning a PCI device
>> to a guest t all 0's and for now only make sure INTx bit is set according
>> to if MSI/MSI-X enabled.
> "... is typically 0, so when assigning a PCI device reset the guest view of
>   the command register to all 0's. For now our emulation only makes sure INTx
>   is set according to host requirements, i.e. depending on MSI/MSI-X enabled
>   state."
This sounds good, I will use it. Thank you
>
> Maybe? (Obviously a fresh device given to a guest will have MSI/MSI-X
> disabled, so I'm not sure that aspect really needs mentioning.)
>
> But: What's still missing here then is the separation of guest and host
> views. When we set INTx behind the guest's back, it shouldn't observe the
> bit set. Or is this meant to be another (big) TODO?
But, patch [PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for guests
already takes care of it, I mean that it will set/reset INTx for the guest
according to MSI/MSI-X. So, if we squash these two patches the whole
picture will be seen at once.
>
> Jan
>
Thank 

Re: [PATCH 08/16] x86/P2M: PoD, altp2m, and nested-p2m are HVM-only

2022-02-07 Thread George Dunlap


> On Feb 7, 2022, at 10:11 AM, Jan Beulich  wrote:
> 
> On 05.02.2022 22:29, George Dunlap wrote:
>>> On Jul 5, 2021, at 5:09 PM, Jan Beulich  wrote:
>>> --- a/xen/arch/x86/mm/p2m-pod.c
>>> +++ b/xen/arch/x86/mm/p2m-pod.c
>>> @@ -1135,6 +1135,12 @@ p2m_pod_demand_populate(struct p2m_domai
>>>mfn_t mfn;
>>>unsigned long i;
>>> 
>>> +if ( !p2m_is_hostp2m(p2m) )
>>> +{
>>> +ASSERT_UNREACHABLE();
>>> +return false;
>>> +}
>>> +
>>>ASSERT(gfn_locked_by_me(p2m, gfn));
>>>pod_lock(p2m);
>> 
>> Why this check rather than something which explicitly says HVM?
> 
> Checking for just HVM is too lax here imo. PoD operations should
> never be invoked for alternative or nested p2ms; see the various
> uses of p2m_get_hostp2m() in p2m-pod.c.

The fact remains that it doesn’t match what the patch descriptions says, and 
you’re making me, the reviewer, guess why you changed it — along with anyone 
else coming back to try to figure out why the code was this way.

If you want me to approve of the decision to make the check more strict than 
simply HVM, then you need to make it clear why you’re doing it.  Adding a 
sentence in the commit message should be fine.

 -George

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 16:27, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 14:46, Roger Pau Monné wrote:
 I think the per-domain rwlock seems like a good option. I would do
 that as a pre-patch.
>>> It is. But it seems it won't solve the thing we started this adventure for:
>>>
>>> With per-domain read lock and still ABBA in modify_bars (hope the below
>>> is correctly seen with a monospace font):
>>>
>>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
>>>    rom_write -> modify_bars: tmp (pdev2) ->lock
>>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
>>> tmp (pdev1) ->lock
>>>
>>> There is no API to upgrade read lock to write lock in modify_bars which 
>>> could help,
>>> so in both cases vpci_write should take write lock.
>> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>> to acquire the write lock, but its (perhaps indirect) caller. Effectively
>> vpci_write() would need to take the write lock if the range written
>> overlaps the BARs or the command register.
> I'm confused. If we use a per-domain rwlock approach there would be no
> need to lock tmp again in modify_bars, because we should hold the
> rwlock in write mode, so there's no ABBA?
this is only possible with what you wrote below:
>
> We will have however to drop the per domain read and vpci locks and
> pick the per-domain lock in write mode.
I think this is going to be unreliable. We need a reliable way to
upgrade read lock to write lock.
Then, we can drop pdev->vpci_lock at all, because we are always
protected with d->rwlock and those who want to free pdev->vpci
will use write lock.

So, per-domain rwlock with write upgrade implemented minus pdev->vpci
should do the trick
> Thanks, Roger.
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 15:27, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
>> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>>> On 07.02.22 14:46, Roger Pau Monné wrote:
 I think the per-domain rwlock seems like a good option. I would do
 that as a pre-patch.
>>> It is. But it seems it won't solve the thing we started this adventure for:
>>>
>>> With per-domain read lock and still ABBA in modify_bars (hope the below
>>> is correctly seen with a monospace font):
>>>
>>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
>>>    rom_write -> modify_bars: tmp (pdev2) ->lock
>>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
>>> tmp (pdev1) ->lock
>>>
>>> There is no API to upgrade read lock to write lock in modify_bars which 
>>> could help,
>>> so in both cases vpci_write should take write lock.
>>
>> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
>> to acquire the write lock, but its (perhaps indirect) caller. Effectively
>> vpci_write() would need to take the write lock if the range written
>> overlaps the BARs or the command register.
> 
> I'm confused. If we use a per-domain rwlock approach there would be no
> need to lock tmp again in modify_bars, because we should hold the
> rwlock in write mode, so there's no ABBA?
> 
> We will have however to drop the per domain read and vpci locks and
> pick the per-domain lock in write mode.

Well, yes, with intermediate dropping of the lock acquiring in write mode
can be done in modify_bars(). I'm not convinced (yet) that such intermediate
dropping is actually going to be okay.

Jan




Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Jan Beulich
On 07.02.2022 15:17, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 14:54, Jan Beulich wrote:
>> On 07.02.2022 13:51, Oleksandr Andrushchenko wrote:
>>>
>>> On 07.02.22 14:38, Jan Beulich wrote:
 On 07.02.2022 12:27, Oleksandr Andrushchenko wrote:
> On 07.02.22 09:29, Jan Beulich wrote:
>> On 04.02.2022 15:37, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 16:30, Jan Beulich wrote:
 On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> Reset the command register when assigning a PCI device to a guest:
> according to the PCI spec the PCI_COMMAND register is typically all 
> 0's
> after reset.
 It's not entirely clear to me whether setting the hardware register to
 zero is okay. What wants to be zero is the value the guest observes
 initially.
>>> "the PCI spec says the PCI_COMMAND register is typically all 0's after 
>>> reset."
>>> Why wouldn't it be ok? What is the exact concern here?
>> The concern is - as voiced is similar ways before, perhaps in other
>> contexts - that you need to consider bit-by-bit whether overwriting
>> with 0 what is currently there is okay. Xen and/or Dom0 may have put
>> values there which they expect to remain unaltered. I guess
>> PCI_COMMAND_SERR is a good example: While the guest's view of this
>> will want to be zero initially, the host having set it to 1 may not
>> easily be overwritten with 0, or else you'd effectively imply giving
>> the guest control of the bit.
> We have already discussed in great detail PCI_COMMAND emulation [1].
> At the end you wrote [1]:
> "Well, in order for the whole thing to be security supported it needs to
> be explained for every bit why it is safe to allow the guest to drive it.
> Until you mean vPCI to reach that state, leaving TODO notes in the code
> for anything not investigated may indeed be good enough.
>
> Jan"
>
> So, this is why I left a TODO in the PCI_COMMAND emulation for now and 
> only
> care about INTx which is honored with the code in this patch.
 Right. The issue I see is that the description does not have any
 mention of this, but instead talks about simply writing zero.
>>> How do you want that mentioned? Extended commit message or
>>> just a link to the thread [1]?
>> What I'd like you to describe is what the change does without
>> fundamentally implying it'll end up being zero which gets written
>> to the register. Stating as a conclusion that for the time being
>> this means writing zero is certainly fine (and likely helpful if
>> made explicit).
> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
> guest's view of this will want to be zero initially, the host having set
> it to 1 may not easily be overwritten with 0, or else we'd effectively
> imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
> proper emulation in order to honor host's settings.
> 
> There are examples of emulators [1], [2] which already deal with PCI_COMMAND
> register emulation and it seems that at most they care about the only INTX
> bit (besides IO/memory enable and bus muster which are write through).
> It could be because in order to properly emulate the PCI_COMMAND register
> we need to know about the whole PCI topology, e.g. if any setting in device's
> command register is aligned with the upstream port etc.
> This makes me think that because of this complexity others just ignore that.
> Neither I think this can be easily done in Xen case.
> 
> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
> Device Control" says that the reset state of the command register is
> typically 0, so reset the command register when assigning a PCI device
> to a guest t all 0's and for now only make sure INTx bit is set according
> to if MSI/MSI-X enabled.

"... is typically 0, so when assigning a PCI device reset the guest view of
 the command register to all 0's. For now our emulation only makes sure INTx
 is set according to host requirements, i.e. depending on MSI/MSI-X enabled
 state."

Maybe? (Obviously a fresh device given to a guest will have MSI/MSI-X 
disabled, so I'm not sure that aspect really needs mentioning.)

But: What's still missing here then is the separation of guest and host
views. When we set INTx behind the guest's back, it shouldn't observe the
bit set. Or is this meant to be another (big) TODO?

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 16:11, Jan Beulich wrote:
> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
>> On 07.02.22 14:46, Roger Pau Monné wrote:
>>> I think the per-domain rwlock seems like a good option. I would do
>>> that as a pre-patch.
>> It is. But it seems it won't solve the thing we started this adventure for:
>>
>> With per-domain read lock and still ABBA in modify_bars (hope the below
>> is correctly seen with a monospace font):
>>
>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->
>>   rom_write -> modify_bars: tmp (pdev2) ->lock
>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
>> tmp (pdev1) ->lock
>>
>> There is no API to upgrade read lock to write lock in modify_bars which 
>> could help,
>> so in both cases vpci_write should take write lock.
> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
> to acquire the write lock, but its (perhaps indirect) caller. Effectively
> vpci_write() would need to take the write lock if the range written
> overlaps the BARs or the command register.
Exactly, vpci_write needs a write lock, but it is not desirable.
And again, there is a single offending piece of code which wants that...
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 16:19, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 01:53:34PM +, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 14:46, Roger Pau Monné wrote:
>>> On Mon, Feb 07, 2022 at 11:08:39AM +, Oleksandr Andrushchenko wrote:
 ==

 Bottom line:
 ==

 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
 parallel with pci_remove_device which can remove pdev after 
 vpci_{read|write}
 acquired the pdev pointer. This may lead to a fail due to pdev dereference.

 So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.
>>> We would like to take the pcidevs_lock only while fetching the device
>>> (ie: pci_get_pdev_by_domain), afterwards it should be fine to lock the
>>> device using a vpci specific lock so calls to vpci_{read,write} can be
>>> partially concurrent across multiple domains.
>> This means this can't be done a pre-req patch, but as a part of the
>> patch which changes locking.
>>> In fact I think Jan had already pointed out that the pci lock would
>>> need taking while searching for the device in vpci_{read,write}.
>> I was referring to the time after we found pdev and it is currently
>> possible to free pdev while using it after the search
>>> It seems to me that if you implement option 3 below taking the
>>> per-domain rwlock in read mode in vpci_{read|write} will already
>>> protect you from the device being removed if the same per-domain lock
>>> is taken in write mode in vpci_remove_device.
>> Yes, it should. Again this can't be done as a pre-req patch because
>> this relies on pdev->vpci_lock
> Hm, no, I don't think so. You could introduce this per-domain rwlock
> in a prepatch, and then move the vpci lock outside of the vpci struct.
> I see no problem with that.
>
 2. The only offending place which is in the way of pci_dev->vpci_lock is
 modify_bars. If it can be re-worked to track already mapped and unmapped
 regions then we can avoid having a possible deadlock and can use
 pci_dev->vpci_lock (rangesets won't help here as we also need refcounting 
 be
 implemented).
>>> I think a refcounting based solution will be very complex to
>>> implement. I'm however happy to be proven wrong.
>> I can't estimate, but I have a feeling that all these plays around locking
>> is just because of this single piece of code. No other place suffer from
>> pdev->vpci_lock and no d->lock
 If pcidevs_lock is used for vpci_{read|write} then no deadlock is possible,
 but modify_bars code must be re-worked not to lock itself (pdev->vpci_lock 
 and
 tmp->vpci_lock when pdev == tmp, this is minor).
>>> Taking the pcidevs lock (a global lock) is out of the picture IMO, as
>>> it's going to serialize all calls of vpci_{read|write}, and would
>>> create too much contention on the pcidevs lock.
>> I understand that. But if we would like to fix the existing code I see
>> no other alternative.
 3. We may think about a per-domain rwlock and pdev->vpci_lock, so this 
 solves
 modify_bars's two pdevs access. But this doesn't solve possible pdev
 de-reference in vpci_{read|write} vs pci_remove_device.
>>> pci_remove device will call vpci_remove_device, so as long as
>>> vpci_remove_device taken the per-domain lock in write (exclusive) mode
>>> it should be fine.
>> I think I need to see if there are any other places which similarly
>> require the write lock
 @Roger, @Jan, I would like to hear what do you think about the above 
 analysis
 and how can we proceed with locking re-work?
>>> I think the per-domain rwlock seems like a good option. I would do
>>> that as a pre-patch.
>> It is. But it seems it won't solve the thing we started this adventure for:
>>
>> With per-domain read lock and still ABBA in modify_bars (hope the below
>> is correctly seen with a monospace font):
>>
>> cpu0: vpci_write-> d->RLock -> pdev1->lock ->
>>   rom_write -> modify_bars: tmp (pdev2) ->lock
>> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
>> tmp (pdev1) ->lock
>>
>> There is no API to upgrade read lock to write lock in modify_bars which 
>> could help,
>> so in both cases vpci_write should take write lock.
> I've thought more than once that it would be nice to have a
> write_{upgrade,downgrade} (read_downgrade maybe?) or similar helper.
Yes, this is the real use-case for that
>
> I think you could also drop the read lock, take the write lock and
> check that >vpci->header == header in order to be sure
> pdev->vpci hasn't been recreated.
And have pdev freed in between
>   You would have to do similar in
> order to get back again from a write lock into a read one.
Not sure this is reliable.
>
> We should avoid taking the rwlock in write mode in vpci_write
> unconditionally.
Yes, but without upgrading the read lock I see no way 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 03:11:03PM +0100, Jan Beulich wrote:
> On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
> > On 07.02.22 14:46, Roger Pau Monné wrote:
> >> I think the per-domain rwlock seems like a good option. I would do
> >> that as a pre-patch.
> > It is. But it seems it won't solve the thing we started this adventure for:
> > 
> > With per-domain read lock and still ABBA in modify_bars (hope the below
> > is correctly seen with a monospace font):
> > 
> > cpu0: vpci_write-> d->RLock -> pdev1->lock ->   
> >    rom_write -> modify_bars: tmp (pdev2) ->lock
> > cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
> > tmp (pdev1) ->lock
> > 
> > There is no API to upgrade read lock to write lock in modify_bars which 
> > could help,
> > so in both cases vpci_write should take write lock.
> 
> Hmm, yes, I think you're right: It's not modify_bars() itself which needs
> to acquire the write lock, but its (perhaps indirect) caller. Effectively
> vpci_write() would need to take the write lock if the range written
> overlaps the BARs or the command register.

I'm confused. If we use a per-domain rwlock approach there would be no
need to lock tmp again in modify_bars, because we should hold the
rwlock in write mode, so there's no ABBA?

We will have however to drop the per domain read and vpci locks and
pick the per-domain lock in write mode.

Thanks, Roger.



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 01:53:34PM +, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 14:46, Roger Pau Monné wrote:
> > On Mon, Feb 07, 2022 at 11:08:39AM +, Oleksandr Andrushchenko wrote:
> >> ==
> >>
> >> Bottom line:
> >> ==
> >>
> >> 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
> >> parallel with pci_remove_device which can remove pdev after 
> >> vpci_{read|write}
> >> acquired the pdev pointer. This may lead to a fail due to pdev dereference.
> >>
> >> So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.
> > We would like to take the pcidevs_lock only while fetching the device
> > (ie: pci_get_pdev_by_domain), afterwards it should be fine to lock the
> > device using a vpci specific lock so calls to vpci_{read,write} can be
> > partially concurrent across multiple domains.
> This means this can't be done a pre-req patch, but as a part of the
> patch which changes locking.
> >
> > In fact I think Jan had already pointed out that the pci lock would
> > need taking while searching for the device in vpci_{read,write}.
> I was referring to the time after we found pdev and it is currently
> possible to free pdev while using it after the search
> >
> > It seems to me that if you implement option 3 below taking the
> > per-domain rwlock in read mode in vpci_{read|write} will already
> > protect you from the device being removed if the same per-domain lock
> > is taken in write mode in vpci_remove_device.
> Yes, it should. Again this can't be done as a pre-req patch because
> this relies on pdev->vpci_lock

Hm, no, I don't think so. You could introduce this per-domain rwlock
in a prepatch, and then move the vpci lock outside of the vpci struct.
I see no problem with that.

> >
> >> 2. The only offending place which is in the way of pci_dev->vpci_lock is
> >> modify_bars. If it can be re-worked to track already mapped and unmapped
> >> regions then we can avoid having a possible deadlock and can use
> >> pci_dev->vpci_lock (rangesets won't help here as we also need refcounting 
> >> be
> >> implemented).
> > I think a refcounting based solution will be very complex to
> > implement. I'm however happy to be proven wrong.
> I can't estimate, but I have a feeling that all these plays around locking
> is just because of this single piece of code. No other place suffer from
> pdev->vpci_lock and no d->lock
> >
> >> If pcidevs_lock is used for vpci_{read|write} then no deadlock is possible,
> >> but modify_bars code must be re-worked not to lock itself (pdev->vpci_lock 
> >> and
> >> tmp->vpci_lock when pdev == tmp, this is minor).
> > Taking the pcidevs lock (a global lock) is out of the picture IMO, as
> > it's going to serialize all calls of vpci_{read|write}, and would
> > create too much contention on the pcidevs lock.
> I understand that. But if we would like to fix the existing code I see
> no other alternative.
> >
> >> 3. We may think about a per-domain rwlock and pdev->vpci_lock, so this 
> >> solves
> >> modify_bars's two pdevs access. But this doesn't solve possible pdev
> >> de-reference in vpci_{read|write} vs pci_remove_device.
> > pci_remove device will call vpci_remove_device, so as long as
> > vpci_remove_device taken the per-domain lock in write (exclusive) mode
> > it should be fine.
> I think I need to see if there are any other places which similarly
> require the write lock
> >
> >> @Roger, @Jan, I would like to hear what do you think about the above 
> >> analysis
> >> and how can we proceed with locking re-work?
> > I think the per-domain rwlock seems like a good option. I would do
> > that as a pre-patch.
> It is. But it seems it won't solve the thing we started this adventure for:
> 
> With per-domain read lock and still ABBA in modify_bars (hope the below
> is correctly seen with a monospace font):
> 
> cpu0: vpci_write-> d->RLock -> pdev1->lock -> 
>  rom_write -> modify_bars: tmp (pdev2) ->lock
> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
> tmp (pdev1) ->lock
> 
> There is no API to upgrade read lock to write lock in modify_bars which could 
> help,
> so in both cases vpci_write should take write lock.

I've thought more than once that it would be nice to have a
write_{upgrade,downgrade} (read_downgrade maybe?) or similar helper.

I think you could also drop the read lock, take the write lock and
check that >vpci->header == header in order to be sure
pdev->vpci hasn't been recreated. You would have to do similar in
order to get back again from a write lock into a read one.

We should avoid taking the rwlock in write mode in vpci_write
unconditionally.

Thanks, Roger.



Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 14:54, Jan Beulich wrote:
> On 07.02.2022 13:51, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 14:38, Jan Beulich wrote:
>>> On 07.02.2022 12:27, Oleksandr Andrushchenko wrote:
 On 07.02.22 09:29, Jan Beulich wrote:
> On 04.02.2022 15:37, Oleksandr Andrushchenko wrote:
>> On 04.02.22 16:30, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 Reset the command register when assigning a PCI device to a guest:
 according to the PCI spec the PCI_COMMAND register is typically all 0's
 after reset.
>>> It's not entirely clear to me whether setting the hardware register to
>>> zero is okay. What wants to be zero is the value the guest observes
>>> initially.
>> "the PCI spec says the PCI_COMMAND register is typically all 0's after 
>> reset."
>> Why wouldn't it be ok? What is the exact concern here?
> The concern is - as voiced is similar ways before, perhaps in other
> contexts - that you need to consider bit-by-bit whether overwriting
> with 0 what is currently there is okay. Xen and/or Dom0 may have put
> values there which they expect to remain unaltered. I guess
> PCI_COMMAND_SERR is a good example: While the guest's view of this
> will want to be zero initially, the host having set it to 1 may not
> easily be overwritten with 0, or else you'd effectively imply giving
> the guest control of the bit.
 We have already discussed in great detail PCI_COMMAND emulation [1].
 At the end you wrote [1]:
 "Well, in order for the whole thing to be security supported it needs to
 be explained for every bit why it is safe to allow the guest to drive it.
 Until you mean vPCI to reach that state, leaving TODO notes in the code
 for anything not investigated may indeed be good enough.

 Jan"

 So, this is why I left a TODO in the PCI_COMMAND emulation for now and only
 care about INTx which is honored with the code in this patch.
>>> Right. The issue I see is that the description does not have any
>>> mention of this, but instead talks about simply writing zero.
>> How do you want that mentioned? Extended commit message or
>> just a link to the thread [1]?
> What I'd like you to describe is what the change does without
> fundamentally implying it'll end up being zero which gets written
> to the register. Stating as a conclusion that for the time being
> this means writing zero is certainly fine (and likely helpful if
> made explicit).
Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's view of this will want to be zero initially, the host having set
it to 1 may not easily be overwritten with 0, or else we'd effectively
imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

There are examples of emulators [1], [2] which already deal with PCI_COMMAND
register emulation and it seems that at most they care about the only INTX
bit (besides IO/memory enable and bus muster which are write through).
It could be because in order to properly emulate the PCI_COMMAND register
we need to know about the whole PCI topology, e.g. if any setting in device's
command register is aligned with the upstream port etc.
This makes me think that because of this complexity others just ignore that.
Neither I think this can be easily done in Xen case.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" says that the reset state of the command register is
typically 0, so reset the command register when assigning a PCI device
to a guest t all 0's and for now only make sure INTx bit is set according
to if MSI/MSI-X enabled.

[1] https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_config_init.c#L310
[2] 
https://github.com/projectacrn/acrn-hypervisor/blob/master/hypervisor/hw/pci.c#L336

Will the above description be enough?

It also seems to be a good move to squash the following patches:
[PATCH v6 09/13] vpci/header: emulate PCI_COMMAND register for guests
[PATCH v6 10/13] vpci/header: reset the command register when adding devices

as they implement a single piece of functionality now.
>> With the above done, do you think that writing 0's is an acceptable
>> approach as of now?
> Well, yes, provided we have a sufficiently similar understanding
> of what "acceptable" here means.
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 14:53, Oleksandr Andrushchenko wrote:
> On 07.02.22 14:46, Roger Pau Monné wrote:
>> I think the per-domain rwlock seems like a good option. I would do
>> that as a pre-patch.
> It is. But it seems it won't solve the thing we started this adventure for:
> 
> With per-domain read lock and still ABBA in modify_bars (hope the below
> is correctly seen with a monospace font):
> 
> cpu0: vpci_write-> d->RLock -> pdev1->lock -> 
>  rom_write -> modify_bars: tmp (pdev2) ->lock
> cpu1:    vpci_write-> d->RLock pdev2->lock -> cmd_write -> modify_bars: 
> tmp (pdev1) ->lock
> 
> There is no API to upgrade read lock to write lock in modify_bars which could 
> help,
> so in both cases vpci_write should take write lock.

Hmm, yes, I think you're right: It's not modify_bars() itself which needs
to acquire the write lock, but its (perhaps indirect) caller. Effectively
vpci_write() would need to take the write lock if the range written
overlaps the BARs or the command register.

Jan




[linux-linus test] 168041: tolerable FAIL - PUSHED

2022-02-07 Thread osstest service owner
flight 168041 linux-linus real [real]
flight 168045 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/168041/
http://logs.test-lab.xenproject.org/osstest/logs/168045/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 12 debian-hvm-install fail 
pass in 168045-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail in 168045 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168035
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168035
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168035
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168035
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168035
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168035
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168035
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168035
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linuxdfd42facf1e4ada021b939b4e19c935dcdd55566
baseline version:
 linuxd8ad2ce873abab1cfd38779c626b79cef6307aac

Last test of basis   168035  2022-02-06 18:41:13 Z0 days
Testing same since   168041  2022-02-07 05:04:10 Z0 days1 attempts


People who touched revisions under test:
  Linus Torvalds 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm   

[ovmf test] 168043: all pass - PUSHED

2022-02-07 Thread osstest service owner
flight 168043 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168043/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 96b8b5fd108a1f27960eee3915c0b10db191c849
baseline version:
 ovmf f78b937c95ddc4f7a29e41fee98e96076828a108

Last test of basis   168042  2022-02-07 06:41:36 Z0 days
Testing same since   168043  2022-02-07 10:42:40 Z0 days1 attempts


People who touched revisions under test:
  Matt DeVillier 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   f78b937c95..96b8b5fd10  96b8b5fd108a1f27960eee3915c0b10db191c849 -> 
xen-tested-master



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 14:46, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 11:08:39AM +, Oleksandr Andrushchenko wrote:
>> Hello,
>>
>> On 04.02.22 16:57, Roger Pau Monné wrote:
>>> On Fri, Feb 04, 2022 at 02:43:07PM +, Oleksandr Andrushchenko wrote:
 On 04.02.22 15:06, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
>> On 04.02.22 14:47, Jan Beulich wrote:
>>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
 On 04.02.22 13:37, Jan Beulich wrote:
> On 04.02.2022 12:13, Roger Pau Monné wrote:
>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
>>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
 On 04.02.22 11:15, Jan Beulich wrote:
> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>> On 04.02.22 09:52, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 @@ -285,6 +286,12 @@ static int modify_bars(const struct 
 pci_dev *pdev, uint16_t cmd, bool rom_only)
 continue;
 }
 
 +spin_lock(>vpci_lock);
 +if ( !tmp->vpci )
 +{
 +spin_unlock(>vpci_lock);
 +continue;
 +}
 for ( i = 0; i < 
 ARRAY_SIZE(tmp->vpci->header.bars); i++ )
 {
 const struct vpci_bar *bar = 
 >vpci->header.bars[i];
 @@ -303,12 +310,14 @@ static int modify_bars(const struct 
 pci_dev *pdev, uint16_t cmd, bool rom_only)
 rc = rangeset_remove_range(mem, start, 
 end);
 if ( rc )
 {
 +spin_unlock(>vpci_lock);
 printk(XENLOG_G_WARNING "Failed to 
 remove [%lx, %lx]: %d\n",
start, end, rc);
 rangeset_destroy(mem);
 return rc;
 }
 }
 +spin_unlock(>vpci_lock);
 }
>>> At the first glance this simply looks like another unjustified 
>>> (in the
>>> description) change, as you're not converting anything here but 
>>> you
>>> actually add locking (and I realize this was there before, so 
>>> I'm sorry
>>> for not pointing this out earlier).
>> Well, I thought that the description already has "...the lock 
>> can be
>> used (and in a few cases is used right away) to check whether 
>> vpci
>> is present" and this is enough for such uses as here.
>>> But then I wonder whether you
>>> actually tested this, since I can't help getting the impression 
>>> that
>>> you're introducing a live-lock: The function is called from 
>>> cmd_write()
>>> and rom_write(), which in turn are called out of vpci_write(). 
>>> Yet that
>>> function already holds the lock, and the lock is not (currently)
>>> recursive. (For the 3rd caller of the function - init_bars() - 
>>> otoh
>>> the locking looks to be entirely unnecessary.)
>> Well, you are correct: if tmp != pdev then it is correct to 
>> acquire
>> the lock. But if tmp == pdev and rom_only == true
>> then we'll deadlock.
>>
>> It seems we need to have the locking conditional, e.g. only lock
>> if tmp != pdev
> Which will address the live-lock, but introduce ABBA deadlock 
> potential
> between the two locks.
 I am not sure I can suggest a better solution here
 @Roger, @Jan, could you please help here?
>>> Well, first of all I'd like to mention that while it may have been 
>>> okay to
>>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
>>> dealing
>>> with DomU-s' lists of PCI devices. The requirement really applies 
>>> to the
>>> other use of for_each_pdev() as well (in vpci_dump_msi()), except 
>>> that
>>> there it probably wants to be a try-lock.
>>>
>>> Next I'd like to point out that here we have the still pending 
>>> issue of
>>> how to deal with hidden devices, which Dom0 can access. See my 

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 13:57, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 14:34, Jan Beulich wrote:
>> On 07.02.2022 12:08, Oleksandr Andrushchenko wrote:
>>> 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
>>> parallel with pci_remove_device which can remove pdev after 
>>> vpci_{read|write}
>>> acquired the pdev pointer. This may lead to a fail due to pdev dereference.
>>>
>>> So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.
>> I think this is not the only place where there is a theoretical race
>> against pci_remove_device().
> Not at all, that was just to demonstrate one of the possible sources of races.
>>   I would recommend to separate the
>> overall situation with pcidevs_lock from the issue here.
> Do you agree that there is already an issue with that? In the currently 
> existing code?
>>   I don't view
>> it as an option to acquire pcidevs_lock in vpci_{read,write}().
> Yes, that would hurt too much, I agree. But this needs to be solved
>>   If
>> anything, we need proper refcounting of PCI devices (at which point
>> likely a number of lock uses can go away).
> It seems so. Then not only pdev's need refcounting, but pdev->vpci as well
> 
> What's your view on how can we achieve both goals?
> pdev and pdev->vpci and locking/refcounting

I don't see why pdev->vpci might need refcounting. And just to state it
in different words: I'd like to suggest to leave aside the pdev locking
as long as it's _just_ to protect against hot remove of a device. That's
orthogonal to what you need for vPCI, where you need to protect
against the device disappearing from a guest (without at the same time
disappearing from the host).

Jan




[xtf test] 168044: all pass - PUSHED

2022-02-07 Thread osstest service owner
flight 168044 xtf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168044/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 xtf  bc0abf2a5498d4691538bf34496ca0f0f189951b
baseline version:
 xtf  61e6f40b07d256bd62ae7b231a3eeecd49d0b15b

Last test of basis   165349  2021-10-04 12:43:20 Z  126 days
Testing same since   168044  2022-02-07 11:41:42 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 

jobs:
 build-amd64-xtf  pass
 build-amd64  pass
 build-amd64-pvopspass
 test-xtf-amd64-amd64-1   pass
 test-xtf-amd64-amd64-2   pass
 test-xtf-amd64-amd64-3   pass
 test-xtf-amd64-amd64-4   pass
 test-xtf-amd64-amd64-5   pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xtf.git
   61e6f40..bc0abf2  bc0abf2a5498d4691538bf34496ca0f0f189951b -> 
xen-tested-master



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 14:34, Jan Beulich wrote:
> On 07.02.2022 12:08, Oleksandr Andrushchenko wrote:
>> 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
>> parallel with pci_remove_device which can remove pdev after vpci_{read|write}
>> acquired the pdev pointer. This may lead to a fail due to pdev dereference.
>>
>> So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.
> I think this is not the only place where there is a theoretical race
> against pci_remove_device().
Not at all, that was just to demonstrate one of the possible sources of races.
>   I would recommend to separate the
> overall situation with pcidevs_lock from the issue here.
Do you agree that there is already an issue with that? In the currently 
existing code?
>   I don't view
> it as an option to acquire pcidevs_lock in vpci_{read,write}().
Yes, that would hurt too much, I agree. But this needs to be solved
>   If
> anything, we need proper refcounting of PCI devices (at which point
> likely a number of lock uses can go away).
It seems so. Then not only pdev's need refcounting, but pdev->vpci as well

What's your view on how can we achieve both goals?
pdev and pdev->vpci and locking/refcounting
This is really crucial for all the code for PCI passthrough on Arm because
without this ground work done we can't accept all the patches which rely
on this: vPCI changes, MSI/MSI-X etc.
>
> Jan
>
Thank you,
Oleksandr

Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Jan Beulich
On 07.02.2022 13:51, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 14:38, Jan Beulich wrote:
>> On 07.02.2022 12:27, Oleksandr Andrushchenko wrote:
>>>
>>> On 07.02.22 09:29, Jan Beulich wrote:
 On 04.02.2022 15:37, Oleksandr Andrushchenko wrote:
> On 04.02.22 16:30, Jan Beulich wrote:
>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>>> Reset the command register when assigning a PCI device to a guest:
>>> according to the PCI spec the PCI_COMMAND register is typically all 0's
>>> after reset.
>> It's not entirely clear to me whether setting the hardware register to
>> zero is okay. What wants to be zero is the value the guest observes
>> initially.
> "the PCI spec says the PCI_COMMAND register is typically all 0's after 
> reset."
> Why wouldn't it be ok? What is the exact concern here?
 The concern is - as voiced is similar ways before, perhaps in other
 contexts - that you need to consider bit-by-bit whether overwriting
 with 0 what is currently there is okay. Xen and/or Dom0 may have put
 values there which they expect to remain unaltered. I guess
 PCI_COMMAND_SERR is a good example: While the guest's view of this
 will want to be zero initially, the host having set it to 1 may not
 easily be overwritten with 0, or else you'd effectively imply giving
 the guest control of the bit.
>>> We have already discussed in great detail PCI_COMMAND emulation [1].
>>> At the end you wrote [1]:
>>> "Well, in order for the whole thing to be security supported it needs to
>>> be explained for every bit why it is safe to allow the guest to drive it.
>>> Until you mean vPCI to reach that state, leaving TODO notes in the code
>>> for anything not investigated may indeed be good enough.
>>>
>>> Jan"
>>>
>>> So, this is why I left a TODO in the PCI_COMMAND emulation for now and only
>>> care about INTx which is honored with the code in this patch.
>> Right. The issue I see is that the description does not have any
>> mention of this, but instead talks about simply writing zero.
> How do you want that mentioned? Extended commit message or
> just a link to the thread [1]?

What I'd like you to describe is what the change does without
fundamentally implying it'll end up being zero which gets written
to the register. Stating as a conclusion that for the time being
this means writing zero is certainly fine (and likely helpful if
made explicit).

> With the above done, do you think that writing 0's is an acceptable
> approach as of now?

Well, yes, provided we have a sufficiently similar understanding
of what "acceptable" here means.

Jan




Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 14:38, Jan Beulich wrote:
> On 07.02.2022 12:27, Oleksandr Andrushchenko wrote:
>>
>> On 07.02.22 09:29, Jan Beulich wrote:
>>> On 04.02.2022 15:37, Oleksandr Andrushchenko wrote:
 On 04.02.22 16:30, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> Reset the command register when assigning a PCI device to a guest:
>> according to the PCI spec the PCI_COMMAND register is typically all 0's
>> after reset.
> It's not entirely clear to me whether setting the hardware register to
> zero is okay. What wants to be zero is the value the guest observes
> initially.
 "the PCI spec says the PCI_COMMAND register is typically all 0's after 
 reset."
 Why wouldn't it be ok? What is the exact concern here?
>>> The concern is - as voiced is similar ways before, perhaps in other
>>> contexts - that you need to consider bit-by-bit whether overwriting
>>> with 0 what is currently there is okay. Xen and/or Dom0 may have put
>>> values there which they expect to remain unaltered. I guess
>>> PCI_COMMAND_SERR is a good example: While the guest's view of this
>>> will want to be zero initially, the host having set it to 1 may not
>>> easily be overwritten with 0, or else you'd effectively imply giving
>>> the guest control of the bit.
>> We have already discussed in great detail PCI_COMMAND emulation [1].
>> At the end you wrote [1]:
>> "Well, in order for the whole thing to be security supported it needs to
>> be explained for every bit why it is safe to allow the guest to drive it.
>> Until you mean vPCI to reach that state, leaving TODO notes in the code
>> for anything not investigated may indeed be good enough.
>>
>> Jan"
>>
>> So, this is why I left a TODO in the PCI_COMMAND emulation for now and only
>> care about INTx which is honored with the code in this patch.
> Right. The issue I see is that the description does not have any
> mention of this, but instead talks about simply writing zero.
How do you want that mentioned? Extended commit message or
just a link to the thread [1]?
With the above done, do you think that writing 0's is an acceptable
approach as of now?
> Jan
>
Thank you,
Oleksandr

Re: [PATCH] xen/public: partially revert commit 7c7f7e8fba01

2022-02-07 Thread Juergen Gross

On 07.02.22 13:46, Jan Beulich wrote:

On 07.02.2022 12:00, Juergen Gross wrote:

On 07.02.22 11:46, Jan Beulich wrote:

On 07.02.2022 11:36, Juergen Gross wrote:

--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -662,7 +662,17 @@ struct xen_mem_acquire_resource {
* two calls.
*/
   uint32_t nr_frames;
-uint32_t pad;
+
+/*
+ * OUT - Must be zero on entry. On return this may contain a bitwise
+ *   OR of the following values.
+ */
+uint32_t flags;
+
+/* No longer supported - will be never set */
+#define _XENMEM_rsrc_acq_caller_owned 0
+#define XENMEM_rsrc_acq_caller_owned (1u << _XENMEM_rsrc_acq_caller_owned)


I think this goes too far: Neither do we want to re-introduce the
#define-s, nor should we re-fix the purpose of the padding field
to be OUT (only). All we need to make sure is that the field
coming in as zero won't get responded to by setting bit 0 of it.
Imo this can only reasonably be done by way of adding a comment.
This comment may, in turn, mention XENMEM_rsrc_acq_caller_owned
of course.


The kernel could be changed to no longer use that #define before
updating the header from Xen, but are we really sure there are no
other users, too?


Pretty sure. And I think in this case it's better to break the build
of consumers (so we're sure they'd notice, assuming they import the
header directly in the first place). It's rather an exceptional case
after all.


Okay, I'll just add a comment regarding the reserved bit then, without
reverting any part of commit 7c7f7e8fba01.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2] dom0/pvh: fix processing softirqs during memory map population

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 01:41:38PM +0100, Jan Beulich wrote:
> On 07.02.2022 12:20, Roger Pau Monne wrote:
> > Make sure softirqs are processed after every successful call to
> > guest_physmap_add_page. Even if only a single page is to be added,
> > it's unknown whether the p2m or the IOMMU will require splitting the
> > provided page into smaller ones, and thus in case of having to break
> > a 1G page into 4K entries the amount of time taken by a single of
> > those additions will be non-trivial. Stay on the safe side an check
> > for pending softirqs on ever successful loop iteration.
> > 
> > Fixes: 5427134eae ('x86: populate PVHv2 Dom0 physical memory map')
> > Signed-off-by: Roger Pau Monné 
> 
> Reviewed-by: Jan Beulich 
> 
> Nit: I guess it's "and" and "every" in the last sentence. I'd be
> happy to adjust while committing.

Yes please, if you are happy to adjust on commit.

Thanks, Roger.



Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 11:08:39AM +, Oleksandr Andrushchenko wrote:
> Hello,
> 
> On 04.02.22 16:57, Roger Pau Monné wrote:
> > On Fri, Feb 04, 2022 at 02:43:07PM +, Oleksandr Andrushchenko wrote:
> >>
> >> On 04.02.22 15:06, Roger Pau Monné wrote:
> >>> On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
>  On 04.02.22 14:47, Jan Beulich wrote:
> > On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 13:37, Jan Beulich wrote:
> >>> On 04.02.2022 12:13, Roger Pau Monné wrote:
>  On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> > On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 11:15, Jan Beulich wrote:
> >>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
>  On 04.02.22 09:52, Jan Beulich wrote:
> > On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> >> @@ -285,6 +286,12 @@ static int modify_bars(const struct 
> >> pci_dev *pdev, uint16_t cmd, bool rom_only)
> >>continue;
> >>}
> >>
> >> +spin_lock(>vpci_lock);
> >> +if ( !tmp->vpci )
> >> +{
> >> +spin_unlock(>vpci_lock);
> >> +continue;
> >> +}
> >>for ( i = 0; i < 
> >> ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> >>{
> >>const struct vpci_bar *bar = 
> >> >vpci->header.bars[i];
> >> @@ -303,12 +310,14 @@ static int modify_bars(const struct 
> >> pci_dev *pdev, uint16_t cmd, bool rom_only)
> >>rc = rangeset_remove_range(mem, start, end);
> >>if ( rc )
> >>{
> >> +spin_unlock(>vpci_lock);
> >>printk(XENLOG_G_WARNING "Failed to 
> >> remove [%lx, %lx]: %d\n",
> >>   start, end, rc);
> >>rangeset_destroy(mem);
> >>return rc;
> >>}
> >>}
> >> +spin_unlock(>vpci_lock);
> >>}
> > At the first glance this simply looks like another unjustified 
> > (in the
> > description) change, as you're not converting anything here but 
> > you
> > actually add locking (and I realize this was there before, so 
> > I'm sorry
> > for not pointing this out earlier).
>  Well, I thought that the description already has "...the lock 
>  can be
>  used (and in a few cases is used right away) to check whether 
>  vpci
>  is present" and this is enough for such uses as here.
> >But then I wonder whether you
> > actually tested this, since I can't help getting the impression 
> > that
> > you're introducing a live-lock: The function is called from 
> > cmd_write()
> > and rom_write(), which in turn are called out of vpci_write(). 
> > Yet that
> > function already holds the lock, and the lock is not (currently)
> > recursive. (For the 3rd caller of the function - init_bars() - 
> > otoh
> > the locking looks to be entirely unnecessary.)
>  Well, you are correct: if tmp != pdev then it is correct to 
>  acquire
>  the lock. But if tmp == pdev and rom_only == true
>  then we'll deadlock.
> 
>  It seems we need to have the locking conditional, e.g. only lock
>  if tmp != pdev
> >>> Which will address the live-lock, but introduce ABBA deadlock 
> >>> potential
> >>> between the two locks.
> >> I am not sure I can suggest a better solution here
> >> @Roger, @Jan, could you please help here?
> > Well, first of all I'd like to mention that while it may have been 
> > okay to
> > not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> > dealing
> > with DomU-s' lists of PCI devices. The requirement really applies 
> > to the
> > other use of for_each_pdev() as well (in vpci_dump_msi()), except 
> > that
> > there it probably wants to be a try-lock.
> >
> > Next I'd like to point out that here we have the still pending 
> > issue of
> > how to deal with hidden devices, which Dom0 can access. See my RFC 
> > patch
> > "vPCI: account for hidden devices in 

Re: [PATCH] xen/public: partially revert commit 7c7f7e8fba01

2022-02-07 Thread Jan Beulich
On 07.02.2022 12:00, Juergen Gross wrote:
> On 07.02.22 11:46, Jan Beulich wrote:
>> On 07.02.2022 11:36, Juergen Gross wrote:
>>> --- a/xen/include/public/memory.h
>>> +++ b/xen/include/public/memory.h
>>> @@ -662,7 +662,17 @@ struct xen_mem_acquire_resource {
>>>* two calls.
>>>*/
>>>   uint32_t nr_frames;
>>> -uint32_t pad;
>>> +
>>> +/*
>>> + * OUT - Must be zero on entry. On return this may contain a bitwise
>>> + *   OR of the following values.
>>> + */
>>> +uint32_t flags;
>>> +
>>> +/* No longer supported - will be never set */
>>> +#define _XENMEM_rsrc_acq_caller_owned 0
>>> +#define XENMEM_rsrc_acq_caller_owned (1u << _XENMEM_rsrc_acq_caller_owned)
>>
>> I think this goes too far: Neither do we want to re-introduce the
>> #define-s, nor should we re-fix the purpose of the padding field
>> to be OUT (only). All we need to make sure is that the field
>> coming in as zero won't get responded to by setting bit 0 of it.
>> Imo this can only reasonably be done by way of adding a comment.
>> This comment may, in turn, mention XENMEM_rsrc_acq_caller_owned
>> of course.
> 
> The kernel could be changed to no longer use that #define before
> updating the header from Xen, but are we really sure there are no
> other users, too?

Pretty sure. And I think in this case it's better to break the build
of consumers (so we're sure they'd notice, assuming they import the
header directly in the first place). It's rather an exceptional case
after all.

Jan




Re: [PATCH RFC 3/3] xen: Introduce a platform sub-op to retrieve the VGA information

2022-02-07 Thread Jan Beulich
On 07.02.2022 12:58, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 09:57:55AM +0100, Jan Beulich wrote:
>> On 06.02.2022 20:28, Julien Grall wrote:
>>> @@ -311,6 +312,7 @@ struct xenpf_firmware_info {
>>>  
>>>  /* Int16, Fn02: Get keyboard shift flags. */
>>>  uint8_t kbd_shift_flags; /* XEN_FW_KBD_SHIFT_FLAGS */
>>> +struct dom0_vga_console_info vga;
>>
>> ... the structure name including "vga" (but if the #define is adjusted,
>> the field name would want to become "video" as well).
> 
> It's my understanding that this will forcefully be
> XEN_VGATYPE_EFI_LFB, at which point we could consider giving a type
> name to the vesa_lfb field of dom0_vga_console_info (video_lfb) and
> use the same struct here?
> 
> There's no need to propagate XEN_VGATYPE_TEXT_MODE_3 into this
> interface.

Hmm, yes, this is probably better / more clean. Julien, thoughts?

Jan




Re: [PATCH v2] dom0/pvh: fix processing softirqs during memory map population

2022-02-07 Thread Jan Beulich
On 07.02.2022 12:20, Roger Pau Monne wrote:
> Make sure softirqs are processed after every successful call to
> guest_physmap_add_page. Even if only a single page is to be added,
> it's unknown whether the p2m or the IOMMU will require splitting the
> provided page into smaller ones, and thus in case of having to break
> a 1G page into 4K entries the amount of time taken by a single of
> those additions will be non-trivial. Stay on the safe side an check
> for pending softirqs on ever successful loop iteration.
> 
> Fixes: 5427134eae ('x86: populate PVHv2 Dom0 physical memory map')
> Signed-off-by: Roger Pau Monné 

Reviewed-by: Jan Beulich 

Nit: I guess it's "and" and "every" in the last sentence. I'd be
happy to adjust while committing.

Jan




Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Jan Beulich
On 07.02.2022 12:27, Oleksandr Andrushchenko wrote:
> 
> 
> On 07.02.22 09:29, Jan Beulich wrote:
>> On 04.02.2022 15:37, Oleksandr Andrushchenko wrote:
>>> On 04.02.22 16:30, Jan Beulich wrote:
 On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> Reset the command register when assigning a PCI device to a guest:
> according to the PCI spec the PCI_COMMAND register is typically all 0's
> after reset.
 It's not entirely clear to me whether setting the hardware register to
 zero is okay. What wants to be zero is the value the guest observes
 initially.
>>> "the PCI spec says the PCI_COMMAND register is typically all 0's after 
>>> reset."
>>> Why wouldn't it be ok? What is the exact concern here?
>> The concern is - as voiced is similar ways before, perhaps in other
>> contexts - that you need to consider bit-by-bit whether overwriting
>> with 0 what is currently there is okay. Xen and/or Dom0 may have put
>> values there which they expect to remain unaltered. I guess
>> PCI_COMMAND_SERR is a good example: While the guest's view of this
>> will want to be zero initially, the host having set it to 1 may not
>> easily be overwritten with 0, or else you'd effectively imply giving
>> the guest control of the bit.
> We have already discussed in great detail PCI_COMMAND emulation [1].
> At the end you wrote [1]:
> "Well, in order for the whole thing to be security supported it needs to
> be explained for every bit why it is safe to allow the guest to drive it.
> Until you mean vPCI to reach that state, leaving TODO notes in the code
> for anything not investigated may indeed be good enough.
> 
> Jan"
> 
> So, this is why I left a TODO in the PCI_COMMAND emulation for now and only
> care about INTx which is honored with the code in this patch.

Right. The issue I see is that the description does not have any
mention of this, but instead talks about simply writing zero.

Jan




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Jan Beulich
On 07.02.2022 12:08, Oleksandr Andrushchenko wrote:
> 1. vpci_{read|write} are not protected with pcidevs_lock and can run in
> parallel with pci_remove_device which can remove pdev after vpci_{read|write}
> acquired the pdev pointer. This may lead to a fail due to pdev dereference.
> 
> So, to protect pdev dereference vpci_{read|write} must also use pdevs_lock.

I think this is not the only place where there is a theoretical race
against pci_remove_device(). I would recommend to separate the
overall situation with pcidevs_lock from the issue here. I don't view
it as an option to acquire pcidevs_lock in vpci_{read,write}(). If
anything, we need proper refcounting of PCI devices (at which point
likely a number of lock uses can go away).

Jan




Re: [PATCH RFC 3/3] xen: Introduce a platform sub-op to retrieve the VGA information

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 09:57:55AM +0100, Jan Beulich wrote:
> On 06.02.2022 20:28, Julien Grall wrote:
> > From: Julien Grall 
> > 
> > When using EFI, the VGA information is fetched using the EFI
> > boot services. However, Xen will have exited the boot services.
> > Therefore, we need to find a different way to pass the information
> > to dom0.
> > 
> > For PV dom0, they are part of the start_info. But this is not
> > something that exists on Arm. So the best way would to be to
> > use a hypercall.
> > 
> > For now the structure layout is based on dom0_vga_console_info
> > for convenience. I am open on another proposal.
> > 
> > Signed-off-by: Julien Grall 
> 
> Cc-ing Roger as this may want using for PVH Dom0 also on x86; my
> first attempt to propagate this information was rejected.

I think it's easier to use a Xen specific layout in XENPF, as that's
already a Xen specific interface.

I wonder however if passing the information here (instead of doing it
in the start info or equivalent) could cause a delay in the
initialization of the video console. I guess the same happens when
using the Xen consoles (either the hypercall one or the shared ring),
so it's fine.

> > --- a/xen/include/public/platform.h
> > +++ b/xen/include/public/platform.h
> > @@ -244,6 +244,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_efi_runtime_call_t);
> >  #define  XEN_FW_EFI_PCI_ROM5
> >  #define  XEN_FW_EFI_APPLE_PROPERTIES 6
> >  #define XEN_FW_KBD_SHIFT_FLAGS5
> > +#define XEN_FW_VGA_INFO   6
> 
> Perhaps s/VGA/VIDEO/, despite ...
> 
> >  struct xenpf_firmware_info {
> >  /* IN variables. */
> >  uint32_t type;
> > @@ -311,6 +312,7 @@ struct xenpf_firmware_info {
> >  
> >  /* Int16, Fn02: Get keyboard shift flags. */
> >  uint8_t kbd_shift_flags; /* XEN_FW_KBD_SHIFT_FLAGS */
> > +struct dom0_vga_console_info vga;
> 
> ... the structure name including "vga" (but if the #define is adjusted,
> the field name would want to become "video" as well).

It's my understanding that this will forcefully be
XEN_VGATYPE_EFI_LFB, at which point we could consider giving a type
name to the vesa_lfb field of dom0_vga_console_info (video_lfb) and
use the same struct here?

There's no need to propagate XEN_VGATYPE_TEXT_MODE_3 into this
interface.

Thanks, Roger.



Re: [PATCH v6 10/13] vpci/header: reset the command register when adding devices

2022-02-07 Thread Oleksandr Andrushchenko


On 07.02.22 09:29, Jan Beulich wrote:
> On 04.02.2022 15:37, Oleksandr Andrushchenko wrote:
>> On 04.02.22 16:30, Jan Beulich wrote:
>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
 Reset the command register when assigning a PCI device to a guest:
 according to the PCI spec the PCI_COMMAND register is typically all 0's
 after reset.
>>> It's not entirely clear to me whether setting the hardware register to
>>> zero is okay. What wants to be zero is the value the guest observes
>>> initially.
>> "the PCI spec says the PCI_COMMAND register is typically all 0's after 
>> reset."
>> Why wouldn't it be ok? What is the exact concern here?
> The concern is - as voiced is similar ways before, perhaps in other
> contexts - that you need to consider bit-by-bit whether overwriting
> with 0 what is currently there is okay. Xen and/or Dom0 may have put
> values there which they expect to remain unaltered. I guess
> PCI_COMMAND_SERR is a good example: While the guest's view of this
> will want to be zero initially, the host having set it to 1 may not
> easily be overwritten with 0, or else you'd effectively imply giving
> the guest control of the bit.
We have already discussed in great detail PCI_COMMAND emulation [1].
At the end you wrote [1]:
"Well, in order for the whole thing to be security supported it needs to
be explained for every bit why it is safe to allow the guest to drive it.
Until you mean vPCI to reach that state, leaving TODO notes in the code
for anything not investigated may indeed be good enough.

Jan"

So, this is why I left a TODO in the PCI_COMMAND emulation for now and only
care about INTx which is honored with the code in this patch.
>
> Jan
>

Thank you,
Oleksandr

[1] 
https://patchwork.kernel.org/project/xen-devel/patch/20210903100831.177748-9-andr2...@gmail.com/
[2] https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg00737.html

[xen-unstable test] 168037: tolerable FAIL

2022-02-07 Thread osstest service owner
flight 168037 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168037/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-libvirt-xsm  20 guest-start/debian.repeat  fail pass in 168032
 test-armhf-armhf-xl-rtds 18 guest-start/debian.repeat  fail pass in 168032

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 168032
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 168032
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 168032
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 168032
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail like 
168032
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 168032
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 168032
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 168032
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 168032
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 168032
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 168032
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 168032
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 168032
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 

[PATCH v2] dom0/pvh: fix processing softirqs during memory map population

2022-02-07 Thread Roger Pau Monne
Make sure softirqs are processed after every successful call to
guest_physmap_add_page. Even if only a single page is to be added,
it's unknown whether the p2m or the IOMMU will require splitting the
provided page into smaller ones, and thus in case of having to break
a 1G page into 4K entries the amount of time taken by a single of
those additions will be non-trivial. Stay on the safe side an check
for pending softirqs on ever successful loop iteration.

Fixes: 5427134eae ('x86: populate PVHv2 Dom0 physical memory map')
Signed-off-by: Roger Pau Monné 
---
Changes since v1:
 - Unconditionally process softirqs after every successful loop
   iteration.
---
 xen/arch/x86/hvm/dom0_build.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 549ff8ec7c..cbc28113cb 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -114,10 +114,9 @@ static int __init pvh_populate_memory_range(struct domain 
*d,
 { .align = PFN_DOWN(MB(2)), .order = PAGE_ORDER_2M },
 { .align = PFN_DOWN(KB(4)), .order = PAGE_ORDER_4K },
 };
-unsigned int max_order = MAX_ORDER, i = 0;
+unsigned int max_order = MAX_ORDER;
 struct page_info *page;
 int rc;
-#define MAP_MAX_ITER 64
 
 while ( nr_pages != 0 )
 {
@@ -186,12 +185,16 @@ static int __init pvh_populate_memory_range(struct domain 
*d,
 start += 1UL << order;
 nr_pages -= 1UL << order;
 order_stats[order]++;
-if ( (++i % MAP_MAX_ITER) == 0 )
-process_pending_softirqs();
+/*
+ * Process pending softirqs on every successful loop: it's unknown
+ * whether the p2m/IOMMU code will have split the page into multiple
+ * smaller entries, and thus the time consumed would be much higher
+ * than populating a single entry.
+ */
+process_pending_softirqs();
 }
 
 return 0;
-#undef MAP_MAX_ITER
 }
 
 /* Steal RAM from the end of a memory region. */
-- 
2.34.1




Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

2022-02-07 Thread Oleksandr Andrushchenko
Hello,

On 04.02.22 16:57, Roger Pau Monné wrote:
> On Fri, Feb 04, 2022 at 02:43:07PM +, Oleksandr Andrushchenko wrote:
>>
>> On 04.02.22 15:06, Roger Pau Monné wrote:
>>> On Fri, Feb 04, 2022 at 12:53:20PM +, Oleksandr Andrushchenko wrote:
 On 04.02.22 14:47, Jan Beulich wrote:
> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote:
>> On 04.02.22 13:37, Jan Beulich wrote:
>>> On 04.02.2022 12:13, Roger Pau Monné wrote:
 On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote:
> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote:
>> On 04.02.22 11:15, Jan Beulich wrote:
>>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
 On 04.02.22 09:52, Jan Beulich wrote:
> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev 
>> *pdev, uint16_t cmd, bool rom_only)
>>continue;
>>}
>>
>> +spin_lock(>vpci_lock);
>> +if ( !tmp->vpci )
>> +{
>> +spin_unlock(>vpci_lock);
>> +continue;
>> +}
>>for ( i = 0; i < 
>> ARRAY_SIZE(tmp->vpci->header.bars); i++ )
>>{
>>const struct vpci_bar *bar = 
>> >vpci->header.bars[i];
>> @@ -303,12 +310,14 @@ static int modify_bars(const struct 
>> pci_dev *pdev, uint16_t cmd, bool rom_only)
>>rc = rangeset_remove_range(mem, start, end);
>>if ( rc )
>>{
>> +spin_unlock(>vpci_lock);
>>printk(XENLOG_G_WARNING "Failed to remove 
>> [%lx, %lx]: %d\n",
>>   start, end, rc);
>>rangeset_destroy(mem);
>>return rc;
>>}
>>}
>> +spin_unlock(>vpci_lock);
>>}
> At the first glance this simply looks like another unjustified 
> (in the
> description) change, as you're not converting anything here but 
> you
> actually add locking (and I realize this was there before, so I'm 
> sorry
> for not pointing this out earlier).
 Well, I thought that the description already has "...the lock can 
 be
 used (and in a few cases is used right away) to check whether vpci
 is present" and this is enough for such uses as here.
>But then I wonder whether you
> actually tested this, since I can't help getting the impression 
> that
> you're introducing a live-lock: The function is called from 
> cmd_write()
> and rom_write(), which in turn are called out of vpci_write(). 
> Yet that
> function already holds the lock, and the lock is not (currently)
> recursive. (For the 3rd caller of the function - init_bars() - 
> otoh
> the locking looks to be entirely unnecessary.)
 Well, you are correct: if tmp != pdev then it is correct to acquire
 the lock. But if tmp == pdev and rom_only == true
 then we'll deadlock.

 It seems we need to have the locking conditional, e.g. only lock
 if tmp != pdev
>>> Which will address the live-lock, but introduce ABBA deadlock 
>>> potential
>>> between the two locks.
>> I am not sure I can suggest a better solution here
>> @Roger, @Jan, could you please help here?
> Well, first of all I'd like to mention that while it may have been 
> okay to
> not hold pcidevs_lock here for Dom0, it surely needs acquiring when 
> dealing
> with DomU-s' lists of PCI devices. The requirement really applies to 
> the
> other use of for_each_pdev() as well (in vpci_dump_msi()), except that
> there it probably wants to be a try-lock.
>
> Next I'd like to point out that here we have the still pending issue 
> of
> how to deal with hidden devices, which Dom0 can access. See my RFC 
> patch
> "vPCI: account for hidden devices in modify_bars()". Whatever the 
> solution
> here, I think it wants to at least account for the extra need there.
 Yes, sorry, I should take care of that.

> Now it is quite clear that pcidevs_lock isn't going to help with 
> avoiding
> the deadlock, as it's 

Re: [PATCH] xen/public: partially revert commit 7c7f7e8fba01

2022-02-07 Thread Juergen Gross

On 07.02.22 11:46, Jan Beulich wrote:

On 07.02.2022 11:36, Juergen Gross wrote:

Commit 7c7f7e8fba01 changed xen/include/public/memory.h in an incompatible
way. Unfortunately the changed parts were already in use in the Linux
kernel, so an update of the header in the kernel would result in a build
breakage.

Even when removing its usage from the kernel the used flag bit should be
marked as reserved in order to avoid to give it a different semantic in
the future.

Do a partial revert of said commit in order to enable the kernel to take
an updated version of memory.h.


I don't think it should be a partial revert, and as said on irc I'm of
the opinion that ...


Fixes: 7c7f7e8fba01 ("include/public/memory.h: remove the 
XENMEM_rsrc_acq_caller_owned flag")


... it's 0e2e54966af5 which should have taken measures to protect
against re-use of the flag as an output.


The design of that feature was flawed from the beginning, as it was used
in the kernel right away. So I think the initial revert was the
effective start of the problem.




--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -662,7 +662,17 @@ struct xen_mem_acquire_resource {
   * two calls.
   */
  uint32_t nr_frames;
-uint32_t pad;
+
+/*
+ * OUT - Must be zero on entry. On return this may contain a bitwise
+ *   OR of the following values.
+ */
+uint32_t flags;
+
+/* No longer supported - will be never set */
+#define _XENMEM_rsrc_acq_caller_owned 0
+#define XENMEM_rsrc_acq_caller_owned (1u << _XENMEM_rsrc_acq_caller_owned)


I think this goes too far: Neither do we want to re-introduce the
#define-s, nor should we re-fix the purpose of the padding field
to be OUT (only). All we need to make sure is that the field
coming in as zero won't get responded to by setting bit 0 of it.
Imo this can only reasonably be done by way of adding a comment.
This comment may, in turn, mention XENMEM_rsrc_acq_caller_owned
of course.


The kernel could be changed to no longer use that #define before
updating the header from Xen, but are we really sure there are no
other users, too?

I'm fine doing it that way, but I think I should spell out the
consequences of that decision.


Btw., if the field was to become OUT-only again, I think you'd
also need to revert the change to xen/common/compat/memory.c. At
least to not leave a trap for someone to later fall into.


Okay, if you like that better.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Metadata and signalling channels for Zephyr virtio-backends on Xen

2022-02-07 Thread Alex Bennée


Hi Stefano,

Vincent gave an update on his virtio-scmi work at the last Stratos sync
call and the discussion moved onto next steps. Currently the demo setup
is intermediated by a double-ended vhost-user daemon running on the
devbox acting as a go between a number of QEMU instances representing
the front and back-ends. You can view the architecture with Vincents
diagram here:

  
https://docs.google.com/drawings/d/1YSuJUSjEdTi2oEUq4oG4A9pBKSEJTAp6hhcHKKhmYHs/edit?usp=sharing

The key virtq handling is done over the special carve outs of shared
memory between the front end and guest. However the signalling is
currently over a virtio device on the backend. This is useful for the
PoC but obviously in a real system we don't have a hidden POSIX system
acting as a go between not to mention the additional latency it causes
with all those context switches.

I was hoping we could get some more of the Xen experts to the next
Stratos sync (17th Feb) to go over approaches for a properly hosted on
Xen approach. From my recollection (Vincent please correct me if I'm
wrong) of last week the issues that need solving are:

 * How to handle configuration steps as FE guests come up

The SCMI server will be a long running persistent backend because it is
managing real HW resources. However the guests may be ephemeral (or just
restarted) so we can't just hard-code everything in a DTB. While the
virtio-negotiation in the config space covers most things we still need
information like where in the guests address space the shared memory
lives and at what offset into that the queues are created. As far as I'm
aware the canonical source of domain information is XenStore
(https://wiki.xenproject.org/wiki/XenStore) but this relies on a Dom0
type approach. Is there an alternative for dom0less systems or do we
need a dom0-light approach, for example using STR-21 (Ensure Zephyr can
run cleanly as a Dom0 guest) providing just enough services for FE's to
register metadata and BE's to read it?

 * How to handle mapping of memory

AIUI the Xen model is the FE guest explicitly makes grant table requests
to expose portions of it's memory to other domains. Can the BE query the
hypervisor itself to discover the available grants or does it require
coordination with Dom0/XenStore for that information to be available to
the BE domain?

 * How to handle signalling

I guess this requires a minimal implementation of the IOREQ calls for
Zephyr so we can register the handler in the backend? Does the IOREQ API
allow for a IPI style notifications using the global GIC IRQs?

Forgive the incomplete notes from the Stratos sync, I was trying to type
while participating in the discussion so hopefully this email captures
what was missed:

  
https://linaro.atlassian.net/wiki/spaces/STR/pages/28682518685/2022-02-03+Project+Stratos+Sync+Meeting+Notes

Vincent, anything to add?

-- 
Alex Bennée



Re: [PATCH] xen/public: partially revert commit 7c7f7e8fba01

2022-02-07 Thread Jan Beulich
On 07.02.2022 11:36, Juergen Gross wrote:
> Commit 7c7f7e8fba01 changed xen/include/public/memory.h in an incompatible
> way. Unfortunately the changed parts were already in use in the Linux
> kernel, so an update of the header in the kernel would result in a build
> breakage.
> 
> Even when removing its usage from the kernel the used flag bit should be
> marked as reserved in order to avoid to give it a different semantic in
> the future.
> 
> Do a partial revert of said commit in order to enable the kernel to take
> an updated version of memory.h.

I don't think it should be a partial revert, and as said on irc I'm of
the opinion that ...

> Fixes: 7c7f7e8fba01 ("include/public/memory.h: remove the 
> XENMEM_rsrc_acq_caller_owned flag")

... it's 0e2e54966af5 which should have taken measures to protect
against re-use of the flag as an output.

> --- a/xen/include/public/memory.h
> +++ b/xen/include/public/memory.h
> @@ -662,7 +662,17 @@ struct xen_mem_acquire_resource {
>   * two calls.
>   */
>  uint32_t nr_frames;
> -uint32_t pad;
> +
> +/*
> + * OUT - Must be zero on entry. On return this may contain a bitwise
> + *   OR of the following values.
> + */
> +uint32_t flags;
> +
> +/* No longer supported - will be never set */
> +#define _XENMEM_rsrc_acq_caller_owned 0
> +#define XENMEM_rsrc_acq_caller_owned (1u << _XENMEM_rsrc_acq_caller_owned)

I think this goes too far: Neither do we want to re-introduce the
#define-s, nor should we re-fix the purpose of the padding field
to be OUT (only). All we need to make sure is that the field
coming in as zero won't get responded to by setting bit 0 of it.
Imo this can only reasonably be done by way of adding a comment.
This comment may, in turn, mention XENMEM_rsrc_acq_caller_owned
of course.

Btw., if the field was to become OUT-only again, I think you'd
also need to revert the change to xen/common/compat/memory.c. At
least to not leave a trap for someone to later fall into.

Jan




Re: [PATCH] xen/x2apic: Fix inconsistent indenting

2022-02-07 Thread Juergen Gross

On 07.02.22 11:35, Jiapeng Chong wrote:

Eliminate the follow smatch warning:

arch/x86/xen/enlighten_hvm.c:189 xen_cpu_dead_hvm() warn: inconsistent
indenting.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[ovmf test] 168042: all pass - PUSHED

2022-02-07 Thread osstest service owner
flight 168042 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/168042/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf f78b937c95ddc4f7a29e41fee98e96076828a108
baseline version:
 ovmf 6fb09da89f88000a7592171a0ce08cf1feaa0646

Last test of basis   168038  2022-02-07 01:55:23 Z0 days
Testing same since   168042  2022-02-07 06:41:36 Z0 days1 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Jake Garver 
  Jake Garver via groups.io 
  Wei6 Xu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   6fb09da89f..f78b937c95  f78b937c95ddc4f7a29e41fee98e96076828a108 -> 
xen-tested-master



[PATCH] xen/public: partially revert commit 7c7f7e8fba01

2022-02-07 Thread Juergen Gross
Commit 7c7f7e8fba01 changed xen/include/public/memory.h in an incompatible
way. Unfortunately the changed parts were already in use in the Linux
kernel, so an update of the header in the kernel would result in a build
breakage.

Even when removing its usage from the kernel the used flag bit should be
marked as reserved in order to avoid to give it a different semantic in
the future.

Do a partial revert of said commit in order to enable the kernel to take
an updated version of memory.h.

Fixes: 7c7f7e8fba01 ("include/public/memory.h: remove the 
XENMEM_rsrc_acq_caller_owned flag")
Signed-off-by: Juergen Gross 
---
 xen/common/memory.c |  2 +-
 xen/include/public/memory.h | 12 +++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 0d7c413df8..9b5214a8a9 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1234,7 +1234,7 @@ static int acquire_resource(
 if ( copy_from_guest(, arg, 1) )
 return -EFAULT;
 
-if ( xmar.pad != 0 )
+if ( xmar.flags != 0 )
 return -EINVAL;
 
 /*
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 383a9468c3..fd768e0b7b 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -662,7 +662,17 @@ struct xen_mem_acquire_resource {
  * two calls.
  */
 uint32_t nr_frames;
-uint32_t pad;
+
+/*
+ * OUT - Must be zero on entry. On return this may contain a bitwise
+ *   OR of the following values.
+ */
+uint32_t flags;
+
+/* No longer supported - will be never set */
+#define _XENMEM_rsrc_acq_caller_owned 0
+#define XENMEM_rsrc_acq_caller_owned (1u << _XENMEM_rsrc_acq_caller_owned)
+
 /*
  * IN - the index of the initial frame to be mapped. This parameter
  *  is ignored if nr_frames is 0.  This value may be updated
-- 
2.34.1




[PATCH] xen/x2apic: Fix inconsistent indenting

2022-02-07 Thread Jiapeng Chong
Eliminate the follow smatch warning:

arch/x86/xen/enlighten_hvm.c:189 xen_cpu_dead_hvm() warn: inconsistent
indenting.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 arch/x86/xen/enlighten_hvm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 6448c5071117..6f4c9b57eda8 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -185,8 +185,7 @@ static int xen_cpu_dead_hvm(unsigned int cpu)
 
if (xen_have_vector_callback && xen_feature(XENFEAT_hvm_safe_pvclock))
xen_teardown_timer(cpu);
-
-   return 0;
+   return 0;
 }
 
 static bool no_vector_callback __initdata;
-- 
2.20.1.7.g153144c




Re: [PATCH 08/16] x86/P2M: PoD, altp2m, and nested-p2m are HVM-only

2022-02-07 Thread Jan Beulich
On 05.02.2022 22:29, George Dunlap wrote:
>> On Jul 5, 2021, at 5:09 PM, Jan Beulich  wrote:
>> --- a/xen/arch/x86/mm/p2m-pod.c
>> +++ b/xen/arch/x86/mm/p2m-pod.c
>> @@ -1135,6 +1135,12 @@ p2m_pod_demand_populate(struct p2m_domai
>> mfn_t mfn;
>> unsigned long i;
>>
>> +if ( !p2m_is_hostp2m(p2m) )
>> +{
>> +ASSERT_UNREACHABLE();
>> +return false;
>> +}
>> +
>> ASSERT(gfn_locked_by_me(p2m, gfn));
>> pod_lock(p2m);
> 
> Why this check rather than something which explicitly says HVM?

Checking for just HVM is too lax here imo. PoD operations should
never be invoked for alternative or nested p2ms; see the various
uses of p2m_get_hostp2m() in p2m-pod.c. However, looking at the
call sites again, I no longer see why I did put in
ASSERT_UNREACHABLE() here. IOW ...

> If you really mean to check for HVM here but are just using this as a 
> shortcut, it needs a comment.

... it's not just a shortcut, yet it feels as if even then you'd
want a comment attached. I'm not really sure though what such a
comment might say which goes beyond what the use p2m_is_hostp2m()
already communicates.

> With that addressed:
> 
> Reviewed-by: George Dunlap 

Thanks, but as per above I'll wait with making use of this.

Jan




Re: [PATCH] dom0/pvh: fix processing softirqs during memory map population

2022-02-07 Thread Jan Beulich
On 07.02.2022 10:51, Roger Pau Monné wrote:
> On Mon, Feb 07, 2022 at 09:21:11AM +0100, Jan Beulich wrote:
>> On 05.02.2022 11:18, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/hvm/dom0_build.c
>>> +++ b/xen/arch/x86/hvm/dom0_build.c
>>> @@ -186,7 +186,7 @@ static int __init pvh_populate_memory_range(struct 
>>> domain *d,
>>>  start += 1UL << order;
>>>  nr_pages -= 1UL << order;
>>>  order_stats[order]++;
>>> -if ( (++i % MAP_MAX_ITER) == 0 )
>>> +if ( (i++ % MAP_MAX_ITER) == 0 )
>>>  process_pending_softirqs();
>>>  }
>>
>> This way is perhaps easiest, so
>>
>> Acked-by: Jan Beulich 
>>
>> but I'd like you to consider to avoid doing this on the first
>> iteration. How about keeping the code here as is, but instead
>> insert an invocation in the sole caller (and there unconditionally
>> at the end of every successful loop iteration)?
> 
> In fact I was thinking that we should call process_pending_softirqs on
> every iteration: the calls to guest_physmap_add_page could use a 1G
> page order, so if not using sync-pt (at least until your series for
> IOMMU super-page support is committed) mapping a whole 1G page using
> 4K chunks on the IOMMU page-tables could be quite time consuming, and
> hence we would likely need to process softirqs on every iteration.

Good point; please do so.

Jan




Re: [PATCH 03/16] x86/P2M: drop a few CONFIG_HVM

2022-02-07 Thread Jan Beulich
On 04.02.2022 23:13, George Dunlap wrote:
> On Mon, Jul 5, 2021 at 5:06 PM Jan Beulich  wrote:
> 
>> This is to make it easier to see which parts of p2m.c still aren't HVM-
>> specific: In one case the conditionals sat in an already guarded region,
>> while in the other case P2M_AUDIT implies HVM.
>>
> 
> I think this would be much more easy to understand what's going on if it
> was more like this:
> 
> ---
> x86/p2m: P2M_AUDIT implies CONFIG_HVM
> 
> Remove one #endif / #ifdef CONFIG_HVM pair to make all the audit code
> CONFIG_HVM only.  This is to make it easier to see which parts of p2m.c
> still aren't HVM-specific.
> 
> While here, remove a redundant set of nested #ifdef CONFIG_HVM.
> ---
> 
> Reviewed-by: George Dunlap 

Thanks. Unless you tell me otherwise I'll assume the changed title and
description are merely a suggestion, not a requirement for your R-b to
apply. I continue to like my variant better; in particular I'd like to
not mention P2M_AUDIT in the title and this way avoid "While here, ..."
or alike.

Jan




Re: [PATCH] dom0/pvh: fix processing softirqs during memory map population

2022-02-07 Thread Roger Pau Monné
On Mon, Feb 07, 2022 at 09:21:11AM +0100, Jan Beulich wrote:
> On 05.02.2022 11:18, Roger Pau Monne wrote:
> > Make sure softirqs are processed at least once for every call to
> > pvh_populate_memory_range. It's likely that none of the calls to
> > pvh_populate_memory_range will perform 64 iterations, in which case
> > softirqs won't be processed for the whole duration of the p2m
> > population.
> > 
> > In order to force softirqs to be processed at least once for every
> > pvh_populate_memory_range call move the increasing of 'i' to be done
> > after evaluation, so on the first loop iteration softirqs will
> > unconditionally be processed.
> 
> Nit: The change still guarantees one invocation only for every
> iteration not encountering an error. That's fine from a functional
> pov, but doesn't fully match what you claim.

OK, will fix on next iteration.

> > --- a/xen/arch/x86/hvm/dom0_build.c
> > +++ b/xen/arch/x86/hvm/dom0_build.c
> > @@ -186,7 +186,7 @@ static int __init pvh_populate_memory_range(struct 
> > domain *d,
> >  start += 1UL << order;
> >  nr_pages -= 1UL << order;
> >  order_stats[order]++;
> > -if ( (++i % MAP_MAX_ITER) == 0 )
> > +if ( (i++ % MAP_MAX_ITER) == 0 )
> >  process_pending_softirqs();
> >  }
> 
> This way is perhaps easiest, so
> 
> Acked-by: Jan Beulich 
> 
> but I'd like you to consider to avoid doing this on the first
> iteration. How about keeping the code here as is, but instead
> insert an invocation in the sole caller (and there unconditionally
> at the end of every successful loop iteration)?

In fact I was thinking that we should call process_pending_softirqs on
every iteration: the calls to guest_physmap_add_page could use a 1G
page order, so if not using sync-pt (at least until your series for
IOMMU super-page support is committed) mapping a whole 1G page using
4K chunks on the IOMMU page-tables could be quite time consuming, and
hence we would likely need to process softirqs on every iteration.

> 
> Furthermore, how about taking the opportunity and deleting the mis-
> named and single-use-only MAP_MAX_ITER define?

Right, let me know your opinion about the comment above.

Thanks, Roger.



Re: [PATCH 02/16] x86/P2M: introduce p2m_{add,remove}_page()

2022-02-07 Thread Jan Beulich
On 04.02.2022 23:07, George Dunlap wrote:
> On Mon, Jul 5, 2021 at 5:06 PM Jan Beulich  wrote:
> 
>> p2m_add_page() is simply a rename from guest_physmap_add_entry().
>> p2m_remove_page() then is its counterpart, despite rendering
>> guest_physmap_remove_page().

First of all: It has been long ago that I noticed that this sentence
misses words. It now ends "...  a trivial wrapper."

>> This way callers can use suitable pairs of
>> functions (previously violated by hvm/grant_table.c).
>>
> 
> Obviously this needs some clarification.  While we're here, I find this a
> bit confusing; I tend to use the present tense for the way the code is
> before the patch, and the imperative for what the patch does; so Id' say:
> 
> Rename guest_physmap_add_entry() to p2m_add_page; make
> guest_physmap_remove_page a wrapper with p2m_remove_page.  That way callers
> can use suitable pairs...

Well, yes, I understand you might word it this way. I'm not convinced
of the fixed scheme you mention for present vs imperative use to be a
universal fit though, requiring to always be followed. When reading
the description with the title in mind (and with the previously missing
words added), I find the use of present tense quite reasonable here.
I'm further slightly puzzled by you keeping the use of present tense in
"That way callers can use ...".

Jan




Re: [PATCH 01/16] x86/P2M: rename p2m_remove_page()

2022-02-07 Thread Jan Beulich
On 04.02.2022 22:54, George Dunlap wrote:
> On Mon, Jul 5, 2021 at 5:05 PM Jan Beulich  wrote:
> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -788,8 +788,8 @@ void p2m_final_teardown(struct domain *d
>>  #ifdef CONFIG_HVM
>>
>>  static int __must_check
>> -p2m_remove_page(struct p2m_domain *p2m, gfn_t gfn, mfn_t mfn,
>> -unsigned int page_order)
>> +p2m_remove_entry(struct p2m_domain *p2m, gfn_t gfn, mfn_t mfn,
>> + unsigned int page_order)
>>
> 
> One question that's naturally raised for both this and the following patch
> is, what is the new naming "scheme" for these renamed functions, and how do
> they relate to the old scheme?
> 
> Overall it seems like the intention is that "guest_physmap_..." can be
> called on a domain which may be PV or HVM, while "p2m_..." should only be
> called on HVM domains.

Yes. I think by the end of the series all p2m_...() named functions
pertain to HVM domains only.

> There's also "..._entry" vs "..._page".  Is the p2m_remove_page /
> p2m_remove_entry distinction have a meaning, and is it the same meaning as
> guest_physmap_add_page / guest_physmap_add_entry?

In the next patch a pair p2m_{add,remove}_page() is introduced.
p2m_remove_entry() remains a static helper for the latter of the two,
assuming the GFN is already locked. I've used the "page" vs "entry" in
the names just like it was used prior to patch 2; I'd be happy to take
suggestions on what else could be used in place of "entry" (but I'd
like to stick to "page").

Jan




Re: [PATCH 2/2] xen/x86: detect support for extended destination ID

2022-02-07 Thread Juergen Gross

On 20.01.22 16:25, Roger Pau Monne wrote:

Xen allows the usage of some previously reserved bits in the IO-APIC
RTE and the MSI address fields in order to store high bits for the
target APIC ID. Such feature is already implemented by QEMU/KVM and
HyperV, so in order to enable it just add the handler that checks for
it's presence.

Signed-off-by: Roger Pau Monné 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH RFC 3/3] xen: Introduce a platform sub-op to retrieve the VGA information

2022-02-07 Thread Jan Beulich
On 06.02.2022 20:28, Julien Grall wrote:
> From: Julien Grall 
> 
> When using EFI, the VGA information is fetched using the EFI
> boot services. However, Xen will have exited the boot services.
> Therefore, we need to find a different way to pass the information
> to dom0.
> 
> For PV dom0, they are part of the start_info. But this is not
> something that exists on Arm. So the best way would to be to
> use a hypercall.
> 
> For now the structure layout is based on dom0_vga_console_info
> for convenience. I am open on another proposal.
> 
> Signed-off-by: Julien Grall 

Cc-ing Roger as this may want using for PVH Dom0 also on x86; my
first attempt to propagate this information was rejected.

> --- a/xen/include/public/platform.h
> +++ b/xen/include/public/platform.h
> @@ -244,6 +244,7 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_efi_runtime_call_t);
>  #define  XEN_FW_EFI_PCI_ROM5
>  #define  XEN_FW_EFI_APPLE_PROPERTIES 6
>  #define XEN_FW_KBD_SHIFT_FLAGS5
> +#define XEN_FW_VGA_INFO   6

Perhaps s/VGA/VIDEO/, despite ...

>  struct xenpf_firmware_info {
>  /* IN variables. */
>  uint32_t type;
> @@ -311,6 +312,7 @@ struct xenpf_firmware_info {
>  
>  /* Int16, Fn02: Get keyboard shift flags. */
>  uint8_t kbd_shift_flags; /* XEN_FW_KBD_SHIFT_FLAGS */
> +struct dom0_vga_console_info vga;

... the structure name including "vga" (but if the #define is adjusted,
the field name would want to become "video" as well).

Jan




Re: [PATCH RFC 2/3] xen/arm: efi: Introduce and fill the vga_console_info

2022-02-07 Thread Jan Beulich
On 06.02.2022 20:28, Julien Grall wrote:
> From: Julien Grall 
> 
> In a follow-up patch will we want to add support for EFI framebuffer
> in dom0. Yet, Xen may not use the framebuffer, so it would be ideal
> to not have to enable CONFIG_VIDEO/CONFIG_VGA.
> 
> Introduce vga_console_info in a hacky way and move the code
> to fill it up from x86 to common.
> 
> Signed-off-by: Julien Grall 
> 
> 
> 
> This is a bit of a hack. Sent early to gather opinion on whether
> we should enable allow Dom0 to use the EFI Framebuffer even
> if Xen is built with CONFIG_VIDEO=n on Arm.

I have no input here; this will need to be settled among you Arm folks.
I have no objection to the code movement, just one nit:

> @@ -1025,6 +1026,77 @@ static void __init efi_get_apple_properties(void)
>  }
>  }
>  
> +static void __init efi_video_init(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
> +  UINTN info_size,
> +  EFI_GRAPHICS_OUTPUT_MODE_INFORMATION 
> *mode_info)
> +{
> +#if defined(CONFIG_VIDEO) || defined(CONFIG_ARM)
> +int bpp = 0;
> +
> +switch ( mode_info->PixelFormat )
> +{
> +case PixelRedGreenBlueReserved8BitPerColor:
> +vga_console_info.u.vesa_lfb.red_pos = 0;
> +vga_console_info.u.vesa_lfb.red_size = 8;
> +vga_console_info.u.vesa_lfb.green_pos = 8;
> +vga_console_info.u.vesa_lfb.green_size = 8;
> +vga_console_info.u.vesa_lfb.blue_pos = 16;
> +vga_console_info.u.vesa_lfb.blue_size = 8;
> +vga_console_info.u.vesa_lfb.rsvd_pos = 24;
> +vga_console_info.u.vesa_lfb.rsvd_size = 8;
> +bpp = 32;
> +break;
> +case PixelBlueGreenRedReserved8BitPerColor:
> +vga_console_info.u.vesa_lfb.red_pos = 16;
> +vga_console_info.u.vesa_lfb.red_size = 8;
> +vga_console_info.u.vesa_lfb.green_pos = 8;
> +vga_console_info.u.vesa_lfb.green_size = 8;
> +vga_console_info.u.vesa_lfb.blue_pos = 0;
> +vga_console_info.u.vesa_lfb.blue_size = 8;
> +vga_console_info.u.vesa_lfb.rsvd_pos = 24;
> +vga_console_info.u.vesa_lfb.rsvd_size = 8;
> +bpp = 32;
> +break;
> +case PixelBitMask:
> +bpp = set_color(mode_info->PixelInformation.RedMask, bpp,
> +_console_info.u.vesa_lfb.red_pos,
> +_console_info.u.vesa_lfb.red_size);
> +bpp = set_color(mode_info->PixelInformation.GreenMask, bpp,
> +_console_info.u.vesa_lfb.green_pos,
> +_console_info.u.vesa_lfb.green_size);
> +bpp = set_color(mode_info->PixelInformation.BlueMask, bpp,
> +_console_info.u.vesa_lfb.blue_pos,
> +_console_info.u.vesa_lfb.blue_size);
> +if ( mode_info->PixelInformation.ReservedMask )
> +bpp = set_color(mode_info->PixelInformation.ReservedMask, bpp,
> +_console_info.u.vesa_lfb.rsvd_pos,
> +_console_info.u.vesa_lfb.rsvd_size);
> +if ( bpp > 0 )
> +break;
> +/* fall through */
> +default:
> +PrintErr(L"Current graphics mode is unsupported!\r\n");
> +bpp  = 0;
> +break;
> +}
> +if ( bpp > 0 )
> +{
> +vga_console_info.video_type = XEN_VGATYPE_EFI_LFB;
> +vga_console_info.u.vesa_lfb.gbl_caps = 2; /* possibly non-VGA */
> +vga_console_info.u.vesa_lfb.width =
> +mode_info->HorizontalResolution;
> +vga_console_info.u.vesa_lfb.height = mode_info->VerticalResolution;
> +vga_console_info.u.vesa_lfb.bits_per_pixel = bpp;
> +vga_console_info.u.vesa_lfb.bytes_per_line =
> +(mode_info->PixelsPerScanLine * bpp + 7) >> 3;
> +vga_console_info.u.vesa_lfb.lfb_base = gop->Mode->FrameBufferBase;
> +vga_console_info.u.vesa_lfb.ext_lfb_base = 
> gop->Mode->FrameBufferBase >> 32;
> +vga_console_info.u.vesa_lfb.lfb_size =
> +(gop->Mode->FrameBufferSize + 0x) >> 16;
> +}
> +#endif
> +}

While you move this code, could you please insert blank lines between
non-fall-through case blocks, and perhaps another one between the switch()
and the if() blocks? And it looks like
- the "gop" parameter could also do with becoming pointer-to-const,
- the expanded #ifdef could do with a comment briefly explaining why Arm
  needs-special casing.

Jan




Re: [PATCH] xen/x86: obtain full video frame buffer address for Dom0 also under EFI

2022-02-07 Thread Juergen Gross

On 07.02.22 08:41, Jan Beulich wrote:

The initial change would not work when Xen was booted from EFI: There
is an early exit from the case block in that case. Move the necessary
code ahead of that.

Fixes: 335e4dd67b48 ("xen/x86: obtain upper 32 bits of video frame
buffer address for Dom0") Signed-off-by: Jan Beulich



Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


  1   2   >