Re: NVME hotplug support ?

2024-01-23 Thread Hannes Reinecke

On 1/24/24 07:52, Philippe Mathieu-Daudé wrote:

Hi Hannes,

[+Markus as QOM/QDev rubber duck]

On 23/1/24 13:40, Hannes Reinecke wrote:

On 1/23/24 11:59, Damien Hedde wrote:

Hi all,

We are currently looking into hotplugging nvme devices and it is 
currently not possible:

When nvme was introduced 2 years ago, the feature was disabled.

commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
Author: Klaus Jensen
Date:   Tue Jul 6 10:48:40 2021 +0200

    hw/nvme: mark nvme-subsys non-hotpluggable
    We currently lack the infrastructure to handle subsystem 
hotplugging, so

    disable it.


Do someone know what's lacking or anyone have some tips/idea of what 
we should develop to add the support ?


Problem is that the object model is messed up. In qemu namespaces are 
attached to controllers, which in turn are children of the PCI device.

There are subsystems, but these just reference the controller.

So if you hotunplug the PCI device you detach/destroy the controller 
and detach the namespaces from the controller.
But if you hotplug the PCI device again the NVMe controller will be 
attached to the PCI device, but the namespace are still be detached.


Klaus said he was going to fix that, and I dimly remember some patches
floating around. But apparently it never went anywhere.

Fundamental problem is that the NVMe hierarchy as per spec is 
incompatible with the qemu object model; qemu requires a strict

tree model where every object has exactly _one_ parent.


The modelling problem is not clear to me.
Do you have an example of how the NVMe hierarchy should be?


Sure.

As per NVMe spec we have this hierarchy:

 --->  subsys ---
||
|V
controller  namespaces

There can be several controllers, and several
namespaces.
The initiator (ie the linux 'nvme' driver) connects
to a controller, queries the subsystem for the attached
namespaces, and presents each namespace as a block device.

For Qemu we have the problem that every device _must_ be
a direct descendant of the parent (expressed by the fact
that each 'parent' object is embedded in the device object).

So if we were to present a NVMe PCI device, the controller
must be derived from the PCI device:

pci -> controller

but now we have to express the NVMe hierarchy, too:

pci -> ctrl1 -> subsys1 -> namespace1

which actually works.
We can easily attach several namespaces:

pci -> ctrl1 ->subsys1 -> namespace2

For a single controller and a single subsystem.
However, as mentioned above, there can be _several_
controllers attached to the same subsystem.
So we can express the second controller:

pci -> ctrl2

but we cannot attach the controller to 'subsys1'
as then 'subsys1' would need to be derived from
'ctrl2', and not (as it is now) from 'ctrl1'.

The most logical step would be to have 'subsystems'
their own entity, independent of any controllers.
But then the block devices (which are derived from
the namespaces) could not be traced back
to the PCI device, and a PCI hotplug would not
'automatically' disconnect the nvme block devices.

Plus the subsystem would be independent from the NVMe
PCI devices, so you could have a subsystem with
no controllers attached. And one would wonder who
should be responsible for cleaning up that.

Cheers,

Hannes




Re: NVME hotplug support ?

2024-01-23 Thread Klaus Jensen
On Jan 23 13:40, Hannes Reinecke wrote:
> On 1/23/24 11:59, Damien Hedde wrote:
> > Hi all,
> > 
> > We are currently looking into hotplugging nvme devices and it is currently 
> > not possible:
> > When nvme was introduced 2 years ago, the feature was disabled.
> > > commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> > > Author: Klaus Jensen
> > > Date:   Tue Jul 6 10:48:40 2021 +0200
> > > 
> > > hw/nvme: mark nvme-subsys non-hotpluggable
> > > We currently lack the infrastructure to handle subsystem hotplugging, 
> > > so
> > > disable it.
> > 
> > Do someone know what's lacking or anyone have some tips/idea of what we 
> > should develop to add the support ?
> > 
> Problem is that the object model is messed up. In qemu namespaces are
> attached to controllers, which in turn are children of the PCI device.
> There are subsystems, but these just reference the controller.
> 
> So if you hotunplug the PCI device you detach/destroy the controller and
> detach the namespaces from the controller.
> But if you hotplug the PCI device again the NVMe controller will be attached
> to the PCI device, but the namespace are still be detached.
> 
> Klaus said he was going to fix that, and I dimly remember some patches
> floating around. But apparently it never went anywhere.
> 
> Fundamental problem is that the NVMe hierarchy as per spec is incompatible
> with the qemu object model; qemu requires a strict
> tree model where every object has exactly _one_ parent.
> 

A little history might help to nuance this just a bit. And to defend the
current model ;)

When we added support for multiple namespaces we did not consider
subsystem support, so the namespaces would just be associated directly
with a parent controller (in QDev terms, the parent has a bus that the
namespace devices are attached to).

When we added subsystems, where namespaces may be attached to several
controllers, it became necessary to break the controller/namespace
parent/child relationship. The problem was that removing the controller
would take all the bus children with it, causing namespaces to be
removed from other controllers in the subsystem. We fixed this by
reparenting the namespaces to the subsystem device instead.

I think this model fits the NVMe hierarchy as good as possible.
Controllers and namespaces are considered children of the subsystem (as
they are in NVMe).

Now, the problem with namespaces not being re-attached is partly false.
If the namespaces are 'shared=on', they will be automatically attached
to any new controller attached to the subsystem. However, if they are
private, that is is not the case. In NVMe, a private namespace just
means a namespace that can only be attached to a single controller at a
time. It is not entirely unlikely that you have a private namespace that
you then reassign to controller B when controller A is removed. Now,
what we could do is track the last controller identifier that a private
namespace was attached to, and if the same controller identifier is
added to the subsystem, we could reattach the private namespace.

However, broadly, I think the current model does a pretty good job in
supporting experimentation with hotplug, multipath and failover
configurations.


signature.asc
Description: PGP signature


Re: NVME hotplug support ?

2024-01-23 Thread Philippe Mathieu-Daudé

Hi Hannes,

[+Markus as QOM/QDev rubber duck]

On 23/1/24 13:40, Hannes Reinecke wrote:

On 1/23/24 11:59, Damien Hedde wrote:

Hi all,

We are currently looking into hotplugging nvme devices and it is 
currently not possible:

When nvme was introduced 2 years ago, the feature was disabled.

commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
Author: Klaus Jensen
Date:   Tue Jul 6 10:48:40 2021 +0200

    hw/nvme: mark nvme-subsys non-hotpluggable
    We currently lack the infrastructure to handle subsystem 
hotplugging, so

    disable it.


Do someone know what's lacking or anyone have some tips/idea of what 
we should develop to add the support ?


Problem is that the object model is messed up. In qemu namespaces are 
attached to controllers, which in turn are children of the PCI device.

There are subsystems, but these just reference the controller.

So if you hotunplug the PCI device you detach/destroy the controller and 
detach the namespaces from the controller.
But if you hotplug the PCI device again the NVMe controller will be 
attached to the PCI device, but the namespace are still be detached.


Klaus said he was going to fix that, and I dimly remember some patches
floating around. But apparently it never went anywhere.

Fundamental problem is that the NVMe hierarchy as per spec is 
incompatible with the qemu object model; qemu requires a strict

tree model where every object has exactly _one_ parent.


The modelling problem is not clear to me.
Do you have an example of how the NVMe hierarchy should be?

Thanks,

Phil.



Re: [PATCH v2 1/4] util/uri: Remove uri_string_unescape()

2024-01-23 Thread Stefan Weil via

Am 23.01.24 um 19:22 schrieb Thomas Huth:


uri_string_unescape() basically does the same as the glib function
g_uri_unescape_segment(). So we can get rid of our implementation
completely by simply using the glib function instead.

Suggested-by: Stefan Weil  [g_uri_unescape_string()]
Suggested-by: Paolo Bonzini  [g_uri_unescape_segment()]
Signed-off-by: Thomas Huth 
---
  include/qemu/uri.h |  1 -
  util/uri.c | 97 ++
  2 files changed, 11 insertions(+), 87 deletions(-)



Reviewed-by: Stefan Weil

Thank you, Thomas and Paolo.




Re: [PATCH v2 0/4] util/uri: Simplify the code, remove unused functions

2024-01-23 Thread Richard Henderson

On 1/24/24 04:22, Thomas Huth wrote:

Thomas Huth (4):
   util/uri: Remove uri_string_unescape()
   util/uri: Remove unused functions uri_resolve() and
 uri_resolve_relative()
   util/uri: Remove the uri_string_escape() function
   util/uri: Remove unused macros ISA_RESERVED() and ISA_GEN_DELIM()


Reviewed-by: Richard Henderson 

r~



[PATCH v2 4/4] util/uri: Remove unused macros ISA_RESERVED() and ISA_GEN_DELIM()

2024-01-23 Thread Thomas Huth
They are not used anywhere, so there's no need to keep them around.

Message-ID: <20240122191753.103118-6-th...@redhat.com>
Reviewed-by: Stefan Weil 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: "Daniel P. Berrangé" 
Signed-off-by: Thomas Huth 
---
 util/uri.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/util/uri.c b/util/uri.c
index 350835b03f..573174bf47 100644
--- a/util/uri.c
+++ b/util/uri.c
@@ -163,19 +163,6 @@ static void uri_clean(URI *uri);
  ((*(p) == '+')) || ((*(p) == ',')) || ((*(p) == ';')) ||  
\
  ((*(p) == '=')) || ((*(p) == '\'')))
 
-/*
- *gen-delims= ":" / "/" / "?" / "#" / "[" / "]" / "@"
- */
-#define ISA_GEN_DELIM(p)   
\
-(((*(p) == ':')) || ((*(p) == '/')) || ((*(p) == '?')) ||  
\
- ((*(p) == '#')) || ((*(p) == '[')) || ((*(p) == ']')) ||  
\
- ((*(p) == '@')))
-
-/*
- *reserved  = gen-delims / sub-delims
- */
-#define ISA_RESERVED(p) (ISA_GEN_DELIM(p) || (ISA_SUB_DELIM(p)))
-
 /*
  *unreserved= ALPHA / DIGIT / "-" / "." / "_" / "~"
  */
-- 
2.43.0




[PATCH v2 3/4] util/uri: Remove the uri_string_escape() function

2024-01-23 Thread Thomas Huth
Now that uri_resolve_relative() has been removed, this function is not
used in QEMU anymore - and if somebody needs this functionality, they
can simply use g_uri_escape_string() from the glib instead.

Reviewed-by: Stefan Weil 
Reviewed-by: "Daniel P. Berrangé" 
Signed-off-by: Thomas Huth 
---
 include/qemu/uri.h |  1 -
 util/uri.c | 70 --
 2 files changed, 71 deletions(-)

diff --git a/include/qemu/uri.h b/include/qemu/uri.h
index 899ce852f5..255e61f452 100644
--- a/include/qemu/uri.h
+++ b/include/qemu/uri.h
@@ -76,7 +76,6 @@ URI *uri_parse(const char *str);
 URI *uri_parse_raw(const char *str, int raw);
 int uri_parse_into(URI *uri, const char *str);
 char *uri_to_string(URI *uri);
-char *uri_string_escape(const char *str, const char *list);
 void uri_free(URI *uri);
 
 /* Single web service query parameter 'name=value'. */
diff --git a/util/uri.c b/util/uri.c
index 1891ca6fb3..350835b03f 100644
--- a/util/uri.c
+++ b/util/uri.c
@@ -1349,76 +1349,6 @@ void uri_free(URI *uri)
 g_free(uri);
 }
 
-/
- *  *
- *   Helper functions   *
- *  *
- /
-
-/**
- * uri_string_escape:
- * @str:  string to escape
- * @list: exception list string of chars not to escape
- *
- * This routine escapes a string to hex, ignoring reserved characters (a-z)
- * and the characters in the exception list.
- *
- * Returns a new escaped string or NULL in case of error.
- */
-char *uri_string_escape(const char *str, const char *list)
-{
-char *ret, ch;
-char *temp;
-const char *in;
-int len, out;
-
-if (str == NULL) {
-return NULL;
-}
-if (str[0] == 0) {
-return g_strdup(str);
-}
-len = strlen(str);
-if (!(len > 0)) {
-return NULL;
-}
-
-len += 20;
-ret = g_malloc(len);
-in = str;
-out = 0;
-while (*in != 0) {
-if (len - out <= 3) {
-temp = realloc2n(ret, );
-ret = temp;
-}
-
-ch = *in;
-
-if ((ch != '@') && (!IS_UNRESERVED(ch)) && (!strchr(list, ch))) {
-unsigned char val;
-ret[out++] = '%';
-val = ch >> 4;
-if (val <= 9) {
-ret[out++] = '0' + val;
-} else {
-ret[out++] = 'A' + val - 0xA;
-}
-val = ch & 0xF;
-if (val <= 9) {
-ret[out++] = '0' + val;
-} else {
-ret[out++] = 'A' + val - 0xA;
-}
-in++;
-} else {
-ret[out++] = *in++;
-}
-}
-ret[out] = 0;
-return ret;
-}
-
 /
  *  *
  *   Public functions   *
-- 
2.43.0




[PATCH v2 1/4] util/uri: Remove uri_string_unescape()

2024-01-23 Thread Thomas Huth
uri_string_unescape() basically does the same as the glib function
g_uri_unescape_segment(). So we can get rid of our implementation
completely by simply using the glib function instead.

Suggested-by: Stefan Weil  [g_uri_unescape_string()]
Suggested-by: Paolo Bonzini  [g_uri_unescape_segment()]
Signed-off-by: Thomas Huth 
---
 include/qemu/uri.h |  1 -
 util/uri.c | 97 ++
 2 files changed, 11 insertions(+), 87 deletions(-)

diff --git a/include/qemu/uri.h b/include/qemu/uri.h
index 1855b764f2..f0722b75da 100644
--- a/include/qemu/uri.h
+++ b/include/qemu/uri.h
@@ -79,7 +79,6 @@ URI *uri_parse_raw(const char *str, int raw);
 int uri_parse_into(URI *uri, const char *str);
 char *uri_to_string(URI *uri);
 char *uri_string_escape(const char *str, const char *list);
-char *uri_string_unescape(const char *str, int len, char *target);
 void uri_free(URI *uri);
 
 /* Single web service query parameter 'name=value'. */
diff --git a/util/uri.c b/util/uri.c
index dcb3305236..fb7823a43c 100644
--- a/util/uri.c
+++ b/util/uri.c
@@ -267,7 +267,7 @@ static int rfc3986_parse_fragment(URI *uri, const char 
**str)
 if (uri->cleanup & 2) {
 uri->fragment = g_strndup(*str, cur - *str);
 } else {
-uri->fragment = uri_string_unescape(*str, cur - *str, NULL);
+uri->fragment = g_uri_unescape_segment(*str, cur, NULL);
 }
 }
 *str = cur;
@@ -368,7 +368,7 @@ static int rfc3986_parse_user_info(URI *uri, const char 
**str)
 if (uri->cleanup & 2) {
 uri->user = g_strndup(*str, cur - *str);
 } else {
-uri->user = uri_string_unescape(*str, cur - *str, NULL);
+uri->user = g_uri_unescape_segment(*str, cur, NULL);
 }
 }
 *str = cur;
@@ -496,7 +496,7 @@ found:
 if (uri->cleanup & 2) {
 uri->server = g_strndup(host, cur - host);
 } else {
-uri->server = uri_string_unescape(host, cur - host, NULL);
+uri->server = g_uri_unescape_segment(host, cur, NULL);
 }
 } else {
 uri->server = NULL;
@@ -614,7 +614,7 @@ static int rfc3986_parse_path_ab_empty(URI *uri, const char 
**str)
 if (uri->cleanup & 2) {
 uri->path = g_strndup(*str, cur - *str);
 } else {
-uri->path = uri_string_unescape(*str, cur - *str, NULL);
+uri->path = g_uri_unescape_segment(*str, cur, NULL);
 }
 } else {
 uri->path = NULL;
@@ -663,7 +663,7 @@ static int rfc3986_parse_path_absolute(URI *uri, const char 
**str)
 if (uri->cleanup & 2) {
 uri->path = g_strndup(*str, cur - *str);
 } else {
-uri->path = uri_string_unescape(*str, cur - *str, NULL);
+uri->path = g_uri_unescape_segment(*str, cur, NULL);
 }
 } else {
 uri->path = NULL;
@@ -709,7 +709,7 @@ static int rfc3986_parse_path_rootless(URI *uri, const char 
**str)
 if (uri->cleanup & 2) {
 uri->path = g_strndup(*str, cur - *str);
 } else {
-uri->path = uri_string_unescape(*str, cur - *str, NULL);
+uri->path = g_uri_unescape_segment(*str, cur, NULL);
 }
 } else {
 uri->path = NULL;
@@ -755,7 +755,7 @@ static int rfc3986_parse_path_no_scheme(URI *uri, const 
char **str)
 if (uri->cleanup & 2) {
 uri->path = g_strndup(*str, cur - *str);
 } else {
-uri->path = uri_string_unescape(*str, cur - *str, NULL);
+uri->path = g_uri_unescape_segment(*str, cur, NULL);
 }
 } else {
 uri->path = NULL;
@@ -1561,81 +1561,6 @@ done_cd:
 return 0;
 }
 
-static int is_hex(char c)
-{
-if (((c >= '0') && (c <= '9')) || ((c >= 'a') && (c <= 'f')) ||
-((c >= 'A') && (c <= 'F'))) {
-return 1;
-}
-return 0;
-}
-
-/**
- * uri_string_unescape:
- * @str:  the string to unescape
- * @len:   the length in bytes to unescape (or <= 0 to indicate full string)
- * @target:  optional destination buffer
- *
- * Unescaping routine, but does not check that the string is an URI. The
- * output is a direct unsigned char translation of %XX values (no encoding)
- * Note that the length of the result can only be smaller or same size as
- * the input string.
- *
- * Returns a copy of the string, but unescaped, will return NULL only in case
- * of error
- */
-char *uri_string_unescape(const char *str, int len, char *target)
-{
-char *ret, *out;
-const char *in;
-
-if (str == NULL) {
-return NULL;
-}
-if (len <= 0) {
-len = strlen(str);
-}
-if (len < 0) {
-return NULL;
-}
-
-if (target == NULL) {
-ret = g_malloc(len + 1);
-} else {
-ret = 

[PATCH v2 2/4] util/uri: Remove unused functions uri_resolve() and uri_resolve_relative()

2024-01-23 Thread Thomas Huth
These rather complex functions have never been used since they've been
introduced in 2012, so looks like they are not really useful for QEMU.
And since the static normalize_uri_path() function is also only used by
uri_resolve(), we can remove that function now, too.

Reviewed-by: Stefan Weil 
Reviewed-by: "Daniel P. Berrangé" 
Signed-off-by: Thomas Huth 
---
 include/qemu/uri.h |   2 -
 util/uri.c | 689 -
 2 files changed, 691 deletions(-)

diff --git a/include/qemu/uri.h b/include/qemu/uri.h
index f0722b75da..899ce852f5 100644
--- a/include/qemu/uri.h
+++ b/include/qemu/uri.h
@@ -72,8 +72,6 @@ typedef struct URI {
 } URI;
 
 URI *uri_new(void);
-char *uri_resolve(const char *URI, const char *base);
-char *uri_resolve_relative(const char *URI, const char *base);
 URI *uri_parse(const char *str);
 URI *uri_parse_raw(const char *str, int raw);
 int uri_parse_into(URI *uri, const char *str);
diff --git a/util/uri.c b/util/uri.c
index fb7823a43c..1891ca6fb3 100644
--- a/util/uri.c
+++ b/util/uri.c
@@ -1355,212 +1355,6 @@ void uri_free(URI *uri)
  *  *
  /
 
-/**
- * normalize_uri_path:
- * @path:  pointer to the path string
- *
- * Applies the 5 normalization steps to a path string--that is, RFC 2396
- * Section 5.2, steps 6.c through 6.g.
- *
- * Normalization occurs directly on the string, no new allocation is done
- *
- * Returns 0 or an error code
- */
-static int normalize_uri_path(char *path)
-{
-char *cur, *out;
-
-if (path == NULL) {
-return -1;
-}
-
-/* Skip all initial "/" chars.  We want to get to the beginning of the
- * first non-empty segment.
- */
-cur = path;
-while (cur[0] == '/') {
-++cur;
-}
-if (cur[0] == '\0') {
-return 0;
-}
-
-/* Keep everything we've seen so far.  */
-out = cur;
-
-/*
- * Analyze each segment in sequence for cases (c) and (d).
- */
-while (cur[0] != '\0') {
-/*
- * c) All occurrences of "./", where "." is a complete path segment,
- *are removed from the buffer string.
- */
-if ((cur[0] == '.') && (cur[1] == '/')) {
-cur += 2;
-/* '//' normalization should be done at this point too */
-while (cur[0] == '/') {
-cur++;
-}
-continue;
-}
-
-/*
- * d) If the buffer string ends with "." as a complete path segment,
- *that "." is removed.
- */
-if ((cur[0] == '.') && (cur[1] == '\0')) {
-break;
-}
-
-/* Otherwise keep the segment.  */
-while (cur[0] != '/') {
-if (cur[0] == '\0') {
-goto done_cd;
-}
-(out++)[0] = (cur++)[0];
-}
-/* nomalize // */
-while ((cur[0] == '/') && (cur[1] == '/')) {
-cur++;
-}
-
-(out++)[0] = (cur++)[0];
-}
-done_cd:
-out[0] = '\0';
-
-/* Reset to the beginning of the first segment for the next sequence.  */
-cur = path;
-while (cur[0] == '/') {
-++cur;
-}
-if (cur[0] == '\0') {
-return 0;
-}
-
-/*
- * Analyze each segment in sequence for cases (e) and (f).
- *
- * e) All occurrences of "/../", where  is a
- *complete path segment not equal to "..", are removed from the
- *buffer string.  Removal of these path segments is performed
- *iteratively, removing the leftmost matching pattern on each
- *iteration, until no matching pattern remains.
- *
- * f) If the buffer string ends with "/..", where 
- *is a complete path segment not equal to "..", that
- *"/.." is removed.
- *
- * To satisfy the "iterative" clause in (e), we need to collapse the
- * string every time we find something that needs to be removed.  Thus,
- * we don't need to keep two pointers into the string: we only need a
- * "current position" pointer.
- */
-while (1) {
-char *segp, *tmp;
-
-/* At the beginning of each iteration of this loop, "cur" points to
- * the first character of the segment we want to examine.
- */
-
-/* Find the end of the current segment.  */
-segp = cur;
-while ((segp[0] != '/') && (segp[0] != '\0')) {
-++segp;
-}
-
-/* If this is the last segment, we're done (we need at least two
- * segments to meet the criteria for the (e) and (f) cases).
- */
-if (segp[0] == '\0') {
-break;
-}
-
-/* If the first segment is "..", or if the next segment _isn't_ "..",
- * keep this segment and try the next one.
- */
-++segp;
-if (((cur[0] == '.') && (cur[1] == '.') && (segp == cur + 

[PATCH v2 0/4] util/uri: Simplify the code, remove unused functions

2024-01-23 Thread Thomas Huth
The URI function uri_string_unescape() is pretty much the same as the
function g_uri_unescape_segment() from the glib, so we can simplify
our code here quite a bit.
While at it, I also noticed that there are many other unused functions
in here which we likely can drop, too.

v2:
- Replace uri_string_unescape() with g_uri_unescape_segment(), so
  we can remove uri_string_unescape() completely now

Thomas Huth (4):
  util/uri: Remove uri_string_unescape()
  util/uri: Remove unused functions uri_resolve() and
uri_resolve_relative()
  util/uri: Remove the uri_string_escape() function
  util/uri: Remove unused macros ISA_RESERVED() and ISA_GEN_DELIM()

 include/qemu/uri.h |   4 -
 util/uri.c | 869 +
 2 files changed, 11 insertions(+), 862 deletions(-)

-- 
2.43.0




Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

2024-01-23 Thread Hanna Czenczek

On 23.01.24 18:10, Kevin Wolf wrote:

Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:

On 21.12.23 22:23, Kevin Wolf wrote:

From: Stefan Hajnoczi

Stop depending on the AioContext lock and instead access
SCSIDevice->requests from only one thread at a time:
- When the VM is running only the BlockBackend's AioContext may access
the requests list.
- When the VM is stopped only the main loop may access the requests
list.

These constraints protect the requests list without the need for locking
in the I/O code path.

Note that multiple IOThreads are not supported yet because the code
assumes all SCSIRequests are executed from a single AioContext. Leave
that as future work.

Signed-off-by: Stefan Hajnoczi
Reviewed-by: Eric Blake
Message-ID:<20231204164259.1515217-2-stefa...@redhat.com>
Signed-off-by: Kevin Wolf
---
   include/hw/scsi/scsi.h |   7 +-
   hw/scsi/scsi-bus.c | 181 -
   2 files changed, 131 insertions(+), 57 deletions(-)

My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
often because of this commit than because of the original bug, i.e. when
repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
this tends to happen when unplugging the scsi-hd:

{"execute":"device_del","arguments":{"id":"stg0"}}
{"return": {}}
qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
Assertion `ctx == blk->ctx' failed.

(gdb) bt
#0  0x7f32a668d83c in  () at /usr/lib/libc.so.6
#1  0x7f32a663d668 in raise () at /usr/lib/libc.so.6
#2  0x7f32a66254b8 in abort () at /usr/lib/libc.so.6
#3  0x7f32a66253dc in  () at /usr/lib/libc.so.6
#4  0x7f32a6635d26 in  () at /usr/lib/libc.so.6
#5  0x556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at
../block/block-backend.c:2429
#6  blk_get_aio_context (blk=0x556e6e89ccf0) at
../block/block-backend.c:2417
#7  0x556e6b112d87 in scsi_device_for_each_req_async_bh
(opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128
#8  0x556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at
../util/async.c:218
#9  0x556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290,
blocking=blocking@entry=true) at ../util/aio-posix.c:722
#10 0x556e6b4564b6 in iothread_run (opaque=opaque@entry=0x556e6d89d920)
at ../iothread.c:63
#11 0x556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at
../util/qemu-thread-posix.c:541
#12 0x7f32a668b9eb in  () at /usr/lib/libc.so.6
#13 0x7f32a670f7cc in  () at /usr/lib/libc.so.6

I don’t know anything about the problem yet, but as usual, I like
speculation and discovering how wrong I was later on, so one thing I came
across that’s funny about virtio-scsi is that requests can happen even while
a disk is being attached or detached.  That is, Linux seems to probe all
LUNs when a new virtio-scsi device is being attached, and it won’t stop just
because a disk is being attached or removed.  So maybe that’s part of the
problem, that we get a request while the BB is being detached, and
temporarily in an inconsistent state (BDS context differs from BB context).

I don't know anything about the problem either, but since you already
speculated about the cause, let me speculate about the solution:
Can we hold the graph writer lock for the tran_commit() call in
bdrv_try_change_aio_context()? And of course take the reader lock for
blk_get_aio_context(), but that should be completely unproblematic.

At the first sight I don't see a reason why this would break something,
but I've learnt not to trust my first impression with the graph locking
work...

Of course, I also didn't check if there are more things inside of the
device emulation that need additional locking in this case, too. But
even if so, blk_get_aio_context() should never see an inconsistent
state.


Ah, sorry, saw your reply only now that I hit “send”.

I forgot that we do have that big lock that I thought rather to avoid 
:)  Sounds good and very reasonable to me.  Changing the contexts in the 
graph sounds like a graph change operation, and reading and comparing 
contexts in the graph sounds like reading the graph.


Hanna

Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

2024-01-23 Thread Hanna Czenczek

On 23.01.24 17:40, Hanna Czenczek wrote:

On 21.12.23 22:23, Kevin Wolf wrote:

From: Stefan Hajnoczi

Stop depending on the AioContext lock and instead access
SCSIDevice->requests from only one thread at a time:
- When the VM is running only the BlockBackend's AioContext may access
   the requests list.
- When the VM is stopped only the main loop may access the requests
   list.

These constraints protect the requests list without the need for locking
in the I/O code path.

Note that multiple IOThreads are not supported yet because the code
assumes all SCSIRequests are executed from a single AioContext. Leave
that as future work.

Signed-off-by: Stefan Hajnoczi
Reviewed-by: Eric Blake
Message-ID:<20231204164259.1515217-2-stefa...@redhat.com>
Signed-off-by: Kevin Wolf
---
  include/hw/scsi/scsi.h |   7 +-
  hw/scsi/scsi-bus.c | 181 -
  2 files changed, 131 insertions(+), 57 deletions(-)

[...]

I don’t know anything about the problem yet, but as usual, I like 
speculation and discovering how wrong I was later on, so one thing I 
came across that’s funny about virtio-scsi is that requests can happen 
even while a disk is being attached or detached.  That is, Linux seems 
to probe all LUNs when a new virtio-scsi device is being attached, and 
it won’t stop just because a disk is being attached or removed.  So 
maybe that’s part of the problem, that we get a request while the BB 
is being detached, and temporarily in an inconsistent state (BDS 
context differs from BB context).


I’ll look more into it.


What I think happens is that scsi_device_purge_requests() runs (perhaps 
through virtio_scsi_hotunplug() -> qdev_simple_device_unplug_cb() -> 
scsi_qdev_unrealize()?), which schedules 
scsi_device_for_each_req_async_bh() to run, but doesn’t await it.  We go 
on, begin to move the BB and its BDS back to the main context (via 
blk_set_aio_context() in virtio_scsi_hotunplug()), but 
scsi_device_for_each_req_async_bh() still runs in the I/O thread, it 
calls blk_get_aio_context() while the contexts are inconsistent, and we 
get the crash.


There is a comment above blk_get_aio_context() in 
scsi_device_for_each_req_async_bh() about the BB potentially being moved 
to a different context prior to the BH running, but it doesn’t consider 
the possibility that that move may occur *concurrently*.


I don’t know how to fix this, though.  The whole idea of anything 
happening to a BB while it is being moved to a different context seems 
so wrong to me that I’d want to slap a big lock on it, but I have the 
feeling that that isn’t what we want.


Hanna




Re: [PATCH v2 2/4] Avoid conflicting types for 'copy_file_range'

2024-01-23 Thread Kevin Wolf
Am 22.01.2024 um 18:04 hat Peter Maydell geschrieben:
> (Cc'ing the block folks)
> 
> On Thu, 18 Jan 2024 at 16:03, Manolo de Medici  
> wrote:
> >
> > Compilation fails on systems where copy_file_range is already defined as a
> > stub.
> 
> What do you mean by "stub" here ? If the system headers define
> a prototype for the function, I would have expected the
> meson check to set HAVE_COPY_FILE_RANGE, and then this
> function doesn't get defined at all. That is, the prototype
> mismatch shouldn't matter because if the prototype exists we
> use the libc function, and if it doesn't then we use our version.
> 
> > The prototype of copy_file_range in glibc returns an ssize_t, not an off_t.
> >
> > The function currently only exists on linux and freebsd, and in both cases
> > the return type is ssize_t
> >
> > Signed-off-by: Manolo de Medici 
> > ---
> >  block/file-posix.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 35684f7e21..f744b35642 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2000,12 +2000,13 @@ static int handle_aiocb_write_zeroes_unmap(void 
> > *opaque)
> >  }
> >
> >  #ifndef HAVE_COPY_FILE_RANGE
> > -static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> > - off_t *out_off, size_t len, unsigned int 
> > flags)
> > +ssize_t copy_file_range (int infd, off_t *pinoff,
> > + int outfd, off_t *poutoff,
> > + size_t length, unsigned int flags)
> 
> No space after "copy_file_range". No need to rename all the
> arguments either.
> 
> >  {
> >  #ifdef __NR_copy_file_range
> > -return syscall(__NR_copy_file_range, in_fd, in_off, out_fd,
> > -   out_off, len, flags);
> > +return (ssize_t)syscall(__NR_copy_file_range, infd, pinoff, outfd,
> > +poutoff, length, flags);
> 
> Don't need a cast here, because returning the value will
> automatically cast it to the right thing.
> 
> >  #else
> >  errno = ENOSYS;
> >  return -1;
> 
> These changes aren't wrong, but as noted above I'm surprised that
> the Hurd gets into this code at all.

Yes, I think we didn't expect that HAVE_COPY_FILE_RANGE would not be
defined in some cases even if copy_file_range() exists in the libc.

> Note for Kevin: shouldn't this direct use of syscall() have
> some sort of OS-specific guard on it? There's nothing that
> says that a non-Linux host OS has to have the same set of
> arguments to an __NR_copy_file_range syscall. If this
> fallback is a Linux-ism we should guard it appropriately.

Yes, I think this should be #if defined(__linux__) &&
defined(__NR_copy_file_range).

> For that matter, at what point can we just remove the fallback
> entirely? copy_file_range() went into Linux in 4.5, apparently,
> which is now nearly 8 years old. Maybe all our supported
> hosts now have a new enough kernel and we can drop this
> somewhat ugly syscall() wrapper...

The kernel doesn't really matter here, but the libc. Apparently
copy_file_range() was added in glibc 2.27 in 2018. If we want to remove
the wrapper, we'd have to check if all currently supported distributions
have a new enough glibc.

Kevin




Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

2024-01-23 Thread Kevin Wolf
Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:
> On 21.12.23 22:23, Kevin Wolf wrote:
> > From: Stefan Hajnoczi
> > 
> > Stop depending on the AioContext lock and instead access
> > SCSIDevice->requests from only one thread at a time:
> > - When the VM is running only the BlockBackend's AioContext may access
> >the requests list.
> > - When the VM is stopped only the main loop may access the requests
> >list.
> > 
> > These constraints protect the requests list without the need for locking
> > in the I/O code path.
> > 
> > Note that multiple IOThreads are not supported yet because the code
> > assumes all SCSIRequests are executed from a single AioContext. Leave
> > that as future work.
> > 
> > Signed-off-by: Stefan Hajnoczi
> > Reviewed-by: Eric Blake
> > Message-ID:<20231204164259.1515217-2-stefa...@redhat.com>
> > Signed-off-by: Kevin Wolf
> > ---
> >   include/hw/scsi/scsi.h |   7 +-
> >   hw/scsi/scsi-bus.c | 181 -
> >   2 files changed, 131 insertions(+), 57 deletions(-)
> 
> My reproducer for https://issues.redhat.com/browse/RHEL-3934 now breaks more
> often because of this commit than because of the original bug, i.e. when
> repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
> this tends to happen when unplugging the scsi-hd:
> 
> {"execute":"device_del","arguments":{"id":"stg0"}}
> {"return": {}}
> qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
> Assertion `ctx == blk->ctx' failed.
> 
> (gdb) bt
> #0  0x7f32a668d83c in  () at /usr/lib/libc.so.6
> #1  0x7f32a663d668 in raise () at /usr/lib/libc.so.6
> #2  0x7f32a66254b8 in abort () at /usr/lib/libc.so.6
> #3  0x7f32a66253dc in  () at /usr/lib/libc.so.6
> #4  0x7f32a6635d26 in  () at /usr/lib/libc.so.6
> #5  0x556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at
> ../block/block-backend.c:2429
> #6  blk_get_aio_context (blk=0x556e6e89ccf0) at
> ../block/block-backend.c:2417
> #7  0x556e6b112d87 in scsi_device_for_each_req_async_bh
> (opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128
> #8  0x556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at
> ../util/async.c:218
> #9  0x556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290,
> blocking=blocking@entry=true) at ../util/aio-posix.c:722
> #10 0x556e6b4564b6 in iothread_run (opaque=opaque@entry=0x556e6d89d920)
> at ../iothread.c:63
> #11 0x556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at
> ../util/qemu-thread-posix.c:541
> #12 0x7f32a668b9eb in  () at /usr/lib/libc.so.6
> #13 0x7f32a670f7cc in  () at /usr/lib/libc.so.6
> 
> I don’t know anything about the problem yet, but as usual, I like
> speculation and discovering how wrong I was later on, so one thing I came
> across that’s funny about virtio-scsi is that requests can happen even while
> a disk is being attached or detached.  That is, Linux seems to probe all
> LUNs when a new virtio-scsi device is being attached, and it won’t stop just
> because a disk is being attached or removed.  So maybe that’s part of the
> problem, that we get a request while the BB is being detached, and
> temporarily in an inconsistent state (BDS context differs from BB context).

I don't know anything about the problem either, but since you already
speculated about the cause, let me speculate about the solution:
Can we hold the graph writer lock for the tran_commit() call in
bdrv_try_change_aio_context()? And of course take the reader lock for
blk_get_aio_context(), but that should be completely unproblematic.

At the first sight I don't see a reason why this would break something,
but I've learnt not to trust my first impression with the graph locking
work...

Of course, I also didn't check if there are more things inside of the
device emulation that need additional locking in this case, too. But
even if so, blk_get_aio_context() should never see an inconsistent
state.

Kevin




Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

2024-01-23 Thread Hanna Czenczek

On 21.12.23 22:23, Kevin Wolf wrote:

From: Stefan Hajnoczi

Stop depending on the AioContext lock and instead access
SCSIDevice->requests from only one thread at a time:
- When the VM is running only the BlockBackend's AioContext may access
   the requests list.
- When the VM is stopped only the main loop may access the requests
   list.

These constraints protect the requests list without the need for locking
in the I/O code path.

Note that multiple IOThreads are not supported yet because the code
assumes all SCSIRequests are executed from a single AioContext. Leave
that as future work.

Signed-off-by: Stefan Hajnoczi
Reviewed-by: Eric Blake
Message-ID:<20231204164259.1515217-2-stefa...@redhat.com>
Signed-off-by: Kevin Wolf
---
  include/hw/scsi/scsi.h |   7 +-
  hw/scsi/scsi-bus.c | 181 -
  2 files changed, 131 insertions(+), 57 deletions(-)


My reproducer for https://issues.redhat.com/browse/RHEL-3934 now breaks 
more often because of this commit than because of the original bug, i.e. 
when repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd 
device, this tends to happen when unplugging the scsi-hd:


{"execute":"device_del","arguments":{"id":"stg0"}}
{"return": {}}
qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context: 
Assertion `ctx == blk->ctx' failed.


(gdb) bt
#0  0x7f32a668d83c in  () at /usr/lib/libc.so.6
#1  0x7f32a663d668 in raise () at /usr/lib/libc.so.6
#2  0x7f32a66254b8 in abort () at /usr/lib/libc.so.6
#3  0x7f32a66253dc in  () at /usr/lib/libc.so.6
#4  0x7f32a6635d26 in  () at /usr/lib/libc.so.6
#5  0x556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at 
../block/block-backend.c:2429
#6  blk_get_aio_context (blk=0x556e6e89ccf0) at 
../block/block-backend.c:2417
#7  0x556e6b112d87 in scsi_device_for_each_req_async_bh 
(opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128
#8  0x556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at 
../util/async.c:218
#9  0x556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290, 
blocking=blocking@entry=true) at ../util/aio-posix.c:722
#10 0x556e6b4564b6 in iothread_run 
(opaque=opaque@entry=0x556e6d89d920) at ../iothread.c:63
#11 0x556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at 
../util/qemu-thread-posix.c:541

#12 0x7f32a668b9eb in  () at /usr/lib/libc.so.6
#13 0x7f32a670f7cc in  () at /usr/lib/libc.so.6

I don’t know anything about the problem yet, but as usual, I like 
speculation and discovering how wrong I was later on, so one thing I 
came across that’s funny about virtio-scsi is that requests can happen 
even while a disk is being attached or detached.  That is, Linux seems 
to probe all LUNs when a new virtio-scsi device is being attached, and 
it won’t stop just because a disk is being attached or removed.  So 
maybe that’s part of the problem, that we get a request while the BB is 
being detached, and temporarily in an inconsistent state (BDS context 
differs from BB context).


I’ll look more into it.

Hanna

Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler

2024-01-23 Thread Hanna Czenczek

On 02.01.24 16:24, Hanna Czenczek wrote:

[...]

I’ve attached the preliminary patch that I didn’t get to send (or test 
much) last year.  Not sure if it has the same CPU-usage-spike issue 
Fiona was seeing, the only functional difference is that I notify the 
vq after attaching the notifiers instead of before.


It’ll take me some more time to send a patch mail to that effect, 
because now there’s an assertion failure on hotunplug, which bisects 
back to eaad0fe26050c227dc5dad63205835bac4912a51 (“scsi: only access 
SCSIDevice->requests from one thread”):


{"execute":"device_del","arguments":{"id":"stg0"}}
{"return": {}}
qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context: 
Assertion `ctx == blk->ctx' failed.


(gdb) bt
#0  0x7f32a668d83c in  () at /usr/lib/libc.so.6
#1  0x7f32a663d668 in raise () at /usr/lib/libc.so.6
#2  0x7f32a66254b8 in abort () at /usr/lib/libc.so.6
#3  0x7f32a66253dc in  () at /usr/lib/libc.so.6
#4  0x7f32a6635d26 in  () at /usr/lib/libc.so.6
#5  0x556e6b4880a4 in blk_get_aio_context (blk=0x556e6e89ccf0) at 
../block/block-backend.c:2429
#6  blk_get_aio_context (blk=0x556e6e89ccf0) at 
../block/block-backend.c:2417
#7  0x556e6b112d87 in scsi_device_for_each_req_async_bh 
(opaque=0x556e6e2c6d10) at ../hw/scsi/scsi-bus.c:128
#8  0x556e6b5d1966 in aio_bh_poll (ctx=ctx@entry=0x556e6d8aa290) at 
../util/async.c:218
#9  0x556e6b5bb16a in aio_poll (ctx=0x556e6d8aa290, 
blocking=blocking@entry=true) at ../util/aio-posix.c:722
#10 0x556e6b4564b6 in iothread_run 
(opaque=opaque@entry=0x556e6d89d920) at ../iothread.c:63
#11 0x556e6b5bde58 in qemu_thread_start (args=0x556e6d8aa9b0) at 
../util/qemu-thread-posix.c:541

#12 0x7f32a668b9eb in  () at /usr/lib/libc.so.6
#13 0x7f32a670f7cc in  () at /usr/lib/libc.so.6




Re: [PATCH v2 2/4] Avoid conflicting types for 'copy_file_range'

2024-01-23 Thread Manolo de Medici
On Mon, Jan 22, 2024 at 6:04 PM Peter Maydell  wrote:
>
> (Cc'ing the block folks)
>
> On Thu, 18 Jan 2024 at 16:03, Manolo de Medici  
> wrote:
> >
> > Compilation fails on systems where copy_file_range is already defined as a
> > stub.
>
> What do you mean by "stub" here ? If the system headers define
> a prototype for the function, I would have expected the
> meson check to set HAVE_COPY_FILE_RANGE, and then this
> function doesn't get defined at all. That is, the prototype
> mismatch shouldn't matter because if the prototype exists we
> use the libc function, and if it doesn't then we use our version.
>
> > The prototype of copy_file_range in glibc returns an ssize_t, not an off_t.
> >
> > The function currently only exists on linux and freebsd, and in both cases
> > the return type is ssize_t
> >
> > Signed-off-by: Manolo de Medici 
> > ---
> >  block/file-posix.c | 9 +
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 35684f7e21..f744b35642 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -2000,12 +2000,13 @@ static int handle_aiocb_write_zeroes_unmap(void 
> > *opaque)
> >  }
> >
> >  #ifndef HAVE_COPY_FILE_RANGE
> > -static off_t copy_file_range(int in_fd, off_t *in_off, int out_fd,
> > - off_t *out_off, size_t len, unsigned int 
> > flags)
> > +ssize_t copy_file_range (int infd, off_t *pinoff,
> > + int outfd, off_t *poutoff,
> > + size_t length, unsigned int flags)
>
> No space after "copy_file_range". No need to rename all the
> arguments either.

Ok

>
> >  {
> >  #ifdef __NR_copy_file_range
> > -return syscall(__NR_copy_file_range, in_fd, in_off, out_fd,
> > -   out_off, len, flags);
> > +return (ssize_t)syscall(__NR_copy_file_range, infd, pinoff, outfd,
> > +poutoff, length, flags);
>
> Don't need a cast here, because returning the value will
> automatically cast it to the right thing.
>

Ok

> >  #else
> >  errno = ENOSYS;
> >  return -1;
>
> These changes aren't wrong, but as noted above I'm surprised that
> the Hurd gets into this code at all.
>

I think Sergey explained very well why the Hurd its this piece of code

> Note for Kevin: shouldn't this direct use of syscall() have
> some sort of OS-specific guard on it? There's nothing that
> says that a non-Linux host OS has to have the same set of
> arguments to an __NR_copy_file_range syscall. If this
> fallback is a Linux-ism we should guard it appropriately.
>
> For that matter, at what point can we just remove the fallback
> entirely? copy_file_range() went into Linux in 4.5, apparently,
> which is now nearly 8 years old. Maybe all our supported
> hosts now have a new enough kernel and we can drop this
> somewhat ugly syscall() wrapper...

I'd love to remove the syscall wrapper if you give me the ok to do it

Thanks



[PATCH v2 0/2] hw/block/block.c: improve confusing error

2024-01-23 Thread Manos Pitsidianakis
In cases where a device tries to read more bytes than the block device
contains with the blk_check_size_and_read_all() function, the error is
vague: "device requires X bytes, block backend provides Y bytes".

This patch changes the errors of this function to include the block
backend name, the device id and device type name where appropriate.

Version 2:
- Assert dev is not NULL on qdev_get_human_name
(thanks Phil Mathieu-Daudé )

Manos Pitsidianakis (2):
  hw/core/qdev.c: add qdev_get_human_name()
  hw/block/block.c: improve confusing blk_check_size_and_read_all()
error

 hw/block/block.c | 25 +++--
 hw/block/m25p80.c|  3 ++-
 hw/block/pflash_cfi01.c  |  4 ++--
 hw/block/pflash_cfi02.c  |  2 +-
 hw/core/qdev.c   |  8 
 include/hw/block/block.h |  4 ++--
 include/hw/qdev-core.h   | 14 ++
 7 files changed, 44 insertions(+), 16 deletions(-)

Range-diff against v1:
1:  15b15d6d4f ! 1:  5fb5879708 hw/core/qdev.c: add qdev_get_human_name()
@@ hw/core/qdev.c: Object *qdev_get_machine(void)
  
 +char *qdev_get_human_name(DeviceState *dev)
 +{
-+if (!dev) {
-+return g_strdup("");
-+}
++g_assert(dev != NULL);
 +
 +return dev->id ?
 +   g_strdup(dev->id) : object_get_canonical_path(OBJECT(dev));
@@ include/hw/qdev-core.h: const char *qdev_fw_name(DeviceState *dev);
  
 +/**
 + * qdev_get_human_name() - Return a human-readable name for a device
-+ * @dev: The device
++ * @dev: The device. Must be a valid and non-NULL pointer.
 + *
 + * .. note::
 + *This function is intended for user friendly error messages.
 + *
 + * Returns: A newly allocated string containing the device id if not null,
-+ * else the object canonical path if not null. If @dev is NULL, it 
returns an
-+ * allocated empty string.
++ * else the object canonical path.
 + *
 + * Use g_free() to free it.
 + */
2:  e3701762ed ! 2:  8e7eb17fbd hw/block/block.c: improve confusing 
blk_check_size_and_read_all() error
@@ Commit message
 This patch changes the errors of this function to include the block
 backend name, the device id and device type name where appropriate.
 
+Reviewed-by: Philippe Mathieu-Daudé 
 Signed-off-by: Manos Pitsidianakis 
 
  ## hw/block/block.c ##

base-commit: 09be34717190c1620f0c6e5c8765b8da354aeb4b
-- 
γαῖα πυρί μιχθήτω




[PATCH v2 2/2] hw/block/block.c: improve confusing blk_check_size_and_read_all() error

2024-01-23 Thread Manos Pitsidianakis
In cases where a device tries to read more bytes than the block device
contains, the error is vague: "device requires X bytes, block backend
provides Y bytes".

This patch changes the errors of this function to include the block
backend name, the device id and device type name where appropriate.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Manos Pitsidianakis 
---
 hw/block/block.c | 25 +++--
 hw/block/m25p80.c|  3 ++-
 hw/block/pflash_cfi01.c  |  4 ++--
 hw/block/pflash_cfi02.c  |  2 +-
 include/hw/block/block.h |  4 ++--
 5 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 9f52ee6e72..624389d62d 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -54,29 +54,30 @@ static int blk_pread_nonzeroes(BlockBackend *blk, hwaddr 
size, void *buf)
  * BDRV_REQUEST_MAX_BYTES.
  * On success, return true.
  * On failure, store an error through @errp and return false.
- * Note that the error messages do not identify the block backend.
- * TODO Since callers don't either, this can result in confusing
- * errors.
+ *
  * This function not intended for actual block devices, which read on
  * demand.  It's for things like memory devices that (ab)use a block
  * backend to provide persistence.
  */
-bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
- Error **errp)
+bool blk_check_size_and_read_all(BlockBackend *blk, DeviceState *dev,
+ void *buf, hwaddr size, Error **errp)
 {
 int64_t blk_len;
 int ret;
+g_autofree char *dev_id = NULL;
 
 blk_len = blk_getlength(blk);
 if (blk_len < 0) {
 error_setg_errno(errp, -blk_len,
- "can't get size of block backend");
+ "can't get size of %s block backend", blk_name(blk));
 return false;
 }
 if (blk_len != size) {
-error_setg(errp, "device requires %" HWADDR_PRIu " bytes, "
-   "block backend provides %" PRIu64 " bytes",
-   size, blk_len);
+dev_id = qdev_get_human_name(dev);
+error_setg(errp, "%s device with id='%s' requires %" HWADDR_PRIu
+   " bytes, %s block backend provides %" PRIu64 " bytes",
+   object_get_typename(OBJECT(dev)), dev_id, size,
+   blk_name(blk), blk_len);
 return false;
 }
 
@@ -89,7 +90,11 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void 
*buf, hwaddr size,
 assert(size <= BDRV_REQUEST_MAX_BYTES);
 ret = blk_pread_nonzeroes(blk, size, buf);
 if (ret < 0) {
-error_setg_errno(errp, -ret, "can't read block backend");
+dev_id = qdev_get_human_name(dev);
+error_setg_errno(errp, -ret, "can't read %s block backend"
+ "for %s device with id='%s'",
+ blk_name(blk), object_get_typename(OBJECT(dev)),
+ dev_id);
 return false;
 }
 return true;
diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 26ce895628..0a12030a3a 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -1617,7 +1617,8 @@ static void m25p80_realize(SSIPeripheral *ss, Error 
**errp)
 trace_m25p80_binding(s);
 s->storage = blk_blockalign(s->blk, s->size);
 
-if (!blk_check_size_and_read_all(s->blk, s->storage, s->size, errp)) {
+if (!blk_check_size_and_read_all(s->blk, DEVICE(s),
+ s->storage, s->size, errp)) {
 return;
 }
 } else {
diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index f956f8bcf7..1bda8424b9 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -848,8 +848,8 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 }
 
 if (pfl->blk) {
-if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
- errp)) {
+if (!blk_check_size_and_read_all(pfl->blk, dev, pfl->storage,
+ total_len, errp)) {
 vmstate_unregister_ram(>mem, DEVICE(pfl));
 return;
 }
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 6fa56f14c0..2314142373 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -902,7 +902,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 }
 
 if (pfl->blk) {
-if (!blk_check_size_and_read_all(pfl->blk, pfl->storage,
+if (!blk_check_size_and_read_all(pfl->blk, dev, pfl->storage,
  pfl->chip_len, errp)) {
 vmstate_unregister_ram(>orig_mem, DEVICE(pfl));
 return;
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 15fff66435..de3946a5f1 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -88,8 +88,8 @@ 

[PATCH v2 1/2] hw/core/qdev.c: add qdev_get_human_name()

2024-01-23 Thread Manos Pitsidianakis
Add a simple method to return some kind of human readable identifier for
use in error messages.

Signed-off-by: Manos Pitsidianakis 
---
 hw/core/qdev.c |  8 
 include/hw/qdev-core.h | 14 ++
 2 files changed, 22 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 43d863b0c5..c68d0f7c51 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -879,6 +879,14 @@ Object *qdev_get_machine(void)
 return dev;
 }
 
+char *qdev_get_human_name(DeviceState *dev)
+{
+g_assert(dev != NULL);
+
+return dev->id ?
+   g_strdup(dev->id) : object_get_canonical_path(OBJECT(dev));
+}
+
 static MachineInitPhase machine_phase;
 
 bool phase_check(MachineInitPhase phase)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 151d968238..66338f479f 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -993,6 +993,20 @@ const char *qdev_fw_name(DeviceState *dev);
 void qdev_assert_realized_properly(void);
 Object *qdev_get_machine(void);
 
+/**
+ * qdev_get_human_name() - Return a human-readable name for a device
+ * @dev: The device. Must be a valid and non-NULL pointer.
+ *
+ * .. note::
+ *This function is intended for user friendly error messages.
+ *
+ * Returns: A newly allocated string containing the device id if not null,
+ * else the object canonical path.
+ *
+ * Use g_free() to free it.
+ */
+char *qdev_get_human_name(DeviceState *dev);
+
 /* FIXME: make this a link<> */
 bool qdev_set_parent_bus(DeviceState *dev, BusState *bus, Error **errp);
 
-- 
γαῖα πυρί μιχθήτω




NVME hotplug support ?

2024-01-23 Thread Damien Hedde
Hi all,

We are currently looking into hotplugging nvme devices and it is currently not 
possible:
When nvme was introduced 2 years ago, the feature was disabled.
> commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> Author: Klaus Jensen 
> Date:   Tue Jul 6 10:48:40 2021 +0200
>
>hw/nvme: mark nvme-subsys non-hotpluggable
>
>We currently lack the infrastructure to handle subsystem hotplugging, so
>disable it.

Do someone know what's lacking or anyone have some tips/idea of what we should 
develop to add the support ?

Regards,
--
Damien 







Re: [PATCH] block/io_uring: improve error message when init fails

2024-01-23 Thread Stefan Hajnoczi
On Tue, Jan 23, 2024 at 02:50:44PM +0100, Fiona Ebner wrote:
> The man page for io_uring_queue_init states:
> 
> > io_uring_queue_init(3) returns 0 on success and -errno on failure.
> 
> and the man page for io_uring_setup (which is one of the functions
> where the return value of io_uring_queue_init() can come from) states:
> 
> > On error, a negative error code is returned. The caller should not
> > rely on errno variable.
> 
> Tested using 'sysctl kernel.io_uring_disabled=2'. Output before this
> change:
> 
> > failed to init linux io_uring ring
> 
> Output after this change:
> 
> > failed to init linux io_uring ring: Operation not permitted
> 
> Signed-off-by: Fiona Ebner 
> ---
>  block/io_uring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my master tree:
https://gitlab.com/stefanha/qemu/commits/master

Stefan


signature.asc
Description: PGP signature


[PATCH] block/io_uring: improve error message when init fails

2024-01-23 Thread Fiona Ebner
The man page for io_uring_queue_init states:

> io_uring_queue_init(3) returns 0 on success and -errno on failure.

and the man page for io_uring_setup (which is one of the functions
where the return value of io_uring_queue_init() can come from) states:

> On error, a negative error code is returned. The caller should not
> rely on errno variable.

Tested using 'sysctl kernel.io_uring_disabled=2'. Output before this
change:

> failed to init linux io_uring ring

Output after this change:

> failed to init linux io_uring ring: Operation not permitted

Signed-off-by: Fiona Ebner 
---
 block/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index d77ae55745..d11b2051ab 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -432,7 +432,7 @@ LuringState *luring_init(Error **errp)
 
 rc = io_uring_queue_init(MAX_ENTRIES, ring, 0);
 if (rc < 0) {
-error_setg_errno(errp, errno, "failed to init linux io_uring ring");
+error_setg_errno(errp, -rc, "failed to init linux io_uring ring");
 g_free(s);
 return NULL;
 }
-- 
2.39.2





Re: NVME hotplug support ?

2024-01-23 Thread Hannes Reinecke

On 1/23/24 11:59, Damien Hedde wrote:

Hi all,

We are currently looking into hotplugging nvme devices and it is currently not 
possible:
When nvme was introduced 2 years ago, the feature was disabled.

commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
Author: Klaus Jensen
Date:   Tue Jul 6 10:48:40 2021 +0200

hw/nvme: mark nvme-subsys non-hotpluggable

We currently lack the infrastructure to handle subsystem hotplugging, so

disable it.


Do someone know what's lacking or anyone have some tips/idea of what we should 
develop to add the support ?

Problem is that the object model is messed up. In qemu namespaces are 
attached to controllers, which in turn are children of the PCI device.

There are subsystems, but these just reference the controller.

So if you hotunplug the PCI device you detach/destroy the controller and 
detach the namespaces from the controller.
But if you hotplug the PCI device again the NVMe controller will be 
attached to the PCI device, but the namespace are still be detached.


Klaus said he was going to fix that, and I dimly remember some patches
floating around. But apparently it never went anywhere.

Fundamental problem is that the NVMe hierarchy as per spec is 
incompatible with the qemu object model; qemu requires a strict

tree model where every object has exactly _one_ parent.

Cheers,

Hannes




Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler

2024-01-23 Thread Hanna Czenczek

On 23.01.24 12:12, Fiona Ebner wrote:

[...]


I noticed poll_set_started() is not called, because
ctx->fdmon_ops->need_wait(ctx) was true, i.e. ctx->poll_disable_cnt was
positive (I'm using fdmon_poll). I then found this is because of the
notifier for the event vq, being attached with


virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);

in virtio_scsi_dataplane_start(). But in virtio_scsi_drained_end() it is
attached with virtio_queue_aio_attach_host_notifier() instead of the
_no_poll() variant. So that might be the actual issue here?

 From a quick test, I cannot see the CPU-usage-spike issue with the
following either:


diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 690aceec45..ba1ab8e410 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1166,7 +1166,15 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
  
  for (uint32_t i = 0; i < total_queues; i++) {

  VirtQueue *vq = virtio_get_queue(vdev, i);
-virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+if (!virtio_queue_get_notification(vq)) {
+virtio_queue_set_notification(vq, true);
+}
+if (vq == VIRTIO_SCSI_COMMON(s)->event_vq) {
+virtio_queue_aio_attach_host_notifier_no_poll(vq, s->ctx);
+} else {
+virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+}
+virtio_queue_notify(vdev, i);
  }
  }


Perfect, so we agree on trying it that way. :)

Hanna

Re: NVME hotplug support ?

2024-01-23 Thread Klaus Jensen
On Jan 23 10:59, Damien Hedde wrote:
> Hi all,
> 
> We are currently looking into hotplugging nvme devices and it is currently 
> not possible:
> When nvme was introduced 2 years ago, the feature was disabled.
> > commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845
> > Author: Klaus Jensen 
> > Date:   Tue Jul 6 10:48:40 2021 +0200
> >
> >hw/nvme: mark nvme-subsys non-hotpluggable
> >
> >We currently lack the infrastructure to handle subsystem hotplugging, so
> >disable it.
> 
> Do someone know what's lacking or anyone have some tips/idea of what we 
> should develop to add the support ?
> 
> Regards,
> --
> Damien 
> 

That's not entirely true.

The *subsystem* is non-hotpluggable, but individual controllers can be
hotplugged. Even into an existing subsystem.

However, you cannot hotplug pci devices unless you set up a pcie root
port. Say,

  -device "pcie-root-port,id=pcie_root_port0,chassis=1,slot=0"
  -device "nvme,id=nvme0,serial=nvme0,bus=pcie_root_port0"

nvme0 can then be removed with device_del and added back as a new device
with device_add.


signature.asc
Description: PGP signature


Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler

2024-01-23 Thread Hanna Czenczek

On 22.01.24 18:52, Hanna Czenczek wrote:

On 22.01.24 18:41, Hanna Czenczek wrote:

On 05.01.24 15:30, Fiona Ebner wrote:

Am 05.01.24 um 14:43 schrieb Fiona Ebner:

Am 03.01.24 um 14:35 schrieb Paolo Bonzini:

On 1/3/24 12:40, Fiona Ebner wrote:
I'm happy to report that I cannot reproduce the CPU-usage-spike 
issue
with the patch, but I did run into an assertion failure when 
trying to
verify that it fixes my original stuck-guest-IO issue. See below 
for the
backtrace [0]. Hanna wrote in 
https://issues.redhat.com/browse/RHEL-3934



I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
after the virtio_queue_aio_attach_host_notifier(vq, ctx) call, 
because
both virtio-scsi’s and virtio-blk’s .handle_output() 
implementations
acquire the device’s context, so this should be directly 
callable from

any context.

I guess this is not true anymore now that the AioContext locking was
removed?

Good point and, in fact, even before it was much safer to use
virtio_queue_notify() instead.  Not only does it use the event 
notifier

handler, but it also calls it in the right thread/AioContext just by
doing event_notifier_set().


But with virtio_queue_notify() using the event notifier, the
CPU-usage-spike issue is present:

Back to the CPU-usage-spike issue: I experimented around and it 
doesn't
seem to matter whether I notify the virt queue before or after 
attaching

the notifiers. But there's another functional difference. My patch
called virtio_queue_notify() which contains this block:


 if (vq->host_notifier_enabled) {
event_notifier_set(>host_notifier);
 } else if (vq->handle_output) {
 vq->handle_output(vdev, vq);
In my testing, the first branch was taken, calling 
event_notifier_set().

Hanna's patch uses virtio_queue_notify_vq() and there,
vq->handle_output() will be called. That seems to be the relevant
difference regarding the CPU-usage-spike issue.
I should mention that this is with a VirtIO SCSI disk. I also 
attempted

to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
didn't manage yet.

What I noticed is that in virtio_queue_host_notifier_aio_poll(), 
one of

the queues (but only one) will always show as nonempty. And then,
run_poll_handlers_once() will always detect progress which explains 
the

CPU usage.

The following shows
1. vq address
2. number of times vq was passed to 
virtio_queue_host_notifier_aio_poll()

3. number of times the result of virtio_queue_host_notifier_aio_poll()
was true for the vq


0x555fd93f9c80 17162000 0
0x555fd93f9e48 17162000 6
0x555fd93f9ee0 17162000 0
0x555fd93f9d18 17162000 17162000
0x555fd93f9db0 17162000 0
0x555fd93f9f78 17162000 0

And for the problematic one, the reason it is seen as nonempty is:


0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0

vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and
s->events_dropped is false in my testing, so
virtio_scsi_handle_event_vq() doesn't do anything.


Those values stay like this while the call counts above increase.

So something going wrong with the indices when the event notifier 
is set

from QEMU side (in the main thread)?

The guest is Debian 12 with a 6.1 kernel.


So, trying to figure out a new RFC version:

About the stack trace you, Fiona, posted:  As far as I understand, 
that happens because virtio_blk_drained_end() calling 
virtio_queue_notify_vq() wasn’t safe after all, and instead we need 
to use virtio_queue_notify().  Right?


However, you say using virtio_queue_notify() instead causes busy 
loops of doing nothing in virtio-scsi (what you describe above). I 
mean, better than a crash, but, you know. :)


I don’t know have any prior knowledge about the event handling, 
unfortunately.  The fact that 8 buffers are available but we don’t 
use any sounds OK to me; as far as I understand, we only use those 
buffers if we have any events to push into them, so as long as we 
don’t, we won’t.  Question is, should we not have its poll handler 
return false if we don’t have any events (i.e. events_dropped is 
false)?  Would that solve it?


Or actually, maybe we could just skip the virtio_queue_notify() call 
for the event vq?  I.e. have it be `if (vq != 
VIRTIO_SCSI_COMMON(s)->event_vq) { virtio_queue_notify(vdev, i); }`?  
I wouldn’t like that very much, (a) because this would make it 
slightly cumbersome to put that into 
virtio_queue_aio_attach_host_notifier*(), and (b) in case that does 
fix it, I do kind of feel like the real problem is that we use 
virtio_queue_host_notifier_aio_poll() for the event vq, which tells 
the polling code to poll whenever the vq is non-empty, but we (AFAIU) 
expect the event vq to be non-empty all the time.


Turns out there was commit 38738f7dbbda90fbc161757b7f4be35b52205552 
(“virtio-scsi: don't waste CPU polling the event virtqueue”) by Stefan, 
which specifically intended to not use 
virtio_queue_host_notifier_aio_poll() for the event vq.  I think the 
problem is that virtio_scsi_drained_end() should have taken 

Re: [RFC 0/3] aio-posix: call ->poll_end() when removing AioHandler

2024-01-23 Thread Fiona Ebner
Am 22.01.24 um 18:52 schrieb Hanna Czenczek:
> On 22.01.24 18:41, Hanna Czenczek wrote:
>> On 05.01.24 15:30, Fiona Ebner wrote:
>>> Am 05.01.24 um 14:43 schrieb Fiona Ebner:
 Am 03.01.24 um 14:35 schrieb Paolo Bonzini:
> On 1/3/24 12:40, Fiona Ebner wrote:
>> I'm happy to report that I cannot reproduce the CPU-usage-spike issue
>> with the patch, but I did run into an assertion failure when
>> trying to
>> verify that it fixes my original stuck-guest-IO issue. See below
>> for the
>> backtrace [0]. Hanna wrote in
>> https://issues.redhat.com/browse/RHEL-3934
>>
>>> I think it’s sufficient to simply call virtio_queue_notify_vq(vq)
>>> after the virtio_queue_aio_attach_host_notifier(vq, ctx) call,
>>> because
>>> both virtio-scsi’s and virtio-blk’s .handle_output() implementations
>>> acquire the device’s context, so this should be directly callable
>>> from
>>> any context.
>> I guess this is not true anymore now that the AioContext locking was
>> removed?
> Good point and, in fact, even before it was much safer to use
> virtio_queue_notify() instead.  Not only does it use the event
> notifier
> handler, but it also calls it in the right thread/AioContext just by
> doing event_notifier_set().
>
 But with virtio_queue_notify() using the event notifier, the
 CPU-usage-spike issue is present:

>> Back to the CPU-usage-spike issue: I experimented around and it
>> doesn't
>> seem to matter whether I notify the virt queue before or after
>> attaching
>> the notifiers. But there's another functional difference. My patch
>> called virtio_queue_notify() which contains this block:
>>
>>>  if (vq->host_notifier_enabled) {
>>>  event_notifier_set(>host_notifier);
>>>  } else if (vq->handle_output) {
>>>  vq->handle_output(vdev, vq);
>> In my testing, the first branch was taken, calling
>> event_notifier_set().
>> Hanna's patch uses virtio_queue_notify_vq() and there,
>> vq->handle_output() will be called. That seems to be the relevant
>> difference regarding the CPU-usage-spike issue.
 I should mention that this is with a VirtIO SCSI disk. I also attempted
 to reproduce the CPU-usage-spike issue with a VirtIO block disk, but
 didn't manage yet.

 What I noticed is that in virtio_queue_host_notifier_aio_poll(), one of
 the queues (but only one) will always show as nonempty. And then,
 run_poll_handlers_once() will always detect progress which explains the
 CPU usage.

 The following shows
 1. vq address
 2. number of times vq was passed to
 virtio_queue_host_notifier_aio_poll()
 3. number of times the result of virtio_queue_host_notifier_aio_poll()
 was true for the vq

> 0x555fd93f9c80 17162000 0
> 0x555fd93f9e48 17162000 6
> 0x555fd93f9ee0 17162000 0
> 0x555fd93f9d18 17162000 17162000
> 0x555fd93f9db0 17162000 0
> 0x555fd93f9f78 17162000 0
 And for the problematic one, the reason it is seen as nonempty is:

> 0x555fd93f9d18 shadow_avail_idx 8 last_avail_idx 0
>>> vring_avail_idx(vq) also gives 8 here. This is the vs->event_vq and
>>> s->events_dropped is false in my testing, so
>>> virtio_scsi_handle_event_vq() doesn't do anything.
>>>
 Those values stay like this while the call counts above increase.

 So something going wrong with the indices when the event notifier is
 set
 from QEMU side (in the main thread)?

 The guest is Debian 12 with a 6.1 kernel.
>>
>> So, trying to figure out a new RFC version:
>>
>> About the stack trace you, Fiona, posted:  As far as I understand,
>> that happens because virtio_blk_drained_end() calling
>> virtio_queue_notify_vq() wasn’t safe after all, and instead we need to
>> use virtio_queue_notify().  Right?
>>

AFAICT, yes. In particular, after 4f36b13847 ("scsi: remove AioContext
locking"), the AioContext is not acquired by
virtio_scsi_handle_{cmd,ctrl,event} anymore.

>> However, you say using virtio_queue_notify() instead causes busy loops
>> of doing nothing in virtio-scsi (what you describe above). I mean,
>> better than a crash, but, you know. :)

Yes, that happens for me when using virtio_queue_notify(), i.e.

> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 690aceec45..8cdf04ac2d 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1166,7 +1166,11 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
>  
>  for (uint32_t i = 0; i < total_queues; i++) {
>  VirtQueue *vq = virtio_get_queue(vdev, i);
> +if (!virtio_queue_get_notification(vq)) {
> +virtio_queue_set_notification(vq, true);
> +}
>  virtio_queue_aio_attach_host_notifier(vq, s->ctx);
> +virtio_queue_notify(vdev, i);
>  }
>  }
>  


>>
>> I don’t know have any prior knowledge about the event 

Re: [PATCH 1/2] hw/core/qdev.c: add qdev_get_human_name()

2024-01-23 Thread Philippe Mathieu-Daudé

On 23/1/24 09:15, Manos Pitsidianakis wrote:
On Tue, 23 Jan 2024 10:13, Philippe Mathieu-Daudé  
wrote:

Hi Manos,

On 23/1/24 09:09, Manos Pitsidianakis wrote:

Add a simple method to return some kind of human readable identifier for
use in error messages.

Signed-off-by: Manos Pitsidianakis 
---
  hw/core/qdev.c | 10 ++
  include/hw/qdev-core.h | 15 +++
  2 files changed, 25 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 43d863b0c5..499f191826 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -879,6 +879,16 @@ Object *qdev_get_machine(void)
  return dev;
  }
+char *qdev_get_human_name(DeviceState *dev)
+{
+    if (!dev) {
+    return g_strdup("");
+    }
+
+    return dev->id ?
+   g_strdup(dev->id) : object_get_canonical_path(OBJECT(dev));
+}
+
  static MachineInitPhase machine_phase;
  bool phase_check(MachineInitPhase phase)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 151d968238..a8c742b4a3 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -993,6 +993,21 @@ const char *qdev_fw_name(DeviceState *dev);
  void qdev_assert_realized_properly(void);
  Object *qdev_get_machine(void);
+/**
+ * qdev_get_human_name() - Return a human-readable name for a device
+ * @dev: The device
+ *
+ * .. note::
+ *    This function is intended for user friendly error messages.
+ *
+ * Returns: A newly allocated string containing the device id if not 
null,
+ * else the object canonical path if not null. If @dev is NULL, it 
returns an

+ * allocated empty string.


In which case do we want to call this with NULL?


None I could think of, just future-proofing the NULL case.


I'd rather have a raw assert() as future-proof API, avoiding
dubious corner cases :)




+ *
+ * Use g_free() to free it.
+ */
+char *qdev_get_human_name(DeviceState *dev);
+
  /* FIXME: make this a link<> */
  bool qdev_set_parent_bus(DeviceState *dev, BusState *bus, Error 
**errp);







Re: [PATCH 1/2] hw/core/qdev.c: add qdev_get_human_name()

2024-01-23 Thread Manos Pitsidianakis

On Tue, 23 Jan 2024 10:13, Philippe Mathieu-Daudé  wrote:

Hi Manos,

On 23/1/24 09:09, Manos Pitsidianakis wrote:

Add a simple method to return some kind of human readable identifier for
use in error messages.

Signed-off-by: Manos Pitsidianakis 
---
  hw/core/qdev.c | 10 ++
  include/hw/qdev-core.h | 15 +++
  2 files changed, 25 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 43d863b0c5..499f191826 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -879,6 +879,16 @@ Object *qdev_get_machine(void)
  return dev;
  }
  
+char *qdev_get_human_name(DeviceState *dev)

+{
+if (!dev) {
+return g_strdup("");
+}
+
+return dev->id ?
+   g_strdup(dev->id) : object_get_canonical_path(OBJECT(dev));
+}
+
  static MachineInitPhase machine_phase;
  
  bool phase_check(MachineInitPhase phase)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 151d968238..a8c742b4a3 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -993,6 +993,21 @@ const char *qdev_fw_name(DeviceState *dev);
  void qdev_assert_realized_properly(void);
  Object *qdev_get_machine(void);
  
+/**

+ * qdev_get_human_name() - Return a human-readable name for a device
+ * @dev: The device
+ *
+ * .. note::
+ *This function is intended for user friendly error messages.
+ *
+ * Returns: A newly allocated string containing the device id if not null,
+ * else the object canonical path if not null. If @dev is NULL, it returns an
+ * allocated empty string.


In which case do we want to call this with NULL?


None I could think of, just future-proofing the NULL case.



+ *
+ * Use g_free() to free it.
+ */
+char *qdev_get_human_name(DeviceState *dev);
+
  /* FIXME: make this a link<> */
  bool qdev_set_parent_bus(DeviceState *dev, BusState *bus, Error **errp);
  






Re: [PATCH 1/2] hw/core/qdev.c: add qdev_get_human_name()

2024-01-23 Thread Philippe Mathieu-Daudé

Hi Manos,

On 23/1/24 09:09, Manos Pitsidianakis wrote:

Add a simple method to return some kind of human readable identifier for
use in error messages.

Signed-off-by: Manos Pitsidianakis 
---
  hw/core/qdev.c | 10 ++
  include/hw/qdev-core.h | 15 +++
  2 files changed, 25 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 43d863b0c5..499f191826 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -879,6 +879,16 @@ Object *qdev_get_machine(void)
  return dev;
  }
  
+char *qdev_get_human_name(DeviceState *dev)

+{
+if (!dev) {
+return g_strdup("");
+}
+
+return dev->id ?
+   g_strdup(dev->id) : object_get_canonical_path(OBJECT(dev));
+}
+
  static MachineInitPhase machine_phase;
  
  bool phase_check(MachineInitPhase phase)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 151d968238..a8c742b4a3 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -993,6 +993,21 @@ const char *qdev_fw_name(DeviceState *dev);
  void qdev_assert_realized_properly(void);
  Object *qdev_get_machine(void);
  
+/**

+ * qdev_get_human_name() - Return a human-readable name for a device
+ * @dev: The device
+ *
+ * .. note::
+ *This function is intended for user friendly error messages.
+ *
+ * Returns: A newly allocated string containing the device id if not null,
+ * else the object canonical path if not null. If @dev is NULL, it returns an
+ * allocated empty string.


In which case do we want to call this with NULL?


+ *
+ * Use g_free() to free it.
+ */
+char *qdev_get_human_name(DeviceState *dev);
+
  /* FIXME: make this a link<> */
  bool qdev_set_parent_bus(DeviceState *dev, BusState *bus, Error **errp);
  





Re: [PATCH 2/2] hw/block/block.c: improve confusing blk_check_size_and_read_all() error

2024-01-23 Thread Philippe Mathieu-Daudé

On 23/1/24 09:09, Manos Pitsidianakis wrote:

In cases where a device tries to read more bytes than the block device
contains, the error is vague: "device requires X bytes, block backend
provides Y bytes".

This patch changes the errors of this function to include the block
backend name, the device id and device type name where appropriate.

Signed-off-by: Manos Pitsidianakis 
---
  hw/block/block.c | 25 +++--
  hw/block/m25p80.c|  3 ++-
  hw/block/pflash_cfi01.c  |  4 ++--
  hw/block/pflash_cfi02.c  |  2 +-
  include/hw/block/block.h |  4 ++--
  5 files changed, 22 insertions(+), 16 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




[PATCH 2/2] hw/block/block.c: improve confusing blk_check_size_and_read_all() error

2024-01-23 Thread Manos Pitsidianakis
In cases where a device tries to read more bytes than the block device
contains, the error is vague: "device requires X bytes, block backend
provides Y bytes".

This patch changes the errors of this function to include the block
backend name, the device id and device type name where appropriate.

Signed-off-by: Manos Pitsidianakis 
---
 hw/block/block.c | 25 +++--
 hw/block/m25p80.c|  3 ++-
 hw/block/pflash_cfi01.c  |  4 ++--
 hw/block/pflash_cfi02.c  |  2 +-
 include/hw/block/block.h |  4 ++--
 5 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/hw/block/block.c b/hw/block/block.c
index 9f52ee6e72..624389d62d 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -54,29 +54,30 @@ static int blk_pread_nonzeroes(BlockBackend *blk, hwaddr 
size, void *buf)
  * BDRV_REQUEST_MAX_BYTES.
  * On success, return true.
  * On failure, store an error through @errp and return false.
- * Note that the error messages do not identify the block backend.
- * TODO Since callers don't either, this can result in confusing
- * errors.
+ *
  * This function not intended for actual block devices, which read on
  * demand.  It's for things like memory devices that (ab)use a block
  * backend to provide persistence.
  */
-bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
- Error **errp)
+bool blk_check_size_and_read_all(BlockBackend *blk, DeviceState *dev,
+ void *buf, hwaddr size, Error **errp)
 {
 int64_t blk_len;
 int ret;
+g_autofree char *dev_id = NULL;
 
 blk_len = blk_getlength(blk);
 if (blk_len < 0) {
 error_setg_errno(errp, -blk_len,
- "can't get size of block backend");
+ "can't get size of %s block backend", blk_name(blk));
 return false;
 }
 if (blk_len != size) {
-error_setg(errp, "device requires %" HWADDR_PRIu " bytes, "
-   "block backend provides %" PRIu64 " bytes",
-   size, blk_len);
+dev_id = qdev_get_human_name(dev);
+error_setg(errp, "%s device with id='%s' requires %" HWADDR_PRIu
+   " bytes, %s block backend provides %" PRIu64 " bytes",
+   object_get_typename(OBJECT(dev)), dev_id, size,
+   blk_name(blk), blk_len);
 return false;
 }
 
@@ -89,7 +90,11 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void 
*buf, hwaddr size,
 assert(size <= BDRV_REQUEST_MAX_BYTES);
 ret = blk_pread_nonzeroes(blk, size, buf);
 if (ret < 0) {
-error_setg_errno(errp, -ret, "can't read block backend");
+dev_id = qdev_get_human_name(dev);
+error_setg_errno(errp, -ret, "can't read %s block backend"
+ "for %s device with id='%s'",
+ blk_name(blk), object_get_typename(OBJECT(dev)),
+ dev_id);
 return false;
 }
 return true;
diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 26ce895628..0a12030a3a 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -1617,7 +1617,8 @@ static void m25p80_realize(SSIPeripheral *ss, Error 
**errp)
 trace_m25p80_binding(s);
 s->storage = blk_blockalign(s->blk, s->size);
 
-if (!blk_check_size_and_read_all(s->blk, s->storage, s->size, errp)) {
+if (!blk_check_size_and_read_all(s->blk, DEVICE(s),
+ s->storage, s->size, errp)) {
 return;
 }
 } else {
diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index f956f8bcf7..1bda8424b9 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -848,8 +848,8 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 }
 
 if (pfl->blk) {
-if (!blk_check_size_and_read_all(pfl->blk, pfl->storage, total_len,
- errp)) {
+if (!blk_check_size_and_read_all(pfl->blk, dev, pfl->storage,
+ total_len, errp)) {
 vmstate_unregister_ram(>mem, DEVICE(pfl));
 return;
 }
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 6fa56f14c0..2314142373 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -902,7 +902,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 }
 
 if (pfl->blk) {
-if (!blk_check_size_and_read_all(pfl->blk, pfl->storage,
+if (!blk_check_size_and_read_all(pfl->blk, dev, pfl->storage,
  pfl->chip_len, errp)) {
 vmstate_unregister_ram(>orig_mem, DEVICE(pfl));
 return;
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 15fff66435..de3946a5f1 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -88,8 +88,8 @@ static inline unsigned int 

[PATCH 1/2] hw/core/qdev.c: add qdev_get_human_name()

2024-01-23 Thread Manos Pitsidianakis
Add a simple method to return some kind of human readable identifier for
use in error messages.

Signed-off-by: Manos Pitsidianakis 
---
 hw/core/qdev.c | 10 ++
 include/hw/qdev-core.h | 15 +++
 2 files changed, 25 insertions(+)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 43d863b0c5..499f191826 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -879,6 +879,16 @@ Object *qdev_get_machine(void)
 return dev;
 }
 
+char *qdev_get_human_name(DeviceState *dev)
+{
+if (!dev) {
+return g_strdup("");
+}
+
+return dev->id ?
+   g_strdup(dev->id) : object_get_canonical_path(OBJECT(dev));
+}
+
 static MachineInitPhase machine_phase;
 
 bool phase_check(MachineInitPhase phase)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 151d968238..a8c742b4a3 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -993,6 +993,21 @@ const char *qdev_fw_name(DeviceState *dev);
 void qdev_assert_realized_properly(void);
 Object *qdev_get_machine(void);
 
+/**
+ * qdev_get_human_name() - Return a human-readable name for a device
+ * @dev: The device
+ *
+ * .. note::
+ *This function is intended for user friendly error messages.
+ *
+ * Returns: A newly allocated string containing the device id if not null,
+ * else the object canonical path if not null. If @dev is NULL, it returns an
+ * allocated empty string.
+ *
+ * Use g_free() to free it.
+ */
+char *qdev_get_human_name(DeviceState *dev);
+
 /* FIXME: make this a link<> */
 bool qdev_set_parent_bus(DeviceState *dev, BusState *bus, Error **errp);
 
-- 
γαῖα πυρί μιχθήτω




[PATCH 0/2] hw/block/block.c: improve confusing error

2024-01-23 Thread Manos Pitsidianakis
In cases where a device tries to read more bytes than the block device 
contains with the blk_check_size_and_read_all() function, the error is 
vague: "device requires X bytes, block backend provides Y bytes".

This patch changes the errors of this function to include the block
backend name, the device id and device type name where appropriate.

Manos Pitsidianakis (2):
  hw/core/qdev.c: add qdev_get_human_name()
  hw/block/block.c: improve confusing blk_check_size_and_read_all()
error

 hw/block/block.c | 25 +++--
 hw/block/m25p80.c|  3 ++-
 hw/block/pflash_cfi01.c  |  4 ++--
 hw/block/pflash_cfi02.c  |  2 +-
 hw/core/qdev.c   | 10 ++
 include/hw/block/block.h |  4 ++--
 include/hw/qdev-core.h   | 15 +++
 7 files changed, 47 insertions(+), 16 deletions(-)


base-commit: 09be34717190c1620f0c6e5c8765b8da354aeb4b
-- 
γαῖα πυρί μιχθήτω