date:20201106

Re: [PATCH] tests/vm: update openbsd to release 6.8

2020-11-06 Thread Brad Smith


ping.

On 10/27/2020 6:22 AM, Brad Smith wrote:

On Tue, Oct 27, 2020 at 11:05:20AM +0100, Philippe Mathieu-Daud?? wrote:

On 10/27/20 6:30 AM, Brad Smith wrote:

tests/vm: update openbsd to release 6.8

A double dash at the end of a package name removes ambiguity
when the intent is to install a non-FLAVORed package.

Signed-off-by: Brad Smith 
Reviewed-by: Gerd Hoffmann 
Tested-by: Gerd Hoffmann 
Reviewed-by: Philippe Mathieu-Daud?? 

I confirm Brad sent us this patch off-list, and
- our review comments are addressed,
- the tags are correct.

The patch format itself seems broken... Like a copy/paste
into an email client...

Well, git diff vs a format-patch.


Subject: [PATCH] tests/vm: update openbsd to release 6.8

A double dash at the end of a package name removes ambiguity
when the intent is to install a non-FLAVORed package.

Signed-off-by: Brad Smith 
Reviewed-by: Gerd Hoffmann 
Tested-by: Gerd Hoffmann 
Reviewed-by: Philippe Mathieu-Daudé 
---
  tests/vm/openbsd | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/vm/openbsd b/tests/vm/openbsd
index 8356646f21..5ffa4f1b37 100755
--- a/tests/vm/openbsd
+++ b/tests/vm/openbsd
@@ -22,8 +22,8 @@ class OpenBSDVM(basevm.BaseVM):
  name = "openbsd"
  arch = "x86_64"
  
-link = "https://cdn.openbsd.org/pub/OpenBSD/6.6/amd64/install66.iso;

-csum = "b22e63df56e6266de6bbeed8e9be0fbe9ee2291551c5bc03f3cc2e4ab9436ee3"
+link = "https://cdn.openbsd.org/pub/OpenBSD/6.8/amd64/install68.iso;
+csum = "47e291fcc2d0c1a8ae0b66329f040b33af755b6adbd21739e20bb5ad56f62b6c"
  size = "20G"
  pkgs = [
  # tools
@@ -36,10 +36,10 @@ class OpenBSDVM(basevm.BaseVM):
  "bash",
  "gmake",
  "gsed",
-"gettext",
+"gettext-tools",
  
  # libs: usb

-"libusb1",
+"libusb1--",
  
  # libs: crypto

  "gnutls",

Re: [PATCH v2 11/11] qapi/introspect.py: Add docstring to _tree_to_qlit

2020-11-06 Thread Cleber Rosa

On Mon, Oct 26, 2020 at 03:42:51PM -0400, John Snow wrote:
> Signed-off-by: John Snow 
> ---

Not a big deal, but maybe move this to an earlier position in the
series?  IMO it'd make other reviewer's life easier.  Either way:

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v2 10/11] qapi/introspect.py: improve readability of _tree_to_qlit

2020-11-06 Thread Cleber Rosa

On Mon, Oct 26, 2020 at 03:42:50PM -0400, John Snow wrote:
> Subjective, but I find getting rid of the comprehensions helps. Also,
> divide the sections into scalar and non-scalar sections, and remove
> old-style string formatting.
>

It's certainly a matter of picking your favorite poison... but for the
most part I think this is an improvement...

> Signed-off-by: John Snow 
> ---
>  scripts/qapi/introspect.py | 37 +
>  1 file changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py
> index a261e402d69..d4f28485ba5 100644
> --- a/scripts/qapi/introspect.py
> +++ b/scripts/qapi/introspect.py
> @@ -100,7 +100,7 @@ def indent(level: int) -> str:
>  
>  ret = ''
>  if obj.comment:
> -ret += indent(level) + '/* %s */\n' % obj.comment
> +ret += indent(level) + f"/* {obj.comment} */\n"
>  if obj.ifcond:
>  ret += gen_if(obj.ifcond)
>  ret += _tree_to_qlit(obj.value, level, suppress_first_indent)
> @@ -111,31 +111,36 @@ def indent(level: int) -> str:
>  ret = ''
>  if not suppress_first_indent:
>  ret += indent(level)
> +
> +# Scalars:
>  if obj is None:
>  ret += 'QLIT_QNULL'
>  elif isinstance(obj, str):
> -ret += 'QLIT_QSTR(' + to_c_string(obj) + ')'
> +ret += f"QLIT_QSTR({to_c_string(obj)})"
> +elif isinstance(obj, bool):
> +ret += "QLIT_QBOOL({:s})".format(str(obj).lower())
> +
> +# Non-scalars:
>  elif isinstance(obj, list):
> -elts = [_tree_to_qlit(elt, level + 1).strip('\n')
> -for elt in obj]
> -elts.append(indent(level + 1) + "{}")
>  ret += 'QLIT_QLIST(((QLitObject[]) {\n'
> -ret += '\n'.join(elts) + '\n'
> +for value in obj:
> +ret += _tree_to_qlit(value, level + 1).strip('\n') + '\n'
> +ret += indent(level + 1) + '{}\n'
>  ret += indent(level) + '}))'
>  elif isinstance(obj, dict):
> -elts = []
> -for key, value in sorted(obj.items()):
> -elts.append(indent(level + 1) + '{ %s, %s }' %
> -(to_c_string(key),
> - _tree_to_qlit(value, level + 1, True)))
> -elts.append(indent(level + 1) + '{}')
>  ret += 'QLIT_QDICT(((QLitDictEntry[]) {\n'
> -ret += ',\n'.join(elts) + '\n'
> +for key, value in sorted(obj.items()):
> +ret += indent(level + 1) + "{{ {:s}, {:s} }},\n".format(
> +to_c_string(key),
> +_tree_to_qlit(value, level + 1, suppress_first_indent=True)
> +)
> +ret += indent(level + 1) + '{}\n'
>  ret += indent(level) + '}))'
> -elif isinstance(obj, bool):
> -ret += 'QLIT_QBOOL(%s)' % ('true' if obj else 'false')
>  else:
> -assert False# not implemented
> +raise NotImplementedError(

This is an improvement, but doesn't fall into the *readability* bucket.
IMO it's worth splitting it out.

- Cleber.


signature.asc
Description: PGP signature

Re: [PATCH v2 09/11] qapi/introspect.py: create a typed 'Annotated' data strutcure

2020-11-06 Thread Cleber Rosa

On Mon, Oct 26, 2020 at 03:42:49PM -0400, John Snow wrote:
> This replaces _make_tree with Annotated(). By creating it as a generic
> container, we can more accurately describe the exact nature of this
> particular value. i.e., each Annotated object is actually an
> Annotated, describing its contained value.
> 
> This adds stricter typing to Annotated nodes and extra annotated
> information. It also replaces a check of "isinstance tuple" with the
> much more explicit "isinstance Annotated" which is guaranteed not to
> break if a tuple is accidentally introduced into the type tree. (Perhaps
> as a result of a bad conversion from a list.)
> 
> Signed-off-by: John Snow 
> ---
>  scripts/qapi/introspect.py | 97 +++---
>  1 file changed, 48 insertions(+), 49 deletions(-)
> 
> diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py
> index a0978cb3adb..a261e402d69 100644
> --- a/scripts/qapi/introspect.py
> +++ b/scripts/qapi/introspect.py
> @@ -13,12 +13,13 @@
>  from typing import (
>  Any,
>  Dict,
> +Generic,
> +Iterable,
>  List,
>  Optional,
>  Sequence,
> -Tuple,
> +TypeVar,
>  Union,
> -cast,
>  )
>  
>  from .common import (
> @@ -63,50 +64,48 @@
>  _scalar = Union[str, bool, None]
>  _nonscalar = Union[Dict[str, _stub], List[_stub]]
>  _value = Union[_scalar, _nonscalar]
> -TreeValue = Union[_value, 'Annotated']
> +TreeValue = Union[_value, 'Annotated[_value]']
>  
>  # This is just an alias for an object in the structure described above:
>  _DObject = Dict[str, object]
>  
> -# Represents the annotations themselves:
> -Annotations = Dict[str, object]
>  
> -# Represents an annotated node (of some kind).
> -Annotated = Tuple[_value, Annotations]
> +_AnnoType = TypeVar('_AnnoType', bound=TreeValue)

Here it becomes much harder to keep the suggestions I made on patch
5 because of forward and backward references.

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v2 08/11] qapi/introspect.py: replace 'extra' dict with 'comment' argument

2020-11-06 Thread Cleber Rosa

On Mon, Oct 26, 2020 at 03:42:48PM -0400, John Snow wrote:
> This is only used to pass in a dictionary with a comment already set, so
> skip the runaround and just accept the comment.
> 
> Signed-off-by: John Snow 
> ---

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v2 07/11] qapi/introspect.py: Unify return type of _make_tree()

2020-11-06 Thread Cleber Rosa

On Mon, Oct 26, 2020 at 03:42:47PM -0400, John Snow wrote:
> Returning two different types conditionally can be complicated to
> type. Let's always return a tuple for consistency.

This seems like a standalone change.

> Prohibit the use of
> annotations with dict-values in this circumstance. It can be implemented
> later if and when the need for it arises.

And this seems like another change.

- Cleber.


signature.asc
Description: PGP signature

Re: [PATCH v2 06/11] qapi/introspect.py: add _gen_features helper

2020-11-06 Thread Cleber Rosa

On Mon, Oct 26, 2020 at 03:42:46PM -0400, John Snow wrote:
> _make_tree might receive a dict or some other type. Adding features
> information should arguably be performed by the caller at such a time
> when we know the type of the object and don't have to re-interrogate it.
> 
> Signed-off-by: John Snow 
> ---

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v2 05/11] qapi/introspect.py: add preliminary type hint annotations

2020-11-06 Thread Cleber Rosa

On Mon, Oct 26, 2020 at 03:42:45PM -0400, John Snow wrote:
> The typing of _make_tree and friends is a bit involved, but it can be
> done with some stubbed out types and a bit of elbow grease. The
> forthcoming patches attempt to make some simplifications, but having the
> type hints in advance may aid in review of subsequent patches.
> 
> 
> Some notes on the abstract types used at this point, and what they
> represent:
> 
> - TreeValue represents any object in the type tree. _make_tree is an
>   optional call -- not every node in the final type tree will have been
>   passed to _make_tree, so this type encompasses not only what is passed
>   to _make_tree (dicts, strings) or returned from it (dicts, strings, a
>   2-tuple), but any recursive value for any of the dicts passed to
>   _make_tree -- which includes lists, strings, integers, null constants,
>   and so on.
> 
> - _DObject is a type alias I use to mean "A JSON-style object,
>   represented as a Python dict." There is no "JSON" type in Python, they
>   are converted natively to recursively nested dicts and lists, with
>   leaf values of str, int, float, None, True/False and so on. This type
>   structure is not possible to accurately portray in mypy yet, so a
>   placeholder is used.
> 
>   In this case, _DObject is being used to refer to SchemaInfo-like
>   structures as defined in qapi/introspect.json, OR any sub-object
>   values they may reference. We don't have strong typing available for
>   those, so a generic alternative is used.
> 
> - Extra refers explicitly to the dict containing "extra" information
>   about a node in the tree. mypy does not offer per-key typing for dicts
>   in Python 3.6, so this is the best we can do here.
> 
> - Annotated refers to (one of) the return types of _make_tree:
>   It represents a 2-tuple of (TreeValue, Extra).
> 
> 
> Signed-off-by: Eduardo Habkost 
> Signed-off-by: John Snow 
> ---
>  scripts/qapi/introspect.py | 157 -
>  scripts/qapi/mypy.ini  |   5 --
>  scripts/qapi/schema.py |   2 +-
>  3 files changed, 121 insertions(+), 43 deletions(-)
> 
> diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py
> index 63f721ebfb6..803288a64e7 100644
> --- a/scripts/qapi/introspect.py
> +++ b/scripts/qapi/introspect.py
> @@ -10,7 +10,16 @@
>  See the COPYING file in the top-level directory.
>  """
>  
> -from typing import Optional, Sequence, cast
> +from typing import (
> +Any,
> +Dict,
> +List,
> +Optional,
> +Sequence,
> +Tuple,
> +Union,
> +cast,
> +)
>  
>  from .common import (
>  c_name,
> @@ -20,13 +29,56 @@
>  )
>  from .gen import QAPISchemaMonolithicCVisitor
>  from .schema import (
> +QAPISchema,
>  QAPISchemaArrayType,
>  QAPISchemaBuiltinType,
> +QAPISchemaEntity,
> +QAPISchemaEnumMember,
> +QAPISchemaFeature,
> +QAPISchemaObjectType,
> +QAPISchemaObjectTypeMember,
>  QAPISchemaType,
> +QAPISchemaVariant,
> +QAPISchemaVariants,
>  )
> +from .source import QAPISourceInfo
>  
>  
> -def _make_tree(obj, ifcond, features, extra=None):
> +# This module constructs a tree-like data structure that is used to
> +# generate the introspection information for QEMU. It behaves similarly
> +# to a JSON value.
> +#
> +# A complexity over JSON is that our values may or may not be annotated.
> +#
> +# Un-annotated values may be:
> +# Scalar: str, bool, None.
> +# Non-scalar: List, Dict
> +# _Value = Union[str, bool, None, Dict[str, Value], List[Value]]

Here (and in a few other places) you mention `_Value`, but then define it as
`_value` (lowercase).

> +#
> +# With optional annotations, the type of all values is:
> +# TreeValue = Union[_Value, Annotated[_Value]]
> +#
> +# Sadly, mypy does not support recursive types, so we must approximate this.
> +_stub = Any
> +_scalar = Union[str, bool, None]
> +_nonscalar = Union[Dict[str, _stub], List[_stub]]

Why not use `Any` here instead of declaring/using `_stub`?

> +_value = Union[_scalar, _nonscalar]
> +TreeValue = Union[_value, 'Annotated']
> +

Maybe declare `Annotations` first, then `Annotated`, then *use*
`Annotated` proper here?

- Cleber.


signature.asc
Description: PGP signature

Re: [Virtio-fs] [qemu-web PATCH v2] Add virtio-fs in OSv overview blog post

2020-11-06 Thread Fotis Xenakis

From: Dr. David Alan Gilbert 
* Fotis Xenakis (fo...@windowslive.com) wrote:
> This post briefly goes over the main points of virtio-fs and OSv, a
> unikernel running under QEMU/KVM and taking advantage of its virtio-fs
> implementation.
>
> Changes since v1:
> - Fixed wording and links, as suggested by Thomas Huth.
> - Added a short example of virtio-fs usage in OSv.
>
> Signed-off-by: Fotis Xenakis 

> +One central point is OSv's support for booting from virtio-fs: this enables
> +deploying a modified version or a whole new application **without 
> rebuilding**
> +the image, just by adjusting its root file system contents on the host. Last,
> +owing to the DAX window practically providing low-overhead access to the 
> host's
> +page cache, scalability is also expected to excel, with it being a common
> +concern due to the potentially high density of unikernels per host.

Hi Fotis,
Hello Dave,
  Since I'm not used to unikernels, I'm a little confused by this; I'd
appreciate some explanation.

  In your unikernel, does the root filesystem just contain data?  I mean
being a 'unikernel' aren't all the binaries and support all linked into
the kernel itself?
Short answer: the root file system doesn't contain only data, the
executable can also be loaded from it. Although a unikernel, it
supports both the embedded-in-kernel and separate-filesystem
approaches.

OSv is more heavy-weight than most unikernels, in the sense that it
supports a lot of features found in general-purpose OSs. One could
just describe it as a specialized, light-weight OS. On the filesystem
side, it's actually pretty close e.g. to Linux: it has a VFS and multiple
filesystems:

  *   pseudo-fs (devfs and others)
  *   ramfs, which is indeed embedded into the kernel
  *   ZFS and ROFS, on regular block devices
  *   NFS, virtio-fs

Also like Linux, it initially boots into its initramfs (embedded in the
kernel) and then typically mounts a root file system from a device
(ZFS, ROFS or virtio-fs), before executing the application code.

In case this is not clear, please feel free to ask further!

Fotis

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH v9 1/1] audio/jack: fix use after free segfault

2020-11-06 Thread Geoffrey McRae

This change registers a bottom handler to close the JACK client
connection when a server shutdown signal is recieved. Without this
libjack2 attempts to "clean up" old clients and causes a use after free
segfault.

Signed-off-by: Geoffrey McRae 
---
 audio/jackaudio.c | 50 +++
 1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/audio/jackaudio.c b/audio/jackaudio.c
index 1e714b30bc..e00e19061a 100644
--- a/audio/jackaudio.c
+++ b/audio/jackaudio.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 #include "qemu/module.h"
 #include "qemu/atomic.h"
+#include "qemu/main-loop.h"
 #include "qemu-common.h"
 #include "audio.h"
 
@@ -63,6 +64,7 @@ typedef struct QJackClient {
 QJackState  state;
 jack_client_t  *client;
 jack_nframes_t  freq;
+QEMUBH *shutdown_bh;
 
 struct QJack   *j;
 int nchannels;
@@ -87,6 +89,7 @@ QJackIn;
 static int qjack_client_init(QJackClient *c);
 static void qjack_client_connect_ports(QJackClient *c);
 static void qjack_client_fini(QJackClient *c);
+QemuMutex qjack_shutdown_lock;
 
 static void qjack_buffer_create(QJackBuffer *buffer, int channels, int frames)
 {
@@ -306,21 +309,27 @@ static int qjack_xrun(void *arg)
 return 0;
 }
 
+static void qjack_shutdown_bh(void *opaque)
+{
+QJackClient *c = (QJackClient *)opaque;
+qjack_client_fini(c);
+}
+
 static void qjack_shutdown(void *arg)
 {
 QJackClient *c = (QJackClient *)arg;
 c->state = QJACK_STATE_SHUTDOWN;
+qemu_bh_schedule(c->shutdown_bh);
 }
 
 static void qjack_client_recover(QJackClient *c)
 {
-if (c->state == QJACK_STATE_SHUTDOWN) {
-qjack_client_fini(c);
+if (c->state != QJACK_STATE_DISCONNECTED) {
+return;
 }
 
 /* packets is used simply to throttle this */
-if (c->state == QJACK_STATE_DISCONNECTED &&
-c->packets % 100 == 0) {
+if (c->packets % 100 == 0) {
 
 /* if enabled then attempt to recover */
 if (c->enabled) {
@@ -489,15 +498,16 @@ static int qjack_init_out(HWVoiceOut *hw, struct 
audsettings *as,
 QJackOut *jo  = (QJackOut *)hw;
 Audiodev *dev = (Audiodev *)drv_opaque;
 
-qjack_client_fini(>c);
-
 jo->c.out   = true;
 jo->c.enabled   = false;
 jo->c.nchannels = as->nchannels;
 jo->c.opt   = dev->u.jack.out;
 
+jo->c.shutdown_bh = qemu_bh_new(qjack_shutdown_bh, >c);
+
 int ret = qjack_client_init(>c);
 if (ret != 0) {
+qemu_bh_delete(jo->c.shutdown_bh);
 return ret;
 }
 
@@ -525,15 +535,16 @@ static int qjack_init_in(HWVoiceIn *hw, struct 
audsettings *as,
 QJackIn  *ji  = (QJackIn *)hw;
 Audiodev *dev = (Audiodev *)drv_opaque;
 
-qjack_client_fini(>c);
-
 ji->c.out   = false;
 ji->c.enabled   = false;
 ji->c.nchannels = as->nchannels;
 ji->c.opt   = dev->u.jack.in;
 
+ji->c.shutdown_bh = qemu_bh_new(qjack_shutdown_bh, >c);
+
 int ret = qjack_client_init(>c);
 if (ret != 0) {
+qemu_bh_delete(ji->c.shutdown_bh);
 return ret;
 }
 
@@ -555,7 +566,7 @@ static int qjack_init_in(HWVoiceIn *hw, struct audsettings 
*as,
 return 0;
 }
 
-static void qjack_client_fini(QJackClient *c)
+static void qjack_client_fini_locked(QJackClient *c)
 {
 switch (c->state) {
 case QJACK_STATE_RUNNING:
@@ -564,28 +575,40 @@ static void qjack_client_fini(QJackClient *c)
 
 case QJACK_STATE_SHUTDOWN:
 jack_client_close(c->client);
+c->client = NULL;
+
+qjack_buffer_free(>fifo);
+g_free(c->port);
+
+c->state = QJACK_STATE_DISCONNECTED;
 /* fallthrough */
 
 case QJACK_STATE_DISCONNECTED:
 break;
 }
+}
 
-qjack_buffer_free(>fifo);
-g_free(c->port);
-
-c->state = QJACK_STATE_DISCONNECTED;
+static void qjack_client_fini(QJackClient *c)
+{
+qemu_mutex_lock(_shutdown_lock);
+qjack_client_fini_locked(c);
+qemu_mutex_unlock(_shutdown_lock);
 }
 
 static void qjack_fini_out(HWVoiceOut *hw)
 {
 QJackOut *jo = (QJackOut *)hw;
 qjack_client_fini(>c);
+
+qemu_bh_delete(jo->c.shutdown_bh);
 }
 
 static void qjack_fini_in(HWVoiceIn *hw)
 {
 QJackIn *ji = (QJackIn *)hw;
 qjack_client_fini(>c);
+
+qemu_bh_delete(ji->c.shutdown_bh);
 }
 
 static void qjack_enable_out(HWVoiceOut *hw, bool enable)
@@ -662,6 +685,7 @@ static void qjack_info(const char *msg)
 
 static void register_audio_jack(void)
 {
+qemu_mutex_init(_shutdown_lock);
 audio_driver_register(_driver);
 jack_set_thread_creator(qjack_thread_creator);
 jack_set_error_function(qjack_error);
-- 
2.20.1

[PATCH v8 0/1] audio/jack: fix use after free segfault

2020-11-06 Thread Geoffrey McRae

v9:
  * switch to using a global shutdown mutex

Geoffrey McRae (1):
  audio/jack: fix use after free segfault

 audio/jackaudio.c | 50 +++
 1 file changed, 37 insertions(+), 13 deletions(-)

-- 
2.20.1

[PATCH] hw/core/qdev-properties-system: allow bus addresses > 0x1f

2020-11-06 Thread Geoffrey McRae

The commit bccb20c49df1bd683248a366021973901c11982f introduced an error
in the checking logic that validates the bus addresses for PCI device
addresses preventing usage of devices via vfio-pci that sit at a bus
address of 0x20 or higher. This patch resolves this by reverting the
checking logic to the original maximum value of 0xff

Signed-off-by: Geoffrey McRae 
---
 hw/core/qdev-properties-system.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index b81a4e8d14..e62644bc69 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -882,7 +882,7 @@ static void set_pci_host_devaddr(Object *obj, Visitor *v, 
const char *name,
 bus = val;
 
 p = (char *)e + 1;
-if (qemu_strtoul(p, , 16, ) < 0 || val > 0x1f || e == p) {
+if (qemu_strtoul(p, , 16, ) < 0 || val > 0xff || e == p) {
 goto inval;
 }
 if (*e == ':') {
-- 
2.20.1

[PATCH v10 10/12] hw/block/nvme: Support Zone Descriptor Extensions

2020-11-06 Thread Dmitry Fomichev

Zone Descriptor Extension is a label that can be assigned to a zone.
It can be set to an Empty zone and it stays assigned until the zone
is reset.

This commit adds a new optional module property,
"zoned.descr_ext_size". Its value must be a multiple of 64 bytes.
If this value is non-zero, it becomes possible to assign extensions
of that size to any Empty zones. The default value for this property
is 0, therefore setting extensions is disabled by default.

Signed-off-by: Hans Holmberg 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Klaus Jensen 
Reviewed-by: Niklas Cassel 
---
 hw/block/nvme-ns.h|  8 +++
 hw/block/nvme-ns.c| 25 ++--
 hw/block/nvme.c   | 54 +--
 hw/block/trace-events |  2 ++
 4 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 421bab0a57..50a6a0e1ac 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -35,6 +35,7 @@ typedef struct NvmeNamespaceParams {
 uint64_t zone_cap_bs;
 uint32_t max_active_zones;
 uint32_t max_open_zones;
+uint32_t zd_extension_size;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
@@ -58,6 +59,7 @@ typedef struct NvmeNamespace {
 uint64_tzone_capacity;
 uint64_tzone_array_size;
 uint32_tzone_size_log2;
+uint8_t *zd_extensions;
 int32_t nr_open_zones;
 int32_t nr_active_zones;
 
@@ -127,6 +129,12 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone)
st != NVME_ZONE_STATE_OFFLINE;
 }
 
+static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns,
+ uint32_t zone_idx)
+{
+return >zd_extensions[zone_idx * ns->params.zd_extension_size];
+}
+
 static inline void nvme_aor_inc_open(NvmeNamespace *ns)
 {
 assert(ns->nr_open_zones >= 0);
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 2e45838c15..85dc73cf06 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -133,6 +133,18 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, 
Error **errp)
 return -1;
 }
 
+if (ns->params.zd_extension_size) {
+if (ns->params.zd_extension_size & 0x3f) {
+error_setg(errp,
+"zone descriptor extension size must be a multiple of 64B");
+return -1;
+}
+if ((ns->params.zd_extension_size >> 6) > 0xff) {
+error_setg(errp, "zone descriptor extension size is too large");
+return -1;
+}
+}
+
 return 0;
 }
 
@@ -144,6 +156,10 @@ static void nvme_init_zone_state(NvmeNamespace *ns)
 int i;
 
 ns->zone_array = g_malloc0(ns->zone_array_size);
+if (ns->params.zd_extension_size) {
+ns->zd_extensions = g_malloc0(ns->params.zd_extension_size *
+  ns->num_zones);
+}
 
 QTAILQ_INIT(>exp_open_zones);
 QTAILQ_INIT(>imp_open_zones);
@@ -186,7 +202,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace 
*ns, int lba_index,
 id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00;
 
 id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size);
-id_ns_z->lbafe[lba_index].zdes = 0;
+id_ns_z->lbafe[lba_index].zdes =
+ns->params.zd_extension_size >> 6; /* Units of 64B */
 
 ns->csi = NVME_CSI_ZONED;
 ns->id_ns.nsze = cpu_to_le64(ns->zone_size * ns->num_zones);
@@ -204,7 +221,8 @@ static void nvme_clear_zone(NvmeNamespace *ns, NvmeZone 
*zone)
 
 zone->w_ptr = zone->d.wp;
 state = nvme_get_zone_state(zone);
-if (zone->d.wp != zone->d.zslba) {
+if (zone->d.wp != zone->d.zslba ||
+(zone->d.za & NVME_ZA_ZD_EXT_VALID)) {
 if (state != NVME_ZONE_STATE_CLOSED) {
 trace_pci_nvme_clear_ns_close(state, zone->d.zslba);
 nvme_set_zone_state(zone, NVME_ZONE_STATE_CLOSED);
@@ -301,6 +319,7 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
 if (ns->params.zoned) {
 g_free(ns->id_ns_zoned);
 g_free(ns->zone_array);
+g_free(ns->zd_extensions);
 }
 }
 
@@ -332,6 +351,8 @@ static Property nvme_ns_props[] = {
params.max_active_zones, 0),
 DEFINE_PROP_UINT32("zoned.max_open", NvmeNamespace,
params.max_open_zones, 0),
+DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace,
+   params.zd_extension_size, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index c6b3a5dcf7..e82e3be821 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1703,6 +1703,26 @@ static uint16_t nvme_offline_zone(NvmeNamespace *ns, 
NvmeZone *zone,
 return NVME_ZONE_INVAL_TRANSITION;
 }
 
+static uint16_t nvme_set_zd_ext(NvmeNamespace *ns, NvmeZone *zone)
+{
+uint16_t status;
+uint8_t state = nvme_get_zone_state(zone);
+
+if (state == NVME_ZONE_STATE_EMPTY) {
+nvme_auto_transition_zone(ns, false, true);
+

[PATCH v10 09/12] hw/block/nvme: Introduce max active and open zone limits

2020-11-06 Thread Dmitry Fomichev

Add two module properties, "zoned.max_active" and "zoned.max_open"
to control the maximum number of zones that can be active or open.
Once these variables are set to non-default values, these limits are
checked during I/O and Too Many Active or Too Many Open command status
is returned if they are exceeded.

Signed-off-by: Hans Holmberg 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Niklas Cassel 
---
 hw/block/nvme-ns.h| 41 +++
 hw/block/nvme-ns.c| 30 +-
 hw/block/nvme.c   | 94 +++
 hw/block/trace-events |  2 +
 4 files changed, 165 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index d2631ff5a3..421bab0a57 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -33,6 +33,8 @@ typedef struct NvmeNamespaceParams {
 bool cross_zone_read;
 uint64_t zone_size_bs;
 uint64_t zone_cap_bs;
+uint32_t max_active_zones;
+uint32_t max_open_zones;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
@@ -56,6 +58,8 @@ typedef struct NvmeNamespace {
 uint64_tzone_capacity;
 uint64_tzone_array_size;
 uint32_tzone_size_log2;
+int32_t nr_open_zones;
+int32_t nr_active_zones;
 
 NvmeNamespaceParams params;
 } NvmeNamespace;
@@ -123,6 +127,43 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone)
st != NVME_ZONE_STATE_OFFLINE;
 }
 
+static inline void nvme_aor_inc_open(NvmeNamespace *ns)
+{
+assert(ns->nr_open_zones >= 0);
+if (ns->params.max_open_zones) {
+ns->nr_open_zones++;
+assert(ns->nr_open_zones <= ns->params.max_open_zones);
+}
+}
+
+static inline void nvme_aor_dec_open(NvmeNamespace *ns)
+{
+if (ns->params.max_open_zones) {
+assert(ns->nr_open_zones > 0);
+ns->nr_open_zones--;
+}
+assert(ns->nr_open_zones >= 0);
+}
+
+static inline void nvme_aor_inc_active(NvmeNamespace *ns)
+{
+assert(ns->nr_active_zones >= 0);
+if (ns->params.max_active_zones) {
+ns->nr_active_zones++;
+assert(ns->nr_active_zones <= ns->params.max_active_zones);
+}
+}
+
+static inline void nvme_aor_dec_active(NvmeNamespace *ns)
+{
+if (ns->params.max_active_zones) {
+assert(ns->nr_active_zones > 0);
+ns->nr_active_zones--;
+assert(ns->nr_active_zones >= ns->nr_open_zones);
+}
+assert(ns->nr_active_zones >= 0);
+}
+
 int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_flush(NvmeNamespace *ns);
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index e6db7f7d3b..2e45838c15 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -119,6 +119,20 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, 
Error **errp)
 ns->zone_size_log2 = 63 - clz64(ns->zone_size);
 }
 
+/* Make sure that the values of all ZNS properties are sane */
+if (ns->params.max_open_zones > nz) {
+error_setg(errp,
+   "max_open_zones value %u exceeds the number of zones %u",
+   ns->params.max_open_zones, nz);
+return -1;
+}
+if (ns->params.max_active_zones > nz) {
+error_setg(errp,
+   "max_active_zones value %u exceeds the number of zones %u",
+   ns->params.max_active_zones, nz);
+return -1;
+}
+
 return 0;
 }
 
@@ -166,8 +180,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace 
*ns, int lba_index,
 id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned));
 
 /* MAR/MOR are zeroes-based, 0x means no limit */
-id_ns_z->mar = 0x;
-id_ns_z->mor = 0x;
+id_ns_z->mar = cpu_to_le32(ns->params.max_active_zones - 1);
+id_ns_z->mor = cpu_to_le32(ns->params.max_open_zones - 1);
 id_ns_z->zoc = 0;
 id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00;
 
@@ -195,6 +209,7 @@ static void nvme_clear_zone(NvmeNamespace *ns, NvmeZone 
*zone)
 trace_pci_nvme_clear_ns_close(state, zone->d.zslba);
 nvme_set_zone_state(zone, NVME_ZONE_STATE_CLOSED);
 }
+nvme_aor_inc_active(ns);
 QTAILQ_INSERT_HEAD(>closed_zones, zone, entry);
 } else {
 trace_pci_nvme_clear_ns_reset(state, zone->d.zslba);
@@ -211,16 +226,23 @@ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns)
 
 QTAILQ_FOREACH_SAFE(zone, >closed_zones, entry, next) {
 QTAILQ_REMOVE(>closed_zones, zone, entry);
+nvme_aor_dec_active(ns);
 nvme_clear_zone(ns, zone);
 }
 QTAILQ_FOREACH_SAFE(zone, >imp_open_zones, entry, next) {
 QTAILQ_REMOVE(>imp_open_zones, zone, entry);
+nvme_aor_dec_open(ns);
+nvme_aor_dec_active(ns);
 nvme_clear_zone(ns, zone);
 }
 QTAILQ_FOREACH_SAFE(zone, >exp_open_zones, entry, next) {
 QTAILQ_REMOVE(>exp_open_zones, zone, entry);
+nvme_aor_dec_open(ns);
+

[PATCH v10 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-06 Thread Dmitry Fomichev

The emulation code has been changed to advertise NVM Command Set when
"zoned" device property is not set (default) and Zoned Namespace
Command Set otherwise.

Define values and structures that are needed to support Zoned
Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
Define trace events where needed in newly introduced code.

In order to improve scalability, all open, closed and full zones
are organized in separate linked lists. Consequently, almost all
zone operations don't require scanning of the entire zone array
(which potentially can be quite large) - it is only necessary to
enumerate one or more zone lists.

Handlers for three new NVMe commands introduced in Zoned Namespace
Command Set specification are added, namely for Zone Management
Receive, Zone Management Send and Zone Append.

Device initialization code has been extended to create a proper
configuration for zoned operation using device properties.

Read/Write command handler is modified to only allow writes at the
write pointer if the namespace is zoned. For Zone Append command,
writes implicitly happen at the write pointer and the starting write
pointer value is returned as the result of the command. Write Zeroes
handler is modified to add zoned checks that are identical to those
done as a part of Write flow.

Subsequent commits in this series add ZDE support and checks for
active and open zone limits.

Signed-off-by: Niklas Cassel 
Signed-off-by: Hans Holmberg 
Signed-off-by: Ajay Joshi 
Signed-off-by: Chaitanya Kulkarni 
Signed-off-by: Matias Bjorling 
Signed-off-by: Aravind Ramesh 
Signed-off-by: Shin'ichiro Kawasaki 
Signed-off-by: Adam Manzanares 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Niklas Cassel 
---
 hw/block/nvme-ns.h|  54 +++
 hw/block/nvme.h   |   8 +
 hw/block/nvme-ns.c| 173 
 hw/block/nvme.c   | 972 +-
 hw/block/trace-events |  18 +-
 5 files changed, 1210 insertions(+), 15 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 2d9cd29d07..d2631ff5a3 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -19,9 +19,20 @@
 #define NVME_NS(obj) \
 OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
 
+typedef struct NvmeZone {
+NvmeZoneDescr   d;
+uint64_tw_ptr;
+QTAILQ_ENTRY(NvmeZone) entry;
+} NvmeZone;
+
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
 QemuUUID uuid;
+
+bool zoned;
+bool cross_zone_read;
+uint64_t zone_size_bs;
+uint64_t zone_cap_bs;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
@@ -34,6 +45,18 @@ typedef struct NvmeNamespace {
 bool attached;
 uint8_t  csi;
 
+NvmeIdNsZoned   *id_ns_zoned;
+NvmeZone*zone_array;
+QTAILQ_HEAD(, NvmeZone) exp_open_zones;
+QTAILQ_HEAD(, NvmeZone) imp_open_zones;
+QTAILQ_HEAD(, NvmeZone) closed_zones;
+QTAILQ_HEAD(, NvmeZone) full_zones;
+uint32_tnum_zones;
+uint64_tzone_size;
+uint64_tzone_capacity;
+uint64_tzone_array_size;
+uint32_tzone_size_log2;
+
 NvmeNamespaceParams params;
 } NvmeNamespace;
 
@@ -71,8 +94,39 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t 
lba)
 
 typedef struct NvmeCtrl NvmeCtrl;
 
+static inline uint8_t nvme_get_zone_state(NvmeZone *zone)
+{
+return zone->d.zs >> 4;
+}
+
+static inline void nvme_set_zone_state(NvmeZone *zone, enum NvmeZoneState 
state)
+{
+zone->d.zs = state << 4;
+}
+
+static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone *zone)
+{
+return zone->d.zslba + ns->zone_size;
+}
+
+static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone)
+{
+return zone->d.zslba + zone->d.zcap;
+}
+
+static inline bool nvme_wp_is_valid(NvmeZone *zone)
+{
+uint8_t st = nvme_get_zone_state(zone);
+
+return st != NVME_ZONE_STATE_FULL &&
+   st != NVME_ZONE_STATE_READ_ONLY &&
+   st != NVME_ZONE_STATE_OFFLINE;
+}
+
 int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_flush(NvmeNamespace *ns);
+void nvme_ns_shutdown(NvmeNamespace *ns);
+void nvme_ns_cleanup(NvmeNamespace *ns);
 
 #endif /* NVME_NS_H */
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index e080a2318a..4cb0615128 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -6,6 +6,9 @@
 
 #define NVME_MAX_NAMESPACES 256
 
+#define NVME_DEFAULT_ZONE_SIZE   (128 * MiB)
+#define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
+
 typedef struct NvmeParams {
 char *serial;
 uint32_t num_queues; /* deprecated since 5.1 */
@@ -16,6 +19,7 @@ typedef struct NvmeParams {
 uint32_t aer_max_queued;
 uint8_t  mdts;
 bool use_intel_id;
+uint32_t zasl_bs;
 } NvmeParams;
 
 typedef struct NvmeAsyncEvent {
@@ -28,6 +32,8 @@ typedef struct NvmeRequest {
 struct NvmeNamespace*ns;
 BlockAIOCB  *aiocb;
 uint16_tstatus;
+

[PATCH v10 07/12] block/nvme: Make ZNS-related definitions

2020-11-06 Thread Dmitry Fomichev

Define values and structures that are needed to support Zoned
Namespace Command Set (NVMe TP 4053).

Signed-off-by: Dmitry Fomichev 
---
 include/block/nvme.h | 114 ++-
 1 file changed, 113 insertions(+), 1 deletion(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 394db19022..752623b4f9 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -489,6 +489,9 @@ enum NvmeIoCommands {
 NVME_CMD_COMPARE= 0x05,
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
+NVME_CMD_ZONE_MGMT_SEND = 0x79,
+NVME_CMD_ZONE_MGMT_RECV = 0x7a,
+NVME_CMD_ZONE_APPEND= 0x7d,
 };
 
 typedef struct QEMU_PACKED NvmeDeleteQ {
@@ -648,9 +651,13 @@ typedef struct QEMU_PACKED NvmeAerResult {
 uint8_t resv;
 } NvmeAerResult;
 
+typedef struct QEMU_PACKED NvmeZonedResult {
+uint64_t slba;
+} NvmeZonedResult;
+
 typedef struct QEMU_PACKED NvmeCqe {
 uint32_tresult;
-uint32_trsvd;
+uint32_tdw1;
 uint16_tsq_head;
 uint16_tsq_id;
 uint16_tcid;
@@ -679,6 +686,7 @@ enum NvmeStatusCodes {
 NVME_INVALID_USE_OF_CMB = 0x0012,
 NVME_INVALID_PRP_OFFSET = 0x0013,
 NVME_CMD_SET_CMB_REJECTED   = 0x002b,
+NVME_INVALID_CMD_SET= 0x002c,
 NVME_LBA_RANGE  = 0x0080,
 NVME_CAP_EXCEEDED   = 0x0081,
 NVME_NS_NOT_READY   = 0x0082,
@@ -703,6 +711,14 @@ enum NvmeStatusCodes {
 NVME_CONFLICTING_ATTRS  = 0x0180,
 NVME_INVALID_PROT_INFO  = 0x0181,
 NVME_WRITE_TO_RO= 0x0182,
+NVME_ZONE_BOUNDARY_ERROR= 0x01b8,
+NVME_ZONE_FULL  = 0x01b9,
+NVME_ZONE_READ_ONLY = 0x01ba,
+NVME_ZONE_OFFLINE   = 0x01bb,
+NVME_ZONE_INVALID_WRITE = 0x01bc,
+NVME_ZONE_TOO_MANY_ACTIVE   = 0x01bd,
+NVME_ZONE_TOO_MANY_OPEN = 0x01be,
+NVME_ZONE_INVAL_TRANSITION  = 0x01bf,
 NVME_WRITE_FAULT= 0x0280,
 NVME_UNRECOVERED_READ   = 0x0281,
 NVME_E2E_GUARD_ERROR= 0x0282,
@@ -887,6 +903,11 @@ typedef struct QEMU_PACKED NvmeIdCtrl {
 uint8_t vs[1024];
 } NvmeIdCtrl;
 
+typedef struct NvmeIdCtrlZoned {
+uint8_t zasl;
+uint8_t rsvd1[4095];
+} NvmeIdCtrlZoned;
+
 enum NvmeIdCtrlOacs {
 NVME_OACS_SECURITY  = 1 << 0,
 NVME_OACS_FORMAT= 1 << 1,
@@ -1012,6 +1033,12 @@ typedef struct QEMU_PACKED NvmeLBAF {
 uint8_t rp;
 } NvmeLBAF;
 
+typedef struct QEMU_PACKED NvmeLBAFE {
+uint64_tzsze;
+uint8_t zdes;
+uint8_t rsvd9[7];
+} NvmeLBAFE;
+
 #define NVME_NSID_BROADCAST 0x
 
 typedef struct QEMU_PACKED NvmeIdNs {
@@ -1066,10 +1093,24 @@ enum NvmeNsIdentifierType {
 
 enum NvmeCsi {
 NVME_CSI_NVM= 0x00,
+NVME_CSI_ZONED  = 0x02,
 };
 
 #define NVME_SET_CSI(vec, csi) (vec |= (uint8_t)(1 << (csi)))
 
+typedef struct QEMU_PACKED NvmeIdNsZoned {
+uint16_tzoc;
+uint16_tozcs;
+uint32_tmar;
+uint32_tmor;
+uint32_trrl;
+uint32_tfrl;
+uint8_t rsvd20[2796];
+NvmeLBAFE   lbafe[16];
+uint8_t rsvd3072[768];
+uint8_t vs[256];
+} NvmeIdNsZoned;
+
 /*Deallocate Logical Block Features*/
 #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat)   ((dlfeat) & 0x10)
 #define NVME_ID_NS_DLFEAT_WRITE_ZEROES(dlfeat)((dlfeat) & 0x08)
@@ -1101,10 +1142,76 @@ enum NvmeIdNsDps {
 DPS_FIRST_EIGHT = 8,
 };
 
+enum NvmeZoneAttr {
+NVME_ZA_FINISHED_BY_CTLR = 1 << 0,
+NVME_ZA_FINISH_RECOMMENDED   = 1 << 1,
+NVME_ZA_RESET_RECOMMENDED= 1 << 2,
+NVME_ZA_ZD_EXT_VALID = 1 << 7,
+};
+
+typedef struct QEMU_PACKED NvmeZoneReportHeader {
+uint64_tnr_zones;
+uint8_t rsvd[56];
+} NvmeZoneReportHeader;
+
+enum NvmeZoneReceiveAction {
+NVME_ZONE_REPORT = 0,
+NVME_ZONE_REPORT_EXTENDED= 1,
+};
+
+enum NvmeZoneReportType {
+NVME_ZONE_REPORT_ALL = 0,
+NVME_ZONE_REPORT_EMPTY   = 1,
+NVME_ZONE_REPORT_IMPLICITLY_OPEN = 2,
+NVME_ZONE_REPORT_EXPLICITLY_OPEN = 3,
+NVME_ZONE_REPORT_CLOSED  = 4,
+NVME_ZONE_REPORT_FULL= 5,
+NVME_ZONE_REPORT_READ_ONLY   = 6,
+NVME_ZONE_REPORT_OFFLINE = 7,
+};
+
+enum NvmeZoneType {
+NVME_ZONE_TYPE_RESERVED  = 0x00,
+NVME_ZONE_TYPE_SEQ_WRITE = 0x02,
+};
+
+enum NvmeZoneSendAction {
+NVME_ZONE_ACTION_RSD = 0x00,
+NVME_ZONE_ACTION_CLOSE   = 0x01,
+NVME_ZONE_ACTION_FINISH  = 0x02,
+NVME_ZONE_ACTION_OPEN= 0x03,
+NVME_ZONE_ACTION_RESET   = 0x04,
+NVME_ZONE_ACTION_OFFLINE = 0x05,
+NVME_ZONE_ACTION_SET_ZD_EXT  = 0x10,
+};
+
+typedef struct QEMU_PACKED NvmeZoneDescr {
+uint8_t zt;
+uint8_t zs;
+uint8_t za;
+uint8_t rsvd3[5];
+uint64_tzcap;
+

[PATCH 1/2] hw/m68k/q800: Don't connect two qemu_irqs directly to the same input

2020-11-06 Thread Peter Maydell

The q800 board code connects both of the IRQ outputs of the ESCC
to the same pic[3] qemu_irq. Connecting two qemu_irqs outputs directly
to the same input is not valid as it produces subtly wrong behaviour
(for instance if both the IRQ lines are high, and then one goes
low, the PIC input will see this as a high-to-low transition
even though the second IRQ line should still be holding it high).

This kind of wiring needs an explicitly created OR gate; add one.

Signed-off-by: Peter Maydell 
---
 hw/m68k/q800.c  | 12 ++--
 hw/m68k/Kconfig |  1 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
index ce4b47c3e34..dc13007aaf2 100644
--- a/hw/m68k/q800.c
+++ b/hw/m68k/q800.c
@@ -28,6 +28,7 @@
 #include "hw/hw.h"
 #include "hw/boards.h"
 #include "hw/irq.h"
+#include "hw/or-irq.h"
 #include "elf.h"
 #include "hw/loader.h"
 #include "ui/console.h"
@@ -171,6 +172,7 @@ static void q800_init(MachineState *machine)
 CPUState *cs;
 DeviceState *dev;
 DeviceState *via_dev;
+DeviceState *escc_orgate;
 SysBusESPState *sysbus_esp;
 ESPState *esp;
 SysBusDevice *sysbus;
@@ -283,8 +285,14 @@ static void q800_init(MachineState *machine)
 qdev_prop_set_uint32(dev, "chnAtype", 0);
 sysbus = SYS_BUS_DEVICE(dev);
 sysbus_realize_and_unref(sysbus, _fatal);
-sysbus_connect_irq(sysbus, 0, pic[3]);
-sysbus_connect_irq(sysbus, 1, pic[3]);
+
+/* Logically OR both its IRQs together */
+escc_orgate = DEVICE(object_new(TYPE_OR_IRQ));
+object_property_set_int(OBJECT(escc_orgate), "num-lines", 2, _fatal);
+qdev_realize_and_unref(escc_orgate, NULL, _fatal);
+sysbus_connect_irq(sysbus, 0, qdev_get_gpio_in(escc_orgate, 0));
+sysbus_connect_irq(sysbus, 1, qdev_get_gpio_in(escc_orgate, 1));
+qdev_connect_gpio_out(DEVICE(escc_orgate), 0, pic[3]);
 sysbus_mmio_map(sysbus, 0, SCC_BASE);
 
 /* SCSI */
diff --git a/hw/m68k/Kconfig b/hw/m68k/Kconfig
index c757e7dfa48..60d7bcfb8f2 100644
--- a/hw/m68k/Kconfig
+++ b/hw/m68k/Kconfig
@@ -22,3 +22,4 @@ config Q800
 select ESCC
 select ESP
 select DP8393X
+select OR_IRQ
-- 
2.20.1

[PATCH 2/2] hw/m68k/q800.c: Make the GLUE chip an actual QOM device

2020-11-06 Thread Peter Maydell

The handling of the GLUE (General Logic Unit) device is
currently open-coded. Make this into a proper QOM device.

This minor piece of modernisation gets rid of the free
floating qemu_irq array 'pic', which Coverity points out
is technically leaked when we exit the machine init function.
(The replacement glue device is not leaked because it gets
added to the sysbus, so it's accessible via that.)

Fixes: Coverity CID 1421883
Signed-off-by: Peter Maydell 
---
 hw/m68k/q800.c | 82 ++
 1 file changed, 70 insertions(+), 12 deletions(-)

diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
index dc13007aaf2..05bb372f958 100644
--- a/hw/m68k/q800.c
+++ b/hw/m68k/q800.c
@@ -47,6 +47,7 @@
 #include "sysemu/qtest.h"
 #include "sysemu/runstate.h"
 #include "sysemu/reset.h"
+#include "migration/vmstate.h"
 
 #define MACROM_ADDR 0x4080
 #define MACROM_SIZE 0x0010
@@ -94,10 +95,14 @@
  * CPU.
  */
 
-typedef struct {
+#define TYPE_GLUE "q800-glue"
+OBJECT_DECLARE_SIMPLE_TYPE(GLUEState, GLUE)
+
+struct GLUEState {
+SysBusDevice parent_obj;
 M68kCPU *cpu;
 uint8_t ipr;
-} GLUEState;
+};
 
 static void GLUE_set_irq(void *opaque, int irq, int level)
 {
@@ -119,6 +124,58 @@ static void GLUE_set_irq(void *opaque, int irq, int level)
 m68k_set_irq_level(s->cpu, 0, 0);
 }
 
+static void glue_reset(DeviceState *dev)
+{
+GLUEState *s = GLUE(dev);
+
+s->ipr = 0;
+}
+
+static const VMStateDescription vmstate_glue = {
+.name = "q800-glue",
+.version_id = 0,
+.minimum_version_id = 0,
+.fields = (VMStateField[]) {
+VMSTATE_UINT8(ipr, GLUEState),
+VMSTATE_END_OF_LIST(),
+},
+};
+
+/*
+ * If the m68k CPU implemented its inbound irq lines as GPIO lines
+ * rather than via the m68k_set_irq_level() function we would not need
+ * this cpu link property and could instead provide outbound IRQ lines
+ * that the board could wire up to the CPU.
+ */
+static Property glue_properties[] = {
+DEFINE_PROP_LINK("cpu", GLUEState, cpu, TYPE_M68K_CPU, M68kCPU *),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void glue_init(Object *obj)
+{
+DeviceState *dev = DEVICE(obj);
+
+qdev_init_gpio_in(dev, GLUE_set_irq, 8);
+}
+
+static void glue_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->vmsd = _glue;
+dc->reset = glue_reset;
+device_class_set_props(dc, glue_properties);
+}
+
+static const TypeInfo glue_info = {
+.name = TYPE_GLUE,
+.parent = TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(GLUEState),
+.instance_init = glue_init,
+.class_init = glue_class_init,
+};
+
 static void main_cpu_reset(void *opaque)
 {
 M68kCPU *cpu = opaque;
@@ -178,8 +235,7 @@ static void q800_init(MachineState *machine)
 SysBusDevice *sysbus;
 BusState *adb_bus;
 NubusBus *nubus;
-GLUEState *irq;
-qemu_irq *pic;
+DeviceState *glue;
 DriveInfo *dinfo;
 
 linux_boot = (kernel_filename != NULL);
@@ -213,10 +269,9 @@ static void q800_init(MachineState *machine)
 }
 
 /* IRQ Glue */
-
-irq = g_new0(GLUEState, 1);
-irq->cpu = cpu;
-pic = qemu_allocate_irqs(GLUE_set_irq, irq, 8);
+glue = qdev_new(TYPE_GLUE);
+object_property_set_link(OBJECT(glue), "cpu", OBJECT(cpu), _abort);
+sysbus_realize_and_unref(SYS_BUS_DEVICE(glue), _fatal);
 
 /* VIA */
 
@@ -228,8 +283,10 @@ static void q800_init(MachineState *machine)
 sysbus = SYS_BUS_DEVICE(via_dev);
 sysbus_realize_and_unref(sysbus, _fatal);
 sysbus_mmio_map(sysbus, 0, VIA_BASE);
-qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 0, pic[0]);
-qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 1, pic[1]);
+qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 0,
+qdev_get_gpio_in(glue, 0));
+qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 1,
+qdev_get_gpio_in(glue, 1));
 
 
 adb_bus = qdev_get_child_bus(via_dev, "adb.0");
@@ -270,7 +327,7 @@ static void q800_init(MachineState *machine)
 sysbus_realize_and_unref(sysbus, _fatal);
 sysbus_mmio_map(sysbus, 0, SONIC_BASE);
 sysbus_mmio_map(sysbus, 1, SONIC_PROM_BASE);
-sysbus_connect_irq(sysbus, 0, pic[2]);
+sysbus_connect_irq(sysbus, 0, qdev_get_gpio_in(glue, 2));
 
 /* SCC */
 
@@ -292,7 +349,7 @@ static void q800_init(MachineState *machine)
 qdev_realize_and_unref(escc_orgate, NULL, _fatal);
 sysbus_connect_irq(sysbus, 0, qdev_get_gpio_in(escc_orgate, 0));
 sysbus_connect_irq(sysbus, 1, qdev_get_gpio_in(escc_orgate, 1));
-qdev_connect_gpio_out(DEVICE(escc_orgate), 0, pic[3]);
+qdev_connect_gpio_out(DEVICE(escc_orgate), 0, qdev_get_gpio_in(glue, 3));
 sysbus_mmio_map(sysbus, 0, SCC_BASE);
 
 /* SCSI */
@@ -457,6 +514,7 @@ static const TypeInfo q800_machine_typeinfo = {
 static void q800_machine_register_types(void)
 {
 type_register_static(_machine_typeinfo);
+

[PATCH 0/2] m68k/q800: make the GLUE chip a QOM device

2020-11-06 Thread Peter Maydell

This series is 6.0 material really I think.  It's a bit of cleanup
prompted by a Coverity issue, CID 1421883.  There are another half
dozen or so similar issues, where Coverity is complaining that we
allocate an array of qemu_irqs with qemu_allocate_irqs() in a board
init function -- in this case the 'pic' array in q800_init() -- and
then we return from the board init function and the memory is leaked,
in the sense that nobody has a pointer to it any more.

The leak isn't real, in that board init functions are called only
once, and the array of qemu_irqs really does need to stay around for
the life of the simulation, so these are pretty much insignificant
as Coverity issues go. But this coding style which uses a free-floating
set of qemu_irqs is not very "modern QEMU", so the issues act as
a nudge that we should clean the code up by encapsulating the
interrupt-line behaviour in a QOM device. In the q800 case there
actually is already a GLUEState struct, it just needs to be turned
into a QOM device with GPIO input lines. Patch 2 does that.

Patch 1 fixes a bug I noticed while doing this work -- it's
not valid to connect two qemu_irq lines directly to the same
input (here both ESCC irq lines go to pic[3]) because it produces
weird behaviour like "both lines are asserted but the device
consuming the interrupt sees the line deassert when one of the
two inputs goes low, rather than only when they both go low".
You need to put an explicit OR gate in, assuming that logical-OR
is the desired behaviour, which it usually is.

Tested only with 'make check' and 'make check-acceptance',
but the latter does have a q800 bootup test.

thanks
-- PMM

Peter Maydell (2):
  hw/m68k/q800: Don't connect two qemu_irqs directly to the same input
  hw/m68k/q800.c: Make the GLUE chip an actual QOM device

 hw/m68k/q800.c  | 92 ++---
 hw/m68k/Kconfig |  1 +
 2 files changed, 80 insertions(+), 13 deletions(-)

-- 
2.20.1

[PATCH v10 05/12] hw/block/nvme: Add support for Namespace Types

2020-11-06 Thread Dmitry Fomichev

From: Niklas Cassel 

Define the structures and constants required to implement
Namespace Types support.

Namespace Types introduce a new command set, "I/O Command Sets",
that allows the host to retrieve the command sets associated with
a namespace. Introduce support for the command set and enable
detection for the NVM Command Set.

The new workflows for identify commands rely heavily on zero-filled
identify structs. E.g., certain CNS commands are defined to return
a zero-filled identify struct when an inactive namespace NSID
is supplied.

Add a helper function in order to avoid code duplication when
reporting zero-filled identify structures.

Signed-off-by: Niklas Cassel 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.h|   1 +
 include/block/nvme.h  |  66 +++
 hw/block/nvme-ns.c|   2 +
 hw/block/nvme.c   | 188 +++---
 hw/block/trace-events |   7 ++
 5 files changed, 219 insertions(+), 45 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index a38071884a..d795e44bab 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -31,6 +31,7 @@ typedef struct NvmeNamespace {
 int64_t  size;
 NvmeIdNs id_ns;
 const uint32_t *iocs;
+uint8_t  csi;
 
 NvmeNamespaceParams params;
 } NvmeNamespace;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index f62cc90d49..af23514713 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -84,6 +84,7 @@ enum NvmeCapMask {
 
 enum NvmeCapCss {
 NVME_CAP_CSS_NVM= 1 << 0,
+NVME_CAP_CSS_CSI_SUPP   = 1 << 6,
 NVME_CAP_CSS_ADMIN_ONLY = 1 << 7,
 };
 
@@ -117,9 +118,25 @@ enum NvmeCcMask {
 
 enum NvmeCcCss {
 NVME_CC_CSS_NVM= 0x0,
+NVME_CC_CSS_CSI= 0x6,
 NVME_CC_CSS_ADMIN_ONLY = 0x7,
 };
 
+#define NVME_SET_CC_EN(cc, val) \
+(cc |= (uint32_t)((val) & CC_EN_MASK) << CC_EN_SHIFT)
+#define NVME_SET_CC_CSS(cc, val)\
+(cc |= (uint32_t)((val) & CC_CSS_MASK) << CC_CSS_SHIFT)
+#define NVME_SET_CC_MPS(cc, val)\
+(cc |= (uint32_t)((val) & CC_MPS_MASK) << CC_MPS_SHIFT)
+#define NVME_SET_CC_AMS(cc, val)\
+(cc |= (uint32_t)((val) & CC_AMS_MASK) << CC_AMS_SHIFT)
+#define NVME_SET_CC_SHN(cc, val)\
+(cc |= (uint32_t)((val) & CC_SHN_MASK) << CC_SHN_SHIFT)
+#define NVME_SET_CC_IOSQES(cc, val) \
+(cc |= (uint32_t)((val) & CC_IOSQES_MASK) << CC_IOSQES_SHIFT)
+#define NVME_SET_CC_IOCQES(cc, val) \
+(cc |= (uint32_t)((val) & CC_IOCQES_MASK) << CC_IOCQES_SHIFT)
+
 enum NvmeCstsShift {
 CSTS_RDY_SHIFT  = 0,
 CSTS_CFS_SHIFT  = 1,
@@ -534,8 +551,13 @@ typedef struct QEMU_PACKED NvmeIdentify {
 uint64_trsvd2[2];
 uint64_tprp1;
 uint64_tprp2;
-uint32_tcns;
-uint32_trsvd11[5];
+uint8_t cns;
+uint8_t rsvd10;
+uint16_tctrlid;
+uint16_tnvmsetid;
+uint8_t rsvd11;
+uint8_t csi;
+uint32_trsvd12[4];
 } NvmeIdentify;
 
 typedef struct QEMU_PACKED NvmeRwCmd {
@@ -656,6 +678,7 @@ enum NvmeStatusCodes {
 NVME_SGL_DESCR_TYPE_INVALID = 0x0011,
 NVME_INVALID_USE_OF_CMB = 0x0012,
 NVME_INVALID_PRP_OFFSET = 0x0013,
+NVME_CMD_SET_CMB_REJECTED   = 0x002b,
 NVME_LBA_RANGE  = 0x0080,
 NVME_CAP_EXCEEDED   = 0x0081,
 NVME_NS_NOT_READY   = 0x0082,
@@ -782,11 +805,15 @@ typedef struct QEMU_PACKED NvmePSD {
 
 #define NVME_IDENTIFY_DATA_SIZE 4096
 
-enum {
-NVME_ID_CNS_NS = 0x0,
-NVME_ID_CNS_CTRL   = 0x1,
-NVME_ID_CNS_NS_ACTIVE_LIST = 0x2,
-NVME_ID_CNS_NS_DESCR_LIST  = 0x3,
+enum NvmeIdCns {
+NVME_ID_CNS_NS= 0x00,
+NVME_ID_CNS_CTRL  = 0x01,
+NVME_ID_CNS_NS_ACTIVE_LIST= 0x02,
+NVME_ID_CNS_NS_DESCR_LIST = 0x03,
+NVME_ID_CNS_CS_NS = 0x05,
+NVME_ID_CNS_CS_CTRL   = 0x06,
+NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07,
+NVME_ID_CNS_IO_COMMAND_SET= 0x1c,
 };
 
 typedef struct QEMU_PACKED NvmeIdCtrl {
@@ -934,6 +961,7 @@ enum NvmeFeatureIds {
 NVME_WRITE_ATOMICITY= 0xa,
 NVME_ASYNCHRONOUS_EVENT_CONF= 0xb,
 NVME_TIMESTAMP  = 0xe,
+NVME_COMMAND_SET_PROFILE= 0x19,
 NVME_SOFTWARE_PROGRESS_MARKER   = 0x80,
 NVME_FID_MAX= 0x100,
 };
@@ -1018,18 +1046,26 @@ typedef struct QEMU_PACKED NvmeIdNsDescr {
 uint8_t rsvd2[2];
 } NvmeIdNsDescr;
 
-enum {
-NVME_NIDT_EUI64_LEN =  8,
-NVME_NIDT_NGUID_LEN = 16,
-NVME_NIDT_UUID_LEN  = 16,
+enum NvmeNsIdentifierLength {
+NVME_NIDL_EUI64 = 8,
+NVME_NIDL_NGUID = 16,
+NVME_NIDL_UUID  = 16,
+NVME_NIDL_CSI   = 1,
 };
 
 enum NvmeNsIdentifierType {
-NVME_NIDT_EUI64 = 0x1,
-NVME_NIDT_NGUID = 0x2,
-NVME_NIDT_UUID  = 0x3,
+NVME_NIDT_EUI64 = 0x01,
+NVME_NIDT_NGUID = 0x02,
+

[PATCH v10 04/12] hw/block/nvme: Merge nvme_write_zeroes() with nvme_write()

2020-11-06 Thread Dmitry Fomichev

nvme_write() now handles WRITE, WRITE ZEROES and ZONE_APPEND.

Signed-off-by: Dmitry Fomichev 
Reviewed-by: Niklas Cassel 
Acked-by: Klaus Jensen 
---
 hw/block/nvme.c   | 72 +--
 hw/block/trace-events |  1 -
 2 files changed, 28 insertions(+), 45 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 770e42a066..48adbe84f5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1001,32 +1001,7 @@ invalid:
 return status | NVME_DNR;
 }
 
-static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
-{
-NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
-NvmeNamespace *ns = req->ns;
-uint64_t slba = le64_to_cpu(rw->slba);
-uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
-uint64_t offset = nvme_l2b(ns, slba);
-uint32_t count = nvme_l2b(ns, nlb);
-uint16_t status;
-
-trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), slba, nlb);
-
-status = nvme_check_bounds(n, ns, slba, nlb);
-if (status) {
-trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
-return status;
-}
-
-block_acct_start(blk_get_stats(req->ns->blkconf.blk), >acct, 0,
- BLOCK_ACCT_WRITE);
-req->aiocb = blk_aio_pwrite_zeroes(req->ns->blkconf.blk, offset, count,
-   BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req);
-return NVME_NO_COMPLETE;
-}
-
-static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool wrz)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
 NvmeNamespace *ns = req->ns;
@@ -1040,10 +1015,12 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode),
  nvme_nsid(ns), nlb, data_size, slba);
 
-status = nvme_check_mdts(n, data_size);
-if (status) {
-trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
-goto invalid;
+if (!wrz) {
+status = nvme_check_mdts(n, data_size);
+if (status) {
+trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
+goto invalid;
+}
 }
 
 status = nvme_check_bounds(n, ns, slba, nlb);
@@ -1052,21 +1029,28 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest 
*req)
 goto invalid;
 }
 
-status = nvme_map_dptr(n, data_size, req);
-if (status) {
-goto invalid;
-}
-
 data_offset = nvme_l2b(ns, slba);
 
-block_acct_start(blk_get_stats(blk), >acct, data_size,
- BLOCK_ACCT_WRITE);
-if (req->qsg.sg) {
-req->aiocb = dma_blk_write(blk, >qsg, data_offset,
-   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+if (!wrz) {
+status = nvme_map_dptr(n, data_size, req);
+if (status) {
+goto invalid;
+}
+
+block_acct_start(blk_get_stats(blk), >acct, data_size,
+ BLOCK_ACCT_WRITE);
+if (req->qsg.sg) {
+req->aiocb = dma_blk_write(blk, >qsg, data_offset,
+   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+} else {
+req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0,
+ nvme_rw_cb, req);
+}
 } else {
-req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0,
- nvme_rw_cb, req);
+block_acct_start(blk_get_stats(blk), >acct, 0, BLOCK_ACCT_WRITE);
+req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size,
+   BDRV_REQ_MAY_UNMAP, nvme_rw_cb,
+   req);
 }
 return NVME_NO_COMPLETE;
 
@@ -1100,9 +1084,9 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 case NVME_CMD_FLUSH:
 return nvme_flush(n, req);
 case NVME_CMD_WRITE_ZEROES:
-return nvme_write_zeroes(n, req);
+return nvme_write(n, req, true);
 case NVME_CMD_WRITE:
-return nvme_write(n, req);
+return nvme_write(n, req, false);
 case NVME_CMD_READ:
 return nvme_read(n, req);
 default:
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 540c600931..e67e96c2b5 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -43,7 +43,6 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t 
opcode, const char *opna
 pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, 
uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 
0x%"PRIx64""
 pci_nvme_write(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, 
uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb 
%"PRIu32" count %"PRIu64" lba 0x%"PRIx64""
 pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'"
-pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t 
nlb) "cid %"PRIu16" nsid %"PRIu32"

[PATCH v10 02/12] hw/block/nvme: Generate namespace UUIDs

2020-11-06 Thread Dmitry Fomichev

In NVMe 1.4, a namespace must report an ID descriptor of UUID type
if it doesn't support EUI64 or NGUID. Add a new namespace property,
"uuid", that provides the user the option to either specify the UUID
explicitly or have a UUID generated automatically every time a
namespace is initialized.

Suggested-by: Klaus Jensen 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Klaus Jensen 
Reviewed-by: Keith Busch 
Reviewed-by: Niklas Cassel 
---
 hw/block/nvme-ns.h | 1 +
 hw/block/nvme-ns.c | 1 +
 hw/block/nvme.c| 9 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index ea8c2f785d..a38071884a 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -21,6 +21,7 @@
 
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
+QemuUUID uuid;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index b69cdaf27e..de735eb9f3 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -129,6 +129,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
+DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 702f7cc2e3..ed3f38f01d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1564,6 +1564,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req)
 
 static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
 {
+NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
 uint32_t nsid = le32_to_cpu(c->nsid);
 uint8_t list[NVME_IDENTIFY_DATA_SIZE];
@@ -1583,7 +1584,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_NSID | NVME_DNR;
 }
 
-if (unlikely(!nvme_ns(n, nsid))) {
+ns = nvme_ns(n, nsid);
+if (unlikely(!ns)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -1592,12 +1594,11 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
*n, NvmeRequest *req)
 /*
  * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data
  * structure, a Namespace UUID (nidt = 0x3) must be reported in the
- * Namespace Identification Descriptor. Add a very basic Namespace UUID
- * here.
+ * Namespace Identification Descriptor. Add the namespace UUID here.
  */
 ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID;
 ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN;
-stl_be_p(_descrs->uuid.v, nsid);
+memcpy(_descrs->uuid.v, ns->params.uuid.data, NVME_NIDT_UUID_LEN);
 
 return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE,
 DMA_DIRECTION_FROM_DEVICE, req);
-- 
2.21.0

[PATCH v10 11/12] hw/block/nvme: Add injection of Offline/Read-Only zones

2020-11-06 Thread Dmitry Fomichev

ZNS specification defines two zone conditions for the zones that no
longer can function properly, possibly because of flash wear or other
internal fault. It is useful to be able to "inject" a small number of
such zones for testing purposes.

This commit defines two optional device properties, "offline_zones"
and "rdonly_zones". Users can assign non-zero values to these variables
to specify the number of zones to be initialized as Offline or
Read-Only. The actual number of injected zones may be smaller than the
requested amount - Read-Only and Offline counts are expected to be much
smaller than the total number of zones on a drive.

Signed-off-by: Dmitry Fomichev 
Reviewed-by: Niklas Cassel 
---
 hw/block/nvme-ns.h |  2 ++
 hw/block/nvme-ns.c | 52 ++
 2 files changed, 54 insertions(+)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 50a6a0e1ac..b30478e5d7 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -36,6 +36,8 @@ typedef struct NvmeNamespaceParams {
 uint32_t max_active_zones;
 uint32_t max_open_zones;
 uint32_t zd_extension_size;
+uint32_t nr_offline_zones;
+uint32_t nr_rdonly_zones;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 85dc73cf06..5e4a6705cd 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -21,6 +21,7 @@
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
+#include "crypto/random.h"
 
 #include "hw/qdev-properties.h"
 #include "hw/qdev-core.h"
@@ -145,6 +146,20 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, 
Error **errp)
 }
 }
 
+if (ns->params.max_open_zones < nz) {
+if (ns->params.nr_offline_zones > nz - ns->params.max_open_zones) {
+error_setg(errp, "offline_zones value %u is too large",
+ns->params.nr_offline_zones);
+return -1;
+}
+if (ns->params.nr_rdonly_zones >
+nz - ns->params.max_open_zones - ns->params.nr_offline_zones) {
+error_setg(errp, "rdonly_zones value %u is too large",
+ns->params.nr_rdonly_zones);
+return -1;
+}
+}
+
 return 0;
 }
 
@@ -153,7 +168,9 @@ static void nvme_init_zone_state(NvmeNamespace *ns)
 uint64_t start = 0, zone_size = ns->zone_size;
 uint64_t capacity = ns->num_zones * zone_size;
 NvmeZone *zone;
+uint32_t rnd;
 int i;
+uint16_t zs;
 
 ns->zone_array = g_malloc0(ns->zone_array_size);
 if (ns->params.zd_extension_size) {
@@ -180,6 +197,37 @@ static void nvme_init_zone_state(NvmeNamespace *ns)
 zone->w_ptr = start;
 start += zone_size;
 }
+
+/* If required, make some zones Offline or Read Only */
+
+for (i = 0; i < ns->params.nr_offline_zones; i++) {
+do {
+qcrypto_random_bytes(, sizeof(rnd), NULL);
+rnd %= ns->num_zones;
+} while (rnd < ns->params.max_open_zones);
+zone = >zone_array[rnd];
+zs = nvme_get_zone_state(zone);
+if (zs != NVME_ZONE_STATE_OFFLINE) {
+nvme_set_zone_state(zone, NVME_ZONE_STATE_OFFLINE);
+} else {
+i--;
+}
+}
+
+for (i = 0; i < ns->params.nr_rdonly_zones; i++) {
+do {
+qcrypto_random_bytes(, sizeof(rnd), NULL);
+rnd %= ns->num_zones;
+} while (rnd < ns->params.max_open_zones);
+zone = >zone_array[rnd];
+zs = nvme_get_zone_state(zone);
+if (zs != NVME_ZONE_STATE_OFFLINE &&
+zs != NVME_ZONE_STATE_READ_ONLY) {
+nvme_set_zone_state(zone, NVME_ZONE_STATE_READ_ONLY);
+} else {
+i--;
+}
+}
 }
 
 static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index,
@@ -353,6 +401,10 @@ static Property nvme_ns_props[] = {
params.max_open_zones, 0),
 DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace,
params.zd_extension_size, 0),
+DEFINE_PROP_UINT32("zoned.offline_zones", NvmeNamespace,
+   params.nr_offline_zones, 0),
+DEFINE_PROP_UINT32("zoned.rdonly_zones", NvmeNamespace,
+   params.nr_rdonly_zones, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.21.0

[PATCH v10 01/12] hw/block/nvme: Add Commands Supported and Effects log

2020-11-06 Thread Dmitry Fomichev

This log page becomes necessary to implement to allow checking for
Zone Append command support in Zoned Namespace Command Set.

This commit adds the code to report this log page for NVM Command
Set only. The parts that are specific to zoned operation will be
added later in the series.

All incoming admin and i/o commands are now only processed if their
corresponding support bits are set in this log. This provides an
easy way to control what commands to support and what not to
depending on set CC.CSS.

Signed-off-by: Dmitry Fomichev 
Reviewed-by: Niklas Cassel 
---
 hw/block/nvme-ns.h|  1 +
 include/block/nvme.h  | 19 +
 hw/block/nvme.c   | 96 +++
 hw/block/trace-events |  1 +
 4 files changed, 108 insertions(+), 9 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 83734f4606..ea8c2f785d 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -29,6 +29,7 @@ typedef struct NvmeNamespace {
 int32_t  bootindex;
 int64_t  size;
 NvmeIdNs id_ns;
+const uint32_t *iocs;
 
 NvmeNamespaceParams params;
 } NvmeNamespace;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 8a46d9cf01..f62cc90d49 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -745,10 +745,27 @@ enum NvmeSmartWarn {
 NVME_SMART_FAILED_VOLATILE_MEDIA  = 1 << 4,
 };
 
+typedef struct NvmeEffectsLog {
+uint32_tacs[256];
+uint32_tiocs[256];
+uint8_t resv[2048];
+} NvmeEffectsLog;
+
+enum {
+NVME_CMD_EFF_CSUPP  = 1 << 0,
+NVME_CMD_EFF_LBCC   = 1 << 1,
+NVME_CMD_EFF_NCC= 1 << 2,
+NVME_CMD_EFF_NIC= 1 << 3,
+NVME_CMD_EFF_CCC= 1 << 4,
+NVME_CMD_EFF_CSE_MASK   = 3 << 16,
+NVME_CMD_EFF_UUID_SEL   = 1 << 19,
+};
+
 enum NvmeLogIdentifier {
 NVME_LOG_ERROR_INFO = 0x01,
 NVME_LOG_SMART_INFO = 0x02,
 NVME_LOG_FW_SLOT_INFO   = 0x03,
+NVME_LOG_CMD_EFFECTS= 0x05,
 };
 
 typedef struct QEMU_PACKED NvmePSD {
@@ -861,6 +878,7 @@ enum NvmeIdCtrlFrmw {
 
 enum NvmeIdCtrlLpa {
 NVME_LPA_NS_SMART = 1 << 0,
+NVME_LPA_CSE  = 1 << 1,
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
@@ -1060,6 +1078,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512);
 QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512);
+QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 065e763e4f..702f7cc2e3 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -111,6 +111,28 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
 [NVME_TIMESTAMP]= NVME_FEAT_CAP_CHANGE,
 };
 
+static const uint32_t nvme_cse_acs[256] = {
+[NVME_ADM_CMD_DELETE_SQ]= NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_CREATE_SQ]= NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_GET_LOG_PAGE] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_DELETE_CQ]= NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_CREATE_CQ]= NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_IDENTIFY] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_ABORT]= NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
+[NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
+};
+
+static const uint32_t nvme_cse_iocs_none[256];
+
+static const uint32_t nvme_cse_iocs_nvm[256] = {
+[NVME_CMD_FLUSH]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_WRITE]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+[NVME_CMD_READ] = NVME_CMD_EFF_CSUPP,
+};
+
 static void nvme_process_sq(void *opaque);
 
 static uint16_t nvme_cid(NvmeRequest *req)
@@ -1022,10 +1044,6 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req),
   req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode));
 
-if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ADMIN_ONLY) {
-return NVME_INVALID_OPCODE | NVME_DNR;
-}
-
 if (!nvme_nsid_valid(n, nsid)) {
 return NVME_INVALID_NSID | NVME_DNR;
 }
@@ -1035,6 +1053,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
+if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
+trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
+return NVME_INVALID_OPCODE | NVME_DNR;
+}
+
 switch (req->cmd.opcode) {
 case NVME_CMD_FLUSH:
 return nvme_flush(n, req);
@@ -1044,8 +1067,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 case

[PATCH v10 12/12] hw/block/nvme: Document zoned parameters in usage text

2020-11-06 Thread Dmitry Fomichev

Added brief descriptions of the new device properties that are
now available to users to configure features of Zoned Namespace
Command Set in the emulator.

This patch is for documentation only, no functionality change.

Signed-off-by: Dmitry Fomichev 
Reviewed-by: Niklas Cassel 
---
 hw/block/nvme.c | 47 ++-
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e82e3be821..6043f95116 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -9,7 +9,7 @@
  */
 
 /**
- * Reference Specs: http://www.nvmexpress.org, 1.2, 1.1, 1.0e
+ * Reference Specs: http://www.nvmexpress.org, 1.4, 1.3, 1.2, 1.1, 1.0e
  *
  *  https://nvmexpress.org/developers/nvme-specification/
  */
@@ -22,8 +22,9 @@
  *  [pmrdev=,] \
  *  max_ioqpairs=, \
  *  aerl=, aer_max_queued=, \
- *  mdts=
- *  -device nvme-ns,drive=,bus=bus_name,nsid=
+ *  mdts=,zoned.append_size_limit= \
+ *  -device nvme-ns,drive=,bus=,nsid=,\
+ *  zoned=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
@@ -41,14 +42,50 @@
  * ~~
  * - `aerl`
  *   The Asynchronous Event Request Limit (AERL). Indicates the maximum number
- *   of concurrently outstanding Asynchronous Event Request commands suppoert
+ *   of concurrently outstanding Asynchronous Event Request commands support
  *   by the controller. This is a 0's based value.
  *
  * - `aer_max_queued`
  *   This is the maximum number of events that the device will enqueue for
- *   completion when there are no oustanding AERs. When the maximum number of
+ *   completion when there are no outstanding AERs. When the maximum number of
  *   enqueued events are reached, subsequent events will be dropped.
  *
+ * - `zoned.append_size_limit`
+ *   The maximum I/O size in bytes that is allowed in Zone Append command.
+ *   The default is 128KiB. Since internally this this value is maintained as
+ *   ZASL = log2( / ), some values assigned
+ *   to this property may be rounded down and result in a lower maximum ZA
+ *   data size being in effect. By setting this property to 0, users can make
+ *   ZASL to be equal to MDTS. This property only affects zoned namespaces.
+ *
+ * Setting `zoned` to true selects Zoned Command Set at the namespace.
+ * In this case, the following namespace properties are available to configure
+ * zoned operation:
+ * zoned.zsze=
+ * The number may be followed by K, M, G as in kilo-, mega- or giga-.
+ *
+ * zoned.zcap=
+ * The value 0 (default) forces zone capacity to be the same as zone
+ * size. The value of this property may not exceed zone size.
+ *
+ * zoned.descr_ext_size=
+ * This value needs to be specified in 64B units. If it is zero,
+ * namespace(s) will not support zone descriptor extensions.
+ *
+ * zoned.max_active=
+ * The default value means there is no limit to the number of
+ * concurrently active zones.
+ *
+ * zoned.max_open=
+ * The default value means there is no limit to the number of
+ * concurrently open zones.
+ *
+ * zoned.offline_zones=
+ *
+ * zoned.rdonly_zones=
+ *
+ * zoned.cross_zone_read=
+ * Setting this property to true enables Read Across Zone Boundaries.
  */
 
 #include "qemu/osdep.h"
-- 
2.21.0

[PATCH v10 06/12] hw/block/nvme: Support allocated CNS command variants

2020-11-06 Thread Dmitry Fomichev

From: Niklas Cassel 

Many CNS commands have "allocated" command variants. These include
a namespace as long as it is allocated, that is a namespace is
included regardless if it is active (attached) or not.

While these commands are optional (they are mandatory for controllers
supporting the namespace attachment command), our QEMU implementation
is more complete by actually providing support for these CNS values.

However, since our QEMU model currently does not support the namespace
attachment command, these new allocated CNS commands will return the
same result as the active CNS command variants.

In NVMe, a namespace is active if it exists and is attached to the
controller.

Add a new Boolean namespace flag, "attached", to provide the most
basic namespace attachment support. The default value for this new
flag is true. Also, implement the logic in the new CNS values to
include/exclude namespaces based on this new property. The only thing
missing is hooking up the actual Namespace Attachment command opcode,
which will allow a user to toggle the "attached" flag per namespace.

The reason for not hooking up this command completely is because the
NVMe specification requires the namespace management command to be
supported if the namespace attachment command is supported.

Signed-off-by: Niklas Cassel 
Signed-off-by: Dmitry Fomichev 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.h   |  1 +
 include/block/nvme.h | 20 +++
 hw/block/nvme-ns.c   |  1 +
 hw/block/nvme.c  | 46 +---
 4 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index d795e44bab..2d9cd29d07 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -31,6 +31,7 @@ typedef struct NvmeNamespace {
 int64_t  size;
 NvmeIdNs id_ns;
 const uint32_t *iocs;
+bool attached;
 uint8_t  csi;
 
 NvmeNamespaceParams params;
diff --git a/include/block/nvme.h b/include/block/nvme.h
index af23514713..394db19022 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -806,14 +806,18 @@ typedef struct QEMU_PACKED NvmePSD {
 #define NVME_IDENTIFY_DATA_SIZE 4096
 
 enum NvmeIdCns {
-NVME_ID_CNS_NS= 0x00,
-NVME_ID_CNS_CTRL  = 0x01,
-NVME_ID_CNS_NS_ACTIVE_LIST= 0x02,
-NVME_ID_CNS_NS_DESCR_LIST = 0x03,
-NVME_ID_CNS_CS_NS = 0x05,
-NVME_ID_CNS_CS_CTRL   = 0x06,
-NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07,
-NVME_ID_CNS_IO_COMMAND_SET= 0x1c,
+NVME_ID_CNS_NS= 0x00,
+NVME_ID_CNS_CTRL  = 0x01,
+NVME_ID_CNS_NS_ACTIVE_LIST= 0x02,
+NVME_ID_CNS_NS_DESCR_LIST = 0x03,
+NVME_ID_CNS_CS_NS = 0x05,
+NVME_ID_CNS_CS_CTRL   = 0x06,
+NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07,
+NVME_ID_CNS_NS_PRESENT_LIST   = 0x10,
+NVME_ID_CNS_NS_PRESENT= 0x11,
+NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a,
+NVME_ID_CNS_CS_NS_PRESENT = 0x1b,
+NVME_ID_CNS_IO_COMMAND_SET= 0x1c,
 };
 
 typedef struct QEMU_PACKED NvmeIdCtrl {
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index c0362426cc..e191ef9be0 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -42,6 +42,7 @@ static void nvme_ns_init(NvmeNamespace *ns)
 id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns));
 
 ns->csi = NVME_CSI_NVM;
+ns->attached = true;
 
 /* no thin provisioning */
 id_ns->ncap = id_ns->nsze;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index bb82dd9975..7495cdb5ef 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1236,6 +1236,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, 
uint32_t buf_len,
 uint32_t trans_len;
 NvmeNamespace *ns;
 time_t current_ms;
+int i;
 
 if (off >= sizeof(smart)) {
 return NVME_INVALID_FIELD | NVME_DNR;
@@ -1246,10 +1247,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 if (!ns) {
 return NVME_INVALID_NSID | NVME_DNR;
 }
-nvme_set_blk_stats(ns, );
 } else {
-int i;
-
 for (i = 1; i <= n->num_namespaces; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
@@ -1552,7 +1550,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req,
+ bool only_active)
 {
 NvmeNamespace *ns;
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
@@ -1569,11 +1568,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req)
 return nvme_rpt_empty_id_struct(n, req);
 }
 
+if (only_active && !ns->attached) {
+return nvme_rpt_empty_id_struct(n, req);
+}
+
 return nvme_dma(n, (uint8_t *)>id_ns,

[PATCH v10 00/12] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set

2020-11-06 Thread Dmitry Fomichev

v9 -> v10:

 - Correctly check for MDTS in Zone Management Receive handler.

 - Change Klaus' "Reviewed-by" email in UUID patch.

v8 -> v9:

 - Move the modifications to "include/block/nvme.h" made to
   introduce ZNS-related definitions to a separate patch.

 - Add a new struct, NvmeZonedResult, along the same lines as the
   existing NvmeAerResult, to carry Zone Append LBA returned to
   the host. Now, there is no need to modify NvmeCqe struct except
   renaming DW1 field from "rsvd" to "dw1".

 - Add check for MDTS in Zone Management Receive handler.

 - Remove checks for ns->attached since the value of this flag
   is always true for now.

 - Rebase to the current quemu-nvme/nvme-next branch.

v7 -> v8:

 - Move refactoring commits to the front of the series.

 - Remove "attached" and "fill_pattern" device properties.

 - Only close open zones upon subsystem shutdown, not when CC.EN flag
   is set to 0. Avoid looping through all zones by iterating through
   lists of open and closed zones.

 - Improve bulk processing of zones aka zoned operations with "all"
   flag set. Avoid looping through the entire zone array for all zone
   operations except Offline Zone.

 - Prefix ZNS-related property names with "zoned.". The "zoned" Boolean
   property is retained to turn on zoned command set as it is much more
   intuitive and user-friendly compared to setting a magic number value
   to csi property.

 - Address review comments.

 - Remove unused trace events.

v6 -> v7:

 - Introduce ns->iocs initialization function earlier in the series,
   in CSE Log patch.

 - Set NVM iocs for zoned namespaces when CC.CSS is set to
   NVME_CC_CSS_NVM.

 - Clean up code in CSE log handler.
 
v5 -> v6:

 - Remove zoned state persistence code. Replace position-independent
   zone lists with QTAILQs.

 - Close all open zones upon clearing of the controller. This is
   a similar procedure to the one previously performed upon powering
   up with zone persistence. 

 - Squash NS Types and ZNS triplets of commits to keep definitions
   and trace event definitions together with the implementation code.

 - Move namespace UUID generation to a separate patch. Add the new
   "uuid" property as suggested by Klaus.

 - Rework Commands and Effects patch to make sure that the log is
   always in sync with the actual set of commands supported.

 - Add two refactoring commits at the end of the series to
   optimize read and write i/o path.

- Incorporate feedback from Keith, Klaus and Niklas:

  * fix rebase errors in nvme_identify_ns_descr_list()
  * remove unnecessary code from nvme_write_bar()
  * move csi to NvmeNamespace and use it from the beginning in NSTypes
patch
  * change zone read processing to cover all corner cases with RAZB=1
  * sync w_ptr and d.wp in case of a i/o error at the preceding zone
  * reword the commit message in active/inactive patch with the new
text from Niklas
  * correct dlfeat reporting depending on the fill pattern set
  * add more checks for "attached" n/s parameter to prevent i/o and
get/set features on inactive namespaces
  * Use DEFINE_PROP_SIZE and DEFINE_PROP_SIZE32 for zone size/capacity
and ZASL respectively
  * Improve zone size and capacity validation
  * Correctly report NSZE

v4 -> v5:

 - Rebase to the current qemu-nvme.

 - Use HostMemoryBackendFile as the backing storage for persistent
   zone metadata.

 - Fix the issue with filling the valid data in the next zone if RAZB
   is enabled.

v3 -> v4:

 - Fix bugs introduced in v2/v3 for QD > 1 operation. Now, all writes
   to a zone happen at the new write pointer variable, zone->w_ptr,
   that is advanced right after submitting the backend i/o. The existing
   zone->d.wp variable is updated upon the successful write completion
   and it is used for zone reporting. Some code has been split from
   nvme_finalize_zoned_write() function to a new function,
   nvme_advance_zone_wp().

 - Make the code compile under mingw. Switch to using QEMU API for
   mmap/msync, i.e. memory_region...(). Since mmap is not available in
   mingw (even though there is mman-win32 library available on Github),
   conditional compilation is added around these calls to avoid
   undefined symbols under mingw. A better fix would be to add stub
   functions to softmmu/memory.c for the case when CONFIG_POSIX is not
   defined, but such change is beyond the scope of this patchset and it
   can be made in a separate patch.

 - Correct permission mask used to open zone metadata file.

 - Fold "Define 64 bit cqe.result" patch into ZNS commit.

 - Use clz64/clz32 instead of defining nvme_ilog2() function.

 - Simplify rpt_empty_id_struct() code, move nvme_fill_data() back
   to ZNS patch.

 - Fix a power-on processing bug.

 - Rename NVME_CMD_ZONE_APND to NVME_CMD_ZONE_APPEND.

 - Make the list of review comments addressed in v2 of the series
   (see below).

v2 -> v3:

 - Moved nvme_fill_data() function to the NSTypes patch as it is
   now used there to output

[PATCH v10 03/12] hw/block/nvme: Separate read and write handlers

2020-11-06 Thread Dmitry Fomichev

With ZNS support in place, the majority of code in nvme_rw() has
become read- or write-specific. Move these parts to two separate
handlers, nvme_read() and nvme_write() to make the code more
readable and to remove multiple is_write checks that so far existed
in the i/o path.

This is a refactoring patch, no change in functionality.

Signed-off-by: Dmitry Fomichev 
Reviewed-by: Niklas Cassel 
Acked-by: Klaus Jensen 
---
 hw/block/nvme.c   | 91 ++-
 hw/block/trace-events |  3 +-
 2 files changed, 67 insertions(+), 27 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ed3f38f01d..770e42a066 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -953,6 +953,54 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
+static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
+NvmeNamespace *ns = req->ns;
+uint64_t slba = le64_to_cpu(rw->slba);
+uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
+uint64_t data_size = nvme_l2b(ns, nlb);
+uint64_t data_offset;
+BlockBackend *blk = ns->blkconf.blk;
+uint16_t status;
+
+trace_pci_nvme_read(nvme_cid(req), nvme_nsid(ns), nlb, data_size, slba);
+
+status = nvme_check_mdts(n, data_size);
+if (status) {
+trace_pci_nvme_err_mdts(nvme_cid(req), data_size);
+goto invalid;
+}
+
+status = nvme_check_bounds(n, ns, slba, nlb);
+if (status) {
+trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
+goto invalid;
+}
+
+status = nvme_map_dptr(n, data_size, req);
+if (status) {
+goto invalid;
+}
+
+data_offset = nvme_l2b(ns, slba);
+
+block_acct_start(blk_get_stats(blk), >acct, data_size,
+ BLOCK_ACCT_READ);
+if (req->qsg.sg) {
+req->aiocb = dma_blk_read(blk, >qsg, data_offset,
+  BDRV_SECTOR_SIZE, nvme_rw_cb, req);
+} else {
+req->aiocb = blk_aio_preadv(blk, data_offset, >iov, 0,
+nvme_rw_cb, req);
+}
+return NVME_NO_COMPLETE;
+
+invalid:
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ);
+return status | NVME_DNR;
+}
+
 static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
@@ -978,22 +1026,19 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_NO_COMPLETE;
 }
 
-static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
 NvmeNamespace *ns = req->ns;
-uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
 uint64_t slba = le64_to_cpu(rw->slba);
-
+uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
 uint64_t data_size = nvme_l2b(ns, nlb);
-uint64_t data_offset = nvme_l2b(ns, slba);
-enum BlockAcctType acct = req->cmd.opcode == NVME_CMD_WRITE ?
-BLOCK_ACCT_WRITE : BLOCK_ACCT_READ;
+uint64_t data_offset;
 BlockBackend *blk = ns->blkconf.blk;
 uint16_t status;
 
-trace_pci_nvme_rw(nvme_cid(req), nvme_io_opc_str(rw->opcode),
-  nvme_nsid(ns), nlb, data_size, slba);
+trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode),
+ nvme_nsid(ns), nlb, data_size, slba);
 
 status = nvme_check_mdts(n, data_size);
 if (status) {
@@ -1012,29 +1057,22 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 goto invalid;
 }
 
-block_acct_start(blk_get_stats(blk), >acct, data_size, acct);
+data_offset = nvme_l2b(ns, slba);
+
+block_acct_start(blk_get_stats(blk), >acct, data_size,
+ BLOCK_ACCT_WRITE);
 if (req->qsg.sg) {
-if (acct == BLOCK_ACCT_WRITE) {
-req->aiocb = dma_blk_write(blk, >qsg, data_offset,
-   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
-} else {
-req->aiocb = dma_blk_read(blk, >qsg, data_offset,
-  BDRV_SECTOR_SIZE, nvme_rw_cb, req);
-}
+req->aiocb = dma_blk_write(blk, >qsg, data_offset,
+   BDRV_SECTOR_SIZE, nvme_rw_cb, req);
 } else {
-if (acct == BLOCK_ACCT_WRITE) {
-req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0,
- nvme_rw_cb, req);
-} else {
-req->aiocb = blk_aio_preadv(blk, data_offset, >iov, 0,
-nvme_rw_cb, req);
-}
+req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0,
+ nvme_rw_cb, req);
 }
 return NVME_NO_COMPLETE;
 
 invalid:
-block_acct_invalid(blk_get_stats(ns->blkconf.blk), acct);
-return status;
+block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE);
+return status | NVME_DNR;

RE: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set

2020-11-06 Thread Dmitry Fomichev

> -Original Message-
> From: Niklas Cassel 
> Sent: Friday, November 6, 2020 6:59 AM
> To: Dmitry Fomichev 
> Cc: Keith Busch ; Klaus Jensen
> ; Kevin Wolf ; Philippe
> Mathieu-Daudé ; Max Reitz ;
> Maxim Levitsky ; Fam Zheng ;
> Alistair Francis ; Matias Bjorling
> ; Damien Le Moal ;
> qemu-bl...@nongnu.org; qemu-devel@nongnu.org
> Subject: Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace
> Command Set
> 
> On Thu, Nov 05, 2020 at 11:53:38AM +0900, Dmitry Fomichev wrote:
> > The emulation code has been changed to advertise NVM Command Set
> when
> > "zoned" device property is not set (default) and Zoned Namespace
> > Command Set otherwise.
> >
> > Define values and structures that are needed to support Zoned
> > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller
> emulator.
> > Define trace events where needed in newly introduced code.
> >
> > In order to improve scalability, all open, closed and full zones
> > are organized in separate linked lists. Consequently, almost all
> > zone operations don't require scanning of the entire zone array
> > (which potentially can be quite large) - it is only necessary to
> > enumerate one or more zone lists.
> >
> > Handlers for three new NVMe commands introduced in Zoned Namespace
> > Command Set specification are added, namely for Zone Management
> > Receive, Zone Management Send and Zone Append.
> >
> > Device initialization code has been extended to create a proper
> > configuration for zoned operation using device properties.
> >
> > Read/Write command handler is modified to only allow writes at the
> > write pointer if the namespace is zoned. For Zone Append command,
> > writes implicitly happen at the write pointer and the starting write
> > pointer value is returned as the result of the command. Write Zeroes
> > handler is modified to add zoned checks that are identical to those
> > done as a part of Write flow.
> >
> > Subsequent commits in this series add ZDE support and checks for
> > active and open zone limits.
> >
> > Signed-off-by: Niklas Cassel 
> > Signed-off-by: Hans Holmberg 
> > Signed-off-by: Ajay Joshi 
> > Signed-off-by: Chaitanya Kulkarni 
> > Signed-off-by: Matias Bjorling 
> > Signed-off-by: Aravind Ramesh 
> > Signed-off-by: Shin'ichiro Kawasaki 
> > Signed-off-by: Adam Manzanares 
> > Signed-off-by: Dmitry Fomichev 
> > Reviewed-by: Niklas Cassel 
> > ---
> >  hw/block/nvme-ns.h|  54 +++
> >  hw/block/nvme.h   |   8 +
> >  hw/block/nvme-ns.c| 173 
> >  hw/block/nvme.c   | 971
> +-
> >  hw/block/trace-events |  18 +-
> >  5 files changed, 1209 insertions(+), 15 deletions(-)
> >
> 
> (snip)
> 
> > +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest
> *req)
> > +{
> > +NvmeCmd *cmd = (NvmeCmd *)>cmd;
> > +NvmeNamespace *ns = req->ns;
> > +/* cdw12 is zero-based number of dwords to return. Convert to bytes
> */
> > +uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2;
> > +uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> > +uint32_t zone_idx, zra, zrasf, partial;
> > +uint64_t max_zones, nr_zones = 0;
> > +uint16_t ret;
> > +uint64_t slba;
> > +NvmeZoneDescr *z;
> > +NvmeZone *zs;
> > +NvmeZoneReportHeader *header;
> > +void *buf, *buf_p;
> > +size_t zone_entry_sz;
> > +
> > +req->status = NVME_SUCCESS;
> > +
> > +ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, , _idx);
> > +if (ret) {
> > +return ret;
> > +}
> > +
> > +zra = dw13 & 0xff;
> > +if (zra != NVME_ZONE_REPORT) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +zrasf = (dw13 >> 8) & 0xff;
> > +if (zrasf > NVME_ZONE_REPORT_OFFLINE) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +if (data_size < sizeof(NvmeZoneReportHeader)) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> > +ret = nvme_map_dptr(n, data_size, req);
> 
> nvme_map_dptr() call was not here in v8 patch set.
> 
> On v7 I commented that you were missing a call to nvme_check_mdts().
> I think you still need to call nvme_check_mdts in order to verify
> that data_size < mdts, no?

Ugh, I've added nvme_map_dptr() instead of nvme_check_mdts() :o
Will send the corrected version shortly...

Cheers,
DF

> 
> This function already has a call do nvme_dma(). nvme_dma() already
> calls nvme_map_dptr().
> If you use nvme_dma(), you cannot use nvme_map_dptr().
> 
> It will call nvme_map_addr() (which calls qemu_sglist_add()) on the
> same buffer twice, causing the qsg->size to be twice what the user
> sent in. Which will cause the:
> 
> if (unlikely(residual)) {
> 
> check in nvme_dma() to fail.
> 
> 
> Looking at nvme_read()/nvme_write(), they use nvme_map_dptr()
> (without any call to nvme_dma()), and then use dma_blk_read() or
> dma_blk_write(). (and they both call nvme_check_mdts())
> 
> 
> Kind regards,
> Niklas
> 
> > +if (ret) {
> > +return

Re: nbd: bitmap_to_extents() calls nbd_extent_array_add() without checking return value: coverity false positive?

2020-11-06 Thread Peter Maydell

On Fri, 6 Nov 2020 at 20:36, Eric Blake  wrote:
>
> On 11/6/20 11:22 AM, Peter Maydell wrote:
> > Hi; Coverity's "you usually check the return value of this function
> > but you didn't do that here" heuristic has fired on the code in
> > nbd/server.c:bitmap_to_extents() -- the function nbd_extent_array_add()
> > is called five times in server.c, and the return value is checked
> > in four of those, but not in the final call at the end of
> > bitmap_to_extents(). (CID 1436125.)
> >
> > Is this a false positive, or should the caller be handling an
> > error here ?
>
> False positive, but I don't mind tweaking the code to silence Coverity.
> This should do it; let me know if I should turn it into a formal patch.
>
> diff --git i/nbd/server.c w/nbd/server.c
> index d145e1a69083..377698a2ce85 100644
> --- i/nbd/server.c
> +++ w/nbd/server.c
> @@ -2128,9 +2128,8 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
>  }
>  }
>
> -if (!full) {
> -/* last non dirty extent */
> -nbd_extent_array_add(es, end - start, 0);
> +if (!full && nbd_extent_array_add(es, end - start, 0) < 0) {
> +/* last non dirty extent, not a problem if array is now full */
>  }
>
>  bdrv_dirty_bitmap_unlock(bitmap);

Hmm; that looks a little odd but I guess it's a bit more
documentative of the intent. Up to you whether you want
to submit it as a patch or not I guess :-)

thanks
-- PMM

Re: [PATCH v3 01/41] tcg: Enhance flush_icache_range with separate data pointer

2020-11-06 Thread Richard Henderson

On 11/6/20 12:31 PM, Alex Bennée wrote:
>> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
>> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t 
>> len)
>>  {
>> -__builtin___clear_cache((char *)start, (char *)stop);
>> +/* TODO: Copy this from gcc to avoid 4 loops instead of 2. */
> 
> Why not do it now?

Do you really want to be including that into a patch that touches all host
backends?  I do it later, in a change by itself.

> I take it i386 is just too primitive to care about flushing things?

Or the reverse.  ;-)


r~

Re: [Virtio-fs] [PATCH] virtiofsd: Use --thread-pool-size=0 to mean no thread pool

2020-11-06 Thread Vivek Goyal

On Fri, Nov 06, 2020 at 08:33:50PM +, Venegas Munoz, Jose Carlos wrote:
> Hi Vivek,
> 
> I have tested with Kata 1.12-apha0, the results seems that are better for the 
> use fio config I am tracking. 
> 
> The fio config does  randrw:
> 
> fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio 
> --bs=4k --iodepth=64 --size=200M --readwrite=randrw --rwmixread=75
> 

Hi Carlos,

Thanks for the testing. 

So basically two conclusions from your tests.

- for virtiofs, --thread-pool-size=0 is performing better as comapred
  to --thread-pool-size=1 as well as --thread-pool-size=64. Approximately
  35-40% better.

- virtio-9p is still approximately 30% better than virtiofs
  --thread-pool-size=0.

As I had done the analysis that this particular workload (mixed read and
write) is bad with virtiofs because after every write we are invalidating
attrs and cache so next read ends up fetching attrs again. I had posted
patches to gain some of the performance.

https://lore.kernel.org/linux-fsdevel/20200929185015.gg220...@redhat.com/

But I got the feedback to look into implementing file leases instead.

Anyway, good to know that --thread-pool-size=0 narrows the performance
gap substantially (more than half) for this workload. I will look into
identifying further bottlenecks.

Given in my tests, most of the workload benefited from
--thread-pool-size=0, may be this can be the default and user can
opt-in for a thread pool (--thread-pool-size=64) once it is known
that a particular workload benefits from it.

David, Stefan, WDYT?

Thanks
Vivek

> - I can see better results with this patch
> - 9pfs is still better in the case of Kata because of the use of:
> https://github.com/kata-containers/packaging/blob/stable-1.12/qemu/patches/5.0.x/0001-9p-removing-coroutines-of-9p-to-increase-the-I-O-per.patch
> 
> Results:
> ./fio-results-run_virtiofs_tread_pool_0
>READ: bw=42.8MiB/s (44.8MB/s), 42.8MiB/s-42.8MiB/s (44.8MB/s-44.8MB/s), 
> io=150MiB (157MB), run=3507-3507msec
>   WRITE: bw=14.3MiB/s (14.9MB/s), 14.3MiB/s-14.3MiB/s (14.9MB/s-14.9MB/s), 
> io=49.0MiB (52.4MB), run=3507-3507msec
> ./fio-results-run_9pfs
>READ: bw=55.1MiB/s (57.8MB/s), 55.1MiB/s-55.1MiB/s (57.8MB/s-57.8MB/s), 
> io=150MiB (157MB), run=2722-2722msec
>   WRITE: bw=18.4MiB/s (19.3MB/s), 18.4MiB/s-18.4MiB/s (19.3MB/s-19.3MB/s), 
> io=49.0MiB (52.4MB), run=2722-2722msec
> ./fio-results-run_virtiofs_tread_pool_1
>READ: bw=34.5MiB/s (36.1MB/s), 34.5MiB/s-34.5MiB/s (36.1MB/s-36.1MB/s), 
> io=150MiB (157MB), run=4354-4354msec
>   WRITE: bw=11.5MiB/s (12.0MB/s), 11.5MiB/s-11.5MiB/s (12.0MB/s-12.0MB/s), 
> io=49.0MiB (52.4MB), run=4354-4354msec
> ./fio-results-run_virtiofs_tread_pool_64
>READ: bw=32.3MiB/s (33.8MB/s), 32.3MiB/s-32.3MiB/s (33.8MB/s-33.8MB/s), 
> io=150MiB (157MB), run=4648-4648msec
>   WRITE: bw=10.8MiB/s (11.3MB/s), 10.8MiB/s-10.8MiB/s (11.3MB/s-11.3MB/s), 
> io=49.0MiB (52.4MB), run=4648-4648msec
> 
> Next: 
> - run https://github.com/rhvgoyal/virtiofs-tests for  tread_pool_0, 
> tread_pool_64, and 9pfs
> - Test 
> https://lore.kernel.org/linux-fsdevel/20201009181512.65496-1-vgo...@redhat.com/
> 
> All the testing  for kata is based in:
> https://github.com/jcvenegas/mrunner/blob/master/scripts/bash_workloads/build-qemu-and-run-fio-test.sh
> 
> I ran this using an azure VM.
> 
> Cheers,
> Carlos
> 
> 
> On 05/11/20 13:53, "Vivek Goyal"  wrote:
> 
> On Thu, Nov 05, 2020 at 02:44:16PM -0500, Vivek Goyal wrote:
> > Right now we create a thread pool and main thread hands over the request
> > to thread in thread pool to process. Number of threads in thread pool
> > can be managed by option --thread-pool-size.
> > 
> > There is a chance that in case of some workloads, we might get better
> > performance if we don't handover the request to a different thread
> > and process in the context of thread receiving the request.
> > 
> > To implement that, redefine the meaning of --thread-pool-size=0 to
> > mean that don't use a thread pool. Instead process the request in
> > the context of thread receiving request from the queue.
> > 
> > I can't think how --thread-pool-size=0 is useful and hence using
> > that. If it is already useful somehow, I could look at defining
> > a new option say "--no-thread-pool".
> > 
> > I think this patch will be used more as a debug help to do comparison
> > when it is more effecient to do not hand over the requests to a
> > thread pool.
> 
> I ran virtiofs-tests to comapre --thread-pool-size=0 and
> --thread-pool-size=64. And results seem to be all over the place. In
> some cases thread pool seems to perform batter and in other cases
> no-thread-pool seems to perform better.
> 
> But in general it looks like that except for the case of libaio workload,
> no-thread-pool is helping.
> 
> Thanks
> Vivek
> 
> NAMEWORKLOADBandwidth   IOPS

Re: [PATCH v2 00/44] Make qdev static property API usable by any QOM type

2020-11-06 Thread Eduardo Habkost

On Fri, Nov 06, 2020 at 10:50:19AM -0500, Eduardo Habkost wrote:
> On Fri, Nov 06, 2020 at 10:45:11AM +0100, Kevin Wolf wrote:
> > Am 04.11.2020 um 16:59 hat Eduardo Habkost geschrieben:
> > > This series refactor the qdev property code so the static
> > > property system can be used by any QOM type.  As an example, at
> > > the end of the series some properties in TYPE_MACHINE are
> > > converted to static properties to demonstrate the new API.
> > > 
> > > Changes v1 -> v2
> > > 
> > > 
> > > * Rename functions and source files to call the new feature
> > >   "field property" instead of "static property"
> > > 
> > > * Change the API signature from an array-based interface similar
> > >   to device_class_set_props() to a object_property_add()-like
> > >   interface.
> > > 
> > >   This means instead of doing this:
> > > 
> > > object_class_property_add_static_props(oc, my_array_of_props);
> > > 
> > >   properties are registered like this:
> > > 
> > > object_class_property_add_field(oc, "property-name"
> > > PROP_XXX(MyState, my_field),
> > > prop_allow_set_always);
> > > 
> > >   where PROP_XXX is a PROP_* macro like PROP_STRING, PROP_BOOL,
> > >   etc.
> > 
> > In comparison, I really like the resulting code from the array based
> > interface in v1 better.
> > 
> > I think it's mostly for two reasons: First, the array is much more
> > compact and easier to read. And maybe even more importantly, you know
> > it's static data and only static data. The function based interface can
> > be mixed with other code or the parameter list can contain calls to
> > other functions with side effects, so there are a lot more opportunities
> > for surprises.
> 
> This is a really good point, and I strongly agree with it.
> Letting code do funny tricks at runtime is one of the reasons QOM
> properties became hard to introspect.
> 
> > 
> > What I didn't like about the v1 interface is that there is still a
> > separate object_class_property_set_description() for each property, but
> > I think that could have been fixed by moving the description to the
> > definitions in the array, too.
> 
> This would be very easy to implement.

This was implemented at:
https://gitlab.com/ehabkost/qemu/-/commits/work/qdev-make-generic

This is the interface I'd like to submit as v3:

static Property machine_props[] = {
DEFINE_PROP_STRING("kernel", MachineState, kernel_filename,
   .description = "Linux kernel image file"),
DEFINE_PROP_STRING("initrd", MachineState, initrd_filename,
   .description = "Linux initial ramdisk file"),
DEFINE_PROP_STRING("append", MachineState, kernel_cmdline,
   .description = "Linux kernel command line"),
DEFINE_PROP_STRING("dtb", MachineState, dtb,
   .description = "Linux kernel device tree file"),
DEFINE_PROP_STRING("dumpdtb", MachineState, dumpdtb,
   .description = "Dump current dtb to a file and quit"),
DEFINE_PROP_STRING("dt-compatible", MachineState, dt_compatible,
   .description = "Overrides the \"compatible\" "
  "property of the dt root node"),
DEFINE_PROP_STRING("firmware", MachineState, firmware,
   .description = "Firmware image"),
DEFINE_PROP_STRING("memory-backend", MachineState, ram_memdev_id,
   .description = "ID of memory backend object"),
DEFINE_PROP_BOOL("dump-guest-core", MachineState, dump_guest_core, true,
 .description = "Include guest memory in a core dump"),
DEFINE_PROP_BOOL("mem-merge", MachineState, mem_merge, true,
 .description = "Enable/disable memory merge support"),
DEFINE_PROP_BOOL("graphics", MachineState, enable_graphics, true,
 .description = "Set on/off to enable/disable graphics 
emulation"),
DEFINE_PROP_BOOL("suppress-vmdesc", MachineState, suppress_vmdesc, false,
 .description = "Set on to disable self-describing 
migration"),
DEFINE_PROP_UINT32("phandle-start", MachineState, phandle_start, 0,
   .description = "The first phandle ID we may generate 
dynamically"),
DEFINE_PROP_END_OF_LIST(),
};

static void machine_class_init(ObjectClass *oc, void *data)
{
...
object_class_add_field_properties(oc, machine_props, prop_allow_set_always);
...
}



> 
> > 
> > On Fri, Oct 30, 2020 at 06:10:34PM +0100, Paolo Bonzini wrote:
> > > On 29/10/20 23:02, Eduardo Habkost wrote:
> > > > +static Property machine_props[] = {
> > > > +DEFINE_PROP_STRING("kernel", MachineState, kernel_filename),
> > > > +DEFINE_PROP_STRING("initrd", MachineState, initrd_filename),
> > > > +DEFINE_PROP_STRING("append", MachineState, kernel_cmdline),
> > > > +DEFINE_PROP_STRING("dtb", MachineState, dtb),
> > > > +

[PATCH] meson: always include contrib/libvhost-user

2020-11-06 Thread Stefan Hajnoczi

libvhost-user is needed when CONFIG_LINUX is set. The CONFIG_VHOST_USER
check in meson.build is incorrect.

In fact, no explicit check is needed since this dependency is not built
by default. If something declares a dependency on libvhost-user then it
will be built, otherwise it won't be built (i.e. on non-Linux hosts).

This fixes ./configure --disable-vhost-user && make.

Fixes: bc15e44cb2191bbb2318878acdf5038134e56394 ("configure: introduce 
--enable-vhost-user-blk-server")
Reported-by: Philippe Mathieu-Daudé 
Reported-by: Michael S. Tsirkin 
Signed-off-by: Stefan Hajnoczi 
---
 meson.build | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index f5175010df..b473620321 100644
--- a/meson.build
+++ b/meson.build
@@ -1450,11 +1450,7 @@ trace_events_subdirs += [
   'util',
 ]
 
-vhost_user = not_found
-if 'CONFIG_VHOST_USER' in config_host
-  subdir('contrib/libvhost-user')
-endif
-
+subdir('contrib/libvhost-user')
 subdir('qapi')
 subdir('qobject')
 subdir('stubs')
-- 
2.28.0

Re: nbd: bitmap_to_extents() calls nbd_extent_array_add() without checking return value: coverity false positive?

2020-11-06 Thread Eric Blake

On 11/6/20 11:22 AM, Peter Maydell wrote:
> Hi; Coverity's "you usually check the return value of this function
> but you didn't do that here" heuristic has fired on the code in
> nbd/server.c:bitmap_to_extents() -- the function nbd_extent_array_add()
> is called five times in server.c, and the return value is checked
> in four of those, but not in the final call at the end of
> bitmap_to_extents(). (CID 1436125.)
> 
> Is this a false positive, or should the caller be handling an
> error here ?

False positive, but I don't mind tweaking the code to silence Coverity.
This should do it; let me know if I should turn it into a formal patch.

diff --git i/nbd/server.c w/nbd/server.c
index d145e1a69083..377698a2ce85 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -2128,9 +2128,8 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap,
 }
 }

-if (!full) {
-/* last non dirty extent */
-nbd_extent_array_add(es, end - start, 0);
+if (!full && nbd_extent_array_add(es, end - start, 0) < 0) {
+/* last non dirty extent, not a problem if array is now full */
 }

 bdrv_dirty_bitmap_unlock(bitmap);


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Virtio-fs] [PATCH] virtiofsd: Use --thread-pool-size=0 to mean no thread pool

2020-11-06 Thread Venegas Munoz, Jose Carlos

Hi Vivek,

I have tested with Kata 1.12-apha0, the results seems that are better for the 
use fio config I am tracking. 

The fio config does  randrw:

fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio 
--bs=4k --iodepth=64 --size=200M --readwrite=randrw --rwmixread=75

- I can see better results with this patch
- 9pfs is still better in the case of Kata because of the use of:
https://github.com/kata-containers/packaging/blob/stable-1.12/qemu/patches/5.0.x/0001-9p-removing-coroutines-of-9p-to-increase-the-I-O-per.patch

Results:
./fio-results-run_virtiofs_tread_pool_0
   READ: bw=42.8MiB/s (44.8MB/s), 42.8MiB/s-42.8MiB/s (44.8MB/s-44.8MB/s), 
io=150MiB (157MB), run=3507-3507msec
  WRITE: bw=14.3MiB/s (14.9MB/s), 14.3MiB/s-14.3MiB/s (14.9MB/s-14.9MB/s), 
io=49.0MiB (52.4MB), run=3507-3507msec
./fio-results-run_9pfs
   READ: bw=55.1MiB/s (57.8MB/s), 55.1MiB/s-55.1MiB/s (57.8MB/s-57.8MB/s), 
io=150MiB (157MB), run=2722-2722msec
  WRITE: bw=18.4MiB/s (19.3MB/s), 18.4MiB/s-18.4MiB/s (19.3MB/s-19.3MB/s), 
io=49.0MiB (52.4MB), run=2722-2722msec
./fio-results-run_virtiofs_tread_pool_1
   READ: bw=34.5MiB/s (36.1MB/s), 34.5MiB/s-34.5MiB/s (36.1MB/s-36.1MB/s), 
io=150MiB (157MB), run=4354-4354msec
  WRITE: bw=11.5MiB/s (12.0MB/s), 11.5MiB/s-11.5MiB/s (12.0MB/s-12.0MB/s), 
io=49.0MiB (52.4MB), run=4354-4354msec
./fio-results-run_virtiofs_tread_pool_64
   READ: bw=32.3MiB/s (33.8MB/s), 32.3MiB/s-32.3MiB/s (33.8MB/s-33.8MB/s), 
io=150MiB (157MB), run=4648-4648msec
  WRITE: bw=10.8MiB/s (11.3MB/s), 10.8MiB/s-10.8MiB/s (11.3MB/s-11.3MB/s), 
io=49.0MiB (52.4MB), run=4648-4648msec

Next: 
- run https://github.com/rhvgoyal/virtiofs-tests for  tread_pool_0, 
tread_pool_64, and 9pfs
- Test 
https://lore.kernel.org/linux-fsdevel/20201009181512.65496-1-vgo...@redhat.com/

All the testing  for kata is based in:
https://github.com/jcvenegas/mrunner/blob/master/scripts/bash_workloads/build-qemu-and-run-fio-test.sh

I ran this using an azure VM.

Cheers,
Carlos


On 05/11/20 13:53, "Vivek Goyal"  wrote:

On Thu, Nov 05, 2020 at 02:44:16PM -0500, Vivek Goyal wrote:
> Right now we create a thread pool and main thread hands over the request
> to thread in thread pool to process. Number of threads in thread pool
> can be managed by option --thread-pool-size.
> 
> There is a chance that in case of some workloads, we might get better
> performance if we don't handover the request to a different thread
> and process in the context of thread receiving the request.
> 
> To implement that, redefine the meaning of --thread-pool-size=0 to
> mean that don't use a thread pool. Instead process the request in
> the context of thread receiving request from the queue.
> 
> I can't think how --thread-pool-size=0 is useful and hence using
> that. If it is already useful somehow, I could look at defining
> a new option say "--no-thread-pool".
> 
> I think this patch will be used more as a debug help to do comparison
> when it is more effecient to do not hand over the requests to a
> thread pool.

I ran virtiofs-tests to comapre --thread-pool-size=0 and
--thread-pool-size=64. And results seem to be all over the place. In
some cases thread pool seems to perform batter and in other cases
no-thread-pool seems to perform better.

But in general it looks like that except for the case of libaio workload,
no-thread-pool is helping.

Thanks
Vivek

NAMEWORKLOADBandwidth   IOPS

thread-pool seqread-psync   682.4mb 170.6k  

no-thread-pool  seqread-psync   679.3mb 169.8k  


thread-pool seqread-psync-multi 2415.9mb603.9k  

no-thread-pool  seqread-psync-multi 2528.5mb632.1k  


thread-pool seqread-mmap591.7mb 147.9k  

no-thread-pool  seqread-mmap595.6mb 148.9k  


thread-pool seqread-mmap-multi  2195.3mb548.8k  

no-thread-pool  seqread-mmap-multi  2286.1mb571.5k  


thread-pool seqread-libaio  329.1mb 82.2k   

no-thread-pool  seqread-libaio  271.5mb 67.8k   


thread-pool seqread-libaio-multi1387.1mb346.7k  

no-thread-pool  seqread-libaio-multi1508.2mb377.0k  


thread-pool randread-psync  59.0mb  14.7k   

no-thread-pool  randread-psync  78.5mb  19.6k   


thread-pool randread-psync-multi226.4mb 56.6k   

no-thread-pool  randread-psync-multi289.2mb 72.3k   


thread-pool

Re: [PATCH v3 01/41] tcg: Enhance flush_icache_range with separate data pointer

2020-11-06 Thread Alex Bennée



Richard Henderson  writes:

> We are shortly going to have a split rw/rx jit buffer.  Depending
> on the host, we need to flush the dcache at the rw data pointer and
> flush the icache at the rx code pointer.
>
> For now, the two passed pointers are identical, so there is no
> effective change in behaviour.
>
> Signed-off-by: Richard Henderson 
> ---
>  tcg/aarch64/tcg-target.h |  9 +++--
>  tcg/arm/tcg-target.h |  8 ++--
>  tcg/i386/tcg-target.h|  3 ++-
>  tcg/mips/tcg-target.h|  8 ++--
>  tcg/ppc/tcg-target.h |  2 +-
>  tcg/riscv/tcg-target.h   |  8 ++--
>  tcg/s390/tcg-target.h|  3 ++-
>  tcg/sparc/tcg-target.h   |  8 +---
>  tcg/tci/tcg-target.h |  3 ++-
>  softmmu/physmem.c|  9 -
>  tcg/tcg.c|  6 --
>  tcg/aarch64/tcg-target.c.inc |  2 +-
>  tcg/mips/tcg-target.c.inc|  2 +-
>  tcg/ppc/tcg-target.c.inc | 21 +++--
>  tcg/sparc/tcg-target.c.inc   |  4 ++--
>  15 files changed, 64 insertions(+), 32 deletions(-)
>
> diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
> index 663dd0b95e..d0a6a059b7 100644
> --- a/tcg/aarch64/tcg-target.h
> +++ b/tcg/aarch64/tcg-target.h
> @@ -148,9 +148,14 @@ typedef enum {
>  #define TCG_TARGET_DEFAULT_MO (0)
>  #define TCG_TARGET_HAS_MEMORY_BSWAP 1
>  
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t 
> len)
>  {
> -__builtin___clear_cache((char *)start, (char *)stop);
> +/* TODO: Copy this from gcc to avoid 4 loops instead of 2. */

Why not do it now?

> +if (rw != rx) {
> +__builtin___clear_cache((char *)rw, (char *)(rw + len));
> +}
> +__builtin___clear_cache((char *)rx, (char *)(rx + len));
>  }
>  

>  
> -static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
> +/* Flush the dcache at RW, and the icache at RX, as necessary. */
> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t 
> len)
>  {
>  }

I take it i386 is just too primitive to care about flushing things?

Anyway:

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PULL 33/43] target/ppc: convert xxspltw to vector operations

2020-11-06 Thread Richard Henderson

On 11/6/20 10:47 AM, Peter Maydell wrote:
>> +#ifndef HOST_WORDS_BIG_ENDIAN
>> +bofs ^= 8 | 4;
>> +#endif
> 
> The ifdef is HOST_WORDS_BIGENDIAN without the
> third underscore, so this XOR operation will be
> done on both little and big-endian hosts.

Ho hum.

> Should the ifndef line be fixed...

This.

I once had a patch set that changed all of our endian tests from defined/undef
to true/false, so that we could detect errors like this.  Perhaps I'll try to
recreate it next dev cycle...

r~

Re: [PATCH v1] docs/devel: Add VFIO device migration documentation

2020-11-06 Thread Alex Williamson

On Sat, 7 Nov 2020 00:27:46 +0530
Kirti Wankhede  wrote:

> On 11/6/2020 2:56 AM, Alex Williamson wrote:
> > On Fri, 6 Nov 2020 02:22:11 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 11/6/2020 12:41 AM, Alex Williamson wrote:  
> >>> On Fri, 6 Nov 2020 00:29:36 +0530
> >>> Kirti Wankhede  wrote:
> >>>  
>  On 11/4/2020 6:15 PM, Alex Williamson wrote:  
> > On Wed, 4 Nov 2020 13:25:40 +0530
> > Kirti Wankhede  wrote:
> > 
> >> On 11/4/2020 1:57 AM, Alex Williamson wrote:  
> >>> On Wed, 4 Nov 2020 01:18:12 +0530
> >>> Kirti Wankhede  wrote:
> >>>
>  On 10/30/2020 12:35 AM, Alex Williamson wrote:  
> > On Thu, 29 Oct 2020 23:11:16 +0530
> > Kirti Wankhede  wrote:
> >   
> 
>  
>    
>  +System memory dirty pages tracking
>  +--
>  +
>  +A ``log_sync`` memory listener callback is added to mark system 
>  memory pages  
> >>>
> >>> s/is added to mark/marks those/
> >>>  
>  +as dirty which are used for DMA by VFIO device. Dirty pages 
>  bitmap is queried  
> >>>
> >>> s/by/by the/
> >>> s/Dirty/The dirty/
> >>>  
>  +per container. All pages pinned by vendor driver through 
>  vfio_pin_pages()  
> >>>
> >>> s/by/by the/
> >>>  
>  +external API have to be marked as dirty during migration. When 
>  there are CPU
>  +writes, CPU dirty page tracking can identify dirtied pages, but 
>  any page pinned
>  +by vendor driver can also be written by device. There is 
>  currently no device  
> >>>
> >>> s/by/by the/ (x2)
> >>>  
>  +which has hardware support for dirty page tracking. So all 
>  pages which are
>  +pinned by vendor driver are considered as dirty.
>  +Dirty pages are tracked when device is in stop-and-copy phase 
>  because if pages
>  +are marked dirty during pre-copy phase and content is 
>  transfered from source to
>  +destination, there is no way to know newly dirtied pages from 
>  the point they
>  +were copied earlier until device stops. To avoid repeated copy 
>  of same content,
>  +pinned pages are marked dirty only during stop-and-copy phase.  
> >>
> >>  
> >>> Let me take a quick stab at rewriting this paragraph (not sure if 
> >>> I
> >>> understood it correctly):
> >>>
> >>> "Dirty pages are tracked when the device is in the stop-and-copy 
> >>> phase.
> >>> During the pre-copy phase, it is not possible to distinguish a 
> >>> dirty
> >>> page that has been transferred from the source to the destination 
> >>> from
> >>> newly dirtied pages, which would lead to repeated copying of the 
> >>> same
> >>> content. Therefore, pinned pages are only marked dirty during the
> >>> stop-and-copy phase." ?
> >>>  
> >>
> >> I think above rephrase only talks about repeated copying in 
> >> pre-copy
> >> phase. Used "copied earlier until device stops" to indicate both
> >> pre-copy and stop-and-copy till device stops.  
> >
> >
> > Now I'm confused, I thought we had abandoned the idea that we can 
> > only
> > report pinned pages during stop-and-copy.  Doesn't the device needs 
> > to
> > expose its dirty memory footprint during the iterative phase 
> > regardless
> > of whether that causes repeat copies?  If QEMU iterates and sees 
> > that
> > all memory is still dirty, it may have transferred more data, but it
> > can actually predict if it can achieve its downtime tolerances.  
> > Which
> > is more important, less data transfer or predictability?  Thanks,
> >   
> 
>  Even if QEMU copies and transfers content of all sys mem pages during
>  pre-copy (worst case with IOMMU backed mdev device when its vendor
>  driver is not smart to pin pages explicitly and all sys mem pages are
>  marked dirty), then also its prediction about downtime tolerance will
>  not be correct, because during stop-and-copy again all pages need to 
>  be
>  copied as device can write to any of those pinned pages.  
> >>>
> >>> I think you're only reiterating my point.  If QEMU copies all of guest
> >>> memory during the iterative phase and each time it sees that all 
>

[PATCH-for-5.2] configure: Check vhost-user is available to build vhost-user-blk-server

2020-11-06 Thread Philippe Mathieu-Daudé

Check vhost-user is available to build vhost-user-blk-server to fix:

 $ ../configure \
  --disable-vhost-user --enable-vhost-user-blk-server && \
   make qemu-nbd
 ...
 [505/505] Linking target qemu-nbd
 FAILED: qemu-nbd
 cc  -o qemu-nbd qemu-nbd.p/qemu-nbd.c.o -Wl,--as-needed -Wl,--no-undefined 
-pie -Wl,--whole-archive libblockdev.fa libblock.fa libcrypto.fa libauthz.fa 
libqom.fa libio.fa -Wl,--no-whole-archive -Wl,--warn-common -Wl,-z,relro 
-Wl,-z,now -m64 -fstack-protector-strong -Wl,--start-group libqemuutil.a 
libblockdev.fa libblock.fa libcrypto.fa libauthz.fa libqom.fa libio.fa 
@block.syms -lgio-2.0 -lgobject-2.0 -lglib-2.0 -lgio-2.0 -lgobject-2.0 
-lglib-2.0 -pthread -lgnutls -lutil -lm -lgthread-2.0 -lglib-2.0 -lbz2 
-lgthread-2.0 -lglib-2.0 -lssh -lrbd -lrados -lcurl -lxml2 -lzstd -lacl -lgfapi 
-lglusterfs -lgfrpc -lgfxdr -luuid -laio /usr/lib64/libz.so -L/usr/lib64/iscsi 
-liscsi -lnettle -lgnutls -lpam -Wl,--end-group
 /usr/bin/ld: libblockdev.fa(block_export_vhost-user-blk-server.c.o): in 
function `vu_blk_process_vq':
 block/export/vhost-user-blk-server.c:203: undefined reference to `vu_get_queue'
 /usr/bin/ld: block/export/vhost-user-blk-server.c:208: undefined reference to 
`vu_queue_pop'
 /usr/bin/ld: libblockdev.fa(block_export_vhost-user-blk-server.c.o): in 
function `vu_blk_queue_set_started':
 block/export/vhost-user-blk-server.c:228: undefined reference to `vu_get_queue'
 /usr/bin/ld: libblockdev.fa(block_export_vhost-user-blk-server.c.o): in 
function `vu_blk_req_complete':
 block/export/vhost-user-blk-server.c:55: undefined reference to `vu_queue_push'
 /usr/bin/ld: block/export/vhost-user-blk-server.c:56: undefined reference to 
`vu_queue_notify'
 /usr/bin/ld: libblockdev.fa(block_export_vhost-user-blk-server.c.o): in 
function `vu_blk_queue_set_started':
 block/export/vhost-user-blk-server.c:229: undefined reference to 
`vu_set_queue_handler'
 /usr/bin/ld: libqemuutil.a(util_vhost-user-server.c.o): in function 
`vu_client_trip':
 util/vhost-user-server.c:176: undefined reference to `vu_dispatch'
 /usr/bin/ld: util/vhost-user-server.c:180: undefined reference to `vu_deinit'
 /usr/bin/ld: libqemuutil.a(util_vhost-user-server.c.o): in function 
`vu_accept':
 util/vhost-user-server.c:291: undefined reference to `vu_init'
 collect2: error: ld returned 1 exit status
 ninja: build stopped: subcommand failed.
 make: *** [Makefile:171: run-ninja] Error 1

Fixes: bc15e44cb21 ("configure: introduce --enable-vhost-user-blk-server")
Signed-off-by: Philippe Mathieu-Daudé 
---
 configure | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/configure b/configure
index 805f7791503..5079ebb416a 100755
--- a/configure
+++ b/configure
@@ -2390,6 +2390,9 @@ fi
 
 # libvhost-user is Linux-only
 test "$vhost_user_blk_server" = "" && vhost_user_blk_server=$linux
+if test "$vhost_user_blk_server" = "yes" && test "$vhost_user" = "no"; then
+  error_exit "--enable-vhost-user-blk-server requires --enable-vhost-user"
+fi
 if test "$vhost_user_blk_server" = "yes" && test "$linux" = "no"; then
   error_exit "--enable-vhost-user-blk-server is only available on Linux"
 fi
-- 
2.26.2

Re: [PATCH v2 04/11] qapi/introspect.py: add assertions and casts

2020-11-06 Thread Cleber Rosa

On Mon, Oct 26, 2020 at 03:42:44PM -0400, John Snow wrote:
> This is necessary to keep mypy passing in the next patch when we add
> preliminary type hints. It will be removed shortly.
> 
> Signed-off-by: John Snow 
> ---

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v1] docs/devel: Add VFIO device migration documentation

2020-11-06 Thread Kirti Wankhede





On 11/6/2020 2:56 AM, Alex Williamson wrote:

On Fri, 6 Nov 2020 02:22:11 +0530
Kirti Wankhede  wrote:


On 11/6/2020 12:41 AM, Alex Williamson wrote:

On Fri, 6 Nov 2020 00:29:36 +0530
Kirti Wankhede  wrote:
   

On 11/4/2020 6:15 PM, Alex Williamson wrote:

On Wed, 4 Nov 2020 13:25:40 +0530
Kirti Wankhede  wrote:
  

On 11/4/2020 1:57 AM, Alex Williamson wrote:

On Wed, 4 Nov 2020 01:18:12 +0530
Kirti Wankhede  wrote:
 

On 10/30/2020 12:35 AM, Alex Williamson wrote:

On Thu, 29 Oct 2020 23:11:16 +0530
Kirti Wankhede  wrote:






+System memory dirty pages tracking
+--
+
+A ``log_sync`` memory listener callback is added to mark system memory pages


s/is added to mark/marks those/
   

+as dirty which are used for DMA by VFIO device. Dirty pages bitmap is queried


s/by/by the/
s/Dirty/The dirty/
   

+per container. All pages pinned by vendor driver through vfio_pin_pages()


s/by/by the/
   

+external API have to be marked as dirty during migration. When there are CPU
+writes, CPU dirty page tracking can identify dirtied pages, but any page pinned
+by vendor driver can also be written by device. There is currently no device


s/by/by the/ (x2)
   

+which has hardware support for dirty page tracking. So all pages which are
+pinned by vendor driver are considered as dirty.
+Dirty pages are tracked when device is in stop-and-copy phase because if pages
+are marked dirty during pre-copy phase and content is transfered from source to
+destination, there is no way to know newly dirtied pages from the point they
+were copied earlier until device stops. To avoid repeated copy of same content,
+pinned pages are marked dirty only during stop-and-copy phase.


   

Let me take a quick stab at rewriting this paragraph (not sure if I
understood it correctly):

"Dirty pages are tracked when the device is in the stop-and-copy phase.
During the pre-copy phase, it is not possible to distinguish a dirty
page that has been transferred from the source to the destination from
newly dirtied pages, which would lead to repeated copying of the same
content. Therefore, pinned pages are only marked dirty during the
stop-and-copy phase." ?
   


I think above rephrase only talks about repeated copying in pre-copy
phase. Used "copied earlier until device stops" to indicate both
pre-copy and stop-and-copy till device stops.



Now I'm confused, I thought we had abandoned the idea that we can only
report pinned pages during stop-and-copy.  Doesn't the device needs to
expose its dirty memory footprint during the iterative phase regardless
of whether that causes repeat copies?  If QEMU iterates and sees that
all memory is still dirty, it may have transferred more data, but it
can actually predict if it can achieve its downtime tolerances.  Which
is more important, less data transfer or predictability?  Thanks,



Even if QEMU copies and transfers content of all sys mem pages during
pre-copy (worst case with IOMMU backed mdev device when its vendor
driver is not smart to pin pages explicitly and all sys mem pages are
marked dirty), then also its prediction about downtime tolerance will
not be correct, because during stop-and-copy again all pages need to be
copied as device can write to any of those pinned pages.


I think you're only reiterating my point.  If QEMU copies all of guest
memory during the iterative phase and each time it sees that all memory
is dirty, such as if CPUs or devices (including assigned devices) are
dirtying pages as fast as it copies them (or continuously marks them
dirty), then QEMU can predict that downtime will require copying all
pages.


But as of now there is no way to know if device has dirtied pages during
iterative phase.



This claim doesn't make any sense, pinned pages are considered
persistently dirtied, during the iterative phase and while stopped.



If instead devices don't mark dirty pages until the VM is
stopped, then QEMU might iterate through memory copy and predict a short
downtime because not much memory is dirty, only to be surprised that
all of memory is suddenly dirty.  At that point it's too late, the VM
is already stopped, the predicted short downtime takes far longer than
expected.  This is exactly why we made the kernel interface mark pinned
pages persistently dirty when it was proposed that we only report
pinned pages once.  Thanks,
 


Since there is no way to know if device dirtied pages during iterative
phase, QEMU should query pinned pages in stop-and-copy phase.



As above, I don't believe this is true.

  

Whenever there will be hardware support or some software mechanism to
report pages dirtied by device then we will add a capability bit in
migration capability and based on that capability bit qemu/user space
app should decide to query dirty pages in iterative phase.



Yes, we could advertise support

Re: [PULL 33/43] target/ppc: convert xxspltw to vector operations

2020-11-06 Thread Peter Maydell

On Mon, 18 Feb 2019 at 14:31, David Gibson  wrote:
>
> From: Richard Henderson 
>
> Signed-off-by: Richard Henderson 
> Acked-by: David Gibson 
> Message-Id: <20190215100058.20015-8-mark.cave-ayl...@ilande.co.uk>
> Signed-off-by: David Gibson 

So this is a commit from 18 months back, but I happened
to notice an oddity in it while grepping the code base for
something else:

> diff --git a/target/ppc/translate/vsx-impl.inc.c 
> b/target/ppc/translate/vsx-impl.inc.c
> index 944fc0608a..0e8cecb00a 100644
> --- a/target/ppc/translate/vsx-impl.inc.c
> +++ b/target/ppc/translate/vsx-impl.inc.c
> @@ -1359,38 +1359,24 @@ static void gen_xxsel(DisasContext * ctx)
>
>  static void gen_xxspltw(DisasContext *ctx)
>  {
> -TCGv_i64 b, b2;
> -TCGv_i64 vsr;
> +int rt = xT(ctx->opcode);
> +int rb = xB(ctx->opcode);
> +int uim = UIM(ctx->opcode);
> +int tofs, bofs;

[...]

> -tcg_gen_shli_i64(b2, b, 32);
> -tcg_gen_or_i64(vsr, b, b2);
> -set_cpu_vsrh(xT(ctx->opcode), vsr);
> -set_cpu_vsrl(xT(ctx->opcode), vsr);
> +tofs = vsr_full_offset(rt);
> +bofs = vsr_full_offset(rb);
> +bofs += uim << MO_32;
> +#ifndef HOST_WORDS_BIG_ENDIAN
> +bofs ^= 8 | 4;
> +#endif

The ifdef is HOST_WORDS_BIGENDIAN without the
third underscore, so this XOR operation will be
done on both little and big-endian hosts.

Should the ifndef line be fixed, or is this actually
a case where the xor really should be done on all hosts
so we should drop the ifndef/endif lines?

>
> -tcg_temp_free_i64(vsr);
> -tcg_temp_free_i64(b);
> -tcg_temp_free_i64(b2);
> +tcg_gen_gvec_dup_mem(MO_32, tofs, bofs, 16, 16);
>  }

thanks
-- PMM

Re: [PATCH for-5.2 3/4] hw/net/can/ctucan_core: Handle big-endian hosts

2020-11-06 Thread Pavel Pisa

On Friday 06 of November 2020 19:29:27 Philippe Mathieu-Daudé wrote:
> On 11/6/20 6:11 PM, Peter Maydell wrote:
> > The ctucan driver defines types for its registers which are a union
> > of a uint32_t with a struct with bitfields for the individual
> > fields within that register. This is a bad idea, because bitfields
> > aren't portable. The ctu_can_fd_regs.h header works around the
> > most glaring of the portability issues by defining the
> > fields in two different orders depending on the setting of the
> > __LITTLE_ENDIAN_BITFIELD define. However, in ctucan_core.h this
> > is unconditionally set to 1, which is wrong for big-endian hosts.
> >
> > Set it only if HOST_WORDS_BIGENDIAN is not set. There is no need
> > for a "have we defined it already" guard, because the only place
> > that should set it is ctucan_core.h, which has the usual
> > double-inclusion guard.
> >
> > Signed-off-by: Peter Maydell 
> > ---
> > Ideally all that bitfield-using code would be rewritten to use
> > extract32 and deposit32 instead, IMHO.
> > ---
> >  hw/net/can/ctucan_core.h | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/hw/net/can/ctucan_core.h b/hw/net/can/ctucan_core.h
> > index f21cb1c5ec3..bbc09ae0678 100644
> > --- a/hw/net/can/ctucan_core.h
> > +++ b/hw/net/can/ctucan_core.h
> > @@ -31,8 +31,7 @@
> >  #include "exec/hwaddr.h"
> >  #include "net/can_emu.h"
> >
> > -
> > -#ifndef __LITTLE_ENDIAN_BITFIELD
> > +#ifndef HOST_WORDS_BIGENDIAN
> >  #define __LITTLE_ENDIAN_BITFIELD 1
> >  #endif
>
> Alternatively s/#ifdef __LITTLE_ENDIAN_BITFIELD/#ifndef
> HOST_WORDS_BIGENDIAN/ the codebase and remove this here...

But then we cannot have same generated, untouch header file
common for Linux kernel and QEMU. Each uses different defines.
Or at least it was the goal, but after mainline Linux review
we switch in longer run to defines with use of BIT, GENMASK
FIELD_GET and FIELD_PREP.

But even GENMASK does not seem to be defined in QEMU headers
even that it is referenced in include/hw/usb/dwc2-regs.h
and include/standard-headers/asm-x86/kvm_para.h

So idea to have nice common generated headers which can
be checked for consistency and right version by diff
for Linux kernel, QEMU and even userspace tests
and other OSes (there with Linux defines substitution)
seems to be only dream.

Anyway, we switch to solution which is matching requirements
of each project if it is suggested. But it takes time.

Best wishes,

Pavel

Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers

2020-11-06 Thread Peter Maydell

On Fri, 6 Nov 2020 at 18:31, Philippe Mathieu-Daudé  wrote:
>
> On 11/6/20 6:11 PM, Peter Maydell wrote:
> > Instead of casting an address within a uint8_t array to a
> > uint32_t*, use stl_le_p(). This handles possibly misaligned
> > addresses which would otherwise crash on some hosts.
> >
> > Signed-off-by: Peter Maydell 
> > ---
> >  hw/net/can/ctucan_core.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
> > index f2ce978e5ec..e66526efa83 100644
> > --- a/hw/net/can/ctucan_core.c
> > +++ b/hw/net/can/ctucan_core.c
> > @@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, 
> > uint64_t val,
> >  addr %= CTUCAN_CORE_TXBUFF_SPAN;
> >  assert(buff_num < CTUCAN_CORE_TXBUF_NUM);
> >  if (addr < sizeof(s->tx_buffer[buff_num].data)) {
> > -uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + 
> > addr);
> > -*bufp = cpu_to_le32(val);
> > +stl_le_p(s->tx_buffer[buff_num].data + addr, val);
>
> Out of curiosity, how did you notice? Passing by while reviewing?

I saw it while I was fixing the Coverity issue from
patch 1, yes.

thanks
-- PMM

Re: [PATCH for-5.2] hw/mips/boston.c: Fix memory leak in boston_fdt_filter() error-handling paths

2020-11-06 Thread Philippe Mathieu-Daudé

On 11/6/20 6:58 PM, Peter Maydell wrote:
> Coverity points out that the error-handling paths in the
> boston_fdt_filter() function don't free the fdt that was allocated.
> Fix the leak by using g_autofree.
> 
> Fixes: Coverity CID 1432275
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/mips/boston.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers

2020-11-06 Thread Philippe Mathieu-Daudé

On 11/6/20 7:31 PM, Philippe Mathieu-Daudé wrote:
> On 11/6/20 6:11 PM, Peter Maydell wrote:
>> Instead of casting an address within a uint8_t array to a
>> uint32_t*, use stl_le_p(). This handles possibly misaligned
>> addresses which would otherwise crash on some hosts.
>>
>> Signed-off-by: Peter Maydell 
>> ---
>>  hw/net/can/ctucan_core.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
>> index f2ce978e5ec..e66526efa83 100644
>> --- a/hw/net/can/ctucan_core.c
>> +++ b/hw/net/can/ctucan_core.c
>> @@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, 
>> uint64_t val,
>>  addr %= CTUCAN_CORE_TXBUFF_SPAN;
>>  assert(buff_num < CTUCAN_CORE_TXBUF_NUM);
>>  if (addr < sizeof(s->tx_buffer[buff_num].data)) {
>> -uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + 
>> addr);
>> -*bufp = cpu_to_le32(val);
>> +stl_le_p(s->tx_buffer[buff_num].data + addr, val);
> 
> Out of curiosity, how did you notice? Passing by while reviewing?

$ git grep -P '^\s+\*.*=.*(cpu_to.*|to_cpu)\('|wc -l
82

Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers

2020-11-06 Thread Peter Maydell

On Fri, 6 Nov 2020 at 18:19, Pavel Pisa  wrote:
> On Friday 06 of November 2020 18:11:53 Peter Maydell wrote:
> > Instead of casting an address within a uint8_t array to a
> > uint32_t*, use stl_le_p(). This handles possibly misaligned
> > addresses which would otherwise crash on some hosts.
> >
> > Signed-off-by: Peter Maydell 
> > ---

> even that I do not like stl_le_p name, because it differs from the Linux
> kernel one. cpu_to_le32 matches. The pointer variant is cpu_to_le32p
> on Linux kernel side, I think. stl is strange name and l for long
> is problematic as well, if it is st32_le_p or st_le32_p I would recognize
> that much easier.

QEMU is not the kernel. We have our own naming conventions
and our own APIs. I agree that the b/w/l/q suffixing
is less intuitive than 8/16/32/64, but we have a lot of
functions using that convention, and the API is what it is.

thanks
-- PMM

[PATCH v2 1/1] Fix use after free in vfio_migration_probe

2020-11-06 Thread Kirti Wankhede

Fixes Coverity issue:
CID 1436126:  Memory - illegal accesses  (USE_AFTER_FREE)

Fixes: a9e271ec9b36 ("vfio: Add migration region initialization and finalize
function")

Signed-off-by: Kirti Wankhede 
Reviewed-by: David Edmondson 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/vfio/migration.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 3ce285ea395d..55261562d4f3 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -897,8 +897,8 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 goto add_blocker;
 }
 
-g_free(info);
 trace_vfio_migration_probe(vbasedev->name, info->index);
+g_free(info);
 return 0;
 
 add_blocker:
-- 
2.7.0

Re: [PATCH for-5.2 1/4] hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer

2020-11-06 Thread Pavel Pisa

Hello Peter,

On Friday 06 of November 2020 19:04:38 Peter Maydell wrote:
> On Fri, 6 Nov 2020 at 17:48, Pavel Pisa  wrote:
> > Hello Peter,
> >
> > thanks much for the catching the problem and investing time into
> > fixing. I hope to find time for more review of remarks and Xilinx
> > patches next week. I do not find reasonable time slot till Sunday.
> > Excuse me. To not block updates, I confirm your changes.
> >
> > On Friday 06 of November 2020 18:11:50 Peter Maydell wrote:
> > > The ctucan device has 4 CAN bus cores, each of which has a set of 20
> > > 32-bit registers for writing the transmitted data. The registers are
> > > however not contiguous; each core's buffers is 0x100 bytes after
> > > the last.
> > >
> > > We got the checks on the address wrong in the ctucan_mem_write()
> > > function:
> > >  * the first "is addr in range at all" check allowed
> > >addr == CTUCAN_CORE_MEM_SIZE, which is actually the first
> > >byte off the end of the range
> > >  * the decode of addresses into core-number plus offset in the
> > >tx buffer for that core failed to check that the offset was
> > >in range, so the guest could write off the end of the
> > >tx_buffer[] array
> > >  * the decode had an explicit check for whether the core-number
> > >was out of range, which is actually impossible given the
> > >CTUCAN_CORE_MEM_SIZE check and the number of cores.
> > >
> > > Fix the top level check, check the offset, and turn the check
> > > on the core-number into an assertion.
> > >
> > > Fixes: Coverity CID 1432874
> > > Signed-off-by: Peter Maydell 
> > > ---
> > >  hw/net/can/ctucan_core.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
> > > index d20835cd7e9..ea09bf71a0c 100644
> > > --- a/hw/net/can/ctucan_core.c
> > > +++ b/hw/net/can/ctucan_core.c
> > > @@ -303,7 +303,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr
> > > addr, uint64_t val, DPRINTF("write 0x%02llx addr 0x%02x\n",
> > >  (unsigned long long)val, (unsigned int)addr);
> > >
> > > -if (addr > CTUCAN_CORE_MEM_SIZE) {
> > > +if (addr >= CTUCAN_CORE_MEM_SIZE) {
> > >  return;
> > >  }
> >
> > Acked-by: Pavel Pisa 
> >
> > > @@ -312,7 +312,8 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr
> > > addr, uint64_t val, addr -= CTU_CAN_FD_TXTB1_DATA_1;
> > >  buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN;
> > >  addr %= CTUCAN_CORE_TXBUFF_SPAN;
> > > -if (buff_num < CTUCAN_CORE_TXBUF_NUM) {
> > > +assert(buff_num < CTUCAN_CORE_TXBUF_NUM);
> >
> > Assert is not necessary. If there is not buffer at that location,
> > then write has no effect. Assert would check for driver errors,
> > but it is not a problem of QEMU and for sure should not lead to its
> > crash.
>
> We assert() here as a guide to readers of the code that we know
> that buff_num can't possibly be out of range for the array
> access we're about to do: the values of CTUCAN_CORE_MEM_SIZE,
> CTUCAN_CORE_TXBUFF_SPAN, etc, make it mathematically impossible.
> We prefer to assert() that kind of condition rather than having
> an if() test for it.

I understand but I see as fully valid to have CTUCAN_CORE_MEM_SIZE
with some reserve and only two buffers populated which would lead
to "spare" space after the end of the area where buffers are mapped
into address-space. Same for CTUCAN_CORE_TXBUFF_SPAN
and CTUCAN_CORE_MSG_MAX_LEN. There could be check

  assert(CTUCAN_CORE_MSG_MAX_LEN <= CTUCAN_CORE_TXBUFF_SPAN)

and 
 
  assert(CTUCAN_CORE_TXBUFF_SPAN * CTUCAN_CORE_TXBUF_NUM +
 CTU_CAN_FD_TXTB1_DATA_1 <= CTUCAN_CORE_MEM_SIZE)

or even some cross checks of sizeof of the filed.

But all other combination are in the fact real, depends on core
synthesis parameters. Yes, for core release 2.x are the most
fixed now. But option to provide variant compatible with actual
driver which has CTUCAN_CORE_TXBUF_NUM > 4 up to hard limit of 8
is left open still.

Best wishes,

Pavel

Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers

2020-11-06 Thread Philippe Mathieu-Daudé

On 11/6/20 6:11 PM, Peter Maydell wrote:
> Instead of casting an address within a uint8_t array to a
> uint32_t*, use stl_le_p(). This handles possibly misaligned
> addresses which would otherwise crash on some hosts.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/net/can/ctucan_core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
> index f2ce978e5ec..e66526efa83 100644
> --- a/hw/net/can/ctucan_core.c
> +++ b/hw/net/can/ctucan_core.c
> @@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, 
> uint64_t val,
>  addr %= CTUCAN_CORE_TXBUFF_SPAN;
>  assert(buff_num < CTUCAN_CORE_TXBUF_NUM);
>  if (addr < sizeof(s->tx_buffer[buff_num].data)) {
> -uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + 
> addr);
> -*bufp = cpu_to_le32(val);
> +stl_le_p(s->tx_buffer[buff_num].data + addr, val);

Out of curiosity, how did you notice? Passing by while reviewing?

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-5.2 3/4] hw/net/can/ctucan_core: Handle big-endian hosts

2020-11-06 Thread Philippe Mathieu-Daudé

On 11/6/20 6:11 PM, Peter Maydell wrote:
> The ctucan driver defines types for its registers which are a union
> of a uint32_t with a struct with bitfields for the individual
> fields within that register. This is a bad idea, because bitfields
> aren't portable. The ctu_can_fd_regs.h header works around the
> most glaring of the portability issues by defining the
> fields in two different orders depending on the setting of the
> __LITTLE_ENDIAN_BITFIELD define. However, in ctucan_core.h this
> is unconditionally set to 1, which is wrong for big-endian hosts.
> 
> Set it only if HOST_WORDS_BIGENDIAN is not set. There is no need
> for a "have we defined it already" guard, because the only place
> that should set it is ctucan_core.h, which has the usual
> double-inclusion guard.
> 
> Signed-off-by: Peter Maydell 
> ---
> Ideally all that bitfield-using code would be rewritten to use
> extract32 and deposit32 instead, IMHO.
> ---
>  hw/net/can/ctucan_core.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/net/can/ctucan_core.h b/hw/net/can/ctucan_core.h
> index f21cb1c5ec3..bbc09ae0678 100644
> --- a/hw/net/can/ctucan_core.h
> +++ b/hw/net/can/ctucan_core.h
> @@ -31,8 +31,7 @@
>  #include "exec/hwaddr.h"
>  #include "net/can_emu.h"
>  
> -
> -#ifndef __LITTLE_ENDIAN_BITFIELD
> +#ifndef HOST_WORDS_BIGENDIAN
>  #define __LITTLE_ENDIAN_BITFIELD 1
>  #endif

Alternatively s/#ifdef __LITTLE_ENDIAN_BITFIELD/#ifndef
HOST_WORDS_BIGENDIAN/ the codebase and remove this here...

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/2] qemu-option: warn for short-form boolean options

2020-11-06 Thread Paolo Bonzini


On 06/11/20 17:49, Markus Armbruster wrote:

Deprecate all this, except for -chardev and -spice where it is in
wide use.

I consider this a misuse of deprecation, to be frank.  If something is
known to be unused, we just remove it.  Deprecation is precisely for
things that are used.  I'm with Daniel here: let's deprecate this sugar
everywhere.

Wide use may justify extending the deprecation grace period.


Fair enough.  However now that I think of it I'd have to remove the 
coverage of the "feature" in tests, because they'd warn.


Paolo

Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers

2020-11-06 Thread Pavel Pisa




On Friday 06 of November 2020 18:11:53 Peter Maydell wrote:
> Instead of casting an address within a uint8_t array to a
> uint32_t*, use stl_le_p(). This handles possibly misaligned
> addresses which would otherwise crash on some hosts.
>
> Signed-off-by: Peter Maydell 
> ---
>  hw/net/can/ctucan_core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
> index f2ce978e5ec..e66526efa83 100644
> --- a/hw/net/can/ctucan_core.c
> +++ b/hw/net/can/ctucan_core.c
> @@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr,
> uint64_t val, addr %= CTUCAN_CORE_TXBUFF_SPAN;
>  assert(buff_num < CTUCAN_CORE_TXBUF_NUM);
>  if (addr < sizeof(s->tx_buffer[buff_num].data)) {
> -uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data +
> addr); -*bufp = cpu_to_le32(val);
> +stl_le_p(s->tx_buffer[buff_num].data + addr, val);
>  }
>  } else {
>  switch (addr & ~3) {

Acked-by: Pavel Pisa 

even that I do not like stl_le_p name, because it differs from the Linux
kernel one. cpu_to_le32 matches. The pointer variant is cpu_to_le32p
on Linux kernel side, I think. stl is strange name and l for long
is problematic as well, if it is st32_le_p or st_le32_p I would recognize
that much easier.

Best wishes,

Pavel Pisa

Re: [PATCH for-5.2 2/4] hw/net/can/ctucan: Avoid unused value in ctucan_send_ready_buffers()

2020-11-06 Thread Pavel Pisa

Hello Peter,

this one is a little problematic. I understand that you want
to have clean code and no warnings reports from coverity.

On Friday 06 of November 2020 18:11:51 Peter Maydell wrote:
> Coverity points out that in ctucan_send_ready_buffers() we
> set buff_st_mask = 0xf << (i * 4) inside the loop, but then
> we never use it before overwriting it later.
>
> The only thing we use the mask for is as part of the code that is
> inserting the new buff_st field into tx_status.  That is more
> comprehensibly written using deposit32(), so do that and drop the
> mask variable entirely.
>
> We also update the buff_st local variable at multiple points
> during this function, but nothing can ever see these
> intermediate values, so just drop those, write the final
> TXT_TOK as a fixed constant value, and collapse the only
> remaining set/use of buff_st down into an extract32().
>
> Fixes: Coverity CID 1432869
> Signed-off-by: Peter Maydell 
> ---
>  hw/net/can/ctucan_core.c | 15 +++
>  1 file changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
> index ea09bf71a0c..f2ce978e5ec 100644
> --- a/hw/net/can/ctucan_core.c
> +++ b/hw/net/can/ctucan_core.c
> @@ -240,8 +240,6 @@ static void ctucan_send_ready_buffers(CtuCanCoreState
> *s) uint8_t *pf;
>  int buff2tx_idx;
>  uint32_t tx_prio_max;
> -unsigned int buff_st;
> -uint32_t buff_st_mask;
>
>  if (!s->mode_settings.s.ena) {
>  return;
> @@ -256,10 +254,7 @@ static void ctucan_send_ready_buffers(CtuCanCoreState
> *s) for (i = 0; i < CTUCAN_CORE_TXBUF_NUM; i++) {
>  uint32_t prio;
>
> -buff_st_mask = 0xf << (i * 4);
> -buff_st = (s->tx_status.u32 >> (i * 4)) & 0xf;
> -
> -if (buff_st != TXT_RDY) {
> +if (extract32(s->tx_status.u32, i * 4, 4) != TXT_RDY) {
>  continue;
>  }

It is right replacement. Initially buff_st kept location of bits of the buffer 
found to be processed later. But after priorization of Tx this cannot
be used without recompute.

>  prio = (s->tx_priority.u32 >> (i * 4)) & 0x7;
> @@ -271,10 +266,7 @@ static void ctucan_send_ready_buffers(CtuCanCoreState
> *s) if (buff2tx_idx == -1) {
>  break;
>  }
> -buff_st_mask = 0xf << (buff2tx_idx * 4);
> -buff_st = (s->tx_status.u32 >> (buff2tx_idx * 4)) & 0xf;

There I would kept extracted state in the variable. Actual model is really
simplified to real hardware. Tx succeeds in zero time.

>  int_stat.u32 = 0;
> -buff_st = TXT_RDY;

This is why the TXT_RDY state immediately changes to TXT_TOK. It works well
for actual simple CAN subsystem implementation. But if we want to implement
priorization of messages on emulated bus and even simulate real bus latency
by delay to state change and interrut delivery, then we need to proceed
through TXT_RDY state. If it is a problem for release, that your want to have
coverity clean source tree, then please left the line as a comment there
or use some trick with
   (void)buff_st;

Or if you prefer, use

  +s->tx_status.u32 = deposit32(s->tx_status.u32,
  + buff2tx_idx * 4, 4, TXT_RDY);

if it silent the coverity.

>  pf = s->tx_buffer[buff2tx_idx].data;
>  ctucan_buff2frame(pf, );
>  s->status.s.idle = 0;
> @@ -283,12 +275,11 @@ static void ctucan_send_ready_buffers(CtuCanCoreState
> *s) s->status.s.idle = 1;
>  s->status.s.txs = 0;
>  s->tx_fr_ctr.s.tx_fr_ctr_val++;
> -buff_st = TXT_TOK;
>  int_stat.s.txi = 1;
>  int_stat.s.txbhci = 1;
>  s->int_stat.u32 |= int_stat.u32 & ~s->int_mask.u32;
> -s->tx_status.u32 = (s->tx_status.u32 & ~buff_st_mask) |
> -(buff_st << (buff2tx_idx * 4));
> +s->tx_status.u32 = deposit32(s->tx_status.u32,
> + buff2tx_idx * 4, 4, TXT_TOK);
>  } while (1);
>  }


Thanks for your time, I have planned to look and these and attempt
to provide solution which is acceptable but documents our intentions,
but it is on my tasks queue still,

Pavel

Re: [Virtio-fs] [qemu-web PATCH v2] Add virtio-fs in OSv overview blog post

2020-11-06 Thread Dr. David Alan Gilbert

* Fotis Xenakis (fo...@windowslive.com) wrote:
> This post briefly goes over the main points of virtio-fs and OSv, a
> unikernel running under QEMU/KVM and taking advantage of its virtio-fs
> implementation.
> 
> Changes since v1:
> - Fixed wording and links, as suggested by Thomas Huth.
> - Added a short example of virtio-fs usage in OSv.
> 
> Signed-off-by: Fotis Xenakis 

> +One central point is OSv's support for booting from virtio-fs: this enables
> +deploying a modified version or a whole new application **without 
> rebuilding**
> +the image, just by adjusting its root file system contents on the host. Last,
> +owing to the DAX window practically providing low-overhead access to the 
> host's
> +page cache, scalability is also expected to excel, with it being a common
> +concern due to the potentially high density of unikernels per host.

Hi Fotis,
  Since I'm not used to unikernels, I'm a little confused by this; I'd
appreciate some explanation.

  In your unikernel, does the root filesystem just contain data?  I mean
being a 'unikernel' aren't all the binaries and support all linked into
the kernel itself?

Dave
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[PATCH 1/2] docs/fuzz: rST-ify the fuzzing documentation

2020-11-06 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
---
 MAINTAINERS|   2 +-
 docs/devel/fuzzing.rst | 236 +
 docs/devel/fuzzing.txt | 214 -
 docs/devel/index.rst   |   1 +
 4 files changed, 238 insertions(+), 215 deletions(-)
 create mode 100644 docs/devel/fuzzing.rst
 delete mode 100644 docs/devel/fuzzing.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 63223e1183..da1ef68ff1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2517,7 +2517,7 @@ R: Thomas Huth 
 S: Maintained
 F: tests/qtest/fuzz/
 F: scripts/oss-fuzz/
-F: docs/devel/fuzzing.txt
+F: docs/devel/fuzzing.rst
 
 Register API
 M: Alistair Francis 
diff --git a/docs/devel/fuzzing.rst b/docs/devel/fuzzing.rst
new file mode 100644
index 00..f19d75ceff
--- /dev/null
+++ b/docs/devel/fuzzing.rst
@@ -0,0 +1,236 @@
+
+Fuzzing
+
+
+This document describes the virtual-device fuzzing infrastructure in QEMU and
+how to use it to implement additional fuzzers.
+
+Basics
+--
+
+Fuzzing operates by passing inputs to an entry point/target function. The
+fuzzer tracks the code coverage triggered by the input. Based on these
+findings, the fuzzer mutates the input and repeats the fuzzing.
+
+To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer
+is an *in-process* fuzzer. For the developer, this means that it is their
+responsibility to ensure that state is reset between fuzzing-runs.
+
+Building the fuzzers
+
+
+*NOTE*: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is
+much faster, since the page-map has a smaller size. This is due to the fact 
that
+AddressSanitizer maps ~20TB of memory, as part of its detection. This results
+in a large page-map, and a much slower ``fork()``.
+
+To build the fuzzers, install a recent version of clang:
+Configure with (substitute the clang binaries with the version you installed).
+Here, enable-sanitizers, is optional but it allows us to reliably detect bugs
+such as out-of-bounds accesses, use-after-frees, double-frees etc.::
+
+CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \
+--enable-sanitizers
+
+Fuzz targets are built similarly to system targets::
+
+make i386-softmmu/fuzz
+
+This builds ``./i386-softmmu/qemu-fuzz-i386``
+
+The first option to this command is: ``--fuzz-target=FUZZ_NAME``
+To list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments.
+
+For example::
+
+./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
+
+Internally, libfuzzer parses all arguments that do not begin with ``"--"``.
+Information about these is available by passing ``-help=1``
+
+Now the only thing left to do is wait for the fuzzer to trigger potential
+crashes.
+
+Useful libFuzzer flags
+--
+
+As mentioned above, libFuzzer accepts some arguments. Passing ``-help=1`` will
+list the available arguments. In particular, these arguments might be helpful:
+
+* ``CORPUS_DIR/`` : Specify a directory as the last argument to libFuzzer.
+  libFuzzer stores each "interesting" input in this corpus directory. The next
+  time you run libFuzzer, it will read all of the inputs from the corpus, and
+  continue fuzzing from there. You can also specify multiple directories.
+  libFuzzer loads existing inputs from all specified directories, but will only
+  write new ones to the first one specified.
+
+* ``-max_len=4096`` : specify the maximum byte-length of the inputs libFuzzer
+  will generate.
+
+* ``-close_fd_mask={1,2,3}`` : close, stderr, or both. Useful for targets that
+  trigger many debug/error messages, or create output on the serial console.
+
+* ``-jobs=4 -workers=4`` : These arguments configure libFuzzer to run 4 
fuzzers in
+  parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only
+  ``-jobs=N``, libFuzzer automatically spawns a number of workers less than or 
equal
+  to half the available CPU cores. Replace 4 with a number appropriate for your
+  machine. Make sure to specify a ``CORPUS_DIR``, which will allow the parallel
+  fuzzers to share information about the interesting inputs they find.
+
+* ``-use_value_profile=1`` : For each comparison operation, libFuzzer computes
+  ``(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12)`` and places this in the
+  coverage table. Useful for targets with "magic" constants. If Arg1 came from
+  the fuzzer's input and Arg2 is a magic constant, then each time the Hamming
+  distance between Arg1 and Arg2 decreases, libFuzzer adds the input to the
+  corpus.
+
+* ``-shrink=1`` : Tries to make elements of the corpus "smaller". Might lead to
+  better coverage performance, depending on the target.
+
+Note that libFuzzer's exact behavior will depend on the version of
+clang and libFuzzer used to build the device fuzzers.
+
+Generating Coverage Reports
+---
+
+Code coverage is a crucial

[PATCH 0/2] Fuzzing Documentation Updates

2020-11-06 Thread Alexander Bulekov

I converted the docuemntation to markdown and updated build instructions
that changed after meson.

Alexander Bulekov (2):
  docs/fuzz: rST-ify the fuzzing documentation
  docs/fuzz: update fuzzing documentation post-meson

 MAINTAINERS|   2 +-
 docs/devel/fuzzing.rst | 236 +
 docs/devel/fuzzing.txt | 214 -
 docs/devel/index.rst   |   1 +
 4 files changed, 238 insertions(+), 215 deletions(-)
 create mode 100644 docs/devel/fuzzing.rst
 delete mode 100644 docs/devel/fuzzing.txt

-- 
2.28.0

[PATCH 2/2] docs/fuzz: update fuzzing documentation post-meson

2020-11-06 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
---
 docs/devel/fuzzing.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/devel/fuzzing.rst b/docs/devel/fuzzing.rst
index f19d75ceff..6096242d99 100644
--- a/docs/devel/fuzzing.rst
+++ b/docs/devel/fuzzing.rst
@@ -34,16 +34,16 @@ such as out-of-bounds accesses, use-after-frees, 
double-frees etc.::
 
 Fuzz targets are built similarly to system targets::
 
-make i386-softmmu/fuzz
+make qemu-fuzz-i386
 
-This builds ``./i386-softmmu/qemu-fuzz-i386``
+This builds ``./qemu-fuzz-i386``
 
 The first option to this command is: ``--fuzz-target=FUZZ_NAME``
 To list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments.
 
 For example::
 
-./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
+./qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz
 
 Internally, libfuzzer parses all arguments that do not begin with ``"--"``.
 Information about these is available by passing ``-help=1``
-- 
2.28.0

Re: [PATCH for-5.2 1/4] hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer

2020-11-06 Thread Peter Maydell

On Fri, 6 Nov 2020 at 17:48, Pavel Pisa  wrote:
>
> Hello Peter,
>
> thanks much for the catching the problem and investing time into
> fixing. I hope to find time for more review of remarks and Xilinx
> patches next week. I do not find reasonable time slot till Sunday.
> Excuse me. To not block updates, I confirm your changes.
>
> On Friday 06 of November 2020 18:11:50 Peter Maydell wrote:
> > The ctucan device has 4 CAN bus cores, each of which has a set of 20
> > 32-bit registers for writing the transmitted data. The registers are
> > however not contiguous; each core's buffers is 0x100 bytes after
> > the last.
> >
> > We got the checks on the address wrong in the ctucan_mem_write()
> > function:
> >  * the first "is addr in range at all" check allowed
> >addr == CTUCAN_CORE_MEM_SIZE, which is actually the first
> >byte off the end of the range
> >  * the decode of addresses into core-number plus offset in the
> >tx buffer for that core failed to check that the offset was
> >in range, so the guest could write off the end of the
> >tx_buffer[] array
> >  * the decode had an explicit check for whether the core-number
> >was out of range, which is actually impossible given the
> >CTUCAN_CORE_MEM_SIZE check and the number of cores.
> >
> > Fix the top level check, check the offset, and turn the check
> > on the core-number into an assertion.
> >
> > Fixes: Coverity CID 1432874
> > Signed-off-by: Peter Maydell 
> > ---
> >  hw/net/can/ctucan_core.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
> > index d20835cd7e9..ea09bf71a0c 100644
> > --- a/hw/net/can/ctucan_core.c
> > +++ b/hw/net/can/ctucan_core.c
> > @@ -303,7 +303,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr,
> > uint64_t val, DPRINTF("write 0x%02llx addr 0x%02x\n",
> >  (unsigned long long)val, (unsigned int)addr);
> >
> > -if (addr > CTUCAN_CORE_MEM_SIZE) {
> > +if (addr >= CTUCAN_CORE_MEM_SIZE) {
> >  return;
> >  }
>
> Acked-by: Pavel Pisa 

> > @@ -312,7 +312,8 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr,
> > uint64_t val, addr -= CTU_CAN_FD_TXTB1_DATA_1;
> >  buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN;
> >  addr %= CTUCAN_CORE_TXBUFF_SPAN;
> > -if (buff_num < CTUCAN_CORE_TXBUF_NUM) {
> > +assert(buff_num < CTUCAN_CORE_TXBUF_NUM);
>
> Assert is not necessary. If there is not buffer at that location,
> then write has no effect. Assert would check for driver errors,
> but it is not a problem of QEMU and for sure should not lead to its
> crash.

We assert() here as a guide to readers of the code that we know
that buff_num can't possibly be out of range for the array
access we're about to do: the values of CTUCAN_CORE_MEM_SIZE,
CTUCAN_CORE_TXBUFF_SPAN, etc, make it mathematically impossible.
We prefer to assert() that kind of condition rather than having
an if() test for it.

thanks
-- PMM

[PATCH for-5.2] hw/mips/boston.c: Fix memory leak in boston_fdt_filter() error-handling paths

2020-11-06 Thread Peter Maydell

Coverity points out that the error-handling paths in the
boston_fdt_filter() function don't free the fdt that was allocated.
Fix the leak by using g_autofree.

Fixes: Coverity CID 1432275

Signed-off-by: Peter Maydell 
---
 hw/mips/boston.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/mips/boston.c b/hw/mips/boston.c
index 3356d7a6814..3d40867dc4c 100644
--- a/hw/mips/boston.c
+++ b/hw/mips/boston.c
@@ -349,11 +349,9 @@ static const void *boston_fdt_filter(void *opaque, const 
void *fdt_orig,
 MachineState *machine = s->mach;
 const char *cmdline;
 int err;
-void *fdt;
-size_t fdt_sz, ram_low_sz, ram_high_sz;
-
-fdt_sz = fdt_totalsize(fdt_orig) * 2;
-fdt = g_malloc0(fdt_sz);
+size_t ram_low_sz, ram_high_sz;
+size_t fdt_sz = fdt_totalsize(fdt_orig) * 2;
+g_autofree void *fdt = g_malloc0(fdt_sz);
 
 err = fdt_open_into(fdt_orig, fdt, fdt_sz);
 if (err) {
@@ -380,7 +378,7 @@ static const void *boston_fdt_filter(void *opaque, const 
void *fdt_orig,
 
 s->fdt_base = *load_addr;
 
-return fdt;
+return g_steal_pointer();
 }
 
 static const void *boston_kernel_filter(void *opaque, const void *kernel,
-- 
2.20.1

Re: [PATCH for-5.2 1/4] hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer

2020-11-06 Thread Pavel Pisa

Hello Peter,

thanks much for the catching the problem and investing time into
fixing. I hope to find time for more review of remarks and Xilinx
patches next week. I do not find reasonable time slot till Sunday.
Excuse me. To not block updates, I confirm your changes.

On Friday 06 of November 2020 18:11:50 Peter Maydell wrote:
> The ctucan device has 4 CAN bus cores, each of which has a set of 20
> 32-bit registers for writing the transmitted data. The registers are
> however not contiguous; each core's buffers is 0x100 bytes after
> the last.
>
> We got the checks on the address wrong in the ctucan_mem_write()
> function:
>  * the first "is addr in range at all" check allowed
>addr == CTUCAN_CORE_MEM_SIZE, which is actually the first
>byte off the end of the range
>  * the decode of addresses into core-number plus offset in the
>tx buffer for that core failed to check that the offset was
>in range, so the guest could write off the end of the
>tx_buffer[] array
>  * the decode had an explicit check for whether the core-number
>was out of range, which is actually impossible given the
>CTUCAN_CORE_MEM_SIZE check and the number of cores.
>
> Fix the top level check, check the offset, and turn the check
> on the core-number into an assertion.
>
> Fixes: Coverity CID 1432874
> Signed-off-by: Peter Maydell 
> ---
>  hw/net/can/ctucan_core.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
> index d20835cd7e9..ea09bf71a0c 100644
> --- a/hw/net/can/ctucan_core.c
> +++ b/hw/net/can/ctucan_core.c
> @@ -303,7 +303,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr,
> uint64_t val, DPRINTF("write 0x%02llx addr 0x%02x\n",
>  (unsigned long long)val, (unsigned int)addr);
>
> -if (addr > CTUCAN_CORE_MEM_SIZE) {
> +if (addr >= CTUCAN_CORE_MEM_SIZE) {
>  return;
>  }

Acked-by: Pavel Pisa 

There is another mistake

diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
index d20835cd7e..b90a8e3b76 100644
--- a/hw/net/can/ctucan_core.c
+++ b/hw/net/can/ctucan_core.c
@@ -418,7 +418,7 @@ uint64_t ctucan_mem_read(CtuCanCoreState *s, hwaddr addr, 
unsigned size)

 DPRINTF("read addr 0x%02x ...\n", (unsigned int)addr);

-if (addr > CTUCAN_CORE_MEM_SIZE) {
+if (addr >= CTUCAN_CORE_MEM_SIZE) {
 return 0;
 }
But is should be really harmless.

> @@ -312,7 +312,8 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr,
> uint64_t val, addr -= CTU_CAN_FD_TXTB1_DATA_1;
>  buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN;
>  addr %= CTUCAN_CORE_TXBUFF_SPAN;
> -if (buff_num < CTUCAN_CORE_TXBUF_NUM) {
> +assert(buff_num < CTUCAN_CORE_TXBUF_NUM);

Assert is not necessary. If there is not buffer at that location,
then write has no effect. Assert would check for driver errors,
but it is not a problem of QEMU and for sure should not lead to its
crash.

> +if (addr < sizeof(s->tx_buffer[buff_num].data)) {
>  uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data +
> addr); *bufp = cpu_to_le32(val);
>  }

So my proposed solution is

diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
index d20835cd7e..af30d57cfd 100644
--- a/hw/net/can/ctucan_core.c
+++ b/hw/net/can/ctucan_core.c
@@ -312,7 +312,9 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, 
uint64_t val,
 addr -= CTU_CAN_FD_TXTB1_DATA_1;
 buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN;
 addr %= CTUCAN_CORE_TXBUFF_SPAN;
-if (buff_num < CTUCAN_CORE_TXBUF_NUM) {
+addr &= ~3;
+if (buff_num < CTUCAN_CORE_TXBUF_NUM &&
+addr < CTUCAN_CORE_MSG_MAX_LEN) {
 uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + addr);
 *bufp = cpu_to_le32(val);
 }

It ignores writes to unimplemented locations.

There is fix, workaround to make safe writes to unaligned addresses.
They are not supported by actual QEMU CTU CAN FD model.
Real HW supports them but driver never uses unaligned writes
nor any other size than 32-bits.

Reads supports at least accesses by smaller size correctly
but do not support unaligned reads which cross 32-bit boundaries.
Again, actual driver code never uses other size than 32-bits
for read. Byte access for debug dumps by rdwrmem by bytes
has been tested and works OK as well.

The following proposed correction right solution too

diff --git a/hw/net/can/ctucan_core.h b/hw/net/can/ctucan_core.h
index f21cb1c5ec..ad701c4764 100644
--- a/hw/net/can/ctucan_core.h
+++ b/hw/net/can/ctucan_core.h
@@ -31,10 +31,11 @@
 #include "exec/hwaddr.h"
 #include "net/can_emu.h"

-
+#ifndef HOST_WORDS_BIGENDIAN
 #ifndef __LITTLE_ENDIAN_BITFIELD
 #define __LITTLE_ENDIAN_BITFIELD 1
 #endif
+#endif

 #include "ctu_can_fd_frame.h"
 #include "ctu_can_fd_regs.h"

As for bitfields use, there is plan to add additional mode to generate
registers header file from

[PATCH v2 1/2] quorum: Implement bdrv_co_block_status()

2020-11-06 Thread Alberto Garcia

The quorum driver does not implement bdrv_co_block_status() and
because of that it always reports to contain data even if all its
children are known to be empty.

One consequence of this is that if we for example create a quorum with
a size of 10GB and we mirror it to a new image the operation will
write 10GB of actual zeroes to the destination image wasting a lot of
time and disk space.

Since a quorum has an arbitrary number of children of potentially
different formats there is no way to report all possible allocation
status flags in a way that makes sense, so this implementation only
reports when a given region is known to contain zeroes
(BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA).

If all children agree that a region contains zeroes then we can return
BDRV_BLOCK_ZERO using the smallest size reported by the children
(because all agree that a region of at least that size contains
zeroes).

If at least one child disagrees we have to return BDRV_BLOCK_DATA.
In this case we use the largest of the sizes reported by the children
that didn't return BDRV_BLOCK_ZERO (because we know that there won't
be an agreement for at least that size).

Signed-off-by: Alberto Garcia 
Tested-by: Tao Xu 
---
 block/quorum.c |  49 
 tests/qemu-iotests/312 | 148 +
 tests/qemu-iotests/312.out |  67 +
 tests/qemu-iotests/group   |   1 +
 4 files changed, 265 insertions(+)
 create mode 100755 tests/qemu-iotests/312
 create mode 100644 tests/qemu-iotests/312.out

diff --git a/block/quorum.c b/block/quorum.c
index e846a7e892..29cee42705 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -18,6 +18,7 @@
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "block/block_int.h"
+#include "block/coroutines.h"
 #include "block/qdict.h"
 #include "qapi/error.h"
 #include "qapi/qapi-events-block.h"
@@ -1174,6 +1175,53 @@ static void quorum_child_perm(BlockDriverState *bs, 
BdrvChild *c,
  | DEFAULT_PERM_UNCHANGED;
 }
 
+/*
+ * Each one of the children can report different status flags even
+ * when they contain the same data, so what this function does is
+ * return BDRV_BLOCK_ZERO if *all* children agree that a certain
+ * region contains zeroes, and BDRV_BLOCK_DATA otherwise.
+ */
+static int coroutine_fn quorum_co_block_status(BlockDriverState *bs,
+   bool want_zero,
+   int64_t offset, int64_t count,
+   int64_t *pnum, int64_t *map,
+   BlockDriverState **file)
+{
+BDRVQuorumState *s = bs->opaque;
+int i, ret;
+int64_t pnum_zero = count;
+int64_t pnum_data = 0;
+
+for (i = 0; i < s->num_children; i++) {
+int64_t bytes;
+ret = bdrv_co_common_block_status_above(s->children[i]->bs, NULL, 
false,
+want_zero, offset, count,
+, NULL, NULL, NULL);
+if (ret < 0) {
+return ret;
+}
+/*
+ * Even if all children agree about whether there are zeroes
+ * or not at @offset they might disagree on the size, so use
+ * the smallest when reporting BDRV_BLOCK_ZERO and the largest
+ * when reporting BDRV_BLOCK_DATA.
+ */
+if (ret & BDRV_BLOCK_ZERO) {
+pnum_zero = MIN(pnum_zero, bytes);
+} else {
+pnum_data = MAX(pnum_data, bytes);
+}
+}
+
+if (pnum_data) {
+*pnum = pnum_data;
+return BDRV_BLOCK_DATA;
+} else {
+*pnum = pnum_zero;
+return BDRV_BLOCK_ZERO;
+}
+}
+
 static const char *const quorum_strong_runtime_opts[] = {
 QUORUM_OPT_VOTE_THRESHOLD,
 QUORUM_OPT_BLKVERIFY,
@@ -1192,6 +1240,7 @@ static BlockDriver bdrv_quorum = {
 .bdrv_close = quorum_close,
 .bdrv_gather_child_options  = quorum_gather_child_options,
 .bdrv_dirname   = quorum_dirname,
+.bdrv_co_block_status   = quorum_co_block_status,
 
 .bdrv_co_flush_to_disk  = quorum_co_flush,
 
diff --git a/tests/qemu-iotests/312 b/tests/qemu-iotests/312
new file mode 100755
index 00..1b08f1552f
--- /dev/null
+++ b/tests/qemu-iotests/312
@@ -0,0 +1,148 @@
+#!/usr/bin/env bash
+#
+# Test drive-mirror with quorum
+#
+# The goal of this test is to check how the quorum driver reports
+# regions that are known to read as zeroes (BDRV_BLOCK_ZERO). The idea
+# is that drive-mirror will try the efficient representation of zeroes
+# in the destination image instead of writing actual zeroes.
+#
+# Copyright (C) 2020 Igalia, S.L.
+# Author: Alberto Garcia 
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation;

[PATCH v2 2/2] quorum: Implement bdrv_co_pwrite_zeroes()

2020-11-06 Thread Alberto Garcia

This simply calls bdrv_co_pwrite_zeroes() in all children

Signed-off-by: Alberto Garcia 
---
 block/quorum.c | 18 --
 tests/qemu-iotests/312 |  7 +++
 tests/qemu-iotests/312.out |  4 
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 29cee42705..70b39a7226 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -692,8 +692,13 @@ static void write_quorum_entry(void *opaque)
 QuorumChildRequest *sacb = >qcrs[i];
 
 sacb->bs = s->children[i]->bs;
-sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
-acb->qiov, acb->flags);
+if (acb->flags & BDRV_REQ_ZERO_WRITE) {
+sacb->ret = bdrv_co_pwrite_zeroes(s->children[i], acb->offset,
+  acb->bytes, acb->flags);
+} else {
+sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
+acb->qiov, acb->flags);
+}
 if (sacb->ret == 0) {
 acb->success_count++;
 } else {
@@ -739,6 +744,14 @@ static int quorum_co_pwritev(BlockDriverState *bs, 
uint64_t offset,
 return ret;
 }
 
+static int quorum_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+   int bytes, BdrvRequestFlags flags)
+
+{
+return quorum_co_pwritev(bs, offset, bytes, NULL,
+ flags | BDRV_REQ_ZERO_WRITE);
+}
+
 static int64_t quorum_getlength(BlockDriverState *bs)
 {
 BDRVQuorumState *s = bs->opaque;
@@ -1248,6 +1261,7 @@ static BlockDriver bdrv_quorum = {
 
 .bdrv_co_preadv = quorum_co_preadv,
 .bdrv_co_pwritev= quorum_co_pwritev,
+.bdrv_co_pwrite_zeroes  = quorum_co_pwrite_zeroes,
 
 .bdrv_add_child = quorum_add_child,
 .bdrv_del_child = quorum_del_child,
diff --git a/tests/qemu-iotests/312 b/tests/qemu-iotests/312
index 1b08f1552f..93046393e7 100755
--- a/tests/qemu-iotests/312
+++ b/tests/qemu-iotests/312
@@ -114,6 +114,13 @@ $QEMU_IO -c "write -P 0 $((0x20)) $((0x1))" 
"$TEST_IMG.0" | _filter_qemu
 $QEMU_IO -c "write -z   $((0x20)) $((0x3))" "$TEST_IMG.1" | 
_filter_qemu_io
 $QEMU_IO -c "write -P 0 $((0x20)) $((0x2))" "$TEST_IMG.2" | 
_filter_qemu_io
 
+# Test 5: write data to a region and then zeroize it, doing it
+# directly on the quorum device instead of the individual images.
+# This has no effect on the end result but proves that the quorum driver
+# supports 'write -z'.
+$QEMU_IO -c "open -o $quorum" -c "write $((0x25)) $((0x1))" | 
_filter_qemu_io
+$QEMU_IO -c "open -o $quorum" -c "write -z $((0x25)) $((0x1))" | 
_filter_qemu_io
+
 echo
 echo '### Launch the drive-mirror job'
 echo
diff --git a/tests/qemu-iotests/312.out b/tests/qemu-iotests/312.out
index 4ae749175b..778dda95c7 100644
--- a/tests/qemu-iotests/312.out
+++ b/tests/qemu-iotests/312.out
@@ -39,6 +39,10 @@ wrote 196608/196608 bytes at offset 2097152
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2097152
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 2424832
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 2424832
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 ### Launch the drive-mirror job
 
-- 
2.20.1

[PATCH v2 0/2] quorum: Implement bdrv_co_block_status()

2020-11-06 Thread Alberto Garcia

Hi,

The first patch is the same as in v1, but now that we're at it I
decided to also implement bdrv_co_pwrite_zeroes()

Berto

v2:
- Implement bdrv_co_pwrite_zeroes() for quorum

v1: https://lists.gnu.org/archive/html/qemu-block/2020-11/msg00163.html

Alberto Garcia (2):
  quorum: Implement bdrv_co_block_status()
  quorum: Implement bdrv_co_pwrite_zeroes()

 block/quorum.c |  67 +++-
 tests/qemu-iotests/312 | 155 +
 tests/qemu-iotests/312.out |  71 +
 tests/qemu-iotests/group   |   1 +
 4 files changed, 292 insertions(+), 2 deletions(-)
 create mode 100755 tests/qemu-iotests/312
 create mode 100644 tests/qemu-iotests/312.out

-- 
2.20.1

Re: [PATCH v2 4/4] linux-user/sparc: Handle tstate in sparc64_get/set_context()

2020-11-06 Thread Richard Henderson

On 11/6/20 7:27 AM, Peter Maydell wrote:
> +#ifdef TARGET_SPARC64
> +/* win_helper.c */
> +target_ulong cpu_get_ccr(CPUSPARCState *env1);
> +void cpu_put_ccr(CPUSPARCState *env1, target_ulong val);
> +target_ulong cpu_get_cwp64(CPUSPARCState *env1);
> +void cpu_put_cwp64(CPUSPARCState *env1, int cwp);
> +
> +static inline uint64_t sparc64_tstate(CPUSPARCState *env)
> +{
> +uint64_t tstate = (cpu_get_ccr(env) << 32) |
> +((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
> +cpu_get_cwp64(env);
> +
> +if (env->def.features & CPU_FEATURE_GL) {
> +tstate |= (env->gl & 7ULL) << 40;
> +}
> +return tstate;
> +}
> +#endif

Given that this inline function calls 2 other out-of-line functions, I think it
might as well be out-of-line itself.
I'd place it in win_helper.c alongside the functions that it calls.

But either way,
Reviewed-by: Richard Henderson 

r~

nbd: bitmap_to_extents() calls nbd_extent_array_add() without checking return value: coverity false positive?

2020-11-06 Thread Peter Maydell

Hi; Coverity's "you usually check the return value of this function
but you didn't do that here" heuristic has fired on the code in
nbd/server.c:bitmap_to_extents() -- the function nbd_extent_array_add()
is called five times in server.c, and the return value is checked
in four of those, but not in the final call at the end of
bitmap_to_extents(). (CID 1436125.)

Is this a false positive, or should the caller be handling an
error here ?

thanks
-- PMM

[PATCH for-5.2 3/4] hw/net/can/ctucan_core: Handle big-endian hosts

2020-11-06 Thread Peter Maydell

The ctucan driver defines types for its registers which are a union
of a uint32_t with a struct with bitfields for the individual
fields within that register. This is a bad idea, because bitfields
aren't portable. The ctu_can_fd_regs.h header works around the
most glaring of the portability issues by defining the
fields in two different orders depending on the setting of the
__LITTLE_ENDIAN_BITFIELD define. However, in ctucan_core.h this
is unconditionally set to 1, which is wrong for big-endian hosts.

Set it only if HOST_WORDS_BIGENDIAN is not set. There is no need
for a "have we defined it already" guard, because the only place
that should set it is ctucan_core.h, which has the usual
double-inclusion guard.

Signed-off-by: Peter Maydell 
---
Ideally all that bitfield-using code would be rewritten to use
extract32 and deposit32 instead, IMHO.
---
 hw/net/can/ctucan_core.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/net/can/ctucan_core.h b/hw/net/can/ctucan_core.h
index f21cb1c5ec3..bbc09ae0678 100644
--- a/hw/net/can/ctucan_core.h
+++ b/hw/net/can/ctucan_core.h
@@ -31,8 +31,7 @@
 #include "exec/hwaddr.h"
 #include "net/can_emu.h"
 
-
-#ifndef __LITTLE_ENDIAN_BITFIELD
+#ifndef HOST_WORDS_BIGENDIAN
 #define __LITTLE_ENDIAN_BITFIELD 1
 #endif
 
-- 
2.20.1

Re: [PATCH v2 1/4] linux-user/sparc: Correct sparc64_get/set_context() FPU handling

2020-11-06 Thread Peter Maydell

On Fri, 6 Nov 2020 at 17:09, Richard Henderson
 wrote:
>
> On 11/6/20 7:27 AM, Peter Maydell wrote:
> > +if (fprs & FPRS_DU) {
> > +for (i = 16; i < 31; i++) {
>
> 32.

Derp. Lucky this code basically never gets run, eh ? :-)

-- PMM

[PATCH for-5.2 0/4] hw/net/can/ctucan: fix Coverity and other issues

2020-11-06 Thread Peter Maydell

This patchset fixes a couple of issues spotted by Coverity:
 * incorrect address checks meant the guest could write off the
   end of the tx_buffer arrays
 * we had an unused value in ctucan_send_ready_buffers()
and also some I noticed while reading the code:
 * we don't adjust the device's non-portable use of bitfields
   on bigendian hosts
 * we should use stl_le_p() rather than casting uint_t* to
   uint32_t*

Tested with "make check" only.

thanks
 -- PMM

Peter Maydell (4):
  hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer
  hw/net/can/ctucan: Avoid unused value in ctucan_send_ready_buffers()
  hw/net/can/ctucan_core: Handle big-endian hosts
  hw/net/ctucan_core: Use stl_le_p to write to tx_buffers

 hw/net/can/ctucan_core.h |  3 +--
 hw/net/can/ctucan_core.c | 23 +++
 2 files changed, 8 insertions(+), 18 deletions(-)

-- 
2.20.1

[PATCH for-5.2 2/4] hw/net/can/ctucan: Avoid unused value in ctucan_send_ready_buffers()

2020-11-06 Thread Peter Maydell

Coverity points out that in ctucan_send_ready_buffers() we
set buff_st_mask = 0xf << (i * 4) inside the loop, but then
we never use it before overwriting it later.

The only thing we use the mask for is as part of the code that is
inserting the new buff_st field into tx_status.  That is more
comprehensibly written using deposit32(), so do that and drop the
mask variable entirely.

We also update the buff_st local variable at multiple points
during this function, but nothing can ever see these
intermediate values, so just drop those, write the final
TXT_TOK as a fixed constant value, and collapse the only
remaining set/use of buff_st down into an extract32().

Fixes: Coverity CID 1432869
Signed-off-by: Peter Maydell 
---
 hw/net/can/ctucan_core.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
index ea09bf71a0c..f2ce978e5ec 100644
--- a/hw/net/can/ctucan_core.c
+++ b/hw/net/can/ctucan_core.c
@@ -240,8 +240,6 @@ static void ctucan_send_ready_buffers(CtuCanCoreState *s)
 uint8_t *pf;
 int buff2tx_idx;
 uint32_t tx_prio_max;
-unsigned int buff_st;
-uint32_t buff_st_mask;
 
 if (!s->mode_settings.s.ena) {
 return;
@@ -256,10 +254,7 @@ static void ctucan_send_ready_buffers(CtuCanCoreState *s)
 for (i = 0; i < CTUCAN_CORE_TXBUF_NUM; i++) {
 uint32_t prio;
 
-buff_st_mask = 0xf << (i * 4);
-buff_st = (s->tx_status.u32 >> (i * 4)) & 0xf;
-
-if (buff_st != TXT_RDY) {
+if (extract32(s->tx_status.u32, i * 4, 4) != TXT_RDY) {
 continue;
 }
 prio = (s->tx_priority.u32 >> (i * 4)) & 0x7;
@@ -271,10 +266,7 @@ static void ctucan_send_ready_buffers(CtuCanCoreState *s)
 if (buff2tx_idx == -1) {
 break;
 }
-buff_st_mask = 0xf << (buff2tx_idx * 4);
-buff_st = (s->tx_status.u32 >> (buff2tx_idx * 4)) & 0xf;
 int_stat.u32 = 0;
-buff_st = TXT_RDY;
 pf = s->tx_buffer[buff2tx_idx].data;
 ctucan_buff2frame(pf, );
 s->status.s.idle = 0;
@@ -283,12 +275,11 @@ static void ctucan_send_ready_buffers(CtuCanCoreState *s)
 s->status.s.idle = 1;
 s->status.s.txs = 0;
 s->tx_fr_ctr.s.tx_fr_ctr_val++;
-buff_st = TXT_TOK;
 int_stat.s.txi = 1;
 int_stat.s.txbhci = 1;
 s->int_stat.u32 |= int_stat.u32 & ~s->int_mask.u32;
-s->tx_status.u32 = (s->tx_status.u32 & ~buff_st_mask) |
-(buff_st << (buff2tx_idx * 4));
+s->tx_status.u32 = deposit32(s->tx_status.u32,
+ buff2tx_idx * 4, 4, TXT_TOK);
 } while (1);
 }
 
-- 
2.20.1

[PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers

2020-11-06 Thread Peter Maydell

Instead of casting an address within a uint8_t array to a
uint32_t*, use stl_le_p(). This handles possibly misaligned
addresses which would otherwise crash on some hosts.

Signed-off-by: Peter Maydell 
---
 hw/net/can/ctucan_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
index f2ce978e5ec..e66526efa83 100644
--- a/hw/net/can/ctucan_core.c
+++ b/hw/net/can/ctucan_core.c
@@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, 
uint64_t val,
 addr %= CTUCAN_CORE_TXBUFF_SPAN;
 assert(buff_num < CTUCAN_CORE_TXBUF_NUM);
 if (addr < sizeof(s->tx_buffer[buff_num].data)) {
-uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + addr);
-*bufp = cpu_to_le32(val);
+stl_le_p(s->tx_buffer[buff_num].data + addr, val);
 }
 } else {
 switch (addr & ~3) {
-- 
2.20.1

Re: [PATCH v2 2/4] linux-user/sparc: Remove unneeded checks of 'err' from sparc64_get_context()

2020-11-06 Thread Richard Henderson

On 11/6/20 7:27 AM, Peter Maydell wrote:
> Unlike the kernel macros, our __get_user() and __put_user() do not
> return a failure code.  Kernel code typically has a style of
>   err |= __get_user(...); err |= __get_user(...);
> and then checking err at the end.  In sparc64_get_context() our
> version of the code dropped the accumulating into err but left the
> "if (err) goto do_sigsegv" checks, which will never be taken. Delete
> unnecessary if()s.
> 
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/sparc/signal.c | 4 
>  1 file changed, 4 deletions(-)

Reviewed-by: Richard Henderson 

r~

[PATCH for-5.2 1/4] hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer

2020-11-06 Thread Peter Maydell

The ctucan device has 4 CAN bus cores, each of which has a set of 20
32-bit registers for writing the transmitted data. The registers are
however not contiguous; each core's buffers is 0x100 bytes after
the last.

We got the checks on the address wrong in the ctucan_mem_write()
function:
 * the first "is addr in range at all" check allowed
   addr == CTUCAN_CORE_MEM_SIZE, which is actually the first
   byte off the end of the range
 * the decode of addresses into core-number plus offset in the
   tx buffer for that core failed to check that the offset was
   in range, so the guest could write off the end of the
   tx_buffer[] array
 * the decode had an explicit check for whether the core-number
   was out of range, which is actually impossible given the
   CTUCAN_CORE_MEM_SIZE check and the number of cores.

Fix the top level check, check the offset, and turn the check
on the core-number into an assertion.

Fixes: Coverity CID 1432874
Signed-off-by: Peter Maydell 
---
 hw/net/can/ctucan_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c
index d20835cd7e9..ea09bf71a0c 100644
--- a/hw/net/can/ctucan_core.c
+++ b/hw/net/can/ctucan_core.c
@@ -303,7 +303,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, 
uint64_t val,
 DPRINTF("write 0x%02llx addr 0x%02x\n",
 (unsigned long long)val, (unsigned int)addr);
 
-if (addr > CTUCAN_CORE_MEM_SIZE) {
+if (addr >= CTUCAN_CORE_MEM_SIZE) {
 return;
 }
 
@@ -312,7 +312,8 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, 
uint64_t val,
 addr -= CTU_CAN_FD_TXTB1_DATA_1;
 buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN;
 addr %= CTUCAN_CORE_TXBUFF_SPAN;
-if (buff_num < CTUCAN_CORE_TXBUF_NUM) {
+assert(buff_num < CTUCAN_CORE_TXBUF_NUM);
+if (addr < sizeof(s->tx_buffer[buff_num].data)) {
 uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + addr);
 *bufp = cpu_to_le32(val);
 }
-- 
2.20.1

Re: [PATCH v2 3/4] linux-user/sparc: Don't restore %g7 in sparc64_set_context()

2020-11-06 Thread Richard Henderson

On 11/6/20 7:27 AM, Peter Maydell wrote:
> The kernel does not restore the g7 register in sparc64_set_context();
> neither should we. (We still save it in sparc64_get_context().)
> 
> Signed-off-by: Peter Maydell 
> ---
>  linux-user/sparc/signal.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson 

r~

Re: Question on UEFI ACPI tables setup and probing on arm64

2020-11-06 Thread Laszlo Ersek

On 11/05/20 05:30, Ying Fang wrote:

> I see it in Qemu the *loader_start* is fixed at 1 GiB on the
> physical address space which points to the DRAM base. In ArmVirtQemu.dsc
> PcdDeviceTreeInitialBaseAddress is set 0x4000 with correspondence.
> 
> Here I also see the discussion about DRAM base for ArmVirtQemu.
> https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03127.html
> 
> I am still not sure how UEFI knows that it is running on a ArmVirtQemu
> machine type.

It doesn't know. It remains a convention.

This part is not auto-detected; the constants in QEMU and edk2 are
independently open-coded, their values were synchronized by human effort
initially.

The user or the management layer have to make sure they boot a UEFI
firmware binary on the machine type that is compatible with the machine
type.

There is some meta-data to help with that:

> Does UEFI derive it from the fdt *compatible* property ?

Please see the schema "docs/interop/firmware.json" in the QEMU tree; in
particular the @FirmwareTarget element.

For an actual example: QEMU bundles some edk2 firmware binaries (purely
as a convenience, not for production), and those are accompanied by
matching descriptor files. See
"pc-bios/descriptors/60-edk2-aarch64.json". (It is a template that's
fixed up during QEMU installation, but that's tangential here.)

"targets": [
{
"architecture": "aarch64",
"machines": [
"virt-*"
]
}
],

Thanks
Laszlo

Re: [PATCH v2 1/4] linux-user/sparc: Correct sparc64_get/set_context() FPU handling

2020-11-06 Thread Richard Henderson

On 11/6/20 7:27 AM, Peter Maydell wrote:
> +if (fprs & FPRS_DU) {
> +for (i = 16; i < 31; i++) {

32.

Otherwise,
Reviewed-by: Richard Henderson 

r~

[PATCH-for-6.0 1/2] hw/scsi/scsi-disk: Rename type as TYPE_SCSI_DISK

2020-11-06 Thread Philippe Mathieu-Daudé

Rename TYPE_SCSI_DISK without the '_BASE' suffix to match
the other abstract types in the codebase.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/scsi/scsi-disk.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index e859534eaf3..d2b9cb28da1 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -53,9 +53,9 @@
 #define DEFAULT_MAX_UNMAP_SIZE  (1 * GiB)
 #define DEFAULT_MAX_IO_SIZE INT_MAX /* 2 GB - 1 block */
 
-#define TYPE_SCSI_DISK_BASE "scsi-disk-base"
+#define TYPE_SCSI_DISK  "scsi-disk-base"
 
-OBJECT_DECLARE_TYPE(SCSIDiskState, SCSIDiskClass, SCSI_DISK_BASE)
+OBJECT_DECLARE_TYPE(SCSIDiskState, SCSIDiskClass, SCSI_DISK)
 
 struct SCSIDiskClass {
 SCSIDeviceClass parent_class;
@@ -2956,7 +2956,7 @@ BlockAIOCB *scsi_dma_writev(int64_t offset, QEMUIOVector 
*iov,
 static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);
+SCSIDiskClass *sdc = SCSI_DISK_CLASS(klass);
 
 dc->fw_name = "disk";
 dc->reset = scsi_disk_reset;
@@ -2966,7 +2966,7 @@ static void scsi_disk_base_class_initfn(ObjectClass 
*klass, void *data)
 }
 
 static const TypeInfo scsi_disk_base_info = {
-.name  = TYPE_SCSI_DISK_BASE,
+.name  = TYPE_SCSI_DISK,
 .parent= TYPE_SCSI_DEVICE,
 .class_init= scsi_disk_base_class_initfn,
 .instance_size = sizeof(SCSIDiskState),
@@ -3036,7 +3036,7 @@ static void scsi_hd_class_initfn(ObjectClass *klass, void 
*data)
 
 static const TypeInfo scsi_hd_info = {
 .name  = "scsi-hd",
-.parent= TYPE_SCSI_DISK_BASE,
+.parent= TYPE_SCSI_DISK,
 .class_init= scsi_hd_class_initfn,
 };
 
@@ -3067,7 +3067,7 @@ static void scsi_cd_class_initfn(ObjectClass *klass, void 
*data)
 
 static const TypeInfo scsi_cd_info = {
 .name  = "scsi-cd",
-.parent= TYPE_SCSI_DISK_BASE,
+.parent= TYPE_SCSI_DISK,
 .class_init= scsi_cd_class_initfn,
 };
 
@@ -3090,7 +3090,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, 
void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass);
-SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass);
+SCSIDiskClass *sdc = SCSI_DISK_CLASS(klass);
 
 sc->realize  = scsi_block_realize;
 sc->alloc_req= scsi_block_new_request;
@@ -3106,7 +3106,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, 
void *data)
 
 static const TypeInfo scsi_block_info = {
 .name  = "scsi-block",
-.parent= TYPE_SCSI_DISK_BASE,
+.parent= TYPE_SCSI_DISK,
 .class_init= scsi_block_class_initfn,
 };
 #endif
@@ -3146,7 +3146,7 @@ static void scsi_disk_class_initfn(ObjectClass *klass, 
void *data)
 
 static const TypeInfo scsi_disk_info = {
 .name  = "scsi-disk",
-.parent= TYPE_SCSI_DISK_BASE,
+.parent= TYPE_SCSI_DISK,
 .class_init= scsi_disk_class_initfn,
 };
 
-- 
2.26.2

[PATCH-for-6.0 2/2] hw/scsi/scsi-disk: Use SCSI_DISK_GET_CLASS() macro

2020-11-06 Thread Philippe Mathieu-Daudé

Use the SCSI_DISK_GET_CLASS() macro to match the rest of
the codebase.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/scsi/scsi-disk.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index d2b9cb28da1..deb51ec8e7d 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -338,7 +338,7 @@ static void scsi_read_complete(void *opaque, int ret)
 static void scsi_do_read(SCSIDiskReq *r, int ret)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
+SCSIDiskClass *sdc = SCSI_DISK_GET_CLASS(s);
 
 assert (r->req.aiocb == NULL);
 if (scsi_disk_req_check_error(r, ret, false)) {
@@ -438,7 +438,7 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int error, 
bool acct_failed)
 {
 bool is_read = (r->req.cmd.mode == SCSI_XFER_FROM_DEV);
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
+SCSIDiskClass *sdc = SCSI_DISK_GET_CLASS(s);
 BlockErrorAction action = blk_get_error_action(s->qdev.conf.blk,
is_read, error);
 
@@ -538,7 +538,7 @@ static void scsi_write_data(SCSIRequest *req)
 {
 SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
+SCSIDiskClass *sdc = SCSI_DISK_GET_CLASS(s);
 
 /* No data transfer may already be in progress */
 assert(r->req.aiocb == NULL);
@@ -2149,7 +2149,7 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, 
uint8_t *buf)
 {
 SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
-SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s));
+SCSIDiskClass *sdc = SCSI_DISK_GET_CLASS(s);
 uint32_t len;
 uint8_t command;
 
-- 
2.26.2

[PATCH-for-6.0 0/2] hw/scsi/scsi-disk: QOM style change

2020-11-06 Thread Philippe Mathieu-Daudé

Some QOM style changes in TYPE_SCSI_DISK to follow
the rest of the codebase style. No logical change.

Philippe Mathieu-Daudé (2):
  hw/scsi/scsi-disk: Rename type as TYPE_SCSI_DISK
  hw/scsi/scsi-disk: Use SCSI_DISK_GET_CLASS() macro

 hw/scsi/scsi-disk.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

-- 
2.26.2

Re: [PATCH 1/1] Change the order of g_free(info) and tracepoint

2020-11-06 Thread Philippe Mathieu-Daudé

On 11/6/20 4:59 PM, David Edmondson wrote:
> On Friday, 2020-11-06 at 19:09:24 +0530, Kirti Wankhede wrote:
> 
>> Fixes Coverity issue:
>> CID 1436126:  Memory - illegal accesses  (USE_AFTER_FREE)
>>
>> Fixes: a9e271ec9b36 ("vfio: Add migration region initialization and finalize
>> function")
>>
>> Signed-off-by: Kirti Wankhede 
> 
> Maybe "fix use after free in vfio_migration_probe" as a summary?

Yes please :)

Reviewed-by: Philippe Mathieu-Daudé 

> 
> Reviewed-by: David Edmondson 
> 
>> ---
>>  hw/vfio/migration.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 3ce285ea395d..55261562d4f3 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -897,8 +897,8 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error 
>> **errp)
>>  goto add_blocker;
>>  }
>>  
>> -g_free(info);
>>  trace_vfio_migration_probe(vbasedev->name, info->index);
>> +g_free(info);
>>  return 0;
>>  
>>  add_blocker:
>> -- 
>> 2.7.0
> 
> dme.
>

Re: [PATCH 1/2] qemu-option: move help handling to get_opt_name_value

2020-11-06 Thread Markus Armbruster

One more thought...

Markus Armbruster  writes:

> Paolo Bonzini  writes:
[...]
>> diff --git a/util/qemu-option.c b/util/qemu-option.c
[...]
>> @@ -767,16 +763,18 @@ void qemu_opts_print(QemuOpts *opts, const char 
>> *separator)
>>  
>>  static const char *get_opt_name_value(const char *params,
>>const char *firstname,
>> +  bool *help_wanted,
>>char **name, char **value)
>>  {
>> -const char *p, *pe, *pc;
>> -
>> -pe = strchr(params, '=');
>> -pc = strchr(params, ',');
>> +const char *p;
>> +size_t len;
>>  
>> -if (!pe || (pc && pc < pe)) {
>> +len = strcspn(params, "=,");
>> +if (params[len] != '=') {
>>  /* found "foo,more" */
>> -if (firstname) {
>> +if (help_wanted && starts_with_help_option(params) == len) {
>> +*help_wanted = true;
>> +} else if (firstname) {
>>  /* implicitly named first option */
>>  *name = g_strdup(firstname);
>>  p = get_opt_value(params, value);
>
> This function parses one parameter from @params into @name, @value, and
> returns a pointer to the next parameter, or else to the terminating
> '\0'.

Like opt_validate() before, this sets *help_wanted only to true.
Callers must pass a pointer to false.  Perhaps having it set
*help_wanted always could simplify things overall.  Up to you.

[...]

Re: [PATCH] migration/dirtyrate: simplify inlcudes in dirtyrate.c

2020-11-06 Thread Dr. David Alan Gilbert

* Zheng Chuan (zhengch...@huawei.com) wrote:
> Kindly ping for not forgetting this trivial fix:)

Yes but it's too late for the merge window, so it'll happen on the next
one, no rush!

Dave

> On 2020/10/30 22:09, Mark Kanda wrote:
> > On 10/29/2020 10:58 PM, Chuan Zheng wrote:
> >> Remove redundant blank line which is left by Commit 662770af7c6e8c,
> >> also take this opportunity to remove redundant includes in dirtyrate.c.
> >>
> >> Signed-off-by: Chuan Zheng 
> > 
> > Reviewed-by: Mark Kanda 
> > 
> >> ---
> >> Â  migration/dirtyrate.c | 5 -
> >> Â  1 file changed, 5 deletions(-)
> >>
> >> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> >> index 8f728d2..ccb9814 100644
> >> --- a/migration/dirtyrate.c
> >> +++ b/migration/dirtyrate.c
> >> @@ -11,17 +11,12 @@
> >> Â Â  */
> >> Â  Â  #include "qemu/osdep.h"
> >> -
> >> Â  #include 
> >> Â  #include "qapi/error.h"
> >> Â  #include "cpu.h"
> >> -#include "qemu/config-file.h"
> >> -#include "exec/memory.h"
> >> Â  #include "exec/ramblock.h"
> >> -#include "exec/target_page.h"
> >> Â  #include "qemu/rcu_queue.h"
> >> Â  #include "qapi/qapi-commands-migration.h"
> >> -#include "migration.h"
> >> Â  #include "ram.h"
> >> Â  #include "trace.h"
> >> Â  #include "dirtyrate.h"
> >>
> 
> -- 
> Regards.
> Chuan
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 2/2] qemu-option: warn for short-form boolean options

2020-11-06 Thread Markus Armbruster

Paolo Bonzini  writes:

> Options such as "server" or "nowait", that are commonly found in -chardev,
> are sugar for "server=on" and "wait=off".  This is quite surprising and
> also does not have any notion of typing attached.  It is even possible to
> do "-device e1000,noid" and get a device with "id=off".
>
> Deprecate all this, except for -chardev and -spice where it is in
> wide use.

I consider this a misuse of deprecation, to be frank.  If something is
known to be unused, we just remove it.  Deprecation is precisely for
things that are used.  I'm with Daniel here: let's deprecate this sugar
everywhere.

Wide use may justify extending the deprecation grace period.

> Signed-off-by: Paolo Bonzini 
> ---
>  chardev/char.c |  1 +
>  docs/system/deprecated.rst |  7 +++
>  include/qemu/option.h  |  1 +
>  tests/test-qemu-opts.c |  1 +
>  ui/spice-core.c|  1 +
>  util/qemu-option.c | 21 ++---
>  6 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/chardev/char.c b/chardev/char.c
> index 78553125d3..108da615df 100644
> --- a/chardev/char.c
> +++ b/chardev/char.c
> @@ -829,6 +829,7 @@ Chardev *qemu_chr_find(const char *name)
>  
>  QemuOptsList qemu_chardev_opts = {
>  .name = "chardev",
> +.allow_flag_options = true, /* server, nowait, etc. */
>  .implied_opt_name = "backend",
>  .head = QTAILQ_HEAD_INITIALIZER(qemu_chardev_opts.head),
>  .desc = {
> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> index 32a0e620db..0e7edf7e56 100644
> --- a/docs/system/deprecated.rst
> +++ b/docs/system/deprecated.rst
> @@ -146,6 +146,13 @@ Drives with interface types other than ``if=none`` are 
> for onboard
>  devices.  It is possible to use drives the board doesn't pick up with
>  -device.  This usage is now deprecated.  Use ``if=none`` instead.
>  
> +Short-form boolean options (since 5.2)
> +''
> +
> +Boolean options such as ``share=on``/``share=off`` can be written
> +in short form as ``share`` and ``noshare``.  This is deprecated
> +for all command-line options except ``-chardev` and ``-spice``, for
> +which the short form was in wide use.

Unlike the commit message, the text here misleads readers into assuming
the sugar applies only to boolean options.

>  
>  QEMU Machine Protocol (QMP) commands
>  
> diff --git a/include/qemu/option.h b/include/qemu/option.h
> index ac69352e0e..046ac06a1f 100644
> --- a/include/qemu/option.h
> +++ b/include/qemu/option.h
> @@ -65,6 +65,7 @@ struct QemuOptsList {
>  const char *name;
>  const char *implied_opt_name;
>  bool merge_lists;  /* Merge multiple uses of option into a single list? 
> */
> +bool allow_flag_options; /* Whether to warn for short-form boolean 
> options */

I disagree with having this option.  I'm reviewing it anyway.

Staying under the 80 characters limit is really, really easy here:

   bool allow_flag_options;/* Warn on short-form boolean options? */

I had to read ahead to figure out that false means warn, true means
accept silently.  The comment is confusing because "whether to warn"
suggests "warn if true", which is wrong.

Clearer (I think), and even shorter:

   bool allow_bool_sugar;  /* Is boolean option sugar okay? */

>  QTAILQ_HEAD(, QemuOpts) head;
>  QemuOptDesc desc[];
>  };
> diff --git a/tests/test-qemu-opts.c b/tests/test-qemu-opts.c
> index 297ffe79dd..a74c475773 100644
> --- a/tests/test-qemu-opts.c
> +++ b/tests/test-qemu-opts.c
> @@ -77,6 +77,7 @@ static QemuOptsList opts_list_02 = {
>  static QemuOptsList opts_list_03 = {
>  .name = "opts_list_03",
>  .implied_opt_name = "implied",
> +.allow_flag_options = true,
>  .head = QTAILQ_HEAD_INITIALIZER(opts_list_03.head),
>  .desc = {
>  /* no elements => accept any params */

Can you point me to the tests this hunk affects?

Do we have test coverage for boolean sugar with both values of
allow_flag_options?

If no, why is that okay?

> diff --git a/ui/spice-core.c b/ui/spice-core.c
> index eea52f5389..08f834fa41 100644
> --- a/ui/spice-core.c
> +++ b/ui/spice-core.c
> @@ -397,6 +397,7 @@ static SpiceChannelList *qmp_query_spice_channels(void)
>  
>  static QemuOptsList qemu_spice_opts = {
>  .name = "spice",
> +.allow_flag_options = true, /* disable-agent-file-xfer, sasl, unix, etc. 
> */
>  .head = QTAILQ_HEAD_INITIALIZER(qemu_spice_opts.head),
>  .merge_lists = true,
>  .desc = {
> diff --git a/util/qemu-option.c b/util/qemu-option.c
> index 61fc96f9dd..858860377b 100644
> --- a/util/qemu-option.c
> +++ b/util/qemu-option.c
> @@ -763,10 +763,12 @@ void qemu_opts_print(QemuOpts *opts, const char 
> *separator)
>  
>  static const char *get_opt_name_value(const char *params,
>const char *firstname,
> +  bool warn_on_flag,

Two callers pass false,

Re: [PATCH] CODING_STYLE.rst: Be less strict about 80 character limit

2020-11-06 Thread Michael S. Tsirkin

On Fri, Nov 06, 2020 at 11:29:40AM +, Peter Maydell wrote:
> Relax the wording about line lengths a little bit; this goes with the
> checkpatch changes to warn at 100 characters rather than 80.
> 
> (Compare the Linux kernel commit bdc48fa11e46f8; our coding style is
> not theirs, but the rationale is good and applies to us too.)
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Michael S. Tsirkin 

> ---
>  CODING_STYLE.rst | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/CODING_STYLE.rst b/CODING_STYLE.rst
> index 8b13ef0669e..7bf4e39d487 100644
> --- a/CODING_STYLE.rst
> +++ b/CODING_STYLE.rst
> @@ -85,8 +85,13 @@ Line width
>  Lines should be 80 characters; try not to make them longer.
>  
>  Sometimes it is hard to do, especially when dealing with QEMU subsystems
> -that use long function or symbol names.  Even in that case, do not make
> -lines much longer than 80 characters.
> +that use long function or symbol names. If wrapping the line at 80 columns
> +is obviously less readable and more awkward, prefer not to wrap it; better
> +to have an 85 character line than one which is awkwardly wrapped.
> +
> +Even in that case, try not to make lines much longer than 80 characters.
> +(The checkpatch script will warn at 100 characters, but this is intended
> +as a guard against obviously-overlength lines, not a target.)
>  
>  Rationale:
>  
> -- 
> 2.20.1

Re: [PATCH 1/1] Change the order of g_free(info) and tracepoint

2020-11-06 Thread Alex Bennée



Kirti Wankhede  writes:

> Fixes Coverity issue:
> CID 1436126:  Memory - illegal accesses  (USE_AFTER_FREE)
>
> Fixes: a9e271ec9b36 ("vfio: Add migration region initialization and finalize
> function")
>
> Signed-off-by: Kirti Wankhede 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH] scripts/checkpatch.pl: Modify the line length limit of the code

2020-11-06 Thread Peter Maydell

On Fri, 6 Nov 2020 at 16:08, Markus Armbruster  wrote:
> Peter Maydell  writes:
> > Personally I just don't think checkpatch should be nudging people
> > into folding 85-character lines, especially when there are
> > multiple very similar lines in a row and only one would get
> > folded, eg the prototypes in target/arm/helper.h -- some of
> > these just edge beyond 80 characters and I think wrapping them
> > is clearly worse for readability.
>
> The warning's intent is "are you sure this line is better not broken?"
> The problem is people treating it as an order that absolves them from
> using good judgement instead.
>
> I propose to fix it by phrasing the warning more clearly.  Instead of
>
> WARNING: line over 80 characters
>
> we could say
>
> WARNING: line over 80 characters
> Please examine the line, and use your judgement to decide whether
> it should be broken.

I would suggest that for a line over 80 characters and less than
85 characters, the answer is going to be "better not broken"
a pretty high percentage of the time; that is, the warning
has too many false positives, and we should tune it to have fewer.

And the lure of "produce no warnings" is a strong one, so
we should be at least cautious about what our tooling is
nudging us into doing.

thanks
-- PMM

Re: [PATCH] scripts/checkpatch.pl: Modify the line length limit of the code

2020-11-06 Thread Markus Armbruster

Peter Maydell  writes:

> On Fri, 6 Nov 2020 at 14:08, Philippe Mathieu-Daudé  wrote:
>> Can we keep the error please? Maybe 132 is the next display logical
>> limit once we increased the warning from 80 to 100.
>>
>> I understand hardware evolved, we have larger displays with better
>> resolution and can fit more characters in a line.
[...]
>
> Personally I just don't think checkpatch should be nudging people
> into folding 85-character lines, especially when there are
> multiple very similar lines in a row and only one would get
> folded, eg the prototypes in target/arm/helper.h -- some of
> these just edge beyond 80 characters and I think wrapping them
> is clearly worse for readability.

The warning's intent is "are you sure this line is better not broken?"
The problem is people treating it as an order that absolves them from
using good judgement instead.

I propose to fix it by phrasing the warning more clearly.  Instead of

WARNING: line over 80 characters

we could say

WARNING: line over 80 characters
Please examine the line, and use your judgement to decide whether
it should be broken.

>   If we don't want people
> sending us "style fix" patches which wrap >80 char lines
> (which I think we do not) then we shouldn't have checkpatch
> complain about them, because if it does then that's what we get.

I think that's throwing out the baby with the bathwater.

checkpatch's purpose is not guiding inexperienced developers to style
issues they can fix.  It is relieving maintainers of the tedium of
catching and explaining certain kinds of issues patches frequently have.

Neutering checks that have led inexperienced developers to post less
than useful patches may well relieve maintainers of having to reject
such patches.  But it comes a price: maintainers and contributors lose
automated help with checking useful patches.  I consider that a bad
trade.

We may want to discourage inexperienced contributors from sending us
style fix patches.  Fixing style takes good taste, which develops only
with experience.  Moreover, fixing up style builds only little
experience.  At best, it exercises "configure; make check" and the patch
submission process and running "make check").  There are better ways to
get your feet wet.

Re: [PATCH 1/1] Change the order of g_free(info) and tracepoint

2020-11-06 Thread David Edmondson

On Friday, 2020-11-06 at 19:09:24 +0530, Kirti Wankhede wrote:

> Fixes Coverity issue:
> CID 1436126:  Memory - illegal accesses  (USE_AFTER_FREE)
>
> Fixes: a9e271ec9b36 ("vfio: Add migration region initialization and finalize
> function")
>
> Signed-off-by: Kirti Wankhede 

Maybe "fix use after free in vfio_migration_probe" as a summary?

Reviewed-by: David Edmondson 

> ---
>  hw/vfio/migration.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 3ce285ea395d..55261562d4f3 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -897,8 +897,8 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error 
> **errp)
>  goto add_blocker;
>  }
>  
> -g_free(info);
>  trace_vfio_migration_probe(vbasedev->name, info->index);
> +g_free(info);
>  return 0;
>  
>  add_blocker:
> -- 
> 2.7.0

dme.
-- 
I think I waited too long, I'm moving into the dollhouse.

Re: [PATCH v2 00/44] Make qdev static property API usable by any QOM type

2020-11-06 Thread Eduardo Habkost

On Fri, Nov 06, 2020 at 10:45:11AM +0100, Kevin Wolf wrote:
> Am 04.11.2020 um 16:59 hat Eduardo Habkost geschrieben:
> > This series refactor the qdev property code so the static
> > property system can be used by any QOM type.  As an example, at
> > the end of the series some properties in TYPE_MACHINE are
> > converted to static properties to demonstrate the new API.
> > 
> > Changes v1 -> v2
> > 
> > 
> > * Rename functions and source files to call the new feature
> >   "field property" instead of "static property"
> > 
> > * Change the API signature from an array-based interface similar
> >   to device_class_set_props() to a object_property_add()-like
> >   interface.
> > 
> >   This means instead of doing this:
> > 
> > object_class_property_add_static_props(oc, my_array_of_props);
> > 
> >   properties are registered like this:
> > 
> > object_class_property_add_field(oc, "property-name"
> > PROP_XXX(MyState, my_field),
> > prop_allow_set_always);
> > 
> >   where PROP_XXX is a PROP_* macro like PROP_STRING, PROP_BOOL,
> >   etc.
> 
> In comparison, I really like the resulting code from the array based
> interface in v1 better.
> 
> I think it's mostly for two reasons: First, the array is much more
> compact and easier to read. And maybe even more importantly, you know
> it's static data and only static data. The function based interface can
> be mixed with other code or the parameter list can contain calls to
> other functions with side effects, so there are a lot more opportunities
> for surprises.

This is a really good point, and I strongly agree with it.
Letting code do funny tricks at runtime is one of the reasons QOM
properties became hard to introspect.

> 
> What I didn't like about the v1 interface is that there is still a
> separate object_class_property_set_description() for each property, but
> I think that could have been fixed by moving the description to the
> definitions in the array, too.

This would be very easy to implement.

> 
> On Fri, Oct 30, 2020 at 06:10:34PM +0100, Paolo Bonzini wrote:
> > On 29/10/20 23:02, Eduardo Habkost wrote:
> > > +static Property machine_props[] = {
> > > +DEFINE_PROP_STRING("kernel", MachineState, kernel_filename),
> > > +DEFINE_PROP_STRING("initrd", MachineState, initrd_filename),
> > > +DEFINE_PROP_STRING("append", MachineState, kernel_cmdline),
> > > +DEFINE_PROP_STRING("dtb", MachineState, dtb),
> > > +DEFINE_PROP_STRING("dumpdtb", MachineState, dumpdtb),
> > > +DEFINE_PROP_STRING("dt-compatible", MachineState, dt_compatible),
> > > +DEFINE_PROP_STRING("firmware", MachineState, firmware),
> > > +DEFINE_PROP_STRING("memory-backend", MachineState, ram_memdev_id),
> > > +DEFINE_PROP_END_OF_LIST(),
> > > +};
> > > +
> >
> > While I think generalizing the _code_ for static properties is obviously
> > a good idea, I am not sure about generalizing the interface for adding them.
> >
> > The reason is that we already have a place for adding properties in
> > class_init, and having a second makes things "less local", so to speak.
> 
> As long as you have the function call to apply the properites array in
> .class_init, it should be obvious enough what you're doing.
> 
> Of course, I think we should refrain from mixing both styles in a single
> object, but generally speaking the array feels so much better that I
> don't think we should reject it just because QOM only had a different
> interface so far (and the property array is preexisting in qdev, too, so
> we already have differences between objects - in fact, the majority of
> objects is probably qdev today).

This is also a strong argument.  The QEMU code base has ~500
matches for "object*_property_add*" calls, and ~2100 for
"DEFINE_PROP*".

Converting qdev arrays to object_class_property_add_*() calls
would be a huge effort with no gains.  The end result would be
two different APIs for registering class field properties
coexisting for a long time, and people still confused on what's
the preferred API.

-- 
Eduardo

Re: [PATCH] qtest: Fix bad printf format specifiers

2020-11-06 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> On 11/6/20 7:33 AM, Markus Armbruster wrote:
[...]
>> In other words "%" PRIu32 is just a less legible alias for "%u" in all
>> cases that matter.
>
> Can we add a checkpatch rule to avoid using 'PRI[dux]32' format,
> so it is clear for everyone?

I guess we could, but *I* can't: -ENOTIME, sorry.

Re: Migrating to the gitlab issue tracker

2020-11-06 Thread Laszlo Ersek

On 11/04/20 18:19, Daniel P. Berrangé wrote:

> This just sounds like fairly niche requirements for which directly
> subscribing to the project issue tracker will satisfy 99% of the time.

OK.

Laszlo

Re: [PATCH for-5.2 3/4] qtest: escape device name in device-introspect-test

2020-11-06 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 06/11/20 14:15, Markus Armbruster wrote:
>> Paolo Bonzini  writes:
>> 
>>> device-introspect-test uses HMP, so it should escape the device name
>>> properly.  Because of this, a few devices that had commas in their
>>> names were escaping testing.
>>> Signed-off-by: Paolo Bonzini 
>>
>> $ git-grep '\.name *= *"[^"]*,' | cat
>> hw/block/fdc.c:.name  = "SUNW,fdtwo"
>> Any others?
>
> Not that I know, but this is a bug anyway. :)

Yes, but "a few devices" made me curious.

Re: [PATCH 0/2] Increase amount of data for monitor to read

2020-11-06 Thread Andrey Shinkevich


Please exclude this address when reply:

jc...@redhat.com

Andrey

Re: [PATCH v2 2/7] block: add bdrv_replace_node_common()

2020-11-06 Thread Alberto Garcia

On Fri 06 Nov 2020 01:42:36 PM CET, Vladimir Sementsov-Ogievskiy wrote:
> Add new parameter to bdrv_replace_node(): auto_skip. With
> auto_skip=false we'll have stricter behavior: update _all_ from
> parents or fail. New behaviour will be used in the following commit in
> block.c, so keep original function name as public interface.
>
> Note: new error message is a bit funny in contrast with further
> "Cannot" in case of frozen child, but we'd better keep some difference
> to make it possible to distinguish one from another on failure. Still,
> actually we'd better refactor should_update_child() call to distinguish
> also different kinds of "should not". Let's do it later.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto

Re: [PATCH v2 0/4] linux/sparc: more get/set_context fixes

2020-11-06 Thread Peter Maydell

On Fri, 6 Nov 2020 at 15:27, Peter Maydell  wrote:
>
> Based-on: 20201105212314.9628-1-peter.mayd...@linaro.org
> ("[PATCH for-5.2 0/3] linux-user: fix various sparc64 guest bugs")
>
> This series fixes a few more issues with our sparc linux-user
> sparc64_get_context() and sparc64_set_context() implementation:
>  * we weren't handling FPU regs correctly, and also the way
>we coded the handling triggered Coverity warnings
>  * some stray pointless error checks
>  * we shouldn't restore %g7 in set_context
>  * we weren't saving and restoring tstate correctly

The 'v2' in the subject tag is wrong, incidentally; this is
the first version of this series :-)

-- PMM

[PATCH v2 2/4] linux-user/sparc: Remove unneeded checks of 'err' from sparc64_get_context()

2020-11-06 Thread Peter Maydell

Unlike the kernel macros, our __get_user() and __put_user() do not
return a failure code.  Kernel code typically has a style of
  err |= __get_user(...); err |= __get_user(...);
and then checking err at the end.  In sparc64_get_context() our
version of the code dropped the accumulating into err but left the
"if (err) goto do_sigsegv" checks, which will never be taken. Delete
unnecessary if()s.

Signed-off-by: Peter Maydell 
---
 linux-user/sparc/signal.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/linux-user/sparc/signal.c b/linux-user/sparc/signal.c
index e661a769cb1..43dcd137f51 100644
--- a/linux-user/sparc/signal.c
+++ b/linux-user/sparc/signal.c
@@ -555,8 +555,6 @@ void sparc64_get_context(CPUSPARCState *env)
 for (i = 0; i < TARGET_NSIG_WORDS; i++, dst++, src++) {
 __put_user(*src, dst);
 }
-if (err)
-goto do_sigsegv;
 }
 
 /* XXX: tstate must be saved properly */
@@ -598,8 +596,6 @@ void sparc64_get_context(CPUSPARCState *env)
  * hidden behind an "if (fenab)" where fenab is always 0).
  */
 
-if (err)
-goto do_sigsegv;
 unlock_user_struct(ucp, ucp_addr, 1);
 return;
 do_sigsegv:
-- 
2.20.1

[PATCH v2 3/4] linux-user/sparc: Don't restore %g7 in sparc64_set_context()

2020-11-06 Thread Peter Maydell

The kernel does not restore the g7 register in sparc64_set_context();
neither should we. (We still save it in sparc64_get_context().)

Signed-off-by: Peter Maydell 
---
 linux-user/sparc/signal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/sparc/signal.c b/linux-user/sparc/signal.c
index 43dcd137f51..ed32c7abd17 100644
--- a/linux-user/sparc/signal.c
+++ b/linux-user/sparc/signal.c
@@ -447,7 +447,7 @@ void sparc64_set_context(CPUSPARCState *env)
 __get_user(env->gregs[4], (&(*grp)[SPARC_MC_G4]));
 __get_user(env->gregs[5], (&(*grp)[SPARC_MC_G5]));
 __get_user(env->gregs[6], (&(*grp)[SPARC_MC_G6]));
-__get_user(env->gregs[7], (&(*grp)[SPARC_MC_G7]));
+/* Skip g7 as that's the thread register in userspace */
 
 /*
  * Note that unlike the kernel, we didn't need to mess with the
-- 
2.20.1

[PATCH v2 4/4] linux-user/sparc: Handle tstate in sparc64_get/set_context()

2020-11-06 Thread Peter Maydell

Correctly implement save/restore of the tstate field in
sparc64_get_context() and sparc64_set_context():
 * Don't use the CWP value from the guest in set_context
 * Construct and save a tstate value rather than leaving
   it as zero in get_context

To do this we factor out the "calculate TSTATE value from CPU state"
code from sparc_cpu_do_interrupt() into its own sparc64_tstate()
function; that in turn requires us to move some of the function
prototypes out from inside a CPU_NO_IO_DEFS ifdef guard.

Signed-off-by: Peter Maydell 
---
 target/sparc/cpu.h  | 24 
 linux-user/sparc/signal.c   |  7 +++
 target/sparc/int64_helper.c |  5 +
 3 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/target/sparc/cpu.h b/target/sparc/cpu.h
index 277254732b9..4b2290650be 100644
--- a/target/sparc/cpu.h
+++ b/target/sparc/cpu.h
@@ -608,10 +608,6 @@ target_ulong cpu_get_psr(CPUSPARCState *env1);
 void cpu_put_psr(CPUSPARCState *env1, target_ulong val);
 void cpu_put_psr_raw(CPUSPARCState *env1, target_ulong val);
 #ifdef TARGET_SPARC64
-target_ulong cpu_get_ccr(CPUSPARCState *env1);
-void cpu_put_ccr(CPUSPARCState *env1, target_ulong val);
-target_ulong cpu_get_cwp64(CPUSPARCState *env1);
-void cpu_put_cwp64(CPUSPARCState *env1, int cwp);
 void cpu_change_pstate(CPUSPARCState *env1, uint32_t new_pstate);
 void cpu_gl_switch_gregs(CPUSPARCState *env, uint32_t new_gl);
 #endif
@@ -829,4 +825,24 @@ static inline bool tb_am_enabled(int tb_flags)
 #endif
 }
 
+#ifdef TARGET_SPARC64
+/* win_helper.c */
+target_ulong cpu_get_ccr(CPUSPARCState *env1);
+void cpu_put_ccr(CPUSPARCState *env1, target_ulong val);
+target_ulong cpu_get_cwp64(CPUSPARCState *env1);
+void cpu_put_cwp64(CPUSPARCState *env1, int cwp);
+
+static inline uint64_t sparc64_tstate(CPUSPARCState *env)
+{
+uint64_t tstate = (cpu_get_ccr(env) << 32) |
+((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
+cpu_get_cwp64(env);
+
+if (env->def.features & CPU_FEATURE_GL) {
+tstate |= (env->gl & 7ULL) << 40;
+}
+return tstate;
+}
+#endif
+
 #endif
diff --git a/linux-user/sparc/signal.c b/linux-user/sparc/signal.c
index ed32c7abd17..a6c7c7664a2 100644
--- a/linux-user/sparc/signal.c
+++ b/linux-user/sparc/signal.c
@@ -438,9 +438,9 @@ void sparc64_set_context(CPUSPARCState *env)
 env->npc = npc;
 __get_user(env->y, &((*grp)[SPARC_MC_Y]));
 __get_user(tstate, &((*grp)[SPARC_MC_TSTATE]));
+/* Honour TSTATE_ASI, TSTATE_ICC and TSTATE_XCC only */
 env->asi = (tstate >> 24) & 0xff;
-cpu_put_ccr(env, tstate >> 32);
-cpu_put_cwp64(env, tstate & 0x1f);
+cpu_put_ccr(env, (tstate >> 32) & 0xff);
 __get_user(env->gregs[1], (&(*grp)[SPARC_MC_G1]));
 __get_user(env->gregs[2], (&(*grp)[SPARC_MC_G2]));
 __get_user(env->gregs[3], (&(*grp)[SPARC_MC_G3]));
@@ -557,8 +557,7 @@ void sparc64_get_context(CPUSPARCState *env)
 }
 }
 
-/* XXX: tstate must be saved properly */
-//__put_user(env->tstate, &((*grp)[SPARC_MC_TSTATE]));
+__put_user(sparc64_tstate(env), &((*grp)[SPARC_MC_TSTATE]));
 __put_user(env->pc, &((*grp)[SPARC_MC_PC]));
 __put_user(env->npc, &((*grp)[SPARC_MC_NPC]));
 __put_user(env->y, &((*grp)[SPARC_MC_Y]));
diff --git a/target/sparc/int64_helper.c b/target/sparc/int64_helper.c
index f3e7f32de61..735668f5006 100644
--- a/target/sparc/int64_helper.c
+++ b/target/sparc/int64_helper.c
@@ -131,9 +131,7 @@ void sparc_cpu_do_interrupt(CPUState *cs)
 }
 tsptr = cpu_tsptr(env);
 
-tsptr->tstate = (cpu_get_ccr(env) << 32) |
-((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) |
-cpu_get_cwp64(env);
+tsptr->tstate = sparc64_tstate(env);
 tsptr->tpc = env->pc;
 tsptr->tnpc = env->npc;
 tsptr->tt = intno;
@@ -148,7 +146,6 @@ void sparc_cpu_do_interrupt(CPUState *cs)
 }
 
 if (env->def.features & CPU_FEATURE_GL) {
-tsptr->tstate |= (env->gl & 7ULL) << 40;
 cpu_gl_switch_gregs(env, env->gl + 1);
 env->gl++;
 }
-- 
2.20.1

[PATCH v2 0/4] linux/sparc: more get/set_context fixes

2020-11-06 Thread Peter Maydell

Based-on: 20201105212314.9628-1-peter.mayd...@linaro.org
("[PATCH for-5.2 0/3] linux-user: fix various sparc64 guest bugs")

This series fixes a few more issues with our sparc linux-user
sparc64_get_context() and sparc64_set_context() implementation:
 * we weren't handling FPU regs correctly, and also the way
   we coded the handling triggered Coverity warnings
 * some stray pointless error checks
 * we shouldn't restore %g7 in set_context
 * we weren't saving and restoring tstate correctly

My main aim here was to deal with the Coverity errors, but
the rest are things I noticed while I was working on the
code or which had fixme comments, and I figured I'd fix
them while the code was fresh in my mind.

thanks
-- PMM

Peter Maydell (4):
  linux-user/sparc: Correct sparc64_get/set_context() FPU handling
  linux-user/sparc: Remove unneeded checks of 'err' from
sparc64_get_context()
  linux-user/sparc: Don't restore %g7 in sparc64_set_context()
  linux-user/sparc: Handle tstate in sparc64_get/set_context()

 target/sparc/cpu.h  | 28 +---
 linux-user/sparc/signal.c   | 87 -
 target/sparc/int64_helper.c |  5 +--
 3 files changed, 71 insertions(+), 49 deletions(-)

-- 
2.20.1

1 2 >

1 - 100 of 169 matches

Mail list logo