Re: [PATCH] tests/vm: update openbsd to release 6.8
ping. On 10/27/2020 6:22 AM, Brad Smith wrote: On Tue, Oct 27, 2020 at 11:05:20AM +0100, Philippe Mathieu-Daud?? wrote: On 10/27/20 6:30 AM, Brad Smith wrote: tests/vm: update openbsd to release 6.8 A double dash at the end of a package name removes ambiguity when the intent is to install a non-FLAVORed package. Signed-off-by: Brad Smith Reviewed-by: Gerd Hoffmann Tested-by: Gerd Hoffmann Reviewed-by: Philippe Mathieu-Daud?? I confirm Brad sent us this patch off-list, and - our review comments are addressed, - the tags are correct. The patch format itself seems broken... Like a copy/paste into an email client... Well, git diff vs a format-patch. Subject: [PATCH] tests/vm: update openbsd to release 6.8 A double dash at the end of a package name removes ambiguity when the intent is to install a non-FLAVORed package. Signed-off-by: Brad Smith Reviewed-by: Gerd Hoffmann Tested-by: Gerd Hoffmann Reviewed-by: Philippe Mathieu-Daudé --- tests/vm/openbsd | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tests/vm/openbsd b/tests/vm/openbsd index 8356646f21..5ffa4f1b37 100755 --- a/tests/vm/openbsd +++ b/tests/vm/openbsd @@ -22,8 +22,8 @@ class OpenBSDVM(basevm.BaseVM): name = "openbsd" arch = "x86_64" -link = "https://cdn.openbsd.org/pub/OpenBSD/6.6/amd64/install66.iso; -csum = "b22e63df56e6266de6bbeed8e9be0fbe9ee2291551c5bc03f3cc2e4ab9436ee3" +link = "https://cdn.openbsd.org/pub/OpenBSD/6.8/amd64/install68.iso; +csum = "47e291fcc2d0c1a8ae0b66329f040b33af755b6adbd21739e20bb5ad56f62b6c" size = "20G" pkgs = [ # tools @@ -36,10 +36,10 @@ class OpenBSDVM(basevm.BaseVM): "bash", "gmake", "gsed", -"gettext", +"gettext-tools", # libs: usb -"libusb1", +"libusb1--", # libs: crypto "gnutls",
Re: [PATCH v2 11/11] qapi/introspect.py: Add docstring to _tree_to_qlit
On Mon, Oct 26, 2020 at 03:42:51PM -0400, John Snow wrote: > Signed-off-by: John Snow > --- Not a big deal, but maybe move this to an earlier position in the series? IMO it'd make other reviewer's life easier. Either way: Reviewed-by: Cleber Rosa signature.asc Description: PGP signature
Re: [PATCH v2 10/11] qapi/introspect.py: improve readability of _tree_to_qlit
On Mon, Oct 26, 2020 at 03:42:50PM -0400, John Snow wrote: > Subjective, but I find getting rid of the comprehensions helps. Also, > divide the sections into scalar and non-scalar sections, and remove > old-style string formatting. > It's certainly a matter of picking your favorite poison... but for the most part I think this is an improvement... > Signed-off-by: John Snow > --- > scripts/qapi/introspect.py | 37 + > 1 file changed, 21 insertions(+), 16 deletions(-) > > diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py > index a261e402d69..d4f28485ba5 100644 > --- a/scripts/qapi/introspect.py > +++ b/scripts/qapi/introspect.py > @@ -100,7 +100,7 @@ def indent(level: int) -> str: > > ret = '' > if obj.comment: > -ret += indent(level) + '/* %s */\n' % obj.comment > +ret += indent(level) + f"/* {obj.comment} */\n" > if obj.ifcond: > ret += gen_if(obj.ifcond) > ret += _tree_to_qlit(obj.value, level, suppress_first_indent) > @@ -111,31 +111,36 @@ def indent(level: int) -> str: > ret = '' > if not suppress_first_indent: > ret += indent(level) > + > +# Scalars: > if obj is None: > ret += 'QLIT_QNULL' > elif isinstance(obj, str): > -ret += 'QLIT_QSTR(' + to_c_string(obj) + ')' > +ret += f"QLIT_QSTR({to_c_string(obj)})" > +elif isinstance(obj, bool): > +ret += "QLIT_QBOOL({:s})".format(str(obj).lower()) > + > +# Non-scalars: > elif isinstance(obj, list): > -elts = [_tree_to_qlit(elt, level + 1).strip('\n') > -for elt in obj] > -elts.append(indent(level + 1) + "{}") > ret += 'QLIT_QLIST(((QLitObject[]) {\n' > -ret += '\n'.join(elts) + '\n' > +for value in obj: > +ret += _tree_to_qlit(value, level + 1).strip('\n') + '\n' > +ret += indent(level + 1) + '{}\n' > ret += indent(level) + '}))' > elif isinstance(obj, dict): > -elts = [] > -for key, value in sorted(obj.items()): > -elts.append(indent(level + 1) + '{ %s, %s }' % > -(to_c_string(key), > - _tree_to_qlit(value, level + 1, True))) > -elts.append(indent(level + 1) + '{}') > ret += 'QLIT_QDICT(((QLitDictEntry[]) {\n' > -ret += ',\n'.join(elts) + '\n' > +for key, value in sorted(obj.items()): > +ret += indent(level + 1) + "{{ {:s}, {:s} }},\n".format( > +to_c_string(key), > +_tree_to_qlit(value, level + 1, suppress_first_indent=True) > +) > +ret += indent(level + 1) + '{}\n' > ret += indent(level) + '}))' > -elif isinstance(obj, bool): > -ret += 'QLIT_QBOOL(%s)' % ('true' if obj else 'false') > else: > -assert False# not implemented > +raise NotImplementedError( This is an improvement, but doesn't fall into the *readability* bucket. IMO it's worth splitting it out. - Cleber. signature.asc Description: PGP signature
Re: [PATCH v2 09/11] qapi/introspect.py: create a typed 'Annotated' data strutcure
On Mon, Oct 26, 2020 at 03:42:49PM -0400, John Snow wrote: > This replaces _make_tree with Annotated(). By creating it as a generic > container, we can more accurately describe the exact nature of this > particular value. i.e., each Annotated object is actually an > Annotated, describing its contained value. > > This adds stricter typing to Annotated nodes and extra annotated > information. It also replaces a check of "isinstance tuple" with the > much more explicit "isinstance Annotated" which is guaranteed not to > break if a tuple is accidentally introduced into the type tree. (Perhaps > as a result of a bad conversion from a list.) > > Signed-off-by: John Snow > --- > scripts/qapi/introspect.py | 97 +++--- > 1 file changed, 48 insertions(+), 49 deletions(-) > > diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py > index a0978cb3adb..a261e402d69 100644 > --- a/scripts/qapi/introspect.py > +++ b/scripts/qapi/introspect.py > @@ -13,12 +13,13 @@ > from typing import ( > Any, > Dict, > +Generic, > +Iterable, > List, > Optional, > Sequence, > -Tuple, > +TypeVar, > Union, > -cast, > ) > > from .common import ( > @@ -63,50 +64,48 @@ > _scalar = Union[str, bool, None] > _nonscalar = Union[Dict[str, _stub], List[_stub]] > _value = Union[_scalar, _nonscalar] > -TreeValue = Union[_value, 'Annotated'] > +TreeValue = Union[_value, 'Annotated[_value]'] > > # This is just an alias for an object in the structure described above: > _DObject = Dict[str, object] > > -# Represents the annotations themselves: > -Annotations = Dict[str, object] > > -# Represents an annotated node (of some kind). > -Annotated = Tuple[_value, Annotations] > +_AnnoType = TypeVar('_AnnoType', bound=TreeValue) Here it becomes much harder to keep the suggestions I made on patch 5 because of forward and backward references. Reviewed-by: Cleber Rosa signature.asc Description: PGP signature
Re: [PATCH v2 08/11] qapi/introspect.py: replace 'extra' dict with 'comment' argument
On Mon, Oct 26, 2020 at 03:42:48PM -0400, John Snow wrote: > This is only used to pass in a dictionary with a comment already set, so > skip the runaround and just accept the comment. > > Signed-off-by: John Snow > --- Reviewed-by: Cleber Rosa signature.asc Description: PGP signature
Re: [PATCH v2 07/11] qapi/introspect.py: Unify return type of _make_tree()
On Mon, Oct 26, 2020 at 03:42:47PM -0400, John Snow wrote: > Returning two different types conditionally can be complicated to > type. Let's always return a tuple for consistency. This seems like a standalone change. > Prohibit the use of > annotations with dict-values in this circumstance. It can be implemented > later if and when the need for it arises. And this seems like another change. - Cleber. signature.asc Description: PGP signature
Re: [PATCH v2 06/11] qapi/introspect.py: add _gen_features helper
On Mon, Oct 26, 2020 at 03:42:46PM -0400, John Snow wrote: > _make_tree might receive a dict or some other type. Adding features > information should arguably be performed by the caller at such a time > when we know the type of the object and don't have to re-interrogate it. > > Signed-off-by: John Snow > --- Reviewed-by: Cleber Rosa signature.asc Description: PGP signature
Re: [PATCH v2 05/11] qapi/introspect.py: add preliminary type hint annotations
On Mon, Oct 26, 2020 at 03:42:45PM -0400, John Snow wrote: > The typing of _make_tree and friends is a bit involved, but it can be > done with some stubbed out types and a bit of elbow grease. The > forthcoming patches attempt to make some simplifications, but having the > type hints in advance may aid in review of subsequent patches. > > > Some notes on the abstract types used at this point, and what they > represent: > > - TreeValue represents any object in the type tree. _make_tree is an > optional call -- not every node in the final type tree will have been > passed to _make_tree, so this type encompasses not only what is passed > to _make_tree (dicts, strings) or returned from it (dicts, strings, a > 2-tuple), but any recursive value for any of the dicts passed to > _make_tree -- which includes lists, strings, integers, null constants, > and so on. > > - _DObject is a type alias I use to mean "A JSON-style object, > represented as a Python dict." There is no "JSON" type in Python, they > are converted natively to recursively nested dicts and lists, with > leaf values of str, int, float, None, True/False and so on. This type > structure is not possible to accurately portray in mypy yet, so a > placeholder is used. > > In this case, _DObject is being used to refer to SchemaInfo-like > structures as defined in qapi/introspect.json, OR any sub-object > values they may reference. We don't have strong typing available for > those, so a generic alternative is used. > > - Extra refers explicitly to the dict containing "extra" information > about a node in the tree. mypy does not offer per-key typing for dicts > in Python 3.6, so this is the best we can do here. > > - Annotated refers to (one of) the return types of _make_tree: > It represents a 2-tuple of (TreeValue, Extra). > > > Signed-off-by: Eduardo Habkost > Signed-off-by: John Snow > --- > scripts/qapi/introspect.py | 157 - > scripts/qapi/mypy.ini | 5 -- > scripts/qapi/schema.py | 2 +- > 3 files changed, 121 insertions(+), 43 deletions(-) > > diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py > index 63f721ebfb6..803288a64e7 100644 > --- a/scripts/qapi/introspect.py > +++ b/scripts/qapi/introspect.py > @@ -10,7 +10,16 @@ > See the COPYING file in the top-level directory. > """ > > -from typing import Optional, Sequence, cast > +from typing import ( > +Any, > +Dict, > +List, > +Optional, > +Sequence, > +Tuple, > +Union, > +cast, > +) > > from .common import ( > c_name, > @@ -20,13 +29,56 @@ > ) > from .gen import QAPISchemaMonolithicCVisitor > from .schema import ( > +QAPISchema, > QAPISchemaArrayType, > QAPISchemaBuiltinType, > +QAPISchemaEntity, > +QAPISchemaEnumMember, > +QAPISchemaFeature, > +QAPISchemaObjectType, > +QAPISchemaObjectTypeMember, > QAPISchemaType, > +QAPISchemaVariant, > +QAPISchemaVariants, > ) > +from .source import QAPISourceInfo > > > -def _make_tree(obj, ifcond, features, extra=None): > +# This module constructs a tree-like data structure that is used to > +# generate the introspection information for QEMU. It behaves similarly > +# to a JSON value. > +# > +# A complexity over JSON is that our values may or may not be annotated. > +# > +# Un-annotated values may be: > +# Scalar: str, bool, None. > +# Non-scalar: List, Dict > +# _Value = Union[str, bool, None, Dict[str, Value], List[Value]] Here (and in a few other places) you mention `_Value`, but then define it as `_value` (lowercase). > +# > +# With optional annotations, the type of all values is: > +# TreeValue = Union[_Value, Annotated[_Value]] > +# > +# Sadly, mypy does not support recursive types, so we must approximate this. > +_stub = Any > +_scalar = Union[str, bool, None] > +_nonscalar = Union[Dict[str, _stub], List[_stub]] Why not use `Any` here instead of declaring/using `_stub`? > +_value = Union[_scalar, _nonscalar] > +TreeValue = Union[_value, 'Annotated'] > + Maybe declare `Annotations` first, then `Annotated`, then *use* `Annotated` proper here? - Cleber. signature.asc Description: PGP signature
Re: [Virtio-fs] [qemu-web PATCH v2] Add virtio-fs in OSv overview blog post
From: Dr. David Alan Gilbert * Fotis Xenakis (fo...@windowslive.com) wrote: > This post briefly goes over the main points of virtio-fs and OSv, a > unikernel running under QEMU/KVM and taking advantage of its virtio-fs > implementation. > > Changes since v1: > - Fixed wording and links, as suggested by Thomas Huth. > - Added a short example of virtio-fs usage in OSv. > > Signed-off-by: Fotis Xenakis > +One central point is OSv's support for booting from virtio-fs: this enables > +deploying a modified version or a whole new application **without > rebuilding** > +the image, just by adjusting its root file system contents on the host. Last, > +owing to the DAX window practically providing low-overhead access to the > host's > +page cache, scalability is also expected to excel, with it being a common > +concern due to the potentially high density of unikernels per host. Hi Fotis, Hello Dave, Since I'm not used to unikernels, I'm a little confused by this; I'd appreciate some explanation. In your unikernel, does the root filesystem just contain data? I mean being a 'unikernel' aren't all the binaries and support all linked into the kernel itself? Short answer: the root file system doesn't contain only data, the executable can also be loaded from it. Although a unikernel, it supports both the embedded-in-kernel and separate-filesystem approaches. OSv is more heavy-weight than most unikernels, in the sense that it supports a lot of features found in general-purpose OSs. One could just describe it as a specialized, light-weight OS. On the filesystem side, it's actually pretty close e.g. to Linux: it has a VFS and multiple filesystems: * pseudo-fs (devfs and others) * ramfs, which is indeed embedded into the kernel * ZFS and ROFS, on regular block devices * NFS, virtio-fs Also like Linux, it initially boots into its initramfs (embedded in the kernel) and then typically mounts a root file system from a device (ZFS, ROFS or virtio-fs), before executing the application code. In case this is not clear, please feel free to ask further! Fotis Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[PATCH v9 1/1] audio/jack: fix use after free segfault
This change registers a bottom handler to close the JACK client connection when a server shutdown signal is recieved. Without this libjack2 attempts to "clean up" old clients and causes a use after free segfault. Signed-off-by: Geoffrey McRae --- audio/jackaudio.c | 50 +++ 1 file changed, 37 insertions(+), 13 deletions(-) diff --git a/audio/jackaudio.c b/audio/jackaudio.c index 1e714b30bc..e00e19061a 100644 --- a/audio/jackaudio.c +++ b/audio/jackaudio.c @@ -25,6 +25,7 @@ #include "qemu/osdep.h" #include "qemu/module.h" #include "qemu/atomic.h" +#include "qemu/main-loop.h" #include "qemu-common.h" #include "audio.h" @@ -63,6 +64,7 @@ typedef struct QJackClient { QJackState state; jack_client_t *client; jack_nframes_t freq; +QEMUBH *shutdown_bh; struct QJack *j; int nchannels; @@ -87,6 +89,7 @@ QJackIn; static int qjack_client_init(QJackClient *c); static void qjack_client_connect_ports(QJackClient *c); static void qjack_client_fini(QJackClient *c); +QemuMutex qjack_shutdown_lock; static void qjack_buffer_create(QJackBuffer *buffer, int channels, int frames) { @@ -306,21 +309,27 @@ static int qjack_xrun(void *arg) return 0; } +static void qjack_shutdown_bh(void *opaque) +{ +QJackClient *c = (QJackClient *)opaque; +qjack_client_fini(c); +} + static void qjack_shutdown(void *arg) { QJackClient *c = (QJackClient *)arg; c->state = QJACK_STATE_SHUTDOWN; +qemu_bh_schedule(c->shutdown_bh); } static void qjack_client_recover(QJackClient *c) { -if (c->state == QJACK_STATE_SHUTDOWN) { -qjack_client_fini(c); +if (c->state != QJACK_STATE_DISCONNECTED) { +return; } /* packets is used simply to throttle this */ -if (c->state == QJACK_STATE_DISCONNECTED && -c->packets % 100 == 0) { +if (c->packets % 100 == 0) { /* if enabled then attempt to recover */ if (c->enabled) { @@ -489,15 +498,16 @@ static int qjack_init_out(HWVoiceOut *hw, struct audsettings *as, QJackOut *jo = (QJackOut *)hw; Audiodev *dev = (Audiodev *)drv_opaque; -qjack_client_fini(>c); - jo->c.out = true; jo->c.enabled = false; jo->c.nchannels = as->nchannels; jo->c.opt = dev->u.jack.out; +jo->c.shutdown_bh = qemu_bh_new(qjack_shutdown_bh, >c); + int ret = qjack_client_init(>c); if (ret != 0) { +qemu_bh_delete(jo->c.shutdown_bh); return ret; } @@ -525,15 +535,16 @@ static int qjack_init_in(HWVoiceIn *hw, struct audsettings *as, QJackIn *ji = (QJackIn *)hw; Audiodev *dev = (Audiodev *)drv_opaque; -qjack_client_fini(>c); - ji->c.out = false; ji->c.enabled = false; ji->c.nchannels = as->nchannels; ji->c.opt = dev->u.jack.in; +ji->c.shutdown_bh = qemu_bh_new(qjack_shutdown_bh, >c); + int ret = qjack_client_init(>c); if (ret != 0) { +qemu_bh_delete(ji->c.shutdown_bh); return ret; } @@ -555,7 +566,7 @@ static int qjack_init_in(HWVoiceIn *hw, struct audsettings *as, return 0; } -static void qjack_client_fini(QJackClient *c) +static void qjack_client_fini_locked(QJackClient *c) { switch (c->state) { case QJACK_STATE_RUNNING: @@ -564,28 +575,40 @@ static void qjack_client_fini(QJackClient *c) case QJACK_STATE_SHUTDOWN: jack_client_close(c->client); +c->client = NULL; + +qjack_buffer_free(>fifo); +g_free(c->port); + +c->state = QJACK_STATE_DISCONNECTED; /* fallthrough */ case QJACK_STATE_DISCONNECTED: break; } +} -qjack_buffer_free(>fifo); -g_free(c->port); - -c->state = QJACK_STATE_DISCONNECTED; +static void qjack_client_fini(QJackClient *c) +{ +qemu_mutex_lock(_shutdown_lock); +qjack_client_fini_locked(c); +qemu_mutex_unlock(_shutdown_lock); } static void qjack_fini_out(HWVoiceOut *hw) { QJackOut *jo = (QJackOut *)hw; qjack_client_fini(>c); + +qemu_bh_delete(jo->c.shutdown_bh); } static void qjack_fini_in(HWVoiceIn *hw) { QJackIn *ji = (QJackIn *)hw; qjack_client_fini(>c); + +qemu_bh_delete(ji->c.shutdown_bh); } static void qjack_enable_out(HWVoiceOut *hw, bool enable) @@ -662,6 +685,7 @@ static void qjack_info(const char *msg) static void register_audio_jack(void) { +qemu_mutex_init(_shutdown_lock); audio_driver_register(_driver); jack_set_thread_creator(qjack_thread_creator); jack_set_error_function(qjack_error); -- 2.20.1
[PATCH v8 0/1] audio/jack: fix use after free segfault
v9: * switch to using a global shutdown mutex Geoffrey McRae (1): audio/jack: fix use after free segfault audio/jackaudio.c | 50 +++ 1 file changed, 37 insertions(+), 13 deletions(-) -- 2.20.1
[PATCH] hw/core/qdev-properties-system: allow bus addresses > 0x1f
The commit bccb20c49df1bd683248a366021973901c11982f introduced an error in the checking logic that validates the bus addresses for PCI device addresses preventing usage of devices via vfio-pci that sit at a bus address of 0x20 or higher. This patch resolves this by reverting the checking logic to the original maximum value of 0xff Signed-off-by: Geoffrey McRae --- hw/core/qdev-properties-system.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index b81a4e8d14..e62644bc69 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -882,7 +882,7 @@ static void set_pci_host_devaddr(Object *obj, Visitor *v, const char *name, bus = val; p = (char *)e + 1; -if (qemu_strtoul(p, , 16, ) < 0 || val > 0x1f || e == p) { +if (qemu_strtoul(p, , 16, ) < 0 || val > 0xff || e == p) { goto inval; } if (*e == ':') { -- 2.20.1
[PATCH v10 10/12] hw/block/nvme: Support Zone Descriptor Extensions
Zone Descriptor Extension is a label that can be assigned to a zone. It can be set to an Empty zone and it stays assigned until the zone is reset. This commit adds a new optional module property, "zoned.descr_ext_size". Its value must be a multiple of 64 bytes. If this value is non-zero, it becomes possible to assign extensions of that size to any Empty zones. The default value for this property is 0, therefore setting extensions is disabled by default. Signed-off-by: Hans Holmberg Signed-off-by: Dmitry Fomichev Reviewed-by: Klaus Jensen Reviewed-by: Niklas Cassel --- hw/block/nvme-ns.h| 8 +++ hw/block/nvme-ns.c| 25 ++-- hw/block/nvme.c | 54 +-- hw/block/trace-events | 2 ++ 4 files changed, 85 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 421bab0a57..50a6a0e1ac 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -35,6 +35,7 @@ typedef struct NvmeNamespaceParams { uint64_t zone_cap_bs; uint32_t max_active_zones; uint32_t max_open_zones; +uint32_t zd_extension_size; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -58,6 +59,7 @@ typedef struct NvmeNamespace { uint64_tzone_capacity; uint64_tzone_array_size; uint32_tzone_size_log2; +uint8_t *zd_extensions; int32_t nr_open_zones; int32_t nr_active_zones; @@ -127,6 +129,12 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone) st != NVME_ZONE_STATE_OFFLINE; } +static inline uint8_t *nvme_get_zd_extension(NvmeNamespace *ns, + uint32_t zone_idx) +{ +return >zd_extensions[zone_idx * ns->params.zd_extension_size]; +} + static inline void nvme_aor_inc_open(NvmeNamespace *ns) { assert(ns->nr_open_zones >= 0); diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 2e45838c15..85dc73cf06 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -133,6 +133,18 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) return -1; } +if (ns->params.zd_extension_size) { +if (ns->params.zd_extension_size & 0x3f) { +error_setg(errp, +"zone descriptor extension size must be a multiple of 64B"); +return -1; +} +if ((ns->params.zd_extension_size >> 6) > 0xff) { +error_setg(errp, "zone descriptor extension size is too large"); +return -1; +} +} + return 0; } @@ -144,6 +156,10 @@ static void nvme_init_zone_state(NvmeNamespace *ns) int i; ns->zone_array = g_malloc0(ns->zone_array_size); +if (ns->params.zd_extension_size) { +ns->zd_extensions = g_malloc0(ns->params.zd_extension_size * + ns->num_zones); +} QTAILQ_INIT(>exp_open_zones); QTAILQ_INIT(>imp_open_zones); @@ -186,7 +202,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; id_ns_z->lbafe[lba_index].zsze = cpu_to_le64(ns->zone_size); -id_ns_z->lbafe[lba_index].zdes = 0; +id_ns_z->lbafe[lba_index].zdes = +ns->params.zd_extension_size >> 6; /* Units of 64B */ ns->csi = NVME_CSI_ZONED; ns->id_ns.nsze = cpu_to_le64(ns->zone_size * ns->num_zones); @@ -204,7 +221,8 @@ static void nvme_clear_zone(NvmeNamespace *ns, NvmeZone *zone) zone->w_ptr = zone->d.wp; state = nvme_get_zone_state(zone); -if (zone->d.wp != zone->d.zslba) { +if (zone->d.wp != zone->d.zslba || +(zone->d.za & NVME_ZA_ZD_EXT_VALID)) { if (state != NVME_ZONE_STATE_CLOSED) { trace_pci_nvme_clear_ns_close(state, zone->d.zslba); nvme_set_zone_state(zone, NVME_ZONE_STATE_CLOSED); @@ -301,6 +319,7 @@ void nvme_ns_cleanup(NvmeNamespace *ns) if (ns->params.zoned) { g_free(ns->id_ns_zoned); g_free(ns->zone_array); +g_free(ns->zd_extensions); } } @@ -332,6 +351,8 @@ static Property nvme_ns_props[] = { params.max_active_zones, 0), DEFINE_PROP_UINT32("zoned.max_open", NvmeNamespace, params.max_open_zones, 0), +DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace, + params.zd_extension_size, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index c6b3a5dcf7..e82e3be821 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1703,6 +1703,26 @@ static uint16_t nvme_offline_zone(NvmeNamespace *ns, NvmeZone *zone, return NVME_ZONE_INVAL_TRANSITION; } +static uint16_t nvme_set_zd_ext(NvmeNamespace *ns, NvmeZone *zone) +{ +uint16_t status; +uint8_t state = nvme_get_zone_state(zone); + +if (state == NVME_ZONE_STATE_EMPTY) { +nvme_auto_transition_zone(ns, false, true); +
[PATCH v10 09/12] hw/block/nvme: Introduce max active and open zone limits
Add two module properties, "zoned.max_active" and "zoned.max_open" to control the maximum number of zones that can be active or open. Once these variables are set to non-default values, these limits are checked during I/O and Too Many Active or Too Many Open command status is returned if they are exceeded. Signed-off-by: Hans Holmberg Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel --- hw/block/nvme-ns.h| 41 +++ hw/block/nvme-ns.c| 30 +- hw/block/nvme.c | 94 +++ hw/block/trace-events | 2 + 4 files changed, 165 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index d2631ff5a3..421bab0a57 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -33,6 +33,8 @@ typedef struct NvmeNamespaceParams { bool cross_zone_read; uint64_t zone_size_bs; uint64_t zone_cap_bs; +uint32_t max_active_zones; +uint32_t max_open_zones; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -56,6 +58,8 @@ typedef struct NvmeNamespace { uint64_tzone_capacity; uint64_tzone_array_size; uint32_tzone_size_log2; +int32_t nr_open_zones; +int32_t nr_active_zones; NvmeNamespaceParams params; } NvmeNamespace; @@ -123,6 +127,43 @@ static inline bool nvme_wp_is_valid(NvmeZone *zone) st != NVME_ZONE_STATE_OFFLINE; } +static inline void nvme_aor_inc_open(NvmeNamespace *ns) +{ +assert(ns->nr_open_zones >= 0); +if (ns->params.max_open_zones) { +ns->nr_open_zones++; +assert(ns->nr_open_zones <= ns->params.max_open_zones); +} +} + +static inline void nvme_aor_dec_open(NvmeNamespace *ns) +{ +if (ns->params.max_open_zones) { +assert(ns->nr_open_zones > 0); +ns->nr_open_zones--; +} +assert(ns->nr_open_zones >= 0); +} + +static inline void nvme_aor_inc_active(NvmeNamespace *ns) +{ +assert(ns->nr_active_zones >= 0); +if (ns->params.max_active_zones) { +ns->nr_active_zones++; +assert(ns->nr_active_zones <= ns->params.max_active_zones); +} +} + +static inline void nvme_aor_dec_active(NvmeNamespace *ns) +{ +if (ns->params.max_active_zones) { +assert(ns->nr_active_zones > 0); +ns->nr_active_zones--; +assert(ns->nr_active_zones >= ns->nr_open_zones); +} +assert(ns->nr_active_zones >= 0); +} + int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_flush(NvmeNamespace *ns); diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index e6db7f7d3b..2e45838c15 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -119,6 +119,20 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) ns->zone_size_log2 = 63 - clz64(ns->zone_size); } +/* Make sure that the values of all ZNS properties are sane */ +if (ns->params.max_open_zones > nz) { +error_setg(errp, + "max_open_zones value %u exceeds the number of zones %u", + ns->params.max_open_zones, nz); +return -1; +} +if (ns->params.max_active_zones > nz) { +error_setg(errp, + "max_active_zones value %u exceeds the number of zones %u", + ns->params.max_active_zones, nz); +return -1; +} + return 0; } @@ -166,8 +180,8 @@ static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned)); /* MAR/MOR are zeroes-based, 0x means no limit */ -id_ns_z->mar = 0x; -id_ns_z->mor = 0x; +id_ns_z->mar = cpu_to_le32(ns->params.max_active_zones - 1); +id_ns_z->mor = cpu_to_le32(ns->params.max_open_zones - 1); id_ns_z->zoc = 0; id_ns_z->ozcs = ns->params.cross_zone_read ? 0x01 : 0x00; @@ -195,6 +209,7 @@ static void nvme_clear_zone(NvmeNamespace *ns, NvmeZone *zone) trace_pci_nvme_clear_ns_close(state, zone->d.zslba); nvme_set_zone_state(zone, NVME_ZONE_STATE_CLOSED); } +nvme_aor_inc_active(ns); QTAILQ_INSERT_HEAD(>closed_zones, zone, entry); } else { trace_pci_nvme_clear_ns_reset(state, zone->d.zslba); @@ -211,16 +226,23 @@ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns) QTAILQ_FOREACH_SAFE(zone, >closed_zones, entry, next) { QTAILQ_REMOVE(>closed_zones, zone, entry); +nvme_aor_dec_active(ns); nvme_clear_zone(ns, zone); } QTAILQ_FOREACH_SAFE(zone, >imp_open_zones, entry, next) { QTAILQ_REMOVE(>imp_open_zones, zone, entry); +nvme_aor_dec_open(ns); +nvme_aor_dec_active(ns); nvme_clear_zone(ns, zone); } QTAILQ_FOREACH_SAFE(zone, >exp_open_zones, entry, next) { QTAILQ_REMOVE(>exp_open_zones, zone, entry); +nvme_aor_dec_open(ns); +
[PATCH v10 08/12] hw/block/nvme: Support Zoned Namespace Command Set
The emulation code has been changed to advertise NVM Command Set when "zoned" device property is not set (default) and Zoned Namespace Command Set otherwise. Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator. Define trace events where needed in newly introduced code. In order to improve scalability, all open, closed and full zones are organized in separate linked lists. Consequently, almost all zone operations don't require scanning of the entire zone array (which potentially can be quite large) - it is only necessary to enumerate one or more zone lists. Handlers for three new NVMe commands introduced in Zoned Namespace Command Set specification are added, namely for Zone Management Receive, Zone Management Send and Zone Append. Device initialization code has been extended to create a proper configuration for zoned operation using device properties. Read/Write command handler is modified to only allow writes at the write pointer if the namespace is zoned. For Zone Append command, writes implicitly happen at the write pointer and the starting write pointer value is returned as the result of the command. Write Zeroes handler is modified to add zoned checks that are identical to those done as a part of Write flow. Subsequent commits in this series add ZDE support and checks for active and open zone limits. Signed-off-by: Niklas Cassel Signed-off-by: Hans Holmberg Signed-off-by: Ajay Joshi Signed-off-by: Chaitanya Kulkarni Signed-off-by: Matias Bjorling Signed-off-by: Aravind Ramesh Signed-off-by: Shin'ichiro Kawasaki Signed-off-by: Adam Manzanares Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel --- hw/block/nvme-ns.h| 54 +++ hw/block/nvme.h | 8 + hw/block/nvme-ns.c| 173 hw/block/nvme.c | 972 +- hw/block/trace-events | 18 +- 5 files changed, 1210 insertions(+), 15 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 2d9cd29d07..d2631ff5a3 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -19,9 +19,20 @@ #define NVME_NS(obj) \ OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS) +typedef struct NvmeZone { +NvmeZoneDescr d; +uint64_tw_ptr; +QTAILQ_ENTRY(NvmeZone) entry; +} NvmeZone; + typedef struct NvmeNamespaceParams { uint32_t nsid; QemuUUID uuid; + +bool zoned; +bool cross_zone_read; +uint64_t zone_size_bs; +uint64_t zone_cap_bs; } NvmeNamespaceParams; typedef struct NvmeNamespace { @@ -34,6 +45,18 @@ typedef struct NvmeNamespace { bool attached; uint8_t csi; +NvmeIdNsZoned *id_ns_zoned; +NvmeZone*zone_array; +QTAILQ_HEAD(, NvmeZone) exp_open_zones; +QTAILQ_HEAD(, NvmeZone) imp_open_zones; +QTAILQ_HEAD(, NvmeZone) closed_zones; +QTAILQ_HEAD(, NvmeZone) full_zones; +uint32_tnum_zones; +uint64_tzone_size; +uint64_tzone_capacity; +uint64_tzone_array_size; +uint32_tzone_size_log2; + NvmeNamespaceParams params; } NvmeNamespace; @@ -71,8 +94,39 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba) typedef struct NvmeCtrl NvmeCtrl; +static inline uint8_t nvme_get_zone_state(NvmeZone *zone) +{ +return zone->d.zs >> 4; +} + +static inline void nvme_set_zone_state(NvmeZone *zone, enum NvmeZoneState state) +{ +zone->d.zs = state << 4; +} + +static inline uint64_t nvme_zone_rd_boundary(NvmeNamespace *ns, NvmeZone *zone) +{ +return zone->d.zslba + ns->zone_size; +} + +static inline uint64_t nvme_zone_wr_boundary(NvmeZone *zone) +{ +return zone->d.zslba + zone->d.zcap; +} + +static inline bool nvme_wp_is_valid(NvmeZone *zone) +{ +uint8_t st = nvme_get_zone_state(zone); + +return st != NVME_ZONE_STATE_FULL && + st != NVME_ZONE_STATE_READ_ONLY && + st != NVME_ZONE_STATE_OFFLINE; +} + int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_flush(NvmeNamespace *ns); +void nvme_ns_shutdown(NvmeNamespace *ns); +void nvme_ns_cleanup(NvmeNamespace *ns); #endif /* NVME_NS_H */ diff --git a/hw/block/nvme.h b/hw/block/nvme.h index e080a2318a..4cb0615128 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -6,6 +6,9 @@ #define NVME_MAX_NAMESPACES 256 +#define NVME_DEFAULT_ZONE_SIZE (128 * MiB) +#define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB) + typedef struct NvmeParams { char *serial; uint32_t num_queues; /* deprecated since 5.1 */ @@ -16,6 +19,7 @@ typedef struct NvmeParams { uint32_t aer_max_queued; uint8_t mdts; bool use_intel_id; +uint32_t zasl_bs; } NvmeParams; typedef struct NvmeAsyncEvent { @@ -28,6 +32,8 @@ typedef struct NvmeRequest { struct NvmeNamespace*ns; BlockAIOCB *aiocb; uint16_tstatus; +
[PATCH v10 07/12] block/nvme: Make ZNS-related definitions
Define values and structures that are needed to support Zoned Namespace Command Set (NVMe TP 4053). Signed-off-by: Dmitry Fomichev --- include/block/nvme.h | 114 ++- 1 file changed, 113 insertions(+), 1 deletion(-) diff --git a/include/block/nvme.h b/include/block/nvme.h index 394db19022..752623b4f9 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -489,6 +489,9 @@ enum NvmeIoCommands { NVME_CMD_COMPARE= 0x05, NVME_CMD_WRITE_ZEROES = 0x08, NVME_CMD_DSM= 0x09, +NVME_CMD_ZONE_MGMT_SEND = 0x79, +NVME_CMD_ZONE_MGMT_RECV = 0x7a, +NVME_CMD_ZONE_APPEND= 0x7d, }; typedef struct QEMU_PACKED NvmeDeleteQ { @@ -648,9 +651,13 @@ typedef struct QEMU_PACKED NvmeAerResult { uint8_t resv; } NvmeAerResult; +typedef struct QEMU_PACKED NvmeZonedResult { +uint64_t slba; +} NvmeZonedResult; + typedef struct QEMU_PACKED NvmeCqe { uint32_tresult; -uint32_trsvd; +uint32_tdw1; uint16_tsq_head; uint16_tsq_id; uint16_tcid; @@ -679,6 +686,7 @@ enum NvmeStatusCodes { NVME_INVALID_USE_OF_CMB = 0x0012, NVME_INVALID_PRP_OFFSET = 0x0013, NVME_CMD_SET_CMB_REJECTED = 0x002b, +NVME_INVALID_CMD_SET= 0x002c, NVME_LBA_RANGE = 0x0080, NVME_CAP_EXCEEDED = 0x0081, NVME_NS_NOT_READY = 0x0082, @@ -703,6 +711,14 @@ enum NvmeStatusCodes { NVME_CONFLICTING_ATTRS = 0x0180, NVME_INVALID_PROT_INFO = 0x0181, NVME_WRITE_TO_RO= 0x0182, +NVME_ZONE_BOUNDARY_ERROR= 0x01b8, +NVME_ZONE_FULL = 0x01b9, +NVME_ZONE_READ_ONLY = 0x01ba, +NVME_ZONE_OFFLINE = 0x01bb, +NVME_ZONE_INVALID_WRITE = 0x01bc, +NVME_ZONE_TOO_MANY_ACTIVE = 0x01bd, +NVME_ZONE_TOO_MANY_OPEN = 0x01be, +NVME_ZONE_INVAL_TRANSITION = 0x01bf, NVME_WRITE_FAULT= 0x0280, NVME_UNRECOVERED_READ = 0x0281, NVME_E2E_GUARD_ERROR= 0x0282, @@ -887,6 +903,11 @@ typedef struct QEMU_PACKED NvmeIdCtrl { uint8_t vs[1024]; } NvmeIdCtrl; +typedef struct NvmeIdCtrlZoned { +uint8_t zasl; +uint8_t rsvd1[4095]; +} NvmeIdCtrlZoned; + enum NvmeIdCtrlOacs { NVME_OACS_SECURITY = 1 << 0, NVME_OACS_FORMAT= 1 << 1, @@ -1012,6 +1033,12 @@ typedef struct QEMU_PACKED NvmeLBAF { uint8_t rp; } NvmeLBAF; +typedef struct QEMU_PACKED NvmeLBAFE { +uint64_tzsze; +uint8_t zdes; +uint8_t rsvd9[7]; +} NvmeLBAFE; + #define NVME_NSID_BROADCAST 0x typedef struct QEMU_PACKED NvmeIdNs { @@ -1066,10 +1093,24 @@ enum NvmeNsIdentifierType { enum NvmeCsi { NVME_CSI_NVM= 0x00, +NVME_CSI_ZONED = 0x02, }; #define NVME_SET_CSI(vec, csi) (vec |= (uint8_t)(1 << (csi))) +typedef struct QEMU_PACKED NvmeIdNsZoned { +uint16_tzoc; +uint16_tozcs; +uint32_tmar; +uint32_tmor; +uint32_trrl; +uint32_tfrl; +uint8_t rsvd20[2796]; +NvmeLBAFE lbafe[16]; +uint8_t rsvd3072[768]; +uint8_t vs[256]; +} NvmeIdNsZoned; + /*Deallocate Logical Block Features*/ #define NVME_ID_NS_DLFEAT_GUARD_CRC(dlfeat) ((dlfeat) & 0x10) #define NVME_ID_NS_DLFEAT_WRITE_ZEROES(dlfeat)((dlfeat) & 0x08) @@ -1101,10 +1142,76 @@ enum NvmeIdNsDps { DPS_FIRST_EIGHT = 8, }; +enum NvmeZoneAttr { +NVME_ZA_FINISHED_BY_CTLR = 1 << 0, +NVME_ZA_FINISH_RECOMMENDED = 1 << 1, +NVME_ZA_RESET_RECOMMENDED= 1 << 2, +NVME_ZA_ZD_EXT_VALID = 1 << 7, +}; + +typedef struct QEMU_PACKED NvmeZoneReportHeader { +uint64_tnr_zones; +uint8_t rsvd[56]; +} NvmeZoneReportHeader; + +enum NvmeZoneReceiveAction { +NVME_ZONE_REPORT = 0, +NVME_ZONE_REPORT_EXTENDED= 1, +}; + +enum NvmeZoneReportType { +NVME_ZONE_REPORT_ALL = 0, +NVME_ZONE_REPORT_EMPTY = 1, +NVME_ZONE_REPORT_IMPLICITLY_OPEN = 2, +NVME_ZONE_REPORT_EXPLICITLY_OPEN = 3, +NVME_ZONE_REPORT_CLOSED = 4, +NVME_ZONE_REPORT_FULL= 5, +NVME_ZONE_REPORT_READ_ONLY = 6, +NVME_ZONE_REPORT_OFFLINE = 7, +}; + +enum NvmeZoneType { +NVME_ZONE_TYPE_RESERVED = 0x00, +NVME_ZONE_TYPE_SEQ_WRITE = 0x02, +}; + +enum NvmeZoneSendAction { +NVME_ZONE_ACTION_RSD = 0x00, +NVME_ZONE_ACTION_CLOSE = 0x01, +NVME_ZONE_ACTION_FINISH = 0x02, +NVME_ZONE_ACTION_OPEN= 0x03, +NVME_ZONE_ACTION_RESET = 0x04, +NVME_ZONE_ACTION_OFFLINE = 0x05, +NVME_ZONE_ACTION_SET_ZD_EXT = 0x10, +}; + +typedef struct QEMU_PACKED NvmeZoneDescr { +uint8_t zt; +uint8_t zs; +uint8_t za; +uint8_t rsvd3[5]; +uint64_tzcap; +
[PATCH 1/2] hw/m68k/q800: Don't connect two qemu_irqs directly to the same input
The q800 board code connects both of the IRQ outputs of the ESCC to the same pic[3] qemu_irq. Connecting two qemu_irqs outputs directly to the same input is not valid as it produces subtly wrong behaviour (for instance if both the IRQ lines are high, and then one goes low, the PIC input will see this as a high-to-low transition even though the second IRQ line should still be holding it high). This kind of wiring needs an explicitly created OR gate; add one. Signed-off-by: Peter Maydell --- hw/m68k/q800.c | 12 ++-- hw/m68k/Kconfig | 1 + 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c index ce4b47c3e34..dc13007aaf2 100644 --- a/hw/m68k/q800.c +++ b/hw/m68k/q800.c @@ -28,6 +28,7 @@ #include "hw/hw.h" #include "hw/boards.h" #include "hw/irq.h" +#include "hw/or-irq.h" #include "elf.h" #include "hw/loader.h" #include "ui/console.h" @@ -171,6 +172,7 @@ static void q800_init(MachineState *machine) CPUState *cs; DeviceState *dev; DeviceState *via_dev; +DeviceState *escc_orgate; SysBusESPState *sysbus_esp; ESPState *esp; SysBusDevice *sysbus; @@ -283,8 +285,14 @@ static void q800_init(MachineState *machine) qdev_prop_set_uint32(dev, "chnAtype", 0); sysbus = SYS_BUS_DEVICE(dev); sysbus_realize_and_unref(sysbus, _fatal); -sysbus_connect_irq(sysbus, 0, pic[3]); -sysbus_connect_irq(sysbus, 1, pic[3]); + +/* Logically OR both its IRQs together */ +escc_orgate = DEVICE(object_new(TYPE_OR_IRQ)); +object_property_set_int(OBJECT(escc_orgate), "num-lines", 2, _fatal); +qdev_realize_and_unref(escc_orgate, NULL, _fatal); +sysbus_connect_irq(sysbus, 0, qdev_get_gpio_in(escc_orgate, 0)); +sysbus_connect_irq(sysbus, 1, qdev_get_gpio_in(escc_orgate, 1)); +qdev_connect_gpio_out(DEVICE(escc_orgate), 0, pic[3]); sysbus_mmio_map(sysbus, 0, SCC_BASE); /* SCSI */ diff --git a/hw/m68k/Kconfig b/hw/m68k/Kconfig index c757e7dfa48..60d7bcfb8f2 100644 --- a/hw/m68k/Kconfig +++ b/hw/m68k/Kconfig @@ -22,3 +22,4 @@ config Q800 select ESCC select ESP select DP8393X +select OR_IRQ -- 2.20.1
[PATCH 2/2] hw/m68k/q800.c: Make the GLUE chip an actual QOM device
The handling of the GLUE (General Logic Unit) device is currently open-coded. Make this into a proper QOM device. This minor piece of modernisation gets rid of the free floating qemu_irq array 'pic', which Coverity points out is technically leaked when we exit the machine init function. (The replacement glue device is not leaked because it gets added to the sysbus, so it's accessible via that.) Fixes: Coverity CID 1421883 Signed-off-by: Peter Maydell --- hw/m68k/q800.c | 82 ++ 1 file changed, 70 insertions(+), 12 deletions(-) diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c index dc13007aaf2..05bb372f958 100644 --- a/hw/m68k/q800.c +++ b/hw/m68k/q800.c @@ -47,6 +47,7 @@ #include "sysemu/qtest.h" #include "sysemu/runstate.h" #include "sysemu/reset.h" +#include "migration/vmstate.h" #define MACROM_ADDR 0x4080 #define MACROM_SIZE 0x0010 @@ -94,10 +95,14 @@ * CPU. */ -typedef struct { +#define TYPE_GLUE "q800-glue" +OBJECT_DECLARE_SIMPLE_TYPE(GLUEState, GLUE) + +struct GLUEState { +SysBusDevice parent_obj; M68kCPU *cpu; uint8_t ipr; -} GLUEState; +}; static void GLUE_set_irq(void *opaque, int irq, int level) { @@ -119,6 +124,58 @@ static void GLUE_set_irq(void *opaque, int irq, int level) m68k_set_irq_level(s->cpu, 0, 0); } +static void glue_reset(DeviceState *dev) +{ +GLUEState *s = GLUE(dev); + +s->ipr = 0; +} + +static const VMStateDescription vmstate_glue = { +.name = "q800-glue", +.version_id = 0, +.minimum_version_id = 0, +.fields = (VMStateField[]) { +VMSTATE_UINT8(ipr, GLUEState), +VMSTATE_END_OF_LIST(), +}, +}; + +/* + * If the m68k CPU implemented its inbound irq lines as GPIO lines + * rather than via the m68k_set_irq_level() function we would not need + * this cpu link property and could instead provide outbound IRQ lines + * that the board could wire up to the CPU. + */ +static Property glue_properties[] = { +DEFINE_PROP_LINK("cpu", GLUEState, cpu, TYPE_M68K_CPU, M68kCPU *), +DEFINE_PROP_END_OF_LIST(), +}; + +static void glue_init(Object *obj) +{ +DeviceState *dev = DEVICE(obj); + +qdev_init_gpio_in(dev, GLUE_set_irq, 8); +} + +static void glue_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); + +dc->vmsd = _glue; +dc->reset = glue_reset; +device_class_set_props(dc, glue_properties); +} + +static const TypeInfo glue_info = { +.name = TYPE_GLUE, +.parent = TYPE_SYS_BUS_DEVICE, +.instance_size = sizeof(GLUEState), +.instance_init = glue_init, +.class_init = glue_class_init, +}; + static void main_cpu_reset(void *opaque) { M68kCPU *cpu = opaque; @@ -178,8 +235,7 @@ static void q800_init(MachineState *machine) SysBusDevice *sysbus; BusState *adb_bus; NubusBus *nubus; -GLUEState *irq; -qemu_irq *pic; +DeviceState *glue; DriveInfo *dinfo; linux_boot = (kernel_filename != NULL); @@ -213,10 +269,9 @@ static void q800_init(MachineState *machine) } /* IRQ Glue */ - -irq = g_new0(GLUEState, 1); -irq->cpu = cpu; -pic = qemu_allocate_irqs(GLUE_set_irq, irq, 8); +glue = qdev_new(TYPE_GLUE); +object_property_set_link(OBJECT(glue), "cpu", OBJECT(cpu), _abort); +sysbus_realize_and_unref(SYS_BUS_DEVICE(glue), _fatal); /* VIA */ @@ -228,8 +283,10 @@ static void q800_init(MachineState *machine) sysbus = SYS_BUS_DEVICE(via_dev); sysbus_realize_and_unref(sysbus, _fatal); sysbus_mmio_map(sysbus, 0, VIA_BASE); -qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 0, pic[0]); -qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 1, pic[1]); +qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 0, +qdev_get_gpio_in(glue, 0)); +qdev_connect_gpio_out_named(DEVICE(sysbus), "irq", 1, +qdev_get_gpio_in(glue, 1)); adb_bus = qdev_get_child_bus(via_dev, "adb.0"); @@ -270,7 +327,7 @@ static void q800_init(MachineState *machine) sysbus_realize_and_unref(sysbus, _fatal); sysbus_mmio_map(sysbus, 0, SONIC_BASE); sysbus_mmio_map(sysbus, 1, SONIC_PROM_BASE); -sysbus_connect_irq(sysbus, 0, pic[2]); +sysbus_connect_irq(sysbus, 0, qdev_get_gpio_in(glue, 2)); /* SCC */ @@ -292,7 +349,7 @@ static void q800_init(MachineState *machine) qdev_realize_and_unref(escc_orgate, NULL, _fatal); sysbus_connect_irq(sysbus, 0, qdev_get_gpio_in(escc_orgate, 0)); sysbus_connect_irq(sysbus, 1, qdev_get_gpio_in(escc_orgate, 1)); -qdev_connect_gpio_out(DEVICE(escc_orgate), 0, pic[3]); +qdev_connect_gpio_out(DEVICE(escc_orgate), 0, qdev_get_gpio_in(glue, 3)); sysbus_mmio_map(sysbus, 0, SCC_BASE); /* SCSI */ @@ -457,6 +514,7 @@ static const TypeInfo q800_machine_typeinfo = { static void q800_machine_register_types(void) { type_register_static(_machine_typeinfo); +
[PATCH 0/2] m68k/q800: make the GLUE chip a QOM device
This series is 6.0 material really I think. It's a bit of cleanup prompted by a Coverity issue, CID 1421883. There are another half dozen or so similar issues, where Coverity is complaining that we allocate an array of qemu_irqs with qemu_allocate_irqs() in a board init function -- in this case the 'pic' array in q800_init() -- and then we return from the board init function and the memory is leaked, in the sense that nobody has a pointer to it any more. The leak isn't real, in that board init functions are called only once, and the array of qemu_irqs really does need to stay around for the life of the simulation, so these are pretty much insignificant as Coverity issues go. But this coding style which uses a free-floating set of qemu_irqs is not very "modern QEMU", so the issues act as a nudge that we should clean the code up by encapsulating the interrupt-line behaviour in a QOM device. In the q800 case there actually is already a GLUEState struct, it just needs to be turned into a QOM device with GPIO input lines. Patch 2 does that. Patch 1 fixes a bug I noticed while doing this work -- it's not valid to connect two qemu_irq lines directly to the same input (here both ESCC irq lines go to pic[3]) because it produces weird behaviour like "both lines are asserted but the device consuming the interrupt sees the line deassert when one of the two inputs goes low, rather than only when they both go low". You need to put an explicit OR gate in, assuming that logical-OR is the desired behaviour, which it usually is. Tested only with 'make check' and 'make check-acceptance', but the latter does have a q800 bootup test. thanks -- PMM Peter Maydell (2): hw/m68k/q800: Don't connect two qemu_irqs directly to the same input hw/m68k/q800.c: Make the GLUE chip an actual QOM device hw/m68k/q800.c | 92 ++--- hw/m68k/Kconfig | 1 + 2 files changed, 80 insertions(+), 13 deletions(-) -- 2.20.1
[PATCH v10 05/12] hw/block/nvme: Add support for Namespace Types
From: Niklas Cassel Define the structures and constants required to implement Namespace Types support. Namespace Types introduce a new command set, "I/O Command Sets", that allows the host to retrieve the command sets associated with a namespace. Introduce support for the command set and enable detection for the NVM Command Set. The new workflows for identify commands rely heavily on zero-filled identify structs. E.g., certain CNS commands are defined to return a zero-filled identify struct when an inactive namespace NSID is supplied. Add a helper function in order to avoid code duplication when reporting zero-filled identify structures. Signed-off-by: Niklas Cassel Signed-off-by: Dmitry Fomichev Reviewed-by: Keith Busch --- hw/block/nvme-ns.h| 1 + include/block/nvme.h | 66 +++ hw/block/nvme-ns.c| 2 + hw/block/nvme.c | 188 +++--- hw/block/trace-events | 7 ++ 5 files changed, 219 insertions(+), 45 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index a38071884a..d795e44bab 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -31,6 +31,7 @@ typedef struct NvmeNamespace { int64_t size; NvmeIdNs id_ns; const uint32_t *iocs; +uint8_t csi; NvmeNamespaceParams params; } NvmeNamespace; diff --git a/include/block/nvme.h b/include/block/nvme.h index f62cc90d49..af23514713 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -84,6 +84,7 @@ enum NvmeCapMask { enum NvmeCapCss { NVME_CAP_CSS_NVM= 1 << 0, +NVME_CAP_CSS_CSI_SUPP = 1 << 6, NVME_CAP_CSS_ADMIN_ONLY = 1 << 7, }; @@ -117,9 +118,25 @@ enum NvmeCcMask { enum NvmeCcCss { NVME_CC_CSS_NVM= 0x0, +NVME_CC_CSS_CSI= 0x6, NVME_CC_CSS_ADMIN_ONLY = 0x7, }; +#define NVME_SET_CC_EN(cc, val) \ +(cc |= (uint32_t)((val) & CC_EN_MASK) << CC_EN_SHIFT) +#define NVME_SET_CC_CSS(cc, val)\ +(cc |= (uint32_t)((val) & CC_CSS_MASK) << CC_CSS_SHIFT) +#define NVME_SET_CC_MPS(cc, val)\ +(cc |= (uint32_t)((val) & CC_MPS_MASK) << CC_MPS_SHIFT) +#define NVME_SET_CC_AMS(cc, val)\ +(cc |= (uint32_t)((val) & CC_AMS_MASK) << CC_AMS_SHIFT) +#define NVME_SET_CC_SHN(cc, val)\ +(cc |= (uint32_t)((val) & CC_SHN_MASK) << CC_SHN_SHIFT) +#define NVME_SET_CC_IOSQES(cc, val) \ +(cc |= (uint32_t)((val) & CC_IOSQES_MASK) << CC_IOSQES_SHIFT) +#define NVME_SET_CC_IOCQES(cc, val) \ +(cc |= (uint32_t)((val) & CC_IOCQES_MASK) << CC_IOCQES_SHIFT) + enum NvmeCstsShift { CSTS_RDY_SHIFT = 0, CSTS_CFS_SHIFT = 1, @@ -534,8 +551,13 @@ typedef struct QEMU_PACKED NvmeIdentify { uint64_trsvd2[2]; uint64_tprp1; uint64_tprp2; -uint32_tcns; -uint32_trsvd11[5]; +uint8_t cns; +uint8_t rsvd10; +uint16_tctrlid; +uint16_tnvmsetid; +uint8_t rsvd11; +uint8_t csi; +uint32_trsvd12[4]; } NvmeIdentify; typedef struct QEMU_PACKED NvmeRwCmd { @@ -656,6 +678,7 @@ enum NvmeStatusCodes { NVME_SGL_DESCR_TYPE_INVALID = 0x0011, NVME_INVALID_USE_OF_CMB = 0x0012, NVME_INVALID_PRP_OFFSET = 0x0013, +NVME_CMD_SET_CMB_REJECTED = 0x002b, NVME_LBA_RANGE = 0x0080, NVME_CAP_EXCEEDED = 0x0081, NVME_NS_NOT_READY = 0x0082, @@ -782,11 +805,15 @@ typedef struct QEMU_PACKED NvmePSD { #define NVME_IDENTIFY_DATA_SIZE 4096 -enum { -NVME_ID_CNS_NS = 0x0, -NVME_ID_CNS_CTRL = 0x1, -NVME_ID_CNS_NS_ACTIVE_LIST = 0x2, -NVME_ID_CNS_NS_DESCR_LIST = 0x3, +enum NvmeIdCns { +NVME_ID_CNS_NS= 0x00, +NVME_ID_CNS_CTRL = 0x01, +NVME_ID_CNS_NS_ACTIVE_LIST= 0x02, +NVME_ID_CNS_NS_DESCR_LIST = 0x03, +NVME_ID_CNS_CS_NS = 0x05, +NVME_ID_CNS_CS_CTRL = 0x06, +NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, +NVME_ID_CNS_IO_COMMAND_SET= 0x1c, }; typedef struct QEMU_PACKED NvmeIdCtrl { @@ -934,6 +961,7 @@ enum NvmeFeatureIds { NVME_WRITE_ATOMICITY= 0xa, NVME_ASYNCHRONOUS_EVENT_CONF= 0xb, NVME_TIMESTAMP = 0xe, +NVME_COMMAND_SET_PROFILE= 0x19, NVME_SOFTWARE_PROGRESS_MARKER = 0x80, NVME_FID_MAX= 0x100, }; @@ -1018,18 +1046,26 @@ typedef struct QEMU_PACKED NvmeIdNsDescr { uint8_t rsvd2[2]; } NvmeIdNsDescr; -enum { -NVME_NIDT_EUI64_LEN = 8, -NVME_NIDT_NGUID_LEN = 16, -NVME_NIDT_UUID_LEN = 16, +enum NvmeNsIdentifierLength { +NVME_NIDL_EUI64 = 8, +NVME_NIDL_NGUID = 16, +NVME_NIDL_UUID = 16, +NVME_NIDL_CSI = 1, }; enum NvmeNsIdentifierType { -NVME_NIDT_EUI64 = 0x1, -NVME_NIDT_NGUID = 0x2, -NVME_NIDT_UUID = 0x3, +NVME_NIDT_EUI64 = 0x01, +NVME_NIDT_NGUID = 0x02, +
[PATCH v10 04/12] hw/block/nvme: Merge nvme_write_zeroes() with nvme_write()
nvme_write() now handles WRITE, WRITE ZEROES and ZONE_APPEND. Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel Acked-by: Klaus Jensen --- hw/block/nvme.c | 72 +-- hw/block/trace-events | 1 - 2 files changed, 28 insertions(+), 45 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 770e42a066..48adbe84f5 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1001,32 +1001,7 @@ invalid: return status | NVME_DNR; } -static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) -{ -NvmeRwCmd *rw = (NvmeRwCmd *)>cmd; -NvmeNamespace *ns = req->ns; -uint64_t slba = le64_to_cpu(rw->slba); -uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; -uint64_t offset = nvme_l2b(ns, slba); -uint32_t count = nvme_l2b(ns, nlb); -uint16_t status; - -trace_pci_nvme_write_zeroes(nvme_cid(req), nvme_nsid(ns), slba, nlb); - -status = nvme_check_bounds(n, ns, slba, nlb); -if (status) { -trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze); -return status; -} - -block_acct_start(blk_get_stats(req->ns->blkconf.blk), >acct, 0, - BLOCK_ACCT_WRITE); -req->aiocb = blk_aio_pwrite_zeroes(req->ns->blkconf.blk, offset, count, - BDRV_REQ_MAY_UNMAP, nvme_rw_cb, req); -return NVME_NO_COMPLETE; -} - -static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req, bool wrz) { NvmeRwCmd *rw = (NvmeRwCmd *)>cmd; NvmeNamespace *ns = req->ns; @@ -1040,10 +1015,12 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode), nvme_nsid(ns), nlb, data_size, slba); -status = nvme_check_mdts(n, data_size); -if (status) { -trace_pci_nvme_err_mdts(nvme_cid(req), data_size); -goto invalid; +if (!wrz) { +status = nvme_check_mdts(n, data_size); +if (status) { +trace_pci_nvme_err_mdts(nvme_cid(req), data_size); +goto invalid; +} } status = nvme_check_bounds(n, ns, slba, nlb); @@ -1052,21 +1029,28 @@ static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req) goto invalid; } -status = nvme_map_dptr(n, data_size, req); -if (status) { -goto invalid; -} - data_offset = nvme_l2b(ns, slba); -block_acct_start(blk_get_stats(blk), >acct, data_size, - BLOCK_ACCT_WRITE); -if (req->qsg.sg) { -req->aiocb = dma_blk_write(blk, >qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); +if (!wrz) { +status = nvme_map_dptr(n, data_size, req); +if (status) { +goto invalid; +} + +block_acct_start(blk_get_stats(blk), >acct, data_size, + BLOCK_ACCT_WRITE); +if (req->qsg.sg) { +req->aiocb = dma_blk_write(blk, >qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); +} else { +req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0, + nvme_rw_cb, req); +} } else { -req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0, - nvme_rw_cb, req); +block_acct_start(blk_get_stats(blk), >acct, 0, BLOCK_ACCT_WRITE); +req->aiocb = blk_aio_pwrite_zeroes(blk, data_offset, data_size, + BDRV_REQ_MAY_UNMAP, nvme_rw_cb, + req); } return NVME_NO_COMPLETE; @@ -1100,9 +1084,9 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case NVME_CMD_FLUSH: return nvme_flush(n, req); case NVME_CMD_WRITE_ZEROES: -return nvme_write_zeroes(n, req); +return nvme_write(n, req, true); case NVME_CMD_WRITE: -return nvme_write(n, req); +return nvme_write(n, req, false); case NVME_CMD_READ: return nvme_read(n, req); default: diff --git a/hw/block/trace-events b/hw/block/trace-events index 540c600931..e67e96c2b5 100644 --- a/hw/block/trace-events +++ b/hw/block/trace-events @@ -43,7 +43,6 @@ pci_nvme_admin_cmd(uint16_t cid, uint16_t sqid, uint8_t opcode, const char *opna pci_nvme_read(uint16_t cid, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_write(uint16_t cid, const char *verb, uint32_t nsid, uint32_t nlb, uint64_t count, uint64_t lba) "cid %"PRIu16" opname '%s' nsid %"PRIu32" nlb %"PRIu32" count %"PRIu64" lba 0x%"PRIx64"" pci_nvme_rw_cb(uint16_t cid, const char *blkname) "cid %"PRIu16" blk '%s'" -pci_nvme_write_zeroes(uint16_t cid, uint32_t nsid, uint64_t slba, uint32_t nlb) "cid %"PRIu16" nsid %"PRIu32"
[PATCH v10 02/12] hw/block/nvme: Generate namespace UUIDs
In NVMe 1.4, a namespace must report an ID descriptor of UUID type if it doesn't support EUI64 or NGUID. Add a new namespace property, "uuid", that provides the user the option to either specify the UUID explicitly or have a UUID generated automatically every time a namespace is initialized. Suggested-by: Klaus Jensen Signed-off-by: Dmitry Fomichev Reviewed-by: Klaus Jensen Reviewed-by: Keith Busch Reviewed-by: Niklas Cassel --- hw/block/nvme-ns.h | 1 + hw/block/nvme-ns.c | 1 + hw/block/nvme.c| 9 + 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index ea8c2f785d..a38071884a 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -21,6 +21,7 @@ typedef struct NvmeNamespaceParams { uint32_t nsid; +QemuUUID uuid; } NvmeNamespaceParams; typedef struct NvmeNamespace { diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index b69cdaf27e..de735eb9f3 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -129,6 +129,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) static Property nvme_ns_props[] = { DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf), DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0), +DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 702f7cc2e3..ed3f38f01d 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1564,6 +1564,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, NvmeRequest *req) static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) { +NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; uint32_t nsid = le32_to_cpu(c->nsid); uint8_t list[NVME_IDENTIFY_DATA_SIZE]; @@ -1583,7 +1584,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_NSID | NVME_DNR; } -if (unlikely(!nvme_ns(n, nsid))) { +ns = nvme_ns(n, nsid); +if (unlikely(!ns)) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -1592,12 +1594,11 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req) /* * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data * structure, a Namespace UUID (nidt = 0x3) must be reported in the - * Namespace Identification Descriptor. Add a very basic Namespace UUID - * here. + * Namespace Identification Descriptor. Add the namespace UUID here. */ ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID; ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN; -stl_be_p(_descrs->uuid.v, nsid); +memcpy(_descrs->uuid.v, ns->params.uuid.data, NVME_NIDT_UUID_LEN); return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE, DMA_DIRECTION_FROM_DEVICE, req); -- 2.21.0
[PATCH v10 11/12] hw/block/nvme: Add injection of Offline/Read-Only zones
ZNS specification defines two zone conditions for the zones that no longer can function properly, possibly because of flash wear or other internal fault. It is useful to be able to "inject" a small number of such zones for testing purposes. This commit defines two optional device properties, "offline_zones" and "rdonly_zones". Users can assign non-zero values to these variables to specify the number of zones to be initialized as Offline or Read-Only. The actual number of injected zones may be smaller than the requested amount - Read-Only and Offline counts are expected to be much smaller than the total number of zones on a drive. Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel --- hw/block/nvme-ns.h | 2 ++ hw/block/nvme-ns.c | 52 ++ 2 files changed, 54 insertions(+) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 50a6a0e1ac..b30478e5d7 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -36,6 +36,8 @@ typedef struct NvmeNamespaceParams { uint32_t max_active_zones; uint32_t max_open_zones; uint32_t zd_extension_size; +uint32_t nr_offline_zones; +uint32_t nr_rdonly_zones; } NvmeNamespaceParams; typedef struct NvmeNamespace { diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index 85dc73cf06..5e4a6705cd 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -21,6 +21,7 @@ #include "sysemu/sysemu.h" #include "sysemu/block-backend.h" #include "qapi/error.h" +#include "crypto/random.h" #include "hw/qdev-properties.h" #include "hw/qdev-core.h" @@ -145,6 +146,20 @@ static int nvme_calc_zone_geometry(NvmeNamespace *ns, Error **errp) } } +if (ns->params.max_open_zones < nz) { +if (ns->params.nr_offline_zones > nz - ns->params.max_open_zones) { +error_setg(errp, "offline_zones value %u is too large", +ns->params.nr_offline_zones); +return -1; +} +if (ns->params.nr_rdonly_zones > +nz - ns->params.max_open_zones - ns->params.nr_offline_zones) { +error_setg(errp, "rdonly_zones value %u is too large", +ns->params.nr_rdonly_zones); +return -1; +} +} + return 0; } @@ -153,7 +168,9 @@ static void nvme_init_zone_state(NvmeNamespace *ns) uint64_t start = 0, zone_size = ns->zone_size; uint64_t capacity = ns->num_zones * zone_size; NvmeZone *zone; +uint32_t rnd; int i; +uint16_t zs; ns->zone_array = g_malloc0(ns->zone_array_size); if (ns->params.zd_extension_size) { @@ -180,6 +197,37 @@ static void nvme_init_zone_state(NvmeNamespace *ns) zone->w_ptr = start; start += zone_size; } + +/* If required, make some zones Offline or Read Only */ + +for (i = 0; i < ns->params.nr_offline_zones; i++) { +do { +qcrypto_random_bytes(, sizeof(rnd), NULL); +rnd %= ns->num_zones; +} while (rnd < ns->params.max_open_zones); +zone = >zone_array[rnd]; +zs = nvme_get_zone_state(zone); +if (zs != NVME_ZONE_STATE_OFFLINE) { +nvme_set_zone_state(zone, NVME_ZONE_STATE_OFFLINE); +} else { +i--; +} +} + +for (i = 0; i < ns->params.nr_rdonly_zones; i++) { +do { +qcrypto_random_bytes(, sizeof(rnd), NULL); +rnd %= ns->num_zones; +} while (rnd < ns->params.max_open_zones); +zone = >zone_array[rnd]; +zs = nvme_get_zone_state(zone); +if (zs != NVME_ZONE_STATE_OFFLINE && +zs != NVME_ZONE_STATE_READ_ONLY) { +nvme_set_zone_state(zone, NVME_ZONE_STATE_READ_ONLY); +} else { +i--; +} +} } static int nvme_zoned_init_ns(NvmeCtrl *n, NvmeNamespace *ns, int lba_index, @@ -353,6 +401,10 @@ static Property nvme_ns_props[] = { params.max_open_zones, 0), DEFINE_PROP_UINT32("zoned.descr_ext_size", NvmeNamespace, params.zd_extension_size, 0), +DEFINE_PROP_UINT32("zoned.offline_zones", NvmeNamespace, + params.nr_offline_zones, 0), +DEFINE_PROP_UINT32("zoned.rdonly_zones", NvmeNamespace, + params.nr_rdonly_zones, 0), DEFINE_PROP_END_OF_LIST(), }; -- 2.21.0
[PATCH v10 01/12] hw/block/nvme: Add Commands Supported and Effects log
This log page becomes necessary to implement to allow checking for Zone Append command support in Zoned Namespace Command Set. This commit adds the code to report this log page for NVM Command Set only. The parts that are specific to zoned operation will be added later in the series. All incoming admin and i/o commands are now only processed if their corresponding support bits are set in this log. This provides an easy way to control what commands to support and what not to depending on set CC.CSS. Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel --- hw/block/nvme-ns.h| 1 + include/block/nvme.h | 19 + hw/block/nvme.c | 96 +++ hw/block/trace-events | 1 + 4 files changed, 108 insertions(+), 9 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index 83734f4606..ea8c2f785d 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -29,6 +29,7 @@ typedef struct NvmeNamespace { int32_t bootindex; int64_t size; NvmeIdNs id_ns; +const uint32_t *iocs; NvmeNamespaceParams params; } NvmeNamespace; diff --git a/include/block/nvme.h b/include/block/nvme.h index 8a46d9cf01..f62cc90d49 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -745,10 +745,27 @@ enum NvmeSmartWarn { NVME_SMART_FAILED_VOLATILE_MEDIA = 1 << 4, }; +typedef struct NvmeEffectsLog { +uint32_tacs[256]; +uint32_tiocs[256]; +uint8_t resv[2048]; +} NvmeEffectsLog; + +enum { +NVME_CMD_EFF_CSUPP = 1 << 0, +NVME_CMD_EFF_LBCC = 1 << 1, +NVME_CMD_EFF_NCC= 1 << 2, +NVME_CMD_EFF_NIC= 1 << 3, +NVME_CMD_EFF_CCC= 1 << 4, +NVME_CMD_EFF_CSE_MASK = 3 << 16, +NVME_CMD_EFF_UUID_SEL = 1 << 19, +}; + enum NvmeLogIdentifier { NVME_LOG_ERROR_INFO = 0x01, NVME_LOG_SMART_INFO = 0x02, NVME_LOG_FW_SLOT_INFO = 0x03, +NVME_LOG_CMD_EFFECTS= 0x05, }; typedef struct QEMU_PACKED NvmePSD { @@ -861,6 +878,7 @@ enum NvmeIdCtrlFrmw { enum NvmeIdCtrlLpa { NVME_LPA_NS_SMART = 1 << 0, +NVME_LPA_CSE = 1 << 1, NVME_LPA_EXTENDED = 1 << 2, }; @@ -1060,6 +1078,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64); QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512); QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512); +QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16); diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 065e763e4f..702f7cc2e3 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -111,6 +111,28 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = { [NVME_TIMESTAMP]= NVME_FEAT_CAP_CHANGE, }; +static const uint32_t nvme_cse_acs[256] = { +[NVME_ADM_CMD_DELETE_SQ]= NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_CREATE_SQ]= NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_GET_LOG_PAGE] = NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_DELETE_CQ]= NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_CREATE_CQ]= NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_IDENTIFY] = NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_ABORT]= NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_SET_FEATURES] = NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP, +[NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP, +}; + +static const uint32_t nvme_cse_iocs_none[256]; + +static const uint32_t nvme_cse_iocs_nvm[256] = { +[NVME_CMD_FLUSH]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, +[NVME_CMD_WRITE_ZEROES] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, +[NVME_CMD_WRITE]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, +[NVME_CMD_READ] = NVME_CMD_EFF_CSUPP, +}; + static void nvme_process_sq(void *opaque); static uint16_t nvme_cid(NvmeRequest *req) @@ -1022,10 +1044,6 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req), req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode)); -if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_ADMIN_ONLY) { -return NVME_INVALID_OPCODE | NVME_DNR; -} - if (!nvme_nsid_valid(n, nsid)) { return NVME_INVALID_NSID | NVME_DNR; } @@ -1035,6 +1053,11 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } +if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) { +trace_pci_nvme_err_invalid_opc(req->cmd.opcode); +return NVME_INVALID_OPCODE | NVME_DNR; +} + switch (req->cmd.opcode) { case NVME_CMD_FLUSH: return nvme_flush(n, req); @@ -1044,8 +1067,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) case
[PATCH v10 12/12] hw/block/nvme: Document zoned parameters in usage text
Added brief descriptions of the new device properties that are now available to users to configure features of Zoned Namespace Command Set in the emulator. This patch is for documentation only, no functionality change. Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel --- hw/block/nvme.c | 47 ++- 1 file changed, 42 insertions(+), 5 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index e82e3be821..6043f95116 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -9,7 +9,7 @@ */ /** - * Reference Specs: http://www.nvmexpress.org, 1.2, 1.1, 1.0e + * Reference Specs: http://www.nvmexpress.org, 1.4, 1.3, 1.2, 1.1, 1.0e * * https://nvmexpress.org/developers/nvme-specification/ */ @@ -22,8 +22,9 @@ * [pmrdev=,] \ * max_ioqpairs=, \ * aerl=, aer_max_queued=, \ - * mdts= - * -device nvme-ns,drive=,bus=bus_name,nsid= + * mdts=,zoned.append_size_limit= \ + * -device nvme-ns,drive=,bus=,nsid=,\ + * zoned= * * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. @@ -41,14 +42,50 @@ * ~~ * - `aerl` * The Asynchronous Event Request Limit (AERL). Indicates the maximum number - * of concurrently outstanding Asynchronous Event Request commands suppoert + * of concurrently outstanding Asynchronous Event Request commands support * by the controller. This is a 0's based value. * * - `aer_max_queued` * This is the maximum number of events that the device will enqueue for - * completion when there are no oustanding AERs. When the maximum number of + * completion when there are no outstanding AERs. When the maximum number of * enqueued events are reached, subsequent events will be dropped. * + * - `zoned.append_size_limit` + * The maximum I/O size in bytes that is allowed in Zone Append command. + * The default is 128KiB. Since internally this this value is maintained as + * ZASL = log2( / ), some values assigned + * to this property may be rounded down and result in a lower maximum ZA + * data size being in effect. By setting this property to 0, users can make + * ZASL to be equal to MDTS. This property only affects zoned namespaces. + * + * Setting `zoned` to true selects Zoned Command Set at the namespace. + * In this case, the following namespace properties are available to configure + * zoned operation: + * zoned.zsze= + * The number may be followed by K, M, G as in kilo-, mega- or giga-. + * + * zoned.zcap= + * The value 0 (default) forces zone capacity to be the same as zone + * size. The value of this property may not exceed zone size. + * + * zoned.descr_ext_size= + * This value needs to be specified in 64B units. If it is zero, + * namespace(s) will not support zone descriptor extensions. + * + * zoned.max_active= + * The default value means there is no limit to the number of + * concurrently active zones. + * + * zoned.max_open= + * The default value means there is no limit to the number of + * concurrently open zones. + * + * zoned.offline_zones= + * + * zoned.rdonly_zones= + * + * zoned.cross_zone_read= + * Setting this property to true enables Read Across Zone Boundaries. */ #include "qemu/osdep.h" -- 2.21.0
[PATCH v10 06/12] hw/block/nvme: Support allocated CNS command variants
From: Niklas Cassel Many CNS commands have "allocated" command variants. These include a namespace as long as it is allocated, that is a namespace is included regardless if it is active (attached) or not. While these commands are optional (they are mandatory for controllers supporting the namespace attachment command), our QEMU implementation is more complete by actually providing support for these CNS values. However, since our QEMU model currently does not support the namespace attachment command, these new allocated CNS commands will return the same result as the active CNS command variants. In NVMe, a namespace is active if it exists and is attached to the controller. Add a new Boolean namespace flag, "attached", to provide the most basic namespace attachment support. The default value for this new flag is true. Also, implement the logic in the new CNS values to include/exclude namespaces based on this new property. The only thing missing is hooking up the actual Namespace Attachment command opcode, which will allow a user to toggle the "attached" flag per namespace. The reason for not hooking up this command completely is because the NVMe specification requires the namespace management command to be supported if the namespace attachment command is supported. Signed-off-by: Niklas Cassel Signed-off-by: Dmitry Fomichev Reviewed-by: Keith Busch --- hw/block/nvme-ns.h | 1 + include/block/nvme.h | 20 +++ hw/block/nvme-ns.c | 1 + hw/block/nvme.c | 46 +--- 4 files changed, 49 insertions(+), 19 deletions(-) diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h index d795e44bab..2d9cd29d07 100644 --- a/hw/block/nvme-ns.h +++ b/hw/block/nvme-ns.h @@ -31,6 +31,7 @@ typedef struct NvmeNamespace { int64_t size; NvmeIdNs id_ns; const uint32_t *iocs; +bool attached; uint8_t csi; NvmeNamespaceParams params; diff --git a/include/block/nvme.h b/include/block/nvme.h index af23514713..394db19022 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -806,14 +806,18 @@ typedef struct QEMU_PACKED NvmePSD { #define NVME_IDENTIFY_DATA_SIZE 4096 enum NvmeIdCns { -NVME_ID_CNS_NS= 0x00, -NVME_ID_CNS_CTRL = 0x01, -NVME_ID_CNS_NS_ACTIVE_LIST= 0x02, -NVME_ID_CNS_NS_DESCR_LIST = 0x03, -NVME_ID_CNS_CS_NS = 0x05, -NVME_ID_CNS_CS_CTRL = 0x06, -NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, -NVME_ID_CNS_IO_COMMAND_SET= 0x1c, +NVME_ID_CNS_NS= 0x00, +NVME_ID_CNS_CTRL = 0x01, +NVME_ID_CNS_NS_ACTIVE_LIST= 0x02, +NVME_ID_CNS_NS_DESCR_LIST = 0x03, +NVME_ID_CNS_CS_NS = 0x05, +NVME_ID_CNS_CS_CTRL = 0x06, +NVME_ID_CNS_CS_NS_ACTIVE_LIST = 0x07, +NVME_ID_CNS_NS_PRESENT_LIST = 0x10, +NVME_ID_CNS_NS_PRESENT= 0x11, +NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a, +NVME_ID_CNS_CS_NS_PRESENT = 0x1b, +NVME_ID_CNS_IO_COMMAND_SET= 0x1c, }; typedef struct QEMU_PACKED NvmeIdCtrl { diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c index c0362426cc..e191ef9be0 100644 --- a/hw/block/nvme-ns.c +++ b/hw/block/nvme-ns.c @@ -42,6 +42,7 @@ static void nvme_ns_init(NvmeNamespace *ns) id_ns->nsze = cpu_to_le64(nvme_ns_nlbas(ns)); ns->csi = NVME_CSI_NVM; +ns->attached = true; /* no thin provisioning */ id_ns->ncap = id_ns->nsze; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index bb82dd9975..7495cdb5ef 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -1236,6 +1236,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, uint32_t trans_len; NvmeNamespace *ns; time_t current_ms; +int i; if (off >= sizeof(smart)) { return NVME_INVALID_FIELD | NVME_DNR; @@ -1246,10 +1247,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len, if (!ns) { return NVME_INVALID_NSID | NVME_DNR; } -nvme_set_blk_stats(ns, ); } else { -int i; - for (i = 1; i <= n->num_namespaces; i++) { ns = nvme_ns(n, i); if (!ns) { @@ -1552,7 +1550,8 @@ static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } -static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req, + bool only_active) { NvmeNamespace *ns; NvmeIdentify *c = (NvmeIdentify *)>cmd; @@ -1569,11 +1568,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest *req) return nvme_rpt_empty_id_struct(n, req); } +if (only_active && !ns->attached) { +return nvme_rpt_empty_id_struct(n, req); +} + return nvme_dma(n, (uint8_t *)>id_ns,
[PATCH v10 00/12] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
v9 -> v10: - Correctly check for MDTS in Zone Management Receive handler. - Change Klaus' "Reviewed-by" email in UUID patch. v8 -> v9: - Move the modifications to "include/block/nvme.h" made to introduce ZNS-related definitions to a separate patch. - Add a new struct, NvmeZonedResult, along the same lines as the existing NvmeAerResult, to carry Zone Append LBA returned to the host. Now, there is no need to modify NvmeCqe struct except renaming DW1 field from "rsvd" to "dw1". - Add check for MDTS in Zone Management Receive handler. - Remove checks for ns->attached since the value of this flag is always true for now. - Rebase to the current quemu-nvme/nvme-next branch. v7 -> v8: - Move refactoring commits to the front of the series. - Remove "attached" and "fill_pattern" device properties. - Only close open zones upon subsystem shutdown, not when CC.EN flag is set to 0. Avoid looping through all zones by iterating through lists of open and closed zones. - Improve bulk processing of zones aka zoned operations with "all" flag set. Avoid looping through the entire zone array for all zone operations except Offline Zone. - Prefix ZNS-related property names with "zoned.". The "zoned" Boolean property is retained to turn on zoned command set as it is much more intuitive and user-friendly compared to setting a magic number value to csi property. - Address review comments. - Remove unused trace events. v6 -> v7: - Introduce ns->iocs initialization function earlier in the series, in CSE Log patch. - Set NVM iocs for zoned namespaces when CC.CSS is set to NVME_CC_CSS_NVM. - Clean up code in CSE log handler. v5 -> v6: - Remove zoned state persistence code. Replace position-independent zone lists with QTAILQs. - Close all open zones upon clearing of the controller. This is a similar procedure to the one previously performed upon powering up with zone persistence. - Squash NS Types and ZNS triplets of commits to keep definitions and trace event definitions together with the implementation code. - Move namespace UUID generation to a separate patch. Add the new "uuid" property as suggested by Klaus. - Rework Commands and Effects patch to make sure that the log is always in sync with the actual set of commands supported. - Add two refactoring commits at the end of the series to optimize read and write i/o path. - Incorporate feedback from Keith, Klaus and Niklas: * fix rebase errors in nvme_identify_ns_descr_list() * remove unnecessary code from nvme_write_bar() * move csi to NvmeNamespace and use it from the beginning in NSTypes patch * change zone read processing to cover all corner cases with RAZB=1 * sync w_ptr and d.wp in case of a i/o error at the preceding zone * reword the commit message in active/inactive patch with the new text from Niklas * correct dlfeat reporting depending on the fill pattern set * add more checks for "attached" n/s parameter to prevent i/o and get/set features on inactive namespaces * Use DEFINE_PROP_SIZE and DEFINE_PROP_SIZE32 for zone size/capacity and ZASL respectively * Improve zone size and capacity validation * Correctly report NSZE v4 -> v5: - Rebase to the current qemu-nvme. - Use HostMemoryBackendFile as the backing storage for persistent zone metadata. - Fix the issue with filling the valid data in the next zone if RAZB is enabled. v3 -> v4: - Fix bugs introduced in v2/v3 for QD > 1 operation. Now, all writes to a zone happen at the new write pointer variable, zone->w_ptr, that is advanced right after submitting the backend i/o. The existing zone->d.wp variable is updated upon the successful write completion and it is used for zone reporting. Some code has been split from nvme_finalize_zoned_write() function to a new function, nvme_advance_zone_wp(). - Make the code compile under mingw. Switch to using QEMU API for mmap/msync, i.e. memory_region...(). Since mmap is not available in mingw (even though there is mman-win32 library available on Github), conditional compilation is added around these calls to avoid undefined symbols under mingw. A better fix would be to add stub functions to softmmu/memory.c for the case when CONFIG_POSIX is not defined, but such change is beyond the scope of this patchset and it can be made in a separate patch. - Correct permission mask used to open zone metadata file. - Fold "Define 64 bit cqe.result" patch into ZNS commit. - Use clz64/clz32 instead of defining nvme_ilog2() function. - Simplify rpt_empty_id_struct() code, move nvme_fill_data() back to ZNS patch. - Fix a power-on processing bug. - Rename NVME_CMD_ZONE_APND to NVME_CMD_ZONE_APPEND. - Make the list of review comments addressed in v2 of the series (see below). v2 -> v3: - Moved nvme_fill_data() function to the NSTypes patch as it is now used there to output
[PATCH v10 03/12] hw/block/nvme: Separate read and write handlers
With ZNS support in place, the majority of code in nvme_rw() has become read- or write-specific. Move these parts to two separate handlers, nvme_read() and nvme_write() to make the code more readable and to remove multiple is_write checks that so far existed in the i/o path. This is a refactoring patch, no change in functionality. Signed-off-by: Dmitry Fomichev Reviewed-by: Niklas Cassel Acked-by: Klaus Jensen --- hw/block/nvme.c | 91 ++- hw/block/trace-events | 3 +- 2 files changed, 67 insertions(+), 27 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index ed3f38f01d..770e42a066 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -953,6 +953,54 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; } +static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req) +{ +NvmeRwCmd *rw = (NvmeRwCmd *)>cmd; +NvmeNamespace *ns = req->ns; +uint64_t slba = le64_to_cpu(rw->slba); +uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; +uint64_t data_size = nvme_l2b(ns, nlb); +uint64_t data_offset; +BlockBackend *blk = ns->blkconf.blk; +uint16_t status; + +trace_pci_nvme_read(nvme_cid(req), nvme_nsid(ns), nlb, data_size, slba); + +status = nvme_check_mdts(n, data_size); +if (status) { +trace_pci_nvme_err_mdts(nvme_cid(req), data_size); +goto invalid; +} + +status = nvme_check_bounds(n, ns, slba, nlb); +if (status) { +trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze); +goto invalid; +} + +status = nvme_map_dptr(n, data_size, req); +if (status) { +goto invalid; +} + +data_offset = nvme_l2b(ns, slba); + +block_acct_start(blk_get_stats(blk), >acct, data_size, + BLOCK_ACCT_READ); +if (req->qsg.sg) { +req->aiocb = dma_blk_read(blk, >qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); +} else { +req->aiocb = blk_aio_preadv(blk, data_offset, >iov, 0, +nvme_rw_cb, req); +} +return NVME_NO_COMPLETE; + +invalid: +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_READ); +return status | NVME_DNR; +} + static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) { NvmeRwCmd *rw = (NvmeRwCmd *)>cmd; @@ -978,22 +1026,19 @@ static uint16_t nvme_write_zeroes(NvmeCtrl *n, NvmeRequest *req) return NVME_NO_COMPLETE; } -static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) +static uint16_t nvme_write(NvmeCtrl *n, NvmeRequest *req) { NvmeRwCmd *rw = (NvmeRwCmd *)>cmd; NvmeNamespace *ns = req->ns; -uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t slba = le64_to_cpu(rw->slba); - +uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1; uint64_t data_size = nvme_l2b(ns, nlb); -uint64_t data_offset = nvme_l2b(ns, slba); -enum BlockAcctType acct = req->cmd.opcode == NVME_CMD_WRITE ? -BLOCK_ACCT_WRITE : BLOCK_ACCT_READ; +uint64_t data_offset; BlockBackend *blk = ns->blkconf.blk; uint16_t status; -trace_pci_nvme_rw(nvme_cid(req), nvme_io_opc_str(rw->opcode), - nvme_nsid(ns), nlb, data_size, slba); +trace_pci_nvme_write(nvme_cid(req), nvme_io_opc_str(rw->opcode), + nvme_nsid(ns), nlb, data_size, slba); status = nvme_check_mdts(n, data_size); if (status) { @@ -1012,29 +1057,22 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req) goto invalid; } -block_acct_start(blk_get_stats(blk), >acct, data_size, acct); +data_offset = nvme_l2b(ns, slba); + +block_acct_start(blk_get_stats(blk), >acct, data_size, + BLOCK_ACCT_WRITE); if (req->qsg.sg) { -if (acct == BLOCK_ACCT_WRITE) { -req->aiocb = dma_blk_write(blk, >qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); -} else { -req->aiocb = dma_blk_read(blk, >qsg, data_offset, - BDRV_SECTOR_SIZE, nvme_rw_cb, req); -} +req->aiocb = dma_blk_write(blk, >qsg, data_offset, + BDRV_SECTOR_SIZE, nvme_rw_cb, req); } else { -if (acct == BLOCK_ACCT_WRITE) { -req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0, - nvme_rw_cb, req); -} else { -req->aiocb = blk_aio_preadv(blk, data_offset, >iov, 0, -nvme_rw_cb, req); -} +req->aiocb = blk_aio_pwritev(blk, data_offset, >iov, 0, + nvme_rw_cb, req); } return NVME_NO_COMPLETE; invalid: -block_acct_invalid(blk_get_stats(ns->blkconf.blk), acct); -return status; +block_acct_invalid(blk_get_stats(blk), BLOCK_ACCT_WRITE); +return status | NVME_DNR;
RE: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set
> -Original Message- > From: Niklas Cassel > Sent: Friday, November 6, 2020 6:59 AM > To: Dmitry Fomichev > Cc: Keith Busch ; Klaus Jensen > ; Kevin Wolf ; Philippe > Mathieu-Daudé ; Max Reitz ; > Maxim Levitsky ; Fam Zheng ; > Alistair Francis ; Matias Bjorling > ; Damien Le Moal ; > qemu-bl...@nongnu.org; qemu-devel@nongnu.org > Subject: Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace > Command Set > > On Thu, Nov 05, 2020 at 11:53:38AM +0900, Dmitry Fomichev wrote: > > The emulation code has been changed to advertise NVM Command Set > when > > "zoned" device property is not set (default) and Zoned Namespace > > Command Set otherwise. > > > > Define values and structures that are needed to support Zoned > > Namespace Command Set (NVMe TP 4053) in PCI NVMe controller > emulator. > > Define trace events where needed in newly introduced code. > > > > In order to improve scalability, all open, closed and full zones > > are organized in separate linked lists. Consequently, almost all > > zone operations don't require scanning of the entire zone array > > (which potentially can be quite large) - it is only necessary to > > enumerate one or more zone lists. > > > > Handlers for three new NVMe commands introduced in Zoned Namespace > > Command Set specification are added, namely for Zone Management > > Receive, Zone Management Send and Zone Append. > > > > Device initialization code has been extended to create a proper > > configuration for zoned operation using device properties. > > > > Read/Write command handler is modified to only allow writes at the > > write pointer if the namespace is zoned. For Zone Append command, > > writes implicitly happen at the write pointer and the starting write > > pointer value is returned as the result of the command. Write Zeroes > > handler is modified to add zoned checks that are identical to those > > done as a part of Write flow. > > > > Subsequent commits in this series add ZDE support and checks for > > active and open zone limits. > > > > Signed-off-by: Niklas Cassel > > Signed-off-by: Hans Holmberg > > Signed-off-by: Ajay Joshi > > Signed-off-by: Chaitanya Kulkarni > > Signed-off-by: Matias Bjorling > > Signed-off-by: Aravind Ramesh > > Signed-off-by: Shin'ichiro Kawasaki > > Signed-off-by: Adam Manzanares > > Signed-off-by: Dmitry Fomichev > > Reviewed-by: Niklas Cassel > > --- > > hw/block/nvme-ns.h| 54 +++ > > hw/block/nvme.h | 8 + > > hw/block/nvme-ns.c| 173 > > hw/block/nvme.c | 971 > +- > > hw/block/trace-events | 18 +- > > 5 files changed, 1209 insertions(+), 15 deletions(-) > > > > (snip) > > > +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest > *req) > > +{ > > +NvmeCmd *cmd = (NvmeCmd *)>cmd; > > +NvmeNamespace *ns = req->ns; > > +/* cdw12 is zero-based number of dwords to return. Convert to bytes > */ > > +uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2; > > +uint32_t dw13 = le32_to_cpu(cmd->cdw13); > > +uint32_t zone_idx, zra, zrasf, partial; > > +uint64_t max_zones, nr_zones = 0; > > +uint16_t ret; > > +uint64_t slba; > > +NvmeZoneDescr *z; > > +NvmeZone *zs; > > +NvmeZoneReportHeader *header; > > +void *buf, *buf_p; > > +size_t zone_entry_sz; > > + > > +req->status = NVME_SUCCESS; > > + > > +ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, , _idx); > > +if (ret) { > > +return ret; > > +} > > + > > +zra = dw13 & 0xff; > > +if (zra != NVME_ZONE_REPORT) { > > +return NVME_INVALID_FIELD | NVME_DNR; > > +} > > + > > +zrasf = (dw13 >> 8) & 0xff; > > +if (zrasf > NVME_ZONE_REPORT_OFFLINE) { > > +return NVME_INVALID_FIELD | NVME_DNR; > > +} > > + > > +if (data_size < sizeof(NvmeZoneReportHeader)) { > > +return NVME_INVALID_FIELD | NVME_DNR; > > +} > > + > > +ret = nvme_map_dptr(n, data_size, req); > > nvme_map_dptr() call was not here in v8 patch set. > > On v7 I commented that you were missing a call to nvme_check_mdts(). > I think you still need to call nvme_check_mdts in order to verify > that data_size < mdts, no? Ugh, I've added nvme_map_dptr() instead of nvme_check_mdts() :o Will send the corrected version shortly... Cheers, DF > > This function already has a call do nvme_dma(). nvme_dma() already > calls nvme_map_dptr(). > If you use nvme_dma(), you cannot use nvme_map_dptr(). > > It will call nvme_map_addr() (which calls qemu_sglist_add()) on the > same buffer twice, causing the qsg->size to be twice what the user > sent in. Which will cause the: > > if (unlikely(residual)) { > > check in nvme_dma() to fail. > > > Looking at nvme_read()/nvme_write(), they use nvme_map_dptr() > (without any call to nvme_dma()), and then use dma_blk_read() or > dma_blk_write(). (and they both call nvme_check_mdts()) > > > Kind regards, > Niklas > > > +if (ret) { > > +return
Re: nbd: bitmap_to_extents() calls nbd_extent_array_add() without checking return value: coverity false positive?
On Fri, 6 Nov 2020 at 20:36, Eric Blake wrote: > > On 11/6/20 11:22 AM, Peter Maydell wrote: > > Hi; Coverity's "you usually check the return value of this function > > but you didn't do that here" heuristic has fired on the code in > > nbd/server.c:bitmap_to_extents() -- the function nbd_extent_array_add() > > is called five times in server.c, and the return value is checked > > in four of those, but not in the final call at the end of > > bitmap_to_extents(). (CID 1436125.) > > > > Is this a false positive, or should the caller be handling an > > error here ? > > False positive, but I don't mind tweaking the code to silence Coverity. > This should do it; let me know if I should turn it into a formal patch. > > diff --git i/nbd/server.c w/nbd/server.c > index d145e1a69083..377698a2ce85 100644 > --- i/nbd/server.c > +++ w/nbd/server.c > @@ -2128,9 +2128,8 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap, > } > } > > -if (!full) { > -/* last non dirty extent */ > -nbd_extent_array_add(es, end - start, 0); > +if (!full && nbd_extent_array_add(es, end - start, 0) < 0) { > +/* last non dirty extent, not a problem if array is now full */ > } > > bdrv_dirty_bitmap_unlock(bitmap); Hmm; that looks a little odd but I guess it's a bit more documentative of the intent. Up to you whether you want to submit it as a patch or not I guess :-) thanks -- PMM
Re: [PATCH v3 01/41] tcg: Enhance flush_icache_range with separate data pointer
On 11/6/20 12:31 PM, Alex Bennée wrote: >> +/* Flush the dcache at RW, and the icache at RX, as necessary. */ >> +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t >> len) >> { >> -__builtin___clear_cache((char *)start, (char *)stop); >> +/* TODO: Copy this from gcc to avoid 4 loops instead of 2. */ > > Why not do it now? Do you really want to be including that into a patch that touches all host backends? I do it later, in a change by itself. > I take it i386 is just too primitive to care about flushing things? Or the reverse. ;-) r~
Re: [Virtio-fs] [PATCH] virtiofsd: Use --thread-pool-size=0 to mean no thread pool
On Fri, Nov 06, 2020 at 08:33:50PM +, Venegas Munoz, Jose Carlos wrote: > Hi Vivek, > > I have tested with Kata 1.12-apha0, the results seems that are better for the > use fio config I am tracking. > > The fio config does randrw: > > fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio > --bs=4k --iodepth=64 --size=200M --readwrite=randrw --rwmixread=75 > Hi Carlos, Thanks for the testing. So basically two conclusions from your tests. - for virtiofs, --thread-pool-size=0 is performing better as comapred to --thread-pool-size=1 as well as --thread-pool-size=64. Approximately 35-40% better. - virtio-9p is still approximately 30% better than virtiofs --thread-pool-size=0. As I had done the analysis that this particular workload (mixed read and write) is bad with virtiofs because after every write we are invalidating attrs and cache so next read ends up fetching attrs again. I had posted patches to gain some of the performance. https://lore.kernel.org/linux-fsdevel/20200929185015.gg220...@redhat.com/ But I got the feedback to look into implementing file leases instead. Anyway, good to know that --thread-pool-size=0 narrows the performance gap substantially (more than half) for this workload. I will look into identifying further bottlenecks. Given in my tests, most of the workload benefited from --thread-pool-size=0, may be this can be the default and user can opt-in for a thread pool (--thread-pool-size=64) once it is known that a particular workload benefits from it. David, Stefan, WDYT? Thanks Vivek > - I can see better results with this patch > - 9pfs is still better in the case of Kata because of the use of: > https://github.com/kata-containers/packaging/blob/stable-1.12/qemu/patches/5.0.x/0001-9p-removing-coroutines-of-9p-to-increase-the-I-O-per.patch > > Results: > ./fio-results-run_virtiofs_tread_pool_0 >READ: bw=42.8MiB/s (44.8MB/s), 42.8MiB/s-42.8MiB/s (44.8MB/s-44.8MB/s), > io=150MiB (157MB), run=3507-3507msec > WRITE: bw=14.3MiB/s (14.9MB/s), 14.3MiB/s-14.3MiB/s (14.9MB/s-14.9MB/s), > io=49.0MiB (52.4MB), run=3507-3507msec > ./fio-results-run_9pfs >READ: bw=55.1MiB/s (57.8MB/s), 55.1MiB/s-55.1MiB/s (57.8MB/s-57.8MB/s), > io=150MiB (157MB), run=2722-2722msec > WRITE: bw=18.4MiB/s (19.3MB/s), 18.4MiB/s-18.4MiB/s (19.3MB/s-19.3MB/s), > io=49.0MiB (52.4MB), run=2722-2722msec > ./fio-results-run_virtiofs_tread_pool_1 >READ: bw=34.5MiB/s (36.1MB/s), 34.5MiB/s-34.5MiB/s (36.1MB/s-36.1MB/s), > io=150MiB (157MB), run=4354-4354msec > WRITE: bw=11.5MiB/s (12.0MB/s), 11.5MiB/s-11.5MiB/s (12.0MB/s-12.0MB/s), > io=49.0MiB (52.4MB), run=4354-4354msec > ./fio-results-run_virtiofs_tread_pool_64 >READ: bw=32.3MiB/s (33.8MB/s), 32.3MiB/s-32.3MiB/s (33.8MB/s-33.8MB/s), > io=150MiB (157MB), run=4648-4648msec > WRITE: bw=10.8MiB/s (11.3MB/s), 10.8MiB/s-10.8MiB/s (11.3MB/s-11.3MB/s), > io=49.0MiB (52.4MB), run=4648-4648msec > > Next: > - run https://github.com/rhvgoyal/virtiofs-tests for tread_pool_0, > tread_pool_64, and 9pfs > - Test > https://lore.kernel.org/linux-fsdevel/20201009181512.65496-1-vgo...@redhat.com/ > > All the testing for kata is based in: > https://github.com/jcvenegas/mrunner/blob/master/scripts/bash_workloads/build-qemu-and-run-fio-test.sh > > I ran this using an azure VM. > > Cheers, > Carlos > > > On 05/11/20 13:53, "Vivek Goyal" wrote: > > On Thu, Nov 05, 2020 at 02:44:16PM -0500, Vivek Goyal wrote: > > Right now we create a thread pool and main thread hands over the request > > to thread in thread pool to process. Number of threads in thread pool > > can be managed by option --thread-pool-size. > > > > There is a chance that in case of some workloads, we might get better > > performance if we don't handover the request to a different thread > > and process in the context of thread receiving the request. > > > > To implement that, redefine the meaning of --thread-pool-size=0 to > > mean that don't use a thread pool. Instead process the request in > > the context of thread receiving request from the queue. > > > > I can't think how --thread-pool-size=0 is useful and hence using > > that. If it is already useful somehow, I could look at defining > > a new option say "--no-thread-pool". > > > > I think this patch will be used more as a debug help to do comparison > > when it is more effecient to do not hand over the requests to a > > thread pool. > > I ran virtiofs-tests to comapre --thread-pool-size=0 and > --thread-pool-size=64. And results seem to be all over the place. In > some cases thread pool seems to perform batter and in other cases > no-thread-pool seems to perform better. > > But in general it looks like that except for the case of libaio workload, > no-thread-pool is helping. > > Thanks > Vivek > > NAMEWORKLOADBandwidth IOPS
Re: [PATCH v2 00/44] Make qdev static property API usable by any QOM type
On Fri, Nov 06, 2020 at 10:50:19AM -0500, Eduardo Habkost wrote: > On Fri, Nov 06, 2020 at 10:45:11AM +0100, Kevin Wolf wrote: > > Am 04.11.2020 um 16:59 hat Eduardo Habkost geschrieben: > > > This series refactor the qdev property code so the static > > > property system can be used by any QOM type. As an example, at > > > the end of the series some properties in TYPE_MACHINE are > > > converted to static properties to demonstrate the new API. > > > > > > Changes v1 -> v2 > > > > > > > > > * Rename functions and source files to call the new feature > > > "field property" instead of "static property" > > > > > > * Change the API signature from an array-based interface similar > > > to device_class_set_props() to a object_property_add()-like > > > interface. > > > > > > This means instead of doing this: > > > > > > object_class_property_add_static_props(oc, my_array_of_props); > > > > > > properties are registered like this: > > > > > > object_class_property_add_field(oc, "property-name" > > > PROP_XXX(MyState, my_field), > > > prop_allow_set_always); > > > > > > where PROP_XXX is a PROP_* macro like PROP_STRING, PROP_BOOL, > > > etc. > > > > In comparison, I really like the resulting code from the array based > > interface in v1 better. > > > > I think it's mostly for two reasons: First, the array is much more > > compact and easier to read. And maybe even more importantly, you know > > it's static data and only static data. The function based interface can > > be mixed with other code or the parameter list can contain calls to > > other functions with side effects, so there are a lot more opportunities > > for surprises. > > This is a really good point, and I strongly agree with it. > Letting code do funny tricks at runtime is one of the reasons QOM > properties became hard to introspect. > > > > > What I didn't like about the v1 interface is that there is still a > > separate object_class_property_set_description() for each property, but > > I think that could have been fixed by moving the description to the > > definitions in the array, too. > > This would be very easy to implement. This was implemented at: https://gitlab.com/ehabkost/qemu/-/commits/work/qdev-make-generic This is the interface I'd like to submit as v3: static Property machine_props[] = { DEFINE_PROP_STRING("kernel", MachineState, kernel_filename, .description = "Linux kernel image file"), DEFINE_PROP_STRING("initrd", MachineState, initrd_filename, .description = "Linux initial ramdisk file"), DEFINE_PROP_STRING("append", MachineState, kernel_cmdline, .description = "Linux kernel command line"), DEFINE_PROP_STRING("dtb", MachineState, dtb, .description = "Linux kernel device tree file"), DEFINE_PROP_STRING("dumpdtb", MachineState, dumpdtb, .description = "Dump current dtb to a file and quit"), DEFINE_PROP_STRING("dt-compatible", MachineState, dt_compatible, .description = "Overrides the \"compatible\" " "property of the dt root node"), DEFINE_PROP_STRING("firmware", MachineState, firmware, .description = "Firmware image"), DEFINE_PROP_STRING("memory-backend", MachineState, ram_memdev_id, .description = "ID of memory backend object"), DEFINE_PROP_BOOL("dump-guest-core", MachineState, dump_guest_core, true, .description = "Include guest memory in a core dump"), DEFINE_PROP_BOOL("mem-merge", MachineState, mem_merge, true, .description = "Enable/disable memory merge support"), DEFINE_PROP_BOOL("graphics", MachineState, enable_graphics, true, .description = "Set on/off to enable/disable graphics emulation"), DEFINE_PROP_BOOL("suppress-vmdesc", MachineState, suppress_vmdesc, false, .description = "Set on to disable self-describing migration"), DEFINE_PROP_UINT32("phandle-start", MachineState, phandle_start, 0, .description = "The first phandle ID we may generate dynamically"), DEFINE_PROP_END_OF_LIST(), }; static void machine_class_init(ObjectClass *oc, void *data) { ... object_class_add_field_properties(oc, machine_props, prop_allow_set_always); ... } > > > > > On Fri, Oct 30, 2020 at 06:10:34PM +0100, Paolo Bonzini wrote: > > > On 29/10/20 23:02, Eduardo Habkost wrote: > > > > +static Property machine_props[] = { > > > > +DEFINE_PROP_STRING("kernel", MachineState, kernel_filename), > > > > +DEFINE_PROP_STRING("initrd", MachineState, initrd_filename), > > > > +DEFINE_PROP_STRING("append", MachineState, kernel_cmdline), > > > > +DEFINE_PROP_STRING("dtb", MachineState, dtb), > > > > +
[PATCH] meson: always include contrib/libvhost-user
libvhost-user is needed when CONFIG_LINUX is set. The CONFIG_VHOST_USER check in meson.build is incorrect. In fact, no explicit check is needed since this dependency is not built by default. If something declares a dependency on libvhost-user then it will be built, otherwise it won't be built (i.e. on non-Linux hosts). This fixes ./configure --disable-vhost-user && make. Fixes: bc15e44cb2191bbb2318878acdf5038134e56394 ("configure: introduce --enable-vhost-user-blk-server") Reported-by: Philippe Mathieu-Daudé Reported-by: Michael S. Tsirkin Signed-off-by: Stefan Hajnoczi --- meson.build | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/meson.build b/meson.build index f5175010df..b473620321 100644 --- a/meson.build +++ b/meson.build @@ -1450,11 +1450,7 @@ trace_events_subdirs += [ 'util', ] -vhost_user = not_found -if 'CONFIG_VHOST_USER' in config_host - subdir('contrib/libvhost-user') -endif - +subdir('contrib/libvhost-user') subdir('qapi') subdir('qobject') subdir('stubs') -- 2.28.0
Re: nbd: bitmap_to_extents() calls nbd_extent_array_add() without checking return value: coverity false positive?
On 11/6/20 11:22 AM, Peter Maydell wrote: > Hi; Coverity's "you usually check the return value of this function > but you didn't do that here" heuristic has fired on the code in > nbd/server.c:bitmap_to_extents() -- the function nbd_extent_array_add() > is called five times in server.c, and the return value is checked > in four of those, but not in the final call at the end of > bitmap_to_extents(). (CID 1436125.) > > Is this a false positive, or should the caller be handling an > error here ? False positive, but I don't mind tweaking the code to silence Coverity. This should do it; let me know if I should turn it into a formal patch. diff --git i/nbd/server.c w/nbd/server.c index d145e1a69083..377698a2ce85 100644 --- i/nbd/server.c +++ w/nbd/server.c @@ -2128,9 +2128,8 @@ static void bitmap_to_extents(BdrvDirtyBitmap *bitmap, } } -if (!full) { -/* last non dirty extent */ -nbd_extent_array_add(es, end - start, 0); +if (!full && nbd_extent_array_add(es, end - start, 0) < 0) { +/* last non dirty extent, not a problem if array is now full */ } bdrv_dirty_bitmap_unlock(bitmap); -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Re: [Virtio-fs] [PATCH] virtiofsd: Use --thread-pool-size=0 to mean no thread pool
Hi Vivek, I have tested with Kata 1.12-apha0, the results seems that are better for the use fio config I am tracking. The fio config does randrw: fio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=200M --readwrite=randrw --rwmixread=75 - I can see better results with this patch - 9pfs is still better in the case of Kata because of the use of: https://github.com/kata-containers/packaging/blob/stable-1.12/qemu/patches/5.0.x/0001-9p-removing-coroutines-of-9p-to-increase-the-I-O-per.patch Results: ./fio-results-run_virtiofs_tread_pool_0 READ: bw=42.8MiB/s (44.8MB/s), 42.8MiB/s-42.8MiB/s (44.8MB/s-44.8MB/s), io=150MiB (157MB), run=3507-3507msec WRITE: bw=14.3MiB/s (14.9MB/s), 14.3MiB/s-14.3MiB/s (14.9MB/s-14.9MB/s), io=49.0MiB (52.4MB), run=3507-3507msec ./fio-results-run_9pfs READ: bw=55.1MiB/s (57.8MB/s), 55.1MiB/s-55.1MiB/s (57.8MB/s-57.8MB/s), io=150MiB (157MB), run=2722-2722msec WRITE: bw=18.4MiB/s (19.3MB/s), 18.4MiB/s-18.4MiB/s (19.3MB/s-19.3MB/s), io=49.0MiB (52.4MB), run=2722-2722msec ./fio-results-run_virtiofs_tread_pool_1 READ: bw=34.5MiB/s (36.1MB/s), 34.5MiB/s-34.5MiB/s (36.1MB/s-36.1MB/s), io=150MiB (157MB), run=4354-4354msec WRITE: bw=11.5MiB/s (12.0MB/s), 11.5MiB/s-11.5MiB/s (12.0MB/s-12.0MB/s), io=49.0MiB (52.4MB), run=4354-4354msec ./fio-results-run_virtiofs_tread_pool_64 READ: bw=32.3MiB/s (33.8MB/s), 32.3MiB/s-32.3MiB/s (33.8MB/s-33.8MB/s), io=150MiB (157MB), run=4648-4648msec WRITE: bw=10.8MiB/s (11.3MB/s), 10.8MiB/s-10.8MiB/s (11.3MB/s-11.3MB/s), io=49.0MiB (52.4MB), run=4648-4648msec Next: - run https://github.com/rhvgoyal/virtiofs-tests for tread_pool_0, tread_pool_64, and 9pfs - Test https://lore.kernel.org/linux-fsdevel/20201009181512.65496-1-vgo...@redhat.com/ All the testing for kata is based in: https://github.com/jcvenegas/mrunner/blob/master/scripts/bash_workloads/build-qemu-and-run-fio-test.sh I ran this using an azure VM. Cheers, Carlos On 05/11/20 13:53, "Vivek Goyal" wrote: On Thu, Nov 05, 2020 at 02:44:16PM -0500, Vivek Goyal wrote: > Right now we create a thread pool and main thread hands over the request > to thread in thread pool to process. Number of threads in thread pool > can be managed by option --thread-pool-size. > > There is a chance that in case of some workloads, we might get better > performance if we don't handover the request to a different thread > and process in the context of thread receiving the request. > > To implement that, redefine the meaning of --thread-pool-size=0 to > mean that don't use a thread pool. Instead process the request in > the context of thread receiving request from the queue. > > I can't think how --thread-pool-size=0 is useful and hence using > that. If it is already useful somehow, I could look at defining > a new option say "--no-thread-pool". > > I think this patch will be used more as a debug help to do comparison > when it is more effecient to do not hand over the requests to a > thread pool. I ran virtiofs-tests to comapre --thread-pool-size=0 and --thread-pool-size=64. And results seem to be all over the place. In some cases thread pool seems to perform batter and in other cases no-thread-pool seems to perform better. But in general it looks like that except for the case of libaio workload, no-thread-pool is helping. Thanks Vivek NAMEWORKLOADBandwidth IOPS thread-pool seqread-psync 682.4mb 170.6k no-thread-pool seqread-psync 679.3mb 169.8k thread-pool seqread-psync-multi 2415.9mb603.9k no-thread-pool seqread-psync-multi 2528.5mb632.1k thread-pool seqread-mmap591.7mb 147.9k no-thread-pool seqread-mmap595.6mb 148.9k thread-pool seqread-mmap-multi 2195.3mb548.8k no-thread-pool seqread-mmap-multi 2286.1mb571.5k thread-pool seqread-libaio 329.1mb 82.2k no-thread-pool seqread-libaio 271.5mb 67.8k thread-pool seqread-libaio-multi1387.1mb346.7k no-thread-pool seqread-libaio-multi1508.2mb377.0k thread-pool randread-psync 59.0mb 14.7k no-thread-pool randread-psync 78.5mb 19.6k thread-pool randread-psync-multi226.4mb 56.6k no-thread-pool randread-psync-multi289.2mb 72.3k thread-pool
Re: [PATCH v3 01/41] tcg: Enhance flush_icache_range with separate data pointer
Richard Henderson writes: > We are shortly going to have a split rw/rx jit buffer. Depending > on the host, we need to flush the dcache at the rw data pointer and > flush the icache at the rx code pointer. > > For now, the two passed pointers are identical, so there is no > effective change in behaviour. > > Signed-off-by: Richard Henderson > --- > tcg/aarch64/tcg-target.h | 9 +++-- > tcg/arm/tcg-target.h | 8 ++-- > tcg/i386/tcg-target.h| 3 ++- > tcg/mips/tcg-target.h| 8 ++-- > tcg/ppc/tcg-target.h | 2 +- > tcg/riscv/tcg-target.h | 8 ++-- > tcg/s390/tcg-target.h| 3 ++- > tcg/sparc/tcg-target.h | 8 +--- > tcg/tci/tcg-target.h | 3 ++- > softmmu/physmem.c| 9 - > tcg/tcg.c| 6 -- > tcg/aarch64/tcg-target.c.inc | 2 +- > tcg/mips/tcg-target.c.inc| 2 +- > tcg/ppc/tcg-target.c.inc | 21 +++-- > tcg/sparc/tcg-target.c.inc | 4 ++-- > 15 files changed, 64 insertions(+), 32 deletions(-) > > diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h > index 663dd0b95e..d0a6a059b7 100644 > --- a/tcg/aarch64/tcg-target.h > +++ b/tcg/aarch64/tcg-target.h > @@ -148,9 +148,14 @@ typedef enum { > #define TCG_TARGET_DEFAULT_MO (0) > #define TCG_TARGET_HAS_MEMORY_BSWAP 1 > > -static inline void flush_icache_range(uintptr_t start, uintptr_t stop) > +/* Flush the dcache at RW, and the icache at RX, as necessary. */ > +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t > len) > { > -__builtin___clear_cache((char *)start, (char *)stop); > +/* TODO: Copy this from gcc to avoid 4 loops instead of 2. */ Why not do it now? > +if (rw != rx) { > +__builtin___clear_cache((char *)rw, (char *)(rw + len)); > +} > +__builtin___clear_cache((char *)rx, (char *)(rx + len)); > } > > > -static inline void flush_icache_range(uintptr_t start, uintptr_t stop) > +/* Flush the dcache at RW, and the icache at RX, as necessary. */ > +static inline void flush_idcache_range(uintptr_t rx, uintptr_t rw, size_t > len) > { > } I take it i386 is just too primitive to care about flushing things? Anyway: Reviewed-by: Alex Bennée -- Alex Bennée
Re: [PULL 33/43] target/ppc: convert xxspltw to vector operations
On 11/6/20 10:47 AM, Peter Maydell wrote: >> +#ifndef HOST_WORDS_BIG_ENDIAN >> +bofs ^= 8 | 4; >> +#endif > > The ifdef is HOST_WORDS_BIGENDIAN without the > third underscore, so this XOR operation will be > done on both little and big-endian hosts. Ho hum. > Should the ifndef line be fixed... This. I once had a patch set that changed all of our endian tests from defined/undef to true/false, so that we could detect errors like this. Perhaps I'll try to recreate it next dev cycle... r~
Re: [PATCH v1] docs/devel: Add VFIO device migration documentation
On Sat, 7 Nov 2020 00:27:46 +0530 Kirti Wankhede wrote: > On 11/6/2020 2:56 AM, Alex Williamson wrote: > > On Fri, 6 Nov 2020 02:22:11 +0530 > > Kirti Wankhede wrote: > > > >> On 11/6/2020 12:41 AM, Alex Williamson wrote: > >>> On Fri, 6 Nov 2020 00:29:36 +0530 > >>> Kirti Wankhede wrote: > >>> > On 11/4/2020 6:15 PM, Alex Williamson wrote: > > On Wed, 4 Nov 2020 13:25:40 +0530 > > Kirti Wankhede wrote: > > > >> On 11/4/2020 1:57 AM, Alex Williamson wrote: > >>> On Wed, 4 Nov 2020 01:18:12 +0530 > >>> Kirti Wankhede wrote: > >>> > On 10/30/2020 12:35 AM, Alex Williamson wrote: > > On Thu, 29 Oct 2020 23:11:16 +0530 > > Kirti Wankhede wrote: > > > > > > +System memory dirty pages tracking > +-- > + > +A ``log_sync`` memory listener callback is added to mark system > memory pages > >>> > >>> s/is added to mark/marks those/ > >>> > +as dirty which are used for DMA by VFIO device. Dirty pages > bitmap is queried > >>> > >>> s/by/by the/ > >>> s/Dirty/The dirty/ > >>> > +per container. All pages pinned by vendor driver through > vfio_pin_pages() > >>> > >>> s/by/by the/ > >>> > +external API have to be marked as dirty during migration. When > there are CPU > +writes, CPU dirty page tracking can identify dirtied pages, but > any page pinned > +by vendor driver can also be written by device. There is > currently no device > >>> > >>> s/by/by the/ (x2) > >>> > +which has hardware support for dirty page tracking. So all > pages which are > +pinned by vendor driver are considered as dirty. > +Dirty pages are tracked when device is in stop-and-copy phase > because if pages > +are marked dirty during pre-copy phase and content is > transfered from source to > +destination, there is no way to know newly dirtied pages from > the point they > +were copied earlier until device stops. To avoid repeated copy > of same content, > +pinned pages are marked dirty only during stop-and-copy phase. > >> > >> > >>> Let me take a quick stab at rewriting this paragraph (not sure if > >>> I > >>> understood it correctly): > >>> > >>> "Dirty pages are tracked when the device is in the stop-and-copy > >>> phase. > >>> During the pre-copy phase, it is not possible to distinguish a > >>> dirty > >>> page that has been transferred from the source to the destination > >>> from > >>> newly dirtied pages, which would lead to repeated copying of the > >>> same > >>> content. Therefore, pinned pages are only marked dirty during the > >>> stop-and-copy phase." ? > >>> > >> > >> I think above rephrase only talks about repeated copying in > >> pre-copy > >> phase. Used "copied earlier until device stops" to indicate both > >> pre-copy and stop-and-copy till device stops. > > > > > > Now I'm confused, I thought we had abandoned the idea that we can > > only > > report pinned pages during stop-and-copy. Doesn't the device needs > > to > > expose its dirty memory footprint during the iterative phase > > regardless > > of whether that causes repeat copies? If QEMU iterates and sees > > that > > all memory is still dirty, it may have transferred more data, but it > > can actually predict if it can achieve its downtime tolerances. > > Which > > is more important, less data transfer or predictability? Thanks, > > > > Even if QEMU copies and transfers content of all sys mem pages during > pre-copy (worst case with IOMMU backed mdev device when its vendor > driver is not smart to pin pages explicitly and all sys mem pages are > marked dirty), then also its prediction about downtime tolerance will > not be correct, because during stop-and-copy again all pages need to > be > copied as device can write to any of those pinned pages. > >>> > >>> I think you're only reiterating my point. If QEMU copies all of guest > >>> memory during the iterative phase and each time it sees that all >
[PATCH-for-5.2] configure: Check vhost-user is available to build vhost-user-blk-server
Check vhost-user is available to build vhost-user-blk-server to fix: $ ../configure \ --disable-vhost-user --enable-vhost-user-blk-server && \ make qemu-nbd ... [505/505] Linking target qemu-nbd FAILED: qemu-nbd cc -o qemu-nbd qemu-nbd.p/qemu-nbd.c.o -Wl,--as-needed -Wl,--no-undefined -pie -Wl,--whole-archive libblockdev.fa libblock.fa libcrypto.fa libauthz.fa libqom.fa libio.fa -Wl,--no-whole-archive -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -m64 -fstack-protector-strong -Wl,--start-group libqemuutil.a libblockdev.fa libblock.fa libcrypto.fa libauthz.fa libqom.fa libio.fa @block.syms -lgio-2.0 -lgobject-2.0 -lglib-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0 -pthread -lgnutls -lutil -lm -lgthread-2.0 -lglib-2.0 -lbz2 -lgthread-2.0 -lglib-2.0 -lssh -lrbd -lrados -lcurl -lxml2 -lzstd -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid -laio /usr/lib64/libz.so -L/usr/lib64/iscsi -liscsi -lnettle -lgnutls -lpam -Wl,--end-group /usr/bin/ld: libblockdev.fa(block_export_vhost-user-blk-server.c.o): in function `vu_blk_process_vq': block/export/vhost-user-blk-server.c:203: undefined reference to `vu_get_queue' /usr/bin/ld: block/export/vhost-user-blk-server.c:208: undefined reference to `vu_queue_pop' /usr/bin/ld: libblockdev.fa(block_export_vhost-user-blk-server.c.o): in function `vu_blk_queue_set_started': block/export/vhost-user-blk-server.c:228: undefined reference to `vu_get_queue' /usr/bin/ld: libblockdev.fa(block_export_vhost-user-blk-server.c.o): in function `vu_blk_req_complete': block/export/vhost-user-blk-server.c:55: undefined reference to `vu_queue_push' /usr/bin/ld: block/export/vhost-user-blk-server.c:56: undefined reference to `vu_queue_notify' /usr/bin/ld: libblockdev.fa(block_export_vhost-user-blk-server.c.o): in function `vu_blk_queue_set_started': block/export/vhost-user-blk-server.c:229: undefined reference to `vu_set_queue_handler' /usr/bin/ld: libqemuutil.a(util_vhost-user-server.c.o): in function `vu_client_trip': util/vhost-user-server.c:176: undefined reference to `vu_dispatch' /usr/bin/ld: util/vhost-user-server.c:180: undefined reference to `vu_deinit' /usr/bin/ld: libqemuutil.a(util_vhost-user-server.c.o): in function `vu_accept': util/vhost-user-server.c:291: undefined reference to `vu_init' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed. make: *** [Makefile:171: run-ninja] Error 1 Fixes: bc15e44cb21 ("configure: introduce --enable-vhost-user-blk-server") Signed-off-by: Philippe Mathieu-Daudé --- configure | 3 +++ 1 file changed, 3 insertions(+) diff --git a/configure b/configure index 805f7791503..5079ebb416a 100755 --- a/configure +++ b/configure @@ -2390,6 +2390,9 @@ fi # libvhost-user is Linux-only test "$vhost_user_blk_server" = "" && vhost_user_blk_server=$linux +if test "$vhost_user_blk_server" = "yes" && test "$vhost_user" = "no"; then + error_exit "--enable-vhost-user-blk-server requires --enable-vhost-user" +fi if test "$vhost_user_blk_server" = "yes" && test "$linux" = "no"; then error_exit "--enable-vhost-user-blk-server is only available on Linux" fi -- 2.26.2
Re: [PATCH v2 04/11] qapi/introspect.py: add assertions and casts
On Mon, Oct 26, 2020 at 03:42:44PM -0400, John Snow wrote: > This is necessary to keep mypy passing in the next patch when we add > preliminary type hints. It will be removed shortly. > > Signed-off-by: John Snow > --- Reviewed-by: Cleber Rosa signature.asc Description: PGP signature
Re: [PATCH v1] docs/devel: Add VFIO device migration documentation
On 11/6/2020 2:56 AM, Alex Williamson wrote: On Fri, 6 Nov 2020 02:22:11 +0530 Kirti Wankhede wrote: On 11/6/2020 12:41 AM, Alex Williamson wrote: On Fri, 6 Nov 2020 00:29:36 +0530 Kirti Wankhede wrote: On 11/4/2020 6:15 PM, Alex Williamson wrote: On Wed, 4 Nov 2020 13:25:40 +0530 Kirti Wankhede wrote: On 11/4/2020 1:57 AM, Alex Williamson wrote: On Wed, 4 Nov 2020 01:18:12 +0530 Kirti Wankhede wrote: On 10/30/2020 12:35 AM, Alex Williamson wrote: On Thu, 29 Oct 2020 23:11:16 +0530 Kirti Wankhede wrote: +System memory dirty pages tracking +-- + +A ``log_sync`` memory listener callback is added to mark system memory pages s/is added to mark/marks those/ +as dirty which are used for DMA by VFIO device. Dirty pages bitmap is queried s/by/by the/ s/Dirty/The dirty/ +per container. All pages pinned by vendor driver through vfio_pin_pages() s/by/by the/ +external API have to be marked as dirty during migration. When there are CPU +writes, CPU dirty page tracking can identify dirtied pages, but any page pinned +by vendor driver can also be written by device. There is currently no device s/by/by the/ (x2) +which has hardware support for dirty page tracking. So all pages which are +pinned by vendor driver are considered as dirty. +Dirty pages are tracked when device is in stop-and-copy phase because if pages +are marked dirty during pre-copy phase and content is transfered from source to +destination, there is no way to know newly dirtied pages from the point they +were copied earlier until device stops. To avoid repeated copy of same content, +pinned pages are marked dirty only during stop-and-copy phase. Let me take a quick stab at rewriting this paragraph (not sure if I understood it correctly): "Dirty pages are tracked when the device is in the stop-and-copy phase. During the pre-copy phase, it is not possible to distinguish a dirty page that has been transferred from the source to the destination from newly dirtied pages, which would lead to repeated copying of the same content. Therefore, pinned pages are only marked dirty during the stop-and-copy phase." ? I think above rephrase only talks about repeated copying in pre-copy phase. Used "copied earlier until device stops" to indicate both pre-copy and stop-and-copy till device stops. Now I'm confused, I thought we had abandoned the idea that we can only report pinned pages during stop-and-copy. Doesn't the device needs to expose its dirty memory footprint during the iterative phase regardless of whether that causes repeat copies? If QEMU iterates and sees that all memory is still dirty, it may have transferred more data, but it can actually predict if it can achieve its downtime tolerances. Which is more important, less data transfer or predictability? Thanks, Even if QEMU copies and transfers content of all sys mem pages during pre-copy (worst case with IOMMU backed mdev device when its vendor driver is not smart to pin pages explicitly and all sys mem pages are marked dirty), then also its prediction about downtime tolerance will not be correct, because during stop-and-copy again all pages need to be copied as device can write to any of those pinned pages. I think you're only reiterating my point. If QEMU copies all of guest memory during the iterative phase and each time it sees that all memory is dirty, such as if CPUs or devices (including assigned devices) are dirtying pages as fast as it copies them (or continuously marks them dirty), then QEMU can predict that downtime will require copying all pages. But as of now there is no way to know if device has dirtied pages during iterative phase. This claim doesn't make any sense, pinned pages are considered persistently dirtied, during the iterative phase and while stopped. If instead devices don't mark dirty pages until the VM is stopped, then QEMU might iterate through memory copy and predict a short downtime because not much memory is dirty, only to be surprised that all of memory is suddenly dirty. At that point it's too late, the VM is already stopped, the predicted short downtime takes far longer than expected. This is exactly why we made the kernel interface mark pinned pages persistently dirty when it was proposed that we only report pinned pages once. Thanks, Since there is no way to know if device dirtied pages during iterative phase, QEMU should query pinned pages in stop-and-copy phase. As above, I don't believe this is true. Whenever there will be hardware support or some software mechanism to report pages dirtied by device then we will add a capability bit in migration capability and based on that capability bit qemu/user space app should decide to query dirty pages in iterative phase. Yes, we could advertise support
Re: [PULL 33/43] target/ppc: convert xxspltw to vector operations
On Mon, 18 Feb 2019 at 14:31, David Gibson wrote: > > From: Richard Henderson > > Signed-off-by: Richard Henderson > Acked-by: David Gibson > Message-Id: <20190215100058.20015-8-mark.cave-ayl...@ilande.co.uk> > Signed-off-by: David Gibson So this is a commit from 18 months back, but I happened to notice an oddity in it while grepping the code base for something else: > diff --git a/target/ppc/translate/vsx-impl.inc.c > b/target/ppc/translate/vsx-impl.inc.c > index 944fc0608a..0e8cecb00a 100644 > --- a/target/ppc/translate/vsx-impl.inc.c > +++ b/target/ppc/translate/vsx-impl.inc.c > @@ -1359,38 +1359,24 @@ static void gen_xxsel(DisasContext * ctx) > > static void gen_xxspltw(DisasContext *ctx) > { > -TCGv_i64 b, b2; > -TCGv_i64 vsr; > +int rt = xT(ctx->opcode); > +int rb = xB(ctx->opcode); > +int uim = UIM(ctx->opcode); > +int tofs, bofs; [...] > -tcg_gen_shli_i64(b2, b, 32); > -tcg_gen_or_i64(vsr, b, b2); > -set_cpu_vsrh(xT(ctx->opcode), vsr); > -set_cpu_vsrl(xT(ctx->opcode), vsr); > +tofs = vsr_full_offset(rt); > +bofs = vsr_full_offset(rb); > +bofs += uim << MO_32; > +#ifndef HOST_WORDS_BIG_ENDIAN > +bofs ^= 8 | 4; > +#endif The ifdef is HOST_WORDS_BIGENDIAN without the third underscore, so this XOR operation will be done on both little and big-endian hosts. Should the ifndef line be fixed, or is this actually a case where the xor really should be done on all hosts so we should drop the ifndef/endif lines? > > -tcg_temp_free_i64(vsr); > -tcg_temp_free_i64(b); > -tcg_temp_free_i64(b2); > +tcg_gen_gvec_dup_mem(MO_32, tofs, bofs, 16, 16); > } thanks -- PMM
Re: [PATCH for-5.2 3/4] hw/net/can/ctucan_core: Handle big-endian hosts
On Friday 06 of November 2020 19:29:27 Philippe Mathieu-Daudé wrote: > On 11/6/20 6:11 PM, Peter Maydell wrote: > > The ctucan driver defines types for its registers which are a union > > of a uint32_t with a struct with bitfields for the individual > > fields within that register. This is a bad idea, because bitfields > > aren't portable. The ctu_can_fd_regs.h header works around the > > most glaring of the portability issues by defining the > > fields in two different orders depending on the setting of the > > __LITTLE_ENDIAN_BITFIELD define. However, in ctucan_core.h this > > is unconditionally set to 1, which is wrong for big-endian hosts. > > > > Set it only if HOST_WORDS_BIGENDIAN is not set. There is no need > > for a "have we defined it already" guard, because the only place > > that should set it is ctucan_core.h, which has the usual > > double-inclusion guard. > > > > Signed-off-by: Peter Maydell > > --- > > Ideally all that bitfield-using code would be rewritten to use > > extract32 and deposit32 instead, IMHO. > > --- > > hw/net/can/ctucan_core.h | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/hw/net/can/ctucan_core.h b/hw/net/can/ctucan_core.h > > index f21cb1c5ec3..bbc09ae0678 100644 > > --- a/hw/net/can/ctucan_core.h > > +++ b/hw/net/can/ctucan_core.h > > @@ -31,8 +31,7 @@ > > #include "exec/hwaddr.h" > > #include "net/can_emu.h" > > > > - > > -#ifndef __LITTLE_ENDIAN_BITFIELD > > +#ifndef HOST_WORDS_BIGENDIAN > > #define __LITTLE_ENDIAN_BITFIELD 1 > > #endif > > Alternatively s/#ifdef __LITTLE_ENDIAN_BITFIELD/#ifndef > HOST_WORDS_BIGENDIAN/ the codebase and remove this here... But then we cannot have same generated, untouch header file common for Linux kernel and QEMU. Each uses different defines. Or at least it was the goal, but after mainline Linux review we switch in longer run to defines with use of BIT, GENMASK FIELD_GET and FIELD_PREP. But even GENMASK does not seem to be defined in QEMU headers even that it is referenced in include/hw/usb/dwc2-regs.h and include/standard-headers/asm-x86/kvm_para.h So idea to have nice common generated headers which can be checked for consistency and right version by diff for Linux kernel, QEMU and even userspace tests and other OSes (there with Linux defines substitution) seems to be only dream. Anyway, we switch to solution which is matching requirements of each project if it is suggested. But it takes time. Best wishes, Pavel
Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers
On Fri, 6 Nov 2020 at 18:31, Philippe Mathieu-Daudé wrote: > > On 11/6/20 6:11 PM, Peter Maydell wrote: > > Instead of casting an address within a uint8_t array to a > > uint32_t*, use stl_le_p(). This handles possibly misaligned > > addresses which would otherwise crash on some hosts. > > > > Signed-off-by: Peter Maydell > > --- > > hw/net/can/ctucan_core.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c > > index f2ce978e5ec..e66526efa83 100644 > > --- a/hw/net/can/ctucan_core.c > > +++ b/hw/net/can/ctucan_core.c > > @@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, > > uint64_t val, > > addr %= CTUCAN_CORE_TXBUFF_SPAN; > > assert(buff_num < CTUCAN_CORE_TXBUF_NUM); > > if (addr < sizeof(s->tx_buffer[buff_num].data)) { > > -uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + > > addr); > > -*bufp = cpu_to_le32(val); > > +stl_le_p(s->tx_buffer[buff_num].data + addr, val); > > Out of curiosity, how did you notice? Passing by while reviewing? I saw it while I was fixing the Coverity issue from patch 1, yes. thanks -- PMM
Re: [PATCH for-5.2] hw/mips/boston.c: Fix memory leak in boston_fdt_filter() error-handling paths
On 11/6/20 6:58 PM, Peter Maydell wrote: > Coverity points out that the error-handling paths in the > boston_fdt_filter() function don't free the fdt that was allocated. > Fix the leak by using g_autofree. > > Fixes: Coverity CID 1432275 > > Signed-off-by: Peter Maydell > --- > hw/mips/boston.c | 10 -- > 1 file changed, 4 insertions(+), 6 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers
On 11/6/20 7:31 PM, Philippe Mathieu-Daudé wrote: > On 11/6/20 6:11 PM, Peter Maydell wrote: >> Instead of casting an address within a uint8_t array to a >> uint32_t*, use stl_le_p(). This handles possibly misaligned >> addresses which would otherwise crash on some hosts. >> >> Signed-off-by: Peter Maydell >> --- >> hw/net/can/ctucan_core.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c >> index f2ce978e5ec..e66526efa83 100644 >> --- a/hw/net/can/ctucan_core.c >> +++ b/hw/net/can/ctucan_core.c >> @@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, >> uint64_t val, >> addr %= CTUCAN_CORE_TXBUFF_SPAN; >> assert(buff_num < CTUCAN_CORE_TXBUF_NUM); >> if (addr < sizeof(s->tx_buffer[buff_num].data)) { >> -uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + >> addr); >> -*bufp = cpu_to_le32(val); >> +stl_le_p(s->tx_buffer[buff_num].data + addr, val); > > Out of curiosity, how did you notice? Passing by while reviewing? $ git grep -P '^\s+\*.*=.*(cpu_to.*|to_cpu)\('|wc -l 82
Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers
On Fri, 6 Nov 2020 at 18:19, Pavel Pisa wrote: > On Friday 06 of November 2020 18:11:53 Peter Maydell wrote: > > Instead of casting an address within a uint8_t array to a > > uint32_t*, use stl_le_p(). This handles possibly misaligned > > addresses which would otherwise crash on some hosts. > > > > Signed-off-by: Peter Maydell > > --- > even that I do not like stl_le_p name, because it differs from the Linux > kernel one. cpu_to_le32 matches. The pointer variant is cpu_to_le32p > on Linux kernel side, I think. stl is strange name and l for long > is problematic as well, if it is st32_le_p or st_le32_p I would recognize > that much easier. QEMU is not the kernel. We have our own naming conventions and our own APIs. I agree that the b/w/l/q suffixing is less intuitive than 8/16/32/64, but we have a lot of functions using that convention, and the API is what it is. thanks -- PMM
[PATCH v2 1/1] Fix use after free in vfio_migration_probe
Fixes Coverity issue: CID 1436126: Memory - illegal accesses (USE_AFTER_FREE) Fixes: a9e271ec9b36 ("vfio: Add migration region initialization and finalize function") Signed-off-by: Kirti Wankhede Reviewed-by: David Edmondson Reviewed-by: Alex Bennée Reviewed-by: Philippe Mathieu-Daudé --- hw/vfio/migration.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 3ce285ea395d..55261562d4f3 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -897,8 +897,8 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp) goto add_blocker; } -g_free(info); trace_vfio_migration_probe(vbasedev->name, info->index); +g_free(info); return 0; add_blocker: -- 2.7.0
Re: [PATCH for-5.2 1/4] hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer
Hello Peter, On Friday 06 of November 2020 19:04:38 Peter Maydell wrote: > On Fri, 6 Nov 2020 at 17:48, Pavel Pisa wrote: > > Hello Peter, > > > > thanks much for the catching the problem and investing time into > > fixing. I hope to find time for more review of remarks and Xilinx > > patches next week. I do not find reasonable time slot till Sunday. > > Excuse me. To not block updates, I confirm your changes. > > > > On Friday 06 of November 2020 18:11:50 Peter Maydell wrote: > > > The ctucan device has 4 CAN bus cores, each of which has a set of 20 > > > 32-bit registers for writing the transmitted data. The registers are > > > however not contiguous; each core's buffers is 0x100 bytes after > > > the last. > > > > > > We got the checks on the address wrong in the ctucan_mem_write() > > > function: > > > * the first "is addr in range at all" check allowed > > >addr == CTUCAN_CORE_MEM_SIZE, which is actually the first > > >byte off the end of the range > > > * the decode of addresses into core-number plus offset in the > > >tx buffer for that core failed to check that the offset was > > >in range, so the guest could write off the end of the > > >tx_buffer[] array > > > * the decode had an explicit check for whether the core-number > > >was out of range, which is actually impossible given the > > >CTUCAN_CORE_MEM_SIZE check and the number of cores. > > > > > > Fix the top level check, check the offset, and turn the check > > > on the core-number into an assertion. > > > > > > Fixes: Coverity CID 1432874 > > > Signed-off-by: Peter Maydell > > > --- > > > hw/net/can/ctucan_core.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c > > > index d20835cd7e9..ea09bf71a0c 100644 > > > --- a/hw/net/can/ctucan_core.c > > > +++ b/hw/net/can/ctucan_core.c > > > @@ -303,7 +303,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr > > > addr, uint64_t val, DPRINTF("write 0x%02llx addr 0x%02x\n", > > > (unsigned long long)val, (unsigned int)addr); > > > > > > -if (addr > CTUCAN_CORE_MEM_SIZE) { > > > +if (addr >= CTUCAN_CORE_MEM_SIZE) { > > > return; > > > } > > > > Acked-by: Pavel Pisa > > > > > @@ -312,7 +312,8 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr > > > addr, uint64_t val, addr -= CTU_CAN_FD_TXTB1_DATA_1; > > > buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN; > > > addr %= CTUCAN_CORE_TXBUFF_SPAN; > > > -if (buff_num < CTUCAN_CORE_TXBUF_NUM) { > > > +assert(buff_num < CTUCAN_CORE_TXBUF_NUM); > > > > Assert is not necessary. If there is not buffer at that location, > > then write has no effect. Assert would check for driver errors, > > but it is not a problem of QEMU and for sure should not lead to its > > crash. > > We assert() here as a guide to readers of the code that we know > that buff_num can't possibly be out of range for the array > access we're about to do: the values of CTUCAN_CORE_MEM_SIZE, > CTUCAN_CORE_TXBUFF_SPAN, etc, make it mathematically impossible. > We prefer to assert() that kind of condition rather than having > an if() test for it. I understand but I see as fully valid to have CTUCAN_CORE_MEM_SIZE with some reserve and only two buffers populated which would lead to "spare" space after the end of the area where buffers are mapped into address-space. Same for CTUCAN_CORE_TXBUFF_SPAN and CTUCAN_CORE_MSG_MAX_LEN. There could be check assert(CTUCAN_CORE_MSG_MAX_LEN <= CTUCAN_CORE_TXBUFF_SPAN) and assert(CTUCAN_CORE_TXBUFF_SPAN * CTUCAN_CORE_TXBUF_NUM + CTU_CAN_FD_TXTB1_DATA_1 <= CTUCAN_CORE_MEM_SIZE) or even some cross checks of sizeof of the filed. But all other combination are in the fact real, depends on core synthesis parameters. Yes, for core release 2.x are the most fixed now. But option to provide variant compatible with actual driver which has CTUCAN_CORE_TXBUF_NUM > 4 up to hard limit of 8 is left open still. Best wishes, Pavel
Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers
On 11/6/20 6:11 PM, Peter Maydell wrote: > Instead of casting an address within a uint8_t array to a > uint32_t*, use stl_le_p(). This handles possibly misaligned > addresses which would otherwise crash on some hosts. > > Signed-off-by: Peter Maydell > --- > hw/net/can/ctucan_core.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c > index f2ce978e5ec..e66526efa83 100644 > --- a/hw/net/can/ctucan_core.c > +++ b/hw/net/can/ctucan_core.c > @@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, > uint64_t val, > addr %= CTUCAN_CORE_TXBUFF_SPAN; > assert(buff_num < CTUCAN_CORE_TXBUF_NUM); > if (addr < sizeof(s->tx_buffer[buff_num].data)) { > -uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + > addr); > -*bufp = cpu_to_le32(val); > +stl_le_p(s->tx_buffer[buff_num].data + addr, val); Out of curiosity, how did you notice? Passing by while reviewing? Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH for-5.2 3/4] hw/net/can/ctucan_core: Handle big-endian hosts
On 11/6/20 6:11 PM, Peter Maydell wrote: > The ctucan driver defines types for its registers which are a union > of a uint32_t with a struct with bitfields for the individual > fields within that register. This is a bad idea, because bitfields > aren't portable. The ctu_can_fd_regs.h header works around the > most glaring of the portability issues by defining the > fields in two different orders depending on the setting of the > __LITTLE_ENDIAN_BITFIELD define. However, in ctucan_core.h this > is unconditionally set to 1, which is wrong for big-endian hosts. > > Set it only if HOST_WORDS_BIGENDIAN is not set. There is no need > for a "have we defined it already" guard, because the only place > that should set it is ctucan_core.h, which has the usual > double-inclusion guard. > > Signed-off-by: Peter Maydell > --- > Ideally all that bitfield-using code would be rewritten to use > extract32 and deposit32 instead, IMHO. > --- > hw/net/can/ctucan_core.h | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/hw/net/can/ctucan_core.h b/hw/net/can/ctucan_core.h > index f21cb1c5ec3..bbc09ae0678 100644 > --- a/hw/net/can/ctucan_core.h > +++ b/hw/net/can/ctucan_core.h > @@ -31,8 +31,7 @@ > #include "exec/hwaddr.h" > #include "net/can_emu.h" > > - > -#ifndef __LITTLE_ENDIAN_BITFIELD > +#ifndef HOST_WORDS_BIGENDIAN > #define __LITTLE_ENDIAN_BITFIELD 1 > #endif Alternatively s/#ifdef __LITTLE_ENDIAN_BITFIELD/#ifndef HOST_WORDS_BIGENDIAN/ the codebase and remove this here... Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 2/2] qemu-option: warn for short-form boolean options
On 06/11/20 17:49, Markus Armbruster wrote: Deprecate all this, except for -chardev and -spice where it is in wide use. I consider this a misuse of deprecation, to be frank. If something is known to be unused, we just remove it. Deprecation is precisely for things that are used. I'm with Daniel here: let's deprecate this sugar everywhere. Wide use may justify extending the deprecation grace period. Fair enough. However now that I think of it I'd have to remove the coverage of the "feature" in tests, because they'd warn. Paolo
Re: [PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers
On Friday 06 of November 2020 18:11:53 Peter Maydell wrote: > Instead of casting an address within a uint8_t array to a > uint32_t*, use stl_le_p(). This handles possibly misaligned > addresses which would otherwise crash on some hosts. > > Signed-off-by: Peter Maydell > --- > hw/net/can/ctucan_core.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c > index f2ce978e5ec..e66526efa83 100644 > --- a/hw/net/can/ctucan_core.c > +++ b/hw/net/can/ctucan_core.c > @@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, > uint64_t val, addr %= CTUCAN_CORE_TXBUFF_SPAN; > assert(buff_num < CTUCAN_CORE_TXBUF_NUM); > if (addr < sizeof(s->tx_buffer[buff_num].data)) { > -uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + > addr); -*bufp = cpu_to_le32(val); > +stl_le_p(s->tx_buffer[buff_num].data + addr, val); > } > } else { > switch (addr & ~3) { Acked-by: Pavel Pisa even that I do not like stl_le_p name, because it differs from the Linux kernel one. cpu_to_le32 matches. The pointer variant is cpu_to_le32p on Linux kernel side, I think. stl is strange name and l for long is problematic as well, if it is st32_le_p or st_le32_p I would recognize that much easier. Best wishes, Pavel Pisa
Re: [PATCH for-5.2 2/4] hw/net/can/ctucan: Avoid unused value in ctucan_send_ready_buffers()
Hello Peter, this one is a little problematic. I understand that you want to have clean code and no warnings reports from coverity. On Friday 06 of November 2020 18:11:51 Peter Maydell wrote: > Coverity points out that in ctucan_send_ready_buffers() we > set buff_st_mask = 0xf << (i * 4) inside the loop, but then > we never use it before overwriting it later. > > The only thing we use the mask for is as part of the code that is > inserting the new buff_st field into tx_status. That is more > comprehensibly written using deposit32(), so do that and drop the > mask variable entirely. > > We also update the buff_st local variable at multiple points > during this function, but nothing can ever see these > intermediate values, so just drop those, write the final > TXT_TOK as a fixed constant value, and collapse the only > remaining set/use of buff_st down into an extract32(). > > Fixes: Coverity CID 1432869 > Signed-off-by: Peter Maydell > --- > hw/net/can/ctucan_core.c | 15 +++ > 1 file changed, 3 insertions(+), 12 deletions(-) > > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c > index ea09bf71a0c..f2ce978e5ec 100644 > --- a/hw/net/can/ctucan_core.c > +++ b/hw/net/can/ctucan_core.c > @@ -240,8 +240,6 @@ static void ctucan_send_ready_buffers(CtuCanCoreState > *s) uint8_t *pf; > int buff2tx_idx; > uint32_t tx_prio_max; > -unsigned int buff_st; > -uint32_t buff_st_mask; > > if (!s->mode_settings.s.ena) { > return; > @@ -256,10 +254,7 @@ static void ctucan_send_ready_buffers(CtuCanCoreState > *s) for (i = 0; i < CTUCAN_CORE_TXBUF_NUM; i++) { > uint32_t prio; > > -buff_st_mask = 0xf << (i * 4); > -buff_st = (s->tx_status.u32 >> (i * 4)) & 0xf; > - > -if (buff_st != TXT_RDY) { > +if (extract32(s->tx_status.u32, i * 4, 4) != TXT_RDY) { > continue; > } It is right replacement. Initially buff_st kept location of bits of the buffer found to be processed later. But after priorization of Tx this cannot be used without recompute. > prio = (s->tx_priority.u32 >> (i * 4)) & 0x7; > @@ -271,10 +266,7 @@ static void ctucan_send_ready_buffers(CtuCanCoreState > *s) if (buff2tx_idx == -1) { > break; > } > -buff_st_mask = 0xf << (buff2tx_idx * 4); > -buff_st = (s->tx_status.u32 >> (buff2tx_idx * 4)) & 0xf; There I would kept extracted state in the variable. Actual model is really simplified to real hardware. Tx succeeds in zero time. > int_stat.u32 = 0; > -buff_st = TXT_RDY; This is why the TXT_RDY state immediately changes to TXT_TOK. It works well for actual simple CAN subsystem implementation. But if we want to implement priorization of messages on emulated bus and even simulate real bus latency by delay to state change and interrut delivery, then we need to proceed through TXT_RDY state. If it is a problem for release, that your want to have coverity clean source tree, then please left the line as a comment there or use some trick with (void)buff_st; Or if you prefer, use +s->tx_status.u32 = deposit32(s->tx_status.u32, + buff2tx_idx * 4, 4, TXT_RDY); if it silent the coverity. > pf = s->tx_buffer[buff2tx_idx].data; > ctucan_buff2frame(pf, ); > s->status.s.idle = 0; > @@ -283,12 +275,11 @@ static void ctucan_send_ready_buffers(CtuCanCoreState > *s) s->status.s.idle = 1; > s->status.s.txs = 0; > s->tx_fr_ctr.s.tx_fr_ctr_val++; > -buff_st = TXT_TOK; > int_stat.s.txi = 1; > int_stat.s.txbhci = 1; > s->int_stat.u32 |= int_stat.u32 & ~s->int_mask.u32; > -s->tx_status.u32 = (s->tx_status.u32 & ~buff_st_mask) | > -(buff_st << (buff2tx_idx * 4)); > +s->tx_status.u32 = deposit32(s->tx_status.u32, > + buff2tx_idx * 4, 4, TXT_TOK); > } while (1); > } Thanks for your time, I have planned to look and these and attempt to provide solution which is acceptable but documents our intentions, but it is on my tasks queue still, Pavel
Re: [Virtio-fs] [qemu-web PATCH v2] Add virtio-fs in OSv overview blog post
* Fotis Xenakis (fo...@windowslive.com) wrote: > This post briefly goes over the main points of virtio-fs and OSv, a > unikernel running under QEMU/KVM and taking advantage of its virtio-fs > implementation. > > Changes since v1: > - Fixed wording and links, as suggested by Thomas Huth. > - Added a short example of virtio-fs usage in OSv. > > Signed-off-by: Fotis Xenakis > +One central point is OSv's support for booting from virtio-fs: this enables > +deploying a modified version or a whole new application **without > rebuilding** > +the image, just by adjusting its root file system contents on the host. Last, > +owing to the DAX window practically providing low-overhead access to the > host's > +page cache, scalability is also expected to excel, with it being a common > +concern due to the potentially high density of unikernels per host. Hi Fotis, Since I'm not used to unikernels, I'm a little confused by this; I'd appreciate some explanation. In your unikernel, does the root filesystem just contain data? I mean being a 'unikernel' aren't all the binaries and support all linked into the kernel itself? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[PATCH 1/2] docs/fuzz: rST-ify the fuzzing documentation
Signed-off-by: Alexander Bulekov --- MAINTAINERS| 2 +- docs/devel/fuzzing.rst | 236 + docs/devel/fuzzing.txt | 214 - docs/devel/index.rst | 1 + 4 files changed, 238 insertions(+), 215 deletions(-) create mode 100644 docs/devel/fuzzing.rst delete mode 100644 docs/devel/fuzzing.txt diff --git a/MAINTAINERS b/MAINTAINERS index 63223e1183..da1ef68ff1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2517,7 +2517,7 @@ R: Thomas Huth S: Maintained F: tests/qtest/fuzz/ F: scripts/oss-fuzz/ -F: docs/devel/fuzzing.txt +F: docs/devel/fuzzing.rst Register API M: Alistair Francis diff --git a/docs/devel/fuzzing.rst b/docs/devel/fuzzing.rst new file mode 100644 index 00..f19d75ceff --- /dev/null +++ b/docs/devel/fuzzing.rst @@ -0,0 +1,236 @@ + +Fuzzing + + +This document describes the virtual-device fuzzing infrastructure in QEMU and +how to use it to implement additional fuzzers. + +Basics +-- + +Fuzzing operates by passing inputs to an entry point/target function. The +fuzzer tracks the code coverage triggered by the input. Based on these +findings, the fuzzer mutates the input and repeats the fuzzing. + +To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer +is an *in-process* fuzzer. For the developer, this means that it is their +responsibility to ensure that state is reset between fuzzing-runs. + +Building the fuzzers + + +*NOTE*: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is +much faster, since the page-map has a smaller size. This is due to the fact that +AddressSanitizer maps ~20TB of memory, as part of its detection. This results +in a large page-map, and a much slower ``fork()``. + +To build the fuzzers, install a recent version of clang: +Configure with (substitute the clang binaries with the version you installed). +Here, enable-sanitizers, is optional but it allows us to reliably detect bugs +such as out-of-bounds accesses, use-after-frees, double-frees etc.:: + +CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \ +--enable-sanitizers + +Fuzz targets are built similarly to system targets:: + +make i386-softmmu/fuzz + +This builds ``./i386-softmmu/qemu-fuzz-i386`` + +The first option to this command is: ``--fuzz-target=FUZZ_NAME`` +To list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments. + +For example:: + +./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz + +Internally, libfuzzer parses all arguments that do not begin with ``"--"``. +Information about these is available by passing ``-help=1`` + +Now the only thing left to do is wait for the fuzzer to trigger potential +crashes. + +Useful libFuzzer flags +-- + +As mentioned above, libFuzzer accepts some arguments. Passing ``-help=1`` will +list the available arguments. In particular, these arguments might be helpful: + +* ``CORPUS_DIR/`` : Specify a directory as the last argument to libFuzzer. + libFuzzer stores each "interesting" input in this corpus directory. The next + time you run libFuzzer, it will read all of the inputs from the corpus, and + continue fuzzing from there. You can also specify multiple directories. + libFuzzer loads existing inputs from all specified directories, but will only + write new ones to the first one specified. + +* ``-max_len=4096`` : specify the maximum byte-length of the inputs libFuzzer + will generate. + +* ``-close_fd_mask={1,2,3}`` : close, stderr, or both. Useful for targets that + trigger many debug/error messages, or create output on the serial console. + +* ``-jobs=4 -workers=4`` : These arguments configure libFuzzer to run 4 fuzzers in + parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only + ``-jobs=N``, libFuzzer automatically spawns a number of workers less than or equal + to half the available CPU cores. Replace 4 with a number appropriate for your + machine. Make sure to specify a ``CORPUS_DIR``, which will allow the parallel + fuzzers to share information about the interesting inputs they find. + +* ``-use_value_profile=1`` : For each comparison operation, libFuzzer computes + ``(caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12)`` and places this in the + coverage table. Useful for targets with "magic" constants. If Arg1 came from + the fuzzer's input and Arg2 is a magic constant, then each time the Hamming + distance between Arg1 and Arg2 decreases, libFuzzer adds the input to the + corpus. + +* ``-shrink=1`` : Tries to make elements of the corpus "smaller". Might lead to + better coverage performance, depending on the target. + +Note that libFuzzer's exact behavior will depend on the version of +clang and libFuzzer used to build the device fuzzers. + +Generating Coverage Reports +--- + +Code coverage is a crucial
[PATCH 0/2] Fuzzing Documentation Updates
I converted the docuemntation to markdown and updated build instructions that changed after meson. Alexander Bulekov (2): docs/fuzz: rST-ify the fuzzing documentation docs/fuzz: update fuzzing documentation post-meson MAINTAINERS| 2 +- docs/devel/fuzzing.rst | 236 + docs/devel/fuzzing.txt | 214 - docs/devel/index.rst | 1 + 4 files changed, 238 insertions(+), 215 deletions(-) create mode 100644 docs/devel/fuzzing.rst delete mode 100644 docs/devel/fuzzing.txt -- 2.28.0
[PATCH 2/2] docs/fuzz: update fuzzing documentation post-meson
Signed-off-by: Alexander Bulekov --- docs/devel/fuzzing.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/devel/fuzzing.rst b/docs/devel/fuzzing.rst index f19d75ceff..6096242d99 100644 --- a/docs/devel/fuzzing.rst +++ b/docs/devel/fuzzing.rst @@ -34,16 +34,16 @@ such as out-of-bounds accesses, use-after-frees, double-frees etc.:: Fuzz targets are built similarly to system targets:: -make i386-softmmu/fuzz +make qemu-fuzz-i386 -This builds ``./i386-softmmu/qemu-fuzz-i386`` +This builds ``./qemu-fuzz-i386`` The first option to this command is: ``--fuzz-target=FUZZ_NAME`` To list all of the available fuzzers run ``qemu-fuzz-i386`` with no arguments. For example:: -./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz +./qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz Internally, libfuzzer parses all arguments that do not begin with ``"--"``. Information about these is available by passing ``-help=1`` -- 2.28.0
Re: [PATCH for-5.2 1/4] hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer
On Fri, 6 Nov 2020 at 17:48, Pavel Pisa wrote: > > Hello Peter, > > thanks much for the catching the problem and investing time into > fixing. I hope to find time for more review of remarks and Xilinx > patches next week. I do not find reasonable time slot till Sunday. > Excuse me. To not block updates, I confirm your changes. > > On Friday 06 of November 2020 18:11:50 Peter Maydell wrote: > > The ctucan device has 4 CAN bus cores, each of which has a set of 20 > > 32-bit registers for writing the transmitted data. The registers are > > however not contiguous; each core's buffers is 0x100 bytes after > > the last. > > > > We got the checks on the address wrong in the ctucan_mem_write() > > function: > > * the first "is addr in range at all" check allowed > >addr == CTUCAN_CORE_MEM_SIZE, which is actually the first > >byte off the end of the range > > * the decode of addresses into core-number plus offset in the > >tx buffer for that core failed to check that the offset was > >in range, so the guest could write off the end of the > >tx_buffer[] array > > * the decode had an explicit check for whether the core-number > >was out of range, which is actually impossible given the > >CTUCAN_CORE_MEM_SIZE check and the number of cores. > > > > Fix the top level check, check the offset, and turn the check > > on the core-number into an assertion. > > > > Fixes: Coverity CID 1432874 > > Signed-off-by: Peter Maydell > > --- > > hw/net/can/ctucan_core.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c > > index d20835cd7e9..ea09bf71a0c 100644 > > --- a/hw/net/can/ctucan_core.c > > +++ b/hw/net/can/ctucan_core.c > > @@ -303,7 +303,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, > > uint64_t val, DPRINTF("write 0x%02llx addr 0x%02x\n", > > (unsigned long long)val, (unsigned int)addr); > > > > -if (addr > CTUCAN_CORE_MEM_SIZE) { > > +if (addr >= CTUCAN_CORE_MEM_SIZE) { > > return; > > } > > Acked-by: Pavel Pisa > > @@ -312,7 +312,8 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, > > uint64_t val, addr -= CTU_CAN_FD_TXTB1_DATA_1; > > buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN; > > addr %= CTUCAN_CORE_TXBUFF_SPAN; > > -if (buff_num < CTUCAN_CORE_TXBUF_NUM) { > > +assert(buff_num < CTUCAN_CORE_TXBUF_NUM); > > Assert is not necessary. If there is not buffer at that location, > then write has no effect. Assert would check for driver errors, > but it is not a problem of QEMU and for sure should not lead to its > crash. We assert() here as a guide to readers of the code that we know that buff_num can't possibly be out of range for the array access we're about to do: the values of CTUCAN_CORE_MEM_SIZE, CTUCAN_CORE_TXBUFF_SPAN, etc, make it mathematically impossible. We prefer to assert() that kind of condition rather than having an if() test for it. thanks -- PMM
[PATCH for-5.2] hw/mips/boston.c: Fix memory leak in boston_fdt_filter() error-handling paths
Coverity points out that the error-handling paths in the boston_fdt_filter() function don't free the fdt that was allocated. Fix the leak by using g_autofree. Fixes: Coverity CID 1432275 Signed-off-by: Peter Maydell --- hw/mips/boston.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/hw/mips/boston.c b/hw/mips/boston.c index 3356d7a6814..3d40867dc4c 100644 --- a/hw/mips/boston.c +++ b/hw/mips/boston.c @@ -349,11 +349,9 @@ static const void *boston_fdt_filter(void *opaque, const void *fdt_orig, MachineState *machine = s->mach; const char *cmdline; int err; -void *fdt; -size_t fdt_sz, ram_low_sz, ram_high_sz; - -fdt_sz = fdt_totalsize(fdt_orig) * 2; -fdt = g_malloc0(fdt_sz); +size_t ram_low_sz, ram_high_sz; +size_t fdt_sz = fdt_totalsize(fdt_orig) * 2; +g_autofree void *fdt = g_malloc0(fdt_sz); err = fdt_open_into(fdt_orig, fdt, fdt_sz); if (err) { @@ -380,7 +378,7 @@ static const void *boston_fdt_filter(void *opaque, const void *fdt_orig, s->fdt_base = *load_addr; -return fdt; +return g_steal_pointer(); } static const void *boston_kernel_filter(void *opaque, const void *kernel, -- 2.20.1
Re: [PATCH for-5.2 1/4] hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer
Hello Peter, thanks much for the catching the problem and investing time into fixing. I hope to find time for more review of remarks and Xilinx patches next week. I do not find reasonable time slot till Sunday. Excuse me. To not block updates, I confirm your changes. On Friday 06 of November 2020 18:11:50 Peter Maydell wrote: > The ctucan device has 4 CAN bus cores, each of which has a set of 20 > 32-bit registers for writing the transmitted data. The registers are > however not contiguous; each core's buffers is 0x100 bytes after > the last. > > We got the checks on the address wrong in the ctucan_mem_write() > function: > * the first "is addr in range at all" check allowed >addr == CTUCAN_CORE_MEM_SIZE, which is actually the first >byte off the end of the range > * the decode of addresses into core-number plus offset in the >tx buffer for that core failed to check that the offset was >in range, so the guest could write off the end of the >tx_buffer[] array > * the decode had an explicit check for whether the core-number >was out of range, which is actually impossible given the >CTUCAN_CORE_MEM_SIZE check and the number of cores. > > Fix the top level check, check the offset, and turn the check > on the core-number into an assertion. > > Fixes: Coverity CID 1432874 > Signed-off-by: Peter Maydell > --- > hw/net/can/ctucan_core.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c > index d20835cd7e9..ea09bf71a0c 100644 > --- a/hw/net/can/ctucan_core.c > +++ b/hw/net/can/ctucan_core.c > @@ -303,7 +303,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, > uint64_t val, DPRINTF("write 0x%02llx addr 0x%02x\n", > (unsigned long long)val, (unsigned int)addr); > > -if (addr > CTUCAN_CORE_MEM_SIZE) { > +if (addr >= CTUCAN_CORE_MEM_SIZE) { > return; > } Acked-by: Pavel Pisa There is another mistake diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c index d20835cd7e..b90a8e3b76 100644 --- a/hw/net/can/ctucan_core.c +++ b/hw/net/can/ctucan_core.c @@ -418,7 +418,7 @@ uint64_t ctucan_mem_read(CtuCanCoreState *s, hwaddr addr, unsigned size) DPRINTF("read addr 0x%02x ...\n", (unsigned int)addr); -if (addr > CTUCAN_CORE_MEM_SIZE) { +if (addr >= CTUCAN_CORE_MEM_SIZE) { return 0; } But is should be really harmless. > @@ -312,7 +312,8 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, > uint64_t val, addr -= CTU_CAN_FD_TXTB1_DATA_1; > buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN; > addr %= CTUCAN_CORE_TXBUFF_SPAN; > -if (buff_num < CTUCAN_CORE_TXBUF_NUM) { > +assert(buff_num < CTUCAN_CORE_TXBUF_NUM); Assert is not necessary. If there is not buffer at that location, then write has no effect. Assert would check for driver errors, but it is not a problem of QEMU and for sure should not lead to its crash. > +if (addr < sizeof(s->tx_buffer[buff_num].data)) { > uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + > addr); *bufp = cpu_to_le32(val); > } So my proposed solution is diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c index d20835cd7e..af30d57cfd 100644 --- a/hw/net/can/ctucan_core.c +++ b/hw/net/can/ctucan_core.c @@ -312,7 +312,9 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, uint64_t val, addr -= CTU_CAN_FD_TXTB1_DATA_1; buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN; addr %= CTUCAN_CORE_TXBUFF_SPAN; -if (buff_num < CTUCAN_CORE_TXBUF_NUM) { +addr &= ~3; +if (buff_num < CTUCAN_CORE_TXBUF_NUM && +addr < CTUCAN_CORE_MSG_MAX_LEN) { uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + addr); *bufp = cpu_to_le32(val); } It ignores writes to unimplemented locations. There is fix, workaround to make safe writes to unaligned addresses. They are not supported by actual QEMU CTU CAN FD model. Real HW supports them but driver never uses unaligned writes nor any other size than 32-bits. Reads supports at least accesses by smaller size correctly but do not support unaligned reads which cross 32-bit boundaries. Again, actual driver code never uses other size than 32-bits for read. Byte access for debug dumps by rdwrmem by bytes has been tested and works OK as well. The following proposed correction right solution too diff --git a/hw/net/can/ctucan_core.h b/hw/net/can/ctucan_core.h index f21cb1c5ec..ad701c4764 100644 --- a/hw/net/can/ctucan_core.h +++ b/hw/net/can/ctucan_core.h @@ -31,10 +31,11 @@ #include "exec/hwaddr.h" #include "net/can_emu.h" - +#ifndef HOST_WORDS_BIGENDIAN #ifndef __LITTLE_ENDIAN_BITFIELD #define __LITTLE_ENDIAN_BITFIELD 1 #endif +#endif #include "ctu_can_fd_frame.h" #include "ctu_can_fd_regs.h" As for bitfields use, there is plan to add additional mode to generate registers header file from
[PATCH v2 1/2] quorum: Implement bdrv_co_block_status()
The quorum driver does not implement bdrv_co_block_status() and because of that it always reports to contain data even if all its children are known to be empty. One consequence of this is that if we for example create a quorum with a size of 10GB and we mirror it to a new image the operation will write 10GB of actual zeroes to the destination image wasting a lot of time and disk space. Since a quorum has an arbitrary number of children of potentially different formats there is no way to report all possible allocation status flags in a way that makes sense, so this implementation only reports when a given region is known to contain zeroes (BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA). If all children agree that a region contains zeroes then we can return BDRV_BLOCK_ZERO using the smallest size reported by the children (because all agree that a region of at least that size contains zeroes). If at least one child disagrees we have to return BDRV_BLOCK_DATA. In this case we use the largest of the sizes reported by the children that didn't return BDRV_BLOCK_ZERO (because we know that there won't be an agreement for at least that size). Signed-off-by: Alberto Garcia Tested-by: Tao Xu --- block/quorum.c | 49 tests/qemu-iotests/312 | 148 + tests/qemu-iotests/312.out | 67 + tests/qemu-iotests/group | 1 + 4 files changed, 265 insertions(+) create mode 100755 tests/qemu-iotests/312 create mode 100644 tests/qemu-iotests/312.out diff --git a/block/quorum.c b/block/quorum.c index e846a7e892..29cee42705 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -18,6 +18,7 @@ #include "qemu/module.h" #include "qemu/option.h" #include "block/block_int.h" +#include "block/coroutines.h" #include "block/qdict.h" #include "qapi/error.h" #include "qapi/qapi-events-block.h" @@ -1174,6 +1175,53 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c, | DEFAULT_PERM_UNCHANGED; } +/* + * Each one of the children can report different status flags even + * when they contain the same data, so what this function does is + * return BDRV_BLOCK_ZERO if *all* children agree that a certain + * region contains zeroes, and BDRV_BLOCK_DATA otherwise. + */ +static int coroutine_fn quorum_co_block_status(BlockDriverState *bs, + bool want_zero, + int64_t offset, int64_t count, + int64_t *pnum, int64_t *map, + BlockDriverState **file) +{ +BDRVQuorumState *s = bs->opaque; +int i, ret; +int64_t pnum_zero = count; +int64_t pnum_data = 0; + +for (i = 0; i < s->num_children; i++) { +int64_t bytes; +ret = bdrv_co_common_block_status_above(s->children[i]->bs, NULL, false, +want_zero, offset, count, +, NULL, NULL, NULL); +if (ret < 0) { +return ret; +} +/* + * Even if all children agree about whether there are zeroes + * or not at @offset they might disagree on the size, so use + * the smallest when reporting BDRV_BLOCK_ZERO and the largest + * when reporting BDRV_BLOCK_DATA. + */ +if (ret & BDRV_BLOCK_ZERO) { +pnum_zero = MIN(pnum_zero, bytes); +} else { +pnum_data = MAX(pnum_data, bytes); +} +} + +if (pnum_data) { +*pnum = pnum_data; +return BDRV_BLOCK_DATA; +} else { +*pnum = pnum_zero; +return BDRV_BLOCK_ZERO; +} +} + static const char *const quorum_strong_runtime_opts[] = { QUORUM_OPT_VOTE_THRESHOLD, QUORUM_OPT_BLKVERIFY, @@ -1192,6 +1240,7 @@ static BlockDriver bdrv_quorum = { .bdrv_close = quorum_close, .bdrv_gather_child_options = quorum_gather_child_options, .bdrv_dirname = quorum_dirname, +.bdrv_co_block_status = quorum_co_block_status, .bdrv_co_flush_to_disk = quorum_co_flush, diff --git a/tests/qemu-iotests/312 b/tests/qemu-iotests/312 new file mode 100755 index 00..1b08f1552f --- /dev/null +++ b/tests/qemu-iotests/312 @@ -0,0 +1,148 @@ +#!/usr/bin/env bash +# +# Test drive-mirror with quorum +# +# The goal of this test is to check how the quorum driver reports +# regions that are known to read as zeroes (BDRV_BLOCK_ZERO). The idea +# is that drive-mirror will try the efficient representation of zeroes +# in the destination image instead of writing actual zeroes. +# +# Copyright (C) 2020 Igalia, S.L. +# Author: Alberto Garcia +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation;
[PATCH v2 2/2] quorum: Implement bdrv_co_pwrite_zeroes()
This simply calls bdrv_co_pwrite_zeroes() in all children Signed-off-by: Alberto Garcia --- block/quorum.c | 18 -- tests/qemu-iotests/312 | 7 +++ tests/qemu-iotests/312.out | 4 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/block/quorum.c b/block/quorum.c index 29cee42705..70b39a7226 100644 --- a/block/quorum.c +++ b/block/quorum.c @@ -692,8 +692,13 @@ static void write_quorum_entry(void *opaque) QuorumChildRequest *sacb = >qcrs[i]; sacb->bs = s->children[i]->bs; -sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes, -acb->qiov, acb->flags); +if (acb->flags & BDRV_REQ_ZERO_WRITE) { +sacb->ret = bdrv_co_pwrite_zeroes(s->children[i], acb->offset, + acb->bytes, acb->flags); +} else { +sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes, +acb->qiov, acb->flags); +} if (sacb->ret == 0) { acb->success_count++; } else { @@ -739,6 +744,14 @@ static int quorum_co_pwritev(BlockDriverState *bs, uint64_t offset, return ret; } +static int quorum_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, + int bytes, BdrvRequestFlags flags) + +{ +return quorum_co_pwritev(bs, offset, bytes, NULL, + flags | BDRV_REQ_ZERO_WRITE); +} + static int64_t quorum_getlength(BlockDriverState *bs) { BDRVQuorumState *s = bs->opaque; @@ -1248,6 +1261,7 @@ static BlockDriver bdrv_quorum = { .bdrv_co_preadv = quorum_co_preadv, .bdrv_co_pwritev= quorum_co_pwritev, +.bdrv_co_pwrite_zeroes = quorum_co_pwrite_zeroes, .bdrv_add_child = quorum_add_child, .bdrv_del_child = quorum_del_child, diff --git a/tests/qemu-iotests/312 b/tests/qemu-iotests/312 index 1b08f1552f..93046393e7 100755 --- a/tests/qemu-iotests/312 +++ b/tests/qemu-iotests/312 @@ -114,6 +114,13 @@ $QEMU_IO -c "write -P 0 $((0x20)) $((0x1))" "$TEST_IMG.0" | _filter_qemu $QEMU_IO -c "write -z $((0x20)) $((0x3))" "$TEST_IMG.1" | _filter_qemu_io $QEMU_IO -c "write -P 0 $((0x20)) $((0x2))" "$TEST_IMG.2" | _filter_qemu_io +# Test 5: write data to a region and then zeroize it, doing it +# directly on the quorum device instead of the individual images. +# This has no effect on the end result but proves that the quorum driver +# supports 'write -z'. +$QEMU_IO -c "open -o $quorum" -c "write $((0x25)) $((0x1))" | _filter_qemu_io +$QEMU_IO -c "open -o $quorum" -c "write -z $((0x25)) $((0x1))" | _filter_qemu_io + echo echo '### Launch the drive-mirror job' echo diff --git a/tests/qemu-iotests/312.out b/tests/qemu-iotests/312.out index 4ae749175b..778dda95c7 100644 --- a/tests/qemu-iotests/312.out +++ b/tests/qemu-iotests/312.out @@ -39,6 +39,10 @@ wrote 196608/196608 bytes at offset 2097152 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) wrote 131072/131072 bytes at offset 2097152 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 65536/65536 bytes at offset 2424832 +64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 65536/65536 bytes at offset 2424832 +64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) ### Launch the drive-mirror job -- 2.20.1
[PATCH v2 0/2] quorum: Implement bdrv_co_block_status()
Hi, The first patch is the same as in v1, but now that we're at it I decided to also implement bdrv_co_pwrite_zeroes() Berto v2: - Implement bdrv_co_pwrite_zeroes() for quorum v1: https://lists.gnu.org/archive/html/qemu-block/2020-11/msg00163.html Alberto Garcia (2): quorum: Implement bdrv_co_block_status() quorum: Implement bdrv_co_pwrite_zeroes() block/quorum.c | 67 +++- tests/qemu-iotests/312 | 155 + tests/qemu-iotests/312.out | 71 + tests/qemu-iotests/group | 1 + 4 files changed, 292 insertions(+), 2 deletions(-) create mode 100755 tests/qemu-iotests/312 create mode 100644 tests/qemu-iotests/312.out -- 2.20.1
Re: [PATCH v2 4/4] linux-user/sparc: Handle tstate in sparc64_get/set_context()
On 11/6/20 7:27 AM, Peter Maydell wrote: > +#ifdef TARGET_SPARC64 > +/* win_helper.c */ > +target_ulong cpu_get_ccr(CPUSPARCState *env1); > +void cpu_put_ccr(CPUSPARCState *env1, target_ulong val); > +target_ulong cpu_get_cwp64(CPUSPARCState *env1); > +void cpu_put_cwp64(CPUSPARCState *env1, int cwp); > + > +static inline uint64_t sparc64_tstate(CPUSPARCState *env) > +{ > +uint64_t tstate = (cpu_get_ccr(env) << 32) | > +((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) | > +cpu_get_cwp64(env); > + > +if (env->def.features & CPU_FEATURE_GL) { > +tstate |= (env->gl & 7ULL) << 40; > +} > +return tstate; > +} > +#endif Given that this inline function calls 2 other out-of-line functions, I think it might as well be out-of-line itself. I'd place it in win_helper.c alongside the functions that it calls. But either way, Reviewed-by: Richard Henderson r~
nbd: bitmap_to_extents() calls nbd_extent_array_add() without checking return value: coverity false positive?
Hi; Coverity's "you usually check the return value of this function but you didn't do that here" heuristic has fired on the code in nbd/server.c:bitmap_to_extents() -- the function nbd_extent_array_add() is called five times in server.c, and the return value is checked in four of those, but not in the final call at the end of bitmap_to_extents(). (CID 1436125.) Is this a false positive, or should the caller be handling an error here ? thanks -- PMM
[PATCH for-5.2 3/4] hw/net/can/ctucan_core: Handle big-endian hosts
The ctucan driver defines types for its registers which are a union of a uint32_t with a struct with bitfields for the individual fields within that register. This is a bad idea, because bitfields aren't portable. The ctu_can_fd_regs.h header works around the most glaring of the portability issues by defining the fields in two different orders depending on the setting of the __LITTLE_ENDIAN_BITFIELD define. However, in ctucan_core.h this is unconditionally set to 1, which is wrong for big-endian hosts. Set it only if HOST_WORDS_BIGENDIAN is not set. There is no need for a "have we defined it already" guard, because the only place that should set it is ctucan_core.h, which has the usual double-inclusion guard. Signed-off-by: Peter Maydell --- Ideally all that bitfield-using code would be rewritten to use extract32 and deposit32 instead, IMHO. --- hw/net/can/ctucan_core.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/net/can/ctucan_core.h b/hw/net/can/ctucan_core.h index f21cb1c5ec3..bbc09ae0678 100644 --- a/hw/net/can/ctucan_core.h +++ b/hw/net/can/ctucan_core.h @@ -31,8 +31,7 @@ #include "exec/hwaddr.h" #include "net/can_emu.h" - -#ifndef __LITTLE_ENDIAN_BITFIELD +#ifndef HOST_WORDS_BIGENDIAN #define __LITTLE_ENDIAN_BITFIELD 1 #endif -- 2.20.1
Re: [PATCH v2 1/4] linux-user/sparc: Correct sparc64_get/set_context() FPU handling
On Fri, 6 Nov 2020 at 17:09, Richard Henderson wrote: > > On 11/6/20 7:27 AM, Peter Maydell wrote: > > +if (fprs & FPRS_DU) { > > +for (i = 16; i < 31; i++) { > > 32. Derp. Lucky this code basically never gets run, eh ? :-) -- PMM
[PATCH for-5.2 0/4] hw/net/can/ctucan: fix Coverity and other issues
This patchset fixes a couple of issues spotted by Coverity: * incorrect address checks meant the guest could write off the end of the tx_buffer arrays * we had an unused value in ctucan_send_ready_buffers() and also some I noticed while reading the code: * we don't adjust the device's non-portable use of bitfields on bigendian hosts * we should use stl_le_p() rather than casting uint_t* to uint32_t* Tested with "make check" only. thanks -- PMM Peter Maydell (4): hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer hw/net/can/ctucan: Avoid unused value in ctucan_send_ready_buffers() hw/net/can/ctucan_core: Handle big-endian hosts hw/net/ctucan_core: Use stl_le_p to write to tx_buffers hw/net/can/ctucan_core.h | 3 +-- hw/net/can/ctucan_core.c | 23 +++ 2 files changed, 8 insertions(+), 18 deletions(-) -- 2.20.1
[PATCH for-5.2 2/4] hw/net/can/ctucan: Avoid unused value in ctucan_send_ready_buffers()
Coverity points out that in ctucan_send_ready_buffers() we set buff_st_mask = 0xf << (i * 4) inside the loop, but then we never use it before overwriting it later. The only thing we use the mask for is as part of the code that is inserting the new buff_st field into tx_status. That is more comprehensibly written using deposit32(), so do that and drop the mask variable entirely. We also update the buff_st local variable at multiple points during this function, but nothing can ever see these intermediate values, so just drop those, write the final TXT_TOK as a fixed constant value, and collapse the only remaining set/use of buff_st down into an extract32(). Fixes: Coverity CID 1432869 Signed-off-by: Peter Maydell --- hw/net/can/ctucan_core.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c index ea09bf71a0c..f2ce978e5ec 100644 --- a/hw/net/can/ctucan_core.c +++ b/hw/net/can/ctucan_core.c @@ -240,8 +240,6 @@ static void ctucan_send_ready_buffers(CtuCanCoreState *s) uint8_t *pf; int buff2tx_idx; uint32_t tx_prio_max; -unsigned int buff_st; -uint32_t buff_st_mask; if (!s->mode_settings.s.ena) { return; @@ -256,10 +254,7 @@ static void ctucan_send_ready_buffers(CtuCanCoreState *s) for (i = 0; i < CTUCAN_CORE_TXBUF_NUM; i++) { uint32_t prio; -buff_st_mask = 0xf << (i * 4); -buff_st = (s->tx_status.u32 >> (i * 4)) & 0xf; - -if (buff_st != TXT_RDY) { +if (extract32(s->tx_status.u32, i * 4, 4) != TXT_RDY) { continue; } prio = (s->tx_priority.u32 >> (i * 4)) & 0x7; @@ -271,10 +266,7 @@ static void ctucan_send_ready_buffers(CtuCanCoreState *s) if (buff2tx_idx == -1) { break; } -buff_st_mask = 0xf << (buff2tx_idx * 4); -buff_st = (s->tx_status.u32 >> (buff2tx_idx * 4)) & 0xf; int_stat.u32 = 0; -buff_st = TXT_RDY; pf = s->tx_buffer[buff2tx_idx].data; ctucan_buff2frame(pf, ); s->status.s.idle = 0; @@ -283,12 +275,11 @@ static void ctucan_send_ready_buffers(CtuCanCoreState *s) s->status.s.idle = 1; s->status.s.txs = 0; s->tx_fr_ctr.s.tx_fr_ctr_val++; -buff_st = TXT_TOK; int_stat.s.txi = 1; int_stat.s.txbhci = 1; s->int_stat.u32 |= int_stat.u32 & ~s->int_mask.u32; -s->tx_status.u32 = (s->tx_status.u32 & ~buff_st_mask) | -(buff_st << (buff2tx_idx * 4)); +s->tx_status.u32 = deposit32(s->tx_status.u32, + buff2tx_idx * 4, 4, TXT_TOK); } while (1); } -- 2.20.1
[PATCH for-5.2 4/4] hw/net/ctucan_core: Use stl_le_p to write to tx_buffers
Instead of casting an address within a uint8_t array to a uint32_t*, use stl_le_p(). This handles possibly misaligned addresses which would otherwise crash on some hosts. Signed-off-by: Peter Maydell --- hw/net/can/ctucan_core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c index f2ce978e5ec..e66526efa83 100644 --- a/hw/net/can/ctucan_core.c +++ b/hw/net/can/ctucan_core.c @@ -305,8 +305,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, uint64_t val, addr %= CTUCAN_CORE_TXBUFF_SPAN; assert(buff_num < CTUCAN_CORE_TXBUF_NUM); if (addr < sizeof(s->tx_buffer[buff_num].data)) { -uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + addr); -*bufp = cpu_to_le32(val); +stl_le_p(s->tx_buffer[buff_num].data + addr, val); } } else { switch (addr & ~3) { -- 2.20.1
Re: [PATCH v2 2/4] linux-user/sparc: Remove unneeded checks of 'err' from sparc64_get_context()
On 11/6/20 7:27 AM, Peter Maydell wrote: > Unlike the kernel macros, our __get_user() and __put_user() do not > return a failure code. Kernel code typically has a style of > err |= __get_user(...); err |= __get_user(...); > and then checking err at the end. In sparc64_get_context() our > version of the code dropped the accumulating into err but left the > "if (err) goto do_sigsegv" checks, which will never be taken. Delete > unnecessary if()s. > > Signed-off-by: Peter Maydell > --- > linux-user/sparc/signal.c | 4 > 1 file changed, 4 deletions(-) Reviewed-by: Richard Henderson r~
[PATCH for-5.2 1/4] hw/net/can/ctucan: Don't allow guest to write off end of tx_buffer
The ctucan device has 4 CAN bus cores, each of which has a set of 20 32-bit registers for writing the transmitted data. The registers are however not contiguous; each core's buffers is 0x100 bytes after the last. We got the checks on the address wrong in the ctucan_mem_write() function: * the first "is addr in range at all" check allowed addr == CTUCAN_CORE_MEM_SIZE, which is actually the first byte off the end of the range * the decode of addresses into core-number plus offset in the tx buffer for that core failed to check that the offset was in range, so the guest could write off the end of the tx_buffer[] array * the decode had an explicit check for whether the core-number was out of range, which is actually impossible given the CTUCAN_CORE_MEM_SIZE check and the number of cores. Fix the top level check, check the offset, and turn the check on the core-number into an assertion. Fixes: Coverity CID 1432874 Signed-off-by: Peter Maydell --- hw/net/can/ctucan_core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/net/can/ctucan_core.c b/hw/net/can/ctucan_core.c index d20835cd7e9..ea09bf71a0c 100644 --- a/hw/net/can/ctucan_core.c +++ b/hw/net/can/ctucan_core.c @@ -303,7 +303,7 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, uint64_t val, DPRINTF("write 0x%02llx addr 0x%02x\n", (unsigned long long)val, (unsigned int)addr); -if (addr > CTUCAN_CORE_MEM_SIZE) { +if (addr >= CTUCAN_CORE_MEM_SIZE) { return; } @@ -312,7 +312,8 @@ void ctucan_mem_write(CtuCanCoreState *s, hwaddr addr, uint64_t val, addr -= CTU_CAN_FD_TXTB1_DATA_1; buff_num = addr / CTUCAN_CORE_TXBUFF_SPAN; addr %= CTUCAN_CORE_TXBUFF_SPAN; -if (buff_num < CTUCAN_CORE_TXBUF_NUM) { +assert(buff_num < CTUCAN_CORE_TXBUF_NUM); +if (addr < sizeof(s->tx_buffer[buff_num].data)) { uint32_t *bufp = (uint32_t *)(s->tx_buffer[buff_num].data + addr); *bufp = cpu_to_le32(val); } -- 2.20.1
Re: [PATCH v2 3/4] linux-user/sparc: Don't restore %g7 in sparc64_set_context()
On 11/6/20 7:27 AM, Peter Maydell wrote: > The kernel does not restore the g7 register in sparc64_set_context(); > neither should we. (We still save it in sparc64_get_context().) > > Signed-off-by: Peter Maydell > --- > linux-user/sparc/signal.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Richard Henderson r~
Re: Question on UEFI ACPI tables setup and probing on arm64
On 11/05/20 05:30, Ying Fang wrote: > I see it in Qemu the *loader_start* is fixed at 1 GiB on the > physical address space which points to the DRAM base. In ArmVirtQemu.dsc > PcdDeviceTreeInitialBaseAddress is set 0x4000 with correspondence. > > Here I also see the discussion about DRAM base for ArmVirtQemu. > https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03127.html > > I am still not sure how UEFI knows that it is running on a ArmVirtQemu > machine type. It doesn't know. It remains a convention. This part is not auto-detected; the constants in QEMU and edk2 are independently open-coded, their values were synchronized by human effort initially. The user or the management layer have to make sure they boot a UEFI firmware binary on the machine type that is compatible with the machine type. There is some meta-data to help with that: > Does UEFI derive it from the fdt *compatible* property ? Please see the schema "docs/interop/firmware.json" in the QEMU tree; in particular the @FirmwareTarget element. For an actual example: QEMU bundles some edk2 firmware binaries (purely as a convenience, not for production), and those are accompanied by matching descriptor files. See "pc-bios/descriptors/60-edk2-aarch64.json". (It is a template that's fixed up during QEMU installation, but that's tangential here.) "targets": [ { "architecture": "aarch64", "machines": [ "virt-*" ] } ], Thanks Laszlo
Re: [PATCH v2 1/4] linux-user/sparc: Correct sparc64_get/set_context() FPU handling
On 11/6/20 7:27 AM, Peter Maydell wrote: > +if (fprs & FPRS_DU) { > +for (i = 16; i < 31; i++) { 32. Otherwise, Reviewed-by: Richard Henderson r~
[PATCH-for-6.0 1/2] hw/scsi/scsi-disk: Rename type as TYPE_SCSI_DISK
Rename TYPE_SCSI_DISK without the '_BASE' suffix to match the other abstract types in the codebase. Signed-off-by: Philippe Mathieu-Daudé --- hw/scsi/scsi-disk.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index e859534eaf3..d2b9cb28da1 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -53,9 +53,9 @@ #define DEFAULT_MAX_UNMAP_SIZE (1 * GiB) #define DEFAULT_MAX_IO_SIZE INT_MAX /* 2 GB - 1 block */ -#define TYPE_SCSI_DISK_BASE "scsi-disk-base" +#define TYPE_SCSI_DISK "scsi-disk-base" -OBJECT_DECLARE_TYPE(SCSIDiskState, SCSIDiskClass, SCSI_DISK_BASE) +OBJECT_DECLARE_TYPE(SCSIDiskState, SCSIDiskClass, SCSI_DISK) struct SCSIDiskClass { SCSIDeviceClass parent_class; @@ -2956,7 +2956,7 @@ BlockAIOCB *scsi_dma_writev(int64_t offset, QEMUIOVector *iov, static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); -SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass); +SCSIDiskClass *sdc = SCSI_DISK_CLASS(klass); dc->fw_name = "disk"; dc->reset = scsi_disk_reset; @@ -2966,7 +2966,7 @@ static void scsi_disk_base_class_initfn(ObjectClass *klass, void *data) } static const TypeInfo scsi_disk_base_info = { -.name = TYPE_SCSI_DISK_BASE, +.name = TYPE_SCSI_DISK, .parent= TYPE_SCSI_DEVICE, .class_init= scsi_disk_base_class_initfn, .instance_size = sizeof(SCSIDiskState), @@ -3036,7 +3036,7 @@ static void scsi_hd_class_initfn(ObjectClass *klass, void *data) static const TypeInfo scsi_hd_info = { .name = "scsi-hd", -.parent= TYPE_SCSI_DISK_BASE, +.parent= TYPE_SCSI_DISK, .class_init= scsi_hd_class_initfn, }; @@ -3067,7 +3067,7 @@ static void scsi_cd_class_initfn(ObjectClass *klass, void *data) static const TypeInfo scsi_cd_info = { .name = "scsi-cd", -.parent= TYPE_SCSI_DISK_BASE, +.parent= TYPE_SCSI_DISK, .class_init= scsi_cd_class_initfn, }; @@ -3090,7 +3090,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); SCSIDeviceClass *sc = SCSI_DEVICE_CLASS(klass); -SCSIDiskClass *sdc = SCSI_DISK_BASE_CLASS(klass); +SCSIDiskClass *sdc = SCSI_DISK_CLASS(klass); sc->realize = scsi_block_realize; sc->alloc_req= scsi_block_new_request; @@ -3106,7 +3106,7 @@ static void scsi_block_class_initfn(ObjectClass *klass, void *data) static const TypeInfo scsi_block_info = { .name = "scsi-block", -.parent= TYPE_SCSI_DISK_BASE, +.parent= TYPE_SCSI_DISK, .class_init= scsi_block_class_initfn, }; #endif @@ -3146,7 +3146,7 @@ static void scsi_disk_class_initfn(ObjectClass *klass, void *data) static const TypeInfo scsi_disk_info = { .name = "scsi-disk", -.parent= TYPE_SCSI_DISK_BASE, +.parent= TYPE_SCSI_DISK, .class_init= scsi_disk_class_initfn, }; -- 2.26.2
[PATCH-for-6.0 2/2] hw/scsi/scsi-disk: Use SCSI_DISK_GET_CLASS() macro
Use the SCSI_DISK_GET_CLASS() macro to match the rest of the codebase. Signed-off-by: Philippe Mathieu-Daudé --- hw/scsi/scsi-disk.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index d2b9cb28da1..deb51ec8e7d 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -338,7 +338,7 @@ static void scsi_read_complete(void *opaque, int ret) static void scsi_do_read(SCSIDiskReq *r, int ret) { SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); -SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s)); +SCSIDiskClass *sdc = SCSI_DISK_GET_CLASS(s); assert (r->req.aiocb == NULL); if (scsi_disk_req_check_error(r, ret, false)) { @@ -438,7 +438,7 @@ static bool scsi_handle_rw_error(SCSIDiskReq *r, int error, bool acct_failed) { bool is_read = (r->req.cmd.mode == SCSI_XFER_FROM_DEV); SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); -SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s)); +SCSIDiskClass *sdc = SCSI_DISK_GET_CLASS(s); BlockErrorAction action = blk_get_error_action(s->qdev.conf.blk, is_read, error); @@ -538,7 +538,7 @@ static void scsi_write_data(SCSIRequest *req) { SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); -SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s)); +SCSIDiskClass *sdc = SCSI_DISK_GET_CLASS(s); /* No data transfer may already be in progress */ assert(r->req.aiocb == NULL); @@ -2149,7 +2149,7 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, uint8_t *buf) { SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req); SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev); -SCSIDiskClass *sdc = (SCSIDiskClass *) object_get_class(OBJECT(s)); +SCSIDiskClass *sdc = SCSI_DISK_GET_CLASS(s); uint32_t len; uint8_t command; -- 2.26.2
[PATCH-for-6.0 0/2] hw/scsi/scsi-disk: QOM style change
Some QOM style changes in TYPE_SCSI_DISK to follow the rest of the codebase style. No logical change. Philippe Mathieu-Daudé (2): hw/scsi/scsi-disk: Rename type as TYPE_SCSI_DISK hw/scsi/scsi-disk: Use SCSI_DISK_GET_CLASS() macro hw/scsi/scsi-disk.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) -- 2.26.2
Re: [PATCH 1/1] Change the order of g_free(info) and tracepoint
On 11/6/20 4:59 PM, David Edmondson wrote: > On Friday, 2020-11-06 at 19:09:24 +0530, Kirti Wankhede wrote: > >> Fixes Coverity issue: >> CID 1436126: Memory - illegal accesses (USE_AFTER_FREE) >> >> Fixes: a9e271ec9b36 ("vfio: Add migration region initialization and finalize >> function") >> >> Signed-off-by: Kirti Wankhede > > Maybe "fix use after free in vfio_migration_probe" as a summary? Yes please :) Reviewed-by: Philippe Mathieu-Daudé > > Reviewed-by: David Edmondson > >> --- >> hw/vfio/migration.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index 3ce285ea395d..55261562d4f3 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -897,8 +897,8 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error >> **errp) >> goto add_blocker; >> } >> >> -g_free(info); >> trace_vfio_migration_probe(vbasedev->name, info->index); >> +g_free(info); >> return 0; >> >> add_blocker: >> -- >> 2.7.0 > > dme. >
Re: [PATCH 1/2] qemu-option: move help handling to get_opt_name_value
One more thought... Markus Armbruster writes: > Paolo Bonzini writes: [...] >> diff --git a/util/qemu-option.c b/util/qemu-option.c [...] >> @@ -767,16 +763,18 @@ void qemu_opts_print(QemuOpts *opts, const char >> *separator) >> >> static const char *get_opt_name_value(const char *params, >>const char *firstname, >> + bool *help_wanted, >>char **name, char **value) >> { >> -const char *p, *pe, *pc; >> - >> -pe = strchr(params, '='); >> -pc = strchr(params, ','); >> +const char *p; >> +size_t len; >> >> -if (!pe || (pc && pc < pe)) { >> +len = strcspn(params, "=,"); >> +if (params[len] != '=') { >> /* found "foo,more" */ >> -if (firstname) { >> +if (help_wanted && starts_with_help_option(params) == len) { >> +*help_wanted = true; >> +} else if (firstname) { >> /* implicitly named first option */ >> *name = g_strdup(firstname); >> p = get_opt_value(params, value); > > This function parses one parameter from @params into @name, @value, and > returns a pointer to the next parameter, or else to the terminating > '\0'. Like opt_validate() before, this sets *help_wanted only to true. Callers must pass a pointer to false. Perhaps having it set *help_wanted always could simplify things overall. Up to you. [...]
Re: [PATCH] migration/dirtyrate: simplify inlcudes in dirtyrate.c
* Zheng Chuan (zhengch...@huawei.com) wrote: > Kindly ping for not forgetting this trivial fix:) Yes but it's too late for the merge window, so it'll happen on the next one, no rush! Dave > On 2020/10/30 22:09, Mark Kanda wrote: > > On 10/29/2020 10:58 PM, Chuan Zheng wrote: > >> Remove redundant blank line which is left by Commit 662770af7c6e8c, > >> also take this opportunity to remove redundant includes in dirtyrate.c. > >> > >> Signed-off-by: Chuan Zheng > > > > Reviewed-by: Mark Kanda > > > >> --- > >> Â migration/dirtyrate.c | 5 - > >> Â 1 file changed, 5 deletions(-) > >> > >> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c > >> index 8f728d2..ccb9814 100644 > >> --- a/migration/dirtyrate.c > >> +++ b/migration/dirtyrate.c > >> @@ -11,17 +11,12 @@ > >> Â Â */ > >> Â Â #include "qemu/osdep.h" > >> - > >> Â #include > >> Â #include "qapi/error.h" > >> Â #include "cpu.h" > >> -#include "qemu/config-file.h" > >> -#include "exec/memory.h" > >> Â #include "exec/ramblock.h" > >> -#include "exec/target_page.h" > >> Â #include "qemu/rcu_queue.h" > >> Â #include "qapi/qapi-commands-migration.h" > >> -#include "migration.h" > >> Â #include "ram.h" > >> Â #include "trace.h" > >> Â #include "dirtyrate.h" > >> > > -- > Regards. > Chuan > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH 2/2] qemu-option: warn for short-form boolean options
Paolo Bonzini writes: > Options such as "server" or "nowait", that are commonly found in -chardev, > are sugar for "server=on" and "wait=off". This is quite surprising and > also does not have any notion of typing attached. It is even possible to > do "-device e1000,noid" and get a device with "id=off". > > Deprecate all this, except for -chardev and -spice where it is in > wide use. I consider this a misuse of deprecation, to be frank. If something is known to be unused, we just remove it. Deprecation is precisely for things that are used. I'm with Daniel here: let's deprecate this sugar everywhere. Wide use may justify extending the deprecation grace period. > Signed-off-by: Paolo Bonzini > --- > chardev/char.c | 1 + > docs/system/deprecated.rst | 7 +++ > include/qemu/option.h | 1 + > tests/test-qemu-opts.c | 1 + > ui/spice-core.c| 1 + > util/qemu-option.c | 21 ++--- > 6 files changed, 25 insertions(+), 7 deletions(-) > > diff --git a/chardev/char.c b/chardev/char.c > index 78553125d3..108da615df 100644 > --- a/chardev/char.c > +++ b/chardev/char.c > @@ -829,6 +829,7 @@ Chardev *qemu_chr_find(const char *name) > > QemuOptsList qemu_chardev_opts = { > .name = "chardev", > +.allow_flag_options = true, /* server, nowait, etc. */ > .implied_opt_name = "backend", > .head = QTAILQ_HEAD_INITIALIZER(qemu_chardev_opts.head), > .desc = { > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst > index 32a0e620db..0e7edf7e56 100644 > --- a/docs/system/deprecated.rst > +++ b/docs/system/deprecated.rst > @@ -146,6 +146,13 @@ Drives with interface types other than ``if=none`` are > for onboard > devices. It is possible to use drives the board doesn't pick up with > -device. This usage is now deprecated. Use ``if=none`` instead. > > +Short-form boolean options (since 5.2) > +'' > + > +Boolean options such as ``share=on``/``share=off`` can be written > +in short form as ``share`` and ``noshare``. This is deprecated > +for all command-line options except ``-chardev` and ``-spice``, for > +which the short form was in wide use. Unlike the commit message, the text here misleads readers into assuming the sugar applies only to boolean options. > > QEMU Machine Protocol (QMP) commands > > diff --git a/include/qemu/option.h b/include/qemu/option.h > index ac69352e0e..046ac06a1f 100644 > --- a/include/qemu/option.h > +++ b/include/qemu/option.h > @@ -65,6 +65,7 @@ struct QemuOptsList { > const char *name; > const char *implied_opt_name; > bool merge_lists; /* Merge multiple uses of option into a single list? > */ > +bool allow_flag_options; /* Whether to warn for short-form boolean > options */ I disagree with having this option. I'm reviewing it anyway. Staying under the 80 characters limit is really, really easy here: bool allow_flag_options;/* Warn on short-form boolean options? */ I had to read ahead to figure out that false means warn, true means accept silently. The comment is confusing because "whether to warn" suggests "warn if true", which is wrong. Clearer (I think), and even shorter: bool allow_bool_sugar; /* Is boolean option sugar okay? */ > QTAILQ_HEAD(, QemuOpts) head; > QemuOptDesc desc[]; > }; > diff --git a/tests/test-qemu-opts.c b/tests/test-qemu-opts.c > index 297ffe79dd..a74c475773 100644 > --- a/tests/test-qemu-opts.c > +++ b/tests/test-qemu-opts.c > @@ -77,6 +77,7 @@ static QemuOptsList opts_list_02 = { > static QemuOptsList opts_list_03 = { > .name = "opts_list_03", > .implied_opt_name = "implied", > +.allow_flag_options = true, > .head = QTAILQ_HEAD_INITIALIZER(opts_list_03.head), > .desc = { > /* no elements => accept any params */ Can you point me to the tests this hunk affects? Do we have test coverage for boolean sugar with both values of allow_flag_options? If no, why is that okay? > diff --git a/ui/spice-core.c b/ui/spice-core.c > index eea52f5389..08f834fa41 100644 > --- a/ui/spice-core.c > +++ b/ui/spice-core.c > @@ -397,6 +397,7 @@ static SpiceChannelList *qmp_query_spice_channels(void) > > static QemuOptsList qemu_spice_opts = { > .name = "spice", > +.allow_flag_options = true, /* disable-agent-file-xfer, sasl, unix, etc. > */ > .head = QTAILQ_HEAD_INITIALIZER(qemu_spice_opts.head), > .merge_lists = true, > .desc = { > diff --git a/util/qemu-option.c b/util/qemu-option.c > index 61fc96f9dd..858860377b 100644 > --- a/util/qemu-option.c > +++ b/util/qemu-option.c > @@ -763,10 +763,12 @@ void qemu_opts_print(QemuOpts *opts, const char > *separator) > > static const char *get_opt_name_value(const char *params, >const char *firstname, > + bool warn_on_flag, Two callers pass false,
Re: [PATCH] CODING_STYLE.rst: Be less strict about 80 character limit
On Fri, Nov 06, 2020 at 11:29:40AM +, Peter Maydell wrote: > Relax the wording about line lengths a little bit; this goes with the > checkpatch changes to warn at 100 characters rather than 80. > > (Compare the Linux kernel commit bdc48fa11e46f8; our coding style is > not theirs, but the rationale is good and applies to us too.) > > Signed-off-by: Peter Maydell Reviewed-by: Michael S. Tsirkin > --- > CODING_STYLE.rst | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/CODING_STYLE.rst b/CODING_STYLE.rst > index 8b13ef0669e..7bf4e39d487 100644 > --- a/CODING_STYLE.rst > +++ b/CODING_STYLE.rst > @@ -85,8 +85,13 @@ Line width > Lines should be 80 characters; try not to make them longer. > > Sometimes it is hard to do, especially when dealing with QEMU subsystems > -that use long function or symbol names. Even in that case, do not make > -lines much longer than 80 characters. > +that use long function or symbol names. If wrapping the line at 80 columns > +is obviously less readable and more awkward, prefer not to wrap it; better > +to have an 85 character line than one which is awkwardly wrapped. > + > +Even in that case, try not to make lines much longer than 80 characters. > +(The checkpatch script will warn at 100 characters, but this is intended > +as a guard against obviously-overlength lines, not a target.) > > Rationale: > > -- > 2.20.1
Re: [PATCH 1/1] Change the order of g_free(info) and tracepoint
Kirti Wankhede writes: > Fixes Coverity issue: > CID 1436126: Memory - illegal accesses (USE_AFTER_FREE) > > Fixes: a9e271ec9b36 ("vfio: Add migration region initialization and finalize > function") > > Signed-off-by: Kirti Wankhede Reviewed-by: Alex Bennée -- Alex Bennée
Re: [PATCH] scripts/checkpatch.pl: Modify the line length limit of the code
On Fri, 6 Nov 2020 at 16:08, Markus Armbruster wrote: > Peter Maydell writes: > > Personally I just don't think checkpatch should be nudging people > > into folding 85-character lines, especially when there are > > multiple very similar lines in a row and only one would get > > folded, eg the prototypes in target/arm/helper.h -- some of > > these just edge beyond 80 characters and I think wrapping them > > is clearly worse for readability. > > The warning's intent is "are you sure this line is better not broken?" > The problem is people treating it as an order that absolves them from > using good judgement instead. > > I propose to fix it by phrasing the warning more clearly. Instead of > > WARNING: line over 80 characters > > we could say > > WARNING: line over 80 characters > Please examine the line, and use your judgement to decide whether > it should be broken. I would suggest that for a line over 80 characters and less than 85 characters, the answer is going to be "better not broken" a pretty high percentage of the time; that is, the warning has too many false positives, and we should tune it to have fewer. And the lure of "produce no warnings" is a strong one, so we should be at least cautious about what our tooling is nudging us into doing. thanks -- PMM
Re: [PATCH] scripts/checkpatch.pl: Modify the line length limit of the code
Peter Maydell writes: > On Fri, 6 Nov 2020 at 14:08, Philippe Mathieu-Daudé wrote: >> Can we keep the error please? Maybe 132 is the next display logical >> limit once we increased the warning from 80 to 100. >> >> I understand hardware evolved, we have larger displays with better >> resolution and can fit more characters in a line. [...] > > Personally I just don't think checkpatch should be nudging people > into folding 85-character lines, especially when there are > multiple very similar lines in a row and only one would get > folded, eg the prototypes in target/arm/helper.h -- some of > these just edge beyond 80 characters and I think wrapping them > is clearly worse for readability. The warning's intent is "are you sure this line is better not broken?" The problem is people treating it as an order that absolves them from using good judgement instead. I propose to fix it by phrasing the warning more clearly. Instead of WARNING: line over 80 characters we could say WARNING: line over 80 characters Please examine the line, and use your judgement to decide whether it should be broken. > If we don't want people > sending us "style fix" patches which wrap >80 char lines > (which I think we do not) then we shouldn't have checkpatch > complain about them, because if it does then that's what we get. I think that's throwing out the baby with the bathwater. checkpatch's purpose is not guiding inexperienced developers to style issues they can fix. It is relieving maintainers of the tedium of catching and explaining certain kinds of issues patches frequently have. Neutering checks that have led inexperienced developers to post less than useful patches may well relieve maintainers of having to reject such patches. But it comes a price: maintainers and contributors lose automated help with checking useful patches. I consider that a bad trade. We may want to discourage inexperienced contributors from sending us style fix patches. Fixing style takes good taste, which develops only with experience. Moreover, fixing up style builds only little experience. At best, it exercises "configure; make check" and the patch submission process and running "make check"). There are better ways to get your feet wet.
Re: [PATCH 1/1] Change the order of g_free(info) and tracepoint
On Friday, 2020-11-06 at 19:09:24 +0530, Kirti Wankhede wrote: > Fixes Coverity issue: > CID 1436126: Memory - illegal accesses (USE_AFTER_FREE) > > Fixes: a9e271ec9b36 ("vfio: Add migration region initialization and finalize > function") > > Signed-off-by: Kirti Wankhede Maybe "fix use after free in vfio_migration_probe" as a summary? Reviewed-by: David Edmondson > --- > hw/vfio/migration.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index 3ce285ea395d..55261562d4f3 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -897,8 +897,8 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error > **errp) > goto add_blocker; > } > > -g_free(info); > trace_vfio_migration_probe(vbasedev->name, info->index); > +g_free(info); > return 0; > > add_blocker: > -- > 2.7.0 dme. -- I think I waited too long, I'm moving into the dollhouse.
Re: [PATCH v2 00/44] Make qdev static property API usable by any QOM type
On Fri, Nov 06, 2020 at 10:45:11AM +0100, Kevin Wolf wrote: > Am 04.11.2020 um 16:59 hat Eduardo Habkost geschrieben: > > This series refactor the qdev property code so the static > > property system can be used by any QOM type. As an example, at > > the end of the series some properties in TYPE_MACHINE are > > converted to static properties to demonstrate the new API. > > > > Changes v1 -> v2 > > > > > > * Rename functions and source files to call the new feature > > "field property" instead of "static property" > > > > * Change the API signature from an array-based interface similar > > to device_class_set_props() to a object_property_add()-like > > interface. > > > > This means instead of doing this: > > > > object_class_property_add_static_props(oc, my_array_of_props); > > > > properties are registered like this: > > > > object_class_property_add_field(oc, "property-name" > > PROP_XXX(MyState, my_field), > > prop_allow_set_always); > > > > where PROP_XXX is a PROP_* macro like PROP_STRING, PROP_BOOL, > > etc. > > In comparison, I really like the resulting code from the array based > interface in v1 better. > > I think it's mostly for two reasons: First, the array is much more > compact and easier to read. And maybe even more importantly, you know > it's static data and only static data. The function based interface can > be mixed with other code or the parameter list can contain calls to > other functions with side effects, so there are a lot more opportunities > for surprises. This is a really good point, and I strongly agree with it. Letting code do funny tricks at runtime is one of the reasons QOM properties became hard to introspect. > > What I didn't like about the v1 interface is that there is still a > separate object_class_property_set_description() for each property, but > I think that could have been fixed by moving the description to the > definitions in the array, too. This would be very easy to implement. > > On Fri, Oct 30, 2020 at 06:10:34PM +0100, Paolo Bonzini wrote: > > On 29/10/20 23:02, Eduardo Habkost wrote: > > > +static Property machine_props[] = { > > > +DEFINE_PROP_STRING("kernel", MachineState, kernel_filename), > > > +DEFINE_PROP_STRING("initrd", MachineState, initrd_filename), > > > +DEFINE_PROP_STRING("append", MachineState, kernel_cmdline), > > > +DEFINE_PROP_STRING("dtb", MachineState, dtb), > > > +DEFINE_PROP_STRING("dumpdtb", MachineState, dumpdtb), > > > +DEFINE_PROP_STRING("dt-compatible", MachineState, dt_compatible), > > > +DEFINE_PROP_STRING("firmware", MachineState, firmware), > > > +DEFINE_PROP_STRING("memory-backend", MachineState, ram_memdev_id), > > > +DEFINE_PROP_END_OF_LIST(), > > > +}; > > > + > > > > While I think generalizing the _code_ for static properties is obviously > > a good idea, I am not sure about generalizing the interface for adding them. > > > > The reason is that we already have a place for adding properties in > > class_init, and having a second makes things "less local", so to speak. > > As long as you have the function call to apply the properites array in > .class_init, it should be obvious enough what you're doing. > > Of course, I think we should refrain from mixing both styles in a single > object, but generally speaking the array feels so much better that I > don't think we should reject it just because QOM only had a different > interface so far (and the property array is preexisting in qdev, too, so > we already have differences between objects - in fact, the majority of > objects is probably qdev today). This is also a strong argument. The QEMU code base has ~500 matches for "object*_property_add*" calls, and ~2100 for "DEFINE_PROP*". Converting qdev arrays to object_class_property_add_*() calls would be a huge effort with no gains. The end result would be two different APIs for registering class field properties coexisting for a long time, and people still confused on what's the preferred API. -- Eduardo
Re: [PATCH] qtest: Fix bad printf format specifiers
Philippe Mathieu-Daudé writes: > On 11/6/20 7:33 AM, Markus Armbruster wrote: [...] >> In other words "%" PRIu32 is just a less legible alias for "%u" in all >> cases that matter. > > Can we add a checkpatch rule to avoid using 'PRI[dux]32' format, > so it is clear for everyone? I guess we could, but *I* can't: -ENOTIME, sorry.
Re: Migrating to the gitlab issue tracker
On 11/04/20 18:19, Daniel P. Berrangé wrote: > This just sounds like fairly niche requirements for which directly > subscribing to the project issue tracker will satisfy 99% of the time. OK. Laszlo
Re: [PATCH for-5.2 3/4] qtest: escape device name in device-introspect-test
Paolo Bonzini writes: > On 06/11/20 14:15, Markus Armbruster wrote: >> Paolo Bonzini writes: >> >>> device-introspect-test uses HMP, so it should escape the device name >>> properly. Because of this, a few devices that had commas in their >>> names were escaping testing. >>> Signed-off-by: Paolo Bonzini >> >> $ git-grep '\.name *= *"[^"]*,' | cat >> hw/block/fdc.c:.name = "SUNW,fdtwo" >> Any others? > > Not that I know, but this is a bug anyway. :) Yes, but "a few devices" made me curious.
Re: [PATCH 0/2] Increase amount of data for monitor to read
Please exclude this address when reply: jc...@redhat.com Andrey
Re: [PATCH v2 2/7] block: add bdrv_replace_node_common()
On Fri 06 Nov 2020 01:42:36 PM CET, Vladimir Sementsov-Ogievskiy wrote: > Add new parameter to bdrv_replace_node(): auto_skip. With > auto_skip=false we'll have stricter behavior: update _all_ from > parents or fail. New behaviour will be used in the following commit in > block.c, so keep original function name as public interface. > > Note: new error message is a bit funny in contrast with further > "Cannot" in case of frozen child, but we'd better keep some difference > to make it possible to distinguish one from another on failure. Still, > actually we'd better refactor should_update_child() call to distinguish > also different kinds of "should not". Let's do it later. > > Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Alberto Garcia Berto
Re: [PATCH v2 0/4] linux/sparc: more get/set_context fixes
On Fri, 6 Nov 2020 at 15:27, Peter Maydell wrote: > > Based-on: 20201105212314.9628-1-peter.mayd...@linaro.org > ("[PATCH for-5.2 0/3] linux-user: fix various sparc64 guest bugs") > > This series fixes a few more issues with our sparc linux-user > sparc64_get_context() and sparc64_set_context() implementation: > * we weren't handling FPU regs correctly, and also the way >we coded the handling triggered Coverity warnings > * some stray pointless error checks > * we shouldn't restore %g7 in set_context > * we weren't saving and restoring tstate correctly The 'v2' in the subject tag is wrong, incidentally; this is the first version of this series :-) -- PMM
[PATCH v2 2/4] linux-user/sparc: Remove unneeded checks of 'err' from sparc64_get_context()
Unlike the kernel macros, our __get_user() and __put_user() do not return a failure code. Kernel code typically has a style of err |= __get_user(...); err |= __get_user(...); and then checking err at the end. In sparc64_get_context() our version of the code dropped the accumulating into err but left the "if (err) goto do_sigsegv" checks, which will never be taken. Delete unnecessary if()s. Signed-off-by: Peter Maydell --- linux-user/sparc/signal.c | 4 1 file changed, 4 deletions(-) diff --git a/linux-user/sparc/signal.c b/linux-user/sparc/signal.c index e661a769cb1..43dcd137f51 100644 --- a/linux-user/sparc/signal.c +++ b/linux-user/sparc/signal.c @@ -555,8 +555,6 @@ void sparc64_get_context(CPUSPARCState *env) for (i = 0; i < TARGET_NSIG_WORDS; i++, dst++, src++) { __put_user(*src, dst); } -if (err) -goto do_sigsegv; } /* XXX: tstate must be saved properly */ @@ -598,8 +596,6 @@ void sparc64_get_context(CPUSPARCState *env) * hidden behind an "if (fenab)" where fenab is always 0). */ -if (err) -goto do_sigsegv; unlock_user_struct(ucp, ucp_addr, 1); return; do_sigsegv: -- 2.20.1
[PATCH v2 3/4] linux-user/sparc: Don't restore %g7 in sparc64_set_context()
The kernel does not restore the g7 register in sparc64_set_context(); neither should we. (We still save it in sparc64_get_context().) Signed-off-by: Peter Maydell --- linux-user/sparc/signal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/linux-user/sparc/signal.c b/linux-user/sparc/signal.c index 43dcd137f51..ed32c7abd17 100644 --- a/linux-user/sparc/signal.c +++ b/linux-user/sparc/signal.c @@ -447,7 +447,7 @@ void sparc64_set_context(CPUSPARCState *env) __get_user(env->gregs[4], (&(*grp)[SPARC_MC_G4])); __get_user(env->gregs[5], (&(*grp)[SPARC_MC_G5])); __get_user(env->gregs[6], (&(*grp)[SPARC_MC_G6])); -__get_user(env->gregs[7], (&(*grp)[SPARC_MC_G7])); +/* Skip g7 as that's the thread register in userspace */ /* * Note that unlike the kernel, we didn't need to mess with the -- 2.20.1
[PATCH v2 4/4] linux-user/sparc: Handle tstate in sparc64_get/set_context()
Correctly implement save/restore of the tstate field in sparc64_get_context() and sparc64_set_context(): * Don't use the CWP value from the guest in set_context * Construct and save a tstate value rather than leaving it as zero in get_context To do this we factor out the "calculate TSTATE value from CPU state" code from sparc_cpu_do_interrupt() into its own sparc64_tstate() function; that in turn requires us to move some of the function prototypes out from inside a CPU_NO_IO_DEFS ifdef guard. Signed-off-by: Peter Maydell --- target/sparc/cpu.h | 24 linux-user/sparc/signal.c | 7 +++ target/sparc/int64_helper.c | 5 + 3 files changed, 24 insertions(+), 12 deletions(-) diff --git a/target/sparc/cpu.h b/target/sparc/cpu.h index 277254732b9..4b2290650be 100644 --- a/target/sparc/cpu.h +++ b/target/sparc/cpu.h @@ -608,10 +608,6 @@ target_ulong cpu_get_psr(CPUSPARCState *env1); void cpu_put_psr(CPUSPARCState *env1, target_ulong val); void cpu_put_psr_raw(CPUSPARCState *env1, target_ulong val); #ifdef TARGET_SPARC64 -target_ulong cpu_get_ccr(CPUSPARCState *env1); -void cpu_put_ccr(CPUSPARCState *env1, target_ulong val); -target_ulong cpu_get_cwp64(CPUSPARCState *env1); -void cpu_put_cwp64(CPUSPARCState *env1, int cwp); void cpu_change_pstate(CPUSPARCState *env1, uint32_t new_pstate); void cpu_gl_switch_gregs(CPUSPARCState *env, uint32_t new_gl); #endif @@ -829,4 +825,24 @@ static inline bool tb_am_enabled(int tb_flags) #endif } +#ifdef TARGET_SPARC64 +/* win_helper.c */ +target_ulong cpu_get_ccr(CPUSPARCState *env1); +void cpu_put_ccr(CPUSPARCState *env1, target_ulong val); +target_ulong cpu_get_cwp64(CPUSPARCState *env1); +void cpu_put_cwp64(CPUSPARCState *env1, int cwp); + +static inline uint64_t sparc64_tstate(CPUSPARCState *env) +{ +uint64_t tstate = (cpu_get_ccr(env) << 32) | +((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) | +cpu_get_cwp64(env); + +if (env->def.features & CPU_FEATURE_GL) { +tstate |= (env->gl & 7ULL) << 40; +} +return tstate; +} +#endif + #endif diff --git a/linux-user/sparc/signal.c b/linux-user/sparc/signal.c index ed32c7abd17..a6c7c7664a2 100644 --- a/linux-user/sparc/signal.c +++ b/linux-user/sparc/signal.c @@ -438,9 +438,9 @@ void sparc64_set_context(CPUSPARCState *env) env->npc = npc; __get_user(env->y, &((*grp)[SPARC_MC_Y])); __get_user(tstate, &((*grp)[SPARC_MC_TSTATE])); +/* Honour TSTATE_ASI, TSTATE_ICC and TSTATE_XCC only */ env->asi = (tstate >> 24) & 0xff; -cpu_put_ccr(env, tstate >> 32); -cpu_put_cwp64(env, tstate & 0x1f); +cpu_put_ccr(env, (tstate >> 32) & 0xff); __get_user(env->gregs[1], (&(*grp)[SPARC_MC_G1])); __get_user(env->gregs[2], (&(*grp)[SPARC_MC_G2])); __get_user(env->gregs[3], (&(*grp)[SPARC_MC_G3])); @@ -557,8 +557,7 @@ void sparc64_get_context(CPUSPARCState *env) } } -/* XXX: tstate must be saved properly */ -//__put_user(env->tstate, &((*grp)[SPARC_MC_TSTATE])); +__put_user(sparc64_tstate(env), &((*grp)[SPARC_MC_TSTATE])); __put_user(env->pc, &((*grp)[SPARC_MC_PC])); __put_user(env->npc, &((*grp)[SPARC_MC_NPC])); __put_user(env->y, &((*grp)[SPARC_MC_Y])); diff --git a/target/sparc/int64_helper.c b/target/sparc/int64_helper.c index f3e7f32de61..735668f5006 100644 --- a/target/sparc/int64_helper.c +++ b/target/sparc/int64_helper.c @@ -131,9 +131,7 @@ void sparc_cpu_do_interrupt(CPUState *cs) } tsptr = cpu_tsptr(env); -tsptr->tstate = (cpu_get_ccr(env) << 32) | -((env->asi & 0xff) << 24) | ((env->pstate & 0xf3f) << 8) | -cpu_get_cwp64(env); +tsptr->tstate = sparc64_tstate(env); tsptr->tpc = env->pc; tsptr->tnpc = env->npc; tsptr->tt = intno; @@ -148,7 +146,6 @@ void sparc_cpu_do_interrupt(CPUState *cs) } if (env->def.features & CPU_FEATURE_GL) { -tsptr->tstate |= (env->gl & 7ULL) << 40; cpu_gl_switch_gregs(env, env->gl + 1); env->gl++; } -- 2.20.1
[PATCH v2 0/4] linux/sparc: more get/set_context fixes
Based-on: 20201105212314.9628-1-peter.mayd...@linaro.org ("[PATCH for-5.2 0/3] linux-user: fix various sparc64 guest bugs") This series fixes a few more issues with our sparc linux-user sparc64_get_context() and sparc64_set_context() implementation: * we weren't handling FPU regs correctly, and also the way we coded the handling triggered Coverity warnings * some stray pointless error checks * we shouldn't restore %g7 in set_context * we weren't saving and restoring tstate correctly My main aim here was to deal with the Coverity errors, but the rest are things I noticed while I was working on the code or which had fixme comments, and I figured I'd fix them while the code was fresh in my mind. thanks -- PMM Peter Maydell (4): linux-user/sparc: Correct sparc64_get/set_context() FPU handling linux-user/sparc: Remove unneeded checks of 'err' from sparc64_get_context() linux-user/sparc: Don't restore %g7 in sparc64_set_context() linux-user/sparc: Handle tstate in sparc64_get/set_context() target/sparc/cpu.h | 28 +--- linux-user/sparc/signal.c | 87 - target/sparc/int64_helper.c | 5 +-- 3 files changed, 71 insertions(+), 49 deletions(-) -- 2.20.1