Re: [PATCH v2] failover: specify an alternate MAC address

2021-10-27 Thread Jason Wang
On Wed, Oct 27, 2021 at 6:00 PM Laurent Vivier  wrote:
>
> If the guest driver doesn't support the STANDBY feature, by default
> we keep the virtio-net device and don't hotplug the VFIO device,
> but in some cases, user can prefer to use the VFIO device rather
> than the virtio-net one. We can't unplug the virtio-net device
> (because on migration it is expected on the destination side) but
> we can keep both interfaces if the MAC addresses are different
> (to have the same MAC address can cause kernel crash with old
> kernel). The VFIO device will be unplugged before the migration
> like in the normal failover migration but without a failover device.
>
> This patch adds a new property to the virtio-net device:
> "failover-legacy-mac"
>
> If an alternate MAC address is provided with "failover-legacy-mac" and
> the STANDBY feature is not supported, both interfaces are plugged
> but the standby interface (virtio-net) MAC address is set to the
> value provided by the "failover-legacy-mac" parameter.
>
> If the STANDBY feature is supported by guest and QEMU, the virtio-net
> failover acts as usual.
>
> Signed-off-by: Laurent Vivier 

Acked-by: Jason Wang 

> ---
>
> Notes:
> v2: rename alt-mac to failover-legacy-mac
> update doc with text provided by MST
>
>  docs/system/virtio-net-failover.rst | 10 ++
>  hw/net/virtio-net.c | 48 +++--
>  include/hw/virtio/virtio-net.h  |  6 
>  3 files changed, 55 insertions(+), 9 deletions(-)
>
> diff --git a/docs/system/virtio-net-failover.rst 
> b/docs/system/virtio-net-failover.rst
> index 6002dc5d96e4..99f21cd55ef7 100644
> --- a/docs/system/virtio-net-failover.rst
> +++ b/docs/system/virtio-net-failover.rst
> @@ -51,6 +51,16 @@ Usage
>is only for pairing the devices within QEMU. The guest kernel module
>net_failover will match devices with identical MAC addresses.
>
> +  For legacy guests (including bios/EUFI) not supporting 
> VIRTIO_NET_F_STANDBY,
> +  two options exist:
> +
> +  1. if failover-legacy-mac has not been configured (default)
> + only the standby virtio-net device is visible to the guest
> +
> +  2. if failover-legacy-mac has been configured, virtio and vfio devices will
> + be presented to guest as two NIC devices, with virtio using the
> + failover-legacy-mac address.
> +
>  Hotplug
>  ---
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index f2014d5ea0b3..0d47d287de14 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -45,6 +45,9 @@
>  #include "net_rx_pkt.h"
>  #include "hw/virtio/vhost.h"
>
> +/* zero MAC address to check with */
> +static const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
> +
>  #define VIRTIO_NET_VM_VERSION11
>
>  #define MAC_TABLE_ENTRIES64
> @@ -126,7 +129,6 @@ static void virtio_net_get_config(VirtIODevice *vdev, 
> uint8_t *config)
>  VirtIONet *n = VIRTIO_NET(vdev);
>  struct virtio_net_config netcfg;
>  NetClientState *nc = qemu_get_queue(n->nic);
> -static const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
>
>  int ret = 0;
>  memset(, 0 , sizeof(struct virtio_net_config));
> @@ -871,10 +873,21 @@ static void failover_add_primary(VirtIONet *n, Error 
> **errp)
>  error_propagate(errp, err);
>  }
>
> +static void failover_plug_primary(VirtIONet *n)
> +{
> +Error *err = NULL;
> +
> +qapi_event_send_failover_negotiated(n->netclient_name);
> +qatomic_set(>failover_primary_hidden, false);
> +failover_add_primary(n, );
> +if (err) {
> +warn_report_err(err);
> +}
> +}
> +
>  static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
>  {
>  VirtIONet *n = VIRTIO_NET(vdev);
> -Error *err = NULL;
>  int i;
>
>  if (n->mtu_bypass_backend &&
> @@ -921,12 +934,22 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
> uint64_t features)
>  memset(n->vlans, 0xff, MAX_VLAN >> 3);
>  }
>
> -if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> -qapi_event_send_failover_negotiated(n->netclient_name);
> -qatomic_set(>failover_primary_hidden, false);
> -failover_add_primary(n, );
> -if (err) {
> -warn_report_err(err);
> +if (n->failover) {
> +if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> +if (memcmp(>legacy_mac, , sizeof(zero)) != 0 &&
> +memcmp(n->mac, >legacy_mac, ETH_ALEN) == 0) {
> +/*
> + * set_features can be called twice, without & with 
> F_STANDBY,
> + * so restore original MAC address
> + */
> +memcpy(n->mac, >nic->conf->macaddr, sizeof(n->mac));
> +qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
> +}
> +failover_plug_primary(n);
> +} else if (memcmp(>legacy_mac, , sizeof(zero)) != 0) {
> +memcpy(n->mac, >legacy_mac, ETH_ALEN);
> +

Re: [PATCH] monitor: Fix find_device_state() for IDs containing slashes

2021-10-27 Thread Markus Armbruster
Paolo Bonzini  writes:

> Acked-by: Paolo Bonzini 
>
> Thanks for the quick fix!

Who's going to do the pull request?




[PULL 3/4] qapi/monitor: allow VNC display id in set/expire_password

2021-10-27 Thread Markus Armbruster
From: Stefan Reiter 

It is possible to specify more than one VNC server on the command line,
either with an explicit ID or the auto-generated ones à la "default",
"vnc2", "vnc3", ...

It is not possible to change the password on one of these extra VNC
displays though. Fix this by adding a "display" parameter to the
"set_password" and "expire_password" QMP and HMP commands.

For HMP, the display is specified using the "-d" value flag.

For QMP, the schema is updated to explicitly express the supported
variants of the commands with protocol-discriminated unions.

Suggested-by: Markus Armbruster 
Signed-off-by: Stefan Reiter 
Message-Id: <20211021100135.4146766-4-s.rei...@proxmox.com>
Reviewed-by: Markus Armbruster 
Acked-by: Gerd Hoffmann 
Signed-off-by: Markus Armbruster 
---
 qapi/ui.json   | 112 +++--
 monitor/hmp-cmds.c |  45 --
 monitor/qmp-cmds.c |  36 ++-
 hmp-commands.hx|  24 +-
 4 files changed, 148 insertions(+), 69 deletions(-)

diff --git a/qapi/ui.json b/qapi/ui.json
index 15cc19dcc5..99ac29ad9c 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -39,20 +39,61 @@
   'data': [ 'fail', 'disconnect', 'keep' ] }
 
 ##
-# @set_password:
+# @SetPasswordOptions:
 #
-# Sets the password of a remote display session.
+# General options for set_password.
 #
 # @protocol: - 'vnc' to modify the VNC server password
 #- 'spice' to modify the Spice server password
 #
 # @password: the new password
 #
-# @connected: how to handle existing clients when changing the
-# password.  If nothing is specified, defaults to 'keep'
-# 'fail' to fail the command if clients are connected
-# 'disconnect' to disconnect existing clients
-# 'keep' to maintain existing clients
+# Since: 6.2
+#
+##
+{ 'union': 'SetPasswordOptions',
+  'base': { 'protocol': 'DisplayProtocol',
+'password': 'str' },
+  'discriminator': 'protocol',
+  'data': { 'vnc': 'SetPasswordOptionsVnc',
+'spice': 'SetPasswordOptionsSpice' } }
+
+##
+# @SetPasswordOptionsSpice:
+#
+# Options for set_password specific to the SPICE procotol.
+#
+# @connected: How to handle existing clients when changing the
+# password. If nothing is specified, defaults to 'keep'.
+#
+# Since: 6.2
+#
+##
+{ 'struct': 'SetPasswordOptionsSpice',
+  'data': { '*connected': 'SetPasswordAction' } }
+
+##
+# @SetPasswordOptionsVnc:
+#
+# Options for set_password specific to the VNC procotol.
+#
+# @display: The id of the display where the password should be changed.
+#   Defaults to the first.
+#
+# @connected: How to handle existing clients when changing the
+# password.
+#
+# Since: 6.2
+#
+##
+{ 'struct': 'SetPasswordOptionsVnc',
+  'data': { '*display': 'str',
+'*connected': 'SetPasswordAction' }}
+
+##
+# @set_password:
+#
+# Set the password of a remote display server.
 #
 # Returns: - Nothing on success
 #  - If Spice is not enabled, DeviceNotFound
@@ -66,18 +107,16 @@
 # <- { "return": {} }
 #
 ##
-{ 'command': 'set_password',
-  'data': { 'protocol': 'DisplayProtocol',
-'password': 'str',
-'*connected': 'SetPasswordAction' } }
+{ 'command': 'set_password', 'boxed': true, 'data': 'SetPasswordOptions' }
 
 ##
-# @expire_password:
+# @ExpirePasswordOptions:
 #
-# Expire the password of a remote display server.
-#
-# @protocol: the name of the remote display protocol 'vnc' or 'spice'
+# General options for expire_password.
 #
+# @protocol: - 'vnc' to modify the VNC server expiration
+#- 'spice' to modify the Spice server expiration
+
 # @time: when to expire the password.
 #
 #- 'now' to expire the password immediately
@@ -85,16 +124,45 @@
 #- '+INT' where INT is the number of seconds from now (integer)
 #- 'INT' where INT is the absolute time in seconds
 #
-# Returns: - Nothing on success
-#  - If @protocol is 'spice' and Spice is not active, DeviceNotFound
-#
-# Since: 0.14
-#
 # Notes: Time is relative to the server and currently there is no way to
 #coordinate server time with client time.  It is not recommended to
 #use the absolute time version of the @time parameter unless you're
 #sure you are on the same machine as the QEMU instance.
 #
+# Since: 6.2
+#
+##
+{ 'union': 'ExpirePasswordOptions',
+  'base': { 'protocol': 'DisplayProtocol',
+'time': 'str' },
+  'discriminator': 'protocol',
+  'data': { 'vnc': 'ExpirePasswordOptionsVnc' } }
+
+##
+# @ExpirePasswordOptionsVnc:
+#
+# Options for expire_password specific to the VNC procotol.
+#
+# @display: The id of the display where the expiration should be changed.
+#   Defaults to the first.
+#
+# Since: 6.2
+#
+##
+
+{ 'struct': 'ExpirePasswordOptionsVnc',
+  'data': { '*display': 'str' } }
+
+##
+# @expire_password:
+#
+# Expire the password of a remote display server.
+#
+# Returns: - Nothing on 

[PULL 0/4] Monitor patches patches for 2021-10-28

2021-10-27 Thread Markus Armbruster
The following changes since commit c52d69e7dbaaed0ffdef8125e79218672c30161d:

  Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20211027' 
into staging (2021-10-27 11:45:18 -0700)

are available in the Git repository at:

  git://repo.or.cz/qemu/armbru.git tags/pull-monitor-2021-10-28

for you to fetch changes up to 47c849357b57c1fbd3d3355c586c4784c6f4188e:

  qapi/monitor: only allow 'keep' SetPasswordAction for VNC and deprecate 
(2021-10-28 06:25:08 +0200)


Monitor patches patches for 2021-10-28


Stefan Reiter (4):
  monitor/hmp: add support for flag argument with value
  qapi/monitor: refactor set/expire_password with enums
  qapi/monitor: allow VNC display id in set/expire_password
  qapi/monitor: only allow 'keep' SetPasswordAction for VNC and deprecate

 docs/about/deprecated.rst  |   6 ++
 qapi/ui.json   | 156 +++--
 monitor/monitor-internal.h |   3 +-
 monitor/hmp-cmds.c |  48 +-
 monitor/hmp.c  |  19 +-
 monitor/qmp-cmds.c |  54 
 hmp-commands.hx|  24 +++
 7 files changed, 236 insertions(+), 74 deletions(-)

-- 
2.31.1




[PULL 2/4] qapi/monitor: refactor set/expire_password with enums

2021-10-27 Thread Markus Armbruster
From: Stefan Reiter 

'protocol' and 'connected' are better suited as enums than as strings,
make use of that. No functional change intended.

Suggested-by: Markus Armbruster 
Reviewed-by: Markus Armbruster 
Signed-off-by: Stefan Reiter 
Message-Id: <20211021100135.4146766-3-s.rei...@proxmox.com>
Acked-by: Gerd Hoffmann 
Signed-off-by: Markus Armbruster 
---
 qapi/ui.json   | 37 +++--
 monitor/hmp-cmds.c | 29 +++--
 monitor/qmp-cmds.c | 37 -
 3 files changed, 74 insertions(+), 29 deletions(-)

diff --git a/qapi/ui.json b/qapi/ui.json
index d7567ac866..15cc19dcc5 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -9,6 +9,35 @@
 { 'include': 'common.json' }
 { 'include': 'sockets.json' }
 
+##
+# @DisplayProtocol:
+#
+# Display protocols which support changing password options.
+#
+# Since: 6.2
+#
+##
+{ 'enum': 'DisplayProtocol',
+  'data': [ { 'name': 'vnc', 'if': 'CONFIG_VNC' },
+{ 'name': 'spice', 'if': 'CONFIG_SPICE' } ] }
+
+##
+# @SetPasswordAction:
+#
+# An action to take on changing a password on a connection with active clients.
+#
+# @fail: fail the command if clients are connected
+#
+# @disconnect: disconnect existing clients
+#
+# @keep: maintain existing clients
+#
+# Since: 6.2
+#
+##
+{ 'enum': 'SetPasswordAction',
+  'data': [ 'fail', 'disconnect', 'keep' ] }
+
 ##
 # @set_password:
 #
@@ -38,7 +67,9 @@
 #
 ##
 { 'command': 'set_password',
-  'data': {'protocol': 'str', 'password': 'str', '*connected': 'str'} }
+  'data': { 'protocol': 'DisplayProtocol',
+'password': 'str',
+'*connected': 'SetPasswordAction' } }
 
 ##
 # @expire_password:
@@ -71,7 +102,9 @@
 # <- { "return": {} }
 #
 ##
-{ 'command': 'expire_password', 'data': {'protocol': 'str', 'time': 'str'} }
+{ 'command': 'expire_password',
+  'data': { 'protocol': 'DisplayProtocol',
+'time': 'str' } }
 
 ##
 # @screendump:
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index bcaa41350e..b8abe69609 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1453,8 +1453,24 @@ void hmp_set_password(Monitor *mon, const QDict *qdict)
 const char *password  = qdict_get_str(qdict, "password");
 const char *connected = qdict_get_try_str(qdict, "connected");
 Error *err = NULL;
+DisplayProtocol proto;
+SetPasswordAction conn;
 
-qmp_set_password(protocol, password, !!connected, connected, );
+proto = qapi_enum_parse(_lookup, protocol,
+DISPLAY_PROTOCOL_VNC, );
+if (err) {
+goto out;
+}
+
+conn = qapi_enum_parse(_lookup, connected,
+   SET_PASSWORD_ACTION_KEEP, );
+if (err) {
+goto out;
+}
+
+qmp_set_password(proto, password, !!connected, conn, );
+
+out:
 hmp_handle_error(mon, err);
 }
 
@@ -1463,8 +1479,17 @@ void hmp_expire_password(Monitor *mon, const QDict 
*qdict)
 const char *protocol  = qdict_get_str(qdict, "protocol");
 const char *whenstr = qdict_get_str(qdict, "time");
 Error *err = NULL;
+DisplayProtocol proto;
 
-qmp_expire_password(protocol, whenstr, );
+proto = qapi_enum_parse(_lookup, protocol,
+DISPLAY_PROTOCOL_VNC, );
+if (err) {
+goto out;
+}
+
+qmp_expire_password(proto, whenstr, );
+
+out:
 hmp_handle_error(mon, err);
 }
 
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 5c0d5e116b..0654d7289a 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -163,33 +163,27 @@ void qmp_system_wakeup(Error **errp)
 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, errp);
 }
 
-void qmp_set_password(const char *protocol, const char *password,
-  bool has_connected, const char *connected, Error **errp)
+void qmp_set_password(DisplayProtocol protocol, const char *password,
+  bool has_connected, SetPasswordAction connected,
+  Error **errp)
 {
 int disconnect_if_connected = 0;
 int fail_if_connected = 0;
 int rc;
 
 if (has_connected) {
-if (strcmp(connected, "fail") == 0) {
-fail_if_connected = 1;
-} else if (strcmp(connected, "disconnect") == 0) {
-disconnect_if_connected = 1;
-} else if (strcmp(connected, "keep") == 0) {
-/* nothing */
-} else {
-error_setg(errp, QERR_INVALID_PARAMETER, "connected");
-return;
-}
+fail_if_connected = connected == SET_PASSWORD_ACTION_FAIL;
+disconnect_if_connected = connected == SET_PASSWORD_ACTION_DISCONNECT;
 }
 
-if (strcmp(protocol, "spice") == 0) {
+if (protocol == DISPLAY_PROTOCOL_SPICE) {
 if (!qemu_using_spice(errp)) {
 return;
 }
 rc = qemu_spice.set_passwd(password, fail_if_connected,
disconnect_if_connected);
-} else if (strcmp(protocol, "vnc") == 0) {
+ 

[PULL 4/4] qapi/monitor: only allow 'keep' SetPasswordAction for VNC and deprecate

2021-10-27 Thread Markus Armbruster
From: Stefan Reiter 

VNC only supports 'keep' here, enforce this via a seperate
SetPasswordActionVnc enum and mark the option 'deprecated' (as it is
useless with only one value possible).

Also add a deprecation note to docs.

Suggested-by: Eric Blake 
Reviewed-by: Markus Armbruster 
Signed-off-by: Stefan Reiter 
Message-Id: <20211021100135.4146766-5-s.rei...@proxmox.com>
Acked-by: Gerd Hoffmann 
Signed-off-by: Markus Armbruster 
---
 docs/about/deprecated.rst |  6 ++
 qapi/ui.json  | 21 -
 monitor/qmp-cmds.c|  5 -
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index be19317470..15b016e344 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -234,6 +234,12 @@ single ``bitmap``, the new ``block-export-add`` uses a 
list of ``bitmaps``.
 Member ``values`` in return value elements with meta-type ``enum`` is
 deprecated.  Use ``members`` instead.
 
+``set_password`` argument ``connected`` for VNC protocol (since 6.2)
+
+
+Only the value ``keep`` is and was ever supported for VNC. The (useless)
+argument will be dropped in a future version of QEMU.
+
 System accelerators
 ---
 
diff --git a/qapi/ui.json b/qapi/ui.json
index 99ac29ad9c..5292617b44 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -38,6 +38,20 @@
 { 'enum': 'SetPasswordAction',
   'data': [ 'fail', 'disconnect', 'keep' ] }
 
+##
+# @SetPasswordActionVnc:
+#
+# See @SetPasswordAction. VNC only supports the keep action. 'connection'
+# should just be omitted for VNC, this is kept for backwards compatibility.
+#
+# @keep: maintain existing clients
+#
+# Since: 6.2
+#
+##
+{ 'enum': 'SetPasswordActionVnc',
+  'data': [ 'keep' ] }
+
 ##
 # @SetPasswordOptions:
 #
@@ -83,12 +97,17 @@
 # @connected: How to handle existing clients when changing the
 # password.
 #
+# Features:
+# @deprecated: For VNC, @connected will always be 'keep', parameter should be
+#  omitted.
+#
 # Since: 6.2
 #
 ##
 { 'struct': 'SetPasswordOptionsVnc',
   'data': { '*display': 'str',
-'*connected': 'SetPasswordAction' }}
+'*connected': { 'type': 'SetPasswordActionVnc',
+'features': ['deprecated'] } } }
 
 ##
 # @set_password:
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 5637bd70b6..4825d0cbea 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -176,11 +176,6 @@ void qmp_set_password(SetPasswordOptions *opts, Error 
**errp)
 opts->u.spice.connected == SET_PASSWORD_ACTION_DISCONNECT);
 } else {
 assert(opts->protocol == DISPLAY_PROTOCOL_VNC);
-if (opts->u.vnc.connected != SET_PASSWORD_ACTION_KEEP) {
-/* vnc supports "connected=keep" only */
-error_setg(errp, QERR_INVALID_PARAMETER, "connected");
-return;
-}
 /* Note that setting an empty password will not disable login through
  * this interface. */
 rc = vnc_display_password(opts->u.vnc.display, opts->password);
-- 
2.31.1




[PULL 1/4] monitor/hmp: add support for flag argument with value

2021-10-27 Thread Markus Armbruster
From: Stefan Reiter 

Adds support for the "-xV" parameter type, where "-x" denotes a flag
name and the "V" suffix indicates that this flag is supposed to take an
arbitrary string parameter.

These parameters are always optional, the entry in the qdict will be
omitted if the flag is not given.

Signed-off-by: Stefan Reiter 
Message-Id: <20211021100135.4146766-2-s.rei...@proxmox.com>
Reviewed-by: Dr. David Alan Gilbert 
Acked-by: Gerd Hoffmann 
Signed-off-by: Markus Armbruster 
---
 monitor/monitor-internal.h |  3 ++-
 monitor/hmp.c  | 19 ++-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index 9c3a09cb01..9e708b329d 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -63,7 +63,8 @@
  * '.'  other form of optional type (for 'i' and 'l')
  * 'b'  boolean
  *  user mode accepts "on" or "off"
- * '-'  optional parameter (eg. '-f')
+ * '-'  optional parameter (eg. '-f'); if followed by an 'V', it
+ *  specifies an optional string param (e.g. '-fV' allows '-f foo')
  *
  */
 
diff --git a/monitor/hmp.c b/monitor/hmp.c
index d50c3124e1..899e0c990f 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -980,6 +980,7 @@ static QDict *monitor_parse_arguments(Monitor *mon,
 {
 const char *tmp = p;
 int skip_key = 0;
+int ret;
 /* option */
 
 c = *typestr++;
@@ -1002,11 +1003,27 @@ static QDict *monitor_parse_arguments(Monitor *mon,
 }
 if (skip_key) {
 p = tmp;
+} else if (*typestr == 'V') {
+/* has option with string value */
+typestr++;
+tmp = p++;
+while (qemu_isspace(*p)) {
+p++;
+}
+ret = get_str(buf, sizeof(buf), );
+if (ret < 0) {
+monitor_printf(mon, "%s: value expected for -%c\n",
+   cmd->name, *tmp);
+goto fail;
+}
+qdict_put_str(qdict, key, buf);
 } else {
-/* has option */
+/* has boolean option */
 p++;
 qdict_put_bool(qdict, key, true);
 }
+} else if (*typestr == 'V') {
+typestr++;
 }
 }
 break;
-- 
2.31.1




Re: [PATCH v7 0/4] VNC-related HMP/QMP fixes

2021-10-27 Thread Markus Armbruster
Stefan Reiter  writes:

> Since the removal of the generic 'qmp_change' command, one can no longer 
> replace
> the 'default' VNC display listen address at runtime (AFAIK). For our users who
> need to set up a secondary VNC access port, this means configuring a second 
> VNC
> display (in addition to our standard one for web-access), but it turns out one
> cannot set a password on this second display at the moment, as the
> 'set_password' call only operates on the 'default' display.
>
> Additionally, using secret objects, the password is only read once at startup.
> This could be considered a bug too, but is not touched in this series and left
> for a later date.

Queued, thanks!




[PULL 18/18] target/riscv: remove force HS exception

2021-10-27 Thread Alistair Francis
From: Jose Martins 

There is no need to "force an hs exception" as the current privilege
level, the state of the global ie and of the delegation registers should
be enough to route the interrupt to the appropriate privilege level in
riscv_cpu_do_interrupt. The is true for both asynchronous and
synchronous exceptions, specifically, guest page faults which must be
hardwired to zero hedeleg. As such the hs_force_except mechanism can be
removed.

Signed-off-by: Jose Martins 
Reviewed-by: Alistair Francis 
Message-id: 20211026145126.11025-3-josemartin...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h|  2 --
 target/riscv/cpu_bits.h   |  6 --
 target/riscv/cpu_helper.c | 26 +-
 3 files changed, 1 insertion(+), 33 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 325908287d..0760c0af93 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -352,8 +352,6 @@ int riscv_cpu_gdb_write_register(CPUState *cpu, uint8_t 
*buf, int reg);
 bool riscv_cpu_fp_enabled(CPURISCVState *env);
 bool riscv_cpu_virt_enabled(CPURISCVState *env);
 void riscv_cpu_set_virt_enabled(CPURISCVState *env, bool enable);
-bool riscv_cpu_force_hs_excep_enabled(CPURISCVState *env);
-void riscv_cpu_set_force_hs_excep(CPURISCVState *env, bool enable);
 bool riscv_cpu_two_stage_lookup(int mmu_idx);
 int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch);
 hwaddr riscv_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index aa0bce4e06..9913fa9f77 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -444,12 +444,6 @@ typedef enum {
 
 /* Virtulisation Register Fields */
 #define VIRT_ONOFF  1
-/* This is used to save state for when we take an exception. If this is set
- * that means that we want to force a HS level exception (no matter what the
- * delegation is set to). This will occur for things such as a second level
- * page table fault.
- */
-#define FORCE_HS_EXCEP  2
 
 /* RV32 satp CSR field masks */
 #define SATP32_MODE 0x8000
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 5076580374..f30ff672f8 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -270,24 +270,6 @@ void riscv_cpu_set_virt_enabled(CPURISCVState *env, bool 
enable)
 env->virt = set_field(env->virt, VIRT_ONOFF, enable);
 }
 
-bool riscv_cpu_force_hs_excep_enabled(CPURISCVState *env)
-{
-if (!riscv_has_ext(env, RVH)) {
-return false;
-}
-
-return get_field(env->virt, FORCE_HS_EXCEP);
-}
-
-void riscv_cpu_set_force_hs_excep(CPURISCVState *env, bool enable)
-{
-if (!riscv_has_ext(env, RVH)) {
-return;
-}
-
-env->virt = set_field(env->virt, FORCE_HS_EXCEP, enable);
-}
-
 bool riscv_cpu_two_stage_lookup(int mmu_idx)
 {
 return mmu_idx & TB_FLAGS_PRIV_HYP_ACCESS_MASK;
@@ -1004,7 +986,6 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 
 RISCVCPU *cpu = RISCV_CPU(cs);
 CPURISCVState *env = >env;
-bool force_hs_execp = riscv_cpu_force_hs_excep_enabled(env);
 uint64_t s;
 
 /* cs->exception is 32-bits wide unlike mcause which is XLEN-bits wide
@@ -1033,8 +1014,6 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 case RISCV_EXCP_INST_GUEST_PAGE_FAULT:
 case RISCV_EXCP_LOAD_GUEST_ACCESS_FAULT:
 case RISCV_EXCP_STORE_GUEST_AMO_ACCESS_FAULT:
-force_hs_execp = true;
-/* fallthrough */
 case RISCV_EXCP_INST_ADDR_MIS:
 case RISCV_EXCP_INST_ACCESS_FAULT:
 case RISCV_EXCP_LOAD_ADDR_MIS:
@@ -1093,8 +1072,7 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 env->hstatus = set_field(env->hstatus, HSTATUS_GVA, 0);
 }
 
-if (riscv_cpu_virt_enabled(env) && ((hdeleg >> cause) & 1) &&
-!force_hs_execp) {
+if (riscv_cpu_virt_enabled(env) && ((hdeleg >> cause) & 1)) {
 /* Trap to VS mode */
 /*
  * See if we need to adjust cause. Yes if its VS mode interrupt
@@ -1116,7 +1094,6 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 htval = env->guest_phys_fault_addr;
 
 riscv_cpu_set_virt_enabled(env, 0);
-riscv_cpu_set_force_hs_excep(env, 0);
 } else {
 /* Trap into HS mode */
 env->hstatus = set_field(env->hstatus, HSTATUS_SPV, false);
@@ -1152,7 +1129,6 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 
 /* Trapping to M mode, virt is disabled */
 riscv_cpu_set_virt_enabled(env, 0);
-riscv_cpu_set_force_hs_excep(env, 0);
 }
 
 s = env->mstatus;
-- 
2.31.1




[PULL 17/18] target/riscv: fix VS interrupts forwarding to HS

2021-10-27 Thread Alistair Francis
From: Jose Martins 

VS interrupts (2, 6, 10) were not correctly forwarded to hs-mode when
not delegated in hideleg (which was not being taken into account). This
was mainly because hs level sie was not always considered enabled when
it should. The spec states that "Interrupts for higher-privilege modes,
y>x, are always globally enabled regardless of the setting of the global
yIE bit for the higher-privilege mode." and also "For purposes of
interrupt global enables, HS-mode is considered more privileged than
VS-mode, and VS-mode is considered more privileged than VU-mode". Also,
vs-level interrupts were not being taken into account unless V=1, but
should be unless delegated.

Finally, there is no need for a special case for to handle vs interrupts
as the current privilege level, the state of the global ie and of the
delegation registers should be enough to route all interrupts to the
appropriate privilege level in riscv_cpu_do_interrupt.

Signed-off-by: Jose Martins 
Reviewed-by: Alistair Francis 
Message-id: 20211026145126.11025-2-josemartin...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 28 
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 662228c238..5076580374 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -135,36 +135,24 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, 
target_ulong *pc,
 #ifndef CONFIG_USER_ONLY
 static int riscv_cpu_local_irq_pending(CPURISCVState *env)
 {
-target_ulong irqs;
+target_ulong virt_enabled = riscv_cpu_virt_enabled(env);
 
 target_ulong mstatus_mie = get_field(env->mstatus, MSTATUS_MIE);
 target_ulong mstatus_sie = get_field(env->mstatus, MSTATUS_SIE);
-target_ulong hs_mstatus_sie = get_field(env->mstatus_hs, MSTATUS_SIE);
 
-target_ulong pending = env->mip & env->mie &
-   ~(MIP_VSSIP | MIP_VSTIP | MIP_VSEIP);
-target_ulong vspending = (env->mip & env->mie &
-  (MIP_VSSIP | MIP_VSTIP | MIP_VSEIP));
+target_ulong pending = env->mip & env->mie;
 
 target_ulong mie= env->priv < PRV_M ||
   (env->priv == PRV_M && mstatus_mie);
 target_ulong sie= env->priv < PRV_S ||
   (env->priv == PRV_S && mstatus_sie);
-target_ulong hs_sie = env->priv < PRV_S ||
-  (env->priv == PRV_S && hs_mstatus_sie);
+target_ulong hsie   = virt_enabled || sie;
+target_ulong vsie   = virt_enabled && sie;
 
-if (riscv_cpu_virt_enabled(env)) {
-target_ulong pending_hs_irq = pending & -hs_sie;
-
-if (pending_hs_irq) {
-riscv_cpu_set_force_hs_excep(env, FORCE_HS_EXCEP);
-return ctz64(pending_hs_irq);
-}
-
-pending = vspending;
-}
-
-irqs = (pending & ~env->mideleg & -mie) | (pending &  env->mideleg & -sie);
+target_ulong irqs =
+(pending & ~env->mideleg & -mie) |
+(pending &  env->mideleg & ~env->hideleg & -hsie) |
+(pending &  env->mideleg &  env->hideleg & -vsie);
 
 if (irqs) {
 return ctz64(irqs); /* since non-zero */
-- 
2.31.1




[PULL 15/18] softfloat: add APIs to handle alternative sNaN propagation for fmax/fmin

2021-10-27 Thread Alistair Francis
From: Chih-Min Chao 

For "fmax/fmin ft0, ft1, ft2" and if one of the inputs is sNaN,

  The original logic:
Return NaN and set invalid flag if ft1 == sNaN || ft2 == sNan.

  The alternative path:
Set invalid flag if ft1 == sNaN || ft2 == sNaN.
Return NaN only if ft1 == NaN && ft2 == NaN.

The IEEE 754 spec allows both implementation and some architecture such
as riscv choose different defintions in two spec versions.
(riscv-spec-v2.2 use original version, riscv-spec-20191213 changes to
 alternative)

Signed-off-by: Chih-Min Chao 
Signed-off-by: Frank Chang 
Reviewed-by: Richard Henderson 
Message-id: 20211016085428.3001501-2-frank.ch...@sifive.com
Signed-off-by: Alistair Francis 
---
 include/fpu/softfloat.h   | 10 ++
 fpu/softfloat.c   | 19 +--
 fpu/softfloat-parts.c.inc | 25 +++--
 3 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index ec7dca0960..a249991e61 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -243,6 +243,8 @@ float16 float16_minnum(float16, float16, float_status 
*status);
 float16 float16_maxnum(float16, float16, float_status *status);
 float16 float16_minnummag(float16, float16, float_status *status);
 float16 float16_maxnummag(float16, float16, float_status *status);
+float16 float16_minimum_number(float16, float16, float_status *status);
+float16 float16_maximum_number(float16, float16, float_status *status);
 float16 float16_sqrt(float16, float_status *status);
 FloatRelation float16_compare(float16, float16, float_status *status);
 FloatRelation float16_compare_quiet(float16, float16, float_status *status);
@@ -422,6 +424,8 @@ bfloat16 bfloat16_minnum(bfloat16, bfloat16, float_status 
*status);
 bfloat16 bfloat16_maxnum(bfloat16, bfloat16, float_status *status);
 bfloat16 bfloat16_minnummag(bfloat16, bfloat16, float_status *status);
 bfloat16 bfloat16_maxnummag(bfloat16, bfloat16, float_status *status);
+bfloat16 bfloat16_minimum_number(bfloat16, bfloat16, float_status *status);
+bfloat16 bfloat16_maximum_number(bfloat16, bfloat16, float_status *status);
 bfloat16 bfloat16_sqrt(bfloat16, float_status *status);
 FloatRelation bfloat16_compare(bfloat16, bfloat16, float_status *status);
 FloatRelation bfloat16_compare_quiet(bfloat16, bfloat16, float_status *status);
@@ -589,6 +593,8 @@ float32 float32_minnum(float32, float32, float_status 
*status);
 float32 float32_maxnum(float32, float32, float_status *status);
 float32 float32_minnummag(float32, float32, float_status *status);
 float32 float32_maxnummag(float32, float32, float_status *status);
+float32 float32_minimum_number(float32, float32, float_status *status);
+float32 float32_maximum_number(float32, float32, float_status *status);
 bool float32_is_quiet_nan(float32, float_status *status);
 bool float32_is_signaling_nan(float32, float_status *status);
 float32 float32_silence_nan(float32, float_status *status);
@@ -778,6 +784,8 @@ float64 float64_minnum(float64, float64, float_status 
*status);
 float64 float64_maxnum(float64, float64, float_status *status);
 float64 float64_minnummag(float64, float64, float_status *status);
 float64 float64_maxnummag(float64, float64, float_status *status);
+float64 float64_minimum_number(float64, float64, float_status *status);
+float64 float64_maximum_number(float64, float64, float_status *status);
 bool float64_is_quiet_nan(float64 a, float_status *status);
 bool float64_is_signaling_nan(float64, float_status *status);
 float64 float64_silence_nan(float64, float_status *status);
@@ -1210,6 +1218,8 @@ float128 float128_minnum(float128, float128, float_status 
*status);
 float128 float128_maxnum(float128, float128, float_status *status);
 float128 float128_minnummag(float128, float128, float_status *status);
 float128 float128_maxnummag(float128, float128, float_status *status);
+float128 float128_minimum_number(float128, float128, float_status *status);
+float128 float128_maximum_number(float128, float128, float_status *status);
 bool float128_is_quiet_nan(float128, float_status *status);
 bool float128_is_signaling_nan(float128, float_status *status);
 float128 float128_silence_nan(float128, float_status *status);
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6e769f990c..9a28720d82 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -436,6 +436,11 @@ enum {
 minmax_isnum = 2,
 /* Set for the IEEE 754-2008 minNumMag() and minNumMag() operations. */
 minmax_ismag = 4,
+/*
+ * Set for the IEEE 754-2019 minimumNumber() and maximumNumber()
+ * operations.
+ */
+minmax_isnumber = 8,
 };
 
 /* Simple helpers for checking if, or what kind of, NaN we have */
@@ -3927,12 +3932,14 @@ static float128 float128_minmax(float128 a, float128 b,
 { return type##_minmax(a, b, s, flags); }
 
 #define MINMAX_2(type) \
-MINMAX_1(type, max, 0)  \
-MINMAX_1(type, maxnum, 

[PULL 13/18] target/riscv: Implement address masking functions required for RISC-V Pointer Masking extension

2021-10-27 Thread Alistair Francis
From: Anatoly Parshintsev 

Signed-off-by: Anatoly Parshintsev 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
Message-id: 20211025173609.2724490-8-space.monkey.deliv...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h|  2 ++
 target/riscv/cpu_helper.c | 18 ++
 target/riscv/translate.c  | 39 +--
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index b2422e3f99..325908287d 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -410,6 +410,8 @@ FIELD(TB_FLAGS, HLSX, 10, 1)
 FIELD(TB_FLAGS, MSTATUS_HS_FS, 11, 2)
 /* The combination of MXL/SXL/UXL that applies to the current cpu mode. */
 FIELD(TB_FLAGS, XL, 13, 2)
+/* If PointerMasking should be applied */
+FIELD(TB_FLAGS, PM_ENABLED, 15, 1)
 
 #ifdef TARGET_RISCV32
 #define riscv_cpu_mxl(env)  ((void)(env), MXL_RV32)
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 0d1132f39d..662228c238 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -107,6 +107,24 @@ void cpu_get_tb_cpu_state(CPURISCVState *env, target_ulong 
*pc,
 flags = FIELD_DP32(flags, TB_FLAGS, MSTATUS_HS_FS,
get_field(env->mstatus_hs, MSTATUS_FS));
 }
+if (riscv_has_ext(env, RVJ)) {
+int priv = flags & TB_FLAGS_PRIV_MMU_MASK;
+bool pm_enabled = false;
+switch (priv) {
+case PRV_U:
+pm_enabled = env->mmte & U_PM_ENABLE;
+break;
+case PRV_S:
+pm_enabled = env->mmte & S_PM_ENABLE;
+break;
+case PRV_M:
+pm_enabled = env->mmte & M_PM_ENABLE;
+break;
+default:
+g_assert_not_reached();
+}
+flags = FIELD_DP32(flags, TB_FLAGS, PM_ENABLED, pm_enabled);
+}
 #endif
 
 flags = FIELD_DP32(flags, TB_FLAGS, XL, cpu_get_xl(env));
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index a5e6fa145d..1d57bc97b5 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -36,6 +36,9 @@ static TCGv cpu_gpr[32], cpu_pc, cpu_vl;
 static TCGv_i64 cpu_fpr[32]; /* assume F and D extensions */
 static TCGv load_res;
 static TCGv load_val;
+/* globals for PM CSRs */
+static TCGv pm_mask[4];
+static TCGv pm_base[4];
 
 #include "exec/gen-icount.h"
 
@@ -83,6 +86,10 @@ typedef struct DisasContext {
 TCGv zero;
 /* Space for 3 operands plus 1 extra for address computation. */
 TCGv temp[4];
+/* PointerMasking extension */
+bool pm_enabled;
+TCGv pm_mask;
+TCGv pm_base;
 } DisasContext;
 
 static inline bool has_ext(DisasContext *ctx, uint32_t ext)
@@ -272,11 +279,20 @@ static void gen_jal(DisasContext *ctx, int rd, 
target_ulong imm)
 }
 
 /*
- * Temp stub: generates address adjustment for PointerMasking
+ * Generates address adjustment for PointerMasking
  */
 static TCGv gen_pm_adjust_address(DisasContext *s, TCGv src)
 {
-return src;
+TCGv temp;
+if (!s->pm_enabled) {
+/* Load unmodified address */
+return src;
+} else {
+temp = temp_new(s);
+tcg_gen_andc_tl(temp, src, s->pm_mask);
+tcg_gen_or_tl(temp, temp, s->pm_base);
+return temp;
+}
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -622,6 +638,10 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 ctx->cs = cs;
 ctx->ntemp = 0;
 memset(ctx->temp, 0, sizeof(ctx->temp));
+ctx->pm_enabled = FIELD_EX32(tb_flags, TB_FLAGS, PM_ENABLED);
+int priv = tb_flags & TB_FLAGS_PRIV_MMU_MASK;
+ctx->pm_mask = pm_mask[priv];
+ctx->pm_base = pm_base[priv];
 
 ctx->zero = tcg_constant_tl(0);
 }
@@ -735,4 +755,19 @@ void riscv_translate_init(void)
  "load_res");
 load_val = tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, load_val),
  "load_val");
+#ifndef CONFIG_USER_ONLY
+/* Assign PM CSRs to tcg globals */
+pm_mask[PRV_U] =
+  tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, upmmask), "upmmask");
+pm_base[PRV_U] =
+  tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, upmbase), "upmbase");
+pm_mask[PRV_S] =
+  tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, spmmask), "spmmask");
+pm_base[PRV_S] =
+  tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, spmbase), "spmbase");
+pm_mask[PRV_M] =
+  tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, mpmmask), "mpmmask");
+pm_base[PRV_M] =
+  tcg_global_mem_new(cpu_env, offsetof(CPURISCVState, mpmbase), "mpmbase");
+#endif
 }
-- 
2.31.1




[PULL 10/18] target/riscv: Add J extension state description

2021-10-27 Thread Alistair Francis
From: Alexey Baturo 

Signed-off-by: Alexey Baturo 
Reviewed-by: Alistair Francis 
Message-id: 20211025173609.2724490-5-space.monkey.deliv...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/machine.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/target/riscv/machine.c b/target/riscv/machine.c
index f64b2a96c1..7b4c739564 100644
--- a/target/riscv/machine.c
+++ b/target/riscv/machine.c
@@ -84,6 +84,14 @@ static bool vector_needed(void *opaque)
 return riscv_has_ext(env, RVV);
 }
 
+static bool pointermasking_needed(void *opaque)
+{
+RISCVCPU *cpu = opaque;
+CPURISCVState *env = >env;
+
+return riscv_has_ext(env, RVJ);
+}
+
 static const VMStateDescription vmstate_vector = {
 .name = "cpu/vector",
 .version_id = 1,
@@ -100,6 +108,24 @@ static const VMStateDescription vmstate_vector = {
 }
 };
 
+static const VMStateDescription vmstate_pointermasking = {
+.name = "cpu/pointer_masking",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = pointermasking_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINTTL(env.mmte, RISCVCPU),
+VMSTATE_UINTTL(env.mpmmask, RISCVCPU),
+VMSTATE_UINTTL(env.mpmbase, RISCVCPU),
+VMSTATE_UINTTL(env.spmmask, RISCVCPU),
+VMSTATE_UINTTL(env.spmbase, RISCVCPU),
+VMSTATE_UINTTL(env.upmmask, RISCVCPU),
+VMSTATE_UINTTL(env.upmbase, RISCVCPU),
+
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_hyper = {
 .name = "cpu/hyper",
 .version_id = 1,
@@ -191,6 +217,7 @@ const VMStateDescription vmstate_riscv_cpu = {
 _pmp,
 _hyper,
 _vector,
+_pointermasking,
 NULL
 }
 };
-- 
2.31.1




[PULL 09/18] target/riscv: Support CSRs required for RISC-V PM extension except for the h-mode

2021-10-27 Thread Alistair Francis
From: Alexey Baturo 

Signed-off-by: Alexey Baturo 
Reviewed-by: Alistair Francis 
Message-id: 20211025173609.2724490-4-space.monkey.deliv...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h |  11 ++
 target/riscv/cpu.c |   2 +
 target/riscv/csr.c | 285 +
 3 files changed, 298 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 1cfc6a53a0..b2422e3f99 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -238,6 +238,17 @@ struct CPURISCVState {
 
 /* True if in debugger mode.  */
 bool debugger;
+
+/*
+ * CSRs for PointerMasking extension
+ */
+target_ulong mmte;
+target_ulong mpmmask;
+target_ulong mpmbase;
+target_ulong spmmask;
+target_ulong spmbase;
+target_ulong upmmask;
+target_ulong upmbase;
 #endif
 
 float_status fp_status;
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 788fa0b11c..6b767a4a0b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -367,6 +367,8 @@ static void riscv_cpu_reset(DeviceState *dev)
 env->mcause = 0;
 env->pc = env->resetvec;
 env->two_stage_lookup = false;
+/* mmte is supposed to have pm.current hardwired to 1 */
+env->mmte |= (PM_EXT_INITIAL | MMTE_M_PM_CURRENT);
 #endif
 cs->exception_index = RISCV_EXCP_NONE;
 env->load_res = -1;
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 69e4d65fcd..9f41954894 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -192,6 +192,16 @@ static RISCVException hmode32(CPURISCVState *env, int 
csrno)
 
 }
 
+/* Checks if PointerMasking registers could be accessed */
+static RISCVException pointer_masking(CPURISCVState *env, int csrno)
+{
+/* Check if j-ext is present */
+if (riscv_has_ext(env, RVJ)) {
+return RISCV_EXCP_NONE;
+}
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
 static RISCVException pmp(CPURISCVState *env, int csrno)
 {
 if (riscv_feature(env, RISCV_FEATURE_PMP)) {
@@ -1425,6 +1435,268 @@ static RISCVException write_pmpaddr(CPURISCVState *env, 
int csrno,
 return RISCV_EXCP_NONE;
 }
 
+/*
+ * Functions to access Pointer Masking feature registers
+ * We have to check if current priv lvl could modify
+ * csr in given mode
+ */
+static bool check_pm_current_disabled(CPURISCVState *env, int csrno)
+{
+int csr_priv = get_field(csrno, 0x300);
+int pm_current;
+
+/*
+ * If priv lvls differ that means we're accessing csr from higher priv lvl,
+ * so allow the access
+ */
+if (env->priv != csr_priv) {
+return false;
+}
+switch (env->priv) {
+case PRV_M:
+pm_current = get_field(env->mmte, M_PM_CURRENT);
+break;
+case PRV_S:
+pm_current = get_field(env->mmte, S_PM_CURRENT);
+break;
+case PRV_U:
+pm_current = get_field(env->mmte, U_PM_CURRENT);
+break;
+default:
+g_assert_not_reached();
+}
+/* It's same priv lvl, so we allow to modify csr only if pm.current==1 */
+return !pm_current;
+}
+
+static RISCVException read_mmte(CPURISCVState *env, int csrno,
+target_ulong *val)
+{
+*val = env->mmte & MMTE_MASK;
+return RISCV_EXCP_NONE;
+}
+
+static RISCVException write_mmte(CPURISCVState *env, int csrno,
+ target_ulong val)
+{
+uint64_t mstatus;
+target_ulong wpri_val = val & MMTE_MASK;
+
+if (val != wpri_val) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s" TARGET_FMT_lx " %s" TARGET_FMT_lx 
"\n",
+  "MMTE: WPRI violation written 0x", val,
+  "vs expected 0x", wpri_val);
+}
+/* for machine mode pm.current is hardwired to 1 */
+wpri_val |= MMTE_M_PM_CURRENT;
+
+/* hardwiring pm.instruction bit to 0, since it's not supported yet */
+wpri_val &= ~(MMTE_M_PM_INSN | MMTE_S_PM_INSN | MMTE_U_PM_INSN);
+env->mmte = wpri_val | PM_EXT_DIRTY;
+
+/* Set XS and SD bits, since PM CSRs are dirty */
+mstatus = env->mstatus | MSTATUS_XS;
+write_mstatus(env, csrno, mstatus);
+return RISCV_EXCP_NONE;
+}
+
+static RISCVException read_smte(CPURISCVState *env, int csrno,
+target_ulong *val)
+{
+*val = env->mmte & SMTE_MASK;
+return RISCV_EXCP_NONE;
+}
+
+static RISCVException write_smte(CPURISCVState *env, int csrno,
+ target_ulong val)
+{
+target_ulong wpri_val = val & SMTE_MASK;
+
+if (val != wpri_val) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s" TARGET_FMT_lx " %s" TARGET_FMT_lx 
"\n",
+  "SMTE: WPRI violation written 0x", val,
+  "vs expected 0x", wpri_val);
+}
+
+/* if pm.current==0 we can't modify current PM CSRs */
+if (check_pm_current_disabled(env, csrno)) {
+return RISCV_EXCP_NONE;
+}
+
+wpri_val |= (env->mmte & ~SMTE_MASK);
+write_mmte(env, csrno, wpri_val);
+return RISCV_EXCP_NONE;
+}
+

[PULL 08/18] target/riscv: Add CSR defines for RISC-V PM extension

2021-10-27 Thread Alistair Francis
From: Alexey Baturo 

Signed-off-by: Alexey Baturo 
Reviewed-by: Alistair Francis 
Message-id: 20211025173609.2724490-3-space.monkey.deliv...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_bits.h | 96 +
 1 file changed, 96 insertions(+)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index cffcd3a5df..aa0bce4e06 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -334,6 +334,38 @@
 #define CSR_MHPMCOUNTER30H  0xb9e
 #define CSR_MHPMCOUNTER31H  0xb9f
 
+/*
+ * User PointerMasking registers
+ * NB: actual CSR numbers might be changed in future
+ */
+#define CSR_UMTE0x4c0
+#define CSR_UPMMASK 0x4c1
+#define CSR_UPMBASE 0x4c2
+
+/*
+ * Machine PointerMasking registers
+ * NB: actual CSR numbers might be changed in future
+ */
+#define CSR_MMTE0x3c0
+#define CSR_MPMMASK 0x3c1
+#define CSR_MPMBASE 0x3c2
+
+/*
+ * Supervisor PointerMaster registers
+ * NB: actual CSR numbers might be changed in future
+ */
+#define CSR_SMTE0x1c0
+#define CSR_SPMMASK 0x1c1
+#define CSR_SPMBASE 0x1c2
+
+/*
+ * Hypervisor PointerMaster registers
+ * NB: actual CSR numbers might be changed in future
+ */
+#define CSR_VSMTE   0x2c0
+#define CSR_VSPMMASK0x2c1
+#define CSR_VSPMBASE0x2c2
+
 /* mstatus CSR bits */
 #define MSTATUS_UIE 0x0001
 #define MSTATUS_SIE 0x0002
@@ -525,4 +557,68 @@ typedef enum RISCVException {
 #define MIE_UTIE   (1 << IRQ_U_TIMER)
 #define MIE_SSIE   (1 << IRQ_S_SOFT)
 #define MIE_USIE   (1 << IRQ_U_SOFT)
+
+/* General PointerMasking CSR bits*/
+#define PM_ENABLE   0x0001ULL
+#define PM_CURRENT  0x0002ULL
+#define PM_INSN 0x0004ULL
+#define PM_XS_MASK  0x0003ULL
+
+/* PointerMasking XS bits values */
+#define PM_EXT_DISABLE  0xULL
+#define PM_EXT_INITIAL  0x0001ULL
+#define PM_EXT_CLEAN0x0002ULL
+#define PM_EXT_DIRTY0x0003ULL
+
+/* Offsets for every pair of control bits per each priv level */
+#define XS_OFFSET0ULL
+#define U_OFFSET 2ULL
+#define S_OFFSET 5ULL
+#define M_OFFSET 8ULL
+
+#define PM_XS_BITS   (PM_XS_MASK << XS_OFFSET)
+#define U_PM_ENABLE  (PM_ENABLE  << U_OFFSET)
+#define U_PM_CURRENT (PM_CURRENT << U_OFFSET)
+#define U_PM_INSN(PM_INSN<< U_OFFSET)
+#define S_PM_ENABLE  (PM_ENABLE  << S_OFFSET)
+#define S_PM_CURRENT (PM_CURRENT << S_OFFSET)
+#define S_PM_INSN(PM_INSN<< S_OFFSET)
+#define M_PM_ENABLE  (PM_ENABLE  << M_OFFSET)
+#define M_PM_CURRENT (PM_CURRENT << M_OFFSET)
+#define M_PM_INSN(PM_INSN<< M_OFFSET)
+
+/* mmte CSR bits */
+#define MMTE_PM_XS_BITS PM_XS_BITS
+#define MMTE_U_PM_ENABLEU_PM_ENABLE
+#define MMTE_U_PM_CURRENT   U_PM_CURRENT
+#define MMTE_U_PM_INSN  U_PM_INSN
+#define MMTE_S_PM_ENABLES_PM_ENABLE
+#define MMTE_S_PM_CURRENT   S_PM_CURRENT
+#define MMTE_S_PM_INSN  S_PM_INSN
+#define MMTE_M_PM_ENABLEM_PM_ENABLE
+#define MMTE_M_PM_CURRENT   M_PM_CURRENT
+#define MMTE_M_PM_INSN  M_PM_INSN
+#define MMTE_MASK(MMTE_U_PM_ENABLE | MMTE_U_PM_CURRENT | MMTE_U_PM_INSN | \
+  MMTE_S_PM_ENABLE | MMTE_S_PM_CURRENT | MMTE_S_PM_INSN | \
+  MMTE_M_PM_ENABLE | MMTE_M_PM_CURRENT | MMTE_M_PM_INSN | \
+  MMTE_PM_XS_BITS)
+
+/* (v)smte CSR bits */
+#define SMTE_PM_XS_BITS PM_XS_BITS
+#define SMTE_U_PM_ENABLEU_PM_ENABLE
+#define SMTE_U_PM_CURRENT   U_PM_CURRENT
+#define SMTE_U_PM_INSN  U_PM_INSN
+#define SMTE_S_PM_ENABLES_PM_ENABLE
+#define SMTE_S_PM_CURRENT   S_PM_CURRENT
+#define SMTE_S_PM_INSN  S_PM_INSN
+#define SMTE_MASK(SMTE_U_PM_ENABLE | SMTE_U_PM_CURRENT | SMTE_U_PM_INSN | \
+  SMTE_S_PM_ENABLE | SMTE_S_PM_CURRENT | SMTE_S_PM_INSN | \
+  SMTE_PM_XS_BITS)
+
+/* umte CSR bits */
+#define UMTE_U_PM_ENABLEU_PM_ENABLE
+#define UMTE_U_PM_CURRENT   U_PM_CURRENT
+#define UMTE_U_PM_INSN  U_PM_INSN
+#define UMTE_MASK (UMTE_U_PM_ENABLE | MMTE_U_PM_CURRENT | UMTE_U_PM_INSN)
+
 #endif
-- 
2.31.1




[PULL 14/18] target/riscv: Allow experimental J-ext to be turned on

2021-10-27 Thread Alistair Francis
From: Alexey Baturo 

Signed-off-by: Alexey Baturo 
Reviewed-by: Alistair Francis 
Reviewed-by: Bin Meng 
Reviewed-by: Richard Henderson 
Message-id: 20211025173609.2724490-9-space.monkey.deliv...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 16fac64806..7d53125dbc 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -562,6 +562,9 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 set_vext_version(env, vext_version);
 }
+if (cpu->cfg.ext_j) {
+ext |= RVJ;
+}
 
 set_misa(env, env->misa_mxl, ext);
 }
@@ -637,6 +640,7 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("x-zbc", RISCVCPU, cfg.ext_zbc, false),
 DEFINE_PROP_BOOL("x-zbs", RISCVCPU, cfg.ext_zbs, false),
 DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
+DEFINE_PROP_BOOL("x-j", RISCVCPU, cfg.ext_j, false),
 DEFINE_PROP_BOOL("x-v", RISCVCPU, cfg.ext_v, false),
 DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
 DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
-- 
2.31.1




[PULL 07/18] target/riscv: Add J-extension into RISC-V

2021-10-27 Thread Alistair Francis
From: Alexey Baturo 

Signed-off-by: Alexey Baturo 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
Reviewed-by: Bin Meng 
Message-id: 20211025173609.2724490-2-space.monkey.deliv...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index a33dc30be8..1cfc6a53a0 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -65,6 +65,7 @@
 #define RVS RV('S')
 #define RVU RV('U')
 #define RVH RV('H')
+#define RVJ RV('J')
 
 /* S extension denotes that Supervisor mode exists, however it is possible
to have a core that support S mode but does not have an MMU and there
@@ -291,6 +292,7 @@ struct RISCVCPU {
 bool ext_s;
 bool ext_u;
 bool ext_h;
+bool ext_j;
 bool ext_v;
 bool ext_zba;
 bool ext_zbb;
-- 
2.31.1




[PULL 16/18] target/riscv: change the api for RVF/RVD fmin/fmax

2021-10-27 Thread Alistair Francis
From: Chih-Min Chao 

The sNaN propagation behavior has been changed since
cd20cee7 in https://github.com/riscv/riscv-isa-manual.

Signed-off-by: Chih-Min Chao 
Signed-off-by: Frank Chang 
Acked-by: Alistair Francis 
Message-id: 20211016085428.3001501-3-frank.ch...@sifive.com
Signed-off-by: Alistair Francis 
---
 target/riscv/fpu_helper.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/target/riscv/fpu_helper.c b/target/riscv/fpu_helper.c
index 8700516a14..d62f470900 100644
--- a/target/riscv/fpu_helper.c
+++ b/target/riscv/fpu_helper.c
@@ -174,14 +174,18 @@ uint64_t helper_fmin_s(CPURISCVState *env, uint64_t rs1, 
uint64_t rs2)
 {
 float32 frs1 = check_nanbox_s(rs1);
 float32 frs2 = check_nanbox_s(rs2);
-return nanbox_s(float32_minnum(frs1, frs2, >fp_status));
+return nanbox_s(env->priv_ver < PRIV_VERSION_1_11_0 ?
+float32_minnum(frs1, frs2, >fp_status) :
+float32_minimum_number(frs1, frs2, >fp_status));
 }
 
 uint64_t helper_fmax_s(CPURISCVState *env, uint64_t rs1, uint64_t rs2)
 {
 float32 frs1 = check_nanbox_s(rs1);
 float32 frs2 = check_nanbox_s(rs2);
-return nanbox_s(float32_maxnum(frs1, frs2, >fp_status));
+return nanbox_s(env->priv_ver < PRIV_VERSION_1_11_0 ?
+float32_maxnum(frs1, frs2, >fp_status) :
+float32_maximum_number(frs1, frs2, >fp_status));
 }
 
 uint64_t helper_fsqrt_s(CPURISCVState *env, uint64_t rs1)
@@ -283,12 +287,16 @@ uint64_t helper_fdiv_d(CPURISCVState *env, uint64_t frs1, 
uint64_t frs2)
 
 uint64_t helper_fmin_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-return float64_minnum(frs1, frs2, >fp_status);
+return env->priv_ver < PRIV_VERSION_1_11_0 ?
+float64_minnum(frs1, frs2, >fp_status) :
+float64_minimum_number(frs1, frs2, >fp_status);
 }
 
 uint64_t helper_fmax_d(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
 {
-return float64_maxnum(frs1, frs2, >fp_status);
+return env->priv_ver < PRIV_VERSION_1_11_0 ?
+float64_maxnum(frs1, frs2, >fp_status) :
+float64_maximum_number(frs1, frs2, >fp_status);
 }
 
 uint64_t helper_fcvt_s_d(CPURISCVState *env, uint64_t rs1)
-- 
2.31.1




[PULL 06/18] hw/riscv: opentitan: Fixup the PLIC context addresses

2021-10-27 Thread Alistair Francis
From: Alistair Francis 

Fixup the PLIC context address to correctly support the threshold and
claim register.

Fixes: ef63100648 ("hw/riscv: opentitan: Update to the latest build")
Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
Message-id: 20211025040657.262696-1-alistair.fran...@opensource.wdc.com
---
 hw/riscv/opentitan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
index 83e1511f28..c531450b9f 100644
--- a/hw/riscv/opentitan.c
+++ b/hw/riscv/opentitan.c
@@ -161,8 +161,8 @@ static void lowrisc_ibex_soc_realize(DeviceState *dev_soc, 
Error **errp)
 qdev_prop_set_uint32(DEVICE(>plic), "pending-base", 0x1000);
 qdev_prop_set_uint32(DEVICE(>plic), "enable-base", 0x2000);
 qdev_prop_set_uint32(DEVICE(>plic), "enable-stride", 0x18);
-qdev_prop_set_uint32(DEVICE(>plic), "context-base", 0x24);
-qdev_prop_set_uint32(DEVICE(>plic), "context-stride", 4);
+qdev_prop_set_uint32(DEVICE(>plic), "context-base", 0x20);
+qdev_prop_set_uint32(DEVICE(>plic), "context-stride", 8);
 qdev_prop_set_uint32(DEVICE(>plic), "aperture-size", 
memmap[IBEX_DEV_PLIC].size);
 
 if (!sysbus_realize(SYS_BUS_DEVICE(>plic), errp)) {
-- 
2.31.1




[PULL 05/18] hw/riscv: virt: Use the PLIC config helper function

2021-10-27 Thread Alistair Francis
From: Alistair Francis 

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Bin Meng 
Tested-by: Bin Meng 
Message-id: 20211022060133.3045020-5-alistair.fran...@opensource.wdc.com
---
 hw/riscv/virt.c | 20 +---
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 28a5909a3b..3af074148e 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -748,24 +748,6 @@ static FWCfgState *create_fw_cfg(const MachineState *mc)
 return fw_cfg;
 }
 
-/*
- * Return the per-socket PLIC hart topology configuration string
- * (caller must free with g_free())
- */
-static char *plic_hart_config_string(int hart_count)
-{
-g_autofree const char **vals = g_new(const char *, hart_count + 1);
-int i;
-
-for (i = 0; i < hart_count; i++) {
-vals[i] = "MS";
-}
-vals[i] = NULL;
-
-/* g_strjoinv() obliges us to cast away const here */
-return g_strjoinv(",", (char **)vals);
-}
-
 static void virt_machine_init(MachineState *machine)
 {
 const MemMapEntry *memmap = virt_memmap;
@@ -839,7 +821,7 @@ static void virt_machine_init(MachineState *machine)
 }
 
 /* Per-socket PLIC hart topology configuration string */
-plic_hart_config = plic_hart_config_string(hart_count);
+plic_hart_config = riscv_plic_hart_config_string(hart_count);
 
 /* Per-socket PLIC */
 s->plic[i] = sifive_plic_create(
-- 
2.31.1




[PULL 12/18] target/riscv: Support pointer masking for RISC-V for i/c/f/d/a types of instructions

2021-10-27 Thread Alistair Francis
From: Alexey Baturo 

Signed-off-by: Alexey Baturo 
Reviewed-by: Richard Henderson 
Reviewed-by: Alistair Francis 
Message-id: 20211025173609.2724490-7-space.monkey.deliv...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/translate.c| 8 
 target/riscv/insn_trans/trans_rva.c.inc | 3 +++
 target/riscv/insn_trans/trans_rvd.c.inc | 2 ++
 target/riscv/insn_trans/trans_rvf.c.inc | 2 ++
 target/riscv/insn_trans/trans_rvi.c.inc | 2 ++
 5 files changed, 17 insertions(+)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index d38f87d718..a5e6fa145d 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -271,6 +271,14 @@ static void gen_jal(DisasContext *ctx, int rd, 
target_ulong imm)
 ctx->base.is_jmp = DISAS_NORETURN;
 }
 
+/*
+ * Temp stub: generates address adjustment for PointerMasking
+ */
+static TCGv gen_pm_adjust_address(DisasContext *s, TCGv src)
+{
+return src;
+}
+
 #ifndef CONFIG_USER_ONLY
 /* The states of mstatus_fs are:
  * 0 = disabled, 1 = initial, 2 = clean, 3 = dirty
diff --git a/target/riscv/insn_trans/trans_rva.c.inc 
b/target/riscv/insn_trans/trans_rva.c.inc
index 6ea07d89b0..40fe132b04 100644
--- a/target/riscv/insn_trans/trans_rva.c.inc
+++ b/target/riscv/insn_trans/trans_rva.c.inc
@@ -25,6 +25,7 @@ static bool gen_lr(DisasContext *ctx, arg_atomic *a, MemOp 
mop)
 if (a->rl) {
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
 }
+src1 = gen_pm_adjust_address(ctx, src1);
 tcg_gen_qemu_ld_tl(load_val, src1, ctx->mem_idx, mop);
 if (a->aq) {
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
@@ -44,6 +45,7 @@ static bool gen_sc(DisasContext *ctx, arg_atomic *a, MemOp 
mop)
 TCGLabel *l2 = gen_new_label();
 
 src1 = get_gpr(ctx, a->rs1, EXT_ZERO);
+src1 = gen_pm_adjust_address(ctx, src1);
 tcg_gen_brcond_tl(TCG_COND_NE, load_res, src1, l1);
 
 /*
@@ -84,6 +86,7 @@ static bool gen_amo(DisasContext *ctx, arg_atomic *a,
 TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
 TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
 
+src1 = gen_pm_adjust_address(ctx, src1);
 func(dest, src1, src2, ctx->mem_idx, mop);
 
 gen_set_gpr(ctx, a->rd, dest);
diff --git a/target/riscv/insn_trans/trans_rvd.c.inc 
b/target/riscv/insn_trans/trans_rvd.c.inc
index db9ae15755..64fb0046f7 100644
--- a/target/riscv/insn_trans/trans_rvd.c.inc
+++ b/target/riscv/insn_trans/trans_rvd.c.inc
@@ -31,6 +31,7 @@ static bool trans_fld(DisasContext *ctx, arg_fld *a)
 tcg_gen_addi_tl(temp, addr, a->imm);
 addr = temp;
 }
+addr = gen_pm_adjust_address(ctx, addr);
 
 tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], addr, ctx->mem_idx, MO_TEQ);
 
@@ -51,6 +52,7 @@ static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
 tcg_gen_addi_tl(temp, addr, a->imm);
 addr = temp;
 }
+addr = gen_pm_adjust_address(ctx, addr);
 
 tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], addr, ctx->mem_idx, MO_TEQ);
 
diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc
index bddbd418d9..b5459249c4 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -37,6 +37,7 @@ static bool trans_flw(DisasContext *ctx, arg_flw *a)
 tcg_gen_addi_tl(temp, addr, a->imm);
 addr = temp;
 }
+addr = gen_pm_adjust_address(ctx, addr);
 
 dest = cpu_fpr[a->rd];
 tcg_gen_qemu_ld_i64(dest, addr, ctx->mem_idx, MO_TEUL);
@@ -59,6 +60,7 @@ static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
 tcg_gen_addi_tl(temp, addr, a->imm);
 addr = temp;
 }
+addr = gen_pm_adjust_address(ctx, addr);
 
 tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], addr, ctx->mem_idx, MO_TEUL);
 
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 91dc438a3a..e51dbc41c5 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -144,6 +144,7 @@ static bool gen_load(DisasContext *ctx, arg_lb *a, MemOp 
memop)
 tcg_gen_addi_tl(temp, addr, a->imm);
 addr = temp;
 }
+addr = gen_pm_adjust_address(ctx, addr);
 
 tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, memop);
 gen_set_gpr(ctx, a->rd, dest);
@@ -185,6 +186,7 @@ static bool gen_store(DisasContext *ctx, arg_sb *a, MemOp 
memop)
 tcg_gen_addi_tl(temp, addr, a->imm);
 addr = temp;
 }
+addr = gen_pm_adjust_address(ctx, addr);
 
 tcg_gen_qemu_st_tl(data, addr, ctx->mem_idx, memop);
 return true;
-- 
2.31.1




[PULL 03/18] hw/riscv: sifive_u: Use the PLIC config helper function

2021-10-27 Thread Alistair Francis
From: Alistair Francis 

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Bin Meng 
Tested-by: Bin Meng 
Message-id: 20211022060133.3045020-3-alistair.fran...@opensource.wdc.com
---
 include/hw/riscv/sifive_u.h |  1 -
 hw/riscv/sifive_u.c | 14 +-
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/include/hw/riscv/sifive_u.h b/include/hw/riscv/sifive_u.h
index f71c90c94c..8f63a183c4 100644
--- a/include/hw/riscv/sifive_u.h
+++ b/include/hw/riscv/sifive_u.h
@@ -156,7 +156,6 @@ enum {
 #define SIFIVE_U_MANAGEMENT_CPU_COUNT   1
 #define SIFIVE_U_COMPUTE_CPU_COUNT  4
 
-#define SIFIVE_U_PLIC_HART_CONFIG "MS"
 #define SIFIVE_U_PLIC_NUM_SOURCES 54
 #define SIFIVE_U_PLIC_NUM_PRIORITIES 7
 #define SIFIVE_U_PLIC_PRIORITY_BASE 0x04
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 0217006c27..589ae72a59 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -811,7 +811,6 @@ static void sifive_u_soc_realize(DeviceState *dev, Error 
**errp)
 MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
 MemoryRegion *l2lim_mem = g_new(MemoryRegion, 1);
 char *plic_hart_config;
-size_t plic_hart_config_len;
 int i, j;
 NICInfo *nd = _table[0];
 
@@ -852,18 +851,7 @@ static void sifive_u_soc_realize(DeviceState *dev, Error 
**errp)
 l2lim_mem);
 
 /* create PLIC hart topology configuration string */
-plic_hart_config_len = (strlen(SIFIVE_U_PLIC_HART_CONFIG) + 1) *
-   ms->smp.cpus;
-plic_hart_config = g_malloc0(plic_hart_config_len);
-for (i = 0; i < ms->smp.cpus; i++) {
-if (i != 0) {
-strncat(plic_hart_config, "," SIFIVE_U_PLIC_HART_CONFIG,
-plic_hart_config_len);
-} else {
-strncat(plic_hart_config, "M", plic_hart_config_len);
-}
-plic_hart_config_len -= (strlen(SIFIVE_U_PLIC_HART_CONFIG) + 1);
-}
+plic_hart_config = riscv_plic_hart_config_string(ms->smp.cpus);
 
 /* MMIO */
 s->plic = sifive_plic_create(memmap[SIFIVE_U_DEV_PLIC].base,
-- 
2.31.1




[PULL 01/18] hw/riscv: virt: Don't use a macro for the PLIC configuration

2021-10-27 Thread Alistair Francis
From: Alistair Francis 

Using a macro for the PLIC configuration doesn't make the code any
easier to read. Instead it makes it harder to figure out what is going
on, so let's remove it.

Signed-off-by: Alistair Francis 
Reviewed-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20211022060133.3045020-1-alistair.fran...@opensource.wdc.com
---
 include/hw/riscv/virt.h | 1 -
 hw/riscv/virt.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
index d9105c1886..b8ef99f348 100644
--- a/include/hw/riscv/virt.h
+++ b/include/hw/riscv/virt.h
@@ -73,7 +73,6 @@ enum {
 VIRTIO_NDEV = 0x35 /* Arbitrary maximum number of interrupts */
 };
 
-#define VIRT_PLIC_HART_CONFIG "MS"
 #define VIRT_PLIC_NUM_SOURCES 127
 #define VIRT_PLIC_NUM_PRIORITIES 7
 #define VIRT_PLIC_PRIORITY_BASE 0x04
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index b3b431c847..28a5909a3b 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -758,7 +758,7 @@ static char *plic_hart_config_string(int hart_count)
 int i;
 
 for (i = 0; i < hart_count; i++) {
-vals[i] = VIRT_PLIC_HART_CONFIG;
+vals[i] = "MS";
 }
 vals[i] = NULL;
 
-- 
2.31.1




[PULL 11/18] target/riscv: Print new PM CSRs in QEMU logs

2021-10-27 Thread Alistair Francis
From: Alexey Baturo 

Signed-off-by: Alexey Baturo 
Reviewed-by: Alistair Francis 
Message-id: 20211025173609.2724490-6-space.monkey.deliv...@gmail.com
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 6b767a4a0b..16fac64806 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -271,6 +271,13 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
 CSR_MSCRATCH,
 CSR_SSCRATCH,
 CSR_SATP,
+CSR_MMTE,
+CSR_UPMBASE,
+CSR_UPMMASK,
+CSR_SPMBASE,
+CSR_SPMMASK,
+CSR_MPMBASE,
+CSR_MPMMASK,
 };
 
 for (int i = 0; i < ARRAY_SIZE(dump_csrs); ++i) {
-- 
2.31.1




[PULL 04/18] hw/riscv: microchip_pfsoc: Use the PLIC config helper function

2021-10-27 Thread Alistair Francis
From: Alistair Francis 

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Bin Meng 
Tested-by: Bin Meng 
Message-id: 20211022060133.3045020-4-alistair.fran...@opensource.wdc.com
---
 include/hw/riscv/microchip_pfsoc.h |  1 -
 hw/riscv/microchip_pfsoc.c | 14 +-
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/include/hw/riscv/microchip_pfsoc.h 
b/include/hw/riscv/microchip_pfsoc.h
index d30916f45d..a0673f5f59 100644
--- a/include/hw/riscv/microchip_pfsoc.h
+++ b/include/hw/riscv/microchip_pfsoc.h
@@ -138,7 +138,6 @@ enum {
 #define MICROCHIP_PFSOC_MANAGEMENT_CPU_COUNT1
 #define MICROCHIP_PFSOC_COMPUTE_CPU_COUNT   4
 
-#define MICROCHIP_PFSOC_PLIC_HART_CONFIG"MS"
 #define MICROCHIP_PFSOC_PLIC_NUM_SOURCES185
 #define MICROCHIP_PFSOC_PLIC_NUM_PRIORITIES 7
 #define MICROCHIP_PFSOC_PLIC_PRIORITY_BASE  0x04
diff --git a/hw/riscv/microchip_pfsoc.c b/hw/riscv/microchip_pfsoc.c
index 3fc8545562..57d779fb55 100644
--- a/hw/riscv/microchip_pfsoc.c
+++ b/hw/riscv/microchip_pfsoc.c
@@ -187,7 +187,6 @@ static void microchip_pfsoc_soc_realize(DeviceState *dev, 
Error **errp)
 MemoryRegion *envm_data = g_new(MemoryRegion, 1);
 MemoryRegion *qspi_xip_mem = g_new(MemoryRegion, 1);
 char *plic_hart_config;
-size_t plic_hart_config_len;
 NICInfo *nd;
 int i;
 
@@ -262,18 +261,7 @@ static void microchip_pfsoc_soc_realize(DeviceState *dev, 
Error **errp)
 l2lim_mem);
 
 /* create PLIC hart topology configuration string */
-plic_hart_config_len = (strlen(MICROCHIP_PFSOC_PLIC_HART_CONFIG) + 1) *
-   ms->smp.cpus;
-plic_hart_config = g_malloc0(plic_hart_config_len);
-for (i = 0; i < ms->smp.cpus; i++) {
-if (i != 0) {
-strncat(plic_hart_config, "," MICROCHIP_PFSOC_PLIC_HART_CONFIG,
-plic_hart_config_len);
-} else {
-strncat(plic_hart_config, "M", plic_hart_config_len);
-}
-plic_hart_config_len -= (strlen(MICROCHIP_PFSOC_PLIC_HART_CONFIG) + 1);
-}
+plic_hart_config = riscv_plic_hart_config_string(ms->smp.cpus);
 
 /* PLIC */
 s->plic = sifive_plic_create(memmap[MICROCHIP_PFSOC_PLIC].base,
-- 
2.31.1




[PULL 02/18] hw/riscv: boot: Add a PLIC config string function

2021-10-27 Thread Alistair Francis
From: Alistair Francis 

Add a generic function that can create the PLIC strings.

Signed-off-by: Alistair Francis 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Bin Meng 
Message-id: 20211022060133.3045020-2-alistair.fran...@opensource.wdc.com
---
 include/hw/riscv/boot.h |  2 ++
 hw/riscv/boot.c | 25 +
 2 files changed, 27 insertions(+)

diff --git a/include/hw/riscv/boot.h b/include/hw/riscv/boot.h
index 0e89400b09..baff11dd8a 100644
--- a/include/hw/riscv/boot.h
+++ b/include/hw/riscv/boot.h
@@ -31,6 +31,8 @@
 
 bool riscv_is_32bit(RISCVHartArrayState *harts);
 
+char *riscv_plic_hart_config_string(int hart_count);
+
 target_ulong riscv_calc_kernel_start_addr(RISCVHartArrayState *harts,
   target_ulong firmware_end_addr);
 target_ulong riscv_find_and_load_firmware(MachineState *machine,
diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index d1ffc7b56c..519fa455a1 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -38,6 +38,31 @@ bool riscv_is_32bit(RISCVHartArrayState *harts)
 return harts->harts[0].env.misa_mxl_max == MXL_RV32;
 }
 
+/*
+ * Return the per-socket PLIC hart topology configuration string
+ * (caller must free with g_free())
+ */
+char *riscv_plic_hart_config_string(int hart_count)
+{
+g_autofree const char **vals = g_new(const char *, hart_count + 1);
+int i;
+
+for (i = 0; i < hart_count; i++) {
+CPUState *cs = qemu_get_cpu(i);
+CPURISCVState *env = _CPU(cs)->env;
+
+if (riscv_has_ext(env, RVS)) {
+vals[i] = "MS";
+} else {
+vals[i] = "M";
+}
+}
+vals[i] = NULL;
+
+/* g_strjoinv() obliges us to cast away const here */
+return g_strjoinv(",", (char **)vals);
+}
+
 target_ulong riscv_calc_kernel_start_addr(RISCVHartArrayState *harts,
   target_ulong firmware_end_addr) {
 if (riscv_is_32bit(harts)) {
-- 
2.31.1




[PULL 00/18] riscv-to-apply queue

2021-10-27 Thread Alistair Francis
From: Alistair Francis 

The following changes since commit c52d69e7dbaaed0ffdef8125e79218672c30161d:

  Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20211027' 
into staging (2021-10-27 11:45:18 -0700)

are available in the Git repository at:

  g...@github.com:alistair23/qemu.git tags/pull-riscv-to-apply-20211028

for you to fetch changes up to 344b61e1478c8eb37e81b96f63d8f5071f5a38e1:

  target/riscv: remove force HS exception (2021-10-28 14:39:23 +1000)


Fifth RISC-V PR for QEMU 6.2

 - Use a shared PLIC config helper function
 - Fixup the OpenTitan PLIC configuration
 - Add support for the experimental J extension
 - Update the fmin/fmax handling
 - Fixup VS interrupt forwarding


Alexey Baturo (7):
  target/riscv: Add J-extension into RISC-V
  target/riscv: Add CSR defines for RISC-V PM extension
  target/riscv: Support CSRs required for RISC-V PM extension except for 
the h-mode
  target/riscv: Add J extension state description
  target/riscv: Print new PM CSRs in QEMU logs
  target/riscv: Support pointer masking for RISC-V for i/c/f/d/a types of 
instructions
  target/riscv: Allow experimental J-ext to be turned on

Alistair Francis (6):
  hw/riscv: virt: Don't use a macro for the PLIC configuration
  hw/riscv: boot: Add a PLIC config string function
  hw/riscv: sifive_u: Use the PLIC config helper function
  hw/riscv: microchip_pfsoc: Use the PLIC config helper function
  hw/riscv: virt: Use the PLIC config helper function
  hw/riscv: opentitan: Fixup the PLIC context addresses

Anatoly Parshintsev (1):
  target/riscv: Implement address masking functions required for RISC-V 
Pointer Masking extension

Chih-Min Chao (2):
  softfloat: add APIs to handle alternative sNaN propagation for fmax/fmin
  target/riscv: change the api for RVF/RVD fmin/fmax

Jose Martins (2):
  target/riscv: fix VS interrupts forwarding to HS
  target/riscv: remove force HS exception

 include/fpu/softfloat.h |  10 ++
 include/hw/riscv/boot.h |   2 +
 include/hw/riscv/microchip_pfsoc.h  |   1 -
 include/hw/riscv/sifive_u.h |   1 -
 include/hw/riscv/virt.h |   1 -
 target/riscv/cpu.h  |  17 +-
 target/riscv/cpu_bits.h | 102 +++-
 fpu/softfloat.c |  19 ++-
 hw/riscv/boot.c |  25 +++
 hw/riscv/microchip_pfsoc.c  |  14 +-
 hw/riscv/opentitan.c|   4 +-
 hw/riscv/sifive_u.c |  14 +-
 hw/riscv/virt.c |  20 +--
 target/riscv/cpu.c  |  13 ++
 target/riscv/cpu_helper.c   |  72 +++-
 target/riscv/csr.c  | 285 
 target/riscv/fpu_helper.c   |  16 +-
 target/riscv/machine.c  |  27 +++
 target/riscv/translate.c|  43 +
 fpu/softfloat-parts.c.inc   |  25 ++-
 target/riscv/insn_trans/trans_rva.c.inc |   3 +
 target/riscv/insn_trans/trans_rvd.c.inc |   2 +
 target/riscv/insn_trans/trans_rvf.c.inc |   2 +
 target/riscv/insn_trans/trans_rvi.c.inc |   2 +
 24 files changed, 605 insertions(+), 115 deletions(-)



[PATCH v2 4/5] pci: Add pci_for_each_root_bus()

2021-10-27 Thread Peter Xu
Add a helper to loop over each root bus of the system, either the default root
bus or extended buses like pxb-pcie.

There're three places that can be rewritten with the pci_for_each_root_bus()
helper that we just introduced.  De-dup the code.

Signed-off-by: Peter Xu 
---
 hw/arm/virt-acpi-build.c | 31 +++
 hw/i386/acpi-build.c | 38 ++
 hw/pci/pci.c | 26 ++
 include/hw/pci/pci.h |  2 ++
 4 files changed, 49 insertions(+), 48 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 674f902652..adba51f35a 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -264,28 +264,20 @@ struct AcpiIortIdMapping {
 typedef struct AcpiIortIdMapping AcpiIortIdMapping;
 
 /* Build the iort ID mapping to SMMUv3 for a given PCI host bridge */
-static int
-iort_host_bridges(Object *obj, void *opaque)
+static void
+iort_host_bridges(PCIBus *bus, void *opaque)
 {
-GArray *idmap_blob = opaque;
-
-if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
-PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
-
-if (bus && !pci_bus_bypass_iommu(bus)) {
-int min_bus, max_bus;
+if (!pci_bus_bypass_iommu(bus)) {
+int min_bus, max_bus;
 
-pci_bus_range(bus, _bus, _bus);
+pci_bus_range(bus, _bus, _bus);
 
-AcpiIortIdMapping idmap = {
-.input_base = min_bus << 8,
-.id_count = (max_bus - min_bus + 1) << 8,
-};
-g_array_append_val(idmap_blob, idmap);
-}
+AcpiIortIdMapping idmap = {
+.input_base = min_bus << 8,
+.id_count = (max_bus - min_bus + 1) << 8,
+};
+g_array_append_val((GArray *)opaque, idmap);
 }
-
-return 0;
 }
 
 static int iort_idmap_compare(gconstpointer a, gconstpointer b)
@@ -320,8 +312,7 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 if (vms->iommu == VIRT_IOMMU_SMMUV3) {
 AcpiIortIdMapping next_range = {0};
 
-object_child_foreach_recursive(object_get_root(),
-   iort_host_bridges, smmu_idmaps);
+pci_for_each_root_bus(iort_host_bridges, smmu_idmaps);
 
 /* Sort the smmu idmap by input_base */
 g_array_sort(smmu_idmaps, iort_idmap_compare);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a76b17ed92..3e50acfe35 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2123,20 +2123,12 @@ insert_scope(PCIBus *bus, PCIDevice *dev, void *opaque)
 }
 
 /* For a given PCI host bridge, walk and insert DMAR scope */
-static int
-dmar_host_bridges(Object *obj, void *opaque)
+static void
+dmar_host_bridges(PCIBus *bus, void *opaque)
 {
-GArray *scope_blob = opaque;
-
-if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
-PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
-
-if (bus && !pci_bus_bypass_iommu(bus)) {
-pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
-}
+if (!pci_bus_bypass_iommu(bus)) {
+pci_for_each_device_under_bus(bus, insert_scope, opaque);
 }
-
-return 0;
 }
 
 /*
@@ -2165,8 +2157,7 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, 
const char *oem_id,
  * Insert scope for each PCI bridge and endpoint device which
  * is attached to a bus with iommu enabled.
  */
-object_child_foreach_recursive(object_get_root(),
-   dmar_host_bridges, scope_blob);
+pci_for_each_root_bus(dmar_host_bridges, scope_blob);
 
 assert(iommu);
 if (x86_iommu_ir_supported(iommu)) {
@@ -2329,20 +2320,12 @@ insert_ivhd(PCIBus *bus, PCIDevice *dev, void *opaque)
 }
 
 /* For all PCI host bridges, walk and insert IVHD entries */
-static int
-ivrs_host_bridges(Object *obj, void *opaque)
+static void
+ivrs_host_bridges(PCIBus *bus, void *opaque)
 {
-GArray *ivhd_blob = opaque;
-
-if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) {
-PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
-
-if (bus && !pci_bus_bypass_iommu(bus)) {
-pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
-}
+if (!pci_bus_bypass_iommu(bus)) {
+pci_for_each_device_under_bus(bus, insert_ivhd, opaque);
 }
-
-return 0;
 }
 
 static void
@@ -2380,8 +2363,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, 
const char *oem_id,
  * blob further below.  Fall back to an entry covering all devices, which
  * is sufficient when no aliases are present.
  */
-object_child_foreach_recursive(object_get_root(),
-   ivrs_host_bridges, ivhd_blob);
+pci_for_each_root_bus(ivrs_host_bridges, ivhd_blob);
 
 if (!ivhd_blob->len) {
 /*
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 4a84e478ce..258290f4eb 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2097,6 

Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)

2021-10-27 Thread Leonardo Bras Soares Passos
On Thu, Oct 28, 2021 at 1:30 AM Markus Armbruster  wrote:
>
> Leonardo Bras Soares Passos  writes:
>
> [...]
>
> >> The general argument for having QAPI schema 'if' mirror the C
> >> implementation's #if is introspection.  Let me explain why that matters.
> >>
> >> Consider a management application that supports a range of QEMU
> >> versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
> >> new in QEMU 6.2.  The sane way to do that is to probe for the command
> >> with query-qmp-schema.  Same for command arguments, and anything else
> >> QMP.
> >>
> >> If you doubt "sane", check out Part II of "QEMU interface introspection:
> >> From hacks to solutions"[*].
> >>
> >> The same technique works when a QMP command / argument / whatever is
> >> compile-time conditional ('if' in the schema).  The code the management
> >> application needs anyway to deal with older QEMU now also deals with
> >> "compiled out".  Nice.
> >>
> >> Of course, a command or argument present in QEMU can still fail, and the
> >> management application still needs to handle failure.  Distinguishing
> >> different failure modes can be bothersome and/or fragile.
> >>
> >> By making the QAPI schema conditional mirror the C conditional, you
> >> squash the failure mode "this version of QEMU supports it, but this
> >> build of QEMU does not" into "this version of QEMU does not support
> >> it".  Makes sense, doesn't it?
> >>
> >> A minor additional advantage is less generated code.
> >>
> >>
> >>
> >> [*] 
> >> http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
> >>
> >
> > This was very informative, thanks!
> > I now understand the rationale about this choice.
> >
> > TBH I am not very used to this syntax.
> > I did a take a peek at some other json files, and ended adding this
> > lines in code, which compiled just fine:
> >
> > for : enum MigrationParameter
> > {'name': 'multifd-zerocopy', 'if' : 'CONFIG_LINUX'},
> >
> > for : struct MigrateSetParameters and struct MigrationParameters:
> > '*multifd-zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
> >
> > Is that enough? Is there any other necessary change?
>
> Looks good to me.
>
> The QAPI schema language is documented in docs/devel/qapi-code-gen.rst.

Thanks for reviewing and for pointing this docs!

>
> If you're curious, you can diff code generated into qapi/ before and
> after adding the 'if'.

Good idea!

>
> > Thanks for reviewing and for helping out with this!
>
> My pleasure!
>

:)

Best regards,
Leo




[PATCH v2 3/5] qom: object_child_foreach_recursive_type()

2021-10-27 Thread Peter Xu
Add this sister helper besides object_child_foreach_recursive() to loop over
child objects only if the object can be casted to a specific type.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Peter Xu 
---
 include/qom/object.h | 20 
 qom/object.c | 27 +++
 2 files changed, 47 insertions(+)

diff --git a/include/qom/object.h b/include/qom/object.h
index faae0d841f..355277db40 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -1926,6 +1926,26 @@ int object_child_foreach(Object *obj, int (*fn)(Object 
*child, void *opaque),
 int object_child_foreach_recursive(Object *obj,
int (*fn)(Object *child, void *opaque),
void *opaque);
+
+/**
+ * object_child_foreach_recursive_type:
+ * @obj: the object whose children will be navigated
+ * @type: the typename string to scan
+ * @fn: the iterator function to be called
+ * @opaque: an opaque value that will be passed to the iterator
+ *
+ * This is a special version of object_child_foreach_recursive() so that we
+ * only call the fn() if the child can be casted to the @typename specified.
+ * Please refer to the comments above object_child_foreach_recursive() for
+ * more details.
+ *
+ * Returns: The last value returned by @fn, or 0 if there is no child.
+ */
+int object_child_foreach_recursive_type(Object *obj,
+const char *typename,
+int (*fn)(Object *child, void *opaque),
+void *opaque);
+
 /**
  * container_get:
  * @root: root of the #path, e.g., object_get_root()
diff --git a/qom/object.c b/qom/object.c
index 6be710bc40..d25ca09b1d 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -1134,6 +1134,33 @@ int object_child_foreach_recursive(Object *obj,
 return do_object_child_foreach(obj, fn, opaque, true);
 }
 
+typedef struct {
+const char *typename;
+int (*fn)(Object *child, void *opaque);
+void *opaque;
+} ObjectTypeArgs;
+
+static int object_child_hook(Object *child, void *opaque)
+{
+ObjectTypeArgs *args = opaque;
+
+if (object_dynamic_cast(child, args->typename)) {
+return args->fn(child, args->opaque);
+}
+
+return 0;
+}
+
+int object_child_foreach_recursive_type(Object *obj,
+const char *typename,
+int (*fn)(Object *child, void *opaque),
+void *opaque)
+{
+ObjectTypeArgs args = { .typename = typename, .fn = fn, .opaque = opaque };
+
+return object_child_foreach_recursive(obj, object_child_hook, );
+}
+
 static void object_class_get_list_tramp(ObjectClass *klass, void *opaque)
 {
 GSList **list = opaque;
-- 
2.32.0




[PATCH v2 5/5] pc/q35: Add pre-plug hook for x86-iommu

2021-10-27 Thread Peter Xu
Add a pre-plug hook for x86-iommu, so that we can detect vfio-pci devices
before realizing the vIOMMU device.

When the guest contains both the x86 vIOMMU and vfio-pci devices, the user
needs to specify the x86 vIOMMU before the vfio-pci devices.  The reason is,
vfio_realize() calls pci_device_iommu_address_space() to fetch the correct dma
address space for the device, while that API can only work right after the
vIOMMU device initialized first.

For example, the iommu_fn() that is used in pci_device_iommu_address_space() is
only setup in realize() of the vIOMMU devices.

For a long time we have had libvirt making sure that the ordering is correct,
however from qemu side we never fail a guest from booting even if the ordering
is specified wrongly.  When the order is wrong, the guest will encounter
misterious error when operating on the vfio-pci device because in QEMU we'll
still assume the vfio-pci devices are put into the default DMA domain (which is
normally the direct GPA mapping), so e.g. the DMAs will never go right.

This patch fails the guest from booting when we detected such errornous cmdline
specified, then the guest at least won't encounter weird device behavior after
booted.  The error message will also help the user to know how to fix the issue.

Cc: Alex Williamson 
Suggested-by: Igor Mammedov 
Signed-off-by: Peter Xu 
---
 hw/i386/pc.c|  4 
 hw/i386/x86-iommu.c | 14 ++
 include/hw/i386/x86-iommu.h |  8 
 3 files changed, 26 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 86223acfd3..b70a04011e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -81,6 +81,7 @@
 #include "hw/core/cpu.h"
 #include "hw/usb.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/i386/x86-iommu.h"
 #include "hw/net/ne2000-isa.h"
 #include "standard-headers/asm-x86/bootparam.h"
 #include "hw/virtio/virtio-pmem-pci.h"
@@ -1327,6 +1328,8 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler 
*hotplug_dev,
 pc_memory_pre_plug(hotplug_dev, dev, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 x86_cpu_pre_plug(hotplug_dev, dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE)) {
+x86_iommu_pre_plug(X86_IOMMU_DEVICE(dev), errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
 pc_virtio_md_pci_pre_plug(hotplug_dev, dev, errp);
@@ -1383,6 +1386,7 @@ static HotplugHandler 
*pc_get_hotplug_handler(MachineState *machine,
 {
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
 object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
+object_dynamic_cast(OBJECT(dev), TYPE_X86_IOMMU_DEVICE) ||
 object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
 object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_MEM_PCI)) {
 return HOTPLUG_HANDLER(machine);
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 86ad03972e..c9ee9041a3 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -22,6 +22,7 @@
 #include "hw/i386/x86-iommu.h"
 #include "hw/qdev-properties.h"
 #include "hw/i386/pc.h"
+#include "hw/vfio/pci.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "trace.h"
@@ -103,6 +104,19 @@ IommuType x86_iommu_get_type(void)
 return x86_iommu_default->type;
 }
 
+void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp)
+{
+bool ambiguous = false;
+Object *object;
+
+object = object_resolve_path_type("", TYPE_VFIO_PCI, );
+if (object || ambiguous) {
+/* There're one or more vfio-pci devices detected */
+error_setg(errp, "Please specify all the vfio-pci devices to be after "
+   "the vIOMMU device");
+}
+}
+
 static void x86_iommu_realize(DeviceState *dev, Error **errp)
 {
 X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
index 9de92d33a1..e8b6c293e0 100644
--- a/include/hw/i386/x86-iommu.h
+++ b/include/hw/i386/x86-iommu.h
@@ -172,4 +172,12 @@ void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool 
global,
  * @out: Output MSI message
  */
 void x86_iommu_irq_to_msi_message(X86IOMMUIrq *irq, MSIMessage *out);
+
+/**
+ * x86_iommu_pre_plug: called before plugging the iommu device
+ * @X86IOMMUState: the pointer to x86 iommu state
+ * @errp: the double pointer to Error, set if we want to fail the plug
+ */
+void x86_iommu_pre_plug(X86IOMMUState *iommu, Error **errp);
+
 #endif
-- 
2.32.0




[PATCH v2 0/5] pci/iommu: Fail early if vfio-pci detected before vIOMMU

2021-10-27 Thread Peter Xu
Note that patch 1-4 are cleanups for pci subsystem, and patch 5 is a fix to
fail early for mis-ordered qemu cmdline on vfio and vIOMMU.  Logically they
should be posted separately and they're not directly related, however to make
it still correlated to v1 I kept them in the same patchset.

In this version I used pre_plug() hook for q35 to detect the ordering issue as
Igor suggested, meanwhile it's done via object_resolve_path_type() rather than
scanning the pci bus as Michael suggested.

Please review, thanks.

v2 changelog:
- Picked up r-b where I can
- Merged patch 1 & 4, 2 & 3, 5 & 6
- s/pci_root_bus_args/PCIRootBusArgs/ [David, Michael]
- Replace "void* " with "void *" in pci.h [Phil]
- Dropped "pci: Add pci_for_each_device_all()"
- Dropped "x86-iommu: Fail early if vIOMMU specified after vfio-pci"
- Added "qom: object_child_foreach_recursive_type()"
- Added "pc/q35: Add pre-plug hook for x86-iommu"

v1: https://lore.kernel.org/qemu-devel/20211021104259.57754-1-pet...@redhat.com/

Peter Xu (5):
  pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn
  pci: Export pci_for_each_device_under_bus*()
  qom: object_child_foreach_recursive_type()
  pci: Add pci_for_each_root_bus()
  pc/q35: Add pre-plug hook for x86-iommu

 hw/arm/virt-acpi-build.c| 31 --
 hw/i386/acpi-build.c| 39 +++-
 hw/i386/pc.c|  4 +++
 hw/i386/x86-iommu.c | 14 ++
 hw/pci/pci.c| 52 +
 hw/pci/pcie.c   |  4 +--
 hw/ppc/spapr_pci.c  | 12 -
 hw/ppc/spapr_pci_nvlink2.c  |  7 +++--
 hw/ppc/spapr_pci_vfio.c |  4 +--
 hw/s390x/s390-pci-bus.c |  5 ++--
 hw/xen/xen_pt.c |  4 +--
 include/hw/i386/x86-iommu.h |  8 ++
 include/hw/pci/pci.h| 26 ---
 include/qom/object.h| 20 ++
 qom/object.c| 27 +++
 15 files changed, 160 insertions(+), 97 deletions(-)

-- 
2.32.0




[PATCH v2 2/5] pci: Export pci_for_each_device_under_bus*()

2021-10-27 Thread Peter Xu
They're actually more commonly used than the helper without _under_bus, because
most callers do have the pci bus on hand.  After exporting we can switch a lot
of the call sites to use these two helpers.

Reviewed-by: David Hildenbrand 
Reviewed-by: Eric Auger 
Signed-off-by: Peter Xu 
---
 hw/i386/acpi-build.c   |  5 ++---
 hw/pci/pci.c   | 10 +-
 hw/pci/pcie.c  |  4 +---
 hw/ppc/spapr_pci.c | 12 +---
 hw/ppc/spapr_pci_nvlink2.c |  7 +++
 hw/ppc/spapr_pci_vfio.c|  4 ++--
 hw/s390x/s390-pci-bus.c|  5 ++---
 hw/xen/xen_pt.c|  4 ++--
 include/hw/pci/pci.h   |  5 +
 9 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 81418b7911..a76b17ed92 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2132,8 +2132,7 @@ dmar_host_bridges(Object *obj, void *opaque)
 PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
 
 if (bus && !pci_bus_bypass_iommu(bus)) {
-pci_for_each_device(bus, pci_bus_num(bus), insert_scope,
-scope_blob);
+pci_for_each_device_under_bus(bus, insert_scope, scope_blob);
 }
 }
 
@@ -2339,7 +2338,7 @@ ivrs_host_bridges(Object *obj, void *opaque)
 PCIBus *bus = PCI_HOST_BRIDGE(obj)->bus;
 
 if (bus && !pci_bus_bypass_iommu(bus)) {
-pci_for_each_device(bus, pci_bus_num(bus), insert_ivhd, ivhd_blob);
+pci_for_each_device_under_bus(bus, insert_ivhd, ivhd_blob);
 }
 }
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 17e59cb3a3..4a84e478ce 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1654,9 +1654,9 @@ static const pci_class_desc pci_class_descriptions[] =
 { 0, NULL}
 };
 
-static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
-  pci_bus_dev_fn fn,
-  void *opaque)
+void pci_for_each_device_under_bus_reverse(PCIBus *bus,
+   pci_bus_dev_fn fn,
+   void *opaque)
 {
 PCIDevice *d;
 int devfn;
@@ -1679,8 +1679,8 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
 }
 }
 
-static void pci_for_each_device_under_bus(PCIBus *bus,
-  pci_bus_dev_fn fn, void *opaque)
+void pci_for_each_device_under_bus(PCIBus *bus,
+   pci_bus_dev_fn fn, void *opaque)
 {
 PCIDevice *d;
 int devfn;
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 6e95d82903..914a9bf3d1 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -694,9 +694,7 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
 (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
 (old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) {
 PCIBus *sec_bus = pci_bridge_get_sec_bus(PCI_BRIDGE(dev));
-pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-pcie_unplug_device, NULL);
-
+pci_for_each_device_under_bus(sec_bus, pcie_unplug_device, NULL);
 pci_word_test_and_clear_mask(exp_cap + PCI_EXP_SLTSTA,
  PCI_EXP_SLTSTA_PDS);
 if (dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA ||
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 7430bd6314..5bfd4aa9e5 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1317,8 +1317,7 @@ static int spapr_dt_pci_bus(SpaprPhbState *sphb, PCIBus 
*bus,
   RESOURCE_CELLS_SIZE));
 
 assert(bus);
-pci_for_each_device_reverse(bus, pci_bus_num(bus),
-spapr_dt_pci_device_cb, );
+pci_for_each_device_under_bus_reverse(bus, spapr_dt_pci_device_cb, 
);
 if (cbinfo.err) {
 return cbinfo.err;
 }
@@ -2306,8 +2305,8 @@ static void spapr_phb_pci_enumerate_bridge(PCIBus *bus, 
PCIDevice *pdev,
 return;
 }
 
-pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
-spapr_phb_pci_enumerate_bridge, bus_no);
+pci_for_each_device_under_bus(sec_bus, spapr_phb_pci_enumerate_bridge,
+  bus_no);
 pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, *bus_no, 1);
 }
 
@@ -2316,9 +2315,8 @@ static void spapr_phb_pci_enumerate(SpaprPhbState *phb)
 PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
 unsigned int bus_no = 0;
 
-pci_for_each_device(bus, pci_bus_num(bus),
-spapr_phb_pci_enumerate_bridge,
-_no);
+pci_for_each_device_under_bus(bus, spapr_phb_pci_enumerate_bridge,
+  _no);
 
 }
 
diff --git a/hw/ppc/spapr_pci_nvlink2.c b/hw/ppc/spapr_pci_nvlink2.c
index 8ef9b40a18..7fb0cf4d04 100644
--- a/hw/ppc/spapr_pci_nvlink2.c
+++ b/hw/ppc/spapr_pci_nvlink2.c
@@ -164,8 +164,7 @@ static void spapr_phb_pci_collect_nvgpu(PCIBus *bus, 
PCIDevice *pdev,
 

[PATCH v2 1/5] pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn

2021-10-27 Thread Peter Xu
They're used in quite a few places of pci.[ch] and also in the rest of the code
base.  Define them so that it doesn't need to be defined all over the places.

The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an
opaque.  The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a
void* pointer.

Reviewed-by: David Hildenbrand 
Reviewed-by: Eric Auger 
Signed-off-by: Peter Xu 
---
 hw/pci/pci.c | 20 ++--
 include/hw/pci/pci.h | 19 +--
 2 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 186758ee11..17e59cb3a3 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1655,9 +1655,7 @@ static const pci_class_desc pci_class_descriptions[] =
 };
 
 static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
-  void (*fn)(PCIBus *b,
- PCIDevice *d,
- void *opaque),
+  pci_bus_dev_fn fn,
   void *opaque)
 {
 PCIDevice *d;
@@ -1672,8 +1670,7 @@ static void pci_for_each_device_under_bus_reverse(PCIBus 
*bus,
 }
 
 void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
- void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
- void *opaque)
+ pci_bus_dev_fn fn, void *opaque)
 {
 bus = pci_find_bus_nr(bus, bus_num);
 
@@ -1683,9 +1680,7 @@ void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
 }
 
 static void pci_for_each_device_under_bus(PCIBus *bus,
-  void (*fn)(PCIBus *b, PCIDevice *d,
- void *opaque),
-  void *opaque)
+  pci_bus_dev_fn fn, void *opaque)
 {
 PCIDevice *d;
 int devfn;
@@ -1699,8 +1694,7 @@ static void pci_for_each_device_under_bus(PCIBus *bus,
 }
 
 void pci_for_each_device(PCIBus *bus, int bus_num,
- void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
- void *opaque)
+ pci_bus_dev_fn fn, void *opaque)
 {
 bus = pci_find_bus_nr(bus, bus_num);
 
@@ -2078,10 +2072,8 @@ static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num)
 return NULL;
 }
 
-void pci_for_each_bus_depth_first(PCIBus *bus,
-  void *(*begin)(PCIBus *bus, void 
*parent_state),
-  void (*end)(PCIBus *bus, void *state),
-  void *parent_state)
+void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
+  pci_bus_fn end, void *parent_state)
 {
 PCIBus *sec;
 void *state;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 7fc90132cf..4a8740b76b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -401,6 +401,10 @@ typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int 
pin);
 OBJECT_DECLARE_TYPE(PCIBus, PCIBusClass, PCI_BUS)
 #define TYPE_PCIE_BUS "PCIE"
 
+typedef void (*pci_bus_dev_fn)(PCIBus *b, PCIDevice *d, void *opaque);
+typedef void (*pci_bus_fn)(PCIBus *b, void *opaque);
+typedef void *(*pci_bus_ret_fn)(PCIBus *b, void *opaque);
+
 bool pci_bus_is_express(PCIBus *bus);
 
 void pci_root_bus_init(PCIBus *bus, size_t bus_size, DeviceState *parent,
@@ -458,23 +462,18 @@ static inline int pci_dev_bus_num(const PCIDevice *dev)
 
 int pci_bus_numa_node(PCIBus *bus);
 void pci_for_each_device(PCIBus *bus, int bus_num,
- void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
+ pci_bus_dev_fn fn,
  void *opaque);
 void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
- void (*fn)(PCIBus *bus, PCIDevice *d,
-void *opaque),
+ pci_bus_dev_fn fn,
  void *opaque);
-void pci_for_each_bus_depth_first(PCIBus *bus,
-  void *(*begin)(PCIBus *bus, void 
*parent_state),
-  void (*end)(PCIBus *bus, void *state),
-  void *parent_state);
+void pci_for_each_bus_depth_first(PCIBus *bus, pci_bus_ret_fn begin,
+  pci_bus_fn end, void *parent_state);
 PCIDevice *pci_get_function_0(PCIDevice *pci_dev);
 
 /* Use this wrapper when specific scan order is not required. */
 static inline
-void pci_for_each_bus(PCIBus *bus,
-  void (*fn)(PCIBus *bus, void *opaque),
-  void *opaque)
+void pci_for_each_bus(PCIBus *bus, pci_bus_fn fn, void *opaque)
 {
 pci_for_each_bus_depth_first(bus, NULL, fn, 

Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)

2021-10-27 Thread Markus Armbruster
Leonardo Bras Soares Passos  writes:

[...]

>> The general argument for having QAPI schema 'if' mirror the C
>> implementation's #if is introspection.  Let me explain why that matters.
>>
>> Consider a management application that supports a range of QEMU
>> versions, say 5.0 to 6.2.  Say it wants to use an QMP command that is
>> new in QEMU 6.2.  The sane way to do that is to probe for the command
>> with query-qmp-schema.  Same for command arguments, and anything else
>> QMP.
>>
>> If you doubt "sane", check out Part II of "QEMU interface introspection:
>> From hacks to solutions"[*].
>>
>> The same technique works when a QMP command / argument / whatever is
>> compile-time conditional ('if' in the schema).  The code the management
>> application needs anyway to deal with older QEMU now also deals with
>> "compiled out".  Nice.
>>
>> Of course, a command or argument present in QEMU can still fail, and the
>> management application still needs to handle failure.  Distinguishing
>> different failure modes can be bothersome and/or fragile.
>>
>> By making the QAPI schema conditional mirror the C conditional, you
>> squash the failure mode "this version of QEMU supports it, but this
>> build of QEMU does not" into "this version of QEMU does not support
>> it".  Makes sense, doesn't it?
>>
>> A minor additional advantage is less generated code.
>>
>>
>>
>> [*] 
>> http://events17.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
>>
>
> This was very informative, thanks!
> I now understand the rationale about this choice.
>
> TBH I am not very used to this syntax.
> I did a take a peek at some other json files, and ended adding this
> lines in code, which compiled just fine:
>
> for : enum MigrationParameter
> {'name': 'multifd-zerocopy', 'if' : 'CONFIG_LINUX'},
>
> for : struct MigrateSetParameters and struct MigrationParameters:
> '*multifd-zerocopy': { 'type': 'bool', 'if': 'CONFIG_LINUX' },
>
> Is that enough? Is there any other necessary change?

Looks good to me.

The QAPI schema language is documented in docs/devel/qapi-code-gen.rst.

If you're curious, you can diff code generated into qapi/ before and
after adding the 'if'.

> Thanks for reviewing and for helping out with this!

My pleasure!




Re: [PATCH v12 0/6] support dirtyrate at the granualrity of vcpu

2021-10-27 Thread Hyman Huang




在 2021/10/27 14:31, Zheng Chuan 写道:

Hi.
I have no objection for the implement code itself.
But we should know or let the user know the performance penalty and conflicted 
with migration compared to the hash method, especially for the performance of 
vm with hugepages.


i dirty guest memory with 1G and do the measurement with two method.

the copy rate is almost 1,665 MB/s in vm

the following output is guest memory performance when do measurement 
with dirty ring method:

/init (1): INFO: 1635392998977ms copied 1 GB in 00616ms
/init (1): INFO: 1635392999593ms copied 1 GB in 00615ms
/init (1): INFO: 1635393000211ms copied 1 GB in 00616ms
- start measurement ---
/init (1): INFO: 1635393000884ms copied 1 GB in 00672ms 
/init (1): INFO: 1635393001849ms copied 1 GB in 00963ms 
/init (1): INFO: 1635393002578ms copied 1 GB in 00727ms
- end measurement ---
/init (1): INFO: 1635393003195ms copied 1 GB in 00615ms
/init (1): INFO: 1635393003811ms copied 1 GB in 00614ms
/init (1): INFO: 1635393004427ms copied 1 GB in 00615ms

guest memory performance do not trigger any changes almostly with hash 
method:


the following is test results (measurment interval=1s):

method  measurement result  copy rate during measurement
hash44 MB/s 1,665MB/s   
dirty ring  1167 MB/s   1,523MB/s、1,063MB/s、1,408MB/s

the max penalty is 36% during test interval(1s), the average penalty is 
20%。


if we trade off accurance, the dirty ring method may be a availiabe 
method for user. users can select a appropriate method as they need.




On 2021/10/15 10:07, Hyman Huang wrote:



在 2021/10/15 9:32, Peter Xu 写道:

On Wed, Jun 30, 2021 at 12:01:17AM +0800, huang...@chinatelecom.cn wrote:

From: Hyman Huang(黄勇) 

v12
- adjust the order of calculating dirty rate
    let memory_global_dirty_log_sync before calculating as
    v11 version description.


Ping for Yong. >
Dave/Juan, any plan to review/merge this series (along with the other series of
dirty logging)?

I found it useful when I wanted to modify the program I used to generate
constant dirty workload - this series can help me to verify the change.

I still keep thinking this series is something good to have.  Thanks,

the dirtyrate calculation has already been used to estimate time of live migration in 
"e cloud" production of chinatelecom, it also predict the migration success 
ratio, which provide valuable information for the cloud management plane when selecting 
which vm should be migrated.








--
Best regard

Hyman Huang(黄勇)



Re: [PATCH 5/8] pci: Add pci_for_each_root_bus()

2021-10-27 Thread Peter Xu
On Mon, Oct 25, 2021 at 09:16:53AM -0400, Michael S. Tsirkin wrote:
> > +void pci_for_each_root_bus(pci_bus_fn fn, void *opaque)
> > +{
> > +pci_root_bus_args args = { .fn = fn, .opaque = opaque };
> > +
> > +object_child_foreach_recursive(object_get_root(), pci_find_root_bus, 
> > );
> > +}
> >  
> >  PCIDevice *pci_find_device(PCIBus *bus, int bus_num, uint8_t devfn)
> >  {
> 
> 
> How about adding an API with a type filter to the qom core?
> E.g.
> object_child_foreach_type_recursive getting a type.

Sounds good, will do.  Thanks,

-- 
Peter Xu




[PULL 56/56] tcg/optimize: Propagate sign info for shifting

2021-10-27 Thread Richard Henderson
For constant shifts, we can simply shift the s_mask.

For variable shifts, we know that sar does not reduce
the s_mask, which helps for sequences like

ext32s_i64  t, in
sar_i64 t, t, v
ext32s_i64  out, t

allowing the final extend to be eliminated.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 50 +++---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c0eccc61d6..dbb2d46e88 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -85,6 +85,18 @@ static uint64_t smask_from_zmask(uint64_t zmask)
 return ~(~0ull >> rep);
 }
 
+/*
+ * Recreate a properly left-aligned smask after manipulation.
+ * Some bit-shuffling, particularly shifts and rotates, may
+ * retain sign bits on the left, but may scatter disconnected
+ * sign bits on the right.  Retain only what remains to the left.
+ */
+static uint64_t smask_from_smask(int64_t smask)
+{
+/* Only the 1 bits are significant for smask */
+return smask_from_zmask(~smask);
+}
+
 static inline TempOptInfo *ts_info(TCGTemp *ts)
 {
 return ts->state_ptr;
@@ -1843,18 +1855,50 @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
 
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
+uint64_t s_mask, z_mask, sign;
+
 if (fold_const2(ctx, op) ||
 fold_ix_to_i(ctx, op, 0) ||
 fold_xi_to_x(ctx, op, 0)) {
 return true;
 }
 
+s_mask = arg_info(op->args[1])->s_mask;
+z_mask = arg_info(op->args[1])->z_mask;
+
 if (arg_is_const(op->args[2])) {
-ctx->z_mask = do_constant_folding(op->opc, ctx->type,
-  arg_info(op->args[1])->z_mask,
-  arg_info(op->args[2])->val);
+int sh = arg_info(op->args[2])->val;
+
+ctx->z_mask = do_constant_folding(op->opc, ctx->type, z_mask, sh);
+
+s_mask = do_constant_folding(op->opc, ctx->type, s_mask, sh);
+ctx->s_mask = smask_from_smask(s_mask);
+
 return fold_masks(ctx, op);
 }
+
+switch (op->opc) {
+CASE_OP_32_64(sar):
+/*
+ * Arithmetic right shift will not reduce the number of
+ * input sign repetitions.
+ */
+ctx->s_mask = s_mask;
+break;
+CASE_OP_32_64(shr):
+/*
+ * If the sign bit is known zero, then logical right shift
+ * will not reduced the number of input sign repetitions.
+ */
+sign = (s_mask & -s_mask) >> 1;
+if (!(z_mask & sign)) {
+ctx->s_mask = s_mask;
+}
+break;
+default:
+break;
+}
+
 return false;
 }
 
-- 
2.25.1




[PULL 53/56] tcg/optimize: Propagate sign info for logical operations

2021-10-27 Thread Richard Henderson
Sign repetitions are perforce all identical, whether they are 1 or 0.
Bitwise operations preserve the relative quantity of the repetitions.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index ef202abbcb..de1abd9cc3 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -967,6 +967,13 @@ static bool fold_and(OptContext *ctx, TCGOp *op)
 z2 = arg_info(op->args[2])->z_mask;
 ctx->z_mask = z1 & z2;
 
+/*
+ * Sign repetitions are perforce all identical, whether they are 1 or 0.
+ * Bitwise operations preserve the relative quantity of the repetitions.
+ */
+ctx->s_mask = arg_info(op->args[1])->s_mask
+& arg_info(op->args[2])->s_mask;
+
 /*
  * Known-zeros does not imply known-ones.  Therefore unless
  * arg2 is constant, we can't infer affected bits from it.
@@ -1002,6 +1009,8 @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 }
 ctx->z_mask = z1;
 
+ctx->s_mask = arg_info(op->args[1])->s_mask
+& arg_info(op->args[2])->s_mask;
 return fold_masks(ctx, op);
 }
 
@@ -1300,6 +1309,9 @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
 fold_xi_to_not(ctx, op, 0)) {
 return true;
 }
+
+ctx->s_mask = arg_info(op->args[1])->s_mask
+& arg_info(op->args[2])->s_mask;
 return false;
 }
 
@@ -1487,6 +1499,8 @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
 
 ctx->z_mask = arg_info(op->args[3])->z_mask
 | arg_info(op->args[4])->z_mask;
+ctx->s_mask = arg_info(op->args[3])->s_mask
+& arg_info(op->args[4])->s_mask;
 
 if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
 uint64_t tv = arg_info(op->args[3])->val;
@@ -1585,6 +1599,9 @@ static bool fold_nand(OptContext *ctx, TCGOp *op)
 fold_xi_to_not(ctx, op, -1)) {
 return true;
 }
+
+ctx->s_mask = arg_info(op->args[1])->s_mask
+& arg_info(op->args[2])->s_mask;
 return false;
 }
 
@@ -1614,6 +1631,9 @@ static bool fold_nor(OptContext *ctx, TCGOp *op)
 fold_xi_to_not(ctx, op, 0)) {
 return true;
 }
+
+ctx->s_mask = arg_info(op->args[1])->s_mask
+& arg_info(op->args[2])->s_mask;
 return false;
 }
 
@@ -1623,6 +1643,8 @@ static bool fold_not(OptContext *ctx, TCGOp *op)
 return true;
 }
 
+ctx->s_mask = arg_info(op->args[1])->s_mask;
+
 /* Because of fold_to_not, we want to always return true, via finish. */
 finish_folding(ctx, op);
 return true;
@@ -1638,6 +1660,8 @@ static bool fold_or(OptContext *ctx, TCGOp *op)
 
 ctx->z_mask = arg_info(op->args[1])->z_mask
 | arg_info(op->args[2])->z_mask;
+ctx->s_mask = arg_info(op->args[1])->s_mask
+& arg_info(op->args[2])->s_mask;
 return fold_masks(ctx, op);
 }
 
@@ -1649,6 +1673,9 @@ static bool fold_orc(OptContext *ctx, TCGOp *op)
 fold_ix_to_not(ctx, op, 0)) {
 return true;
 }
+
+ctx->s_mask = arg_info(op->args[1])->s_mask
+& arg_info(op->args[2])->s_mask;
 return false;
 }
 
@@ -1922,6 +1949,8 @@ static bool fold_xor(OptContext *ctx, TCGOp *op)
 
 ctx->z_mask = arg_info(op->args[1])->z_mask
 | arg_info(op->args[2])->z_mask;
+ctx->s_mask = arg_info(op->args[1])->s_mask
+& arg_info(op->args[2])->s_mask;
 return fold_masks(ctx, op);
 }
 
-- 
2.25.1




[PULL 54/56] tcg/optimize: Propagate sign info for setcond

2021-10-27 Thread Richard Henderson
The result is either 0 or 1, which means that we have
a 2 bit signed result, and thus 62 bits of sign.
For clarity, use the smask_from_zmask function.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index de1abd9cc3..5fa4d7285d 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1730,6 +1730,7 @@ static bool fold_setcond(OptContext *ctx, TCGOp *op)
 }
 
 ctx->z_mask = 1;
+ctx->s_mask = smask_from_zmask(1);
 return false;
 }
 
@@ -1802,6 +1803,7 @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
 }
 
 ctx->z_mask = 1;
+ctx->s_mask = smask_from_zmask(1);
 return false;
 
  do_setcond_const:
-- 
2.25.1




[PULL 43/56] tcg/optimize: Split out fold_masks

2021-10-27 Thread Richard Henderson
Move all of the known-zero optimizations into the per-opcode
functions.  Use fold_masks when there is a possibility of the
result being determined, and simply set ctx->z_mask otherwise.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 545 ++---
 1 file changed, 294 insertions(+), 251 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index bf74b77355..e84d10be53 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -50,7 +50,8 @@ typedef struct OptContext {
 TCGTempSet temps_used;
 
 /* In flight values from optimization. */
-uint64_t z_mask;
+uint64_t a_mask;  /* mask bit is 0 iff value identical to first input */
+uint64_t z_mask;  /* mask bit is 0 iff value bit is 0 */
 TCGType type;
 } OptContext;
 
@@ -694,6 +695,31 @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
 return false;
 }
 
+static bool fold_masks(OptContext *ctx, TCGOp *op)
+{
+uint64_t a_mask = ctx->a_mask;
+uint64_t z_mask = ctx->z_mask;
+
+/*
+ * 32-bit ops generate 32-bit results.  For the result is zero test
+ * below, we can ignore high bits, but for further optimizations we
+ * need to record that the high bits contain garbage.
+ */
+if (ctx->type == TCG_TYPE_I32) {
+ctx->z_mask |= MAKE_64BIT_MASK(32, 32);
+a_mask &= MAKE_64BIT_MASK(0, 32);
+z_mask &= MAKE_64BIT_MASK(0, 32);
+}
+
+if (z_mask == 0) {
+return tcg_opt_gen_movi(ctx, op, op->args[0], 0);
+}
+if (a_mask == 0) {
+return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+}
+return false;
+}
+
 /*
  * Convert @op to NOT, if NOT is supported by the host.
  * Return true f the conversion is successful, which will still
@@ -847,24 +873,55 @@ static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
 
 static bool fold_and(OptContext *ctx, TCGOp *op)
 {
+uint64_t z1, z2;
+
 if (fold_const2(ctx, op) ||
 fold_xi_to_i(ctx, op, 0) ||
 fold_xi_to_x(ctx, op, -1) ||
 fold_xx_to_x(ctx, op)) {
 return true;
 }
-return false;
+
+z1 = arg_info(op->args[1])->z_mask;
+z2 = arg_info(op->args[2])->z_mask;
+ctx->z_mask = z1 & z2;
+
+/*
+ * Known-zeros does not imply known-ones.  Therefore unless
+ * arg2 is constant, we can't infer affected bits from it.
+ */
+if (arg_is_const(op->args[2])) {
+ctx->a_mask = z1 & ~z2;
+}
+
+return fold_masks(ctx, op);
 }
 
 static bool fold_andc(OptContext *ctx, TCGOp *op)
 {
+uint64_t z1;
+
 if (fold_const2(ctx, op) ||
 fold_xx_to_i(ctx, op, 0) ||
 fold_xi_to_x(ctx, op, 0) ||
 fold_ix_to_not(ctx, op, -1)) {
 return true;
 }
-return false;
+
+z1 = arg_info(op->args[1])->z_mask;
+
+/*
+ * Known-zeros does not imply known-ones.  Therefore unless
+ * arg2 is constant, we can't infer anything from it.
+ */
+if (arg_is_const(op->args[2])) {
+uint64_t z2 = ~arg_info(op->args[2])->z_mask;
+ctx->a_mask = z1 & ~z2;
+z1 &= z2;
+}
+ctx->z_mask = z1;
+
+return fold_masks(ctx, op);
 }
 
 static bool fold_brcond(OptContext *ctx, TCGOp *op)
@@ -963,13 +1020,52 @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 
 static bool fold_bswap(OptContext *ctx, TCGOp *op)
 {
+uint64_t z_mask, sign;
+
 if (arg_is_const(op->args[1])) {
 uint64_t t = arg_info(op->args[1])->val;
 
 t = do_constant_folding(op->opc, ctx->type, t, op->args[2]);
 return tcg_opt_gen_movi(ctx, op, op->args[0], t);
 }
-return false;
+
+z_mask = arg_info(op->args[1])->z_mask;
+switch (op->opc) {
+case INDEX_op_bswap16_i32:
+case INDEX_op_bswap16_i64:
+z_mask = bswap16(z_mask);
+sign = INT16_MIN;
+break;
+case INDEX_op_bswap32_i32:
+case INDEX_op_bswap32_i64:
+z_mask = bswap32(z_mask);
+sign = INT32_MIN;
+break;
+case INDEX_op_bswap64_i64:
+z_mask = bswap64(z_mask);
+sign = INT64_MIN;
+break;
+default:
+g_assert_not_reached();
+}
+
+switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
+case TCG_BSWAP_OZ:
+break;
+case TCG_BSWAP_OS:
+/* If the sign bit may be 1, force all the bits above to 1. */
+if (z_mask & sign) {
+z_mask |= sign;
+}
+break;
+default:
+/* The high bits are undefined: force all bits above the sign to 1. */
+z_mask |= sign << 1;
+break;
+}
+ctx->z_mask = z_mask;
+
+return fold_masks(ctx, op);
 }
 
 static bool fold_call(OptContext *ctx, TCGOp *op)
@@ -1006,6 +1102,8 @@ static bool fold_call(OptContext *ctx, TCGOp *op)
 
 static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
 {
+uint64_t z_mask;
+
 if (arg_is_const(op->args[1])) {
 uint64_t t = arg_info(op->args[1])->val;
 
@@ -1015,12 

[PULL 41/56] tcg/optimize: Split out fold_xi_to_x

2021-10-27 Thread Richard Henderson
Pull the "op r, a, i => mov r, a" optimization into a function,
and use them in the outer-most logical operations.

Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 61 +-
 1 file changed, 26 insertions(+), 35 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e0d850ffe4..f5ab0500b7 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -749,6 +749,15 @@ static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, 
uint64_t i)
 return false;
 }
 
+/* If the binary operation has second argument @i, fold to identity. */
+static bool fold_xi_to_x(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+}
+return false;
+}
+
 /* If the binary operation has second argument @i, fold to NOT. */
 static bool fold_xi_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -787,7 +796,11 @@ static bool fold_xx_to_x(OptContext *ctx, TCGOp *op)
 
 static bool fold_add(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xi_to_x(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_addsub2_i32(OptContext *ctx, TCGOp *op, bool add)
@@ -827,6 +840,7 @@ static bool fold_and(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
 fold_xi_to_i(ctx, op, 0) ||
+fold_xi_to_x(ctx, op, -1) ||
 fold_xx_to_x(ctx, op)) {
 return true;
 }
@@ -837,6 +851,7 @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
 fold_xx_to_i(ctx, op, 0) ||
+fold_xi_to_x(ctx, op, 0) ||
 fold_ix_to_not(ctx, op, -1)) {
 return true;
 }
@@ -1044,6 +1059,7 @@ static bool fold_dup2(OptContext *ctx, TCGOp *op)
 static bool fold_eqv(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
+fold_xi_to_x(ctx, op, -1) ||
 fold_xi_to_not(ctx, op, 0)) {
 return true;
 }
@@ -1237,6 +1253,7 @@ static bool fold_not(OptContext *ctx, TCGOp *op)
 static bool fold_or(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
+fold_xi_to_x(ctx, op, 0) ||
 fold_xx_to_x(ctx, op)) {
 return true;
 }
@@ -1246,6 +1263,7 @@ static bool fold_or(OptContext *ctx, TCGOp *op)
 static bool fold_orc(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
+fold_xi_to_x(ctx, op, -1) ||
 fold_ix_to_not(ctx, op, 0)) {
 return true;
 }
@@ -1365,7 +1383,11 @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
 
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xi_to_x(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_sub_to_neg(OptContext *ctx, TCGOp *op)
@@ -1408,6 +1430,7 @@ static bool fold_sub(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
 fold_xx_to_i(ctx, op, 0) ||
+fold_xi_to_x(ctx, op, 0) ||
 fold_sub_to_neg(ctx, op)) {
 return true;
 }
@@ -1423,6 +1446,7 @@ static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
 fold_xx_to_i(ctx, op, 0) ||
+fold_xi_to_x(ctx, op, 0) ||
 fold_xi_to_not(ctx, op, -1)) {
 return true;
 }
@@ -1546,39 +1570,6 @@ void tcg_optimize(TCGContext *s)
 break;
 }
 
-/* Simplify expression for "op r, a, const => mov r, a" cases */
-switch (opc) {
-CASE_OP_32_64_VEC(add):
-CASE_OP_32_64_VEC(sub):
-CASE_OP_32_64_VEC(or):
-CASE_OP_32_64_VEC(xor):
-CASE_OP_32_64_VEC(andc):
-CASE_OP_32_64(shl):
-CASE_OP_32_64(shr):
-CASE_OP_32_64(sar):
-CASE_OP_32_64(rotl):
-CASE_OP_32_64(rotr):
-if (!arg_is_const(op->args[1])
-&& arg_is_const(op->args[2])
-&& arg_info(op->args[2])->val == 0) {
-tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
-continue;
-}
-break;
-CASE_OP_32_64_VEC(and):
-CASE_OP_32_64_VEC(orc):
-CASE_OP_32_64(eqv):
-if (!arg_is_const(op->args[1])
-&& arg_is_const(op->args[2])
-&& arg_info(op->args[2])->val == -1) {
-tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
-continue;
-}
-break;
-default:
-break;
-}
-
 /* Simplify using known-zero bits. Currently only ops with a single
output argument is supported. */
 z_mask = -1;
-- 
2.25.1




[PULL 38/56] tcg/optimize: Add type to OptContext

2021-10-27 Thread Richard Henderson
Compute the type of the operation early.

There are at least 4 places that used a def->flags ladder
to determine the type of the operation being optimized.

There were two places that assumed !TCG_OPF_64BIT means
TCG_TYPE_I32, and so could potentially compute incorrect
results for vector operations.

Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 149 +
 1 file changed, 89 insertions(+), 60 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index cfdc53c964..e869fa7e78 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -51,6 +51,7 @@ typedef struct OptContext {
 
 /* In flight values from optimization. */
 uint64_t z_mask;
+TCGType type;
 } OptContext;
 
 static inline TempOptInfo *ts_info(TCGTemp *ts)
@@ -187,7 +188,6 @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, 
TCGArg dst, TCGArg src)
 {
 TCGTemp *dst_ts = arg_temp(dst);
 TCGTemp *src_ts = arg_temp(src);
-const TCGOpDef *def;
 TempOptInfo *di;
 TempOptInfo *si;
 uint64_t z_mask;
@@ -201,16 +201,24 @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, 
TCGArg dst, TCGArg src)
 reset_ts(dst_ts);
 di = ts_info(dst_ts);
 si = ts_info(src_ts);
-def = _op_defs[op->opc];
-if (def->flags & TCG_OPF_VECTOR) {
-new_op = INDEX_op_mov_vec;
-} else if (def->flags & TCG_OPF_64BIT) {
-new_op = INDEX_op_mov_i64;
-} else {
+
+switch (ctx->type) {
+case TCG_TYPE_I32:
 new_op = INDEX_op_mov_i32;
+break;
+case TCG_TYPE_I64:
+new_op = INDEX_op_mov_i64;
+break;
+case TCG_TYPE_V64:
+case TCG_TYPE_V128:
+case TCG_TYPE_V256:
+/* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
+new_op = INDEX_op_mov_vec;
+break;
+default:
+g_assert_not_reached();
 }
 op->opc = new_op;
-/* TCGOP_VECL and TCGOP_VECE remain unchanged.  */
 op->args[0] = dst;
 op->args[1] = src;
 
@@ -237,20 +245,9 @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, 
TCGArg dst, TCGArg src)
 static bool tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
  TCGArg dst, uint64_t val)
 {
-const TCGOpDef *def = _op_defs[op->opc];
-TCGType type;
-TCGTemp *tv;
-
-if (def->flags & TCG_OPF_VECTOR) {
-type = TCGOP_VECL(op) + TCG_TYPE_V64;
-} else if (def->flags & TCG_OPF_64BIT) {
-type = TCG_TYPE_I64;
-} else {
-type = TCG_TYPE_I32;
-}
-
 /* Convert movi to mov with constant temp. */
-tv = tcg_constant_internal(type, val);
+TCGTemp *tv = tcg_constant_internal(ctx->type, val);
+
 init_ts_info(ctx, tv);
 return tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
 }
@@ -420,11 +417,11 @@ static uint64_t do_constant_folding_2(TCGOpcode op, 
uint64_t x, uint64_t y)
 }
 }
 
-static uint64_t do_constant_folding(TCGOpcode op, uint64_t x, uint64_t y)
+static uint64_t do_constant_folding(TCGOpcode op, TCGType type,
+uint64_t x, uint64_t y)
 {
-const TCGOpDef *def = _op_defs[op];
 uint64_t res = do_constant_folding_2(op, x, y);
-if (!(def->flags & TCG_OPF_64BIT)) {
+if (type == TCG_TYPE_I32) {
 res = (int32_t)res;
 }
 return res;
@@ -510,19 +507,21 @@ static bool do_constant_folding_cond_eq(TCGCond c)
  * Return -1 if the condition can't be simplified,
  * and the result of the condition (0 or 1) if it can.
  */
-static int do_constant_folding_cond(TCGOpcode op, TCGArg x,
+static int do_constant_folding_cond(TCGType type, TCGArg x,
 TCGArg y, TCGCond c)
 {
 uint64_t xv = arg_info(x)->val;
 uint64_t yv = arg_info(y)->val;
 
 if (arg_is_const(x) && arg_is_const(y)) {
-const TCGOpDef *def = _op_defs[op];
-tcg_debug_assert(!(def->flags & TCG_OPF_VECTOR));
-if (def->flags & TCG_OPF_64BIT) {
-return do_constant_folding_cond_64(xv, yv, c);
-} else {
+switch (type) {
+case TCG_TYPE_I32:
 return do_constant_folding_cond_32(xv, yv, c);
+case TCG_TYPE_I64:
+return do_constant_folding_cond_64(xv, yv, c);
+default:
+/* Only scalar comparisons are optimizable */
+return -1;
 }
 } else if (args_are_copies(x, y)) {
 return do_constant_folding_cond_eq(c);
@@ -677,7 +676,7 @@ static bool fold_const1(OptContext *ctx, TCGOp *op)
 uint64_t t;
 
 t = arg_info(op->args[1])->val;
-t = do_constant_folding(op->opc, t, 0);
+t = do_constant_folding(op->opc, ctx->type, t, 0);
 return tcg_opt_gen_movi(ctx, op, op->args[0], t);
 }
 return false;
@@ -689,7 +688,7 @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
 uint64_t t1 = arg_info(op->args[1])->val;
 uint64_t t2 = arg_info(op->args[2])->val;
 
-t1 = do_constant_folding(op->opc, t1, t2);
+

[PULL 33/56] tcg/optimize: Split out fold_dup, fold_dup2

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 53 +-
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5374c230da..8524fe1f8a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -920,6 +920,31 @@ static bool fold_divide(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_dup(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1])) {
+uint64_t t = arg_info(op->args[1])->val;
+t = dup_const(TCGOP_VECE(op), t);
+return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+}
+return false;
+}
+
+static bool fold_dup2(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+uint64_t t = deposit64(arg_info(op->args[1])->val, 32, 32,
+   arg_info(op->args[2])->val);
+return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+}
+
+if (args_are_copies(op->args[1], op->args[2])) {
+op->opc = INDEX_op_dup_vec;
+TCGOP_VECE(op) = MO_32;
+}
+return false;
+}
+
 static bool fold_eqv(OptContext *ctx, TCGOp *op)
 {
 return fold_const2(ctx, op);
@@ -1731,28 +1756,6 @@ void tcg_optimize(TCGContext *s)
 done = tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
 break;
 
-case INDEX_op_dup_vec:
-if (arg_is_const(op->args[1])) {
-tmp = arg_info(op->args[1])->val;
-tmp = dup_const(TCGOP_VECE(op), tmp);
-tcg_opt_gen_movi(, op, op->args[0], tmp);
-continue;
-}
-break;
-
-case INDEX_op_dup2_vec:
-assert(TCG_TARGET_REG_BITS == 32);
-if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-tcg_opt_gen_movi(, op, op->args[0],
- deposit64(arg_info(op->args[1])->val, 32, 32,
-   arg_info(op->args[2])->val));
-continue;
-} else if (args_are_copies(op->args[1], op->args[2])) {
-op->opc = INDEX_op_dup_vec;
-TCGOP_VECE(op) = MO_32;
-}
-break;
-
 default:
 break;
 
@@ -1796,6 +1799,12 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(divu):
 done = fold_divide(, op);
 break;
+case INDEX_op_dup_vec:
+done = fold_dup(, op);
+break;
+case INDEX_op_dup2_vec:
+done = fold_dup2(, op);
+break;
 CASE_OP_32_64(eqv):
 done = fold_eqv(, op);
 break;
-- 
2.25.1




Re: [PATCH 00/16] fdt: Make OF_BOARD a boolean option

2021-10-27 Thread Simon Glass
Hi Ilias,

On Tue, 26 Oct 2021 at 00:46, Ilias Apalodimas
 wrote:
>
> Hi Simon,
>
> A bit late to the party, sorry!

(Did you remember the beer? I am replying to this but I don't think it
is all that helpful for me to reply to a lot of things on this thread,
since I would not be adding much to my cover letter and patches)

>
> [...]
>
> > >
> > > I really want to see what the binary case looks like since we could then
> > > kill off rpi_{3,3_b,4}_defconfig and I would need to see if we could
> > > then also do a rpi_arm32_defconfig too.
> > >
> > > I want to see less device trees in U-Boot sources, if they can come
> > > functionally correct from the hardware/our caller.
> > >
> > > And I'm not seeing how we make use of "U-Boot /config" if we also don't
> > > use the device tree from build time at run time, ignoring the device
> > > tree provided to us at run time by the caller.
> >
> > Firstly I should say that I find building firmware very messy and
> > confusing these days. Lots of things to build and it's hard to find
> > the instructions. It doesn't have to be that way, but if we carry on
> > as we are, it will continue to be messy and in five years you will
> > need a Ph.D and a lucky charm to boot on any modern board. My
> > objective here is to simplify things, bringing some consistency to the
> > different components. Binman was one effort there. I feel that putting
> > at least the U-Boot house in order, in my role as devicetree
> > maintainer (and as author of devicetree support in U-Boot back in
> > 2011), is the next step.
> >
> > If we set things up correctly and agree on the bindings, devicetree
> > can be the unifying configuration mechanism through the whole of
> > firmware (except for very early bits) and into the OS, this will set
> > us up very well to deal with the complexity that is coming.
> >
> > Anyway, here are the mental steps that I've gone through over the past
> > two months:
> >
> > Step 1: At present, some people think U-Boot is not even allowed to
> > have its own nodes/properties in the DT. It is an abuse of the
> > devicetree standard, like the /chosen node but with less history. We
> > should sacrifice efficiency, expedience and expandability on the altar
> > of 'devicetree is a hardware description'. How do we get over that
> > one? Wel, I just think we need to accept that U-Boot uses devicetree
> > for its own purposes, as well as for booting the OS. I am not saying
> > it always has to have those properties, but with existing features
> > like verified boot, SPL as well as complex firmware images where
> > U-Boot needs to be able to find things in the image, it is essential.
> > So let's just assume that we need this everywhere, since we certainly
> > need it in at least some places.
> >
> > (stop reading here if you disagree, because nothing below will make
> > any sense...you can still use U-Boot v2011.06 which doesn't have
> > OF_CONTROL :-)
>
> Having U-Boot keep it's *internal* config state in DTs is fine.  Adding
> that to the DTs that are copied over from linux isn't imho.  There are
> various reasons for that.  First of all syncing device trees is a huge pain
> and that's probably one of the main reasons our DTs are out of sync for a
> large number of boards.
> The point is this was fine in 2011 were we had SPL only,  but the reality
> today is completely different.  There's previous stage boot loaders (and
> enough cases were vendors prefer those over SPL).  If that bootloader needs
> to use it's own device tree for whatever reason,  imposing restrictions on
> it wrt to the device tree it has to include,  and require them to have
> knowledge of U-Boot and it's internal config mechanism makes no sense not
> to mention it doesn't scale at all.

I think the solution here may be the binman image packer. It works
from a description of the image (i.e. is data-driver) and can collect
all the pieces together. The U-Boot properties (and the ones required
by TF-A, etc.) can be added at package time.

If you think about it, it doesn't matter what properties are in the DT
that is put into the firmware image. TF-A, for example, is presumably
reading a devicetree from flash, so what does it care if it has some
U-Boot properties in it?

As to syncing, we have solved this using u-boot.dtsi files in U-Boot,
so I think this can be dealt with.

>
> >
> > Step 2: Assume U-Boot has its own nodes/properties. How do they get
> > there? Well, we have u-boot.dtsi files for that (the 2016 patch
> > "6d427c6b1fa binman: Automatically include a U-Boot .dtsi file"), we
> > have binman definitions, etc. So we need a way to overlay those things
> > into the DT. We already support this for in-tree DTs, so IMO this is
> > easy. Just require every board to have an in-tree DT. It helps with
> > discoverability and documentation, anyway. That is this series.
> >
>
> Again, the board might decide for it's own reason to provide it's own DT.
> IMHO U-Boot must be able to cope with that and asking 

[PULL 55/56] tcg/optimize: Propagate sign info for bit counting

2021-10-27 Thread Richard Henderson
The results are generally 6 bit unsigned values, though
the count leading and trailing bits may produce any value
for a zero input.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5fa4d7285d..c0eccc61d6 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1229,7 +1229,7 @@ static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
 g_assert_not_reached();
 }
 ctx->z_mask = arg_info(op->args[2])->z_mask | z_mask;
-
+ctx->s_mask = smask_from_zmask(ctx->z_mask);
 return false;
 }
 
@@ -1249,6 +1249,7 @@ static bool fold_ctpop(OptContext *ctx, TCGOp *op)
 default:
 g_assert_not_reached();
 }
+ctx->s_mask = smask_from_zmask(ctx->z_mask);
 return false;
 }
 
-- 
2.25.1




Re: [PATCH] e1000e: Added ICR clearing by corresponding IMS bit.

2021-10-27 Thread Jason Wang
On Wed, Oct 27, 2021 at 6:57 PM Andrew Melnichenko  wrote:
>
> Hi,
> Let's make things clear.
> At first, I've decided to fix the issue in the linux e1000e driver.
> (https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20200413/019497.html)
> Original driver developers suggest to fix the issue on qemu and assures that 
> driver works correctly on real hw.
> I've added fix according to the note 13.3.28 of the 8257X manual
> (https://www.intel.com/content/dam/www/public/us/en/documents/manuals/pcie-gbe-controllers-open-source-manual.pdf)
> I've reference to 8257X spec which is an apparently a bit different to 
> 82574l-gbe-controller-datasheet.pdf

Yes, and 82475l is the referenced model when developing e1000e
emulation code I think.

Thanks

>
>
> On Thu, Oct 21, 2021 at 5:16 AM Jason Wang  wrote:
>>
>> Hi Andrew:
>>
>> On Thu, Oct 21, 2021 at 6:27 AM Andrew Melnichenko  wrote:
>> >
>> > Hi,
>> > I've used this 
>> > manual(https://www.intel.com/content/dam/www/public/us/en/documents/manuals/pcie-gbe-controllers-open-source-manual.pdf)
>> > It was provided by Intel when I've tried to research that bug.
>> > Although it's a bit newer manual - the article is 13.3.28.
>>
>> Note that it's not the model that e1000e tries to implement (82574L).
>> The device ID in qemu is 0x10D3 which is not listed in the above link
>> "4.7.7 Mandatory PCI Configuration Registers".
>>
>> Thanks
>>
>> >
>> >
>> > On Tue, Oct 19, 2021 at 10:56 AM Jason Wang  wrote:
>> >>
>> >> On Thu, Oct 14, 2021 at 4:34 PM Andrew Melnichenko  
>> >> wrote:
>> >> >
>> >> > Ping
>> >> >
>> >> > On Wed, Aug 18, 2021 at 9:10 PM Andrew Melnychenko  
>> >> > wrote:
>> >> >>
>> >> >> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1707441
>> >> >>
>> >> >> The issue is in LSC clearing. So, after "link up"(during 
>> >> >> initialization),
>> >> >> the next LSC event is masked and can't be processed.
>> >> >> Technically, the event should be 'cleared' during ICR read.
>> >> >> On Windows guest, everything works well, mostly because of
>> >> >> different interrupt routines(ICR clears during register write).
>> >> >> So, added ICR clearing during reading, according to the note by
>> >> >> section 13.3.27 of the 8257X developers manual.
>> >> >>
>> >> >> Signed-off-by: Andrew Melnychenko 
>> >> >> ---
>> >> >>  hw/net/e1000e_core.c | 10 ++
>> >> >>  hw/net/trace-events  |  1 +
>> >> >>  2 files changed, 11 insertions(+)
>> >> >>
>> >> >> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
>> >> >> index b75f2ab8fc..288897a975 100644
>> >> >> --- a/hw/net/e1000e_core.c
>> >> >> +++ b/hw/net/e1000e_core.c
>> >> >> @@ -2617,6 +2617,16 @@ e1000e_mac_icr_read(E1000ECore *core, int index)
>> >> >>  e1000e_clear_ims_bits(core, core->mac[IAM]);
>> >> >>  }
>> >> >>
>> >> >> +/*
>> >> >> + * PCIe* GbE Controllers Open Source Software Developer's Manual
>> >> >> + * 13.3.27 Interrupt Cause Read Register
>> >>
>> >> Per link in the beginning of this file it should be 82574l I guess?
>> >>
>> >> If yes, I'm using revision 3.4 and it's 13.3.27 is not about ICR.
>> >>
>> >> What it said are:
>> >>
>> >> "
>> >> In MSI-X mode the bits in this register can be configured to
>> >> auto-clear when the MSI-X interrupt message is sent, in order to
>> >> minimize driver overhead, and when using MSI-X interrupt signaling. In
>> >> systems that do not support MSI-X, reading the ICR register clears
>> >> it's bits or writing 1b's clears the corresponding bits in this
>> >> register.
>> >> "
>> >>
>> >>
>> >> >> + */
>> >> >> +if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
>> >> >> +(core->mac[ICR] & core->mac[IMS])) {
>> >> >> +trace_e1000e_irq_icr_clear_icr_bit_ims(core->mac[ICR], 
>> >> >> core->mac[IMS]);
>> >> >> +core->mac[ICR] = 0;
>> >> >> +}
>> >>
>> >> Thanks
>> >>
>> >> >> +
>> >> >>  trace_e1000e_irq_icr_read_exit(core->mac[ICR]);
>> >> >>  e1000e_update_interrupt_state(core);
>> >> >>  return ret;
>> >> >> diff --git a/hw/net/trace-events b/hw/net/trace-events
>> >> >> index c28b91ee1a..15fd09aa1c 100644
>> >> >> --- a/hw/net/trace-events
>> >> >> +++ b/hw/net/trace-events
>> >> >> @@ -225,6 +225,7 @@ e1000e_irq_icr_read_entry(uint32_t icr) "Starting 
>> >> >> ICR read. Current ICR: 0x%x"
>> >> >>  e1000e_irq_icr_read_exit(uint32_t icr) "Ending ICR read. Current ICR: 
>> >> >> 0x%x"
>> >> >>  e1000e_irq_icr_clear_zero_ims(void) "Clearing ICR on read due to zero 
>> >> >> IMS"
>> >> >>  e1000e_irq_icr_clear_iame(void) "Clearing ICR on read due to IAME"
>> >> >> +e1000e_irq_icr_clear_icr_bit_ims(uint32_t icr, uint32_t ims) 
>> >> >> "Clearing ICR on read due corresponding IMS bit: 0x%x & 0x%x"
>> >> >>  e1000e_irq_iam_clear_eiame(uint32_t iam, uint32_t cause) "Clearing 
>> >> >> IMS due to EIAME, IAM: 0x%X, cause: 0x%X"
>> >> >>  e1000e_irq_icr_clear_eiac(uint32_t icr, uint32_t eiac) "Clearing ICR 
>> >> >> bits due to EIAC, ICR: 0x%X, EIAC: 0x%X"
>> >> >>  

[PULL 28/56] tcg/optimize: Split out fold_extract2

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 39 ++-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 110b3d1cc2..faedbdbfb8 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -888,6 +888,25 @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_extract2(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+uint64_t v1 = arg_info(op->args[1])->val;
+uint64_t v2 = arg_info(op->args[2])->val;
+int shr = op->args[3];
+
+if (op->opc == INDEX_op_extract2_i64) {
+v1 >>= shr;
+v2 <<= 64 - shr;
+} else {
+v1 = (uint32_t)v1 >> shr;
+v2 = (int32_t)v2 << (32 - shr);
+}
+return tcg_opt_gen_movi(ctx, op, op->args[0], v1 | v2);
+}
+return false;
+}
+
 static bool fold_exts(OptContext *ctx, TCGOp *op)
 {
 return fold_const1(ctx, op);
@@ -1726,23 +1745,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-CASE_OP_32_64(extract2):
-if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-uint64_t v1 = arg_info(op->args[1])->val;
-uint64_t v2 = arg_info(op->args[2])->val;
-int shr = op->args[3];
-
-if (opc == INDEX_op_extract2_i64) {
-tmp = (v1 >> shr) | (v2 << (64 - shr));
-} else {
-tmp = (int32_t)(((uint32_t)v1 >> shr) |
-((uint32_t)v2 << (32 - shr)));
-}
-tcg_opt_gen_movi(, op, op->args[0], tmp);
-continue;
-}
-break;
-
 default:
 break;
 
@@ -1777,6 +1779,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(eqv):
 done = fold_eqv(, op);
 break;
+CASE_OP_32_64(extract2):
+done = fold_extract2(, op);
+break;
 CASE_OP_32_64(ext8s):
 CASE_OP_32_64(ext16s):
 case INDEX_op_ext32s_i64:
-- 
2.25.1




[PULL 52/56] tcg/optimize: Optimize sign extensions

2021-10-27 Thread Richard Henderson
Certain targets, like riscv, produce signed 32-bit results.
This can lead to lots of redundant extensions as values are
manipulated.

Begin by tracking only the obvious sign-extensions, and
converting them to simple copies when possible.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 123 -
 1 file changed, 102 insertions(+), 21 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 7ac63c9231..ef202abbcb 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -43,6 +43,7 @@ typedef struct TempOptInfo {
 TCGTemp *next_copy;
 uint64_t val;
 uint64_t z_mask;  /* mask bit is 0 if and only if value bit is 0 */
+uint64_t s_mask;  /* a left-aligned mask of clrsb(value) bits. */
 } TempOptInfo;
 
 typedef struct OptContext {
@@ -53,9 +54,37 @@ typedef struct OptContext {
 /* In flight values from optimization. */
 uint64_t a_mask;  /* mask bit is 0 iff value identical to first input */
 uint64_t z_mask;  /* mask bit is 0 iff value bit is 0 */
+uint64_t s_mask;  /* mask of clrsb(value) bits */
 TCGType type;
 } OptContext;
 
+/* Calculate the smask for a specific value. */
+static uint64_t smask_from_value(uint64_t value)
+{
+int rep = clrsb64(value);
+return ~(~0ull >> rep);
+}
+
+/*
+ * Calculate the smask for a given set of known-zeros.
+ * If there are lots of zeros on the left, we can consider the remainder
+ * an unsigned field, and thus the corresponding signed field is one bit
+ * larger.
+ */
+static uint64_t smask_from_zmask(uint64_t zmask)
+{
+/*
+ * Only the 0 bits are significant for zmask, thus the msb itself
+ * must be zero, else we have no sign information.
+ */
+int rep = clz64(zmask);
+if (rep == 0) {
+return 0;
+}
+rep -= 1;
+return ~(~0ull >> rep);
+}
+
 static inline TempOptInfo *ts_info(TCGTemp *ts)
 {
 return ts->state_ptr;
@@ -94,6 +123,7 @@ static void reset_ts(TCGTemp *ts)
 ti->prev_copy = ts;
 ti->is_const = false;
 ti->z_mask = -1;
+ti->s_mask = 0;
 }
 
 static void reset_temp(TCGArg arg)
@@ -124,9 +154,11 @@ static void init_ts_info(OptContext *ctx, TCGTemp *ts)
 ti->is_const = true;
 ti->val = ts->val;
 ti->z_mask = ts->val;
+ti->s_mask = smask_from_value(ts->val);
 } else {
 ti->is_const = false;
 ti->z_mask = -1;
+ti->s_mask = 0;
 }
 }
 
@@ -220,6 +252,7 @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, 
TCGArg dst, TCGArg src)
 op->args[1] = src;
 
 di->z_mask = si->z_mask;
+di->s_mask = si->s_mask;
 
 if (src_ts->type == dst_ts->type) {
 TempOptInfo *ni = ts_info(si->next_copy);
@@ -658,13 +691,15 @@ static void finish_folding(OptContext *ctx, TCGOp *op)
 
 nb_oargs = def->nb_oargs;
 for (i = 0; i < nb_oargs; i++) {
-reset_temp(op->args[i]);
+TCGTemp *ts = arg_temp(op->args[i]);
+reset_ts(ts);
 /*
- * Save the corresponding known-zero bits mask for the
+ * Save the corresponding known-zero/sign bits mask for the
  * first output argument (only one supported so far).
  */
 if (i == 0) {
-arg_info(op->args[i])->z_mask = ctx->z_mask;
+ts_info(ts)->z_mask = ctx->z_mask;
+ts_info(ts)->s_mask = ctx->s_mask;
 }
 }
 }
@@ -714,6 +749,7 @@ static bool fold_masks(OptContext *ctx, TCGOp *op)
 {
 uint64_t a_mask = ctx->a_mask;
 uint64_t z_mask = ctx->z_mask;
+uint64_t s_mask = ctx->s_mask;
 
 /*
  * 32-bit ops generate 32-bit results, which for the purpose of
@@ -725,7 +761,9 @@ static bool fold_masks(OptContext *ctx, TCGOp *op)
 if (ctx->type == TCG_TYPE_I32) {
 a_mask = (int32_t)a_mask;
 z_mask = (int32_t)z_mask;
+s_mask |= MAKE_64BIT_MASK(32, 32);
 ctx->z_mask = z_mask;
+ctx->s_mask = s_mask;
 }
 
 if (z_mask == 0) {
@@ -1072,7 +1110,7 @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 
 static bool fold_bswap(OptContext *ctx, TCGOp *op)
 {
-uint64_t z_mask, sign;
+uint64_t z_mask, s_mask, sign;
 
 if (arg_is_const(op->args[1])) {
 uint64_t t = arg_info(op->args[1])->val;
@@ -1082,6 +1120,7 @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
 }
 
 z_mask = arg_info(op->args[1])->z_mask;
+
 switch (op->opc) {
 case INDEX_op_bswap16_i32:
 case INDEX_op_bswap16_i64:
@@ -1100,6 +1139,7 @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
 default:
 g_assert_not_reached();
 }
+s_mask = smask_from_zmask(z_mask);
 
 switch (op->args[2] & (TCG_BSWAP_OZ | TCG_BSWAP_OS)) {
 case TCG_BSWAP_OZ:
@@ -1108,14 +1148,17 @@ static bool fold_bswap(OptContext *ctx, TCGOp *op)
 /* If the sign bit may be 1, force all the bits above to 1. */
 if (z_mask & sign) {
 z_mask |= sign;
+s_mask 

[PULL 32/56] tcg/optimize: Split out fold_bswap

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index dd65f1afcd..5374c230da 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -841,6 +841,17 @@ static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 return false;
 }
 
+static bool fold_bswap(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1])) {
+uint64_t t = arg_info(op->args[1])->val;
+
+t = do_constant_folding(op->opc, t, op->args[2]);
+return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+}
+return false;
+}
+
 static bool fold_call(OptContext *ctx, TCGOp *op)
 {
 TCGContext *s = ctx->tcg;
@@ -1742,17 +1753,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-CASE_OP_32_64(bswap16):
-CASE_OP_32_64(bswap32):
-case INDEX_op_bswap64_i64:
-if (arg_is_const(op->args[1])) {
-tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
-  op->args[2]);
-tcg_opt_gen_movi(, op, op->args[0], tmp);
-continue;
-}
-break;
-
 default:
 break;
 
@@ -1777,6 +1777,11 @@ void tcg_optimize(TCGContext *s)
 case INDEX_op_brcond2_i32:
 done = fold_brcond2(, op);
 break;
+CASE_OP_32_64(bswap16):
+CASE_OP_32_64(bswap32):
+case INDEX_op_bswap64_i64:
+done = fold_bswap(, op);
+break;
 CASE_OP_32_64(clz):
 CASE_OP_32_64(ctz):
 done = fold_count_zeros(, op);
-- 
2.25.1




[PULL 47/56] tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit values

2021-10-27 Thread Richard Henderson
This "garbage" setting pre-dates the addition of the type
changing opcodes INDEX_op_ext_i32_i64, INDEX_op_extu_i32_i64,
and INDEX_op_extr{l,h}_i64_i32.

So now we have a definitive points at which to adjust z_mask
to eliminate such bits from the 32-bit operands.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e42f5a145f..e0abf769d0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -124,10 +124,6 @@ static void init_ts_info(OptContext *ctx, TCGTemp *ts)
 ti->is_const = true;
 ti->val = ts->val;
 ti->z_mask = ts->val;
-if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
-/* High bits of a 32-bit quantity are garbage.  */
-ti->z_mask |= ~0xull;
-}
 } else {
 ti->is_const = false;
 ti->z_mask = -1;
@@ -192,7 +188,6 @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, 
TCGArg dst, TCGArg src)
 TCGTemp *src_ts = arg_temp(src);
 TempOptInfo *di;
 TempOptInfo *si;
-uint64_t z_mask;
 TCGOpcode new_op;
 
 if (ts_are_copies(dst_ts, src_ts)) {
@@ -224,12 +219,7 @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, 
TCGArg dst, TCGArg src)
 op->args[0] = dst;
 op->args[1] = src;
 
-z_mask = si->z_mask;
-if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
-/* High bits of the destination are now garbage.  */
-z_mask |= ~0xull;
-}
-di->z_mask = z_mask;
+di->z_mask = si->z_mask;
 
 if (src_ts->type == dst_ts->type) {
 TempOptInfo *ni = ts_info(si->next_copy);
@@ -247,9 +237,14 @@ static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, 
TCGArg dst, TCGArg src)
 static bool tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
  TCGArg dst, uint64_t val)
 {
-/* Convert movi to mov with constant temp. */
-TCGTemp *tv = tcg_constant_internal(ctx->type, val);
+TCGTemp *tv;
 
+if (ctx->type == TCG_TYPE_I32) {
+val = (int32_t)val;
+}
+
+/* Convert movi to mov with constant temp. */
+tv = tcg_constant_internal(ctx->type, val);
 init_ts_info(ctx, tv);
 return tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
 }
@@ -721,14 +716,16 @@ static bool fold_masks(OptContext *ctx, TCGOp *op)
 uint64_t z_mask = ctx->z_mask;
 
 /*
- * 32-bit ops generate 32-bit results.  For the result is zero test
- * below, we can ignore high bits, but for further optimizations we
- * need to record that the high bits contain garbage.
+ * 32-bit ops generate 32-bit results, which for the purpose of
+ * simplifying tcg are sign-extended.  Certainly that's how we
+ * represent our constants elsewhere.  Note that the bits will
+ * be reset properly for a 64-bit value when encountering the
+ * type changing opcodes.
  */
 if (ctx->type == TCG_TYPE_I32) {
-ctx->z_mask |= MAKE_64BIT_MASK(32, 32);
-a_mask &= MAKE_64BIT_MASK(0, 32);
-z_mask &= MAKE_64BIT_MASK(0, 32);
+a_mask = (int32_t)a_mask;
+z_mask = (int32_t)z_mask;
+ctx->z_mask = z_mask;
 }
 
 if (z_mask == 0) {
-- 
2.25.1




[PULL 45/56] tcg/optimize: Expand fold_addsub2_i32 to 64-bit ops

2021-10-27 Thread Richard Henderson
Rename to fold_addsub2.
Use Int128 to implement the wider operation.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 65 ++
 1 file changed, 44 insertions(+), 21 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e2ecad2884..f723deaafe 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -24,6 +24,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/int128.h"
 #include "tcg/tcg-op.h"
 #include "tcg-internal.h"
 
@@ -838,37 +839,59 @@ static bool fold_add(OptContext *ctx, TCGOp *op)
 return false;
 }
 
-static bool fold_addsub2_i32(OptContext *ctx, TCGOp *op, bool add)
+static bool fold_addsub2(OptContext *ctx, TCGOp *op, bool add)
 {
 if (arg_is_const(op->args[2]) && arg_is_const(op->args[3]) &&
 arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
-uint32_t al = arg_info(op->args[2])->val;
-uint32_t ah = arg_info(op->args[3])->val;
-uint32_t bl = arg_info(op->args[4])->val;
-uint32_t bh = arg_info(op->args[5])->val;
-uint64_t a = ((uint64_t)ah << 32) | al;
-uint64_t b = ((uint64_t)bh << 32) | bl;
+uint64_t al = arg_info(op->args[2])->val;
+uint64_t ah = arg_info(op->args[3])->val;
+uint64_t bl = arg_info(op->args[4])->val;
+uint64_t bh = arg_info(op->args[5])->val;
 TCGArg rl, rh;
-TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
+TCGOp *op2;
 
-if (add) {
-a += b;
+if (ctx->type == TCG_TYPE_I32) {
+uint64_t a = deposit64(al, 32, 32, ah);
+uint64_t b = deposit64(bl, 32, 32, bh);
+
+if (add) {
+a += b;
+} else {
+a -= b;
+}
+
+al = sextract64(a, 0, 32);
+ah = sextract64(a, 32, 32);
 } else {
-a -= b;
+Int128 a = int128_make128(al, ah);
+Int128 b = int128_make128(bl, bh);
+
+if (add) {
+a = int128_add(a, b);
+} else {
+a = int128_sub(a, b);
+}
+
+al = int128_getlo(a);
+ah = int128_gethi(a);
 }
 
 rl = op->args[0];
 rh = op->args[1];
-tcg_opt_gen_movi(ctx, op, rl, (int32_t)a);
-tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(a >> 32));
+
+/* The proper opcode is supplied by tcg_opt_gen_mov. */
+op2 = tcg_op_insert_before(ctx->tcg, op, 0);
+
+tcg_opt_gen_movi(ctx, op, rl, al);
+tcg_opt_gen_movi(ctx, op2, rh, ah);
 return true;
 }
 return false;
 }
 
-static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
+static bool fold_add2(OptContext *ctx, TCGOp *op)
 {
-return fold_addsub2_i32(ctx, op, true);
+return fold_addsub2(ctx, op, true);
 }
 
 static bool fold_and(OptContext *ctx, TCGOp *op)
@@ -1725,9 +1748,9 @@ static bool fold_sub(OptContext *ctx, TCGOp *op)
 return false;
 }
 
-static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
+static bool fold_sub2(OptContext *ctx, TCGOp *op)
 {
-return fold_addsub2_i32(ctx, op, false);
+return fold_addsub2(ctx, op, false);
 }
 
 static bool fold_tcg_ld(OptContext *ctx, TCGOp *op)
@@ -1873,8 +1896,8 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(add):
 done = fold_add(, op);
 break;
-case INDEX_op_add2_i32:
-done = fold_add2_i32(, op);
+CASE_OP_32_64(add2):
+done = fold_add2(, op);
 break;
 CASE_OP_32_64_VEC(and):
 done = fold_and(, op);
@@ -2011,8 +2034,8 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(sub):
 done = fold_sub(, op);
 break;
-case INDEX_op_sub2_i32:
-done = fold_sub2_i32(, op);
+CASE_OP_32_64(sub2):
+done = fold_sub2(, op);
 break;
 CASE_OP_32_64_VEC(xor):
 done = fold_xor(, op);
-- 
2.25.1




[PULL 29/56] tcg/optimize: Split out fold_extract, fold_sextract

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 48 ++--
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index faedbdbfb8..3bd5f043c8 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -888,6 +888,18 @@ static bool fold_eqv(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_extract(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1])) {
+uint64_t t;
+
+t = arg_info(op->args[1])->val;
+t = extract64(t, op->args[2], op->args[3]);
+return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+}
+return false;
+}
+
 static bool fold_extract2(OptContext *ctx, TCGOp *op)
 {
 if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
@@ -1126,6 +1138,18 @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
 return tcg_opt_gen_movi(ctx, op, op->args[0], i);
 }
 
+static bool fold_sextract(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1])) {
+uint64_t t;
+
+t = arg_info(op->args[1])->val;
+t = sextract64(t, op->args[2], op->args[3]);
+return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+}
+return false;
+}
+
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
 return fold_const2(ctx, op);
@@ -1727,24 +1751,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-CASE_OP_32_64(extract):
-if (arg_is_const(op->args[1])) {
-tmp = extract64(arg_info(op->args[1])->val,
-op->args[2], op->args[3]);
-tcg_opt_gen_movi(, op, op->args[0], tmp);
-continue;
-}
-break;
-
-CASE_OP_32_64(sextract):
-if (arg_is_const(op->args[1])) {
-tmp = sextract64(arg_info(op->args[1])->val,
- op->args[2], op->args[3]);
-tcg_opt_gen_movi(, op, op->args[0], tmp);
-continue;
-}
-break;
-
 default:
 break;
 
@@ -1779,6 +1785,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(eqv):
 done = fold_eqv(, op);
 break;
+CASE_OP_32_64(extract):
+done = fold_extract(, op);
+break;
 CASE_OP_32_64(extract2):
 done = fold_extract2(, op);
 break;
@@ -1856,6 +1865,9 @@ void tcg_optimize(TCGContext *s)
 case INDEX_op_setcond2_i32:
 done = fold_setcond2(, op);
 break;
+CASE_OP_32_64(sextract):
+done = fold_sextract(, op);
+break;
 CASE_OP_32_64_VEC(sub):
 done = fold_sub(, op);
 break;
-- 
2.25.1




[PULL 44/56] tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multiplies

2021-10-27 Thread Richard Henderson
Rename to fold_multiply2, and handle muls2_i32, mulu2_i64,
and muls2_i64.

Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 44 +++-
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e84d10be53..e2ecad2884 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1412,19 +1412,44 @@ static bool fold_mul_highpart(OptContext *ctx, TCGOp 
*op)
 return false;
 }
 
-static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
+static bool fold_multiply2(OptContext *ctx, TCGOp *op)
 {
 if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
-uint32_t a = arg_info(op->args[2])->val;
-uint32_t b = arg_info(op->args[3])->val;
-uint64_t r = (uint64_t)a * b;
+uint64_t a = arg_info(op->args[2])->val;
+uint64_t b = arg_info(op->args[3])->val;
+uint64_t h, l;
 TCGArg rl, rh;
-TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
+TCGOp *op2;
+
+switch (op->opc) {
+case INDEX_op_mulu2_i32:
+l = (uint64_t)(uint32_t)a * (uint32_t)b;
+h = (int32_t)(l >> 32);
+l = (int32_t)l;
+break;
+case INDEX_op_muls2_i32:
+l = (int64_t)(int32_t)a * (int32_t)b;
+h = l >> 32;
+l = (int32_t)l;
+break;
+case INDEX_op_mulu2_i64:
+mulu64(, , a, b);
+break;
+case INDEX_op_muls2_i64:
+muls64(, , a, b);
+break;
+default:
+g_assert_not_reached();
+}
 
 rl = op->args[0];
 rh = op->args[1];
-tcg_opt_gen_movi(ctx, op, rl, (int32_t)r);
-tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(r >> 32));
+
+/* The proper opcode is supplied by tcg_opt_gen_mov. */
+op2 = tcg_op_insert_before(ctx->tcg, op, 0);
+
+tcg_opt_gen_movi(ctx, op, rl, l);
+tcg_opt_gen_movi(ctx, op2, rh, h);
 return true;
 }
 return false;
@@ -1932,8 +1957,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(muluh):
 done = fold_mul_highpart(, op);
 break;
-case INDEX_op_mulu2_i32:
-done = fold_mulu2_i32(, op);
+CASE_OP_32_64(muls2):
+CASE_OP_32_64(mulu2):
+done = fold_multiply2(, op);
 break;
 CASE_OP_32_64(nand):
 done = fold_nand(, op);
-- 
2.25.1




[PULL 49/56] tcg/optimize: Use fold_xi_to_x for mul

2021-10-27 Thread Richard Henderson
Recognize the identity function for low-part multiply.

Suggested-by: Luis Pires 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 6d795954f2..907049fb06 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1461,7 +1461,8 @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
 static bool fold_mul(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
-fold_xi_to_i(ctx, op, 0)) {
+fold_xi_to_i(ctx, op, 0) ||
+fold_xi_to_x(ctx, op, 1)) {
 return true;
 }
 return false;
-- 
2.25.1




[PULL 51/56] tcg/optimize: Use fold_xx_to_i for rem

2021-10-27 Thread Richard Henderson
Recognize the constant function for remainder.

Suggested-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f8b0709157..7ac63c9231 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1624,7 +1624,11 @@ static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
 
 static bool fold_remainder(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xx_to_i(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_setcond(OptContext *ctx, TCGOp *op)
-- 
2.25.1




[PULL 26/56] tcg/optimize: Split out fold_addsub2_i32

2021-10-27 Thread Richard Henderson
Add two additional helpers, fold_add2_i32 and fold_sub2_i32
which will not be simple wrappers forever.

Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 70 +++---
 1 file changed, 44 insertions(+), 26 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 805522f99d..9d1d045363 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -704,6 +704,39 @@ static bool fold_add(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_addsub2_i32(OptContext *ctx, TCGOp *op, bool add)
+{
+if (arg_is_const(op->args[2]) && arg_is_const(op->args[3]) &&
+arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
+uint32_t al = arg_info(op->args[2])->val;
+uint32_t ah = arg_info(op->args[3])->val;
+uint32_t bl = arg_info(op->args[4])->val;
+uint32_t bh = arg_info(op->args[5])->val;
+uint64_t a = ((uint64_t)ah << 32) | al;
+uint64_t b = ((uint64_t)bh << 32) | bl;
+TCGArg rl, rh;
+TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
+
+if (add) {
+a += b;
+} else {
+a -= b;
+}
+
+rl = op->args[0];
+rh = op->args[1];
+tcg_opt_gen_movi(ctx, op, rl, (int32_t)a);
+tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(a >> 32));
+return true;
+}
+return false;
+}
+
+static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
+{
+return fold_addsub2_i32(ctx, op, true);
+}
+
 static bool fold_and(OptContext *ctx, TCGOp *op)
 {
 return fold_const2(ctx, op);
@@ -1056,6 +1089,11 @@ static bool fold_sub(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
+{
+return fold_addsub2_i32(ctx, op, false);
+}
+
 static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
 return fold_const2(ctx, op);
@@ -1701,32 +1739,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-case INDEX_op_add2_i32:
-case INDEX_op_sub2_i32:
-if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])
-&& arg_is_const(op->args[4]) && arg_is_const(op->args[5])) {
-uint32_t al = arg_info(op->args[2])->val;
-uint32_t ah = arg_info(op->args[3])->val;
-uint32_t bl = arg_info(op->args[4])->val;
-uint32_t bh = arg_info(op->args[5])->val;
-uint64_t a = ((uint64_t)ah << 32) | al;
-uint64_t b = ((uint64_t)bh << 32) | bl;
-TCGArg rl, rh;
-TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
-
-if (opc == INDEX_op_add2_i32) {
-a += b;
-} else {
-a -= b;
-}
-
-rl = op->args[0];
-rh = op->args[1];
-tcg_opt_gen_movi(, op, rl, (int32_t)a);
-tcg_opt_gen_movi(, op2, rh, (int32_t)(a >> 32));
-continue;
-}
-break;
 
 default:
 break;
@@ -1737,6 +1749,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(add):
 done = fold_add(, op);
 break;
+case INDEX_op_add2_i32:
+done = fold_add2_i32(, op);
+break;
 CASE_OP_32_64_VEC(and):
 done = fold_and(, op);
 break;
@@ -1833,6 +1848,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(sub):
 done = fold_sub(, op);
 break;
+case INDEX_op_sub2_i32:
+done = fold_sub2_i32(, op);
+break;
 CASE_OP_32_64_VEC(xor):
 done = fold_xor(, op);
 break;
-- 
2.25.1




[PULL 50/56] tcg/optimize: Use fold_xi_to_x for div

2021-10-27 Thread Richard Henderson
Recognize the identity function for division.

Suggested-by: Luis Pires 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 907049fb06..f8b0709157 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1218,7 +1218,11 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
 
 static bool fold_divide(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xi_to_x(ctx, op, 1)) {
+return true;
+}
+return false;
 }
 
 static bool fold_dup(OptContext *ctx, TCGOp *op)
-- 
2.25.1




[PULL 40/56] tcg/optimize: Split out fold_sub_to_neg

2021-10-27 Thread Richard Henderson
Even though there is only one user, place this more complex
conversion into its own helper.

Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 89 ++
 1 file changed, 47 insertions(+), 42 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 21f4251b4f..e0d850ffe4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1203,7 +1203,15 @@ static bool fold_nand(OptContext *ctx, TCGOp *op)
 
 static bool fold_neg(OptContext *ctx, TCGOp *op)
 {
-return fold_const1(ctx, op);
+if (fold_const1(ctx, op)) {
+return true;
+}
+/*
+ * Because of fold_sub_to_neg, we want to always return true,
+ * via finish_folding.
+ */
+finish_folding(ctx, op);
+return true;
 }
 
 static bool fold_nor(OptContext *ctx, TCGOp *op)
@@ -1360,10 +1368,47 @@ static bool fold_shift(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_sub_to_neg(OptContext *ctx, TCGOp *op)
+{
+TCGOpcode neg_op;
+bool have_neg;
+
+if (!arg_is_const(op->args[1]) || arg_info(op->args[1])->val != 0) {
+return false;
+}
+
+switch (ctx->type) {
+case TCG_TYPE_I32:
+neg_op = INDEX_op_neg_i32;
+have_neg = TCG_TARGET_HAS_neg_i32;
+break;
+case TCG_TYPE_I64:
+neg_op = INDEX_op_neg_i64;
+have_neg = TCG_TARGET_HAS_neg_i64;
+break;
+case TCG_TYPE_V64:
+case TCG_TYPE_V128:
+case TCG_TYPE_V256:
+neg_op = INDEX_op_neg_vec;
+have_neg = (TCG_TARGET_HAS_neg_vec &&
+tcg_can_emit_vec_op(neg_op, ctx->type, TCGOP_VECE(op)) > 
0);
+break;
+default:
+g_assert_not_reached();
+}
+if (have_neg) {
+op->opc = neg_op;
+op->args[1] = op->args[2];
+return fold_neg(ctx, op);
+}
+return false;
+}
+
 static bool fold_sub(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
-fold_xx_to_i(ctx, op, 0)) {
+fold_xx_to_i(ctx, op, 0) ||
+fold_sub_to_neg(ctx, op)) {
 return true;
 }
 return false;
@@ -1497,46 +1542,6 @@ void tcg_optimize(TCGContext *s)
 continue;
 }
 break;
-CASE_OP_32_64_VEC(sub):
-{
-TCGOpcode neg_op;
-bool have_neg;
-
-if (arg_is_const(op->args[2])) {
-/* Proceed with possible constant folding. */
-break;
-}
-switch (ctx.type) {
-case TCG_TYPE_I32:
-neg_op = INDEX_op_neg_i32;
-have_neg = TCG_TARGET_HAS_neg_i32;
-break;
-case TCG_TYPE_I64:
-neg_op = INDEX_op_neg_i64;
-have_neg = TCG_TARGET_HAS_neg_i64;
-break;
-case TCG_TYPE_V64:
-case TCG_TYPE_V128:
-case TCG_TYPE_V256:
-neg_op = INDEX_op_neg_vec;
-have_neg = tcg_can_emit_vec_op(neg_op, ctx.type,
-   TCGOP_VECE(op)) > 0;
-break;
-default:
-g_assert_not_reached();
-}
-if (!have_neg) {
-break;
-}
-if (arg_is_const(op->args[1])
-&& arg_info(op->args[1])->val == 0) {
-op->opc = neg_op;
-reset_temp(op->args[0]);
-op->args[1] = op->args[2];
-continue;
-}
-}
-break;
 default:
 break;
 }
-- 
2.25.1




[PULL 48/56] tcg/optimize: Use fold_xx_to_i for orc

2021-10-27 Thread Richard Henderson
Recognize the constant function for or-complement.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e0abf769d0..6d795954f2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1586,6 +1586,7 @@ static bool fold_or(OptContext *ctx, TCGOp *op)
 static bool fold_orc(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
+fold_xx_to_i(ctx, op, -1) ||
 fold_xi_to_x(ctx, op, -1) ||
 fold_ix_to_not(ctx, op, 0)) {
 return true;
-- 
2.25.1




[PULL 19/56] tcg/optimize: Split out fold_mb, fold_qemu_{ld,st}

2021-10-27 Thread Richard Henderson
This puts the separate mb optimization into the same framework
as the others.  While fold_qemu_{ld,st} are currently identical,
that won't last as more code gets moved.

Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 89 +-
 1 file changed, 51 insertions(+), 38 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 699476e2f1..159a5a9ee5 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -692,6 +692,44 @@ static bool fold_call(OptContext *ctx, TCGOp *op)
 return true;
 }
 
+static bool fold_mb(OptContext *ctx, TCGOp *op)
+{
+/* Eliminate duplicate and redundant fence instructions.  */
+if (ctx->prev_mb) {
+/*
+ * Merge two barriers of the same type into one,
+ * or a weaker barrier into a stronger one,
+ * or two weaker barriers into a stronger one.
+ *   mb X; mb Y => mb X|Y
+ *   mb; strl => mb; st
+ *   ldaq; mb => ld; mb
+ *   ldaq; strl => ld; mb; st
+ * Other combinations are also merged into a strong
+ * barrier.  This is stricter than specified but for
+ * the purposes of TCG is better than not optimizing.
+ */
+ctx->prev_mb->args[0] |= op->args[0];
+tcg_op_remove(ctx->tcg, op);
+} else {
+ctx->prev_mb = op;
+}
+return true;
+}
+
+static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
+{
+/* Opcodes that touch guest memory stop the mb optimization.  */
+ctx->prev_mb = NULL;
+return false;
+}
+
+static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
+{
+/* Opcodes that touch guest memory stop the mb optimization.  */
+ctx->prev_mb = NULL;
+return false;
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
@@ -1599,6 +1637,19 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
+case INDEX_op_mb:
+done = fold_mb(, op);
+break;
+case INDEX_op_qemu_ld_i32:
+case INDEX_op_qemu_ld_i64:
+done = fold_qemu_ld(, op);
+break;
+case INDEX_op_qemu_st_i32:
+case INDEX_op_qemu_st8_i32:
+case INDEX_op_qemu_st_i64:
+done = fold_qemu_st(, op);
+break;
+
 default:
 break;
 }
@@ -1606,43 +1657,5 @@ void tcg_optimize(TCGContext *s)
 if (!done) {
 finish_folding(, op);
 }
-
-/* Eliminate duplicate and redundant fence instructions.  */
-if (ctx.prev_mb) {
-switch (opc) {
-case INDEX_op_mb:
-/* Merge two barriers of the same type into one,
- * or a weaker barrier into a stronger one,
- * or two weaker barriers into a stronger one.
- *   mb X; mb Y => mb X|Y
- *   mb; strl => mb; st
- *   ldaq; mb => ld; mb
- *   ldaq; strl => ld; mb; st
- * Other combinations are also merged into a strong
- * barrier.  This is stricter than specified but for
- * the purposes of TCG is better than not optimizing.
- */
-ctx.prev_mb->args[0] |= op->args[0];
-tcg_op_remove(s, op);
-break;
-
-default:
-/* Opcodes that end the block stop the optimization.  */
-if ((def->flags & TCG_OPF_BB_END) == 0) {
-break;
-}
-/* fallthru */
-case INDEX_op_qemu_ld_i32:
-case INDEX_op_qemu_ld_i64:
-case INDEX_op_qemu_st_i32:
-case INDEX_op_qemu_st8_i32:
-case INDEX_op_qemu_st_i64:
-/* Opcodes that touch guest memory stop the optimization.  */
-ctx.prev_mb = NULL;
-break;
-}
-} else if (opc == INDEX_op_mb) {
-ctx.prev_mb = op;
-}
 }
 }
-- 
2.25.1




[PULL 46/56] tcg/optimize: Sink commutative operand swapping into fold functions

2021-10-27 Thread Richard Henderson
Most of these are handled by creating a fold_const2_commutative
to handle all of the binary operators.  The rest were already
handled on a case-by-case basis in the switch, and have their
own fold function in which to place the call.

We now have only one major switch on TCGOpcode.

Introduce NO_DEST and a block comment for swap_commutative in
order to make the handling of brcond and movcond opcodes cleaner.

Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 142 -
 1 file changed, 70 insertions(+), 72 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f723deaafe..e42f5a145f 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -577,6 +577,19 @@ static int do_constant_folding_cond2(TCGArg *p1, TCGArg 
*p2, TCGCond c)
 return -1;
 }
 
+/**
+ * swap_commutative:
+ * @dest: TCGArg of the destination argument, or NO_DEST.
+ * @p1: first paired argument
+ * @p2: second paired argument
+ *
+ * If *@p1 is a constant and *@p2 is not, swap.
+ * If *@p2 matches @dest, swap.
+ * Return true if a swap was performed.
+ */
+
+#define NO_DEST  temp_arg(NULL)
+
 static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
 {
 TCGArg a1 = *p1, a2 = *p2;
@@ -696,6 +709,12 @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
 return false;
 }
 
+static bool fold_const2_commutative(OptContext *ctx, TCGOp *op)
+{
+swap_commutative(op->args[0], >args[1], >args[2]);
+return fold_const2(ctx, op);
+}
+
 static bool fold_masks(OptContext *ctx, TCGOp *op)
 {
 uint64_t a_mask = ctx->a_mask;
@@ -832,7 +851,7 @@ static bool fold_xx_to_x(OptContext *ctx, TCGOp *op)
 
 static bool fold_add(OptContext *ctx, TCGOp *op)
 {
-if (fold_const2(ctx, op) ||
+if (fold_const2_commutative(ctx, op) ||
 fold_xi_to_x(ctx, op, 0)) {
 return true;
 }
@@ -891,6 +910,10 @@ static bool fold_addsub2(OptContext *ctx, TCGOp *op, bool 
add)
 
 static bool fold_add2(OptContext *ctx, TCGOp *op)
 {
+/* Note that the high and low parts may be independently swapped. */
+swap_commutative(op->args[0], >args[2], >args[4]);
+swap_commutative(op->args[1], >args[3], >args[5]);
+
 return fold_addsub2(ctx, op, true);
 }
 
@@ -898,7 +921,7 @@ static bool fold_and(OptContext *ctx, TCGOp *op)
 {
 uint64_t z1, z2;
 
-if (fold_const2(ctx, op) ||
+if (fold_const2_commutative(ctx, op) ||
 fold_xi_to_i(ctx, op, 0) ||
 fold_xi_to_x(ctx, op, -1) ||
 fold_xx_to_x(ctx, op)) {
@@ -950,8 +973,13 @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 static bool fold_brcond(OptContext *ctx, TCGOp *op)
 {
 TCGCond cond = op->args[2];
-int i = do_constant_folding_cond(ctx->type, op->args[0], op->args[1], 
cond);
+int i;
 
+if (swap_commutative(NO_DEST, >args[0], >args[1])) {
+op->args[2] = cond = tcg_swap_cond(cond);
+}
+
+i = do_constant_folding_cond(ctx->type, op->args[0], op->args[1], cond);
 if (i == 0) {
 tcg_op_remove(ctx->tcg, op);
 return true;
@@ -966,10 +994,14 @@ static bool fold_brcond(OptContext *ctx, TCGOp *op)
 static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 {
 TCGCond cond = op->args[4];
-int i = do_constant_folding_cond2(>args[0], >args[2], cond);
 TCGArg label = op->args[5];
-int inv = 0;
+int i, inv = 0;
 
+if (swap_commutative2(>args[0], >args[2])) {
+op->args[4] = cond = tcg_swap_cond(cond);
+}
+
+i = do_constant_folding_cond2(>args[0], >args[2], cond);
 if (i >= 0) {
 goto do_brcond_const;
 }
@@ -1219,7 +1251,7 @@ static bool fold_dup2(OptContext *ctx, TCGOp *op)
 
 static bool fold_eqv(OptContext *ctx, TCGOp *op)
 {
-if (fold_const2(ctx, op) ||
+if (fold_const2_commutative(ctx, op) ||
 fold_xi_to_x(ctx, op, -1) ||
 fold_xi_to_not(ctx, op, 0)) {
 return true;
@@ -1381,8 +1413,20 @@ static bool fold_mov(OptContext *ctx, TCGOp *op)
 static bool fold_movcond(OptContext *ctx, TCGOp *op)
 {
 TCGCond cond = op->args[5];
-int i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], 
cond);
+int i;
 
+if (swap_commutative(NO_DEST, >args[1], >args[2])) {
+op->args[5] = cond = tcg_swap_cond(cond);
+}
+/*
+ * Canonicalize the "false" input reg to match the destination reg so
+ * that the tcg backend can implement a "move if true" operation.
+ */
+if (swap_commutative(op->args[0], >args[4], >args[3])) {
+op->args[5] = cond = tcg_invert_cond(cond);
+}
+
+i = do_constant_folding_cond(ctx->type, op->args[1], op->args[2], cond);
 if (i >= 0) {
 return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
 }
@@ -1428,7 +1472,7 @@ static bool fold_mul(OptContext *ctx, TCGOp *op)
 
 static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
 {
-if (fold_const2(ctx, op) ||
+if (fold_const2_commutative(ctx, op) ||
 fold_xi_to_i(ctx, op, 0)) {
   

[PULL 39/56] tcg/optimize: Split out fold_to_not

2021-10-27 Thread Richard Henderson
Split out the conditional conversion from a more complex logical
operation to a simple NOT.  Create a couple more helpers to make
this easy for the outer-most logical operations.

Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 158 +++--
 1 file changed, 86 insertions(+), 72 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e869fa7e78..21f4251b4f 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -694,6 +694,52 @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
 return false;
 }
 
+/*
+ * Convert @op to NOT, if NOT is supported by the host.
+ * Return true f the conversion is successful, which will still
+ * indicate that the processing is complete.
+ */
+static bool fold_not(OptContext *ctx, TCGOp *op);
+static bool fold_to_not(OptContext *ctx, TCGOp *op, int idx)
+{
+TCGOpcode not_op;
+bool have_not;
+
+switch (ctx->type) {
+case TCG_TYPE_I32:
+not_op = INDEX_op_not_i32;
+have_not = TCG_TARGET_HAS_not_i32;
+break;
+case TCG_TYPE_I64:
+not_op = INDEX_op_not_i64;
+have_not = TCG_TARGET_HAS_not_i64;
+break;
+case TCG_TYPE_V64:
+case TCG_TYPE_V128:
+case TCG_TYPE_V256:
+not_op = INDEX_op_not_vec;
+have_not = TCG_TARGET_HAS_not_vec;
+break;
+default:
+g_assert_not_reached();
+}
+if (have_not) {
+op->opc = not_op;
+op->args[1] = op->args[idx];
+return fold_not(ctx, op);
+}
+return false;
+}
+
+/* If the binary operation has first argument @i, fold to NOT. */
+static bool fold_ix_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+if (arg_is_const(op->args[1]) && arg_info(op->args[1])->val == i) {
+return fold_to_not(ctx, op, 2);
+}
+return false;
+}
+
 /* If the binary operation has second argument @i, fold to @i. */
 static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -703,6 +749,15 @@ static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, 
uint64_t i)
 return false;
 }
 
+/* If the binary operation has second argument @i, fold to NOT. */
+static bool fold_xi_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+return fold_to_not(ctx, op, 1);
+}
+return false;
+}
+
 /* If the binary operation has both arguments equal, fold to @i. */
 static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -781,7 +836,8 @@ static bool fold_and(OptContext *ctx, TCGOp *op)
 static bool fold_andc(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
-fold_xx_to_i(ctx, op, 0)) {
+fold_xx_to_i(ctx, op, 0) ||
+fold_ix_to_not(ctx, op, -1)) {
 return true;
 }
 return false;
@@ -987,7 +1043,11 @@ static bool fold_dup2(OptContext *ctx, TCGOp *op)
 
 static bool fold_eqv(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xi_to_not(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_extract(OptContext *ctx, TCGOp *op)
@@ -1134,7 +1194,11 @@ static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
 
 static bool fold_nand(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xi_to_not(ctx, op, -1)) {
+return true;
+}
+return false;
 }
 
 static bool fold_neg(OptContext *ctx, TCGOp *op)
@@ -1144,12 +1208,22 @@ static bool fold_neg(OptContext *ctx, TCGOp *op)
 
 static bool fold_nor(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xi_to_not(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_not(OptContext *ctx, TCGOp *op)
 {
-return fold_const1(ctx, op);
+if (fold_const1(ctx, op)) {
+return true;
+}
+
+/* Because of fold_to_not, we want to always return true, via finish. */
+finish_folding(ctx, op);
+return true;
 }
 
 static bool fold_or(OptContext *ctx, TCGOp *op)
@@ -1163,7 +1237,11 @@ static bool fold_or(OptContext *ctx, TCGOp *op)
 
 static bool fold_orc(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_ix_to_not(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
@@ -1299,7 +1377,8 @@ static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
 static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
-fold_xx_to_i(ctx, op, 0)) {
+fold_xx_to_i(ctx, op, 0) ||
+fold_xi_to_not(ctx, op, -1)) {
 return true;
 }
 return false;
@@ -1458,71 +1537,6 @@ void tcg_optimize(TCGContext *s)
 }
 }
 break;
-CASE_OP_32_64_VEC(xor):
-CASE_OP_32_64(nand):
-if 

[PULL 42/56] tcg/optimize: Split out fold_ix_to_i

2021-10-27 Thread Richard Henderson
Pull the "op r, 0, b => movi r, 0" optimization into a function,
and use it in fold_shift.

Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 28 ++--
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f5ab0500b7..bf74b77355 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -731,6 +731,15 @@ static bool fold_to_not(OptContext *ctx, TCGOp *op, int 
idx)
 return false;
 }
 
+/* If the binary operation has first argument @i, fold to @i. */
+static bool fold_ix_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+if (arg_is_const(op->args[1]) && arg_info(op->args[1])->val == i) {
+return tcg_opt_gen_movi(ctx, op, op->args[0], i);
+}
+return false;
+}
+
 /* If the binary operation has first argument @i, fold to NOT. */
 static bool fold_ix_to_not(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -1384,6 +1393,7 @@ static bool fold_sextract(OptContext *ctx, TCGOp *op)
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
+fold_ix_to_i(ctx, op, 0) ||
 fold_xi_to_x(ctx, op, 0)) {
 return true;
 }
@@ -1552,24 +1562,6 @@ void tcg_optimize(TCGContext *s)
 break;
 }
 
-/* Simplify expressions for "shift/rot r, 0, a => movi r, 0",
-   and "sub r, 0, a => neg r, a" case.  */
-switch (opc) {
-CASE_OP_32_64(shl):
-CASE_OP_32_64(shr):
-CASE_OP_32_64(sar):
-CASE_OP_32_64(rotl):
-CASE_OP_32_64(rotr):
-if (arg_is_const(op->args[1])
-&& arg_info(op->args[1])->val == 0) {
-tcg_opt_gen_movi(, op, op->args[0], 0);
-continue;
-}
-break;
-default:
-break;
-}
-
 /* Simplify using known-zero bits. Currently only ops with a single
output argument is supported. */
 z_mask = -1;
-- 
2.25.1




[PULL 17/56] tcg/optimize: Split out finish_folding

2021-10-27 Thread Richard Henderson
Copy z_mask into OptContext, for writeback to the
first output within the new function.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 49 +
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 066e635f73..368457f4a2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -48,6 +48,9 @@ typedef struct OptContext {
 TCGContext *tcg;
 TCGOp *prev_mb;
 TCGTempSet temps_used;
+
+/* In flight values from optimization. */
+uint64_t z_mask;
 } OptContext;
 
 static inline TempOptInfo *ts_info(TCGTemp *ts)
@@ -629,6 +632,34 @@ static void copy_propagate(OptContext *ctx, TCGOp *op,
 }
 }
 
+static void finish_folding(OptContext *ctx, TCGOp *op)
+{
+const TCGOpDef *def = _op_defs[op->opc];
+int i, nb_oargs;
+
+/*
+ * For an opcode that ends a BB, reset all temp data.
+ * We do no cross-BB optimization.
+ */
+if (def->flags & TCG_OPF_BB_END) {
+memset(>temps_used, 0, sizeof(ctx->temps_used));
+ctx->prev_mb = NULL;
+return;
+}
+
+nb_oargs = def->nb_oargs;
+for (i = 0; i < nb_oargs; i++) {
+reset_temp(op->args[i]);
+/*
+ * Save the corresponding known-zero bits mask for the
+ * first output argument (only one supported so far).
+ */
+if (i == 0) {
+arg_info(op->args[i])->z_mask = ctx->z_mask;
+}
+}
+}
+
 static bool fold_call(OptContext *ctx, TCGOp *op)
 {
 TCGContext *s = ctx->tcg;
@@ -1122,6 +1153,7 @@ void tcg_optimize(TCGContext *s)
 partmask &= 0xu;
 affected &= 0xu;
 }
+ctx.z_mask = z_mask;
 
 if (partmask == 0) {
 tcg_opt_gen_movi(, op, op->args[0], 0);
@@ -1570,22 +1602,7 @@ void tcg_optimize(TCGContext *s)
 break;
 }
 
-/* Some of the folding above can change opc. */
-opc = op->opc;
-def = _op_defs[opc];
-if (def->flags & TCG_OPF_BB_END) {
-memset(_used, 0, sizeof(ctx.temps_used));
-} else {
-int nb_oargs = def->nb_oargs;
-for (i = 0; i < nb_oargs; i++) {
-reset_temp(op->args[i]);
-/* Save the corresponding known-zero bits mask for the
-   first output argument (only one supported so far). */
-if (i == 0) {
-arg_info(op->args[i])->z_mask = z_mask;
-}
-}
-}
+finish_folding(, op);
 
 /* Eliminate duplicate and redundant fence instructions.  */
 if (ctx.prev_mb) {
-- 
2.25.1




[PULL 31/56] tcg/optimize: Split out fold_count_zeros

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2c57d08760..dd65f1afcd 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -873,6 +873,20 @@ static bool fold_call(OptContext *ctx, TCGOp *op)
 return true;
 }
 
+static bool fold_count_zeros(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1])) {
+uint64_t t = arg_info(op->args[1])->val;
+
+if (t != 0) {
+t = do_constant_folding(op->opc, t, 0);
+return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+}
+return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[2]);
+}
+return false;
+}
+
 static bool fold_ctpop(OptContext *ctx, TCGOp *op)
 {
 return fold_const1(ctx, op);
@@ -1739,20 +1753,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-CASE_OP_32_64(clz):
-CASE_OP_32_64(ctz):
-if (arg_is_const(op->args[1])) {
-TCGArg v = arg_info(op->args[1])->val;
-if (v != 0) {
-tmp = do_constant_folding(opc, v, 0);
-tcg_opt_gen_movi(, op, op->args[0], tmp);
-} else {
-tcg_opt_gen_mov(, op, op->args[0], op->args[2]);
-}
-continue;
-}
-break;
-
 default:
 break;
 
@@ -1777,6 +1777,10 @@ void tcg_optimize(TCGContext *s)
 case INDEX_op_brcond2_i32:
 done = fold_brcond2(, op);
 break;
+CASE_OP_32_64(clz):
+CASE_OP_32_64(ctz):
+done = fold_count_zeros(, op);
+break;
 CASE_OP_32_64(ctpop):
 done = fold_ctpop(, op);
 break;
-- 
2.25.1




[PULL 37/56] tcg/optimize: Split out fold_xi_to_i

2021-10-27 Thread Richard Henderson
Pull the "op r, a, 0 => movi r, 0" optimization into a function,
and use it in the outer opcode fold functions.

Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index ab96849edf..cfdc53c964 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -695,6 +695,15 @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
 return false;
 }
 
+/* If the binary operation has second argument @i, fold to @i. */
+static bool fold_xi_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == i) {
+return tcg_opt_gen_movi(ctx, op, op->args[0], i);
+}
+return false;
+}
+
 /* If the binary operation has both arguments equal, fold to @i. */
 static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
 {
@@ -763,6 +772,7 @@ static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
 static bool fold_and(OptContext *ctx, TCGOp *op)
 {
 if (fold_const2(ctx, op) ||
+fold_xi_to_i(ctx, op, 0) ||
 fold_xx_to_x(ctx, op)) {
 return true;
 }
@@ -1081,12 +1091,20 @@ static bool fold_movcond(OptContext *ctx, TCGOp *op)
 
 static bool fold_mul(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xi_to_i(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xi_to_i(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
@@ -1753,22 +1771,6 @@ void tcg_optimize(TCGContext *s)
 continue;
 }
 
-/* Simplify expression for "op r, a, 0 => movi r, 0" cases */
-switch (opc) {
-CASE_OP_32_64_VEC(and):
-CASE_OP_32_64_VEC(mul):
-CASE_OP_32_64(muluh):
-CASE_OP_32_64(mulsh):
-if (arg_is_const(op->args[2])
-&& arg_info(op->args[2])->val == 0) {
-tcg_opt_gen_movi(, op, op->args[0], 0);
-continue;
-}
-break;
-default:
-break;
-}
-
 /*
  * Process each opcode.
  * Sorted alphabetically by opcode as much as possible.
-- 
2.25.1




[PULL 15/56] tcg/optimize: Change fail return for do_constant_folding_cond*

2021-10-27 Thread Richard Henderson
Return -1 instead of 2 for failure, so that we can
use comparisons against 0 for all cases.

Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 145 +
 1 file changed, 74 insertions(+), 71 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 77cdffaaef..19c01687b4 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -502,10 +502,12 @@ static bool do_constant_folding_cond_eq(TCGCond c)
 }
 }
 
-/* Return 2 if the condition can't be simplified, and the result
-   of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond(TCGOpcode op, TCGArg x,
-   TCGArg y, TCGCond c)
+/*
+ * Return -1 if the condition can't be simplified,
+ * and the result of the condition (0 or 1) if it can.
+ */
+static int do_constant_folding_cond(TCGOpcode op, TCGArg x,
+TCGArg y, TCGCond c)
 {
 uint64_t xv = arg_info(x)->val;
 uint64_t yv = arg_info(y)->val;
@@ -527,15 +529,17 @@ static TCGArg do_constant_folding_cond(TCGOpcode op, 
TCGArg x,
 case TCG_COND_GEU:
 return 1;
 default:
-return 2;
+return -1;
 }
 }
-return 2;
+return -1;
 }
 
-/* Return 2 if the condition can't be simplified, and the result
-   of the condition (0 or 1) if it can */
-static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
+/*
+ * Return -1 if the condition can't be simplified,
+ * and the result of the condition (0 or 1) if it can.
+ */
+static int do_constant_folding_cond2(TCGArg *p1, TCGArg *p2, TCGCond c)
 {
 TCGArg al = p1[0], ah = p1[1];
 TCGArg bl = p2[0], bh = p2[1];
@@ -565,7 +569,7 @@ static TCGArg do_constant_folding_cond2(TCGArg *p1, TCGArg 
*p2, TCGCond c)
 if (args_are_copies(al, bl) && args_are_copies(ah, bh)) {
 return do_constant_folding_cond_eq(c);
 }
-return 2;
+return -1;
 }
 
 static bool swap_commutative(TCGArg dest, TCGArg *p1, TCGArg *p2)
@@ -1321,22 +1325,21 @@ void tcg_optimize(TCGContext *s)
 break;
 
 CASE_OP_32_64(setcond):
-tmp = do_constant_folding_cond(opc, op->args[1],
-   op->args[2], op->args[3]);
-if (tmp != 2) {
-tcg_opt_gen_movi(, op, op->args[0], tmp);
+i = do_constant_folding_cond(opc, op->args[1],
+ op->args[2], op->args[3]);
+if (i >= 0) {
+tcg_opt_gen_movi(, op, op->args[0], i);
 continue;
 }
 break;
 
 CASE_OP_32_64(brcond):
-tmp = do_constant_folding_cond(opc, op->args[0],
-   op->args[1], op->args[2]);
-switch (tmp) {
-case 0:
+i = do_constant_folding_cond(opc, op->args[0],
+ op->args[1], op->args[2]);
+if (i == 0) {
 tcg_op_remove(s, op);
 continue;
-case 1:
+} else if (i > 0) {
 memset(_used, 0, sizeof(ctx.temps_used));
 op->opc = opc = INDEX_op_br;
 op->args[0] = op->args[3];
@@ -1345,10 +1348,10 @@ void tcg_optimize(TCGContext *s)
 break;
 
 CASE_OP_32_64(movcond):
-tmp = do_constant_folding_cond(opc, op->args[1],
-   op->args[2], op->args[5]);
-if (tmp != 2) {
-tcg_opt_gen_mov(, op, op->args[0], op->args[4-tmp]);
+i = do_constant_folding_cond(opc, op->args[1],
+ op->args[2], op->args[5]);
+if (i >= 0) {
+tcg_opt_gen_mov(, op, op->args[0], op->args[4 - i]);
 continue;
 }
 if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
@@ -1412,14 +1415,14 @@ void tcg_optimize(TCGContext *s)
 break;
 
 case INDEX_op_brcond2_i32:
-tmp = do_constant_folding_cond2(>args[0], >args[2],
-op->args[4]);
-if (tmp == 0) {
+i = do_constant_folding_cond2(>args[0], >args[2],
+  op->args[4]);
+if (i == 0) {
 do_brcond_false:
 tcg_op_remove(s, op);
 continue;
 }
-if (tmp == 1) {
+if (i > 0) {
 do_brcond_true:
 op->opc = opc = INDEX_op_br;
 op->args[0] = op->args[5];
@@ -1443,20 +1446,20 @@ void tcg_optimize(TCGContext *s)
 if (op->args[4] == TCG_COND_EQ) {
 /* Simplify EQ comparisons where one of the pairs
can be simplified.  */
-tmp = 

[PULL 34/56] tcg/optimize: Split out fold_mov

2021-10-27 Thread Richard Henderson
This is the final entry in the main switch that was in a
different form.  After this, we have the option to convert
the switch into a function dispatch table.

Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 8524fe1f8a..5f1bd7cd78 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1015,6 +1015,11 @@ static bool fold_mb(OptContext *ctx, TCGOp *op)
 return true;
 }
 
+static bool fold_mov(OptContext *ctx, TCGOp *op)
+{
+return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+}
+
 static bool fold_movcond(OptContext *ctx, TCGOp *op)
 {
 TCGOpcode opc = op->opc;
@@ -1748,20 +1753,11 @@ void tcg_optimize(TCGContext *s)
 break;
 }
 
-/* Propagate constants through copy operations and do constant
-   folding.  Constants will be substituted to arguments by register
-   allocator where needed and possible.  Also detect copies. */
+/*
+ * Process each opcode.
+ * Sorted alphabetically by opcode as much as possible.
+ */
 switch (opc) {
-CASE_OP_32_64_VEC(mov):
-done = tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
-break;
-
-default:
-break;
-
-/* -- */
-/* Sorted alphabetically by opcode as much as possible. */
-
 CASE_OP_32_64_VEC(add):
 done = fold_add(, op);
 break;
@@ -1831,6 +1827,9 @@ void tcg_optimize(TCGContext *s)
 case INDEX_op_mb:
 done = fold_mb(, op);
 break;
+CASE_OP_32_64_VEC(mov):
+done = fold_mov(, op);
+break;
 CASE_OP_32_64(movcond):
 done = fold_movcond(, op);
 break;
@@ -1900,6 +1899,8 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(xor):
 done = fold_xor(, op);
 break;
+default:
+break;
 }
 
 if (!done) {
-- 
2.25.1




[PULL 35/56] tcg/optimize: Split out fold_xx_to_i

2021-10-27 Thread Richard Henderson
Pull the "op r, a, a => movi r, 0" optimization into a function,
and use it in the outer opcode fold functions.

Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 41 -
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5f1bd7cd78..2f55dc56c0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -695,6 +695,15 @@ static bool fold_const2(OptContext *ctx, TCGOp *op)
 return false;
 }
 
+/* If the binary operation has both arguments equal, fold to @i. */
+static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, uint64_t i)
+{
+if (args_are_copies(op->args[1], op->args[2])) {
+return tcg_opt_gen_movi(ctx, op, op->args[0], i);
+}
+return false;
+}
+
 /*
  * These outermost fold_ functions are sorted alphabetically.
  */
@@ -744,7 +753,11 @@ static bool fold_and(OptContext *ctx, TCGOp *op)
 
 static bool fold_andc(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xx_to_i(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_brcond(OptContext *ctx, TCGOp *op)
@@ -1224,7 +1237,11 @@ static bool fold_shift(OptContext *ctx, TCGOp *op)
 
 static bool fold_sub(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xx_to_i(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
@@ -1234,7 +1251,11 @@ static bool fold_sub2_i32(OptContext *ctx, TCGOp *op)
 
 static bool fold_xor(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xx_to_i(ctx, op, 0)) {
+return true;
+}
+return false;
 }
 
 /* Propagate constants and copies, fold constant expressions. */
@@ -1739,20 +1760,6 @@ void tcg_optimize(TCGContext *s)
 break;
 }
 
-/* Simplify expression for "op r, a, a => movi r, 0" cases */
-switch (opc) {
-CASE_OP_32_64_VEC(andc):
-CASE_OP_32_64_VEC(sub):
-CASE_OP_32_64_VEC(xor):
-if (args_are_copies(op->args[1], op->args[2])) {
-tcg_opt_gen_movi(, op, op->args[0], 0);
-continue;
-}
-break;
-default:
-break;
-}
-
 /*
  * Process each opcode.
  * Sorted alphabetically by opcode as much as possible.
-- 
2.25.1




[PULL 27/56] tcg/optimize: Split out fold_movcond

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 56 --
 1 file changed, 31 insertions(+), 25 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 9d1d045363..110b3d1cc2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -922,6 +922,34 @@ static bool fold_mb(OptContext *ctx, TCGOp *op)
 return true;
 }
 
+static bool fold_movcond(OptContext *ctx, TCGOp *op)
+{
+TCGOpcode opc = op->opc;
+TCGCond cond = op->args[5];
+int i = do_constant_folding_cond(opc, op->args[1], op->args[2], cond);
+
+if (i >= 0) {
+return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[4 - i]);
+}
+
+if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
+uint64_t tv = arg_info(op->args[3])->val;
+uint64_t fv = arg_info(op->args[4])->val;
+
+opc = (opc == INDEX_op_movcond_i32
+   ? INDEX_op_setcond_i32 : INDEX_op_setcond_i64);
+
+if (tv == 1 && fv == 0) {
+op->opc = opc;
+op->args[3] = cond;
+} else if (fv == 1 && tv == 0) {
+op->opc = opc;
+op->args[3] = tcg_invert_cond(cond);
+}
+}
+return false;
+}
+
 static bool fold_mul(OptContext *ctx, TCGOp *op)
 {
 return fold_const2(ctx, op);
@@ -1715,31 +1743,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-CASE_OP_32_64(movcond):
-i = do_constant_folding_cond(opc, op->args[1],
- op->args[2], op->args[5]);
-if (i >= 0) {
-tcg_opt_gen_mov(, op, op->args[0], op->args[4 - i]);
-continue;
-}
-if (arg_is_const(op->args[3]) && arg_is_const(op->args[4])) {
-uint64_t tv = arg_info(op->args[3])->val;
-uint64_t fv = arg_info(op->args[4])->val;
-TCGCond cond = op->args[5];
-
-if (fv == 1 && tv == 0) {
-cond = tcg_invert_cond(cond);
-} else if (!(tv == 1 && fv == 0)) {
-break;
-}
-op->args[3] = cond;
-op->opc = opc = (opc == INDEX_op_movcond_i32
- ? INDEX_op_setcond_i32
- : INDEX_op_setcond_i64);
-}
-break;
-
-
 default:
 break;
 
@@ -1791,6 +1794,9 @@ void tcg_optimize(TCGContext *s)
 case INDEX_op_mb:
 done = fold_mb(, op);
 break;
+CASE_OP_32_64(movcond):
+done = fold_movcond(, op);
+break;
 CASE_OP_32_64(mul):
 done = fold_mul(, op);
 break;
-- 
2.25.1




[PULL 11/56] tcg/optimize: Split out init_arguments

2021-10-27 Thread Richard Henderson
There was no real reason for calls to have separate code here.
Unify init for calls vs non-calls using the call path, which
handles TCG_CALL_DUMMY_ARG.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index b875d76354..019c5aaf81 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -128,11 +128,6 @@ static void init_ts_info(OptContext *ctx, TCGTemp *ts)
 }
 }
 
-static void init_arg_info(OptContext *ctx, TCGArg arg)
-{
-init_ts_info(ctx, arg_temp(arg));
-}
-
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
 {
 TCGTemp *i, *g, *l;
@@ -606,6 +601,16 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 return false;
 }
 
+static void init_arguments(OptContext *ctx, TCGOp *op, int nb_args)
+{
+for (int i = 0; i < nb_args; i++) {
+TCGTemp *ts = arg_temp(op->args[i]);
+if (ts) {
+init_ts_info(ctx, ts);
+}
+}
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
@@ -636,19 +641,11 @@ void tcg_optimize(TCGContext *s)
 if (opc == INDEX_op_call) {
 nb_oargs = TCGOP_CALLO(op);
 nb_iargs = TCGOP_CALLI(op);
-for (i = 0; i < nb_oargs + nb_iargs; i++) {
-TCGTemp *ts = arg_temp(op->args[i]);
-if (ts) {
-init_ts_info(, ts);
-}
-}
 } else {
 nb_oargs = def->nb_oargs;
 nb_iargs = def->nb_iargs;
-for (i = 0; i < nb_oargs + nb_iargs; i++) {
-init_arg_info(, op->args[i]);
-}
 }
+init_arguments(, op, nb_oargs + nb_iargs);
 
 /* Do copy propagation */
 for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-- 
2.25.1




[PULL 24/56] tcg/optimize: Split out fold_setcond

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 24ba6d2830..f79cb44944 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -948,6 +948,17 @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_setcond(OptContext *ctx, TCGOp *op)
+{
+TCGCond cond = op->args[3];
+int i = do_constant_folding_cond(op->opc, op->args[1], op->args[2], cond);
+
+if (i >= 0) {
+return tcg_opt_gen_movi(ctx, op, op->args[0], i);
+}
+return false;
+}
+
 static bool fold_setcond2(OptContext *ctx, TCGOp *op)
 {
 TCGCond cond = op->args[5];
@@ -1648,15 +1659,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-CASE_OP_32_64(setcond):
-i = do_constant_folding_cond(opc, op->args[1],
- op->args[2], op->args[3]);
-if (i >= 0) {
-tcg_opt_gen_movi(, op, op->args[0], i);
-continue;
-}
-break;
-
 CASE_OP_32_64(movcond):
 i = do_constant_folding_cond(opc, op->args[1],
  op->args[2], op->args[5]);
@@ -1817,6 +1819,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(shr):
 done = fold_shift(, op);
 break;
+CASE_OP_32_64(setcond):
+done = fold_setcond(, op);
+break;
 case INDEX_op_setcond2_i32:
 done = fold_setcond2(, op);
 break;
-- 
2.25.1




[PULL 36/56] tcg/optimize: Split out fold_xx_to_x

2021-10-27 Thread Richard Henderson
Pull the "op r, a, a => mov r, a" optimization into a function,
and use it in the outer opcode fold functions.

Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 39 ---
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2f55dc56c0..ab96849edf 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -704,8 +704,22 @@ static bool fold_xx_to_i(OptContext *ctx, TCGOp *op, 
uint64_t i)
 return false;
 }
 
+/* If the binary operation has both arguments equal, fold to identity. */
+static bool fold_xx_to_x(OptContext *ctx, TCGOp *op)
+{
+if (args_are_copies(op->args[1], op->args[2])) {
+return tcg_opt_gen_mov(ctx, op, op->args[0], op->args[1]);
+}
+return false;
+}
+
 /*
  * These outermost fold_ functions are sorted alphabetically.
+ *
+ * The ordering of the transformations should be:
+ *   1) those that produce a constant
+ *   2) those that produce a copy
+ *   3) those that produce information about the result value.
  */
 
 static bool fold_add(OptContext *ctx, TCGOp *op)
@@ -748,7 +762,11 @@ static bool fold_add2_i32(OptContext *ctx, TCGOp *op)
 
 static bool fold_and(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xx_to_x(ctx, op)) {
+return true;
+}
+return false;
 }
 
 static bool fold_andc(OptContext *ctx, TCGOp *op)
@@ -,7 +1129,11 @@ static bool fold_not(OptContext *ctx, TCGOp *op)
 
 static bool fold_or(OptContext *ctx, TCGOp *op)
 {
-return fold_const2(ctx, op);
+if (fold_const2(ctx, op) ||
+fold_xx_to_x(ctx, op)) {
+return true;
+}
+return false;
 }
 
 static bool fold_orc(OptContext *ctx, TCGOp *op)
@@ -1747,19 +1769,6 @@ void tcg_optimize(TCGContext *s)
 break;
 }
 
-/* Simplify expression for "op r, a, a => mov r, a" cases */
-switch (opc) {
-CASE_OP_32_64_VEC(or):
-CASE_OP_32_64_VEC(and):
-if (args_are_copies(op->args[1], op->args[2])) {
-tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
-continue;
-}
-break;
-default:
-break;
-}
-
 /*
  * Process each opcode.
  * Sorted alphabetically by opcode as much as possible.
-- 
2.25.1




[PULL 30/56] tcg/optimize: Split out fold_deposit

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 3bd5f043c8..2c57d08760 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -878,6 +878,18 @@ static bool fold_ctpop(OptContext *ctx, TCGOp *op)
 return fold_const1(ctx, op);
 }
 
+static bool fold_deposit(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+uint64_t t1 = arg_info(op->args[1])->val;
+uint64_t t2 = arg_info(op->args[2])->val;
+
+t1 = deposit64(t1, op->args[3], op->args[4], t2);
+return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
+}
+return false;
+}
+
 static bool fold_divide(OptContext *ctx, TCGOp *op)
 {
 return fold_const2(ctx, op);
@@ -1741,16 +1753,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-CASE_OP_32_64(deposit):
-if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
-tmp = deposit64(arg_info(op->args[1])->val,
-op->args[3], op->args[4],
-arg_info(op->args[2])->val);
-tcg_opt_gen_movi(, op, op->args[0], tmp);
-continue;
-}
-break;
-
 default:
 break;
 
@@ -1778,6 +1780,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(ctpop):
 done = fold_ctpop(, op);
 break;
+CASE_OP_32_64(deposit):
+done = fold_deposit(, op);
+break;
 CASE_OP_32_64(div):
 CASE_OP_32_64(divu):
 done = fold_divide(, op);
-- 
2.25.1




[PULL 23/56] tcg/optimize: Split out fold_brcond

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c9db14f1d0..24ba6d2830 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -714,6 +714,22 @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_brcond(OptContext *ctx, TCGOp *op)
+{
+TCGCond cond = op->args[2];
+int i = do_constant_folding_cond(op->opc, op->args[0], op->args[1], cond);
+
+if (i == 0) {
+tcg_op_remove(ctx->tcg, op);
+return true;
+}
+if (i > 0) {
+op->opc = INDEX_op_br;
+op->args[0] = op->args[3];
+}
+return false;
+}
+
 static bool fold_brcond2(OptContext *ctx, TCGOp *op)
 {
 TCGCond cond = op->args[4];
@@ -1641,20 +1657,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-CASE_OP_32_64(brcond):
-i = do_constant_folding_cond(opc, op->args[0],
- op->args[1], op->args[2]);
-if (i == 0) {
-tcg_op_remove(s, op);
-continue;
-} else if (i > 0) {
-memset(_used, 0, sizeof(ctx.temps_used));
-op->opc = opc = INDEX_op_br;
-op->args[0] = op->args[3];
-break;
-}
-break;
-
 CASE_OP_32_64(movcond):
 i = do_constant_folding_cond(opc, op->args[1],
  op->args[2], op->args[5]);
@@ -1737,6 +1739,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(andc):
 done = fold_andc(, op);
 break;
+CASE_OP_32_64(brcond):
+done = fold_brcond(, op);
+break;
 case INDEX_op_brcond2_i32:
 done = fold_brcond2(, op);
 break;
-- 
2.25.1




[PULL 06/56] tcg/optimize: Rename "mask" to "z_mask"

2021-10-27 Thread Richard Henderson
Prepare for tracking different masks by renaming this one.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 142 +
 1 file changed, 72 insertions(+), 70 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index c239c3bd07..148e360fc6 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -41,7 +41,7 @@ typedef struct TempOptInfo {
 TCGTemp *prev_copy;
 TCGTemp *next_copy;
 uint64_t val;
-uint64_t mask;
+uint64_t z_mask;  /* mask bit is 0 if and only if value bit is 0 */
 } TempOptInfo;
 
 static inline TempOptInfo *ts_info(TCGTemp *ts)
@@ -81,7 +81,7 @@ static void reset_ts(TCGTemp *ts)
 ti->next_copy = ts;
 ti->prev_copy = ts;
 ti->is_const = false;
-ti->mask = -1;
+ti->z_mask = -1;
 }
 
 static void reset_temp(TCGArg arg)
@@ -111,14 +111,14 @@ static void init_ts_info(TCGTempSet *temps_used, TCGTemp 
*ts)
 if (ts->kind == TEMP_CONST) {
 ti->is_const = true;
 ti->val = ts->val;
-ti->mask = ts->val;
+ti->z_mask = ts->val;
 if (TCG_TARGET_REG_BITS > 32 && ts->type == TCG_TYPE_I32) {
 /* High bits of a 32-bit quantity are garbage.  */
-ti->mask |= ~0xull;
+ti->z_mask |= ~0xull;
 }
 } else {
 ti->is_const = false;
-ti->mask = -1;
+ti->z_mask = -1;
 }
 }
 
@@ -186,7 +186,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, 
TCGArg dst, TCGArg src)
 const TCGOpDef *def;
 TempOptInfo *di;
 TempOptInfo *si;
-uint64_t mask;
+uint64_t z_mask;
 TCGOpcode new_op;
 
 if (ts_are_copies(dst_ts, src_ts)) {
@@ -210,12 +210,12 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, 
TCGArg dst, TCGArg src)
 op->args[0] = dst;
 op->args[1] = src;
 
-mask = si->mask;
+z_mask = si->z_mask;
 if (TCG_TARGET_REG_BITS > 32 && new_op == INDEX_op_mov_i32) {
 /* High bits of the destination are now garbage.  */
-mask |= ~0xull;
+z_mask |= ~0xull;
 }
-di->mask = mask;
+di->z_mask = z_mask;
 
 if (src_ts->type == dst_ts->type) {
 TempOptInfo *ni = ts_info(si->next_copy);
@@ -621,7 +621,7 @@ void tcg_optimize(TCGContext *s)
 }
 
 QTAILQ_FOREACH_SAFE(op, >ops, link, op_next) {
-uint64_t mask, partmask, affected, tmp;
+uint64_t z_mask, partmask, affected, tmp;
 int nb_oargs, nb_iargs;
 TCGOpcode opc = op->opc;
 const TCGOpDef *def = _op_defs[opc];
@@ -855,170 +855,172 @@ void tcg_optimize(TCGContext *s)
 
 /* Simplify using known-zero bits. Currently only ops with a single
output argument is supported. */
-mask = -1;
+z_mask = -1;
 affected = -1;
 switch (opc) {
 CASE_OP_32_64(ext8s):
-if ((arg_info(op->args[1])->mask & 0x80) != 0) {
+if ((arg_info(op->args[1])->z_mask & 0x80) != 0) {
 break;
 }
 QEMU_FALLTHROUGH;
 CASE_OP_32_64(ext8u):
-mask = 0xff;
+z_mask = 0xff;
 goto and_const;
 CASE_OP_32_64(ext16s):
-if ((arg_info(op->args[1])->mask & 0x8000) != 0) {
+if ((arg_info(op->args[1])->z_mask & 0x8000) != 0) {
 break;
 }
 QEMU_FALLTHROUGH;
 CASE_OP_32_64(ext16u):
-mask = 0x;
+z_mask = 0x;
 goto and_const;
 case INDEX_op_ext32s_i64:
-if ((arg_info(op->args[1])->mask & 0x8000) != 0) {
+if ((arg_info(op->args[1])->z_mask & 0x8000) != 0) {
 break;
 }
 QEMU_FALLTHROUGH;
 case INDEX_op_ext32u_i64:
-mask = 0xU;
+z_mask = 0xU;
 goto and_const;
 
 CASE_OP_32_64(and):
-mask = arg_info(op->args[2])->mask;
+z_mask = arg_info(op->args[2])->z_mask;
 if (arg_is_const(op->args[2])) {
 and_const:
-affected = arg_info(op->args[1])->mask & ~mask;
+affected = arg_info(op->args[1])->z_mask & ~z_mask;
 }
-mask = arg_info(op->args[1])->mask & mask;
+z_mask = arg_info(op->args[1])->z_mask & z_mask;
 break;
 
 case INDEX_op_ext_i32_i64:
-if ((arg_info(op->args[1])->mask & 0x8000) != 0) {
+if ((arg_info(op->args[1])->z_mask & 0x8000) != 0) {
 break;
 }
 QEMU_FALLTHROUGH;
 case INDEX_op_extu_i32_i64:
 /* We do not compute affected as it is a size changing op.  */
-mask = (uint32_t)arg_info(op->args[1])->mask;
+z_mask = (uint32_t)arg_info(op->args[1])->z_mask;
 break;
 
 

[PULL 12/56] tcg/optimize: Split out copy_propagate

2021-10-27 Thread Richard Henderson
Continue splitting tcg_optimize.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 019c5aaf81..fad6f5de1f 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -611,6 +611,19 @@ static void init_arguments(OptContext *ctx, TCGOp *op, int 
nb_args)
 }
 }
 
+static void copy_propagate(OptContext *ctx, TCGOp *op,
+   int nb_oargs, int nb_iargs)
+{
+TCGContext *s = ctx->tcg;
+
+for (int i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
+TCGTemp *ts = arg_temp(op->args[i]);
+if (ts && ts_is_copy(ts)) {
+op->args[i] = temp_arg(find_better_copy(s, ts));
+}
+}
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
@@ -646,14 +659,7 @@ void tcg_optimize(TCGContext *s)
 nb_iargs = def->nb_iargs;
 }
 init_arguments(, op, nb_oargs + nb_iargs);
-
-/* Do copy propagation */
-for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
-TCGTemp *ts = arg_temp(op->args[i]);
-if (ts && ts_is_copy(ts)) {
-op->args[i] = temp_arg(find_better_copy(s, ts));
-}
-}
+copy_propagate(, op, nb_oargs, nb_iargs);
 
 /* For commutative operations make constant second argument */
 switch (opc) {
-- 
2.25.1




[PULL 25/56] tcg/optimize: Split out fold_mulu2_i32

2021-10-27 Thread Richard Henderson
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f79cb44944..805522f99d 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -899,6 +899,24 @@ static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_mulu2_i32(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
+uint32_t a = arg_info(op->args[2])->val;
+uint32_t b = arg_info(op->args[3])->val;
+uint64_t r = (uint64_t)a * b;
+TCGArg rl, rh;
+TCGOp *op2 = tcg_op_insert_before(ctx->tcg, op, INDEX_op_mov_i32);
+
+rl = op->args[0];
+rh = op->args[1];
+tcg_opt_gen_movi(ctx, op, rl, (int32_t)r);
+tcg_opt_gen_movi(ctx, op2, rh, (int32_t)(r >> 32));
+return true;
+}
+return false;
+}
+
 static bool fold_nand(OptContext *ctx, TCGOp *op)
 {
 return fold_const2(ctx, op);
@@ -1710,22 +1728,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-case INDEX_op_mulu2_i32:
-if (arg_is_const(op->args[2]) && arg_is_const(op->args[3])) {
-uint32_t a = arg_info(op->args[2])->val;
-uint32_t b = arg_info(op->args[3])->val;
-uint64_t r = (uint64_t)a * b;
-TCGArg rl, rh;
-TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_mov_i32);
-
-rl = op->args[0];
-rh = op->args[1];
-tcg_opt_gen_movi(, op, rl, (int32_t)r);
-tcg_opt_gen_movi(, op2, rh, (int32_t)(r >> 32));
-continue;
-}
-break;
-
 default:
 break;
 
@@ -1781,6 +1783,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(muluh):
 done = fold_mul_highpart(, op);
 break;
+case INDEX_op_mulu2_i32:
+done = fold_mulu2_i32(, op);
+break;
 CASE_OP_32_64(nand):
 done = fold_nand(, op);
 break;
-- 
2.25.1




[PULL 20/56] tcg/optimize: Split out fold_const{1,2}

2021-10-27 Thread Richard Henderson
Split out a whole bunch of placeholder functions, which are
currently identical.  That won't last as more code gets moved.

Use CASE_32_64_VEC for some logical operators that previously
missed the addition of vectors.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 271 +++--
 1 file changed, 219 insertions(+), 52 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 159a5a9ee5..5c3f8e8fcd 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -660,6 +660,60 @@ static void finish_folding(OptContext *ctx, TCGOp *op)
 }
 }
 
+/*
+ * The fold_* functions return true when processing is complete,
+ * usually by folding the operation to a constant or to a copy,
+ * and calling tcg_opt_gen_{mov,movi}.  They may do other things,
+ * like collect information about the value produced, for use in
+ * optimizing a subsequent operation.
+ *
+ * These first fold_* functions are all helpers, used by other
+ * folders for more specific operations.
+ */
+
+static bool fold_const1(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1])) {
+uint64_t t;
+
+t = arg_info(op->args[1])->val;
+t = do_constant_folding(op->opc, t, 0);
+return tcg_opt_gen_movi(ctx, op, op->args[0], t);
+}
+return false;
+}
+
+static bool fold_const2(OptContext *ctx, TCGOp *op)
+{
+if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
+uint64_t t1 = arg_info(op->args[1])->val;
+uint64_t t2 = arg_info(op->args[2])->val;
+
+t1 = do_constant_folding(op->opc, t1, t2);
+return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
+}
+return false;
+}
+
+/*
+ * These outermost fold_ functions are sorted alphabetically.
+ */
+
+static bool fold_add(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_and(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_andc(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
 static bool fold_call(OptContext *ctx, TCGOp *op)
 {
 TCGContext *s = ctx->tcg;
@@ -692,6 +746,31 @@ static bool fold_call(OptContext *ctx, TCGOp *op)
 return true;
 }
 
+static bool fold_ctpop(OptContext *ctx, TCGOp *op)
+{
+return fold_const1(ctx, op);
+}
+
+static bool fold_divide(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_eqv(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_exts(OptContext *ctx, TCGOp *op)
+{
+return fold_const1(ctx, op);
+}
+
+static bool fold_extu(OptContext *ctx, TCGOp *op)
+{
+return fold_const1(ctx, op);
+}
+
 static bool fold_mb(OptContext *ctx, TCGOp *op)
 {
 /* Eliminate duplicate and redundant fence instructions.  */
@@ -716,6 +795,46 @@ static bool fold_mb(OptContext *ctx, TCGOp *op)
 return true;
 }
 
+static bool fold_mul(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_mul_highpart(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_nand(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_neg(OptContext *ctx, TCGOp *op)
+{
+return fold_const1(ctx, op);
+}
+
+static bool fold_nor(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_not(OptContext *ctx, TCGOp *op)
+{
+return fold_const1(ctx, op);
+}
+
+static bool fold_or(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_orc(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
 static bool fold_qemu_ld(OptContext *ctx, TCGOp *op)
 {
 /* Opcodes that touch guest memory stop the mb optimization.  */
@@ -730,6 +849,26 @@ static bool fold_qemu_st(OptContext *ctx, TCGOp *op)
 return false;
 }
 
+static bool fold_remainder(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_shift(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_sub(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
+static bool fold_xor(OptContext *ctx, TCGOp *op)
+{
+return fold_const2(ctx, op);
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
@@ -1276,26 +1415,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-CASE_OP_32_64(not):
-CASE_OP_32_64(neg):
-CASE_OP_32_64(ext8s):
-CASE_OP_32_64(ext8u):
-CASE_OP_32_64(ext16s):
-CASE_OP_32_64(ext16u):
-CASE_OP_32_64(ctpop):
-case INDEX_op_ext32s_i64:
-case INDEX_op_ext32u_i64:
-case INDEX_op_ext_i32_i64:
-case INDEX_op_extu_i32_i64:
-case INDEX_op_extrl_i64_i32:
-case INDEX_op_extrh_i64_i32:
-if (arg_is_const(op->args[1])) {
-tmp = 

[PULL 04/56] host-utils: add 128-bit quotient support to divu128/divs128

2021-10-27 Thread Richard Henderson
From: Luis Pires 

These will be used to implement new decimal floating point
instructions from Power ISA 3.1.

The remainder is now returned directly by divu128/divs128,
freeing up phigh to receive the high 64 bits of the quotient.

Signed-off-by: Luis Pires 
Reviewed-by: Richard Henderson 
Message-Id: <20211025191154.350831-4-luis.pi...@eldorado.org.br>
Signed-off-by: Richard Henderson 
---
 include/hw/clock.h|   6 +-
 include/qemu/host-utils.h |  20 --
 target/ppc/int_helper.c   |   9 +--
 util/host-utils.c | 133 +-
 4 files changed, 108 insertions(+), 60 deletions(-)

diff --git a/include/hw/clock.h b/include/hw/clock.h
index 7443e6c4ab..5c927cee7f 100644
--- a/include/hw/clock.h
+++ b/include/hw/clock.h
@@ -323,11 +323,7 @@ static inline uint64_t clock_ns_to_ticks(const Clock *clk, 
uint64_t ns)
 if (clk->period == 0) {
 return 0;
 }
-/*
- * BUG: when CONFIG_INT128 is not defined, the current implementation of
- * divu128 does not return a valid truncated quotient, so the result will
- * be wrong.
- */
+
 divu128(, , clk->period);
 return lo;
 }
diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 08a17e16e5..a3a7ced78d 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -56,26 +56,32 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, 
uint32_t c)
 return (__int128_t)a * b / c;
 }
 
-static inline void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+static inline uint64_t divu128(uint64_t *plow, uint64_t *phigh,
+   uint64_t divisor)
 {
 __uint128_t dividend = ((__uint128_t)*phigh << 64) | *plow;
 __uint128_t result = dividend / divisor;
+
 *plow = result;
-*phigh = dividend % divisor;
+*phigh = result >> 64;
+return dividend % divisor;
 }
 
-static inline void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
+static inline int64_t divs128(uint64_t *plow, int64_t *phigh,
+  int64_t divisor)
 {
-__int128_t dividend = ((__int128_t)*phigh << 64) | (uint64_t)*plow;
+__int128_t dividend = ((__int128_t)*phigh << 64) | *plow;
 __int128_t result = dividend / divisor;
+
 *plow = result;
-*phigh = dividend % divisor;
+*phigh = result >> 64;
+return dividend % divisor;
 }
 #else
 void muls64(uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b);
 void mulu64(uint64_t *plow, uint64_t *phigh, uint64_t a, uint64_t b);
-void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
-void divs128(int64_t *plow, int64_t *phigh, int64_t divisor);
+uint64_t divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
+int64_t divs128(uint64_t *plow, int64_t *phigh, int64_t divisor);
 
 static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 {
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 510faf24cf..eeb7781a9e 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -120,7 +120,7 @@ uint64_t helper_divdeu(CPUPPCState *env, uint64_t ra, 
uint64_t rb, uint32_t oe)
 
 uint64_t helper_divde(CPUPPCState *env, uint64_t rau, uint64_t rbu, uint32_t 
oe)
 {
-int64_t rt = 0;
+uint64_t rt = 0;
 int64_t ra = (int64_t)rau;
 int64_t rb = (int64_t)rbu;
 int overflow = 0;
@@ -2506,6 +2506,7 @@ uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, 
uint32_t ps)
 int cr;
 uint64_t lo_value;
 uint64_t hi_value;
+uint64_t rem;
 ppc_avr_t ret = { .u64 = { 0, 0 } };
 
 if (b->VsrSD(0) < 0) {
@@ -2541,10 +2542,10 @@ uint32_t helper_bcdcfsq(ppc_avr_t *r, ppc_avr_t *b, 
uint32_t ps)
  * In that case, we leave r unchanged.
  */
 } else {
-divu128(_value, _value, 1000ULL);
+rem = divu128(_value, _value, 1000ULL);
 
-for (i = 1; i < 16; hi_value /= 10, i++) {
-bcd_put_digit(, hi_value % 10, i);
+for (i = 1; i < 16; rem /= 10, i++) {
+bcd_put_digit(, rem % 10, i);
 }
 
 for (; i < 32; lo_value /= 10, i++) {
diff --git a/util/host-utils.c b/util/host-utils.c
index 701a371843..bcc772b8ec 100644
--- a/util/host-utils.c
+++ b/util/host-utils.c
@@ -87,72 +87,117 @@ void muls64 (uint64_t *plow, uint64_t *phigh, int64_t a, 
int64_t b)
 }
 
 /*
- * Unsigned 128-by-64 division. Returns quotient via plow and
- * remainder via phigh.
- * The result must fit in 64 bits (plow) - otherwise, the result
- * is undefined.
- * This function will cause a division by zero if passed a zero divisor.
+ * Unsigned 128-by-64 division.
+ * Returns the remainder.
+ * Returns quotient via plow and phigh.
+ * Also returns the remainder via the function return value.
  */
-void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+uint64_t divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
 {
 uint64_t dhi = *phigh;
 uint64_t dlo = *plow;
-unsigned i;
-uint64_t carry 

[PULL 21/56] tcg/optimize: Split out fold_setcond2

2021-10-27 Thread Richard Henderson
Reduce some code duplication by folding the NE and EQ cases.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 145 -
 1 file changed, 72 insertions(+), 73 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5c3f8e8fcd..80e43deb8e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -854,6 +854,75 @@ static bool fold_remainder(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_setcond2(OptContext *ctx, TCGOp *op)
+{
+TCGCond cond = op->args[5];
+int i = do_constant_folding_cond2(>args[1], >args[3], cond);
+int inv = 0;
+
+if (i >= 0) {
+goto do_setcond_const;
+}
+
+switch (cond) {
+case TCG_COND_LT:
+case TCG_COND_GE:
+/*
+ * Simplify LT/GE comparisons vs zero to a single compare
+ * vs the high word of the input.
+ */
+if (arg_is_const(op->args[3]) && arg_info(op->args[3])->val == 0 &&
+arg_is_const(op->args[4]) && arg_info(op->args[4])->val == 0) {
+goto do_setcond_high;
+}
+break;
+
+case TCG_COND_NE:
+inv = 1;
+QEMU_FALLTHROUGH;
+case TCG_COND_EQ:
+/*
+ * Simplify EQ/NE comparisons where one of the pairs
+ * can be simplified.
+ */
+i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[1],
+ op->args[3], cond);
+switch (i ^ inv) {
+case 0:
+goto do_setcond_const;
+case 1:
+goto do_setcond_high;
+}
+
+i = do_constant_folding_cond(INDEX_op_setcond_i32, op->args[2],
+ op->args[4], cond);
+switch (i ^ inv) {
+case 0:
+goto do_setcond_const;
+case 1:
+op->args[2] = op->args[3];
+op->args[3] = cond;
+op->opc = INDEX_op_setcond_i32;
+break;
+}
+break;
+
+default:
+break;
+
+do_setcond_high:
+op->args[1] = op->args[2];
+op->args[2] = op->args[4];
+op->args[3] = cond;
+op->opc = INDEX_op_setcond_i32;
+break;
+}
+return false;
+
+ do_setcond_const:
+return tcg_opt_gen_movi(ctx, op, op->args[0], i);
+}
+
 static bool fold_shift(OptContext *ctx, TCGOp *op)
 {
 return fold_const2(ctx, op);
@@ -1653,79 +1722,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-case INDEX_op_setcond2_i32:
-i = do_constant_folding_cond2(>args[1], >args[3],
-  op->args[5]);
-if (i >= 0) {
-do_setcond_const:
-tcg_opt_gen_movi(, op, op->args[0], i);
-continue;
-}
-if ((op->args[5] == TCG_COND_LT || op->args[5] == TCG_COND_GE)
- && arg_is_const(op->args[3])
- && arg_info(op->args[3])->val == 0
- && arg_is_const(op->args[4])
- && arg_info(op->args[4])->val == 0) {
-/* Simplify LT/GE comparisons vs zero to a single compare
-   vs the high word of the input.  */
-do_setcond_high:
-reset_temp(op->args[0]);
-arg_info(op->args[0])->z_mask = 1;
-op->opc = INDEX_op_setcond_i32;
-op->args[1] = op->args[2];
-op->args[2] = op->args[4];
-op->args[3] = op->args[5];
-break;
-}
-if (op->args[5] == TCG_COND_EQ) {
-/* Simplify EQ comparisons where one of the pairs
-   can be simplified.  */
-i = do_constant_folding_cond(INDEX_op_setcond_i32,
- op->args[1], op->args[3],
- TCG_COND_EQ);
-if (i == 0) {
-goto do_setcond_const;
-} else if (i > 0) {
-goto do_setcond_high;
-}
-i = do_constant_folding_cond(INDEX_op_setcond_i32,
- op->args[2], op->args[4],
- TCG_COND_EQ);
-if (i == 0) {
-goto do_setcond_high;
-} else if (i < 0) {
-break;
-}
-do_setcond_low:
-reset_temp(op->args[0]);
-arg_info(op->args[0])->z_mask = 1;
-op->opc = INDEX_op_setcond_i32;
-op->args[2] = op->args[3];
-op->args[3] = op->args[5];
-break;
-}
-if (op->args[5] == TCG_COND_NE) {
-/* Simplify NE comparisons where one of the pairs
-   can be simplified.  */
-i = 

[PULL 10/56] tcg/optimize: Move prev_mb into OptContext

2021-10-27 Thread Richard Henderson
This will expose the variable to subroutines that
will be broken out of tcg_optimize.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 627a5b39f6..b875d76354 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -46,6 +46,7 @@ typedef struct TempOptInfo {
 
 typedef struct OptContext {
 TCGContext *tcg;
+TCGOp *prev_mb;
 TCGTempSet temps_used;
 } OptContext;
 
@@ -609,7 +610,7 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
 void tcg_optimize(TCGContext *s)
 {
 int nb_temps, nb_globals, i;
-TCGOp *op, *op_next, *prev_mb = NULL;
+TCGOp *op, *op_next;
 OptContext ctx = { .tcg = s };
 
 /* Array VALS has an element for each temp.
@@ -1566,7 +1567,7 @@ void tcg_optimize(TCGContext *s)
 }
 
 /* Eliminate duplicate and redundant fence instructions.  */
-if (prev_mb) {
+if (ctx.prev_mb) {
 switch (opc) {
 case INDEX_op_mb:
 /* Merge two barriers of the same type into one,
@@ -1580,7 +1581,7 @@ void tcg_optimize(TCGContext *s)
  * barrier.  This is stricter than specified but for
  * the purposes of TCG is better than not optimizing.
  */
-prev_mb->args[0] |= op->args[0];
+ctx.prev_mb->args[0] |= op->args[0];
 tcg_op_remove(s, op);
 break;
 
@@ -1597,11 +1598,11 @@ void tcg_optimize(TCGContext *s)
 case INDEX_op_qemu_st_i64:
 case INDEX_op_call:
 /* Opcodes that touch guest memory stop the optimization.  */
-prev_mb = NULL;
+ctx.prev_mb = NULL;
 break;
 }
 } else if (opc == INDEX_op_mb) {
-prev_mb = op;
+ctx.prev_mb = op;
 }
 }
 }
-- 
2.25.1




[PULL 22/56] tcg/optimize: Split out fold_brcond2

2021-10-27 Thread Richard Henderson
Reduce some code duplication by folding the NE and EQ cases.

Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 159 +
 1 file changed, 81 insertions(+), 78 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 80e43deb8e..c9db14f1d0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -714,6 +714,84 @@ static bool fold_andc(OptContext *ctx, TCGOp *op)
 return fold_const2(ctx, op);
 }
 
+static bool fold_brcond2(OptContext *ctx, TCGOp *op)
+{
+TCGCond cond = op->args[4];
+int i = do_constant_folding_cond2(>args[0], >args[2], cond);
+TCGArg label = op->args[5];
+int inv = 0;
+
+if (i >= 0) {
+goto do_brcond_const;
+}
+
+switch (cond) {
+case TCG_COND_LT:
+case TCG_COND_GE:
+/*
+ * Simplify LT/GE comparisons vs zero to a single compare
+ * vs the high word of the input.
+ */
+if (arg_is_const(op->args[2]) && arg_info(op->args[2])->val == 0 &&
+arg_is_const(op->args[3]) && arg_info(op->args[3])->val == 0) {
+goto do_brcond_high;
+}
+break;
+
+case TCG_COND_NE:
+inv = 1;
+QEMU_FALLTHROUGH;
+case TCG_COND_EQ:
+/*
+ * Simplify EQ/NE comparisons where one of the pairs
+ * can be simplified.
+ */
+i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[0],
+ op->args[2], cond);
+switch (i ^ inv) {
+case 0:
+goto do_brcond_const;
+case 1:
+goto do_brcond_high;
+}
+
+i = do_constant_folding_cond(INDEX_op_brcond_i32, op->args[1],
+ op->args[3], cond);
+switch (i ^ inv) {
+case 0:
+goto do_brcond_const;
+case 1:
+op->opc = INDEX_op_brcond_i32;
+op->args[1] = op->args[2];
+op->args[2] = cond;
+op->args[3] = label;
+break;
+}
+break;
+
+default:
+break;
+
+do_brcond_high:
+op->opc = INDEX_op_brcond_i32;
+op->args[0] = op->args[1];
+op->args[1] = op->args[3];
+op->args[2] = cond;
+op->args[3] = label;
+break;
+
+do_brcond_const:
+if (i == 0) {
+tcg_op_remove(ctx->tcg, op);
+return true;
+}
+op->opc = INDEX_op_br;
+op->args[0] = label;
+break;
+}
+return false;
+}
+
 static bool fold_call(OptContext *ctx, TCGOp *op)
 {
 TCGContext *s = ctx->tcg;
@@ -1644,84 +1722,6 @@ void tcg_optimize(TCGContext *s)
 }
 break;
 
-case INDEX_op_brcond2_i32:
-i = do_constant_folding_cond2(>args[0], >args[2],
-  op->args[4]);
-if (i == 0) {
-do_brcond_false:
-tcg_op_remove(s, op);
-continue;
-}
-if (i > 0) {
-do_brcond_true:
-op->opc = opc = INDEX_op_br;
-op->args[0] = op->args[5];
-break;
-}
-if ((op->args[4] == TCG_COND_LT || op->args[4] == TCG_COND_GE)
- && arg_is_const(op->args[2])
- && arg_info(op->args[2])->val == 0
- && arg_is_const(op->args[3])
- && arg_info(op->args[3])->val == 0) {
-/* Simplify LT/GE comparisons vs zero to a single compare
-   vs the high word of the input.  */
-do_brcond_high:
-op->opc = opc = INDEX_op_brcond_i32;
-op->args[0] = op->args[1];
-op->args[1] = op->args[3];
-op->args[2] = op->args[4];
-op->args[3] = op->args[5];
-break;
-}
-if (op->args[4] == TCG_COND_EQ) {
-/* Simplify EQ comparisons where one of the pairs
-   can be simplified.  */
-i = do_constant_folding_cond(INDEX_op_brcond_i32,
- op->args[0], op->args[2],
- TCG_COND_EQ);
-if (i == 0) {
-goto do_brcond_false;
-} else if (i > 0) {
-goto do_brcond_high;
-}
-i = do_constant_folding_cond(INDEX_op_brcond_i32,
- op->args[1], op->args[3],
- TCG_COND_EQ);
-if (i == 0) {
-goto do_brcond_false;
-} else if (i < 0) {
-break;
-}
-do_brcond_low:
-memset(_used, 0, sizeof(ctx.temps_used));
-op->opc = INDEX_op_brcond_i32;
-op->args[1] = op->args[2];
-

[PULL 07/56] tcg/optimize: Split out OptContext

2021-10-27 Thread Richard Henderson
Provide what will become a larger context for splitting
the very large tcg_optimize function.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 77 ++
 1 file changed, 40 insertions(+), 37 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 148e360fc6..b76991215e 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -44,6 +44,10 @@ typedef struct TempOptInfo {
 uint64_t z_mask;  /* mask bit is 0 if and only if value bit is 0 */
 } TempOptInfo;
 
+typedef struct OptContext {
+TCGTempSet temps_used;
+} OptContext;
+
 static inline TempOptInfo *ts_info(TCGTemp *ts)
 {
 return ts->state_ptr;
@@ -90,15 +94,15 @@ static void reset_temp(TCGArg arg)
 }
 
 /* Initialize and activate a temporary.  */
-static void init_ts_info(TCGTempSet *temps_used, TCGTemp *ts)
+static void init_ts_info(OptContext *ctx, TCGTemp *ts)
 {
 size_t idx = temp_idx(ts);
 TempOptInfo *ti;
 
-if (test_bit(idx, temps_used->l)) {
+if (test_bit(idx, ctx->temps_used.l)) {
 return;
 }
-set_bit(idx, temps_used->l);
+set_bit(idx, ctx->temps_used.l);
 
 ti = ts->state_ptr;
 if (ti == NULL) {
@@ -122,9 +126,9 @@ static void init_ts_info(TCGTempSet *temps_used, TCGTemp 
*ts)
 }
 }
 
-static void init_arg_info(TCGTempSet *temps_used, TCGArg arg)
+static void init_arg_info(OptContext *ctx, TCGArg arg)
 {
-init_ts_info(temps_used, arg_temp(arg));
+init_ts_info(ctx, arg_temp(arg));
 }
 
 static TCGTemp *find_better_copy(TCGContext *s, TCGTemp *ts)
@@ -229,7 +233,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, 
TCGArg dst, TCGArg src)
 }
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet *temps_used,
+static void tcg_opt_gen_movi(TCGContext *s, OptContext *ctx,
  TCGOp *op, TCGArg dst, uint64_t val)
 {
 const TCGOpDef *def = _op_defs[op->opc];
@@ -246,7 +250,7 @@ static void tcg_opt_gen_movi(TCGContext *s, TCGTempSet 
*temps_used,
 
 /* Convert movi to mov with constant temp. */
 tv = tcg_constant_internal(type, val);
-init_ts_info(temps_used, tv);
+init_ts_info(ctx, tv);
 tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
 }
 
@@ -605,7 +609,7 @@ void tcg_optimize(TCGContext *s)
 {
 int nb_temps, nb_globals, i;
 TCGOp *op, *op_next, *prev_mb = NULL;
-TCGTempSet temps_used;
+OptContext ctx = {};
 
 /* Array VALS has an element for each temp.
If this temp holds a constant then its value is kept in VALS' element.
@@ -615,7 +619,6 @@ void tcg_optimize(TCGContext *s)
 nb_temps = s->nb_temps;
 nb_globals = s->nb_globals;
 
-memset(_used, 0, sizeof(temps_used));
 for (i = 0; i < nb_temps; ++i) {
 s->temps[i].state_ptr = NULL;
 }
@@ -634,14 +637,14 @@ void tcg_optimize(TCGContext *s)
 for (i = 0; i < nb_oargs + nb_iargs; i++) {
 TCGTemp *ts = arg_temp(op->args[i]);
 if (ts) {
-init_ts_info(_used, ts);
+init_ts_info(, ts);
 }
 }
 } else {
 nb_oargs = def->nb_oargs;
 nb_iargs = def->nb_iargs;
 for (i = 0; i < nb_oargs + nb_iargs; i++) {
-init_arg_info(_used, op->args[i]);
+init_arg_info(, op->args[i]);
 }
 }
 
@@ -720,7 +723,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(rotr):
 if (arg_is_const(op->args[1])
 && arg_info(op->args[1])->val == 0) {
-tcg_opt_gen_movi(s, _used, op, op->args[0], 0);
+tcg_opt_gen_movi(s, , op, op->args[0], 0);
 continue;
 }
 break;
@@ -1085,7 +1088,7 @@ void tcg_optimize(TCGContext *s)
 
 if (partmask == 0) {
 tcg_debug_assert(nb_oargs == 1);
-tcg_opt_gen_movi(s, _used, op, op->args[0], 0);
+tcg_opt_gen_movi(s, , op, op->args[0], 0);
 continue;
 }
 if (affected == 0) {
@@ -1102,7 +1105,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(mulsh):
 if (arg_is_const(op->args[2])
 && arg_info(op->args[2])->val == 0) {
-tcg_opt_gen_movi(s, _used, op, op->args[0], 0);
+tcg_opt_gen_movi(s, , op, op->args[0], 0);
 continue;
 }
 break;
@@ -1129,7 +1132,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(sub):
 CASE_OP_32_64_VEC(xor):
 if (args_are_copies(op->args[1], op->args[2])) {
-tcg_opt_gen_movi(s, _used, op, op->args[0], 0);
+tcg_opt_gen_movi(s, , op, op->args[0], 0);
 continue;
 }
 break;
@@ -1149,7 +1152,7 @@ void tcg_optimize(TCGContext *s)
 if (arg_is_const(op->args[1])) {

[PULL 08/56] tcg/optimize: Remove do_default label

2021-10-27 Thread Richard Henderson
Break the final cleanup clause out of the main switch
statement.  When fully folding an opcode to mov/movi,
use "continue" to process the next opcode, else break
to fall into the final cleanup.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 190 -
 1 file changed, 94 insertions(+), 96 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index b76991215e..a37efff4d0 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1146,16 +1146,16 @@ void tcg_optimize(TCGContext *s)
 switch (opc) {
 CASE_OP_32_64_VEC(mov):
 tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
-break;
+continue;
 
 case INDEX_op_dup_vec:
 if (arg_is_const(op->args[1])) {
 tmp = arg_info(op->args[1])->val;
 tmp = dup_const(TCGOP_VECE(op), tmp);
 tcg_opt_gen_movi(s, , op, op->args[0], tmp);
-break;
+continue;
 }
-goto do_default;
+break;
 
 case INDEX_op_dup2_vec:
 assert(TCG_TARGET_REG_BITS == 32);
@@ -1163,13 +1163,13 @@ void tcg_optimize(TCGContext *s)
 tcg_opt_gen_movi(s, , op, op->args[0],
  deposit64(arg_info(op->args[1])->val, 32, 32,
arg_info(op->args[2])->val));
-break;
+continue;
 } else if (args_are_copies(op->args[1], op->args[2])) {
 op->opc = INDEX_op_dup_vec;
 TCGOP_VECE(op) = MO_32;
 nb_iargs = 1;
 }
-goto do_default;
+break;
 
 CASE_OP_32_64(not):
 CASE_OP_32_64(neg):
@@ -1187,9 +1187,9 @@ void tcg_optimize(TCGContext *s)
 if (arg_is_const(op->args[1])) {
 tmp = do_constant_folding(opc, arg_info(op->args[1])->val, 0);
 tcg_opt_gen_movi(s, , op, op->args[0], tmp);
-break;
+continue;
 }
-goto do_default;
+break;
 
 CASE_OP_32_64(bswap16):
 CASE_OP_32_64(bswap32):
@@ -1198,9 +1198,9 @@ void tcg_optimize(TCGContext *s)
 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
   op->args[2]);
 tcg_opt_gen_movi(s, , op, op->args[0], tmp);
-break;
+continue;
 }
-goto do_default;
+break;
 
 CASE_OP_32_64(add):
 CASE_OP_32_64(sub):
@@ -1228,9 +1228,9 @@ void tcg_optimize(TCGContext *s)
 tmp = do_constant_folding(opc, arg_info(op->args[1])->val,
   arg_info(op->args[2])->val);
 tcg_opt_gen_movi(s, , op, op->args[0], tmp);
-break;
+continue;
 }
-goto do_default;
+break;
 
 CASE_OP_32_64(clz):
 CASE_OP_32_64(ctz):
@@ -1242,9 +1242,9 @@ void tcg_optimize(TCGContext *s)
 } else {
 tcg_opt_gen_mov(s, op, op->args[0], op->args[2]);
 }
-break;
+continue;
 }
-goto do_default;
+break;
 
 CASE_OP_32_64(deposit):
 if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
@@ -1252,27 +1252,27 @@ void tcg_optimize(TCGContext *s)
 op->args[3], op->args[4],
 arg_info(op->args[2])->val);
 tcg_opt_gen_movi(s, , op, op->args[0], tmp);
-break;
+continue;
 }
-goto do_default;
+break;
 
 CASE_OP_32_64(extract):
 if (arg_is_const(op->args[1])) {
 tmp = extract64(arg_info(op->args[1])->val,
 op->args[2], op->args[3]);
 tcg_opt_gen_movi(s, , op, op->args[0], tmp);
-break;
+continue;
 }
-goto do_default;
+break;
 
 CASE_OP_32_64(sextract):
 if (arg_is_const(op->args[1])) {
 tmp = sextract64(arg_info(op->args[1])->val,
  op->args[2], op->args[3]);
 tcg_opt_gen_movi(s, , op, op->args[0], tmp);
-break;
+continue;
 }
-goto do_default;
+break;
 
 CASE_OP_32_64(extract2):
 if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
@@ -1287,40 +1287,40 @@ void tcg_optimize(TCGContext *s)
 ((uint32_t)v2 << (32 - shr)));
 }
 tcg_opt_gen_movi(s, , op, op->args[0], tmp);
-   

[PULL 02/56] host-utils: move checks out of divu128/divs128

2021-10-27 Thread Richard Henderson
From: Luis Pires 

In preparation for changing the divu128/divs128 implementations
to allow for quotients larger than 64 bits, move the div-by-zero
and overflow checks to the callers.

Signed-off-by: Luis Pires 
Reviewed-by: Richard Henderson 
Message-Id: <20211025191154.350831-2-luis.pi...@eldorado.org.br>
Signed-off-by: Richard Henderson 
---
 include/hw/clock.h|  5 +++--
 include/qemu/host-utils.h | 34 -
 target/ppc/int_helper.c   | 14 +-
 util/host-utils.c | 40 ++-
 4 files changed, 42 insertions(+), 51 deletions(-)

diff --git a/include/hw/clock.h b/include/hw/clock.h
index 11f67fb970..7443e6c4ab 100644
--- a/include/hw/clock.h
+++ b/include/hw/clock.h
@@ -324,8 +324,9 @@ static inline uint64_t clock_ns_to_ticks(const Clock *clk, 
uint64_t ns)
 return 0;
 }
 /*
- * Ignore divu128() return value as we've caught div-by-zero and don't
- * need different behaviour for overflow.
+ * BUG: when CONFIG_INT128 is not defined, the current implementation of
+ * divu128 does not return a valid truncated quotient, so the result will
+ * be wrong.
  */
 divu128(, , clk->period);
 return lo;
diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index ca9f3f021b..e82e6239af 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -52,36 +52,26 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, 
uint32_t c)
 return (__int128_t)a * b / c;
 }
 
-static inline int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+static inline void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
 {
-if (divisor == 0) {
-return 1;
-} else {
-__uint128_t dividend = ((__uint128_t)*phigh << 64) | *plow;
-__uint128_t result = dividend / divisor;
-*plow = result;
-*phigh = dividend % divisor;
-return result > UINT64_MAX;
-}
+__uint128_t dividend = ((__uint128_t)*phigh << 64) | *plow;
+__uint128_t result = dividend / divisor;
+*plow = result;
+*phigh = dividend % divisor;
 }
 
-static inline int divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
+static inline void divs128(int64_t *plow, int64_t *phigh, int64_t divisor)
 {
-if (divisor == 0) {
-return 1;
-} else {
-__int128_t dividend = ((__int128_t)*phigh << 64) | (uint64_t)*plow;
-__int128_t result = dividend / divisor;
-*plow = result;
-*phigh = dividend % divisor;
-return result != *plow;
-}
+__int128_t dividend = ((__int128_t)*phigh << 64) | (uint64_t)*plow;
+__int128_t result = dividend / divisor;
+*plow = result;
+*phigh = dividend % divisor;
 }
 #else
 void muls64(uint64_t *plow, uint64_t *phigh, int64_t a, int64_t b);
 void mulu64(uint64_t *plow, uint64_t *phigh, uint64_t a, uint64_t b);
-int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
-int divs128(int64_t *plow, int64_t *phigh, int64_t divisor);
+void divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
+void divs128(int64_t *plow, int64_t *phigh, int64_t divisor);
 
 static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 {
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index f5dac3aa87..510faf24cf 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -104,10 +104,11 @@ uint64_t helper_divdeu(CPUPPCState *env, uint64_t ra, 
uint64_t rb, uint32_t oe)
 uint64_t rt = 0;
 int overflow = 0;
 
-overflow = divu128(, , rb);
-
-if (unlikely(overflow)) {
+if (unlikely(rb == 0 || ra >= rb)) {
+overflow = 1;
 rt = 0; /* Undefined */
+} else {
+divu128(, , rb);
 }
 
 if (oe) {
@@ -122,10 +123,13 @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, 
uint64_t rbu, uint32_t oe)
 int64_t rt = 0;
 int64_t ra = (int64_t)rau;
 int64_t rb = (int64_t)rbu;
-int overflow = divs128(, , rb);
+int overflow = 0;
 
-if (unlikely(overflow)) {
+if (unlikely(rb == 0 || uabs64(ra) >= uabs64(rb))) {
+overflow = 1;
 rt = 0; /* Undefined */
+} else {
+divs128(, , rb);
 }
 
 if (oe) {
diff --git a/util/host-utils.c b/util/host-utils.c
index a789a11b46..701a371843 100644
--- a/util/host-utils.c
+++ b/util/host-utils.c
@@ -86,24 +86,23 @@ void muls64 (uint64_t *plow, uint64_t *phigh, int64_t a, 
int64_t b)
 *phigh = rh;
 }
 
-/* Unsigned 128x64 division.  Returns 1 if overflow (divide by zero or */
-/* quotient exceeds 64 bits).  Otherwise returns quotient via plow and */
-/* remainder via phigh. */
-int divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor)
+/*
+ * Unsigned 128-by-64 division. Returns quotient via plow and
+ * remainder via phigh.
+ * The result must fit in 64 bits (plow) - otherwise, the result
+ * is undefined.
+ * This function will cause a division by zero if passed a zero divisor.
+ */
+void 

[PULL 14/56] tcg/optimize: Drop nb_oargs, nb_iargs locals

2021-10-27 Thread Richard Henderson
Rather than try to keep these up-to-date across folding,
re-read nb_oargs at the end, after re-reading the opcode.

A couple of asserts need dropping, but that will take care
of itself as we split the function further.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 74b9aa025a..77cdffaaef 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -675,7 +675,6 @@ void tcg_optimize(TCGContext *s)
 
 QTAILQ_FOREACH_SAFE(op, >ops, link, op_next) {
 uint64_t z_mask, partmask, affected, tmp;
-int nb_oargs, nb_iargs;
 TCGOpcode opc = op->opc;
 const TCGOpDef *def;
 
@@ -686,10 +685,8 @@ void tcg_optimize(TCGContext *s)
 }
 
 def = _op_defs[opc];
-nb_oargs = def->nb_oargs;
-nb_iargs = def->nb_iargs;
-init_arguments(, op, nb_oargs + nb_iargs);
-copy_propagate(, op, nb_oargs, nb_iargs);
+init_arguments(, op, def->nb_oargs + def->nb_iargs);
+copy_propagate(, op, def->nb_oargs, def->nb_iargs);
 
 /* For commutative operations make constant second argument */
 switch (opc) {
@@ -1063,7 +1060,7 @@ void tcg_optimize(TCGContext *s)
 
 CASE_OP_32_64(qemu_ld):
 {
-MemOpIdx oi = op->args[nb_oargs + nb_iargs];
+MemOpIdx oi = op->args[def->nb_oargs + def->nb_iargs];
 MemOp mop = get_memop(oi);
 if (!(mop & MO_SIGN)) {
 z_mask = (2ULL << ((8 << (mop & MO_SIZE)) - 1)) - 1;
@@ -1122,12 +1119,10 @@ void tcg_optimize(TCGContext *s)
 }
 
 if (partmask == 0) {
-tcg_debug_assert(nb_oargs == 1);
 tcg_opt_gen_movi(, op, op->args[0], 0);
 continue;
 }
 if (affected == 0) {
-tcg_debug_assert(nb_oargs == 1);
 tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
 continue;
 }
@@ -1202,7 +1197,6 @@ void tcg_optimize(TCGContext *s)
 } else if (args_are_copies(op->args[1], op->args[2])) {
 op->opc = INDEX_op_dup_vec;
 TCGOP_VECE(op) = MO_32;
-nb_iargs = 1;
 }
 break;
 
@@ -1371,7 +1365,6 @@ void tcg_optimize(TCGContext *s)
 op->opc = opc = (opc == INDEX_op_movcond_i32
  ? INDEX_op_setcond_i32
  : INDEX_op_setcond_i64);
-nb_iargs = 2;
 }
 break;
 
@@ -1579,6 +1572,7 @@ void tcg_optimize(TCGContext *s)
 if (def->flags & TCG_OPF_BB_END) {
 memset(_used, 0, sizeof(ctx.temps_used));
 } else {
+int nb_oargs = def->nb_oargs;
 for (i = 0; i < nb_oargs; i++) {
 reset_temp(op->args[i]);
 /* Save the corresponding known-zero bits mask for the
-- 
2.25.1




[PULL 18/56] tcg/optimize: Use a boolean to avoid a mass of continues

2021-10-27 Thread Richard Henderson
Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 368457f4a2..699476e2f1 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -713,6 +713,7 @@ void tcg_optimize(TCGContext *s)
 uint64_t z_mask, partmask, affected, tmp;
 TCGOpcode opc = op->opc;
 const TCGOpDef *def;
+bool done = false;
 
 /* Calls are special. */
 if (opc == INDEX_op_call) {
@@ -1212,8 +1213,8 @@ void tcg_optimize(TCGContext *s)
allocator where needed and possible.  Also detect copies. */
 switch (opc) {
 CASE_OP_32_64_VEC(mov):
-tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
-continue;
+done = tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
+break;
 
 case INDEX_op_dup_vec:
 if (arg_is_const(op->args[1])) {
@@ -1602,7 +1603,9 @@ void tcg_optimize(TCGContext *s)
 break;
 }
 
-finish_folding(, op);
+if (!done) {
+finish_folding(, op);
+}
 
 /* Eliminate duplicate and redundant fence instructions.  */
 if (ctx.prev_mb) {
-- 
2.25.1




[PULL 05/56] host-utils: add unit tests for divu128/divs128

2021-10-27 Thread Richard Henderson
From: Luis Pires 

Signed-off-by: Luis Pires 
Reviewed-by: Richard Henderson 
Message-Id: <20211025191154.350831-5-luis.pi...@eldorado.org.br>
Signed-off-by: Richard Henderson 
---
 tests/unit/test-div128.c | 197 +++
 tests/unit/meson.build   |   1 +
 2 files changed, 198 insertions(+)
 create mode 100644 tests/unit/test-div128.c

diff --git a/tests/unit/test-div128.c b/tests/unit/test-div128.c
new file mode 100644
index 00..0bc25fe4a8
--- /dev/null
+++ b/tests/unit/test-div128.c
@@ -0,0 +1,197 @@
+/*
+ * Test 128-bit division functions
+ *
+ * Copyright (c) 2021 Instituto de Pesquisas Eldorado (eldorado.org.br)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+
+typedef struct {
+uint64_t high;
+uint64_t low;
+uint64_t rhigh;
+uint64_t rlow;
+uint64_t divisor;
+uint64_t remainder;
+} test_data_unsigned;
+
+typedef struct {
+int64_t high;
+uint64_t low;
+int64_t rhigh;
+uint64_t rlow;
+int64_t divisor;
+int64_t remainder;
+} test_data_signed;
+
+static const test_data_unsigned test_table_unsigned[] = {
+/* Dividend fits in 64 bits */
+{ 0xULL, 0xULL,
+  0xULL, 0xULL,
+  0x0001ULL, 0xULL},
+{ 0xULL, 0x0001ULL,
+  0xULL, 0x0001ULL,
+  0x0001ULL, 0xULL},
+{ 0xULL, 0x0003ULL,
+  0xULL, 0x0001ULL,
+  0x0002ULL, 0x0001ULL},
+{ 0xULL, 0x8000ULL,
+  0xULL, 0x8000ULL,
+  0x0001ULL, 0xULL},
+{ 0xULL, 0xa000ULL,
+  0xULL, 0x0002ULL,
+  0x4000ULL, 0x2000ULL},
+{ 0xULL, 0x8000ULL,
+  0xULL, 0x0001ULL,
+  0x8000ULL, 0xULL},
+
+/* Dividend > 64 bits, with MSB 0 */
+{ 0x123456789abcdefeULL, 0xefedcba987654321ULL,
+  0x123456789abcdefeULL, 0xefedcba987654321ULL,
+  0x0001ULL, 0xULL},
+{ 0x123456789abcdefeULL, 0xefedcba987654321ULL,
+  0x0001ULL, 0x000dULL,
+  0x123456789abcdefeULL, 0x03456789abcdf03bULL},
+{ 0x123456789abcdefeULL, 0xefedcba987654321ULL,
+  0x0123456789abcdefULL, 0xeefedcba98765432ULL,
+  0x0010ULL, 0x0001ULL},
+
+/* Dividend > 64 bits, with MSB 1 */
+{ 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+  0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+  0x0001ULL, 0xULL},
+{ 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+  0x0001ULL, 0xULL,
+  0xfeeddccbbaa99887ULL, 0x766554433221100fULL},
+{ 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+  0x0feeddccbbaa9988ULL, 0x7766554433221100ULL,
+  0x0010ULL, 0x000fULL},
+{ 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+  0x000eULL, 0x00f0f0f0f0f0f35aULL,
+  0x123456789abcdefeULL, 0x0f8922bc55ef90c3ULL},
+
+/**
+ * Divisor == 64 bits, with MSB 1
+ * and high 64 bits of dividend >= divisor
+ * (for testing normalization)
+ */
+{ 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+  0x0001ULL, 0xULL,
+  0xfeeddccbbaa99887ULL, 0x766554433221100fULL},
+{ 0xfeeddccbbaa99887ULL, 0x766554433221100fULL,
+  0x0001ULL, 0xfddbb9977553310aULL,
+  0x8001ULL, 0x78899aabbccddf05ULL},
+
+/* Dividend > 64 bits, divisor almost as big */
+{ 0x0001ULL, 0x23456789abcdef01ULL,
+  0xULL, 0x000fULL,
+  0x123456789abcdefeULL, 0x123456789abcde1fULL},
+};
+
+static const test_data_signed test_table_signed[] = {
+/* Positive dividend, positive/negative divisors */
+{ 0xLL, 0x00bc614eULL,
+  0xLL, 0x00bc614eULL,
+  0x0001LL, 0xLL},
+{ 0xLL, 

[PULL 13/56] tcg/optimize: Split out fold_call

2021-10-27 Thread Richard Henderson
Calls are special in that they have a variable number
of arguments, and need to be able to clobber globals.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 63 --
 1 file changed, 41 insertions(+), 22 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index fad6f5de1f..74b9aa025a 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -624,10 +624,42 @@ static void copy_propagate(OptContext *ctx, TCGOp *op,
 }
 }
 
+static bool fold_call(OptContext *ctx, TCGOp *op)
+{
+TCGContext *s = ctx->tcg;
+int nb_oargs = TCGOP_CALLO(op);
+int nb_iargs = TCGOP_CALLI(op);
+int flags, i;
+
+init_arguments(ctx, op, nb_oargs + nb_iargs);
+copy_propagate(ctx, op, nb_oargs, nb_iargs);
+
+/* If the function reads or writes globals, reset temp data. */
+flags = tcg_call_flags(op);
+if (!(flags & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
+int nb_globals = s->nb_globals;
+
+for (i = 0; i < nb_globals; i++) {
+if (test_bit(i, ctx->temps_used.l)) {
+reset_ts(>tcg->temps[i]);
+}
+}
+}
+
+/* Reset temp data for outputs. */
+for (i = 0; i < nb_oargs; i++) {
+reset_temp(op->args[i]);
+}
+
+/* Stop optimizing MB across calls. */
+ctx->prev_mb = NULL;
+return true;
+}
+
 /* Propagate constants and copies, fold constant expressions. */
 void tcg_optimize(TCGContext *s)
 {
-int nb_temps, nb_globals, i;
+int nb_temps, i;
 TCGOp *op, *op_next;
 OptContext ctx = { .tcg = s };
 
@@ -637,8 +669,6 @@ void tcg_optimize(TCGContext *s)
available through the doubly linked circular list. */
 
 nb_temps = s->nb_temps;
-nb_globals = s->nb_globals;
-
 for (i = 0; i < nb_temps; ++i) {
 s->temps[i].state_ptr = NULL;
 }
@@ -647,17 +677,17 @@ void tcg_optimize(TCGContext *s)
 uint64_t z_mask, partmask, affected, tmp;
 int nb_oargs, nb_iargs;
 TCGOpcode opc = op->opc;
-const TCGOpDef *def = _op_defs[opc];
+const TCGOpDef *def;
 
-/* Count the arguments, and initialize the temps that are
-   going to be used */
+/* Calls are special. */
 if (opc == INDEX_op_call) {
-nb_oargs = TCGOP_CALLO(op);
-nb_iargs = TCGOP_CALLI(op);
-} else {
-nb_oargs = def->nb_oargs;
-nb_iargs = def->nb_iargs;
+fold_call(, op);
+continue;
 }
+
+def = _op_defs[opc];
+nb_oargs = def->nb_oargs;
+nb_iargs = def->nb_iargs;
 init_arguments(, op, nb_oargs + nb_iargs);
 copy_propagate(, op, nb_oargs, nb_iargs);
 
@@ -1549,16 +1579,6 @@ void tcg_optimize(TCGContext *s)
 if (def->flags & TCG_OPF_BB_END) {
 memset(_used, 0, sizeof(ctx.temps_used));
 } else {
-if (opc == INDEX_op_call &&
-!(tcg_call_flags(op)
-  & (TCG_CALL_NO_READ_GLOBALS | TCG_CALL_NO_WRITE_GLOBALS))) {
-for (i = 0; i < nb_globals; i++) {
-if (test_bit(i, ctx.temps_used.l)) {
-reset_ts(>temps[i]);
-}
-}
-}
-
 for (i = 0; i < nb_oargs; i++) {
 reset_temp(op->args[i]);
 /* Save the corresponding known-zero bits mask for the
@@ -1599,7 +1619,6 @@ void tcg_optimize(TCGContext *s)
 case INDEX_op_qemu_st_i32:
 case INDEX_op_qemu_st8_i32:
 case INDEX_op_qemu_st_i64:
-case INDEX_op_call:
 /* Opcodes that touch guest memory stop the optimization.  */
 ctx.prev_mb = NULL;
 break;
-- 
2.25.1




[PULL 16/56] tcg/optimize: Return true from tcg_opt_gen_{mov,movi}

2021-10-27 Thread Richard Henderson
This will allow callers to tail call to these functions
and return true indicating processing complete.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 19c01687b4..066e635f73 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -180,7 +180,7 @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
 return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
 }
 
-static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
+static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
 {
 TCGTemp *dst_ts = arg_temp(dst);
 TCGTemp *src_ts = arg_temp(src);
@@ -192,7 +192,7 @@ static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, 
TCGArg dst, TCGArg src)
 
 if (ts_are_copies(dst_ts, src_ts)) {
 tcg_op_remove(ctx->tcg, op);
-return;
+return true;
 }
 
 reset_ts(dst_ts);
@@ -228,9 +228,10 @@ static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, 
TCGArg dst, TCGArg src)
 di->is_const = si->is_const;
 di->val = si->val;
 }
+return true;
 }
 
-static void tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
+static bool tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
  TCGArg dst, uint64_t val)
 {
 const TCGOpDef *def = _op_defs[op->opc];
@@ -248,7 +249,7 @@ static void tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
 /* Convert movi to mov with constant temp. */
 tv = tcg_constant_internal(type, val);
 init_ts_info(ctx, tv);
-tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
+return tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
 }
 
 static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
-- 
2.25.1




[PULL 01/56] qemu/int128: Add int128_{not,xor}

2021-10-27 Thread Richard Henderson
From: Frédéric Pétrot 

Addition of not and xor on 128-bit integers.

Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
Message-Id: <20211025122818.168890-3-frederic.pet...@univ-grenoble-alpes.fr>
[rth: Split out logical operations.]
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 include/qemu/int128.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index 2ac0746426..b6d517aea4 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -58,6 +58,11 @@ static inline Int128 int128_exts64(int64_t a)
 return a;
 }
 
+static inline Int128 int128_not(Int128 a)
+{
+return ~a;
+}
+
 static inline Int128 int128_and(Int128 a, Int128 b)
 {
 return a & b;
@@ -68,6 +73,11 @@ static inline Int128 int128_or(Int128 a, Int128 b)
 return a | b;
 }
 
+static inline Int128 int128_xor(Int128 a, Int128 b)
+{
+return a ^ b;
+}
+
 static inline Int128 int128_rshift(Int128 a, int n)
 {
 return a >> n;
@@ -235,6 +245,11 @@ static inline Int128 int128_exts64(int64_t a)
 return int128_make128(a, (a < 0) ? -1 : 0);
 }
 
+static inline Int128 int128_not(Int128 a)
+{
+return int128_make128(~a.lo, ~a.hi);
+}
+
 static inline Int128 int128_and(Int128 a, Int128 b)
 {
 return int128_make128(a.lo & b.lo, a.hi & b.hi);
@@ -245,6 +260,11 @@ static inline Int128 int128_or(Int128 a, Int128 b)
 return int128_make128(a.lo | b.lo, a.hi | b.hi);
 }
 
+static inline Int128 int128_xor(Int128 a, Int128 b)
+{
+return int128_make128(a.lo ^ b.lo, a.hi ^ b.hi);
+}
+
 static inline Int128 int128_rshift(Int128 a, int n)
 {
 int64_t h;
-- 
2.25.1




[PULL 09/56] tcg/optimize: Change tcg_opt_gen_{mov,movi} interface

2021-10-27 Thread Richard Henderson
Adjust the interface to take the OptContext parameter instead
of TCGContext or both.

Reviewed-by: Alex Bennée 
Reviewed-by: Luis Pires 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 67 +-
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index a37efff4d0..627a5b39f6 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -45,6 +45,7 @@ typedef struct TempOptInfo {
 } TempOptInfo;
 
 typedef struct OptContext {
+TCGContext *tcg;
 TCGTempSet temps_used;
 } OptContext;
 
@@ -183,7 +184,7 @@ static bool args_are_copies(TCGArg arg1, TCGArg arg2)
 return ts_are_copies(arg_temp(arg1), arg_temp(arg2));
 }
 
-static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, TCGArg dst, TCGArg src)
+static void tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src)
 {
 TCGTemp *dst_ts = arg_temp(dst);
 TCGTemp *src_ts = arg_temp(src);
@@ -194,7 +195,7 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, 
TCGArg dst, TCGArg src)
 TCGOpcode new_op;
 
 if (ts_are_copies(dst_ts, src_ts)) {
-tcg_op_remove(s, op);
+tcg_op_remove(ctx->tcg, op);
 return;
 }
 
@@ -233,8 +234,8 @@ static void tcg_opt_gen_mov(TCGContext *s, TCGOp *op, 
TCGArg dst, TCGArg src)
 }
 }
 
-static void tcg_opt_gen_movi(TCGContext *s, OptContext *ctx,
- TCGOp *op, TCGArg dst, uint64_t val)
+static void tcg_opt_gen_movi(OptContext *ctx, TCGOp *op,
+ TCGArg dst, uint64_t val)
 {
 const TCGOpDef *def = _op_defs[op->opc];
 TCGType type;
@@ -251,7 +252,7 @@ static void tcg_opt_gen_movi(TCGContext *s, OptContext *ctx,
 /* Convert movi to mov with constant temp. */
 tv = tcg_constant_internal(type, val);
 init_ts_info(ctx, tv);
-tcg_opt_gen_mov(s, op, dst, temp_arg(tv));
+tcg_opt_gen_mov(ctx, op, dst, temp_arg(tv));
 }
 
 static uint64_t do_constant_folding_2(TCGOpcode op, uint64_t x, uint64_t y)
@@ -609,7 +610,7 @@ void tcg_optimize(TCGContext *s)
 {
 int nb_temps, nb_globals, i;
 TCGOp *op, *op_next, *prev_mb = NULL;
-OptContext ctx = {};
+OptContext ctx = { .tcg = s };
 
 /* Array VALS has an element for each temp.
If this temp holds a constant then its value is kept in VALS' element.
@@ -723,7 +724,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(rotr):
 if (arg_is_const(op->args[1])
 && arg_info(op->args[1])->val == 0) {
-tcg_opt_gen_movi(s, , op, op->args[0], 0);
+tcg_opt_gen_movi(, op, op->args[0], 0);
 continue;
 }
 break;
@@ -838,7 +839,7 @@ void tcg_optimize(TCGContext *s)
 if (!arg_is_const(op->args[1])
 && arg_is_const(op->args[2])
 && arg_info(op->args[2])->val == 0) {
-tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
 continue;
 }
 break;
@@ -848,7 +849,7 @@ void tcg_optimize(TCGContext *s)
 if (!arg_is_const(op->args[1])
 && arg_is_const(op->args[2])
 && arg_info(op->args[2])->val == -1) {
-tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
 continue;
 }
 break;
@@ -1088,12 +1089,12 @@ void tcg_optimize(TCGContext *s)
 
 if (partmask == 0) {
 tcg_debug_assert(nb_oargs == 1);
-tcg_opt_gen_movi(s, , op, op->args[0], 0);
+tcg_opt_gen_movi(, op, op->args[0], 0);
 continue;
 }
 if (affected == 0) {
 tcg_debug_assert(nb_oargs == 1);
-tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
 continue;
 }
 
@@ -1105,7 +1106,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(mulsh):
 if (arg_is_const(op->args[2])
 && arg_info(op->args[2])->val == 0) {
-tcg_opt_gen_movi(s, , op, op->args[0], 0);
+tcg_opt_gen_movi(, op, op->args[0], 0);
 continue;
 }
 break;
@@ -1118,7 +1119,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(or):
 CASE_OP_32_64_VEC(and):
 if (args_are_copies(op->args[1], op->args[2])) {
-tcg_opt_gen_mov(s, op, op->args[0], op->args[1]);
+tcg_opt_gen_mov(, op, op->args[0], op->args[1]);
 continue;
 }
 break;
@@ -1132,7 +1133,7 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64_VEC(sub):
 CASE_OP_32_64_VEC(xor):
 if (args_are_copies(op->args[1], op->args[2])) {
-tcg_opt_gen_movi(s, , op, op->args[0], 0);
+ 

[PULL 03/56] host-utils: move udiv_qrnnd() to host-utils

2021-10-27 Thread Richard Henderson
From: Luis Pires 

Move udiv_qrnnd() from include/fpu/softfloat-macros.h to host-utils,
so it can be reused by divu128().

Signed-off-by: Luis Pires 
Reviewed-by: Richard Henderson 
Message-Id: <20211025191154.350831-3-luis.pi...@eldorado.org.br>
Signed-off-by: Richard Henderson 
---
 include/fpu/softfloat-macros.h | 82 --
 include/qemu/host-utils.h  | 81 +
 2 files changed, 81 insertions(+), 82 deletions(-)

diff --git a/include/fpu/softfloat-macros.h b/include/fpu/softfloat-macros.h
index 81c3fe8256..f35cdbfa63 100644
--- a/include/fpu/softfloat-macros.h
+++ b/include/fpu/softfloat-macros.h
@@ -8,7 +8,6 @@
  * so some portions are provided under:
  *  the SoftFloat-2a license
  *  the BSD license
- *  GPL-v2-or-later
  *
  * Any future contributions to this file after December 1st 2014 will be
  * taken to be licensed under the Softfloat-2a license unless specifically
@@ -75,10 +74,6 @@ this code that are retained.
  * THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-/* Portions of this work are licensed under the terms of the GNU GPL,
- * version 2 or later. See the COPYING file in the top-level directory.
- */
-
 #ifndef FPU_SOFTFLOAT_MACROS_H
 #define FPU_SOFTFLOAT_MACROS_H
 
@@ -585,83 +580,6 @@ static inline uint64_t estimateDiv128To64(uint64_t a0, 
uint64_t a1, uint64_t b)
 
 }
 
-/* From the GNU Multi Precision Library - longlong.h __udiv_qrnnd
- * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
- *
- * Licensed under the GPLv2/LGPLv3
- */
-static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
-  uint64_t n0, uint64_t d)
-{
-#if defined(__x86_64__)
-uint64_t q;
-asm("divq %4" : "=a"(q), "=d"(*r) : "0"(n0), "1"(n1), "rm"(d));
-return q;
-#elif defined(__s390x__) && !defined(__clang__)
-/* Need to use a TImode type to get an even register pair for DLGR.  */
-unsigned __int128 n = (unsigned __int128)n1 << 64 | n0;
-asm("dlgr %0, %1" : "+r"(n) : "r"(d));
-*r = n >> 64;
-return n;
-#elif defined(_ARCH_PPC64) && defined(_ARCH_PWR7)
-/* From Power ISA 2.06, programming note for divdeu.  */
-uint64_t q1, q2, Q, r1, r2, R;
-asm("divdeu %0,%2,%4; divdu %1,%3,%4"
-: "="(q1), "=r"(q2)
-: "r"(n1), "r"(n0), "r"(d));
-r1 = -(q1 * d); /* low part of (n1<<64) - (q1 * d) */
-r2 = n0 - (q2 * d);
-Q = q1 + q2;
-R = r1 + r2;
-if (R >= d || R < r2) { /* overflow implies R > d */
-Q += 1;
-R -= d;
-}
-*r = R;
-return Q;
-#else
-uint64_t d0, d1, q0, q1, r1, r0, m;
-
-d0 = (uint32_t)d;
-d1 = d >> 32;
-
-r1 = n1 % d1;
-q1 = n1 / d1;
-m = q1 * d0;
-r1 = (r1 << 32) | (n0 >> 32);
-if (r1 < m) {
-q1 -= 1;
-r1 += d;
-if (r1 >= d) {
-if (r1 < m) {
-q1 -= 1;
-r1 += d;
-}
-}
-}
-r1 -= m;
-
-r0 = r1 % d1;
-q0 = r1 / d1;
-m = q0 * d0;
-r0 = (r0 << 32) | (uint32_t)n0;
-if (r0 < m) {
-q0 -= 1;
-r0 += d;
-if (r0 >= d) {
-if (r0 < m) {
-q0 -= 1;
-r0 += d;
-}
-}
-}
-r0 -= m;
-
-*r = r0;
-return (q1 << 32) | q0;
-#endif
-}
-
 /*
 | Returns an approximation to the square root of the 32-bit significand given
 | by `a'.  Considered as an integer, `a' must be at least 2^31.  If bit 0 of
diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index e82e6239af..08a17e16e5 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -23,6 +23,10 @@
  * THE SOFTWARE.
  */
 
+/* Portions of this work are licensed under the terms of the GNU GPL,
+ * version 2 or later. See the COPYING file in the top-level directory.
+ */
+
 #ifndef HOST_UTILS_H
 #define HOST_UTILS_H
 
@@ -726,4 +730,81 @@ void urshift(uint64_t *plow, uint64_t *phigh, int32_t 
shift);
  */
 void ulshift(uint64_t *plow, uint64_t *phigh, int32_t shift, bool *overflow);
 
+/* From the GNU Multi Precision Library - longlong.h __udiv_qrnnd
+ * (https://gmplib.org/repo/gmp/file/tip/longlong.h)
+ *
+ * Licensed under the GPLv2/LGPLv3
+ */
+static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1,
+  uint64_t n0, uint64_t d)
+{
+#if defined(__x86_64__)
+uint64_t q;
+asm("divq %4" : "=a"(q), "=d"(*r) : "0"(n0), "1"(n1), "rm"(d));
+return q;
+#elif defined(__s390x__) && !defined(__clang__)
+/* Need to use a TImode type to get an even register pair for DLGR.  */
+unsigned __int128 n = (unsigned __int128)n1 << 64 | n0;
+asm("dlgr %0, %1" : "+r"(n) : "r"(d));
+*r = n >> 64;
+return n;
+#elif defined(_ARCH_PPC64) && defined(_ARCH_PWR7)
+/* From Power ISA 2.06, programming note for divdeu.  */
+uint64_t q1, q2, Q, r1, r2, R;
+asm("divdeu %0,%2,%4; divdu %1,%3,%4"

[PULL 00/56] tcg patch queue

2021-10-27 Thread Richard Henderson
The following changes since commit c52d69e7dbaaed0ffdef8125e79218672c30161d:

  Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20211027' 
into staging (2021-10-27 11:45:18 -0700)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20211027

for you to fetch changes up to 820c025f0dcacf2f3c12735b1f162893fbfa7bc6:

  tcg/optimize: Propagate sign info for shifting (2021-10-27 17:11:23 -0700)


Improvements to qemu/int128
Fixes for 128/64 division.
Cleanup tcg/optimize.c
Optimize redundant sign extensions


Frédéric Pétrot (1):
  qemu/int128: Add int128_{not,xor}

Luis Pires (4):
  host-utils: move checks out of divu128/divs128
  host-utils: move udiv_qrnnd() to host-utils
  host-utils: add 128-bit quotient support to divu128/divs128
  host-utils: add unit tests for divu128/divs128

Richard Henderson (51):
  tcg/optimize: Rename "mask" to "z_mask"
  tcg/optimize: Split out OptContext
  tcg/optimize: Remove do_default label
  tcg/optimize: Change tcg_opt_gen_{mov,movi} interface
  tcg/optimize: Move prev_mb into OptContext
  tcg/optimize: Split out init_arguments
  tcg/optimize: Split out copy_propagate
  tcg/optimize: Split out fold_call
  tcg/optimize: Drop nb_oargs, nb_iargs locals
  tcg/optimize: Change fail return for do_constant_folding_cond*
  tcg/optimize: Return true from tcg_opt_gen_{mov,movi}
  tcg/optimize: Split out finish_folding
  tcg/optimize: Use a boolean to avoid a mass of continues
  tcg/optimize: Split out fold_mb, fold_qemu_{ld,st}
  tcg/optimize: Split out fold_const{1,2}
  tcg/optimize: Split out fold_setcond2
  tcg/optimize: Split out fold_brcond2
  tcg/optimize: Split out fold_brcond
  tcg/optimize: Split out fold_setcond
  tcg/optimize: Split out fold_mulu2_i32
  tcg/optimize: Split out fold_addsub2_i32
  tcg/optimize: Split out fold_movcond
  tcg/optimize: Split out fold_extract2
  tcg/optimize: Split out fold_extract, fold_sextract
  tcg/optimize: Split out fold_deposit
  tcg/optimize: Split out fold_count_zeros
  tcg/optimize: Split out fold_bswap
  tcg/optimize: Split out fold_dup, fold_dup2
  tcg/optimize: Split out fold_mov
  tcg/optimize: Split out fold_xx_to_i
  tcg/optimize: Split out fold_xx_to_x
  tcg/optimize: Split out fold_xi_to_i
  tcg/optimize: Add type to OptContext
  tcg/optimize: Split out fold_to_not
  tcg/optimize: Split out fold_sub_to_neg
  tcg/optimize: Split out fold_xi_to_x
  tcg/optimize: Split out fold_ix_to_i
  tcg/optimize: Split out fold_masks
  tcg/optimize: Expand fold_mulu2_i32 to all 4-arg multiplies
  tcg/optimize: Expand fold_addsub2_i32 to 64-bit ops
  tcg/optimize: Sink commutative operand swapping into fold functions
  tcg/optimize: Stop forcing z_mask to "garbage" for 32-bit values
  tcg/optimize: Use fold_xx_to_i for orc
  tcg/optimize: Use fold_xi_to_x for mul
  tcg/optimize: Use fold_xi_to_x for div
  tcg/optimize: Use fold_xx_to_i for rem
  tcg/optimize: Optimize sign extensions
  tcg/optimize: Propagate sign info for logical operations
  tcg/optimize: Propagate sign info for setcond
  tcg/optimize: Propagate sign info for bit counting
  tcg/optimize: Propagate sign info for shifting

 include/fpu/softfloat-macros.h |   82 --
 include/hw/clock.h |5 +-
 include/qemu/host-utils.h  |  121 +-
 include/qemu/int128.h  |   20 +
 target/ppc/int_helper.c|   23 +-
 tcg/optimize.c | 2644 
 tests/unit/test-div128.c   |  197 +++
 util/host-utils.c  |  147 ++-
 tests/unit/meson.build |1 +
 9 files changed, 2053 insertions(+), 1187 deletions(-)
 create mode 100644 tests/unit/test-div128.c



Re: [PATCH v2 0/2] mconfigptr support

2021-10-27 Thread Rahul Pathak
On Wed, Oct 27, 2021 at 8:14 AM Alistair Francis 
wrote:

> On Mon, Oct 25, 2021 at 10:51 PM Rahul Pathak 
> wrote:
> >
> > Patches add the mconfigptr csr support.
> > mconfigptr is newly incorporated in risc-v privileged architecture
> > specification 1.12 version.
> > priv spec 1.12.0 version check is also added.
> >
> >
> > qemu-system-riscv64 -nographic -machine virt -cpu rv64,priv_spec=v1.12.0
>
> Thanks for the patches!
>
> I gave some comments in line with the code changes. Overall this looks
> good, we just need to add the other v1.12.0 features.
>
> Alistair


Thanks Alistair, I will work on the comments and send the
next version.

>
>
>
> > Changelog:
> >
> > v1->v2
> > --
> > 1. Added privileged architecture spec version 1.12 ("v1.12.0") check
> > 2. Added predicate function for mconfigptr which verifies
> > for priv spec version v1.12.0 or higher.
> >
> > Thanks
> > Rahul
> >
> > Rahul Pathak (2):
> >   target/riscv: Add priv spec 1.12.0 version check
> >   target/riscv: csr: Implement mconfigptr CSR
> >
> >  target/riscv/cpu.c  |  4 +++-
> >  target/riscv/cpu.h  |  1 +
> >  target/riscv/cpu_bits.h |  1 +
> >  target/riscv/csr.c  | 19 +++
> >  4 files changed, 20 insertions(+), 5 deletions(-)
> >
> > --
> > 2.25.1
> >
> >
>


Re: [PATCH v2 1/2] target/riscv: Add priv spec 1.12.0 version check

2021-10-27 Thread Rahul Pathak
On Wed, Oct 27, 2021 at 8:08 AM Alistair Francis 
wrote:

> On Mon, Oct 25, 2021 at 10:55 PM Rahul Pathak 
> wrote:
> >
> > Signed-off-by: Rahul Pathak 
> > ---
> >  target/riscv/cpu.c | 4 +++-
> >  target/riscv/cpu.h | 1 +
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index 788fa0b11c..83c3814a5a 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -405,7 +405,9 @@ static void riscv_cpu_realize(DeviceState *dev,
> Error **errp)
> >  }
> >
> >  if (cpu->cfg.priv_spec) {
> > -if (!g_strcmp0(cpu->cfg.priv_spec, "v1.11.0")) {
> > +if (!g_strcmp0(cpu->cfg.priv_spec, "v1.12.0")) {
> > +priv_version = PRIV_VERSION_1_12_0;
> > +} else if (!g_strcmp0(cpu->cfg.priv_spec, "v1.11.0")) {
>
> This change, actually allowing the user to enable the spec, should be
> in a separate patch at the end of the series.
>
> The idea is to add the feature, then expose it.
>

Sure, will change in the next version

>
> Alistair
>
>
> >  priv_version = PRIV_VERSION_1_11_0;
> >  } else if (!g_strcmp0(cpu->cfg.priv_spec, "v1.10.0")) {
> >  priv_version = PRIV_VERSION_1_10_0;
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index a33dc30be8..67c52e6f9e 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -79,6 +79,7 @@ enum {
> >
> >  #define PRIV_VERSION_1_10_0 0x00011000
> >  #define PRIV_VERSION_1_11_0 0x00011100
> > +#define PRIV_VERSION_1_12_0 0x00011200
> >
> >  #define VEXT_VERSION_0_07_1 0x0701
> >
> > --
> > 2.25.1
> >
> >
>


Re: [PATCH v2 2/2] target/riscv: csr: Implement mconfigptr CSR

2021-10-27 Thread Rahul Pathak
On Wed, Oct 27, 2021 at 8:13 AM Alistair Francis 
wrote:

> On Mon, Oct 25, 2021 at 10:55 PM Rahul Pathak 
> wrote:
> >
> > Signed-off-by: Rahul Pathak 
> > ---
> >  target/riscv/cpu_bits.h |  1 +
> >  target/riscv/csr.c  | 19 +++
> >  2 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> > index cffcd3a5df..e2f154b7c5 100644
> > --- a/target/riscv/cpu_bits.h
> > +++ b/target/riscv/cpu_bits.h
> > @@ -140,6 +140,7 @@
> >  #define CSR_MARCHID 0xf12
> >  #define CSR_MIMPID  0xf13
> >  #define CSR_MHARTID 0xf14
> > +#define CSR_MCONFIGPTR  0xf15
> >
> >  /* Machine Trap Setup */
> >  #define CSR_MSTATUS 0x300
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 69e4d65fcd..2d7f608d49 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -209,6 +209,16 @@ static RISCVException epmp(CPURISCVState *env, int
> csrno)
> >
> >  return RISCV_EXCP_ILLEGAL_INST;
> >  }
> > +
> > +static RISCVException priv1p12(CPURISCVState *env, int csrno)
> > +{
> > +   if (env->priv_ver >= PRIV_VERSION_1_12_0) {
> > +   return RISCV_EXCP_NONE;
> > +   }
> > +
> > +   return RISCV_EXCP_ILLEGAL_INST;
> > +}
> > +
> >  #endif
> >
> >  /* User Floating-Point CSRs */
> > @@ -1569,10 +1579,11 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
> >  [CSR_MINSTRETH] = { "minstreth", any32, read_instreth },
> >
> >  /* Machine Information Registers */
> > -[CSR_MVENDORID] = { "mvendorid", any,   read_zero},
> > -[CSR_MARCHID]   = { "marchid",   any,   read_zero},
> > -[CSR_MIMPID]= { "mimpid",any,   read_zero},
> > -[CSR_MHARTID]   = { "mhartid",   any,   read_mhartid },
> > +[CSR_MVENDORID] = { "mvendorid",   any,read_zero},
> > +[CSR_MARCHID]   = { "marchid", any,read_zero},
> > +[CSR_MIMPID]= { "mimpid",  any,read_zero},
> > +[CSR_MHARTID]   = { "mhartid", any,read_mhartid },
>
> Why change these?
>
The alignment of all structure entries is consistent in their respective
blocks, that's why I aligned these with the mconfigptr line.
It's really not necessary and if my observation on the alignment is
really not a requirement I will undo this.



>
>
>
> > +[CSR_MCONFIGPTR] = {"mconfigptr",  priv1p12,   read_zero},
>
> This looks fine, but there are more changes then this in v1.12.
> Looking at the preface we need mret/sret changes at least. It also
> looks like some other changes will need to be implemented or at least
> checked.
>
>
Agree, I will look into that


> Alistair
>
> >
> >  /* Machine Trap Setup */
> >  [CSR_MSTATUS] = { "mstatus",any,   read_mstatus,
>  write_mstatus },
> > --
> > 2.25.1
> >
> >
>


Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci

2021-10-27 Thread Peter Xu
On Wed, Oct 27, 2021 at 04:30:18PM +0800, Peter Xu wrote:
> On Tue, Oct 26, 2021 at 05:11:39PM +0200, Igor Mammedov wrote:
> > On Fri, 22 Oct 2021 10:14:29 +0800
> > Peter Xu  wrote:
> > 
> > > Hi, Alex,
> > > 
> > > On Thu, Oct 21, 2021 at 04:30:39PM -0600, Alex Williamson wrote:
> > > > On Thu, 21 Oct 2021 18:42:59 +0800
> > > > Peter Xu  wrote:
> > > >   
> > > > > Scan the pci bus to make sure there's no vfio-pci device attached 
> > > > > before vIOMMU
> > > > > is realized.  
> > > > 
> > > > Sorry, I'm not onboard with this solution at all.
> > > > 
> > > > It would be really useful though if this commit log or a code comment
> > > > described exactly the incompatibility for which vfio-pci devices are
> > > > being called out here.  Otherwise I see this as a bit of magic voodoo
> > > > that gets lost in lore and copied elsewhere and we're constantly trying
> > > > to figure out specific incompatibilities when vfio-pci devices are
> > > > trying really hard to be "just another device".  
> > > 
> > > Sure, I can enrich the commit message.
> > > 
> > > > 
> > > > I infer from the link of the previous alternate solution that this is
> > > > to do with the fact that vfio devices attach a memory listener to the
> > > > device address space.  
> > > 
> > > IMHO it's not about the memory listeners, I think that' after vfio 
> > > detected
> > > some vIOMMU memory regions already, which must be based on an vIOMMU 
> > > address
> > > space being available.  I think the problem is that when realize() 
> > > vfio-pci we
> > > fetch the dma address space specifically for getting the vfio group, 
> > > while that
> > > could happen too early, even before vIOMMU is created.
> > > 
> > > > Interestingly that previous cover letter also discusses how vdpa devices
> > > > might have a similar issue, which makes it confusing again that we're 
> > > > calling
> > > > out vfio-pci devices by name rather than for a behavior.  
> > > 
> > > Yes I'll need to see whether this approach will be accepted first.  I 
> > > think
> > > similar thing could help VDPA but it's not required there because VDPA has
> > > already worked around using pci_device_iommu_address_space().  So 
> > > potentially
> > > the only one to "fix" is the vfio-pci device using along with vIOMMU, 
> > > when the
> > > device ordering is specified in the wrong order.  I'll leave the VDPA 
> > > problem
> > > to Jason to see whether he prefers keeping current code, or switch to a 
> > > simpler
> > > one.  That should be after this one.
> > > 
> > > > 
> > > > If the behavior here is that vfio-pci devices attach a listener to the
> > > > device address space, then that provides a couple possible options.  We
> > > > could look for devices that have recorded an interest in their address
> > > > space, such as by setting a flag on PCIDevice when someone calls
> > > > pci_device_iommu_address_space(), where we could walk all devices using
> > > > the code in this series to find a device with such a flag.  
> > > 
> > > Right, we can set a flag for all the pci devices that needs to consolidate
> > > pci_device_iommu_address_space() result, however then it'll be vfio-pci 
> > > only so
> > > far.  Btw, I actually proposed similar things two months ago, and I think 
> > > Igor
> > > showed concern on that flag being vague on meaning:
> > 
> > (1)
> > > https://lore.kernel.org/qemu-devel/20210906104915.7dd5c...@redhat.com/
> > 
> > > 
> > >   > > Does it need to be a pre_plug hook?  I thought we might just need a 
> > > flag in the
> > >   > > pci device classes showing that it should be after vIOMMUs, then in 
> > > vIOMMU
> > >   > > realize functions we walk pci bus to make sure no such device exist?
> > >   > > 
> > >   > > We could have a base vIOMMU class, then that could be in the 
> > > realize() of the
> > >   > > common class.  
> > >   > 
> > >   > We basically don't know if device needs IOMMU or not and can work
> > >   > with/without it just fine. In this case I'd think about IOMMU as board
> > >   > feature that morphs PCI buses (some of them) (address space, bus 
> > > numers, ...).
> > >   > So I don't perceive any iommu flag as a device property at all.
> > >   > 
> > >   > As for realize vs pre_plug, the later is the part of abstract realize
> > >   > (see: device_set_realized) and is already used by some PCI 
> > > infrastructure:
> > >   >   ex: pcie_cap_slot_pre_plug_cb/spapr_pci_pre_plug  
> > > 
> > > I still think that flag will work, that flag should only shows "whether 
> > > this
> > > device needs to be specified earlier than vIOMMU", but I can get the 
> > > point from
> > > Igor that it's at least confusing on what does the flag mean.
> > 
> > > Meanwhile I
> > > don't think that flag will be required, as this is not the first time we 
> > > name a
> > > special device in the code, e.g. pc_machine_device_pre_plug_cb().
> > > intel_iommu.c has it too upon vfio-pci already on making sure 
> > > caching-mode=on
> > > in 

  1   2   3   4   5   >