Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding

2021-02-03 Thread David Hildenbrand

On 04.02.21 07:41, David Hildenbrand wrote:



Am 04.02.2021 um 03:22 schrieb Richard Henderson :

On 2/1/21 10:45 AM, Richard W.M. Jones wrote:

This commit breaks running certain s390x binaries, at least
the "mount" command (or a library it uses) breaks.

More details in this BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1922248

Could we revert this change since it seems to have caused other
problems as well?


Well, the other problems have been fixed (which were in fact latent, and could
have been produced by other means).  I would not like to sideline this patch
set indefinitely.

Could you give me some help extracting the relevant binaries?  "Begin with an
s390x host" is a non-starter.



Hi,

I‘m planning on reproducing it today or tomorrow. Especially, finding a 
reproducer and trying reproducing on x86-64 host.


FWIW, on an x86-64 host, I can boot F32, Fedora rawhide, and RHEL8.X 
just fine from qcow2 (so "mount" seems to work in that environment as 
expected). Maybe it's really s390x-host specific? I'll give it a try.


--
Thanks,

David / dhildenb




Re: [PATCH] MAINTAINERS: Fix the location of virtiofsd.rst

2021-02-03 Thread Philippe Mathieu-Daudé
On 2/3/21 10:19 PM, Wainer dos Santos Moschetta wrote:
> The virtiofsd.rst file was moved to docs/tools, so this update
> MAINTAINERS accordingly.
> 
> Fixes: a08b4a9fe6c ("docs: Move tools documentation to tools manual")

Thanks, but why not directly fix all the files changed by that commit?

> Signed-off-by: Wainer dos Santos Moschetta 
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)




Re: [PATCH v2 20/36] block: add bdrv_attach_child_common() transaction action

2021-02-03 Thread Kevin Wolf
Am 04.02.2021 um 08:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 04.02.2021 00:01, Kevin Wolf wrote:
> > Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > Split out no-perm part of bdrv_root_attach_child() into separate
> > > transaction action. bdrv_root_attach_child() now moves to new
> > > permission update paradigm: first update graph relations then update
> > > permissions.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy 

> > > +static void bdrv_attach_child_common_abort(void *opaque)
> > > +{
> > > +BdrvAttachChildCommonState *s = opaque;
> > > +BdrvChild *child = *s->child;
> > > +BlockDriverState *bs = child->bs;
> > > +
> > > +bdrv_replace_child_noperm(child, NULL);
> > > +
> > > +if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
> > > +bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
> > 
> > Would failure actually be fatal? I think we can ignore it, the node is
> > in an AioContext that works for it.
> 
> As far as I explored the code, check-aio-context is transparent
> enough, nothing rely on IO, etc, and if we succeeded to change it we
> must success in revert.
> 
> And as I understand it is critical: if we failed to rollback
> aio-context change somewhere (but succeeded in reverting graph
> relation change), it means that we end up with different aio contexts
> inside one block subtree..

Ah, right, we're going to change the graph once again, so what is
working now doesn't have to be working for the changed graph.

Ok, let's leave this as &error_abort.

Kevin




Re: [PATCH v2 20/36] block: add bdrv_attach_child_common() transaction action

2021-02-03 Thread Vladimir Sementsov-Ogievskiy

04.02.2021 00:01, Kevin Wolf wrote:

Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:

Split out no-perm part of bdrv_root_attach_child() into separate
transaction action. bdrv_root_attach_child() now moves to new
permission update paradigm: first update graph relations then update
permissions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block.c | 162 
  1 file changed, 117 insertions(+), 45 deletions(-)

diff --git a/block.c b/block.c
index f0fcd7..a7ccbb4fb1 100644
--- a/block.c
+++ b/block.c
@@ -86,6 +86,13 @@ static void bdrv_parent_set_aio_context_ignore(BdrvChild *c, 
AioContext *ctx,
 GSList **ignore);
  static void bdrv_replace_child_noperm(BdrvChild *child,
BlockDriverState *new_bs);
+static int bdrv_attach_child_common(BlockDriverState *child_bs,
+const char *child_name,
+const BdrvChildClass *child_class,
+BdrvChildRole child_role,
+uint64_t perm, uint64_t shared_perm,
+void *opaque, BdrvChild **child,
+GSList **tran, Error **errp);


If you added the new code above bdrv_root_attach_child(), we wouldn't
need the forward declaration and the patch would probably be simpler to
read (because it's the first part of bdrv_root_attach_child() that is
factored out).


  static int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue
 *queue, Error **errp);
@@ -2898,55 +2905,22 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
uint64_t perm, uint64_t shared_perm,
void *opaque, Error **errp)
  {
-BdrvChild *child;
-Error *local_err = NULL;
  int ret;
-AioContext *ctx;
+BdrvChild *child = NULL;
+GSList *tran = NULL;
  
-ret = bdrv_check_update_perm(child_bs, NULL, perm, shared_perm, NULL, errp);

+ret = bdrv_attach_child_common(child_bs, child_name, child_class,
+   child_role, perm, shared_perm, opaque,
+   &child, &tran, errp);
  if (ret < 0) {
-bdrv_abort_perm_update(child_bs);
  bdrv_unref(child_bs);
  return NULL;
  }
  
-child = g_new(BdrvChild, 1);

-*child = (BdrvChild) {
-.bs = NULL,
-.name   = g_strdup(child_name),
-.klass  = child_class,
-.role   = child_role,
-.perm   = perm,
-.shared_perm= shared_perm,
-.opaque = opaque,
-};
-
-ctx = bdrv_child_get_parent_aio_context(child);
-
-/* If the AioContexts don't match, first try to move the subtree of
- * child_bs into the AioContext of the new parent. If this doesn't work,
- * try moving the parent into the AioContext of child_bs instead. */
-if (bdrv_get_aio_context(child_bs) != ctx) {
-ret = bdrv_try_set_aio_context(child_bs, ctx, &local_err);
-if (ret < 0) {
-if (bdrv_parent_try_set_aio_context(child, ctx, NULL) == 0) {
-ret = 0;
-error_free(local_err);
-local_err = NULL;
-}
-}
-if (ret < 0) {
-error_propagate(errp, local_err);
-g_free(child);
-bdrv_abort_perm_update(child_bs);
-bdrv_unref(child_bs);
-return NULL;
-}
-}
-
-/* This performs the matching bdrv_set_perm() for the above check. */
-bdrv_replace_child(child, child_bs);
+ret = bdrv_refresh_perms(child_bs, errp);
+tran_finalize(tran, ret);
  
+bdrv_unref(child_bs);

  return child;
  }
  
@@ -2988,16 +2962,114 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,

  return child;
  }
  
-static void bdrv_detach_child(BdrvChild *child)

+static void bdrv_remove_empty_child(BdrvChild *child)
  {
+assert(!child->bs);
  QLIST_SAFE_REMOVE(child, next);
-
-bdrv_replace_child(child, NULL);
-
  g_free(child->name);
  g_free(child);
  }
  
+typedef struct BdrvAttachChildCommonState {

+BdrvChild **child;
+AioContext *old_parent_ctx;
+AioContext *old_child_ctx;
+} BdrvAttachChildCommonState;
+
+static void bdrv_attach_child_common_abort(void *opaque)
+{
+BdrvAttachChildCommonState *s = opaque;
+BdrvChild *child = *s->child;
+BlockDriverState *bs = child->bs;
+
+bdrv_replace_child_noperm(child, NULL);
+
+if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
+bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);


Would failure actually be fatal? I think we can ignore it, the node is
in an AioContext that works for it.


As far as I explored the

[Bug 1912777] Re: KVM_EXIT_MMIO has increased in Qemu4.0.0 when compared to Qemu 2.11.0

2021-02-03 Thread ANIMESH KUMAR SINHA
** Description changed:

  I was able to generate trace dump in Qemu for kvm_run_exit event in both QEMU 
2.11.0 and QEMU 4.0.0
  From the trace i noticed that the number of KVM_KXIT_MMIO calls has increased 
alot and is causing delay in testcase execution.
  
  I executed same testcase from Qemu 2.11 and Qemu4.
  Inside Virtual machine when using qemu 2.11 testcase got completed in 11 
seconds
  but the same testcase when executed on Qemu 4.0.0 got executed in 26 seconds.
  
+ I did a bit of digging and extracted the kvm_run_exit to figure out
+ whats going on.
  
- I did a bit of digging and extracted the kvm_run_exit to figure out whats 
going on.
- 
- Please find 
+ Please find
  Stats from Qemu2.11:
  
  KVM_EXIT_UNKNOWN  : 0
  KVM_EXIT_EXCEPTION: 0
  KVM_EXIT_IO   : 182513
  KVM_EXIT_HYPERCALL: 0
  KVM_EXIT_DEBUG: 0
  KVM_EXIT_HLT  : 0
  KVM_EXIT_MMIO : 216701
  KVM_EXIT_IRQ_WINDOW_OPEN  : 0
  KVM_EXIT_SHUTDOWN : 0
  KVM_EXIT_FAIL_ENTRY   : 0
  KVM_EXIT_INTR : 0
  KVM_EXIT_SET_TPR  : 0
  KVM_EXIT_TPR_ACCESS   : 0
  KVM_EXIT_S390_SIEIC   : 0
  KVM_EXIT_S390_RESET   : 0
  KVM_EXIT_DCR  : 0
  KVM_EXIT_NMI  : 0
  KVM_EXIT_INTERNAL_ERROR   : 0
  KVM_EXIT_OSI  : 0
  KVM_EXIT_PAPR_HCALL   : 0
  KVM_EXIT_S390_UCONTROL: 0
  KVM_EXIT_WATCHDOG : 0
  KVM_EXIT_S390_TSCH: 0
  KVM_EXIT_EPR  : 0
  KVM_EXIT_SYSTEM_EVENT : 0
  KVM_EXIT_S390_STSI: 0
  KVM_EXIT_IOAPIC_EOI   : 0
  KVM_EXIT_HYPERV   : 0
  
  KVM_RUN_EXIT  : 399214  (Total in Qemu 2.11 for a testcase)
  
- 
  Stats For Qemu 4.0.0:
  
- VM_EXIT_UNKNOWN   : 0 
   
- KVM_EXIT_EXCEPTION: 0 
   
- KVM_EXIT_IO   : 163729
   
- KVM_EXIT_HYPERCALL: 0 
   
- KVM_EXIT_DEBUG: 0 
   
- KVM_EXIT_HLT  : 0 
   
- KVM_EXIT_MMIO : 1094231   
   
- KVM_EXIT_IRQ_WINDOW_OPEN  : 46
   
- KVM_EXIT_SHUTDOWN : 0 
   
- KVM_EXIT_FAIL_ENTRY   : 0 
   
- KVM_EXIT_INTR : 0 
   
- KVM_EXIT_SET_TPR  : 0 
   
- KVM_EXIT_TPR_ACCESS   : 0 
   
- KVM_EXIT_S390_SIEIC   : 0 
   
- KVM_EXIT_S390_RESET   : 0 
   
- KVM_EXIT_DCR  : 0 
   
- KVM_EXIT_NMI  : 0 
   
- KVM_EXIT_INTERNAL_ERROR   : 0 
   
- KVM_EXIT_OSI  : 0 
   
- KVM_EXIT_PAPR_HCALL   : 0 
   
- KVM_EXIT_S390_UCONTROL: 0   

Re: [PATCH v2 19/36] block: fix bdrv_replace_node_common

2021-02-03 Thread Vladimir Sementsov-Ogievskiy

03.02.2021 21:23, Kevin Wolf wrote:

Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:

inore_children thing doesn't help to track all propagated permissions
of children we want to ignore. The simplest way to correctly update
permissions is update graph first and then do permission update. In
this case we just referesh permissions for the whole subgraph (in
topological-sort defined order) and everything is correctly calculated
automatically without any ignore_children.

So, refactor bdrv_replace_node_common to first do graph update and then
refresh the permissions.

Test test_parallel_exclusive_write() now pass, so move it out of
debugging "if".

Signed-off-by: Vladimir Sementsov-Ogievskiy 



diff --git a/tests/test-bdrv-graph-mod.c b/tests/test-bdrv-graph-mod.c
index 0d62e05ddb..93a5941a9b 100644
--- a/tests/test-bdrv-graph-mod.c
+++ b/tests/test-bdrv-graph-mod.c
@@ -294,20 +294,11 @@ static void test_parallel_perm_update(void)
  bdrv_child_refresh_perms(top, top->children.lh_first, &error_abort);
  
  assert(c_fl1->perm & BLK_PERM_WRITE);

+bdrv_unref(top);
  }


Why do have this addition in this patch? Shouldn't the changed function
behave the same as before with respect to referenced nodes?



Hmm, looks like accidental fixup that should be squashed to original commit.. 
Or just a mistake. Will check when prepare next version


--
Best regards,
Vladimir



[PATCH] qemu-storage-daemon: Enable object-add

2021-02-03 Thread Kevin Wolf
As we don't have a fully QAPIfied version of object-add yet and it still
has 'gen': false in the schema, it needs to be registered explicitly in
init_qmp_commands() to be available for users.

Fixes: 2af282ec51a27116d0402cab237b8970800f870c
Signed-off-by: Kevin Wolf 
---
 storage-daemon/qemu-storage-daemon.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index d8d172cc60..9021a46b3a 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -144,6 +144,8 @@ static void init_qmp_commands(void)
 qmp_init_marshal(&qmp_commands);
 qmp_register_command(&qmp_commands, "query-qmp-schema",
  qmp_query_qmp_schema, QCO_ALLOW_PRECONFIG);
+qmp_register_command(&qmp_commands, "object-add", qmp_object_add,
+ QCO_NO_OPTIONS);
 
 QTAILQ_INIT(&qmp_cap_negotiation_commands);
 qmp_register_command(&qmp_cap_negotiation_commands, "qmp_capabilities",
-- 
2.29.2




Re: [PATCH v2 15/36] block: use topological sort for permission update

2021-02-03 Thread Vladimir Sementsov-Ogievskiy

03.02.2021 21:38, Kevin Wolf wrote:

Am 28.01.2021 um 19:04 hat Vladimir Sementsov-Ogievskiy geschrieben:

28.01.2021 20:13, Kevin Wolf wrote:

Am 28.01.2021 um 10:34 hat Vladimir Sementsov-Ogievskiy geschrieben:

27.01.2021 21:38, Kevin Wolf wrote:

Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:

-static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
-   uint64_t cumulative_perms,
-   uint64_t cumulative_shared_perms,
-   GSList *ignore_children, Error **errp)
+static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
+uint64_t cumulative_perms,
+uint64_t cumulative_shared_perms,
+GSList *ignore_children, Error **errp)
{
BlockDriver *drv = bs->drv;
BdrvChild *c;
@@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, 
BlockReopenQueue *q,
/* Check all children */
QLIST_FOREACH(c, &bs->children, next) {
uint64_t cur_perm, cur_shared;
-GSList *cur_ignore_children;
bdrv_child_perm(bs, c->bs, c, c->role, q,
cumulative_perms, cumulative_shared_perms,
&cur_perm, &cur_shared);
+bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);


This "added" line is actually old code. What is removed here is the
recursive call of bdrv_check_update_perm(). This is what the code below
will have to replace.


yes, we'll use explicit loop instead of recursion




+}
+
+return 0;
+}
+
+static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
+   uint64_t cumulative_perms,
+   uint64_t cumulative_shared_perms,
+   GSList *ignore_children, Error **errp)
+{
+int ret;
+BlockDriverState *root = bs;
+g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
+
+for ( ; list; list = list->next) {
+bs = list->data;
+
+if (bs != root) {
+if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
+return -EINVAL;
+}


At this point bs still had the old permissions, but we don't access
them. As we're going in topological order, the parents have already been
updated if they were a child covered in bdrv_node_check_perm(), so we're
checking the relevant values. Good.

What about the root node? If I understand correctly, the parents of the
root nodes wouldn't have been checked in the old code. In the new state,
the parent BdrvChild already has to contain the new permission.

In bdrv_refresh_perms(), we already check parent conflicts, so no change
for all callers going through it. Good.

bdrv_reopen_multiple() is less obvious. It passes permissions from the
BDRVReopenState, without applying the permissions first.


It will be changed in the series


Do we check the
old parent permissions instead of the new state here?


We use given (new) cumulative permissions for bs, and recalculate
permissions for bs subtree.


Where do we actually set them? I would expect a
bdrv_child_set_perm_safe() call somewhere, but I can't see it in the
call path from bdrv_reopen_multiple().


You mean parent BdrvChild objects? Then this question applies as well
to pre-patch code.


I don't think so. The pre-patch code doesn't rely on the permissions
already being set in the BdrvChild object, but it gets them passed in
parameters. Changing the graph first and relying on the information in
BdrvChild is the new approach that you're introducing.


So, we just call bdrv_check_perm() for bs in bdrv_reopen_multiple.. I
think the answer is like this:

if state->perm and state->shared_perm are different from actual
cumulative permissions (before reopne), then we must have the
parent(s) of the node in same bs_queue. Then, corresponding children
are updated as part of another bdrv_check_perm call from same loop in
bdrv_reopen_multiple().

Let's check how state->perm and state->shared_perm are set:

bdrv_reopen_queue_child()

 /* This needs to be overwritten in bdrv_reopen_prepare() */
 bs_entry->state.perm = UINT64_MAX;
 bs_entry->state.shared_perm = 0;


...
bdrv_reopen_prepare()

bdrv_reopen_perm(queue, reopen_state->bs,
  &reopen_state->perm, &reopen_state->shared_perm);

and bdrv_reopen_perm() calculate cumulative permissions, taking
permissions from the queue, for parents which exists in queue.


Right, but it stores the new permissions in reopen_state, not in the
BdrvChild objects that this patch is looking it. Or am I missing
something?


Not sure how much it correct, keeping in mind that we may look at a
node in queue, for which bdrv_reopen_perm was not yet called, but the
idea is clean.


I don't think the above code can work correctly without something
actually updating the BdrvChild firs

Re: [PATCH 1/6] travis.yml: Move gprof/gcov test across to gitlab

2021-02-03 Thread Thomas Huth

On 03/02/2021 20.32, Wainer dos Santos Moschetta wrote:

Hi,

On 2/3/21 8:32 AM, Thomas Huth wrote:

From: Philippe Mathieu-Daudé 

Similarly to commit 8cdb2cef3f1, move the gprof/gcov test to GitLab.

The coverage-summary.sh script is not Travis-CI specific, make it
generic.

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20201108204535.2319870-10-phi...@redhat.com>
[thuth: Add gcovr and bsdmainutils which are required for the
 overage-summary.sh script to the ubuntu docker file]

s/overage/coverage/

Signed-off-by: Thomas Huth 
---
  .gitlab-ci.yml | 12 
  .travis.yml    | 14 --
  MAINTAINERS    |  2 +-
  scripts/{travis => ci}/coverage-summary.sh |  2 +-
  tests/docker/dockerfiles/ubuntu2004.docker |  2 ++
  5 files changed, 16 insertions(+), 16 deletions(-)
  rename scripts/{travis => ci}/coverage-summary.sh (92%)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 7c0db64710..8b97b512bb 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -468,6 +468,18 @@ check-deprecated:
  MAKE_CHECK_ARGS: check-tcg
    allow_failure: true
+# gprof/gcov are GCC features
+build-gprof-gcov:
+  <<: *native_build_job_definition
+  variables:
+    IMAGE: ubuntu2004
+    CONFIGURE_ARGS: --enable-gprof --enable-gcov
+    MAKE_CHECK_ARGS: build-tcg


With build-tcg it generates an empty report, e.g., 
https://gitlab.com/wainersm/qemu/-/jobs/1005923421


Shouldn't it run `make check`?


D'oh, you're right. I think we need to run at least a "make check-unit" 
here. I'll rework my patch accordingly...


By the way, it's broken on Travis since a long time, e.g. with version 5.0 
there is already only a stack trace:


https://travis-ci.org/github/qemu/qemu/jobs/680661167#L8411

Seems like nobody noticed this for almost a year now...

 Thomas




[PATCH] arm: xlnx-versal: fix virtio-mmio base address assignment

2021-02-03 Thread schspa


At the moment the following QEMU command line triggers an assertion
failure On xlnx-versal SOC:
  qemu-system-aarch64 \
  -machine xlnx-versal-virt -nographic -smp 2 -m 128 \
  -fsdev local,id=shareid,path=${HOME}/work,security_model=none \
  -device virtio-9p-device,fsdev=shareid,mount_tag=share \
  -fsdev local,id=shareid1,path=${HOME}/Music,security_model=none \
  -device virtio-9p-device,fsdev=shareid1,mount_tag=share1

  qemu-system-aarch64: ../migration/savevm.c:860:
  vmstate_register_with_alias_id:
  Assertion `!se->compat || se->instance_id == 0' failed.

This problem was fixed on arm virt platform in patch
 
https://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg01119.html

It works perfectly on arm virt platform. but there is still there on
xlnx-versal SOC.

The main difference between arm virt and xlnx-versal is they use
different way to create virtio-mmio qdev. on arm virt, it calls
sysbus_create_simple("virtio-mmio", base, pic[irq]); which will call
sysbus_mmio_map internally and assign base address to subsys device
mmio correctly. but xlnx-versal's implements won't do this.

However, xlnx-versal can't switch to sysbus_create_simple() to create
virtio-mmio device. It's because xlnx-versal's cpu use
VersalVirt.soc.fpd.apu.mr as it's memory. which is subregion of
system_memory. sysbus_create_simple will add virtio to system_memory,
which can't be accessed by cpu.

We can solve this by simply assign mmio[0].addr directly. makes
virtio_mmio_bus_get_dev_path to produce correct unique device path.

Signed-off-by: schspa 
---
 hw/arm/xlnx-versal-virt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index 8482cd6196..87b92ec6c3 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -490,6 +490,7 @@ static void create_virtio_regions(VersalVirt *s)
 object_property_add_child(OBJECT(&s->soc), name, OBJECT(dev));
 sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic_irq);
+    SYS_BUS_DEVICE(dev)->mmio[0].addr = base;
 mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
 memory_region_add_subregion(&s->soc.mr_ps, base, mr);
 g_free(name);
-- 
2.30.0





Re: [PATCH 02/22] tests/acceptance/boot_linux.py: rename misleading cloudinit method

2021-02-03 Thread Thomas Huth

On 03/02/2021 18.23, Cleber Rosa wrote:

There's no downloading happening on that method, so let's call it
"prepare" instead.  While at it, and because of it, the current
"prepare_boot" and "prepare_cloudinit" are also renamed.

The reasoning here is that "prepare_" methods will just work on the
images, while "set_up_" will make them effective to the VM that will
be launched.  Inspiration comes from the "virtiofs_submounts.py"
tests, which this expects to converge more into.

Signed-off-by: Cleber Rosa 
---
  tests/acceptance/boot_linux.py | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)


Reviewed-by: Thomas Huth 




Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding

2021-02-03 Thread David Hildenbrand


> Am 04.02.2021 um 03:22 schrieb Richard Henderson 
> :
> 
> On 2/1/21 10:45 AM, Richard W.M. Jones wrote:
>> This commit breaks running certain s390x binaries, at least
>> the "mount" command (or a library it uses) breaks.
>> 
>> More details in this BZ:
>> 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1922248
>> 
>> Could we revert this change since it seems to have caused other
>> problems as well?
> 
> Well, the other problems have been fixed (which were in fact latent, and could
> have been produced by other means).  I would not like to sideline this patch
> set indefinitely.
> 
> Could you give me some help extracting the relevant binaries?  "Begin with an
> s390x host" is a non-starter.
> 

Hi,

I‘m planning on reproducing it today or tomorrow. Especially, finding a 
reproducer and trying reproducing on x86-64 host.

> FWIW, with qemu-system-s390x, booting debian, building qemu-s390x, and running
> "/bin/mount -t proc proc /mnt" under double-emulation does not show the bug.
> 
> I suspect that's because debian targets a relatively old s390x cpu, and that
> yours is using the relatively new vector instructions.  But I don't know.
> 
> What I do know is that current qemu doesn't seem to boot current fedora:
> 
> $ ../bld/qemu-system-s390x -nographic -m 4G -cpu max -drive
> file=Fedora-Server-netinst-s390x-33-1.2.iso,format=raw,if=virtio
> qemu-system-s390x: warning: 'msa5-base' requires 'kimd-sha-512'.
> qemu-system-s390x: warning: 'msa5-base' requires 'klmd-sha-512'.
> LOADPARM=[]
> Using virtio-blk.
> ISO boot image size verified
> 
> KASLR disabled: CPU has no PRNG
> Linux version 5.8.15-301.fc33.s390x
> (mockbu...@buildvm-s390x-07.s390.fedoraproject.org) 1 SMP Thu Oct 15 15:55:57
> UTC 2020Kernel fault: interruption code 0005 ilc:2
> PSW : 20018000 000124c4
>   R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3
> GPRS:  000b806a2da6 7aa19c5cbb980703 95f62d65812b83ab
>   d5e42882af203615 000b806a2da0  
>   000230e8 01438500 01720320 
>   01442718 00010070 00012482 bf20
> 
> Which makes me thing that fedora 33 is now targeting a cpu that is too new and
> not actually supported by tcg.
> 

Try rawhide instead, that worked when testing the clang build fixes. 
Alternatively, boot F33 via kernel and initrd.

The Fedora 33 iso is broken and cannot boot under KVM as well (the combined 
kernel+initrd file is messed up).

Cheers!

> 
> r~
> 




Re: gitlab containers are broken

2021-02-03 Thread Richard Henderson
On 2/3/21 8:03 PM, Thomas Huth wrote:
> On 04/02/2021 00.04, Richard Henderson wrote:
>> Something has gone wrong with the building of the containers
>> in gitlab, because *all* off them are installing Alpine Linux.
>>
>> https://gitlab.com/rth7680/qemu/-/jobs/1006336396#L155
> 
> I think that's ok ... the output about alpine that you see there is just the
> output from the container that builds the final container. Later you can see
> some "yum install" lines in that output, too, that's where the CentOS 
> container
> gets build. And the final compilation job runs on CentOS, too:
> 
>  https://gitlab.com/rth7680/qemu/-/jobs/1006336699#L35
> 
> (look for the string "Red Hat" there)

Hmm.  Is there any way to get the full output of the container build?  At
present it's being truncated:

#7 [4/5] RUN yum install -y bzip2 bzip2-devel ccache csnappy-de...


In particular, I'm trying to add a new test, and I have added libffi-devel.i686
to the fedora-i386-cross.docker file, but then the actual build fails because
the libffi header file is missing.

I know you may need the actual patch to comment, but pointers to how to debug
this sort of failure are welcome.


r~



Re: gitlab containers are broken

2021-02-03 Thread Thomas Huth

On 04/02/2021 00.04, Richard Henderson wrote:

Something has gone wrong with the building of the containers
in gitlab, because *all* off them are installing Alpine Linux.

https://gitlab.com/rth7680/qemu/-/jobs/1006336396#L155


I think that's ok ... the output about alpine that you see there is just the 
output from the container that builds the final container. Later you can see 
some "yum install" lines in that output, too, that's where the CentOS 
container gets build. And the final compilation job runs on CentOS, too:


 https://gitlab.com/rth7680/qemu/-/jobs/1006336699#L35

(look for the string "Red Hat" there)


I presume that IMAGE is not actually being passed through, and alpine.docker is
lexicographically first.

I have a strong suspicion that it's related to local "make docker" breakage, in
that e.g.

$ make docker-test-build@fedora-i386-cross
/usr/bin/python3 -B /home/rth/qemu/qemu/meson/meson.py introspect --targets
--tests --benchmarks | /usr/bin/python3 -B scripts/mtest2make.py > 
Makefile.mtest
   GIT ui/keycodemapdb tests/fp/berkeley-testfloat-3
tests/fp/berkeley-softfloat-3 meson dtc capstone slirp
   GIT ui/keycodemapdb tests/fp/berkeley-testfloat-3
tests/fp/berkeley-softfloat-3 meson dtc capstone slirp
make: *** No rule to make target 'docker-test-build@fedora-i386-cross'.  Stop.

which certainly looks like the docker-TEST@IMAGE format documented.


No clue about that, local containers never really worked for me... Alex? 
Philippe? Any ideas?


 Thomas




Re: [PATCH v4 0/9] hw/sd: Support block read/write in SPI mode

2021-02-03 Thread Bin Meng
On Thu, Jan 28, 2021 at 2:30 PM Bin Meng  wrote:
>
> From: Bin Meng 
>
> This includes the previously v3 series [1], and one single patch [2].
>
> Compared to v3, this fixed the following issue in patch [v3,6/6]:
> - Keep the card state to SSI_SD_CMD instead of SSI_SD_RESPONSE after
>   receiving the STOP_TRAN token per the spec
>
> All software tested so far (U-Boot/Linux/VxWorks) do work without
> the fix, but it is better to comform with the spec.
>
> In addition to [2], one more issue was exposed when testing with
> VxWorks driver related to STOP_TRANSMISSION (CMD12) response.
>
> [1] http://patchwork.ozlabs.org/project/qemu-devel/list/?series=226136
> [2] 
> http://patchwork.ozlabs.org/project/qemu-devel/patch/1611636214-52427-1-git-send-email-bmeng...@gmail.com/
>
> Changes in v4:
> - Keep the card state to SSI_SD_CMD instead of SSI_SD_RESPONSE after
>   receiving the STOP_TRAN token per the spec
> - new patch: fix STOP_TRANSMISSION (CMD12) response
> - new patch: handle the rest commands with R1b response type
>

Ping?



[Bug 1914535] [NEW] PL110 8-bit mode is not emulated correctly

2021-02-03 Thread Vadim Averin
Public bug reported:

When the emulated pl110/pl111 is switched programmatically to 8-bit
color depth mode, the display is drawn green and blue, but the real
PL110 displays grayscale in 8-bit mode.

The bug appears in qemu-system-arm version 3.1.0 (Debian
1:3.1+dfsg-8+deb10u8) and qemu-system-arm version 5.2.50
(v5.2.0-1579-g99ae0cd90d).

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1914535

Title:
  PL110 8-bit mode is not emulated correctly

Status in QEMU:
  New

Bug description:
  When the emulated pl110/pl111 is switched programmatically to 8-bit
  color depth mode, the display is drawn green and blue, but the real
  PL110 displays grayscale in 8-bit mode.

  The bug appears in qemu-system-arm version 3.1.0 (Debian
  1:3.1+dfsg-8+deb10u8) and qemu-system-arm version 5.2.50
  (v5.2.0-1579-g99ae0cd90d).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1914535/+subscriptions



Re: [PATCH v2 2/2] hw/ppc: e500: Fill in correct for the serial nodes

2021-02-03 Thread Bin Meng
On Thu, Feb 4, 2021 at 12:58 PM David Gibson
 wrote:
>
> On Wed, Feb 03, 2021 at 10:24:48PM +0800, Bin Meng wrote:
> > From: Bin Meng 
> >
> > At present the  property of the serial node is
> > populated with value zero. U-Boot's ns16550 driver is not happy
> > about this, so let's fill in a meaningful value.
>
> Are you sure this is correct - that is that the serial clock is really
> the same as the overall system clock?  Quite often there's some kind
> of divider in between.
>

Yes, see the U-Boot codes include/configs/qemu-ppce500.h

#define CONFIG_SYS_NS16550_CLK  (get_bus_freq(0))

get_bus_freq(0) eventually returns the platform clock frequency which is 400MHz.

But the value doesn't matter anyway for QEMU. We don't emulate any
baud rate specific thing for a serial port. We only need a sane value
that is non-zero.

Regards,
Bin



Re: [PATCH v2 2/2] hw/ppc: e500: Fill in correct for the serial nodes

2021-02-03 Thread David Gibson
On Wed, Feb 03, 2021 at 10:24:48PM +0800, Bin Meng wrote:
> From: Bin Meng 
> 
> At present the  property of the serial node is
> populated with value zero. U-Boot's ns16550 driver is not happy
> about this, so let's fill in a meaningful value.

Are you sure this is correct - that is that the serial clock is really
the same as the overall system clock?  Quite often there's some kind
of divider in between.

> 
> Signed-off-by: Bin Meng 
> Reviewed-by: Philippe Mathieu-Daudé 
> 
> ---
> 
> (no changes since v1)
> 
>  hw/ppc/e500.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
> index c795276..01517a6 100644
> --- a/hw/ppc/e500.c
> +++ b/hw/ppc/e500.c
> @@ -126,7 +126,7 @@ static void dt_serial_create(void *fdt, unsigned long 
> long offset,
>  qemu_fdt_setprop_string(fdt, ser, "compatible", "ns16550");
>  qemu_fdt_setprop_cells(fdt, ser, "reg", offset, 0x100);
>  qemu_fdt_setprop_cell(fdt, ser, "cell-index", idx);
> -qemu_fdt_setprop_cell(fdt, ser, "clock-frequency", 0);
> +qemu_fdt_setprop_cell(fdt, ser, "clock-frequency", PLATFORM_CLK_FREQ_HZ);
>  qemu_fdt_setprop_cells(fdt, ser, "interrupts", 42, 2);
>  qemu_fdt_setprop_phandle(fdt, ser, "interrupt-parent", mpic);
>  qemu_fdt_setprop_string(fdt, "/aliases", alias, ser);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v2 1/2] hw/ppc: e500: Use a macro for the platform clock frequency

2021-02-03 Thread David Gibson
On Wed, Feb 03, 2021 at 10:24:47PM +0800, Bin Meng wrote:
> From: Bin Meng 
> 
> At present the platform clock frequency is using a magic number.
> Convert it to a macro and use it everywhere.
> 
> Signed-off-by: Bin Meng 
> Reviewed-by: Philippe Mathieu-Daudé 

Applied to ppc-for-6.0, thanks.

> 
> ---
> 
> Changes in v2:
> - Rename the macro per Philippe's comments
> 
>  hw/ppc/e500.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
> index c64b5d0..c795276 100644
> --- a/hw/ppc/e500.c
> +++ b/hw/ppc/e500.c
> @@ -74,6 +74,8 @@
>  #define MPC8544_I2C_IRQ43
>  #define RTC_REGS_OFFSET0x68
>  
> +#define PLATFORM_CLK_FREQ_HZ   (400 * 1000 * 1000)
> +
>  struct boot_info
>  {
>  uint32_t dt_base;
> @@ -320,8 +322,8 @@ static int ppce500_load_device_tree(PPCE500MachineState 
> *pms,
>  int fdt_size;
>  void *fdt;
>  uint8_t hypercall[16];
> -uint32_t clock_freq = 4;
> -uint32_t tb_freq = 4;
> +uint32_t clock_freq = PLATFORM_CLK_FREQ_HZ;
> +uint32_t tb_freq = PLATFORM_CLK_FREQ_HZ;
>  int i;
>  char compatible_sb[] = "fsl,mpc8544-immr\0simple-bus";
>  char *soc;
> @@ -890,7 +892,7 @@ void ppce500_init(MachineState *machine)
>  env->spr_cb[SPR_BOOKE_PIR].default_value = cs->cpu_index = i;
>  env->mpic_iack = pmc->ccsrbar_base + MPC8544_MPIC_REGS_OFFSET + 0xa0;
>  
> -ppc_booke_timers_init(cpu, 4, PPC_TIMER_E500);
> +ppc_booke_timers_init(cpu, PLATFORM_CLK_FREQ_HZ, PPC_TIMER_E500);
>  
>  /* Register reset handler */
>  if (!i) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH v1 01/01] PCIe DOE for PCIe and CXL 2.0

2021-02-03 Thread Chris Browy
Hi Jonathan,
  
Thanks for the review comments and we'll put out a v2 patch series
based on a genuine git send-email flow in a day or so and plan to include
- functionally separate patches
- new MSI-X support
- few bugs found in CDAT table header + checksum generation
- more fully respond to review comments (thanks again!)

After the SSWG responds to your email on spec clarifications we'll work on
adding user-defined CDAT entries.  Thanks for raising the issues with SSWG!

It would be good to collaborate on how best to specify external CDAT files.
One idea is to provide -device command line property for filenames.  Files
could be ascii format specifying the CDAT struct instances with named fields and
value pairs.  Some checks could be adding when reading in the files.  Users 
could
specify the CDAT structure types in any order and have multiple instances.

Just like you we feel what's most important is to have DOE supported so that
UEFI and Linux kernel and drivers can progress.  We're also contributing to
writing compliance tests for the CXL Compliance Software Development WG.

Note your email did not post to lore.kernel.org/qemu-devel despite being CC’d.
Maybe a --in-replies-to issue.  I’ve restored that here in this email reply.

Best Regards,
Chris


On 2/3/21, 12:19 PM, "Jonathan Cameron"  wrote:

On Tue, 2 Feb 2021 15:43:28 -0500
Chris Browy  wrote:

Hi Chris,

Whilst I appreciate that this is very much an RFC and so not in the
form you would eventually aim to present it in, please look for
a v2 to break this into a series of functionally separate patches.
Probably.

1. Introduce DOE support with no users - probably including the
   discovery protocol
2. CMA support
3. CDAT support for CXL
4. Compliance part.

It's also well worth jumping through the hoops needed to get a
git send-email workflow up and running as you seem to have had some
trouble with getting the thread to send in one go etc.

Clearly we now have two possible implementations for this functionality.
Personally I don't care which one we take forwards - if nothing else
the exercise has highlighted some disagreements in spec interpretation
that need clearing up.  I've mailed one big one to the SSWG list today.

I found a few things I definitely got wrong as well whilst reading this :)
Always advantages in having multiple implementations given we don't have
hardware yet.

Jonathan

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 981dc92e25..4fb865e0b3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1655,6 +1655,13 @@ F: docs/pci*
>   F: docs/specs/*pci*
>   F: default-configs/pci.mak
> 
> +PCIE DOE
> +M: Huai-Cheng Kuo 
> +M: Chris Browy 
> +S: Supported
> +F: include/hw/pci/pcie_doe.h
> +F: hw/pci/pcie_doe.c
> +
>   ACPI/SMBIOS
>   M: Michael S. Tsirkin 
>   M: Igor Mammedov 
> diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> index e1bcee5bdb..c49d2aa896 100644
> --- a/hw/cxl/cxl-component-utils.c
> +++ b/hw/cxl/cxl-component-utils.c
> @@ -195,3 +195,154 @@ void cxl_component_create_dvsec(CXLComponentState 
*cxl, uint16_t length,
>   range_init_nofail(&cxl->dvsecs[type], cxl->dvsec_offset, length);
>   cxl->dvsec_offset += length;
>   }
> +
> +uint32_t cxl_doe_compliance_init(CXLComponentState *cxl_cstate)
> +{
> +PCIDevice *pci_dev = cxl_cstate->pdev;
> +uint32_t req;
> +uint32_t byte_cnt = 0;
> +
> +DOE_DBG(">> %s\n",  __func__);
> +
> +req = ((struct cxl_compliance_mode_cap *)pcie_doe_get_req(pci_dev))
> +->req_code;
> +switch (req) {
> +case CXL_COMP_MODE_CAP:
> +byte_cnt = sizeof(struct cxl_compliance_mode_cap_rsp);
> +cxl_cstate->doe_resp.cap_rsp.header.vendor_id = CXL_VENDOR_ID;
> +cxl_cstate->doe_resp.cap_rsp.header.doe_type = 
CXL_DOE_COMPLIANCE;
> +cxl_cstate->doe_resp.cap_rsp.header.reserved = 0x0;
> +cxl_cstate->doe_resp.cap_rsp.header.length =
> +dwsizeof(struct cxl_compliance_mode_cap_rsp);
> +cxl_cstate->doe_resp.cap_rsp.rsp_code = 0x0;
> +cxl_cstate->doe_resp.cap_rsp.version = 0x1;
> +cxl_cstate->doe_resp.cap_rsp.length = 0x1c;
> +cxl_cstate->doe_resp.cap_rsp.status = 0x0;
> +cxl_cstate->doe_resp.cap_rsp.available_cap_bitmask = 0x3;
> +cxl_cstate->doe_resp.cap_rsp.enabled_cap_bitmask = 0x3;
> +break;
> +case CXL_COMP_MODE_STATUS:
> +byte_cnt = sizeof(struct cxl_compliance_mode_status_rsp);
> +cxl_cstate->doe_resp.status_rsp.header.vendor_id = CXL_VENDOR_ID;
> +cxl_cstate->doe_resp.status_rsp.header.doe_type = 
CXL_DOE_COMPLIANCE;
> +cxl_cstate->doe_resp.status_rsp.header.reserved = 0x0;
>

Re: [PATCH v2 00/93] TCI fixes and cleanups

2021-02-03 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20210204014509.882821-1-richard.hender...@linaro.org/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210204014509.882821-1-richard.hender...@linaro.org
Subject: [PATCH v2 00/93] TCI fixes and cleanups

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20210127232151.3523581-1-f4...@amsat.org -> 
patchew/20210127232151.3523581-1-f4...@amsat.org
 - [tag update]  patchew/20210128144125.3696119-1-f4...@amsat.org -> 
patchew/20210128144125.3696119-1-f4...@amsat.org
 * [new tag] 
patchew/20210204014509.882821-1-richard.hender...@linaro.org -> 
patchew/20210204014509.882821-1-richard.hender...@linaro.org
Switched to a new branch 'test'
8b0bc01 tcg/tci: Implement add2, sub2
da10429 tcg/tci: Implement mulu2, muls2
ced4f5a tcg/tci: Implement clz, ctz, ctpop
1e4852c tcg/tci: Implement extract, sextract
f3f91cc tcg/tci: Implement andc, orc, eqv, nand, nor
87a5d9e tcg/tci: Implement movcond
9976361 tcg/tci: Implement goto_ptr
11e005d tcg/tci: Change encoding to uint32_t units
d42a563 tcg/tci: Remove tci_write_reg
231b705 tcg/tci: Emit setcond before brcond
055b38e tcg/tci: Reserve r13 for a temporary
081aea4 tcg/tci: Split out tcg_out_op_r[iI]
fff070a tcg/tci: Split out tcg_out_op_np
015559a tcg/tci: Split out tcg_out_op_v
89bee6c tcg/tci: Split out tcg_out_op_{rrm,rrrm,m}
2311b1a tcg/tci: Split out tcg_out_op_cl
b226664 tcg/tci: Split out tcg_out_op_
08b2642 tcg/tci: Split out tcg_out_op_rr
36b029b tcg/tci: Split out tcg_out_op_rrcl
a78060f tcg/tci: Split out tcg_out_op_rrrbb
80891b3 tcg/tci: Split out tcg_out_op_rc
030b2c5 tcg/tci: Split out tcg_out_op_rrrc
152d803 tcg/tci: Split out tcg_out_op_rrr
219243e tcg/tci: Split out tcg_out_op_rr
bc62fe5 tcg/tci: Split out tcg_out_op_p
ae4c05c tcg/tci: Split out tcg_out_op_l
420d1be tcg/tci: Split out tcg_out_op_rrs
d67ed68 tcg/tci: Push opcode emit into each case
5e85088 tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order
8ef0a82 tcg/tci: Improve tcg_target_call_clobber_regs
6b7a3c7 tcg/tci: Use ffi for calls
3603af3 tcg: Build ffi data structures for helpers
a3ee3dd tcg/tci: Implement the disassembler properly
938c48e tcg/tci: Remove tci_disas
102a4b7 tcg/tci: Hoist op_size checking into tci_args_*
5f4a91e tcg/tci: Split out tci_args_{rrm,rrrm,m}
4ab25b1 tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits
5112952 tcg/tci: Clean up deposit operations
25e378a tcg/tci: Split out tci_args_
98e9b3f tcg/tci: Split out tci_args_rr
0156bf9 tcg/tci: Reuse tci_args_l for goto_tb
0f4b492 tcg/tci: Reuse tci_args_l for exit_tb
0697d2d tcg/tci: Reuse tci_args_l for calls.
d351c13 tcg/tci: Split out tci_args_ri and tci_args_rI
699ba0f tcg/tci: Split out tci_args_rrcl and tci_args_cl
8886438 tcg/tci: Split out tci_args_rc
40f64f6 tcg/tci: Split out tci_args_l
b14402b tcg/tci: Split out tci_args_rrrc
02a8039 tcg/tci: Split out tci_args_rrr
a91c032 tcg/tci: Split out tci_args_rr
cd078fe tcg/tci: Split out tci_args_rrs
943fd28 tcg/tci: Rename tci_read_r to tci_read_rval
0d2df6a tcg/tci: Merge mov, not and neg operations
c8f29d9 tcg/tci: Merge bswap operations
93e1dd4 tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64
4ba996d tcg/tci: Merge extension operations
8e5b80c tcg/tci: Merge basic arithmetic operations
21a34d1 tcg/tci: Reduce use of tci_read_r64
e98b67d tcg/tci: Remove tci_read_r32s
0b2209a tcg/tci: Remove tci_read_r16s
23f5ca8 tcg/tci: Remove tci_read_r16
b1e734e tcg/tci: Remove tci_read_r8s
4d485d6 tcg/tci: Remove tci_read_r8
8ddeee2 tcg/tci: Merge identical cases in generation
01b68d4 tcg/tci: Remove TCG_CONST
3f384f5 tcg/tci: Use bool in tcg_out_ri*
c2365d0 tcg/tci: Fix TCG_REG_R4 misusage
5b68471 tcg/tci: Restrict TCG_TARGET_NB_REGS to 16
ad391cb tcg/tci: Remove TODO as unused
1a32255 tcg/tci: Implement 64-bit division
9b566c3 tcg/tci: Remove dead code for TCG_TARGET_HAS_div2_*
1e76758 tcg/tci: Use g_assert_not_reached
927c77f tcg/tci: Merge INDEX_op_{st_i32,st32_i64}
c794a09 tcg/tci: Move stack bounds check to compile-time
393a624 tcg/tci: Merge INDEX_op_st16_{i32,i64}
7ef8c09 tcg/tci: Merge INDEX_op_st8_{i32,i64}
10e9b92 tcg/tci: Merge INDEX_op_{ld_i32,ld32u_i64}
116f33d tcg/tci: Merge INDEX_op_ld16s_{i32,i64}
5f7ede9 tcg/tci: Merge INDEX_op_ld16u_{i32,i64}
8dcd478 tcg/tci: Merge INDEX_op_ld8s_{i32,i64}
01ef162 tcg/tci: Merge INDEX_op_ld8u_{i32,i64}
fccf4b6 tcg/tci: Inline tci_write_reg64 into 64-bit callers
c245070 tcg/tci: Inline tci_write_reg32 into all callers
84dcf2a tcg/tci: Inline tci_write_reg16 into the only caller
df64108 tcg/tci: Inline 

Re: [RFC 09/10] vhost: Route guest->host notification through shadow virtqueue

2021-02-03 Thread Jason Wang



On 2021/2/2 下午6:08, Eugenio Perez Martin wrote:

On Mon, Feb 1, 2021 at 7:29 AM Jason Wang  wrote:


On 2021/1/30 上午4:54, Eugenio Pérez wrote:

Shadow virtqueue notifications forwarding is disabled when vhost_dev
stops.

Signed-off-by: Eugenio Pérez 
---
   hw/virtio/vhost-shadow-virtqueue.h |   5 ++
   include/hw/virtio/vhost.h  |   4 +
   hw/virtio/vhost-shadow-virtqueue.c | 123 +-
   hw/virtio/vhost.c  | 135 -
   4 files changed, 264 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 6cc18d6acb..466f8ae595 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -17,6 +17,11 @@

   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;

+bool vhost_shadow_vq_start_rcu(struct vhost_dev *dev,
+   VhostShadowVirtqueue *svq);
+void vhost_shadow_vq_stop_rcu(struct vhost_dev *dev,
+  VhostShadowVirtqueue *svq);
+
   VhostShadowVirtqueue *vhost_shadow_vq_new(struct vhost_dev *dev, int idx);

   void vhost_shadow_vq_free(VhostShadowVirtqueue *vq);
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 2be782cefd..732a4b2a2b 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -55,6 +55,8 @@ struct vhost_iommu {
   QLIST_ENTRY(vhost_iommu) iommu_next;
   };

+typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+
   typedef struct VhostDevConfigOps {
   /* Vhost device config space changed callback
*/
@@ -83,7 +85,9 @@ struct vhost_dev {
   uint64_t backend_cap;
   bool started;
   bool log_enabled;
+bool sw_lm_enabled;
   uint64_t log_size;
+VhostShadowVirtqueue **shadow_vqs;
   Error *migration_blocker;
   const VhostOps *vhost_ops;
   void *opaque;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index c0c967a7c5..908c36c66d 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -8,15 +8,129 @@
*/

   #include "hw/virtio/vhost-shadow-virtqueue.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/virtio-access.h"
+
+#include "standard-headers/linux/vhost_types.h"
+#include "standard-headers/linux/virtio_ring.h"

   #include "qemu/error-report.h"
-#include "qemu/event_notifier.h"
+#include "qemu/main-loop.h"

   typedef struct VhostShadowVirtqueue {
   EventNotifier kick_notifier;
   EventNotifier call_notifier;
+const struct vhost_virtqueue *hvq;
+VirtIODevice *vdev;
+VirtQueue *vq;
   } VhostShadowVirtqueue;


So instead of doing things at virtio level, how about do the shadow
stuffs at vhost level?

It works like:

virtio -> [shadow vhost backend] -> vhost backend

Then the QMP is used to plug the shadow vhost backend in the middle or not.

It looks kind of easier since we don't need to deal with virtqueue
handlers etc.. Instead, we just need to deal with eventfd stuffs:

When shadow vhost mode is enabled, we just intercept the host_notifiers
and guest_notifiers. When it was disabled, we just pass the host/guest
notifiers to the real vhost backends?


Hi Jason.

Sure we can try that model, but it seems to me that it comes with a
different set of problems.

For example, there are code in vhost.c that checks if implementations
are available in vhost_ops, like:

if (dev->vhost_ops->vhost_vq_get_addr) {
 r = dev->vhost_ops->vhost_vq_get_addr(dev, &addr, vq);
 ...
}

I can count 14 of these, checking:

dev->vhost_ops->vhost_backend_can_merge
dev->vhost_ops->vhost_backend_mem_section_filter
dev->vhost_ops->vhost_force_iommu
dev->vhost_ops->vhost_requires_shm_log
dev->vhost_ops->vhost_set_backend_cap
dev->vhost_ops->vhost_set_vring_busyloop_timeout
dev->vhost_ops->vhost_vq_get_addr
hdev->vhost_ops->vhost_dev_start
hdev->vhost_ops->vhost_get_config
hdev->vhost_ops->vhost_get_inflight_fd
hdev->vhost_ops->vhost_net_set_backend
hdev->vhost_ops->vhost_set_config
hdev->vhost_ops->vhost_set_inflight_fd
hdev->vhost_ops->vhost_set_iotlb_callback

So we should Implement all of the vhost_ops callbacks, forwarding them
to actual vhost_backed, and delete conditionally these ones? In other
words, dynamically generate the new shadow vq vhost_ops? If a new
callback is added to any vhost backend in the future, do we have to
force the adding / checking for NULL in shadow backend vhost_ops?
Would this be a good moment to check if all backends implement these
and delete the checks?



I think it won't be easy if we want to support all kinds of vhost 
backends from the start. So we can go with vhost-vdpa one first.


Actually how it work might be something like (no need to switch 
vhost_ops, we can do everything silently in the ops)


1) when device to switch to shadow vq (e.g via QMP)
2) vhost-vdpa will stop and sync state (last_avail_idx) internally
3) reset vhost-vdpa, clean call and k

Re: [PATCH 1/2] target/cris: Use MMUAccessType enum type when possible

2021-02-03 Thread Richard Henderson
On 1/27/21 2:32 PM, Philippe Mathieu-Daudé wrote:
> Replace the 0/1/2 magic values by the corresponding MMUAccessType.
> We can remove a comment as enum names are self explicit.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/cris/helper.c |  4 ++--
>  target/cris/mmu.c| 13 ++---
>  2 files changed, 8 insertions(+), 9 deletions(-)
Reviewed-by: Richard Henderson 


r~




Re: [RFC 05/10] vhost: Add vhost_dev_from_virtio

2021-02-03 Thread Jason Wang



On 2021/2/2 下午6:17, Eugenio Perez Martin wrote:

On Tue, Feb 2, 2021 at 4:31 AM Jason Wang  wrote:


On 2021/2/1 下午4:28, Eugenio Perez Martin wrote:

On Mon, Feb 1, 2021 at 7:13 AM Jason Wang  wrote:

On 2021/1/30 上午4:54, Eugenio Pérez wrote:

Signed-off-by: Eugenio Pérez 
---
include/hw/virtio/vhost.h |  1 +
hw/virtio/vhost.c | 17 +
2 files changed, 18 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 4a8bc75415..fca076e3f0 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -123,6 +123,7 @@ uint64_t vhost_get_features(struct vhost_dev *hdev, const 
int *feature_bits,
void vhost_ack_features(struct vhost_dev *hdev, const int *feature_bits,
uint64_t features);
bool vhost_has_free_slot(void);
+struct vhost_dev *vhost_dev_from_virtio(const VirtIODevice *vdev);

int vhost_net_set_backend(struct vhost_dev *hdev,
  struct vhost_vring_file *file);
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 28c7d78172..8683d507f5 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -61,6 +61,23 @@ bool vhost_has_free_slot(void)
return slots_limit > used_memslots;
}

+/*
+ * Get the vhost device associated to a VirtIO device.
+ */
+struct vhost_dev *vhost_dev_from_virtio(const VirtIODevice *vdev)
+{
+struct vhost_dev *hdev;
+
+QLIST_FOREACH(hdev, &vhost_devices, entry) {
+if (hdev->vdev == vdev) {
+return hdev;
+}
+}
+
+assert(hdev);
+return NULL;
+}

I'm not sure this can work in the case of multiqueue. E.g vhost-net
multiqueue is a N:1 mapping between vhost devics and virtio devices.

Thanks


Right. We could add an "vdev vq index" parameter to the function in
this case, but I guess the most reliable way to do this is to add a
vhost_opaque value to VirtQueue, as Stefan proposed in previous RFC.


So the question still, it looks like it's easier to hide the shadow
virtqueue stuffs at vhost layer instead of expose them to virtio layer:

1) vhost protocol is stable ABI
2) no need to deal with virtio stuffs which is more complex than vhost

Or are there any advantages if we do it at virtio layer?


As far as I can tell, we will need the virtio layer the moment we
start copying/translating buffers.

In this series, the virtio dependency can be reduced if qemu does not
check the used ring _F_NO_NOTIFY flag before writing to irqfd. It
would enable packed queues and IOMMU immediately, and I think the cost
should not be so high. In the previous RFC this check was deleted
later anyway, so I think it was a bad idea to include it from the start.



I am not sure I understand here. For vhost, we can still do anything we 
want, e.g accessing guest memory etc. Any blocker that prevent us from 
copying/translating buffers? (Note that qemu will propagate memory 
mappings to vhost).


Thanks









Thanks



I need to take this into account in qmp_x_vhost_enable_shadow_vq too.


+
static void vhost_dev_sync_region(struct vhost_dev *dev,
  MemoryRegionSection *section,
  uint64_t mfirst, uint64_t mlast,







Re: [PATCH 2/2] target/nios2: Use MMUAccessType enum type when possible

2021-02-03 Thread Richard Henderson
On 1/27/21 1:41 PM, Philippe Mathieu-Daudé wrote:
> All callers of mmu_translate() provide it a MMUAccessType
> type. Let the prototype use it as argument, as it is stricter
> than an integer. We can remove the documentation as enum
> names are self explicit.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/nios2/mmu.h | 3 ++-
>  target/nios2/mmu.c | 4 ++--
>  2 files changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson 


r~




Re: [PATCH 3/5] target/sh4: Pass mmu_idx to get_physical_address()

2021-02-03 Thread Richard Henderson
On 1/27/21 1:21 PM, Philippe Mathieu-Daudé wrote:
> get_mmu_address() and get_physical_address() don't use their
> 'int access_type' argument: remove it along with  ACCESS_INT
> in superh_cpu_tlb_fill().

Sure.

> Pass the MMU index along, as other targets do.

But if it's unused, why?


r~



Re: [PATCH v8 07/13] confidential guest support: Introduce cgs "ready" flag

2021-02-03 Thread David Gibson
On Wed, Feb 03, 2021 at 05:15:48PM +0100, Greg Kurz wrote:
> On Tue,  2 Feb 2021 15:13:09 +1100
> David Gibson  wrote:
> 
> > The platform specific details of mechanisms for implementing
> > confidential guest support may require setup at various points during
> > initialization.  Thus, it's not really feasible to have a single cgs
> > initialization hook, but instead each mechanism needs its own
> > initialization calls in arch or machine specific code.
> > 
> > However, to make it harder to have a bug where a mechanism isn't
> > properly initialized under some circumstances, we want to have a
> > common place, late in boot, where we verify that cgs has been
> > initialized if it was requested.
> > 
> > This patch introduces a ready flag to the ConfidentialGuestSupport
> > base type to accomplish this, which we verify in
> > qemu_machine_creation_done().
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  include/exec/confidential-guest-support.h | 24 +++
> >  softmmu/vl.c  | 10 ++
> >  target/i386/sev.c |  2 ++
> >  3 files changed, 36 insertions(+)
> > 
> > diff --git a/include/exec/confidential-guest-support.h 
> > b/include/exec/confidential-guest-support.h
> > index 3db6380e63..5dcf602047 100644
> > --- a/include/exec/confidential-guest-support.h
> > +++ b/include/exec/confidential-guest-support.h
> > @@ -27,6 +27,30 @@ OBJECT_DECLARE_SIMPLE_TYPE(ConfidentialGuestSupport, 
> > CONFIDENTIAL_GUEST_SUPPORT)
> >  
> >  struct ConfidentialGuestSupport {
> >  Object parent;
> > +
> > +/*
> > + * ready: flag set by CGS initialization code once it's ready to
> > + *start executing instructions in a potentially-secure
> > + *guest
> > + *
> > + * The definition here is a bit fuzzy, because this is essentially
> > + * part of a self-sanity-check, rather than a strict mechanism.
> > + *
> > + * It's not fasible to have a single point in the common machine
> 
> s/fasible/feasible

Fixed, thanks.

> 
> Anyway,
> 
> Reviewed-by: Greg Kurz 
> 
> > + * init path to configure confidential guest support, because
> > + * different mechanisms have different interdependencies requiring
> > + * initialization in different places, often in arch or machine
> > + * type specific code.  It's also usually not possible to check
> > + * for invalid configurations until that initialization code.
> > + * That means it would be very easy to have a bug allowing CGS
> > + * init to be bypassed entirely in certain configurations.
> > + *
> > + * Silently ignoring a requested security feature would be bad, so
> > + * to avoid that we check late in init that this 'ready' flag is
> > + * set if CGS was requested.  If the CGS init hasn't happened, and
> > + * so 'ready' is not set, we'll abort.
> > + */
> > +bool ready;
> >  };
> >  
> >  typedef struct ConfidentialGuestSupportClass {
> > diff --git a/softmmu/vl.c b/softmmu/vl.c
> > index 1b464e3474..1869ed54a9 100644
> > --- a/softmmu/vl.c
> > +++ b/softmmu/vl.c
> > @@ -101,6 +101,7 @@
> >  #include "qemu/plugin.h"
> >  #include "qemu/queue.h"
> >  #include "sysemu/arch_init.h"
> > +#include "exec/confidential-guest-support.h"
> >  
> >  #include "ui/qemu-spice.h"
> >  #include "qapi/string-input-visitor.h"
> > @@ -2497,6 +2498,8 @@ static void qemu_create_cli_devices(void)
> >  
> >  static void qemu_machine_creation_done(void)
> >  {
> > +MachineState *machine = MACHINE(qdev_get_machine());
> > +
> >  /* Did we create any drives that we failed to create a device for? */
> >  drive_check_orphaned();
> >  
> > @@ -2516,6 +2519,13 @@ static void qemu_machine_creation_done(void)
> >  
> >  qdev_machine_creation_done();
> >  
> > +if (machine->cgs) {
> > +/*
> > + * Verify that Confidential Guest Support has actually been 
> > initialized
> > + */
> > +assert(machine->cgs->ready);
> > +}
> > +
> >  if (foreach_device_config(DEV_GDB, gdbserver_start) < 0) {
> >  exit(1);
> >  }
> > diff --git a/target/i386/sev.c b/target/i386/sev.c
> > index 590cb31fa8..f9e9b5d8ae 100644
> > --- a/target/i386/sev.c
> > +++ b/target/i386/sev.c
> > @@ -737,6 +737,8 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
> > **errp)
> >  qemu_add_machine_init_done_notifier(&sev_machine_done_notify);
> >  qemu_add_vm_change_state_handler(sev_vm_state_change, sev);
> >  
> > +cgs->ready = true;
> > +
> >  return 0;
> >  err:
> >  sev_guest = NULL;
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH v8 10/13] spapr: Add PEF based confidential guest support

2021-02-03 Thread David Gibson
On Wed, Feb 03, 2021 at 06:50:16PM +0100, Greg Kurz wrote:
> On Tue,  2 Feb 2021 15:13:12 +1100
> David Gibson  wrote:
> 
> > Some upcoming POWER machines have a system called PEF (Protected
> > Execution Facility) which uses a small ultravisor to allow guests to
> > run in a way that they can't be eavesdropped by the hypervisor.  The
> > effect is roughly similar to AMD SEV, although the mechanisms are
> > quite different.
> > 
> > Most of the work of this is done between the guest, KVM and the
> > ultravisor, with little need for involvement by qemu.  However qemu
> > does need to tell KVM to allow secure VMs.
> > 
> > Because the availability of secure mode is a guest visible difference
> > which depends on having the right hardware and firmware, we don't
> > enable this by default.  In order to run a secure guest you need to
> > create a "pef-guest" object and set the confidential-guest-support
> > property to point to it.
> > 
> > Note that this just *allows* secure guests, the architecture of PEF is
> > such that the guest still needs to talk to the ultravisor to enter
> > secure mode.  Qemu has no direct way of knowing if the guest is in
> > secure mode, and certainly can't know until well after machine
> > creation time.
> > 
> > To start a PEF-capable guest, use the command line options:
> > -object pef-guest,id=pef0 -machine confidential-guest-support=pef0
> > 
> > Signed-off-by: David Gibson 
> > ---
> 
> Reviewed-by: Greg Kurz 
> 
> Just some cosmetic comments in case you need to respin. See below.
> 
> >  docs/confidential-guest-support.txt |   3 +
> >  docs/papr-pef.txt   |  30 +++
> >  hw/ppc/meson.build  |   1 +
> >  hw/ppc/pef.c| 133 
> >  hw/ppc/spapr.c  |   8 +-
> >  include/hw/ppc/pef.h|  17 
> >  target/ppc/kvm.c|  18 
> >  target/ppc/kvm_ppc.h|   6 --
> >  8 files changed, 191 insertions(+), 25 deletions(-)
> >  create mode 100644 docs/papr-pef.txt
> >  create mode 100644 hw/ppc/pef.c
> >  create mode 100644 include/hw/ppc/pef.h
> > 
> > diff --git a/docs/confidential-guest-support.txt 
> > b/docs/confidential-guest-support.txt
> > index bd439ac800..4da4c91bd3 100644
> > --- a/docs/confidential-guest-support.txt
> > +++ b/docs/confidential-guest-support.txt
> > @@ -40,4 +40,7 @@ Currently supported confidential guest mechanisms are:
> >  AMD Secure Encrypted Virtualization (SEV)
> >  docs/amd-memory-encryption.txt
> >  
> > +POWER Protected Execution Facility (PEF)
> > +docs/papr-pef.txt
> > +
> >  Other mechanisms may be supported in future.
> > diff --git a/docs/papr-pef.txt b/docs/papr-pef.txt
> > new file mode 100644
> > index 00..72550e9bf8
> > --- /dev/null
> > +++ b/docs/papr-pef.txt
> > @@ -0,0 +1,30 @@
> > +POWER (PAPR) Protected Execution Facility (PEF)
> > +===
> > +
> > +Protected Execution Facility (PEF), also known as Secure Guest support
> > +is a feature found on IBM POWER9 and POWER10 processors.
> > +
> > +If a suitable firmware including an Ultravisor is installed, it adds
> > +an extra memory protection mode to the CPU.  The ultravisor manages a
> > +pool of secure memory which cannot be accessed by the hypervisor.
> > +
> > +When this feature is enabled in QEMU, a guest can use ultracalls to
> > +enter "secure mode".  This transfers most of its memory to secure
> > +memory, where it cannot be eavesdropped by a compromised hypervisor.
> > +
> > +Launching
> > +-
> > +
> > +To launch a guest which will be permitted to enter PEF secure mode:
> > +
> > +# ${QEMU} \
> > +-object pef-guest,id=pef0 \
> > +-machine confidential-guest-support=pef0 \
> > +...
> > +
> > +Live Migration
> > +
> > +
> > +Live migration is not yet implemented for PEF guests.  For
> > +consistency, we currently prevent migration if the PEF feature is
> > +enabled, whether or not the guest has actually entered secure mode.
> > diff --git a/hw/ppc/meson.build b/hw/ppc/meson.build
> > index ffa2ec37fa..218631c883 100644
> > --- a/hw/ppc/meson.build
> > +++ b/hw/ppc/meson.build
> > @@ -27,6 +27,7 @@ ppc_ss.add(when: 'CONFIG_PSERIES', if_true: files(
> >'spapr_nvdimm.c',
> >'spapr_rtas_ddw.c',
> >'spapr_numa.c',
> > +  'pef.c',
> >  ))
> >  ppc_ss.add(when: 'CONFIG_SPAPR_RNG', if_true: files('spapr_rng.c'))
> >  ppc_ss.add(when: ['CONFIG_PSERIES', 'CONFIG_LINUX'], if_true: files(
> > diff --git a/hw/ppc/pef.c b/hw/ppc/pef.c
> > new file mode 100644
> > index 00..f9fd1f2a71
> > --- /dev/null
> > +++ b/hw/ppc/pef.c
> > @@ -0,0 +1,133 @@
> > +/*
> > + * PEF (Protected Execution Facility) for POWER support
> > + *
> > + * Copyright Red Hat.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or 
> > later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#inc

Re: [PATCH 1/2] target/nios2: Replace magic value by MMU definitions

2021-02-03 Thread Richard Henderson
On 1/27/21 1:41 PM, Philippe Mathieu-Daudé wrote:
> cpu_get_phys_page_debug() uses 'DATA LOAD' MMU access type.
> The first MMU is the supervisor one.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/nios2/helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I'll note that mmu_idx isn't used by mmu_translate.

Reviewed-by: Richard Henderson 


r~




Re: [PATCH 1/5] target/sh4: Fix code style for checkpatch.pl

2021-02-03 Thread Richard Henderson
On 1/27/21 1:21 PM, Philippe Mathieu-Daudé wrote:
> We are going to move this code, fix its style first.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Easier to review using 'git-diff -w -b'
> ---
>  target/sh4/helper.c | 82 ++---
>  1 file changed, 41 insertions(+), 41 deletions(-)

Reviewed-by: Richard Henderson 


r~




Re: [PATCH 2/2] target/cris: Let cris_mmu_translate() use MMUAccessType access_type

2021-02-03 Thread Richard Henderson
On 1/27/21 2:32 PM, Philippe Mathieu-Daudé wrote:
> All callers of cris_mmu_translate() provide a MMUAccessType
> type. Let the prototype use it as argument, as it is stricter
> than an integer. We can remove the documentation as enum
> names are self explicit.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/cris/mmu.h |  2 +-
>  target/cris/mmu.c | 24 
>  2 files changed, 13 insertions(+), 13 deletions(-)

Reviewed-by: Richard Henderson 


r~




Re: [PATCH 2/5] target/sh4: Replace magic value by MMUAccessType definitions

2021-02-03 Thread Richard Henderson
On 1/27/21 1:21 PM, Philippe Mathieu-Daudé wrote:
> Replace the 0/1/2 magic values by the corresponding MMUAccessType.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/sh4/helper.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)

Reviewed-by: Richard Henderson 


r~




Re: [PATCH 4/5] target/sh4: Let get_physical_address() use MMUAccessType access_type

2021-02-03 Thread Richard Henderson
On 1/27/21 1:21 PM, Philippe Mathieu-Daudé wrote:
> superh_cpu_tlb_fill() already provides a access_type variable of
> type MMUAccessType, and it is passed along, but casted as integer
> and renamed 'rw'.
> Simply replace 'int rw' by 'MMUAccessType access_type'.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/sh4/helper.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson 


r~




Re: [PULL 11/24] tcg/optimize: Use tcg_constant_internal with constant folding

2021-02-03 Thread Richard Henderson
On 2/1/21 10:45 AM, Richard W.M. Jones wrote:
> This commit breaks running certain s390x binaries, at least
> the "mount" command (or a library it uses) breaks.
> 
> More details in this BZ:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1922248
> 
> Could we revert this change since it seems to have caused other
> problems as well?

Well, the other problems have been fixed (which were in fact latent, and could
have been produced by other means).  I would not like to sideline this patch
set indefinitely.

Could you give me some help extracting the relevant binaries?  "Begin with an
s390x host" is a non-starter.

FWIW, with qemu-system-s390x, booting debian, building qemu-s390x, and running
"/bin/mount -t proc proc /mnt" under double-emulation does not show the bug.

I suspect that's because debian targets a relatively old s390x cpu, and that
yours is using the relatively new vector instructions.  But I don't know.

What I do know is that current qemu doesn't seem to boot current fedora:

 $ ../bld/qemu-system-s390x -nographic -m 4G -cpu max -drive
file=Fedora-Server-netinst-s390x-33-1.2.iso,format=raw,if=virtio
 qemu-system-s390x: warning: 'msa5-base' requires 'kimd-sha-512'.
 qemu-system-s390x: warning: 'msa5-base' requires 'klmd-sha-512'.
 LOADPARM=[]
 Using virtio-blk.
 ISO boot image size verified

 KASLR disabled: CPU has no PRNG
 Linux version 5.8.15-301.fc33.s390x
(mockbu...@buildvm-s390x-07.s390.fedoraproject.org) 1 SMP Thu Oct 15 15:55:57
UTC 2020Kernel fault: interruption code 0005 ilc:2
 PSW : 20018000 000124c4
   R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 RI:0 EA:3
 GPRS:  000b806a2da6 7aa19c5cbb980703 95f62d65812b83ab
   d5e42882af203615 000b806a2da0  
   000230e8 01438500 01720320 
   01442718 00010070 00012482 bf20

Which makes me thing that fedora 33 is now targeting a cpu that is too new and
not actually supported by tcg.


r~



Re: [PATCH 5/5] target/sh4: Remove unused definitions

2021-02-03 Thread Richard Henderson
On 1/27/21 1:21 PM, Philippe Mathieu-Daudé wrote:
> Remove these confusing and unused definitions.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/sh4/cpu.h | 11 ---
>  1 file changed, 11 deletions(-)

Reviewed-by: Richard Henderson 


r~




[PATCH v2 92/93] tcg/tci: Implement mulu2, muls2

2021-02-03 Thread Richard Henderson
We already had mulu2_i32 for a 32-bit host; expand this to 64-bit
hosts as well.  The muls2_i32 and the 64-bit opcodes are new.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h |  8 
 tcg/tci.c| 35 +--
 tcg/tci/tcg-target.c.inc | 16 ++--
 3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 59859bd8a6..71a44bbfb0 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -83,7 +83,7 @@
 #define TCG_TARGET_HAS_orc_i32  1
 #define TCG_TARGET_HAS_rot_i32  1
 #define TCG_TARGET_HAS_movcond_i32  1
-#define TCG_TARGET_HAS_muls2_i320
+#define TCG_TARGET_HAS_muls2_i321
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
 #define TCG_TARGET_HAS_goto_ptr 1
@@ -120,13 +120,13 @@
 #define TCG_TARGET_HAS_orc_i64  1
 #define TCG_TARGET_HAS_rot_i64  1
 #define TCG_TARGET_HAS_movcond_i64  1
-#define TCG_TARGET_HAS_muls2_i640
+#define TCG_TARGET_HAS_muls2_i641
 #define TCG_TARGET_HAS_add2_i32 0
 #define TCG_TARGET_HAS_sub2_i32 0
-#define TCG_TARGET_HAS_mulu2_i320
+#define TCG_TARGET_HAS_mulu2_i321
 #define TCG_TARGET_HAS_add2_i64 0
 #define TCG_TARGET_HAS_sub2_i64 0
-#define TCG_TARGET_HAS_mulu2_i640
+#define TCG_TARGET_HAS_mulu2_i641
 #define TCG_TARGET_HAS_muluh_i640
 #define TCG_TARGET_HAS_mulsh_i640
 #else
diff --git a/tcg/tci.c b/tcg/tci.c
index 35f2c4bfbb..5d83b2d957 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -39,7 +39,7 @@ __thread uintptr_t tci_tb_ptr;
 static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
 uint32_t low_index, uint64_t value)
 {
-regs[low_index] = value;
+regs[low_index] = (uint32_t)value;
 regs[high_index] = value >> 32;
 }
 
@@ -169,7 +169,6 @@ static void tci_args_r(uint32_t insn, TCGReg *r0, 
TCGReg *r1,
 *r4 = extract32(insn, 24, 4);
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static void tci_args_(uint32_t insn,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
 {
@@ -178,7 +177,6 @@ static void tci_args_(uint32_t insn,
 *r2 = extract32(insn, 16, 4);
 *r3 = extract32(insn, 20, 4);
 }
-#endif
 
 static void tci_args_rc(uint32_t insn, TCGReg *r0, TCGReg *r1,
 TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
@@ -680,11 +678,21 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 T2 = tci_uint64(regs[r5], regs[r4]);
 tci_write_reg64(regs, r1, r0, T1 - T2);
 break;
+#endif /* TCG_TARGET_REG_BITS == 32 */
+#if TCG_TARGET_HAS_mulu2_i32
 case INDEX_op_mulu2_i32:
 tci_args_(insn, &r0, &r1, &r2, &r3);
-tci_write_reg64(regs, r1, r0, (uint64_t)regs[r2] * regs[r3]);
+tmp64 = (uint64_t)(uint32_t)regs[r2] * (uint32_t)regs[r3];
+tci_write_reg64(regs, r1, r0, tmp64);
 break;
-#endif /* TCG_TARGET_REG_BITS == 32 */
+#endif
+#if TCG_TARGET_HAS_muls2_i32
+case INDEX_op_muls2_i32:
+tci_args_(insn, &r0, &r1, &r2, &r3);
+tmp64 = (int64_t)(int32_t)regs[r2] * (int32_t)regs[r3];
+tci_write_reg64(regs, r1, r0, tmp64);
+break;
+#endif
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
 CASE_32_64(ext8s)
 tci_args_rr(insn, &r0, &r1);
@@ -788,6 +796,18 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 regs[r0] = ctpop64(regs[r1]);
 break;
 #endif
+#if TCG_TARGET_HAS_mulu2_i64
+case INDEX_op_mulu2_i64:
+tci_args_(insn, &r0, &r1, &r2, &r3);
+mulu64(®s[r0], ®s[r1], regs[r2], regs[r3]);
+break;
+#endif
+#if TCG_TARGET_HAS_muls2_i64
+case INDEX_op_muls2_i64:
+tci_args_(insn, &r0, &r1, &r2, &r3);
+muls64(®s[r0], ®s[r1], regs[r2], regs[r3]);
+break;
+#endif
 
 /* Shift/rotate operations (64 bit). */
 
@@ -1295,14 +1315,17 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
str_r(r3), str_r(r4), str_c(c));
 break;
 
-#if TCG_TARGET_REG_BITS == 32
 case INDEX_op_mulu2_i32:
+case INDEX_op_mulu2_i64:
+case INDEX_op_muls2_i32:
+case INDEX_op_muls2_i64:
 tci_args_(insn, &r0, &r1, &r2, &r3);
 info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s",
op_name, str_r(r0), str_r(r1),
str_r(r2), str_r(r3));
 break;
 
+#if TCG_TARGET_REG_BITS == 32
 case INDEX_op_add2_i32:
 case INDEX_op_sub2_i32:
 tci_args_rr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 664d715440..eb48633fba 100644
--- a/tcg/tci/tcg

Re: [PATCH 07/13] target/mips: Let page_table_walk_refill() take MMUAccessType argument

2021-02-03 Thread Richard Henderson
On 1/28/21 4:41 AM, Philippe Mathieu-Daudé wrote:
> -static bool page_table_walk_refill(CPUMIPSState *env, vaddr address, int rw,
> -int mmu_idx)
> +static bool page_table_walk_refill(CPUMIPSState *env, vaddr address,
> +   MMUAccessType access_type, int mmu_idx)

The parameter name has changed without any other change to the function.  If
this compiles, it surely means that the parameter is unused.


r~



Re: [PATCH 00/13] target/mips: Replace integer by MMUAccessType enum when possible

2021-02-03 Thread Richard Henderson
On 1/28/21 4:41 AM, Philippe Mathieu-Daudé wrote:
> Philippe Mathieu-Daudé (13):
>   target/mips: Remove access_type argument from map_address() handler
>   target/mips: Remove access_type argument from get_seg_physical_address
>   target/mips: Remove access_type arg from get_segctl_physical_address()
>   target/mips: Remove access_type argument from get_physical_address()
>   target/mips: Remove unused MMU definitions
>   target/mips: Replace magic value by MMU_DATA_LOAD definition
>   target/mips: Let page_table_walk_refill() take MMUAccessType argument
>   target/mips: Let do_translate_address() take MMUAccessType argument
>   target/mips: Let cpu_mips_translate_address() take MMUAccessType arg
>   target/mips: Let raise_mmu_exception() take MMUAccessType argument
>   target/mips: Let get_physical_address() take MMUAccessType argument
>   target/mips: Let get_seg*_physical_address() take MMUAccessType arg
>   target/mips: Let CPUMIPSTLBContext::map_address() take MMUAccessType

Modulo the comment vs patch 7,
Reviewed-by: Richard Henderson 


r~



[PATCH v2 91/93] tcg/tci: Implement clz, ctz, ctpop

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h | 12 +--
 tcg/tci.c| 44 
 tcg/tci/tcg-target.c.inc |  9 
 3 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 60b67b196b..59859bd8a6 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -75,9 +75,9 @@
 #define TCG_TARGET_HAS_eqv_i32  1
 #define TCG_TARGET_HAS_nand_i32 1
 #define TCG_TARGET_HAS_nor_i32  1
-#define TCG_TARGET_HAS_clz_i32  0
-#define TCG_TARGET_HAS_ctz_i32  0
-#define TCG_TARGET_HAS_ctpop_i320
+#define TCG_TARGET_HAS_clz_i32  1
+#define TCG_TARGET_HAS_ctz_i32  1
+#define TCG_TARGET_HAS_ctpop_i321
 #define TCG_TARGET_HAS_neg_i32  1
 #define TCG_TARGET_HAS_not_i32  1
 #define TCG_TARGET_HAS_orc_i32  1
@@ -112,9 +112,9 @@
 #define TCG_TARGET_HAS_eqv_i64  1
 #define TCG_TARGET_HAS_nand_i64 1
 #define TCG_TARGET_HAS_nor_i64  1
-#define TCG_TARGET_HAS_clz_i64  0
-#define TCG_TARGET_HAS_ctz_i64  0
-#define TCG_TARGET_HAS_ctpop_i640
+#define TCG_TARGET_HAS_clz_i64  1
+#define TCG_TARGET_HAS_ctz_i64  1
+#define TCG_TARGET_HAS_ctpop_i641
 #define TCG_TARGET_HAS_neg_i64  1
 #define TCG_TARGET_HAS_not_i64  1
 #define TCG_TARGET_HAS_orc_i64  1
diff --git a/tcg/tci.c b/tcg/tci.c
index 831a3bb97e..35f2c4bfbb 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -598,6 +598,26 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_args_rrr(insn, &r0, &r1, &r2);
 regs[r0] = (uint32_t)regs[r1] % (uint32_t)regs[r2];
 break;
+#if TCG_TARGET_HAS_clz_i32
+case INDEX_op_clz_i32:
+tci_args_rrr(insn, &r0, &r1, &r2);
+tmp32 = regs[r1];
+regs[r0] = tmp32 ? clz32(tmp32) : regs[r2];
+break;
+#endif
+#if TCG_TARGET_HAS_ctz_i32
+case INDEX_op_ctz_i32:
+tci_args_rrr(insn, &r0, &r1, &r2);
+tmp32 = regs[r1];
+regs[r0] = tmp32 ? ctz32(tmp32) : regs[r2];
+break;
+#endif
+#if TCG_TARGET_HAS_ctpop_i32
+case INDEX_op_ctpop_i32:
+tci_args_rr(insn, &r0, &r1);
+regs[r0] = ctpop32(regs[r1]);
+break;
+#endif
 
 /* Shift/rotate operations (32 bit). */
 
@@ -750,6 +770,24 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_args_rrr(insn, &r0, &r1, &r2);
 regs[r0] = (uint64_t)regs[r1] % (uint64_t)regs[r2];
 break;
+#if TCG_TARGET_HAS_clz_i64
+case INDEX_op_clz_i64:
+tci_args_rrr(insn, &r0, &r1, &r2);
+regs[r0] = regs[r1] ? clz64(regs[r1]) : regs[r2];
+break;
+#endif
+#if TCG_TARGET_HAS_ctz_i64
+case INDEX_op_ctz_i64:
+tci_args_rrr(insn, &r0, &r1, &r2);
+regs[r0] = regs[r1] ? ctz64(regs[r1]) : regs[r2];
+break;
+#endif
+#if TCG_TARGET_HAS_ctpop_i64
+case INDEX_op_ctpop_i64:
+tci_args_rr(insn, &r0, &r1);
+regs[r0] = ctpop64(regs[r1]);
+break;
+#endif
 
 /* Shift/rotate operations (64 bit). */
 
@@ -1176,6 +1214,8 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
 case INDEX_op_not_i64:
 case INDEX_op_neg_i32:
 case INDEX_op_neg_i64:
+case INDEX_op_ctpop_i32:
+case INDEX_op_ctpop_i64:
 tci_args_rr(insn, &r0, &r1);
 info->fprintf_func(info->stream, "%-12s  %s,%s",
op_name, str_r(r0), str_r(r1));
@@ -1221,6 +1261,10 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
 case INDEX_op_rotl_i64:
 case INDEX_op_rotr_i32:
 case INDEX_op_rotr_i64:
+case INDEX_op_clz_i32:
+case INDEX_op_clz_i64:
+case INDEX_op_ctz_i32:
+case INDEX_op_ctz_i64:
 tci_args_rrr(insn, &r0, &r1, &r2);
 info->fprintf_func(info->stream, "%-12s  %s,%s,%s",
op_name, str_r(r0), str_r(r1), str_r(r2));
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index cedd0328df..664d715440 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -67,6 +67,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 case INDEX_op_extract_i64:
 case INDEX_op_sextract_i32:
 case INDEX_op_sextract_i64:
+case INDEX_op_ctpop_i32:
+case INDEX_op_ctpop_i64:
 return C_O1_I1(r, r);
 
 case INDEX_op_st8_i32:
@@ -122,6 +124,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_setcond_i64:
 case INDEX_op_deposit_i32:
 case INDEX_op_deposit_i64:
+case INDEX_op_clz_i32:
+case INDEX_op_clz_i64:
+case INDEX_op_ctz_i32:
+case INDEX_op_ctz_i64:
 return C_O1_I2(r, r, r);
 
 case INDEX_op_brcond_i32:
@@ -657,6 +663,8 @@ static void

[PATCH v2 88/93] tcg/tci: Implement movcond

2021-02-03 Thread Richard Henderson
When this opcode is not available in the backend, tcg middle-end
will expand this as a series of 5 opcodes.  So implementing this
saves bytecode space.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h |  4 ++--
 tcg/tci.c| 16 +++-
 tcg/tci/tcg-target.c.inc | 10 +++---
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 17911d3297..f53773a555 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -82,7 +82,7 @@
 #define TCG_TARGET_HAS_not_i32  1
 #define TCG_TARGET_HAS_orc_i32  0
 #define TCG_TARGET_HAS_rot_i32  1
-#define TCG_TARGET_HAS_movcond_i32  0
+#define TCG_TARGET_HAS_movcond_i32  1
 #define TCG_TARGET_HAS_muls2_i320
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
@@ -119,7 +119,7 @@
 #define TCG_TARGET_HAS_not_i64  1
 #define TCG_TARGET_HAS_orc_i64  0
 #define TCG_TARGET_HAS_rot_i64  1
-#define TCG_TARGET_HAS_movcond_i64  0
+#define TCG_TARGET_HAS_movcond_i64  1
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_add2_i32 0
 #define TCG_TARGET_HAS_sub2_i32 0
diff --git a/tcg/tci.c b/tcg/tci.c
index a6e30d31a9..2a39f8f5a0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -169,6 +169,7 @@ static void tci_args_(uint32_t insn,
 *r2 = extract32(insn, 16, 4);
 *r3 = extract32(insn, 20, 4);
 }
+#endif
 
 static void tci_args_rc(uint32_t insn, TCGReg *r0, TCGReg *r1,
 TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
@@ -181,6 +182,7 @@ static void tci_args_rc(uint32_t insn, TCGReg *r0, 
TCGReg *r1,
 *c5 = extract32(insn, 28, 4);
 }
 
+#if TCG_TARGET_REG_BITS == 32
 static void tci_args_rr(uint32_t insn, TCGReg *r0, TCGReg *r1,
 TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGReg *r5)
 {
@@ -431,6 +433,11 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_args_rrrc(insn, &r0, &r1, &r2, &condition);
 regs[r0] = tci_compare32(regs[r1], regs[r2], condition);
 break;
+case INDEX_op_movcond_i32:
+tci_args_rc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
+tmp32 = tci_compare32(regs[r1], regs[r2], condition);
+regs[r0] = regs[tmp32 ? r3 : r4];
+break;
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_setcond2_i32:
 tci_args_rc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
@@ -443,6 +450,11 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_args_rrrc(insn, &r0, &r1, &r2, &condition);
 regs[r0] = tci_compare64(regs[r1], regs[r2], condition);
 break;
+case INDEX_op_movcond_i64:
+tci_args_rc(insn, &r0, &r1, &r2, &r3, &r4, &condition);
+tmp32 = tci_compare64(regs[r1], regs[r2], condition);
+regs[r0] = regs[tmp32 ? r3 : r4];
+break;
 #endif
 CASE_32_64(mov)
 tci_args_rr(insn, &r0, &r1);
@@ -1148,7 +1160,8 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
op_name, str_r(r0), str_r(r1), str_r(r2), pos, len);
 break;
 
-#if TCG_TARGET_REG_BITS == 32
+case INDEX_op_movcond_i32:
+case INDEX_op_movcond_i64:
 case INDEX_op_setcond2_i32:
 tci_args_rc(insn, &r0, &r1, &r2, &r3, &r4, &c);
 info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%s",
@@ -1156,6 +1169,7 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
str_r(r3), str_r(r4), str_c(c));
 break;
 
+#if TCG_TARGET_REG_BITS == 32
 case INDEX_op_mulu2_i32:
 tci_args_(insn, &r0, &r1, &r2, &r3);
 info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s",
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index db29bc6e54..a0c458a60a 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -133,9 +133,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 return C_O0_I4(r, r, r, r);
 case INDEX_op_mulu2_i32:
 return C_O2_I2(r, r, r, r);
+#endif
+
+case INDEX_op_movcond_i32:
+case INDEX_op_movcond_i64:
 case INDEX_op_setcond2_i32:
 return C_O1_I4(r, r, r, r, r);
-#endif
 
 case INDEX_op_qemu_ld_i32:
 return (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
@@ -419,6 +422,7 @@ static void tcg_out_op_(TCGContext *s, TCGOpcode op,
 insn = deposit32(insn, 20, 4, r3);
 tcg_out32(s, insn);
 }
+#endif
 
 static void tcg_out_op_rc(TCGContext *s, TCGOpcode op,
   TCGReg r0, TCGReg r1, TCGReg r2,
@@ -436,6 +440,7 @@ static void tcg_out_op_rc(TCGContext *s, TCGOpcode op,
 tcg_out32(s, insn);
 }
 
+#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op,
   TCGReg 

[PATCH v2 90/93] tcg/tci: Implement extract, sextract

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h |  8 
 tcg/tci.c| 42 
 tcg/tci/tcg-target.c.inc | 32 ++
 3 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 5945272a43..60b67b196b 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -69,8 +69,8 @@
 #define TCG_TARGET_HAS_ext16u_i32   1
 #define TCG_TARGET_HAS_andc_i32 1
 #define TCG_TARGET_HAS_deposit_i32  1
-#define TCG_TARGET_HAS_extract_i32  0
-#define TCG_TARGET_HAS_sextract_i32 0
+#define TCG_TARGET_HAS_extract_i32  1
+#define TCG_TARGET_HAS_sextract_i32 1
 #define TCG_TARGET_HAS_extract2_i32 0
 #define TCG_TARGET_HAS_eqv_i32  1
 #define TCG_TARGET_HAS_nand_i32 1
@@ -97,8 +97,8 @@
 #define TCG_TARGET_HAS_bswap32_i64  1
 #define TCG_TARGET_HAS_bswap64_i64  1
 #define TCG_TARGET_HAS_deposit_i64  1
-#define TCG_TARGET_HAS_extract_i64  0
-#define TCG_TARGET_HAS_sextract_i64 0
+#define TCG_TARGET_HAS_extract_i64  1
+#define TCG_TARGET_HAS_sextract_i64 1
 #define TCG_TARGET_HAS_extract2_i64 0
 #define TCG_TARGET_HAS_div_i64  1
 #define TCG_TARGET_HAS_rem_i64  1
diff --git a/tcg/tci.c b/tcg/tci.c
index 9c17947e6b..831a3bb97e 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -122,6 +122,15 @@ static void tci_args_rrs(uint32_t insn, TCGReg *r0, TCGReg 
*r1, int32_t *i2)
 *i2 = sextract32(insn, 16, 16);
 }
 
+static void tci_args_rrbb(uint32_t insn, TCGReg *r0, TCGReg *r1,
+  uint8_t *i2, uint8_t *i3)
+{
+*r0 = extract32(insn, 8, 4);
+*r1 = extract32(insn, 12, 4);
+*i2 = extract32(insn, 16, 6);
+*i3 = extract32(insn, 22, 6);
+}
+
 static void tci_args_rrrc(uint32_t insn,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
@@ -619,6 +628,18 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
 regs[r0] = deposit32(regs[r1], pos, len, regs[r2]);
 break;
+#endif
+#if TCG_TARGET_HAS_extract_i32
+case INDEX_op_extract_i32:
+tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+regs[r0] = extract32(regs[r1], pos, len);
+break;
+#endif
+#if TCG_TARGET_HAS_sextract_i32
+case INDEX_op_sextract_i32:
+tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+regs[r0] = sextract32(regs[r1], pos, len);
+break;
 #endif
 case INDEX_op_brcond_i32:
 tci_args_rl(insn, tb_ptr, &r0, &ptr);
@@ -759,6 +780,18 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_args_rrrbb(insn, &r0, &r1, &r2, &pos, &len);
 regs[r0] = deposit64(regs[r1], pos, len, regs[r2]);
 break;
+#endif
+#if TCG_TARGET_HAS_extract_i64
+case INDEX_op_extract_i64:
+tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+regs[r0] = extract64(regs[r1], pos, len);
+break;
+#endif
+#if TCG_TARGET_HAS_sextract_i64
+case INDEX_op_sextract_i64:
+tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+regs[r0] = sextract64(regs[r1], pos, len);
+break;
 #endif
 case INDEX_op_brcond_i64:
 tci_args_rl(insn, tb_ptr, &r0, &ptr);
@@ -1200,6 +1233,15 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
op_name, str_r(r0), str_r(r1), str_r(r2), pos, len);
 break;
 
+case INDEX_op_extract_i32:
+case INDEX_op_extract_i64:
+case INDEX_op_sextract_i32:
+case INDEX_op_sextract_i64:
+tci_args_rrbb(insn, &r0, &r1, &pos, &len);
+info->fprintf_func(info->stream, "%-12s  %s,%s,%d,%d",
+   op_name, str_r(r0), str_r(r1), pos, len);
+break;
+
 case INDEX_op_movcond_i32:
 case INDEX_op_movcond_i64:
 case INDEX_op_setcond2_i32:
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index a0c458a60a..cedd0328df 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -63,6 +63,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 case INDEX_op_bswap32_i32:
 case INDEX_op_bswap32_i64:
 case INDEX_op_bswap64_i64:
+case INDEX_op_extract_i32:
+case INDEX_op_extract_i64:
+case INDEX_op_sextract_i32:
+case INDEX_op_sextract_i64:
 return C_O1_I1(r, r);
 
 case INDEX_op_st8_i32:
@@ -352,6 +356,21 @@ static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
 tcg_out32(s, insn);
 }
 
+static void tcg_out_op_rrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
+TCGReg r1, uint8_t b2, uint8_t b3)
+{
+tcg_insn_unit insn = 0;
+
+tcg_debug_assert(b2 == extract32(b2, 0, 6));
+tcg_debug_assert(b3 == extract32(b3, 0, 6));
+insn = deposit32(insn, 0, 8, op);
+insn = deposit32(

[PATCH v2 87/93] tcg/tci: Implement goto_ptr

2021-02-03 Thread Richard Henderson
This operation is critical to staying within the interpretation
loop longer, which avoids the overhead of setup and teardown for
many TBs.

The check in tcg_prologue_init is disabled because TCI does
want to use NULL to indicate exit, as opposed to branching to
a real epilogue.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target-con-set.h |  1 +
 tcg/tci/tcg-target.h |  2 +-
 tcg/tcg.c|  2 ++
 tcg/tci.c| 19 +++
 tcg/tci/tcg-target.c.inc | 16 
 5 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/tcg/tci/tcg-target-con-set.h b/tcg/tci/tcg-target-con-set.h
index 316730f32c..ae2dc3b844 100644
--- a/tcg/tci/tcg-target-con-set.h
+++ b/tcg/tci/tcg-target-con-set.h
@@ -9,6 +9,7 @@
  * Each operand should be a sequence of constraint letters as defined by
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
+C_O0_I1(r)
 C_O0_I2(r, r)
 C_O0_I3(r, r, r)
 C_O0_I4(r, r, r, r)
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index d953f2ead3..17911d3297 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -86,7 +86,7 @@
 #define TCG_TARGET_HAS_muls2_i320
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
-#define TCG_TARGET_HAS_goto_ptr 0
+#define TCG_TARGET_HAS_goto_ptr 1
 #define TCG_TARGET_HAS_direct_jump  0
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 92aec0d238..ce80adcfbe 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1314,10 +1314,12 @@ void tcg_prologue_init(TCGContext *s)
 }
 #endif
 
+#ifndef CONFIG_TCG_INTERPRETER
 /* Assert that goto_ptr is implemented completely.  */
 if (TCG_TARGET_HAS_goto_ptr) {
 tcg_debug_assert(tcg_code_gen_epilogue != NULL);
 }
+#endif
 }
 
 void tcg_func_start(TCGContext *s)
diff --git a/tcg/tci.c b/tcg/tci.c
index c4f0a7e82d..a6e30d31a9 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -69,6 +69,11 @@ static void tci_args_l(uint32_t insn, const void *tb_ptr, 
void **l0)
 *l0 = diff ? (void *)tb_ptr + diff : NULL;
 }
 
+static void tci_args_r(uint32_t insn, TCGReg *r0)
+{
+*r0 = extract32(insn, 8, 4);
+}
+
 static void tci_args_nl(uint32_t insn, const void *tb_ptr,
 uint8_t *n0, void **l1)
 {
@@ -748,6 +753,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tb_ptr = *(void **)ptr;
 break;
 
+case INDEX_op_goto_ptr:
+tci_args_r(insn, &r0);
+ptr = (void *)regs[r0];
+if (!ptr) {
+return 0;
+}
+tb_ptr = ptr;
+break;
+
 case INDEX_op_qemu_ld_i32:
 if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
 tci_args_rrm(insn, &r0, &r1, &oi);
@@ -1005,6 +1019,11 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
 info->fprintf_func(info->stream, "%-12s  %p", op_name, ptr);
 break;
 
+case INDEX_op_goto_ptr:
+tci_args_r(insn, &r0);
+info->fprintf_func(info->stream, "%-12s  %s", op_name, str_r(r0));
+break;
+
 case INDEX_op_call:
 tci_args_nl(insn, tb_ptr, &len, &ptr);
 info->fprintf_func(info->stream, "%-12s  %d,%p", op_name, len, ptr);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 0df8384be7..db29bc6e54 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -27,6 +27,9 @@
 static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 {
 switch (op) {
+case INDEX_op_goto_ptr:
+return C_O0_I1(r);
+
 case INDEX_op_ld8u_i32:
 case INDEX_op_ld8s_i32:
 case INDEX_op_ld16u_i32:
@@ -263,6 +266,15 @@ static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void 
*p0)
 tcg_out32(s, insn);
 }
 
+static void tcg_out_op_r(TCGContext *s, TCGOpcode op, TCGReg r0)
+{
+tcg_insn_unit insn = 0;
+
+insn = deposit32(insn, 0, 8, op);
+insn = deposit32(insn, 8, 4, r0);
+tcg_out32(s, insn);
+}
+
 static void tcg_out_op_v(TCGContext *s, TCGOpcode op)
 {
 tcg_out32(s, (uint8_t)op);
@@ -567,6 +579,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 set_jmp_reset_offset(s, args[0]);
 break;
 
+case INDEX_op_goto_ptr:
+tcg_out_op_r(s, opc, args[0]);
+break;
+
 case INDEX_op_br:
 tcg_out_op_l(s, opc, arg_label(args[0]));
 break;
-- 
2.25.1




[PATCH v2 86/93] tcg/tci: Change encoding to uint32_t units

2021-02-03 Thread Richard Henderson
This removes all of the problems with unaligned accesses
to the bytecode stream.

With an 8-bit opcode at the bottom, we have 24 bits remaining,
which are generally split into 6 4-bit slots.  This fits well
with the maximum length opcodes, e.g. INDEX_op_add2_i386, which
have 6 register operands.

We have, in previous patches, rearranged things such that there
are no operations with a label, which have more than one other
operand.  Which leaves us with a 20-bit field in which to encode
a label, giving us a maximum TB size of 512k -- easily large.

Change the INDEX_op_tci_movi_{i32,i64} opcodes to tci_mov[il].
The former puts the immediate in the upper 20 bits of the insn,
like we do for the label displacement.  The later uses a label
to reference an entry in the constant pool.  Thus, in the worst
case we still have a single memory reference for any constant,
but now the constants are out-of-line of the bytecode and can
be shared between different moves saving space.

Change INDEX_op_call to use a label to reference a pair of
pointers in the constant pool.  This removes the only slightly
dodgy link with the layout of struct TCGHelperInfo.

The re-encode cannot be done in pieces.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-opc.h|   4 +-
 tcg/tci/tcg-target.h |   3 +-
 tcg/tci.c| 534 +++
 tcg/tci/tcg-target.c.inc | 386 +---
 tcg/tci/README   |  20 +-
 5 files changed, 380 insertions(+), 567 deletions(-)

diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index bbb0884af8..5bbec858aa 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -277,8 +277,8 @@ DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
 #ifdef TCG_TARGET_INTERPRETER
 /* These opcodes are only for use between the tci generator and interpreter. */
-DEF(tci_movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
-DEF(tci_movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
+DEF(tci_movi, 1, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(tci_movl, 1, 0, 1, TCG_OPF_NOT_PRESENT)
 #endif
 
 #undef TLADDR_ARGS
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 1558a6e44e..d953f2ead3 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -41,7 +41,7 @@
 #define TCG_TARGET_H
 
 #define TCG_TARGET_INTERPRETER 1
-#define TCG_TARGET_INSN_UNIT_SIZE 1
+#define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 32
 
 #if UINTPTR_MAX == UINT32_MAX
@@ -165,6 +165,7 @@ typedef enum {
 #define TCG_TARGET_STACK_ALIGN  8
 
 #define HAVE_TCG_QEMU_TB_EXEC
+#define TCG_TARGET_NEED_POOL_LABELS
 
 /* We could notice __i386__ or __s390x__ and reduce the barriers depending
on the host.  But if you want performance, you use the normal backend.
diff --git a/tcg/tci.c b/tcg/tci.c
index 4f81cbb904..c4f0a7e82d 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -49,49 +49,6 @@ static uint64_t tci_uint64(uint32_t high, uint32_t low)
 return ((uint64_t)high << 32) + low;
 }
 
-/* Read constant byte from bytecode. */
-static uint8_t tci_read_b(const uint8_t **tb_ptr)
-{
-return *(tb_ptr[0]++);
-}
-
-/* Read register number from bytecode. */
-static TCGReg tci_read_r(const uint8_t **tb_ptr)
-{
-uint8_t regno = tci_read_b(tb_ptr);
-tci_assert(regno < TCG_TARGET_NB_REGS);
-return regno;
-}
-
-/* Read constant (native size) from bytecode. */
-static tcg_target_ulong tci_read_i(const uint8_t **tb_ptr)
-{
-tcg_target_ulong value = *(const tcg_target_ulong *)(*tb_ptr);
-*tb_ptr += sizeof(value);
-return value;
-}
-
-/* Read unsigned constant (32 bit) from bytecode. */
-static uint32_t tci_read_i32(const uint8_t **tb_ptr)
-{
-uint32_t value = *(const uint32_t *)(*tb_ptr);
-*tb_ptr += sizeof(value);
-return value;
-}
-
-/* Read signed constant (32 bit) from bytecode. */
-static int32_t tci_read_s32(const uint8_t **tb_ptr)
-{
-int32_t value = *(const int32_t *)(*tb_ptr);
-*tb_ptr += sizeof(value);
-return value;
-}
-
-static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
-{
-return tci_read_i(tb_ptr);
-}
-
 /*
  * Load sets of arguments all at once.  The naming convention is:
  *   tci_args_
@@ -106,209 +63,128 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   s = signed ldst offset
  */
 
-static void check_size(const uint8_t *start, const uint8_t **tb_ptr)
+static void tci_args_l(uint32_t insn, const void *tb_ptr, void **l0)
 {
-const uint8_t *old_code_ptr = start - 2;
-uint8_t op_size = old_code_ptr[1];
-tci_assert(*tb_ptr == old_code_ptr + op_size);
+int diff = sextract32(insn, 12, 20);
+*l0 = diff ? (void *)tb_ptr + diff : NULL;
 }
 
-static void tci_args_l(const uint8_t **tb_ptr, void **l0)
+static void tci_args_nl(uint32_t insn, const void *tb_ptr,
+uint8_t *n0, void **l1)
 {
-const uint8_t *start = *tb_ptr;
-
-*l0 = (void *)tci_read_label(tb_ptr);
-
-check_size(start, tb_ptr);
+*n0 = extract32(insn

[PATCH v2 93/93] tcg/tci: Implement add2, sub2

2021-02-03 Thread Richard Henderson
We already had the 32-bit versions for a 32-bit host; expand this
to 64-bit hosts as well.  The 64-bit opcodes are new.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h |  8 
 tcg/tci.c| 40 ++--
 tcg/tci/tcg-target.c.inc | 15 ---
 3 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 71a44bbfb0..515b3c7a56 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -121,11 +121,11 @@
 #define TCG_TARGET_HAS_rot_i64  1
 #define TCG_TARGET_HAS_movcond_i64  1
 #define TCG_TARGET_HAS_muls2_i641
-#define TCG_TARGET_HAS_add2_i32 0
-#define TCG_TARGET_HAS_sub2_i32 0
+#define TCG_TARGET_HAS_add2_i32 1
+#define TCG_TARGET_HAS_sub2_i32 1
 #define TCG_TARGET_HAS_mulu2_i321
-#define TCG_TARGET_HAS_add2_i64 0
-#define TCG_TARGET_HAS_sub2_i64 0
+#define TCG_TARGET_HAS_add2_i64 1
+#define TCG_TARGET_HAS_sub2_i64 1
 #define TCG_TARGET_HAS_mulu2_i641
 #define TCG_TARGET_HAS_muluh_i640
 #define TCG_TARGET_HAS_mulsh_i640
diff --git a/tcg/tci.c b/tcg/tci.c
index 5d83b2d957..ee16142f48 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -189,7 +189,6 @@ static void tci_args_rc(uint32_t insn, TCGReg *r0, 
TCGReg *r1,
 *c5 = extract32(insn, 28, 4);
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static void tci_args_rr(uint32_t insn, TCGReg *r0, TCGReg *r1,
 TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGReg *r5)
 {
@@ -200,7 +199,6 @@ static void tci_args_rr(uint32_t insn, TCGReg *r0, 
TCGReg *r1,
 *r4 = extract32(insn, 24, 4);
 *r5 = extract32(insn, 28, 4);
 }
-#endif
 
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
@@ -368,17 +366,14 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 for (;;) {
 uint32_t insn;
 TCGOpcode opc;
-TCGReg r0, r1, r2, r3, r4;
+TCGReg r0, r1, r2, r3, r4, r5;
 tcg_target_ulong t1;
 TCGCond condition;
 target_ulong taddr;
 uint8_t pos, len;
 uint32_t tmp32;
 uint64_t tmp64;
-#if TCG_TARGET_REG_BITS == 32
-TCGReg r5;
 uint64_t T1, T2;
-#endif
 TCGMemOpIdx oi;
 int32_t ofs;
 void *ptr;
@@ -665,20 +660,22 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tb_ptr = ptr;
 }
 break;
-#if TCG_TARGET_REG_BITS == 32
+#if TCG_TARGET_REG_BITS == 32 || TCG_TARGET_HAS_add2_i32
 case INDEX_op_add2_i32:
 tci_args_rr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
 T1 = tci_uint64(regs[r3], regs[r2]);
 T2 = tci_uint64(regs[r5], regs[r4]);
 tci_write_reg64(regs, r1, r0, T1 + T2);
 break;
+#endif
+#if TCG_TARGET_REG_BITS == 32 || TCG_TARGET_HAS_sub2_i32
 case INDEX_op_sub2_i32:
 tci_args_rr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
 T1 = tci_uint64(regs[r3], regs[r2]);
 T2 = tci_uint64(regs[r5], regs[r4]);
 tci_write_reg64(regs, r1, r0, T1 - T2);
 break;
-#endif /* TCG_TARGET_REG_BITS == 32 */
+#endif
 #if TCG_TARGET_HAS_mulu2_i32
 case INDEX_op_mulu2_i32:
 tci_args_(insn, &r0, &r1, &r2, &r3);
@@ -808,6 +805,24 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 muls64(®s[r0], ®s[r1], regs[r2], regs[r3]);
 break;
 #endif
+#if TCG_TARGET_HAS_add2_i64
+case INDEX_op_add2_i64:
+tci_args_rr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
+T1 = regs[r2] + regs[r4];
+T2 = regs[r3] + regs[r5] + (T1 < regs[r2]);
+regs[r0] = T1;
+regs[r1] = T2;
+break;
+#endif
+#if TCG_TARGET_HAS_add2_i64
+case INDEX_op_sub2_i64:
+tci_args_rr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
+T1 = regs[r2] - regs[r4];
+T2 = regs[r3] - regs[r5] - (regs[r2] < regs[r4]);
+regs[r0] = T1;
+regs[r1] = T2;
+break;
+#endif
 
 /* Shift/rotate operations (64 bit). */
 
@@ -1124,10 +1139,7 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
 const char *op_name;
 uint32_t insn;
 TCGOpcode op;
-TCGReg r0, r1, r2, r3, r4;
-#if TCG_TARGET_REG_BITS == 32
-TCGReg r5;
-#endif
+TCGReg r0, r1, r2, r3, r4, r5;
 tcg_target_ulong i1;
 int32_t s2;
 TCGCond c;
@@ -1325,15 +1337,15 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
str_r(r2), str_r(r3));
 break;
 
-#if TCG_TARGET_REG_BITS == 32
 case INDEX_op_add2_i32:
+case INDEX_op_add2_i64:
 case INDEX_op_sub2_i32:
+case INDEX_op_sub2_i64:
 tci_args_rr(insn, &r0, &r1, &r2, &r3, &r4, &r5);
 info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s

[PATCH v2 89/93] tcg/tci: Implement andc, orc, eqv, nand, nor

2021-02-03 Thread Richard Henderson
These were already present in tcg-target.c.inc,
but not in the interpreter.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h | 20 ++--
 tcg/tci.c| 40 
 2 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index f53773a555..5945272a43 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -67,20 +67,20 @@
 #define TCG_TARGET_HAS_ext16s_i32   1
 #define TCG_TARGET_HAS_ext8u_i321
 #define TCG_TARGET_HAS_ext16u_i32   1
-#define TCG_TARGET_HAS_andc_i32 0
+#define TCG_TARGET_HAS_andc_i32 1
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  0
 #define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_extract2_i32 0
-#define TCG_TARGET_HAS_eqv_i32  0
-#define TCG_TARGET_HAS_nand_i32 0
-#define TCG_TARGET_HAS_nor_i32  0
+#define TCG_TARGET_HAS_eqv_i32  1
+#define TCG_TARGET_HAS_nand_i32 1
+#define TCG_TARGET_HAS_nor_i32  1
 #define TCG_TARGET_HAS_clz_i32  0
 #define TCG_TARGET_HAS_ctz_i32  0
 #define TCG_TARGET_HAS_ctpop_i320
 #define TCG_TARGET_HAS_neg_i32  1
 #define TCG_TARGET_HAS_not_i32  1
-#define TCG_TARGET_HAS_orc_i32  0
+#define TCG_TARGET_HAS_orc_i32  1
 #define TCG_TARGET_HAS_rot_i32  1
 #define TCG_TARGET_HAS_movcond_i32  1
 #define TCG_TARGET_HAS_muls2_i320
@@ -108,16 +108,16 @@
 #define TCG_TARGET_HAS_ext8u_i641
 #define TCG_TARGET_HAS_ext16u_i64   1
 #define TCG_TARGET_HAS_ext32u_i64   1
-#define TCG_TARGET_HAS_andc_i64 0
-#define TCG_TARGET_HAS_eqv_i64  0
-#define TCG_TARGET_HAS_nand_i64 0
-#define TCG_TARGET_HAS_nor_i64  0
+#define TCG_TARGET_HAS_andc_i64 1
+#define TCG_TARGET_HAS_eqv_i64  1
+#define TCG_TARGET_HAS_nand_i64 1
+#define TCG_TARGET_HAS_nor_i64  1
 #define TCG_TARGET_HAS_clz_i64  0
 #define TCG_TARGET_HAS_ctz_i64  0
 #define TCG_TARGET_HAS_ctpop_i640
 #define TCG_TARGET_HAS_neg_i64  1
 #define TCG_TARGET_HAS_not_i64  1
-#define TCG_TARGET_HAS_orc_i64  0
+#define TCG_TARGET_HAS_orc_i64  1
 #define TCG_TARGET_HAS_rot_i64  1
 #define TCG_TARGET_HAS_movcond_i64  1
 #define TCG_TARGET_HAS_muls2_i640
diff --git a/tcg/tci.c b/tcg/tci.c
index 2a39f8f5a0..9c17947e6b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -540,6 +540,36 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_args_rrr(insn, &r0, &r1, &r2);
 regs[r0] = regs[r1] ^ regs[r2];
 break;
+#if TCG_TARGET_HAS_andc_i32 || TCG_TARGET_HAS_andc_i64
+CASE_32_64(andc)
+tci_args_rrr(insn, &r0, &r1, &r2);
+regs[r0] = regs[r1] & ~regs[r2];
+break;
+#endif
+#if TCG_TARGET_HAS_orc_i32 || TCG_TARGET_HAS_orc_i64
+CASE_32_64(orc)
+tci_args_rrr(insn, &r0, &r1, &r2);
+regs[r0] = regs[r1] | ~regs[r2];
+break;
+#endif
+#if TCG_TARGET_HAS_eqv_i32 || TCG_TARGET_HAS_eqv_i64
+CASE_32_64(eqv)
+tci_args_rrr(insn, &r0, &r1, &r2);
+regs[r0] = ~(regs[r1] ^ regs[r2]);
+break;
+#endif
+#if TCG_TARGET_HAS_nand_i32 || TCG_TARGET_HAS_nand_i64
+CASE_32_64(nand)
+tci_args_rrr(insn, &r0, &r1, &r2);
+regs[r0] = ~(regs[r1] & regs[r2]);
+break;
+#endif
+#if TCG_TARGET_HAS_nor_i32 || TCG_TARGET_HAS_nor_i64
+CASE_32_64(nor)
+tci_args_rrr(insn, &r0, &r1, &r2);
+regs[r0] = ~(regs[r1] | regs[r2]);
+break;
+#endif
 
 /* Arithmetic operations (32 bit). */
 
@@ -1130,6 +1160,16 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
 case INDEX_op_or_i64:
 case INDEX_op_xor_i32:
 case INDEX_op_xor_i64:
+case INDEX_op_andc_i32:
+case INDEX_op_andc_i64:
+case INDEX_op_orc_i32:
+case INDEX_op_orc_i64:
+case INDEX_op_eqv_i32:
+case INDEX_op_eqv_i64:
+case INDEX_op_nand_i32:
+case INDEX_op_nand_i64:
+case INDEX_op_nor_i32:
+case INDEX_op_nor_i64:
 case INDEX_op_div_i32:
 case INDEX_op_div_i64:
 case INDEX_op_rem_i32:
-- 
2.25.1




[PATCH v2 85/93] tcg/tci: Remove tci_write_reg

2021-02-03 Thread Richard Henderson
Inline it into its one caller, tci_write_reg64.
Drop the asserts that are redundant with tcg_read_r.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index e7268b13e1..4f81cbb904 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -36,20 +36,11 @@
 
 __thread uintptr_t tci_tb_ptr;
 
-static void
-tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
-{
-tci_assert(index < TCG_TARGET_NB_REGS);
-tci_assert(index != TCG_AREG0);
-tci_assert(index != TCG_REG_CALL_STACK);
-regs[index] = value;
-}
-
 static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
 uint32_t low_index, uint64_t value)
 {
-tci_write_reg(regs, low_index, value);
-tci_write_reg(regs, high_index, value >> 32);
+regs[low_index] = value;
+regs[high_index] = value >> 32;
 }
 
 /* Create a 64 bit value from two 32 bit values. */
-- 
2.25.1




[PATCH v2 83/93] tcg/tci: Reserve r13 for a temporary

2021-02-03 Thread Richard Henderson
We're about to adjust the offset range on host memory ops,
and the format of branches.  Both will require a temporary.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h | 1 +
 tcg/tci/tcg-target.c.inc | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 4df10e2e83..1558a6e44e 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -155,6 +155,7 @@ typedef enum {
 TCG_REG_R14,
 TCG_REG_R15,
 
+TCG_REG_TMP = TCG_REG_R13,
 TCG_AREG0 = TCG_REG_R14,
 TCG_REG_CALL_STACK = TCG_REG_R15,
 } TCGReg;
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index c2d2bd24d7..b29e75425d 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -829,6 +829,7 @@ static void tcg_target_init(TCGContext *s)
 MAKE_64BIT_MASK(TCG_REG_R0, 64 / TCG_TARGET_REG_BITS);
 
 s->reserved_regs = 0;
+tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
 
 /* The call arguments come first, followed by the temp storage. */
-- 
2.25.1




[PATCH v2 82/93] tcg/tci: Split out tcg_out_op_r[iI]

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 50 
 1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index e4a5872b2a..c2d2bd24d7 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -324,6 +324,31 @@ static void tcg_out_op_np(TCGContext *s, TCGOpcode op,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_ri(TCGContext *s, TCGOpcode op, TCGReg r0, int32_t i1)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out32(s, i1);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
+#if TCG_TARGET_REG_BITS == 64
+static void tcg_out_op_rI(TCGContext *s, TCGOpcode op,
+  TCGReg r0, uint64_t i1)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out64(s, i1);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+#endif
+
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 {
 uint8_t *old_code_ptr = s->code_ptr;
@@ -550,25 +575,20 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, 
TCGReg ret, TCGReg arg)
 }
 
 static void tcg_out_movi(TCGContext *s, TCGType type,
- TCGReg t0, tcg_target_long arg)
+ TCGReg ret, tcg_target_long arg)
 {
-uint8_t *old_code_ptr = s->code_ptr;
-uint32_t arg32 = arg;
-if (type == TCG_TYPE_I32 || arg == arg32) {
-tcg_out_op_t(s, INDEX_op_tci_movi_i32);
-tcg_out_r(s, t0);
-tcg_out32(s, arg32);
-} else {
-tcg_debug_assert(type == TCG_TYPE_I64);
+switch (type) {
+case TCG_TYPE_I32:
+tcg_out_op_ri(s, INDEX_op_tci_movi_i32, ret, arg);
+break;
 #if TCG_TARGET_REG_BITS == 64
-tcg_out_op_t(s, INDEX_op_tci_movi_i64);
-tcg_out_r(s, t0);
-tcg_out64(s, arg);
-#else
-TODO();
+case TCG_TYPE_I64:
+tcg_out_op_rI(s, INDEX_op_tci_movi_i64, ret, arg);
+break;
 #endif
+default:
+g_assert_not_reached();
 }
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
 static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
-- 
2.25.1




[PATCH v2 81/93] tcg/tci: Split out tcg_out_op_np

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index eeafec6d44..e4a5872b2a 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -312,6 +312,18 @@ static void tcg_out_op_v(TCGContext *s, TCGOpcode op)
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_np(TCGContext *s, TCGOpcode op,
+  uint8_t n0, const void *p1)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out8(s, n0);
+tcg_out_i(s, (uintptr_t)p1);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 {
 uint8_t *old_code_ptr = s->code_ptr;
@@ -561,7 +573,6 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 
 static void tcg_out_call(TCGContext *s, const tcg_insn_unit *arg)
 {
-uint8_t *old_code_ptr = s->code_ptr;
 const TCGHelperInfo *info;
 uint8_t which;
 
@@ -574,11 +585,8 @@ static void tcg_out_call(TCGContext *s, const 
tcg_insn_unit *arg)
 tcg_debug_assert(info->cif->rtype->size == 8);
 which = 2;
 }
-tcg_out_op_t(s, INDEX_op_call);
-tcg_out8(s, which);
-tcg_out_i(s, (uintptr_t)info);
 
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_np(s, INDEX_op_call, which, info);
 }
 
 #if TCG_TARGET_REG_BITS == 64
-- 
2.25.1




[PATCH v2 75/93] tcg/tci: Split out tcg_out_op_rrcl

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 6c743a8fbd..8cc63124d4 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -341,6 +341,20 @@ static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrcl(TCGContext *s, TCGOpcode op,
+TCGReg r0, TCGReg r1, TCGCond c2, TCGLabel *l3)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out8(s, c2);
+tci_out_label(s, l3);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
 TCGReg r0, TCGReg r1, TCGReg r2, TCGCond c3)
 {
@@ -565,12 +579,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 break;
 
 CASE_32_64(brcond)
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out8(s, args[2]);   /* condition */
-tci_out_label(s, arg_label(args[3]));
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_rrcl(s, opc, args[0], args[1], args[2], arg_label(args[3]));
 break;
 
 CASE_32_64(neg)  /* Optional (TCG_TARGET_HAS_neg_*). */
-- 
2.25.1




[PATCH v2 79/93] tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm}

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 70 ++--
 1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index fb4aacaca3..f93772f01f 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -314,6 +314,19 @@ static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, 
TCGReg r0, TCGReg r1)
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrm(TCGContext *s, TCGOpcode op,
+   TCGReg r0, TCGReg r1, TCGArg m2)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out32(s, m2);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
TCGReg r0, TCGReg r1, TCGReg r2)
 {
@@ -369,6 +382,20 @@ static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrm(TCGContext *s, TCGOpcode op,
+TCGReg r0, TCGReg r1, TCGReg r2, TCGArg m3)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out32(s, m3);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
  TCGReg r1, TCGReg r2, uint8_t b3, uint8_t b4)
 {
@@ -384,6 +411,21 @@ static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, 
TCGReg r0,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_m(TCGContext *s, TCGOpcode op, TCGReg r0,
+ TCGReg r1, TCGReg r2, TCGReg r3, TCGArg m4)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out_r(s, r3);
+tcg_out32(s, m4);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_(TCGContext *s, TCGOpcode op,
 TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
@@ -663,29 +705,23 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 
 case INDEX_op_qemu_ld_i32:
 case INDEX_op_qemu_st_i32:
-tcg_out_op_t(s, opc);
-tcg_out_r(s, *args++);
-tcg_out_r(s, *args++);
-if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-tcg_out_r(s, *args++);
+if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+tcg_out_op_rrm(s, opc, args[0], args[1], args[2]);
+} else {
+tcg_out_op_rrrm(s, opc, args[0], args[1], args[2], args[3]);
 }
-tcg_out32(s, *args++);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 case INDEX_op_qemu_ld_i64:
 case INDEX_op_qemu_st_i64:
-tcg_out_op_t(s, opc);
-tcg_out_r(s, *args++);
-if (TCG_TARGET_REG_BITS == 32) {
-tcg_out_r(s, *args++);
+if (TCG_TARGET_REG_BITS == 64) {
+tcg_out_op_rrm(s, opc, args[0], args[1], args[2]);
+} else if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+tcg_out_op_rrrm(s, opc, args[0], args[1], args[2], args[3]);
+} else {
+tcg_out_op_m(s, opc, args[0], args[1],
+ args[2], args[3], args[4]);
 }
-tcg_out_r(s, *args++);
-if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-tcg_out_r(s, *args++);
-}
-tcg_out32(s, *args++);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 case INDEX_op_mb:
-- 
2.25.1




[PATCH v2 84/93] tcg/tci: Emit setcond before brcond

2021-02-03 Thread Richard Henderson
The encoding planned for tci does not have enough room for
brcond2, with 4 registers and a condition as input as well
as the label.  Resolve the condition into TCG_REG_TMP, and
relax brcond to one register plus a label, considering the
condition to always be reg != 0.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c| 68 ++--
 tcg/tci/tcg-target.c.inc | 52 +++---
 2 files changed, 35 insertions(+), 85 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index d27db9f720..e7268b13e1 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -141,6 +141,16 @@ static void tci_args_nl(const uint8_t **tb_ptr, uint8_t 
*n0, void **l1)
 check_size(start, tb_ptr);
 }
 
+static void tci_args_rl(const uint8_t **tb_ptr, TCGReg *r0, void **l1)
+{
+const uint8_t *start = *tb_ptr;
+
+*r0 = tci_read_r(tb_ptr);
+*l1 = (void *)tci_read_label(tb_ptr);
+
+check_size(start, tb_ptr);
+}
+
 static void tci_args_rr(const uint8_t **tb_ptr,
 TCGReg *r0, TCGReg *r1)
 {
@@ -212,19 +222,6 @@ static void tci_args_rrs(const uint8_t **tb_ptr,
 check_size(start, tb_ptr);
 }
 
-static void tci_args_rrcl(const uint8_t **tb_ptr,
-  TCGReg *r0, TCGReg *r1, TCGCond *c2, void **l3)
-{
-const uint8_t *start = *tb_ptr;
-
-*r0 = tci_read_r(tb_ptr);
-*r1 = tci_read_r(tb_ptr);
-*c2 = tci_read_b(tb_ptr);
-*l3 = (void *)tci_read_label(tb_ptr);
-
-check_size(start, tb_ptr);
-}
-
 static void tci_args_rrrc(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
@@ -293,21 +290,6 @@ static void tci_args_(const uint8_t **tb_ptr,
 check_size(start, tb_ptr);
 }
 
-static void tci_args_cl(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
-TCGReg *r2, TCGReg *r3, TCGCond *c4, void **l5)
-{
-const uint8_t *start = *tb_ptr;
-
-*r0 = tci_read_r(tb_ptr);
-*r1 = tci_read_r(tb_ptr);
-*r2 = tci_read_r(tb_ptr);
-*r3 = tci_read_r(tb_ptr);
-*c4 = tci_read_b(tb_ptr);
-*l5 = (void *)tci_read_label(tb_ptr);
-
-check_size(start, tb_ptr);
-}
-
 static void tci_args_rc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
 TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
 {
@@ -723,8 +705,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #endif
 case INDEX_op_brcond_i32:
-tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
-if (tci_compare32(regs[r0], regs[r1], condition)) {
+tci_args_rl(&tb_ptr, &r0, &ptr);
+if ((uint32_t)regs[r0]) {
 tb_ptr = ptr;
 }
 break;
@@ -741,15 +723,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 T2 = tci_uint64(regs[r5], regs[r4]);
 tci_write_reg64(regs, r1, r0, T1 - T2);
 break;
-case INDEX_op_brcond2_i32:
-tci_args_cl(&tb_ptr, &r0, &r1, &r2, &r3, &condition, &ptr);
-T1 = tci_uint64(regs[r1], regs[r0]);
-T2 = tci_uint64(regs[r3], regs[r2]);
-if (tci_compare64(T1, T2, condition)) {
-tb_ptr = ptr;
-continue;
-}
-break;
 case INDEX_op_mulu2_i32:
 tci_args_(&tb_ptr, &r0, &r1, &r2, &r3);
 tci_write_reg64(regs, r1, r0, (uint64_t)regs[r2] * regs[r3]);
@@ -877,8 +850,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #endif
 case INDEX_op_brcond_i64:
-tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
-if (tci_compare64(regs[r0], regs[r1], condition)) {
+tci_args_rl(&tb_ptr, &r0, &ptr);
+if (regs[r0]) {
 tb_ptr = ptr;
 }
 break;
@@ -1188,9 +1161,9 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
 
 case INDEX_op_brcond_i32:
 case INDEX_op_brcond_i64:
-tci_args_rrcl(&tb_ptr, &r0, &r1, &c, &ptr);
-info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%p",
-   op_name, str_r(r0), str_r(r1), str_c(c), ptr);
+tci_args_rl(&tb_ptr, &r0, &ptr);
+info->fprintf_func(info->stream, "%-12s  %s,0,ne,%p",
+   op_name, str_r(r0), ptr);
 break;
 
 case INDEX_op_setcond_i32:
@@ -1315,13 +1288,6 @@ int print_insn_tci(bfd_vma addr, disassemble_info *info)
str_r(r3), str_r(r4), str_c(c));
 break;
 
-case INDEX_op_brcond2_i32:
-tci_args_cl(&tb_ptr, &r0, &r1, &r2, &r3, &c, &ptr);
-info->fprintf_func(info->stream, "%-12s  %s,%s,%s,%s,%s,%p",
-   op_name, str_r(r0), str_r(r1),
-   str_r(r2), str_r(r3), str_c(c), ptr);
-break;
-
 case INDEX_op_mulu2_i32:
 tci_args_(&tb_ptr, &r0, &r1,

[PATCH v2 80/93] tcg/tci: Split out tcg_out_op_v

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index f93772f01f..eeafec6d44 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -303,6 +303,15 @@ static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void 
*p0)
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_v(TCGContext *s, TCGOpcode op)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
 {
 uint8_t *old_code_ptr = s->code_ptr;
@@ -587,8 +596,6 @@ static void tcg_out_call(TCGContext *s, const tcg_insn_unit 
*arg)
 static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
const int *const_args)
 {
-uint8_t *old_code_ptr = s->code_ptr;
-
 switch (opc) {
 case INDEX_op_exit_tb:
 tcg_out_op_p(s, opc, (void *)args[0]);
@@ -725,8 +732,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 break;
 
 case INDEX_op_mb:
-tcg_out_op_t(s, opc);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_v(s, opc);
 break;
 
 case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
-- 
2.25.1




[PATCH v2 73/93] tcg/tci: Split out tcg_out_op_rrrrrc

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 5848779208..8eda159dde 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -355,6 +355,25 @@ static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+#if TCG_TARGET_REG_BITS == 32
+static void tcg_out_op_rc(TCGContext *s, TCGOpcode op,
+  TCGReg r0, TCGReg r1, TCGReg r2,
+  TCGReg r3, TCGReg r4, TCGCond c5)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out_r(s, r3);
+tcg_out_r(s, r4);
+tcg_out8(s, c5);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+#endif
+
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
intptr_t offset)
 {
@@ -473,15 +492,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_setcond2_i32:
-/* setcond2_i32 cond, t0, t1_low, t1_high, t2_low, t2_high */
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-tcg_out_r(s, args[3]);
-tcg_out_r(s, args[4]);
-tcg_out8(s, args[5]);   /* condition */
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_rc(s, opc, args[0], args[1], args[2],
+  args[3], args[4], args[5]);
 break;
 #endif
 
-- 
2.25.1




[PATCH v2 79/93] tcg/tci: Split out tcg_out_op_{rrm,rrrm,rrrrm}

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 70 ++--
 1 file changed, 53 insertions(+), 17 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index fb4aacaca3..f93772f01f 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -314,6 +314,19 @@ static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, 
TCGReg r0, TCGReg r1)
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrm(TCGContext *s, TCGOpcode op,
+   TCGReg r0, TCGReg r1, TCGArg m2)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out32(s, m2);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
TCGReg r0, TCGReg r1, TCGReg r2)
 {
@@ -369,6 +382,20 @@ static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrm(TCGContext *s, TCGOpcode op,
+TCGReg r0, TCGReg r1, TCGReg r2, TCGArg m3)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out32(s, m3);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
  TCGReg r1, TCGReg r2, uint8_t b3, uint8_t b4)
 {
@@ -384,6 +411,21 @@ static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, 
TCGReg r0,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_m(TCGContext *s, TCGOpcode op, TCGReg r0,
+ TCGReg r1, TCGReg r2, TCGReg r3, TCGArg m4)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out_r(s, r3);
+tcg_out32(s, m4);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_(TCGContext *s, TCGOpcode op,
 TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
@@ -663,29 +705,23 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 
 case INDEX_op_qemu_ld_i32:
 case INDEX_op_qemu_st_i32:
-tcg_out_op_t(s, opc);
-tcg_out_r(s, *args++);
-tcg_out_r(s, *args++);
-if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-tcg_out_r(s, *args++);
+if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+tcg_out_op_rrm(s, opc, args[0], args[1], args[2]);
+} else {
+tcg_out_op_rrrm(s, opc, args[0], args[1], args[2], args[3]);
 }
-tcg_out32(s, *args++);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 case INDEX_op_qemu_ld_i64:
 case INDEX_op_qemu_st_i64:
-tcg_out_op_t(s, opc);
-tcg_out_r(s, *args++);
-if (TCG_TARGET_REG_BITS == 32) {
-tcg_out_r(s, *args++);
+if (TCG_TARGET_REG_BITS == 64) {
+tcg_out_op_rrm(s, opc, args[0], args[1], args[2]);
+} else if (TARGET_LONG_BITS <= TCG_TARGET_REG_BITS) {
+tcg_out_op_rrrm(s, opc, args[0], args[1], args[2], args[3]);
+} else {
+tcg_out_op_m(s, opc, args[0], args[1],
+ args[2], args[3], args[4]);
 }
-tcg_out_r(s, *args++);
-if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-tcg_out_r(s, *args++);
-}
-tcg_out32(s, *args++);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 case INDEX_op_mb:
-- 
2.25.1




[PATCH v2 78/93] tcg/tci: Split out tcg_out_op_rrrrcl

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index c2bbd85130..fb4aacaca3 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -399,6 +399,23 @@ static void tcg_out_op_(TCGContext *s, TCGOpcode op,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_cl(TCGContext *s, TCGOpcode op,
+  TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3,
+  TCGCond c4, TCGLabel *l5)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out_r(s, r3);
+tcg_out8(s, c4);
+tci_out_label(s, l5);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rc(TCGContext *s, TCGOpcode op,
   TCGReg r0, TCGReg r1, TCGReg r2,
   TCGReg r3, TCGReg r4, TCGCond c5)
@@ -636,14 +653,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
   args[3], args[4], args[5]);
 break;
 case INDEX_op_brcond2_i32:
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-tcg_out_r(s, args[3]);
-tcg_out8(s, args[4]);   /* condition */
-tci_out_label(s, arg_label(args[5]));
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_cl(s, opc, args[0], args[1], args[2],
+  args[3], args[4], arg_label(args[5]));
 break;
 case INDEX_op_mulu2_i32:
 tcg_out_op_(s, opc, args[0], args[1], args[2], args[3]);
-- 
2.25.1




[PATCH v2 72/93] tcg/tci: Split out tcg_out_op_rrrc

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 546424c2bd..5848779208 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -341,6 +341,20 @@ static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
+TCGReg r0, TCGReg r1, TCGReg r2, TCGCond c3)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out8(s, c3);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
intptr_t offset)
 {
@@ -454,12 +468,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 break;
 
 CASE_32_64(setcond)
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-tcg_out8(s, args[3]);   /* condition */
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_rrrc(s, opc, args[0], args[1], args[2], args[3]);
 break;
 
 #if TCG_TARGET_REG_BITS == 32
-- 
2.25.1




[PATCH v2 74/93] tcg/tci: Split out tcg_out_op_rrrbb

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 8eda159dde..6c743a8fbd 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -355,6 +355,21 @@ static void tcg_out_op_rrrc(TCGContext *s, TCGOpcode op,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, TCGReg r0,
+ TCGReg r1, TCGReg r2, uint8_t b3, uint8_t b4)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out8(s, b3);
+tcg_out8(s, b4);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tcg_out_op_rc(TCGContext *s, TCGOpcode op,
   TCGReg r0, TCGReg r1, TCGReg r2,
@@ -538,7 +553,6 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 break;
 
 CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
-tcg_out_op_t(s, opc);
 {
 TCGArg pos = args[3], len = args[4];
 TCGArg max = opc == INDEX_op_deposit_i32 ? 32 : 64;
@@ -546,13 +560,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 tcg_debug_assert(pos < max);
 tcg_debug_assert(pos + len <= max);
 
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-tcg_out8(s, pos);
-tcg_out8(s, len);
+tcg_out_op_rrrbb(s, opc, args[0], args[1], args[2], pos, len);
 }
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 CASE_32_64(brcond)
-- 
2.25.1




[PATCH v2 77/93] tcg/tci: Split out tcg_out_op_rrrr

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index f7595fbd65..c2bbd85130 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -385,6 +385,20 @@ static void tcg_out_op_rrrbb(TCGContext *s, TCGOpcode op, 
TCGReg r0,
 }
 
 #if TCG_TARGET_REG_BITS == 32
+static void tcg_out_op_(TCGContext *s, TCGOpcode op,
+TCGReg r0, TCGReg r1, TCGReg r2, TCGReg r3)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out_r(s, r3);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rc(TCGContext *s, TCGOpcode op,
   TCGReg r0, TCGReg r1, TCGReg r2,
   TCGReg r3, TCGReg r4, TCGCond c5)
@@ -632,12 +646,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 case INDEX_op_mulu2_i32:
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-tcg_out_r(s, args[3]);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_(s, opc, args[0], args[1], args[2], args[3]);
 break;
 #endif
 
-- 
2.25.1




[PATCH v2 76/93] tcg/tci: Split out tcg_out_op_rrrrrr

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 8cc63124d4..f7595fbd65 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -401,6 +401,23 @@ static void tcg_out_op_rc(TCGContext *s, TCGOpcode op,
 
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
+
+static void tcg_out_op_rr(TCGContext *s, TCGOpcode op,
+  TCGReg r0, TCGReg r1, TCGReg r2,
+  TCGReg r3, TCGReg r4, TCGReg r5)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+tcg_out_r(s, r3);
+tcg_out_r(s, r4);
+tcg_out_r(s, r5);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
 #endif
 
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
@@ -601,14 +618,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_add2_i32:
 case INDEX_op_sub2_i32:
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-tcg_out_r(s, args[3]);
-tcg_out_r(s, args[4]);
-tcg_out_r(s, args[5]);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_rr(s, opc, args[0], args[1], args[2],
+  args[3], args[4], args[5]);
 break;
 case INDEX_op_brcond2_i32:
 tcg_out_op_t(s, opc);
-- 
2.25.1




[PATCH v2 69/93] tcg/tci: Split out tcg_out_op_p

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 1e3f2c4049..cb0cbbb8da 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -293,6 +293,16 @@ static void tcg_out_op_l(TCGContext *s, TCGOpcode op, 
TCGLabel *l0)
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void *p0)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_i(s, (uintptr_t)p0);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
TCGReg r0, TCGReg r1, intptr_t i2)
 {
@@ -403,17 +413,13 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 
 switch (opc) {
 case INDEX_op_exit_tb:
-tcg_out_op_t(s, opc);
-tcg_out_i(s, args[0]);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_p(s, opc, (void *)args[0]);
 break;
 
 case INDEX_op_goto_tb:
 tcg_debug_assert(s->tb_jmp_insn_offset == 0);
 /* indirect jump method. */
-tcg_out_op_t(s, opc);
-tcg_out_i(s, (uintptr_t)(s->tb_jmp_target_addr + args[0]));
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_p(s, opc, s->tb_jmp_target_addr + args[0]);
 set_jmp_reset_offset(s, args[0]);
 break;
 
-- 
2.25.1




[PATCH v2 63/93] tcg/tci: Use ffi for calls

2021-02-03 Thread Richard Henderson
This requires adjusting where arguments are stored.
Place them on the stack at left-aligned positions.
Adjust the stack frame to be at entirely positive offsets.

Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h|   1 +
 tcg/tci/tcg-target.h |   2 +-
 tcg/tcg.c|  72 -
 tcg/tci.c| 131 ++-
 tcg/tci/tcg-target.c.inc |  50 +++
 5 files changed, 143 insertions(+), 113 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 0f0695e90d..e5573a9877 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -53,6 +53,7 @@
 #define MAX_OPC_PARAM (4 + (MAX_OPC_PARAM_PER_ARG * MAX_OPC_PARAM_ARGS))
 
 #define CPU_TEMP_BUF_NLONGS 128
+#define TCG_STATIC_FRAME_SIZE  (CPU_TEMP_BUF_NLONGS * sizeof(long))
 
 /* Default target word size to pointer size.  */
 #ifndef TCG_TARGET_REG_BITS
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 52af6d8bc5..4df10e2e83 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -161,7 +161,7 @@ typedef enum {
 
 /* Used for function call generation. */
 #define TCG_TARGET_CALL_STACK_OFFSET0
-#define TCG_TARGET_STACK_ALIGN  16
+#define TCG_TARGET_STACK_ALIGN  8
 
 #define HAVE_TCG_QEMU_TB_EXEC
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 6382112215..92aec0d238 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -208,6 +208,18 @@ static size_t tree_size;
 static TCGRegSet tcg_target_available_regs[TCG_TYPE_COUNT];
 static TCGRegSet tcg_target_call_clobber_regs;
 
+typedef struct TCGHelperInfo {
+void *func;
+#ifdef CONFIG_TCG_INTERPRETER
+ffi_cif *cif;
+#endif
+const char *name;
+unsigned flags;
+unsigned sizemask;
+} TCGHelperInfo;
+
+static GHashTable *helper_table;
+
 #if TCG_TARGET_INSN_UNIT_SIZE == 1
 static __attribute__((unused)) inline void tcg_out8(TCGContext *s, uint8_t v)
 {
@@ -1084,16 +1096,6 @@ void tcg_pool_reset(TCGContext *s)
 s->pool_current = NULL;
 }
 
-typedef struct TCGHelperInfo {
-void *func;
-#ifdef CONFIG_TCG_INTERPRETER
-ffi_cif *cif;
-#endif
-const char *name;
-unsigned flags;
-unsigned sizemask;
-} TCGHelperInfo;
-
 #include "exec/helper-proto.h"
 
 #ifdef CONFIG_TCG_INTERPRETER
@@ -1103,7 +1105,6 @@ typedef struct TCGHelperInfo {
 static const TCGHelperInfo all_helpers[] = {
 #include "exec/helper-tcg.h"
 };
-static GHashTable *helper_table;
 
 static int indirect_reg_alloc_order[ARRAY_SIZE(tcg_target_reg_alloc_order)];
 static void process_op_defs(TCGContext *s);
@@ -2081,25 +2082,38 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, 
TCGTemp **args)
 
 real_args = 0;
 for (i = 0; i < nargs; i++) {
-int is_64bit = sizemask & (1 << (i+1)*2);
-if (TCG_TARGET_REG_BITS < 64 && is_64bit) {
-#ifdef TCG_TARGET_CALL_ALIGN_ARGS
-/* some targets want aligned 64 bit args */
-if (real_args & 1) {
-op->args[pi++] = TCG_CALL_DUMMY_ARG;
-real_args++;
-}
+bool is_64bit = sizemask & (1 << (i+1)*2);
+bool want_align = false;
+
+#if defined(CONFIG_TCG_INTERPRETER)
+/*
+ * Align all arguments, so that they land in predictable places
+ * for passing off to ffi_call.
+ */
+want_align = true;
+#elif defined(TCG_TARGET_CALL_ALIGN_ARGS)
+/* Some targets want aligned 64 bit args */
+want_align = is_64bit;
 #endif
-   /* If stack grows up, then we will be placing successive
-  arguments at lower addresses, which means we need to
-  reverse the order compared to how we would normally
-  treat either big or little-endian.  For those arguments
-  that will wind up in registers, this still works for
-  HPPA (the only current STACK_GROWSUP target) since the
-  argument registers are *also* allocated in decreasing
-  order.  If another such target is added, this logic may
-  have to get more complicated to differentiate between
-  stack arguments and register arguments.  */
+
+if (TCG_TARGET_REG_BITS < 64 && want_align && (real_args & 1)) {
+op->args[pi++] = TCG_CALL_DUMMY_ARG;
+real_args++;
+}
+
+if (TCG_TARGET_REG_BITS < 64 && is_64bit) {
+   /*
+* If stack grows up, then we will be placing successive
+* arguments at lower addresses, which means we need to
+* reverse the order compared to how we would normally
+* treat either big or little-endian.  For those arguments
+* that will wind up in registers, this still works for
+* HPPA (the only current STACK_GROWSUP target) since the
+* argument registers are *also* allocated in decreasing
+* order.  If another such target is added, this logic may
+* have to get more complicated to differentiate betwe

[PATCH v2 71/93] tcg/tci: Split out tcg_out_op_rrr

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 272e3ca70b..546424c2bd 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -314,6 +314,19 @@ static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, 
TCGReg r0, TCGReg r1)
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rrr(TCGContext *s, TCGOpcode op,
+   TCGReg r0, TCGReg r1, TCGReg r2)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_out_r(s, r2);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
TCGReg r0, TCGReg r1, intptr_t i2)
 {
@@ -500,11 +513,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 CASE_32_64(divu) /* Optional (TCG_TARGET_HAS_div_*). */
 CASE_32_64(rem)  /* Optional (TCG_TARGET_HAS_div_*). */
 CASE_32_64(remu) /* Optional (TCG_TARGET_HAS_div_*). */
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_rrr(s, opc, args[0], args[1], args[2]);
 break;
 
 CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
-- 
2.25.1




[PATCH v2 68/93] tcg/tci: Split out tcg_out_op_l

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 707f801099..1e3f2c4049 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -283,6 +283,16 @@ static void stack_bounds_check(TCGReg base, target_long 
offset)
 }
 }
 
+static void tcg_out_op_l(TCGContext *s, TCGOpcode op, TCGLabel *l0)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tci_out_label(s, l0);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
TCGReg r0, TCGReg r1, intptr_t i2)
 {
@@ -408,9 +418,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 break;
 
 case INDEX_op_br:
-tcg_out_op_t(s, opc);
-tci_out_label(s, arg_label(args[0]));
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_l(s, opc, arg_label(args[0]));
 break;
 
 CASE_32_64(setcond)
-- 
2.25.1




[PATCH v2 70/93] tcg/tci: Split out tcg_out_op_rr

2021-02-03 Thread Richard Henderson
At the same time, validate the type argument in tcg_out_mov.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index cb0cbbb8da..272e3ca70b 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -303,6 +303,17 @@ static void tcg_out_op_p(TCGContext *s, TCGOpcode op, void 
*p0)
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_op_rr(TCGContext *s, TCGOpcode op, TCGReg r0, TCGReg r1)
+{
+uint8_t *old_code_ptr = s->code_ptr;
+
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
+}
+
 static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
TCGReg r0, TCGReg r1, intptr_t i2)
 {
@@ -337,16 +348,18 @@ static void tcg_out_ld(TCGContext *s, TCGType type, 
TCGReg val, TCGReg base,
 
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
-uint8_t *old_code_ptr = s->code_ptr;
-tcg_debug_assert(ret != arg);
-#if TCG_TARGET_REG_BITS == 32
-tcg_out_op_t(s, INDEX_op_mov_i32);
-#else
-tcg_out_op_t(s, INDEX_op_mov_i64);
+switch (type) {
+case TCG_TYPE_I32:
+tcg_out_op_rr(s, INDEX_op_mov_i32, ret, arg);
+break;
+#if TCG_TARGET_REG_BITS == 64
+case TCG_TYPE_I64:
+tcg_out_op_rr(s, INDEX_op_mov_i64, ret, arg);
+break;
 #endif
-tcg_out_r(s, ret);
-tcg_out_r(s, arg);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+default:
+g_assert_not_reached();
+}
 return true;
 }
 
@@ -534,10 +547,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 CASE_32_64(bswap16)  /* Optional (TCG_TARGET_HAS_bswap16_*). */
 CASE_32_64(bswap32)  /* Optional (TCG_TARGET_HAS_bswap32_*). */
 CASE_64(bswap64) /* Optional (TCG_TARGET_HAS_bswap64_i64). */
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_rr(s, opc, args[0], args[1]);
 break;
 
 #if TCG_TARGET_REG_BITS == 32
-- 
2.25.1




[PATCH v2 59/93] tcg/tci: Hoist op_size checking into tci_args_*

2021-02-03 Thread Richard Henderson
This performs the size check while reading the arguments,
which means that we don't have to arrange for it to be
done after the operation.  Which tidies all of the branches.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 87 ++-
 1 file changed, 73 insertions(+), 14 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index a1846825ea..3dc89ed829 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -24,7 +24,7 @@
 #if defined(CONFIG_DEBUG_TCG)
 # define tci_assert(cond) assert(cond)
 #else
-# define tci_assert(cond) ((void)0)
+# define tci_assert(cond) ((void)(cond))
 #endif
 
 #include "qemu-common.h"
@@ -135,146 +135,217 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   s = signed ldst offset
  */
 
+static void check_size(const uint8_t *start, const uint8_t **tb_ptr)
+{
+const uint8_t *old_code_ptr = start - 2;
+uint8_t op_size = old_code_ptr[1];
+tci_assert(*tb_ptr == old_code_ptr + op_size);
+}
+
 static void tci_args_l(const uint8_t **tb_ptr, void **l0)
 {
+const uint8_t *start = *tb_ptr;
+
 *l0 = (void *)tci_read_label(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rr(const uint8_t **tb_ptr,
 TCGReg *r0, TCGReg *r1)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_ri(const uint8_t **tb_ptr,
 TCGReg *r0, tcg_target_ulong *i1)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *i1 = tci_read_i32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 #if TCG_TARGET_REG_BITS == 64
 static void tci_args_rI(const uint8_t **tb_ptr,
 TCGReg *r0, tcg_target_ulong *i1)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *i1 = tci_read_i(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 #endif
 
 static void tci_args_rrm(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, TCGMemOpIdx *m2)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *m2 = tci_read_i32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrr(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrs(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *i2 = tci_read_s32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrcl(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGCond *c2, void **l3)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *c2 = tci_read_b(tb_ptr);
 *l3 = (void *)tci_read_label(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrc(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *c3 = tci_read_b(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrm(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGMemOpIdx *m3)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *m3 = tci_read_i32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
TCGReg *r2, uint8_t *i3, uint8_t *i4)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *i3 = tci_read_b(tb_ptr);
 *i4 = tci_read_b(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_m(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
TCGReg *r2, TCGReg *r3, TCGMemOpIdx *m4)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *r3 = tci_read_r(tb_ptr);
 *m4 = tci_read_i32(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 #if TCG_TARGET_REG_BITS == 32
 static void tci_args_(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
 {
+const uint8_t *start = *tb_ptr;
+
 *r0 = tci_read_r(tb_ptr);
 *r1 = tci_read_r(tb_ptr);
 *r2 = tci_read_r(tb_ptr);
 *r3 = tci_read_r(tb_ptr);
+
+check_size(start, tb_ptr);
 }
 
 static void tci_args_cl(const ui

[PATCH v2 61/93] tcg/tci: Implement the disassembler properly

2021-02-03 Thread Richard Henderson
Actually print arguments as opposed to simply the opcodes
and, uselessly, the argument counts.  Reuse all of the helpers
developed as part of the interpreter.

Signed-off-by: Richard Henderson 
---
 meson.build   |   2 +-
 include/tcg/tcg-opc.h |   2 -
 disas/tci.c   |  61 -
 tcg/tci.c | 283 ++
 4 files changed, 284 insertions(+), 64 deletions(-)
 delete mode 100644 disas/tci.c

diff --git a/meson.build b/meson.build
index 2d8b433ff0..475d8a94ea 100644
--- a/meson.build
+++ b/meson.build
@@ -1901,7 +1901,7 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tcg/tcg-op.c',
   'tcg/tcg.c',
 ))
-specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('disas/tci.c', 
'tcg/tci.c'))
+specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
 
 subdir('backends')
 subdir('disas')
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index 900984c005..bbb0884af8 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -278,10 +278,8 @@ DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 #ifdef TCG_TARGET_INTERPRETER
 /* These opcodes are only for use between the tci generator and interpreter. */
 DEF(tci_movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
-#if TCG_TARGET_REG_BITS == 64
 DEF(tci_movi_i64, 1, 0, 1, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 #endif
-#endif
 
 #undef TLADDR_ARGS
 #undef DATA64_ARGS
diff --git a/disas/tci.c b/disas/tci.c
deleted file mode 100644
index f1d6c6b469..00
--- a/disas/tci.c
+++ /dev/null
@@ -1,61 +0,0 @@
-/*
- * Tiny Code Interpreter for QEMU - disassembler
- *
- * Copyright (c) 2011 Stefan Weil
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
- */
-
-#include "qemu/osdep.h"
-#include "qemu-common.h"
-#include "disas/dis-asm.h"
-#include "tcg/tcg.h"
-
-/* Disassemble TCI bytecode. */
-int print_insn_tci(bfd_vma addr, disassemble_info *info)
-{
-int length;
-uint8_t byte;
-int status;
-TCGOpcode op;
-
-status = info->read_memory_func(addr, &byte, 1, info);
-if (status != 0) {
-info->memory_error_func(status, addr, info);
-return -1;
-}
-op = byte;
-
-addr++;
-status = info->read_memory_func(addr, &byte, 1, info);
-if (status != 0) {
-info->memory_error_func(status, addr, info);
-return -1;
-}
-length = byte;
-
-if (op >= tcg_op_defs_max) {
-info->fprintf_func(info->stream, "illegal opcode %d", op);
-} else {
-const TCGOpDef *def = &tcg_op_defs[op];
-int nb_oargs = def->nb_oargs;
-int nb_iargs = def->nb_iargs;
-int nb_cargs = def->nb_cargs;
-/* TODO: Improve disassembler output. */
-info->fprintf_func(info->stream, "%s\to=%d i=%d c=%d",
-   def->name, nb_oargs, nb_iargs, nb_cargs);
-}
-
-return length;
-}
diff --git a/tcg/tci.c b/tcg/tci.c
index 3dc89ed829..6843e837ae 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -1076,3 +1076,286 @@ uintptr_t QEMU_DISABLE_CFI 
tcg_qemu_tb_exec(CPUArchState *env,
 }
 }
 }
+
+/*
+ * Disassembler that matches the interpreter
+ */
+
+static const char *str_r(TCGReg r)
+{
+static const char regs[TCG_TARGET_NB_REGS][4] = {
+"r0", "r1", "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
+"r8", "r9", "r10", "r11", "r12", "r13", "env", "sp"
+};
+
+QEMU_BUILD_BUG_ON(TCG_AREG0 != TCG_REG_R14);
+QEMU_BUILD_BUG_ON(TCG_REG_CALL_STACK != TCG_REG_R15);
+
+assert((unsigned)r < TCG_TARGET_NB_REGS);
+return regs[r];
+}
+
+static const char *str_c(TCGCond c)
+{
+static const char cond[16][8] = {
+[TCG_COND_NEVER] = "never",
+[TCG_COND_ALWAYS] = "always",
+[TCG_COND_EQ] = "eq",
+[TCG_COND_NE] = "ne",
+[TCG_COND_LT] = "lt",
+[TCG_COND_GE] = "ge",
+[TCG_COND_LE] = "le",
+[TCG_COND_GT] = "gt",
+[TCG_COND_LTU] = "ltu",
+[TCG_COND_GEU] = "geu",
+[TCG_COND_LEU] = "leu",
+[TCG_COND_GTU] = "gtu",
+};
+
+assert((unsigned)c < ARRAY_SIZE(cond));
+assert(cond[c][0] != 0);
+return cond[c];
+}
+
+/* Disassemble TCI bytecode. */
+int print_insn_tci(bfd_vma addr, disassemble_info *info)
+{
+uint8_t buf[256];
+int length, status;
+const TCGOpDef *def;
+const char *op_name;
+TCGOpcode op;
+TCGReg r0, r1, r2, r3;
+#if TCG

[PATCH v2 65/93] tcg/tci: Move call-return regs to end of tcg_target_reg_alloc_order

2021-02-03 Thread Richard Henderson
As the only call-clobbered regs for TCI, these should
receive the least priority.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 4dae09deda..53edc50a3b 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -170,8 +170,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 }
 
 static const int tcg_target_reg_alloc_order[] = {
-TCG_REG_R0,
-TCG_REG_R1,
 TCG_REG_R2,
 TCG_REG_R3,
 TCG_REG_R4,
@@ -186,6 +184,8 @@ static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_R13,
 TCG_REG_R14,
 TCG_REG_R15,
+TCG_REG_R1,
+TCG_REG_R0,
 };
 
 #if MAX_OPC_PARAM_IARGS != 6
-- 
2.25.1




[PATCH v2 67/93] tcg/tci: Split out tcg_out_op_rrs

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 84 +++-
 1 file changed, 39 insertions(+), 45 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 050d514853..707f801099 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -283,32 +283,38 @@ static void stack_bounds_check(TCGReg base, target_long 
offset)
 }
 }
 
-static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
-   intptr_t arg2)
+static void tcg_out_op_rrs(TCGContext *s, TCGOpcode op,
+   TCGReg r0, TCGReg r1, intptr_t i2)
 {
 uint8_t *old_code_ptr = s->code_ptr;
 
-stack_bounds_check(arg1, arg2);
-if (type == TCG_TYPE_I32) {
-tcg_out_op_t(s, INDEX_op_ld_i32);
-tcg_out_r(s, ret);
-tcg_out_r(s, arg1);
-tcg_out32(s, arg2);
-} else {
-tcg_debug_assert(type == TCG_TYPE_I64);
-#if TCG_TARGET_REG_BITS == 64
-tcg_out_op_t(s, INDEX_op_ld_i64);
-tcg_out_r(s, ret);
-tcg_out_r(s, arg1);
-tcg_debug_assert(arg2 == (int32_t)arg2);
-tcg_out32(s, arg2);
-#else
-TODO();
-#endif
-}
+tcg_out_op_t(s, op);
+tcg_out_r(s, r0);
+tcg_out_r(s, r1);
+tcg_debug_assert(i2 == (int32_t)i2);
+tcg_out32(s, i2);
+
 old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
+   intptr_t offset)
+{
+stack_bounds_check(base, offset);
+switch (type) {
+case TCG_TYPE_I32:
+tcg_out_op_rrs(s, INDEX_op_ld_i32, val, base, offset);
+break;
+#if TCG_TARGET_REG_BITS == 64
+case TCG_TYPE_I64:
+tcg_out_op_rrs(s, INDEX_op_ld_i64, val, base, offset);
+break;
+#endif
+default:
+g_assert_not_reached();
+}
+}
+
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
 {
 uint8_t *old_code_ptr = s->code_ptr;
@@ -444,12 +450,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 CASE_64(st32)
 CASE_64(st)
 stack_bounds_check(args[1], args[2]);
-tcg_out_op_t(s, opc);
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_debug_assert(args[2] == (int32_t)args[2]);
-tcg_out32(s, args[2]);
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
+tcg_out_op_rrs(s, opc, args[0], args[1], args[2]);
 break;
 
 CASE_32_64(add)
@@ -597,29 +598,22 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 }
 }
 
-static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
-   intptr_t arg2)
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg val, TCGReg base,
+   intptr_t offset)
 {
-uint8_t *old_code_ptr = s->code_ptr;
-
-stack_bounds_check(arg1, arg2);
-if (type == TCG_TYPE_I32) {
-tcg_out_op_t(s, INDEX_op_st_i32);
-tcg_out_r(s, arg);
-tcg_out_r(s, arg1);
-tcg_out32(s, arg2);
-} else {
-tcg_debug_assert(type == TCG_TYPE_I64);
+stack_bounds_check(base, offset);
+switch (type) {
+case TCG_TYPE_I32:
+tcg_out_op_rrs(s, INDEX_op_st_i32, val, base, offset);
+break;
 #if TCG_TARGET_REG_BITS == 64
-tcg_out_op_t(s, INDEX_op_st_i64);
-tcg_out_r(s, arg);
-tcg_out_r(s, arg1);
-tcg_out32(s, arg2);
-#else
-TODO();
+case TCG_TYPE_I64:
+tcg_out_op_rrs(s, INDEX_op_st_i64, val, base, offset);
+break;
 #endif
+default:
+g_assert_not_reached();
 }
-old_code_ptr[1] = s->code_ptr - old_code_ptr;
 }
 
 static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
-- 
2.25.1




[PATCH v2 58/93] tcg/tci: Split out tci_args_{rrm,rrrm,rrrrm}

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 147 ++
 1 file changed, 81 insertions(+), 66 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index ddc138359b..a1846825ea 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -66,22 +66,18 @@ tci_write_reg(tcg_target_ulong *regs, TCGReg index, 
tcg_target_ulong value)
 regs[index] = value;
 }
 
-#if TCG_TARGET_REG_BITS == 32
 static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
 uint32_t low_index, uint64_t value)
 {
 tci_write_reg(regs, low_index, value);
 tci_write_reg(regs, high_index, value >> 32);
 }
-#endif
 
-#if TCG_TARGET_REG_BITS == 32
 /* Create a 64 bit value from two 32 bit values. */
 static uint64_t tci_uint64(uint32_t high, uint32_t low)
 {
 return ((uint64_t)high << 32) + low;
 }
-#endif
 
 /* Read constant byte from bytecode. */
 static uint8_t tci_read_b(const uint8_t **tb_ptr)
@@ -121,43 +117,6 @@ static int32_t tci_read_s32(const uint8_t **tb_ptr)
 return value;
 }
 
-/* Read indexed register (native size) from bytecode. */
-static tcg_target_ulong
-tci_read_rval(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-tcg_target_ulong value = tci_read_reg(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
-}
-
-#if TCG_TARGET_REG_BITS == 32
-/* Read two indexed registers (2 * 32 bit) from bytecode. */
-static uint64_t tci_read_r64(const tcg_target_ulong *regs,
- const uint8_t **tb_ptr)
-{
-uint32_t low = tci_read_rval(regs, tb_ptr);
-return tci_uint64(tci_read_rval(regs, tb_ptr), low);
-}
-#elif TCG_TARGET_REG_BITS == 64
-/* Read indexed register (64 bit) from bytecode. */
-static uint64_t tci_read_r64(const tcg_target_ulong *regs,
- const uint8_t **tb_ptr)
-{
-return tci_read_rval(regs, tb_ptr);
-}
-#endif
-
-/* Read indexed register(s) with target address from bytecode. */
-static target_ulong
-tci_read_ulong(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-target_ulong taddr = tci_read_rval(regs, tb_ptr);
-#if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
-taddr += (uint64_t)tci_read_rval(regs, tb_ptr) << 32;
-#endif
-return taddr;
-}
-
 static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
 {
 return tci_read_i(tb_ptr);
@@ -171,6 +130,7 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   b = immediate (bit position)
  *   i = immediate (uint32_t)
  *   I = immediate (tcg_target_ulong)
+ *   m = immediate (TCGMemOpIdx)
  *   r = register
  *   s = signed ldst offset
  */
@@ -203,6 +163,14 @@ static void tci_args_rI(const uint8_t **tb_ptr,
 }
 #endif
 
+static void tci_args_rrm(const uint8_t **tb_ptr,
+ TCGReg *r0, TCGReg *r1, TCGMemOpIdx *m2)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*m2 = tci_read_i32(tb_ptr);
+}
+
 static void tci_args_rrr(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
@@ -237,6 +205,15 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 *c3 = tci_read_b(tb_ptr);
 }
 
+static void tci_args_rrrm(const uint8_t **tb_ptr,
+  TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGMemOpIdx *m3)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*m3 = tci_read_i32(tb_ptr);
+}
+
 static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
TCGReg *r2, uint8_t *i3, uint8_t *i4)
 {
@@ -247,6 +224,16 @@ static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg 
*r0, TCGReg *r1,
 *i4 = tci_read_b(tb_ptr);
 }
 
+static void tci_args_m(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+   TCGReg *r2, TCGReg *r3, TCGMemOpIdx *m4)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+*m4 = tci_read_i32(tb_ptr);
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tci_args_(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
@@ -457,8 +444,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint8_t op_size = tb_ptr[1];
 const uint8_t *old_code_ptr = tb_ptr;
 #endif
-TCGReg r0, r1, r2;
-tcg_target_ulong t0;
+TCGReg r0, r1, r2, r3;
 tcg_target_ulong t1;
 TCGCond condition;
 target_ulong taddr;
@@ -466,7 +452,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint32_t tmp32;
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-TCGReg r3, r4, r5;
+TCGReg r4, r5;
 uint64_t T1, T2;
 #endif
 TCGMemOpIdx oi;
@@ -853,9 +839,13 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 continue;
 
 case INDEX_op_qemu_ld_i32:
-t0 = *tb_ptr++;
-taddr = tci_read_ulong(r

[PATCH v2 56/93] tcg/tci: Clean up deposit operations

2021-02-03 Thread Richard Henderson
Use the correct set of asserts during code generation.
We do not require the first input to overlap the output;
the existing interpreter already supported that.

Split out tci_args_rrrbb in the translator.
Use the deposit32/64 functions rather than inline expansion.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target-con-set.h |  1 -
 tcg/tci.c| 33 -
 tcg/tci/tcg-target.c.inc | 24 ++--
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/tcg/tci/tcg-target-con-set.h b/tcg/tci/tcg-target-con-set.h
index f51b7bcb13..316730f32c 100644
--- a/tcg/tci/tcg-target-con-set.h
+++ b/tcg/tci/tcg-target-con-set.h
@@ -13,7 +13,6 @@ C_O0_I2(r, r)
 C_O0_I3(r, r, r)
 C_O0_I4(r, r, r, r)
 C_O1_I1(r, r)
-C_O1_I2(r, 0, r)
 C_O1_I2(r, r, r)
 C_O1_I4(r, r, r, r, r)
 C_O2_I1(r, r, r)
diff --git a/tcg/tci.c b/tcg/tci.c
index cb24295cd9..e10ccfc344 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -168,6 +168,7 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   tci_args_
  * where arguments is a sequence of
  *
+ *   b = immediate (bit position)
  *   i = immediate (uint32_t)
  *   I = immediate (tcg_target_ulong)
  *   r = register
@@ -236,6 +237,16 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 *c3 = tci_read_b(tb_ptr);
 }
 
+static void tci_args_rrrbb(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+   TCGReg *r2, uint8_t *i3, uint8_t *i4)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*i3 = tci_read_b(tb_ptr);
+*i4 = tci_read_b(tb_ptr);
+}
+
 #if TCG_TARGET_REG_BITS == 32
 static void tci_args_(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
@@ -449,11 +460,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 TCGReg r0, r1, r2;
 tcg_target_ulong t0;
 tcg_target_ulong t1;
-tcg_target_ulong t2;
 TCGCond condition;
 target_ulong taddr;
-uint8_t tmp8;
-uint16_t tmp16;
+uint8_t pos, len;
 uint32_t tmp32;
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
@@ -644,13 +653,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 #if TCG_TARGET_HAS_deposit_i32
 case INDEX_op_deposit_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tmp16 = *tb_ptr++;
-tmp8 = *tb_ptr++;
-tmp32 = (((1 << tmp8) - 1) << tmp16);
-tci_write_reg(regs, t0, (t1 & ~tmp32) | ((t2 << tmp16) & tmp32));
+tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+regs[r0] = deposit32(regs[r1], pos, len, regs[r2]);
 break;
 #endif
 case INDEX_op_brcond_i32:
@@ -806,13 +810,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 #if TCG_TARGET_HAS_deposit_i64
 case INDEX_op_deposit_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tmp16 = *tb_ptr++;
-tmp8 = *tb_ptr++;
-tmp64 = (((1ULL << tmp8) - 1) << tmp16);
-tci_write_reg(regs, t0, (t1 & ~tmp64) | ((t2 << tmp16) & tmp64));
+tci_args_rrrbb(&tb_ptr, &r0, &r1, &r2, &pos, &len);
+regs[r0] = deposit64(regs[r1], pos, len, regs[r2]);
 break;
 #endif
 case INDEX_op_brcond_i64:
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 2c64b4f617..640407b4a8 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -126,11 +126,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_rotr_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
-return C_O1_I2(r, r, r);
-
 case INDEX_op_deposit_i32:
 case INDEX_op_deposit_i64:
-return C_O1_I2(r, 0, r);
+return C_O1_I2(r, r, r);
 
 case INDEX_op_brcond_i32:
 case INDEX_op_brcond_i64:
@@ -480,13 +478,19 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 break;
 
 CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
-tcg_out_r(s, args[0]);
-tcg_out_r(s, args[1]);
-tcg_out_r(s, args[2]);
-tcg_debug_assert(args[3] <= UINT8_MAX);
-tcg_out8(s, args[3]);
-tcg_debug_assert(args[4] <= UINT8_MAX);
-tcg_out8(s, args[4]);
+{
+TCGArg pos = args[3], len = args[4];
+TCGArg max = opc == INDEX_op_deposit_i32 ? 32 : 64;
+
+tcg_debug_assert(pos < max);
+tcg_debug_assert(pos + len <= max);
+
+tcg_out_r(s, args[0]);
+tcg_out_r(s, args[1]);
+tcg_out_r(s, args[2]);
+tcg_out8(s, pos);
+tcg_out8(s, len);
+}
 break;
 
 

[PATCH v2 55/93] tcg/tci: Split out tci_args_rrrr

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 84d77855ee..cb24295cd9 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -237,6 +237,15 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 }
 
 #if TCG_TARGET_REG_BITS == 32
+static void tci_args_(const uint8_t **tb_ptr,
+  TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGReg *r3)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+}
+
 static void tci_args_cl(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
 TCGReg *r2, TCGReg *r3, TCGCond *c4, void **l5)
 {
@@ -676,11 +685,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 }
 break;
 case INDEX_op_mulu2_i32:
-t0 = *tb_ptr++;
-t1 = *tb_ptr++;
-t2 = tci_read_rval(regs, &tb_ptr);
-tmp64 = (uint32_t)tci_read_rval(regs, &tb_ptr);
-tci_write_reg64(regs, t1, t0, (uint32_t)t2 * tmp64);
+tci_args_(&tb_ptr, &r0, &r1, &r2, &r3);
+tci_write_reg64(regs, r1, r0, (uint64_t)regs[r2] * regs[r3]);
 break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
-- 
2.25.1




[PATCH v2 66/93] tcg/tci: Push opcode emit into each case

2021-02-03 Thread Richard Henderson
We're about to split out bytecode output into helpers, but
we can't do that one at a time if tcg_out_op_t is being done
outside of the switch.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 35 ---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 53edc50a3b..050d514853 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -385,40 +385,48 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 {
 uint8_t *old_code_ptr = s->code_ptr;
 
-tcg_out_op_t(s, opc);
-
 switch (opc) {
 case INDEX_op_exit_tb:
+tcg_out_op_t(s, opc);
 tcg_out_i(s, args[0]);
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 case INDEX_op_goto_tb:
 tcg_debug_assert(s->tb_jmp_insn_offset == 0);
 /* indirect jump method. */
+tcg_out_op_t(s, opc);
 tcg_out_i(s, (uintptr_t)(s->tb_jmp_target_addr + args[0]));
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 set_jmp_reset_offset(s, args[0]);
 break;
 
 case INDEX_op_br:
+tcg_out_op_t(s, opc);
 tci_out_label(s, arg_label(args[0]));
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 CASE_32_64(setcond)
+tcg_out_op_t(s, opc);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_out_r(s, args[2]);
 tcg_out8(s, args[3]);   /* condition */
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_setcond2_i32:
 /* setcond2_i32 cond, t0, t1_low, t1_high, t2_low, t2_high */
+tcg_out_op_t(s, opc);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_out_r(s, args[2]);
 tcg_out_r(s, args[3]);
 tcg_out_r(s, args[4]);
 tcg_out8(s, args[5]);   /* condition */
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 #endif
 
@@ -436,10 +444,12 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 CASE_64(st32)
 CASE_64(st)
 stack_bounds_check(args[1], args[2]);
+tcg_out_op_t(s, opc);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_debug_assert(args[2] == (int32_t)args[2]);
 tcg_out32(s, args[2]);
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 CASE_32_64(add)
@@ -462,12 +472,15 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 CASE_32_64(divu) /* Optional (TCG_TARGET_HAS_div_*). */
 CASE_32_64(rem)  /* Optional (TCG_TARGET_HAS_div_*). */
 CASE_32_64(remu) /* Optional (TCG_TARGET_HAS_div_*). */
+tcg_out_op_t(s, opc);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_out_r(s, args[2]);
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 CASE_32_64(deposit)  /* Optional (TCG_TARGET_HAS_deposit_*). */
+tcg_out_op_t(s, opc);
 {
 TCGArg pos = args[3], len = args[4];
 TCGArg max = opc == INDEX_op_deposit_i32 ? 32 : 64;
@@ -481,13 +494,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 tcg_out8(s, pos);
 tcg_out8(s, len);
 }
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 CASE_32_64(brcond)
+tcg_out_op_t(s, opc);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_out8(s, args[2]);   /* condition */
 tci_out_label(s, arg_label(args[3]));
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 CASE_32_64(neg)  /* Optional (TCG_TARGET_HAS_neg_*). */
@@ -503,48 +519,59 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 CASE_32_64(bswap16)  /* Optional (TCG_TARGET_HAS_bswap16_*). */
 CASE_32_64(bswap32)  /* Optional (TCG_TARGET_HAS_bswap32_*). */
 CASE_64(bswap64) /* Optional (TCG_TARGET_HAS_bswap64_i64). */
+tcg_out_op_t(s, opc);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_add2_i32:
 case INDEX_op_sub2_i32:
+tcg_out_op_t(s, opc);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_out_r(s, args[2]);
 tcg_out_r(s, args[3]);
 tcg_out_r(s, args[4]);
 tcg_out_r(s, args[5]);
+old_code_ptr[1] = s->code_ptr - old_code_ptr;
 break;
 case INDEX_op_brcond2_i32:
+tcg_out_op_t(s, opc);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_out_r(s, args[2]);
 tcg_out_r(s, args[3]);
 tcg_out8(s, args[4]);   /* condition */
 tci_out_label(s, arg_label(args[5]));
+old_

[PATCH v2 53/93] tcg/tci: Reuse tci_args_l for goto_tb

2021-02-03 Thread Richard Henderson
Convert to indirect jumps, as it's less complicated.
Then we just have a pointer to the tb address at which
the chain is stored, from which we read.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h | 11 +++
 tcg/tci.c|  8 +++-
 tcg/tci/tcg-target.c.inc | 13 +++--
 3 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 9c0021a26f..9285c930a2 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -87,7 +87,7 @@
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
 #define TCG_TARGET_HAS_goto_ptr 0
-#define TCG_TARGET_HAS_direct_jump  1
+#define TCG_TARGET_HAS_direct_jump  0
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #if TCG_TARGET_REG_BITS == 64
@@ -174,12 +174,7 @@ void tci_disas(uint8_t opc);
 
 #define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
-static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
-uintptr_t jmp_rw, uintptr_t addr)
-{
-/* patch the branch destination */
-qatomic_set((int32_t *)jmp_rw, addr - (jmp_rx + 4));
-/* no need to flush icache explicitly */
-}
+/* not defined -- call should be eliminated at compile time */
+void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t, uintptr_t);
 
 #endif /* TCG_TARGET_H */
diff --git a/tcg/tci.c b/tcg/tci.c
index 57b6defe09..0301ee63a7 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -833,13 +833,11 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 return (uintptr_t)ptr;
 
 case INDEX_op_goto_tb:
-/* Jump address is aligned */
-tb_ptr = QEMU_ALIGN_PTR_UP(tb_ptr, 4);
-t0 = qatomic_read((int32_t *)tb_ptr);
-tb_ptr += sizeof(int32_t);
+tci_args_l(&tb_ptr, &ptr);
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr += (int32_t)t0;
+tb_ptr = *(void **)ptr;
 continue;
+
 case INDEX_op_qemu_ld_i32:
 t0 = *tb_ptr++;
 taddr = tci_read_ulong(regs, &tb_ptr);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index ff8040510f..2c64b4f617 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -405,16 +405,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 break;
 
 case INDEX_op_goto_tb:
-if (s->tb_jmp_insn_offset) {
-/* Direct jump method. */
-/* Align for atomic patching and thread safety */
-s->code_ptr = QEMU_ALIGN_PTR_UP(s->code_ptr, 4);
-s->tb_jmp_insn_offset[args[0]] = tcg_current_code_size(s);
-tcg_out32(s, 0);
-} else {
-/* Indirect jump method. */
-TODO();
-}
+tcg_debug_assert(s->tb_jmp_insn_offset == 0);
+/* indirect jump method. */
+tcg_out_i(s, (uintptr_t)(s->tb_jmp_target_addr + args[0]));
 set_jmp_reset_offset(s, args[0]);
 break;
 
-- 
2.25.1




[PATCH v2 46/93] tcg/tci: Split out tci_args_rrrc

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 1736234bfd..86625061f1 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -207,6 +207,15 @@ static void tci_args_rrs(const uint8_t **tb_ptr,
 *i2 = tci_read_s32(tb_ptr);
 }
 
+static void tci_args_rrrc(const uint8_t **tb_ptr,
+  TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*c3 = tci_read_b(tb_ptr);
+}
+
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
 bool result = false;
@@ -430,11 +439,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tb_ptr = (uint8_t *)label;
 continue;
 case INDEX_op_setcond_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-condition = *tb_ptr++;
-tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
+tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
+regs[r0] = tci_compare32(regs[r1], regs[r2], condition);
 break;
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_setcond2_i32:
@@ -446,11 +452,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #elif TCG_TARGET_REG_BITS == 64
 case INDEX_op_setcond_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-condition = *tb_ptr++;
-tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
+tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
+regs[r0] = tci_compare64(regs[r1], regs[r2], condition);
 break;
 #endif
 CASE_32_64(mov)
-- 
2.25.1




[PATCH v2 54/93] tcg/tci: Split out tci_args_rrrrrr

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 0301ee63a7..84d77855ee 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -258,6 +258,17 @@ static void tci_args_rc(const uint8_t **tb_ptr, TCGReg 
*r0, TCGReg *r1,
 *r4 = tci_read_r(tb_ptr);
 *c5 = tci_read_b(tb_ptr);
 }
+
+static void tci_args_rr(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGReg *r5)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+*r4 = tci_read_r(tb_ptr);
+*r5 = tci_read_r(tb_ptr);
+}
 #endif
 
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
@@ -437,7 +448,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint32_t tmp32;
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-TCGReg r3, r4;
+TCGReg r3, r4, r5;
 uint64_t T1, T2;
 #endif
 TCGMemOpIdx oi;
@@ -643,18 +654,16 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_add2_i32:
-t0 = *tb_ptr++;
-t1 = *tb_ptr++;
-tmp64 = tci_read_r64(regs, &tb_ptr);
-tmp64 += tci_read_r64(regs, &tb_ptr);
-tci_write_reg64(regs, t1, t0, tmp64);
+tci_args_rr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+T1 = tci_uint64(regs[r3], regs[r2]);
+T2 = tci_uint64(regs[r5], regs[r4]);
+tci_write_reg64(regs, r1, r0, T1 + T2);
 break;
 case INDEX_op_sub2_i32:
-t0 = *tb_ptr++;
-t1 = *tb_ptr++;
-tmp64 = tci_read_r64(regs, &tb_ptr);
-tmp64 -= tci_read_r64(regs, &tb_ptr);
-tci_write_reg64(regs, t1, t0, tmp64);
+tci_args_rr(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &r5);
+T1 = tci_uint64(regs[r3], regs[r2]);
+T2 = tci_uint64(regs[r5], regs[r4]);
+tci_write_reg64(regs, r1, r0, T1 - T2);
 break;
 case INDEX_op_brcond2_i32:
 tci_args_cl(&tb_ptr, &r0, &r1, &r2, &r3, &condition, &ptr);
-- 
2.25.1




[PATCH v2 64/93] tcg/tci: Improve tcg_target_call_clobber_regs

2021-02-03 Thread Richard Henderson
The current setting is much too pessimistic.  Indicating only
the one or two registers that are actually assigned after a
call should avoid unnecessary movement between the register
array and the stack array.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.c.inc | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 8d75482546..4dae09deda 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -623,8 +623,14 @@ static void tcg_target_init(TCGContext *s)
 tcg_target_available_regs[TCG_TYPE_I32] = BIT(TCG_TARGET_NB_REGS) - 1;
 /* Registers available for 64 bit operations. */
 tcg_target_available_regs[TCG_TYPE_I64] = BIT(TCG_TARGET_NB_REGS) - 1;
-/* TODO: Which registers should be set here? */
-tcg_target_call_clobber_regs = BIT(TCG_TARGET_NB_REGS) - 1;
+/*
+ * The interpreter "registers" are in the local stack frame and
+ * cannot be clobbered by the called helper functions.  However,
+ * the interpreter assumes a 64-bit return value and assigns to
+ * the return value registers.
+ */
+tcg_target_call_clobber_regs =
+MAKE_64BIT_MASK(TCG_REG_R0, 64 / TCG_TARGET_REG_BITS);
 
 s->reserved_regs = 0;
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
-- 
2.25.1




[PATCH v2 52/93] tcg/tci: Reuse tci_args_l for exit_tb

2021-02-03 Thread Richard Henderson
Do not emit a uint64_t, but a tcg_target_ulong, aka uintptr_t.
This reduces the size of the constant on 32-bit hosts.
The assert for label != NULL has to be removed because that
is a valid value for exit_tb.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c| 13 -
 tcg/tci/tcg-target.c.inc |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 92b13829c3..57b6defe09 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -160,9 +160,7 @@ tci_read_ulong(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 
 static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
 {
-tcg_target_ulong label = tci_read_i(tb_ptr);
-tci_assert(label != 0);
-return label;
+return tci_read_i(tb_ptr);
 }
 
 /*
@@ -417,7 +415,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tcg_target_ulong regs[TCG_TARGET_NB_REGS];
 long tcg_temps[CPU_TEMP_BUF_NLONGS];
 uintptr_t sp_value = (uintptr_t)(tcg_temps + CPU_TEMP_BUF_NLONGS);
-uintptr_t ret = 0;
 
 regs[TCG_AREG0] = (tcg_target_ulong)env;
 regs[TCG_REG_CALL_STACK] = sp_value;
@@ -832,9 +829,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 /* QEMU specific operations. */
 
 case INDEX_op_exit_tb:
-ret = *(uint64_t *)tb_ptr;
-goto exit;
-break;
+tci_args_l(&tb_ptr, &ptr);
+return (uintptr_t)ptr;
+
 case INDEX_op_goto_tb:
 /* Jump address is aligned */
 tb_ptr = QEMU_ALIGN_PTR_UP(tb_ptr, 4);
@@ -992,6 +989,4 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 }
 tci_assert(tb_ptr == old_code_ptr + op_size);
 }
-exit:
-return ret;
 }
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index c79f9c32d8..ff8040510f 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -401,7 +401,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 
 switch (opc) {
 case INDEX_op_exit_tb:
-tcg_out64(s, args[0]);
+tcg_out_i(s, args[0]);
 break;
 
 case INDEX_op_goto_tb:
-- 
2.25.1




[PATCH v2 49/93] tcg/tci: Split out tci_args_rrcl and tci_args_rrrrcl

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 52 
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 692b95b5c2..1e2f78a9f9 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -212,6 +212,15 @@ static void tci_args_rrs(const uint8_t **tb_ptr,
 *i2 = tci_read_s32(tb_ptr);
 }
 
+static void tci_args_rrcl(const uint8_t **tb_ptr,
+  TCGReg *r0, TCGReg *r1, TCGCond *c2, void **l3)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*c2 = tci_read_b(tb_ptr);
+*l3 = (void *)tci_read_label(tb_ptr);
+}
+
 static void tci_args_rrrc(const uint8_t **tb_ptr,
   TCGReg *r0, TCGReg *r1, TCGReg *r2, TCGCond *c3)
 {
@@ -222,6 +231,17 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 }
 
 #if TCG_TARGET_REG_BITS == 32
+static void tci_args_cl(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+TCGReg *r2, TCGReg *r3, TCGCond *c4, void **l5)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+*c4 = tci_read_b(tb_ptr);
+*l5 = (void *)tci_read_label(tb_ptr);
+}
+
 static void tci_args_rc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
 TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
 {
@@ -405,7 +425,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tcg_target_ulong t0;
 tcg_target_ulong t1;
 tcg_target_ulong t2;
-tcg_target_ulong label;
 TCGCond condition;
 target_ulong taddr;
 uint8_t tmp8;
@@ -414,7 +433,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
 TCGReg r3, r4;
-uint64_t v64, T1, T2;
+uint64_t T1, T2;
 #endif
 TCGMemOpIdx oi;
 int32_t ofs;
@@ -611,13 +630,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #endif
 case INDEX_op_brcond_i32:
-t0 = tci_read_rval(regs, &tb_ptr);
-t1 = tci_read_rval(regs, &tb_ptr);
-condition = *tb_ptr++;
-label = tci_read_label(&tb_ptr);
-if (tci_compare32(t0, t1, condition)) {
+tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
+if (tci_compare32(regs[r0], regs[r1], condition)) {
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr = (uint8_t *)label;
+tb_ptr = ptr;
 continue;
 }
 break;
@@ -637,13 +653,12 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg64(regs, t1, t0, tmp64);
 break;
 case INDEX_op_brcond2_i32:
-tmp64 = tci_read_r64(regs, &tb_ptr);
-v64 = tci_read_r64(regs, &tb_ptr);
-condition = *tb_ptr++;
-label = tci_read_label(&tb_ptr);
-if (tci_compare64(tmp64, v64, condition)) {
+tci_args_cl(&tb_ptr, &r0, &r1, &r2, &r3, &condition, &ptr);
+T1 = tci_uint64(regs[r1], regs[r0]);
+T2 = tci_uint64(regs[r3], regs[r2]);
+if (tci_compare64(T1, T2, condition)) {
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr = (uint8_t *)label;
+tb_ptr = ptr;
 continue;
 }
 break;
@@ -783,13 +798,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #endif
 case INDEX_op_brcond_i64:
-t0 = tci_read_rval(regs, &tb_ptr);
-t1 = tci_read_rval(regs, &tb_ptr);
-condition = *tb_ptr++;
-label = tci_read_label(&tb_ptr);
-if (tci_compare64(t0, t1, condition)) {
+tci_args_rrcl(&tb_ptr, &r0, &r1, &condition, &ptr);
+if (tci_compare64(regs[r0], regs[r1], condition)) {
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr = (uint8_t *)label;
+tb_ptr = ptr;
 continue;
 }
 break;
-- 
2.25.1




[PATCH v2 51/93] tcg/tci: Reuse tci_args_l for calls.

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 5cc05fa554..92b13829c3 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -452,30 +452,30 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 switch (opc) {
 case INDEX_op_call:
-t0 = tci_read_i(&tb_ptr);
+tci_args_l(&tb_ptr, &ptr);
 tci_tb_ptr = (uintptr_t)tb_ptr;
 #if TCG_TARGET_REG_BITS == 32
-tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
-  tci_read_reg(regs, TCG_REG_R1),
-  tci_read_reg(regs, TCG_REG_R2),
-  tci_read_reg(regs, TCG_REG_R3),
-  tci_read_reg(regs, TCG_REG_R4),
-  tci_read_reg(regs, TCG_REG_R5),
-  tci_read_reg(regs, TCG_REG_R6),
-  tci_read_reg(regs, TCG_REG_R7),
-  tci_read_reg(regs, TCG_REG_R8),
-  tci_read_reg(regs, TCG_REG_R9),
-  tci_read_reg(regs, TCG_REG_R10),
-  tci_read_reg(regs, TCG_REG_R11));
+tmp64 = ((helper_function)ptr)(tci_read_reg(regs, TCG_REG_R0),
+   tci_read_reg(regs, TCG_REG_R1),
+   tci_read_reg(regs, TCG_REG_R2),
+   tci_read_reg(regs, TCG_REG_R3),
+   tci_read_reg(regs, TCG_REG_R4),
+   tci_read_reg(regs, TCG_REG_R5),
+   tci_read_reg(regs, TCG_REG_R6),
+   tci_read_reg(regs, TCG_REG_R7),
+   tci_read_reg(regs, TCG_REG_R8),
+   tci_read_reg(regs, TCG_REG_R9),
+   tci_read_reg(regs, TCG_REG_R10),
+   tci_read_reg(regs, TCG_REG_R11));
 tci_write_reg(regs, TCG_REG_R0, tmp64);
 tci_write_reg(regs, TCG_REG_R1, tmp64 >> 32);
 #else
-tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
-  tci_read_reg(regs, TCG_REG_R1),
-  tci_read_reg(regs, TCG_REG_R2),
-  tci_read_reg(regs, TCG_REG_R3),
-  tci_read_reg(regs, TCG_REG_R4),
-  tci_read_reg(regs, TCG_REG_R5));
+tmp64 = ((helper_function)ptr)(tci_read_reg(regs, TCG_REG_R0),
+   tci_read_reg(regs, TCG_REG_R1),
+   tci_read_reg(regs, TCG_REG_R2),
+   tci_read_reg(regs, TCG_REG_R3),
+   tci_read_reg(regs, TCG_REG_R4),
+   tci_read_reg(regs, TCG_REG_R5));
 tci_write_reg(regs, TCG_REG_R0, tmp64);
 #endif
 break;
-- 
2.25.1




[PATCH v2 62/93] tcg: Build ffi data structures for helpers

2021-02-03 Thread Richard Henderson
We will shortly use libffi for tci, as that is the only
portable way of calling arbitrary functions.

Signed-off-by: Richard Henderson 
---
 meson.build|   9 +-
 include/exec/helper-ffi.h  | 115 +
 include/exec/helper-tcg.h  |  24 --
 target/hppa/helper.h   |   2 +
 target/i386/ops_sse_header.h   |   6 ++
 target/m68k/helper.h   |   1 +
 target/ppc/helper.h|   3 +
 tcg/tcg.c  |  20 +
 tests/docker/dockerfiles/fedora.docker |   1 +
 9 files changed, 172 insertions(+), 9 deletions(-)
 create mode 100644 include/exec/helper-ffi.h

diff --git a/meson.build b/meson.build
index 475d8a94ea..fc08f15a00 100644
--- a/meson.build
+++ b/meson.build
@@ -1901,7 +1901,14 @@ specific_ss.add(when: 'CONFIG_TCG', if_true: files(
   'tcg/tcg-op.c',
   'tcg/tcg.c',
 ))
-specific_ss.add(when: 'CONFIG_TCG_INTERPRETER', if_true: files('tcg/tci.c'))
+
+if get_option('tcg_interpreter')
+  libffi = dependency('libffi', version: '>=3.0',
+  static: enable_static, method: 'pkg-config',
+  required: true)
+  specific_ss.add(libffi)
+  specific_ss.add(files('tcg/tci.c'))
+endif
 
 subdir('backends')
 subdir('disas')
diff --git a/include/exec/helper-ffi.h b/include/exec/helper-ffi.h
new file mode 100644
index 00..3af1065af3
--- /dev/null
+++ b/include/exec/helper-ffi.h
@@ -0,0 +1,115 @@
+/*
+ * Helper file for declaring TCG helper functions.
+ * This one defines data structures private to tcg.c.
+ */
+
+#ifndef HELPER_FFI_H
+#define HELPER_FFI_H 1
+
+#include "exec/helper-head.h"
+
+#define dh_ffitype_i32  &ffi_type_uint32
+#define dh_ffitype_s32  &ffi_type_sint32
+#define dh_ffitype_int  &ffi_type_sint
+#define dh_ffitype_i64  &ffi_type_uint64
+#define dh_ffitype_s64  &ffi_type_sint64
+#define dh_ffitype_f16  &ffi_type_uint32
+#define dh_ffitype_f32  &ffi_type_uint32
+#define dh_ffitype_f64  &ffi_type_uint64
+#ifdef TARGET_LONG_BITS
+# if TARGET_LONG_BITS == 32
+#  define dh_ffitype_tl &ffi_type_uint32
+# else
+#  define dh_ffitype_tl &ffi_type_uint64
+# endif
+#endif
+#define dh_ffitype_ptr  &ffi_type_pointer
+#define dh_ffitype_cptr &ffi_type_pointer
+#define dh_ffitype_void &ffi_type_void
+#define dh_ffitype_noreturn &ffi_type_void
+#define dh_ffitype_env  &ffi_type_pointer
+#define dh_ffitype(t) glue(dh_ffitype_, t)
+
+#define DEF_HELPER_FLAGS_0(NAME, FLAGS, ret)\
+static ffi_cif glue(cif_,NAME) = {  \
+.rtype = dh_ffitype(ret), .nargs = 0,   \
+};
+
+#define DEF_HELPER_FLAGS_1(NAME, FLAGS, ret, t1)\
+static ffi_type *glue(cif_args_,NAME)[1] = { dh_ffitype(t1) };  \
+static ffi_cif glue(cif_,NAME) = {  \
+.rtype = dh_ffitype(ret), .nargs = 1,   \
+.arg_types = glue(cif_args_,NAME),  \
+};
+
+#define DEF_HELPER_FLAGS_2(NAME, FLAGS, ret, t1, t2)\
+static ffi_type *glue(cif_args_,NAME)[2] = {\
+dh_ffitype(t1), dh_ffitype(t2)  \
+};  \
+static ffi_cif glue(cif_,NAME) = {  \
+.rtype = dh_ffitype(ret), .nargs = 2,   \
+.arg_types = glue(cif_args_,NAME),  \
+};
+
+#define DEF_HELPER_FLAGS_3(NAME, FLAGS, ret, t1, t2, t3)\
+static ffi_type *glue(cif_args_,NAME)[3] = {\
+dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3)  \
+};  \
+static ffi_cif glue(cif_,NAME) = {  \
+.rtype = dh_ffitype(ret), .nargs = 3,   \
+.arg_types = glue(cif_args_,NAME),  \
+};
+
+#define DEF_HELPER_FLAGS_4(NAME, FLAGS, ret, t1, t2, t3, t4)\
+static ffi_type *glue(cif_args_,NAME)[4] = {\
+dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3), dh_ffitype(t4)  \
+};  \
+static ffi_cif glue(cif_,NAME) = {  \
+.rtype = dh_ffitype(ret), .nargs = 4,   \
+.arg_types = glue(cif_args_,NAME),  \
+};
+
+#define DEF_HELPER_FLAGS_5(NAME, FLAGS, ret, t1, t2, t3, t4, t5)\
+static ffi_type *glue(cif_args_,NAME)[5] = {\
+dh_ffitype(t1), dh_ffitype(t2), dh_ffitype(t3), \
+dh_ffitype(t4), dh_ffitype(t5)  \
+};  \
+static ffi_cif glue(cif_,NAME) = {  \
+.rtype = dh_ffitype(ret), .nargs = 5,   \
+.arg_types = glue(cif_args_,N

[PATCH v2 45/93] tcg/tci: Split out tci_args_rrr

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 154 --
 1 file changed, 57 insertions(+), 97 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 0bc5294e8b..1736234bfd 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -191,6 +191,14 @@ static void tci_args_rr(const uint8_t **tb_ptr,
 *r1 = tci_read_r(tb_ptr);
 }
 
+static void tci_args_rrr(const uint8_t **tb_ptr,
+ TCGReg *r0, TCGReg *r1, TCGReg *r2)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+}
+
 static void tci_args_rrs(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
@@ -366,7 +374,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint8_t op_size = tb_ptr[1];
 const uint8_t *old_code_ptr = tb_ptr;
 #endif
-TCGReg r0, r1;
+TCGReg r0, r1, r2;
 tcg_target_ulong t0;
 tcg_target_ulong t1;
 tcg_target_ulong t2;
@@ -503,101 +511,71 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 /* Arithmetic operations (mixed 32/64 bit). */
 
 CASE_32_64(add)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 + t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] + regs[r2];
 break;
 CASE_32_64(sub)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 - t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] - regs[r2];
 break;
 CASE_32_64(mul)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 * t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] * regs[r2];
 break;
 CASE_32_64(and)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 & t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] & regs[r2];
 break;
 CASE_32_64(or)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 | t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] | regs[r2];
 break;
 CASE_32_64(xor)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1 ^ t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = regs[r1] ^ regs[r2];
 break;
 
 /* Arithmetic operations (32 bit). */
 
 case INDEX_op_div_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int32_t)t1 / (int32_t)t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (int32_t)regs[r1] / (int32_t)regs[r2];
 break;
 case INDEX_op_divu_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint32_t)t1 / (uint32_t)t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (uint32_t)regs[r1] / (uint32_t)regs[r2];
 break;
 case INDEX_op_rem_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int32_t)t1 % (int32_t)t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (int32_t)regs[r1] % (int32_t)regs[r2];
 break;
 case INDEX_op_remu_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint32_t)t1 % (uint32_t)t2);
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (uint32_t)regs[r1] % (uint32_t)regs[r2];
 break;
 
 /* Shift/rotate operations (32 bit). */
 
 case INDEX_op_shl_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint32_t)t1 << (t2 & 31));
+tci_args_rrr(&tb_ptr, &r0, &r1, &r2);
+regs[r0] = (uint32_t)regs[r1] << (regs[r2] & 31);
 break;
 case INDEX_op_shr_i32:
-t0 = *tb

[PATCH v2 47/93] tcg/tci: Split out tci_args_l

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 86625061f1..8bc9dd27b0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -184,6 +184,11 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   s = signed ldst offset
  */
 
+static void tci_args_l(const uint8_t **tb_ptr, void **l0)
+{
+*l0 = (void *)tci_read_label(tb_ptr);
+}
+
 static void tci_args_rr(const uint8_t **tb_ptr,
 TCGReg *r0, TCGReg *r1)
 {
@@ -434,9 +439,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 break;
 case INDEX_op_br:
-label = tci_read_label(&tb_ptr);
+tci_args_l(&tb_ptr, &ptr);
 tci_assert(tb_ptr == old_code_ptr + op_size);
-tb_ptr = (uint8_t *)label;
+tb_ptr = ptr;
 continue;
 case INDEX_op_setcond_i32:
 tci_args_rrrc(&tb_ptr, &r0, &r1, &r2, &condition);
-- 
2.25.1




[PATCH v2 50/93] tcg/tci: Split out tci_args_ri and tci_args_rI

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 1e2f78a9f9..5cc05fa554 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -121,16 +121,6 @@ static int32_t tci_read_s32(const uint8_t **tb_ptr)
 return value;
 }
 
-#if TCG_TARGET_REG_BITS == 64
-/* Read constant (64 bit) from bytecode. */
-static uint64_t tci_read_i64(const uint8_t **tb_ptr)
-{
-uint64_t value = *(const uint64_t *)(*tb_ptr);
-*tb_ptr += sizeof(value);
-return value;
-}
-#endif
-
 /* Read indexed register (native size) from bytecode. */
 static tcg_target_ulong
 tci_read_rval(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
@@ -180,6 +170,8 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   tci_args_
  * where arguments is a sequence of
  *
+ *   i = immediate (uint32_t)
+ *   I = immediate (tcg_target_ulong)
  *   r = register
  *   s = signed ldst offset
  */
@@ -196,6 +188,22 @@ static void tci_args_rr(const uint8_t **tb_ptr,
 *r1 = tci_read_r(tb_ptr);
 }
 
+static void tci_args_ri(const uint8_t **tb_ptr,
+TCGReg *r0, tcg_target_ulong *i1)
+{
+*r0 = tci_read_r(tb_ptr);
+*i1 = tci_read_i32(tb_ptr);
+}
+
+#if TCG_TARGET_REG_BITS == 64
+static void tci_args_rI(const uint8_t **tb_ptr,
+TCGReg *r0, tcg_target_ulong *i1)
+{
+*r0 = tci_read_r(tb_ptr);
+*i1 = tci_read_i(tb_ptr);
+}
+#endif
+
 static void tci_args_rrr(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, TCGReg *r2)
 {
@@ -498,9 +506,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 regs[r0] = regs[r1];
 break;
 case INDEX_op_tci_movi_i32:
-t0 = *tb_ptr++;
-t1 = tci_read_i32(&tb_ptr);
-tci_write_reg(regs, t0, t1);
+tci_args_ri(&tb_ptr, &r0, &t1);
+regs[r0] = t1;
 break;
 
 /* Load/store operations (32 bit). */
@@ -720,9 +727,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif
 #if TCG_TARGET_REG_BITS == 64
 case INDEX_op_tci_movi_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_i64(&tb_ptr);
-tci_write_reg(regs, t0, t1);
+tci_args_rI(&tb_ptr, &r0, &t1);
+regs[r0] = t1;
 break;
 
 /* Load/store operations (64 bit). */
-- 
2.25.1




[PATCH v2 60/93] tcg/tci: Remove tci_disas

2021-02-03 Thread Richard Henderson
This function is unused.  It's not even the disassembler,
which is print_insn_tci, located in disas/tci.c.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h |  2 --
 tcg/tci/tcg-target.c.inc | 10 --
 2 files changed, 12 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 9285c930a2..52af6d8bc5 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -163,8 +163,6 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET0
 #define TCG_TARGET_STACK_ALIGN  16
 
-void tci_disas(uint8_t opc);
-
 #define HAVE_TCG_QEMU_TB_EXEC
 
 /* We could notice __i386__ or __s390x__ and reduce the barriers depending
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 6c187a25cc..7fb3b04eaf 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -253,16 +253,6 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 return true;
 }
 
-#if defined(CONFIG_DEBUG_TCG_INTERPRETER)
-/* Show current bytecode. Used by tcg interpreter. */
-void tci_disas(uint8_t opc)
-{
-const TCGOpDef *def = &tcg_op_defs[opc];
-fprintf(stderr, "TCG %s %u, %u, %u\n",
-def->name, def->nb_oargs, def->nb_iargs, def->nb_cargs);
-}
-#endif
-
 /* Write value (native size). */
 static void tcg_out_i(TCGContext *s, tcg_target_ulong v)
 {
-- 
2.25.1




[PATCH v2 44/93] tcg/tci: Split out tci_args_rr

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 67 +--
 1 file changed, 31 insertions(+), 36 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index be298ae39d..0bc5294e8b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -184,6 +184,13 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
  *   s = signed ldst offset
  */
 
+static void tci_args_rr(const uint8_t **tb_ptr,
+TCGReg *r0, TCGReg *r1)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+}
+
 static void tci_args_rrs(const uint8_t **tb_ptr,
  TCGReg *r0, TCGReg *r1, int32_t *i2)
 {
@@ -439,9 +446,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #endif
 CASE_32_64(mov)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = regs[r1];
 break;
 case INDEX_op_tci_movi_i32:
 t0 = *tb_ptr++;
@@ -652,58 +658,50 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #endif /* TCG_TARGET_REG_BITS == 32 */
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
 CASE_32_64(ext8s)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int8_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (int8_t)regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
 CASE_32_64(ext16s)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int16_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (int16_t)regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_ext8u_i32 || TCG_TARGET_HAS_ext8u_i64
 CASE_32_64(ext8u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint8_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (uint8_t)regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_ext16u_i32 || TCG_TARGET_HAS_ext16u_i64
 CASE_32_64(ext16u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint16_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (uint16_t)regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32 || TCG_TARGET_HAS_bswap16_i64
 CASE_32_64(bswap16)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap16(t1));
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = bswap16(regs[r1]);
 break;
 #endif
 #if TCG_TARGET_HAS_bswap32_i32 || TCG_TARGET_HAS_bswap32_i64
 CASE_32_64(bswap32)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap32(t1));
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = bswap32(regs[r1]);
 break;
 #endif
 #if TCG_TARGET_HAS_not_i32 || TCG_TARGET_HAS_not_i64
 CASE_32_64(not)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, ~t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = ~regs[r1];
 break;
 #endif
 #if TCG_TARGET_HAS_neg_i32 || TCG_TARGET_HAS_neg_i64
 CASE_32_64(neg)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, -t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = -regs[r1];
 break;
 #endif
 #if TCG_TARGET_REG_BITS == 64
@@ -816,21 +814,18 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 case INDEX_op_ext32s_i64:
 case INDEX_op_ext_i32_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int32_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (int32_t)regs[r1];
 break;
 case INDEX_op_ext32u_i64:
 case INDEX_op_extu_i32_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint32_t)t1);
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = (uint32_t)regs[r1];
 break;
 #if TCG_TARGET_HAS_bswap64_i64
 case INDEX_op_bswap64_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap64(t1));
+tci_args_rr(&tb_ptr, &r0, &r1);
+regs[r0] = bswap64(regs[r1]);
 break;
 #endif
 #endif /* TCG_TARGET_REG_BITS == 64 */
-- 
2.25.1




[PATCH v2 36/93] tcg/tci: Reduce use of tci_read_r64

2021-02-03 Thread Richard Henderson
In all cases restricted to 64-bit hosts, tcg_read_r is
identical.  We retain the 64-bit symbol for the single
case of INDEX_op_qemu_st_i64.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 93 +--
 1 file changed, 42 insertions(+), 51 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 9c8395397a..0246e663a3 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -57,13 +57,6 @@ static tcg_target_ulong tci_read_reg(const tcg_target_ulong 
*regs, TCGReg index)
 return regs[index];
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index)
-{
-return tci_read_reg(regs, index);
-}
-#endif
-
 static void
 tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
 {
@@ -146,9 +139,7 @@ static uint64_t tci_read_r64(const tcg_target_ulong *regs,
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
 {
-uint64_t value = tci_read_reg64(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
+return tci_read_r(regs, tb_ptr);
 }
 #endif
 
@@ -407,8 +398,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #elif TCG_TARGET_REG_BITS == 64
 case INDEX_op_setcond_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
 break;
@@ -689,7 +680,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_REG_BITS == 64
 case INDEX_op_mov_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1);
 break;
 case INDEX_op_tci_movi_i64:
@@ -713,7 +704,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, *(uint64_t *)(t1 + t2));
 break;
 case INDEX_op_st_i64:
-t0 = tci_read_r64(regs, &tb_ptr);
+t0 = tci_read_r(regs, &tb_ptr);
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 *(uint64_t *)(t1 + t2) = t0;
@@ -723,62 +714,62 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 case INDEX_op_add_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 + t2);
 break;
 case INDEX_op_sub_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 - t2);
 break;
 case INDEX_op_mul_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1 * t2);
 break;
 case INDEX_op_div_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int64_t)t1 / (int64_t)t2);
 break;
 case INDEX_op_divu_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint64_t)t1 / (uint64_t)t2);
 break;
 case INDEX_op_rem_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int64_t)t1 % (int64_t)t2);
 break;
 case INDEX_op_remu_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint64_t)t1 % (uint64_t)t2);
 break;
 case INDEX_op_and_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_r64(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
+t2 = tci_read_r(regs, &tb_p

[PATCH v2 48/93] tcg/tci: Split out tci_args_rrrrrc

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 8bc9dd27b0..692b95b5c2 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -221,6 +221,19 @@ static void tci_args_rrrc(const uint8_t **tb_ptr,
 *c3 = tci_read_b(tb_ptr);
 }
 
+#if TCG_TARGET_REG_BITS == 32
+static void tci_args_rc(const uint8_t **tb_ptr, TCGReg *r0, TCGReg *r1,
+TCGReg *r2, TCGReg *r3, TCGReg *r4, TCGCond *c5)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*r2 = tci_read_r(tb_ptr);
+*r3 = tci_read_r(tb_ptr);
+*r4 = tci_read_r(tb_ptr);
+*c5 = tci_read_b(tb_ptr);
+}
+#endif
+
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
 bool result = false;
@@ -400,7 +413,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint32_t tmp32;
 uint64_t tmp64;
 #if TCG_TARGET_REG_BITS == 32
-uint64_t v64;
+TCGReg r3, r4;
+uint64_t v64, T1, T2;
 #endif
 TCGMemOpIdx oi;
 int32_t ofs;
@@ -449,11 +463,10 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 break;
 #if TCG_TARGET_REG_BITS == 32
 case INDEX_op_setcond2_i32:
-t0 = *tb_ptr++;
-tmp64 = tci_read_r64(regs, &tb_ptr);
-v64 = tci_read_r64(regs, &tb_ptr);
-condition = *tb_ptr++;
-tci_write_reg(regs, t0, tci_compare64(tmp64, v64, condition));
+tci_args_rc(&tb_ptr, &r0, &r1, &r2, &r3, &r4, &condition);
+T1 = tci_uint64(regs[r2], regs[r1]);
+T2 = tci_uint64(regs[r4], regs[r3]);
+regs[r0] = tci_compare64(T1, T2, condition);
 break;
 #elif TCG_TARGET_REG_BITS == 64
 case INDEX_op_setcond_i64:
-- 
2.25.1




[PATCH v2 57/93] tcg/tci: Reduce qemu_ld/st TCGMemOpIdx operand to 32-bits

2021-02-03 Thread Richard Henderson
We are currently using the "natural" size routine, which
uses 64-bits on a 64-bit host.  The TCGMemOpIdx operand
has 11 bits, so we can safely reduce to 32-bits.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c| 8 
 tcg/tci/tcg-target.c.inc | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index e10ccfc344..ddc138359b 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -855,7 +855,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 case INDEX_op_qemu_ld_i32:
 t0 = *tb_ptr++;
 taddr = tci_read_ulong(regs, &tb_ptr);
-oi = tci_read_i(&tb_ptr);
+oi = tci_read_i32(&tb_ptr);
 switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
 case MO_UB:
 tmp32 = qemu_ld_ub;
@@ -892,7 +892,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 t1 = *tb_ptr++;
 }
 taddr = tci_read_ulong(regs, &tb_ptr);
-oi = tci_read_i(&tb_ptr);
+oi = tci_read_i32(&tb_ptr);
 switch (get_memop(oi) & (MO_BSWAP | MO_SSIZE)) {
 case MO_UB:
 tmp64 = qemu_ld_ub;
@@ -941,7 +941,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 case INDEX_op_qemu_st_i32:
 t0 = tci_read_rval(regs, &tb_ptr);
 taddr = tci_read_ulong(regs, &tb_ptr);
-oi = tci_read_i(&tb_ptr);
+oi = tci_read_i32(&tb_ptr);
 switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
 case MO_UB:
 qemu_st_b(t0);
@@ -965,7 +965,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 case INDEX_op_qemu_st_i64:
 tmp64 = tci_read_r64(regs, &tb_ptr);
 taddr = tci_read_ulong(regs, &tb_ptr);
-oi = tci_read_i(&tb_ptr);
+oi = tci_read_i32(&tb_ptr);
 switch (get_memop(oi) & (MO_BSWAP | MO_SIZE)) {
 case MO_UB:
 qemu_st_b(tmp64);
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 640407b4a8..6c187a25cc 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -550,7 +550,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
 tcg_out_r(s, *args++);
 }
-tcg_out_i(s, *args++);
+tcg_out32(s, *args++);
 break;
 
 case INDEX_op_qemu_ld_i64:
@@ -563,7 +563,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
 tcg_out_r(s, *args++);
 }
-tcg_out_i(s, *args++);
+tcg_out32(s, *args++);
 break;
 
 case INDEX_op_mb:
-- 
2.25.1




[PATCH v2 42/93] tcg/tci: Rename tci_read_r to tci_read_rval

2021-02-03 Thread Richard Henderson
In the next patches, we want to use tci_read_r to return
the raw register number.  So rename the existing function,
which returns the register value, to tci_read_rval.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 192 +++---
 1 file changed, 96 insertions(+), 96 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 225cb698e8..20aaaca959 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -119,7 +119,7 @@ static uint64_t tci_read_i64(const uint8_t **tb_ptr)
 
 /* Read indexed register (native size) from bytecode. */
 static tcg_target_ulong
-tci_read_r(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
+tci_read_rval(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
 {
 tcg_target_ulong value = tci_read_reg(regs, **tb_ptr);
 *tb_ptr += 1;
@@ -131,15 +131,15 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
 {
-uint32_t low = tci_read_r(regs, tb_ptr);
-return tci_uint64(tci_read_r(regs, tb_ptr), low);
+uint32_t low = tci_read_rval(regs, tb_ptr);
+return tci_uint64(tci_read_rval(regs, tb_ptr), low);
 }
 #elif TCG_TARGET_REG_BITS == 64
 /* Read indexed register (64 bit) from bytecode. */
 static uint64_t tci_read_r64(const tcg_target_ulong *regs,
  const uint8_t **tb_ptr)
 {
-return tci_read_r(regs, tb_ptr);
+return tci_read_rval(regs, tb_ptr);
 }
 #endif
 
@@ -147,9 +147,9 @@ static uint64_t tci_read_r64(const tcg_target_ulong *regs,
 static target_ulong
 tci_read_ulong(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
 {
-target_ulong taddr = tci_read_r(regs, tb_ptr);
+target_ulong taddr = tci_read_rval(regs, tb_ptr);
 #if TARGET_LONG_BITS > TCG_TARGET_REG_BITS
-taddr += (uint64_t)tci_read_r(regs, tb_ptr) << 32;
+taddr += (uint64_t)tci_read_rval(regs, tb_ptr) << 32;
 #endif
 return taddr;
 }
@@ -382,8 +382,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 continue;
 case INDEX_op_setcond_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
+t2 = tci_read_rval(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
 break;
@@ -398,15 +398,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #elif TCG_TARGET_REG_BITS == 64
 case INDEX_op_setcond_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-t2 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
+t2 = tci_read_rval(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
 break;
 #endif
 CASE_32_64(mov)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1);
 break;
 case INDEX_op_tci_movi_i32:
@@ -419,51 +419,51 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 CASE_32_64(ld8u)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
 break;
 CASE_32_64(ld8s)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
 break;
 CASE_32_64(ld16u)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
 break;
 CASE_32_64(ld16s)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(int16_t *)(t1 + t2));
 break;
 case INDEX_op_ld_i32:
 CASE_64(ld32u)
 t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
 break;
 CASE_32_64(st8)
-t0 = tci_read_r(regs, &tb_ptr);
-t1 = tci_read_r(regs, &tb_ptr);
+t0 = tci_read_rval(regs, &tb_ptr);
+t1 = tci_read_rval(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 *(uint8_t *)(t1 + t2) = t0;
 

[PATCH v2 27/93] tcg/tci: Fix TCG_REG_R4 misusage

2021-02-03 Thread Richard Henderson
This was removed from tcg_target_reg_alloc_order and
tcg_target_call_iarg_regs on the assumption that it
was the stack.  This was incorrectly copied from i386.
For tci, the stack is R15.

By adding R4 back to tcg_target_call_iarg_regs, adjust the other
entries so that 6 (or 12) entries are still present in the array,
and adjust the numbers in the interpreter.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c| 8 
 tcg/tci/tcg-target.c.inc | 7 +--
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index e0d815e4b2..935eb87330 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -511,14 +511,14 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
   tci_read_reg(regs, TCG_REG_R1),
   tci_read_reg(regs, TCG_REG_R2),
   tci_read_reg(regs, TCG_REG_R3),
+  tci_read_reg(regs, TCG_REG_R4),
   tci_read_reg(regs, TCG_REG_R5),
   tci_read_reg(regs, TCG_REG_R6),
   tci_read_reg(regs, TCG_REG_R7),
   tci_read_reg(regs, TCG_REG_R8),
   tci_read_reg(regs, TCG_REG_R9),
   tci_read_reg(regs, TCG_REG_R10),
-  tci_read_reg(regs, TCG_REG_R11),
-  tci_read_reg(regs, TCG_REG_R12));
+  tci_read_reg(regs, TCG_REG_R11));
 tci_write_reg(regs, TCG_REG_R0, tmp64);
 tci_write_reg(regs, TCG_REG_R1, tmp64 >> 32);
 #else
@@ -526,8 +526,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
   tci_read_reg(regs, TCG_REG_R1),
   tci_read_reg(regs, TCG_REG_R2),
   tci_read_reg(regs, TCG_REG_R3),
-  tci_read_reg(regs, TCG_REG_R5),
-  tci_read_reg(regs, TCG_REG_R6));
+  tci_read_reg(regs, TCG_REG_R4),
+  tci_read_reg(regs, TCG_REG_R5));
 tci_write_reg(regs, TCG_REG_R0, tmp64);
 #endif
 break;
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index 7e3bed811e..aba7f75ad1 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -181,9 +181,7 @@ static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_R1,
 TCG_REG_R2,
 TCG_REG_R3,
-#if 0 /* used for TCG_REG_CALL_STACK */
 TCG_REG_R4,
-#endif
 TCG_REG_R5,
 TCG_REG_R6,
 TCG_REG_R7,
@@ -206,19 +204,16 @@ static const int tcg_target_call_iarg_regs[] = {
 TCG_REG_R1,
 TCG_REG_R2,
 TCG_REG_R3,
-#if 0 /* used for TCG_REG_CALL_STACK */
 TCG_REG_R4,
-#endif
 TCG_REG_R5,
-TCG_REG_R6,
 #if TCG_TARGET_REG_BITS == 32
 /* 32 bit hosts need 2 * MAX_OPC_PARAM_IARGS registers. */
+TCG_REG_R6,
 TCG_REG_R7,
 TCG_REG_R8,
 TCG_REG_R9,
 TCG_REG_R10,
 TCG_REG_R11,
-TCG_REG_R12,
 #endif
 };
 
-- 
2.25.1




[PATCH v2 43/93] tcg/tci: Split out tci_args_rrs

2021-02-03 Thread Richard Henderson
Begin splitting out functions that do pure argument decode,
without actually loading values from the register set.

This means that decoding need not concern itself between
input and output registers.  We can assert that the register
number is in range during decode, so that it is safe to
simply dereference from regs[] later.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 111 --
 1 file changed, 67 insertions(+), 44 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 20aaaca959..be298ae39d 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -83,6 +83,20 @@ static uint64_t tci_uint64(uint32_t high, uint32_t low)
 }
 #endif
 
+/* Read constant byte from bytecode. */
+static uint8_t tci_read_b(const uint8_t **tb_ptr)
+{
+return *(tb_ptr[0]++);
+}
+
+/* Read register number from bytecode. */
+static TCGReg tci_read_r(const uint8_t **tb_ptr)
+{
+uint8_t regno = tci_read_b(tb_ptr);
+tci_assert(regno < TCG_TARGET_NB_REGS);
+return regno;
+}
+
 /* Read constant (native size) from bytecode. */
 static tcg_target_ulong tci_read_i(const uint8_t **tb_ptr)
 {
@@ -161,6 +175,23 @@ static tcg_target_ulong tci_read_label(const uint8_t 
**tb_ptr)
 return label;
 }
 
+/*
+ * Load sets of arguments all at once.  The naming convention is:
+ *   tci_args_
+ * where arguments is a sequence of
+ *
+ *   r = register
+ *   s = signed ldst offset
+ */
+
+static void tci_args_rrs(const uint8_t **tb_ptr,
+ TCGReg *r0, TCGReg *r1, int32_t *i2)
+{
+*r0 = tci_read_r(tb_ptr);
+*r1 = tci_read_r(tb_ptr);
+*i2 = tci_read_s32(tb_ptr);
+}
+
 static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond condition)
 {
 bool result = false;
@@ -328,6 +359,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint8_t op_size = tb_ptr[1];
 const uint8_t *old_code_ptr = tb_ptr;
 #endif
+TCGReg r0, r1;
 tcg_target_ulong t0;
 tcg_target_ulong t1;
 tcg_target_ulong t2;
@@ -342,6 +374,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 uint64_t v64;
 #endif
 TCGMemOpIdx oi;
+int32_t ofs;
+void *ptr;
 
 /* Skip opcode and size entry. */
 tb_ptr += 2;
@@ -418,54 +452,46 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 /* Load/store operations (32 bit). */
 
 CASE_32_64(ld8u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(uint8_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(uint8_t *)ptr;
 break;
 CASE_32_64(ld8s)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(int8_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(int8_t *)ptr;
 break;
 CASE_32_64(ld16u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(uint16_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(uint16_t *)ptr;
 break;
 CASE_32_64(ld16s)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(int16_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(int16_t *)ptr;
 break;
 case INDEX_op_ld_i32:
 CASE_64(ld32u)
-t0 = *tb_ptr++;
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-tci_write_reg(regs, t0, *(uint32_t *)(t1 + t2));
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+regs[r0] = *(uint32_t *)ptr;
 break;
 CASE_32_64(st8)
-t0 = tci_read_rval(regs, &tb_ptr);
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-*(uint8_t *)(t1 + t2) = t0;
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+*(uint8_t *)ptr = regs[r0];
 break;
 CASE_32_64(st16)
-t0 = tci_read_rval(regs, &tb_ptr);
-t1 = tci_read_rval(regs, &tb_ptr);
-t2 = tci_read_s32(&tb_ptr);
-*(uint16_t *)(t1 + t2) = t0;
+tci_args_rrs(&tb_ptr, &r0, &r1, &ofs);
+ptr = (void *)(regs[r1] + ofs);
+*(uint16_t *)ptr = regs[r0];
 break;
 

[PATCH v2 40/93] tcg/tci: Merge bswap operations

2021-02-03 Thread Richard Henderson
This includes bswap16 and bswap32.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 1819652c5a..c979215332 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -652,15 +652,15 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, (uint16_t)t1);
 break;
 #endif
-#if TCG_TARGET_HAS_bswap16_i32
-case INDEX_op_bswap16_i32:
+#if TCG_TARGET_HAS_bswap16_i32 || TCG_TARGET_HAS_bswap16_i64
+CASE_32_64(bswap16)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, bswap16(t1));
 break;
 #endif
-#if TCG_TARGET_HAS_bswap32_i32
-case INDEX_op_bswap32_i32:
+#if TCG_TARGET_HAS_bswap32_i32 || TCG_TARGET_HAS_bswap32_i64
+CASE_32_64(bswap32)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, bswap32(t1));
@@ -808,20 +808,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint32_t)t1);
 break;
-#if TCG_TARGET_HAS_bswap16_i64
-case INDEX_op_bswap16_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap16(t1));
-break;
-#endif
-#if TCG_TARGET_HAS_bswap32_i64
-case INDEX_op_bswap32_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, bswap32(t1));
-break;
-#endif
 #if TCG_TARGET_HAS_bswap64_i64
 case INDEX_op_bswap64_i64:
 t0 = *tb_ptr++;
-- 
2.25.1




[PATCH v2 41/93] tcg/tci: Merge mov, not and neg operations

2021-02-03 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 29 +
 1 file changed, 5 insertions(+), 24 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index c979215332..225cb698e8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -404,7 +404,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
 break;
 #endif
-case INDEX_op_mov_i32:
+CASE_32_64(mov)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, t1);
@@ -666,26 +666,21 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, bswap32(t1));
 break;
 #endif
-#if TCG_TARGET_HAS_not_i32
-case INDEX_op_not_i32:
+#if TCG_TARGET_HAS_not_i32 || TCG_TARGET_HAS_not_i64
+CASE_32_64(not)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, ~t1);
 break;
 #endif
-#if TCG_TARGET_HAS_neg_i32
-case INDEX_op_neg_i32:
+#if TCG_TARGET_HAS_neg_i32 || TCG_TARGET_HAS_neg_i64
+CASE_32_64(neg)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, -t1);
 break;
 #endif
 #if TCG_TARGET_REG_BITS == 64
-case INDEX_op_mov_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
-break;
 case INDEX_op_tci_movi_i64:
 t0 = *tb_ptr++;
 t1 = tci_read_i64(&tb_ptr);
@@ -815,20 +810,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg(regs, t0, bswap64(t1));
 break;
 #endif
-#if TCG_TARGET_HAS_not_i64
-case INDEX_op_not_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, ~t1);
-break;
-#endif
-#if TCG_TARGET_HAS_neg_i64
-case INDEX_op_neg_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, -t1);
-break;
-#endif
 #endif /* TCG_TARGET_REG_BITS == 64 */
 
 /* QEMU specific operations. */
-- 
2.25.1




[PATCH v2 22/93] tcg/tci: Use g_assert_not_reached

2021-02-03 Thread Richard Henderson
Three TODO instances are never happen cases.
Other uses of tcg_abort are also indicating unreachable cases.

Tested-by: Alex Bennée 
Reviewed-by: Stefan Weil 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index eb70672efb..36d594672f 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -362,7 +362,7 @@ static bool tci_compare32(uint32_t u0, uint32_t u1, TCGCond 
condition)
 result = (u0 > u1);
 break;
 default:
-TODO();
+g_assert_not_reached();
 }
 return result;
 }
@@ -404,7 +404,7 @@ static bool tci_compare64(uint64_t u0, uint64_t u1, TCGCond 
condition)
 result = (u0 > u1);
 break;
 default:
-TODO();
+g_assert_not_reached();
 }
 return result;
 }
@@ -1114,7 +1114,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tmp32 = qemu_ld_beul;
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 tci_write_reg(regs, t0, tmp32);
 break;
@@ -1163,7 +1163,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tmp64 = qemu_ld_beq;
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 tci_write_reg(regs, t0, tmp64);
 if (TCG_TARGET_REG_BITS == 32) {
@@ -1191,7 +1191,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 qemu_st_bel(t0);
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 break;
 case INDEX_op_qemu_st_i64:
@@ -1221,7 +1221,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 qemu_st_beq(tmp64);
 break;
 default:
-tcg_abort();
+g_assert_not_reached();
 }
 break;
 case INDEX_op_mb:
@@ -1229,8 +1229,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 smp_mb();
 break;
 default:
-TODO();
-break;
+g_assert_not_reached();
 }
 tci_assert(tb_ptr == old_code_ptr + op_size);
 }
-- 
2.25.1




[PATCH v2 39/93] tcg/tci: Remove ifdefs for TCG_TARGET_HAS_ext32[us]_i64

2021-02-03 Thread Richard Henderson
These operations are always available under different names:
INDEX_op_ext_i32_i64 and INDEX_op_extu_i32_i64, so we remove
no code with the ifdef.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index cdfd9b7af8..1819652c5a 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -796,17 +796,13 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 continue;
 }
 break;
-#if TCG_TARGET_HAS_ext32s_i64
 case INDEX_op_ext32s_i64:
-#endif
 case INDEX_op_ext_i32_i64:
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int32_t)t1);
 break;
-#if TCG_TARGET_HAS_ext32u_i64
 case INDEX_op_ext32u_i64:
-#endif
 case INDEX_op_extu_i32_i64:
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
-- 
2.25.1




[PATCH v2 38/93] tcg/tci: Merge extension operations

2021-02-03 Thread Richard Henderson
This includes ext8s, ext8u, ext16s, ext16u.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 44 
 1 file changed, 8 insertions(+), 36 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 894e87e1b0..cdfd9b7af8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -624,29 +624,29 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tci_write_reg64(regs, t1, t0, (uint32_t)t2 * tmp64);
 break;
 #endif /* TCG_TARGET_REG_BITS == 32 */
-#if TCG_TARGET_HAS_ext8s_i32
-case INDEX_op_ext8s_i32:
+#if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
+CASE_32_64(ext8s)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int8_t)t1);
 break;
 #endif
-#if TCG_TARGET_HAS_ext16s_i32
-case INDEX_op_ext16s_i32:
+#if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
+CASE_32_64(ext16s)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (int16_t)t1);
 break;
 #endif
-#if TCG_TARGET_HAS_ext8u_i32
-case INDEX_op_ext8u_i32:
+#if TCG_TARGET_HAS_ext8u_i32 || TCG_TARGET_HAS_ext8u_i64
+CASE_32_64(ext8u)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint8_t)t1);
 break;
 #endif
-#if TCG_TARGET_HAS_ext16u_i32
-case INDEX_op_ext16u_i32:
+#if TCG_TARGET_HAS_ext16u_i32 || TCG_TARGET_HAS_ext16u_i64
+CASE_32_64(ext16u)
 t0 = *tb_ptr++;
 t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, (uint16_t)t1);
@@ -796,34 +796,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 continue;
 }
 break;
-#if TCG_TARGET_HAS_ext8u_i64
-case INDEX_op_ext8u_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint8_t)t1);
-break;
-#endif
-#if TCG_TARGET_HAS_ext8s_i64
-case INDEX_op_ext8s_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int8_t)t1);
-break;
-#endif
-#if TCG_TARGET_HAS_ext16s_i64
-case INDEX_op_ext16s_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, (int16_t)t1);
-break;
-#endif
-#if TCG_TARGET_HAS_ext16u_i64
-case INDEX_op_ext16u_i64:
-t0 = *tb_ptr++;
-t1 = tci_read_r(regs, &tb_ptr);
-tci_write_reg(regs, t0, (uint16_t)t1);
-break;
-#endif
 #if TCG_TARGET_HAS_ext32s_i64
 case INDEX_op_ext32s_i64:
 #endif
-- 
2.25.1




[PATCH v2 29/93] tcg/tci: Remove TCG_CONST

2021-02-03 Thread Richard Henderson
Only allow registers or constants, but not both, in any
given position.  Removing this difference in input will
allow more code to be shared between 32-bit and 64-bit.

Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target-con-set.h |   6 +-
 tcg/tci/tcg-target.h |   3 -
 tcg/tci.c| 189 +--
 tcg/tci/tcg-target.c.inc |  82 ---
 4 files changed, 89 insertions(+), 191 deletions(-)

diff --git a/tcg/tci/tcg-target-con-set.h b/tcg/tci/tcg-target-con-set.h
index 38e82f7535..f51b7bcb13 100644
--- a/tcg/tci/tcg-target-con-set.h
+++ b/tcg/tci/tcg-target-con-set.h
@@ -10,16 +10,12 @@
  * tcg-target-con-str.h; the constraint combination is inclusive or.
  */
 C_O0_I2(r, r)
-C_O0_I2(r, ri)
 C_O0_I3(r, r, r)
-C_O0_I4(r, r, ri, ri)
 C_O0_I4(r, r, r, r)
 C_O1_I1(r, r)
 C_O1_I2(r, 0, r)
-C_O1_I2(r, ri, ri)
 C_O1_I2(r, r, r)
-C_O1_I2(r, r, ri)
-C_O1_I4(r, r, r, ri, ri)
+C_O1_I4(r, r, r, r, r)
 C_O2_I1(r, r, r)
 C_O2_I2(r, r, r, r)
 C_O2_I4(r, r, r, r, r, r)
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 8f7ed676fc..9c0021a26f 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -157,9 +157,6 @@ typedef enum {
 
 TCG_AREG0 = TCG_REG_R14,
 TCG_REG_CALL_STACK = TCG_REG_R15,
-
-/* Special value UINT8_MAX is used by TCI to encode constant values. */
-TCG_CONST = UINT8_MAX
 } TCGReg;
 
 /* Used for function call generation. */
diff --git a/tcg/tci.c b/tcg/tci.c
index 935eb87330..fb3c97aaf1 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -255,61 +255,6 @@ tci_read_ulong(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 return taddr;
 }
 
-/* Read indexed register or constant (native size) from bytecode. */
-static tcg_target_ulong
-tci_read_ri(const tcg_target_ulong *regs, const uint8_t **tb_ptr)
-{
-tcg_target_ulong value;
-TCGReg r = **tb_ptr;
-*tb_ptr += 1;
-if (r == TCG_CONST) {
-value = tci_read_i(tb_ptr);
-} else {
-value = tci_read_reg(regs, r);
-}
-return value;
-}
-
-/* Read indexed register or constant (32 bit) from bytecode. */
-static uint32_t tci_read_ri32(const tcg_target_ulong *regs,
-  const uint8_t **tb_ptr)
-{
-uint32_t value;
-TCGReg r = **tb_ptr;
-*tb_ptr += 1;
-if (r == TCG_CONST) {
-value = tci_read_i32(tb_ptr);
-} else {
-value = tci_read_reg32(regs, r);
-}
-return value;
-}
-
-#if TCG_TARGET_REG_BITS == 32
-/* Read two indexed registers or constants (2 * 32 bit) from bytecode. */
-static uint64_t tci_read_ri64(const tcg_target_ulong *regs,
-  const uint8_t **tb_ptr)
-{
-uint32_t low = tci_read_ri32(regs, tb_ptr);
-return tci_uint64(tci_read_ri32(regs, tb_ptr), low);
-}
-#elif TCG_TARGET_REG_BITS == 64
-/* Read indexed register or constant (64 bit) from bytecode. */
-static uint64_t tci_read_ri64(const tcg_target_ulong *regs,
-  const uint8_t **tb_ptr)
-{
-uint64_t value;
-TCGReg r = **tb_ptr;
-*tb_ptr += 1;
-if (r == TCG_CONST) {
-value = tci_read_i64(tb_ptr);
-} else {
-value = tci_read_reg64(regs, r);
-}
-return value;
-}
-#endif
-
 static tcg_target_ulong tci_read_label(const uint8_t **tb_ptr)
 {
 tcg_target_ulong label = tci_read_i(tb_ptr);
@@ -504,7 +449,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 switch (opc) {
 case INDEX_op_call:
-t0 = tci_read_ri(regs, &tb_ptr);
+t0 = tci_read_i(&tb_ptr);
 tci_tb_ptr = (uintptr_t)tb_ptr;
 #if TCG_TARGET_REG_BITS == 32
 tmp64 = ((helper_function)t0)(tci_read_reg(regs, TCG_REG_R0),
@@ -539,7 +484,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 case INDEX_op_setcond_i32:
 t0 = *tb_ptr++;
 t1 = tci_read_r32(regs, &tb_ptr);
-t2 = tci_read_ri32(regs, &tb_ptr);
+t2 = tci_read_r32(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare32(t1, t2, condition));
 break;
@@ -547,7 +492,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 case INDEX_op_setcond2_i32:
 t0 = *tb_ptr++;
 tmp64 = tci_read_r64(regs, &tb_ptr);
-v64 = tci_read_ri64(regs, &tb_ptr);
+v64 = tci_read_r64(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare64(tmp64, v64, condition));
 break;
@@ -555,7 +500,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 case INDEX_op_setcond_i64:
 t0 = *tb_ptr++;
 t1 = tci_read_r64(regs, &tb_ptr);
-t2 = tci_read_ri64(regs, &tb_ptr);
+t2 = tci_read_r64(regs, &tb_ptr);
 condition = *tb_ptr++;
 tci_write_reg(regs, t0, tci_compare64(t1, t2, condition));
 break;
@@ 

[PATCH v2 20/93] tcg/tci: Move stack bounds check to compile-time

2021-02-03 Thread Richard Henderson
The existing check was incomplete:
(1) Only applied to two of the 7 stores, and not to the loads at all.
(2) Only checked the upper, but not the lower bound of the stack.

Doing this at compile time means that we don't need to do it
at runtime as well.

Tested-by: Alex Bennée 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tci.c|  2 --
 tcg/tci/tcg-target.c.inc | 13 +
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index fe935e71a3..ee2cd7dfa2 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -628,7 +628,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 t0 = tci_read_r32(regs, &tb_ptr);
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
-tci_assert(t1 != sp_value || (int32_t)t2 < 0);
 *(uint32_t *)(t1 + t2) = t0;
 break;
 
@@ -884,7 +883,6 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 t0 = tci_read_r64(regs, &tb_ptr);
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
-tci_assert(t1 != sp_value || (int32_t)t2 < 0);
 *(uint64_t *)(t1 + t2) = t0;
 break;
 
diff --git a/tcg/tci/tcg-target.c.inc b/tcg/tci/tcg-target.c.inc
index f0f6b13112..82efb9af60 100644
--- a/tcg/tci/tcg-target.c.inc
+++ b/tcg/tci/tcg-target.c.inc
@@ -375,10 +375,20 @@ static void tci_out_label(TCGContext *s, TCGLabel *label)
 }
 }
 
+static void stack_bounds_check(TCGReg base, target_long offset)
+{
+if (base == TCG_REG_CALL_STACK) {
+tcg_debug_assert(offset < 0);
+tcg_debug_assert(offset >= -(CPU_TEMP_BUF_NLONGS * sizeof(long)));
+}
+}
+
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
intptr_t arg2)
 {
 uint8_t *old_code_ptr = s->code_ptr;
+
+stack_bounds_check(arg1, arg2);
 if (type == TCG_TYPE_I32) {
 tcg_out_op_t(s, INDEX_op_ld_i32);
 tcg_out_r(s, ret);
@@ -514,6 +524,7 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const 
TCGArg *args,
 case INDEX_op_st16_i64:
 case INDEX_op_st32_i64:
 case INDEX_op_st_i64:
+stack_bounds_check(args[1], args[2]);
 tcg_out_r(s, args[0]);
 tcg_out_r(s, args[1]);
 tcg_debug_assert(args[2] == (int32_t)args[2]);
@@ -716,6 +727,8 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg 
arg, TCGReg arg1,
intptr_t arg2)
 {
 uint8_t *old_code_ptr = s->code_ptr;
+
+stack_bounds_check(arg1, arg2);
 if (type == TCG_TYPE_I32) {
 tcg_out_op_t(s, INDEX_op_st_i32);
 tcg_out_r(s, arg);
-- 
2.25.1




[PATCH v2 33/93] tcg/tci: Remove tci_read_r16

2021-02-03 Thread Richard Henderson
Use explicit casts for ext16u opcodes, and allow truncation
to happen with the store for st8 opcodes, and with the call
for bswap16 opcodes.

Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 25db479e62..547be0c2f0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -71,11 +71,6 @@ static int32_t tci_read_reg32s(const tcg_target_ulong *regs, 
TCGReg index)
 }
 #endif
 
-static uint16_t tci_read_reg16(const tcg_target_ulong *regs, TCGReg index)
-{
-return (uint16_t)tci_read_reg(regs, index);
-}
-
 static uint32_t tci_read_reg32(const tcg_target_ulong *regs, TCGReg index)
 {
 return (uint32_t)tci_read_reg(regs, index);
@@ -157,15 +152,6 @@ tci_read_r(const tcg_target_ulong *regs, const uint8_t 
**tb_ptr)
 return value;
 }
 
-/* Read indexed register (16 bit) from bytecode. */
-static uint16_t tci_read_r16(const tcg_target_ulong *regs,
- const uint8_t **tb_ptr)
-{
-uint16_t value = tci_read_reg16(regs, **tb_ptr);
-*tb_ptr += 1;
-return value;
-}
-
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
 /* Read indexed register (16 bit signed) from bytecode. */
 static int16_t tci_read_r16s(const tcg_target_ulong *regs,
@@ -526,7 +512,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 *(uint8_t *)(t1 + t2) = t0;
 break;
 CASE_32_64(st16)
-t0 = tci_read_r16(regs, &tb_ptr);
+t0 = tci_read_r(regs, &tb_ptr);
 t1 = tci_read_r(regs, &tb_ptr);
 t2 = tci_read_s32(&tb_ptr);
 *(uint16_t *)(t1 + t2) = t0;
@@ -716,14 +702,14 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext16u_i32
 case INDEX_op_ext16u_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r16(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (uint16_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_bswap16_i32
 case INDEX_op_bswap16_i32:
 t0 = *tb_ptr++;
-t1 = tci_read_r16(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, bswap16(t1));
 break;
 #endif
@@ -924,8 +910,8 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_ext16u_i64
 case INDEX_op_ext16u_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r16(regs, &tb_ptr);
-tci_write_reg(regs, t0, t1);
+t1 = tci_read_r(regs, &tb_ptr);
+tci_write_reg(regs, t0, (uint16_t)t1);
 break;
 #endif
 #if TCG_TARGET_HAS_ext32s_i64
@@ -947,7 +933,7 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 #if TCG_TARGET_HAS_bswap16_i64
 case INDEX_op_bswap16_i64:
 t0 = *tb_ptr++;
-t1 = tci_read_r16(regs, &tb_ptr);
+t1 = tci_read_r(regs, &tb_ptr);
 tci_write_reg(regs, t0, bswap16(t1));
 break;
 #endif
-- 
2.25.1




  1   2   3   4   5   6   >