Re: [Qemu-devel] [Bug 1653384] [NEW] Assertion failed with USB pass through with XHCI controller

2017-01-10 Thread Gerd Hoffmann
Hi,

> qemu-system-x86_64: hw/usb/core.c:623: usb_packet_cleanup: Assertion
> `!usb_packet_is_inflight(p)' failed.

We are trying to free a in-flight transfer.  Hmm.

> Bisected the issue to following commit:
> first bad commit: [94b037f2a451b3dc855f9f2c346e5049a361bd55] xhci: use linked 
> list for transfers

Ok.

> #5  0x55615afda555 in xhci_ep_free_xfer ()
> No symbol table info available.
> #6  0x55615afdc156 in xhci_kick_epctx ()
> No symbol table info available.

Can you rebuild with debug into and try again?

There are multiple xhci_ep_free_xfer() callsites in xhci_kick_epctx()
and it would be useful to know which one is it.

thanks,
  Gerd

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1653384

Title:
  Assertion failed with USB pass through with XHCI controller

Status in QEMU:
  New

Bug description:
  Starting qemu 2.8.0 with XHCI controller and host device passed
  through results in an assertion failure:

  qemu-system-x86_64: hw/usb/core.c:623: usb_packet_cleanup: Assertion
  `!usb_packet_is_inflight(p)' failed.

  Can be reproduced with the following command (passing through a Lenovo
  keyboard):

  qemu-system-x86_64 -usb  -device nec-usb-xhci,id=usb -device usb-
  host,vendorid=0x04b3,productid=0x3025,id=hostdev0,bus=usb.0,port=1

  If nec-usb-xhci is changed to usb-ehci, qemu tries to boot without
  assertion failures.

  
  Can be reproduced with the latest master (commit dbe2b65) and v2.8.0.

  Bisected the issue to following commit:
  first bad commit: [94b037f2a451b3dc855f9f2c346e5049a361bd55] xhci: use linked 
list for transfers

  
  Backtrace from commit dbe2b65:

  #0  0x7f2eb4657227 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:55
  resultvar = 0
  pid = 3453
  selftid = 3453
  #1  0x7f2eb465867a in __GI_abort () at abort.c:89
  save_stage = 2
  act = {__sigaction_handler = {sa_handler = 0x4, sa_sigaction = 0x4}, 
sa_mask = {__val = {140734740550528, 93876690035339, 
140734740550624, 48833659808, 0, 0, 0, 21474836480, 
140734740550792, 139838573009553, 140734740550560, 139838573043008, 
139838573024160, 9387665872, 139838702616576, 
139838573024160}}, sa_flags = 1528954938, 
sa_restorer = 0x55615b2202c0 <__PRETTY_FUNCTION__.38612>}
  sigs = {__val = {32, 0 }}
  #2  0x7f2eb46502cd in __assert_fail_base (fmt=0x7f2eb47893a0 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n", 
  assertion=assertion@entry=0x55615b22003a "!usb_packet_is_inflight(p)", 
file=file@entry=0x55615b21fdf0 "hw/usb/core.c", line=line@entry=619, 
  function=function@entry=0x55615b2202c0 <__PRETTY_FUNCTION__.38612> 
"usb_packet_cleanup") at assert.c:92
  str = 0x55615cfdf510 ""
  total = 4096
  #3  0x7f2eb4650382 in __GI___assert_fail (assertion=0x55615b22003a 
"!usb_packet_is_inflight(p)", file=0x55615b21fdf0 "hw/usb/core.c", 
  line=619, function=0x55615b2202c0 <__PRETTY_FUNCTION__.38612> 
"usb_packet_cleanup") at assert.c:101
  No locals.
  #4  0x55615afc385e in usb_packet_cleanup ()
  No symbol table info available.
  #5  0x55615afda555 in xhci_ep_free_xfer ()
  No symbol table info available.
  #6  0x55615afdc156 in xhci_kick_epctx ()
  No symbol table info available.
  #7  0x55615afda099 in xhci_ep_kick_timer ()
  No symbol table info available.
  #8  0x55615b08ceee in timerlist_run_timers ()
  No symbol table info available.
  #9  0x55615b08cf36 in qemu_clock_run_timers ()
  No symbol table info available.
  #10 0x55615b08d2df in qemu_clock_run_all_timers ()
  No symbol table info available.
  #11 0x55615b08be40 in main_loop_wait ()
  No symbol table info available.
  #12 0x55615ae3870f in main_loop ()
  No symbol table info available.
  #13 0x55615ae4027b in main ()

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1653384/+subscriptions



[Qemu-devel] [Bug 622367] Re: No BIOS MPFP structure with smp=92 and more

2017-01-10 Thread Thomas Huth
QEMU 0.12 is quite outdated nowadays ... can you still reproduce this
issue with the latest version of QEMU (currently version 2.8)?

** Changed in: qemu
   Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/622367

Title:
  No BIOS MPFP structure with smp=92 and more

Status in QEMU:
  Incomplete

Bug description:
  qemu 0.12.2, SeaBios 0.5.1, running qemu-system-x86_64.exe with option -smp.
  If smp>=92 then no MP floating point structure present in 1 Mb. This may be 
verified by pmemsave 0 0x10 in debugger and search for _MP_ signature in 
file.

  qemu 0.10.5 (bios build 05/08/09) can smp=128 (and even 255 if not
  hangs :).

  Host win 7 x64 RTM 7600.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/622367/+subscriptions



[Qemu-devel] [Bug 1619438] Re: GTK+ UI, delete key deletes to the left in the monitor

2017-01-10 Thread Thomas Huth
Released with version 2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1619438

Title:
  GTK+ UI, delete key deletes to the left in the monitor

Status in QEMU:
  Fix Released

Bug description:
  it must delete characters to the right, otherwise it is like having
  two backspaces

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1619438/+subscriptions



[Qemu-devel] [Bug 1624726] Re: Integrator/CP regression after QOM'ification of integratorcp.c

2017-01-10 Thread Thomas Huth
Patch has been included here:
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=e9d9ee234f852026d58
... and been released with QEMU version 2.8

** Changed in: qemu
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1624726

Title:
  Integrator/CP regression after QOM'ification of integratorcp.c

Status in HelenOS branches:
  New
Status in QEMU:
  Fix Released

Bug description:
  The following command line no longer works (i.e. the guest does not
  boot) with QEMU 2.7.0:

  qemu-system-arm -M integratorcp -m 128M -kernel
  HelenOS-0.6.0-arm32-integratorcp.boot

  The HelenOS image can be downloaded here:

  http://www.helenos.org/releases/HelenOS-0.6.0-arm32-integratorcp.boot

  I did git bisect and came to this revision:

  a1f42e0c9abc1028a8bb8686dbb3749fcd2d18e8 is the first bad commit
  commit a1f42e0c9abc1028a8bb8686dbb3749fcd2d18e8
  Author: xiaoqiang.zhao 
  Date:   Mon Mar 7 15:05:44 2016 +0800

  hw/arm: QOM'ify integratorcp.c
  
  * Drop the use of old SysBus init function and use instance_init
  * Remove the empty 'icp_pic_class_init' from Typeinfo
  
  Signed-off-by: xiaoqiang zhao 
  Reviewed-by: Peter Maydell 
  Signed-off-by: Peter Maydell 

  :04 04 b73418ea3fb69ed72438776e78786456fe4c414c
  b483e8579037fdae7d136b2f4ada3147bdde92f1 M  hw

  Upon closer inspection, I discovered that for some reason s->memsz in
  integratorcm_init() is zero. In the last good revision, this value was
  128. As a temporary workaround, hardcoding it to this expected value
  fixes the problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/helenos/+bug/1624726/+subscriptions



Re: [Qemu-devel] qemu-2.8-rc4 is broken

2017-01-10 Thread Pavel Dovgalyuk
> From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> On Wed, Dec 21, 2016 at 5:57 AM, Pavel Dovgalyuk  wrote:
> >> -Original Message-
> >> From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> >> On Tue, Dec 20, 2016 at 11:10 AM, Pavel Dovgalyuk  
> >> wrote:
> >> >> From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> >> >> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
> >> >> > It also fails much earlier when I enable logs with "-d int -D log".
> >> >> >
> >> >> Looks like a heap corruption bug since free() is failing.
> >> >
> >> > Seems to be a race condition.
> >> > When I add logs into invalidate_page_bitmap, the bug disappears.
> >> > It seems that someone tries to free the same page bitmap twice and 
> >> > simultaneously.
> >>
> >> Does the following workaround prevent the crashes?
> >>
> >> -global apic-common.vapic=off
> >
> > Yes, this option helps.
> > Thank you.
> 
> Good news.  This can be fixed in 2.8.1 once someone finds a solution.

It seems that something still goes wrong.
I'm using this workaround, but there is a kind of deadlock in translation.
call_rcu_thread hangs at some moment in qemu_event_wait.

As far as I understand, it is used by QHT in translate-all.c.
I can't get more information yet, because logging makes everything too slow.

Pavel Dovgalyuk




[Qemu-devel] [Bug 696834] Re: FP exception reporting not working on NetBSD host

2017-01-10 Thread Thomas Huth
Thanks for verifying!

** Changed in: qemu
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/696834

Title:
  FP exception reporting not working on NetBSD host

Status in QEMU:
  Fix Released

Bug description:
  I recognize that NetBSD is not one of the officially supported host
  OS.  However, qemu 0.13.0 is available in the NetBSD pkgsrc
  collection, and works quite well.  Well, with one exception (pun
  intended): It seems that Floating Point exceptions don't get reported
  properly.

  The following code-snippet demonstrates the problem:

  
  volatile int flt_signal = 0;

  static sigjmp_buf sigfpe_flt_env;
  static void
  sigfpe_flt_action(int signo, siginfo_t *info, void *ptr)
  {
  flt_signal++;
  }

  void trigger(void)
  {   
  struct sigaction sa;
  double d = strtod("0", NULL);
  
  if (sigsetjmp(sigfpe_flt_env, 0) == 0) {
  sa.sa_flags = SA_SIGINFO;
  sa.sa_sigaction = sigfpe_flt_action;
  sigemptyset(_mask);
  sigaction(SIGFPE, , NULL);
  fpsetmask(FP_X_INV|FP_X_DZ|FP_X_OFL|FP_X_UFL|FP_X_IMP);
  printf("%g\n", 1 / d);
  }
  printf("FPE signal handler invoked %d times.\n");
  }

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/696834/+subscriptions



[Qemu-devel] [Bug 1414293] Re: target-lm32/translate.c:336: bad ? : operator

2017-01-10 Thread Thomas Huth
Released with version 2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1414293

Title:
  target-lm32/translate.c:336: bad ? : operator

Status in QEMU:
  Fix Released

Bug description:
  [qemu/target-lm32/translate.c:336]: (style) Same expression in both
  branches of ternary operator.

 int rY = (dc->format == OP_FMT_RR) ? dc->r0 : dc->r0;

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1414293/+subscriptions



[Qemu-devel] [Bug 1625295] Re: qemu-arm dies with libarmmem inside ld.so.preload

2017-01-10 Thread Thomas Huth
Released with version 2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1625295

Title:
  qemu-arm dies with libarmmem inside ld.so.preload

Status in QEMU:
  Fix Released

Bug description:
  When running raspbian inside qemu,the user has to first comment out
  the following line from /etc/ld.so.conf:

  /usr/lib/arm-linux-gnueabihf/libarmmem.so

  
  Will future qemus will be able to work without changine /etc/ld.so.conf ?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1625295/+subscriptions



[Qemu-devel] What the current status on using loadvm with read-only qcow2 image

2017-01-10 Thread Bo Chen
Hello,

Is there a way to use "-loadvm" loading an internal snapshot with a
read-only "qcow2" image for the latest version of qemu?

This seems to be a popular question, here are two "recent" ones:
[1] https://bugs.launchpad.net/qemu/+bug/1184089
[2] https://lists.nongnu.org/archive/html/qemu-discuss/2011-10/msg9.html

I am curious what is the status on this feature.

Thanks,
Bo


[Qemu-devel] [Bug 1464611] Re: 4 * redundant conditions

2017-01-10 Thread Thomas Huth
Released with version 2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1464611

Title:
  4 * redundant conditions

Status in QEMU:
  Fix Released

Bug description:
  
  1.

  [qemu/hw/block/nvme.c:355]: (style) Redundant condition: sqid. 'A &&
  (!A || B)' is equivalent to 'A || B'

if (!sqid || (sqid && !nvme_check_sqid(n, sqid))) {

  2.

  [qemu/hw/block/nvme.c:429]: (style) Redundant condition: cqid. 'A &&
  (!A || B)' is equivalent to 'A || B'

if (!cqid || (cqid && !nvme_check_cqid(n, cqid))) {

  3.

  [qemu/hw/tpm/tpm_passthrough.c:157]: (style) Redundant condition:
  tpm_pt.tpm_op_canceled. 'A && (!A || B)' is equivalent to 'A || B'

   if (!tpm_pt->tpm_op_canceled ||
  (tpm_pt->tpm_op_canceled && errno != ECANCELED)) {

  4.

  [qemu/target-arm/translate-a64.c:5729]: (style) Redundant condition:
  size<3. 'A && (!A || B)' is equivalent to 'A || B'

if (size > 3
  || (size < 3 && is_q)
  || (size == 3 && !is_q)) {

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1464611/+subscriptions



[Qemu-devel] [Bug 1586756] Re: "-serial unix:" option of qemu-system-arm is broken in qemu 2.6.0

2017-01-10 Thread Thomas Huth
Fix has been committed here:
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=6ab3fc32ea640026726b
... and been released with QEMU version 2.8

** Changed in: qemu
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1586756

Title:
  "-serial unix:" option of qemu-system-arm is broken in qemu 2.6.0

Status in QEMU:
  Fix Released

Bug description:
  I found a bug of "-serial unix:PATH_TO_SOCKET" in qemu 2.6.0 (qemu 2.5.1 
works fine).
  Occasionally, a part of the output of qemu disappears in the bug.

  It looks like following commit is the cause:

  char: ensure all clients are in non-blocking mode (Author: Daniel P. Berrange 
)
  
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=64c800f808748522727847b9cdc73412f22dffb9

  In this commit, UNIX socket is set to non-blocking mode, but 
qemu_chr_fe_write function doesn't handle EAGAIN.
  You should fix code like that:

  ---
  diff --git a/qemu-char.c b/qemu-char.c
  index b597ee1..0361d78 100644
  --- a/qemu-char.c
  +++ b/qemu-char.c
  @@ -270,6 +270,7 @@ static int qemu_chr_fe_write_buffer(CharDriverState *s, 
const uint8_t *buf, int
   int qemu_chr_fe_write(CharDriverState *s, const uint8_t *buf, int len)
   {
   int ret;
  +int offset = 0;
   
   if (s->replay && replay_mode == REPLAY_MODE_PLAY) {
   int offset;
  @@ -280,7 +281,21 @@ int qemu_chr_fe_write(CharDriverState *s, const uint8_t 
*buf, int len)
   }
   
   qemu_mutex_lock(>chr_write_lock);
  -ret = s->chr_write(s, buf, len);
  +
  +while (offset < len) {
  +retry:
  +ret = s->chr_write(s, buf, len);
  +if (ret < 0 && errno == EAGAIN) {
  +g_usleep(100);
  +goto retry;
  +}
  +
  +if (ret <= 0) {
  +break;
  +}
  +
  +offset += ret;
  +}
   
   if (ret > 0) {
   qemu_chr_fe_write_log(s, buf, ret);
  ---

  Or please do "git revert 64c800f808748522727847b9cdc73412f22dffb9".

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1586756/+subscriptions



[Qemu-devel] [Bug 1611979] Re: GTK+ interface, backspace is broken in the monitor console

2017-01-10 Thread Thomas Huth
Released with version 2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1611979

Title:
  GTK+ interface, backspace is broken in the monitor console

Status in QEMU:
  Fix Released

Bug description:
  this has been broken for over 2 years

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1611979/+subscriptions



[Qemu-devel] [Bug 1639322] Re: pasting into ppc64 serial console kills qemu

2017-01-10 Thread Thomas Huth
FWIW, the crash should be fixed by this commit here:
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=7bacfd7f7289192c83330
(but we still need to fix the gtk side, too, to only send as much characters at 
once as the receiving side can take)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1639322

Title:
  pasting into ppc64 serial console kills qemu

Status in QEMU:
  Confirmed

Bug description:
  - run qemu-system-ppc64
  - when X window appears press Ctrl+Alt+3
  - paste any text longer than 16 characters

  
  qemu-system-ppc64: 
/home/abuild/rpmbuild/BUILD/qemu-2.6.1/hw/char/spapr_vty.c:40: vty_receive: 
Assertion `(dev->in - dev->out) < 16' failed.
  Aborted (core dumped)

  Broken in SUSE Leap 42.2 and git
  4eb28abd52d48657cff6ff45e8dbbbefe4dbb414

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1639322/+subscriptions



[Qemu-devel] [Bug 1631625] Re: target-mips/dsp_helper.c: two possible bad shifts

2017-01-10 Thread Thomas Huth
Released with version 2.8

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1631625

Title:
  target-mips/dsp_helper.c: two possible bad shifts

Status in QEMU:
  Fix Released

Bug description:
  target-mips/dsp_helper.c:3480:1: error: V629 Consider inspecting the
  '0x01 << (size + 1)' expression. Bit shifting of the 32-bit value with
  a subsequent expansion to the 64-bit type.

  Source code is

  temp = temp & ((0x01 << (size + 1)) - 1);

  If size >= 32, then better code might be

  temp = temp & ((0x01UL << (size + 1)) - 1);

  target-mips/dsp_helper.c:3509:1: error: V629 Consider inspecting the
  '0x01 << (size + 1)' expression. Bit shifting of the 32-bit value with
  a subsequent expansion to the 64-bit type.

  Duplicate

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1631625/+subscriptions



[Qemu-devel] [Bug 1631773] Re: hw/dma/pl080.c:354: possible typo ?

2017-01-10 Thread Thomas Huth
Released with version 2.8.

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1631773

Title:
  hw/dma/pl080.c:354: possible typo ?

Status in QEMU:
  Fix Released

Bug description:
  hw/dma/pl080.c:354:1: warning: V578 An odd bitwise operation detected:
  s->conf & (0x2 | 0x2). Consider verifying it.

  Source code is

 if (s->conf & (PL080_CONF_M1 | PL080_CONF_M1)) {

  Maybe better code

 if (s->conf & (PL080_CONF_M1 | PL080_CONF_M2)) {

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1631773/+subscriptions



Re: [Qemu-devel] [PATCH 1/4] ramblock-notifier: new

2017-01-10 Thread Fam Zheng
On Wed, 01/11 06:48, Stefan Weil wrote:
> On 01/11/17 06:38, Stefan Weil wrote:
> > Hi,
> > 
> > this fails for me when building with XEN support.
> > I noticed the failure when testing the latest HAXM patches.
> > See compiler output below.
> > 
> > Regards
> > Stefan
> 
> The patch compiles with this modification:
> 
> 
> diff --git a/xen-mapcache.c b/xen-mapcache.c
> index dc9b321491..31debdfb2c 100644
> --- a/xen-mapcache.c
> +++ b/xen-mapcache.c
> @@ -163,7 +163,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  err = g_malloc0(nb_pfn * sizeof (int));
> 
>  if (entry->vaddr_base != NULL) {
> -ram_block_removed(entry->vaddr_base, entry->size);
> +ram_block_notify_remove(entry->vaddr_base, entry->size);
>  if (munmap(entry->vaddr_base, entry->size) != 0) {
>  perror("unmap fails");
>  exit(-1);
> @@ -189,7 +189,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  entry->valid_mapping = (unsigned long *) g_malloc0(sizeof(unsigned
> long) *
>  BITS_TO_LONGS(size >> XC_PAGE_SHIFT));
> 
> -ram_block_added(entry->vaddr_base, entry->size);
> +ram_block_notify_add(entry->vaddr_base, entry->size);
>  bitmap_zero(entry->valid_mapping, nb_pfn);
>  for (i = 0; i < nb_pfn; i++) {
>  if (!err[i]) {
> @@ -399,7 +399,7 @@ static void
> xen_invalidate_map_cache_entry_unlocked(uint8_t *buffer)
>  }
> 
>  pentry->next = entry->next;
> -ram_block_removed(entry->vaddr_base, entry->size);
> +ram_block_notify_remove(entry->vaddr_base, entry->size);
>  if (munmap(entry->vaddr_base, entry->size) != 0) {
>  perror("unmap fails");
>  exit(-1);
> 

Yes, this matches what Paolo pointed out in his reply. I'll fix that in the next
revision.

Fam



[Qemu-devel] [Bug 1637447] Re: VNC/RFB: QEMU reports incorrect name (length)

2017-01-10 Thread Thomas Huth
Fix has been committed:
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=97efe4f961dcf5a0126

** Changed in: qemu
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1637447

Title:
  VNC/RFB: QEMU reports incorrect name (length)

Status in QEMU:
  Fix Committed

Bug description:
  If the name of a machine (as set with the -name argument) has a length
  longer than 1024, (RFB) VNC clients will not receive a correct RFB
  ServerInit message.

  I suspect this is the problem:

  https://github.com/qemu/qemu/blob/v2.7.0-rc5/ui/vnc.c#L2459

  The return value of snprintf is used as the value for the name-length field 
in the ServerInit message.
  This is problematic for names that were truncated to 1024, as the length will 
now be bigger than the actual name.

  I think a quick fix would be to simply report min(size,1024) to the
  client...

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1637447/+subscriptions



Re: [Qemu-devel] [PATCH v2 0/2] memory: extend "info mtree" with flat view dump

2017-01-10 Thread Peter Xu
On Wed, Dec 21, 2016 at 03:58:55PM +0800, Peter Xu wrote:
> v2:
> - fix a size error in patch 2
> - add r-b for Marc-André in patch 1

Ping? :)

-- peterx



Re: [Qemu-devel] [PATCH 1/4] ramblock-notifier: new

2017-01-10 Thread Stefan Weil

On 01/11/17 06:38, Stefan Weil wrote:

Hi,

this fails for me when building with XEN support.
I noticed the failure when testing the latest HAXM patches.
See compiler output below.

Regards
Stefan


The patch compiles with this modification:


diff --git a/xen-mapcache.c b/xen-mapcache.c
index dc9b321491..31debdfb2c 100644
--- a/xen-mapcache.c
+++ b/xen-mapcache.c
@@ -163,7 +163,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 err = g_malloc0(nb_pfn * sizeof (int));

 if (entry->vaddr_base != NULL) {
-ram_block_removed(entry->vaddr_base, entry->size);
+ram_block_notify_remove(entry->vaddr_base, entry->size);
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);
@@ -189,7 +189,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 entry->valid_mapping = (unsigned long *) g_malloc0(sizeof(unsigned 
long) *

 BITS_TO_LONGS(size >> XC_PAGE_SHIFT));

-ram_block_added(entry->vaddr_base, entry->size);
+ram_block_notify_add(entry->vaddr_base, entry->size);
 bitmap_zero(entry->valid_mapping, nb_pfn);
 for (i = 0; i < nb_pfn; i++) {
 if (!err[i]) {
@@ -399,7 +399,7 @@ static void 
xen_invalidate_map_cache_entry_unlocked(uint8_t *buffer)

 }

 pentry->next = entry->next;
-ram_block_removed(entry->vaddr_base, entry->size);
+ram_block_notify_remove(entry->vaddr_base, entry->size);
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);




Re: [Qemu-devel] [PATCH] Add DOS support for RTL8139

2017-01-10 Thread Alexey Kardashevskiy
On 08/01/17 22:54, Gerhard Wiesinger wrote:
> Signed-off-by: Gerhard Wiesinger 
> ---
>  hw/net/rtl8139.c | 288
> ++-
>  1 file changed, 264 insertions(+), 24 deletions(-)
> 
> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
> index f05e59c..5241fea 100644
> --- a/hw/net/rtl8139.c
> +++ b/hw/net/rtl8139.c
> @@ -48,6 +48,17 @@
>   *  2011-Mar-22  Benjamin Poirier:  Implemented VLAN offloading
>   */
>  +/*
> + * Testcases and successful regression tests:
> + * 1.) DOS RSET8139.EXE: EEPROM Test successful
> + * 2.) DOS RSET8139.EXE: Local loopback Test (Run Diagnostics On Board)
> + * 3.) DOS RSET8139.EXE: Remote loopback Test as Initiator (Run
> Diagnostics On Network)
> + * 4.) DOS RSET8139.EXE: Remote loopback Test as Responder (Run
> Diagnostics On Network)
> + * 5.) DOS driver: Loads and works
> + * 6.) Linux tests
> + * 7.) Windows tests
> + */
> +
>  /* For crc32 */
>  #include "qemu/osdep.h"
>  #include 
> @@ -130,6 +141,7 @@ enum RTL8139_registers {
>  NWayExpansion = 0x6A,
>  /* Undocumented registers, but required for proper operation. */
>  FIFOTMS = 0x70,/* FIFO Control and test. */
> +RX_ER = 0x72,   /* RX_ER Counter */
>  CSCR = 0x74,/* Chip Status and Configuration Register. */
>  PARA78 = 0x78,
>  PARA7c = 0x7c,/* Magic transceiver parameter register. */
> @@ -472,6 +484,8 @@ typedef struct RTL8139State {
>  uint16_t NWayLPAR;
>  uint16_t NWayExpansion;
>  +uint16_t Fifo_TMS;
> +
>  uint16_t CpCmd;
>  uint8_t  TxThresh;
>  @@ -757,15 +771,27 @@ static void rtl8139_write_buffer(RTL8139State *s,
> const void *buf, int size)
>   if (size > wrapped)
>  {
> +DPRINTF(">>> rx packet pci dma write "
> +"RxBuf=0x%x, RxBufAddr=0x%x, RxBuf+RxBufAddr=0x%x, "
> +"buf=%p, size=%i, wrapped=%i, size-wrapped=%i\n",
> +s->RxBuf, s->RxBufAddr, s->RxBuf + s->RxBufAddr,
> +buf, size, wrapped, size - wrapped
> +   );
>  pci_dma_write(d, s->RxBuf + s->RxBufAddr,
> -  buf, size-wrapped);
> +  buf, size - wrapped);


The patch has lots of cosmetic and unrelated changes like one above, please
post them as a separate patch (if you really have to) and keep functions
changes apart from that.

Also, please use "git send-mail" to make sure the patch was not damaged - I
could not apply this one as thunderbird wrapped long lines. Thanks.


-- 
Alexey



Re: [Qemu-devel] [PATCH v4 4/4] migration: Fail migration blocker for --only-migratble

2017-01-10 Thread Ashijeet Acharya
On Tue, Jan 10, 2017 at 10:45 PM, Peter Maydell
 wrote:
> On 9 January 2017 at 17:02, Ashijeet Acharya  
> wrote:
>> migrate_add_blocker should rightly fail if the '--only-migratable'
>> option was specified and the device in use should not be able to
>> perform the action which results in an unmigratable VM.
>>
>> Make migrate_add_blocker return -EACCES in this case.
>
>> diff --git a/block/qcow.c b/block/qcow.c
>> index 11526a1..bdc6446 100644
>> --- a/block/qcow.c
>> +++ b/block/qcow.c
>> @@ -254,7 +254,10 @@ static int qcow_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>> bdrv_get_device_or_node_name(bs));
>>  ret = migrate_add_blocker(s->migration_blocker, errp);
>>  if (ret < 0) {
>> -error_free(s->migration_blocker);
>> +if (ret == -EACCES) {
>> +error_append_hint(errp, "Cannot use a node with qcow format as "
>> +  "it does not support live migration");
>> +}
>>  goto fail;
>>  }
>>
>
> The error handling for these call sites should look just like
> that for any other function call that takes an Error**:
>
> Error *local_err = NULL;
> [...]
> migrate_add_blocker(s->migration_blocker, _err);
> if (local_err) {
> error_propagate(errp, local_err);
> return; // or otherwise handle failure appropriately
> }
>

I think it will be better to make migrate_add_blocker() to return the
error value as well, otherwise we will end up setting ret in all the
callers manually and that will lead to a repetition of code at all
call sites, right? Refer to qcow for an example...

> migrate_add_blocker() should just internally construct
> the error text and extra hint lines by looking at the
> text it can fish out of the s->migration_blocker argument
> and calling error_append_hint() itself.
>
Yes, I have done that now.

> The patch is also a bit odd because the error_free() calls
> were only added in patch 3/4, right? Generally adding
> lines of code in one patch and deleting them in the next
> is a bad idea.

Yes, I have removed that as well.

Ashijeet
>
> thanks
> -- PMM



Re: [Qemu-devel] [PATCH 1/4] ramblock-notifier: new

2017-01-10 Thread Stefan Weil

Hi,

this fails for me when building with XEN support.
I noticed the failure when testing the latest HAXM patches.
See compiler output below.

Regards
Stefan


On 12/20/16 17:31, Fam Zheng wrote:

From: Paolo Bonzini 

This adds a notify interface of ram block additions and removals.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Fam Zheng 
---

[...]

diff --git a/xen-mapcache.c b/xen-mapcache.c
index 8f3a592..dc9b321 100644
--- a/xen-mapcache.c
+++ b/xen-mapcache.c
@@ -163,6 +163,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 err = g_malloc0(nb_pfn * sizeof (int));

 if (entry->vaddr_base != NULL) {
+ram_block_removed(entry->vaddr_base, entry->size);
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);
@@ -188,6 +189,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 entry->valid_mapping = (unsigned long *) g_malloc0(sizeof(unsigned long) *
 BITS_TO_LONGS(size >> XC_PAGE_SHIFT));

+ram_block_added(entry->vaddr_base, entry->size);
 bitmap_zero(entry->valid_mapping, nb_pfn);
 for (i = 0; i < nb_pfn; i++) {
 if (!err[i]) {
@@ -397,6 +399,7 @@ static void xen_invalidate_map_cache_entry_unlocked(uint8_t 
*buffer)
 }

 pentry->next = entry->next;
+ram_block_removed(entry->vaddr_base, entry->size);
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);



  CC  x86_64-softmmu/xen-mapcache.o
/qemu/xen-mapcache.c: In function 'xen_remap_bucket':
/qemu/xen-mapcache.c:166:9: error: implicit declaration of function 
'ram_block_removed' [-Werror=implicit-function-declaration]

 ram_block_removed(entry->vaddr_base, entry->size);
 ^
/qemu/xen-mapcache.c:166:9: error: nested extern declaration of 
'ram_block_removed' [-Werror=nested-externs]
/qemu/xen-mapcache.c:192:5: error: implicit declaration of function 
'ram_block_added' [-Werror=implicit-function-declaration]

 ram_block_added(entry->vaddr_base, entry->size);
 ^~~
/qemu/xen-mapcache.c:192:5: error: nested extern declaration of 
'ram_block_added' [-Werror=nested-externs]

cc1: all warnings being treated as errors
/qemu/rules.mak:64: recipe for target 'xen-mapcache.o' failed
make[1]: *** [xen-mapcache.o] Error 1
Makefile:203: recipe for target 'subdir-x86_64-softmmu' failed
make: *** [subdir-x86_64-softmmu] Error 2




[Qemu-devel] [kvm-unit-tests PATCH v5 1/2] run_tests: put logs into per-test file

2017-01-10 Thread Peter Xu
We were using test.log before to keep all the test logs. This patch
creates one log file per test case under logs/ directory with name
"TESTNAME.log". Meanwhile, we will keep the last time log into
logs.old/.

Renaming scripts/functions.bash into scripts/common.bash to store some
more global variables.

Signed-off-by: Peter Xu 
---
 .gitignore  |  3 ++-
 Makefile|  5 ++---
 run_tests.sh| 18 +++---
 scripts/{functions.bash => common.bash} | 13 +++--
 scripts/mkstandalone.sh |  2 +-
 5 files changed, 27 insertions(+), 14 deletions(-)
 rename scripts/{functions.bash => common.bash} (75%)

diff --git a/.gitignore b/.gitignore
index 3155418..2213b9b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -12,7 +12,8 @@ cscope.*
 /lib/asm
 /config.mak
 /*-run
-/test.log
 /msr.out
 /tests
 /build-head
+/logs/
+/logs.old/
diff --git a/Makefile b/Makefile
index a32333b..844bacc 100644
--- a/Makefile
+++ b/Makefile
@@ -94,9 +94,8 @@ libfdt_clean:
$(LIBFDT_objdir)/.*.d
 
 distclean: clean libfdt_clean
-   $(RM) lib/asm config.mak $(TEST_DIR)-run test.log msr.out cscope.* \
- build-head
-   $(RM) -r tests
+   $(RM) lib/asm config.mak $(TEST_DIR)-run msr.out cscope.* build-head
+   $(RM) -r tests logs logs.old
 
 cscope: cscope_dirs = lib lib/libfdt lib/linux $(TEST_DIR) $(ARCH_LIBDIRS) 
lib/asm-generic
 cscope:
diff --git a/run_tests.sh b/run_tests.sh
index 254129d..b6a1059 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -7,7 +7,7 @@ if [ ! -f config.mak ]; then
 exit 1
 fi
 source config.mak
-source scripts/functions.bash
+source scripts/common.bash
 
 function usage()
 {
@@ -46,17 +46,21 @@ while getopts "g:hv" opt; do
 esac
 done
 
-RUNTIME_log_stderr () { cat >> test.log; }
+# RUNTIME_log_file will be configured later
+RUNTIME_log_stderr () { cat >> $RUNTIME_log_file; }
 RUNTIME_log_stdout () {
 if [ "$PRETTY_PRINT_STACKS" = "yes" ]; then
-./scripts/pretty_print_stacks.py $1 >> test.log
+./scripts/pretty_print_stacks.py $1 >> $RUNTIME_log_file
 else
-cat >> test.log
+cat >> $RUNTIME_log_file
 fi
 }
 
-
 config=$TEST_DIR/unittests.cfg
-rm -f test.log
-printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
+
+rm -rf $unittest_log_dir.old
+mv $unittest_log_dir $unittest_log_dir.old
+mkdir $unittest_log_dir
+echo "BUILD_HEAD=$(cat build-head)" > $unittest_log_dir/SUMMARY
+
 for_each_unittest $config run
diff --git a/scripts/functions.bash b/scripts/common.bash
similarity index 75%
rename from scripts/functions.bash
rename to scripts/common.bash
index ee9143c..2dd7360 100644
--- a/scripts/functions.bash
+++ b/scripts/common.bash
@@ -1,3 +1,12 @@
+: ${unittest_log_dir:=logs}
+
+function run_task()
+{
+   local testname="$2"
+
+   RUNTIME_log_file="${unittest_log_dir}/${testname}.log"
+   "$@"
+}
 
 function for_each_unittest()
 {
@@ -17,7 +26,7 @@ function for_each_unittest()
 
while read -u $fd line; do
if [[ "$line" =~ ^\[(.*)\]$ ]]; then
-   "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" 
"$arch" "$check" "$accel" "$timeout"
+   run_task "$cmd" "$testname" "$groups" "$smp" "$kernel" 
"$opts" "$arch" "$check" "$accel" "$timeout"
testname=${BASH_REMATCH[1]}
smp=1
kernel=""
@@ -45,6 +54,6 @@ function for_each_unittest()
timeout=${BASH_REMATCH[1]}
fi
done
-   "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" 
"$accel" "$timeout"
+   run_task "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" 
"$check" "$accel" "$timeout"
exec {fd}<&-
 }
diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index d2bae19..3c1938e 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -5,7 +5,7 @@ if [ ! -f config.mak ]; then
exit 1
 fi
 source config.mak
-source scripts/functions.bash
+source scripts/common.bash
 
 escape ()
 {
-- 
2.7.4




[Qemu-devel] [kvm-unit-tests PATCH v5 0/2] run_tests: support concurrent test execution

2017-01-10 Thread Peter Xu
v5:
- add "/" at start/end of line where proper [Drew]
- remove useless newline in Makefile [Drew]
- don't check "mv" since it won't fail [Drew]
- avoid using '"s in (( )) [Drew]
- comment fix [Drew]

v4:
- add .gitignore for logs/ [Drew]
- instead of create globals.bash, renaming function.bash into
  common.bash, put globals inside [Drew]
- instead of removing logs/ directly when start run_tests, move it
  into logs.old so we at least have the last time result cached [Drew]
- s/ut_/unittest_/ through the whole series [Drew]
- remove unittest_log_summary var [Drew]
- remove radim's s-b in patch 2 since it does not suite [Drew]
- tiny fix on the usage lines [Drew]
- use bash arithmetic where proper [Drew]
- remove ut_in_parallel since not used [Drew]

v3:
- better handling for ctrl-c during run_tests.sh [Radim]

v2:
- patch 1: do per-test logging in all cases
- patch 2: throw away task.bash, instead, take Radim's suggestion to
  use jobs

run_tests.sh is getting slower. Maybe it's time to let it run faster.
An obvious issue is that, we were running the tests sequentially in
the past.

This series provides another new "-j" parameter. "-j 8" means we run
the tests on 8 task queues. That'll fasten the script a lot. A very
quick test of mine shows 3x speed boost with 8 task queues.

Please review, thanks.

Peter Xu (2):
  run_tests: put logs into per-test file
  run_tests: allow run tests in parallel

 .gitignore  |  3 ++-
 Makefile|  5 ++---
 run_tests.sh| 30 +-
 scripts/{functions.bash => common.bash} | 27 +--
 scripts/mkstandalone.sh |  2 +-
 5 files changed, 51 insertions(+), 16 deletions(-)
 rename scripts/{functions.bash => common.bash} (63%)

-- 
2.7.4




[Qemu-devel] [kvm-unit-tests PATCH v5 2/2] run_tests: allow run tests in parallel

2017-01-10 Thread Peter Xu
run_task.sh is getting slow. This patch is trying to make it faster by
running the tests concurrently.

We provide a new parameter "-j" for the run_tests.sh, which can be used
to specify how many run queues we want for the tests. Default queue
length is 1, which is the old behavior.

Quick test on my laptop (4 cores, 2 threads each) shows 3x speed boost:

   |-+---|
   | command | time used |
   |-+---|
   | run_test.sh | 75s   |
   | run_test.sh -j8 | 27s   |
   |-+---|

Signed-off-by: Peter Xu 
---
 run_tests.sh| 12 ++--
 scripts/common.bash | 16 +++-
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index b6a1059..477d4fb 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -13,10 +13,11 @@ function usage()
 {
 cat <

Re: [Qemu-devel] [kvm-unit-tests PATCH v4 2/2] run_tests: allow run tests in parallel

2017-01-10 Thread Peter Xu
On Tue, Jan 10, 2017 at 06:39:59PM +0100, Andrew Jones wrote:
> On Mon, Jan 09, 2017 at 12:04:54PM +0800, Peter Xu wrote:
> > run_task.sh is getting slow. This patch is trying to make it faster by
> > running the tests concurrently.
> > 
> > We provide a new parameter "-j" for the run_tests.sh, which can be used
> > to specify how many run queues we want for the tests. Default queue
> > length is 1, which is the old behavior.
> > 
> > Quick test on my laptop (4 cores, 2 threads each) shows 3x speed boost:
> > 
> >|-+---|
> >| command | time used |
> >|-+---|
> >| run_test.sh | 75s   |
> >| run_test.sh -j8 | 27s   |
> >|-+---|
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  run_tests.sh| 12 ++--
> >  scripts/common.bash | 16 +++-
> >  2 files changed, 25 insertions(+), 3 deletions(-)
> > 
> > diff --git a/run_tests.sh b/run_tests.sh
> > index 1e36d66..795cf73 100755
> > --- a/run_tests.sh
> > +++ b/run_tests.sh
> > @@ -13,10 +13,11 @@ function usage()
> >  {
> >  cat < >  
> > -Usage: $0 [-g group] [-h] [-v]
> > +Usage: $0 [-g group] [-h] [-v] [-j num_run_queues]
> >  
> >  -g: Only execute tests in the given group
> >  -h: Output this help text
> > +-j: Execute tests in parallel
> >  -v: Enables verbose mode
> >  
> >  Set the environment variable QEMU=/path/to/qemu-system-ARCH to
> > @@ -28,7 +29,7 @@ EOF
> >  RUNTIME_arch_run="./$TEST_DIR/run"
> >  source scripts/runtime.bash
> >  
> > -while getopts "g:hv" opt; do
> > +while getopts "g:hj:v" opt; do
> >  case $opt in
> >  g)
> >  only_group=$OPTARG
> > @@ -37,6 +38,13 @@ while getopts "g:hv" opt; do
> >  usage
> >  exit
> >  ;;
> > +j)
> > +unittest_run_queues=$OPTARG
> > +if ! (( "$unittest_run_queues" > 0 )); then
> 
> The equivalent expression without '!' is (( $var <= 0 )), and
> no need for the "'s around the var.
> 
> > +echo "Invalid -j option: $unittest_run_queues"
> > +exit 1
> > +fi
> > +;;
> >  v)
> >  verbose="yes"
> >  ;;
> > diff --git a/scripts/common.bash b/scripts/common.bash
> > index 2dd7360..83aebf8 100644
> > --- a/scripts/common.bash
> > +++ b/scripts/common.bash
> > @@ -1,11 +1,19 @@
> >  : ${unittest_log_dir:=logs}
> > +: ${unittest_run_queues:=1}
> >  
> >  function run_task()
> >  {
> > local testname="$2"
> >  
> > +   while (( "$(jobs | wc -l)" == $unittest_run_queues )); do
> 
> Why the "'s? The result must resolve to an int, otherwise we don't
> want ((...))
> 
> > +   # wait for any background test to finish
> > +   wait -n
> > +   done
> > +
> > RUNTIME_log_file="${unittest_log_dir}/${testname}.log"
> > -   "$@"
> > +
> > +   # start the testcase in the background
> > +   "$@" &
> >  }
> >  
> >  function for_each_unittest()
> > @@ -22,6 +30,8 @@ function for_each_unittest()
> > local accel
> > local timeout
> >  
> > +   trap "wait; exit 130" SIGINT
> > +
> > exec {fd}<"$unittests"
> >  
> > while read -u $fd line; do
> > @@ -55,5 +65,9 @@ function for_each_unittest()
> > fi
> > done
> > run_task "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" 
> > "$check" "$accel" "$timeout"
> > +
> > +   # wait all task finish
> 
> wait until all tasks finish

Fixing all three places, thanks!

-- peterx



Re: [Qemu-devel] [PATCH] migration: re-active images when migration fails to complete

2017-01-10 Thread Hailiang Zhang

ping .. ?

Any comments ? Or should I send a for formal patch ?

On 2016/12/22 10:56, Hailiang Zhang wrote:

On 2016/12/9 4:02, Dr. David Alan Gilbert wrote:

* Hailiang Zhang (zhang.zhanghaili...@huawei.com) wrote:

Hi,

On 2016/12/6 23:24, Dr. David Alan Gilbert wrote:

* Kevin Wolf (kw...@redhat.com) wrote:

Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben:

commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case
which migration aborted QEMU because it didn't regain the control
of images while some errors happened.

Actually, we have another case in that error path to abort QEMU
because of the same reason:
   migration_thread()
   migration_completion()
  bdrv_inactivate_all() > inactivate images
  qemu_savevm_state_complete_precopy()
  socket_writev_buffer() > error because destination 
fails
qemu_fflush() ---> set error on migration stream
  qemu_mutex_unlock_iothread() --> unlock
   qmp_migrate_cancel() -> user cancelled migration
   migrate_set_state() --> set migrate CANCELLING


Important to note here: qmp_migrate_cancel() is executed by a concurrent
thread, it doesn't depend on any code paths in migration_completion().


   migration_completion() -> go on to fail_invalidate
   if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch
   migration_thread() ---> break migration loop
 vm_start() -> restart guest with inactive
   images
We failed to regain the control of images because we only regain it
while the migration state is "active", but here users cancelled the migration
when they found some errors happened (for example, libvirtd daemon is shutdown
in destination unexpectedly).

Signed-off-by: zhanghailiang 
---
migration/migration.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..0c1ee6d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1752,7 +1752,8 @@ fail_invalidate:
/* If not doing postcopy, vm_start() will be called: let's regain
 * control on images.
 */
-if (s->state == MIGRATION_STATUS_ACTIVE) {


This if condition tries to check whether we ran the code path that
called bdrv_inactivate_all(), so that we only try to reactivate images
it if we really inactivated them first.

The problem with it is that it ignores a possible concurrent
modification of s->state.


+if (s->state == MIGRATION_STATUS_ACTIVE ||
+s->state == MIGRATION_STATUS_CANCELLING) {


This adds another state that we could end up with with a concurrent
modification, so that even in this case we undo the inactivation.

However, it is no longer limited to the cases where we inactivated the
image. It also applies to other code paths (like the postcopy one) where
we didn't inactivate images.

What saves the patch is that bdrv_invalidate_cache() is a no-op for
block devices that aren't inactivated, so calling it more often than
necessary is okay.

But then, if we're going to rely on this, it would be much better to
just remove the if altogether. I can't say whether there are any other
possible values of s->state that we should consider, and by removing the
if we would be guaranteed to catch all of them.

If we don't want to rely on it, just keep a local bool that remembers
whether we inactivated images and check that here.


Error *local_err = NULL;

bdrv_invalidate_cache_all(_err);


So in summary, this is a horrible patch because it checks the wrong
thing, and for I can't really say if it covers everything it needs to
cover, but arguably it happens to correctly fix the outcome of a
previously failing case.

Normally I would reject such a patch and require a clean solution, but
then we're on the day of -rc3, so if you can't send v2 right away, we
might not have the time for it.

Tough call...


Hmm, this case is messy; I created this function having split it out
of the main loop a couple of years back but it did get more messy
with more s->state checks; as far as I can tell it's always
done the transition to COMPLETED at the end well after the locked
section, so there's always been that chance that cancellation sneaks
in just before or just after the locked section.

Some of the bad cases that can happen:
  a) A cancel sneaks in after the ACTIVE check but before or after
the locked section;  should we reactivate the disks? Well that
depends on whether the destination actually got the full migration
stream - we don't know!
   If the destination actually starts running we must not reactivate
   the disk on the source even if the CPU is stopped.



Yes, we didn't have 

Re: [Qemu-devel] [kvm-unit-tests PATCH v4 1/2] run_tests: put logs into per-test file

2017-01-10 Thread Peter Xu
On Tue, Jan 10, 2017 at 06:28:41PM +0100, Andrew Jones wrote:
> On Mon, Jan 09, 2017 at 12:04:53PM +0800, Peter Xu wrote:
> > We were using test.log before to keep all the test logs. This patch
> > creates one log file per test case under logs/ directory with name
> > "TESTNAME.log". Meanwhile, we will keep the last time log into
> > logs.old/.
> > 
> > Renaming scripts/functions.bash into scripts/common.bash to store some
> > more global variables.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  .gitignore  |  3 ++-
> >  Makefile|  4 ++--
> >  run_tests.sh| 22 +++---
> >  scripts/{functions.bash => common.bash} | 13 +++--
> >  scripts/mkstandalone.sh |  2 +-
> >  5 files changed, 31 insertions(+), 13 deletions(-)
> >  rename scripts/{functions.bash => common.bash} (75%)
> > 
> > diff --git a/.gitignore b/.gitignore
> > index 3155418..0dc9a39 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -12,7 +12,8 @@ cscope.*
> >  /lib/asm
> >  /config.mak
> >  /*-run
> > -/test.log
> >  /msr.out
> >  /tests
> >  /build-head
> > +logs
> > +logs.old
> 
> Without the leading '/' logs[.old] files or dirs in subdirs will
> also be ignored. You can add trailing '/' to all the dirs in
> .gitignore too in order to ensure we only ignore dirs.

Will do.

> 
> > diff --git a/Makefile b/Makefile
> > index a32333b..bd8843a 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -94,9 +94,9 @@ libfdt_clean:
> > $(LIBFDT_objdir)/.*.d
> >  
> >  distclean: clean libfdt_clean
> > -   $(RM) lib/asm config.mak $(TEST_DIR)-run test.log msr.out cscope.* \
> > +   $(RM) lib/asm config.mak $(TEST_DIR)-run msr.out cscope.* \
> >   build-head
> 
> Can move build-head up now

Yep.

[...]

> > -
> >  config=$TEST_DIR/unittests.cfg
> > -rm -f test.log
> > -printf "BUILD_HEAD=$(cat build-head)\n\n" > test.log
> > +
> > +rm -rf $unittest_log_dir.old
> > +if ! mv $unittest_log_dir $unittest_log_dir.log; then
> 
> You remove the destination above, so this should never fail.

Hmm looks so... :) Let me fix.

(And I used rm -rf again, but with ".old" it looks much safer so I
 dare to use it)

Thanks,

-- peterx



Re: [Qemu-devel] [PATCH V4 net-next] vhost_net: device IOTLB support

2017-01-10 Thread Jason Wang



On 2017年01月11日 12:32, Jason Wang wrote:

This patches implements Device IOTLB support for vhost kernel. This is
done through:

1) switch to use dma helpers when map/unmap vrings from vhost codes
2) introduce a set of VhostOps to:
- setting up device IOTLB request callback
- processing device IOTLB request
- processing device IOTLB invalidation
2) kernel support for Device IOTLB API:

- allow vhost-net to query the IOMMU IOTLB entry through eventfd
- enable the ability for qemu to update a specified mapping of vhost
- through ioctl.
- enable the ability to invalidate a specified range of iova for the
   device IOTLB of vhost through ioctl. In x86/intel_iommu case this is
   triggered through iommu memory region notifier from device IOTLB
   invalidation descriptor processing routine.

With all the above, kernel vhost_net can co-operate with userspace
IOMMU. For vhost-user, the support could be easily done on top by
implementing the VhostOps.

Cc: Michael S. Tsirkin
Signed-off-by: Jason Wang
---
Changes from V4:
- set iotlb callback only when IOMMU_PLATFORM is negotiated (fix
   vhost-user qtest failure)
- whitelist VIRTIO_F_IOMMU_PLATFORM instead of manually add it
- keep cpu_physical_memory_map() in vhost_memory_map()
---


Note: the patch is for qemu not net-next :)




[Qemu-devel] [PATCH V4 net-next] vhost_net: device IOTLB support

2017-01-10 Thread Jason Wang
This patches implements Device IOTLB support for vhost kernel. This is
done through:

1) switch to use dma helpers when map/unmap vrings from vhost codes
2) introduce a set of VhostOps to:
   - setting up device IOTLB request callback
   - processing device IOTLB request
   - processing device IOTLB invalidation
2) kernel support for Device IOTLB API:

- allow vhost-net to query the IOMMU IOTLB entry through eventfd
- enable the ability for qemu to update a specified mapping of vhost
- through ioctl.
- enable the ability to invalidate a specified range of iova for the
  device IOTLB of vhost through ioctl. In x86/intel_iommu case this is
  triggered through iommu memory region notifier from device IOTLB
  invalidation descriptor processing routine.

With all the above, kernel vhost_net can co-operate with userspace
IOMMU. For vhost-user, the support could be easily done on top by
implementing the VhostOps.

Cc: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
Changes from V4:
- set iotlb callback only when IOMMU_PLATFORM is negotiated (fix
  vhost-user qtest failure)
- whitelist VIRTIO_F_IOMMU_PLATFORM instead of manually add it
- keep cpu_physical_memory_map() in vhost_memory_map()
---
 hw/net/vhost_net.c|   1 +
 hw/virtio/vhost-backend.c |  99 +++
 hw/virtio/vhost.c | 166 +-
 include/hw/virtio/vhost-backend.h |  13 +++
 include/hw/virtio/vhost.h |   4 +
 net/tap.c |   1 +
 6 files changed, 262 insertions(+), 22 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6280422..22874a9 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -52,6 +52,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_NET_F_MRG_RXBUF,
 VIRTIO_F_VERSION_1,
 VIRTIO_NET_F_MTU,
+VIRTIO_F_IOMMU_PLATFORM,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 272a5ec..be927b8 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -185,6 +185,102 @@ static int vhost_kernel_vsock_set_running(struct 
vhost_dev *dev, int start)
 }
 #endif /* CONFIG_VHOST_VSOCK */
 
+static void vhost_kernel_iotlb_read(void *opaque)
+{
+struct vhost_dev *dev = opaque;
+struct vhost_msg msg;
+ssize_t len;
+
+while ((len = read((uintptr_t)dev->opaque, , sizeof msg)) > 0) {
+struct vhost_iotlb_msg *imsg = 
+if (len < sizeof msg) {
+error_report("Wrong vhost message len: %d", (int)len);
+break;
+}
+if (msg.type != VHOST_IOTLB_MSG) {
+error_report("Unknown vhost iotlb message type");
+break;
+}
+switch (imsg->type) {
+case VHOST_IOTLB_MISS:
+vhost_device_iotlb_miss(dev, imsg->iova,
+imsg->perm != VHOST_ACCESS_RO);
+break;
+case VHOST_IOTLB_UPDATE:
+case VHOST_IOTLB_INVALIDATE:
+error_report("Unexpected IOTLB message type");
+break;
+case VHOST_IOTLB_ACCESS_FAIL:
+/* FIXME: report device iotlb error */
+break;
+default:
+break;
+}
+}
+}
+
+static int vhost_kernel_update_device_iotlb(struct vhost_dev *dev,
+uint64_t iova, uint64_t uaddr,
+uint64_t len,
+IOMMUAccessFlags perm)
+{
+struct vhost_msg msg;
+msg.type = VHOST_IOTLB_MSG;
+msg.iotlb.iova =  iova;
+msg.iotlb.uaddr = uaddr;
+msg.iotlb.size = len;
+msg.iotlb.type = VHOST_IOTLB_UPDATE;
+
+switch (perm) {
+case IOMMU_RO:
+msg.iotlb.perm = VHOST_ACCESS_RO;
+break;
+case IOMMU_WO:
+msg.iotlb.perm = VHOST_ACCESS_WO;
+break;
+case IOMMU_RW:
+msg.iotlb.perm = VHOST_ACCESS_RW;
+break;
+default:
+g_assert_not_reached();
+}
+
+if (write((uintptr_t)dev->opaque, , sizeof msg) != sizeof msg) {
+error_report("Fail to update device iotlb");
+return -EFAULT;
+}
+
+return 0;
+}
+
+static int vhost_kernel_invalidate_device_iotlb(struct vhost_dev *dev,
+uint64_t iova, uint64_t len)
+{
+struct vhost_msg msg;
+
+msg.type = VHOST_IOTLB_MSG;
+msg.iotlb.iova = iova;
+msg.iotlb.size = len;
+msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
+
+if (write((uintptr_t)dev->opaque, , sizeof msg) != sizeof msg) {
+error_report("Fail to invalidate device iotlb");
+return -EFAULT;
+}
+
+return 0;
+}
+
+static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
+   int enabled)
+{
+if (enabled)
+qemu_set_fd_handler((uintptr_t)dev->opaque,
+

Re: [Qemu-devel] [PATCH V5 2/2] Add a new qmp command to do checkpoint, query xen replication status

2017-01-10 Thread Zhang Chen

Hi~~ Eric:

I have send the V6 to fix the issues below, Have any comments?


Thanks

Zhang Chen


On 01/05/2017 04:40 AM, Eric Blake wrote:

On 12/27/2016 03:38 AM, Zhang Chen wrote:

We can call this qmp command to do checkpoint outside of qemu.
Like Xen colo need this function.

That sentence is awkward; maybe:

Xen colo will need this function.


Signed-off-by: Zhang Chen 
Signed-off-by: Wen Congyang 
---
  docs/qmp-commands.txt | 24 
  migration/colo.c  | 17 +
  qapi-schema.json  | 50 ++
  3 files changed, 91 insertions(+)

diff --git a/docs/qmp-commands.txt b/docs/qmp-commands.txt
index d182147..a146745 100644
--- a/docs/qmp-commands.txt
+++ b/docs/qmp-commands.txt
@@ -450,6 +450,30 @@ Example:
   "arguments": {"enable": true, "primary": false} }
  <- { "return": {} }
  
+query-xen-replication-status

+
+
+Query replication status when vm is running.
+
+Arguments: None.
+
+Example:
+
+-> { "execute": "query-xen-replication-status" }
+<- { "return": { "status": "normal" } }
+
+xen-do-checkpoint
+-
+
+Xen use this command to notify replication to do checkpoint.

s/use/uses/
s/do/trigger a/


+
+Arguments: None.
+
+Example:
+
+-> { "execute": "xen-do-checkpoint" }
+<- { "return": {} }
+
  migrate
  ---
  
diff --git a/migration/colo.c b/migration/colo.c

index 6fc2ade..7fc9f8a 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -127,6 +127,23 @@ void qmp_xen_set_replication(bool enable, bool primary,
  }
  }
  
+ReplicationErrorResult *qmp_query_xen_replication_status(Error **errp)

+{
+Error *err = NULL;
+ReplicationErrorResult *result = g_new0(ReplicationErrorResult, 1);
+replication_get_error_all();
+result->status = err ?
+ REPLICATION_ERROR_STATUS_ERROR :
+ REPLICATION_ERROR_STATUS_NORMAL;
+error_free(err);
+return result;
+}
+
+void qmp_xen_do_checkpoint(Error **errp)
+{
+replication_do_checkpoint_all(errp);
+}
+
  static void colo_send_message(QEMUFile *f, COLOMessage msg,
Error **errp)
  {
diff --git a/qapi-schema.json b/qapi-schema.json
index 78802f4..6c162a5 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4695,6 +4695,56 @@
'data': { 'enable': 'bool', 'primary': 'bool', '*failover' : 'bool' } }
  
  ##

+# @ReplicationErrorStatus
+#
+# Describe the status of replication error.
+#
+# @error: Replication have a error.

s/have a/has an/


+#
+# @normal:  Replication running normal.

s/running normal/is running normally/


+#
+# Since 2.9
+##
+{ 'enum': 'ReplicationErrorStatus',
+  'data': [ 'error', 'normal' ] }
+
+##
+# @ReplicationErrorResult

The name of this struct is misleading (but harmless, since struct names
are not part of the API).  Better might be: ReplicationStatus


+#
+# The result format for 'xen-get-replication-error'.

Wrong name; you renamed the command 'query-xen-replication-status'.


+#
+# @status: enum of @ReplicationErrorStatus, which shows current
+#  replication error status
+#
+# Since 2.9
+##
+{ 'struct': 'ReplicationErrorResult',
+  'data': { 'status': 'ReplicationErrorStatus'} }
+
+##
+# @query-xen-replication-status
+#
+# Query replication error that occurs when the vm is running.

Since you might extend this in the future, I'd just leave it at:

Query replication status while the vm is running.


+#
+# Returns: A @ReplicationErrorResult objects showing the status.
+#
+# Since: 2.9
+##
+{ 'command': 'query-xen-replication-status',
+  'returns': 'ReplicationErrorResult' }

Again, the struct name is misleading.


+
+##
+# @xen-do-checkpoint
+#
+# Xen use this command to notify replication to do checkpoint.

s/use/uses/


+#
+# Returns: nothing.
+#
+# Since: 2.9
+##
+{ 'command': 'xen-do-checkpoint' }
+
+##
  # @GICCapability:
  #
  # The struct describes capability for a specific GIC (Generic


Getting closer!



--
Thanks
Zhang Chen






Re: [Qemu-devel] [PATCH] tap: fix memory leak on failure in net_init_tap()

2017-01-10 Thread Jason Wang



On 2017年01月11日 03:21, Peter Maydell wrote:

Commit 091a6b2ac fixed most of the memory leaks in failure
paths in net_init_tap() reported by Coverity (CID 1356216),
but missed one. Fix it by deferring the allocation of
fds and vhost_fds until after the error check.

Signed-off-by: Peter Maydell 
---
  net/tap.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index b6896a7..6248e85 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -788,8 +788,8 @@ int net_init_tap(const Netdev *netdev, const char *name,
  return -1;
  }
  } else if (tap->has_fds) {
-char **fds = g_new0(char *, MAX_TAP_QUEUES);
-char **vhost_fds = g_new0(char *, MAX_TAP_QUEUES);
+char **fds;
+char **vhost_fds;
  int nfds, nvhosts;
  
  if (tap->has_ifname || tap->has_script || tap->has_downscript ||

@@ -801,6 +801,9 @@ int net_init_tap(const Netdev *netdev, const char *name,
  return -1;
  }
  
+fds = g_new0(char *, MAX_TAP_QUEUES);

+vhost_fds = g_new0(char *, MAX_TAP_QUEUES);
+
  nfds = get_fds(tap->fds, fds, MAX_TAP_QUEUES);
  if (tap->has_vhostfds) {
  nvhosts = get_fds(tap->vhostfds, vhost_fds, MAX_TAP_QUEUES);


Applied to -net.

Thanks



Re: [Qemu-devel] [PULL 00/65] tcg 2.9 patch queue

2017-01-10 Thread no-reply
Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20170111021820.24416-1-...@twiddle.net
Subject: [Qemu-devel] [PULL 00/65] tcg 2.9 patch queue

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20170111021820.24416-1-...@twiddle.net -> 
patchew/20170111021820.24416-1-...@twiddle.net
Switched to a new branch 'test'
0f88f5c tcg/i386: Handle ctpop opcode
1f3246b tcg/ppc: Handle ctpop opcode
4370d94 tcg: Use ctpop to generate ctz if needed
8f62669 tests: New test-bitcnt
69127b1 qemu/host-utils.h: Reduce the operation count in the fallback ctpop
1350149 target-i386: Use ctpop helper
805f523 target-tilegx: Use ctpop helper
2cc6764 target-sparc: Use ctpop helper
4e44386 target-s390x: Avoid a loop for popcnt
6c03e5a target-ppc: Use ctpop helper
fd08074 target-alpha: Use ctpop helper
e0ee946 tcg: Add opcode for ctpop
0600bbc target-xtensa: Use clrsb helper
03fe872 target-tricore: Use clrsb helper
769636c target-arm: Use clrsb helper
606349a tcg: Add helpers for clrsb
b67b0f6 tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR
5c30589 tcg/i386: Handle ctz and clz opcodes
c6f6448 tcg/i386: Allow bmi2 shiftx to have non-matching operands
c6bfd03 tcg/i386: Hoist common arguments in tcg_out_op
28b7dc6 tcg/i386: Fuly convert tcg_target_op_def
4db1625 tcg/s390: Handle clz opcode
97bfc69 tcg/mips: Handle clz opcode
98030d0 tcg/arm: Handle ctz and clz opcodes
d8e9db5 tcg/aarch64: Handle ctz and clz opcodes
7933434 tcg/ppc: Handle ctz and clz opcodes
b5e0a81 target-i386: Use clz and ctz opcodes
2fdd2e2 target-arm: Use clz opcode
b010d9e target-xtensa: Use clz opcode
a314d9c target-unicore32: Use clz opcode
1434c43 target-tricore: Use clz opcode
0a8fb83 target-tilegx: Use clz and ctz opcodes
2c0ce2e target-s390x: Use clz opcode
1bb6181 target-ppc: Use clz and ctz opcodes
ebc74b7 target-openrisc: Use clz and ctz opcodes
9064b6b target-mips: Use clz opcode
7f35208 target-microblaze: Use clz opcode
813c4d8 target-cris: Use clz opcode
a836c13 target-alpha: Use the ctz and clz opcodes
f6f3d2d disas/ppc: Handle popcnt and cnttz
926a4aa disas/i386.c: Handle tzcnt
6e2652a tcg: Add clz and ctz opcodes
b29aa2d tcg: Allow an operand to be matching or a constant
69d5374 tcg: Pass the opcode width to target_parse_constraint
22172f4 tcg: Transition flat op_defs array to a target callback
60a4dd8 tcg: Add markup for output requires new register
49d060f tcg/optimize: Fold movcond 0/1 into setcond
c6e6941 target-s390x: Use the new deposit and extract ops
167b3dd target-ppc: Use the new deposit and extract ops
a2b1d9f target-mips: Use the new extract op
45a06b9 target-i386: Use new deposit and extract ops
4f4c27f target-arm: Use new deposit and extract ops
3570370 target-alpha: Use deposit and extract ops
668aac4 tcg/s390: Support deposit into zero
46dd947 tcg/s390: Implement field extraction opcodes
84ba4fb tcg/s390: Expose host facilities to tcg-target.h
097af24 tcg/ppc: Implement field extraction opcodes
f89420b tcg/mips: Implement field extraction opcodes
af85983 tcg/i386: Implement field extraction opcodes
7241ba6 tcg/arm: Implement field extraction opcodes
ef59c05 tcg/arm: Move isa detection to tcg-target.h
0e1c309 tcg/aarch64: Implement field extraction opcodes
e8078b3 tcg: Add deposit_z expander
31acf47 tcg: Minor adjustments to deposit expanders
1090144 tcg: Add field extraction primitives

=== OUTPUT BEGIN ===
Checking PATCH 1/65: tcg: Add field extraction primitives...
ERROR: spaces required around that ':' (ctx:VxE)
#143: FILE: tcg/optimize.c:881:
+CASE_OP_32_64(extract):
   ^

ERROR: spaces required around that ':' (ctx:VxE)
#149: FILE: tcg/optimize.c:887:
+CASE_OP_32_64(sextract):
^

ERROR: spaces required around that ':' (ctx:VxE)
#163: FILE: tcg/optimize.c:1064:
+CASE_OP_32_64(extract):
   ^

ERROR: spaces required around that ':' (ctx:VxE)
#171: FILE: tcg/optimize.c:1072:
+CASE_OP_32_64(sextract):
^

ERROR: space prohibited after that '&&' (ctx:ExW)
#275: FILE: tcg/tcg-op.c:582:
+&& TCG_TARGET_extract_i32_valid(ofs, len)) {
 ^

ERROR: space prohibited after that '&&' (ctx:ExW)
#338: FILE: tcg/tcg-op.c:645:
+&& TCG_TARGET_extract_i32_valid(ofs, len)) {
 ^

ERROR: space 

[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers

2017-01-10 Thread Rafael David Tinoco
Xenial Verification (with 3.13 kernel from Trusty since a <= 3.17 kernel
is needed). This verifies that Ubuntu Cloud Archive repositories will be
alright with this new packages (from Xenial / Yakkety).

## CURRENT

inaddy@(xkvm01):~$ apt-cache policy qemu-kvm
qemu-kvm:
  Installed: 1:2.5+dfsg-5ubuntu10.6
  Candidate: 1:2.5+dfsg-5ubuntu10.6

xkvm01 (sender):

Jan 11 01:07:54 xkvm01 kernel: type=1400 audit(1484104074.014:13):
apparmor="DENIED" operation="mknod" profile="libvirt-7cdcb6c0-f85e-4639
-912b-c785bd5992d9" name="/tmp/memfd-Jh5UhR" pid=2535 comm="qemu-
system-x86" requested_mask="c" denied_mask="c" fsuid=112 ouid=112

$ sudo virsh migrate --live guest qemu+ssh://xkvm02/system
error: internal error: unable to execute QEMU command 'migrate': Migration 
disabled: failed to allocate shared memory

xkvm02 (receiver):

Jan 11 01:08:23 xkvm02 kernel: type=1400 audit(1484104103.888:53):
apparmor="DENIED" operation="mknod" profile="libvirt-7cdcb6c0-f85e-4639
-912b-c785bd5992d9" name="/tmp/memfd-fc9rij" pid=2000 comm="qemu-
system-x86" requested_mask="c" denied_mask="c" fsuid=112 ouid=112

OBS: The check was being done in the wrong place AND situation, like I
showed in this bug.

## PROPOSED


inaddy@(xkvm01):~$ apt-cache policy qemu-kvm
qemu-kvm:
  Installed: 1:2.5+dfsg-5ubuntu10.7
  Candidate: 1:2.5+dfsg-5ubuntu10.7

xkvm01 (sender):



xkvm02 (receiver):

inaddy@(xkvm02):~$ virsh list
 IdName   State

 1 guest  running



Its all good.

verification-xenial-done

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1626972

Title:
  QEMU memfd_create fallback mechanism change for security drivers

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive mitaka series:
  Fix Committed
Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu source package in Xenial:
  Fix Committed
Status in qemu source package in Yakkety:
  Fix Committed
Status in qemu source package in Zesty:
  Fix Released

Bug description:
  [Impact]

   * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels.
   * QEMU code checks if it can create /tmp/memfd-XXX files wrongly.
   * Apparmor will block access to /tmp/ and QEMU will fail migrating.

  [Test Case]

   * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules.
   * Try to live-migration from one to another. 
   * Apparmor will block creation of /tmp/memfd-XXX files.

  [Regression Potential]

   Pros:
   * Exhaustively tested this.
   * Worked with upstream on this fix. 
   * I'm implementing new vhost log mechanism for upstream.
   * One line change to a blocker that is already broken.

   Cons:
   * To break live migration in other circumstances. 

  [Other Info]

   * Christian Ehrhardt has been following this.

  ORIGINAL DESCRIPTION:

  When libvirt starts using apparmor, and creating apparmor profiles for
  every virtual machine created in the compute nodes, mitaka qemu (2.5 -
  and upstream also) uses a fallback mechanism for creating shared
  memory for live-migrations. This fall back mechanism, on kernels 3.13
  - that don't have memfd_create() system-call, try to create files on
  /tmp/ directory and fails.. causing live-migration not to work.

  Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability =
  can't live migrate.

  From qemu 2.5, logic is on :

  void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int 
*fd)
  {
  if (memfd_create)... ### only works with HWE kernels

  else ### 3.13 kernels, gets blocked by apparmor
     tmpdir = g_get_tmp_dir
     ...
     mfd = mkstemp(fname)
  }

  And you can see the errors:

  From the host trying to send the virtual machine:

  2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver 
[req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 
133ebc3585c041aebaead8c062cd6511 - - -] [instance: 
2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted
  2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver 
[req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 
133ebc3585c041aebaead8c062cd6511 - - -] [instance: 
2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: 
unable to execute QEMU command 'migrate': Migration disabled: failed to 
allocate shared memory

  From the host trying to receive the virtual machine:

  Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 
audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" 
pid=12565 comm="apparmor_parser"
  Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 
audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" 
profile="unconfined" 

Re: [Qemu-devel] [PATCH v5 1/6] Pass generic CPUState to gen_intermediate_code()

2017-01-10 Thread Richard Henderson

On 12/28/2016 08:27 AM, Lluís Vilanova wrote:

Needed to implement a target-agnostic gen_intermediate_code() in the
future.

Signed-off-by: Lluís Vilanova 
Reviewed-by: David Gibson 
---


Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH] hw/net/dp8393x: Avoid unintentional sign extensions on addresses

2017-01-10 Thread Jason Wang



On 2017年01月10日 02:43, Peter Maydell wrote:

The dp8393x has several 32-bit values which are formed by concatenating
two 16 bit device register values. Attempting to do these inline
with ((s->reg[HI] << 16) | s->reg[LO]) can result in an unintended
sign extension because "x << 16" is of type 'int' even though s->reg
is unsigned, and so if the expression is used in a context where
it is cast to uint64_t the value is incorrectly sign-extended.
Fix this by using accessor functions with a uint32_t return type;
this also makes the code a bit easier to read.

This should fix Coverity issues 1307765, 1307766, 1307767, 1307768.

(To avoid having a ctda read function only used in a DPRINTF,
we move the DPRINTF down slightly so it can use the ttda function.)

Signed-off-by: Peter Maydell
---
Disclaimer: only compile tested as this device only exists on
the MIPS magnum/pica61 boards and I don't have an image for them.


Applied to -net.

Thanks



[Qemu-devel] [PULL 62/65] tests: New test-bitcnt

2017-01-10 Thread Richard Henderson
From: Alex Bennée 

Add some unit tests for bit count functions (currently only ctpop). As
the routines are based on the Hackers Delight optimisations I based
the test patterns on their tests.

Signed-off-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tests/.gitignore   |   1 +
 tests/Makefile.include |   2 +
 tests/test-bitcnt.c| 140 +
 3 files changed, 143 insertions(+)
 create mode 100644 tests/test-bitcnt.c

diff --git a/tests/.gitignore b/tests/.gitignore
index e9b182e..7357d0a 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -13,6 +13,7 @@ rcutorture
 test-aio
 test-base64
 test-bitops
+test-bitcnt
 test-blockjob
 test-blockjob-txn
 test-bufferiszero
diff --git a/tests/Makefile.include b/tests/Makefile.include
index f776404..2029013 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -81,6 +81,7 @@ gcov-files-test-qht-y = util/qht.c
 check-unit-y += tests/test-qht-par$(EXESUF)
 gcov-files-test-qht-par-y = util/qht.c
 check-unit-y += tests/test-bitops$(EXESUF)
+check-unit-y += tests/test-bitcnt$(EXESUF)
 check-unit-$(CONFIG_HAS_GLIB_SUBPROCESS_TESTS) += 
tests/test-qdev-global-props$(EXESUF)
 check-unit-y += tests/check-qom-interface$(EXESUF)
 gcov-files-check-qom-interface-y = qom/object.c
@@ -571,6 +572,7 @@ tests/test-opts-visitor$(EXESUF): tests/test-opts-visitor.o 
$(test-qapi-obj-y)
 
 tests/test-mul64$(EXESUF): tests/test-mul64.o $(test-util-obj-y)
 tests/test-bitops$(EXESUF): tests/test-bitops.o $(test-util-obj-y)
+tests/test-bitcnt$(EXESUF): tests/test-bitcnt.o $(test-util-obj-y)
 tests/test-crypto-hash$(EXESUF): tests/test-crypto-hash.o $(test-crypto-obj-y)
 tests/test-crypto-hmac$(EXESUF): tests/test-crypto-hmac.o $(test-crypto-obj-y)
 tests/test-crypto-cipher$(EXESUF): tests/test-crypto-cipher.o 
$(test-crypto-obj-y)
diff --git a/tests/test-bitcnt.c b/tests/test-bitcnt.c
new file mode 100644
index 000..e153dcb
--- /dev/null
+++ b/tests/test-bitcnt.c
@@ -0,0 +1,140 @@
+/*
+ * Test bit count routines
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+
+struct bitcnt_test_data {
+/* value to count */
+union {
+uint8_t  w8;
+uint16_t w16;
+uint32_t w32;
+uint64_t w64;
+} value;
+/* expected result */
+int popct;
+};
+
+struct bitcnt_test_data eight_bit_data[] = {
+{ { .w8 = 0x00 }, .popct=0 },
+{ { .w8 = 0x01 }, .popct=1 },
+{ { .w8 = 0x03 }, .popct=2 },
+{ { .w8 = 0x04 }, .popct=1 },
+{ { .w8 = 0x0f }, .popct=4 },
+{ { .w8 = 0x3f }, .popct=6 },
+{ { .w8 = 0x40 }, .popct=1 },
+{ { .w8 = 0xf0 }, .popct=4 },
+{ { .w8 = 0x7f }, .popct=7 },
+{ { .w8 = 0x80 }, .popct=1 },
+{ { .w8 = 0xf1 }, .popct=5 },
+{ { .w8 = 0xfe }, .popct=7 },
+{ { .w8 = 0xff }, .popct=8 },
+};
+
+static void test_ctpop8(void)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(eight_bit_data); i++) {
+struct bitcnt_test_data *d = _bit_data[i];
+g_assert(ctpop8(d->value.w8)==d->popct);
+}
+}
+
+struct bitcnt_test_data sixteen_bit_data[] = {
+{ { .w16 = 0x }, .popct=0 },
+{ { .w16 = 0x0001 }, .popct=1 },
+{ { .w16 = 0x0003 }, .popct=2 },
+{ { .w16 = 0x000f }, .popct=4 },
+{ { .w16 = 0x003f }, .popct=6 },
+{ { .w16 = 0x00f0 }, .popct=4 },
+{ { .w16 = 0x0f0f }, .popct=8 },
+{ { .w16 = 0x1f1f }, .popct=10 },
+{ { .w16 = 0x4000 }, .popct=1 },
+{ { .w16 = 0x4001 }, .popct=2 },
+{ { .w16 = 0x7000 }, .popct=3 },
+{ { .w16 = 0x7fff }, .popct=15 },
+};
+
+static void test_ctpop16(void)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(sixteen_bit_data); i++) {
+struct bitcnt_test_data *d = _bit_data[i];
+g_assert(ctpop16(d->value.w16)==d->popct);
+}
+}
+
+struct bitcnt_test_data thirtytwo_bit_data[] = {
+{ { .w32 = 0x }, .popct=0 },
+{ { .w32 = 0x0001 }, .popct=1 },
+{ { .w32 = 0x000f }, .popct=4 },
+{ { .w32 = 0x0f0f }, .popct=8 },
+{ { .w32 = 0x1f1f }, .popct=10 },
+{ { .w32 = 0x4001 }, .popct=2 },
+{ { .w32 = 0x7000 }, .popct=3 },
+{ { .w32 = 0x7fff }, .popct=15 },
+{ { .w32 = 0x }, .popct=16 },
+{ { .w32 = 0x }, .popct=16 },
+{ { .w32 = 0xff00 }, .popct=8 },
+{ { .w32 = 0xc0c0c0c0 }, .popct=8 },
+{ { .w32 = 0x0ff0 }, .popct=24 },
+{ { .w32 = 0x8000 }, .popct=1 },
+{ { .w32 = 0x }, .popct=32 },
+};
+
+static void test_ctpop32(void)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(thirtytwo_bit_data); i++) {
+struct bitcnt_test_data *d = _bit_data[i];
+g_assert(ctpop32(d->value.w32)==d->popct);
+}
+}
+
+struct bitcnt_test_data sixtyfour_bit_data[] = {
+{ { .w64 = 

Re: [Qemu-devel] [PATCH v5 3/6] target: [tcg] Add generic translation framework

2017-01-10 Thread Richard Henderson

On 12/28/2016 08:28 AM, Lluís Vilanova wrote:

+typedef enum DisasJumpType {
+DJ_NEXT,
+DJ_TOO_MANY,
+DJ_TARGET,
+} DisasJumpType;


I wonder if enums like DJ_TARGET_{0..N} wouldn't be better, rather than doing 
addition in the target-specific names.



+typedef struct DisasContextBase {
+TranslationBlock *tb;
+bool singlestep_enabled;
+target_ulong pc_first;
+target_ulong pc_next;
+DisasJumpType jmp_type;
+unsigned int num_insns;
+} DisasContextBase;


Sort the bool to the end to minimize padding.


+/* Get first breakpoint matching a PC */
+static inline CPUBreakpoint *cpu_breakpoint_get(CPUState *cpu, vaddr pc,
+CPUBreakpoint *bp)
+{
+if (likely(bp == NULL)) {
+if (unlikely(!QTAILQ_EMPTY(>breakpoints))) {
+QTAILQ_FOREACH(bp, >breakpoints, entry) {
+if (bp->pc == pc) {
+return bp;
+}
+}
+}
+} else {
+QTAILQ_FOREACH_CONTINUE(bp, entry) {
+if (bp->pc == pc) {
+return bp;
+}
+}
+}
+return NULL;
+}


Any reason not to put the QTAILQ_FOREACH directly into gen_intermediate_code, 
rather than indirect it like this?  I don't see this abstraction as an improvement.



r~



[Qemu-devel] [PULL 64/65] tcg/ppc: Handle ctpop opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |  5 +++--
 tcg/ppc/tcg-target.inc.c | 12 +++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 57e66cf..abd8b3d 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -49,6 +49,7 @@ typedef enum {
 TCG_AREG0 = TCG_REG_R27
 } TCGReg;
 
+extern bool have_isa_2_06;
 extern bool have_isa_3_00;
 
 /* optional instructions automatically implemented */
@@ -72,7 +73,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_nor_i32  1
 #define TCG_TARGET_HAS_clz_i32  1
 #define TCG_TARGET_HAS_ctz_i32  have_isa_3_00
-#define TCG_TARGET_HAS_ctpop_i320
+#define TCG_TARGET_HAS_ctpop_i32have_isa_2_06
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 0
@@ -108,7 +109,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_nor_i64  1
 #define TCG_TARGET_HAS_clz_i64  1
 #define TCG_TARGET_HAS_ctz_i64  have_isa_3_00
-#define TCG_TARGET_HAS_ctpop_i640
+#define TCG_TARGET_HAS_ctpop_i64have_isa_2_06
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 766bc1a..64f67d2 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -83,7 +83,7 @@ static tcg_insn_unit *tb_ret_addr;
 
 #include "elf.h"
 
-static bool have_isa_2_06;
+bool have_isa_2_06;
 bool have_isa_3_00;
 
 #define HAVE_ISA_2_06  have_isa_2_06
@@ -457,6 +457,8 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define CNTLZD XO31( 58)
 #define CNTTZW XO31(538)
 #define CNTTZD XO31(570)
+#define CNTPOPW XO31(378)
+#define CNTPOPD XO31(506)
 #define ANDC   XO31( 60)
 #define ORCXO31(412)
 #define EQVXO31(284)
@@ -2149,6 +2151,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 tcg_out_cntxz(s, TCG_TYPE_I32, CNTTZW, args[0], args[1],
   args[2], const_args[2]);
 break;
+case INDEX_op_ctpop_i32:
+tcg_out32(s, CNTPOPW | SAB(args[1], args[0], 0));
+break;
 
 case INDEX_op_clz_i64:
 tcg_out_cntxz(s, TCG_TYPE_I64, CNTLZD, args[0], args[1],
@@ -2158,6 +2163,9 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 tcg_out_cntxz(s, TCG_TYPE_I64, CNTTZD, args[0], args[1],
   args[2], const_args[2]);
 break;
+case INDEX_op_ctpop_i64:
+tcg_out32(s, CNTPOPD | SAB(args[1], args[0], 0));
+break;
 
 case INDEX_op_mul_i32:
 a0 = args[0], a1 = args[1], a2 = args[2];
@@ -2573,6 +2581,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
 { INDEX_op_nor_i32, { "r", "r", "r" } },
 { INDEX_op_clz_i32, { "r", "r", "rZW" } },
 { INDEX_op_ctz_i32, { "r", "r", "rZW" } },
+{ INDEX_op_ctpop_i32, { "r", "r" } },
 
 { INDEX_op_shl_i32, { "r", "r", "ri" } },
 { INDEX_op_shr_i32, { "r", "r", "ri" } },
@@ -2623,6 +2632,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
 { INDEX_op_nor_i64, { "r", "r", "r" } },
 { INDEX_op_clz_i64, { "r", "r", "rZW" } },
 { INDEX_op_ctz_i64, { "r", "r", "rZW" } },
+{ INDEX_op_ctpop_i64, { "r", "r" } },
 
 { INDEX_op_shl_i64, { "r", "r", "ri" } },
 { INDEX_op_shr_i64, { "r", "r", "ri" } },
-- 
2.9.3




[Qemu-devel] [PULL 61/65] qemu/host-utils.h: Reduce the operation count in the fallback ctpop

2017-01-10 Thread Richard Henderson
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/qemu/host-utils.h | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 46187bb..96288d0 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -327,7 +327,7 @@ static inline int ctpop8(uint8_t val)
 #else
 val = (val & 0x55) + ((val >> 1) & 0x55);
 val = (val & 0x33) + ((val >> 2) & 0x33);
-val = (val & 0x0f) + ((val >> 4) & 0x0f);
+val = (val + (val >> 4)) & 0x0f;
 
 return val;
 #endif
@@ -344,8 +344,8 @@ static inline int ctpop16(uint16_t val)
 #else
 val = (val & 0x) + ((val >> 1) & 0x);
 val = (val & 0x) + ((val >> 2) & 0x);
-val = (val & 0x0f0f) + ((val >> 4) & 0x0f0f);
-val = (val & 0x00ff) + ((val >> 8) & 0x00ff);
+val = (val + (val >> 4)) & 0x0f0f;
+val = (val + (val >> 8)) & 0x00ff;
 
 return val;
 #endif
@@ -360,11 +360,10 @@ static inline int ctpop32(uint32_t val)
 #if QEMU_GNUC_PREREQ(3, 4)
 return __builtin_popcount(val);
 #else
-val = (val & 0x) + ((val >>  1) & 0x);
-val = (val & 0x) + ((val >>  2) & 0x);
-val = (val & 0x0f0f0f0f) + ((val >>  4) & 0x0f0f0f0f);
-val = (val & 0x00ff00ff) + ((val >>  8) & 0x00ff00ff);
-val = (val & 0x) + ((val >> 16) & 0x);
+val = (val & 0x) + ((val >> 1) & 0x);
+val = (val & 0x) + ((val >> 2) & 0x);
+val = (val + (val >> 4)) & 0x0f0f0f0f;
+val = (val * 0x01010101) >> 24;
 
 return val;
 #endif
@@ -379,12 +378,10 @@ static inline int ctpop64(uint64_t val)
 #if QEMU_GNUC_PREREQ(3, 4)
 return __builtin_popcountll(val);
 #else
-val = (val & 0xULL) + ((val >>  1) & 
0xULL);
-val = (val & 0xULL) + ((val >>  2) & 
0xULL);
-val = (val & 0x0f0f0f0f0f0f0f0fULL) + ((val >>  4) & 
0x0f0f0f0f0f0f0f0fULL);
-val = (val & 0x00ff00ff00ff00ffULL) + ((val >>  8) & 
0x00ff00ff00ff00ffULL);
-val = (val & 0xULL) + ((val >> 16) & 
0xULL);
-val = (val & 0xULL) + ((val >> 32) & 
0xULL);
+val = (val & 0xULL) + ((val >> 1) & 0xULL);
+val = (val & 0xULL) + ((val >> 2) & 0xULL);
+val = (val + (val >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
+val = (val * 0x0101010101010101ULL) >> 56;
 
 return val;
 #endif
-- 
2.9.3




[Qemu-devel] [PULL 59/65] target-tilegx: Use ctpop helper

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/tilegx/helper.c| 5 -
 target/tilegx/helper.h| 1 -
 target/tilegx/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/tilegx/helper.c b/target/tilegx/helper.c
index b6f5e29..4964bb9 100644
--- a/target/tilegx/helper.c
+++ b/target/tilegx/helper.c
@@ -55,11 +55,6 @@ void helper_ext01_ics(CPUTLGState *env)
 }
 }
 
-uint64_t helper_pcnt(uint64_t arg)
-{
-return ctpop64(arg);
-}
-
 uint64_t helper_revbits(uint64_t arg)
 {
 return revbit64(arg);
diff --git a/target/tilegx/helper.h b/target/tilegx/helper.h
index bab303a..16745c2 100644
--- a/target/tilegx/helper.h
+++ b/target/tilegx/helper.h
@@ -1,6 +1,5 @@
 DEF_HELPER_2(exception, noreturn, env, i32)
 DEF_HELPER_1(ext01_ics, void, env)
-DEF_HELPER_FLAGS_1(pcnt, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(revbits, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_3(shufflebytes, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_2(crc32_8, TCG_CALL_NO_RWG_SE, i64, i64, i64)
diff --git a/target/tilegx/translate.c b/target/tilegx/translate.c
index 8a2df1b..ff2ef7b 100644
--- a/target/tilegx/translate.c
+++ b/target/tilegx/translate.c
@@ -697,7 +697,7 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned 
opext,
 break;
 case OE_RR_X0(PCNT):
 case OE_RR_Y0(PCNT):
-gen_helper_pcnt(tdest, tsrca);
+tcg_gen_ctpop_tl(tdest, tsrca);
 mnemonic = "pcnt";
 break;
 case OE_RR_X0(REVBITS):
-- 
2.9.3




[Qemu-devel] [PULL 48/65] tcg/i386: Handle ctz and clz opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |   8 +--
 tcg/i386/tcg-target.inc.c | 125 ++
 2 files changed, 120 insertions(+), 13 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index f2d9955..8fff287 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -93,8 +93,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i32  0
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
-#define TCG_TARGET_HAS_clz_i32  0
-#define TCG_TARGET_HAS_ctz_i32  0
+#define TCG_TARGET_HAS_clz_i32  1
+#define TCG_TARGET_HAS_ctz_i32  1
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 1
@@ -127,8 +127,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i64  0
 #define TCG_TARGET_HAS_nand_i64 0
 #define TCG_TARGET_HAS_nor_i64  0
-#define TCG_TARGET_HAS_clz_i64  0
-#define TCG_TARGET_HAS_ctz_i64  0
+#define TCG_TARGET_HAS_clz_i64  1
+#define TCG_TARGET_HAS_ctz_i64  1
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 651d96c..3ed8cd1 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -92,6 +92,7 @@ static const int tcg_target_call_oarg_regs[] = {
 #define TCG_CT_CONST_S32 0x100
 #define TCG_CT_CONST_U32 0x200
 #define TCG_CT_CONST_I32 0x400
+#define TCG_CT_CONST_WSZ 0x800
 
 /* Registers used with L constraint, which are the first argument 
registers on x86_64, and two random call clobbered registers on
@@ -138,6 +139,11 @@ static bool have_bmi2;
 #else
 # define have_bmi2 0
 #endif
+#if defined(CONFIG_CPUID_H) && defined(bit_LZCNT)
+static bool have_lzcnt;
+#else
+# define have_lzcnt 0
+#endif
 
 static tcg_insn_unit *tb_ret_addr;
 
@@ -214,6 +220,10 @@ static const char 
*target_parse_constraint(TCGArgConstraint *ct,
 tcg_regset_set32(ct->u.regs, 0, 0xff);
 }
 break;
+case 'W':
+/* With TZCNT/LZCNT, we can have operand-size as an input.  */
+ct->ct |= TCG_CT_CONST_WSZ;
+break;
 
 /* qemu_ld/st address constraint */
 case 'L':
@@ -260,6 +270,9 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 if ((ct & TCG_CT_CONST_I32) && ~val == (int32_t)~val) {
 return 1;
 }
+if ((ct & TCG_CT_CONST_WSZ) && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
+return 1;
+}
 return 0;
 }
 
@@ -293,6 +306,8 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_ARITH_GvEv (0x03)  /* ... plus (ARITH_FOO << 3) */
 #define OPC_ANDN(0xf2 | P_EXT38)
 #define OPC_ADD_GvEv   (OPC_ARITH_GvEv | (ARITH_ADD << 3))
+#define OPC_BSF (0xbc | P_EXT)
+#define OPC_BSR (0xbd | P_EXT)
 #define OPC_BSWAP  (0xc8 | P_EXT)
 #define OPC_CALL_Jz(0xe8)
 #define OPC_CMOVCC  (0x40 | P_EXT)  /* ... plus condition code */
@@ -307,6 +322,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_JMP_long   (0xe9)
 #define OPC_JMP_short  (0xeb)
 #define OPC_LEA (0x8d)
+#define OPC_LZCNT   (0xbd | P_EXT | P_SIMDF3)
 #define OPC_MOVB_EvGv  (0x88)  /* stores, more or less */
 #define OPC_MOVL_EvGv  (0x89)  /* stores, more or less */
 #define OPC_MOVL_GvEv  (0x8b)  /* loads, more or less */
@@ -333,6 +349,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_SHLX(0xf7 | P_EXT38 | P_DATA16)
 #define OPC_SHRX(0xf7 | P_EXT38 | P_SIMDF2)
 #define OPC_TESTL  (0x85)
+#define OPC_TZCNT   (0xbc | P_EXT | P_SIMDF3)
 #define OPC_XCHG_ax_r32(0x90)
 
 #define OPC_GRP3_Ev(0xf7)
@@ -418,6 +435,11 @@ static void tcg_out_opc(TCGContext *s, int opc, int r, int 
rm, int x)
 if (opc & P_ADDR32) {
 tcg_out8(s, 0x67);
 }
+if (opc & P_SIMDF3) {
+tcg_out8(s, 0xf3);
+} else if (opc & P_SIMDF2) {
+tcg_out8(s, 0xf2);
+}
 
 rex = 0;
 rex |= (opc & P_REXW) ? 0x8 : 0x0;  /* REX.W */
@@ -452,6 +474,11 @@ static void tcg_out_opc(TCGContext *s, int opc)
 if (opc & P_DATA16) {
 tcg_out8(s, 0x66);
 }
+if (opc & P_SIMDF3) {
+tcg_out8(s, 0xf3);
+} else if (opc & P_SIMDF2) {
+tcg_out8(s, 0xf2);
+}
 if (opc & (P_EXT | P_EXT38)) {
 tcg_out8(s, 0x0f);
 if (opc & P_EXT38) {
@@ -1080,13 +1107,11 @@ static void tcg_out_setcond2(TCGContext *s, const 
TCGArg *args,
 }
 #endif
 
-static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGArg dest,
-  TCGArg c1, TCGArg c2, int const_c2,
-  TCGArg v1)

Re: [Qemu-devel] [PATCH V4 10/10] vhost_net: device IOTLB support

2017-01-10 Thread Jason Wang



On 2017年01月10日 12:55, Michael S. Tsirkin wrote:

On Fri, Dec 30, 2016 at 06:09:19PM +0800, Jason Wang wrote:

This patches implements Device IOTLB support for vhost kernel. This is
done through:

1) switch to use dma helpers when map/unmap vrings from vhost codes
2) introduce a set of VhostOps to:
- setting up device IOTLB request callback
- processing device IOTLB request
- processing device IOTLB invalidation
2) kernel support for Device IOTLB API:

- allow vhost-net to query the IOMMU IOTLB entry through eventfd
- enable the ability for qemu to update a specified mapping of vhost
- through ioctl.
- enable the ability to invalidate a specified range of iova for the
   device IOTLB of vhost through ioctl. In x86/intel_iommu case this is
   triggered through iommu memory region notifier from device IOTLB
   invalidation descriptor processing routine.

With all the above, kernel vhost_net can co-operate with userspace
IOMMU. For vhost-user, the support could be easily done on top by
implementing the VhostOps.

Cc: Michael S. Tsirkin
Signed-off-by: Jason Wang

Specifically this patch is the one causing issues.




Yes, this is because it tries to enable device IOTLB for vhost-user. 
Will post a new version for this patch.


Thanks



[Qemu-devel] [PULL 57/65] target-s390x: Avoid a loop for popcnt

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/s390x/int_helper.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/s390x/int_helper.c b/target/s390x/int_helper.c
index 5bc470b..f26f36a 100644
--- a/target/s390x/int_helper.c
+++ b/target/s390x/int_helper.c
@@ -137,14 +137,11 @@ uint64_t HELPER(cvd)(int32_t reg)
 return dec;
 }
 
-uint64_t HELPER(popcnt)(uint64_t r2)
+uint64_t HELPER(popcnt)(uint64_t val)
 {
-uint64_t ret = 0;
-int i;
-
-for (i = 0; i < 64; i += 8) {
-uint64_t t = ctpop32((r2 >> i) & 0xff);
-ret |= t << i;
-}
-return ret;
+/* Note that we don't fold past bytes. */
+val = (val & 0xULL) + ((val >> 1) & 0xULL);
+val = (val & 0xULL) + ((val >> 2) & 0xULL);
+val = (val + (val >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
+return val;
 }
-- 
2.9.3




[Qemu-devel] [PULL 63/65] tcg: Use ctpop to generate ctz if needed

2017-01-10 Thread Richard Henderson
Particularly when andc is also available, this is two insns
shorter than using clz to compute ctz.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 100 +++
 1 file changed, 60 insertions(+), 40 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 6f4b1b6..95a39b7 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -497,33 +497,27 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, 
TCGv_i32 arg2)
 tcg_gen_extrl_i64_i32(ret, t1);
 tcg_temp_free_i64(t1);
 tcg_temp_free_i64(t2);
-} else if (TCG_TARGET_HAS_clz_i32) {
-TCGv_i32 t1 = tcg_temp_new_i32();
-TCGv_i32 t2 = tcg_temp_new_i32();
-tcg_gen_neg_i32(t1, arg1);
-tcg_gen_xori_i32(t2, arg2, 31);
-tcg_gen_and_i32(t1, t1, arg1);
-tcg_gen_clz_i32(ret, t1, t2);
-tcg_temp_free_i32(t1);
-tcg_temp_free_i32(t2);
-tcg_gen_xori_i32(ret, ret, 31);
-} else if (TCG_TARGET_HAS_clz_i64) {
-TCGv_i32 t1 = tcg_temp_new_i32();
-TCGv_i32 t2 = tcg_temp_new_i32();
-TCGv_i64 x1 = tcg_temp_new_i64();
-TCGv_i64 x2 = tcg_temp_new_i64();
-tcg_gen_neg_i32(t1, arg1);
-tcg_gen_xori_i32(t2, arg2, 63);
-tcg_gen_and_i32(t1, t1, arg1);
-tcg_gen_extu_i32_i64(x1, t1);
-tcg_gen_extu_i32_i64(x2, t2);
-tcg_temp_free_i32(t1);
-tcg_temp_free_i32(t2);
-tcg_gen_clz_i64(x1, x1, x2);
-tcg_gen_extrl_i64_i32(ret, x1);
-tcg_temp_free_i64(x1);
-tcg_temp_free_i64(x2);
-tcg_gen_xori_i32(ret, ret, 63);
+} else if (TCG_TARGET_HAS_ctpop_i32
+   || TCG_TARGET_HAS_ctpop_i64
+   || TCG_TARGET_HAS_clz_i32
+   || TCG_TARGET_HAS_clz_i64) {
+TCGv_i32 z, t = tcg_temp_new_i32();
+
+if (TCG_TARGET_HAS_ctpop_i32 || TCG_TARGET_HAS_ctpop_i64) {
+tcg_gen_subi_i32(t, arg1, 1);
+tcg_gen_andc_i32(t, t, arg1);
+tcg_gen_ctpop_i32(t, t);
+} else {
+/* Since all non-x86 hosts have clz(0) == 32, don't fight it.  */
+tcg_gen_neg_i32(t, arg1);
+tcg_gen_and_i32(t, t, arg1);
+tcg_gen_clzi_i32(t, t, 32);
+tcg_gen_xori_i32(t, t, 31);
+}
+z = tcg_const_i32(0);
+tcg_gen_movcond_i32(TCG_COND_EQ, ret, arg1, z, arg2, t);
+tcg_temp_free_i32(t);
+tcg_temp_free_i32(z);
 } else {
 gen_helper_ctz_i32(ret, arg1, arg2);
 }
@@ -531,9 +525,18 @@ void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 
arg2)
 
 void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2)
 {
-TCGv_i32 t = tcg_const_i32(arg2);
-tcg_gen_ctz_i32(ret, arg1, t);
-tcg_temp_free_i32(t);
+if (!TCG_TARGET_HAS_ctz_i32 && TCG_TARGET_HAS_ctpop_i32 && arg2 == 32) {
+/* This equivalence has the advantage of not requiring a fixup.  */
+TCGv_i32 t = tcg_temp_new_i32();
+tcg_gen_subi_i32(t, arg1, 1);
+tcg_gen_andc_i32(t, t, arg1);
+tcg_gen_ctpop_i32(ret, t);
+tcg_temp_free_i32(t);
+} else {
+TCGv_i32 t = tcg_const_i32(arg2);
+tcg_gen_ctz_i32(ret, arg1, t);
+tcg_temp_free_i32(t);
+}
 }
 
 void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
@@ -1842,16 +1845,24 @@ void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, 
TCGv_i64 arg2)
 {
 if (TCG_TARGET_HAS_ctz_i64) {
 tcg_gen_op3_i64(INDEX_op_ctz_i64, ret, arg1, arg2);
-} else if (TCG_TARGET_HAS_clz_i64) {
-TCGv_i64 t1 = tcg_temp_new_i64();
-TCGv_i64 t2 = tcg_temp_new_i64();
-tcg_gen_neg_i64(t1, arg1);
-tcg_gen_xori_i64(t2, arg2, 63);
-tcg_gen_and_i64(t1, t1, arg1);
-tcg_gen_clz_i64(ret, t1, t2);
-tcg_temp_free_i64(t1);
-tcg_temp_free_i64(t2);
-tcg_gen_xori_i64(ret, ret, 63);
+} else if (TCG_TARGET_HAS_ctpop_i64 || TCG_TARGET_HAS_clz_i64) {
+TCGv_i64 z, t = tcg_temp_new_i64();
+
+if (TCG_TARGET_HAS_ctpop_i64) {
+tcg_gen_subi_i64(t, arg1, 1);
+tcg_gen_andc_i64(t, t, arg1);
+tcg_gen_ctpop_i64(t, t);
+} else {
+/* Since all non-x86 hosts have clz(0) == 64, don't fight it.  */
+tcg_gen_neg_i64(t, arg1);
+tcg_gen_and_i64(t, t, arg1);
+tcg_gen_clzi_i64(t, t, 64);
+tcg_gen_xori_i64(t, t, 63);
+}
+z = tcg_const_i64(0);
+tcg_gen_movcond_i64(TCG_COND_EQ, ret, arg1, z, arg2, t);
+tcg_temp_free_i64(t);
+tcg_temp_free_i64(z);
 } else {
 gen_helper_ctz_i64(ret, arg1, arg2);
 }
@@ -1868,6 +1879,15 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, 
uint64_t arg2)
 tcg_gen_ctz_i32(TCGV_LOW(ret), TCGV_LOW(arg1), t32);
 tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
 tcg_temp_free_i32(t32);
+} else if (!TCG_TARGET_HAS_ctz_i64
+   && 

[Qemu-devel] [PULL 46/65] tcg/i386: Hoist common arguments in tcg_out_op

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 197 ++
 1 file changed, 95 insertions(+), 102 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index e497bef..83572ac 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1791,7 +1791,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
   const TCGArg *args, const int *const_args)
 {
-int c, vexop, rexw = 0;
+TCGArg a0, a1, a2;
+int c, const_a2, vexop, rexw = 0;
 
 #if TCG_TARGET_REG_BITS == 64
 # define OP_32_64(x) \
@@ -1803,9 +1804,15 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 case glue(glue(INDEX_op_, x), _i32)
 #endif
 
-switch(opc) {
+/* Hoist the loads of the most common arguments.  */
+a0 = args[0];
+a1 = args[1];
+a2 = args[2];
+const_a2 = const_args[2];
+
+switch (opc) {
 case INDEX_op_exit_tb:
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, args[0]);
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, a0);
 tcg_out_jmp(s, tb_ret_addr);
 break;
 case INDEX_op_goto_tb:
@@ -1820,57 +1827,53 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_nopn(s, gap - 1);
 }
 tcg_out8(s, OPC_JMP_long); /* jmp im */
-s->tb_jmp_insn_offset[args[0]] = tcg_current_code_size(s);
+s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
 tcg_out32(s, 0);
 } else {
 /* indirect jump method */
 tcg_out_modrm_offset(s, OPC_GRP5, EXT5_JMPN_Ev, -1,
- (intptr_t)(s->tb_jmp_target_addr + args[0]));
+ (intptr_t)(s->tb_jmp_target_addr + a0));
 }
-s->tb_jmp_reset_offset[args[0]] = tcg_current_code_size(s);
+s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
 break;
 case INDEX_op_br:
-tcg_out_jxx(s, JCC_JMP, arg_label(args[0]), 0);
+tcg_out_jxx(s, JCC_JMP, arg_label(a0), 0);
 break;
 OP_32_64(ld8u):
 /* Note that we can ignore REXW for the zero-extend to 64-bit.  */
-tcg_out_modrm_offset(s, OPC_MOVZBL, args[0], args[1], args[2]);
+tcg_out_modrm_offset(s, OPC_MOVZBL, a0, a1, a2);
 break;
 OP_32_64(ld8s):
-tcg_out_modrm_offset(s, OPC_MOVSBL + rexw, args[0], args[1], args[2]);
+tcg_out_modrm_offset(s, OPC_MOVSBL + rexw, a0, a1, a2);
 break;
 OP_32_64(ld16u):
 /* Note that we can ignore REXW for the zero-extend to 64-bit.  */
-tcg_out_modrm_offset(s, OPC_MOVZWL, args[0], args[1], args[2]);
+tcg_out_modrm_offset(s, OPC_MOVZWL, a0, a1, a2);
 break;
 OP_32_64(ld16s):
-tcg_out_modrm_offset(s, OPC_MOVSWL + rexw, args[0], args[1], args[2]);
+tcg_out_modrm_offset(s, OPC_MOVSWL + rexw, a0, a1, a2);
 break;
 #if TCG_TARGET_REG_BITS == 64
 case INDEX_op_ld32u_i64:
 #endif
 case INDEX_op_ld_i32:
-tcg_out_ld(s, TCG_TYPE_I32, args[0], args[1], args[2]);
+tcg_out_ld(s, TCG_TYPE_I32, a0, a1, a2);
 break;
 
 OP_32_64(st8):
 if (const_args[0]) {
-tcg_out_modrm_offset(s, OPC_MOVB_EvIz,
- 0, args[1], args[2]);
-tcg_out8(s, args[0]);
+tcg_out_modrm_offset(s, OPC_MOVB_EvIz, 0, a1, a2);
+tcg_out8(s, a0);
 } else {
-tcg_out_modrm_offset(s, OPC_MOVB_EvGv | P_REXB_R,
- args[0], args[1], args[2]);
+tcg_out_modrm_offset(s, OPC_MOVB_EvGv | P_REXB_R, a0, a1, a2);
 }
 break;
 OP_32_64(st16):
 if (const_args[0]) {
-tcg_out_modrm_offset(s, OPC_MOVL_EvIz | P_DATA16,
- 0, args[1], args[2]);
-tcg_out16(s, args[0]);
+tcg_out_modrm_offset(s, OPC_MOVL_EvIz | P_DATA16, 0, a1, a2);
+tcg_out16(s, a0);
 } else {
-tcg_out_modrm_offset(s, OPC_MOVL_EvGv | P_DATA16,
- args[0], args[1], args[2]);
+tcg_out_modrm_offset(s, OPC_MOVL_EvGv | P_DATA16, a0, a1, a2);
 }
 break;
 #if TCG_TARGET_REG_BITS == 64
@@ -1878,19 +1881,18 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 #endif
 case INDEX_op_st_i32:
 if (const_args[0]) {
-tcg_out_modrm_offset(s, OPC_MOVL_EvIz, 0, args[1], args[2]);
-tcg_out32(s, args[0]);
+tcg_out_modrm_offset(s, OPC_MOVL_EvIz, 0, a1, a2);
+tcg_out32(s, a0);
 } else {
-tcg_out_st(s, TCG_TYPE_I32, args[0], args[1], args[2]);
+tcg_out_st(s, TCG_TYPE_I32, a0, a1, a2);
 }
 break;
 
 OP_32_64(add):
 

[Qemu-devel] [PULL 56/65] target-ppc: Use ctpop helper

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/ppc/helper.h |  3 +--
 target/ppc/int_helper.c | 18 +++---
 target/ppc/translate.c  |  6 +-
 3 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 1ed1d2c..0a8fbba 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -39,12 +39,11 @@ DEF_HELPER_4(divweu, tl, env, tl, tl, i32)
 DEF_HELPER_4(divwe, tl, env, tl, tl, i32)
 
 DEF_HELPER_FLAGS_1(popcntb, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(cmpb, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_3(sraw, tl, env, tl, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_2(cmpeqb, TCG_CALL_NO_RWG_SE, i32, tl, tl)
-DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
+DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_3(srad, tl, env, tl, tl)
 DEF_HELPER_0(darn32, tl)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index e1bb695..1871792 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -272,6 +272,7 @@ target_ulong helper_srad(CPUPPCState *env, target_ulong 
value,
 #if defined(TARGET_PPC64)
 target_ulong helper_popcntb(target_ulong val)
 {
+/* Note that we don't fold past bytes */
 val = (val & 0xULL) + ((val >>  1) &
0xULL);
 val = (val & 0xULL) + ((val >>  2) &
@@ -283,6 +284,7 @@ target_ulong helper_popcntb(target_ulong val)
 
 target_ulong helper_popcntw(target_ulong val)
 {
+/* Note that we don't fold past words.  */
 val = (val & 0xULL) + ((val >>  1) &
0xULL);
 val = (val & 0xULL) + ((val >>  2) &
@@ -295,29 +297,15 @@ target_ulong helper_popcntw(target_ulong val)
0xULL);
 return val;
 }
-
-target_ulong helper_popcntd(target_ulong val)
-{
-return ctpop64(val);
-}
 #else
 target_ulong helper_popcntb(target_ulong val)
 {
+/* Note that we don't fold past bytes */
 val = (val & 0x) + ((val >>  1) & 0x);
 val = (val & 0x) + ((val >>  2) & 0x);
 val = (val & 0x0f0f0f0f) + ((val >>  4) & 0x0f0f0f0f);
 return val;
 }
-
-target_ulong helper_popcntw(target_ulong val)
-{
-val = (val & 0x) + ((val >>  1) & 0x);
-val = (val & 0x) + ((val >>  2) & 0x);
-val = (val & 0x0f0f0f0f) + ((val >>  4) & 0x0f0f0f0f);
-val = (val & 0x00ff00ff) + ((val >>  8) & 0x00ff00ff);
-val = (val & 0x) + ((val >> 16) & 0x);
-return val;
-}
 #endif
 
 /*/
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 1224f56..1212180 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1844,14 +1844,18 @@ static void gen_popcntb(DisasContext *ctx)
 
 static void gen_popcntw(DisasContext *ctx)
 {
+#if defined(TARGET_PPC64)
 gen_helper_popcntw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+#else
+tcg_gen_ctpop_i32(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+#endif
 }
 
 #if defined(TARGET_PPC64)
 /* popcntd: PowerPC 2.06 specification */
 static void gen_popcntd(DisasContext *ctx)
 {
-gen_helper_popcntd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+tcg_gen_ctpop_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
 }
 #endif
 
-- 
2.9.3




[Qemu-devel] [PULL 60/65] target-i386: Use ctpop helper

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/i386/cc_helper.c  |  3 +++
 target/i386/cpu.h|  1 +
 target/i386/ops_sse.h| 26 --
 target/i386/ops_sse_header.h |  1 -
 target/i386/translate.c  | 13 +++--
 5 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/target/i386/cc_helper.c b/target/i386/cc_helper.c
index 83af223..c9c90e1 100644
--- a/target/i386/cc_helper.c
+++ b/target/i386/cc_helper.c
@@ -105,6 +105,8 @@ target_ulong helper_cc_compute_all(target_ulong dst, 
target_ulong src1,
 return src1;
 case CC_OP_CLR:
 return CC_Z | CC_P;
+case CC_OP_POPCNT:
+return src1 ? 0 : CC_Z;
 
 case CC_OP_MULB:
 return compute_all_mulb(dst, src1);
@@ -232,6 +234,7 @@ target_ulong helper_cc_compute_c(target_ulong dst, 
target_ulong src1,
 case CC_OP_LOGICL:
 case CC_OP_LOGICQ:
 case CC_OP_CLR:
+case CC_OP_POPCNT:
 return 0;
 
 case CC_OP_EFLAGS:
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index a7f2f60..a04e46b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -777,6 +777,7 @@ typedef enum {
 CC_OP_ADCOX, /* CC_DST = C, CC_SRC2 = O, CC_SRC = rest.  */
 
 CC_OP_CLR, /* Z set, all other flags clear.  */
+CC_OP_POPCNT, /* Z via CC_SRC, all other flags clear.  */
 
 CC_OP_NB,
 } CCOp;
diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 7a98f53..16509d0 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2157,32 +2157,6 @@ target_ulong helper_crc32(uint32_t crc1, target_ulong 
msg, uint32_t len)
 return crc;
 }
 
-#define POPMASK(i) ((target_ulong) -1 / ((1LL << (1 << i)) + 1))
-#define POPCOUNT(n, i) ((n & POPMASK(i)) + ((n >> (1 << i)) & POPMASK(i)))
-target_ulong helper_popcnt(CPUX86State *env, target_ulong n, uint32_t type)
-{
-CC_SRC = n ? 0 : CC_Z;
-
-n = POPCOUNT(n, 0);
-n = POPCOUNT(n, 1);
-n = POPCOUNT(n, 2);
-n = POPCOUNT(n, 3);
-if (type == 1) {
-return n & 0xff;
-}
-
-n = POPCOUNT(n, 4);
-#ifndef TARGET_X86_64
-return n;
-#else
-if (type == 2) {
-return n & 0xff;
-}
-
-return POPCOUNT(n, 5);
-#endif
-}
-
 void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
 uint32_t ctrl)
 {
diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h
index 64c5857..094aafc 100644
--- a/target/i386/ops_sse_header.h
+++ b/target/i386/ops_sse_header.h
@@ -333,7 +333,6 @@ DEF_HELPER_4(glue(pcmpestrm, SUFFIX), void, env, Reg, Reg, 
i32)
 DEF_HELPER_4(glue(pcmpistri, SUFFIX), void, env, Reg, Reg, i32)
 DEF_HELPER_4(glue(pcmpistrm, SUFFIX), void, env, Reg, Reg, i32)
 DEF_HELPER_3(crc32, tl, i32, tl, i32)
-DEF_HELPER_3(popcnt, tl, env, tl, i32)
 #endif
 
 /* AES-NI op helpers */
diff --git a/target/i386/translate.c b/target/i386/translate.c
index ce9ccb8..5f5e60d 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -222,6 +222,7 @@ static const uint8_t cc_op_live[CC_OP_NB] = {
 [CC_OP_ADOX] = USES_CC_SRC | USES_CC_SRC2,
 [CC_OP_ADCOX] = USES_CC_DST | USES_CC_SRC | USES_CC_SRC2,
 [CC_OP_CLR] = 0,
+[CC_OP_POPCNT] = USES_CC_SRC,
 };
 
 static void set_cc_op(DisasContext *s, CCOp op)
@@ -757,6 +758,7 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv 
reg)
 
 case CC_OP_LOGICB ... CC_OP_LOGICQ:
 case CC_OP_CLR:
+case CC_OP_POPCNT:
 return (CCPrepare) { .cond = TCG_COND_NEVER, .mask = -1 };
 
 case CC_OP_INCB ... CC_OP_INCQ:
@@ -824,6 +826,7 @@ static CCPrepare gen_prepare_eflags_s(DisasContext *s, TCGv 
reg)
 return (CCPrepare) { .cond = TCG_COND_NE, .reg = cpu_cc_src,
  .mask = CC_S };
 case CC_OP_CLR:
+case CC_OP_POPCNT:
 return (CCPrepare) { .cond = TCG_COND_NEVER, .mask = -1 };
 default:
 {
@@ -843,6 +846,7 @@ static CCPrepare gen_prepare_eflags_o(DisasContext *s, TCGv 
reg)
 return (CCPrepare) { .cond = TCG_COND_NE, .reg = cpu_cc_src2,
  .mask = -1, .no_setcond = true };
 case CC_OP_CLR:
+case CC_OP_POPCNT:
 return (CCPrepare) { .cond = TCG_COND_NEVER, .mask = -1 };
 default:
 gen_compute_eflags(s);
@@ -866,6 +870,9 @@ static CCPrepare gen_prepare_eflags_z(DisasContext *s, TCGv 
reg)
  .mask = CC_Z };
 case CC_OP_CLR:
 return (CCPrepare) { .cond = TCG_COND_ALWAYS, .mask = -1 };
+case CC_OP_POPCNT:
+return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_src,
+ .mask = -1 };
 default:
 {
 TCGMemOp size = (s->cc_op - CC_OP_ADDB) & 3;
@@ -8205,10 +8212,12 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 }
 
 gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
-gen_helper_popcnt(cpu_T0, cpu_env, cpu_T0, tcg_const_i32(ot));
+gen_extu(ot, 

[Qemu-devel] [PULL 43/65] tcg/mips: Handle clz opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.h |  6 --
 tcg/mips/tcg-target.inc.c | 47 +++
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 06988cf..a680f16 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -121,8 +121,6 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_rem_i32  1
 #define TCG_TARGET_HAS_not_i32  1
 #define TCG_TARGET_HAS_nor_i32  1
-#define TCG_TARGET_HAS_clz_i32  0
-#define TCG_TARGET_HAS_ctz_i32  0
 #define TCG_TARGET_HAS_andc_i32 0
 #define TCG_TARGET_HAS_orc_i32  0
 #define TCG_TARGET_HAS_eqv_i32  0
@@ -165,6 +163,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_ext8s_i32use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i32   use_mips32r2_instructions
 #define TCG_TARGET_HAS_rot_i32  use_mips32r2_instructions
+#define TCG_TARGET_HAS_clz_i32  use_mips32r2_instructions
+#define TCG_TARGET_HAS_ctz_i32  0
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_movcond_i64  use_movnz_instructions
@@ -177,6 +177,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_ext8s_i64use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i64   use_mips32r2_instructions
 #define TCG_TARGET_HAS_rot_i64  use_mips32r2_instructions
+#define TCG_TARGET_HAS_clz_i64  use_mips32r2_instructions
+#define TCG_TARGET_HAS_ctz_i64  0
 #endif
 
 /* optional instructions automatically implemented */
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index ea20891..01ac7b2 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -179,6 +179,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_S16  0x400/* Signed 16-bit: -32768 - 32767 */
 #define TCG_CT_CONST_P2M1 0x800/* Power of 2 minus 1.  */
 #define TCG_CT_CONST_N16  0x1000   /* "Negatable" 16-bit: -32767 - 32767 */
+#define TCG_CT_CONST_WSZ  0x2000   /* word size */
 
 static inline bool is_p2m1(tcg_target_long val)
 {
@@ -229,6 +230,9 @@ static const char *target_parse_constraint(TCGArgConstraint 
*ct,
 case 'N':
 ct->ct |= TCG_CT_CONST_N16;
 break;
+case 'W':
+ct->ct |= TCG_CT_CONST_WSZ;
+break;
 case 'Z':
 /* We are cheating a bit here, using the fact that the register
ZERO is also the register number 0. Hence there is no need
@@ -260,6 +264,9 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 } else if ((ct & TCG_CT_CONST_P2M1)
&& use_mips32r2_instructions && is_p2m1(val)) {
 return 1;
+} else if ((ct & TCG_CT_CONST_WSZ)
+   && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
+return 1;
 }
 return 0;
 }
@@ -356,6 +363,8 @@ typedef enum {
 OPC_DSRL32   = OPC_SPECIAL | 076,
 OPC_DROTR32  = OPC_SPECIAL | 076 | (1 << 21),
 OPC_DSRA32   = OPC_SPECIAL | 077,
+OPC_CLZ_R6   = OPC_SPECIAL | 0120,
+OPC_DCLZ_R6  = OPC_SPECIAL | 0122,
 
 OPC_REGIMM   = 001 << 26,
 OPC_BLTZ = OPC_REGIMM | (000 << 16),
@@ -363,6 +372,8 @@ typedef enum {
 
 OPC_SPECIAL2 = 034 << 26,
 OPC_MUL_R5   = OPC_SPECIAL2 | 002,
+OPC_CLZ  = OPC_SPECIAL2 | 040,
+OPC_DCLZ = OPC_SPECIAL2 | 044,
 
 OPC_SPECIAL3 = 037 << 26,
 OPC_EXT  = OPC_SPECIAL3 | 000,
@@ -1664,6 +1675,33 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0)
 tcg_out32(s, sync[a0 & TCG_MO_ALL]);
 }
 
+static void tcg_out_clz(TCGContext *s, MIPSInsn opcv2, MIPSInsn opcv6,
+int width, TCGReg a0, TCGReg a1, TCGArg a2)
+{
+if (use_mips32r6_instructions) {
+if (a2 == width) {
+tcg_out_opc_reg(s, opcv6, a0, a1, 0);
+} else {
+tcg_out_opc_reg(s, opcv6, TCG_TMP0, a1, 0);
+tcg_out_movcond(s, TCG_COND_EQ, a0, a1, 0, a2, TCG_TMP0);
+}
+} else {
+if (a2 == width) {
+tcg_out_opc_reg(s, opcv2, a0, a1, a1);
+} else if (a0 == a2) {
+tcg_out_opc_reg(s, opcv2, TCG_TMP0, a1, a1);
+tcg_out_opc_reg(s, OPC_MOVN, a0, TCG_TMP0, a1);
+} else if (a0 != a1) {
+tcg_out_opc_reg(s, opcv2, a0, a1, a1);
+tcg_out_opc_reg(s, OPC_MOVZ, a0, a2, a1);
+} else {
+tcg_out_opc_reg(s, opcv2, TCG_TMP0, a1, a1);
+tcg_out_opc_reg(s, OPC_MOVZ, TCG_TMP0, a2, a1);
+tcg_out_mov(s, TCG_TYPE_REG, a0, TCG_TMP0);
+}
+}
+}
+
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
   const TCGArg *args, const int *const_args)
 {
@@ -2040,6 +2078,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
+case INDEX_op_clz_i32:
+

[Qemu-devel] [PULL 53/65] target-xtensa: Use clrsb helper

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/xtensa/translate.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 5c719a4..5a93705 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -1372,16 +1372,7 @@ static void disas_xtensa_insn(CPUXtensaState *env, 
DisasContext *dc)
 case 14: /*NSAu*/
 HAS_OPTION(XTENSA_OPTION_MISC_OP_NSA);
 if (gen_window_check2(dc, RRR_S, RRR_T)) {
-TCGv_i32 t0 = tcg_temp_new_i32();
-
-/* if (v & 0x8000) v = ~v; */
-tcg_gen_sari_i32(t0, cpu_R[RRR_S], 31);
-tcg_gen_xor_i32(t0, t0, cpu_R[RRR_S]);
-
-/* r = (v ? clz(v) : 32) - 1; */
-tcg_gen_clzi_i32(t0, t0, 32);
-tcg_gen_subi_i32(cpu_R[RRR_T], t0, 1);
-tcg_temp_free_i32(t0);
+tcg_gen_clrsb_i32(cpu_R[RRR_T], cpu_R[RRR_S]);
 }
 break;
 
-- 
2.9.3




[Qemu-devel] [PULL 55/65] target-alpha: Use ctpop helper

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/alpha/helper.h | 2 --
 target/alpha/int_helper.c | 5 -
 target/alpha/translate.c  | 2 +-
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/target/alpha/helper.h b/target/alpha/helper.h
index eed3906..d60f208 100644
--- a/target/alpha/helper.h
+++ b/target/alpha/helper.h
@@ -3,8 +3,6 @@ DEF_HELPER_FLAGS_1(load_pcc, TCG_CALL_NO_RWG_SE, i64, env)
 
 DEF_HELPER_FLAGS_3(check_overflow, TCG_CALL_NO_WG, void, env, i64, i64)
 
-DEF_HELPER_FLAGS_1(ctpop, TCG_CALL_NO_RWG_SE, i64, i64)
-
 DEF_HELPER_FLAGS_2(zap, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(zapnot, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 
diff --git a/target/alpha/int_helper.c b/target/alpha/int_helper.c
index 3c303bd..e43b50a 100644
--- a/target/alpha/int_helper.c
+++ b/target/alpha/int_helper.c
@@ -24,11 +24,6 @@
 #include "qemu/host-utils.h"
 
 
-uint64_t helper_ctpop(uint64_t arg)
-{
-return ctpop64(arg);
-}
-
 uint64_t helper_zapnot(uint64_t val, uint64_t mskb)
 {
 uint64_t mask;
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 6e2e563..055286a 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -2541,7 +2541,7 @@ static ExitStatus translate_one(DisasContext *ctx, 
uint32_t insn)
 REQUIRE_TB_FLAG(TB_FLAGS_AMASK_CIX);
 REQUIRE_REG_31(ra);
 REQUIRE_NO_LIT;
-gen_helper_ctpop(vc, vb);
+tcg_gen_ctpop_i64(vc, vb);
 break;
 case 0x31:
 /* PERR */
-- 
2.9.3




[Qemu-devel] [PULL 42/65] tcg/arm: Handle ctz and clz opcodes

2017-01-10 Thread Richard Henderson
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.h |  4 ++--
 tcg/arm/tcg-target.inc.c | 27 +++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 02cc242..4cb94dc 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -110,8 +110,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_eqv_i32  0
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
-#define TCG_TARGET_HAS_clz_i32  0
-#define TCG_TARGET_HAS_ctz_i32  0
+#define TCG_TARGET_HAS_clz_i32  use_armv5t_instructions
+#define TCG_TARGET_HAS_ctz_i32  use_armv7_instructions
 #define TCG_TARGET_HAS_deposit_i32  use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32  use_armv7_instructions
 #define TCG_TARGET_HAS_sextract_i32 use_armv7_instructions
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index ec0b861..e75a6d4 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -256,6 +256,9 @@ typedef enum {
 ARITH_BIC = 0xe << 21,
 ARITH_MVN = 0xf << 21,
 
+INSN_CLZ   = 0x016f0f10,
+INSN_RBIT  = 0x06ff0f30,
+
 INSN_LDR_IMM   = 0x0410,
 INSN_LDR_REG   = 0x0610,
 INSN_STR_IMM   = 0x0400,
@@ -1829,6 +1832,28 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
+case INDEX_op_ctz_i32:
+tcg_out_dat_reg(s, COND_AL, INSN_RBIT, TCG_REG_TMP, 0, args[1], 0);
+a1 = TCG_REG_TMP;
+goto do_clz;
+
+case INDEX_op_clz_i32:
+a1 = args[1];
+do_clz:
+a0 = args[0];
+a2 = args[2];
+c = const_args[2];
+if (c && a2 == 32) {
+tcg_out_dat_reg(s, COND_AL, INSN_CLZ, a0, 0, a1, 0);
+break;
+}
+tcg_out_dat_imm(s, COND_AL, ARITH_CMP, 0, a1, 0);
+tcg_out_dat_reg(s, COND_NE, INSN_CLZ, a0, 0, a1, 0);
+if (c || a0 != a2) {
+tcg_out_dat_rIK(s, COND_EQ, ARITH_MOV, ARITH_MVN, a0, 0, a2, c);
+}
+break;
+
 case INDEX_op_brcond_i32:
 tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
args[0], args[1], const_args[1]);
@@ -1963,6 +1988,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
 { INDEX_op_sar_i32, { "r", "r", "ri" } },
 { INDEX_op_rotl_i32, { "r", "r", "ri" } },
 { INDEX_op_rotr_i32, { "r", "r", "ri" } },
+{ INDEX_op_clz_i32, { "r", "r", "rIK" } },
+{ INDEX_op_ctz_i32, { "r", "r", "rIK" } },
 
 { INDEX_op_brcond_i32, { "r", "rIN" } },
 { INDEX_op_setcond_i32, { "r", "r", "rIN" } },
-- 
2.9.3




[Qemu-devel] [PULL 51/65] target-arm: Use clrsb helper

2017-01-10 Thread Richard Henderson
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.c| 10 --
 target/arm/helper-a64.h|  2 --
 target/arm/translate-a64.c |  8 
 3 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 77999ff..d9df82c 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -54,16 +54,6 @@ int64_t HELPER(sdiv64)(int64_t num, int64_t den)
 return num / den;
 }
 
-uint64_t HELPER(cls64)(uint64_t x)
-{
-return clrsb64(x);
-}
-
-uint32_t HELPER(cls32)(uint32_t x)
-{
-return clrsb32(x);
-}
-
 uint64_t HELPER(rbit64)(uint64_t x)
 {
 return revbit64(x);
diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index d320f96..6f9eaba 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -18,8 +18,6 @@
  */
 DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
-DEF_HELPER_FLAGS_1(cls64, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(cls32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 1bf94bc..4f09dfb 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -3972,11 +3972,11 @@ static void handle_cls(DisasContext *s, unsigned int sf,
 tcg_rn = cpu_reg(s, rn);
 
 if (sf) {
-gen_helper_cls64(tcg_rd, tcg_rn);
+tcg_gen_clrsb_i64(tcg_rd, tcg_rn);
 } else {
 TCGv_i32 tcg_tmp32 = tcg_temp_new_i32();
 tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn);
-gen_helper_cls32(tcg_tmp32, tcg_tmp32);
+tcg_gen_clrsb_i32(tcg_tmp32, tcg_tmp32);
 tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32);
 tcg_temp_free_i32(tcg_tmp32);
 }
@@ -7593,7 +7593,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, 
bool u,
 if (u) {
 tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
 } else {
-gen_helper_cls64(tcg_rd, tcg_rn);
+tcg_gen_clrsb_i64(tcg_rd, tcg_rn);
 }
 break;
 case 0x5: /* NOT */
@@ -10263,7 +10263,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, 
uint32_t insn)
 if (u) {
 tcg_gen_clzi_i32(tcg_res, tcg_op, 32);
 } else {
-gen_helper_cls32(tcg_res, tcg_op);
+tcg_gen_clrsb_i32(tcg_res, tcg_op);
 }
 break;
 case 0x7: /* SQABS, SQNEG */
-- 
2.9.3




[Qemu-devel] [PULL 47/65] tcg/i386: Allow bmi2 shiftx to have non-matching operands

2017-01-10 Thread Richard Henderson
Previously we could not have different constraints for different ISA levels,
which prevented us from eliding the matching constraint for shifts.

We do now have to make sure that the operands match for constant shifts.
We can also handle some small left shifts via lea.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 83572ac..651d96c 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -179,7 +179,6 @@ static const char *target_parse_constraint(TCGArgConstraint 
*ct,
 tcg_regset_set_reg(ct->u.regs, TCG_REG_EBX);
 break;
 case 'c':
-case_c:
 ct->ct |= TCG_CT_REG;
 tcg_regset_set_reg(ct->u.regs, TCG_REG_ECX);
 break;
@@ -208,7 +207,6 @@ static const char *target_parse_constraint(TCGArgConstraint 
*ct,
 tcg_regset_set32(ct->u.regs, 0, 0xf);
 break;
 case 'r':
-case_r:
 ct->ct |= TCG_CT_REG;
 if (TCG_TARGET_REG_BITS == 64) {
 tcg_regset_set32(ct->u.regs, 0, 0x);
@@ -216,13 +214,6 @@ static const char 
*target_parse_constraint(TCGArgConstraint *ct,
 tcg_regset_set32(ct->u.regs, 0, 0xff);
 }
 break;
-case 'C':
-/* With SHRX et al, we need not use ECX as shift count register.  */
-if (have_bmi2) {
-goto case_r;
-} else {
-goto case_c;
-}
 
 /* qemu_ld/st address constraint */
 case 'L':
@@ -1959,6 +1950,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 
 OP_32_64(shl):
+/* For small constant 3-operand shift, use LEA.  */
+if (const_a2 && a0 != a1 && (a2 - 1) < 3) {
+if (a2 - 1 == 0) {
+/* shl $1,a1,a0 -> lea (a1,a1),a0 */
+tcg_out_modrm_sib_offset(s, OPC_LEA + rexw, a0, a1, a1, 0, 0);
+} else {
+/* shl $n,a1,a0 -> lea 0(,a1,n),a0 */
+tcg_out_modrm_sib_offset(s, OPC_LEA + rexw, a0, -1, a1, a2, 0);
+}
+break;
+}
 c = SHIFT_SHL;
 vexop = OPC_SHLX;
 goto gen_shift_maybe_vex;
@@ -1977,9 +1979,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 c = SHIFT_ROR;
 goto gen_shift;
 gen_shift_maybe_vex:
-if (have_bmi2 && !const_a2) {
-tcg_out_vex_modrm(s, vexop + rexw, a0, a2, a1);
-break;
+if (have_bmi2) {
+if (!const_a2) {
+tcg_out_vex_modrm(s, vexop + rexw, a0, a2, a1);
+break;
+}
+tcg_out_mov(s, rexw ? TCG_TYPE_I64 : TCG_TYPE_I32, a0, a1);
 }
 /* FALLTHRU */
 gen_shift:
@@ -2190,9 +2195,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 static const TCGTargetOpDef r_q = { .args_ct_str = { "r", "q" } };
 static const TCGTargetOpDef r_re = { .args_ct_str = { "r", "re" } };
 static const TCGTargetOpDef r_0 = { .args_ct_str = { "r", "0" } };
+static const TCGTargetOpDef r_r_ri = { .args_ct_str = { "r", "r", "ri" } };
 static const TCGTargetOpDef r_r_re = { .args_ct_str = { "r", "r", "re" } };
 static const TCGTargetOpDef r_0_re = { .args_ct_str = { "r", "0", "re" } };
-static const TCGTargetOpDef r_0_Ci = { .args_ct_str = { "r", "0", "Ci" } };
 static const TCGTargetOpDef r_0_ci = { .args_ct_str = { "r", "0", "ci" } };
 static const TCGTargetOpDef r_L = { .args_ct_str = { "r", "L" } };
 static const TCGTargetOpDef L_L = { .args_ct_str = { "L", "L" } };
@@ -2266,7 +2271,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_shr_i64:
 case INDEX_op_sar_i32:
 case INDEX_op_sar_i64:
-return _0_Ci;
+return have_bmi2 ? _r_ri : _0_ci;
 case INDEX_op_rotl_i32:
 case INDEX_op_rotl_i64:
 case INDEX_op_rotr_i32:
-- 
2.9.3




[Qemu-devel] [PULL 39/65] target-i386: Use clz and ctz opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/i386/helper.h |  2 --
 target/i386/int_helper.c | 11 ---
 target/i386/translate.c  | 31 ++-
 3 files changed, 14 insertions(+), 30 deletions(-)

diff --git a/target/i386/helper.h b/target/i386/helper.h
index bd9b2cf..4c1aaff 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -202,8 +202,6 @@ DEF_HELPER_FLAGS_3(xsetbv, TCG_CALL_NO_WG, void, env, i32, 
i64)
 DEF_HELPER_FLAGS_2(rdpkru, TCG_CALL_NO_WG, i64, env, i32)
 DEF_HELPER_FLAGS_3(wrpkru, TCG_CALL_NO_WG, void, env, i32, i64)
 
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(ctz, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(pdep, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(pext, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
diff --git a/target/i386/int_helper.c b/target/i386/int_helper.c
index 9e873ac..4dc5c65 100644
--- a/target/i386/int_helper.c
+++ b/target/i386/int_helper.c
@@ -417,17 +417,6 @@ void helper_idivq_EAX(CPUX86State *env, target_ulong t0)
 # define clztl  clz64
 #endif
 
-/* bit operations */
-target_ulong helper_ctz(target_ulong t0)
-{
-return ctztl(t0);
-}
-
-target_ulong helper_clz(target_ulong t0)
-{
-return clztl(t0);
-}
-
 target_ulong helper_pdep(target_ulong src, target_ulong mask)
 {
 target_ulong dest = 0;
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 816d0b1..ce9ccb8 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6807,21 +6807,18 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 ? s->cpuid_ext3_features & CPUID_EXT3_ABM
 : s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_BMI1)) {
 int size = 8 << ot;
+/* For lzcnt/tzcnt, C bit is defined related to the input. */
 tcg_gen_mov_tl(cpu_cc_src, cpu_T0);
 if (b & 1) {
 /* For lzcnt, reduce the target_ulong result by the
number of zeros that we expect to find at the top.  */
-gen_helper_clz(cpu_T0, cpu_T0);
+tcg_gen_clzi_tl(cpu_T0, cpu_T0, TARGET_LONG_BITS);
 tcg_gen_subi_tl(cpu_T0, cpu_T0, TARGET_LONG_BITS - size);
 } else {
-/* For tzcnt, a zero input must return the operand size:
-   force all bits outside the operand size to 1.  */
-target_ulong mask = (target_ulong)-2 << (size - 1);
-tcg_gen_ori_tl(cpu_T0, cpu_T0, mask);
-gen_helper_ctz(cpu_T0, cpu_T0);
-}
-/* For lzcnt/tzcnt, C and Z bits are defined and are
-   related to the result.  */
+/* For tzcnt, a zero input must return the operand size.  */
+tcg_gen_ctzi_tl(cpu_T0, cpu_T0, size);
+}
+/* For lzcnt/tzcnt, Z bit is defined related to the result.  */
 gen_op_update1_cc();
 set_cc_op(s, CC_OP_BMILGB + ot);
 } else {
@@ -6829,20 +6826,20 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
to the input and not the result.  */
 tcg_gen_mov_tl(cpu_cc_dst, cpu_T0);
 set_cc_op(s, CC_OP_LOGICB + ot);
+
+/* ??? The manual says that the output is undefined when the
+   input is zero, but real hardware leaves it unchanged, and
+   real programs appear to depend on that.  Accomplish this
+   by passing the output as the value to return upon zero.  */
 if (b & 1) {
 /* For bsr, return the bit index of the first 1 bit,
not the count of leading zeros.  */
-gen_helper_clz(cpu_T0, cpu_T0);
+tcg_gen_xori_tl(cpu_T1, cpu_regs[reg], TARGET_LONG_BITS - 1);
+tcg_gen_clz_tl(cpu_T0, cpu_T0, cpu_T1);
 tcg_gen_xori_tl(cpu_T0, cpu_T0, TARGET_LONG_BITS - 1);
 } else {
-gen_helper_ctz(cpu_T0, cpu_T0);
+tcg_gen_ctz_tl(cpu_T0, cpu_T0, cpu_regs[reg]);
 }
-/* ??? The manual says that the output is undefined when the
-   input is zero, but real hardware leaves it unchanged, and
-   real programs appear to depend on that.  */
-tcg_gen_movi_tl(cpu_tmp0, 0);
-tcg_gen_movcond_tl(TCG_COND_EQ, cpu_T0, cpu_cc_dst, cpu_tmp0,
-   cpu_regs[reg], cpu_T0);
 }
 gen_op_mov_reg_v(ot, reg, cpu_T0);
 break;
-- 
2.9.3




[Qemu-devel] [PULL 65/65] tcg/i386: Handle ctpop opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  5 +++--
 tcg/i386/tcg-target.inc.c | 12 +++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index b8f73f5..21d96ec 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -76,6 +76,7 @@ typedef enum {
 #endif
 
 extern bool have_bmi1;
+extern bool have_popcnt;
 
 /* optional instructions */
 #define TCG_TARGET_HAS_div2_i32 1
@@ -95,7 +96,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_clz_i32  1
 #define TCG_TARGET_HAS_ctz_i32  1
-#define TCG_TARGET_HAS_ctpop_i320
+#define TCG_TARGET_HAS_ctpop_i32have_popcnt
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 1
@@ -130,7 +131,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nor_i64  0
 #define TCG_TARGET_HAS_clz_i64  1
 #define TCG_TARGET_HAS_ctz_i64  1
-#define TCG_TARGET_HAS_ctpop_i640
+#define TCG_TARGET_HAS_ctpop_i64have_popcnt
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3650340..01177a9 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -130,9 +130,10 @@ static bool have_movbe;
 # define have_movbe 0
 #endif
 
-/* We need this symbol in tcg-target.h, and we can't properly conditionalize
+/* We need these symbols in tcg-target.h, and we can't properly conditionalize
it there.  Therefore we always define the variable.  */
 bool have_bmi1;
+bool have_popcnt;
 
 #if defined(CONFIG_CPUID_H) && defined(bit_BMI2)
 static bool have_bmi2;
@@ -337,6 +338,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_MOVZBL (0xb6 | P_EXT)
 #define OPC_MOVZWL (0xb7 | P_EXT)
 #define OPC_POP_r32(0x58)
+#define OPC_POPCNT  (0xb8 | P_EXT | P_SIMDF3)
 #define OPC_PUSH_r32   (0x50)
 #define OPC_PUSH_Iv(0x68)
 #define OPC_PUSH_Ib(0x6a)
@@ -2083,6 +2085,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 OP_32_64(clz):
 tcg_out_clz(s, rexw, args[0], args[1], args[2], const_args[2]);
 break;
+OP_32_64(ctpop):
+tcg_out_modrm(s, OPC_POPCNT + rexw, a0, a1);
+break;
 
 case INDEX_op_brcond_i32:
 tcg_out_brcond32(s, a2, a0, a1, const_args[1], arg_label(args[3]), 0);
@@ -2398,6 +2403,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_extract_i32:
 case INDEX_op_extract_i64:
 case INDEX_op_sextract_i32:
+case INDEX_op_ctpop_i32:
+case INDEX_op_ctpop_i64:
 return _r;
 
 case INDEX_op_deposit_i32:
@@ -2602,6 +2609,9 @@ static void tcg_target_init(TCGContext *s)
need to probe for it.  */
 have_movbe = (c & bit_MOVBE) != 0;
 #endif
+#ifdef bit_POPCNT
+have_popcnt = (c & bit_POPCNT) != 0;
+#endif
 }
 
 if (max >= 7) {
-- 
2.9.3




[Qemu-devel] [PULL 50/65] tcg: Add helpers for clrsb

2017-01-10 Thread Richard Henderson
The number of actual invocations does not warrent an opcode,
and the backends generating it.  But at least we can eliminate
redundant helpers.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg-runtime.c | 10 ++
 tcg/tcg-op.c  | 28 
 tcg/tcg-op.h  |  4 
 tcg/tcg-runtime.h |  2 ++
 4 files changed, 44 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index eb3bade..c8b98df 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -121,6 +121,16 @@ uint64_t HELPER(ctz_i64)(uint64_t arg, uint64_t zero_val)
 return arg ? ctz64(arg) : zero_val;
 }
 
+uint32_t HELPER(clrsb_i32)(uint32_t arg)
+{
+return clrsb32(arg);
+}
+
+uint64_t HELPER(clrsb_i64)(uint64_t arg)
+{
+return clrsb64(arg);
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
 cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 2b520c1..620e268 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -536,6 +536,20 @@ void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, 
uint32_t arg2)
 tcg_temp_free_i32(t);
 }
 
+void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg)
+{
+if (TCG_TARGET_HAS_clz_i32) {
+TCGv_i32 t = tcg_temp_new_i32();
+tcg_gen_sari_i32(t, arg, 31);
+tcg_gen_xor_i32(t, t, arg);
+tcg_gen_clzi_i32(t, t, 32);
+tcg_gen_subi_i32(ret, t, 1);
+tcg_temp_free_i32(t);
+} else {
+gen_helper_clrsb_i32(ret, arg);
+}
+}
+
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 {
 if (TCG_TARGET_HAS_rot_i32) {
@@ -1846,6 +1860,20 @@ void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, 
uint64_t arg2)
 }
 }
 
+void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg)
+{
+if (TCG_TARGET_HAS_clz_i64 || TCG_TARGET_HAS_clz_i32) {
+TCGv_i64 t = tcg_temp_new_i64();
+tcg_gen_sari_i64(t, arg, 63);
+tcg_gen_xor_i64(t, t, arg);
+tcg_gen_clzi_i64(t, t, 64);
+tcg_gen_subi_i64(ret, t, 1);
+tcg_temp_free_i64(t);
+} else {
+gen_helper_clrsb_i64(ret, arg);
+}
+}
+
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 {
 if (TCG_TARGET_HAS_rot_i64) {
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 7a24e84..c2f3db9 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -290,6 +290,7 @@ void tcg_gen_clz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 
arg2);
 void tcg_gen_ctz_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_clzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
 void tcg_gen_ctzi_i32(TCGv_i32 ret, TCGv_i32 arg1, uint32_t arg2);
+void tcg_gen_clrsb_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, unsigned arg2);
 void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2);
@@ -477,6 +478,7 @@ void tcg_gen_clz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 
arg2);
 void tcg_gen_ctz_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_clzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
 void tcg_gen_ctzi_i64(TCGv_i64 ret, TCGv_i64 arg1, uint64_t arg2);
+void tcg_gen_clrsb_i64(TCGv_i64 ret, TCGv_i64 arg);
 void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, unsigned arg2);
 void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2);
@@ -970,6 +972,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, TCGv_i64, 
TCGArg, TCGMemOp);
 #define tcg_gen_ctz_tl tcg_gen_ctz_i64
 #define tcg_gen_clzi_tl tcg_gen_clzi_i64
 #define tcg_gen_ctzi_tl tcg_gen_ctzi_i64
+#define tcg_gen_clrsb_tl tcg_gen_clrsb_i64
 #define tcg_gen_rotl_tl tcg_gen_rotl_i64
 #define tcg_gen_rotli_tl tcg_gen_rotli_i64
 #define tcg_gen_rotr_tl tcg_gen_rotr_i64
@@ -1065,6 +1068,7 @@ void tcg_gen_atomic_xor_fetch_i64(TCGv_i64, TCGv, 
TCGv_i64, TCGArg, TCGMemOp);
 #define tcg_gen_ctz_tl tcg_gen_ctz_i32
 #define tcg_gen_clzi_tl tcg_gen_clzi_i32
 #define tcg_gen_ctzi_tl tcg_gen_ctzi_i32
+#define tcg_gen_clrsb_tl tcg_gen_clrsb_i32
 #define tcg_gen_rotl_tl tcg_gen_rotl_i32
 #define tcg_gen_rotli_tl tcg_gen_rotli_i32
 #define tcg_gen_rotr_tl tcg_gen_rotr_i32
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index eb1cd76..0d30f1a 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -19,6 +19,8 @@ DEF_HELPER_FLAGS_2(clz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(ctz_i32, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(clz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(ctz_i64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_1(clrsb_i32, TCG_CALL_NO_RWG_SE, i32, i32)
+DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
-- 
2.9.3




[Qemu-devel] [PULL 32/65] target-ppc: Use clz and ctz opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/ppc/helper.h |  4 
 target/ppc/int_helper.c | 20 
 target/ppc/translate.c  | 20 
 3 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index da00f0a..1ed1d2c 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -38,16 +38,12 @@ DEF_HELPER_4(divde, i64, env, i64, i64, i32)
 DEF_HELPER_4(divweu, tl, env, tl, tl, i32)
 DEF_HELPER_4(divwe, tl, env, tl, tl, i32)
 
-DEF_HELPER_FLAGS_1(cntlzw, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(cnttzw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntb, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(cmpb, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_3(sraw, tl, env, tl, tl)
 #if defined(TARGET_PPC64)
 DEF_HELPER_FLAGS_2(cmpeqb, TCG_CALL_NO_RWG_SE, i32, tl, tl)
-DEF_HELPER_FLAGS_1(cntlzd, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(cnttzd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(popcntd, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_3(srad, tl, env, tl, tl)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 2d57c9a..e1bb695 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -141,16 +141,6 @@ uint64_t helper_divde(CPUPPCState *env, uint64_t rau, 
uint64_t rbu, uint32_t oe)
 #endif
 
 
-target_ulong helper_cntlzw(target_ulong t)
-{
-return clz32(t);
-}
-
-target_ulong helper_cnttzw(target_ulong t)
-{
-return ctz32(t);
-}
-
 #if defined(TARGET_PPC64)
 /* if x = 0xab, returns 0xababababababababa */
 #define pattern(x) (((x) & 0xff) * (~(target_ulong)0 / 0xff))
@@ -174,16 +164,6 @@ uint32_t helper_cmpeqb(target_ulong ra, target_ulong rb)
 #undef haszero
 #undef hasvalue
 
-target_ulong helper_cntlzd(target_ulong t)
-{
-return clz64(t);
-}
-
-target_ulong helper_cnttzd(target_ulong t)
-{
-return ctz64(t);
-}
-
 /* Return invalid random number.
  *
  * FIXME: Add rng backend or other mechanism to get cryptographically suitable
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 435c6f0..1224f56 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1641,7 +1641,13 @@ static void gen_andis_(DisasContext *ctx)
 /* cntlzw */
 static void gen_cntlzw(DisasContext *ctx)
 {
-gen_helper_cntlzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+TCGv_i32 t = tcg_temp_new_i32();
+
+tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
+tcg_gen_clzi_i32(t, t, 32);
+tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
+tcg_temp_free_i32(t);
+
 if (unlikely(Rc(ctx->opcode) != 0))
 gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
 }
@@ -1649,7 +1655,13 @@ static void gen_cntlzw(DisasContext *ctx)
 /* cnttzw */
 static void gen_cnttzw(DisasContext *ctx)
 {
-gen_helper_cnttzw(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+TCGv_i32 t = tcg_temp_new_i32();
+
+tcg_gen_trunc_tl_i32(t, cpu_gpr[rS(ctx->opcode)]);
+tcg_gen_ctzi_i32(t, t, 32);
+tcg_gen_extu_i32_tl(cpu_gpr[rA(ctx->opcode)], t);
+tcg_temp_free_i32(t);
+
 if (unlikely(Rc(ctx->opcode) != 0)) {
 gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
 }
@@ -1891,7 +1903,7 @@ GEN_LOGICAL1(extsw, tcg_gen_ext32s_tl, 0x1E, PPC_64B);
 /* cntlzd */
 static void gen_cntlzd(DisasContext *ctx)
 {
-gen_helper_cntlzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+tcg_gen_clzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
 if (unlikely(Rc(ctx->opcode) != 0))
 gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
 }
@@ -1899,7 +1911,7 @@ static void gen_cntlzd(DisasContext *ctx)
 /* cnttzd */
 static void gen_cnttzd(DisasContext *ctx)
 {
-gen_helper_cnttzd(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
+tcg_gen_ctzi_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)], 64);
 if (unlikely(Rc(ctx->opcode) != 0)) {
 gen_set_Rc0(ctx, cpu_gpr[rA(ctx->opcode)]);
 }
-- 
2.9.3




[Qemu-devel] [PULL 37/65] target-xtensa: Use clz opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/xtensa/helper.h|  2 --
 target/xtensa/op_helper.c | 13 -
 target/xtensa/translate.c | 13 +++--
 3 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/target/xtensa/helper.h b/target/xtensa/helper.h
index 5ea9c5b..0c8adae 100644
--- a/target/xtensa/helper.h
+++ b/target/xtensa/helper.h
@@ -3,8 +3,6 @@ DEF_HELPER_3(exception_cause, noreturn, env, i32, i32)
 DEF_HELPER_4(exception_cause_vaddr, noreturn, env, i32, i32, i32)
 DEF_HELPER_3(debug_exception, noreturn, env, i32, i32)
 
-DEF_HELPER_FLAGS_1(nsa, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(nsau, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_2(wsr_windowbase, void, env, i32)
 DEF_HELPER_4(entry, void, env, i32, i32, i32)
 DEF_HELPER_2(retw, i32, env, i32)
diff --git a/target/xtensa/op_helper.c b/target/xtensa/op_helper.c
index 0a4b214..dc25625 100644
--- a/target/xtensa/op_helper.c
+++ b/target/xtensa/op_helper.c
@@ -161,19 +161,6 @@ void HELPER(debug_exception)(CPUXtensaState *env, uint32_t 
pc, uint32_t cause)
 HELPER(exception)(env, EXC_DEBUG);
 }
 
-uint32_t HELPER(nsa)(uint32_t v)
-{
-if (v & 0x8000) {
-v = ~v;
-}
-return v ? clz32(v) - 1 : 31;
-}
-
-uint32_t HELPER(nsau)(uint32_t v)
-{
-return v ? clz32(v) : 32;
-}
-
 static void copy_window_from_phys(CPUXtensaState *env,
 uint32_t window, uint32_t phys, uint32_t n)
 {
diff --git a/target/xtensa/translate.c b/target/xtensa/translate.c
index 0858c29..5c719a4 100644
--- a/target/xtensa/translate.c
+++ b/target/xtensa/translate.c
@@ -1372,14 +1372,23 @@ static void disas_xtensa_insn(CPUXtensaState *env, 
DisasContext *dc)
 case 14: /*NSAu*/
 HAS_OPTION(XTENSA_OPTION_MISC_OP_NSA);
 if (gen_window_check2(dc, RRR_S, RRR_T)) {
-gen_helper_nsa(cpu_R[RRR_T], cpu_R[RRR_S]);
+TCGv_i32 t0 = tcg_temp_new_i32();
+
+/* if (v & 0x8000) v = ~v; */
+tcg_gen_sari_i32(t0, cpu_R[RRR_S], 31);
+tcg_gen_xor_i32(t0, t0, cpu_R[RRR_S]);
+
+/* r = (v ? clz(v) : 32) - 1; */
+tcg_gen_clzi_i32(t0, t0, 32);
+tcg_gen_subi_i32(cpu_R[RRR_T], t0, 1);
+tcg_temp_free_i32(t0);
 }
 break;
 
 case 15: /*NSAUu*/
 HAS_OPTION(XTENSA_OPTION_MISC_OP_NSA);
 if (gen_window_check2(dc, RRR_S, RRR_T)) {
-gen_helper_nsau(cpu_R[RRR_T], cpu_R[RRR_S]);
+tcg_gen_clzi_i32(cpu_R[RRR_T], cpu_R[RRR_S], 32);
 }
 break;
 
-- 
2.9.3




[Qemu-devel] [PULL 58/65] target-sparc: Use ctpop helper

2017-01-10 Thread Richard Henderson
Acked-by: Mark Cave-Ayland 
Signed-off-by: Richard Henderson 
---
 target/sparc/helper.c| 5 -
 target/sparc/helper.h| 1 -
 target/sparc/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/sparc/helper.c b/target/sparc/helper.c
index 359b0b1..1d85489 100644
--- a/target/sparc/helper.c
+++ b/target/sparc/helper.c
@@ -49,11 +49,6 @@ void helper_debug(CPUSPARCState *env)
 }
 
 #ifdef TARGET_SPARC64
-target_ulong helper_popc(target_ulong val)
-{
-return ctpop64(val);
-}
-
 void helper_tick_set_count(void *opaque, uint64_t count)
 {
 #if !defined(CONFIG_USER_ONLY)
diff --git a/target/sparc/helper.h b/target/sparc/helper.h
index 0cf1bfb..3ef38b9 100644
--- a/target/sparc/helper.h
+++ b/target/sparc/helper.h
@@ -16,7 +16,6 @@ DEF_HELPER_2(wrccr, void, env, tl)
 DEF_HELPER_1(rdcwp, tl, env)
 DEF_HELPER_2(wrcwp, void, env, tl)
 DEF_HELPER_FLAGS_2(array8, TCG_CALL_NO_RWG_SE, tl, tl, tl)
-DEF_HELPER_FLAGS_1(popc, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(set_softint, TCG_CALL_NO_RWG, void, env, i64)
 DEF_HELPER_FLAGS_2(clear_softint, TCG_CALL_NO_RWG, void, env, i64)
 DEF_HELPER_FLAGS_2(write_softint, TCG_CALL_NO_RWG, void, env, i64)
diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 2205f89..ead585e 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -4647,7 +4647,7 @@ static void disas_sparc_insn(DisasContext * dc, unsigned 
int insn)
 gen_store_gpr(dc, rd, cpu_dst);
 break;
 case 0x2e: /* V9 popc */
-gen_helper_popc(cpu_dst, cpu_src2);
+tcg_gen_ctpop_tl(cpu_dst, cpu_src2);
 gen_store_gpr(dc, rd, cpu_dst);
 break;
 case 0x2f: /* V9 movr */
-- 
2.9.3




[Qemu-devel] [PULL 49/65] tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR

2017-01-10 Thread Richard Henderson
The ISA manual documents the output is undefined if the input was zero.

However, we document in target-i386 that the behavior of real silicon
is to preserve the contents of the output register.  We also mention
that there are real applications that depend on this.  That this is
baked into silicon is mentioned as a potential cause for some false
sharing behaviour wrt lzcnt/tzcnt.

Taking advantage of this allows us to save 2 insns in the normal case,
and 4 insns for i686 emulating a 64-bit clz.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3ed8cd1..3650340 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1146,9 +1146,12 @@ static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg 
dest, TCGReg arg1,
 tcg_debug_assert(arg2 == (rexw ? 64 : 32));
 tcg_out_modrm(s, OPC_TZCNT + rexw, dest, arg1);
 } else {
-tcg_debug_assert(dest != arg2);
+/* ??? The manual says that the output is undefined when the
+   input is zero, but real hardware leaves it unchanged.  As
+   noted in target-i386/translate.c, real programs depend on
+   this -- now we are one more of those.  */
+tcg_debug_assert(dest == arg2);
 tcg_out_modrm(s, OPC_BSF + rexw, dest, arg1);
-tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
 }
 }
 
@@ -1161,20 +1164,26 @@ static void tcg_out_clz(TCGContext *s, int rexw, TCGReg 
dest, TCGReg arg1,
 tcg_debug_assert(arg2 == (rexw ? 64 : 32));
 } else {
 tcg_debug_assert(dest != arg2);
+/* LZCNT sets C if the input was zero.  */
 tcg_out_cmov(s, TCG_COND_LTU, rexw, dest, arg2);
 }
 } else {
-tcg_debug_assert(!const_a2);
-tcg_debug_assert(dest != arg1);
-tcg_debug_assert(dest != arg2);
+TCGType type = rexw ? TCG_TYPE_I64: TCG_TYPE_I32;
+TCGArg rev = rexw ? 63 : 31;
 
-/* Recall that the output of BSR is the index not the count.  */
+/* Recall that the output of BSR is the index not the count.
+   Therefore we must adjust the result by ^ (SIZE-1).  In some
+   cases below, we prefer an extra XOR to a JMP.  */
+/* ??? See the comment in tcg_out_ctz re BSF.  */
+if (const_a2) {
+tcg_debug_assert(dest != arg1);
+tcg_out_movi(s, type, dest, arg2 ^ rev);
+} else {
+tcg_debug_assert(dest == arg2);
+tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0);
+}
 tcg_out_modrm(s, OPC_BSR + rexw, dest, arg1);
-tgen_arithi(s, ARITH_XOR + rexw, dest, rexw ? 63 : 31, 0);
-
-/* Since we have destroyed the flags from BSR, we have to re-test.  */
-tcg_out_cmp(s, arg1, 0, 1, rexw);
-tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
+tgen_arithi(s, ARITH_XOR + rexw, dest, rev, 0);
 }
 }
 
@@ -2443,7 +2452,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_ctz_i64:
 {
 static const TCGTargetOpDef ctz[2] = {
-{ .args_ct_str = { "", "r", "r" } },
+{ .args_ct_str = { "r", "r", "0" } },
 { .args_ct_str = { "", "r", "rW" } },
 };
 return [have_bmi1];
@@ -2452,7 +2461,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_clz_i64:
 {
 static const TCGTargetOpDef clz[2] = {
-{ .args_ct_str = { "", "r", "r" } },
+{ .args_ct_str = { "", "r", "0i" } },
 { .args_ct_str = { "", "r", "rW" } },
 };
 return [have_lzcnt];
-- 
2.9.3




[Qemu-devel] [PULL 29/65] target-microblaze: Use clz opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/microblaze/helper.h| 1 -
 target/microblaze/op_helper.c | 5 -
 target/microblaze/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/microblaze/helper.h b/target/microblaze/helper.h
index bd13826..71a6c08 100644
--- a/target/microblaze/helper.h
+++ b/target/microblaze/helper.h
@@ -3,7 +3,6 @@ DEF_HELPER_1(debug, void, env)
 DEF_HELPER_FLAGS_3(carry, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
 DEF_HELPER_2(cmp, i32, i32, i32)
 DEF_HELPER_2(cmpu, i32, i32, i32)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
 
 DEF_HELPER_3(divs, i32, env, i32, i32)
 DEF_HELPER_3(divu, i32, env, i32, i32)
diff --git a/target/microblaze/op_helper.c b/target/microblaze/op_helper.c
index 4a856e6..1e07e21 100644
--- a/target/microblaze/op_helper.c
+++ b/target/microblaze/op_helper.c
@@ -145,11 +145,6 @@ uint32_t helper_cmpu(uint32_t a, uint32_t b)
 return t;
 }
 
-uint32_t helper_clz(uint32_t t0)
-{
-return clz32(t0);
-}
-
 uint32_t helper_carry(uint32_t a, uint32_t b, uint32_t cf)
 {
 return compute_carry(a, b, cf);
diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
index de2090a..0bb6095 100644
--- a/target/microblaze/translate.c
+++ b/target/microblaze/translate.c
@@ -768,7 +768,7 @@ static void dec_bit(DisasContext *dc)
 t_gen_raise_exception(dc, EXCP_HW_EXCP);
 }
 if (dc->cpu->env.pvr.regs[2] & PVR2_USE_PCMP_INSTR) {
-gen_helper_clz(cpu_R[dc->rd], cpu_R[dc->ra]);
+tcg_gen_clzi_i32(cpu_R[dc->rd], cpu_R[dc->ra], 32);
 }
 break;
 case 0x1e0:
-- 
2.9.3




[Qemu-devel] [PULL 31/65] target-openrisc: Use clz and ctz opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/openrisc/helper.h |  2 --
 target/openrisc/int_helper.c | 19 ---
 target/openrisc/translate.c  |  6 --
 3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/target/openrisc/helper.h b/target/openrisc/helper.h
index f53fa21..bcc7245 100644
--- a/target/openrisc/helper.h
+++ b/target/openrisc/helper.h
@@ -54,8 +54,6 @@ FOP_CMP(ge)
 #undef FOP_CMP
 
 /* int */
-DEF_HELPER_FLAGS_1(ff1, 0, tl, tl)
-DEF_HELPER_FLAGS_1(fl1, 0, tl, tl)
 DEF_HELPER_FLAGS_3(mul32, 0, i32, env, i32, i32)
 
 /* interrupt */
diff --git a/target/openrisc/int_helper.c b/target/openrisc/int_helper.c
index 4d1f958..ba0fd27 100644
--- a/target/openrisc/int_helper.c
+++ b/target/openrisc/int_helper.c
@@ -24,25 +24,6 @@
 #include "exception.h"
 #include "qemu/host-utils.h"
 
-target_ulong HELPER(ff1)(target_ulong x)
-{
-/*#ifdef TARGET_OPENRISC64
-return x ? ctz64(x) + 1 : 0;
-#else*/
-return x ? ctz32(x) + 1 : 0;
-/*#endif*/
-}
-
-target_ulong HELPER(fl1)(target_ulong x)
-{
-/* not used yet, open it when we need or64.  */
-/*#ifdef TARGET_OPENRISC64
-return 64 - clz64(x);
-#else*/
-return 32 - clz32(x);
-/*#endif*/
-}
-
 uint32_t HELPER(mul32)(CPUOpenRISCState *env,
uint32_t ra, uint32_t rb)
 {
diff --git a/target/openrisc/translate.c b/target/openrisc/translate.c
index 229361a..03fa7db 100644
--- a/target/openrisc/translate.c
+++ b/target/openrisc/translate.c
@@ -602,11 +602,13 @@ static void dec_calc(DisasContext *dc, uint32_t insn)
 switch (op1) {
 case 0x00:/* l.ff1 */
 LOG_DIS("l.ff1 r%d, r%d, r%d\n", rd, ra, rb);
-gen_helper_ff1(cpu_R[rd], cpu_R[ra]);
+tcg_gen_ctzi_tl(cpu_R[rd], cpu_R[ra], -1);
+tcg_gen_addi_tl(cpu_R[rd], cpu_R[rd], 1);
 break;
 case 0x01:/* l.fl1 */
 LOG_DIS("l.fl1 r%d, r%d, r%d\n", rd, ra, rb);
-gen_helper_fl1(cpu_R[rd], cpu_R[ra]);
+tcg_gen_clzi_tl(cpu_R[rd], cpu_R[ra], TARGET_LONG_BITS);
+tcg_gen_subfi_tl(cpu_R[rd], TARGET_LONG_BITS, cpu_R[rd]);
 break;
 
 default:
-- 
2.9.3




[Qemu-devel] [PULL 54/65] tcg: Add opcode for ctpop

2017-01-10 Thread Richard Henderson
The number of actual invocations of ctpop itself does not warrent
an opcode, but it is very helpful for POWER7 to use in generating
an expansion for ctz.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg-runtime.c| 10 ++
 tcg/aarch64/tcg-target.h |  2 ++
 tcg/arm/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  2 ++
 tcg/ia64/tcg-target.h|  2 ++
 tcg/mips/tcg-target.h|  2 ++
 tcg/optimize.c   | 14 ++
 tcg/ppc/tcg-target.h |  2 ++
 tcg/s390/tcg-target.h|  2 ++
 tcg/sparc/tcg-target.h   |  2 ++
 tcg/tcg-op.c | 29 +
 tcg/tcg-op.h |  4 
 tcg/tcg-opc.h|  2 ++
 tcg/tcg-runtime.h|  2 ++
 tcg/tcg.h|  1 +
 tcg/tci/tcg-target.h |  2 ++
 16 files changed, 79 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index c8b98df..4c60c96 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -131,6 +131,16 @@ uint64_t HELPER(clrsb_i64)(uint64_t arg)
 return clrsb64(arg);
 }
 
+uint32_t HELPER(ctpop_i32)(uint32_t arg)
+{
+return ctpop32(arg);
+}
+
+uint64_t HELPER(ctpop_i64)(uint64_t arg)
+{
+return ctpop64(arg);
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
 cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 9d6b00f..1a5ea23 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -64,6 +64,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_clz_i32  1
 #define TCG_TARGET_HAS_ctz_i32  1
+#define TCG_TARGET_HAS_ctpop_i320
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 1
@@ -98,6 +99,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_i64  0
 #define TCG_TARGET_HAS_clz_i64  1
 #define TCG_TARGET_HAS_ctz_i64  1
+#define TCG_TARGET_HAS_ctpop_i640
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 4cb94dc..09a19c6 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -112,6 +112,7 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_clz_i32  use_armv5t_instructions
 #define TCG_TARGET_HAS_ctz_i32  use_armv7_instructions
+#define TCG_TARGET_HAS_ctpop_i320
 #define TCG_TARGET_HAS_deposit_i32  use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32  use_armv7_instructions
 #define TCG_TARGET_HAS_sextract_i32 use_armv7_instructions
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 8fff287..b8f73f5 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -95,6 +95,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_clz_i32  1
 #define TCG_TARGET_HAS_ctz_i32  1
+#define TCG_TARGET_HAS_ctpop_i320
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 1
@@ -129,6 +130,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nor_i64  0
 #define TCG_TARGET_HAS_clz_i64  1
 #define TCG_TARGET_HAS_ctz_i64  1
+#define TCG_TARGET_HAS_ctpop_i640
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 9a829ae..42aea03 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -144,6 +144,8 @@ typedef enum {
 #define TCG_TARGET_HAS_clz_i64  0
 #define TCG_TARGET_HAS_ctz_i32  0
 #define TCG_TARGET_HAS_ctz_i64  0
+#define TCG_TARGET_HAS_ctpop_i320
+#define TCG_TARGET_HAS_ctpop_i640
 #define TCG_TARGET_HAS_nor_i64  1
 #define TCG_TARGET_HAS_orc_i32  1
 #define TCG_TARGET_HAS_orc_i64  1
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index a680f16..f46d64a 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -165,6 +165,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_rot_i32  use_mips32r2_instructions
 #define TCG_TARGET_HAS_clz_i32  use_mips32r2_instructions
 #define TCG_TARGET_HAS_ctz_i32  0
+#define TCG_TARGET_HAS_ctpop_i320
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_movcond_i64  use_movnz_instructions
@@ -179,6 +180,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_rot_i64  use_mips32r2_instructions
 #define TCG_TARGET_HAS_clz_i64  use_mips32r2_instructions
 #define TCG_TARGET_HAS_ctz_i64  0
+#define TCG_TARGET_HAS_ctpop_i640
 #endif
 
 /* optional instructions automatically implemented */

[Qemu-devel] [PULL 45/65] tcg/i386: Fuly convert tcg_target_op_def

2017-01-10 Thread Richard Henderson
Use a switch instead of searching a table.  Share constraints between
32-bit and 64-bit, when at all possible.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 340 +++---
 1 file changed, 198 insertions(+), 142 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index aa5a248..e497bef 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -237,13 +237,13 @@ static const char 
*target_parse_constraint(TCGArgConstraint *ct,
 break;
 
 case 'e':
-ct->ct |= TCG_CT_CONST_S32;
+ct->ct |= (type == TCG_TYPE_I32 ? TCG_CT_CONST : TCG_CT_CONST_S32);
 break;
 case 'Z':
-ct->ct |= TCG_CT_CONST_U32;
+ct->ct |= (type == TCG_TYPE_I32 ? TCG_CT_CONST : TCG_CT_CONST_U32);
 break;
 case 'I':
-ct->ct |= TCG_CT_CONST_I32;
+ct->ct |= (type == TCG_TYPE_I32 ? TCG_CT_CONST : TCG_CT_CONST_I32);
 break;
 
 default:
@@ -2188,152 +2188,208 @@ static inline void tcg_out_op(TCGContext *s, 
TCGOpcode opc,
 #undef OP_32_64
 }
 
-static const TCGTargetOpDef x86_op_defs[] = {
-{ INDEX_op_exit_tb, { } },
-{ INDEX_op_goto_tb, { } },
-{ INDEX_op_br, { } },
-{ INDEX_op_ld8u_i32, { "r", "r" } },
-{ INDEX_op_ld8s_i32, { "r", "r" } },
-{ INDEX_op_ld16u_i32, { "r", "r" } },
-{ INDEX_op_ld16s_i32, { "r", "r" } },
-{ INDEX_op_ld_i32, { "r", "r" } },
-{ INDEX_op_st8_i32, { "qi", "r" } },
-{ INDEX_op_st16_i32, { "ri", "r" } },
-{ INDEX_op_st_i32, { "ri", "r" } },
-
-{ INDEX_op_add_i32, { "r", "r", "ri" } },
-{ INDEX_op_sub_i32, { "r", "0", "ri" } },
-{ INDEX_op_mul_i32, { "r", "0", "ri" } },
-{ INDEX_op_div2_i32, { "a", "d", "0", "1", "r" } },
-{ INDEX_op_divu2_i32, { "a", "d", "0", "1", "r" } },
-{ INDEX_op_and_i32, { "r", "0", "ri" } },
-{ INDEX_op_or_i32, { "r", "0", "ri" } },
-{ INDEX_op_xor_i32, { "r", "0", "ri" } },
-{ INDEX_op_andc_i32, { "r", "r", "ri" } },
-
-{ INDEX_op_shl_i32, { "r", "0", "Ci" } },
-{ INDEX_op_shr_i32, { "r", "0", "Ci" } },
-{ INDEX_op_sar_i32, { "r", "0", "Ci" } },
-{ INDEX_op_rotl_i32, { "r", "0", "ci" } },
-{ INDEX_op_rotr_i32, { "r", "0", "ci" } },
-
-{ INDEX_op_brcond_i32, { "r", "ri" } },
-
-{ INDEX_op_bswap16_i32, { "r", "0" } },
-{ INDEX_op_bswap32_i32, { "r", "0" } },
-
-{ INDEX_op_neg_i32, { "r", "0" } },
-
-{ INDEX_op_not_i32, { "r", "0" } },
-
-{ INDEX_op_ext8s_i32, { "r", "q" } },
-{ INDEX_op_ext16s_i32, { "r", "r" } },
-{ INDEX_op_ext8u_i32, { "r", "q" } },
-{ INDEX_op_ext16u_i32, { "r", "r" } },
-
-{ INDEX_op_setcond_i32, { "q", "r", "ri" } },
-
-{ INDEX_op_deposit_i32, { "Q", "0", "Q" } },
-{ INDEX_op_extract_i32, { "r", "r" } },
-{ INDEX_op_sextract_i32, { "r", "r" } },
-
-{ INDEX_op_movcond_i32, { "r", "r", "ri", "r", "0" } },
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+static const TCGTargetOpDef ri_r = { .args_ct_str = { "ri", "r" } };
+static const TCGTargetOpDef re_r = { .args_ct_str = { "re", "r" } };
+static const TCGTargetOpDef qi_r = { .args_ct_str = { "qi", "r" } };
+static const TCGTargetOpDef r_r = { .args_ct_str = { "r", "r" } };
+static const TCGTargetOpDef r_q = { .args_ct_str = { "r", "q" } };
+static const TCGTargetOpDef r_re = { .args_ct_str = { "r", "re" } };
+static const TCGTargetOpDef r_0 = { .args_ct_str = { "r", "0" } };
+static const TCGTargetOpDef r_r_re = { .args_ct_str = { "r", "r", "re" } };
+static const TCGTargetOpDef r_0_re = { .args_ct_str = { "r", "0", "re" } };
+static const TCGTargetOpDef r_0_Ci = { .args_ct_str = { "r", "0", "Ci" } };
+static const TCGTargetOpDef r_0_ci = { .args_ct_str = { "r", "0", "ci" } };
+static const TCGTargetOpDef r_L = { .args_ct_str = { "r", "L" } };
+static const TCGTargetOpDef L_L = { .args_ct_str = { "L", "L" } };
+static const TCGTargetOpDef r_L_L = { .args_ct_str = { "r", "L", "L" } };
+static const TCGTargetOpDef r_r_L = { .args_ct_str = { "r", "r", "L" } };
+static const TCGTargetOpDef L_L_L = { .args_ct_str = { "L", "L", "L" } };
+static const TCGTargetOpDef r_r_L_L
+= { .args_ct_str = { "r", "r", "L", "L" } };
+static const TCGTargetOpDef L_L_L_L
+= { .args_ct_str = { "L", "L", "L", "L" } };
+
+switch (op) {
+case INDEX_op_ld8u_i32:
+case INDEX_op_ld8u_i64:
+case INDEX_op_ld8s_i32:
+case INDEX_op_ld8s_i64:
+case INDEX_op_ld16u_i32:
+case INDEX_op_ld16u_i64:
+case INDEX_op_ld16s_i32:
+case INDEX_op_ld16s_i64:
+case INDEX_op_ld_i32:
+case INDEX_op_ld32u_i64:
+case INDEX_op_ld32s_i64:
+case INDEX_op_ld_i64:
+return _r;
 
-{ INDEX_op_mulu2_i32, { "a", "d", "a", "r" } },
-{ INDEX_op_muls2_i32, { "a", "d", "a", "r" } },
-{ INDEX_op_add2_i32, { "r", "r", "0", "1", "ri", "ri" } },
-{ 

[Qemu-devel] [PULL 52/65] target-tricore: Use clrsb helper

2017-01-10 Thread Richard Henderson
Tested-by: Bastian Koppelmann 
Reviewed-by: Bastian Koppelmann 
Signed-off-by: Richard Henderson 
---
 target/tricore/helper.h| 1 -
 target/tricore/op_helper.c | 5 -
 target/tricore/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/tricore/helper.h b/target/tricore/helper.h
index 2cf04e1..d215349 100644
--- a/target/tricore/helper.h
+++ b/target/tricore/helper.h
@@ -89,7 +89,6 @@ DEF_HELPER_FLAGS_2(ixmin_u, TCG_CALL_NO_RWG_SE, i64, i64, i32)
 /* count leading ... */
 DEF_HELPER_FLAGS_1(clo_h, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clz_h, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(cls, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(cls_h, TCG_CALL_NO_RWG_SE, i32, i32)
 /* sh */
 DEF_HELPER_FLAGS_2(sh, TCG_CALL_NO_RWG_SE, i32, i32, i32)
diff --git a/target/tricore/op_helper.c b/target/tricore/op_helper.c
index 3731d5e..7af202c 100644
--- a/target/tricore/op_helper.c
+++ b/target/tricore/op_helper.c
@@ -1769,11 +1769,6 @@ uint32_t helper_clz_h(target_ulong r1)
 return ret_hw0 | (ret_hw1 << 16);
 }
 
-uint32_t helper_cls(target_ulong r1)
-{
-return clrsb32(r1);
-}
-
 uint32_t helper_cls_h(target_ulong r1)
 {
 uint32_t ret_hw0 = extract32(r1, 0, 16);
diff --git a/target/tricore/translate.c b/target/tricore/translate.c
index 69cdfb9..41b1d27 100644
--- a/target/tricore/translate.c
+++ b/target/tricore/translate.c
@@ -6374,7 +6374,7 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, 
DisasContext *ctx)
 gen_helper_clo_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
 break;
 case OPC2_32_RR_CLS:
-gen_helper_cls(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+tcg_gen_clrsb_tl(cpu_gpr_d[r3], cpu_gpr_d[r1]);
 break;
 case OPC2_32_RR_CLS_H:
 gen_helper_cls_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
-- 
2.9.3




[Qemu-devel] [PULL 25/65] disas/i386.c: Handle tzcnt

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 disas/i386.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/disas/i386.c b/disas/i386.c
index 57145d0..07f871f 100644
--- a/disas/i386.c
+++ b/disas/i386.c
@@ -682,6 +682,7 @@ fetch_data(struct disassemble_info *info, bfd_byte *addr)
 #define PREGRP104 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 104 } }
 #define PREGRP105 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 105 } }
 #define PREGRP106 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 106 } }
+#define PREGRP107 NULL, { { NULL, USE_PREFIX_USER_TABLE }, { NULL, 107 } }
 
 #define X86_64_0  NULL, { { NULL, X86_64_SPECIAL }, { NULL, 0 } }
 #define X86_64_1  NULL, { { NULL, X86_64_SPECIAL }, { NULL, 1 } }
@@ -1247,7 +1248,7 @@ static const struct dis386 dis386_twobyte[] = {
   { "ud2b",{ XX } },
   { GRP8 },
   { "btcS",{ Ev, Gv } },
-  { "bsfS",{ Gv, Ev } },
+  { PREGRP107 },
   { PREGRP36 },
   { "movs{bR|x|bR|x}", { Gv, Eb } },
   { "movs{wR|x|wR|x}", { Gv, Ew } }, /* yes, there really is movsww ! */
@@ -1431,7 +1432,7 @@ static const unsigned char twobyte_uses_REPZ_prefix[256] 
= {
   /* 80 */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 8f */
   /* 90 */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* 9f */
   /* a0 */ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, /* af */
-  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0, /* bf */
+  /* b0 */ 0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0, /* bf */
   /* c0 */ 0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0, /* cf */
   /* d0 */ 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0, /* df */
   /* e0 */ 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0, /* ef */
@@ -2800,6 +2801,13 @@ static const struct dis386 prefix_user_table[][4] = {
 { "shrxS", { Gv, Ev, Bv } },
   },
 
+  /* PREGRP107 */
+  {
+{ "bsfS",  { Gv, Ev } },
+{ "tzcntS",{ Gv, Ev } },
+{ "bsfS",  { Gv, Ev } },
+{ "(bad)", { XX } },
+  },
 };
 
 static const struct dis386 x86_64_table[][2] = {
-- 
2.9.3




[Qemu-devel] [PULL 30/65] target-mips: Use clz opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/mips/helper.h|  7 ---
 target/mips/op_helper.c | 22 --
 target/mips/translate.c | 23 ---
 3 files changed, 16 insertions(+), 36 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 666936c..60efa01 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -20,13 +20,6 @@ DEF_HELPER_4(scd, tl, env, tl, tl, int)
 #endif
 #endif
 
-DEF_HELPER_FLAGS_1(clo, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, tl, tl)
-#ifdef TARGET_MIPS64
-DEF_HELPER_FLAGS_1(dclo, TCG_CALL_NO_RWG_SE, tl, tl)
-DEF_HELPER_FLAGS_1(dclz, TCG_CALL_NO_RWG_SE, tl, tl)
-#endif
-
 DEF_HELPER_3(muls, tl, env, tl, tl)
 DEF_HELPER_3(mulsu, tl, env, tl, tl)
 DEF_HELPER_3(macc, tl, env, tl, tl)
diff --git a/target/mips/op_helper.c b/target/mips/op_helper.c
index 7af4c2f..11d781f 100644
--- a/target/mips/op_helper.c
+++ b/target/mips/op_helper.c
@@ -103,28 +103,6 @@ HELPER_ST(sd, stq, uint64_t)
 #endif
 #undef HELPER_ST
 
-target_ulong helper_clo (target_ulong arg1)
-{
-return clo32(arg1);
-}
-
-target_ulong helper_clz (target_ulong arg1)
-{
-return clz32(arg1);
-}
-
-#if defined(TARGET_MIPS64)
-target_ulong helper_dclo (target_ulong arg1)
-{
-return clo64(arg1);
-}
-
-target_ulong helper_dclz (target_ulong arg1)
-{
-return clz64(arg1);
-}
-#endif /* TARGET_MIPS64 */
-
 /* 64 bits arithmetic for 32 bits hosts */
 static inline uint64_t get_HILO(CPUMIPSState *env)
 {
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 8deffa1..7f8ecf4 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -3626,29 +3626,38 @@ static void gen_cl (DisasContext *ctx, uint32_t opc,
 /* Treat as NOP. */
 return;
 }
-t0 = tcg_temp_new();
+t0 = cpu_gpr[rd];
 gen_load_gpr(t0, rs);
+
 switch (opc) {
 case OPC_CLO:
 case R6_OPC_CLO:
-gen_helper_clo(cpu_gpr[rd], t0);
+#if defined(TARGET_MIPS64)
+case OPC_DCLO:
+case R6_OPC_DCLO:
+#endif
+tcg_gen_not_tl(t0, t0);
 break;
+}
+
+switch (opc) {
+case OPC_CLO:
+case R6_OPC_CLO:
 case OPC_CLZ:
 case R6_OPC_CLZ:
-gen_helper_clz(cpu_gpr[rd], t0);
+tcg_gen_ext32u_tl(t0, t0);
+tcg_gen_clzi_tl(t0, t0, TARGET_LONG_BITS);
+tcg_gen_subi_tl(t0, t0, TARGET_LONG_BITS - 32);
 break;
 #if defined(TARGET_MIPS64)
 case OPC_DCLO:
 case R6_OPC_DCLO:
-gen_helper_dclo(cpu_gpr[rd], t0);
-break;
 case OPC_DCLZ:
 case R6_OPC_DCLZ:
-gen_helper_dclz(cpu_gpr[rd], t0);
+tcg_gen_clzi_i64(t0, t0, 64);
 break;
 #endif
 }
-tcg_temp_free(t0);
 }
 
 /* Godson integer instructions */
-- 
2.9.3




[Qemu-devel] [PULL 41/65] tcg/aarch64: Handle ctz and clz opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  8 
 tcg/aarch64/tcg-target.inc.c | 48 
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 976f493..9d6b00f 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -62,8 +62,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32  1
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
-#define TCG_TARGET_HAS_clz_i32  0
-#define TCG_TARGET_HAS_ctz_i32  0
+#define TCG_TARGET_HAS_clz_i32  1
+#define TCG_TARGET_HAS_ctz_i32  1
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 1
@@ -96,8 +96,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64  1
 #define TCG_TARGET_HAS_nand_i64 0
 #define TCG_TARGET_HAS_nor_i64  0
-#define TCG_TARGET_HAS_clz_i64  0
-#define TCG_TARGET_HAS_ctz_i64  0
+#define TCG_TARGET_HAS_clz_i64  1
+#define TCG_TARGET_HAS_ctz_i64  1
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 1
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 17c0b20..585b0d6 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -339,8 +339,12 @@ typedef enum {
 /* Conditional select instructions.  */
 I3506_CSEL  = 0x1a80,
 I3506_CSINC = 0x1a800400,
+I3506_CSINV = 0x5a80,
+I3506_CSNEG = 0x5a800400,
 
 /* Data-processing (1 source) instructions.  */
+I3507_CLZ   = 0x5ac01000,
+I3507_RBIT  = 0x5ac0,
 I3507_REV16 = 0x5ac00400,
 I3507_REV32 = 0x5ac00800,
 I3507_REV64 = 0x5ac00c00,
@@ -993,6 +997,37 @@ static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
 tcg_out32(s, sync[a0 & TCG_MO_ALL]);
 }
 
+static void tcg_out_cltz(TCGContext *s, TCGType ext, TCGReg d,
+ TCGReg a0, TCGArg b, bool const_b, bool is_ctz)
+{
+TCGReg a1 = a0;
+if (is_ctz) {
+a1 = TCG_REG_TMP;
+tcg_out_insn(s, 3507, RBIT, ext, a1, a0);
+}
+if (const_b && b == (ext ? 64 : 32)) {
+tcg_out_insn(s, 3507, CLZ, ext, d, a1);
+} else {
+AArch64Insn sel = I3506_CSEL;
+
+tcg_out_cmp(s, ext, a0, 0, 1);
+tcg_out_insn(s, 3507, CLZ, ext, TCG_REG_TMP, a1);
+
+if (const_b) {
+if (b == -1) {
+b = TCG_REG_XZR;
+sel = I3506_CSINV;
+} else if (b == 0) {
+b = TCG_REG_XZR;
+} else {
+tcg_out_movi(s, ext, d, b);
+b = d;
+}
+}
+tcg_out_insn_3506(s, sel, ext, d, TCG_REG_TMP, b, TCG_COND_NE);
+}
+}
+
 #ifdef CONFIG_SOFTMMU
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  * TCGMemOpIdx oi, uintptr_t ra)
@@ -1559,6 +1594,15 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 }
 break;
 
+case INDEX_op_clz_i64:
+case INDEX_op_clz_i32:
+tcg_out_cltz(s, ext, a0, a1, a2, c2, false);
+break;
+case INDEX_op_ctz_i64:
+case INDEX_op_ctz_i32:
+tcg_out_cltz(s, ext, a0, a1, a2, c2, true);
+break;
+
 case INDEX_op_brcond_i32:
 a1 = (int32_t)a1;
 /* FALLTHRU */
@@ -1750,11 +1794,15 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
 { INDEX_op_sar_i32, { "r", "r", "ri" } },
 { INDEX_op_rotl_i32, { "r", "r", "ri" } },
 { INDEX_op_rotr_i32, { "r", "r", "ri" } },
+{ INDEX_op_clz_i32, { "r", "r", "rAL" } },
+{ INDEX_op_ctz_i32, { "r", "r", "rAL" } },
 { INDEX_op_shl_i64, { "r", "r", "ri" } },
 { INDEX_op_shr_i64, { "r", "r", "ri" } },
 { INDEX_op_sar_i64, { "r", "r", "ri" } },
 { INDEX_op_rotl_i64, { "r", "r", "ri" } },
 { INDEX_op_rotr_i64, { "r", "r", "ri" } },
+{ INDEX_op_clz_i64, { "r", "r", "rAL" } },
+{ INDEX_op_ctz_i64, { "r", "r", "rAL" } },
 
 { INDEX_op_brcond_i32, { "r", "rA" } },
 { INDEX_op_brcond_i64, { "r", "rA" } },
-- 
2.9.3




[Qemu-devel] [PULL 21/65] tcg: Transition flat op_defs array to a target callback

2017-01-10 Thread Richard Henderson
This will allow the target to tailor the constraints to the
auto-detected ISA extensions.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 14 ++--
 tcg/arm/tcg-target.inc.c | 14 ++--
 tcg/i386/tcg-target.inc.c| 14 ++--
 tcg/ia64/tcg-target.inc.c| 14 ++--
 tcg/mips/tcg-target.inc.c| 14 ++--
 tcg/ppc/tcg-target.inc.c | 14 ++--
 tcg/s390/tcg-target.inc.c| 14 ++--
 tcg/sparc/tcg-target.inc.c   | 14 ++--
 tcg/tcg.c| 86 +++-
 tcg/tcg.h|  2 --
 tcg/tci/tcg-target.inc.c | 13 ++-
 11 files changed, 136 insertions(+), 77 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index c0e9890..416db45 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1812,6 +1812,18 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
 { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+int i, n = ARRAY_SIZE(aarch64_op_defs);
+
+for (i = 0; i < n; ++i) {
+if (aarch64_op_defs[i].op == op) {
+return _op_defs[i];
+}
+}
+return NULL;
+}
+
 static void tcg_target_init(TCGContext *s)
 {
 tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0x);
@@ -1834,8 +1846,6 @@ static void tcg_target_init(TCGContext *s)
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_FP);
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */
-
-tcg_add_target_add_op_defs(aarch64_op_defs);
 }
 
 /* Saving pairs: (X19, X20) .. (X27, X28), (X29(fp), X30(lr)).  */
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 2d5af0f..eeabcf8 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -2008,6 +2008,18 @@ static const TCGTargetOpDef arm_op_defs[] = {
 { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+int i, n = ARRAY_SIZE(arm_op_defs);
+
+for (i = 0; i < n; ++i) {
+if (arm_op_defs[i].op == op) {
+return _op_defs[i];
+}
+}
+return NULL;
+}
+
 static void tcg_target_init(TCGContext *s)
 {
 /* Only probe for the platform and capabilities if we havn't already
@@ -2038,8 +2050,6 @@ static void tcg_target_init(TCGContext *s)
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_PC);
-
-tcg_add_target_add_op_defs(arm_op_defs);
 }
 
 static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 39f62bd..595c399 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2330,6 +2330,18 @@ static const TCGTargetOpDef x86_op_defs[] = {
 { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+int i, n = ARRAY_SIZE(x86_op_defs);
+
+for (i = 0; i < n; ++i) {
+if (x86_op_defs[i].op == op) {
+return _op_defs[i];
+}
+}
+return NULL;
+}
+
 static int tcg_target_callee_save_regs[] = {
 #if TCG_TARGET_REG_BITS == 64
 TCG_REG_RBP,
@@ -2471,8 +2483,6 @@ static void tcg_target_init(TCGContext *s)
 
 tcg_regset_clear(s->reserved_regs);
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_CALL_STACK);
-
-tcg_add_target_add_op_defs(x86_op_defs);
 }
 
 typedef struct {
diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index b04d716..e4d419d 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -2352,6 +2352,18 @@ static const TCGTargetOpDef ia64_op_defs[] = {
 { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+int i, n = ARRAY_SIZE(ia64_op_defs);
+
+for (i = 0; i < n; ++i) {
+if (ia64_op_defs[i].op == op) {
+return _op_defs[i];
+}
+}
+return NULL;
+}
+
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
@@ -2471,6 +2483,4 @@ static void tcg_target_init(TCGContext *s)
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R5);
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R6);
 tcg_regset_set_reg(s->reserved_regs, TCG_REG_R7);
-
-tcg_add_target_add_op_defs(ia64_op_defs);
 }
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 24c4949..a8f031a 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2262,6 +2262,18 @@ static const TCGTargetOpDef mips_op_defs[] = {
 { -1 },
 };
 
+static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
+{
+int i, n = ARRAY_SIZE(mips_op_defs);
+
+for (i = 0; i < n; ++i) {
+if (mips_op_defs[i].op == op) {
+return _op_defs[i];
+}
+

[Qemu-devel] [PULL 24/65] tcg: Add clz and ctz opcodes

2017-01-10 Thread Richard Henderson
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg-runtime.c|  20 +++
 tcg/README   |   8 +++
 tcg/aarch64/tcg-target.h |   4 ++
 tcg/arm/tcg-target.h |   2 +
 tcg/i386/tcg-target.h|   4 ++
 tcg/ia64/tcg-target.h|   4 ++
 tcg/mips/tcg-target.h|   2 +
 tcg/optimize.c   |  36 
 tcg/ppc/tcg-target.h |   4 ++
 tcg/s390/tcg-target.h|   4 ++
 tcg/sparc/tcg-target.h   |   4 ++
 tcg/tcg-op.c | 143 +++
 tcg/tcg-op.h |  16 ++
 tcg/tcg-opc.h|   4 ++
 tcg/tcg-runtime.h|   5 ++
 tcg/tcg.h|   2 +
 tcg/tci/tcg-target.h |   4 ++
 17 files changed, 266 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index 9327b6f..eb3bade 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -101,6 +101,26 @@ int64_t HELPER(mulsh_i64)(int64_t arg1, int64_t arg2)
 return h;
 }
 
+uint32_t HELPER(clz_i32)(uint32_t arg, uint32_t zero_val)
+{
+return arg ? clz32(arg) : zero_val;
+}
+
+uint32_t HELPER(ctz_i32)(uint32_t arg, uint32_t zero_val)
+{
+return arg ? ctz32(arg) : zero_val;
+}
+
+uint64_t HELPER(clz_i64)(uint64_t arg, uint64_t zero_val)
+{
+return arg ? clz64(arg) : zero_val;
+}
+
+uint64_t HELPER(ctz_i64)(uint64_t arg, uint64_t zero_val)
+{
+return arg ? ctz64(arg) : zero_val;
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
 cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/README b/tcg/README
index 6946b5b..a9858c2 100644
--- a/tcg/README
+++ b/tcg/README
@@ -246,6 +246,14 @@ t0=~(t1|t2)
 
 t0=t1|~t2
 
+* clz_i32/i64 t0, t1, t2
+
+t0 = t1 ? clz(t1) : t2
+
+* ctz_i32/i64 t0, t1, t2
+
+t0 = t1 ? ctz(t1) : t2
+
 * Shifts/Rotates
 
 * shl_i32/i64 t0, t1, t2
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 4a74bd8..976f493 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -62,6 +62,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32  1
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
+#define TCG_TARGET_HAS_clz_i32  0
+#define TCG_TARGET_HAS_ctz_i32  0
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 1
@@ -94,6 +96,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64  1
 #define TCG_TARGET_HAS_nand_i64 0
 #define TCG_TARGET_HAS_nor_i64  0
+#define TCG_TARGET_HAS_clz_i64  0
+#define TCG_TARGET_HAS_ctz_i64  0
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 4e30728..02cc242 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -110,6 +110,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_eqv_i32  0
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
+#define TCG_TARGET_HAS_clz_i32  0
+#define TCG_TARGET_HAS_ctz_i32  0
 #define TCG_TARGET_HAS_deposit_i32  use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32  use_armv7_instructions
 #define TCG_TARGET_HAS_sextract_i32 use_armv7_instructions
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index dc19c47..f2d9955 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -93,6 +93,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i32  0
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
+#define TCG_TARGET_HAS_clz_i32  0
+#define TCG_TARGET_HAS_ctz_i32  0
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 1
@@ -125,6 +127,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_eqv_i64  0
 #define TCG_TARGET_HAS_nand_i64 0
 #define TCG_TARGET_HAS_nor_i64  0
+#define TCG_TARGET_HAS_clz_i64  0
+#define TCG_TARGET_HAS_ctz_i64  0
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 8856dc8..9a829ae 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -140,6 +140,10 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32 1
 #define TCG_TARGET_HAS_nand_i64 1
 #define TCG_TARGET_HAS_nor_i32  1
+#define TCG_TARGET_HAS_clz_i32  0
+#define TCG_TARGET_HAS_clz_i64  0
+#define TCG_TARGET_HAS_ctz_i32  0
+#define TCG_TARGET_HAS_ctz_i64  0
 #define TCG_TARGET_HAS_nor_i64  1
 #define TCG_TARGET_HAS_orc_i32  1
 #define TCG_TARGET_HAS_orc_i64  1
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 92d203a..06988cf 100644
--- 

[Qemu-devel] [PULL 44/65] tcg/s390: Handle clz opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.h |  2 +-
 tcg/s390/tcg-target.inc.c | 36 +++-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 3ac2dc9..22500ba 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -110,7 +110,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_eqv_i640
 #define TCG_TARGET_HAS_nand_i64   0
 #define TCG_TARGET_HAS_nor_i640
-#define TCG_TARGET_HAS_clz_i640
+#define TCG_TARGET_HAS_clz_i64(s390_facilities & FACILITY_EXT_IMM)
 #define TCG_TARGET_HAS_ctz_i640
 #define TCG_TARGET_HAS_deposit_i64(s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_extract_i64(s390_facilities & FACILITY_GEN_INST_EXT)
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index c36a9ff..0682d01 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -50,7 +50,7 @@
 #define TCG_REG_NONE0
 
 /* A scratch register that may be be used throughout the backend.  */
-#define TCG_TMP0TCG_REG_R14
+#define TCG_TMP0TCG_REG_R1
 
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG TCG_REG_R13
@@ -133,6 +133,7 @@ typedef enum S390Opcode {
 RRE_DLR = 0xb997,
 RRE_DSGFR   = 0xb91d,
 RRE_DSGR= 0xb90d,
+RRE_FLOGR   = 0xb983,
 RRE_LGBR= 0xb906,
 RRE_LCGR= 0xb903,
 RRE_LGFR= 0xb914,
@@ -1246,6 +1247,33 @@ static void tgen_movcond(TCGContext *s, TCGType type, 
TCGCond c, TCGReg dest,
 }
 }
 
+static void tgen_clz(TCGContext *s, TCGReg dest, TCGReg a1,
+ TCGArg a2, int a2const)
+{
+/* Since this sets both R and R+1, we have no choice but to store the
+   result into R0, allowing R1 == TCG_TMP0 to be clobbered as well.  */
+QEMU_BUILD_BUG_ON(TCG_TMP0 != TCG_REG_R1);
+tcg_out_insn(s, RRE, FLOGR, TCG_REG_R0, a1);
+
+if (a2const && a2 == 64) {
+tcg_out_mov(s, TCG_TYPE_I64, dest, TCG_REG_R0);
+} else {
+if (a2const) {
+tcg_out_movi(s, TCG_TYPE_I64, dest, a2);
+} else {
+tcg_out_mov(s, TCG_TYPE_I64, dest, a2);
+}
+if (s390_facilities & FACILITY_LOAD_ON_COND) {
+/* Emit: if (one bit found) dest = r0.  */
+tcg_out_insn(s, RRF, LOCGR, dest, TCG_REG_R0, 2);
+} else {
+/* Emit: if (no one bit found) goto over; dest = r0; over:  */
+tcg_out_insn(s, RI, BRC, 8, (4 + 4) >> 1);
+tcg_out_insn(s, RRE, LGR, dest, TCG_REG_R0);
+}
+}
+}
+
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
  int ofs, int len, int z)
 {
@@ -2186,6 +2214,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tgen_extract(s, args[0], args[1], args[2], args[3]);
 break;
 
+case INDEX_op_clz_i64:
+tgen_clz(s, args[0], args[1], args[2], const_args[2]);
+break;
+
 case INDEX_op_mb:
 /* The host memory model is quite strong, we simply need to
serialize the instruction stream.  */
@@ -2309,6 +2341,8 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_bswap32_i64, { "r", "r" } },
 { INDEX_op_bswap64_i64, { "r", "r" } },
 
+{ INDEX_op_clz_i64, { "r", "r", "ri" } },
+
 { INDEX_op_add2_i64, { "r", "r", "0", "1", "rA", "r" } },
 { INDEX_op_sub2_i64, { "r", "r", "0", "1", "rA", "r" } },
 
-- 
2.9.3




[Qemu-devel] [PULL 40/65] tcg/ppc: Handle ctz and clz opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h | 10 +---
 tcg/ppc/tcg-target.inc.c | 67 
 2 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 698a599..c798c9c 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -49,6 +49,8 @@ typedef enum {
 TCG_AREG0 = TCG_REG_R27
 } TCGReg;
 
+extern bool have_isa_3_00;
+
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_ext8u_i320 /* andi */
 #define TCG_TARGET_HAS_ext16u_i32   0
@@ -68,8 +70,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i32  1
 #define TCG_TARGET_HAS_nand_i32 1
 #define TCG_TARGET_HAS_nor_i32  1
-#define TCG_TARGET_HAS_clz_i32  0
-#define TCG_TARGET_HAS_ctz_i32  0
+#define TCG_TARGET_HAS_clz_i32  1
+#define TCG_TARGET_HAS_ctz_i32  have_isa_3_00
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 0
@@ -103,8 +105,8 @@ typedef enum {
 #define TCG_TARGET_HAS_eqv_i64  1
 #define TCG_TARGET_HAS_nand_i64 1
 #define TCG_TARGET_HAS_nor_i64  1
-#define TCG_TARGET_HAS_clz_i64  0
-#define TCG_TARGET_HAS_ctz_i64  0
+#define TCG_TARGET_HAS_clz_i64  1
+#define TCG_TARGET_HAS_ctz_i64  have_isa_3_00
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index bf17161..766bc1a 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -77,11 +77,15 @@
 #define TCG_CT_CONST_U32  0x800
 #define TCG_CT_CONST_ZERO 0x1000
 #define TCG_CT_CONST_MONE 0x2000
+#define TCG_CT_CONST_WSZ  0x4000
 
 static tcg_insn_unit *tb_ret_addr;
 
 #include "elf.h"
+
 static bool have_isa_2_06;
+bool have_isa_3_00;
+
 #define HAVE_ISA_2_06  have_isa_2_06
 #define HAVE_ISEL  have_isa_2_06
 
@@ -305,6 +309,9 @@ static const char *target_parse_constraint(TCGArgConstraint 
*ct,
 case 'U':
 ct->ct |= TCG_CT_CONST_U32;
 break;
+case 'W':
+ct->ct |= TCG_CT_CONST_WSZ;
+break;
 case 'Z':
 ct->ct |= TCG_CT_CONST_ZERO;
 break;
@@ -341,6 +348,9 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 return 1;
 } else if ((ct & TCG_CT_CONST_MONE) && val == -1) {
 return 1;
+} else if ((ct & TCG_CT_CONST_WSZ)
+   && val == (type == TCG_TYPE_I32 ? 32 : 64)) {
+return 1;
 }
 return 0;
 }
@@ -445,6 +455,8 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define NORXO31(124)
 #define CNTLZW XO31( 26)
 #define CNTLZD XO31( 58)
+#define CNTTZW XO31(538)
+#define CNTTZD XO31(570)
 #define ANDC   XO31( 60)
 #define ORCXO31(412)
 #define EQVXO31(284)
@@ -1166,6 +1178,32 @@ static void tcg_out_movcond(TCGContext *s, TCGType type, 
TCGCond cond,
 }
 }
 
+static void tcg_out_cntxz(TCGContext *s, TCGType type, uint32_t opc,
+  TCGArg a0, TCGArg a1, TCGArg a2, bool const_a2)
+{
+if (const_a2 && a2 == (type == TCG_TYPE_I32 ? 32 : 64)) {
+tcg_out32(s, opc | RA(a0) | RS(a1));
+} else {
+tcg_out_cmp(s, TCG_COND_EQ, a1, 0, 1, 7, type);
+/* Note that the only other valid constant for a2 is 0.  */
+if (HAVE_ISEL) {
+tcg_out32(s, opc | RA(TCG_REG_R0) | RS(a1));
+tcg_out32(s, tcg_to_isel[TCG_COND_EQ] | TAB(a0, a2, TCG_REG_R0));
+} else if (!const_a2 && a0 == a2) {
+tcg_out32(s, tcg_to_bc[TCG_COND_EQ] | 8);
+tcg_out32(s, opc | RA(a0) | RS(a1));
+} else {
+tcg_out32(s, opc | RA(a0) | RS(a1));
+tcg_out32(s, tcg_to_bc[TCG_COND_NE] | 8);
+if (const_a2) {
+tcg_out_movi(s, type, a0, 0);
+} else {
+tcg_out_mov(s, type, a0, a2);
+}
+}
+}
+}
+
 static void tcg_out_cmp2(TCGContext *s, const TCGArg *args,
  const int *const_args)
 {
@@ -2103,6 +2141,24 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 tcg_out32(s, NOR | SAB(args[1], args[0], args[2]));
 break;
 
+case INDEX_op_clz_i32:
+tcg_out_cntxz(s, TCG_TYPE_I32, CNTLZW, args[0], args[1],
+  args[2], const_args[2]);
+break;
+case INDEX_op_ctz_i32:
+tcg_out_cntxz(s, TCG_TYPE_I32, CNTTZW, args[0], args[1],
+  args[2], const_args[2]);
+break;
+
+case INDEX_op_clz_i64:
+tcg_out_cntxz(s, TCG_TYPE_I64, CNTLZD, args[0], args[1],
+  args[2], const_args[2]);
+break;
+case INDEX_op_ctz_i64:
+tcg_out_cntxz(s, TCG_TYPE_I64, CNTTZD, args[0], args[1],
+ 

[Qemu-devel] [PULL 28/65] target-cris: Use clz opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/cris/helper.h| 1 -
 target/cris/op_helper.c | 5 -
 target/cris/translate.c | 2 +-
 3 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/cris/helper.h b/target/cris/helper.h
index ff35956..20d21c4 100644
--- a/target/cris/helper.h
+++ b/target/cris/helper.h
@@ -7,7 +7,6 @@ DEF_HELPER_1(rfn, void, env)
 DEF_HELPER_3(movl_sreg_reg, void, env, i32, i32)
 DEF_HELPER_3(movl_reg_sreg, void, env, i32, i32)
 
-DEF_HELPER_FLAGS_1(lz, TCG_CALL_NO_SE, i32, i32)
 DEF_HELPER_FLAGS_4(btst, TCG_CALL_NO_SE, i32, env, i32, i32, i32)
 
 DEF_HELPER_FLAGS_4(evaluate_flags_muls, TCG_CALL_NO_SE, i32, env, i32, i32, 
i32)
diff --git a/target/cris/op_helper.c b/target/cris/op_helper.c
index 5043039..e92505c 100644
--- a/target/cris/op_helper.c
+++ b/target/cris/op_helper.c
@@ -230,11 +230,6 @@ void helper_rfn(CPUCRISState *env)
env->pregs[PR_CCS] |= M_FLAG_V32;
 }
 
-uint32_t helper_lz(uint32_t t0)
-{
-   return clz32(t0);
-}
-
 uint32_t helper_btst(CPUCRISState *env, uint32_t t0, uint32_t t1, uint32_t ccs)
 {
/* FIXME: clean this up.  */
diff --git a/target/cris/translate.c b/target/cris/translate.c
index b910427..0ee05ca 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -767,7 +767,7 @@ static void cris_alu_op_exec(DisasContext *dc, int op,
 t_gen_subx_carry(dc, dst);
 break;
 case CC_OP_LZ:
-gen_helper_lz(dst, b);
+tcg_gen_clzi_tl(dst, b, TARGET_LONG_BITS);
 break;
 case CC_OP_MULS:
 tcg_gen_muls2_tl(dst, cpu_PR[PR_MOF], a, b);
-- 
2.9.3




[Qemu-devel] [PULL 35/65] target-tricore: Use clz opcode

2017-01-10 Thread Richard Henderson
Tested-by: Bastian Koppelmann 
Reviewed-by: Bastian Koppelmann 
Signed-off-by: Richard Henderson 
---
 target/tricore/helper.h|  2 --
 target/tricore/op_helper.c | 10 --
 target/tricore/translate.c |  5 +++--
 3 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/target/tricore/helper.h b/target/tricore/helper.h
index 9333e16..2cf04e1 100644
--- a/target/tricore/helper.h
+++ b/target/tricore/helper.h
@@ -87,9 +87,7 @@ DEF_HELPER_FLAGS_2(min_hu, TCG_CALL_NO_RWG_SE, i32, i32, i32)
 DEF_HELPER_FLAGS_2(ixmin, TCG_CALL_NO_RWG_SE, i64, i64, i32)
 DEF_HELPER_FLAGS_2(ixmin_u, TCG_CALL_NO_RWG_SE, i64, i64, i32)
 /* count leading ... */
-DEF_HELPER_FLAGS_1(clo, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clo_h, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(clz_h, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(cls, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(cls_h, TCG_CALL_NO_RWG_SE, i32, i32)
diff --git a/target/tricore/op_helper.c b/target/tricore/op_helper.c
index ac02e0a..3731d5e 100644
--- a/target/tricore/op_helper.c
+++ b/target/tricore/op_helper.c
@@ -1733,11 +1733,6 @@ EXTREMA_H_B(min, <)
 
 #undef EXTREMA_H_B
 
-uint32_t helper_clo(target_ulong r1)
-{
-return clo32(r1);
-}
-
 uint32_t helper_clo_h(target_ulong r1)
 {
 uint32_t ret_hw0 = extract32(r1, 0, 16);
@@ -1756,11 +1751,6 @@ uint32_t helper_clo_h(target_ulong r1)
 return ret_hw0 | (ret_hw1 << 16);
 }
 
-uint32_t helper_clz(target_ulong r1)
-{
-return clz32(r1);
-}
-
 uint32_t helper_clz_h(target_ulong r1)
 {
 uint32_t ret_hw0 = extract32(r1, 0, 16);
diff --git a/target/tricore/translate.c b/target/tricore/translate.c
index 36f734a..69cdfb9 100644
--- a/target/tricore/translate.c
+++ b/target/tricore/translate.c
@@ -6367,7 +6367,8 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, 
DisasContext *ctx)
 tcg_gen_andc_tl(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_CLO:
-gen_helper_clo(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+tcg_gen_not_tl(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+tcg_gen_clzi_tl(cpu_gpr_d[r3], cpu_gpr_d[r3], TARGET_LONG_BITS);
 break;
 case OPC2_32_RR_CLO_H:
 gen_helper_clo_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
@@ -6379,7 +6380,7 @@ static void decode_rr_logical_shift(CPUTriCoreState *env, 
DisasContext *ctx)
 gen_helper_cls_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
 break;
 case OPC2_32_RR_CLZ:
-gen_helper_clz(cpu_gpr_d[r3], cpu_gpr_d[r1]);
+tcg_gen_clzi_tl(cpu_gpr_d[r3], cpu_gpr_d[r1], TARGET_LONG_BITS);
 break;
 case OPC2_32_RR_CLZ_H:
 gen_helper_clz_h(cpu_gpr_d[r3], cpu_gpr_d[r1]);
-- 
2.9.3




[Qemu-devel] [PULL 20/65] tcg: Add markup for output requires new register

2017-01-10 Thread Richard Henderson
This is the same concept as, and same markup as, the
early clobber markup in gcc.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 34 ++
 tcg/tcg.h |  1 +
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index aabf94f..27913f0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1263,6 +1263,10 @@ void tcg_add_target_add_op_defs(const TCGTargetOpDef 
*tdefs)
 if (*ct_str == '\0')
 break;
 switch(*ct_str) {
+case '&':
+def->args_ct[i].ct |= TCG_CT_NEWREG;
+ct_str++;
+break;
 case 'i':
 def->args_ct[i].ct |= TCG_CT_CONST;
 ct_str++;
@@ -2208,7 +2212,8 @@ static void tcg_reg_alloc_op(TCGContext *s,
  const TCGOpDef *def, TCGOpcode opc,
  const TCGArg *args, TCGLifeData arg_life)
 {
-TCGRegSet allocated_regs;
+TCGRegSet i_allocated_regs;
+TCGRegSet o_allocated_regs;
 int i, k, nb_iargs, nb_oargs;
 TCGReg reg;
 TCGArg arg;
@@ -2225,8 +2230,10 @@ static void tcg_reg_alloc_op(TCGContext *s,
args + nb_oargs + nb_iargs, 
sizeof(TCGArg) * def->nb_cargs);
 
+tcg_regset_set(i_allocated_regs, s->reserved_regs);
+tcg_regset_set(o_allocated_regs, s->reserved_regs);
+
 /* satisfy input constraints */ 
-tcg_regset_set(allocated_regs, s->reserved_regs);
 for(k = 0; k < nb_iargs; k++) {
 i = def->sorted_args[nb_oargs + k];
 arg = args[i];
@@ -2241,7 +2248,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
 goto iarg_end;
 }
 
-temp_load(s, ts, arg_ct->u.regs, allocated_regs);
+temp_load(s, ts, arg_ct->u.regs, i_allocated_regs);
 
 if (arg_ct->ct & TCG_CT_IALIAS) {
 if (ts->fixed_reg) {
@@ -2275,13 +2282,13 @@ static void tcg_reg_alloc_op(TCGContext *s,
 allocate_in_reg:
 /* allocate a new register matching the constraint 
and move the temporary register into it */
-reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs,
+reg = tcg_reg_alloc(s, arg_ct->u.regs, i_allocated_regs,
 ts->indirect_base);
 tcg_out_mov(s, ts->type, reg, ts->reg);
 }
 new_args[i] = reg;
 const_args[i] = 0;
-tcg_regset_set_reg(allocated_regs, reg);
+tcg_regset_set_reg(i_allocated_regs, reg);
 iarg_end: ;
 }
 
@@ -2293,24 +2300,23 @@ static void tcg_reg_alloc_op(TCGContext *s,
 }
 
 if (def->flags & TCG_OPF_BB_END) {
-tcg_reg_alloc_bb_end(s, allocated_regs);
+tcg_reg_alloc_bb_end(s, i_allocated_regs);
 } else {
 if (def->flags & TCG_OPF_CALL_CLOBBER) {
 /* XXX: permit generic clobber register list ? */ 
 for (i = 0; i < TCG_TARGET_NB_REGS; i++) {
 if (tcg_regset_test_reg(tcg_target_call_clobber_regs, i)) {
-tcg_reg_free(s, i, allocated_regs);
+tcg_reg_free(s, i, i_allocated_regs);
 }
 }
 }
 if (def->flags & TCG_OPF_SIDE_EFFECTS) {
 /* sync globals if the op has side effects and might trigger
an exception. */
-sync_globals(s, allocated_regs);
+sync_globals(s, i_allocated_regs);
 }
 
 /* satisfy the output constraints */
-tcg_regset_set(allocated_regs, s->reserved_regs);
 for(k = 0; k < nb_oargs; k++) {
 i = def->sorted_args[k];
 arg = args[i];
@@ -2318,6 +2324,10 @@ static void tcg_reg_alloc_op(TCGContext *s,
 ts = >temps[arg];
 if (arg_ct->ct & TCG_CT_ALIAS) {
 reg = new_args[arg_ct->alias_index];
+} else if (arg_ct->ct & TCG_CT_NEWREG) {
+reg = tcg_reg_alloc(s, arg_ct->u.regs,
+i_allocated_regs | o_allocated_regs,
+ts->indirect_base);
 } else {
 /* if fixed register, we try to use it */
 reg = ts->reg;
@@ -2325,10 +2335,10 @@ static void tcg_reg_alloc_op(TCGContext *s,
 tcg_regset_test_reg(arg_ct->u.regs, reg)) {
 goto oarg_end;
 }
-reg = tcg_reg_alloc(s, arg_ct->u.regs, allocated_regs,
+reg = tcg_reg_alloc(s, arg_ct->u.regs, o_allocated_regs,
 ts->indirect_base);
 }
-tcg_regset_set_reg(allocated_regs, reg);
+tcg_regset_set_reg(o_allocated_regs, reg);
 /* if a fixed register is used, then a move will be done 
afterwards */
 if 

[Qemu-devel] [PULL 38/65] target-arm: Use clz opcode

2017-01-10 Thread Richard Henderson
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.c| 10 --
 target/arm/helper-a64.h|  2 --
 target/arm/helper.c|  5 -
 target/arm/helper.h|  1 -
 target/arm/translate-a64.c |  8 
 target/arm/translate.c |  6 +++---
 6 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 98b97df..77999ff 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -54,11 +54,6 @@ int64_t HELPER(sdiv64)(int64_t num, int64_t den)
 return num / den;
 }
 
-uint64_t HELPER(clz64)(uint64_t x)
-{
-return clz64(x);
-}
-
 uint64_t HELPER(cls64)(uint64_t x)
 {
 return clrsb64(x);
@@ -69,11 +64,6 @@ uint32_t HELPER(cls32)(uint32_t x)
 return clrsb32(x);
 }
 
-uint32_t HELPER(clz32)(uint32_t x)
-{
-return clz32(x);
-}
-
 uint64_t HELPER(rbit64)(uint64_t x)
 {
 return revbit64(x);
diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index dd32000..d320f96 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -18,10 +18,8 @@
  */
 DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
-DEF_HELPER_FLAGS_1(clz64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(cls64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(cls32, TCG_CALL_NO_RWG_SE, i32, i32)
-DEF_HELPER_FLAGS_1(clz32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 8dcabbf..77ea5e0 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5725,11 +5725,6 @@ uint32_t HELPER(uxtb16)(uint32_t x)
 return res;
 }
 
-uint32_t HELPER(clz)(uint32_t x)
-{
-return clz32(x);
-}
-
 int32_t HELPER(sdiv)(int32_t num, int32_t den)
 {
 if (den == 0)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 84aa637..df86bf7 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -1,4 +1,3 @@
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(sxtb16, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(uxtb16, TCG_CALL_NO_RWG_SE, i32, i32)
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index a59c90c..1bf94bc 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -3954,11 +3954,11 @@ static void handle_clz(DisasContext *s, unsigned int sf,
 tcg_rn = cpu_reg(s, rn);
 
 if (sf) {
-gen_helper_clz64(tcg_rd, tcg_rn);
+tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
 } else {
 TCGv_i32 tcg_tmp32 = tcg_temp_new_i32();
 tcg_gen_extrl_i64_i32(tcg_tmp32, tcg_rn);
-gen_helper_clz(tcg_tmp32, tcg_tmp32);
+tcg_gen_clzi_i32(tcg_tmp32, tcg_tmp32, 32);
 tcg_gen_extu_i32_i64(tcg_rd, tcg_tmp32);
 tcg_temp_free_i32(tcg_tmp32);
 }
@@ -7591,7 +7591,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, 
bool u,
 switch (opcode) {
 case 0x4: /* CLS, CLZ */
 if (u) {
-gen_helper_clz64(tcg_rd, tcg_rn);
+tcg_gen_clzi_i64(tcg_rd, tcg_rn, 64);
 } else {
 gen_helper_cls64(tcg_rd, tcg_rn);
 }
@@ -10261,7 +10261,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, 
uint32_t insn)
 goto do_cmop;
 case 0x4: /* CLS */
 if (u) {
-gen_helper_clz32(tcg_res, tcg_op);
+tcg_gen_clzi_i32(tcg_res, tcg_op, 32);
 } else {
 gen_helper_cls32(tcg_res, tcg_op);
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 08da9ac..c9186b6 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7037,7 +7037,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t 
insn)
 switch (size) {
 case 0: gen_helper_neon_clz_u8(tmp, tmp); break;
 case 1: gen_helper_neon_clz_u16(tmp, tmp); break;
-case 2: gen_helper_clz(tmp, tmp); break;
+case 2: tcg_gen_clzi_i32(tmp, tmp, 32); break;
 default: abort();
 }
 break;
@@ -8219,7 +8219,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int 
insn)
 ARCH(5);
 rd = (insn >> 12) & 0xf;
 tmp = load_reg(s, rm);
-gen_helper_clz(tmp, tmp);
+tcg_gen_clzi_i32(tmp, tmp, 32);
 store_reg(s, rd, tmp);
 } else {
 goto illegal_op;
@@ -9992,7 +9992,7 @@ static int disas_thumb2_insn(CPUARMState *env, 
DisasContext *s, uint16_t insn_hw
  

[Qemu-devel] [PULL 15/65] target-i386: Use new deposit and extract ops

2017-01-10 Thread Richard Henderson
A couple of places where it was easy to identify a right-shift
followed by an extract or and-with-immediate, and the obvious
sign-extract from a high byte register.

Acked-by: Eduardo Habkost 
Signed-off-by: Richard Henderson 
---
 target/i386/translate.c | 45 +++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 59e11fc..816d0b1 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -383,8 +383,7 @@ static void gen_op_mov_reg_v(TCGMemOp ot, int reg, TCGv t0)
 static inline void gen_op_mov_v_reg(TCGMemOp ot, TCGv t0, int reg)
 {
 if (ot == MO_8 && byte_reg_is_xH(reg)) {
-tcg_gen_shri_tl(t0, cpu_regs[reg - 4], 8);
-tcg_gen_ext8u_tl(t0, t0);
+tcg_gen_extract_tl(t0, cpu_regs[reg - 4], 8, 8);
 } else {
 tcg_gen_mov_tl(t0, cpu_regs[reg]);
 }
@@ -3768,8 +3767,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b,
 
 /* Extract the LEN into a mask.  Lengths larger than
operand size get all ones.  */
-tcg_gen_shri_tl(cpu_A0, cpu_regs[s->vex_v], 8);
-tcg_gen_ext8u_tl(cpu_A0, cpu_A0);
+tcg_gen_extract_tl(cpu_A0, cpu_regs[s->vex_v], 8, 8);
 tcg_gen_movcond_tl(TCG_COND_LEU, cpu_A0, cpu_A0, bound,
cpu_A0, bound);
 tcg_temp_free(bound);
@@ -3920,9 +3918,8 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b,
 gen_compute_eflags(s);
 }
 carry_in = cpu_tmp0;
-tcg_gen_shri_tl(carry_in, cpu_cc_src,
-ctz32(b == 0x1f6 ? CC_C : CC_O));
-tcg_gen_andi_tl(carry_in, carry_in, 1);
+tcg_gen_extract_tl(carry_in, cpu_cc_src,
+   ctz32(b == 0x1f6 ? CC_C : CC_O), 1);
 }
 
 switch (ot) {
@@ -5447,21 +5444,25 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 rm = (modrm & 7) | REX_B(s);
 
 if (mod == 3) {
-gen_op_mov_v_reg(ot, cpu_T0, rm);
-switch (s_ot) {
-case MO_UB:
-tcg_gen_ext8u_tl(cpu_T0, cpu_T0);
-break;
-case MO_SB:
-tcg_gen_ext8s_tl(cpu_T0, cpu_T0);
-break;
-case MO_UW:
-tcg_gen_ext16u_tl(cpu_T0, cpu_T0);
-break;
-default:
-case MO_SW:
-tcg_gen_ext16s_tl(cpu_T0, cpu_T0);
-break;
+if (s_ot == MO_SB && byte_reg_is_xH(rm)) {
+tcg_gen_sextract_tl(cpu_T0, cpu_regs[rm - 4], 8, 8);
+} else {
+gen_op_mov_v_reg(ot, cpu_T0, rm);
+switch (s_ot) {
+case MO_UB:
+tcg_gen_ext8u_tl(cpu_T0, cpu_T0);
+break;
+case MO_SB:
+tcg_gen_ext8s_tl(cpu_T0, cpu_T0);
+break;
+case MO_UW:
+tcg_gen_ext16u_tl(cpu_T0, cpu_T0);
+break;
+default:
+case MO_SW:
+tcg_gen_ext16s_tl(cpu_T0, cpu_T0);
+break;
+}
 }
 gen_op_mov_reg_v(d_ot, reg, cpu_T0);
 } else {
-- 
2.9.3




[Qemu-devel] [PULL 27/65] target-alpha: Use the ctz and clz opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/alpha/helper.h |  2 --
 target/alpha/int_helper.c | 10 --
 target/alpha/translate.c  |  4 ++--
 3 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/target/alpha/helper.h b/target/alpha/helper.h
index 004221d..eed3906 100644
--- a/target/alpha/helper.h
+++ b/target/alpha/helper.h
@@ -4,8 +4,6 @@ DEF_HELPER_FLAGS_1(load_pcc, TCG_CALL_NO_RWG_SE, i64, env)
 DEF_HELPER_FLAGS_3(check_overflow, TCG_CALL_NO_WG, void, env, i64, i64)
 
 DEF_HELPER_FLAGS_1(ctpop, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(ctlz, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(cttz, TCG_CALL_NO_RWG_SE, i64, i64)
 
 DEF_HELPER_FLAGS_2(zap, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(zapnot, TCG_CALL_NO_RWG_SE, i64, i64, i64)
diff --git a/target/alpha/int_helper.c b/target/alpha/int_helper.c
index 19bebfe..3c303bd 100644
--- a/target/alpha/int_helper.c
+++ b/target/alpha/int_helper.c
@@ -29,16 +29,6 @@ uint64_t helper_ctpop(uint64_t arg)
 return ctpop64(arg);
 }
 
-uint64_t helper_ctlz(uint64_t arg)
-{
-return clz64(arg);
-}
-
-uint64_t helper_cttz(uint64_t arg)
-{
-return ctz64(arg);
-}
-
 uint64_t helper_zapnot(uint64_t val, uint64_t mskb)
 {
 uint64_t mask;
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 5ac2277..6e2e563 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -2555,14 +2555,14 @@ static ExitStatus translate_one(DisasContext *ctx, 
uint32_t insn)
 REQUIRE_TB_FLAG(TB_FLAGS_AMASK_CIX);
 REQUIRE_REG_31(ra);
 REQUIRE_NO_LIT;
-gen_helper_ctlz(vc, vb);
+tcg_gen_clzi_i64(vc, vb, 64);
 break;
 case 0x33:
 /* CTTZ */
 REQUIRE_TB_FLAG(TB_FLAGS_AMASK_CIX);
 REQUIRE_REG_31(ra);
 REQUIRE_NO_LIT;
-gen_helper_cttz(vc, vb);
+tcg_gen_ctzi_i64(vc, vb, 64);
 break;
 case 0x34:
 /* UNPKBW */
-- 
2.9.3




[Qemu-devel] [PULL 33/65] target-s390x: Use clz opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/s390x/helper.h | 1 -
 target/s390x/int_helper.c | 6 --
 target/s390x/translate.c  | 2 +-
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 207a6e7..9102071 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -70,7 +70,6 @@ DEF_HELPER_FLAGS_4(msdb, TCG_CALL_NO_WG, i64, env, i64, i64, 
i64)
 DEF_HELPER_FLAGS_3(tceb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
-DEF_HELPER_FLAGS_1(clz, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target/s390x/int_helper.c b/target/s390x/int_helper.c
index 370c94d..5bc470b 100644
--- a/target/s390x/int_helper.c
+++ b/target/s390x/int_helper.c
@@ -117,12 +117,6 @@ uint64_t HELPER(divu64)(CPUS390XState *env, uint64_t ah, 
uint64_t al,
 return ret;
 }
 
-/* count leading zeros, for find leftmost one */
-uint64_t HELPER(clz)(uint64_t v)
-{
-return clz64(v);
-}
-
 uint64_t HELPER(cvd)(int32_t reg)
 {
 /* positive 0 */
diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 6cebb7e..01c6217 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -2249,7 +2249,7 @@ static ExitStatus op_flogr(DisasContext *s, DisasOps *o)
 gen_op_update1_cc_i64(s, CC_OP_FLOGR, o->in2);
 
 /* R1 = IN ? CLZ(IN) : 64.  */
-gen_helper_clz(o->out, o->in2);
+tcg_gen_clzi_i64(o->out, o->in2, 64);
 
 /* R1+1 = IN & ~(found bit).  Note that we may attempt to shift this
value by 64, which is undefined.  But since the shift is 64 iff the
-- 
2.9.3




[Qemu-devel] [PULL 36/65] target-unicore32: Use clz opcode

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/unicore32/helper.c| 10 --
 target/unicore32/helper.h|  3 ---
 target/unicore32/translate.c |  6 +++---
 3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/target/unicore32/helper.c b/target/unicore32/helper.c
index d603bde..7a5613e 100644
--- a/target/unicore32/helper.c
+++ b/target/unicore32/helper.c
@@ -32,16 +32,6 @@ UniCore32CPU *uc32_cpu_init(const char *cpu_model)
 return UNICORE32_CPU(cpu_generic_init(TYPE_UNICORE32_CPU, cpu_model));
 }
 
-uint32_t HELPER(clo)(uint32_t x)
-{
-return clo32(x);
-}
-
-uint32_t HELPER(clz)(uint32_t x)
-{
-return clz32(x);
-}
-
 #ifndef CONFIG_USER_ONLY
 void helper_cp0_set(CPUUniCore32State *env, uint32_t val, uint32_t creg,
 uint32_t cop)
diff --git a/target/unicore32/helper.h b/target/unicore32/helper.h
index 9418137..a4a5d45 100644
--- a/target/unicore32/helper.h
+++ b/target/unicore32/helper.h
@@ -13,9 +13,6 @@ DEF_HELPER_3(cp0_get, i32, env, i32, i32)
 DEF_HELPER_1(cp1_putc, void, i32)
 #endif
 
-DEF_HELPER_1(clz, i32, i32)
-DEF_HELPER_1(clo, i32, i32)
-
 DEF_HELPER_2(exception, void, env, i32)
 
 DEF_HELPER_3(asr_write, void, env, i32, i32)
diff --git a/target/unicore32/translate.c b/target/unicore32/translate.c
index 514d460..666a201 100644
--- a/target/unicore32/translate.c
+++ b/target/unicore32/translate.c
@@ -1479,10 +1479,10 @@ static void do_misc(CPUUniCore32State *env, 
DisasContext *s, uint32_t insn)
 /* clz */
 tmp = load_reg(s, UCOP_REG_M);
 if (UCOP_SET(26)) {
-gen_helper_clo(tmp, tmp);
-} else {
-gen_helper_clz(tmp, tmp);
+/* clo */
+tcg_gen_not_i32(tmp, tmp);
 }
+tcg_gen_clzi_i32(tmp, tmp, 32);
 store_reg(s, UCOP_REG_D, tmp);
 return;
 }
-- 
2.9.3




[Qemu-devel] [PULL 11/65] tcg/s390: Implement field extraction opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.h |  4 ++--
 tcg/s390/tcg-target.inc.c | 11 +++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index d650a72..e9ac12e 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -78,7 +78,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_nand_i32   0
 #define TCG_TARGET_HAS_nor_i320
 #define TCG_TARGET_HAS_deposit_i32(s390_facilities & FACILITY_GEN_INST_EXT)
-#define TCG_TARGET_HAS_extract_i320
+#define TCG_TARGET_HAS_extract_i32(s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i32   0
 #define TCG_TARGET_HAS_movcond_i321
 #define TCG_TARGET_HAS_add2_i32   1
@@ -109,7 +109,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_nand_i64   0
 #define TCG_TARGET_HAS_nor_i640
 #define TCG_TARGET_HAS_deposit_i64(s390_facilities & FACILITY_GEN_INST_EXT)
-#define TCG_TARGET_HAS_extract_i640
+#define TCG_TARGET_HAS_extract_i64(s390_facilities & FACILITY_GEN_INST_EXT)
 #define TCG_TARGET_HAS_sextract_i64   0
 #define TCG_TARGET_HAS_movcond_i641
 #define TCG_TARGET_HAS_add2_i64   1
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 3821378..2faa761 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -1252,6 +1252,12 @@ static void tgen_deposit(TCGContext *s, TCGReg dest, 
TCGReg src,
 tcg_out_risbg(s, dest, src, msb, lsb, ofs, 0);
 }
 
+static void tgen_extract(TCGContext *s, TCGReg dest, TCGReg src,
+ int ofs, int len)
+{
+tcg_out_risbg(s, dest, src, 64 - len, 63, 64 - ofs, 1);
+}
+
 static void tgen_gotoi(TCGContext *s, int cc, tcg_insn_unit *dest)
 {
 ptrdiff_t off = dest - s->code_ptr;
@@ -2158,6 +2164,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 OP_32_64(deposit):
 tgen_deposit(s, args[0], args[2], args[3], args[4]);
 break;
+OP_32_64(extract):
+tgen_extract(s, args[0], args[1], args[2], args[3]);
+break;
 
 case INDEX_op_mb:
 /* The host memory model is quite strong, we simply need to
@@ -2227,6 +2236,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_setcond_i32, { "r", "r", "rC" } },
 { INDEX_op_movcond_i32, { "r", "r", "rC", "r", "0" } },
 { INDEX_op_deposit_i32, { "r", "0", "r" } },
+{ INDEX_op_extract_i32, { "r", "r" } },
 
 { INDEX_op_qemu_ld_i32, { "r", "L" } },
 { INDEX_op_qemu_ld_i64, { "r", "L" } },
@@ -2288,6 +2298,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_setcond_i64, { "r", "r", "rC" } },
 { INDEX_op_movcond_i64, { "r", "r", "rC", "r", "0" } },
 { INDEX_op_deposit_i64, { "r", "0", "r" } },
+{ INDEX_op_extract_i64, { "r", "r" } },
 
 { INDEX_op_mb, { } },
 { -1 },
-- 
2.9.3




[Qemu-devel] [PULL 26/65] disas/ppc: Handle popcnt and cnttz

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 disas/ppc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/disas/ppc.c b/disas/ppc.c
index bd05623..ed7e0d0 100644
--- a/disas/ppc.c
+++ b/disas/ppc.c
@@ -1955,6 +1955,9 @@ extract_tbr (unsigned long insn,
 #define POWER4 PPC_OPCODE_POWER4
 #define POWER5 PPC_OPCODE_POWER5
 #define POWER6 PPC_OPCODE_POWER6
+/* Documentation purposes only; we don't actually check the isa for disas.  */
+#define POWER7  PPC_OPCODE_POWER6
+#define POWER9  PPC_OPCODE_POWER6
 #define CELL   PPC_OPCODE_CELL
 #define PPC32   PPC_OPCODE_32 | PPC_OPCODE_PPC
 #define PPC64   PPC_OPCODE_64 | PPC_OPCODE_PPC
@@ -3589,6 +3592,13 @@ const struct powerpc_opcode powerpc_opcodes[] = {
 { "lbzux",   X(31,119),X_MASK, COM,{ RT, RAL, RB } 
},
 
 { "popcntb", X(31,122), XRB_MASK,  POWER5, { RA, RS } },
+{ "popcntw", X(31,378), XRB_MASK,   POWER7, { RA, RS } },
+{ "popcntd", X(31,506), XRB_MASK,   POWER7, { RA, RS } },
+
+{ "cnttzw",  XRC(31,538,0), XRB_MASK,   POWER9, { RA, RS } },
+{ "cnttzw.", XRC(31,538,1), XRB_MASK,   POWER9, { RA, RS } },
+{ "cnttzd",  XRC(31,570,0), XRB_MASK,   POWER9, { RA, RS } },
+{ "cnttzd.", XRC(31,570,1), XRB_MASK,   POWER9, { RA, RS } },
 
 { "not", XRC(31,124,0), X_MASK,COM,{ RA, RS, RBS } },
 { "nor", XRC(31,124,0), X_MASK,COM,{ RA, RS, RB } },
-- 
2.9.3




[Qemu-devel] [PULL 23/65] tcg: Allow an operand to be matching or a constant

2017-01-10 Thread Richard Henderson
This allows an output operand to match an input operand
only when the input operand needs a register.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/README | 13 +
 tcg/tcg.c  | 63 +++---
 2 files changed, 41 insertions(+), 35 deletions(-)

diff --git a/tcg/README b/tcg/README
index 065d9c2..6946b5b 100644
--- a/tcg/README
+++ b/tcg/README
@@ -539,24 +539,29 @@ version. Aliases are specified in the input operands as 
for GCC.
 The same register may be used for both an input and an output, even when
 they are not explicitly aliased.  If an op expands to multiple target
 instructions then care must be taken to avoid clobbering input values.
-GCC style "early clobber" outputs are not currently supported.
+GCC style "early clobber" outputs are supported, with '&'.
 
 A target can define specific register or constant constraints. If an
 operation uses a constant input constraint which does not allow all
 constants, it must also accept registers in order to have a fallback.
+The constraint 'i' is defined generically to accept any constant.
+The constraint 'r' is not defined generically, but is consistently
+used by each backend to indicate all registers.
 
 The movi_i32 and movi_i64 operations must accept any constants.
 
 The mov_i32 and mov_i64 operations must accept any registers of the
 same type.
 
-The ld/st instructions must accept signed 32 bit constant offsets. It
-can be implemented by reserving a specific register to compute the
-address if the offset is too big.
+The ld/st/sti instructions must accept signed 32 bit constant offsets.
+This can be implemented by reserving a specific register in which to
+compute the address if the offset is too big.
 
 The ld/st instructions must accept any destination (ld) or source (st)
 register.
 
+The sti instruction may fail if it cannot store the given constant.
+
 4.3) Function call assumptions
 
 - The only supported types for parameters and return value are: 32 and
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8b4dce7..cb898f1 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1256,37 +1256,37 @@ static void process_op_defs(TCGContext *s)
 
 tcg_regset_clear(def->args_ct[i].u.regs);
 def->args_ct[i].ct = 0;
-if (ct_str[0] >= '0' && ct_str[0] <= '9') {
-int oarg;
-oarg = ct_str[0] - '0';
-tcg_debug_assert(oarg < def->nb_oargs);
-tcg_debug_assert(def->args_ct[oarg].ct & TCG_CT_REG);
-/* TCG_CT_ALIAS is for the output arguments. The input
-   argument is tagged with TCG_CT_IALIAS. */
-def->args_ct[i] = def->args_ct[oarg];
-def->args_ct[oarg].ct = TCG_CT_ALIAS;
-def->args_ct[oarg].alias_index = i;
-def->args_ct[i].ct |= TCG_CT_IALIAS;
-def->args_ct[i].alias_index = oarg;
-} else {
-for(;;) {
-if (*ct_str == '\0')
-break;
-switch(*ct_str) {
-case '&':
-def->args_ct[i].ct |= TCG_CT_NEWREG;
-ct_str++;
-break;
-case 'i':
-def->args_ct[i].ct |= TCG_CT_CONST;
-ct_str++;
-break;
-default:
-ct_str = target_parse_constraint(>args_ct[i],
- ct_str, type);
-/* Typo in TCGTargetOpDef constraint. */
-tcg_debug_assert(ct_str != NULL);
+while (*ct_str != '\0') {
+switch(*ct_str) {
+case '0' ... '9':
+{
+int oarg = *ct_str - '0';
+tcg_debug_assert(ct_str == tdefs->args_ct_str[i]);
+tcg_debug_assert(oarg < def->nb_oargs);
+tcg_debug_assert(def->args_ct[oarg].ct & TCG_CT_REG);
+/* TCG_CT_ALIAS is for the output arguments.
+   The input is tagged with TCG_CT_IALIAS. */
+def->args_ct[i] = def->args_ct[oarg];
+def->args_ct[oarg].ct |= TCG_CT_ALIAS;
+def->args_ct[oarg].alias_index = i;
+def->args_ct[i].ct |= TCG_CT_IALIAS;
+def->args_ct[i].alias_index = oarg;
 }
+ct_str++;
+break;
+case '&':
+def->args_ct[i].ct |= TCG_CT_NEWREG;
+ct_str++;
+break;
+case 'i':
+def->args_ct[i].ct |= TCG_CT_CONST;
+ct_str++;
+break;
+ 

[Qemu-devel] [PULL 34/65] target-tilegx: Use clz and ctz opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/tilegx/helper.c| 10 --
 target/tilegx/helper.h|  2 --
 target/tilegx/translate.c |  4 ++--
 3 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/target/tilegx/helper.c b/target/tilegx/helper.c
index b4fba9c..b6f5e29 100644
--- a/target/tilegx/helper.c
+++ b/target/tilegx/helper.c
@@ -55,16 +55,6 @@ void helper_ext01_ics(CPUTLGState *env)
 }
 }
 
-uint64_t helper_cntlz(uint64_t arg)
-{
-return clz64(arg);
-}
-
-uint64_t helper_cnttz(uint64_t arg)
-{
-return ctz64(arg);
-}
-
 uint64_t helper_pcnt(uint64_t arg)
 {
 return ctpop64(arg);
diff --git a/target/tilegx/helper.h b/target/tilegx/helper.h
index 9281d0f..bab303a 100644
--- a/target/tilegx/helper.h
+++ b/target/tilegx/helper.h
@@ -1,7 +1,5 @@
 DEF_HELPER_2(exception, noreturn, env, i32)
 DEF_HELPER_1(ext01_ics, void, env)
-DEF_HELPER_FLAGS_1(cntlz, TCG_CALL_NO_RWG_SE, i64, i64)
-DEF_HELPER_FLAGS_1(cnttz, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(pcnt, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(revbits, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_3(shufflebytes, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
diff --git a/target/tilegx/translate.c b/target/tilegx/translate.c
index 9c734ee..8a2df1b 100644
--- a/target/tilegx/translate.c
+++ b/target/tilegx/translate.c
@@ -608,12 +608,12 @@ static TileExcp gen_rr_opcode(DisasContext *dc, unsigned 
opext,
 switch (opext) {
 case OE_RR_X0(CNTLZ):
 case OE_RR_Y0(CNTLZ):
-gen_helper_cntlz(tdest, tsrca);
+tcg_gen_clzi_tl(tdest, tsrca, TARGET_LONG_BITS);
 mnemonic = "cntlz";
 break;
 case OE_RR_X0(CNTTZ):
 case OE_RR_Y0(CNTTZ):
-gen_helper_cnttz(tdest, tsrca);
+tcg_gen_ctzi_tl(tdest, tsrca, TARGET_LONG_BITS);
 mnemonic = "cnttz";
 break;
 case OE_RR_X0(FSINGLE_PACK1):
-- 
2.9.3




[Qemu-devel] [PULL 09/65] tcg/ppc: Implement field extraction opcodes

2017-01-10 Thread Richard Henderson
Reviewed-by: David Gibson 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |  4 ++--
 tcg/ppc/tcg-target.inc.c | 10 ++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index c765d3e..b42c57a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -69,7 +69,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32 1
 #define TCG_TARGET_HAS_nor_i32  1
 #define TCG_TARGET_HAS_deposit_i32  1
-#define TCG_TARGET_HAS_extract_i32  0
+#define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_movcond_i32  1
 #define TCG_TARGET_HAS_mulu2_i320
@@ -102,7 +102,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64 1
 #define TCG_TARGET_HAS_nor_i64  1
 #define TCG_TARGET_HAS_deposit_i64  1
-#define TCG_TARGET_HAS_extract_i64  0
+#define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_movcond_i64  1
 #define TCG_TARGET_HAS_add2_i64 1
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index a3262cf..7ec54a2 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -2396,6 +2396,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 }
 break;
 
+case INDEX_op_extract_i32:
+tcg_out_rlw(s, RLWINM, args[0], args[1],
+32 - args[2], 32 - args[3], 31);
+break;
+case INDEX_op_extract_i64:
+tcg_out_rld(s, RLDICL, args[0], args[1], 64 - args[2], 64 - args[3]);
+break;
+
 case INDEX_op_movcond_i32:
 tcg_out_movcond(s, TCG_TYPE_I32, args[5], args[0], args[1], args[2],
 args[3], args[4], const_args[2]);
@@ -2530,6 +2538,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
 { INDEX_op_movcond_i32, { "r", "r", "ri", "rZ", "rZ" } },
 
 { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+{ INDEX_op_extract_i32, { "r", "r" } },
 
 { INDEX_op_muluh_i32, { "r", "r", "r" } },
 { INDEX_op_mulsh_i32, { "r", "r", "r" } },
@@ -2585,6 +2594,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
 { INDEX_op_movcond_i64, { "r", "r", "ri", "rZ", "rZ" } },
 
 { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
+{ INDEX_op_extract_i64, { "r", "r" } },
 
 { INDEX_op_mulsh_i64, { "r", "r", "r" } },
 { INDEX_op_muluh_i64, { "r", "r", "r" } },
-- 
2.9.3




[Qemu-devel] [PULL 12/65] tcg/s390: Support deposit into zero

2017-01-10 Thread Richard Henderson
Since we can no longer use matching constraints, this does
mean we must handle that data movement by hand.

Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.inc.c | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 2faa761..22e121a 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -43,6 +43,7 @@
 #define TCG_CT_CONST_XORI  0x400
 #define TCG_CT_CONST_CMPI  0x800
 #define TCG_CT_CONST_ADLI  0x1000
+#define TCG_CT_CONST_ZERO  0x2000
 
 /* Several places within the instruction set 0 means "no register"
rather than TCG_REG_R0.  */
@@ -399,6 +400,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 case 'C':
 ct->ct |= TCG_CT_CONST_CMPI;
 break;
+case 'Z':
+ct->ct |= TCG_CT_CONST_ZERO;
+break;
 default:
 return -1;
 }
@@ -538,6 +542,8 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 return tcg_match_xori(type, val);
 } else if (ct & TCG_CT_CONST_CMPI) {
 return tcg_match_cmpi(type, val);
+} else if (ct & TCG_CT_CONST_ZERO) {
+return val == 0;
 }
 
 return 0;
@@ -1245,11 +1251,11 @@ static void tgen_movcond(TCGContext *s, TCGType type, 
TCGCond c, TCGReg dest,
 }
 
 static void tgen_deposit(TCGContext *s, TCGReg dest, TCGReg src,
- int ofs, int len)
+ int ofs, int len, int z)
 {
 int lsb = (63 - ofs);
 int msb = lsb - (len - 1);
-tcg_out_risbg(s, dest, src, msb, lsb, ofs, 0);
+tcg_out_risbg(s, dest, src, msb, lsb, ofs, z);
 }
 
 static void tgen_extract(TCGContext *s, TCGReg dest, TCGReg src,
@@ -2162,8 +2168,24 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 
 OP_32_64(deposit):
-tgen_deposit(s, args[0], args[2], args[3], args[4]);
+a0 = args[0], a1 = args[1], a2 = args[2];
+if (const_args[1]) {
+tgen_deposit(s, a0, a2, args[3], args[4], 1);
+} else {
+/* Since we can't support "0Z" as a constraint, we allow a1 in
+   any register.  Fix things up as if a matching constraint.  */
+if (a0 != a1) {
+TCGType type = (opc == INDEX_op_deposit_i64);
+if (a0 == a2) {
+tcg_out_mov(s, type, TCG_TMP0, a2);
+a2 = TCG_TMP0;
+}
+tcg_out_mov(s, type, a0, a1);
+}
+tgen_deposit(s, a0, a2, args[3], args[4], 0);
+}
 break;
+
 OP_32_64(extract):
 tgen_extract(s, args[0], args[1], args[2], args[3]);
 break;
@@ -2235,7 +2257,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
 { INDEX_op_brcond_i32, { "r", "rC" } },
 { INDEX_op_setcond_i32, { "r", "r", "rC" } },
 { INDEX_op_movcond_i32, { "r", "r", "rC", "r", "0" } },
-{ INDEX_op_deposit_i32, { "r", "0", "r" } },
+{ INDEX_op_deposit_i32, { "r", "rZ", "r" } },
 { INDEX_op_extract_i32, { "r", "r" } },
 
 { INDEX_op_qemu_ld_i32, { "r", "L" } },
-- 
2.9.3




[Qemu-devel] [PULL 22/65] tcg: Pass the opcode width to target_parse_constraint

2017-01-10 Thread Richard Henderson
This will let us choose how to interpret a given constraint
depending on whether the opcode is 32- or 64-bit.  Which will
let us share more constraint combinations between opcodes.

At the same time, change the interface to return the advanced
pointer instead of passing it in/out by reference.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 15 +--
 tcg/arm/tcg-target.inc.c | 15 +--
 tcg/i386/tcg-target.inc.c| 14 +-
 tcg/ia64/tcg-target.inc.c| 14 +-
 tcg/mips/tcg-target.inc.c| 14 +-
 tcg/ppc/tcg-target.inc.c | 14 +-
 tcg/s390/tcg-target.inc.c| 14 +-
 tcg/sparc/tcg-target.inc.c   | 14 +-
 tcg/tcg.c| 12 
 tcg/tci/tcg-target.inc.c | 12 +---
 10 files changed, 53 insertions(+), 85 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 416db45..17c0b20 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -115,12 +115,10 @@ static inline void patch_reloc(tcg_insn_unit *code_ptr, 
int type,
 #define TCG_CT_CONST_MONE 0x800
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct,
-   const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+   const char *ct_str, TCGType type)
 {
-const char *ct_str = *pct_str;
-
-switch (ct_str[0]) {
+switch (*ct_str++) {
 case 'r':
 ct->ct |= TCG_CT_REG;
 tcg_regset_set32(ct->u.regs, 0, (1ULL << TCG_TARGET_NB_REGS) - 1);
@@ -150,12 +148,9 @@ static int target_parse_constraint(TCGArgConstraint *ct,
 ct->ct |= TCG_CT_CONST_ZERO;
 break;
 default:
-return -1;
+return NULL;
 }
-
-ct_str++;
-*pct_str = ct_str;
-return 0;
+return ct_str;
 }
 
 static inline bool is_aimm(uint64_t val)
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index eeabcf8..ec0b861 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -114,12 +114,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 #define TCG_CT_CONST_ZERO 0x800
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+   const char *ct_str, TCGType type)
 {
-const char *ct_str;
-
-ct_str = *pct_str;
-switch (ct_str[0]) {
+switch (*ct_str++) {
 case 'I':
 ct->ct |= TCG_CT_CONST_ARM;
 break;
@@ -172,12 +170,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 break;
 
 default:
-return -1;
+return NULL;
 }
-ct_str++;
-*pct_str = ct_str;
-
-return 0;
+return ct_str;
 }
 
 static inline uint32_t rotl(uint32_t val, int n)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 595c399..aa5a248 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -166,12 +166,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 }
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+   const char *ct_str, TCGType type)
 {
-const char *ct_str;
-
-ct_str = *pct_str;
-switch(ct_str[0]) {
+switch(*ct_str++) {
 case 'a':
 ct->ct |= TCG_CT_REG;
 tcg_regset_set_reg(ct->u.regs, TCG_REG_EAX);
@@ -249,11 +247,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 break;
 
 default:
-return -1;
+return NULL;
 }
-ct_str++;
-*pct_str = ct_str;
-return 0;
+return ct_str;
 }
 
 /* test if a constant matches the constraint */
diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index e4d419d..bf9a97d 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -721,12 +721,10 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
  */
 
 /* parse target specific constraints */
-static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+static const char *target_parse_constraint(TCGArgConstraint *ct,
+   const char *ct_str, TCGType type)
 {
-const char *ct_str;
-
-ct_str = *pct_str;
-switch(ct_str[0]) {
+switch(*ct_str++) {
 case 'r':
 ct->ct |= TCG_CT_REG;
 tcg_regset_set(ct->u.regs, 0xull);
@@ -750,11 +748,9 @@ static int target_parse_constraint(TCGArgConstraint *ct, 
const char **pct_str)
 ct->ct |= TCG_CT_CONST_ZERO;

[Qemu-devel] [PULL 17/65] target-ppc: Use the new deposit and extract ops

2017-01-10 Thread Richard Henderson
Use the new primitives for RDWINM and RLDICL.

Reviewed-by: David Gibson 
Signed-off-by: Richard Henderson 
---
 target/ppc/translate.c | 35 +++
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 59e9552..435c6f0 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1975,16 +1975,16 @@ static void gen_rlwinm(DisasContext *ctx)
 {
 TCGv t_ra = cpu_gpr[rA(ctx->opcode)];
 TCGv t_rs = cpu_gpr[rS(ctx->opcode)];
-uint32_t sh = SH(ctx->opcode);
-uint32_t mb = MB(ctx->opcode);
-uint32_t me = ME(ctx->opcode);
-
-if (mb == 0 && me == (31 - sh)) {
-tcg_gen_shli_tl(t_ra, t_rs, sh);
-tcg_gen_ext32u_tl(t_ra, t_ra);
-} else if (sh != 0 && me == 31 && sh == (32 - mb)) {
-tcg_gen_ext32u_tl(t_ra, t_rs);
-tcg_gen_shri_tl(t_ra, t_ra, mb);
+int sh = SH(ctx->opcode);
+int mb = MB(ctx->opcode);
+int me = ME(ctx->opcode);
+int len = me - mb + 1;
+int rsh = (32 - sh) & 31;
+
+if (sh != 0 && len > 0 && me == (31 - sh)) {
+tcg_gen_deposit_z_tl(t_ra, t_rs, sh, len);
+} else if (me == 31 && rsh + len <= 32) {
+tcg_gen_extract_tl(t_ra, t_rs, rsh, len);
 } else {
 target_ulong mask;
 #if defined(TARGET_PPC64)
@@ -1992,8 +1992,9 @@ static void gen_rlwinm(DisasContext *ctx)
 me += 32;
 #endif
 mask = MASK(mb, me);
-
-if (mask <= 0xu) {
+if (sh == 0) {
+tcg_gen_andi_tl(t_ra, t_rs, mask);
+} else if (mask <= 0xu) {
 TCGv_i32 t0 = tcg_temp_new_i32();
 tcg_gen_trunc_tl_i32(t0, t_rs);
 tcg_gen_rotli_i32(t0, t0, sh);
@@ -2096,11 +2097,13 @@ static void gen_rldinm(DisasContext *ctx, int mb, int 
me, int sh)
 {
 TCGv t_ra = cpu_gpr[rA(ctx->opcode)];
 TCGv t_rs = cpu_gpr[rS(ctx->opcode)];
+int len = me - mb + 1;
+int rsh = (64 - sh) & 63;
 
-if (sh != 0 && mb == 0 && me == (63 - sh)) {
-tcg_gen_shli_tl(t_ra, t_rs, sh);
-} else if (sh != 0 && me == 63 && sh == (64 - mb)) {
-tcg_gen_shri_tl(t_ra, t_rs, mb);
+if (sh != 0 && len > 0 && me == (63 - sh)) {
+tcg_gen_deposit_z_tl(t_ra, t_rs, sh, len);
+} else if (me == 63 && rsh + len <= 64) {
+tcg_gen_extract_tl(t_ra, t_rs, rsh, len);
 } else {
 tcg_gen_rotli_tl(t_ra, t_rs, sh);
 tcg_gen_andi_tl(t_ra, t_ra, MASK(mb, me));
-- 
2.9.3




[Qemu-devel] [PULL 07/65] tcg/i386: Implement field extraction opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h | 12 +---
 tcg/i386/tcg-target.inc.c | 38 ++
 2 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 7625188..dc19c47 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -94,8 +94,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_deposit_i32  1
-#define TCG_TARGET_HAS_extract_i32  0
-#define TCG_TARGET_HAS_sextract_i32 0
+#define TCG_TARGET_HAS_extract_i32  1
+#define TCG_TARGET_HAS_sextract_i32 1
 #define TCG_TARGET_HAS_movcond_i32  1
 #define TCG_TARGET_HAS_add2_i32 1
 #define TCG_TARGET_HAS_sub2_i32 1
@@ -126,7 +126,7 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i64 0
 #define TCG_TARGET_HAS_nor_i64  0
 #define TCG_TARGET_HAS_deposit_i64  1
-#define TCG_TARGET_HAS_extract_i64  0
+#define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_movcond_i64  1
 #define TCG_TARGET_HAS_add2_i64 1
@@ -142,6 +142,12 @@ extern bool have_bmi1;
  ((ofs) == 0 && (len) == 16))
 #define TCG_TARGET_deposit_i64_validTCG_TARGET_deposit_i32_valid
 
+/* Check for the possibility of high-byte extraction and, for 64-bit,
+   zero-extending 32-bit right-shift.  */
+#define TCG_TARGET_extract_i32_valid(ofs, len) ((ofs) == 8 && (len) == 8)
+#define TCG_TARGET_extract_i64_valid(ofs, len) \
+(((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)
+
 #if TCG_TARGET_REG_BITS == 64
 # define TCG_AREG0 TCG_REG_R14
 #else
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index eeb1777..39f62bd 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2143,6 +2143,40 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
+case INDEX_op_extract_i64:
+if (args[2] + args[3] == 32) {
+/* This is a 32-bit zero-extending right shift.  */
+tcg_out_mov(s, TCG_TYPE_I32, args[0], args[1]);
+tcg_out_shifti(s, SHIFT_SHR, args[0], args[2]);
+break;
+}
+/* FALLTHRU */
+case INDEX_op_extract_i32:
+/* On the off-chance that we can use the high-byte registers.
+   Otherwise we emit the same ext16 + shift pattern that we
+   would have gotten from the normal tcg-op.c expansion.  */
+tcg_debug_assert(args[2] == 8 && args[3] == 8);
+if (args[1] < 4 && args[0] < 8) {
+tcg_out_modrm(s, OPC_MOVZBL, args[0], args[1] + 4);
+} else {
+tcg_out_ext16u(s, args[0], args[1]);
+tcg_out_shifti(s, SHIFT_SHR, args[0], 8);
+}
+break;
+
+case INDEX_op_sextract_i32:
+/* We don't implement sextract_i64, as we cannot sign-extend to
+   64-bits without using the REX prefix that explicitly excludes
+   access to the high-byte registers.  */
+tcg_debug_assert(args[2] == 8 && args[3] == 8);
+if (args[1] < 4 && args[0] < 8) {
+tcg_out_modrm(s, OPC_MOVSBL, args[0], args[1] + 4);
+} else {
+tcg_out_ext16s(s, args[0], args[1], 0);
+tcg_out_shifti(s, SHIFT_SAR, args[0], 8);
+}
+break;
+
 case INDEX_op_mb:
 tcg_out_mb(s, args[0]);
 break;
@@ -2204,6 +2238,9 @@ static const TCGTargetOpDef x86_op_defs[] = {
 { INDEX_op_setcond_i32, { "q", "r", "ri" } },
 
 { INDEX_op_deposit_i32, { "Q", "0", "Q" } },
+{ INDEX_op_extract_i32, { "r", "r" } },
+{ INDEX_op_sextract_i32, { "r", "r" } },
+
 { INDEX_op_movcond_i32, { "r", "r", "ri", "r", "0" } },
 
 { INDEX_op_mulu2_i32, { "a", "d", "a", "r" } },
@@ -2265,6 +2302,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
 { INDEX_op_extu_i32_i64, { "r", "r" } },
 
 { INDEX_op_deposit_i64, { "Q", "0", "Q" } },
+{ INDEX_op_extract_i64, { "r", "r" } },
 { INDEX_op_movcond_i64, { "r", "r", "re", "r", "0" } },
 
 { INDEX_op_mulu2_i64, { "a", "d", "a", "r" } },
-- 
2.9.3




[Qemu-devel] [PULL 19/65] tcg/optimize: Fold movcond 0/1 into setcond

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index f41ed2c..9e26bb7 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1105,6 +1105,21 @@ void tcg_optimize(TCGContext *s)
 tcg_opt_gen_mov(s, op, args, args[0], args[4-tmp]);
 break;
 }
+if (temp_is_const(args[3]) && temp_is_const(args[4])) {
+tcg_target_ulong tv = temps[args[3]].val;
+tcg_target_ulong fv = temps[args[4]].val;
+TCGCond cond = args[5];
+if (fv == 1 && tv == 0) {
+cond = tcg_invert_cond(cond);
+} else if (!(tv == 1 && fv == 0)) {
+goto do_default;
+}
+args[3] = cond;
+op->opc = opc = (opc == INDEX_op_movcond_i32
+ ? INDEX_op_setcond_i32
+ : INDEX_op_setcond_i64);
+nb_iargs = 2;
+}
 goto do_default;
 
 case INDEX_op_add2_i32:
-- 
2.9.3




[Qemu-devel] [PULL 01/65] tcg: Add field extraction primitives

2017-01-10 Thread Richard Henderson
Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of
fixed position bitfields, much like we already have for deposit.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/README   |  20 ++-
 tcg/aarch64/tcg-target.h |   4 +
 tcg/arm/tcg-target.h |   2 +
 tcg/i386/tcg-target.h|   4 +
 tcg/ia64/tcg-target.h|   4 +
 tcg/mips/tcg-target.h|   2 +
 tcg/optimize.c   |  29 +
 tcg/ppc/tcg-target.h |   4 +
 tcg/s390/tcg-target.h|   4 +
 tcg/sparc/tcg-target.h   |   4 +
 tcg/tcg-op.c | 323 +++
 tcg/tcg-op.h |  12 ++
 tcg/tcg-opc.h|   4 +
 tcg/tcg.h|   8 ++
 tcg/tci/tcg-target.h |   4 +
 15 files changed, 426 insertions(+), 2 deletions(-)

diff --git a/tcg/README b/tcg/README
index ae31388..065d9c2 100644
--- a/tcg/README
+++ b/tcg/README
@@ -314,11 +314,27 @@ The bitfield is described by POS/LEN, which are immediate 
values:
   LEN - the length of the bitfield
   POS - the position of the first bit, counting from the LSB
 
-For example, pos=8, len=4 indicates a 4-bit field at bit 8.
-This operation would be equivalent to
+For example, "deposit_i32 dest, t1, t2, 8, 4" indicates a 4-bit field
+at bit 8.  This operation would be equivalent to
 
   dest = (t1 & ~0x0f00) | ((t2 << 8) & 0x0f00)
 
+* extract_i32/i64 dest, t1, pos, len
+* sextract_i32/i64 dest, t1, pos, len
+
+Extract a bitfield from T1, placing the result in DEST.
+The bitfield is described by POS/LEN, which are immediate values,
+as above for deposit.  For extract_*, the result will be extended
+to the left with zeros; for sextract_*, the result will be extended
+to the left with copies of the bitfield sign bit at pos + len - 1.
+
+For example, "sextract_i32 dest, t1, 8, 4" indicates a 4-bit field
+at bit 8.  This operation would be equivalent to
+
+  dest = (t1 << 20) >> 28
+
+(using an arithmetic right shift).
+
 * extrl_i64_i32 t0, t1
 
 For 64-bit hosts only, extract the low 32-bits of input T1 and place it
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index a1d101f..410c31b 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -63,6 +63,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_deposit_i32  1
+#define TCG_TARGET_HAS_extract_i32  0
+#define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_movcond_i32  1
 #define TCG_TARGET_HAS_add2_i32 1
 #define TCG_TARGET_HAS_sub2_i32 1
@@ -93,6 +95,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_i64 0
 #define TCG_TARGET_HAS_nor_i64  0
 #define TCG_TARGET_HAS_deposit_i64  1
+#define TCG_TARGET_HAS_extract_i64  0
+#define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_movcond_i64  1
 #define TCG_TARGET_HAS_add2_i64 1
 #define TCG_TARGET_HAS_sub2_i64 1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index a0e1acf..8e724be 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -80,6 +80,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_deposit_i32  1
+#define TCG_TARGET_HAS_extract_i32  0
+#define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_movcond_i32  1
 #define TCG_TARGET_HAS_mulu2_i321
 #define TCG_TARGET_HAS_muls2_i321
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 524cfc6..7625188 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -94,6 +94,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_deposit_i32  1
+#define TCG_TARGET_HAS_extract_i32  0
+#define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_movcond_i32  1
 #define TCG_TARGET_HAS_add2_i32 1
 #define TCG_TARGET_HAS_sub2_i32 1
@@ -124,6 +126,8 @@ extern bool have_bmi1;
 #define TCG_TARGET_HAS_nand_i64 0
 #define TCG_TARGET_HAS_nor_i64  0
 #define TCG_TARGET_HAS_deposit_i64  1
+#define TCG_TARGET_HAS_extract_i64  0
+#define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_movcond_i64  1
 #define TCG_TARGET_HAS_add2_i64 1
 #define TCG_TARGET_HAS_sub2_i64 1
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 6dddb7f..8856dc8 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -149,6 +149,10 @@ typedef enum {
 #define TCG_TARGET_HAS_movcond_i64  1
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_deposit_i64  1
+#define TCG_TARGET_HAS_extract_i32  0
+#define TCG_TARGET_HAS_extract_i64  0
+#define TCG_TARGET_HAS_sextract_i32 0
+#define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_add2_i32   

[Qemu-devel] [PULL 14/65] target-arm: Use new deposit and extract ops

2017-01-10 Thread Richard Henderson
Use the new primitives for UBFX and SBFX.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 81 +-
 target/arm/translate.c | 37 +
 2 files changed, 37 insertions(+), 81 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index f673d93..a59c90c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -3216,67 +3216,44 @@ static void disas_bitfield(DisasContext *s, uint32_t 
insn)
low 32-bits anyway.  */
 tcg_tmp = read_cpu_reg(s, rn, 1);
 
-/* Recognize the common aliases.  */
-if (opc == 0) { /* SBFM */
-if (ri == 0) {
-if (si == 7) { /* SXTB */
-tcg_gen_ext8s_i64(tcg_rd, tcg_tmp);
-goto done;
-} else if (si == 15) { /* SXTH */
-tcg_gen_ext16s_i64(tcg_rd, tcg_tmp);
-goto done;
-} else if (si == 31) { /* SXTW */
-tcg_gen_ext32s_i64(tcg_rd, tcg_tmp);
-goto done;
-}
-}
-if (si == 63 || (si == 31 && ri <= si)) { /* ASR */
-if (si == 31) {
-tcg_gen_ext32s_i64(tcg_tmp, tcg_tmp);
-}
-tcg_gen_sari_i64(tcg_rd, tcg_tmp, ri);
+/* Recognize simple(r) extractions.  */
+if (si <= ri) {
+/* Wd = Wn */
+len = (si - ri) + 1;
+if (opc == 0) { /* SBFM: ASR, SBFX, SXTB, SXTH, SXTW */
+tcg_gen_sextract_i64(tcg_rd, tcg_tmp, ri, len);
 goto done;
-}
-} else if (opc == 2) { /* UBFM */
-if (ri == 0) { /* UXTB, UXTH, plus non-canonical AND */
-tcg_gen_andi_i64(tcg_rd, tcg_tmp, bitmask64(si + 1));
+} else if (opc == 2) { /* UBFM: UBFX, LSR, UXTB, UXTH */
+tcg_gen_extract_i64(tcg_rd, tcg_tmp, ri, len);
 return;
 }
-if (si == 63 || (si == 31 && ri <= si)) { /* LSR */
-if (si == 31) {
-tcg_gen_ext32u_i64(tcg_tmp, tcg_tmp);
-}
-tcg_gen_shri_i64(tcg_rd, tcg_tmp, ri);
-return;
-}
-if (si + 1 == ri && si != bitsize - 1) { /* LSL */
-int shift = bitsize - 1 - si;
-tcg_gen_shli_i64(tcg_rd, tcg_tmp, shift);
-goto done;
-}
-}
-
-if (opc != 1) { /* SBFM or UBFM */
-tcg_gen_movi_i64(tcg_rd, 0);
-}
-
-/* do the bit move operation */
-if (si >= ri) {
-/* Wd = Wn */
-tcg_gen_shri_i64(tcg_tmp, tcg_tmp, ri);
+/* opc == 1, BXFIL fall through to deposit */
+tcg_gen_extract_i64(tcg_tmp, tcg_tmp, ri, len);
 pos = 0;
-len = (si - ri) + 1;
 } else {
-/* Wd<32+s-r,32-r> = Wn */
-pos = bitsize - ri;
+/* Handle the ri > si case with a deposit
+ * Wd<32+s-r,32-r> = Wn
+ */
 len = si + 1;
+pos = (bitsize - ri) & (bitsize - 1);
 }
 
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
+if (opc == 0 && len < ri) {
+/* SBFM: sign extend the destination field from len to fill
+   the balance of the word.  Let the deposit below insert all
+   of those sign bits.  */
+tcg_gen_sextract_i64(tcg_tmp, tcg_tmp, 0, len);
+len = ri;
+}
 
-if (opc == 0) { /* SBFM - sign extend the destination field */
-tcg_gen_shli_i64(tcg_rd, tcg_rd, 64 - (pos + len));
-tcg_gen_sari_i64(tcg_rd, tcg_rd, 64 - (pos + len));
+if (opc == 1) { /* BFM, BXFIL */
+tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, pos, len);
+} else {
+/* SBFM or UBFM: We start with zero, and we haven't modified
+   any bits outside bitsize, therefore the zero-extension
+   below is unneeded.  */
+tcg_gen_deposit_z_i64(tcg_rd, tcg_tmp, pos, len);
+return;
 }
 
  done:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 0ad9070..08da9ac 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -288,29 +288,6 @@ static void gen_revsh(TCGv_i32 var)
 tcg_gen_ext16s_i32(var, var);
 }
 
-/* Unsigned bitfield extract.  */
-static void gen_ubfx(TCGv_i32 var, int shift, uint32_t mask)
-{
-if (shift)
-tcg_gen_shri_i32(var, var, shift);
-tcg_gen_andi_i32(var, var, mask);
-}
-
-/* Signed bitfield extract.  */
-static void gen_sbfx(TCGv_i32 var, int shift, int width)
-{
-uint32_t signbit;
-
-if (shift)
-tcg_gen_sari_i32(var, var, shift);
-if (shift + width < 32) {
-signbit = 1u << (width - 1);
-tcg_gen_andi_i32(var, var, (1u << width) - 1);
-tcg_gen_xori_i32(var, var, signbit);
-tcg_gen_subi_i32(var, var, signbit);
-}
-}
-
 /* Return (b << 32) + a. Mark inputs as dead */
 static TCGv_i64 gen_addq_msw(TCGv_i64 a, TCGv_i32 b)
 {
@@ -9178,9 +9155,9 @@ static void disas_arm_insn(DisasContext *s, unsigned int 
insn)

[Qemu-devel] [PULL 18/65] target-s390x: Use the new deposit and extract ops

2017-01-10 Thread Richard Henderson
Use the new primitives for RISBG.

Signed-off-by: Richard Henderson 
---
 target/s390x/translate.c | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/target/s390x/translate.c b/target/s390x/translate.c
index 02bc705..6cebb7e 100644
--- a/target/s390x/translate.c
+++ b/target/s390x/translate.c
@@ -3134,20 +3134,26 @@ static ExitStatus op_risbg(DisasContext *s, DisasOps *o)
 }
 }
 
-/* In some cases we can implement this with deposit, which can be more
-   efficient on some hosts.  */
-if (~mask == imask && i3 <= i4) {
-if (s->fields->op2 == 0x5d) {
-i3 += 32, i4 += 32;
-}
+len = i4 - i3 + 1;
+pos = 63 - i4;
+rot = i5 & 63;
+if (s->fields->op2 == 0x5d) {
+pos += 32;
+}
+
+/* In some cases we can implement this with extract.  */
+if (imask == 0 && pos == 0 && len > 0 && rot + len <= 64) {
+tcg_gen_extract_i64(o->out, o->in2, rot, len);
+return NO_EXIT;
+}
+
+/* In some cases we can implement this with deposit.  */
+if (len > 0 && (imask == 0 || ~mask == imask)) {
 /* Note that we rotate the bits to be inserted to the lsb, not to
the position as described in the PoO.  */
-len = i4 - i3 + 1;
-pos = 63 - i4;
-rot = (i5 - pos) & 63;
+rot = (rot - pos) & 63;
 } else {
-pos = len = -1;
-rot = i5 & 63;
+pos = -1;
 }
 
 /* Rotate the input as necessary.  */
@@ -3155,7 +3161,11 @@ static ExitStatus op_risbg(DisasContext *s, DisasOps *o)
 
 /* Insert the selected bits into the output.  */
 if (pos >= 0) {
-tcg_gen_deposit_i64(o->out, o->out, o->in2, pos, len);
+if (imask == 0) {
+tcg_gen_deposit_z_i64(o->out, o->in2, pos, len);
+} else {
+tcg_gen_deposit_i64(o->out, o->out, o->in2, pos, len);
+}
 } else if (imask == 0) {
 tcg_gen_andi_i64(o->out, o->in2, mask);
 } else {
-- 
2.9.3




[Qemu-devel] [PULL 08/65] tcg/mips: Implement field extraction opcodes

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.h | 4 +++-
 tcg/mips/tcg-target.inc.c | 9 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index fcc2986..92d203a 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -158,7 +158,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_movcond_i32  use_movnz_instructions
 #define TCG_TARGET_HAS_bswap16_i32  use_mips32r2_instructions
 #define TCG_TARGET_HAS_deposit_i32  use_mips32r2_instructions
-#define TCG_TARGET_HAS_extract_i32  0
+#define TCG_TARGET_HAS_extract_i32  use_mips32r2_instructions
 #define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_ext8s_i32use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i32   use_mips32r2_instructions
@@ -170,6 +170,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_bswap32_i64  use_mips32r2_instructions
 #define TCG_TARGET_HAS_bswap64_i64  use_mips32r2_instructions
 #define TCG_TARGET_HAS_deposit_i64  use_mips32r2_instructions
+#define TCG_TARGET_HAS_extract_i64  use_mips32r2_instructions
+#define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_ext8s_i64use_mips32r2_instructions
 #define TCG_TARGET_HAS_ext16s_i64   use_mips32r2_instructions
 #define TCG_TARGET_HAS_rot_i64  use_mips32r2_instructions
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 5b2fe98..24c4949 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2051,6 +2051,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_opc_bf64(s, OPC_DINS, OPC_DINSM, OPC_DINSU, a0, a2,
  args[3] + args[4] - 1, args[3]);
 break;
+case INDEX_op_extract_i32:
+tcg_out_opc_bf(s, OPC_EXT, a0, a1, a2 + args[3] - 1, a2);
+break;
+case INDEX_op_extract_i64:
+tcg_out_opc_bf64(s, OPC_DEXT, OPC_DEXTM, OPC_DEXTU, a0, a1,
+ a2 + args[3] - 1, a2);
+break;
 
 case INDEX_op_brcond_i32:
 case INDEX_op_brcond_i64:
@@ -2155,6 +2162,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
 { INDEX_op_ext16s_i32, { "r", "rZ" } },
 
 { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+{ INDEX_op_extract_i32, { "r", "r" } },
 
 { INDEX_op_brcond_i32, { "rZ", "rZ" } },
 #if use_mips32r6_instructions
@@ -2224,6 +2232,7 @@ static const TCGTargetOpDef mips_op_defs[] = {
 { INDEX_op_extrh_i64_i32, { "r", "rZ" } },
 
 { INDEX_op_deposit_i64, { "r", "0", "rZ" } },
+{ INDEX_op_extract_i64, { "r", "r" } },
 
 { INDEX_op_brcond_i64, { "rZ", "rZ" } },
 #if use_mips32r6_instructions
-- 
2.9.3




[Qemu-devel] [PULL 06/65] tcg/arm: Implement field extraction opcodes

2017-01-10 Thread Richard Henderson
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.h |  4 ++--
 tcg/arm/tcg-target.inc.c | 24 
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index d1fe12b..4e30728 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -111,8 +111,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_deposit_i32  use_armv7_instructions
-#define TCG_TARGET_HAS_extract_i32  0
-#define TCG_TARGET_HAS_sextract_i32 0
+#define TCG_TARGET_HAS_extract_i32  use_armv7_instructions
+#define TCG_TARGET_HAS_sextract_i32 use_armv7_instructions
 #define TCG_TARGET_HAS_movcond_i32  1
 #define TCG_TARGET_HAS_mulu2_i321
 #define TCG_TARGET_HAS_muls2_i321
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 1415c27..2d5af0f 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -713,6 +713,22 @@ static inline void tcg_out_deposit(TCGContext *s, int 
cond, TCGReg rd,
   | (ofs << 7) | ((ofs + len - 1) << 16));
 }
 
+static inline void tcg_out_extract(TCGContext *s, int cond, TCGReg rd,
+   TCGArg a1, int ofs, int len)
+{
+/* ubfx */
+tcg_out32(s, 0x07e00050 | (cond << 28) | (rd << 12) | a1
+  | (ofs << 7) | ((len - 1) << 16));
+}
+
+static inline void tcg_out_sextract(TCGContext *s, int cond, TCGReg rd,
+TCGArg a1, int ofs, int len)
+{
+/* sbfx */
+tcg_out32(s, 0x07a00050 | (cond << 28) | (rd << 12) | a1
+  | (ofs << 7) | ((len - 1) << 16));
+}
+
 /* Note that this routine is used for both LDR and LDRH formats, so we do
not wish to include an immediate shift at this point.  */
 static void tcg_out_memop_r(TCGContext *s, int cond, ARMInsn opc, TCGReg rt,
@@ -1894,6 +1910,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_deposit(s, COND_AL, args[0], args[2],
 args[3], args[4], const_args[2]);
 break;
+case INDEX_op_extract_i32:
+tcg_out_extract(s, COND_AL, args[0], args[1], args[2], args[3]);
+break;
+case INDEX_op_sextract_i32:
+tcg_out_sextract(s, COND_AL, args[0], args[1], args[2], args[3]);
+break;
 
 case INDEX_op_div_i32:
 tcg_out_sdiv(s, COND_AL, args[0], args[1], args[2]);
@@ -1976,6 +1998,8 @@ static const TCGTargetOpDef arm_op_defs[] = {
 { INDEX_op_ext16u_i32, { "r", "r" } },
 
 { INDEX_op_deposit_i32, { "r", "0", "rZ" } },
+{ INDEX_op_extract_i32, { "r", "r" } },
+{ INDEX_op_sextract_i32, { "r", "r" } },
 
 { INDEX_op_div_i32, { "r", "r", "r" } },
 { INDEX_op_divu_i32, { "r", "r", "r" } },
-- 
2.9.3




[Qemu-devel] [PULL 13/65] target-alpha: Use deposit and extract ops

2017-01-10 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/alpha/translate.c | 67 ++--
 1 file changed, 42 insertions(+), 25 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 114927b..5ac2277 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -949,7 +949,13 @@ static void gen_ext_h(DisasContext *ctx, TCGv vc, TCGv va, 
int rb, bool islit,
   uint8_t lit, uint8_t byte_mask)
 {
 if (islit) {
-tcg_gen_shli_i64(vc, va, (64 - lit * 8) & 0x3f);
+int pos = (64 - lit * 8) & 0x3f;
+int len = cto32(byte_mask) * 8;
+if (pos < len) {
+tcg_gen_deposit_z_i64(vc, va, pos, len - pos);
+} else {
+tcg_gen_movi_i64(vc, 0);
+}
 } else {
 TCGv tmp = tcg_temp_new();
 tcg_gen_shli_i64(tmp, load_gpr(ctx, rb), 3);
@@ -966,38 +972,44 @@ static void gen_ext_l(DisasContext *ctx, TCGv vc, TCGv 
va, int rb, bool islit,
   uint8_t lit, uint8_t byte_mask)
 {
 if (islit) {
-tcg_gen_shri_i64(vc, va, (lit & 7) * 8);
+int pos = (lit & 7) * 8;
+int len = cto32(byte_mask) * 8;
+if (pos + len >= 64) {
+len = 64 - pos;
+}
+tcg_gen_extract_i64(vc, va, pos, len);
 } else {
 TCGv tmp = tcg_temp_new();
 tcg_gen_andi_i64(tmp, load_gpr(ctx, rb), 7);
 tcg_gen_shli_i64(tmp, tmp, 3);
 tcg_gen_shr_i64(vc, va, tmp);
 tcg_temp_free(tmp);
+gen_zapnoti(vc, vc, byte_mask);
 }
-gen_zapnoti(vc, vc, byte_mask);
 }
 
 /* INSWH, INSLH, INSQH */
 static void gen_ins_h(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
   uint8_t lit, uint8_t byte_mask)
 {
-TCGv tmp = tcg_temp_new();
-
-/* The instruction description has us left-shift the byte mask and extract
-   bits <15:8> and apply that zap at the end.  This is equivalent to simply
-   performing the zap first and shifting afterward.  */
-gen_zapnoti(tmp, va, byte_mask);
-
 if (islit) {
-lit &= 7;
-if (unlikely(lit == 0)) {
-tcg_gen_movi_i64(vc, 0);
+int pos = 64 - (lit & 7) * 8;
+int len = cto32(byte_mask) * 8;
+if (pos < len) {
+tcg_gen_extract_i64(vc, va, pos, len - pos);
 } else {
-tcg_gen_shri_i64(vc, tmp, 64 - lit * 8);
+tcg_gen_movi_i64(vc, 0);
 }
 } else {
+TCGv tmp = tcg_temp_new();
 TCGv shift = tcg_temp_new();
 
+/* The instruction description has us left-shift the byte mask
+   and extract bits <15:8> and apply that zap at the end.  This
+   is equivalent to simply performing the zap first and shifting
+   afterward.  */
+gen_zapnoti(tmp, va, byte_mask);
+
 /* If (B & 7) == 0, we need to shift by 64 and leave a zero.  Do this
portably by splitting the shift into two parts: shift_count-1 and 1.
Arrange for the -1 by using ones-complement instead of
@@ -1010,32 +1022,37 @@ static void gen_ins_h(DisasContext *ctx, TCGv vc, TCGv 
va, int rb, bool islit,
 tcg_gen_shr_i64(vc, tmp, shift);
 tcg_gen_shri_i64(vc, vc, 1);
 tcg_temp_free(shift);
+tcg_temp_free(tmp);
 }
-tcg_temp_free(tmp);
 }
 
 /* INSBL, INSWL, INSLL, INSQL */
 static void gen_ins_l(DisasContext *ctx, TCGv vc, TCGv va, int rb, bool islit,
   uint8_t lit, uint8_t byte_mask)
 {
-TCGv tmp = tcg_temp_new();
-
-/* The instruction description has us left-shift the byte mask
-   the same number of byte slots as the data and apply the zap
-   at the end.  This is equivalent to simply performing the zap
-   first and shifting afterward.  */
-gen_zapnoti(tmp, va, byte_mask);
-
 if (islit) {
-tcg_gen_shli_i64(vc, tmp, (lit & 7) * 8);
+int pos = (lit & 7) * 8;
+int len = cto32(byte_mask) * 8;
+if (pos + len > 64) {
+len = 64 - pos;
+}
+tcg_gen_deposit_z_i64(vc, va, pos, len);
 } else {
+TCGv tmp = tcg_temp_new();
 TCGv shift = tcg_temp_new();
+
+/* The instruction description has us left-shift the byte mask
+   and extract bits <15:8> and apply that zap at the end.  This
+   is equivalent to simply performing the zap first and shifting
+   afterward.  */
+gen_zapnoti(tmp, va, byte_mask);
+
 tcg_gen_andi_i64(shift, load_gpr(ctx, rb), 7);
 tcg_gen_shli_i64(shift, shift, 3);
 tcg_gen_shl_i64(vc, tmp, shift);
 tcg_temp_free(shift);
+tcg_temp_free(tmp);
 }
-tcg_temp_free(tmp);
 }
 
 /* MSKWH, MSKLH, MSKQH */
-- 
2.9.3




[Qemu-devel] [PULL 16/65] target-mips: Use the new extract op

2017-01-10 Thread Richard Henderson
Use extract for EXT and DEXT.

Reviewed-by: Yongbok Kim 
Signed-off-by: Richard Henderson 
---
 target/mips/translate.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/target/mips/translate.c b/target/mips/translate.c
index 57b824f..8deffa1 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -4488,11 +4488,12 @@ static void gen_bitops (DisasContext *ctx, uint32_t 
opc, int rt,
 if (lsb + msb > 31) {
 goto fail;
 }
-tcg_gen_shri_tl(t0, t1, lsb);
 if (msb != 31) {
-tcg_gen_andi_tl(t0, t0, (1U << (msb + 1)) - 1);
+tcg_gen_extract_tl(t0, t1, lsb, msb + 1);
 } else {
-tcg_gen_ext32s_tl(t0, t0);
+/* The two checks together imply that lsb == 0,
+   so this is a simple sign-extension.  */
+tcg_gen_ext32s_tl(t0, t1);
 }
 break;
 #if defined(TARGET_MIPS64)
@@ -4507,10 +4508,7 @@ static void gen_bitops (DisasContext *ctx, uint32_t opc, 
int rt,
 if (lsb + msb > 63) {
 goto fail;
 }
-tcg_gen_shri_tl(t0, t1, lsb);
-if (msb != 63) {
-tcg_gen_andi_tl(t0, t0, (1ULL << (msb + 1)) - 1);
-}
+tcg_gen_extract_tl(t0, t1, lsb, msb + 1);
 break;
 #endif
 case OPC_INS:
-- 
2.9.3




[Qemu-devel] [PULL 05/65] tcg/arm: Move isa detection to tcg-target.h

2017-01-10 Thread Richard Henderson
This allows us to use this detection within the TCG_TARGET_HAS_*
macros, instead of requiring a function call into tcg-target.inc.c.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.h | 36 
 tcg/arm/tcg-target.inc.c | 41 +
 2 files changed, 33 insertions(+), 44 deletions(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 8e724be..d1fe12b 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -26,6 +26,37 @@
 #ifndef ARM_TCG_TARGET_H
 #define ARM_TCG_TARGET_H
 
+/* The __ARM_ARCH define is provided by gcc 4.8.  Construct it otherwise.  */
+#ifndef __ARM_ARCH
+# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
+ || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
+ || defined(__ARM_ARCH_7EM__)
+#  define __ARM_ARCH 7
+# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
+   || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
+   || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
+#  define __ARM_ARCH 6
+# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5E__) \
+   || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
+   || defined(__ARM_ARCH_5TEJ__)
+#  define __ARM_ARCH 5
+# else
+#  define __ARM_ARCH 4
+# endif
+#endif
+
+extern int arm_arch;
+
+#if defined(__ARM_ARCH_5T__) \
+|| defined(__ARM_ARCH_5TE__) || defined(__ARM_ARCH_5TEJ__)
+# define use_armv5t_instructions 1
+#else
+# define use_armv5t_instructions use_armv6_instructions
+#endif
+
+#define use_armv6_instructions  (__ARM_ARCH >= 6 || arm_arch >= 6)
+#define use_armv7_instructions  (__ARM_ARCH >= 7 || arm_arch >= 7)
+
 #undef TCG_TARGET_STACK_GROWSUP
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
@@ -79,7 +110,7 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_eqv_i32  0
 #define TCG_TARGET_HAS_nand_i32 0
 #define TCG_TARGET_HAS_nor_i32  0
-#define TCG_TARGET_HAS_deposit_i32  1
+#define TCG_TARGET_HAS_deposit_i32  use_armv7_instructions
 #define TCG_TARGET_HAS_extract_i32  0
 #define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_movcond_i32  1
@@ -90,9 +121,6 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_div_i32  use_idiv_instructions
 #define TCG_TARGET_HAS_rem_i32  0
 
-extern bool tcg_target_deposit_valid(int ofs, int len);
-#define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
-
 enum {
 TCG_AREG0 = TCG_REG_R6,
 };
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index ffa0d40..1415c27 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -25,36 +25,7 @@
 #include "elf.h"
 #include "tcg-be-ldst.h"
 
-/* The __ARM_ARCH define is provided by gcc 4.8.  Construct it otherwise.  */
-#ifndef __ARM_ARCH
-# if defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
- || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
- || defined(__ARM_ARCH_7EM__)
-#  define __ARM_ARCH 7
-# elif defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
-   || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
-   || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
-#  define __ARM_ARCH 6
-# elif defined(__ARM_ARCH_5__) || defined(__ARM_ARCH_5E__) \
-   || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
-   || defined(__ARM_ARCH_5TEJ__)
-#  define __ARM_ARCH 5
-# else
-#  define __ARM_ARCH 4
-# endif
-#endif
-
-static int arm_arch = __ARM_ARCH;
-
-#if defined(__ARM_ARCH_5T__) \
-|| defined(__ARM_ARCH_5TE__) || defined(__ARM_ARCH_5TEJ__)
-# define use_armv5t_instructions 1
-#else
-# define use_armv5t_instructions use_armv6_instructions
-#endif
-
-#define use_armv6_instructions  (__ARM_ARCH >= 6 || arm_arch >= 6)
-#define use_armv7_instructions  (__ARM_ARCH >= 7 || arm_arch >= 7)
+int arm_arch = __ARM_ARCH;
 
 #ifndef use_idiv_instructions
 bool use_idiv_instructions;
@@ -730,16 +701,6 @@ static inline void tcg_out_bswap32(TCGContext *s, int 
cond, int rd, int rn)
 }
 }
 
-bool tcg_target_deposit_valid(int ofs, int len)
-{
-/* ??? Without bfi, we could improve over generic code by combining
-   the right-shift from a non-zero ofs with the orr.  We do run into
-   problems when rd == rs, and the mask generated from ofs+len doesn't
-   fit into an immediate.  We would have to be careful not to pessimize
-   wrt the optimizations performed on the expanded code.  */
-return use_armv7_instructions;
-}
-
 static inline void tcg_out_deposit(TCGContext *s, int cond, TCGReg rd,
TCGArg a1, int ofs, int len, bool const_a1)
 {
-- 
2.9.3




[Qemu-devel] [PULL 03/65] tcg: Add deposit_z expander

2017-01-10 Thread Richard Henderson
While we don't require a new opcode, it is handy to have an expander
that knows the first source is zero.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 143 +++
 tcg/tcg-op.h |   6 +++
 2 files changed, 149 insertions(+)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index b17f03f..1927e53 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -561,6 +561,64 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, 
TCGv_i32 arg2,
 tcg_temp_free_i32(t1);
 }
 
+void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
+   unsigned int ofs, unsigned int len)
+{
+tcg_debug_assert(ofs < 32);
+tcg_debug_assert(len > 0);
+tcg_debug_assert(len <= 32);
+tcg_debug_assert(ofs + len <= 32);
+
+if (ofs + len == 32) {
+tcg_gen_shli_i32(ret, arg, ofs);
+} else if (ofs == 0) {
+tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
+} else if (TCG_TARGET_HAS_deposit_i32
+   && TCG_TARGET_deposit_i32_valid(ofs, len)) {
+TCGv_i32 zero = tcg_const_i32(0);
+tcg_gen_op5ii_i32(INDEX_op_deposit_i32, ret, zero, arg, ofs, len);
+tcg_temp_free_i32(zero);
+} else {
+/* To help two-operand hosts we prefer to zero-extend first,
+   which allows ARG to stay live.  */
+switch (len) {
+case 16:
+if (TCG_TARGET_HAS_ext16u_i32) {
+tcg_gen_ext16u_i32(ret, arg);
+tcg_gen_shli_i32(ret, ret, ofs);
+return;
+}
+break;
+case 8:
+if (TCG_TARGET_HAS_ext8u_i32) {
+tcg_gen_ext8u_i32(ret, arg);
+tcg_gen_shli_i32(ret, ret, ofs);
+return;
+}
+break;
+}
+/* Otherwise prefer zero-extension over AND for code size.  */
+switch (ofs + len) {
+case 16:
+if (TCG_TARGET_HAS_ext16u_i32) {
+tcg_gen_shli_i32(ret, arg, ofs);
+tcg_gen_ext16u_i32(ret, ret);
+return;
+}
+break;
+case 8:
+if (TCG_TARGET_HAS_ext8u_i32) {
+tcg_gen_shli_i32(ret, arg, ofs);
+tcg_gen_ext8u_i32(ret, ret);
+return;
+}
+break;
+}
+tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
+tcg_gen_shli_i32(ret, ret, ofs);
+}
+}
+
 void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
  unsigned int ofs, unsigned int len)
 {
@@ -1762,6 +1820,91 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, 
TCGv_i64 arg2,
 tcg_temp_free_i64(t1);
 }
 
+void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
+   unsigned int ofs, unsigned int len)
+{
+tcg_debug_assert(ofs < 64);
+tcg_debug_assert(len > 0);
+tcg_debug_assert(len <= 64);
+tcg_debug_assert(ofs + len <= 64);
+
+if (ofs + len == 64) {
+tcg_gen_shli_i64(ret, arg, ofs);
+} else if (ofs == 0) {
+tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
+} else if (TCG_TARGET_HAS_deposit_i64
+   && TCG_TARGET_deposit_i64_valid(ofs, len)) {
+TCGv_i64 zero = tcg_const_i64(0);
+tcg_gen_op5ii_i64(INDEX_op_deposit_i64, ret, zero, arg, ofs, len);
+tcg_temp_free_i64(zero);
+} else {
+if (TCG_TARGET_REG_BITS == 32) {
+if (ofs >= 32) {
+tcg_gen_deposit_z_i32(TCGV_HIGH(ret), TCGV_LOW(arg),
+  ofs - 32, len);
+tcg_gen_movi_i32(TCGV_LOW(ret), 0);
+return;
+}
+if (ofs + len <= 32) {
+tcg_gen_deposit_z_i32(TCGV_LOW(ret), TCGV_LOW(arg), ofs, len);
+tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
+return;
+}
+}
+/* To help two-operand hosts we prefer to zero-extend first,
+   which allows ARG to stay live.  */
+switch (len) {
+case 32:
+if (TCG_TARGET_HAS_ext32u_i64) {
+tcg_gen_ext32u_i64(ret, arg);
+tcg_gen_shli_i64(ret, ret, ofs);
+return;
+}
+break;
+case 16:
+if (TCG_TARGET_HAS_ext16u_i64) {
+tcg_gen_ext16u_i64(ret, arg);
+tcg_gen_shli_i64(ret, ret, ofs);
+return;
+}
+break;
+case 8:
+if (TCG_TARGET_HAS_ext8u_i64) {
+tcg_gen_ext8u_i64(ret, arg);
+tcg_gen_shli_i64(ret, ret, ofs);
+return;
+}
+break;
+}
+/* Otherwise prefer zero-extension over AND for code size.  */
+switch (ofs + len) {
+case 32:
+if (TCG_TARGET_HAS_ext32u_i64) {
+tcg_gen_shli_i64(ret, arg, ofs);
+tcg_gen_ext32u_i64(ret, ret);
+   

  1   2   3   4   >