[PATCH v2 0/1] Add Loongson 2F disassembler

2020-07-03 Thread Stefan Brankovic
This patch adds disassembler for Loongson 2F architecture. v2: Fixed coding style problems. Added comments related to licence and author. Stefan Brankovic (1): disas: mips: Add Loongson 2F disassembler MAINTAINERS |1 + configure |1 + disas/Makefile.objs

Re: [PATCH 1/1] disas: mips: Add Loongson 2F disassembler

2020-07-03 Thread Stefan Brankovic
On 3.7.20. 12:09, Thomas Huth wrote: On 03/07/2020 11.49, Stefan Brankovic wrote: On 3.7.20. 09:59, Thomas Huth wrote: On 02/07/2020 21.42, Stefan Brankovic wrote: Add disassembler for Loongson 2F instruction set. Testing is done by comparing qemu disassembly output, obtained by using -d

Re: [PATCH 1/1] disas: mips: Add Loongson 2F disassembler

2020-07-03 Thread Stefan Brankovic
On 3.7.20. 09:59, Thomas Huth wrote: On 02/07/2020 21.42, Stefan Brankovic wrote: Add disassembler for Loongson 2F instruction set. Testing is done by comparing qemu disassembly output, obtained by using -d in_asm command line option, with appropriate objdump output. Signed-off-by: Stefan

[PATCH 0/1] Add Loongson 2F disassembler

2020-07-02 Thread Stefan Brankovic
This patch adds disassembler for Loongson 2F instruction set. Stefan Brankovic (1): disas: mips: Add Loongson 2F disassembler MAINTAINERS |1 + configure |1 + disas/Makefile.objs |1 + disas/loongson2f.cpp| 8134

Re: [PATCH 2/2] mailmap: Change email address of Stefan Brankovic

2020-06-02 Thread Stefan Brankovic
On 2.6.20. 10:52, Aleksandar Markovic wrote: Stefan Brankovic wants to use his new email address for his future work in QEMU. CC: Stefan Brankovic Signed-off-by: Aleksandar Markovic Reviewed-by: Stefan Brankovic --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap

[PATCH v9 1/3] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-10-23 Thread Stefan Brankovic
for the lower doubleword element of vB. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 132 +++- 3 files changed, 130 insertions(+), 13 deletions

[PATCH v9 2/3] target/ppc: Optimize emulation of vpkpx instruction

2019-10-23 Thread Stefan Brankovic
iterations. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 1 - target/ppc/int_helper.c | 21 - target/ppc/translate/vmx-impl.inc.c | 93 - 3 files changed, 92 insertions(+), 23 deletions(-) diff --git a/target/ppc

[PATCH v9 0/3] Optimize emulation of some Altivec instructions

2019-10-23 Thread Stefan Brankovic
. V3: Fixed problem during build. V2: Addressed Richard's Henderson's suggestions. Fixed problem during build on patch 2/8. Rebased series to the latest qemu code. Stefan Brankovic (3): target/ppc: Optimize emulation of vclzh and vclzb instructions target/ppc: Optimize emulation of vpkpx

[PATCH v9 3/3] target/ppc: Optimize emulation of vupkhpx and vupklpx instructions

2019-10-23 Thread Stefan Brankovic
variable 'result', that is later transferred to the destination register. Inner 'for' loop does unpacking of pixels in two iterations. Each iteration takes 16 bits from source register and unpacks them into 32 bits of the destination register. Signed-off-by: Stefan Brankovic --- target/ppc

[PATCH v8 3/3] target/ppc: Optimize emulation of vupkhpx and vupklpx instructions

2019-10-23 Thread Stefan Brankovic
into 32 bits of the destination register. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 20 - target/ppc/translate/vmx-impl.inc.c | 82 - 3 files changed, 80 insertions(+), 24 deletions

[PATCH v8 1/3] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-10-23 Thread Stefan Brankovic
for the lower doubleword element of vB. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 132 +++- 3 files changed, 130 insertions(+), 13 deletions

[PATCH v8 0/3] Optimize emulation of some Altivec instructions

2019-10-23 Thread Stefan Brankovic
: Addressed Richard's Henderson's suggestions. Fixed problem during build on patch 2/8. Rebased series to the latest qemu code. Stefan Brankovic (3): target/ppc: Optimize emulation of vclzh and vclzb instructions target/ppc: Optimize emulation of vpkpx instruction target/ppc: Optimize emulation

[PATCH v8 2/3] target/ppc: Optimize emulation of vpkpx instruction

2019-10-23 Thread Stefan Brankovic
iterations. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 1 - target/ppc/int_helper.c | 21 - target/ppc/translate/vmx-impl.inc.c | 93 - 3 files changed, 92 insertions(+), 23 deletions(-) diff --git a/target/ppc

Re: [PATCH v7 3/3] target/ppc: Optimize emulation of vupkhpx and vupklpx instructions

2019-10-21 Thread Stefan Brankovic
Hello Aleksandar, Thank you for taking a look at this patch. I will start working on a version 8 of the patch where I will address all your suggestions. Kind Regards, Stefan On 19.10.19. 22:40, Aleksandar Markovic wrote: On Thursday, October 17, 2019, Stefan Brankovic

[PATCH v7 3/3] target/ppc: Optimize emulation of vupkhpx and vupklpx instructions

2019-10-17 Thread Stefan Brankovic
the same way. It also stores result of every iteration in temporary register, that is later transferred to destination register. Inner 'for' loop does unpacking of pixels and forms resulting doubleword 32 by 32 bits. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2

[PATCH v7 1/3] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-10-17 Thread Stefan Brankovic
-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 136 +++- 3 files changed, 134 insertions(+), 13 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc

[PATCH v7 2/3] target/ppc: Optimize emulation of vpkpx instruction

2019-10-17 Thread Stefan Brankovic
iterations, 1 for each pixel) and save result in tmp variable. In the end of outer for loop, the result is merged in variable called result and saved in appropriate doubleword element of vD if the whole doubleword is finished(every second iteration). The outer loop has 4 iterations. Signed-off-by: Stefan

[PATCH v7 0/3] target/ppc: Optimize emulation of some Altivec instructions

2019-10-17 Thread Stefan Brankovic
problem during build on patch 2/8. Rebased series to the latest qemu code. Stefan Brankovic (3): target/ppc: Optimize emulation of vclzh and vclzb instructions target/ppc: Optimize emulation of vpkpx instruction target/ppc: Optimize emulation of vupkhpx and vupklpx instructions target/ppc

Re: [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction

2019-10-16 Thread Stefan Brankovic
On 29.8.19. 17:31, Richard Henderson wrote: On 8/29/19 6:34 AM, Stefan Brankovic wrote: Then I run my performance tests and I got following results(test is calling vpkpx 10 times): 1) Current helper implementation: ~ 157 ms 2) helper implementation you suggested: ~94 ms 3) tcg

[PATCH v2] target/ppc: Fix for optimized vsl/vsr instructions

2019-10-04 Thread Stefan Brankovic
improvement compared to old helper implementation. V1 of this patch was not sent to qemu-devel and I am now sending V2 to appropriate email adresses. Stefan Brankovic (1): target/ppc: Fix for optimized vsl/vsr instructions target/ppc/translate/vmx-impl.inc.c | 84

[PATCH v2] target/ppc: Fix for optimized vsl/vsr instructions

2019-10-04 Thread Stefan Brankovic
Suggested-by: Aleksandar Markovic Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 84 ++--- 1 file changed, 40 insertions(+), 44 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index

Re: target/ppc: bug in optimised vsl/vsr implementation?

2019-10-03 Thread Stefan Brankovic
Please take a look at the following patch https://lists.nongnu.org/archive/html/qemu-ppc/2019-10/msg00133.html and let me know if problem is solved. On 2.10.19. 16:08, Stefan Brankovic wrote: Hi Mark, Thank you for reporting this bug. I was away from office for couple of days, so that's why

Re: target/ppc: bug in optimised vsl/vsr implementation?

2019-10-02 Thread Stefan Brankovic
Hi Mark, Thank you for reporting this bug. I was away from office for couple of days, so that's why I am answering you a bit late, sorry about that. I will start working on a solution and try to fix this problem in next couple of days. On 1.10.19. 20:24, Mark Cave-Ayland wrote: On

Re: [Qemu-devel] [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction

2019-08-29 Thread Stefan Brankovic
On 27.8.19. 20:52, Richard Henderson wrote: On 8/27/19 2:37 AM, Stefan Brankovic wrote: +for (i = 0; i < 4; i++) { +switch (i) { +case 0: +/* + * Get high doubleword of vA to perfrom 6-5-5 pack of pixels + * 1 an

[Qemu-devel] [PATCH v6 2/3] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-08-27 Thread Stefan Brankovic
-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 136 +++- 3 files changed, 134 insertions(+), 13 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc

[Qemu-devel] [PATCH v6 3/3] target/ppc: Refactor emulation of vmrgew and vmrgow instructions

2019-08-27 Thread Stefan Brankovic
, and second one with a helper. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 66 + 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index

[Qemu-devel] [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction

2019-08-27 Thread Stefan Brankovic
iterations, 1 for each pixel) and save result in tmp variable. In the end of outer for loop, the result is merged in variable called result and saved in appropriate doubleword element of vD if the whole doubleword is finished(every second iteration). The outer loop has 4 iterations. Signed-off-by: Stefan

[Qemu-devel] [PATCH v6 0/3] target/ppc: Optimize emulation of some Altivec instructions

2019-08-27 Thread Stefan Brankovic
) in tcg. Implemented vector vmrgh and vmrgl instructions for i386. Converted vmrgh and vmrgl instructions to vector operations. V3: Fixed problem during build. V2: Addressed Richard's Henderson's suggestions. Fixed problem during build on patch 2/8. Rebased series to the latest qemu code. Stefan

[Qemu-devel] [PATCH v5 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-07-15 Thread Stefan Brankovic
-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 136 +++- 3 files changed, 134 insertions(+), 13 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc

[Qemu-devel] [PATCH v5 6/8] target/ppc: Optimize emulation of vclzw instruction

2019-07-15 Thread Stefan Brankovic
(one for each word elemnt of source register vB). Every iteration consists of loading appropriate word element from source register, counting leading zeros with tcg_gen_clzi_i32, and saving the result in appropriate word element of destination register. Signed-off-by: Stefan Brankovic Reviewed

[Qemu-devel] [PATCH v5 5/8] target/ppc: Optimize emulation of vclzd instruction

2019-07-15 Thread Stefan Brankovic
instruction two times(once for each doubleword element of source register vB) and placing result in appropriate doubleword element of destination register vD. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 1 - target/ppc/int_helper.c

[Qemu-devel] [PATCH v5 4/8] target/ppc: Optimize emulation of vgbbd instruction

2019-07-15 Thread Stefan Brankovic
doubleword element of destination register vD. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 1 - target/ppc/int_helper.c | 276 target/ppc/translate/vmx-impl.inc.c | 77 +- 3 files

[Qemu-devel] [PATCH v5 8/8] target/ppc: Refactor emulation of vmrgew and vmrgow instructions

2019-07-15 Thread Stefan Brankovic
, and second one with a helper. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 66 + 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index

[Qemu-devel] [PATCH v5 3/8] target/ppc: Optimize emulation of vpkpx instruction

2019-07-15 Thread Stefan Brankovic
iterations, 1 for each pixel) and save result in tmp variable. In the end of outer for loop, the result is merged in variable called result and saved in appropriate doubleword element of vD if the whole doubleword is finished(every second iteration). The outer loop has 4 iterations. Signed-off-by: Stefan

[Qemu-devel] [PATCH v5 1/8] target/ppc: Optimize emulation of lvsl and lvsr instructions

2019-07-15 Thread Stefan Brankovic
obtained is placed in lower doubleword element of vD. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 18 -- target/ppc/translate/vmx-impl.inc.c | 121 ++-- 3

[Qemu-devel] [PATCH v5 2/8] target/ppc: Optimize emulation of vsl and vsr instructions

2019-07-15 Thread Stefan Brankovic
higher doubleword element, shift operation is performed on lower doubleword element of vA, with replacement of highest sh bits(that are now 0) with bits saved in shifted. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 - target/ppc

[Qemu-devel] [PATCH v5 0/8] target/ppc: Optimize emulation of some Altivec instructions

2019-07-15 Thread Stefan Brankovic
instructions to vector operations. V3: Fixed problem during build. V2: Addressed Richard's Henderson's suggestions. Fixed problem during build on patch 2/8. Rebased series to the latest qemu code. Stefan Brankovic (8): target/ppc: Optimize emulation of lvsl and lvsr instructions target/ppc

[Qemu-devel] [PATCH v4 12/13] tcg/i386: Implement vector vmrgl instructions

2019-06-27 Thread Stefan Brankovic
Signed-off-by: Stefan Brankovic --- tcg/i386/tcg-target.h | 2 +- tcg/i386/tcg-target.inc.c | 10 ++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index e825324..d20d08f 100644 --- a/tcg/i386/tcg-target.h +++ b/tcg/i386

[Qemu-devel] [PATCH v4 08/13] tcg: Add opcodes for vector vmrgh instructions

2019-06-27 Thread Stefan Brankovic
Signed-off-by: Stefan Brankovic --- accel/tcg/tcg-runtime-gvec.c | 42 ++ accel/tcg/tcg-runtime.h | 4 tcg/i386/tcg-target.h| 1 + tcg/tcg-op-gvec.c| 23 +++ tcg/tcg-op-gvec.h| 3 +++ tcg/tcg

[Qemu-devel] [PATCH v4 07/13] target/ppc: Refactor emulation of vmrgew and vmrgow instructions

2019-06-27 Thread Stefan Brankovic
, and second one with a helper. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 66 + 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index

[Qemu-devel] [PATCH v4 11/13] tcg: Add opcodes for verctor vmrgl instructions

2019-06-27 Thread Stefan Brankovic
Signed-off-by: Stefan Brankovic --- accel/tcg/tcg-runtime-gvec.c | 42 ++ accel/tcg/tcg-runtime.h | 4 tcg/i386/tcg-target.h| 1 + tcg/tcg-op-gvec.c| 24 tcg/tcg-op-gvec.h| 2 ++ tcg/tcg

[Qemu-devel] [PATCH v4 09/13] tcg/i386: Implement vector vmrgh instructions

2019-06-27 Thread Stefan Brankovic
Signed-off-by: Stefan Brankovic --- tcg/i386/tcg-target.h | 2 +- tcg/i386/tcg-target.inc.c | 19 +++ 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h index e11b22d..daae35f 100644 --- a/tcg/i386/tcg-target.h +++ b

[Qemu-devel] [PATCH v4 06/13] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-06-27 Thread Stefan Brankovic
-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 122 +++- 3 files changed, 120 insertions(+), 13 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc

[Qemu-devel] [PATCH v4 10/13] target/ppc: convert vmrgh instructions to vector operations

2019-06-27 Thread Stefan Brankovic
Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 3 --- target/ppc/int_helper.c | 2 +- target/ppc/translate/vmx-impl.inc.c | 6 +++--- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index ac1a5bd

[Qemu-devel] [PATCH v4 03/13] target/ppc: Optimize emulation of vgbbd instruction

2019-06-27 Thread Stefan Brankovic
doubleword element of destination register vD. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 1 - target/ppc/int_helper.c | 276 target/ppc/translate/vmx-impl.inc.c | 77 +- 3 files

[Qemu-devel] [PATCH v4 01/13] target/ppc: Optimize emulation of lvsl and lvsr instructions

2019-06-27 Thread Stefan Brankovic
obtained is placed in lower doubleword element of vD. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 18 -- target/ppc/translate/vmx-impl.inc.c | 121 ++-- 3

[Qemu-devel] [PATCH v4 05/13] target/ppc: Optimize emulation of vclzw instruction

2019-06-27 Thread Stefan Brankovic
(one for each word elemnt of source register vB). Every iteration consists of loading appropriate word element from source register, counting leading zeros with tcg_gen_clzi_i32, and saving the result in appropriate word element of destination register. Signed-off-by: Stefan Brankovic Reviewed

[Qemu-devel] [PATCH v4 13/13] target/ppc: convert vmrgl instructions to vector operations

2019-06-27 Thread Stefan Brankovic
Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 3 --- target/ppc/int_helper.c | 9 - target/ppc/translate/vmx-impl.inc.c | 6 +++--- 3 files changed, 3 insertions(+), 15 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index

[Qemu-devel] [PATCH v4 04/13] target/ppc: Optimize emulation of vclzd instruction

2019-06-27 Thread Stefan Brankovic
instruction two times(once for each doubleword element of source register vB) and placing result in appropriate doubleword element of destination register vD. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 1 - target/ppc/int_helper.c

[Qemu-devel] [PATCH v4 00/13] target/ppc, tcg, tcg/i386: Optimize emulation of some Altivec instructions

2019-06-27 Thread Stefan Brankovic
instructions for i386. Converted vmrgh and vmrgl instructions to vector operations. V3: Fixed problem during build. V2: Addressed Richard's Henderson's suggestions. Fixed problem during build on patch 2/8. Rebased series to the latest qemu code. Stefan Brankovic (13): target/ppc: Optimize emulation

[Qemu-devel] [PATCH v4 02/13] target/ppc: Optimize emulation of vsl and vsr instructions

2019-06-27 Thread Stefan Brankovic
higher doubleword element, shift operation is performed on lower doubleword element of vA, with replacement of highest sh bits(that are now 0) with bits saved in shifted. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 2 - target/ppc

Re: [Qemu-devel] ?==?utf-8?q? [PATCH v3 0/8] target/ppc: Optimize emulation of some Altivec

2019-06-24 Thread Stefan Brankovic
of some Altivec Date: Monday, June 24, 2019 13:20 CEST From: Howard Spoelstra To: Stefan Brankovic CC: qemu-devel qemu-devel References: <1561371065-3637-1-git-send-email-stefan.branko...@rt-rk.com> <43c6-5d10a600-15-34dab4c0@176981179>    On Mon, Jun 24, 2019 at 12:28 PM Stef

Re: [Qemu-devel] ?==?utf-8?q? [PATCH v3 0/8] target/ppc: Optimize emulation of some Altivec

2019-06-24 Thread Stefan Brankovic
Original Message Subject: [PATCH v3 0/8] target/ppc: Optimize emulation of some Altivec Date: Monday, June 24, 2019 12:10 CEST From: Stefan Brankovic To: stefan.branko...@rt-rk.com  Optimize emulation of ten Altivec instructions: lvsl, lvsr, vsl, vsr, vpkpx, vgbbd, vclzb, vclzh, vclzw and vclzd

[Qemu-devel] [PATCH v3 5/8] target/ppc: Optimize emulation of vclzd instruction

2019-06-21 Thread Stefan Brankovic
instruction two times(once for each doubleword element of source register vB) and placing result in appropriate doubleword element of destination register vD. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 1 - target/ppc/int_helper.c

[Qemu-devel] [PATCH v3 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-06-21 Thread Stefan Brankovic
-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 122 +++- 3 files changed, 120 insertions(+), 13 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc

[Qemu-devel] [PATCH v3 8/8] target/ppc: Refactor emulation of vmrgew and vmrgow instructions

2019-06-21 Thread Stefan Brankovic
, and second one with a helper. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 66 + 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index

[Qemu-devel] [PATCH v3 4/8] target/ppc: Optimize emulation of vgbbd instruction

2019-06-21 Thread Stefan Brankovic
doubleword element of destination register vD. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 1 - target/ppc/int_helper.c | 276 target/ppc/translate/vmx-impl.inc.c | 77 +- 3 files changed, 76 insertions(+), 278

[Qemu-devel] [PATCH v3 0/8] target/ppc: Optimize emulation of some Altivec

2019-06-21 Thread Stefan Brankovic
is presented in this series. The performance improvements are significant in all cases. V3: Fixed problem during build. V2: Addressed Richard's Henderson's suggestions. Fixed problem during build on patch 2/8. Rebased series to the latest qemu code. Stefan Brankovic (8): target/ppc: Optimize

[Qemu-devel] [PATCH v3 1/8] target/ppc: Optimize emulation of lvsl and lvsr instructions

2019-06-21 Thread Stefan Brankovic
obtained is placed in lower doubleword element of vD. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 18 - target/ppc/translate/vmx-impl.inc.c | 129 +++- 3 files changed, 97 insertions(+), 52

[Qemu-devel] [PATCH v3 3/8] target/ppc: Optimize emulation of vpkpx instruction

2019-06-21 Thread Stefan Brankovic
iterations, 1 for each pixel) and save result in tmp variable. In the end of outer for loop, the result is merged in variable called result and saved in appropriate doubleword element of vD if the whole doubleword is finished(every second iteration). The outer loop has 4 iterations. Signed-off-by: Stefan

[Qemu-devel] [PATCH v3 2/8] target/ppc: Optimize emulation of vsl and vsr instructions

2019-06-21 Thread Stefan Brankovic
higher doubleword element, shift operation is performed on lower doubleword element of vA, with replacement of highest sh bits(that are now 0) with bits saved in shifted. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 35

[Qemu-devel] [PATCH v3 6/8] target/ppc: Optimize emulation of vclzw instruction

2019-06-21 Thread Stefan Brankovic
(one for each word elemnt of source register vB). Every iteration consists of loading appropriate word element from source register, counting leading zeros with tcg_gen_clzi_i32, and saving the result in appropriate word element of destination register. Signed-off-by: Stefan Brankovic --- target

[Qemu-devel] [PATCH 6/8] target/ppc: Optimize emulation of vclzw instruction

2019-06-19 Thread Stefan Brankovic
(one for each word elemnt of source register vB). Every iteration consists of loading appropriate word element from source register, counting leading zeros with tcg_gen_clzi_i32, and saving the result in appropriate word element of destination register. Signed-off-by: Stefan Brankovic --- target

[Qemu-devel] [PATCH 0/8] target/ppc: Optimize emulation of some Altivec instructions

2019-06-19 Thread Stefan Brankovic
is presented in this series. The performance improvements are significant in all cases. V2: Addressed Richard's Henderson's suggestions. Fixed problem during build on patch 2/8. Rebased series to the latest qemu code. Stefan Brankovic (8): target/ppc: Optimize emulation of lvsl and lvsr

[Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-06-19 Thread Stefan Brankovic
-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 9 --- target/ppc/translate/vmx-impl.inc.c | 122 +++- 3 files changed, 120 insertions(+), 13 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc

[Qemu-devel] [PATCH 2/8] target/ppc: Optimize emulation of vsl and vsr instructions

2019-06-19 Thread Stefan Brankovic
higher doubleword element, shift operation is performed on lower doubleword element of vA, with replacement of highest sh bits(that are now 0) with bits saved in shifted. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 35

[Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd instruction

2019-06-19 Thread Stefan Brankovic
doubleword element of destination register vD. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 1 - target/ppc/int_helper.c | 276 target/ppc/translate/vmx-impl.inc.c | 77 +- 3 files changed, 76 insertions(+), 278

[Qemu-devel] [PATCH 3/8] target/ppc: Optimize emulation of vpkpx instruction

2019-06-19 Thread Stefan Brankovic
iterations, 1 for each pixel) and save result in tmp variable. In the end of outer for loop, the result is merged in variable called result and saved in appropriate doubleword element of vD if the whole doubleword is finished(every second iteration). The outer loop has 4 iterations. Signed-off-by: Stefan

[Qemu-devel] [PATCH 8/8] target/ppc: Refactor emulation of vmrgew and vmrgow instructions

2019-06-19 Thread Stefan Brankovic
, and second one with a helper. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 66 + 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index

[Qemu-devel] [PATCH 1/8] target/ppc: Optimize emulation of lvsl and lvsr instructions

2019-06-19 Thread Stefan Brankovic
obtained is placed in lower doubleword element of vD. Signed-off-by: Stefan Brankovic --- target/ppc/helper.h | 2 - target/ppc/int_helper.c | 18 -- target/ppc/translate/vmx-impl.inc.c | 120 ++-- 3 files changed, 88 insertions

[Qemu-devel] [PATCH 5/8] target/ppc: Optimize emulation of vclzd instruction

2019-06-19 Thread Stefan Brankovic
instruction two times(once for each doubleword element of source register vB) and placing result in appropriate doubleword element of destination register vD. Signed-off-by: Stefan Brankovic Reviewed-by: Richard Henderson --- target/ppc/helper.h | 1 - target/ppc/int_helper.c

Re: [Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd instruction

2019-06-17 Thread Stefan Brankovic
On 6.6.19. 20:19, Richard Henderson wrote: On 6/6/19 5:15 AM, Stefan Brankovic wrote: Optimize altivec instruction vgbbd (Vector Gather Bits by Bytes by Doubleword) All ith bits (i in range 1 to 8) of each byte of doubleword element in source register are concatenated and placed into ith byte

Re: [Qemu-devel] [PATCH 6/8] target/ppc: Optimize emulation of vclzw instruction

2019-06-17 Thread Stefan Brankovic
On 6.6.19. 20:34, Richard Henderson wrote: On 6/6/19 5:15 AM, Stefan Brankovic wrote: +for (i = 0; i < 2; i++) { +if (i == 0) { +/* Get high doubleword element of vB in avr. */ +get_avr64(avr, VB, true); +} else { +/* Get low doublew

Re: [Qemu-devel] [PATCH 8/8] target/ppc: Refactor emulation of vmrgew and vmrgow instructions

2019-06-17 Thread Stefan Brankovic
On 6.6.19. 22:43, Richard Henderson wrote: On 6/6/19 5:15 AM, Stefan Brankovic wrote: +/* + * We use this macro if one instruction is realized with direct + * translation, and second one with helper. + */ +#define GEN_VXFORM_TRANS_DUAL(name0, flg0, flg2_0, name1, flg1, flg2_1)\ +static void

Re: [Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-06-17 Thread Stefan Brankovic
On 6.6.19. 22:38, Richard Henderson wrote: On 6/6/19 5:15 AM, Stefan Brankovic wrote: Optimize Altivec instruction vclzh (Vector Count Leading Zeros Halfword). This instruction counts the number of leading zeros of each halfword element in source register and places result in the appropriate

Re: [Qemu-devel] [PATCH 2/8] target/ppc: Optimize emulation of vsl and vsr instructions

2019-06-17 Thread Stefan Brankovic
On 6.6.19. 19:03, Richard Henderson wrote: On 6/6/19 5:15 AM, Stefan Brankovic wrote: +tcg_gen_subi_i64(tmp, sh, 64); +tcg_gen_neg_i64(tmp, tmp); Better as tcg_gen_subfi_i64(tmp, 64, sh); I was aware there must be way of doing it in a single tcg invocation, but couldn't find

Re: [Qemu-devel] [PATCH 0/8] Optimize emulation of ten Altivec instructions: lvsl,

2019-06-17 Thread Stefan Brankovic
On 6.6.19. 19:13, Richard Henderson wrote: On 6/6/19 5:15 AM, Stefan Brankovic wrote: Stefan Brankovic (8): target/ppc: Optimize emulation of lvsl and lvsr instructions target/ppc: Optimize emulation of vsl and vsr instructions target/ppc: Optimize emulation of vpkpx instruction

Re: [Qemu-devel] [PATCH 1/8] target/ppc: Optimize emulation of lvsl and lvsr instructions

2019-06-17 Thread Stefan Brankovic
On 6.6.19. 18:46, Richard Henderson wrote: On 6/6/19 5:15 AM, Stefan Brankovic wrote: +tcg_gen_addi_i64(result, sh, 7); +for (i = 7; i >= 1; i--) { +tcg_gen_shli_i64(tmp, sh, i * 8); +tcg_gen_or_i64(result, result, tmp); +tcg_gen_addi_i64(sh, sh

Re: [Qemu-devel] ?==?utf-8?q? ?==?utf-8?q? [PATCH 0/8] Optimize emulation of ten Altivec instructions:?==?utf-8?q? lvsl,

2019-06-12 Thread Stefan Brankovic
> > > Original Message > Subject: Re: [Qemu-devel] [PATCH 0/8] Optimize emulation of ten Altivec > instructions: lvsl, > Date: Thursday, June 6, 2019 19:13 CEST > From: Richard Henderson > To: Stefan Brankovic , qemu-devel@nongnu.org > CC:

[Qemu-devel] [PATCH 3/8] target/ppc: Optimize emulation of vpkpx instruction

2019-06-06 Thread Stefan Brankovic
. In the end of outer for loop, we merge result in variable called result and save it in appropriate doubleword element of vD if whole doubleword is finished(every second iteration). Outer loop has 4 iterations. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 93

[Qemu-devel] [PATCH 8/8] target/ppc: Refactor emulation of vmrgew and vmrgow instructions

2019-06-06 Thread Stefan Brankovic
, and second one with helper. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 62 - 1 file changed, 33 insertions(+), 29 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index 8535a31

[Qemu-devel] [PATCH 4/8] target/ppc: Optimize emulation of vgbbd instruction

2019-06-06 Thread Stefan Brankovic
and result2 is placed in appropriate doubleword element of vD. We repeat this 2 times. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 99 - 1 file changed, 98 insertions(+), 1 deletion(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b

[Qemu-devel] [PATCH 6/8] target/ppc: Optimize emulation of vclzw instruction

2019-06-06 Thread Stefan Brankovic
. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 57 - 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c index 1c34908..7689739 100644 --- a/target/ppc

[Qemu-devel] [PATCH 5/8] target/ppc: Optimize emulation of vclzd instruction

2019-06-06 Thread Stefan Brankovic
instruction two times(once for each doubleword element of source register vB) and placing result in appropriate doubleword element of destination register vD. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 28 +++- 1 file changed, 27 insertions(+), 1

[Qemu-devel] [PATCH 0/8] Optimize emulation of ten Altivec instructions: lvsl,

2019-06-06 Thread Stefan Brankovic
to ppc platform, so relatively complex TCG translation (without direct mapping to host instruction that is not possible in these cases) seems to be the best option, and that approach is presented in this series. The performance improvements are significant in all cases. Stefan Brankovic (8): target

[Qemu-devel] [PATCH 1/8] target/ppc: Optimize emulation of lvsl and lvsr instructions

2019-06-06 Thread Stefan Brankovic
it in appropriate byte of variable result) and save them in higher doubleword element of vD. We repeat this once again for lower doubleword element of vD by creating bytes (24-sh):(32-sh) in a for loop and saving result. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 143

[Qemu-devel] [PATCH 7/8] target/ppc: Optimize emulation of vclzh and vclzb instructions

2019-06-06 Thread Stefan Brankovic
result in appropriate doubleword element of destination register vD. We repeat this once again for lower doubleword element of vB. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 122 +++- 1 file changed, 120 insertions(+), 2 deletions(-) diff

[Qemu-devel] [PATCH 2/8] target/ppc: Optimize emulation of vsl and vsr instructions

2019-06-06 Thread Stefan Brankovic
element of vA and replace highest sh bits(that are now 0) with bits saved in shifted. Signed-off-by: Stefan Brankovic --- target/ppc/translate/vmx-impl.inc.c | 101 +++- 1 file changed, 99 insertions(+), 2 deletions(-) diff --git a/target/ppc/translate/vmx-impl.inc.c

[Qemu-devel] [PATCH 1/2] target/tilegx: Implement emulation of TILEGX instructions V1CMPLEU and V1CMPLTU

2019-03-11 Thread Stefan Brankovic
Implement emulation of TILEGX instruction V1CMPLEU and V1CMPLTU using TCG front end operations. Signed-off-by: Stefan Brankovic --- target/tilegx/translate.c | 62 --- 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/target/tilegx

[Qemu-devel] [PATCH 0/2] Add support for some comparison instructions

2019-03-11 Thread Stefan Brankovic
Implement emulation of TILE-Gx instructions V1CMPLEU, V1CMPLTU, V2CMPLEU, and V2CMPLTU. Stefan Brankovic (2): target/tilegx: Implement emulation of TILEGX instructions V1CMPLEU and V1CMPLTU target/tilegx: Implement emulation of TILEGX instructions V2CMPLEU and V2CMPLTU target/tilegx

[Qemu-devel] [PATCH 2/2] target/tilegx: Implement emulation of TILEGX instructions V2CMPLEU and V2CMPLTU

2019-03-11 Thread Stefan Brankovic
Implement emulation of TILEGX instruction V2CMPLEU and V2CMPLTU using TCG front end operations. Signed-off-by: Stefan Brankovic --- target/tilegx/translate.c | 62 --- 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/target/tilegx