Re: [Qemu-devel] [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction

2019-08-29 Thread Richard Henderson
On 8/29/19 6:34 AM, Stefan Brankovic wrote: > Then I run my performance tests and I got following results(test is calling > vpkpx 10 times): > > 1) Current helper implementation: ~ 157 ms > > 2) helper implementation you suggested: ~94 ms > > 3) tcg implementation: ~75 ms I assume you

Re: [Qemu-devel] [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction

2019-08-29 Thread Stefan Brankovic
On 27.8.19. 20:52, Richard Henderson wrote: On 8/27/19 2:37 AM, Stefan Brankovic wrote: +for (i = 0; i < 4; i++) { +switch (i) { +case 0: +/* + * Get high doubleword of vA to perfrom 6-5-5 pack of pixels + * 1 and 2. + */ +

Re: [Qemu-devel] [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction

2019-08-27 Thread BALATON Zoltan
On Tue, 27 Aug 2019, Richard Henderson wrote: On 8/27/19 2:37 AM, Stefan Brankovic wrote: +for (i = 0; i < 4; i++) { +switch (i) { +case 0: +/* + * Get high doubleword of vA to perfrom 6-5-5 pack of pixels + * 1 and 2. + */ +

Re: [Qemu-devel] [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction

2019-08-27 Thread Richard Henderson
On 8/27/19 2:37 AM, Stefan Brankovic wrote: > +for (i = 0; i < 4; i++) { > +switch (i) { > +case 0: > +/* > + * Get high doubleword of vA to perfrom 6-5-5 pack of pixels > + * 1 and 2. > + */ > +get_avr64(avr, VA,

[Qemu-devel] [PATCH v6 1/3] target/ppc: Optimize emulation of vpkpx instruction

2019-08-27 Thread Stefan Brankovic
Optimize altivec instruction vpkpx (Vector Pack Pixel). Rearranges 8 pixels coded in 6-5-5 pattern (4 from each source register) into contigous array of bits in the destination register. In each iteration of outer loop, the instruction is to be done with the 6-5-5 pack for 2 pixels of each