There are important differences in the ABI, since saved
registers or passed values can take twice as much stack space.
This patch add mechanism which allows optimizations to be run only
on 32-bit platforms since all optimizations are done in assembly.
---
pixman/pixman-mips.c |4
1 files
---
test/lowlevel-blt-bench.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
index 3da094a..9d7fc3f 100644
--- a/test/lowlevel-blt-bench.c
+++ b/test/lowlevel-blt-bench.c
@@ -647,6 +647,7 @@ tests_tbl[] =
{ "src_8
Some of the optimizations introduced in previous DSPr2 commits were not DSPr2
specific. Some of the fast-paths didn't used DSPr2 instructions at all, and
rather utilized more generic MIPS32r2 instruction set or previous version of
DSP instruction set (DSPr1) for optimizations.
Since Pixman's run
isa filed (mips32r2) is available from kernel version 3.9
ASEs implemented field (dsp, dsp2) is available from 3.7
In older kernel versions "dsp" represents both DSPr1 and DSPr2
if kernel version is 3.7 and above runtime detection tries
to find 'dsp2' in /proc/cpuinfo. If it fails or if kernel
ver
Some of the optimizations introduced in previous dspr2 commits, similar to
previous patch, were not dspr2 specific and utilized dspr1 instructions only.
Since Pixman's run-time CPU detection only added dspr2 fast-paths on 74K MIPS
cores, these optimizations couldn't be used on cores that don't supp
pointer to function (memcpy) added to pixman_implementation_t
and it points to C version of memcpy (linked in
pixman-general.c). Function to call is pixman_memcpy and
every call of memcpy is replaced with pixman_memcpy.
If there is optimized version of memcpy it should
be linked with imp->memcpy.
-
Hi Siarhei, Soren,
Please take a look at the latest patch set for MIPS. Sorry for slow response.
In this patch set, fix for this issue is also included (it is the first commit)
and it is now on top of the current source tree.
Thanks,
Nemanja Lukic
-Original Message-
From: Siarhei Siamas
configure.ac - added compiler's check for dspr1
Makefile.am - added files for dspr1 support
pixman-mips.c - runtime detection extended to
support dspr1 (searches dspr1 cores
or 'dsp' in ASEs implemented (for
kernels >= 3.7))
pixman-mips-dspr1.c - adde
configure.ac - added compiler's check for mips32r2
Makefile.am - added files for mips32r2 support
pixman-mips.c - runtime detection extended to
support mips32r2 (searches mips32r2 cores
or 'mips32r2' in ASEs implemented (for
kernels >= 3.7))
pixman-mi
PrepareForStore prefetch is destructive and affects the whole
cache line. Running the code which assumes 32 byte cache line size on
the system with 64 byte cache lines may cause data corruption.
Added mechanism to allow prefetch only if cache line size is 32.
Added no_prefetch version of functions
Performance numbers before/after on MIPS-24kc @ 500 MHz
Referent (before):
src_n_0565= L1: 117.24 L2: 110.68 M:115.83 ( 96.31%) HT: 78.96 VT:
75.03 R: 65.98 RT: 24.94 ( 164Kops/s)
Optimized (with these optimizations):
src_n_0565= L1: 429.43 L2: 299.39 M:346.21 (287.61
Build restriction wasn't good since it demands '-mips32r2'
in CFLAGS during configuration to enable DSPr2 optimizations.
Additional CFLAGS are not needed now and pixman could build
targeting the lowest common denominator.
Architecture and ISA are set in inline assembler
to allow compiler to build t
---
pixman/pixman-mips-dspr2-asm.S |2 +-
pixman/pixman-mips-dspr2-asm.h |2 +-
pixman/pixman-mips-dspr2.c |2 +-
pixman/pixman-mips-dspr2.h |2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/pixman/pixman-mips-dspr2-asm.S b/pixman/pixman-mips-dspr2-asm.S
---
pixman/pixman-mips-dspr2-asm.h |3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/pixman/pixman-mips-dspr2-asm.h b/pixman/pixman-mips-dspr2-asm.h
index cab122d..11849bd 100644
--- a/pixman/pixman-mips-dspr2-asm.h
+++ b/pixman/pixman-mips-dspr2-asm.h
@@ -72,7 +72,10 @@
[...]
> > as, at least on the platforms that I tried, comparisons on vectors yield
> > (unsigned)-1 as representation of "true."
>
> Yes, this behaves exactly as documented at:
>
> https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html
>
> "Vectors are compared element-wise producing 0 wh
15 matches
Mail list logo