Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Steven Newbury
On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
 siavashser...@gmail.com wrote:
  Then I do recommend removing the if (cpu_has_sse4_1) from this 
  patch and similar places, because there is no runtime CPU 
  dispatching happening for SSE optimized code paths in action and 
  just adds extra overhead (unnecessary branches) to the generated 
  code.
 
 No. Sorry, I realize I misread your previous question:
 
   I guess checking for cpu_has_sse4_1 is unnecessary if it isn't 
   controllable by user at runtime; because USE_SSE41 is a 
   compile time check and requires the target machine to be SSE 4.1 
   capable already.
 
 USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you 
 to build the code and then use it only on systems that actually 
 support it.
 
 All of this could have been pretty easily answered by a few greps 
 though...

I wonder what difference it would make to have an option to compile 
out the run-time check code to avoid the additional overhead in cases 
where the builder *knows* at compile time what the run-time system is? 
(ie Gentoo)


signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Timothy Arceri
On Fri, 2014-11-07 at 11:44 +, Steven Newbury wrote:
 On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
  On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
  siavashser...@gmail.com wrote:
   Then I do recommend removing the if (cpu_has_sse4_1) from this 
   patch and similar places, because there is no runtime CPU 
   dispatching happening for SSE optimized code paths in action and 
   just adds extra overhead (unnecessary branches) to the generated 
   code.
  
  No. Sorry, I realize I misread your previous question:
  
I guess checking for cpu_has_sse4_1 is unnecessary if it isn't 
controllable by user at runtime; because USE_SSE41 is a 
compile time check and requires the target machine to be SSE 4.1 
capable already.
  
  USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you 
  to build the code and then use it only on systems that actually 
  support it.
  
  All of this could have been pretty easily answered by a few greps 
  though...
 
 I wonder what difference it would make to have an option to compile 
 out the run-time check code to avoid the additional overhead in cases 
 where the builder *knows* at compile time what the run-time system is? 
 (ie Gentoo)

As long as the check is placed in the right location it shouldn't really
make a noticeable difference. i.e. just outside the hotspot and not
inside it. 

Things that will have more impact is not being able to inline certain
code such as in the latest patchset I sent out. It seems this is another
side effect the way gcc handles intrinsics.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Siavash Eliasi


On 11/07/2014 03:14 PM, Steven Newbury wrote:

On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:

On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
siavashser...@gmail.com wrote:

Then I do recommend removing the if (cpu_has_sse4_1) from this
patch and similar places, because there is no runtime CPU
dispatching happening for SSE optimized code paths in action and
just adds extra overhead (unnecessary branches) to the generated
code.

No. Sorry, I realize I misread your previous question:


I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
controllable by user at runtime; because USE_SSE41 is a
compile time check and requires the target machine to be SSE 4.1
capable already.

USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
to build the code and then use it only on systems that actually
support it.

All of this could have been pretty easily answered by a few greps
though...

I wonder what difference it would make to have an option to compile
out the run-time check code to avoid the additional overhead in cases
where the builder *knows* at compile time what the run-time system is?
(ie Gentoo)
I think that's possible. Since cpu_has_sse4_1 and friends are simply 
macros, one can set them to true or 1 during compile time if it's 
going to be built for an SSE 4.1 capable target so your smart compiler 
will totally get rid of the unnecessary runtime check.


I guess common_x86_features.h should be modified to something like this:

#ifdef __SSE4_1__
#define cpu_has_sse4_1 1
#else
#define cpu_has_sse4_1(_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
#endif
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread steve
On Fri Nov 7 14:09:09 2014 GMT, Siavash Eliasi wrote:
 
 On 11/07/2014 03:14 PM, Steven Newbury wrote:
  On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
  On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
  siavashser...@gmail.com wrote:
  Then I do recommend removing the if (cpu_has_sse4_1) from this
  patch and similar places, because there is no runtime CPU
  dispatching happening for SSE optimized code paths in action and
  just adds extra overhead (unnecessary branches) to the generated
  code.
  No. Sorry, I realize I misread your previous question:
 
  I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
  controllable by user at runtime; because USE_SSE41 is a
  compile time check and requires the target machine to be SSE 4.1
  capable already.
  USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
  to build the code and then use it only on systems that actually
  support it.
 
  All of this could have been pretty easily answered by a few greps
  though...
  I wonder what difference it would make to have an option to compile
  out the run-time check code to avoid the additional overhead in cases
  where the builder *knows* at compile time what the run-time system is?
  (ie Gentoo)
 I think that's possible. Since cpu_has_sse4_1 and friends are simply 
 macros, one can set them to true or 1 during compile time if it's 
 going to be built for an SSE 4.1 capable target so your smart compiler 
 will totally get rid of the unnecessary runtime check.
 
 I guess common_x86_features.h should be modified to something like this:
 
 #ifdef __SSE4_1__
 #define cpu_has_sse4_1 1
 #else
 #define cpu_has_sse4_1(_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
 #endif

Yes, this was what I was thinking.  Then perhaps an option for disabling 
run-time detection, with the available  cpu features then determined during 
configuration setting  appropriate defines.

Whether it's worth it I don't know. I can imagine the compiler having an easier 
job optimizing the code.
-- 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Ian Romanick
On 11/07/2014 06:09 AM, Siavash Eliasi wrote:
 
 On 11/07/2014 03:14 PM, Steven Newbury wrote:
 On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:
 On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
 siavashser...@gmail.com wrote:
 Then I do recommend removing the if (cpu_has_sse4_1) from this
 patch and similar places, because there is no runtime CPU
 dispatching happening for SSE optimized code paths in action and
 just adds extra overhead (unnecessary branches) to the generated
 code.
 No. Sorry, I realize I misread your previous question:

 I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
 controllable by user at runtime; because USE_SSE41 is a
 compile time check and requires the target machine to be SSE 4.1
 capable already.
 USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
 to build the code and then use it only on systems that actually
 support it.

 All of this could have been pretty easily answered by a few greps
 though...
 I wonder what difference it would make to have an option to compile
 out the run-time check code to avoid the additional overhead in cases
 where the builder *knows* at compile time what the run-time system is?
 (ie Gentoo)
 I think that's possible. Since cpu_has_sse4_1 and friends are simply
 macros, one can set them to true or 1 during compile time if it's
 going to be built for an SSE 4.1 capable target so your smart compiler
 will totally get rid of the unnecessary runtime check.
 
 I guess common_x86_features.h should be modified to something like this:
 
 #ifdef __SSE4_1__
 #define cpu_has_sse4_1 1
 #else
 #define cpu_has_sse4_1(_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
 #endif

I was thinking about doing something similar for cpu_has_xmm and
cpu_has_xmm2 for x64.  SSE and SSE2 are required parts of that
instruction set, so they're always there.

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Siavash Eliasi
Then I do recommend removing the if (cpu_has_sse4_1) from this patch 
and similar places, because there is no runtime CPU dispatching 
happening for SSE optimized code paths in action and just adds extra 
overhead (unnecessary branches) to the generated code.


Same must be applied to these patches:
[Mesa-dev] [PATCH 2/2] i965: add runtime check for SSSE3 rgba8_copy
http://lists.freedesktop.org/archives/mesa-dev/2014-November/070256.html

Best regards,
Siavash Eliasi.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-07 Thread Siavash Eliasi


On 11/07/2014 07:31 PM, Ian Romanick wrote:

On 11/07/2014 06:09 AM, Siavash Eliasi wrote:

On 11/07/2014 03:14 PM, Steven Newbury wrote:

On Thu, 2014-11-06 at 21:00 -0800, Matt Turner wrote:

On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi 
siavashser...@gmail.com wrote:

Then I do recommend removing the if (cpu_has_sse4_1) from this
patch and similar places, because there is no runtime CPU
dispatching happening for SSE optimized code paths in action and
just adds extra overhead (unnecessary branches) to the generated
code.

No. Sorry, I realize I misread your previous question:


I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
controllable by user at runtime; because USE_SSE41 is a
compile time check and requires the target machine to be SSE 4.1
capable already.

USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
to build the code and then use it only on systems that actually
support it.

All of this could have been pretty easily answered by a few greps
though...

I wonder what difference it would make to have an option to compile
out the run-time check code to avoid the additional overhead in cases
where the builder *knows* at compile time what the run-time system is?
(ie Gentoo)

I think that's possible. Since cpu_has_sse4_1 and friends are simply
macros, one can set them to true or 1 during compile time if it's
going to be built for an SSE 4.1 capable target so your smart compiler
will totally get rid of the unnecessary runtime check.

I guess common_x86_features.h should be modified to something like this:

#ifdef __SSE4_1__
#define cpu_has_sse4_1 1
#else
#define cpu_has_sse4_1(_mesa_x86_cpu_features  X86_FEATURE_SSE4_1)
#endif

I was thinking about doing something similar for cpu_has_xmm and
cpu_has_xmm2 for x64.  SSE and SSE2 are required parts of that
instruction set, so they're always there.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



I can come up with a patch implementing the same for SSE, SSE2, SSE3 and 
SSSE3 if current approach is fine by you.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Juha-Pekka Heikkila
On 29.10.2014 14:05, Timothy Arceri wrote:
 Makes use of SSE to speed up compute of min and max elements
 
 Callgrind cpu usage results from pts benchmarks:
 
 Openarena 0.8.8: 3.67% - 1.03%
 UrbanTerror: 2.36% - 0.81%
 
 V5:
 - actually make use of the optimisation in android (Emil Velikov)
 - set a better array size limit for using SSE and added TODO
 
 V4:
 - fixed bugs with incrementing pointer and updating counters
 
 V3:
 - Removed sse_minmax.c from Makefile.sources
 - handle the first few values without SSE until the pointer is aligned
  and use _mm_load_si128 rather than _mm_loadu_si128
 - guard the call to the SSE code better at build time
 
 V2:
 - removed GL* types
 - use _mm_store_si128() rather than _mm_store_ps()
 - add runtime check for SSE
 - use aligned attribute for local mix/max
 - bunch of tidyups
 
 Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
 ---
  src/mesa/Android.libmesa_dricore.mk |  8 ++-
  src/mesa/Android.libmesa_st_mesa.mk |  5 ++
  src/mesa/Makefile.am|  3 +-
  src/mesa/main/sse_minmax.c  | 97 
 +
  src/mesa/main/sse_minmax.h  | 30 
  src/mesa/vbo/vbo_exec_array.c   | 14 --
  6 files changed, 152 insertions(+), 5 deletions(-)
  create mode 100644 src/mesa/main/sse_minmax.c
  create mode 100644 src/mesa/main/sse_minmax.h
 
 diff --git a/src/mesa/Android.libmesa_dricore.mk 
 b/src/mesa/Android.libmesa_dricore.mk
 index 1e6d948..2ab593d 100644
 --- a/src/mesa/Android.libmesa_dricore.mk
 +++ b/src/mesa/Android.libmesa_dricore.mk
 @@ -51,10 +51,16 @@ endif # MESA_ENABLE_ASM
  
  ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
  LOCAL_SRC_FILES += \
 - $(SRCDIR)main/streaming-load-memcpy.c
 + $(SRCDIR)main/streaming-load-memcpy.c \
 + $(SRCDIR)main/sse_minmax.c
  LOCAL_CFLAGS := -msse4.1
  endif
  
 +ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
 +LOCAL_CFLAGS += \
 +   -DUSE_SSE41
 +endif
 +
  LOCAL_C_INCLUDES := \
   $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \
   $(MESA_TOP)/src \
 diff --git a/src/mesa/Android.libmesa_st_mesa.mk 
 b/src/mesa/Android.libmesa_st_mesa.mk
 index 8b8d652..618d6bf 100644
 --- a/src/mesa/Android.libmesa_st_mesa.mk
 +++ b/src/mesa/Android.libmesa_st_mesa.mk
 @@ -48,6 +48,11 @@ ifeq ($(TARGET_ARCH),x86)
  endif # x86
  endif # MESA_ENABLE_ASM
  
 +ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
 +LOCAL_CFLAGS := \
 +   -DUSE_SSE41
 +endif
 +
  LOCAL_C_INCLUDES := \
   $(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \
   $(MESA_TOP)/src/gallium/auxiliary \
 diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
 index e71bccb..932db4f 100644
 --- a/src/mesa/Makefile.am
 +++ b/src/mesa/Makefile.am
 @@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \
   $(ARCH_LIBS)
  
  libmesa_sse41_la_SOURCES = \
 - main/streaming-load-memcpy.c
 + main/streaming-load-memcpy.c \
 + main/sse_minmax.c
  libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
  
  pkgconfigdir = $(libdir)/pkgconfig
 diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c
 new file mode 100644
 index 000..91a55e5
 --- /dev/null
 +++ b/src/mesa/main/sse_minmax.c
 @@ -0,0 +1,97 @@
 +/*
 + * Copyright © 2014 Timothy Arceri
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Author:
 + *Timothy Arceri t_arc...@yahoo.com.au
 + *
 + */
 +
 +#ifdef __SSE4_1__
 +#include main/sse_minmax.h
 +#include smmintrin.h
 +#include stdint.h
 +
 +void
 +_mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index,
 + unsigned *max_index, const unsigned count)
 +{
 +   unsigned max_ui = 0;
 +   unsigned min_ui = ~0U;
 +   unsigned i = 0;
 +   unsigned aligned_count = count;
 +
 +   /* handle the first few values without SSE until the pointer is aligned */
 +   while (((uintptr_t)ui_indices  15)  aligned_count) {
 +  if 

Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Matt Turner
On Wed, Nov 5, 2014 at 12:54 PM, Matt Turner matts...@gmail.com wrote:
 On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri t_arc...@yahoo.com.au wrote:
 There have been quite a few eyes over this now but nobody has given it a
 reviewed by yet.

 Would be nice to get it in before the code freeze. Any takers?

 Yes, I'll make sure that happens.

I made a couple of trivial changes to the commit message and added
some spaces between __m128i and * in casts and pushed it with review.

Thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Siavash Eliasi
How and when is cpu_has_sse4_1 true? Is it controllable at runtime 
through setting some environmental variable? or is it set once during 
startup by detecting CPU features?


I guess checking for cpu_has_sse4_1 is unnecessary if it isn't 
controllable by user at runtime; because USE_SSE41 is a compile time 
check and requires the target machine to be SSE 4.1 capable already.


Best regards,
Siavash Eliasi.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 1:30 AM, Siavash Eliasi siavashser...@gmail.com wrote:
 How and when is cpu_has_sse4_1 true? Is it controllable at runtime through
 setting some environmental variable? or is it set once during startup by
 detecting CPU features?

It's actually a macro, but yes, see the end of
src/mesa/x86/common_x86.c. It's set by using the CPUID instruction to
detect SSE 4.1 capabilities.

  if (ecx  bit_SSE4_1)
 _mesa_x86_cpu_features |= X86_FEATURE_SSE4_1;

 I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
 controllable by user at runtime; because USE_SSE41 is a compile time check
 and requires the target machine to be SSE 4.1 capable already.

Right.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-06 Thread Matt Turner
On Thu, Nov 6, 2014 at 8:56 PM, Siavash Eliasi siavashser...@gmail.com wrote:
 Then I do recommend removing the if (cpu_has_sse4_1) from this patch and
 similar places, because there is no runtime CPU dispatching happening for
 SSE optimized code paths in action and just adds extra overhead (unnecessary
 branches) to the generated code.

No. Sorry, I realize I misread your previous question:

 I guess checking for cpu_has_sse4_1 is unnecessary if it isn't
 controllable by user at runtime; because USE_SSE41 is a compile time check
 and requires the target machine to be SSE 4.1 capable already.

USE_SSE41 is set if the *compiler* supports SSE 4.1. This allows you
to build the code and then use it only on systems that actually
support it.

All of this could have been pretty easily answered by a few greps though...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-05 Thread Timothy Arceri
There have been quite a few eyes over this now but nobody has given it a
reviewed by yet. 

Would be nice to get it in before the code freeze. Any takers?


On Wed, 2014-10-29 at 23:05 +1100, Timothy Arceri wrote:
 Makes use of SSE to speed up compute of min and max elements
 
 Callgrind cpu usage results from pts benchmarks:
 
 Openarena 0.8.8: 3.67% - 1.03%
 UrbanTerror: 2.36% - 0.81%
 
 V5:
 - actually make use of the optimisation in android (Emil Velikov)
 - set a better array size limit for using SSE and added TODO
 
 V4:
 - fixed bugs with incrementing pointer and updating counters
 
 V3:
 - Removed sse_minmax.c from Makefile.sources
 - handle the first few values without SSE until the pointer is aligned
  and use _mm_load_si128 rather than _mm_loadu_si128
 - guard the call to the SSE code better at build time
 
 V2:
 - removed GL* types
 - use _mm_store_si128() rather than _mm_store_ps()
 - add runtime check for SSE
 - use aligned attribute for local mix/max
 - bunch of tidyups
 
 Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
 ---
  src/mesa/Android.libmesa_dricore.mk |  8 ++-
  src/mesa/Android.libmesa_st_mesa.mk |  5 ++
  src/mesa/Makefile.am|  3 +-
  src/mesa/main/sse_minmax.c  | 97 
 +
  src/mesa/main/sse_minmax.h  | 30 
  src/mesa/vbo/vbo_exec_array.c   | 14 --
  6 files changed, 152 insertions(+), 5 deletions(-)
  create mode 100644 src/mesa/main/sse_minmax.c
  create mode 100644 src/mesa/main/sse_minmax.h


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-11-05 Thread Matt Turner
On Wed, Nov 5, 2014 at 12:50 PM, Timothy Arceri t_arc...@yahoo.com.au wrote:
 There have been quite a few eyes over this now but nobody has given it a
 reviewed by yet.

 Would be nice to get it in before the code freeze. Any takers?

Yes, I'll make sure that happens.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V5] mesa: add SSE optimisation for glDrawElements

2014-10-29 Thread Timothy Arceri
Makes use of SSE to speed up compute of min and max elements

Callgrind cpu usage results from pts benchmarks:

Openarena 0.8.8: 3.67% - 1.03%
UrbanTerror: 2.36% - 0.81%

V5:
- actually make use of the optimisation in android (Emil Velikov)
- set a better array size limit for using SSE and added TODO

V4:
- fixed bugs with incrementing pointer and updating counters

V3:
- Removed sse_minmax.c from Makefile.sources
- handle the first few values without SSE until the pointer is aligned
 and use _mm_load_si128 rather than _mm_loadu_si128
- guard the call to the SSE code better at build time

V2:
- removed GL* types
- use _mm_store_si128() rather than _mm_store_ps()
- add runtime check for SSE
- use aligned attribute for local mix/max
- bunch of tidyups

Signed-off-by: Timothy Arceri t_arc...@yahoo.com.au
---
 src/mesa/Android.libmesa_dricore.mk |  8 ++-
 src/mesa/Android.libmesa_st_mesa.mk |  5 ++
 src/mesa/Makefile.am|  3 +-
 src/mesa/main/sse_minmax.c  | 97 +
 src/mesa/main/sse_minmax.h  | 30 
 src/mesa/vbo/vbo_exec_array.c   | 14 --
 6 files changed, 152 insertions(+), 5 deletions(-)
 create mode 100644 src/mesa/main/sse_minmax.c
 create mode 100644 src/mesa/main/sse_minmax.h

diff --git a/src/mesa/Android.libmesa_dricore.mk 
b/src/mesa/Android.libmesa_dricore.mk
index 1e6d948..2ab593d 100644
--- a/src/mesa/Android.libmesa_dricore.mk
+++ b/src/mesa/Android.libmesa_dricore.mk
@@ -51,10 +51,16 @@ endif # MESA_ENABLE_ASM
 
 ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
 LOCAL_SRC_FILES += \
-   $(SRCDIR)main/streaming-load-memcpy.c
+   $(SRCDIR)main/streaming-load-memcpy.c \
+   $(SRCDIR)main/sse_minmax.c
 LOCAL_CFLAGS := -msse4.1
 endif
 
+ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
+LOCAL_CFLAGS += \
+   -DUSE_SSE41
+endif
+
 LOCAL_C_INCLUDES := \
$(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \
$(MESA_TOP)/src \
diff --git a/src/mesa/Android.libmesa_st_mesa.mk 
b/src/mesa/Android.libmesa_st_mesa.mk
index 8b8d652..618d6bf 100644
--- a/src/mesa/Android.libmesa_st_mesa.mk
+++ b/src/mesa/Android.libmesa_st_mesa.mk
@@ -48,6 +48,11 @@ ifeq ($(TARGET_ARCH),x86)
 endif # x86
 endif # MESA_ENABLE_ASM
 
+ifeq ($(ARCH_X86_HAVE_SSE4_1),true)
+LOCAL_CFLAGS := \
+   -DUSE_SSE41
+endif
+
 LOCAL_C_INCLUDES := \
$(call intermediates-dir-for STATIC_LIBRARIES,libmesa_program,,) \
$(MESA_TOP)/src/gallium/auxiliary \
diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am
index e71bccb..932db4f 100644
--- a/src/mesa/Makefile.am
+++ b/src/mesa/Makefile.am
@@ -151,7 +151,8 @@ libmesagallium_la_LIBADD = \
$(ARCH_LIBS)
 
 libmesa_sse41_la_SOURCES = \
-   main/streaming-load-memcpy.c
+   main/streaming-load-memcpy.c \
+   main/sse_minmax.c
 libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1
 
 pkgconfigdir = $(libdir)/pkgconfig
diff --git a/src/mesa/main/sse_minmax.c b/src/mesa/main/sse_minmax.c
new file mode 100644
index 000..91a55e5
--- /dev/null
+++ b/src/mesa/main/sse_minmax.c
@@ -0,0 +1,97 @@
+/*
+ * Copyright © 2014 Timothy Arceri
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Author:
+ *Timothy Arceri t_arc...@yahoo.com.au
+ *
+ */
+
+#ifdef __SSE4_1__
+#include main/sse_minmax.h
+#include smmintrin.h
+#include stdint.h
+
+void
+_mesa_uint_array_min_max(const unsigned *ui_indices, unsigned *min_index,
+ unsigned *max_index, const unsigned count)
+{
+   unsigned max_ui = 0;
+   unsigned min_ui = ~0U;
+   unsigned i = 0;
+   unsigned aligned_count = count;
+
+   /* handle the first few values without SSE until the pointer is aligned */
+   while (((uintptr_t)ui_indices  15)  aligned_count) {
+  if (*ui_indices  max_ui)
+ max_ui = *ui_indices;
+  if (*ui_indices  min_ui)
+ min_ui = *ui_indices;
+
+  aligned_count--;
+  ui_indices++;
+   }
+
+