Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds

2018-03-23 Thread Ian Romanick
On 03/20/2018 01:50 PM, Chris Wilson wrote:
> Quoting Scott D Phillips (2018-03-20 20:39:25)
>> When building intel_tiled_memcpy for i686, the stack will only be
>> 4-byte aligned. This isn't sufficient for SSE temporaries which
>> require 16-byte alignment.  Use the force_align_arg_pointer
>> function attribute in that case to ensure sufficient alignment.
>> ---
>>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
>> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> index 69306828d72..bd8bafbd2d7 100644
>> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> @@ -42,6 +42,12 @@
>>  #include 
>>  #endif
>>  
>> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || 
>> defined(__SSE2__))
>> +#define REALIGN __attribute__((force_align_arg_pointer))
>> +#else
>> +#define REALIGN
>> +#endif
> 
> It would be a harmless no-op on x86-64 (or essential ;)
> 
>> +
>>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>>  
>>  #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b)
>> @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src)
>>  /**
>>   * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned.
>>   */
>> -static inline void *
>> +static REALIGN inline void *
>>  rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes)
>>  {
>> assert(bytes == 0 || !(((uintptr_t)dst) & 0xf));
> 
> Hmm, if aligned_dst is spilling to stack, so would be aligned_src. As
> these are supposed to be inlined (and constant folded), do you not want
> to realign the callers instead? Perhaps with an explicit FLATTEN.

The particular bug seems to mostly only trigger in unoptimized debug
builds.  I couldn't get it to trigger with -O2, but with -O0 it was 100%
reproducible.

> -Chris
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds

2018-03-21 Thread Eric Engestrom
On Wednesday, 2018-03-21 10:11:45 -0700, Matt Turner wrote:
> On Wed, Mar 21, 2018 at 2:39 AM, Eric Engestrom
>  wrote:
> > On Tuesday, 2018-03-20 13:39:25 -0700, Scott D Phillips wrote:
> >> When building intel_tiled_memcpy for i686, the stack will only be
> >> 4-byte aligned. This isn't sufficient for SSE temporaries which
> >> require 16-byte alignment.  Use the force_align_arg_pointer
> >> function attribute in that case to ensure sufficient alignment.
> >> ---
> >>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
> >> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> >> index 69306828d72..bd8bafbd2d7 100644
> >> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> >> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> >> @@ -42,6 +42,12 @@
> >>  #include 
> >>  #endif
> >>
> >> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || 
> >> defined(__SSE2__))
> >
> > Is that a typo?  s/SSSE3/SSE3/ ?
> 
> Nope, that's correct. SSSE3 is "Supplemental" SSE3, and we have
> compile-time code in this file that uses it over SSE2 to save a few
> instructions.

Oh, didn't know about that; learned something new :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds

2018-03-21 Thread Matt Turner
On Wed, Mar 21, 2018 at 2:39 AM, Eric Engestrom
 wrote:
> On Tuesday, 2018-03-20 13:39:25 -0700, Scott D Phillips wrote:
>> When building intel_tiled_memcpy for i686, the stack will only be
>> 4-byte aligned. This isn't sufficient for SSE temporaries which
>> require 16-byte alignment.  Use the force_align_arg_pointer
>> function attribute in that case to ensure sufficient alignment.
>> ---
>>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
>> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> index 69306828d72..bd8bafbd2d7 100644
>> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> @@ -42,6 +42,12 @@
>>  #include 
>>  #endif
>>
>> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || 
>> defined(__SSE2__))
>
> Is that a typo?  s/SSSE3/SSE3/ ?

Nope, that's correct. SSSE3 is "Supplemental" SSE3, and we have
compile-time code in this file that uses it over SSE2 to save a few
instructions.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds

2018-03-21 Thread Scott D Phillips
Chris Wilson  writes:

> Quoting Scott D Phillips (2018-03-20 20:39:25)
>> When building intel_tiled_memcpy for i686, the stack will only be
>> 4-byte aligned. This isn't sufficient for SSE temporaries which
>> require 16-byte alignment.  Use the force_align_arg_pointer
>> function attribute in that case to ensure sufficient alignment.
>> ---
>>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
>> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> index 69306828d72..bd8bafbd2d7 100644
>> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
>> @@ -42,6 +42,12 @@
>>  #include 
>>  #endif
>>  
>> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || 
>> defined(__SSE2__))
>> +#define REALIGN __attribute__((force_align_arg_pointer))
>> +#else
>> +#define REALIGN
>> +#endif
>
> It would be a harmless no-op on x86-64 (or essential ;)
>
>> +
>>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>>  
>>  #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b)
>> @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src)
>>  /**
>>   * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned.
>>   */
>> -static inline void *
>> +static REALIGN inline void *
>>  rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes)
>>  {
>> assert(bytes == 0 || !(((uintptr_t)dst) & 0xf));
>
> Hmm, if aligned_dst is spilling to stack, so would be aligned_src.  As
> these are supposed to be inlined (and constant folded), do you not
> want to realign the callers instead?

Playing around with this function attribute more, it seems not to work
as I had hoped. Specifically I was expecting that having the attribute
on an inline function would cause the attribute to be applied to any
inliners as well, but that seems to not happen. Instead it seems to work
more like "if you emit a prologue here, then use this alternative one".

I guess the right solution is to just apply -mstackrealign to the whole
file and let the stack fix get applied to all entry points (whatever
that winds up being after optimization) like we have for
main/streaming-load-memcpy.c. And then relying on inlining to keep any
stack munging out of our loops.

> Perhaps with an explicit FLATTEN.

Ya, it looks like flatten would be good there.

> -Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds

2018-03-21 Thread Eric Engestrom
On Tuesday, 2018-03-20 13:39:25 -0700, Scott D Phillips wrote:
> When building intel_tiled_memcpy for i686, the stack will only be
> 4-byte aligned. This isn't sufficient for SSE temporaries which
> require 16-byte alignment.  Use the force_align_arg_pointer
> function attribute in that case to ensure sufficient alignment.
> ---
>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> index 69306828d72..bd8bafbd2d7 100644
> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> @@ -42,6 +42,12 @@
>  #include 
>  #endif
>  
> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || 
> defined(__SSE2__))

Is that a typo?  s/SSSE3/SSE3/ ?

> +#define REALIGN __attribute__((force_align_arg_pointer))
> +#else
> +#define REALIGN
> +#endif
> +
>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>  
>  #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b)
> @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src)
>  /**
>   * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned.
>   */
> -static inline void *
> +static REALIGN inline void *
>  rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes)
>  {
> assert(bytes == 0 || !(((uintptr_t)dst) & 0xf));
> -- 
> 2.14.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds

2018-03-20 Thread Chris Wilson
Quoting Scott D Phillips (2018-03-20 20:39:25)
> When building intel_tiled_memcpy for i686, the stack will only be
> 4-byte aligned. This isn't sufficient for SSE temporaries which
> require 16-byte alignment.  Use the force_align_arg_pointer
> function attribute in that case to ensure sufficient alignment.
> ---
>  src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> index 69306828d72..bd8bafbd2d7 100644
> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
> @@ -42,6 +42,12 @@
>  #include 
>  #endif
>  
> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || 
> defined(__SSE2__))
> +#define REALIGN __attribute__((force_align_arg_pointer))
> +#else
> +#define REALIGN
> +#endif

It would be a harmless no-op on x86-64 (or essential ;)

> +
>  #define FILE_DEBUG_FLAG DEBUG_TEXTURE
>  
>  #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b)
> @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src)
>  /**
>   * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned.
>   */
> -static inline void *
> +static REALIGN inline void *
>  rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes)
>  {
> assert(bytes == 0 || !(((uintptr_t)dst) & 0xf));

Hmm, if aligned_dst is spilling to stack, so would be aligned_src. As
these are supposed to be inlined (and constant folded), do you not want
to realign the callers instead? Perhaps with an explicit FLATTEN.
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds

2018-03-20 Thread Scott D Phillips
When building intel_tiled_memcpy for i686, the stack will only be
4-byte aligned. This isn't sufficient for SSE temporaries which
require 16-byte alignment.  Use the force_align_arg_pointer
function attribute in that case to ensure sufficient alignment.
---
 src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
index 69306828d72..bd8bafbd2d7 100644
--- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
+++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
@@ -42,6 +42,12 @@
 #include 
 #endif
 
+#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || 
defined(__SSE2__))
+#define REALIGN __attribute__((force_align_arg_pointer))
+#else
+#define REALIGN
+#endif
+
 #define FILE_DEBUG_FLAG DEBUG_TEXTURE
 
 #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b)
@@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src)
 /**
  * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned.
  */
-static inline void *
+static REALIGN inline void *
 rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes)
 {
assert(bytes == 0 || !(((uintptr_t)dst) & 0xf));
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev