Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds
On 03/20/2018 01:50 PM, Chris Wilson wrote: > Quoting Scott D Phillips (2018-03-20 20:39:25) >> When building intel_tiled_memcpy for i686, the stack will only be >> 4-byte aligned. This isn't sufficient for SSE temporaries which >> require 16-byte alignment. Use the force_align_arg_pointer >> function attribute in that case to ensure sufficient alignment. >> --- >> src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> index 69306828d72..bd8bafbd2d7 100644 >> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> @@ -42,6 +42,12 @@ >> #include >> #endif >> >> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || >> defined(__SSE2__)) >> +#define REALIGN __attribute__((force_align_arg_pointer)) >> +#else >> +#define REALIGN >> +#endif > > It would be a harmless no-op on x86-64 (or essential ;) > >> + >> #define FILE_DEBUG_FLAG DEBUG_TEXTURE >> >> #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b) >> @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src) >> /** >> * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned. >> */ >> -static inline void * >> +static REALIGN inline void * >> rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes) >> { >> assert(bytes == 0 || !(((uintptr_t)dst) & 0xf)); > > Hmm, if aligned_dst is spilling to stack, so would be aligned_src. As > these are supposed to be inlined (and constant folded), do you not want > to realign the callers instead? Perhaps with an explicit FLATTEN. The particular bug seems to mostly only trigger in unoptimized debug builds. I couldn't get it to trigger with -O2, but with -O0 it was 100% reproducible. > -Chris > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds
On Wednesday, 2018-03-21 10:11:45 -0700, Matt Turner wrote: > On Wed, Mar 21, 2018 at 2:39 AM, Eric Engestrom > wrote: > > On Tuesday, 2018-03-20 13:39:25 -0700, Scott D Phillips wrote: > >> When building intel_tiled_memcpy for i686, the stack will only be > >> 4-byte aligned. This isn't sufficient for SSE temporaries which > >> require 16-byte alignment. Use the force_align_arg_pointer > >> function attribute in that case to ensure sufficient alignment. > >> --- > >> src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++- > >> 1 file changed, 7 insertions(+), 1 deletion(-) > >> > >> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > >> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > >> index 69306828d72..bd8bafbd2d7 100644 > >> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > >> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > >> @@ -42,6 +42,12 @@ > >> #include > >> #endif > >> > >> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || > >> defined(__SSE2__)) > > > > Is that a typo? s/SSSE3/SSE3/ ? > > Nope, that's correct. SSSE3 is "Supplemental" SSE3, and we have > compile-time code in this file that uses it over SSE2 to save a few > instructions. Oh, didn't know about that; learned something new :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds
On Wed, Mar 21, 2018 at 2:39 AM, Eric Engestrom wrote: > On Tuesday, 2018-03-20 13:39:25 -0700, Scott D Phillips wrote: >> When building intel_tiled_memcpy for i686, the stack will only be >> 4-byte aligned. This isn't sufficient for SSE temporaries which >> require 16-byte alignment. Use the force_align_arg_pointer >> function attribute in that case to ensure sufficient alignment. >> --- >> src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> index 69306828d72..bd8bafbd2d7 100644 >> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> @@ -42,6 +42,12 @@ >> #include >> #endif >> >> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || >> defined(__SSE2__)) > > Is that a typo? s/SSSE3/SSE3/ ? Nope, that's correct. SSSE3 is "Supplemental" SSE3, and we have compile-time code in this file that uses it over SSE2 to save a few instructions. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds
Chris Wilson writes: > Quoting Scott D Phillips (2018-03-20 20:39:25) >> When building intel_tiled_memcpy for i686, the stack will only be >> 4-byte aligned. This isn't sufficient for SSE temporaries which >> require 16-byte alignment. Use the force_align_arg_pointer >> function attribute in that case to ensure sufficient alignment. >> --- >> src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> index 69306828d72..bd8bafbd2d7 100644 >> --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c >> @@ -42,6 +42,12 @@ >> #include >> #endif >> >> +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || >> defined(__SSE2__)) >> +#define REALIGN __attribute__((force_align_arg_pointer)) >> +#else >> +#define REALIGN >> +#endif > > It would be a harmless no-op on x86-64 (or essential ;) > >> + >> #define FILE_DEBUG_FLAG DEBUG_TEXTURE >> >> #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b) >> @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src) >> /** >> * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned. >> */ >> -static inline void * >> +static REALIGN inline void * >> rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes) >> { >> assert(bytes == 0 || !(((uintptr_t)dst) & 0xf)); > > Hmm, if aligned_dst is spilling to stack, so would be aligned_src. As > these are supposed to be inlined (and constant folded), do you not > want to realign the callers instead? Playing around with this function attribute more, it seems not to work as I had hoped. Specifically I was expecting that having the attribute on an inline function would cause the attribute to be applied to any inliners as well, but that seems to not happen. Instead it seems to work more like "if you emit a prologue here, then use this alternative one". I guess the right solution is to just apply -mstackrealign to the whole file and let the stack fix get applied to all entry points (whatever that winds up being after optimization) like we have for main/streaming-load-memcpy.c. And then relying on inlining to keep any stack munging out of our loops. > Perhaps with an explicit FLATTEN. Ya, it looks like flatten would be good there. > -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds
On Tuesday, 2018-03-20 13:39:25 -0700, Scott D Phillips wrote: > When building intel_tiled_memcpy for i686, the stack will only be > 4-byte aligned. This isn't sufficient for SSE temporaries which > require 16-byte alignment. Use the force_align_arg_pointer > function attribute in that case to ensure sufficient alignment. > --- > src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > index 69306828d72..bd8bafbd2d7 100644 > --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > @@ -42,6 +42,12 @@ > #include > #endif > > +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || > defined(__SSE2__)) Is that a typo? s/SSSE3/SSE3/ ? > +#define REALIGN __attribute__((force_align_arg_pointer)) > +#else > +#define REALIGN > +#endif > + > #define FILE_DEBUG_FLAG DEBUG_TEXTURE > > #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b) > @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src) > /** > * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned. > */ > -static inline void * > +static REALIGN inline void * > rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes) > { > assert(bytes == 0 || !(((uintptr_t)dst) & 0xf)); > -- > 2.14.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds
Quoting Scott D Phillips (2018-03-20 20:39:25) > When building intel_tiled_memcpy for i686, the stack will only be > 4-byte aligned. This isn't sufficient for SSE temporaries which > require 16-byte alignment. Use the force_align_arg_pointer > function attribute in that case to ensure sufficient alignment. > --- > src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > index 69306828d72..bd8bafbd2d7 100644 > --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c > @@ -42,6 +42,12 @@ > #include > #endif > > +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || > defined(__SSE2__)) > +#define REALIGN __attribute__((force_align_arg_pointer)) > +#else > +#define REALIGN > +#endif It would be a harmless no-op on x86-64 (or essential ;) > + > #define FILE_DEBUG_FLAG DEBUG_TEXTURE > > #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b) > @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src) > /** > * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned. > */ > -static inline void * > +static REALIGN inline void * > rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes) > { > assert(bytes == 0 || !(((uintptr_t)dst) & 0xf)); Hmm, if aligned_dst is spilling to stack, so would be aligned_src. As these are supposed to be inlined (and constant folded), do you not want to realign the callers instead? Perhaps with an explicit FLATTEN. -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/tiled_memcpy: realign rgba8_copy_aligned_dst stack in 32-bit builds
When building intel_tiled_memcpy for i686, the stack will only be 4-byte aligned. This isn't sufficient for SSE temporaries which require 16-byte alignment. Use the force_align_arg_pointer function attribute in that case to ensure sufficient alignment. --- src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c index 69306828d72..bd8bafbd2d7 100644 --- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c +++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c @@ -42,6 +42,12 @@ #include #endif +#if defined(__GNUC__) && defined(__i386__) && (defined(__SSSE3__) || defined(__SSE2__)) +#define REALIGN __attribute__((force_align_arg_pointer)) +#else +#define REALIGN +#endif + #define FILE_DEBUG_FLAG DEBUG_TEXTURE #define ALIGN_DOWN(a, b) ROUND_DOWN_TO(a, b) @@ -156,7 +162,7 @@ rgba8_copy_16_aligned_src(void *dst, const void *src) /** * Copy RGBA to BGRA - swap R and B, with the destination 16-byte aligned. */ -static inline void * +static REALIGN inline void * rgba8_copy_aligned_dst(void *dst, const void *src, size_t bytes) { assert(bytes == 0 || !(((uintptr_t)dst) & 0xf)); -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev