On 10/08/2017 02:24, Joseph Myers wrote:
> The SSE4.1 packusdw instruction combines source and destination
> vectors of signed 32-bit integers into a single vector of unsigned
> 16-bit integers, with unsigned saturation.  When the source and
> destination are the same register, this means each 32-bit element of
> that register is used twice as an input, to produce two of the 16-bit
> output elements, and so if the operation is carried out
> element-by-element in-place, no matter what the order in which it is
> applied to the elements, the first element's operation will overwrite
> some future input.  The helper for packssdw avoids this issue by
> computing the result in a local temporary and copying it to the
> destination at the end; this patch fixes the packusdw helper to do
> likewise.  This fixes three gcc test failures in my GCC 6-based
> testing.
> 
> Signed-off-by: Joseph Myers <jos...@codesourcery.com>
> 
> ---
> 
> diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
> index 16509d0..05b1701 100644
> --- a/target/i386/ops_sse.h
> +++ b/target/i386/ops_sse.h
> @@ -1655,14 +1655,17 @@ SSE_HELPER_Q(helper_pcmpeqq, FCMPEQQ)
>  
>  void glue(helper_packusdw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
>  {
> -    d->W(0) = satuw((int32_t) d->L(0));
> -    d->W(1) = satuw((int32_t) d->L(1));
> -    d->W(2) = satuw((int32_t) d->L(2));
> -    d->W(3) = satuw((int32_t) d->L(3));
> -    d->W(4) = satuw((int32_t) s->L(0));
> -    d->W(5) = satuw((int32_t) s->L(1));
> -    d->W(6) = satuw((int32_t) s->L(2));
> -    d->W(7) = satuw((int32_t) s->L(3));
> +    Reg r;
> +
> +    r.W(0) = satuw((int32_t) d->L(0));
> +    r.W(1) = satuw((int32_t) d->L(1));
> +    r.W(2) = satuw((int32_t) d->L(2));
> +    r.W(3) = satuw((int32_t) d->L(3));
> +    r.W(4) = satuw((int32_t) s->L(0));
> +    r.W(5) = satuw((int32_t) s->L(1));
> +    r.W(6) = satuw((int32_t) s->L(2));
> +    r.W(7) = satuw((int32_t) s->L(3));
> +    *d = r;
>  }
>  
>  #define FMINSB(d, s) MIN((int8_t)d, (int8_t)s)
> 

Queued, thanks.

Paolo

Reply via email to