On 08/08/2017 22:21, Joseph Myers wrote: > The SSE4.1 pmovsx* and pmovzx* instructions take packed 1-byte, 2-byte > or 4-byte inputs and sign-extend or zero-extend them to a wider vector > output. The associated helpers for these instructions do the > extension on each element in turn, starting with the lowest. If the > input and output are the same register, this means that all the input > elements after the first have been overwritten before they are read. > This patch makes the helpers extend starting with the highest element, > not the lowest, to avoid such overwriting. This fixes many GCC test > failures (161 in the gcc testsuite in my GCC 6-based testing) when > testing with a default CPU setting enabling those instructions. > > Signed-off-by: Joseph Myers <jos...@codesourcery.com> > > --- > > diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h > index 16509d0..d578216 100644 > --- a/target/i386/ops_sse.h > +++ b/target/i386/ops_sse.h > @@ -1617,18 +1617,18 @@ void glue(helper_ptest, SUFFIX)(CPUX86State *env, Reg > *d, Reg *s) > #define SSE_HELPER_F(name, elem, num, F) \ > void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ > { \ > - d->elem(0) = F(0); \ > - d->elem(1) = F(1); \ > if (num > 2) { \ > - d->elem(2) = F(2); \ > - d->elem(3) = F(3); \ > if (num > 4) { \ > - d->elem(4) = F(4); \ > - d->elem(5) = F(5); \ > - d->elem(6) = F(6); \ > d->elem(7) = F(7); \ > + d->elem(6) = F(6); \ > + d->elem(5) = F(5); \ > + d->elem(4) = F(4); \ > } \ > + d->elem(3) = F(3); \ > + d->elem(2) = F(2); \ > } \ > + d->elem(1) = F(1); \ > + d->elem(0) = F(0); \ > } > > SSE_HELPER_F(helper_pmovsxbw, W, 8, (int8_t) s->B) >
Queued, thanks. Paolo