On 28.02.19 10:28, David Hildenbrand wrote: > On 28.02.19 01:03, Richard Henderson wrote: >> On 2/26/19 3:39 AM, David Hildenbrand wrote: >>> Combine all variant in a single handler. As source and destination >>> have different element sizes, we can't use gvec expansion. Expand >>> manually. Also watch out for overlapping source and destination and >>> use a temporary register in that case. >>> >>> Signed-off-by: David Hildenbrand <da...@redhat.com> >>> --- >>> target/s390x/insn-data.def | 8 +++++++ >>> target/s390x/translate_vx.inc.c | 41 +++++++++++++++++++++++++++++++++ >>> 2 files changed, 49 insertions(+) >> >> This works as is, so >> Reviewed-by: Richard Henderson <richard.hender...@linaro.org> >> >> But the same comment applies wrt iteration order and not needing a temporary. >> High unpack can iterate backward, while low unpack can iterate forward, with >> no >> lost data. > > I'll fix that right away. I guess vector pack cannot be handled like this. > > The only way to get rid of the temporary would be to load both elements > from v2 and v3 and then writing the two (half sized) elements in v1. > > I'll have a look.
Hmm, as v2 and v3 are handled concatenated it is not that easy. I am not sure if we can handle this without a temporary vector. I thought about packing them first interleaved v2 = [v2e0, v2e1] v3 = [v3e0, ve31] v1 = [v2e0_packed, v3e0_packed, v2e1_packed, v3e1_packed] And then restoring the right order v1 = [v2e0_packed, v2e1_packed, v3e0_packed, v3e1_packed] But than the second operation seems to be the problem. That shuffling would have to be hard coded as far as I can see. (shuffling with MO_8 is nasty -> 14 element shave to be exchanged, in my opinion needing eventually 14 temporary variables) Of course, we can also simply detect duplicates and if so, call into a helper. -- Thanks, David / dhildenb