On 2/26/19 3:39 AM, David Hildenbrand wrote: > + for (dst_idx = 0; dst_idx < NUM_VEC_ELEMENTS(es); dst_idx++) { > + src_idx = dst_idx / 2; > + if (!high) { > + src_idx += NUM_VEC_ELEMENTS(es) / 2; > + } > + if (dst_idx % 2 == 0) { > + read_vec_element_i64(tmp, v2, src_idx, es); > + } else { > + read_vec_element_i64(tmp, v3, src_idx, es); > + } > + write_vec_element_i64(tmp, dst_v, dst_idx, es); > + }
TODO: Note that you do not need a vector temporary here, so long as you load both source elements before writing, and you iterate in the proper direction. For VMRL, iterate forward as you do now. The element access order for MO_32: read v2: 2 3 read v3: 2 3 write v1: 0 1 2 3 For VMRH, iterate backward: read v2: 1 0 read v3: 1 0 write v1: 3 2 1 0 r~