On 01.03.19 17:26, Richard Henderson wrote: > On 3/1/19 3:53 AM, David Hildenbrand wrote: >> + /* >> + * Check for possible access exceptions by trying to load the last >> + * element. The first element will be checked first next. >> + */ >> + t = tcg_temp_new_i64(); >> + gen_addi_and_wrap_i64(s, t, o->addr1, (v3 - v1) * 16 + 8); >> + tcg_gen_qemu_ld_i64(t, t, get_mem_index(s), MO_TEQ); > > qemu_ld expands to enough code that it is a shame to discard this value and > reload it during this loop. Perhaps load this to t2... > >> + >> + for (;; v1++) { >> + tcg_gen_qemu_ld_i64(t, o->addr1, get_mem_index(s), MO_TEQ); >> + write_vec_element_i64(t, v1, 0, ES_64); > > Move v1 == v3 break here... > >> + gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8); >> + tcg_gen_qemu_ld_i64(t, o->addr1, get_mem_index(s), MO_TEQ); >> + write_vec_element_i64(t, v1, 1, ES_64); >> + if (v1 == v3) { >> + break; >> + } >> + gen_addi_and_wrap_i64(s, o->addr1, o->addr1, 8); >> + } > > ... and store t2 into v3 element 1 after the loop.
Yes, makes perfect sense, thanks! -- Thanks, David / dhildenb