On 10/12/2016 07:21 PM, David Gibson wrote:
+static void gen_bswap32x4(TCGv_i64 outh, TCGv_i64 outl,
+                          TCGv_i64 inh, TCGv_i64 inl)
+    TCGv_i64 hi = tcg_temp_new_i64();
+    TCGv_i64 lo = tcg_temp_new_i64();
+    tcg_gen_bswap64_i64(hi, inh);
+    tcg_gen_bswap64_i64(lo, inl);
+    tcg_gen_shri_i64(outh, hi, 32);
+    tcg_gen_deposit_i64(outh, outh, hi, 32, 32);
+    tcg_gen_shri_i64(outl, lo, 32);
+    tcg_gen_deposit_i64(outl, outl, lo, 32, 32);
+    tcg_temp_free_i64(hi);
+    tcg_temp_free_i64(lo);

Is there actually any advantage to having this 128-bit operation
working on two 64-bit "register"s, as opposed to having a bswap32x2
that operates on a single 64-bit register amd calling it twice?

For this one, no particular advantage.

+    gen_bswap16x8(xth, xtl, xbh, xbl);

Likewise for the 16x8 version, I guess, although that would mean
changing the existing users.

For this one, we have to build a 64-bit constant, 0x00ff00ff00ff00ff. On some hosts that's up to 6 insns. Being about to reuse that for both swaps is useful.

+    tcg_gen_bswap64_i64(t0, xbl);
+    tcg_gen_bswap64_i64(xtl, xbh);
+    tcg_gen_bswap64_i64(xth, t0);

This looks wrong.  You swap xbl as you move it to t0, then swap it
again as you put it back into xth.  So it looks like you'll translate
           0011223344556677 8899AABBCCDDEEFF
           8899AABBCCDDEEFF 7766554433221100
whereas it should become
           FFEEDDCCBBAA9977 7766554433221100

Indeed, the third line should be a move, not a swap.


Reply via email to