Re: [Mesa-dev] [PATCH 2/4] gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_auto

2017-01-04 Thread Jose Fonseca

On 21/12/16 04:01, srol...@vmware.com wrote:

From: Roland Scheidegger 

If we only feed one source vector at a time, we cannot use pack intrinsics
(as we only have a 64bit destination dst vector). lp_bld_conv_auto is
specifically designed to alter the length and number of destination vectors,
so this works just fine (if we use single source vectors at a time, afterwards
we immediately reassemble the vectors).
For AVX though this isn't really possible, since we expect 128bit output
already for a single 256bit input. (One day we should handle AVX2 which again
would need multiple inputs, however there's the problem that we get different
ordered output there and we don't want to reorder, so would need to be able
to tell build_conv to handle upper and lower halfs independently.)
A similar strategy would probably work for 32->8bit too (if it doesn't hit
the special case) but I'm going to try something different for that...
---
 src/gallium/auxiliary/gallivm/lp_bld_conv.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_conv.c 
b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
index 69d24a5..c8f9c28 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_conv.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
@@ -497,8 +497,25 @@ int lp_build_conv_auto(struct gallivm_state *gallivm,
if (src_type.width == dst_type->width) {
   lp_build_conv(gallivm, src_type, *dst_type, src, num_srcs, dst, 
num_dsts);
} else {
-  for (i = 0; i < num_srcs; ++i) {
- lp_build_conv(gallivm, src_type, *dst_type, [i], 1, [i], 1);
+  /*
+   * If dst_width is 16 bits and src_width 32 and the dst vector size
+   * 64bit, try feeding 2 vectors at once so pack intrinsics can be used.
+   * (For AVX, this isn't needed, since we usually get 256bit src and
+   * 128bit dst vectors which works ok. If we do AVX2 pack this should
+   * be extended but need to be able to tell conversion code about pack
+   * ordering first.)
+   */
+  unsigned ratio = 1;
+  if (src_type.width == 2 * dst_type->width &&
+  src_type.length == dst_type->length &&
+  dst_type->floating == 0 && (num_srcs % 2 == 0) &&
+  dst_type->width * dst_type->length == 64) {
+ ratio = 2;
+ num_dsts /= 2;
+ dst_type->length *= 2;


Should this be inside lp_build_conv?


+  }
+  for (i = 0; i < num_dsts; i++) {
+ lp_build_conv(gallivm, src_type, *dst_type, [i*ratio], ratio, 
[i], 1);
   }
}





Reviewed-by: Jose Fonseca 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_auto

2016-12-20 Thread sroland
From: Roland Scheidegger 

If we only feed one source vector at a time, we cannot use pack intrinsics
(as we only have a 64bit destination dst vector). lp_bld_conv_auto is
specifically designed to alter the length and number of destination vectors,
so this works just fine (if we use single source vectors at a time, afterwards
we immediately reassemble the vectors).
For AVX though this isn't really possible, since we expect 128bit output
already for a single 256bit input. (One day we should handle AVX2 which again
would need multiple inputs, however there's the problem that we get different
ordered output there and we don't want to reorder, so would need to be able
to tell build_conv to handle upper and lower halfs independently.)
A similar strategy would probably work for 32->8bit too (if it doesn't hit
the special case) but I'm going to try something different for that...
---
 src/gallium/auxiliary/gallivm/lp_bld_conv.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_conv.c 
b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
index 69d24a5..c8f9c28 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_conv.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
@@ -497,8 +497,25 @@ int lp_build_conv_auto(struct gallivm_state *gallivm,
if (src_type.width == dst_type->width) {
   lp_build_conv(gallivm, src_type, *dst_type, src, num_srcs, dst, 
num_dsts);
} else {
-  for (i = 0; i < num_srcs; ++i) {
- lp_build_conv(gallivm, src_type, *dst_type, [i], 1, [i], 1);
+  /*
+   * If dst_width is 16 bits and src_width 32 and the dst vector size
+   * 64bit, try feeding 2 vectors at once so pack intrinsics can be used.
+   * (For AVX, this isn't needed, since we usually get 256bit src and
+   * 128bit dst vectors which works ok. If we do AVX2 pack this should
+   * be extended but need to be able to tell conversion code about pack
+   * ordering first.)
+   */
+  unsigned ratio = 1;
+  if (src_type.width == 2 * dst_type->width &&
+  src_type.length == dst_type->length &&
+  dst_type->floating == 0 && (num_srcs % 2 == 0) &&
+  dst_type->width * dst_type->length == 64) {
+ ratio = 2;
+ num_dsts /= 2;
+ dst_type->length *= 2;
+  }
+  for (i = 0; i < num_dsts; i++) {
+ lp_build_conv(gallivm, src_type, *dst_type, [i*ratio], ratio, 
[i], 1);
   }
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev