Re: [PATCH] D20782: [AVX512] Emit generic masked store intrinsics directly from clang instead of using x86 specific intrinsics.

2016-05-31 Thread Craig Topper via cfe-commits
craig.topper closed this revision.
craig.topper added a comment.

Commited in r271246.


http://reviews.llvm.org/D20782



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D20782: [AVX512] Emit generic masked store intrinsics directly from clang instead of using x86 specific intrinsics.

2016-05-30 Thread Elena Demikhovsky via cfe-commits
delena accepted this revision.
delena added a comment.
This revision is now accepted and ready to land.

LGTM, After tests cleanup


http://reviews.llvm.org/D20782



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D20782: [AVX512] Emit generic masked store intrinsics directly from clang instead of using x86 specific intrinsics.

2016-05-30 Thread Craig Topper via cfe-commits
craig.topper added inline comments.


Comment at: lib/CodeGen/CGBuiltin.cpp:6304
@@ +6303,3 @@
+  Indices[i] = i;
+Ops[2] = CGF.Builder.CreateShuffleVector(Ops[2], Ops[2],
+ makeArrayRef(Indices, NumElts),

delena wrote:
> craig.topper wrote:
> > delena wrote:
> > > What code do you receive at the end? There is no shuffle instruction in 
> > > the architecture for mask vector.
> > That's not really a shuffle. It's an extract subvector, but the IR doesn't 
> > have a real instruction for that.
> > 
> > It's needed so we can go from i8 -> v8i1 -> v2i1/v4i1.
> I understand. I just wanted to be sure that you receive only one "kmov %edi, 
> %k1" at the end.
Yes, only one "kmov %edi, %k1" was generated.


http://reviews.llvm.org/D20782



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D20782: [AVX512] Emit generic masked store intrinsics directly from clang instead of using x86 specific intrinsics.

2016-05-30 Thread Elena Demikhovsky via cfe-commits
delena added inline comments.


Comment at: lib/CodeGen/CGBuiltin.cpp:6304
@@ +6303,3 @@
+  Indices[i] = i;
+Ops[2] = CGF.Builder.CreateShuffleVector(Ops[2], Ops[2],
+ makeArrayRef(Indices, NumElts),

craig.topper wrote:
> delena wrote:
> > What code do you receive at the end? There is no shuffle instruction in the 
> > architecture for mask vector.
> That's not really a shuffle. It's an extract subvector, but the IR doesn't 
> have a real instruction for that.
> 
> It's needed so we can go from i8 -> v8i1 -> v2i1/v4i1.
I understand. I just wanted to be sure that you receive only one "kmov %edi, 
%k1" at the end.


http://reviews.llvm.org/D20782



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D20782: [AVX512] Emit generic masked store intrinsics directly from clang instead of using x86 specific intrinsics.

2016-05-29 Thread Craig Topper via cfe-commits
craig.topper added inline comments.


Comment at: lib/CodeGen/CGBuiltin.cpp:6304
@@ +6303,3 @@
+  Indices[i] = i;
+Ops[2] = CGF.Builder.CreateShuffleVector(Ops[2], Ops[2],
+ makeArrayRef(Indices, NumElts),

delena wrote:
> What code do you receive at the end? There is no shuffle instruction in the 
> architecture for mask vector.
That's not really a shuffle. It's an extract subvector, but the IR doesn't have 
a real instruction for that.

It's needed so we can go from i8 -> v8i1 -> v2i1/v4i1.


Comment at: test/CodeGen/avx512f-builtins.c:123
@@ -122,2 +122,3 @@
   // CHECK-LABEL: @test_mm512_storeu_si512 
-  // CHECK: @llvm.x86.avx512.mask.storeu.d.512
+  // CHECK: store <16 x i32> %5, <16 x i32>* %6, align 1
+  // CHECK-NEXT: ret void

delena wrote:
> I suggest to remove %5, %6 from the test, you can put something like this:
> CHECK: store <16 x i32> {{.*}}, align 1
I'll clean that up. I fixed most of them but looks like a missed a few.


http://reviews.llvm.org/D20782



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D20782: [AVX512] Emit generic masked store intrinsics directly from clang instead of using x86 specific intrinsics.

2016-05-29 Thread Elena Demikhovsky via cfe-commits
delena added inline comments.


Comment at: lib/CodeGen/CGBuiltin.cpp:6304
@@ +6303,3 @@
+  Indices[i] = i;
+Ops[2] = CGF.Builder.CreateShuffleVector(Ops[2], Ops[2],
+ makeArrayRef(Indices, NumElts),

What code do you receive at the end? There is no shuffle instruction in the 
architecture for mask vector.


Comment at: test/CodeGen/avx512f-builtins.c:123
@@ -122,2 +122,3 @@
   // CHECK-LABEL: @test_mm512_storeu_si512 
-  // CHECK: @llvm.x86.avx512.mask.storeu.d.512
+  // CHECK: store <16 x i32> %5, <16 x i32>* %6, align 1
+  // CHECK-NEXT: ret void

I suggest to remove %5, %6 from the test, you can put something like this:
CHECK: store <16 x i32> {{.*}}, align 1


http://reviews.llvm.org/D20782



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D20782: [AVX512] Emit generic masked store intrinsics directly from clang instead of using x86 specific intrinsics.

2016-05-29 Thread Craig Topper via cfe-commits
craig.topper created this revision.
craig.topper added reviewers: delena, RKSimon, AsafBadouh, igorb, m_zuckerman.
craig.topper added a subscriber: cfe-commits.

This will allow us to remove the x86 specific intrinsics from the backend.

http://reviews.llvm.org/D20782

Files:
  lib/CodeGen/CGBuiltin.cpp
  test/CodeGen/avx512bw-builtins.c
  test/CodeGen/avx512f-builtins.c
  test/CodeGen/avx512vl-builtins.c
  test/CodeGen/avx512vlbw-builtins.c

Index: test/CodeGen/avx512vlbw-builtins.c
===
--- test/CodeGen/avx512vlbw-builtins.c
+++ test/CodeGen/avx512vlbw-builtins.c
@@ -2055,25 +2055,25 @@
 
 void test_mm_mask_storeu_epi16(void *__P, __mmask8 __U, __m128i __A) {
   // CHECK-LABEL: @test_mm_mask_storeu_epi16
-  // CHECK: @llvm.x86.avx512.mask.storeu.w.128
+  // CHECK: @llvm.masked.store.v8i16(<8 x i16> %{{.*}}, <8 x i16>* %{{.*}}, i32 1, <8 x i1> %{{.*}})
   return _mm_mask_storeu_epi16(__P, __U, __A); 
 }
 
 void test_mm256_mask_storeu_epi16(void *__P, __mmask16 __U, __m256i __A) {
   // CHECK-LABEL: @test_mm256_mask_storeu_epi16
-  // CHECK: @llvm.x86.avx512.mask.storeu.w.256
+  // CHECK: @llvm.masked.store.v16i16(<16 x i16> %{{.*}}, <16 x i16>* %{{.*}}, i32 1, <16 x i1> %{{.*}})
   return _mm256_mask_storeu_epi16(__P, __U, __A); 
 }
 
 void test_mm_mask_storeu_epi8(void *__P, __mmask16 __U, __m128i __A) {
   // CHECK-LABEL: @test_mm_mask_storeu_epi8
-  // CHECK: @llvm.x86.avx512.mask.storeu.b.128
+  // CHECK: @llvm.masked.store.v16i8(<16 x i8> %{{.*}}, <16 x i8>* %{{.*}}, i32 1, <16 x i1> %{{.*}})
   return _mm_mask_storeu_epi8(__P, __U, __A); 
 }
 
 void test_mm256_mask_storeu_epi8(void *__P, __mmask32 __U, __m256i __A) {
   // CHECK-LABEL: @test_mm256_mask_storeu_epi8
-  // CHECK: @llvm.x86.avx512.mask.storeu.b.256
+  // CHECK: @llvm.masked.store.v32i8(<32 x i8> %{{.*}}, <32 x i8>* %{{.*}}, i32 1, <32 x i1> %{{.*}})
   return _mm256_mask_storeu_epi8(__P, __U, __A); 
 }
 __mmask16 test_mm_test_epi8_mask(__m128i __A, __m128i __B) {
Index: test/CodeGen/avx512vl-builtins.c
===
--- test/CodeGen/avx512vl-builtins.c
+++ test/CodeGen/avx512vl-builtins.c
@@ -3935,13 +3935,13 @@
 
 void test_mm_mask_store_epi32(void *__P, __mmask8 __U, __m128i __A) {
   // CHECK-LABEL: @test_mm_mask_store_epi32
-  // CHECK: @llvm.x86.avx512.mask.store.d.128
+  // CHECK: @llvm.masked.store.v4i32(<4 x i32> %{{.*}}, <4 x i32>* %{{.}}, i32 16, <4 x i1> %{{.*}})
   return _mm_mask_store_epi32(__P, __U, __A); 
 }
 
 void test_mm256_mask_store_epi32(void *__P, __mmask8 __U, __m256i __A) {
   // CHECK-LABEL: @test_mm256_mask_store_epi32
-  // CHECK: @llvm.x86.avx512.mask.store.d.256
+  // CHECK: @llvm.masked.store.v8i32(<8 x i32> %{{.*}}, <8 x i32>* %{{.}}, i32 32, <8 x i1> %{{.*}})
   return _mm256_mask_store_epi32(__P, __U, __A); 
 }
 
@@ -4043,13 +4043,13 @@
 
 void test_mm_mask_store_epi64(void *__P, __mmask8 __U, __m128i __A) {
   // CHECK-LABEL: @test_mm_mask_store_epi64
-  // CHECK: @llvm.x86.avx512.mask.store.q.128
+  // CHECK: @llvm.masked.store.v2i64(<2 x i64> %{{.*}}, <2 x i64>* %{{.*}}, i32 16, <2 x i1> %{{.*}})
   return _mm_mask_store_epi64(__P, __U, __A); 
 }
 
 void test_mm256_mask_store_epi64(void *__P, __mmask8 __U, __m256i __A) {
   // CHECK-LABEL: @test_mm256_mask_store_epi64
-  // CHECK: @llvm.x86.avx512.mask.store.q.256
+  // CHECK: @llvm.masked.store.v4i64(<4 x i64> %{{.*}}, <4 x i64>* %{{.*}}, i32 32, <4 x i1> %{{.*}})
   return _mm256_mask_store_epi64(__P, __U, __A); 
 }
 
@@ -4343,73 +4343,73 @@
 
 void test_mm_mask_store_pd(void *__P, __mmask8 __U, __m128d __A) {
   // CHECK-LABEL: @test_mm_mask_store_pd
-  // CHECK: @llvm.x86.avx512.mask.store.pd.128
+  // CHECK: @llvm.masked.store.v2f64(<2 x double> %{{.*}}, <2 x double>* %{{.*}}, i32 16, <2 x i1> %{{.*}})
   return _mm_mask_store_pd(__P, __U, __A); 
 }
 
 void test_mm256_mask_store_pd(void *__P, __mmask8 __U, __m256d __A) {
   // CHECK-LABEL: @test_mm256_mask_store_pd
-  // CHECK: @llvm.x86.avx512.mask.store.pd.256
+  // CHECK: @llvm.masked.store.v4f64(<4 x double> %{{.*}}, <4 x double>* %{{.*}}, i32 32, <4 x i1> %{{.*}})
   return _mm256_mask_store_pd(__P, __U, __A); 
 }
 
 void test_mm_mask_store_ps(void *__P, __mmask8 __U, __m128 __A) {
   // CHECK-LABEL: @test_mm_mask_store_ps
-  // CHECK: @llvm.x86.avx512.mask.store.ps.128
+  // CHECK: @llvm.masked.store.v4f32(<4 x float> %{{.*}}, <4 x float>* %{{.*}}, i32 16, <4 x i1> %{{.*}})
   return _mm_mask_store_ps(__P, __U, __A); 
 }
 
 void test_mm256_mask_store_ps(void *__P, __mmask8 __U, __m256 __A) {
   // CHECK-LABEL: @test_mm256_mask_store_ps
-  // CHECK: @llvm.x86.avx512.mask.store.ps.256
+  // CHECK: @llvm.masked.store.v8f32(<8 x float> %{{.*}}, <8 x float>* %{{.*}}, i32 32, <8 x i1> %{{.*}})
   return _mm256_mask_store_ps(__P, __U, __A); 
 }
 
 void test_mm_mask_storeu_epi64(void *__P, __mmask8 __U, __m128i __A) {
   // CHECK-LABEL: @test_mm_mask_storeu_epi64
-  // CHECK: