Re: [PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)

2019-11-22 Thread HAPPY Mahto via cfe-commits
Hello Michael, A very good Morning to you.

On Wed, Nov 20, 2019 at 10:58 PM Michael Kruse  wrote:

> Am Mi., 20. Nov. 2019 um 10:21 Uhr schrieb HAPPY Mahto
> :
> >> #pragma clang loop vectorize_assume_alignment(32)
> >> for(int i = 0;i < n; i++){
> >> a[i] = b[i] + i*i;
> >> }
> >
> >  for this all-access inside the loop will be aligned to 32bit,
> > ex  IR
> >>
> >> for.cond: ; preds = %for.inc,
> %entry
> >>   %5 = load i32, i32* %i, align 32, !llvm.access.group !2
> >>   %6 = load i32, i32* %n, align 32, !llvm.access.group !2
> >>   %cmp = icmp slt i32 %5, %6
> >>   br i1 %cmp, label %for.body, label %for.end
> >>
> >> for.body: ; preds = %for.cond
> >>   %7 = load i32, i32* %i, align 32, !llvm.access.group !2
> >>   %8 = load i32, i32* %i, align 32, !llvm.access.group !2
> >>   %idxprom = sext i32 %8 to i64
> >>   %arrayidx = getelementptr inbounds i32, i32* %vla1, i64 %idxprom
> >>   store i32 %7, i32* %arrayidx, align 32, !llvm.access.group !2
> >>   br label %for.inc
> >>
> >> for.inc:  ; preds = %for.body
> >>   %9 = load i32, i32* %i, align 32, !llvm.access.group !2
> >>   %inc = add nsw i32 %9, 1
> >>   store i32 %inc, i32* %i, align 32, !llvm.access.group !2
> >>   br label %for.cond, !llvm.loop !3
> >
> > You will not need to create pointers for every array(or operand you want
> to perform the operation on).
>
> IMHO it is better if the programmer has to. It is not always obvious
> which arrays are used in the loop. Also, the information can be used
> by other optimzations that the vectorizer.
>
>  We wrote this pragma in by keeping in mind that arrays inside the
important loops are used for vectorization and it'll be good for them to be
in aligned manner.
for( int i = 0; i < n; i++){
  a[i] = b[i] + c[i];
  d[i] = e[i] + i*i;
}
If there are more than 4-5 arrays inside loop being used then it'll be
extra effort to define all of them as _builtin_assume_aligned, we're
thinking of letting that thing handled by pragma itself. It'll be very
helpful for us if other people from community can give their views on this.


> >>
> >> void mult(float* x, int size, float factor){
> >>   float* ax = (float*)__builtin_assume_aligned(x, 64);
> >>   for (int i = 0; i < size; ++i)
> >>  ax[i] *= factor;
> >> }
>
> https://godbolt.org/z/Fd6HMe
>
> > the alignment is assumed whereas in #pragma it is set to the number
> specified.
>
> Semantically, it is the same.
>
> I wonder how you expect the assembly output to change? The
> __builtin_assume_aligned, will be picked up by the backend and result
> in movaps to be used instead of movups.
>
>

> > it'll be easier, and having a pragma for doing this will help as it's
> provided in OMP and intel compilers.
>
> This is a compiler-specific extension. It does not have an influence
> on what other compilers do. Even with clang, if you try to do
>
> #pragma clang loop vectorize_assume_alignment(32)
> #pragma omp simd
> for (int i = 0; i < size; ++i)
>
> clang will silently swallow the vectorize_assume_alignment.
>
>
> Michael
>
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)

2019-11-20 Thread Michael Kruse via cfe-commits
Am Mi., 20. Nov. 2019 um 10:21 Uhr schrieb HAPPY Mahto
:
>> #pragma clang loop vectorize_assume_alignment(32)
>> for(int i = 0;i < n; i++){
>> a[i] = b[i] + i*i;
>> }
>
>  for this all-access inside the loop will be aligned to 32bit,
> ex  IR
>>
>> for.cond: ; preds = %for.inc, %entry
>>   %5 = load i32, i32* %i, align 32, !llvm.access.group !2
>>   %6 = load i32, i32* %n, align 32, !llvm.access.group !2
>>   %cmp = icmp slt i32 %5, %6
>>   br i1 %cmp, label %for.body, label %for.end
>>
>> for.body: ; preds = %for.cond
>>   %7 = load i32, i32* %i, align 32, !llvm.access.group !2
>>   %8 = load i32, i32* %i, align 32, !llvm.access.group !2
>>   %idxprom = sext i32 %8 to i64
>>   %arrayidx = getelementptr inbounds i32, i32* %vla1, i64 %idxprom
>>   store i32 %7, i32* %arrayidx, align 32, !llvm.access.group !2
>>   br label %for.inc
>>
>> for.inc:  ; preds = %for.body
>>   %9 = load i32, i32* %i, align 32, !llvm.access.group !2
>>   %inc = add nsw i32 %9, 1
>>   store i32 %inc, i32* %i, align 32, !llvm.access.group !2
>>   br label %for.cond, !llvm.loop !3
>
> You will not need to create pointers for every array(or operand you want to 
> perform the operation on).

IMHO it is better if the programmer has to. It is not always obvious
which arrays are used in the loop. Also, the information can be used
by other optimzations that the vectorizer.


>>
>> void mult(float* x, int size, float factor){
>>   float* ax = (float*)__builtin_assume_aligned(x, 64);
>>   for (int i = 0; i < size; ++i)
>>  ax[i] *= factor;
>> }

https://godbolt.org/z/Fd6HMe

> the alignment is assumed whereas in #pragma it is set to the number specified.

Semantically, it is the same.

I wonder how you expect the assembly output to change? The
__builtin_assume_aligned, will be picked up by the backend and result
in movaps to be used instead of movups.


> it'll be easier, and having a pragma for doing this will help as it's 
> provided in OMP and intel compilers.

This is a compiler-specific extension. It does not have an influence
on what other compilers do. Even with clang, if you try to do

#pragma clang loop vectorize_assume_alignment(32)
#pragma omp simd
for (int i = 0; i < size; ++i)

clang will silently swallow the vectorize_assume_alignment.


Michael
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)

2019-11-20 Thread HAPPY Mahto via cfe-commits
Hello Michael,
Very sorry for the late reply, we had exams and assignments this week and I
had to read about _builtin_assume_aligned as I didn't come across this.

#pragma clang loop vectorize_assume_alignment(32)
> for(int i = 0;i < n; i++){
> a[i] = b[i] + i*i;
> }
>
 for this all-access inside the loop will be aligned to 32bit,
ex  IR

> for.cond: ; preds = %for.inc,
> %entry
>   %5 = load i32, i32* %i, align 32, !llvm.access.group !2
>   %6 = load i32, i32* %n, align 32, !llvm.access.group !2
>   %cmp = icmp slt i32 %5, %6
>   br i1 %cmp, label %for.body, label %for.end
>
> for.body: ; preds = %for.cond
>   %7 = load i32, i32* %i, align 32, !llvm.access.group !2
>   %8 = load i32, i32* %i, align 32, !llvm.access.group !2
>   %idxprom = sext i32 %8 to i64
>   %arrayidx = getelementptr inbounds i32, i32* %vla1, i64 %idxprom
>   store i32 %7, i32* %arrayidx, align 32, !llvm.access.group !2
>   br label %for.inc
>
> for.inc:  ; preds = %for.body
>   %9 = load i32, i32* %i, align 32, !llvm.access.group !2
>   %inc = add nsw i32 %9, 1
>   store i32 %inc, i32* %i, align 32, !llvm.access.group !2
>   br label %for.cond, !llvm.loop !3
>
You will not need to create pointers for every array(or operand you want to
perform the operation on).

> void mult(float* x, int size, float factor){
>   float* ax = (float*)__builtin_assume_aligned(x, 64);
>   for (int i = 0; i < size; ++i)
>  ax[i] *= factor;
> }
>
the IR generated for this :

> define void @mult(i32*, i32, float) #0 {
>   %4 = alloca i32*, align 8
>   %5 = alloca i32, align 4
>   %6 = alloca float, align 4
>   %7 = alloca i32*, align 8
>   %8 = alloca i32, align 4
>   store i32* %0, i32** %4, align 8
>   store i32 %1, i32* %5, align 4
>   store float %2, float* %6, align 4
>   %9 = load i32*, i32** %4, align 8
>   %10 = bitcast i32* %9 to i8*
>   %11 = ptrtoint i8* %10 to i64
>   %12 = and i64 %11, 63
>   %13 = icmp eq i64 %12, 0
>   call void @llvm.assume(i1 %13)
>   %14 = bitcast i8* %10 to i32*
>   store i32* %14, i32** %7, align 8
>   store i32 0, i32* %8, align 4
>   br label %15
>
> ; :15: ; preds = %29, %3
>   %16 = load i32, i32* %8, align 4
>   %17 = load i32, i32* %5, align 4
>   %18 = icmp slt i32 %16, %17
>   br i1 %18, label %19, label %32
>
> ; :19: ; preds = %15
>   %20 = load float, float* %6, align 4
>   %21 = load i32*, i32** %7, align 8
>   %22 = load i32, i32* %8, align 4
>   %23 = sext i32 %22 to i64
>   %24 = getelementptr inbounds i32, i32* %21, i64 %23
>   %25 = load i32, i32* %24, align 4
>   %26 = sitofp i32 %25 to float
>   %27 = fmul float %26, %20
>   %28 = fptosi float %27 to i32
>   store i32 %28, i32* %24, align 4
>   br label %29
>
> ; :29: ; preds = %19
>   %30 = load i32, i32* %8, align 4
>   %31 = add nsw i32 %30, 1
>   store i32 %31, i32* %8, align 4
>   br label %15
>
> ; :32: ; preds = %15
>   ret void
> }
>
the alignment is assumed whereas in #pragma it is set to the number
specified.
it'll be easier, and having a pragma for doing this will help as it's
provided in OMP and intel compilers.
Thank you, If I made any mistake please tell me.

Happy Mahto
CSE Undergrad, IIT Hyderabad


On Thu, Nov 14, 2019 at 10:32 PM Michael Kruse via Phabricator <
revi...@reviews.llvm.org> wrote:

> Meinersbur added a comment.
>
> Could you elaborate why this is better than `__builtin_assume_aligned`?
>
>
> Repository:
>   rG LLVM Github Monorepo
>
> CHANGES SINCE LAST ACTION
>   https://reviews.llvm.org/D69897/new/
>
> https://reviews.llvm.org/D69897
>
>
>
>
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)

2019-11-14 Thread Michael Kruse via Phabricator via cfe-commits
Meinersbur added a comment.

Could you elaborate why this is better than `__builtin_assume_aligned`?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69897/new/

https://reviews.llvm.org/D69897



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)

2019-11-13 Thread Roman Lebedev via Phabricator via cfe-commits
lebedev.ri added a comment.

To reword, if `vectorize_assume_alignment(32)` is *NOT* lowered
via `CodeGenFunction::EmitAlignmentAssumption()` in clang frontend,
but lowered into `LoopAttributes::VectorizeAssumeAlignment`,
then once  the alignment that was specified does not match reality,
there will be a miscompile instead of an UBSan error.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69897/new/

https://reviews.llvm.org/D69897



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)

2019-11-13 Thread Happy via Phabricator via cfe-commits
m-happy updated this revision to Diff 229190.
m-happy added a comment.

Updated the syntax as suggested by Michael Kruse, and added functionality to 
specify the number used for alignment instead of just using 32bits.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69897/new/

https://reviews.llvm.org/D69897

Files:
  clang/docs/LanguageExtensions.rst
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/AttrDocs.td
  clang/include/clang/Basic/DiagnosticParseKinds.td
  clang/lib/CodeGen/CGLoopInfo.cpp
  clang/lib/CodeGen/CGLoopInfo.h
  clang/lib/Parse/ParsePragma.cpp
  clang/lib/Sema/SemaStmtAttr.cpp
  clang/test/AST/ast-print-pragmas.cpp
  clang/test/CodeGenCXX/pragma-loop-aligned.cpp
  clang/test/Parser/pragma-loop.cpp
  clang/test/Parser/pragma-unroll-and-jam.cpp

Index: clang/test/Parser/pragma-unroll-and-jam.cpp
===
--- clang/test/Parser/pragma-unroll-and-jam.cpp
+++ clang/test/Parser/pragma-unroll-and-jam.cpp
@@ -67,7 +67,7 @@
   }
 
 // pragma clang unroll_and_jam is disabled for the moment
-/* expected-error {{invalid option 'unroll_and_jam'; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, pipeline, pipeline_initiation_interval, vectorize_predicate, or distribute}} */ #pragma clang loop unroll_and_jam(4)
+/* expected-error {{invalid option 'unroll_and_jam'; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, pipeline, pipeline_initiation_interval, vectorize_predicate, vectorize_assume_alignment, or distribute}} */ #pragma clang loop unroll_and_jam(4)
   for (int i = 0; i < Length; i++) {
 for (int j = 0; j < Length; j++) {
   List[i * Length + j] = Value;
Index: clang/test/Parser/pragma-loop.cpp
===
--- clang/test/Parser/pragma-loop.cpp
+++ clang/test/Parser/pragma-loop.cpp
@@ -18,42 +18,36 @@
 
 template 
 void test_nontype_template_vectorize(int *List, int Length) {
-  /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop vectorize_width(V)
-  for (int i = 0; i < Length; i++) {
+  /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop vectorize_width(V) for (int i = 0; i < Length; i++) {
 List[i] = i;
   }
 
-  /* expected-error {{invalid value '0'; must be positive}} */ #pragma clang loop vectorize_width(V / 2)
-  for (int i = 0; i < Length; i++) {
+  /* expected-error {{invalid value '0'; must be positive}} */ #pragma clang loop vectorize_width(V / 2) for (int i = 0; i < Length; i++) {
 List[i] += i;
   }
 }
 
 template 
 void test_nontype_template_interleave(int *List, int Length) {
-  /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop interleave_count(I)
-  for (int i = 0; i < Length; i++) {
+  /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop interleave_count(I) for (int i = 0; i < Length; i++) {
 List[i] = i;
   }
 
-  /* expected-error {{invalid value '0'; must be positive}} */ #pragma clang loop interleave_count(2 % I)
-  for (int i = 0; i < Length; i++) {
+  /* expected-error {{invalid value '0'; must be positive}} */ #pragma clang loop interleave_count(2 % I) for (int i = 0; i < Length; i++) {
 List[i] = i;
   }
 }
 
 template 
 void test_nontype_template_char(int *List, int Length) {
-  /* expected-error {{invalid argument of type 'char'; expected an integer type}} */ #pragma clang loop vectorize_width(V)
-  for (int i = 0; i < Length; i++) {
+  /* expected-error {{invalid argument of type 'char'; expected an integer type}} */ #pragma clang loop vectorize_width(V) for (int i = 0; i < Length; i++) {
 List[i] = i;
   }
 }
 
 template 
 void test_nontype_template_bool(int *List, int Length) {
-  /* expected-error {{invalid argument of type 'bool'; expected an integer type}} */ #pragma clang loop vectorize_width(V)
-  for (int i = 0; i < Length; i++) {
+  /* expected-error {{invalid argument of type 'bool'; expected an integer type}} */ #pragma clang loop vectorize_width(V) for (int i = 0; i < Length; i++) {
 List[i] = i;
   }
 }
@@ -61,8 +55,8 @@
 template 
 void test_nontype_template_badarg(int *List, int Length) {
   /* expected-error {{use of undeclared identifier 'Vec'}} */ #pragma clang loop vectorize_width(Vec) interleave_count(I)
-  /* expected-error {{use of undeclared identifier 'Int'}} */ #pragma clang loop vectorize_width(V) interleave_count(Int)
-  for (int i = 0; i < Length; i++) {
+  /* expected-error {{use of undeclared identifier 'Int'}} */ #pragma clang loop
+  vectorize_width(V) interleave_count(Int) for (int i = 0; i < Length; i++) {
 List[i] = i;
   }
 }
@@ -70,8 +64,7 @@
 template 
 void test_type_template_vectorize(int *List, int Length) {
   const T Value = -1;
-  /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop