from:"Andy Kaylor via Phabricator via cfe\-commits"

[PATCH] D151834: Include math-errno with fast-math

2023-09-06 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.

lgtm


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151834/new/

https://reviews.llvm.org/D151834

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D151834: Include math-errno with fast-math

2023-06-13 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2249
 
+  // MSVC takes into account math-errno when compiling at -O2 level in
+  // the presence of a '#pragma float_control precise(on/off)'.

Since we handle "#pragma float_control" for all targets, we should be 
consistent. I don't think the reference to MSVC here is useful.

I'm not sure why MSVC thinks it needs to call the sqrtf() function at O1, but I 
don't necessarily think we should follow that lead. If we generate the 
intrinsic here, the optimizer will make its own decisions about O1 versus O2, 
etc. There's no reason that errno should be preserved at O1 if it's been 
disabled with -fno-math-errno or we're enabled fast-math.



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2263
+  }
+  bool ErrnoWithO2OrFastMath =
+  MathErrnoOverrride && (CGM.getCodeGenOpts().OptimizationLevel == 2 ||

I'm confused by this condition. If MathErrnoOverride is true, don't we need to 
set errno regardless of the optimization level or fast-math setting?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2266
+ CGM.getLangOpts().FastMath);
+  bool OptNoneWithFastMath =
+  CurFuncDecl->hasAttr() && CGM.getLangOpts().FastMath;

Why do we need to check FastMath if the function is marked as optnone?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2284
+
+  if ((FD->hasAttr() && EmitIntrinsic) || EmitIntrinsic ||
+   ((ConstWithoutErrnoAndExceptions || ConstWithoutExceptions) &&

Based on the comment above, if the function call is marked as const, then it 
will never set errno, so I don't think we need to check EmitIntrinsic in that 
case.



Comment at: clang/test/CodeGen/math-errno.c:28
+// FAST-LABEL: define dso_local nofpclass(nan inf) float @f1
+// FAST: call fast nofpclass(nan inf) float @sqrtf(float noundef nofpclass(nan 
inf) {{.*}}) #[[ATTR3_FAST:[0-9]+]]
+//

Shouldn't the precise pragma remove the nofpcalss(nan inf) attributes? The 
definition of setFPPreciseEnabled looks like it's trying to do that. Maybe we 
need another patch to handle that?



Comment at: clang/test/CodeGen/math-errno.c:40
+// CHECK-LABEL: define dso_local float @f2
+// CHECK: tail call float @llvm.sqrt.f32(float {{.*}})
+

Again, this may be beyond the scope of your change, but it seems like '#pragma 
float_control(precise, off) should lead to fast-math flags being set on this 
call.



Comment at: clang/test/CodeGen/math-errno.c:43
+// FAST-LABEL: define dso_local nofpclass(nan inf) float @f2
+// FAST: call fast nofpclass(nan inf) float @sqrtf(float noundef nofpclass(nan 
inf) {{.*}}) #[[ATTR3_FAST]]
+

I expected the intrinsic to be called in this case. Do you know why it isn't? I 
believe it is without your change.



Comment at: clang/test/CodeGen/math-errno.c:46
+// NOOPT-LABEL: define dso_local float @f2
+// NOOPT: tail call float @llvm.sqrt.f32(float {{.*}})
+

I didn't expect to see the intrinsic in this case. If we use the intrinsic, the 
x86-backend will generate sqrtss instead of the function call, and I think that 
is wrong.



Comment at: clang/test/CodeGen/math-errno.c:56
+// CHECK-LABEL: define dso_local float @f3
+// CHECK: call float @llvm.sqrt.f32(float {{.*}})
+

Again, I didn't expect the intrinsic with optnone in any of these cases.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151834/new/

https://reviews.llvm.org/D151834

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D151834: Include math-errno with fast-math

2023-06-02 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/include/clang/Basic/FPOptions.def:30
 OPTION(BFloat16ExcessPrecision, LangOptions::ExcessPrecisionKind, 2, 
FPEvalMethod)
+OPTION(MathErrno, bool, 1, BFloat16ExcessPrecision)
 #undef OPTION

Does this mean MathErrno is tracked in both LangOpts and FPOptions?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2248
   FD->hasAttr() ? 0 : BuiltinID;
+  bool MathErrnoOverrride = false;
+  if (E->hasStoredFPFeatures()) {

You should add a comment here explaining what this is doing. I don't think it's 
obvious apart from the description of this patch.



Comment at: clang/lib/CodeGen/CGCall.cpp:2384
 
+  if (HasOptnone && !getLangOpts().MathErrno)
+OptNone = false;

I don't understand what this is doing.



Comment at: clang/lib/CodeGen/CodeGenModule.h:611
   void clear();
+  bool OptNone = true;
 

Why is this a module level flag, and why does it default to true?



Comment at: clang/test/CodeGen/math-errno.c:2
+// Tests that at -O2 math-errno is taken into account. Same than MSVC.
+// RUN: %clang_cc1 -Wno-implicit-function-declaration -fmath-errno \
+// RUN: -O2 -emit-llvm -o - %s \

Isn't math-errno true by default when fast-math isn't used?



Comment at: clang/test/CodeGen/math-errno.c:33
+
+__attribute__((optnone))
+float f4(float x) { 

Can you add a runline with -O0. That should prevent all instances of the 
intrinsics, right?



Comment at: clang/test/CodeGen/math-errno.c:49
+// FAST-LABEL: define dso_local nofpclass(nan inf) float @f4(float noundef 
nofpclass(nan inf) {{.*}})
+// FAST: call reassoc nnan ninf nsz arcp afn nofpclass(nan inf) float 
@sqrtf(float noundef nofpclass(nan inf) %0) #[[ATTR0:[0-9]+]]
+

I think the 'afn' flag here is a problem. The backend has no concept of errno, 
so 'afn' will be treated as allowing the function to be replaced.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151834/new/

https://reviews.llvm.org/D151834

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147733: Set rounding_mode to tonearest in presence of a #pragma STDC FENV_ACCESS OFF.

2023-04-11 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.

Looks good to me.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147733/new/

https://reviews.llvm.org/D147733

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147733: Set rounding_mode to tonearest in presence of a #pragma STDC FENV_ACCESS OFF.

2023-04-10 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/test/CodeGen/pragma-fenv_access.c:4
 // RUN: %clang_cc1 -fexperimental-strict-floating-point 
-ffp-exception-behavior=strict -triple %itanium_abi_triple -emit-llvm %s -o - 
-fms-extensions -DMS | FileCheck --check-prefixes=CHECK,STRICT %s
+// RUN %clang_cc1 -fexperimental-strict-floating-point -frounding-math 
-ffp-exception-behavior=strict -triple %itanium_abi_triple -emit-llvm %s -o - 
-fms-extensions -DMS | FileCheck --check-prefixes=CHECK,STRICT %s
 // RUN: %clang_cc1 -fexperimental-strict-floating-point -triple 
%itanium_abi_triple -emit-llvm %s -o - | FileCheck 
--check-prefixes=CHECK,DEFAULT %s

Why doesn't this case use STRICT-RND? round.tonearest, but I don't understand 
why.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147733/new/

https://reviews.llvm.org/D147733

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147733: Set rounding_mode to tonearest in presence of a #pragma STDC FENV_ACCESS OFF.

2023-04-07 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/test/CodeGen/pragma-fenv_access.c:239
+// CHECK-LABEL: @func_20
+// STRICT: call float @llvm.experimental.constrained.fadd.f32(float {{.*}}, 
float {{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
+// DEFAULT: fadd float

pengfei wrote:
> Should this be `ignore`?
This is a tricky case. By a strict reading of the C standard, this could be 
ignore, because the standard says the compiler can assume the default floating 
point environment when FENV_ACCESS is OFF and that if code compiled with 
FENV_ACCESS OFF is executed with anything other than the default environment 
the behavior is undefined. However, in this case strict exception semantics 
have been enabled elsewhere in the compilation unit, so floating point 
exceptions may be unmasked. The standard allows us to ignore exceptions, but 
raising a spurious exception may be bad for users.

I'm unsure about this case. I lean towards leaving it as Zahira has it here 
because I don't think the use of strict exception semantics will be common 
enough to justify the less conservative behavior.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147733/new/

https://reviews.llvm.org/D147733

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D147733: Set rounding_mode to tonearest in presence of a #pragma STDC FENV_ACCESS OFF.

2023-04-07 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/Sema/SemaAttr.cpp:1340
+  if (!IsEnabled)
+NewFPFeatures.setDisallowFenvAccess(IsEnabled);
   FpPragmaStack.Act(Loc, PSK_Set, StringRef(), NewFPFeatures);

Why is this only needed for "!IsEnabled"? Where is the rounding mode set in the 
IsEnabled case? It looks like setAllowFEnvAccessOverride() is defined by a 
macro and just sets a bit in the OverrideMask, but somehow it seems to be 
setting the rounding mode to Round.Dynamic. I'm just concerned that there is a 
disconnect in the implementation here.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147733/new/

https://reviews.llvm.org/D147733

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144454: Add builtin for llvm set rounding

2023-02-21 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

This looks good to me. I'm adding @rjmccall as a reviewer to make sure someone 
with front end expertise sees this.

There seems to be a potential type mismatch between the intrinsic and the 
builtin in the case of __builtin_flt_rounds(). Based on D24461 
, I wonder if that's a target-specific 
difference. I'm not sure if a similar issue exists in this case since we aren't 
copying a gcc builtin.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144454/new/

https://reviews.llvm.org/D144454

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D142907: LangRef: Add "dynamic" option to "denormal-fp-math"

2023-02-10 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In general, it seems like the denormal mode should be considered part of the 
floating point environment (though as far as I know the C standard, at least, 
doesn't document it as such). If it were considered part of the floating point 
environment, the LLVM rules would tell us we could assume the default setting, 
which I'd assume to be IEEE, and it would only be legal to change this mode in 
strict mode. However, for your use case preserving the behavior of fpclass 
seems like what users would want, even in fast-math modes. In this sense, this 
is a lot like the problem we have with preserving isnan() behavior when 
fast-math is enabled. Our rules allow it, but it's not what most people would 
want.

I think the new denormal mode is a good addition.

Do you need to do something with the inliner to handle the case where functions 
with different denormal modes are inlined into one another? We don't seem to 
handle that case correctly now (https://godbolt.org/z/PEsWaMEq6), but with the 
dynamic mode we could handle it without blocking inlining completely.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142907/new/

https://reviews.llvm.org/D142907

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136176: Implement support for option 'fexcess-precision'.

2022-12-15 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

With the latest test cases and explanations, I'm happy with this patch.

@rjmccall is there anything else you wanted to see addressed?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136176/new/

https://reviews.llvm.org/D136176

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136176: Implement support for option 'fexcess-precision'.

2022-12-15 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/test/CodeGen/X86/fexcess-precision.c:89
+// CHECK-EXT-NEXT:[[EXT1:%.*]] = fpext half [[TMP1]] to float
+// CHECK-EXT-NEXT:[[MUL:%.*]] = fmul float [[EXT]], [[EXT1]]
+// CHECK-EXT-NEXT:[[TMP2:%.*]] = load half, ptr [[C_ADDR]]

pengfei wrote:
> andrew.w.kaylor wrote:
> > I apologize for taking so long to get back to this. I'm not sure how to get 
> > this comment to show up as a response to the questions @rjmccall asked 
> > about my earlier remarks here. Apologies if this ends up being redundant.
> > 
> > I said:
> > 
> > > This is not what I'd expect to see. I'd expect the operations to use the 
> > > half type with explicit truncation inserted where needed.
> > 
> > 
> > @rjmccall said:
> > 
> > > Are you suggesting that the frontend emit half operations normally, with 
> > > some intrinsic to force half precision on casts and assignments, and that 
> > > a backend pass would aggressively promote operations between those 
> > > intrinsics? I think that would be a pretty error-prone representation, 
> > > both in terms of guaranteeing the use of excess precision in some 
> > > situations (and thus getting stable behavior across compiler releases) 
> > > and guaranteeing truncation in others (and thus preserving correctness). 
> > > The frontend would have to carefully emit intrinsics in a bunch of places 
> > > or else default to introducing a bug.
> > 
> > 
> > Maybe I'm asking that -fexcess-precision=fast not be an alias of 
> > -fexcess-precision=standard. The idea behind 'fast' as I understand it is 
> > to allow the compiler to generate whatever instructions are most efficient 
> > for the target. If the target supports native fp16 operations, the fastest 
> > code will use those instructions and not use excess precision. If the 
> > target does not support fp16, the fastest code would be to perform the 
> > calculations at 32-bit precision and only truncate on casts and assignments 
> > (or, even faster, not even then). You can see an analogy for what I mean by 
> > looking at what gcc does with 32-bit floats when SSE is disabled: 
> > https://godbolt.org/z/xhEGbjG4G
> > 
> > With -fexcess-precision=fast, gcc takes liberties to make the code run as 
> > fast as possible. With -fexcess-precision=standard, it truncates to the 
> > source value on assignments, casts, or return.
> > 
> > If you generate code using the half type here, the backend will legalize it 
> > and **should** make it as fast as possible. In fact, it looks like 
> > currently the X86 backend will insert calls to extend and truncate the 
> > values to maintain the semantics of the IR 
> > (https://godbolt.org/z/YGnj4cqvv). That's sensible, but it's not what the 
> > X86 backend does in the 32-bit float + no SSE case.
> > 
> > The main thing I'd want to say here is that the front end has no business 
> > trying to decide what will be fastest for the target and encoding that. At 
> > most, the front end should be encoding the semantics in the source and 
> > setting some flag in the IR to indicate the excess precision handling 
> > that's allowed. This feels to me like it requires a new fast-math flag, 
> > maybe 'axp' = "allow excess precision".
> > 
> > The case where the front end is encoding excess precision into the IR feels 
> > much more like -ffp-eval-method. When the front end encodes an fpext to 
> > float and does a calculation, the optimizer is forced to respect that 
> > unless it can prove that doing the calculation at lower precision won't 
> > change the result (and it rarely does that). For targets that have native 
> > FP16 support, this will mean that -fexcess-precision=fast results in slower 
> > code.
> > For targets that have native FP16 support, this will mean that 
> > -fexcess-precision=fast results in slower code.
> 
> This won't be true. We checked `TI.hasLegalHalfType()` before doing excess 
> precision. Maybe better to add a RUN for `avx512fp16` case to address the 
> concern.
> 
> > At most, the front end should be encoding the semantics in the source and 
> > setting some flag in the IR to indicate the excess precision handling 
> > that's allowed.
> 
> I think encoding excess precision into the IR is the easy way to encode the 
> information of the casts and assignments. Otherwise, we have to use 
> intrinsics to tell such information to the backend. As @rjmccall explained, 
> it's more error-prone.
> 
> > If you generate code using the half type here, the backend will legalize it 
> > and should make it as fast as possible.
> 
> I think the current backend implementations in both GCC and LLVM have already 
> tried to make it as fast as possible in the condition of not to make the code 
> too complicated.
> The behavior in 32-bit float + no SSE case is benefited from x87 feature that 
> always does excess precision. It is somehow taking bugs as feature from the 
> semantics of the IR.
> That says, it

[PATCH] D136176: Implement support for option 'fexcess-precision'.

2022-12-14 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/test/CodeGen/X86/fexcess-precision.c:89
+// CHECK-EXT-NEXT:[[EXT1:%.*]] = fpext half [[TMP1]] to float
+// CHECK-EXT-NEXT:[[MUL:%.*]] = fmul float [[EXT]], [[EXT1]]
+// CHECK-EXT-NEXT:[[TMP2:%.*]] = load half, ptr [[C_ADDR]]

I apologize for taking so long to get back to this. I'm not sure how to get 
this comment to show up as a response to the questions @rjmccall asked about my 
earlier remarks here. Apologies if this ends up being redundant.

I said:

> This is not what I'd expect to see. I'd expect the operations to use the half 
> type with explicit truncation inserted where needed.


@rjmccall said:

> Are you suggesting that the frontend emit half operations normally, with some 
> intrinsic to force half precision on casts and assignments, and that a 
> backend pass would aggressively promote operations between those intrinsics? 
> I think that would be a pretty error-prone representation, both in terms of 
> guaranteeing the use of excess precision in some situations (and thus getting 
> stable behavior across compiler releases) and guaranteeing truncation in 
> others (and thus preserving correctness). The frontend would have to 
> carefully emit intrinsics in a bunch of places or else default to introducing 
> a bug.


Maybe I'm asking that -fexcess-precision=fast not be an alias of 
-fexcess-precision=standard. The idea behind 'fast' as I understand it is to 
allow the compiler to generate whatever instructions are most efficient for the 
target. If the target supports native fp16 operations, the fastest code will 
use those instructions and not use excess precision. If the target does not 
support fp16, the fastest code would be to perform the calculations at 32-bit 
precision and only truncate on casts and assignments (or, even faster, not even 
then). You can see an analogy for what I mean by looking at what gcc does with 
32-bit floats when SSE is disabled: https://godbolt.org/z/xhEGbjG4G

With -fexcess-precision=fast, gcc takes liberties to make the code run as fast 
as possible. With -fexcess-precision=standard, it truncates to the source value 
on assignments, casts, or return.

If you generate code using the half type here, the backend will legalize it and 
**should** make it as fast as possible. In fact, it looks like currently the 
X86 backend will insert calls to extend and truncate the values to maintain the 
semantics of the IR (https://godbolt.org/z/YGnj4cqvv). That's sensible, but 
it's not what the X86 backend does in the 32-bit float + no SSE case.

The main thing I'd want to say here is that the front end has no business 
trying to decide what will be fastest for the target and encoding that. At 
most, the front end should be encoding the semantics in the source and setting 
some flag in the IR to indicate the excess precision handling that's allowed. 
This feels to me like it requires a new fast-math flag, maybe 'axp' = "allow 
excess precision".

The case where the front end is encoding excess precision into the IR feels 
much more like -ffp-eval-method. When the front end encodes an fpext to float 
and does a calculation, the optimizer is forced to respect that unless it can 
prove that doing the calculation at lower precision won't change the result 
(and it rarely does that). For targets that have native FP16 support, this will 
mean that -fexcess-precision=fast results in slower code.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136176/new/

https://reviews.llvm.org/D136176

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D139629: clang: Stop emitting "strictfp"

2022-12-14 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a subscriber: erichkeane.
andrew.w.kaylor added inline comments.

Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2135
-llvm::AttrBuilder FuncAttrs(F->getContext());
-FuncAttrs.addAttribute("strictfp");
-F->addFnAttrs(FuncAttrs);

arsenm wrote:
> zahiraam wrote:
> > I think it would better to fix this function instead of removing it 
> > entirely? The issue here is that there is the "strictfp" attribute and the 
> > llvm::Attribute::StrictFP. We could replace  
> > FuncAttrs.addAttribute("strictfp"); with
> >  FuncAttrs.addAttribute(llvm::Attribute::StrictFP); 
> > This function ensures that the function attribute is set when the 
> > FunctionDecl attribute is set. I am concerned that when it's removed, we 
> > will wind up with cases where the function attribute is missing! The only 
> > place where this function attribute is in CodeGenFunction::StartFunction. 
> > Is that enough? @andrew.w.kaylor Can you please weigh in on this?
> I currently don't have evidence that making this use the correct attribute 
> would fix anything. If something was depending on this emission in this 
> context, it's already broken
It may be that anything depending on this is already broken, but the code was 
written for a reason, even if it was always broken. I'm not sure I understand 
what that reason was, and unfortunately the person who wrote the code 
(@mibintc) is no longer actively contributing to LLVM. It was added here: 
https://reviews.llvm.org/D87528

It does seem like the llvm::Attribute::StrictFP is being added any time the 
string attribute is added, but they're coming from different places. The proper 
attribute seems to be coming from CodeGenFunction::StartFunction() which is 
effectively copying it from the function declaration. It's not clear to me 
whether there are circumstances where we get to setLLVMFunctionFEnvAttributes() 
through EmitGlobalFunctionDefinition() without ever having gone through 
CodeGenFunction::StartFunction(). It looks like maybe there are multiversioning 
cases that do that, but I couldn't come up with an example that does. 
@erichkeane wrote a lot of the multi-versioning code, so he might know more, 
but he's on vacation through the end of the month.

Eliminating this extra string attribute seems obviously good. In this 
particular location, though, I'd be inclined to set the enumerated attribute 
here, even though that might be redundant in most cases. On the other hand, if 
one of our front end experts can look at this code and say definitively that 
it's //always// redundant, I'd be fine with this code being deleted.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139629/new/

https://reviews.llvm.org/D139629

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97854: [RFC][nsan] A Floating-point numerical sanitizer.

2022-11-30 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

This looks very interesting. I had a discussion with someone at the recent LLVM 
Dev Meeting about the possibility of something like this. However, rather than 
tracking error based on data precision, I am interested in tracking errors 
introduced by fast-math based optimizations. For instance, someone has a 
program with has been verified within accuracy requirements, but they want it 
to run faster so they enable fast-math. Sometimes this works, but other times 
the fast-math optimizations introduce an unacceptable amount of error. What I'd 
like is to be able to trace the problem back to which part of the original 
source was sensitive to this error so that I can disable fast-math locally for 
just that operation.

Another potential use for this sort of technology relates to an RFC that I just 
posted (https://reviews.llvm.org/D138867). There I'm trying to introduce a 
mechanism that allows the compiler to replace builtin math operations (calls to 
math library functions or equivalent builtins in platforms like SYCL or CUDA) 
based on a specified accuracy requirement. One of the challenges with this is 
verifying that the substitute call is really as accurate as it claims. For 
example, let's say I want to call cosf() and I need 4 ulp accuracy. The 
standard GNU libm implementation claims 1 ulp accuracy, so it's not necessarily 
useful as a point of comparison. But if we had a shadow computation that 
converted the input value to double and called cos() then converted that result 
back to float, that should give me the correctly rounded result, right? Or I 
could use a shadow call to one of the various correctly rounded implementations 
that are becoming available. It would be great to use nsan to verify the 
results from these alternate library calls.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97854/new/

https://reviews.llvm.org/D97854

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136176: Implement support for option 'fexcess-precision'.

2022-11-15 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

Can you update the clang user's manual to describe the behavior of this option? 
The excess precision issue is also mentioned in the clang language extensions 
document 
(https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point).




Comment at: clang/include/clang/Basic/LangOptions.def:320
 BENIGN_ENUM_LANGOPT(FPEvalMethod, FPEvalMethodKind, 2, FEM_UnsetOnCommandLine, 
"FP type used for floating point arithmetic")
+BENIGN_ENUM_LANGOPT(FPPrecisionMode, FPExcessPrecisionModeKind, 2, 
FPP_Standard, "FP precision used for floating point arithmetic")
 LANGOPT(NoBitFieldTypeAlign , 1, 0, "bit-field type alignment")

A lot of people mix up precision and accuracy, so I think this description is 
likely to be ambiguous. I'd suggest "Intermediate truncation behavior for 
floating point arithmetic"



Comment at: clang/include/clang/Basic/LangOptions.h:300
+FPP_Fast,
+FPP_None
+  };

Is FPP_None somehow the same as fexcess-precision=16? If not, what does it mean?



Comment at: clang/include/clang/Driver/Options.td:1564
+def fexcess_precision_EQ : Joined<["-"], "fexcess-precision=">, 
Group, Flags<[CC1Option]>,
+  HelpText<"Specifies the precision in which this floating-point operations 
will be calculated.">,
+  Values<"standard,fast,16">, NormalizedValuesScope<"LangOptions">,

Again, I think this is ambiguous. My understanding is that the option is more 
about the rules for when intermediate values are truncated to source precision 
than when what precision is prescribed for the calculation. For targets that 
have native support for half precision types, the calculations should always be 
done at source precision and this option will have no effect. It's only when 
the native ISA doesn't support the source precision that this is needed and 
even then we will always perform calculations at the nearest precision to 
source precision that is available, right?



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2995
+  StringRef Val = A->getValue();
+   if (TC.getTriple().getArch() == llvm::Triple::x86 && Val.equals("16"))
+D.Diag(diag::err_drv_unsupported_opt_for_target)

Why is 16 only supported for x86? Is it only here for gcc compatibility?



Comment at: clang/test/CodeGen/X86/fexcess-precise.c:89
+// CHECK-EXT-NEXT:[[EXT1:%.*]] = fpext half [[TMP1]] to float
+// CHECK-EXT-NEXT:[[MUL:%.*]] = fmul float [[EXT]], [[EXT1]]
+// CHECK-EXT-NEXT:[[TMP2:%.*]] = load half, ptr [[C_ADDR]], align 2

This is not what I'd expect to see. I'd expect the operations to use the half 
type with explicit truncation inserted where needed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136176/new/

https://reviews.llvm.org/D136176

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136786: Fix `unsafe-fp-math` attribute emission.

2022-10-31 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D136786#3893559 , 
@michele.scandale wrote:

> The fact that `"unsafe-fp-math"="true"` implies `-ffp-contract=fast` is quite 
> unfortunate, and it is something that in principle should be addressed.
> My understanding is that the floating point math function attributes are a 
> quite old concept that predates the per-instruction fast-math flags. Ideally 
> we should get rid of the flooding point math function attribute since we do 
> have a finer grain representation.
> A while back the main issue was that all the backends code (e.g. DAGCombiner) 
> were only using the `TargetOptions` properties (hence the function 
> attributes) to control the floating point related transformations, hence 
> there would have been regressions.

Yes, the function attributes are a vestige of the time before the fast-math 
flags were introduced, but the use of them hasn't been entirely eliminated and 
I think there are a couple of cases where some people think it's necessary. I 
posted an RFC and a patch about a year ago to start cleaning this up, but I got 
pulled away before it landed and honestly I had forgotten about it. 
https://discourse.llvm.org/t/rfc-eliminating-non-ir-floating-point-controls-in-the-selection-dag/59173

> At high level, I think that we need to understand how important is to match 
> GCC behavior in this particular case. We can change the way Clang defines 
> `-funsafe-math-optimizations` to imply `-ffp-contract=fast`, but that seems 
> the wrong solution as it feels like promoting a bug to a feature.

I wouldn't consider the fact that unsafe-fp-math (as a concept) implies 
fp-contract=fast to be a bug. Yes, it may appear to deviate from the gcc 
behavior, but the reasons for that are complicated. Mostly they stem from the 
fact that gcc doesn't support fp-contract in the way that the C standard 
describes it.  In gcc, fp-contract is fast or off, and it defaults to fast. If 
you pass -ffp-contract=on to gcc, it behaves exactly like -ffp-contract=off. If 
you pass -std=c99, the default for fp-contract changes to on, but because gcc 
doesn't support fp-contract=on, FMA isn't formed. 
https://godbolt.org/z/K86Kv8W7h

My take on this is that fp-contract=on is a mode that conforms to the language 
standard and fp-contract=fast allows value changing optimizations that do not 
conform to the standard. Value-changing optimizations that do not conform to 
the standard are exactly what the definition of -funsafe-math-optimizations 
allow, so I can't see any reason that -funsafe-math-optimizations shouldn't 
imply fp-contract=fast.

So I was going to say that gcc is wrong, even by its own rules, to not allow 
fp-contract=fast behavior under -funsafe-math-optimizations, but then I 
double-checked my understanding and made a very happy discovery ... 
-funsafe-math-optimizations DOES imply fp-contract=fast in gcc! You just need 
to follow a convoluted path to see that (i.e. change the default to something 
other than fast using -std=c99 and then add the unsafe math option). 
https://godbolt.org/z/K1GonvGdT

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136786/new/

https://reviews.llvm.org/D136786

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D136786: Fix `unsafe-fp-math` attribute emission.

2022-10-28 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D136786#3892177 , @zahiraam wrote:

>>> I'm not following entirely, but -funsafe-math-optimizations is just a 
>>> subset of -ffast-math, so checking only the properties that contribute to 
>>> -funsafe-math-optimizations should be enough. 
>>>  I think it is way simpler to reason in these terms than enumerating all 
>>> the possible scenarios.
>
> Exactly my point Since -funsafe-math-optimizations is a subset of 
> -ffast-math, shouldn't it be the "minimal" option to trigger the 
> unsafe-fp-math attribute for functions? 
> I agree with the second part of the condition; it should be
> updated with (LangOpts.getDefaultFPContractMode() ==  
> LangOptions::FPModeKind::FPM_Fast ||   LangOpts.getDefaultFPContractMode() == 
> LangOptions::FPModeKind::FPM_FastHonorPragmas) 
> but I am still hesitant about the change you are proposing for the first part 
> of the condition.
> @andrew.w.kaylor Can you please weigh in on this?

I talked to Zahira about this offline, and the current state is even messier 
than I realized. I think we need to start by making sure we agree on the 
behavior we want and then figure out how to get there. There are some messy 
complications in the different ways things are handled in the driver, the front 
end, and the backend. There are more overlapping settings than I would like, 
but I guess it's best to look for the best way we can incrementally improve 
that situation.

As a first principle, I'd like to clarify a "big picture" item that I was 
trying to get at in my earlier comment. I'd like to be able to reason about 
this from the starting point of individual semantic modes. There is a set of 
modes defined in the clang user's manual 
(https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior).
 I think this is reasonably complete:

  exception_behavior
  fenv_access
  rounding_mode
  fp-contract
  denormal_fp_math
  denormal_fp_math_32
  support_math_errno
  no_honor_nans
  no_honor_infinities
  no_signed_zeros
  allow_reciprocal
  allow_approximate_fns
  allow_reassociation

These are the modes from clang's perspective. They get communicated to the back 
end in different ways. The backend handling of this isn't nearly as consistent 
as I'd like, but that's a different problem. I think these basic modes can be 
thought of as language-independent and should be used as the basic building 
blocks for reasoning about floating point behavior across all LLVM-based 
compilers.

On top of these modes, we have two concepts "fast-math" and 
"unsafe-math-optimizations" which are essentially composites of one or more of 
the above semantic modes. I say "essentially" because up to this point there 
has been some fuzzy handling of that. I'd like to say they are absolutely 
composites of the above modes, and I think we can make it so, starting here.

If we agree that fast-math and unsafe-math-optimizations/unsafe-fp-math are 
composites of some of the other modes, then we can apply a symmetry property 
that I think will be helpful in the problem we're trying to solve here and will 
lead to better reasoning about these settings in the future.

I am proposing that passing "-ffast-math" should have *exactly* the same 
effects as passing all of the individual command line options that are implied 
by -ffast-math. Likewise, passing "-funsafe-math-optimizations" should have 
*exactly* the same effects as passing all of the individual options that are 
implied by -funsafe-math-optimizations. And, of course, any combination of 
options that turn various states on and off can be strung together and we can 
just evaluate the final state those options leave us in to see if we are in the 
"fast-math" state or the "unsafe-fp-math" state.

I'm going to ignore fast-math right now, because I think the current handling 
is mostly OK, and this review is about unsafe-fp-math. The unsafe-fp-math case 
is a little simpler in concept, but I think we've got more problems with it.

Another fundamental principle I'd like to declare here is that 
LangOpts.UnsafeFPMath, TargetOptions.UnsafeFPMath, and the "unsafe-fp-math" 
function attribute should all mean exactly the same thing. The function 
attribute has a different scope, so it might not always have the same value as 
the other two, but it should at least mean the same thing.

My understanding is this:

  unsafe-fp-math = exception_behavior(ignore) +
   fenv_access(off) +
   no_signed_zeros(on) +
   allow_reciprocal(on) +
   allow_approximate_fns(on) +
   allow_reassociation(on) +
   fp_contract(fast)

The first two of those settings are the default behavior. The others can be 
turned on by individual command line options. If all of those semantic modes 
are in the states above, then we are in "unsafe-fp-math" mode. That's my 
proposal.

There are a couple of things we

[PATCH] D136786: Fix `unsafe-fp-math` attribute emission.

2022-10-28 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/CodeGen/CGCall.cpp:1865
   FuncAttrs.addAttribute("approx-func-fp-math", "true");
-if ((LangOpts.FastMath ||
- (!LangOpts.FastMath && LangOpts.AllowFPReassoc &&
-  LangOpts.AllowRecip && !LangOpts.FiniteMathOnly &&
-  LangOpts.NoSignedZero && LangOpts.ApproxFunc)) &&
-LangOpts.getDefaultFPContractMode() != 
LangOptions::FPModeKind::FPM_Off)
+if (LangOpts.UnsafeFPMath &&
+(LangOpts.getDefaultFPContractMode() ==

michele.scandale wrote:
> michele.scandale wrote:
> > zahiraam wrote:
> > > zahiraam wrote:
> > > > michele.scandale wrote:
> > > > > michele.scandale wrote:
> > > > > > I've found quite confusing the `(FastMath || (!FastMath && ... ))`.
> > > > > > 
> > > > > > Using directly `UnsafeFPMath` seems more compact, however it also 
> > > > > > causes to taking into account the value for `MathErrno` -- which it 
> > > > > > might be not relevant.
> > > > > > 
> > > > > > If `MathErrno` shouldn't affect this, then I would rewrite this as:
> > > > > > ```
> > > > > > if (LangOpts.AllowFPReassoc && LangOpts.AllowRecip &&
> > > > > >  LangOpts.NoSignedZero && LangOpts.ApproxFunc &&
> > > > > >  (LangOpts.getDefaultFPContractMode() ==
> > > > > >  LangOptions::FPModeKind::FPM_Fast ||
> > > > > >  LangOpts.getDefaultFPContractMode() ==
> > > > > >  LangOptions::FPModeKind::FPM_FastHonorPragmas))
> > > > > >   FuncAttrs.addAttribute("unsafe-fp-math", "true");
> > > > > > ```
> > > > > > so that only the relevant options are considered and there is no 
> > > > > > need to think about what is implied by `FastMath`.
> > > > > I do wonder if in 
> > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Clang.cpp#L3089-L3091
> > > > >  is actually correct to have `!MathErrno`. In the GCC documentation 
> > > > > of `-funsafe-math-optimizations` I don't see any connection to the 
> > > > > `-fmath-errno` or `-fno-math-errno` options.
> > > > Interesting! Using on the command line 'funsafe-math-optimization' 
> > > > implies 'math-errno'. But 'funsafe-math-optimizations' is triggered 
> > > > when this is true:
> > > >  !MathErrno && AssociativeMath && ReciprocalMath && !SignedZeros && 
> > > >  ApproxFunc && !TrappingMath
> > > > 
> > > > See here: 
> > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Clang.cpp#L3089
> > > >  
> > > When the ‘funsafe-math-optimizations’ is used on the command line 
> > > LangOpts.UnsafeFP is ‘0’. 
> > > The function attribute ‘funnsafe-fp-math’ is not set. Is that what’s 
> > > wanted? I would have thought that we want the attribute enabled for 
> > > unsafe mode.
> > > 
> > > If we call 
> > > SpecialOperations = AllowFPReassoc && NoSignedZero && AllowRecip && 
> > > ApproxFunc && !RoundingMath.
> > > 
> > > ‘ffast-math’ = !MathErrno && FiniteMathOnly && SpecialOperations
> > > ‘funsafe-math-optimizations’ = MathErrno && !FiniteMathOnly && 
> > > SpecialOperations 
> > > (it's true that there is no mention of MathErrno in the GCC doc, but the 
> > > default is fmath-errno so presumably using this option on the command 
> > > line implies MathErrno).
> > > 
> > > 
> > > The function attribute UnsafeFPMath is enabled for 
> > > (‘ffast-math’ || ‘funsafe-math-optimization’). 
> > > That will lead to this condition: 
> > > (SpecialOperations  && MathErrNo && "-ffast-math" && -ffp-contract=fast") 
> > > || (SpecialOperations && "-fmath-errno" && "-ffp-contract=on") .
> > > 
> > > 
> > The compiler driver has a default value for `MathErrno` which depends on 
> > the toolchain properties. When `-funsafe-math-optimizations` is processed 
> > `MathErrno` is not affected, but then the compiler driver specify 
> > `-funsafe-math-optimizations` to the CC1 command line only if `MathErrno` 
> > is false.
> > The way the compiler driver mutate the floating point options state when 
> > processing `-funsafe-math-optimizations` seems to match the GCC 
> > documentation, but then the condition for the forwarding expect something 
> > more.
> > It is not clear to me if the intention is to match GCC documented behavior, 
> > or if instead the Clang definition of `-funsafe-math-optimizations` implies 
> > `-fno-math-errno`.
> >When the ‘funsafe-math-optimizations’ is used on the command line 
> >LangOpts.UnsafeFP is ‘0’. 
> The function attribute ‘funnsafe-fp-math’ is not set. Is that what’s wanted? 
> I would have thought that we want the attribute enabled for unsafe mode.
> 
> I believe this is caused by `MathErrno` state in the compiler driver being 
> not affected when processing `-funsafe-math-optimizations`, but considered 
> for the forwarding to the CC1.
> 
> >If we call 
> >SpecialOperations = AllowFPReassoc && NoSignedZero && AllowRecip && 
> >ApproxFunc && !RoundingMath.
> >
> >‘ffast-math’ = !MathErrno && FiniteMathOnly && SpecialOperations
>

[PATCH] D122155: Add warning when eval-method is set in the presence of value unsafe floating-point calculations.

2022-03-22 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/include/clang/Basic/DiagnosticFrontendKinds.td:50-53
+def warn_eval_method_setting_via_option_in_value_unsafe_context : Warning<
+"setting the eval method via '-ffp-eval-method' has not effect when 
numeric "
+"results of floating-point calculations aren't value-safe.">,
+InGroup;

zahiraam wrote:
> aaron.ballman wrote:
> > Unless you have a strong reason for this to be a warning, this seems like a 
> > situation we should diagnose as an error with a much clearer message.
> May  be @andrew.w.kaylor would weigh in on this?
I was going to say that for the command line option we could just issue a 
warning saying that the later option overrides the earlier, but it's a bit 
complicated to sort out what that would mean if the eval method follows a 
fast-math option and it might not always be what the user intended. So, I guess 
I'd agree that it should be an error.

For the case with pragmas, the model I'd follow is the mixing of #pragma 
float_control(except, on) with a fast-math mode or #pragma 
float_control(precise, off) with a non-ignore exception mode. In both those 
cases we issue an error.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122155/new/

https://reviews.llvm.org/D122155

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-11 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

This looks good to me. Thanks for the patch!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-10 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/CodeGen/CodeGenModule.cpp:2067
+  // favor this processor.
+  TuneCPU = SD->getCPUName(GD.getMultiVersionIndex())->getName();
+}

Unfortunately, I don't think it's this easy. The list of names used for 
cpu_specific doesn't come from the same place as the list of names used by 
"tune-cpu". For one thing, the cpu_specific names can't contain the '-' 
character, so we have names like "skylake_avx512" in cpu_specific that would 
need to be translated to "skylake-avx512" for "tune-cpu". I believe the list of 
valid names for "tune-cpu" comes from here: 
https://github.com/llvm/llvm-project/blob/26cd258420c774254cc48330b1f4d23d353baf05/llvm/lib/Support/X86TargetParser.cpp#L294

Also, some of the aliases supported by cpu_specific don't have any 
corresponding "tune-cpu" name. You happen to have picked one of these for the 
test. I believe "core_4th_gen_avx" should map to "haswell".



Comment at: clang/test/CodeGen/attr-cpuspecific-avx-abi.c:28
 // CHECK: attributes #[[V]] = 
{{.*}}"target-features"="+avx,+avx2,+bmi,+cmov,+crc32,+cx8,+f16c,+fma,+lzcnt,+mmx,+movbe,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
+// CHECK-SAME: "tune-cpu"="core_4th_gen_avx"

As noted above, this isn't a valid setting for "tune-cpu". I think it would 
just be ignored.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D121410: Have cpu-specific variants set 'tune-cpu' as an optimization hint

2022-03-10 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

This example illustrates the problem this patch intends to fix: 
https://godbolt.org/z/j445sxPMc

For Intel microarchitectures before Skylake, the LLVM cost model says that 
vector fsqrt is slow, so if fast-math is enabled, we'll use an approximation 
rather than the vsqrtps instruction when vectorizing a call to sqrtf(). If the 
code is compiled with -march=skylake or -mtune=skylake, we'll choose the 
vsqrtps instruction, but with any earlier base target, we'll choose the 
approximation even if there is a cpu_specific(skylake) implementation in the 
source code.

For example

  __attribute__((cpu_specific(skylake))) void foo(void) {
for (int i = 0; i < 8; ++i)
  x[i] = sqrtf(y[i]);
  }

compiles to

  foo.b:
  vmovaps ymm0, ymmword ptr [rip + y]
  vrsqrtpsymm1, ymm0
  vmulps  ymm2, ymm0, ymm1
  vbroadcastssymm3, dword ptr [rip + .LCPI2_0] # ymm3 = 
[-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0,-3.0E+0]
  vfmadd231ps ymm3, ymm2, ymm1# ymm3 = (ymm2 * ymm1) + ymm3
  vbroadcastssymm1, dword ptr [rip + .LCPI2_1] # ymm1 = 
[-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
  vmulps  ymm1, ymm2, ymm1
  vmulps  ymm1, ymm1, ymm3
  vbroadcastssymm2, dword ptr [rip + .LCPI2_2] # ymm2 = 
[NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN]
  vandps  ymm0, ymm0, ymm2
  vbroadcastssymm2, dword ptr [rip + .LCPI2_3] # ymm2 = 
[1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38,1.17549435E-38]
  vcmplepsymm0, ymm2, ymm0
  vandps  ymm0, ymm0, ymm1
  vmovaps ymmword ptr [rip + x], ymm0
  vzeroupper
  ret

but it should compile to

  foo.b:
  vsqrtps ymm0, ymmword ptr [rip + y]
  vmovaps ymmword ptr [rip + x], ymm0
  vzeroupper
  ret


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121410/new/

https://reviews.llvm.org/D121410

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D121122: Set FLT_EVAL_METHOD to -1 when fast-math is enabled.

2022-03-08 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

The fix for the eval_method crash should be moved to a separate patch. 
Otherwise, this looks good. I have only minor comments.




Comment at: clang/lib/Lex/PPMacroExpansion.cpp:1600
+} else {
+  OS << getTUFPEvalMethod();
+  // __FLT_EVAL_METHOD__ expands to a simple numeric value.

It looks like you've got tabs in the code here.



Comment at: clang/lib/Sema/SemaAttr.cpp:521
+// Fast-math is enabled.
+PP.setCurrentFPEvalMethod(
+Loc, LangOptions::FPEvalMethodKind::FEM_Indeterminable);

Why don't you need the reverse of this in the Precise and Pop cases? Also, why 
is it necessary to call getCurrentFPEvalMethod() immediately after setting it?



Comment at: clang/test/CodeGen/eval-method-fast-math.c:55
+  return __FLT_EVAL_METHOD__;
+}

Can you add a test that uses the float_control push and pop?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121122/new/

https://reviews.llvm.org/D121122

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-07 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D120395#3359255 , @pengfei wrote:

> I don't agree. Unlike `__fp16`, `__bf16` is simple an ARM specific type.

Why is __bf16 an ARM-specific type? It's a type that describes a floating point 
value with a specific binary representation that is supported on some ARM and 
some Intel processors. Why should the type be ARM-specific? What the clang 
documentation says is just a description of the current implementation. We can 
change the documentation based on what the compiler supports.

In D120395#3359255 , @pengfei wrote:

> So, it is not a problem of easy or difficult. It's a wrong direction we can't 
> go with it (introducing new IR type without clear ABI declarations). We made 
> such mistake with the `half` type. https://godbolt.org/z/K55s5zqPG We 
> shouldn't make it again.

If people are writing code that uses variables of this type, we aren't really 
solving anything by pretending it's a different type because the ABI for the 
type they're using hasn't been defined. The solution here is to work with HJ 
and others to get the ABI defined for this type.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-07 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D120395#3358533 , @craig.topper 
wrote:

> __m256bh should not have been a new type. It should have been an alias of 
> __m256i. We don't have load/store intrinsics for __m256bh so if you can even 
> get the __m256bh type in and out of memory using load/store intrinsics, it is 
> only because we allow lax vector conversion by default. 
> -fno-lax-vector-conversions will probably break any code trying to load/store 
> it using a load/store intrinsic. If __m256bh was made a struct as at one 
> point proposed, this would have been broken.
>
> If we want __m256bh to be a unique type using __bf16, we must define load, 
> store, and cast intrinsics for it. We would probably want insert/extract 
> element intrinsics as well.

From this, it sounds like our intrinsics support is incomplete for this type. 
Even if it is defined as an alias of some existing type (such as __m256i), I 
would have to do the mental gymnastics of mixing and matching intrinsics to get 
the behavior I want. I think this gets back to the question @scanon asked about 
the semantics of this type. What should I be able to do with it? Can I load a 
vector of these values from memory? Can I store them to memory? Can I assign 
one vector to another? Can I pass it as an argument? Can I use it as a return 
type? It looks the only things we have intrinsics for (for the vector types) 
are converting to and from single precision vectors and performing the dot 
product accumulate operation.

I was wondering about similar issues when I was looking at what we had for the 
__bfloat16 type. We have intrinsics to convert this type to and from single 
precision floating point values, but I can't do anything else with it. Nothing 
else at all, including inserting it into a vector of bf16 values.

So @pengfei is trying to press ahead with the backend implementation, but our 
front end support is incomplete. That might explain why Phoebe and I haven't 
been able to agree on what should be done here.

This patch is strictly a front end patch, but it's trying to just wedge 
definitions into header files to get the desired outcome in the code 
generation. From the user's perspective, it feels totally broken.

Consider this function.

  __mm128 f(__bfloat16 *p1, __bfloat16 *p2) {
// Load the vectors using the integer load intrinsics??
__mm128i temp1 = _mm_loadu_epi32(p1);
__mm128i temp2 = _mm_loadu_epi32(p2);

// Zero-initialize the a base value vector
__mm128 base = _mm_set_ps1(0.0f);

// Perform the dot product
return _mm_dpbf16_ps (base, temp1, temp2);
  }

Is what you'd expect with the current definitions? It looks like it produces 
the instructions I expected, but with -fno-lax-vector-conversions I get an 
error unless I explicitly bitcast the arguments from `__m128i` to `__m128bh`.

I think that just brings me up to speed with what Craig was saying, right?

So at this point we have these options:

1. Make the `__m[128|256|512]bh` types aliases of `__m[128|256|512]i`
2. Deprecate the `__m[128|256|512]bh` types and replace them with 
`__m[128|256|512]i`
3. Add load/store/insert/extract intrinsics for the `__bfloat16` type

Of these, I'd prefer the third option because both of the first two require the 
an overloaded use of the vector-integer type. I already don't like that we use 
the same type for any size integer vector. Using it for BF16 vectors just seems 
wrong.

For the example above, I'd like to write code like this:

  __mm128 f(__bfloat16 *p1, __bfloat16 *p2) {
// Load the BF16 vectors
__mm128bh v1 = _mm_load_pbh(p1);
__mm128bh v2 = _mm_load_pbh(p2);

// Zero-initialize the a base value vector
__mm128 base = _mm_set_ps1(0.0f);

// Perform the dot product
return _mm_dpbf16_ps (base, v1, v2);
  }

That's more work, but it has the merit of allowing me to use types that match 
what the program is doing.

The fact that you can't pass a __m128i value to a function that is expecting 
__m128bh is a good thing. We shouldn't be making changes that prevents this 
diagnostic.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-03 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D120395#3356355 , @pengfei wrote:

> Good question! This is actually the scope of ABI. Unfortunately, we don't 
> have the BF16 ABI at the present. We can't assume what are the physical 
> registers the arguments been passed and returned before we have such a 
> hardware. For example, ARM has soft FP ABI that supports FP arithmetic 
> operations and passes and returns arguments by integer registers. When we 
> enabling some ISA set whose type doesn't have ABI representation, e.g., F16C, 
> we borrowed such conception. And as a trade off, we used integer rather than 
> introducing a new IR type, since we don't need to support the arithmetic 
> operations.

I don't see the point of the ARM soft-float comparison, given that X86 doesn't 
have the strict distinction between integer and floating point registers that 
ARM has, at least not for the XMM/YMM/ZMM registers. Consider the following 
code:

  __m128bh foo(__m128 x) {
return _mm_cvtneps_pbh(x);
  }
  __m128 bar(__m128bh x) {
return _mm_cvtpbh_ps(x);
  }

Currently, both clang and gcc will use XMM0 for the argument and return value 
in both functions. Is XMM0 an integer register or a floating point register? 
There is no such distinction. It's true that the x86_64 psABI does talk about 
the general purpose registers as integer registers, and both clang and gcc will 
use one of these registers for `__bfloat16` values, but that's an 
implementation detail (and a dubious one, considering that nearly anything 
useful that you can do with a `__bfloat16` will require moving it into an SSE 
register).

Also, you say we can't assume what registers will be used (in the eventual 
ABI?) but we are assuming exactly that. If the ABI is ever defined differently 
than what clang and gcc are currently doing, they will both be wrong.

But all of this only applies to the backend code generation. It has very little 
to do with the intrinsic definition in the header file or the IR generated by 
the front end. If we continue to define `__bfloat16` as an `unsigned short` in 
the header file, the front end will treat it as an `unsigned short` and it will 
use its rules for `unsigned short` to generate IR. If the ABI is ever defined 
to treat BF16 differently than `unsigned short`, the front end won't be able to 
do anything about that because we've told the front end that the value is an 
unsigned short.

On the other hand, if we define the `__bfloat16` type as the built-in `__bf16` 
type, then the front end can apply whatever rules it has for that type, 
including adding whatever ABI handling is needed for BF16 values. If that ends 
up being the same as the rules for `unsigned short`, that's no problem. The 
front end can implement it that way. If it ends up being something different, 
the front end can apply rules for whatever the alternative is. The point is, by 
telling the front end that this is a BF16 value, we allow the front end to 
control the semantics for it. This would then result in the front end 
generating IR using `bfloat` as the type for BF16 values. Again, this is a 
correct and accurate description of the value. It allows the optimizer to 
reason about it correctly in any way it needs to.

I don't see why we would treat BF16 values as `unsigned short` and `i16` 
throughout the compiler just to make the backend implementation easier when we 
already have types available for BF16.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-03-02 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.
Herald added a project: All.

In D120395#3346591 , @scanon wrote:

> There's a lot of churn around proposed "solutions" on this and related PR, 
> but not a very clear analysis of what the problem we're trying to solve is.

I thought the problem that the patch was originally trying to solve is that 
`__bfloat16` variables could be used in arithmetic operations (using the 
`__bfloat16` type defined in the avx512bf16intrin.h header). For example, clang 
currently compiles this code without any diagnostics if the target processor 
has the required features (avx512bf16 and avx512vl).

  #include 
  float f(float x, float y) {
__bfloat16 x16 = _mm_cvtness_sbh(x);
__bfloat16 y16 = _mm_cvtness_sbh(y);
__bfloat16 z16 = x16 + y16;
return _mm_cvtsbh_ss(z16);
  }

https://godbolt.org/z/vcbcGsPPx

The problem is that the instructions generated for that code are completely 
wrong because `__bfloat` is defined as `unsigned short`. It relies on the user 
knowing that they shouldn't use this type in arithmetic operations.

Like I said, I //thought// that was the original intention of this patch. 
However, the latest version of the patch doesn't prevent this at all. In fact, 
it makes the problem worse by asking the user to define the BF16 variables as 
unsigned short in their code. Getting correct behavior from this point

@pengfei Please correct me if I misunderstood the purpose of this patch.

In D120395#3346591 , @scanon wrote:

> Concretely, what are the semantics that we want for the BF16 types and 
> intrinsics? Unlike the other floating-point types, there's no standard to 
> guide this, so it's even more important to clearly specify how these types 
> are to be used, instead of having an ad-hoc semantics of whatever someone 
> happens to implement.

The binary representation of a BF16 value (such as the value returned by 
_mm_cvtness_sbh) is, as Phoebe mentioned, the "brain floating point type" as 
described here: https://en.wikichip.org/wiki/brain_floating-point_format

Unfortunately, what you can do with it seems to depend on the target 
architecture. For very recent x86 processors, you can convert vectors of this 
type to and from single precision floating point and you can do a SIMD dot 
product and accumulate operation (VDPBF16PS), but the only way to do this is 
with intrinsics. Some ARM processors support other operations, but I think with 
similar restrictions (i.e. only accessible through intrinsics). Apart from 
intrinsics, it is treated as a storage-only type.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120411: [X86] Replace m[128|256|512]bh with m[128|256|512]i and mark the former deprecated

2022-02-25 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor requested changes to this revision.
andrew.w.kaylor added a comment.
This revision now requires changes to proceed.

Replacing `__m128bh` with `__m128i` does not prevent arithmetic operations on 
the type.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120411/new/

https://reviews.llvm.org/D120411

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-02-25 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/Headers/avx512bf16intrin.h:37
 ///and fraction field is extended to 23 bits.
-static __inline__ float __DEFAULT_FN_ATTRS _mm_cvtsbh_ss(__bfloat16 __A) {
+static __inline__ float __DEFAULT_FN_ATTRS _mm_cvtsbh_ss(unsigned short __A) {
   return __builtin_ia32_cvtsbf162ss_32(__A);

Are we trying to make our intrinsics weakly typed? I don't like this change at 
all.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

2022-02-25 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D120395#3344890 , @pengfei wrote:

> Disscussed with GCC folks. We think it's better to use the same way as 
> D120411  that replacing it with short int.

Which GCC folks did you discuss it with? I ask because GCC currently considers 
__m128bh and __m128i to be incompatible types, which is what I would expect: 
https://godbolt.org/z/z9nefbrEc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120395: [X86] Prohibit arithmatic operations on type `__bfloat16`

2022-02-24 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D120395#3340953 , @craig.topper 
wrote:

> These intrinsics pre-date the existence of the bfloat type in LLVM. To use 
> bfloat we have to make __bf16 a legal type in C. This means we need to 
> support loads, stores, and arguments of that type. I think that would create 
> bunch of backend complexity because we don't have could 16-bit load/store 
> support to XMM registers. I think we only have load that inserts into a 
> specific element. It's doable, but I'm not sure what we gain from it.

My motivation for wanting to use 'bloat' started with wanting to use '__bf16' 
as the front end type. It just doesn't make sense to me to define a new type 
when we have an existing built-in type that has the same semantics and binary 
representation. The argument for introducing a new IR type was made here: 
https://reviews.llvm.org/D76077 It doesn't seem like a particularly strong 
argument, but it's what was decided then. Using bfloat rather than i16 in the 
IR has the benefit that it expresses what the type actually is instead of just 
using something that has the same size. Using i16, the semantics of the type 
are known only to the front end and we have to rely on what the front end did 
for enforcement of the semantics. That's generally going to be OK, but it seems 
to me like it works for the wrong reason. That is, i16 is not a storage-only 
type and the only reason we don't notice is that the front end doesn't generate 
IR that violates the implicit semantics of the type.

I think there's a minor argument to be made concerning TBAA (short and 
__bfloat16 look like compatible types). Perhaps a more significant argument is 
that using the __bf16 built-in type would allow us to define a type like 
__m256bh like this:

  typedef __bf16 __m256bh __attribute__((__vector_size__(32), __aligned__(32)));

So my question would be, how much work are we talking about to make this work 
with the x86 backend?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D120395: [X86] Prohibit arithmatic operations on type `__bfloat16`

2022-02-23 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D120395#3340496 , @pengfei wrote:

> Update LangRef. We use `i16` type to represent bfloat16.

Why are we using i16 to represent bfloat16? The bfloat type is better.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114564: Fix the use of -fno-approx-func along with -Ofast or -ffast-math

2022-01-18 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

lgtm




Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2760
 case options::OPT_fno_honor_nans:   HonorNaNs = false;break;
 case options::OPT_fapprox_func: ApproxFunc = true;break;
 case options::OPT_fno_approx_func:  ApproxFunc = false;   break;

joerg wrote:
> masoud.ataei wrote:
> > masoud.ataei wrote:
> > > andrew.w.kaylor wrote:
> > > > Should this also imply "MathErrno = false"?
> > > I don't think setting ApproxFunc to true should imply "MathErrno = 
> > > false". 
> > > 
> > > Let say someone have a math library that compute approximate result for 
> > > none special input/output but returns NaN, INF and errno correctly 
> > > otherwise. That is actually can be fairly common, because performance in 
> > > the none special cases are much more important that the special ones. So 
> > > returning errno in the special outputs theoretically should not effect 
> > > the performance on the main path. Therefore, I think compiler should not 
> > > assume anything about MathErrno value based on ApproxFunc value.
> > I am not sure what I was suggesting in my last comment is correct or not. 
> > Can one of the more experienced reviewers confirm?
> > The question is: Should "ApproxFunc=ture" implies "MathErrno=false"?
> They seem pretty orthogonal to me.
I see the point of your explanation. I'm satisfied that they should remain 
separate.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114564/new/

https://reviews.llvm.org/D114564

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D114564: Fix the use of -fno-approx-func along with -Ofast or -ffast-math

2021-11-24 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

Thanks for the patch! This looks mostly good. I have just a few suggestions.

Could you add test cases in clang/test/Driver/clang_f_opts.c to verify that the 
various driver inputs get overridden in the expected way? Without such a test, 
this behavior is likely to be broken in the future.




Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2760
 case options::OPT_fno_honor_nans:   HonorNaNs = false;break;
 case options::OPT_fapprox_func: ApproxFunc = true;break;
 case options::OPT_fno_approx_func:  ApproxFunc = false;   break;

Should this also imply "MathErrno = false"?



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2762
 case options::OPT_fno_approx_func:  ApproxFunc = false;   break;
 case options::OPT_fmath_errno:  MathErrno = true; break;
 case options::OPT_fno_math_errno:   MathErrno = false;break;

Should this conflict with -fapprox-func?



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2880
   break;
 case options::OPT_fno_unsafe_math_optimizations:
   AssociativeMath = false;

I think you need "AppoxFunc = false" here.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2937
   // subsequent options conflict then emit warning diagnostic.
   if (HonorINFs && HonorNaNs && !AssociativeMath && !ReciprocalMath &&
   SignedZeros && TrappingMath && RoundingFPMath &&

You need a check for "!ApproxFunc" here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114564/new/

https://reviews.llvm.org/D114564

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107994: Making the code compliant to the documentation about Floating Point support default values for C/C++. FPP-MODEL=PRECISE enables FFP-CONTRACT (FMA is enabled).

2021-11-11 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor closed this revision.
andrew.w.kaylor added a comment.

Closed by commit by rGf04e387055e495e3e14570087d68e93593cf2918 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107994/new/

https://reviews.llvm.org/D107994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107994: Making the code compliant to the documentation about Floating Point support default values for C/C++. FPP-MODEL=PRECISE enables FFP-CONTRACT (FMA is enabled).

2021-11-10 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.

lgtm


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107994/new/

https://reviews.llvm.org/D107994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D107994: Making the code compliant to the documentation about Floating Point support default values for C/C++. FPP-MODEL=PRECISE enables FFP-CONTRACT (FMA is enabled).

2021-11-09 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/test/CodeGen/ffp-model.c:4
+// RUN: | FileCheck %s \
+// RUN: --check-prefixes=CHECK,CHECK-FAST --strict-whitespace
+

Why did you add the strict-whitespace option?



Comment at: clang/test/CodeGen/ffp-model.c:10
+
+// RUN: %clang -S -emit-llvm -ffp-model=strict %s -o - \
+// RUN: | FileCheck %s \

This is still going to have problems with targets that don't support strictfp 
unless you add an explicit target to the run line.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107994/new/

https://reviews.llvm.org/D107994

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D112094: Add support for floating-point option `ffp-eval-method` and for `pragma clang fp eval_method`.

2021-11-01 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.
Herald added a subscriber: luke957.

I agree that the pragma should also control the evaluation of constant 
expressions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112094/new/

https://reviews.llvm.org/D112094

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74436: Change clang option -ffp-model=precise to select ffp-contract=on

2021-08-05 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

FWIW, fp-contract=on has been the documented default for clang since version 5.

https://releases.llvm.org/5.0.1/tools/clang/docs/ClangCommandLineReference.html#cmdoption-clang-ffp-contract

This change just brought the behavior into conformance with the documentation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74436/new/

https://reviews.llvm.org/D74436

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D99675: RFC [llvm][clang] Create new intrinsic llvm.arith.fence to control FP optimization at expression level

2021-04-06 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D99675#2671924 , @efriedma wrote:

>> The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen 
>> before “+ c” and FMA guarantees that, but to prevent later optimizations 
>> from unpacking the FMA the correct transformation needs to be:
>>
>> llvm.arith.fence(a * b) + c  →  llvm.arith.fence(FMA(a, b, c))
>
> Does this actually block later transforms from unpacking the FMA?  Maybe if 
> the FMA isn't marked "fast"...

Later transforms could unpack the FMA, but the result would be fenced. The 
intent isn't so much to prevent the FMA from being unpacked as to prevent 
losing the original fence semantics. That said, it doesn't quite work. For 
example, you might have this:

  %mul = fmul fast float %a, %b
  %fenced_mul = call float @llvm.arith.fence.f32(%mul)
  %result = fadd fast float %fenced_mul, %c

If there are no other uses of %fenced_mul, that could become

  %tmp = call fast float @llvm.fmuladd.f32(float %a, float %b, float %c)
  %result = call float @llvm.arith.fence.f32(%tmp)

If a later optimization decided to unpack this, it would become this:

  %mul = fmul fast float %a, %b
  %tmp = fadd fast float %mul, %c
  %result = call float @llvm.arith.fence.f32(%tmp)

I suggested this as a way of enabling the FMA optimization. It brings the fadd 
into the fence, but still protects the fmul from being reassociated or 
otherwise transformed with other operations outside the fence. In a purely 
practical sense, this would probably work. In a more strict sense, though, I 
now see that it has the problem that you could legally distribute the addition 
within the fence. I can't see a practical reason anyone would do that, but the 
semantics would allow it. The same ("legal but not practical") is true of 
forming the fmuladd intrinsic before codegen, I think.

So, no, I don't think this works the way it was intended.

That might push us back to Kevin's suggestion of just not allowing the FMA 
optimization across a fence.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99675/new/

https://reviews.llvm.org/D99675

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72675: [Clang][Driver] Fix -ffast-math/-ffp-contract interaction

2020-03-12 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D72675#1920309 , @lebedev.ri wrote:

> I may be wrong, but i suspect those failures aren't actually due to the fact
>  that we pessimize optimizations with this change, but that the whole 
> execution
>  just fails. Can you try running test-suite locally? Do tests themselves 
> actually pass,
>  ignoring the question of their performance?

I find the LNT output very hard to decipher, but I thought that all of the 
failures on the Broadwell (x86) LNT bot were just performance regressions. 
There were many perf improvements and also many regressions. I investigated the 
top regression and found that the loop unroller made a different decision when 
the llvm.fmuladd intrinsic was used than it did for separate mul and add 
operations. The central loop of the test was unrolled eight times rather than 
four. Broadwell gets less benefit from FMA than either Haswell or Skylake, so 
any other factors that might drop performance are less likely to be mitigated 
by having fused these operations. In a more general sense, I don't see any 
reason apart from secondary changes in compiler behavior like this that 
allowing FMA would cause performance to drop.

At least one other target had execution failures caused by Melanie's change, 
but I understood it to be simply exposing a latent problem in the 
target-specific code.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72675/new/

https://reviews.llvm.org/D72675

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74500: clang: Treat ieee mode as the default for denormal-fp-math

2020-03-04 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

lgtm


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74500/new/

https://reviews.llvm.org/D74500



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74436: Change clang option -ffp-model=precise to select ffp-contract=on

2020-02-14 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D74436#1875315 , @nemanjai wrote:

> > You're right, -O0 shouldn't generate FMA. I'm preparing to revert this now 
> > -- just verifying the build.
>
> Perhaps this should be
>  `off` with no optimization
>  `on` with `-O1/-O2/-O3/-Os/-Oz`
>  `fast` with fast math
>
> Just a suggestion, I'm not sure whether that would be the best breakdown. 
> Perhaps we can also see what the defaults are for GCC and unify with those?


GCC doesn't support "on" so I'm not sure it's possible to distinguish their 
intentions. I think it would be better to define what we think is best for 
clang and encourage GCC to unify with that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74436/new/

https://reviews.llvm.org/D74436



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74436: Change clang option -ffp-model=precise to select ffp-contract=on

2020-02-14 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a subscriber: scanon.
andrew.w.kaylor added a comment.

In D74436#1875320 , @mibintc wrote:

> However you are right we don't want the frontend to create FMA when 
> optimizations are disabled -O0

I've been discussing this with @scanon on the cfe-dev mailing list, and he has 
convinced me that we should create FMA options at -O0 if the fp-contract 
setting is "on" -- including when it is on by default. The reasoning that 
persuaded me was, "preserving FMA formation at O0 _helps_ debuggability, 
because it means that numerical behavior is more likely to match what a user 
observed at Os, allowing them to debug the problem."

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74436/new/

https://reviews.llvm.org/D74436

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74436: Change clang option -ffp-model=precise to select ffp-contract=on

2020-02-13 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a subscriber: MaskRay.
andrew.w.kaylor added a comment.

In D74436#1875386 , @thakis wrote:

> The revert of this breaks tests everywhere, as far as I can tell.

It looks like something strange happened with the revert:

> clang-11: warning: overriding '-ffp-model=strict' option with 
> '-ffp-model=strict' [-Woverriding-t-option]

I believe the problem is that the original change that was being reverted 
contained this:

  clang/lib/Driver/ToolChains/Clang.cpp 
  @@ -2768,7 +2766,7 @@ static void RenderFloatingPointOptions(const ToolChain 
, const Driver ,
  !AssociativeMath && !ReciprocalMath &&
  SignedZeros && TrappingMath && RoundingFPMath &&
  DenormalFPMath != llvm::DenormalMode::getIEEE() &&
  +FPContract.empty())
  -(FPContract.equals("off") || FPContract.empty()))

But sometime in the land-revert-land-revert cycle the line above that changed, 
causing the merge to miss this change in the most recent revert. I see that 
@MaskRay has since re-landed this change set, but it's going to cause problems 
for PowerPC. If someone needs to revert this yet again, I think it can be 
safely done by recovering the change above.

Apologies for the mess!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74436/new/

https://reviews.llvm.org/D74436

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74436: Change clang option -ffp-model=precise to select ffp-contract=on

2020-02-12 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

lgtm


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74436/new/

https://reviews.llvm.org/D74436



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D74436: Change clang option -ffp-model=precise to select ffp-contract=on

2020-02-11 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/docs/UsersManual.rst:1388
 
-   * ``precise`` Disables optimizations that are not value-safe on 
floating-point data, although FP contraction (FMA) is enabled 
(``-ffp-contract=fast``).  This is the default behavior.
* ``strict`` Enables ``-frounding-math`` and 
``-ffp-exception-behavior=strict``, and disables contractions (FMA).  All of 
the ``-ffast-math`` enablements are disabled.

lebedev.ri wrote:
> I'm confused. Where in this patch the `This patch establishes the default 
> option for -ffp-model to select "precise".` happens? LHS of diff says it is 
> already default
The comments said that it was the default, but the actual default was something 
that didn't quite match any of the fp-models -- precise but with fp contraction 
off.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2548
   DenormalFPMath = DefaultDenormalFPMath;
   FPContract = "";
   StringRef Val = A->getValue();

I think this always gets changed to fast, on, or off below, but making it empty 
here looks wrong.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2661
 // the FPContract value has already been set to a string literal
 // and the Val string isn't a pertinent value.
 ;

Does this mean that "-ffp-model=precise -ffp-contract=off" will leave FP 
contraction on? That doesn't seem right.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2769
 DenormalFPMath != llvm::DenormalMode::getIEEE() &&
-FPContract.empty())
+(FPContract.equals("off") || FPContract.empty()))
 // OK: Current Arg doesn't conflict with -ffp-model=strict

We should never get here with FPContract empty.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74436/new/

https://reviews.llvm.org/D74436



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69978: Separately track input and output denormal mode

2020-01-31 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

lgtm


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69978/new/

https://reviews.llvm.org/D69978



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69978: Separately track input and output denormal mode

2020-01-29 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1829
+   operations. The second indicates the handling of denormal inputs to
+   floating point instructions.
+

Based on the changes below, if the second value is omitted the input mode will 
be assumed to be the same as the output mode. That should probably be 
documented. I guess you intend for that not to happen, but the documentation 
here leaves the result ambiguous if it does happen.



Comment at: llvm/docs/LangRef.rst:1848
+   should be converted to 0 as if by ``@llvm.canonicalize`` during
+   lowering.
+

Is this saying that if a backend generates an instruction that doesn't handle 
the hardware daz mode then it must insert instructions to check for normals and 
convert them to zero? If so, do you intend this to apply to all such 
instructions or only instructions that aren't able to accept denormal inputs?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69978/new/

https://reviews.llvm.org/D69978



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72906: [X86] Improve X86 cmpps/cmppd/cmpss/cmpsd intrinsics with strictfp

2020-01-29 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

lgtm

I have a couple of comments, but nothing that couldn't be addressed in a later 
patch.




Comment at: clang/lib/CodeGen/CGBuiltin.cpp:12363
+Cmp = Builder.CreateFCmp(Pred, Ops[0], Ops[1]);
   return EmitX86MaskedCompareResult(*this, Cmp, NumElts, Ops[3]);
 }

How hard would it be to generate a select with known safe values ahead of the 
compare in the constrained case? 



Comment at: clang/test/CodeGen/avx-builtins-constrained.c:170
+  // CHECK-LABEL: test_mm256_cmp_pd_false_os
+  // CHECK: call <4 x double> @llvm.x86.avx.cmp.pd.256(<4 x double> %{{.*}}, 
<4 x double> %{{.*}}, i8 27)
+  return _mm256_cmp_pd(a, b, _CMP_FALSE_OS);

Does this have the strictfp attribute here? I don't think we do anything with 
that, but it will likely be useful when we start handling strictfp for 
target-specific intrinsics.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72906/new/

https://reviews.llvm.org/D72906



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72841: [RFC] Add support for pragma float_control, to control precision and exception behavior at the source level

2020-01-27 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

It's not clear to me from reading this how the "precise" control is going to 
work with relation to the fast math flags. I don't think MSVC allows the 
granularity of control over these flags that we have in clang, so maybe they 
don't have this problem.

Consider this code: https://godbolt.org/z/mHiLCm

With the options "-ffp-model=precise -fno-honor-infinities -fno-honor-nans" the 
math operations here are generated with the "nnan ninf contract" flags. That's 
correct. What will happen when I use the pragma to turn precise off? Does it 
enable all fast math flags? Will the subsequent "pop" leave the "ninf nnan" 
fast math flags enabled?

As I said, I don't think you can get into this situation with MSVC. I believe 
that icc will go into full fast math mode with the "precise, off, push" pragma 
but will go back to "nnan ninf contract" mode with the pop. At least, that's 
what the design docs say. I haven't looked at the code to verify this. It seems 
like the correct behavior in any case. I think the clang FPOptions needs 
individual entries for all of the fast math flags to handle this case.




Comment at: clang/docs/LanguageExtensions.rst:3042
+by the pragma behaves as though the command-line option ``ffp-model=precise``
+is enabled.  That is, fast-math is disabled and fp-contract=on (fused
+multiple add) is enabled.

Re  "fp-contraction=on": I agree that this is what it should do, but I don't 
think that's what fp-model=precise currently does. I think it's setting 
fp-contraction=fast.



Comment at: clang/docs/LanguageExtensions.rst:3043
+is enabled.  That is, fast-math is disabled and fp-contract=on (fused
+multiple add) is enabled.
+

s/multiple/multiply



Comment at: clang/docs/LanguageExtensions.rst:3050
+
+The full syntax this pragma supports is ``float_control(except|precise, 
on|off, [push])`` and ``float_control(push|pop)``.  The ``push`` and ``pop`` 
forms can only occur at file scope.
+

Looks like you need a line break here.



Comment at: clang/docs/LanguageExtensions.rst:3050
+
+The full syntax this pragma supports is ``float_control(except|precise, 
on|off, [push])`` and ``float_control(push|pop)``.  The ``push`` and ``pop`` 
forms can only occur at file scope.
+

andrew.w.kaylor wrote:
> Looks like you need a line break here.
Are the precise and except stacks independent?



Comment at: clang/include/clang/Basic/DiagnosticParseKinds.td:1112
   InGroup;
 // - #pragma fp_contract
+def err_pragma_file_or_compound_scope : Error<

This comment is wrong after your change.



Comment at: clang/include/clang/Basic/DiagnosticParseKinds.td:1135
+def err_pragma_float_control_malformed : Error<
+  "pragma float_control is malformed; it requires one or two comma-separated "
+  "arguments">;

This isn't quite accurate. The pop case has no comma-separated arguments. It 
might be better to print the full syntax here if that's feasible.



Comment at: clang/include/clang/Basic/LangOptions.h:363
+exceptions(LangOptions::FPE_Ignore),
+fp_precise(false)
 {}

It seems like fp_precise describes too many things to be a single option. Even 
within this set of options it overlaps with fp_contract.



Comment at: clang/test/CodeGen/fp-floatcontrol-stack.cpp:124
+#if DEFAULT
+//CHECK-DDEFAULT: fmul float
+//CHECK-DDEFAULT: fadd float

Are there also fast-math flags set here? If not, why not?



Comment at: clang/test/CodeGen/fp-floatcontrol-stack.cpp:2
+// RUN: %clang_cc1 -DDEFAULT=1 -emit-llvm -o - %s | FileCheck 
--check-prefix=CHECK-DDEFAULT %s
+// RUN: %clang_cc1 -DEBSTRICT=1 -ffp-exception-behavior=strict -emit-llvm -o - 
%s | FileCheck --check-prefix=CHECK-DEBSTRICT %s
+

Can you add run lines for -ffast-math and (separately) "-fno-honor-nans 
-fno-honor-infinities"?



Comment at: clang/test/CodeGen/fp-floatcontrol-stack.cpp:17
+// for exception behavior and rounding mode.
+//CHECK-DEBSTRICT: 
llvm.experimental.constrained.fmul{{.*}}tonearest{{.*}}strict
+#endif

There should be a constrained fadd here also, right?



Comment at: clang/test/CodeGen/fp-floatcontrol-stack.cpp:52
+#if EBSTRICT
+//CHECK-DEBSTRICT: 
llvm.experimental.constrained.fmul{{.*}}tonearest{{.*}}ignore
+#endif

Why is the constrained intrinsic generated in this case? If we've got both 
constraints set to the defaults at the file scope I would have expected that to 
turn off the constrained mode.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72841/new/

https://reviews.llvm.org/D72841



___
cfe-commits mailing list

[PATCH] D72675: [Clang][Driver] Fix -ffast-math/-ffp-contract interaction

2020-01-24 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2792
   if (!MathErrno && AssociativeMath && ReciprocalMath && !SignedZeros &&
-  !TrappingMath)
+  !TrappingMath && !(FPContract.equals("off") || FPContract.equals("on")))
 CmdArgs.push_back("-menable-unsafe-fp-math");

I think this would read better as "... && !FPContract.equals("off") && 
!FPContract.equals("on")" The '||' in the middle of all these '&&'s combined 
with the extra parens from the function call trips me up.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2846
   ReciprocalMath && !SignedZeros && !TrappingMath && !RoundingFPMath) {
-CmdArgs.push_back("-ffast-math");
+if (!(FPContract.equals("off") || FPContract.equals("on")))
+  CmdArgs.push_back("-ffast-math");

As above, I'd prefer "!off && !on".



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2854
 // Enable -ffp-contract=fast
 CmdArgs.push_back(Args.MakeArgString("-ffp-contract=fast"));
   else

This is a bit of an oddity in our handling. The FPContract local variable 
starts out initialized to an empty string. So, what we're saying here is that 
if you've used all the individual controls to set the rest of the fast math 
flags, we'll turn on fp-contract=fast also. That seems wrong. If you use 
"-ffast-math", FPContract will have been set to "fast" so that's not applicable 
here.

I believe this means

```
-fno-honor-infinities -fno-honor-nans -fno-math-errno -fassociative-math 
-freciprocal-math -fno-signed-zeros -fno-trapping-math -fno-rounding-math

```

will imply "-ffp-contract=fast". Even worse:

```
-ffp-contract=off -fno-fast-math -fno-honor-infinities -fno-honor-nans 
-fno-math-errno -fassociative-math -freciprocal-math -fno-signed-zeros 
-fno-trapping-math -fno-rounding-math

```
will imply "-ffp-contract=fast" because "-fno-fast-math" resets FPContract to 
empty.

I think we should initialize FPContract to whatever we want to be the default 
(on?) and remove the special case for empty here. Also, -fno-fast-math should 
either not change FPContract at all or set it to "off". Probably the latter 
since we're saying -ffast-math includes -ffp-contract=on.



Comment at: clang/test/Driver/fast-math.c:196
+// RUN: %clang -### -ffast-math -ffp-contract=fast -c %s 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-UNSAFE-MATH %s
+//

What about "-ffp-contract=off -ffast-math"? The way the code is written that 
will override the -ffp-contract option. That's probably what we want, though it 
might be nice to have a warning.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72675/new/

https://reviews.llvm.org/D72675



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72675: [Clang][Driver] Fix -ffast-math/-ffp-contract interaction

2020-01-24 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D72675#1839492 , @wristow wrote:

> 1. Should we enable FMA "by default" at (for example) '-O2'?


We recently introduced a new option "-ffp-model=[precise|fast|strict], which is 
supposed to serve as an umbrella option for the most common combination of 
options. I think our default behavior should be equivalent to 
-ffp-model=precise, which enables contraction. It currently seems to enable 
-ffp-contract=fast in the precise model (https://godbolt.org/z/c6w8mJ). I'm not 
sure that's what it should be doing, but whatever the precise model does should 
be our default.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72675/new/

https://reviews.llvm.org/D72675



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72820: [FPEnv] Add pragma FP_CONTRACT support under strict FP.

2020-01-17 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/CodeGen/CGExprScalar.cpp:3436
+}
+  }
+

cameron.mcinally wrote:
> andrew.w.kaylor wrote:
> > cameron.mcinally wrote:
> > > cameron.mcinally wrote:
> > > > cameron.mcinally wrote:
> > > > > I don't think it's safe to fuse a FMUL and FADD if the intermediate 
> > > > > rounding isn't exactly the same as those individual operations. 
> > > > > FMULADD doesn't guarantee that, does it?
> > > > To be clear, we could miss very-edge-case overflow/underflow exceptions.
> > > Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized 
> > > away. Sorry for the noise.
> > We've talked about this before but I don't think we ever documented a 
> > decision as to whether we want to allow constrained intrinsics and fast 
> > math flags to be combined. This patch moves that decision into clang's 
> > decision to generate this intrinsic or not.
> > 
> > I think it definitely makes sense in the case of fp contraction, because 
> > even if a user cares about value safety they might want FMA, which is 
> > theorectically more accurate than the separate values even though it 
> > produces a different value. This is consistent with gcc (which produces FMA 
> > under "-ffp-contract=fast -fno-fast-math") and icc (which produced FMA 
> > under "-fp-model strict -fma").
> > 
> > For the record, I also think it makes sense to use nnan, ninf, and nsz with 
> > constrained intrinsics.
> You had me until:
> 
> >For the record, I also think it makes sense to use nnan, ninf, and nsz with 
> >constrained intrinsics.
> 
> To be clear, we'd need them for the `fast` case, but I don't see a lot of 
> value for the `strict` case.
> 
> We definitely want reassoc/recip/etc for the `optimized but trap-safe` case, 
> so that's enough to require FMF flags on constrained intrinsics alone.
> 
> We should probably break this conversation out into an llvm-dev thread...
I agree about starting an llvm-dev thread. I'll send something out unless 
you've already done so by the time I finish typing it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72820/new/

https://reviews.llvm.org/D72820



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-17 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.
This revision is now accepted and ready to land.

I don't know if there were other reviewers who haven't commented on how you 
addressed their concerns, but this looks good to me.

Thanks for taking the time to improve our handling of this!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72820: [FPEnv] Add pragma FP_CONTRACT support under strict FP.

2020-01-16 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.

Comment at: clang/lib/CodeGen/CGExprScalar.cpp:3436
+}
+  }
+

cameron.mcinally wrote:
> cameron.mcinally wrote:
> > cameron.mcinally wrote:
> > > I don't think it's safe to fuse a FMUL and FADD if the intermediate 
> > > rounding isn't exactly the same as those individual operations. FMULADD 
> > > doesn't guarantee that, does it?
> > To be clear, we could miss very-edge-case overflow/underflow exceptions.
> Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized away. 
> Sorry for the noise.
We've talked about this before but I don't think we ever documented a decision 
as to whether we want to allow constrained intrinsics and fast math flags to be 
combined. This patch moves that decision into clang's decision to generate this 
intrinsic or not.

I think it definitely makes sense in the case of fp contraction, because even 
if a user cares about value safety they might want FMA, which is theorectically 
more accurate than the separate values even though it produces a different 
value. This is consistent with gcc (which produces FMA under 
"-ffp-contract=fast -fno-fast-math") and icc (which produced FMA under 
"-fp-model strict -fma").

For the record, I also think it makes sense to use nnan, ninf, and nsz with 
constrained intrinsics.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72820/new/

https://reviews.llvm.org/D72820

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-16 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2311
+  bool TrappingMath = true;
 // overriden by ffp-exception-behavior?
   bool RoundingFPMath = false;

arsenm wrote:
> cameron.mcinally wrote:
> > Last line of comment was not removed.
> > 
> > Also, is it safe to remove `TrappingMathPresent`? Is that part of the 
> > work-in-progress to support `ffp-exception-behavior`?
> I think this is a rebase gone bad. The patch changing the strict math was 
> revered and recommitted and I probably broke this
Looks like this is still wrong. You didn't intend to change either TrappingMath 
flag, did you?



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2657
   // -fno_unsafe_math_optimizations restores default denormal handling
-  DenormalFPMath = "";
+  DenormalFPMath = DefaultDenormalFPMath;
   break;

Shouldn't this also restore DenormalFP32Math to its default value?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72820: [FPEnv] Add pragma FP_CONTRACT support under strict FP.

2020-01-16 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/CodeGen/CGExprScalar.cpp:3381
+ Addend->getType()),
+{MulOp0, MulOp1, Addend, MulOp->getOperand(2), MulOp->getOperand(3)});
+  else

You shouldn't just assume that MulOp is a constrained intrinsic. Cast to 
ConstrainedFPIntrinsic and use ConstrainedFPIntrinsic::getRoundingMode() and 
ConstrainedFPIntrinsic::getExceptionBehavior(). The cast will effectively 
assert that MulOp is a constrained intrisic. I think that should always be true.



Comment at: clang/lib/CodeGen/CGExprScalar.cpp:3423
 
+  if (Builder.getIsFPConstrained()) {
+if (auto *LHSBinOp = dyn_cast(op.LHS)) {

I don't think we should ever non-constrained create FMul instructions if 
Builder is in FP constrained mode, but you should assert that somewhere above. 
Maybe move this block above line 3409 and add:

assert(LHSBinOp->getOpcode() != llvm::Instruction::FMul && 
RHSBinOp->getOpcode() != llvm::Instruction::FMul);



Comment at: clang/test/CodeGen/constrained-math-builtins.c:160
+// CHECK: declare x86_fp80 
@llvm.experimental.constrained.fmuladd.f80(x86_fp80, x86_fp80, x86_fp80, 
metadata, metadata)
+};

I'd like to see a test that verifies the calls generated in the function and 
specifically a test that verifies that the constrained fneg is generated if 
needed.



Comment at: llvm/docs/LangRef.rst:16094
+
+The fourth and fifth arguments specifie the exception behavior as described
+above.

s/specifie/specify

s/the exception behavior/the rounding mode and exception behavior



Comment at: llvm/docs/LangRef.rst:16104
+
+  %0 = call float @llvm.experimental.constrained.fmuladd.f32(%a, %b, %c)
+

missing metadata arguments



Comment at: llvm/include/llvm/CodeGen/BasicTTIImpl.h:1515
  ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy);
+// FIXME: Is constrained intrinsic' cost equal to it's no strict one?
+if (IID == Intrinsic::experimental_constrained_fmuladd)

I don't think that matters. The cost calculation here is a conservative 
estimate based on the cost if we are unable to generate an FMA instruction. So 
a constrained fmuladd that can't be lowered to FMA will be lower the same way a 
contrained mul followed by a constrained add would be.



Comment at: llvm/include/llvm/CodeGen/ISDOpcodes.h:355
 
+/// FMULADD/STRICT_FMULADD - A intermediate node, made functions handle
+/// constrained fmuladd the same as other constrained intrinsics.

Something is wrong with this comment. I'm not sure what it's trying to say but 
the grammar is wrong.

After looking through the rest of the code, I think I understand what's going 
on. I think we need a verbose comment to explain it. Here's my suggestion

```
FMULADD/STRICT_FMULADD -- These are intermediate opcodes used to handle the 
constrained.fmuladd intrinsic. The FMULADD opcode only exists because it is 
required for correct macro expansion and default handling (which is never 
reached). There should never be a node with ISD::FMULADD. The STRICT_FMULADD 
opcode is used to allow selectionDAGBuilder::visitConstrainedFPIntrinsic to 
determine (based on TargetOptions and target cost information) whether the 
constrained.fmuladd intrinsic should be lowered to FMA or separate FMUL and 
FADD operations.
```
Having thought through that, however, it strikes me as a lot of overhead. Can 
we just add special handling for the constrained.fmuladd intrinsic and make the 
decision then to create either a STRICT_FMA node or separate STRICT_FMUL and 
STRICT_FADD?

The idea that ISD::FMULADD is going to exist as a defined opcode but we never 
intend to add any support for handling it is particularly bad.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72820/new/

https://reviews.llvm.org/D72820



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-10 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

cameron.mcinally wrote:
> andrew.w.kaylor wrote:
> > arsenm wrote:
> > > andrew.w.kaylor wrote:
> > > > arsenm wrote:
> > > > > andrew.w.kaylor wrote:
> > > > > > Can you document which targets do support the option? What happens 
> > > > > > if I try to use the option on a target where it is not supported?
> > > > > I'm not sure where to document this, or if/how/where to diagnose it. 
> > > > > I don't think the high level LangRef description is the right place 
> > > > > to discuss specific target handling.
> > > > > 
> > > > > Currently it won't error or anything. Code checking the denorm mode 
> > > > > will see the f32 specific mode, even if the target in the end isn't 
> > > > > really going to respect this.
> > > > > 
> > > > > One problem is this potentially does require coordination with other 
> > > > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > > > driver what FP mode to set on each entry point, but for x86 it 
> > > > > requires linking in crtfastmath to set the default mode bits. If 
> > > > > another target had a similar runtime environment requirement, I don't 
> > > > > think we can be sure the attribute is correct or not.
> > > > There is precedent for describing target-specific behavior in LangRef. 
> > > > It just doesn't seem useful to say that not all targets support the 
> > > > attribute without saying which ones do. We should also say what is 
> > > > expected if a target doesn't support the attribute. It seems reasonable 
> > > > for the function attribute to be silently ignored.
> > > > 
> > > > > One problem is this potentially does require coordination with other 
> > > > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > > > driver what FP mode to set on each entry point, but for x86 it 
> > > > > requires linking in crtfastmath to set the default mode bits.
> > > > 
> > > > This is a point I'm interested in. I don't like the current 
> > > > crtfastmath.o handling. It feels almost accidental when FTZ works as 
> > > > expected. My understanding is we link crtfastmath.o if we find it but 
> > > > if not everything just goes about its business. The Intel compiler 
> > > > injects code into main() to explicitly set the FTZ/DAZ control modes. 
> > > > That obviously has problems too, but it's at least consistent and 
> > > > predictable. As I understand it, crtfastmath.o sets these modes from a 
> > > > static initializer, but I'm not sure anything is done to determine the 
> > > > order of that initializer relative to others.
> > > > 
> > > > How does the compiler identify entry points for AMDGPU? And does it 
> > > > emit code to set FTZ based on the function attribute here?
> > > The entry points are a specific calling convention. There's no real 
> > > concept of main. Each kernel has an associated blob of metadata the 
> > > driver uses to set up various config registers on dispatch.
> > > 
> > > I don't think specially recognizing main in the compiler is fundamentally 
> > > different than having it done in a static constructor. It's still a 
> > > construct not associated with any particular function or anything.
> > The problem with having it done in a static constructor is that you have no 
> > certainty of when it will be done relative to other static constructors. If 
> > it's in main you can at least say that it's after all the static 
> > constructors (assuming main is your entry point).
> Yes and no. The linker should honor static constructor priorities. But, yeah, 
> there's no guarantee that this constructor will run before other priority 101 
> constructors.
> 
> The performance penalty for setting denormal flushing in main could be 
> significant (think C++). Also, there's precedent for using static 
> constructors, like GCC's crtfastmath.o.
Fair enough. I don't necessarily like how icc handles this. I don't have a 
problem with how gcc handles it. I just really don't like how LLVM does it. If 
we want to take the static constructor approach we should define our own, not 
depend on whether or not the GNU object file happens to be around.

Static initialization doesn't help for AMDGPU, and I suppose that's likely to 
be the case for any offload execution model. Since this patch is moving us 
toward a more consistent implementation I'm wondering if we can define some 
general rules for how this is supposed to work. Like when the function 
attribute will result in injected instructions setting the control flags and 
when it won't.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-09 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

arsenm wrote:
> andrew.w.kaylor wrote:
> > arsenm wrote:
> > > andrew.w.kaylor wrote:
> > > > Can you document which targets do support the option? What happens if I 
> > > > try to use the option on a target where it is not supported?
> > > I'm not sure where to document this, or if/how/where to diagnose it. I 
> > > don't think the high level LangRef description is the right place to 
> > > discuss specific target handling.
> > > 
> > > Currently it won't error or anything. Code checking the denorm mode will 
> > > see the f32 specific mode, even if the target in the end isn't really 
> > > going to respect this.
> > > 
> > > One problem is this potentially does require coordination with other 
> > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > driver what FP mode to set on each entry point, but for x86 it requires 
> > > linking in crtfastmath to set the default mode bits. If another target 
> > > had a similar runtime environment requirement, I don't think we can be 
> > > sure the attribute is correct or not.
> > There is precedent for describing target-specific behavior in LangRef. It 
> > just doesn't seem useful to say that not all targets support the attribute 
> > without saying which ones do. We should also say what is expected if a 
> > target doesn't support the attribute. It seems reasonable for the function 
> > attribute to be silently ignored.
> > 
> > > One problem is this potentially does require coordination with other 
> > > toolchain components. For AMDGPU, the compiler can directly tell the 
> > > driver what FP mode to set on each entry point, but for x86 it requires 
> > > linking in crtfastmath to set the default mode bits.
> > 
> > This is a point I'm interested in. I don't like the current crtfastmath.o 
> > handling. It feels almost accidental when FTZ works as expected. My 
> > understanding is we link crtfastmath.o if we find it but if not everything 
> > just goes about its business. The Intel compiler injects code into main() 
> > to explicitly set the FTZ/DAZ control modes. That obviously has problems 
> > too, but it's at least consistent and predictable. As I understand it, 
> > crtfastmath.o sets these modes from a static initializer, but I'm not sure 
> > anything is done to determine the order of that initializer relative to 
> > others.
> > 
> > How does the compiler identify entry points for AMDGPU? And does it emit 
> > code to set FTZ based on the function attribute here?
> The entry points are a specific calling convention. There's no real concept 
> of main. Each kernel has an associated blob of metadata the driver uses to 
> set up various config registers on dispatch.
> 
> I don't think specially recognizing main in the compiler is fundamentally 
> different than having it done in a static constructor. It's still a construct 
> not associated with any particular function or anything.
The problem with having it done in a static constructor is that you have no 
certainty of when it will be done relative to other static constructors. If 
it's in main you can at least say that it's after all the static constructors 
(assuming main is your entry point).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

arsenm wrote:
> andrew.w.kaylor wrote:
> > Can you document which targets do support the option? What happens if I try 
> > to use the option on a target where it is not supported?
> I'm not sure where to document this, or if/how/where to diagnose it. I don't 
> think the high level LangRef description is the right place to discuss 
> specific target handling.
> 
> Currently it won't error or anything. Code checking the denorm mode will see 
> the f32 specific mode, even if the target in the end isn't really going to 
> respect this.
> 
> One problem is this potentially does require coordination with other 
> toolchain components. For AMDGPU, the compiler can directly tell the driver 
> what FP mode to set on each entry point, but for x86 it requires linking in 
> crtfastmath to set the default mode bits. If another target had a similar 
> runtime environment requirement, I don't think we can be sure the attribute 
> is correct or not.
There is precedent for describing target-specific behavior in LangRef. It just 
doesn't seem useful to say that not all targets support the attribute without 
saying which ones do. We should also say what is expected if a target doesn't 
support the attribute. It seems reasonable for the function attribute to be 
silently ignored.

> One problem is this potentially does require coordination with other 
> toolchain components. For AMDGPU, the compiler can directly tell the driver 
> what FP mode to set on each entry point, but for x86 it requires linking in 
> crtfastmath to set the default mode bits.

This is a point I'm interested in. I don't like the current crtfastmath.o 
handling. It feels almost accidental when FTZ works as expected. My 
understanding is we link crtfastmath.o if we find it but if not everything just 
goes about its business. The Intel compiler injects code into main() to 
explicitly set the FTZ/DAZ control modes. That obviously has problems too, but 
it's at least consistent and predictable. As I understand it, crtfastmath.o 
sets these modes from a static initializer, but I'm not sure anything is done 
to determine the order of that initializer relative to others.

How does the compiler identify entry points for AMDGPU? And does it emit code 
to set FTZ based on the function attribute here?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1837
+   are present, this overrides ``"denormal-fp-math"``. Not all targets
+   support separately setting the denormal mode per type.
+

Can you document which targets do support the option? What happens if I try to 
use the option on a target where it is not supported?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69878/new/

https://reviews.llvm.org/D69878



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D71635: [clang] Rename -frounding-math to -fexperimental-rounding-math and add -frounding-math back as a gcc-compat arg.

2019-12-18 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D71635#1790611 , @MaskRay wrote:

> Before:
>
>   % clang -frounding-math -fsyntax-only -x c /dev/null
>   clang-10: warning: Support for floating point control option frounding-math 
> is incomplete and experimental [-Wexperimental-float-control]
>
>
> CC1 will do rounding math things.
>
> After
>
>   % clang -frounding-math -fsyntax-only -x c /dev/null
>   clang-10: warning: optimization flag '-frounding-math' is not supported 
> [-Wignored-optimization-argument]
>
>
> CC1 will not do rounding math things. -fexperimental-rounding-math if the 
> user really wants to use the feature.
>
> Is my understanding correct? If yes, this patch seems pretty reasonable to 
> me, because -frounding-math is currently incomplete/unsafe.
>
> You may consider not changing CC1 options as they are not user facing.


My understanding is this:

Before D62731 :

  % clang -frounding-math -fsyntax-only -x c /dev/null
  clang-10: warning: optimization flag '-frounding-math' is not supported 
[-Wignored-optimization-argument]

After D62731 :

  % clang -frounding-math -fsyntax-only -x c /dev/null
  clang-10: warning: Support for floating point control option frounding-math 
is incomplete and experimental [-Wexperimental-float-control]

After D71671 :

  % clang -frounding-math -fsyntax-only -x c /dev/null
  

I suppose this patch would be updated to put us back where we were before 
D62731  in terms of the warning and require an 
additional option for anyone who wants to use the rounding math implementation 
as it is currently implemented.

The support for -frounding-math is incomplete, but I'm not sure it's unsafe. 
It's no more unsafe than not using the flag at all. The same reasoning applies 
to -ftrapping-math which was accepted without warning well before D62731 
 and has never been completely implemented.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71635/new/

https://reviews.llvm.org/D71635



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: Add support for options -frounding-math, -ftrapping-math, -ffp-model=, and -ffp-exception-behavior=, : Specify floating point behavior

2019-12-17 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D62731#1788838 , @rupprecht wrote:

> It seems the discussion of whether or not this is incomplete died out -- I'd 
> prefer to assume it is incomplete if there is no consensus. Mailed D71635 
>  to rename `-frounding-math` to 
> `-fexperimental-rounding-math`.
>
> Alternatively we could remove the warning. I still don't see a good argument 
> for the middle ground of having it called `-frounding-math` but also generate 
> a warning.

It's definitely incomplete but the results will not be any worse than you get 
when -frounding-math is ignored.

My preference would be to change the text of the warning that is issued but 
allow -frounding-math to be enabled by this commit without requiring an 
additional option.

I would also very much like to see this patch re-committed. It's currently in 
the "approved" state. If anyone objects to this being committed, please use the 
"request changes" action to indicate this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: Add support for options -frounding-math, -ftrapping-math, -ffp-model=, and -ffp-exception-behavior=, : Specify floating point behavior

2019-12-13 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/include/clang/Basic/DiagnosticDriverKinds.td:444
 
+def warn_drv_experimental_fp_control_incomplete_opt : Warning<
+  "Support for floating point control option %0 is incomplete and 
experimental">,

rupprecht wrote:
> andrew.w.kaylor wrote:
> > rupprecht wrote:
> > > mibintc wrote:
> > > > @kpn thought it would be a good idea to add a Warning that the 
> > > > implementation of float control is experimental and partially 
> > > > implemented.  That's what this is for.
> > > Instead of adding a warning, I'd like to propose `-frounding-math` not be 
> > > enabled unless an additional flag (e.g. `-fexperimental-float-control`) 
> > > is passed. Or maybe this feature should be called 
> > > `-f[no-]experimental-rounding-math` instead.
> > > 
> > > There are plenty of builds that are already specifying `-frounding-math` 
> > > (e.g. they also support building w/ a compiler such as gcc that 
> > > implements this), and adding this experimental/incomplete implementation 
> > > is a surprise to those builds.
> > > 
> > > If I'm wrong and it's completely safe to ignore the warning (maybe it's 
> > > incomplete but not in any unsafe way), we should just not have it at all.
> > You raise an interesting point about people who might be using 
> > -frounding-math already. However, if they are using this flag because they 
> > also sometimes build with a compiler such as gcc that supports the flag, 
> > they are currently getting incorrect behavior from clang. Without this 
> > patch, clang silently ignores the option and the optimizer silently ignores 
> > the fact that the program may be changing the rounding mode dynamically. 
> > The user may or may not be aware of this.
> > 
> > With this patch such a user is likely to observe two effects: (1) their 
> > code will suddenly get slower, and (2) it will probably start behaving 
> > correctly with regard to rounding mode changes. The rounding behavior will 
> > definitely not get worse. I think the warning is useful as an indication 
> > that something has changed. I don't think requiring an additional option 
> > should be necessary.
> > However, if they are using this flag because they also sometimes build with 
> > a compiler such as gcc that supports the flag, they are currently getting 
> > incorrect behavior from clang
> 
> Incorrect, yes, but also likely valid behavior. "experimental" seems to imply 
> a miscompile when using this option should not be unexpected.
> 
> As I suggested before: if I'm wrong, and this behavior is only going to make 
> the code more correct (as you suggest), can we remove the warning that this 
> must be ack'd explicitly by adding `-Wno-experimental-float-control` to 
> builds? I don't understand the motivation for the warning.
The "experimental" code won't be incorrect in any way that the code generated 
when we ignore the option is. The things that have been implemented will work 
correctly. The things that are not implemented will have the potential to 
disregard runtime changes to the rounding mode. Currently, dynamic changes to 
the rounding mode always have the potential of being ignored.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: Add support for options -frounding-math, -ftrapping-math, -ffp-model=, and -ffp-exception-behavior=, : Specify floating point behavior

2019-12-13 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D62731#1784225 , @rjmccall wrote:

> ... The big problem is that intrinsic calls are not arbitrary code: the vast 
> majority of intrinsics (e.g. all the ARM vector intrinsics, many of which can 
> be floating-point) are marked `IntrNoMem` (which I believe corresponds to 
> `readnone`), which means calls to those intrinsics can be reordered with 
> other code whether or not that code has arbitrary side-effects.

Oh, you're right. With the constrained intrinsics we are currently handling 
that by using IntrInaccessibleMemOnly as a proxy for access to the FP 
environment, but that's stronger than we'd want for architecture-specific 
intrinsics in the default case. We have talked about an fpenv-specific 
attribute, but nothing has been done. So, I guess that does leave us in the 
situation where rounding controls might not be correctly respected if 
target-specific intrinsics are used.

> It's good that people are looking at achieving better modeling for the x86 
> backend, but we need to have a plan that doesn't require heroic effort just 
> to get basic correctness.

Do you mean in the backend? If so, I don't think that's possible. The backends 
just don't have any sort of feature that could be used to get conservatively 
correct behavior for cheap the way intrinsics give it to us in the middle end. 
Once you go into instruction selection things get very low level in a hurry.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: Add support for options -frounding-math, -ftrapping-math, -ffp-model=, and -ffp-exception-behavior=, : Specify floating point behavior

2019-12-13 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D62731#1782912 , @rjmccall wrote:

> Hmm.  The target-specific intrinsics thing does concern me, since I assume 
> many targets have a bunch of vector intrinsics that may be sensitive to 
> rounding.  Do we have an idea of how we'd fix that?  If it's a short-term 
> incorrectness that we have a plan to fix, I don't mind the risk, but if we 
> don't know how we'd even start to address it...

I see two potential problem cases with the target-specific intrinsics:

1. Some optimization pass recognizes the intrinisic and uses its semantics to 
perform some optimization such as constant folding
2. Some optimization performs code motion that moves the intrinsic (or, in the 
backend, the instruction it represents) across an operation that changes the 
rounding mode

I don't know if there are any instances of the first case in the public 
repository. Downstream users could be doing it. Those will need special 
handling if they exist (checking for the the strictfp attribute).

The second case should be handled in IR by fesetround() or other such 
intrinsics being marked as having side effects. It's possible that there are 
target-specific intrinsics to change the rounding mode that aren't marked as 
having side effects, but if so that's simply a bug. The other part of this is 
that the intrinsic might be lowered to MC and the MC instructions in a way that 
neglects rounding mode. Many targets have instructions with forms that take an 
explicit rounding mode argument and the backends may be using that with the 
default rounding mode. I am not aware of any such case, but it's definitely 
possible.

Finally, our design for handling strict fp mode in the backends is that 
rounding mode control will be handled by explicitly modeling the dependency 
between the relevant control registers and instructions that implicitly use the 
rounding mode controled by those registers. X86 only recently started doing 
this. There may be other backends that have not implemented it. Some may never 
do so.

I don't have a strong preference about what to do with the warning. I have a 
slight preference for replacing the existing warning with a more specific 
warning saying that floating math support is a work in progress. Eventually we 
need a way for backends to indicate that they believe their support is complete.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: Add support for options -frounding-math, -ftrapping-math, -ffp-model=, and -ffp-exception-behavior=, : Specify floating point behavior

2019-12-12 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D62731#1782762 , @rjmccall wrote:

> Currently we emit a warning if you use `-frounding-math`, saying that the 
> option is ignored.  That should have alerted users that they're not getting 
> the correct behavior now.  This patch removes the warning and (IIUC) 
> implements the correct behavior but is over-conservative.  If that's correct, 
> I think that's acceptable and we don't need an "experimental" flag or a 
> warning.

Oh, I didn't realize we were already warning about that. In theory, we should 
handle rounding math correctly with this change. It's possible we've missed 
some things, but I suppose that's always true. I think there are a few general 
intrinsics left that need constrained versions but don't have them, and we 
don't have any solution yet for target-specific intrinsics. If any of those 
have special handling that assumes the default rounding mode we will get it 
wrong. I don't think most users would be likely to encounter a problem.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: Add support for options -frounding-math, -ftrapping-math, -ffp-model=, and -ffp-exception-behavior=, : Specify floating point behavior

2019-12-12 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/include/clang/Basic/DiagnosticDriverKinds.td:444
 
+def warn_drv_experimental_fp_control_incomplete_opt : Warning<
+  "Support for floating point control option %0 is incomplete and 
experimental">,

rupprecht wrote:
> mibintc wrote:
> > @kpn thought it would be a good idea to add a Warning that the 
> > implementation of float control is experimental and partially implemented.  
> > That's what this is for.
> Instead of adding a warning, I'd like to propose `-frounding-math` not be 
> enabled unless an additional flag (e.g. `-fexperimental-float-control`) is 
> passed. Or maybe this feature should be called 
> `-f[no-]experimental-rounding-math` instead.
> 
> There are plenty of builds that are already specifying `-frounding-math` 
> (e.g. they also support building w/ a compiler such as gcc that implements 
> this), and adding this experimental/incomplete implementation is a surprise 
> to those builds.
> 
> If I'm wrong and it's completely safe to ignore the warning (maybe it's 
> incomplete but not in any unsafe way), we should just not have it at all.
You raise an interesting point about people who might be using -frounding-math 
already. However, if they are using this flag because they also sometimes build 
with a compiler such as gcc that supports the flag, they are currently getting 
incorrect behavior from clang. Without this patch, clang silently ignores the 
option and the optimizer silently ignores the fact that the program may be 
changing the rounding mode dynamically. The user may or may not be aware of 
this.

With this patch such a user is likely to observe two effects: (1) their code 
will suddenly get slower, and (2) it will probably start behaving correctly 
with regard to rounding mode changes. The rounding behavior will definitely not 
get worse. I think the warning is useful as an indication that something has 
changed. I don't think requiring an additional option should be necessary.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: Add support for options -frounding-math, -ftrapping-math, -ffp-model=, and -ffp-exception-behavior=, : Specify floating point behavior

2019-11-19 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: llvm/include/llvm/IR/IRBuilder.h:262
 Function *F = BB->getParent();
-if (!F->hasFnAttribute(Attribute::StrictFP)) {
+if (F && !F->hasFnAttribute(Attribute::StrictFP)) {
   F->addFnAttr(Attribute::StrictFP);

kpn wrote:
> rjmccall wrote:
> > kpn wrote:
> > > This looks reasonable to me. 
> > > 
> > > It smells like there's a larger strictfp IRBuilder issue, but that's not 
> > > an issue for this patch here. The larger issue won't be hit since the new 
> > > options affect the entire compilation. It therefore shouldn't block this 
> > > patch.
> > Does IRBuilder actually support inserting into an unparented basic block?  
> > I feel like this is exposing a much more serious mis-use of IRBuilder.
> I suspect you are correct. If we let this "F && " change go in we'll have a 
> situation where whether or not a block is currently in a function when a 
> function call is emitted will affect whether or not the eventual function 
> definition gets the strictfp attribute. That seems like an unfortunate 
> inconsistency.
> 
> I'm still looking into it. I hope to have an IRBuilder review up today or 
> tomorrow.
As I just commented on the related patch @kpn posted, it appears that IRBuilder 
doesn't entirely support inserting into an unparented block. I was surprised by 
this, but there are places that need to be able to get to the Module from the 
BasicBlock. So, I think something problematic may be happening in the failing 
case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69978: Separately track input and output denormal mode

2019-11-12 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: llvm/docs/LangRef.rst:1822
 
 ``"denorm-fp-mode"``
+   This indicates the subnormal handling that may be assumed for the

I don't like the definition of this attribute. It's not reader-friendly. The 
comma-separated pair format has no indication which value refers to inputs and 
which refers to outputs. Also, while this predates your changes, I think the 
meanings of the current choices are unclear. 

What would you think of a comma-separated list with the following possibilities?


```
allow-denormals (default)
inputs-are-zero (outputs not flushed)
inputs-are-zero, outputs-are-zero
inputs-are-zero, outputs-are-positive-zero
inputs-are-positivezero (outputs not flushed)
inputs-are-positivezero, outputs-are-zero
inputs-are-positivezero, outputs-are-positive-zero
denormal-outputs-are-zero (inputs are unchanged)
denormal-outputs-are-positive-zero (inputs are unchanged)

```
I'd also be open to abbreviations. I don't know if "daz" and "ftz" are readable 
to everyone, but I'm more comfortable with them. That would make the options 
something like this.


```
allow-denormals
daz
daz, ftz
daz, ftz+
daz+
daz+, ftz
daz+, ftz+
ftz
ftz+
```


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69978/new/

https://reviews.llvm.org/D69978



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69598: Work on cleaning up denormal mode handling

2019-11-12 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor accepted this revision.
andrew.w.kaylor added a comment.

Thanks. I understand your direction for denormal handling now, and I'm OK with 
this patch apart from the remaining references to subnormal that Sanjay 
mentioned.

In D69598#1739723 , @arsenm wrote:

> I do think the floating point environment bits should be a considered a 
> property of the calling convention, with attributes that override them. A 
> function which calls a function with a different mode would be responsible 
> for switching the mode before the call. This would require people actually 
> caring about getting this right to really implement

Do you mean the compiler should insert code to restore the FP environment on 
function transitions? Or do you mean that the function itself (i.e. the user's 
code) is responsible for switching the mode? I have some reservations about 
this, but I think the C standard specification for FENV_ACCESS is clear that 
the it is the programmer's responsibility to manage the floating point 
environment correctly. Yes, that's a sure recipe for broken code, but that's 
what it says. Obviously LLVM IR is not bound by the C standard and we could 
take a different approach, but I have concerns about the performance 
implications because in general the compiler won't know when the environment 
needs to be restored so it would have to take a conservative approach.

I've been meaning to write some documentation on the expected behavior at 
function boundaries of the FP environment. Perhaps we can continue this 
discussion there.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69598/new/

https://reviews.llvm.org/D69598

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69598: Work on cleaning up denormal mode handling

2019-11-08 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

I'm unclear as to the expectations surrounding this option. I suppose this is 
somewhat beyond the scope of the current changes, but I'm confused by clang's 
current behavior with regard to denormals.

The -fdenromal-fp-math option causes a function attribute to be set indicating 
the desired flushing behavior, and I guess in some cases that has an effect on 
instruction selection, but it seems to be orthogonal to whether or not we're 
actually setting the processor controls to do flushing (at least for most 
targets). I really only know what happens in the x86 case, and I don't know if 
this behavior is consistent across architectures, but in the x86 case setting 
or not setting the processor  control flags depends on the fast math flags and 
whether or not we find crtfastmath.o when we link.

This leads me to my other point of confusion. Setting the "denormal-fp-math" 
option on a per-function basis seems wrong for targets that have a global FP 
control.




Comment at: clang/include/clang/Basic/CodeGenOptions.h:167
   /// The floating-point denormal mode to use.
-  std::string FPDenormalMode;
+  llvm::DenormalMode FPDenormalMode = llvm::DenormalMode::Invalid;
 

Why is "Invalid" the default here? If you don't use the "fdenormal-fp-math" 
option, shouldn't you get IEEE?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69598/new/

https://reviews.llvm.org/D69598



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69272: Restricted variant of '#pragma STDC FENV_ACCESS'

2019-10-21 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

Thanks for the patch! I don't have time to review this in detail this week, but 
I'm very happy to see this functionality.




Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:890
+  "cannot apply to inline functions, ignoring pragma">,
+   InGroup;
 

rjmccall wrote:
> What's the purpose of this restriction?  Whether `inline` really has much to 
> do with inlining depends a lot on the exact language settings.  (Also, even 
> if this restriction were okay, the diagnostic is quite bad given that there 
> are three separate conditions that can lead to it firing.)
> 
> Also, I thought we were adding instruction-level annotations for this kind of 
> thing to LLVM IR.  Was that not in service of implementing this pragma?
> 
> I'm not categorically opposed to taking patches that only partially implement 
> a feature, but I do want to feel confident that there's a reasonable 
> technical path forward to the full implementation.  In this case, it feels 
> like the function-level attribute is a dead end technically.
I'm guessing this is intended to avoid the optimization problems that would 
occur (currently) if a function with strictfp were inlined into a function 
without it. I'm just guessing though, so correct me if I'm wrong.

As I've said elsewhere, I hope this is a temporary problem. It is a real 
problem though (as is the fact that the inliner isn't currently handling this 
case correctly).

What would you think of a new command line option that caused us to mark 
functions with strictfp as noinline? We'd still need an error somewhat like 
this, but I feel like that would be more likely to accomplish what we want on a 
broad scale.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69272/new/

https://reviews.llvm.org/D69272



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: [RFC] Add support for options -fp-model= and -fp-speculation= : specify floating point behavior

2019-08-27 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/docs/UsersManual.rst:1309
+
+   * ``precise`` Disables optimizations that are not value-safe on 
floating-point data, although FP contraction (FMA) is enabled.
+   * ``strict`` Enables precise and except, and disables contractions (FMA).

There's a bit of ambiguity here because FP contraction isn't an on/off switch 
in LLVM. It has three settings: on, off, and fast. What you've done in this 
patch sets it to 'on' for precise, 'off' for strict, and 'fast' for fast. That 
sounds reasonable, but it's not what ICC and MSVC do. ICC and MSVC both have a 
behavior equivalent to -ffp-contract=fast in the precise model.

The idea behind this is that FMA operations are actually more precise than the 
non-contracted operations. They don't always give the same result, but they 
give a more precise result. The problem with this is that if we adopt this 
approach it leaves us with no fp model that corresponds to the default compiler 
behavior if you don't specify an -fp-model  at all.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: [RFC] Add support for options -fp-model= and -fp-speculation= : specify floating point behavior

2019-08-27 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/docs/UsersManual.rst:1305
+   and ``noexcept``. Note that -fp-model=[no]except can be combined with the
+   other three settings for this option. Details:
+

rjmccall wrote:
> mibintc wrote:
> > rjmccall wrote:
> > > Combined how?  With a comma?
> > > 
> > > This option seems to have two independent dimensions.  Is that necessary 
> > > for command-line compatibility with ICC, or can we separate it into two 
> > > options?
> > > 
> > > The documentation should mention the default behavior along both 
> > > dimensions.  Is it possible to override a prior instance of this option 
> > > to get this default behavior back?
> > > 
> > > You mention that this `-fp-model=fast` is equivalent to `-ffast-math`.  
> > > How does this option interact with that one if both are given on a 
> > > command line?
> > > 
> > > Please put option text in backticks wherever it appears.
> > > 
> > > Most of these comments apply to `-fp-speculation` as well.
> > > Combined how? With a comma?
> > 
> > > This option seems to have two independent dimensions. Is that necessary 
> > > for command-line compatibility with ICC, or can we separate it into two 
> > > options?
> > Yes that's right, there are 2 dimensions.  I wrote it like this for 
> > identical compatibility with icc, and cl.exe also defines the option this 
> > way, to specify multiple values simultaneously. However I think it would be 
> > reasonable and good to split them into separate options.  I will discuss 
> > this with the folks back home.
> > 
> > > The documentation should mention the default behavior along both 
> > > dimensions. 
> > I added this info into the doc
> > 
> > > Is it possible to override a prior instance of this option to get this 
> > > default behavior back?
> > The 3 values along one dimension, precise, strict, fast if they appear 
> > multiple times in the command line, the last value will be the setting 
> > along that dimension.  Ditto with the other dimension, the rightmost 
> > occurrence of except or noexcept will be the setting. 
> > 
> > > You mention that this -fp-model=fast is equivalent to -ffast-math. How 
> > > does this option interact with that one if both are given on a command 
> > > line?
> > The idea is that they are synonyms so if either or both appeared on the 
> > command line, the effect would be identical. 
> > 
> > I'll upload another patch with a few documentation updates and get back to 
> > you about splitting the fp-model option into multiple options.  (Longer 
> > term, there are 2 other dimensions to fp-model)
> > 
> > And thanks for the review
> > Yes that's right, there are 2 dimensions. I wrote it like this for 
> > identical compatibility with icc, and cl.exe also defines the option this 
> > way, to specify multiple values simultaneously. However I think it would be 
> > reasonable and good to split them into separate options. I will discuss 
> > this with the folks back home.
> 
> Okay.  There's certainly some value in imitating existing compilers, but it 
> sounds like a lot has been forced into one option, so maybe we should take 
> the opportunity to split it up.  If we do split it, though, I think the 
> different dimensions should have different base spellings, rather than being 
> repeated uses of `-fp-model`.
> 
> > The 3 values along one dimension, precise, strict, fast if they appear 
> > multiple times in the command line, the last value will be the setting 
> > along that dimension.
> 
> Okay.  This wasn't clear to me from the code, since the code also has an 
> "off" option.
> 
> > The idea is that they are synonyms so if either or both appeared on the 
> > command line, the effect would be identical.
> 
> Right, but compiler options are allowed to conflict with each other, with the 
> general rule being that the last option "wins".  So what I'm asking is if 
> that works correctly with this option and `-ffast-math`, so that e.g. 
> `-ffast-math -fp-model=strict` leaves you with strict FP but 
> `-fp-model=strict -ffast-math` leaves you with fast FP.  (That is another 
> reason why it's best to have one aspect settled in each option: because you 
> don't have to merge information from different uses of the option.)
> 
> At any rate, the documentation should be clear about how this interacts with 
> `-ffast-math`.  You might even consider merging this into the documentation 
> for `-ffast-math`, or at least revising that option's documentation.  Does 
> `-fp-model=fast` cause `__FAST_MATH__` to be defined?
> 
> Also, strictly speaking, this should be `-ffp-model`, right?
I think the ICC interface includes the exception option for 
compatibility/consistency with Microsoft's  /fp option. We can handle that in 
clang-cl. So, I agree that it makes sense to split that out in clang.

ICC's implementation of this actually has four dimensions, only two of which 
are being taken on here. Frankly, I think it's a bit

[PATCH] D62731: [RFC] Add support for options -fp-model= and -fp-speculation= : specify floating point behavior

2019-08-16 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/lib/CodeGen/CodeGenFunction.cpp:126
+  case LangOptions::FPM_Precise:
+  case LangOptions::FPM_Fast:
+break;

mibintc wrote:
> mibintc wrote:
> > kpn wrote:
> > > Wait, so "fast" and "precise" are the same thing? That doesn't sound like 
> > > where the documentation you put in the ticket says "the compiler 
> > > preserves the source expression ordering and rounding properties of 
> > > floating-point".
> > > 
> > > (Yes, I saw below where "fast" turns on the fast math flags but "precise" 
> > > doesn't. That doesn't affect my point here.)
> > "precise" doesn't necessitate the use of Constrained Intrinsics, And 
> > likewise for "fast".   The words "compiler preserves the source expression 
> > ordering" were copied from the msdn documentation for /fp:precise as you 
> > explained it would be useful to have the msdn documentation for the option 
> > in case it goes offline in, say, 30 years.  The ICL Intel compiler also 
> > provides equivalent floating point options. The Intel documentation for 
> > precise is phrased differently "Disables optimizations that are not 
> > value-safe on floating-point data."  
> > 
> > fp-model=precise should enable contractions, if that's not true at default 
> > (I mean, clang -c) then this patch is missing that.
> > 
> > fp-model=fast is the same as requesting ffast-math 
> Well, we haven't heard from Andy yet, but he told me some time ago that 
> /fp:precise corresponds more or less (there was wiggle room) to clang's 
> default behavior.  It sounds like you think the description in the msdn of 
> /fp:precise isn't describing clang's default behavior, @kpn can you say more 
> about that, and do you think that ConstrainedIntrinsics should be created to 
> provide the semantics of /fp:precise? 
"Precise" means that no value unsafe optimizations will be performed. That's 
what LLVM does by default. As long as no fast math flags are set, we will not 
perform optimizations that are not value safe.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D66092: [CodeGen] Generate constrained fp intrinsics depending on FPOptions

2019-08-16 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

It took some digging, but I finally found the e-mail thread where we initially 
agreed that we can't mix constrained FP intrinsics and non-constrained FP 
operations within a function. Here it is: 
http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66092/new/

https://reviews.llvm.org/D66092



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D66092: [CodeGen] Generate constrained fp intrinsics depending on FPOptions

2019-08-16 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D66092#1632642 , @sepavloff wrote:

> - What is the issue with moving `a = b/c`? If it moves ahead of `if` 
> statement it seems OK, because the rounding mode is the same in that point. 
> It cannot be moved inside the block (where rounding mode is different) 
> because it breaks semantics.

It may be that the optimizer can prove that 'someCondition' is always true and 
it will eliminate the if statement and there is nothing to prevent the 
operation from migrating between the calls that change the rounding mode.

This is my main point -- "call i32 @fesetround" does not act as a barrier to an 
fdiv instruction (for example), but it does act as a barrier to a constrained 
FP intrinsic. It is not acceptable, for performance reasons in the general 
case, to have calls act as barriers to unconstrained FP operations. Therefore, 
to keep everything semantically correct, it is necessary to use constrained 
intrinsics in any function where the floating point environment may be changed.

I agree that impact on performance must be minimized, but this is necessary for 
correctness.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66092/new/

https://reviews.llvm.org/D66092

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D65997: Add options rounding and exceptions to pragma fp

2019-08-16 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.

Comment at: clang/docs/LanguageExtensions.rst:3152
+rounding mode. This option is experimental; the compiler may ignore an explicit
+rounding mode in some situations.
+

sepavloff wrote:
> andrew.w.kaylor wrote:
> > You should say something about whether the rounding mode is applied or 
> > assumed. The rounding mode argument in the constrained FP intrinsics is 
> > assumed. The compiler is not required to do anything to force the hardware 
> > into that rounding mode. I think we want that behavior here as well. I 
> > believe this follows the semantics of the STDC FENV_ROUND pragmas that were 
> > introduced by ISO TS-18661.
> Hm. I understood the proposal as if pragma FENV_ROUND is applied. If I am 
> wrong, and that pragma is only a hint to compiler, it is not suitable for our 
> purposes. We need a mean to generate constrained intrinsics from C/C++ code. 
> it would facilitate adaptation of LLVM passes for specific floating point 
> options, including rounding, exception behavior, FENV_ACCESS an so on. It 
> also would allow users to tune code generation. In this case `pragma 
> FENV_ROUND` is a different functionality, which should be developed 
> separately.
Sorry, I didn't mean to introduce confusion by bringing up FENV_ROUND, and 
after reading the description you linked on the other review I'm not sure it 
does what I was previously told it would anyway. Let's ignore that and just 
talk about what you're intending to do here.

If you only generate constrained intrinsics and do not generate an explicit 
rounding mode control change instruction (such as a call to fesetround) then 
the declared rounding mode may not actually be used. According to the 
definition in the LLVM Language Reference, when you set a rounding mode in a 
constrained FP intrinsic, the optimizer will assume that rounding mode is in 
effect and it may use it to perform transformations like constant folding. 
However, there is no requirement for the compiler to generate code (in response 
to the constrained intrinsic) to set the rounding mode. Architectures that 
encode the rounding mode in the instructions may use the rounding mode 
information (though the LLVM back end isn't currently well structured to enable 
that), but in cases where the rounding mode is controlled by processor state, 
the constrained intrinsic will not change it.

Hopefully that clarifies what I was asking about here. Do you intend for use of 
these pragmas to actually change the rounding mode or merely describe it? If 
the pragmas are intended to change the rounding mode, you will need to generate 
instructions to do that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65997/new/

https://reviews.llvm.org/D65997

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D66092: [CodeGen] Generate constrained fp intrinsics depending on FPOptions

2019-08-15 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D66092#1630997 , @sepavloff wrote:

> Replacement of floating point operations with constrained intrinsics seems 
> more an optimization helper then a semantic requirement. IR where constrained 
> operations are mixed with unconstrained is still valid in sense of IR 
> specification.

The thing that makes the IR semantically incomplete is that there is nothing 
there to prevent incorrect code motion of the non-constrained operations. 
Consider this case:

  if (someCondition) {
#pragma clang fp rounding(downward)
fesetround(FE_DOWNWARD);
x = y/z;
fesetround(FE_TONEAREST);
  }
  a = b/c;

If you generate a regular fdiv instruction for the 'a = b/c;' statement, there 
is nothing that would prevent it from being hoisted above the call to 
fesetround() and so it might be rounded incorrectly.

In D66092#1630997 , @sepavloff wrote:

> Another issue is non-standard rounding. It can be represented by constrained 
> intrinsics only. The rounding does not require restrictions on code motion, 
> so mixture of constrained and unconstrained operation is OK. Replacement of 
> all operations with constrained intrinsics would give poorly optimized code, 
> because compiler does not optimize them. It would be a bad thing if a user 
> adds the pragma to execute a statement with specific rounding mode and loses 
> optimization.

I agree that loss of optimization would be a bad thing, but I think it's 
unavoidable. By using non-default rounding modes the user is implicitly 
accepting some loss of optimization. This may be more than they would have 
expected, but I can't see any way around it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66092/new/

https://reviews.llvm.org/D66092

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D65997: Add options rounding and exceptions to pragma fp

2019-08-15 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:3127
 
 The pragma can take three values: ``on``, ``fast`` and ``off``.  The ``on``
 option is identical to using ``#pragma STDC FP_CONTRACT(ON)`` and it allows

This part of the documentation is ambiguous in light of the new options. I 
suppose "[t]he pragma" here refers to "#pragma clang fp contract" but the first 
paragraph refers to "#pragma clang fp" as "the pragma."



Comment at: clang/docs/LanguageExtensions.rst:3152
+rounding mode. This option is experimental; the compiler may ignore an explicit
+rounding mode in some situations.
+

You should say something about whether the rounding mode is applied or assumed. 
The rounding mode argument in the constrained FP intrinsics is assumed. The 
compiler is not required to do anything to force the hardware into that 
rounding mode. I think we want that behavior here as well. I believe this 
follows the semantics of the STDC FENV_ROUND pragmas that were introduced by 
ISO TS-18661.



Comment at: clang/docs/LanguageExtensions.rst:3159
+This option is experimental; the compiler may ignore an explicit exception
+behavior in some situations.
+

I'm not sure what this is supposed to mean. What are the circumstances under 
which the compiler may ignore an explicit exception behavior? Do you mean that 
it isn't supposed to happen but it might while the feature is under development?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65997/new/

https://reviews.llvm.org/D65997



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D66092: [CodeGen] Generate constrained fp intrinsics depending on FPOptions

2019-08-14 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D66092#1627339 , @sepavloff wrote:

> In D66092#1625380 , @kpn wrote:
>
> > Also, if any constrained intrinsics are used in a function then the entire 
> > function needs to be constrained. Is this handled anywhere?
>
>
> If we decided to make the entire function constrained, it should be done 
> somewhere in IR transformations, because inlining may mix function bodies 
> with different fp options.

Kevin is right. We have decided that if constrained intrinsics are used 
anywhere in a function they must be used throughout the function. Otherwise, 
there would be nothing to prevent the non-constrained FP operations from 
migrating across constrained operations and the handling could get botched. The 
"relaxed" arguments ("round.tonearest" and "fpexcept.ignore") should be used 
where the default settings would apply. The front end should also be setting 
the "strictfp" attribute on calls within a constrained scope and, I think, 
functions that contain constrained intrinsics.

We will need to teach the inliner to enforce this rule if it isn't already 
doing so, but if things aren't correct coming out of the front end an incorrect 
optimization could already happen before we get to the inliner. We always rely 
on the front end producing IR with fully correct semantics.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66092/new/

https://reviews.llvm.org/D66092

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: [RFC] Add support for options -fp-model= and -fp-speculation= : specify floating point behavior

2019-08-09 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D62731#1623114 , @mibintc wrote:

> @andrew.w.kaylor Thanks Andy.  Reminder -- in a private document you 
> indicated to me that -fp-speculation=safe corresponds to the maytrap setting 
> for the exception argument.  This patch is implemented with those semantics.

Right. This is a clear indication of why I shouldn't try to squeeze in a review 
just before heading out the door at the end of the day. The "separate option" 
in icc that I was referring to is -fp-speculation. Somehow I overlooked the 
fact that you were adding that option here. I saw your note and read the recent 
comments without looking over even the patch description. Sorry about that.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62731: [RFC] Add support for options -fp-model= and -fp-speculation= : specify floating point behavior

2019-08-07 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

I'm not entirely caught up on this review. I've only read the most recent 
comments, but I think I've got enough context to comment on the metadata 
arguments.

Based only on the fp-model command line options, the front end should only ever 
use "round.dynamic" (for strict mode) or "round.tonearest" (where full fenv 
access isn't permitted but we want to enable strict exception semantics). There 
are some pragmas, which I believe are in some draft of a standards document but 
not yet approved, which can declare any of the other rounding modes, but I 
don't know that we have plans to implement those yet. I'm also hoping that at 
some point we'll have a pass that finds calls to fesetround() and changes the 
metadata argument when it can prove what the rounding mode will be.

For the fp exception argument, the only values that can be implied by the 
-fp-model option are "fpexcept.strict" and "fpexcept.ignore". In icc, we have a 
separate option that can prevent speculation (equivalent to 
"fpexcept.maytrap"). I think gcc's, -ftrapping-math has a similar function 
(though the default may be reversed). I don't think we've talked about how (or 
if) clang should ever get into the "fpexcept.maytrap" state.

So for now, I think both arguments only need to support one of two states, 
depending on the -fp-model arguments.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62731/new/

https://reviews.llvm.org/D62731



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D64067: [X86][PPC] Support -mlong-double-64

2019-07-03 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In D64067#1568895 , @hfinkel wrote:

> One thing to realize about these flags is that they're ABI-altering flags. If 
> the user provides the flag to alter the platform defaults, this only works if 
> the user also ensures that matches the ABI of the relevant system libraries 
> that the compiler is using (e.g., because the user is explicitly linking with 
> a suitable libc).

Right. That's what I was trying to figure out. Are we relying on the users of 
this option not to shoot themselves in the foot? It sounds like yes. I believe 
that's the way we handled it in icc also, but gcc and icc do a lot of sketchy 
things that we wouldn't want to do in clang. In this case I don't object to the 
sketchiness, just so everyone realizes that's what we're doing.

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64067/new/

https://reviews.llvm.org/D64067

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D64067: [X86][PPC] Support -mlong-double-64

2019-07-03 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In this review (https://reviews.llvm.org/D6260) @rsmith mentions that this 
should also have an effect on name mangling.

What will this do if the user calls a library function that expects a long 
double? What does gcc do in that case?


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64067/new/

https://reviews.llvm.org/D64067



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53157: Teach the IRBuilder about constrained fadd and friends

2019-05-20 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added inline comments.



Comment at: include/llvm/IR/IRBuilder.h:234
+  /// Set the exception handling to be used with constrained floating point
+  void setDefaultConstrainedExcept(MDNode *NewExcept) { 
+DefaultConstrainedExcept = NewExcept; 

I think it would be better to add some enumerated type to describe the 
exception semantic and rounding modes. The MDNode is an implementation detail 
and an awkward one for the front end to have to deal with.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53157/new/

https://reviews.llvm.org/D53157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53157: Teach the IRBuilder about constrained fadd and friends

2018-11-20 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In https://reviews.llvm.org/D53157#1304895, @cameron.mcinally wrote:

> The problem I was missing is when a FENV_ACCESS=OFF operation, like a FDIV, 
> is hoisted into a FENV_ACCESS=ON region. I see it now, but still think that 
> forcing FENV_ACCESS=OFF operations into constrained intrinsics is a big 
> hammer. If there is a way to add barriers around function calls in a 
> FENV_ACCESS=ON region, that would be better.

It may be a big hammer, but I don't think we'll need to use it terribly often 
(that is, I don't expect the mixing of FENV_ACCESS regions within a function to 
be common -- I could be wrong). In any event, we've been clear that choosing to 
enable FP environment access will mean sacrificing performance in the near 
term. Once we've had time to teach all the relevant optimizations how to handle 
the constrained intrinsics the hammer will start to feel much smaller.

In https://reviews.llvm.org/D53157#1304895, @cameron.mcinally wrote:

> Stepping way back, the *best* solution is to have the FPEnv implementation 
> shut down unsafe optimizations on an individual basis. Perhaps we should be 
> tagging functions that contain FENV_ACCESS=ON regions. That way unsafe 
> optimization passes, e.g. hoisting, can query the tag to see if they should 
> skip these functions.

One of the reasons we went with the approach we did is that it provides 
conservatively correct results by default. Optimizations don't need to be 
taught to leave the intrinsics alone. They do that as a default behavior when 
they don't recognize the intrinsic. If we had required optimizations to opt out 
as needed, we'd have problems chasing down all the places where that needed to 
happen. So we chose the opposite challenge of using the big hammer for initial 
implementation and then having to go looking for the places that needed to be 
taught how to interpret the intrinsics for cases when the optimization can 
still be performed.

I think at this point we're all on the same page in this regard. I just wanted 
to make sure we also understand why we're on that page. I still believe it was 
the correct choice.

https://reviews.llvm.org/D53157

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53157: Teach the IRBuilder about constrained fadd and friends

2018-11-19 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In https://reviews.llvm.org/D53157#1302724, @uweigand wrote:

> A couple of comments on the previous discussion:
>
> 1. Instead of defining a new command line option, I'd prefer to use the 
> existing options -frounding-math and -ftrapping-math to set the default 
> behavior of math operations w.r.t. rounding modes and exception status.  (For 
> compatibility with GCC if nothing else.)

I agree that it's preferable to re-use these existing options if possible. I 
have some concerns that -ftrapping-math has a partial implementation in place 
that doesn't seem to be well aligned with the way fast-math flags are handled, 
so it might require some work to have that working as expected without breaking 
existing users. In general though these seem like they should do what we need.

Regarding GCC compatibility, I notice that GCC defaults to trapping math being 
enabled and I don't think that's what we want with clang. It also seems to 
imply something more than I think we need for constrained handling. For 
example, the GCC documentation says that -fno-trapping-math "can result in 
incorrect output for programs that depend on an exact implementation of IEEE or 
ISO rules/specifications for math functions" so it sounds like maybe it also 
implies (for GCC)  something like LLVM's "afn" fast math flag.

So if we are going to use these options, I think we need to have a discussion 
about whether or not it's OK to diverge from GCC's interpretation of them.

In https://reviews.llvm.org/D53157#1302724, @uweigand wrote:

> 2. I also read the C standard to imply that it is a requirement of **user 
> code** to reset the status flag to default before switching back to 
> FENV_ACCESS OFF.  The fundamental characterization of the pragma says "The 
> FENV_ACCESS pragma provides a means **to inform the implementation** when a 
> program might access the floating-point environment to test floating-point 
> status flags or run under non-default floating-point control modes."  There 
> is no mention anywhere that using the pragma, on its own, will ever 
> **change** those control modes.   The last sentence about "... the 
> floating-point control modes have their default setting", while indeed a bit 
> ambiguous, is still consistent with an interpretation that it is the 
> responsibility of user code to ensure that state, there is no explicit 
> statement that the implementation will do so.

I definitely agree with this interpretation of the standard. My understanding 
is that behavior is undefined if the user has not left the FP environment in 
the default state when transitioning to an FENV_ACCESS OFF region.

In https://reviews.llvm.org/D53157#1302724, @uweigand wrote:

> 3. I agree that we need to be careful about intermixing "normal" 
> floating-point operations with strict ones.  However, I'm still not convinced 
> that the pragma itself must be the scheduling barrier.  It seems to me that 
> the compiler already knows where FP control flags are ever modified directly 
> (this can only happen with intrinsics or the like), so the main issue is 
> whether function calls need to be considered.  This is where the pragma comes 
> in: in my mind, the primary difference between FENV_ACCESS ON and FENV_ACCESS 
> OFF regions is that where the pragma is ON, function calls need to be 
> considered (unless otherwise known for sure) to access FP control flags, 
> while where the pragma is OFF, function calls can be considered to never 
> touch FP control flags.  So the real scheduling barrier would be any 
> **function call within a FENV_ACCESS ON region**.  Those would have to be 
> marked by the front-end in the IR, presumably using a function attribute.  
> The common LLVM optimizers would then need to respect that scheduling barrier 
> (here is where we likely still have an open issue, there doesn't appear to be 
> any way to express that at the IR level for regular floating-point operations 
> ...), and likewise the back-ends (but that looks straightforward: a back-end 
> typically will model FP status as residing in a register or in a 
> pseudo-memory slot, and those can simply be considered used/clobbered by 
> function calls marked as within FENV_ACCESS ON regions).

I'm a bit confused by this. The constrained intrinsics will cause all calls to 
act as barriers to motion of the FP operations represented by the intrinsics 
(at least before instruction selection). So I'm not clear what you are saying 
is needed here.

https://reviews.llvm.org/D53157

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53157: Teach the IRBuilder about constrained fadd and friends

2018-11-16 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In https://reviews.llvm.org/D53157#1291978, @cameron.mcinally wrote:

> In https://reviews.llvm.org/D53157#1291964, @kpn wrote:
>
> > I don't expect anyone would need to switch between constrained and regular 
> > floating point at an instruction level of granularity.
>
>
> The Standard allows for this (to some degree). E.g. FENV_ACCESS can be 
> toggled between nested compound statements.
>
> Also, some AVX512 hardware instructions have explicit SAE and RM operands.

Just because FENV_ACCESS can be toggled on that granularity doesn't mean we 
have to represent it that way. We've previously agreed (I think) that if 
FENV_ACCESS is enabled anywhere in a function we will want to use the 
constrained intrinsics for all FP operations in the function, not just the ones 
in the scope where it was specified. I think this works because FENV_ACCESS ON 
needs to be respected (i.e. we need to assume that the user may change the FP 
environment) but FENV_ACCESS OFF doesn't need to be respected (i.e. we are not 
required to assume that the user will not change the environment). For 
instance, the spec does give us liberty to assume the default rounding mode in 
FENV_ACCESS OFF regions, but if we make no assumptions about the rounding mode 
that will be conservatively correct.

As for instructions that take an explicit rounding mode argument, that's a 
separate issue (and one that is relevant for multiple architectures). The 
constrained intrinsics do not enforce a rounding mode (i.e. they are 
descriptive rather than prescriptive), but if we have concrete rounding mode 
arguments for any given instruction we could use that to encode the rounding 
mode in the instruction. I'm not sure how user code would express this apart 
from use of intrinsics.

https://reviews.llvm.org/D53157

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D45616: [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR

2018-11-08 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

Agreed. Reverting this patch wouldn't move us forward on constrained FP 
handling. What I'm saying (and what I think @nastafev  is saying) is that this 
patch is taking a built-in that allows the user to express specific signalling 
behavior and throwing away that aspect of its semantics. Granted we do say that 
FP environment access is unsupported, so technically unmasking FP exceptions or 
even reading the status flag is undefined behavior. But it seems like there's a 
pretty big difference between using that claim to justify enabling some 
optimization that might do constant folding in a way that assumes the default 
rounding mode versus using that claim to disregard part of the semantics of a 
built-in that is modeling a target-specific instruction.

I'm not insisting that we have to revert this patch or anything like that. I'm 
just saying that we should think about it. My personal preference would 
actually be to have this code re-implemented using the new constrained fcmp 
intrinsic when it lands. That still leaves the masking part of this unsettled, 
but as you say that's probably not a priority right now.

BTW, here's a link to the constrained fcmp review: 
https://reviews.llvm.org/D54121


Repository:
  rC Clang

https://reviews.llvm.org/D45616



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D45616: [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR

2018-11-08 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

In https://reviews.llvm.org/D45616#1290897, @nastafev wrote:

> >   can trigger arbitrary floating-point exceptions anywhere in your code
>
> I believe this statement reflects the current state of many compilers on the 
> market, I guess I just don't see the reason why making things worse. It seems 
> the original intent of the commit was to add support for masked compares, and 
> that could have been achieved without breaking what already worked.
>
> I hope the patch is ultimately helping some performance optimization, but it 
> is breaking the original intent of some legitimate programs that worked 
> before, and introduces correctness regression. So to me it must be at least 
> guarded by a flip-switch.
>
> The reference to constrained floating-point intrinsics work is relevant, but 
> it will obviously take time and extra effort to enable and then to unbreak 
> all the cases that are made broken here. Instead one could postpone lowering 
> of the particular instructions until it was possible without violation of the 
> semantics...

That seems like a legitimate concern to me.

I believe this patch was part of a larger effort to get rid of the dependence 
on intrinsics. We have a very broad preference for expressing things using 
native IR whenever possible because (among other reasons) intrinsics block most 
optimizations and we don't want to teach optimizations to reason about 
target-specific intrinsics. In this particular case we may have overreached 
because it isn't strictly possible to express all the semantics of this 
built-in accurately using native IR.

There is a patch currently active to add constrained fcmp intrinsics, but it 
doesn't have a masked variant.

Repository:
  rC Clang

https://reviews.llvm.org/D45616

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53157: Teach the IRBuilder about constrained fadd and friends

2018-11-08 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

Craig Topper also raised some concerns with me (in person, his desk is right by 
mine) about the potential effect this might have on code size. He tells me that 
IRBuilder calls are intended to always be inlined and if we grow the 
implementation of these functions too much it could lead to noticeable bloat. 
It still seems to me like it might be worthwhile for the simplification it 
would allow in the front end, but I'm not really a front end guy so I 
definitely agree that we should get some input from front end people about what 
they want.


https://reviews.llvm.org/D53157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-02 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

Fair enough.

FYI, I've spent most of the day poking at the IR linker and I've found all 
sorts of ways that I can get it to make a complete mess of structure types and 
pointers to them, including some simple cases where it will mess it up in 
different ways depending on the order in which you link files, but I think I've 
convinced myself that there is no way to get the linker to cause incorrect code 
to be generated -- just lots of extra types, pointer casts, and function casts. 
This seems to be entirely consistent with what you are saying and sounds like a 
pretty solid argument for getting rid of typed pointers.

I guess maybe I'll take a step back and think about whether or not I could 
solve my current problems by pretending that all pointers are i8* and reasoning 
from there based on uses.

Thanks for taking the time to share your thoughts on this with me.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-02 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

I hope I'm not coming across as too argumentative here. I don't really have 
strong feelings about this function pointer type patch and ultimately I see 
that you are right, but the objections you are raising here would equally apply 
to what I'm working on with the IR linker and if I find a way to fix that, I'll 
care a bit more about that case. (If you'd like a preview, here's the bug I'm 
trying to fix: https://bugs.llvm.org/show_bug.cgi?id=38408)

You say "there is no semantic meaning for pointer types in LLVM IR" and that's 
fine, but there is a function PointerType::getElementType() and if I modify 
that function to always return the i8 type it will break a lot of things. So 
while there may be some sense in which the the pointer type cannot be the 
"correct" type, there is most definitely a sense in which it can be "incorrect" 
even if that sense isn't the strict semantic sense. I haven't looked at all the 
uses of  PointerType::getElementType() [or the related 
Type::getPointerElementType()]. I know a lot of them are just tests and 
assertions. Others are trivially walking through something that they know to be 
true by other means. A few seem (at least on first glance) to actually be doing 
something with it. For instance, llvm::getPointerStride() contains this line of 
code: "int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType());"

I'm not arguing against opaque pointer types. I just feel like we're probably 
at least a couple of years away from having that. I'm also not arguing for 
robust and exhaustive correction of all cases where pointer types are not 
currently "working" in the sense that I am describing in my prior comments. I'm 
just saying that if someone runs into a specific case that is causing them 
problems and they submit a fix for that case, maybe we should let it through 
unless we have more of a reason than strict semantics rendering it unimportant.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-01 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

The LLVM linker also seems to have a bunch of problems with resolving pointer 
types in structures.  I'm looking into a couple of those now.

I am aware of the opaque pointer effort, though as you say it seems to be 
stalled. I agree that there aren't a lot of things you can do based on pointer 
type information, but there are some things that you might like to do which can 
be proven to be unsafe based on pointer type information that might be 
incorrectly flagged as unsafe if the pointer type information is incorrect. An 
example would be the sorts of data layout optimizations described here: 
https://llvm.org/devmtg/2014-10/Slides/Prashanth-DLO.pdf  Note, in particular, 
on slide 11 ("Legality") the reference to "cast to/from a given struct type". I 
would imagine that includes pointer casts. It should be possible to do the same 
sort of analysis with opaque pointers by following their uses very carefully, 
and maybe not having these infernal pointer type casts all over the place would 
make that less prone to false negatives. In the meantime, the pointer casts are 
there and have to be dealt with.

Getting back to the current review, Erich explained to me that this patch 
involves a certain amount of risk in that it changes the way clang processes 
things, but it has the merit of getting the correct answer.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D49403: More aggressively complete RecordTypes with Function Pointers

2018-08-01 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor added a comment.

I've talked to Olga about this, and I think we have a way to handle the problem 
she was working on without this change.

However,  I think this change is worth considering anyway. The test case shows 
an example where clang is clearly not producing the output it intends to 
produce. In most cases that probably doesn't matter, and I can't come up with 
any case where it will result in incorrect code generation. One place that I 
think it has the potential to introduce trouble is with LTO.

Modifying the example from the test case slightly, suppose you have that code 
broken up into two source files like this.

f1.c:

  struct has_fp;
  typedef void (*fp)(const struct has_fp *f);
  struct has_fp {
fp i;
  };
  void func(const struct has_fp *f) {}

f2.c:

  struct has_fp;
  typedef void (*fp)(const struct has_fp *f);
  struct has_fp {
fp i;
  };
  void func(const struct has_fp *f);
  void func2(const struct has_fp *f, int n) {
if (n == 0)
  func(f);
  }

Now if I compile both of these files with "-c -flto -O2" and then use 
"llvm-link -S -o - f1.o f2.o" I'll see the following:

  %struct.has_fp = type { {}* }
  %struct.has_fp.0 = type { void (%struct.has_fp.0*)* }
  
  define void @func(%struct.has_fp* %f) {
  entry:
ret void
  }
  
  define void @func2(%struct.has_fp.0* %f, i32 %i) {
  entry:
%cmp = icmp eq i32 %i, 0
br i1 %cmp, label %if.end, label %if.then
  
  if.then:
tail call void bitcast (void (%struct.has_fp*)* @func to void 
(%struct.has_fp.0*)*)(%struct.has_fp.0* %f)
br label %if.end
  
  if.end:
ret void
  }

Granted, this will ultimately produce correct code, and I don't have an example 
handy that shows how the extra type and the function bitcast might inhibit 
optimizations, but I think we're at least a step closer to being able to 
imagine it.


https://reviews.llvm.org/D49403



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D38060: Remove offset size check in nullptr arithmetic handling

2017-09-20 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor closed this revision.
andrew.w.kaylor added a comment.

This was committed as r313784.  I put the wrong differential revision number in 
the comment for that check-in.


https://reviews.llvm.org/D38060



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D38060: Remove offset size check in nullptr arithmetic handling

2017-09-19 Thread Andy Kaylor via Phabricator via cfe-commits

andrew.w.kaylor created this revision.

This patch amends the changes that were committed from 
https://reviews.llvm.org/D37042 to remove the offset size check in the idiom 
identification routine.

That check was behaving inconsistently (from target to target) because the 
offset was implicitly promoted to an integer before the idiom identification 
routine was called.  The result was that when a smaller-than-int offset was 
added to a nullptr, the idiom was reported as having been matched on platforms 
where the int size is the same as the pointer size but the idiom was reported 
as not having been matched on platforms where the int size and the pointer size 
were different.

Because this check adds little value, and because the behavior of nullptr 
arithmetic is undefined in all cases (and therefore at our discretion to 
implement as we choose), it seems best to remove this check in order to 
guarantee consistent behavior across all platforms.


https://reviews.llvm.org/D38060

Files:
  lib/AST/Expr.cpp
  test/CodeGen/nullptr-arithmetic.c
  test/Sema/pointer-addition.c
  test/SemaCXX/nullptr-arithmetic.cpp


Index: lib/AST/Expr.cpp
===
--- lib/AST/Expr.cpp
+++ lib/AST/Expr.cpp
@@ -1862,10 +1862,6 @@
   if (!PTy || !PTy->getPointeeType()->isCharType())
 return false;
 
-  // Check that the integer type is pointer-sized.
-  if (Ctx.getTypeSize(IExp->getType()) != Ctx.getTypeSize(PExp->getType()))
-return false;
-
   return true;
 }
 InitListExpr::InitListExpr(const ASTContext , SourceLocation lbraceloc,
Index: test/SemaCXX/nullptr-arithmetic.cpp
===
--- test/SemaCXX/nullptr-arithmetic.cpp
+++ test/SemaCXX/nullptr-arithmetic.cpp
@@ -1,4 +1,6 @@
 // RUN: %clang_cc1 %s -fsyntax-only -verify -pedantic -Wextra -std=c++11
+// RUN: %clang_cc1 %s -fsyntax-only -triple i686-unknown-unknown -verify 
-pedantic -Wextra -std=c++11
+// RUN: %clang_cc1 %s -fsyntax-only -triple x86_64-unknown-unknown -verify 
-pedantic -Wextra -std=c++11
 
 #include 
 
Index: test/CodeGen/nullptr-arithmetic.c
===
--- test/CodeGen/nullptr-arithmetic.c
+++ test/CodeGen/nullptr-arithmetic.c
@@ -1,4 +1,6 @@
 // RUN: %clang_cc1 -S %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -S %s -emit-llvm -triple i686-unknown-unknown -o - | 
FileCheck %s
+// RUN: %clang_cc1 -S %s -emit-llvm -triple x86_64-unknown-unknown -o - | 
FileCheck %s
 
 #include 
 
@@ -32,3 +34,14 @@
 // CHECK-LABEL: test3
 // CHECK: getelementptr
 // CHECK-NOT: inttoptr
+
+// This checks the case where the offset isn't pointer-sized.
+// The front end will implicitly cast the offset to an integer, so we need to
+// make sure that doesn't cause problems on targets where integers and pointers
+// are not the same size.
+int8_t *test4(int8_t b) {
+  return NULLPTRI8 + b;
+}
+// CHECK-LABEL: test4
+// CHECK: inttoptr
+// CHECK-NOT: getelementptr
Index: test/Sema/pointer-addition.c
===
--- test/Sema/pointer-addition.c
+++ test/Sema/pointer-addition.c
@@ -1,4 +1,6 @@
 // RUN: %clang_cc1 %s -fsyntax-only -verify -pedantic -Wextra -std=c11
+// RUN: %clang_cc1 %s -fsyntax-only -triple i686-unknown-unknown -verify 
-pedantic -Wextra -std=c11
+// RUN: %clang_cc1 %s -fsyntax-only -triple x86_64-unknown-unknown -verify 
-pedantic -Wextra -std=c11
 
 #include 
 


Index: lib/AST/Expr.cpp
===
--- lib/AST/Expr.cpp
+++ lib/AST/Expr.cpp
@@ -1862,10 +1862,6 @@
   if (!PTy || !PTy->getPointeeType()->isCharType())
 return false;
 
-  // Check that the integer type is pointer-sized.
-  if (Ctx.getTypeSize(IExp->getType()) != Ctx.getTypeSize(PExp->getType()))
-return false;
-
   return true;
 }
 InitListExpr::InitListExpr(const ASTContext , SourceLocation lbraceloc,
Index: test/SemaCXX/nullptr-arithmetic.cpp
===
--- test/SemaCXX/nullptr-arithmetic.cpp
+++ test/SemaCXX/nullptr-arithmetic.cpp
@@ -1,4 +1,6 @@
 // RUN: %clang_cc1 %s -fsyntax-only -verify -pedantic -Wextra -std=c++11
+// RUN: %clang_cc1 %s -fsyntax-only -triple i686-unknown-unknown -verify -pedantic -Wextra -std=c++11
+// RUN: %clang_cc1 %s -fsyntax-only -triple x86_64-unknown-unknown -verify -pedantic -Wextra -std=c++11
 
 #include 
 
Index: test/CodeGen/nullptr-arithmetic.c
===
--- test/CodeGen/nullptr-arithmetic.c
+++ test/CodeGen/nullptr-arithmetic.c
@@ -1,4 +1,6 @@
 // RUN: %clang_cc1 -S %s -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 -S %s -emit-llvm -triple i686-unknown-unknown -o - | FileCheck %s
+// RUN: %clang_cc1 -S %s -emit-llvm -triple x86_64-unknown-unknown -o - | FileCheck %s
 
 #include 
 
@@ -32,3 +34,14 @@
 // CHECK-LABEL: test3
 // CHECK: getelementptr

1 2 >

1 - 100 of 119 matches

Mail list logo