[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2020-01-18 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added a comment.

I posted a patch to speed things up:
https://reviews.llvm.org/D72984
The numbers are different, since 9 seconds is measured in a local linux build 
with no debug info, vs. 28s I measured when building at work, on Windows, with 
debug info.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2020-01-17 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added a comment.

`ArmMveAliasValid` is the second slowest function to compile in all of clang, 
according to ClangBuildAnalyzer: https://reviews.llvm.org/P8185

We shouldn't spend 28 CPU seconds per rebuild on a switch. Please do something 
to speed it up. Generally, it is best to emit tables instead of code whenever 
possible.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-25 Thread Simon Tatham via Phabricator via cfe-commits
simon_tatham marked an inline comment as done.
simon_tatham added inline comments.



Comment at: clang/utils/TableGen/CMakeLists.txt:17
   NeonEmitter.cpp
+  MveEmitter.cpp
   TableGen.cpp

thakis wrote:
> nit: These files are listed alphabetically.
Sorry about that. Fixed in rG24ef631f4333120abd6b66c1e8466a582b60779f


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-25 Thread Nico Weber via Phabricator via cfe-commits
thakis added inline comments.



Comment at: clang/utils/TableGen/CMakeLists.txt:17
   NeonEmitter.cpp
+  MveEmitter.cpp
   TableGen.cpp

nit: These files are listed alphabetically.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

I've hopefully fixed the build in rG7b3de1e81197 
, but it's 
hard to tell for sure with the other error.

I also fixed the tests in rG78700ef8866d 
 (and 
worked out how to remove change-id's from commit messages).

Let us know if anything else looks broken.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Hmm.. Let me take a look. There's a different error on the same build now, but 
I think it's just hiding this one.

I'll also try and fix the tests that are failing in places too, if I can.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-24 Thread Yi-Hong Lyu via Phabricator via cfe-commits
Yi-Hong.Lyu added a comment.

Seems this commit broke the build 
http://lab.llvm.org:8011/builders/ppc64le-lld-multistage-test/builds/6794/


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-23 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

I've read this through again and it looks good. If no one else has any issues, 
then LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-07 Thread Dave Green via Phabricator via cfe-commits
dmgreen added subscribers: samparker, SjoerdMeijer.
dmgreen added a comment.

This is looking good to me. My understanding is that is has some dependencies? 
The llvm side will likely needed to go in first, plus a couple of clang patches?




Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8529
   "argument should be a multiple of %0">;
+def err_argument_not_power_of_2 : Error<
+  "argument should be a power of 2">;

simon_tatham wrote:
> dmgreen wrote:
> > Do we have any/should we have some tests for these errors?
> By the time we finish implementing all the intrinsics, there will be tests 
> for these errors. The intrinsics that need them aren't in this preliminary 
> commit, though.
OK. That's fair.



Comment at: clang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c:2
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -O3 -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -DPOLYMORPHIC -O3 -S -emit-llvm -o - %s | FileCheck %s

simon_tatham wrote:
> dmgreen wrote:
> > These tests all run -O3, the entire pass pipeline. I see quite a few tests 
> > in the same folder do the same thing, but in this case we will be adding 
> > quite a few tests. Random mid-end optimizations may well keep on altering 
> > them.
> > 
> > Is it possible to use -disable-O0-optnone and pipe them into opt -mem2reg? 
> > What would that do to the codegen, would it be a lot more verbose than it 
> > is now?
> > 
> > Also could/should they be using clang_cc1?
> The immediate problem is that if you use any form of `clang | opt | 
> FileCheck` command line, then `update_cc_test_checks.py` says 'skipping 
> non-FileChecked line', because it doesn't support anything more complicated 
> than a two-stage pipeline consisting of clang and FileCheck.
> 
> I've enhanced `update_cc_test_checks` to handle that case, in D68406, and 
> respun these tests accordingly. But if that patch doesn't get approval then 
> I'll have to rethink this again.
> 
> (For `vld24.c` I ended up running `opt -sroa -early-cse` to avoid the IR 
> becoming noticeably more verbose. The rest worked fine with just `mem2reg`, 
> though.)
> 
> 
> Patching `update_cc_test_checks.py` to support more complex pipelines, it 
> seems to work OK: most codegen changes are trivial ones, such as modifiers 
> not appearing on IR operations (`call` instead of `tail call`, plain `shl` in 
> place of `shl nuw`). Only `vld24` becomes significantly more complicated: for 
> that one file I had to run `opt -mem2reg -sroa -early-cse` instead.
Yeah, they look OK. Thanks for making the change. It looks like a useful 
feature being added.



Comment at: clang/test/CodeGen/arm-mve-intrinsics/vcvt.c:13
+{
+return vcvttq_f16_f32(a, b);
+}

Tests for bottom too would be good to see too. In general, more tests are 
better.



Comment at: clang/test/CodeGen/arm-mve-intrinsics/vcvt.c:20
+// CHECK-NEXT:[[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 
[[TMP0]])
+// CHECK-NEXT:[[TMP2:%.*]] = call <8 x half> 
@llvm.arm.mve.fltnarrow.predicated(<8 x half> [[A:%.*]], <4 x float> [[B:%.*]], 
i32 1, <4 x i1> [[TMP1]])
+// CHECK-NEXT:ret <8 x half> [[TMP2]]

Are these tests still using old names? I'm not missing something about how 
these are generated, am I?



Comment at: clang/utils/TableGen/MveEmitter.cpp:132
+// the pointee type.
+Pointer,
+  };

simon_tatham wrote:
> dmgreen wrote:
> > The gathers are really a Vector of Pointers, in a way. But in the 
> > intrinsics (and IR), they are just treated as a vector of ints, so I 
> > presume a new type is not needed? We may (but not yet) want to make use of 
> > the llvm masked gathers. We would have to add codegen support for them 
> > first though (which I don't think we have plans for in the near term).
> Yes, I'm working so far on the assumption that we don't need to represent 
> scatter/gather address vectors as a special vector-of-pointers type.
> 
> (If nothing else, it would be a bit strange for the 64-bit versions, where 
> the vector element isn't even the same //size// as a pointer.)
> 
> If auto-generation of gather loads during vectorization turns out to need a 
> special representation, then I suppose we'll have to rethink.
Sounds good.



Comment at: clang/utils/TableGen/MveEmitter.cpp:207
+// pointer, especially if the pointee is also const.
+if (isa(Pointee)) {
+  if (Const)

simon_tatham wrote:
> dmgreen wrote:
> > Would making this always east const be simpler? Do we generate anything 
> > with inner pointer types? Should there be a whitespace before the "const " 
> > in Name += "const "?
> There 

[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-04 Thread Simon Tatham via Phabricator via cfe-commits
simon_tatham marked 20 inline comments as done.
simon_tatham added inline comments.



Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8529
   "argument should be a multiple of %0">;
+def err_argument_not_power_of_2 : Error<
+  "argument should be a power of 2">;

dmgreen wrote:
> Do we have any/should we have some tests for these errors?
By the time we finish implementing all the intrinsics, there will be tests for 
these errors. The intrinsics that need them aren't in this preliminary commit, 
though.



Comment at: clang/include/clang/Basic/arm_mve_defs.td:266
+// vidupq_wb_u16 -> vidupq_u16.
+def PNT_WB: PolymorphicNameType<0, "wb">;
+

dmgreen wrote:
> These are not used yet, right? They are not meant for the vldr gathers, those 
> don't have polymorphic names (if I'm reading this list of intrinsics right). 
> These are for vidup's as the comment says?
I've defined here the complete list of polymorphic name types that will be 
needed for //all// the MVE intrinsics. I haven't included an example of every 
single one in this preliminary set, though.

Yes, I think `PNT_WB` in particular is only used for the `vidup` family.



Comment at: clang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c:2
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -O3 -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -DPOLYMORPHIC -O3 -S -emit-llvm -o - %s | FileCheck %s

dmgreen wrote:
> These tests all run -O3, the entire pass pipeline. I see quite a few tests in 
> the same folder do the same thing, but in this case we will be adding quite a 
> few tests. Random mid-end optimizations may well keep on altering them.
> 
> Is it possible to use -disable-O0-optnone and pipe them into opt -mem2reg? 
> What would that do to the codegen, would it be a lot more verbose than it is 
> now?
> 
> Also could/should they be using clang_cc1?
The immediate problem is that if you use any form of `clang | opt | FileCheck` 
command line, then `update_cc_test_checks.py` says 'skipping non-FileChecked 
line', because it doesn't support anything more complicated than a two-stage 
pipeline consisting of clang and FileCheck.

I've enhanced `update_cc_test_checks` to handle that case, in D68406, and 
respun these tests accordingly. But if that patch doesn't get approval then 
I'll have to rethink this again.

(For `vld24.c` I ended up running `opt -sroa -early-cse` to avoid the IR 
becoming noticeably more verbose. The rest worked fine with just `mem2reg`, 
though.)


Patching `update_cc_test_checks.py` to support more complex pipelines, it seems 
to work OK: most codegen changes are trivial ones, such as modifiers not 
appearing on IR operations (`call` instead of `tail call`, plain `shl` in place 
of `shl nuw`). Only `vld24` becomes significantly more complicated: for that 
one file I had to run `opt -mem2reg -sroa -early-cse` instead.



Comment at: clang/utils/TableGen/MveEmitter.cpp:82
+#if 0
+} // stop emacs from wanting to auto-indent everything to 2 spaces inside here
+#endif

dmgreen wrote:
> Is this needed still? It seems like something that should be handled in 
> clang-format/emacs directly.
Oh yes, now I look on Stack Overflow there is a way to make emacs stop 
indenting things inside namespaces. Thanks for the prod :-)



Comment at: clang/utils/TableGen/MveEmitter.cpp:1403
+" *\n"
+" * Permission is hereby granted, free of charge, to any person "
+"obtaining a copy\n"

dmgreen wrote:
> Clang format really made a mess of this, didn't it.
> 
> Is this is old license? Should it be updated to the new one. I imagine that 
> these generated headers might well have been missed in the switchover.
Seems likely! I've updated it to the same license that's at the top of this 
source file itself.



Comment at: clang/utils/TableGen/MveEmitter.cpp:132
+// the pointee type.
+Pointer,
+  };

dmgreen wrote:
> The gathers are really a Vector of Pointers, in a way. But in the intrinsics 
> (and IR), they are just treated as a vector of ints, so I presume a new type 
> is not needed? We may (but not yet) want to make use of the llvm masked 
> gathers. We would have to add codegen support for them first though (which I 
> don't think we have plans for in the near term).
Yes, I'm working so far on the assumption that we don't need to represent 
scatter/gather address vectors as a special vector-of-pointers type.

(If nothing else, it would be a bit strange for the 64-bit versions, where the 
vector element isn't even the same //size// as a pointer.)

If auto-generation of gather loads during vectorization turns out to need a 

[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-02 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: clang/include/clang/Basic/arm_mve_defs.td:119
+// The type Void can be used for the return type of an intrinsic, and as the
+// parameter type for intrinsics that aren't actually parametrised by any kind
+// of _s32 / _f16 / _u8 suffix.

parametrised->parameterised



Comment at: clang/utils/TableGen/MveEmitter.cpp:207
+// pointer, especially if the pointee is also const.
+if (isa(Pointee)) {
+  if (Const)

Would making this always east const be simpler? Do we generate anything with 
inner pointer types? Should there be a whitespace before the "const " in Name 
+= "const "?



Comment at: clang/utils/TableGen/MveEmitter.cpp:251
+  }
+  virtual unsigned sizeInBits() const override { return Bits; }
+  ScalarTypeKind kind() const { return Kind; }

These don't need to be virtual and override, they can just be override. Same 
elsewhere.



Comment at: clang/utils/TableGen/MveEmitter.cpp:325
+return Element->c_name_base() + "x" + utostr(Registers);
+  }
+

How come this doesn't have/need a llvm_name? Are they just all handled manually?



Comment at: clang/utils/TableGen/MveEmitter.cpp:382
+// ones are the same across the whole set (so that no variable is needed at
+// all).
+//

Cool.

Do you know how much this helps? Do the benefits outweigh the costs in the 
complexity of this code?

And do you know, once we have added a lot of instruction how long this will 
take to run (I know, difficult question to answer without doing it first, but 
is it roughly O(n log n)? And memory will never be excessive? A very 
unscientific test I just ran made it seem pretty quick, but that was obviously 
without the number of intrinsics we will have in the end). We should try and 
ensure that we don't make this a compile time burden for llvm.



Comment at: clang/utils/TableGen/MveEmitter.cpp:455
+
+class Value {
+public:

Is it worth giving this a different name to more obviously distinguish it from 
llvm::Value?



Comment at: clang/utils/TableGen/MveEmitter.cpp:723
+CodeGenParamAllocator DummyParamAlloc;
+V->gen_code(nulls(), DummyParamAlloc);
+

What is this doing?



Comment at: clang/utils/TableGen/MveEmitter.cpp:755
+}
+std::list used;
+gen_code_dfs(Code, used, Pass);

Used

And why a list, out of interest?



Comment at: clang/utils/TableGen/MveEmitter.cpp:1023
+Value::Ptr Arg = getCodeForDagArg(D, 0, Scope, Param);
+if (const auto *ST = dyn_cast(CastType)) {
+  if (!ST->requires_float()) {

Do you envision this may require float const or vector casts in the future?



Comment at: clang/utils/TableGen/MveEmitter.cpp:1125
+  if (!PolymorphicNameType->isValueUnset("ExtraSuffixToDiscard")) {
+std::string ExtraSuffix =
+PolymorphicNameType->getValueAsString("ExtraSuffixToDiscard");

Some of these could use StringRef's more liberally, here and elsewhere.



Comment at: clang/utils/TableGen/MveEmitter.cpp:1595
+  // Is this parameter the same for all intrinsics in the group?
+  bool Constant = true;
+  auto it = kv.second.begin();

Can this use (something like):
bool Constant = llvm::all_of(kv.second, [&](const auto ) { return 
OI.ParamValues[i] == kv.second[0].ParamValues[i]; } ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-09-29 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Sorry. I wasn't ignoring this (sort-of), I just knew that you were on holiday 
and this is a bit of a big one.

But I like this. I still have to take a deeper look into the main tablegen 
parts, but it looks very powerful.

I presume from here adding new intrinsics is mostly a case of adding to 
arm_mve.td (with perhaps some minor parts in defs and maybe some custom bits 
here and there).




Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8529
   "argument should be a multiple of %0">;
+def err_argument_not_power_of_2 : Error<
+  "argument should be a power of 2">;

Do we have any/should we have some tests for these errors?



Comment at: clang/include/clang/Basic/arm_mve_defs.td:266
+// vidupq_wb_u16 -> vidupq_u16.
+def PNT_WB: PolymorphicNameType<0, "wb">;
+

These are not used yet, right? They are not meant for the vldr gathers, those 
don't have polymorphic names (if I'm reading this list of intrinsics right). 
These are for vidup's as the comment says?



Comment at: clang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c:2
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -O3 -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -DPOLYMORPHIC -O3 -S -emit-llvm -o - %s | FileCheck %s

These tests all run -O3, the entire pass pipeline. I see quite a few tests in 
the same folder do the same thing, but in this case we will be adding quite a 
few tests. Random mid-end optimizations may well keep on altering them.

Is it possible to use -disable-O0-optnone and pipe them into opt -mem2reg? What 
would that do to the codegen, would it be a lot more verbose than it is now?

Also could/should they be using clang_cc1?



Comment at: clang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c:3
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -O3 -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -DPOLYMORPHIC -O3 -S -emit-llvm -o - %s | FileCheck %s
+

POLYMORPHIC isn't used here. Same for vcvt below (Is there a polymorphic vcvt?)



Comment at: clang/test/CodeGen/arm-mve-intrinsics/vminvq.c:12
+//
+uint32_t test_vminvq_u32(uint32_t a, uint32x4_t b)
+{

Its probably worth trying to fill in tests for most types.



Comment at: clang/utils/TableGen/MveEmitter.cpp:82
+#if 0
+} // stop emacs from wanting to auto-indent everything to 2 spaces inside here
+#endif

Is this needed still? It seems like something that should be handled in 
clang-format/emacs directly.



Comment at: clang/utils/TableGen/MveEmitter.cpp:1403
+" *\n"
+" * Permission is hereby granted, free of charge, to any person "
+"obtaining a copy\n"

Clang format really made a mess of this, didn't it.

Is this is old license? Should it be updated to the new one. I imagine that 
these generated headers might well have been missed in the switchover.



Comment at: clang/utils/TableGen/MveEmitter.cpp:132
+// the pointee type.
+Pointer,
+  };

The gathers are really a Vector of Pointers, in a way. But in the intrinsics 
(and IR), they are just treated as a vector of ints, so I presume a new type is 
not needed? We may (but not yet) want to make use of the llvm masked gathers. 
We would have to add codegen support for them first though (which I don't think 
we have plans for in the near term).



Comment at: clang/utils/TableGen/MveEmitter.cpp:520
+  // time.
+  bool needs_visiting(unsigned Pass) {
+bool ToRet = Visited < Pass;

needsVisiting.

I would guess the same for other places that use snake_case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits