[PATCH] D40508: Replace long type names in IR with hashes

2018-02-26 Thread Serge Pavlov via Phabricator via cfe-commits
sepavloff abandoned this revision.
sepavloff added a comment.

Using llvm type names is considered a wrong direction.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-29 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

Are you trying to use LLVM struct type identity to infer information about the 
source program?  That is not and has never been a guarantee.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-29 Thread Serge Pavlov via Phabricator via cfe-commits
sepavloff added a comment.

In https://reviews.llvm.org/D40508#939513, @rjmccall wrote:

> In https://reviews.llvm.org/D40508#938854, @sepavloff wrote:
>
> > In https://reviews.llvm.org/D40508#938686, @rjmccall wrote:
> >
> > > The LLVM linking model does not actually depend on struct type names 
> > > matching.  My understanding is that, at best, it uses that as a heuristic 
> > > for deciding whether to make an effort to unify two types, but it's not 
> > > something that's ultimately supposed to impact IR semantics.
> >
> >
> > It is mainly true with an exception, when `llvm-link` resolves opaque types 
> > it relies on type name only. And this behavior creates the issue that 
> > https://reviews.llvm.org/D40567 tries to resolve.
>
>
> It is not clear from that report what the actual problem is.  Two incomplete 
> types get merged by the linker?  So what?


`llvm-link` is expected to produce IR that is semantically consistent with the 
program initially represented by a set of TUs. In this case it is not true. A 
function defined in source as `foo(ABC&)` is converted by linking to 
`foo &)` and this breaks initial semantics.

>>> If we needed IR type names to match reliably, we would use a mangled name, 
>>> not a pretty-print.
>> 
>> There is no requirement for IR type name to be an identifier, so 
>> pretty-print fits the need of type identification.
> 
> Not really; pretty-printing drops a lot of information that is pertinent in a 
> stable identifier, like scopes and so on, and makes arbitrary decisions about 
> other things, like where to insert spaces, namespace qualifiers, etc.

Type name mangling indeed is attractive solution. It has at least the 
advantages:

- It is reversible (in theory).
- It can be more compact. For instance, there is no need to spell type name 
that is encountered already, a some kind of reference is sufficient.

And there are arguments against it:

- It make working with IR harder for developers as readability is broken,
- Type name in IR is mostly a decoration, with the exception of rare case of 
opaque type resolution, so type name mangling may be considered as an overkill.

On the other hand pretty-printing can be finely tuned so that all necessary 
information appears in its result. As there is no requirements on compatibility 
of type names in bitcode files, things like number of spaces look not so 
important, it is enough that the same version of clang was used to compile bc 
files that are linked by `llvm-link`. After all it is readable.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-29 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In https://reviews.llvm.org/D40508#938854, @sepavloff wrote:

> In https://reviews.llvm.org/D40508#938686, @rjmccall wrote:
>
> > In https://reviews.llvm.org/D40508#938675, @sepavloff wrote:
> >
> > > In https://reviews.llvm.org/D40508#938040, @rjmccall wrote:
> > >
> > > > My skepticism about the raw_ostream is not about the design of having a 
> > > > custom raw_ostream subclass, it's that that subclass could conceivably 
> > > > be re-used by some other client.  It feels like it belongs as an 
> > > > internal hack in Clang absent some real evidence that someone else 
> > > > would use it.
> > >
> > >
> > > This class can be helpful in various cases where string identifier must 
> > > persistently identify an object and memory consumption must be low. It 
> > > may be:
> > >
> > > - If we introduce an option that allows a user to specify how many 
> > > symbols of full type name are kept in abbreviated name, then `llvm-link` 
> > > may see types named as `foo` and 
> > > `2544896211ff669ed44dccd265f4c9163b340190`, if they come from modules 
> > > compiled with different options. To find out that these are the same 
> > > type, it must have access to the same machinery.
> >
> >
> > The LLVM linking model does not actually depend on struct type names 
> > matching.  My understanding is that, at best, it uses that as a heuristic 
> > for deciding whether to make an effort to unify two types, but it's not 
> > something that's ultimately supposed to impact IR semantics.
>
>
> It is mainly true with an exception, when `llvm-link` resolves opaque types 
> it relies on type name only. And this behavior creates the issue that 
> https://reviews.llvm.org/D40567 tries to resolve.


It is not clear from that report what the actual problem is.  Two incomplete 
types get merged by the linker?  So what?

>> If we needed IR type names to match reliably, we would use a mangled name, 
>> not a pretty-print.
> 
> There is no requirement for IR type name to be an identifier, so pretty-print 
> fits the need of type identification.

Not really; pretty-printing drops a lot of information that is pertinent in a 
stable identifier, like scopes and so on, and makes arbitrary decisions about 
other things, like where to insert spaces, namespace qualifiers, etc.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-29 Thread Serge Pavlov via Phabricator via cfe-commits
sepavloff added a comment.

In https://reviews.llvm.org/D40508#938686, @rjmccall wrote:

> In https://reviews.llvm.org/D40508#938675, @sepavloff wrote:
>
> > In https://reviews.llvm.org/D40508#938040, @rjmccall wrote:
> >
> > > My skepticism about the raw_ostream is not about the design of having a 
> > > custom raw_ostream subclass, it's that that subclass could conceivably be 
> > > re-used by some other client.  It feels like it belongs as an internal 
> > > hack in Clang absent some real evidence that someone else would use it.
> >
> >
> > This class can be helpful in various cases where string identifier must 
> > persistently identify an object and memory consumption must be low. It may 
> > be:
> >
> > - If we introduce an option that allows a user to specify how many symbols 
> > of full type name are kept in abbreviated name, then `llvm-link` may see 
> > types named as `foo` and `2544896211ff669ed44dccd265f4c9163b340190`, 
> > if they come from modules compiled with different options. To find out that 
> > these are the same type, it must have access to the same machinery.
>
>
> The LLVM linking model does not actually depend on struct type names 
> matching.  My understanding is that, at best, it uses that as a heuristic for 
> deciding whether to make an effort to unify two types, but it's not something 
> that's ultimately supposed to impact IR semantics.


It is mainly true with an exception, when `llvm-link` resolves opaque types it 
relies on type name only. And this behavior creates the issue that 
https://reviews.llvm.org/D40567 tries to resolve.

> If we needed IR type names to match reliably, we would use a mangled name, 
> not a pretty-print.

There is no requirement for IR type name to be an identifier, so pretty-print 
fits the need of type identification.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-28 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In https://reviews.llvm.org/D40508#938675, @sepavloff wrote:

> In https://reviews.llvm.org/D40508#938040, @rjmccall wrote:
>
> > My skepticism about the raw_ostream is not about the design of having a 
> > custom raw_ostream subclass, it's that that subclass could conceivably be 
> > re-used by some other client.  It feels like it belongs as an internal hack 
> > in Clang absent some real evidence that someone else would use it.
>
>
> This class can be helpful in various cases where string identifier must 
> persistently identify an object and memory consumption must be low. It may be:
>
> - Name mangling,
> - Symbol obfuscation,
> - More convenient replacement for `raw_sha1_ostream` (the latter produces 
> binary result, while `raw_abbrev_ostream` produces text),


There's no requirement to persistently identify an object here.

> - If we introduce an option that allows a user to specify how many symbols of 
> full type name are kept in abbreviated name, then `llvm-link` may see types 
> named as `foo` and `2544896211ff669ed44dccd265f4c9163b340190`, if they 
> come from modules compiled with different options. To find out that these are 
> the same type, it must have access to the same machinery.

The LLVM linking model does not actually depend on struct type names matching.  
My understanding is that, at best, it uses that as a heuristic for deciding 
whether to make an effort to unify two types, but it's not something that's 
ultimately supposed to impact IR semantics.

If we needed IR type names to match reliably, we would use a mangled name, not 
a pretty-print.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-28 Thread Serge Pavlov via Phabricator via cfe-commits
sepavloff added a comment.

In https://reviews.llvm.org/D40508#938040, @rjmccall wrote:

> My skepticism about the raw_ostream is not about the design of having a 
> custom raw_ostream subclass, it's that that subclass could conceivably be 
> re-used by some other client.  It feels like it belongs as an internal hack 
> in Clang absent some real evidence that someone else would use it.


This class can be helpful in various cases where string identifier must 
persistently identify an object and memory consumption must be low. It may be:

- Name mangling,
- Symbol obfuscation,
- More convenient replacement for `raw_sha1_ostream` (the latter produces 
binary result, while `raw_abbrev_ostream` produces text),
- If we introduce an option that allows a user to specify how many symbols of 
full type name are kept in abbreviated name, then `llvm-link` may see types 
named as `foo` and `2544896211ff669ed44dccd265f4c9163b340190`, if they 
come from modules compiled with different options. To find out that these are 
the same type, it must have access to the same machinery.




Comment at: lib/AST/TypePrinter.cpp:1532-1534
+namespace {
+template
+void printTo(raw_ostream , ArrayRef Args, const PrintingPolicy ,

rjmccall wrote:
> sepavloff wrote:
> > rnk wrote:
> > > `static` is nicer than a short anonymous namespace.
> > Yes, but this is function template. It won't create symbol in object file. 
> > Actually anonymous namespace has no effect here, it is only a documentation 
> > hint.
> Nonetheless, we generally prefer to use 'static' on internal functions, even 
> function templates, instead of putting them in anonymous namespaces.
OK, fixed in rL319290.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-28 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

My skepticism about the raw_ostream is not about the design of having a custom 
raw_ostream subclass, it's that that subclass could conceivably be re-used by 
some other client.  It feels like it belongs as an internal hack in Clang 
absent some real evidence that someone else would use it.




Comment at: lib/AST/TypePrinter.cpp:1532-1534
+namespace {
+template
+void printTo(raw_ostream , ArrayRef Args, const PrintingPolicy ,

sepavloff wrote:
> rnk wrote:
> > `static` is nicer than a short anonymous namespace.
> Yes, but this is function template. It won't create symbol in object file. 
> Actually anonymous namespace has no effect here, it is only a documentation 
> hint.
Nonetheless, we generally prefer to use 'static' on internal functions, even 
function templates, instead of putting them in anonymous namespaces.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-28 Thread Serge Pavlov via Phabricator via cfe-commits
sepavloff added a comment.

https://reviews.llvm.org/D40567 presents a patch that adds template argument 
names to class template specializations. It also describes the use case in 
which type names are used to identify type across different TUs.
Adding template parameters obviously increase memory consumption. This patch 
provides a way to reduce it.
Note that this issue arises only if source is compiled to bitcode (.ll or .bc). 
If compilation is made to machine representation (.s or .o), IR type name are 
not needed.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-28 Thread Serge Pavlov via Phabricator via cfe-commits
sepavloff added a comment.

In https://reviews.llvm.org/D40508#937212, @rjmccall wrote:

> In Swift's IRGen, we have an option controlling whether to emit meaningful 
> value names.  That option can be set directly from the command line, but it 
> defaults to whether we're emitting IR assembly (i.e. .ll, not .bc), which 
> means that the clients most interested in getting meaningful value names — 
> humans, presumably — always get good names, but nobody else pays for them.  I 
> have many, many times wished that we'd taken that same approach in Clang 
> instead of basing it on build configuration.
>
> If type names are a significant burden on compile times, should we just start 
> taking that same approach?  Just don't use real type names in .bc output 
> unless we're asked to.  Maybe we can eventually retroactively use the same 
> option for value names.


This is indeed reasonable approach, I will implement it the subsequent patch. 
Actually we could vary name length to achieve better readability or same 
memory, as the hash is calculated for entire type name and remains the same.

> I agree with Reid that it's really weird for there to be a raw_ostream that 
> automatically rewrites the string contents to be a hash when some limit is 
> reached; that does not feel like generalizable technology.

It reduces type names and at the same time keeps type uniqueness across TUs. It 
also does not require massive changes in printing methods. Probably the intent 
will be more clear when I publish the next patch of this set.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-28 Thread Serge Pavlov via Phabricator via cfe-commits
sepavloff marked 3 inline comments as done.
sepavloff added a comment.

In https://reviews.llvm.org/D40508#936617, @rnk wrote:

> It's not clear to me that this abbreviation functionality should live in 
> Support. Clang probably wants enough control over this (assuming we're doing 
> it) that the logic should live in clang.
>
> I also think we might want to try solving this a completely different way: 
> just don't bother emitting more than two template arguments for IR type 
> names. We don't really need to worry about type name uniqueness or matching 
> them across TUs.


Actually the intention is to have unique type names across different TUs.
I will publish the relevant patch a bit latter, but the problem we are solving 
is in incorrect behavior of `llvm-link`. If one TU contains opaque type and the 
other has the type definition, these two types must be merged into one. The 
merge procedure relies on type names, so it is important to have the same type 
names for types in different TUs that are equivalent in the sense of C++.




Comment at: include/clang/AST/PrettyPrinter.h:231
+  /// make things like breaking digraphs and '>>' in template parameters.
+  bool NotForCompilation : 1;
 };

rjmccall wrote:
> This saves, what, a few spaces from a few thousand IR type names?  I'm 
> skeptical that this even offsets the code-size impact of adding this option.
Not these few spaces themselves make the issue. The real evil is that to insert 
these spaces, `printTemplateArgumentList` had to print each template parameter 
into intermediate stream. We could try to use `raw_abbrev_ostream` to reduce 
memory consumption, it would not work because these intermediate streams are of 
type `raw_svector_ostream` and all these huge parameter texts would be 
materialized first and only then would go to compacting stream.
If no need to maintain compilable output, intermediate streams are not needed 
and all input can go directly to `raw_abbrev_ostream`.



Comment at: lib/AST/TypePrinter.cpp:1532-1534
+namespace {
+template
+void printTo(raw_ostream , ArrayRef Args, const PrintingPolicy ,

rnk wrote:
> `static` is nicer than a short anonymous namespace.
Yes, but this is function template. It won't create symbol in object file. 
Actually anonymous namespace has no effect here, it is only a documentation 
hint.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-28 Thread Serge Pavlov via Phabricator via cfe-commits
sepavloff updated this revision to Diff 124587.
sepavloff added a comment.

Updated patch as part of it was committed in https://reviews.llvm.org/rL319178


https://reviews.llvm.org/D40508

Files:
  include/clang/AST/PrettyPrinter.h
  lib/AST/TypePrinter.cpp
  lib/CodeGen/CodeGenTypes.cpp
  test/CodeGenCXX/template-types.cpp

Index: test/CodeGenCXX/template-types.cpp
===
--- /dev/null
+++ test/CodeGenCXX/template-types.cpp
@@ -0,0 +1,30 @@
+// RUN: %clang_cc1 -std=c++11 -triple i686-linux-gnu -S -emit-llvm %s -o - | FileCheck %s
+
+// Taken from the test pr29160.cpp
+template 
+struct Foo {
+  template 
+  static void ignore() {}
+  Foo() { ignore(); }
+  struct Inner {};
+};
+
+struct Base {
+  Base();
+  ~Base();
+};
+
+#define STAMP(thiz, prev) using thiz = Foo< \
+  prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, \
+  prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, \
+  prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev \
+  >;
+STAMP(A, Base);
+STAMP(B, A);
+STAMP(C, B);
+
+int main() {
+  C::Inner val_1;
+}
+
+// CHECK: %"struct.Foo TypeName;
-  llvm::raw_svector_ostream OS(TypeName);
+  PrintingPolicy Policy = RD->getASTContext().getPrintingPolicy();
+  Policy.NotForCompilation = true;
+  llvm::raw_abbrev_ostream OS;
+  OS.setHash().setTrunc().setBeginMarker("...");
+
+  // Set truncation limit. Long value helps in debugging but can result in
+  // higher memory consumption.
+  OS.setLimit(400);
+
   OS << RD->getKindName() << '.';
-  
+
   // Name the codegen type after the typedef name
   // if there is no tag type name available
   if (RD->getIdentifier()) {
+OS.startAbbrev();
 // FIXME: We should not have to check for a null decl context here.
 // Right now we do it because the implicit Obj-C decls don't have one.
 if (RD->getDeclContext())
-  RD->printQualifiedName(OS);
+  RD->printQualifiedName(OS, Policy);
 else
   RD->printName(OS);
   } else if (const TypedefNameDecl *TDD = RD->getTypedefNameForAnonDecl()) {
+OS.startAbbrev();
 // FIXME: We should not have to check for a null decl context here.
 // Right now we do it because the implicit Obj-C decls don't have one.
 if (TDD->getDeclContext())
-  TDD->printQualifiedName(OS);
+  TDD->printQualifiedName(OS, Policy);
 else
   TDD->printName(OS);
   } else
 OS << "anon";
+  OS.stopAbbrev();
 
   if (!suffix.empty())
 OS << suffix;
Index: lib/AST/TypePrinter.cpp
===
--- lib/AST/TypePrinter.cpp
+++ lib/AST/TypePrinter.cpp
@@ -1540,9 +1540,13 @@
   bool NeedSpace = false;
   bool FirstArg = true;
   for (const auto  : Args) {
-// Print the argument into a string.
-SmallString<128> Buf;
-llvm::raw_svector_ostream ArgOS(Buf);
+SmallString<0> Buffer;
+std::unique_ptr BufOS;
+if (!Policy.NotForCompilation) {
+  // Print the argument into a string.
+  BufOS.reset(new llvm::raw_svector_ostream(Buffer));
+}
+llvm::raw_ostream  = Policy.NotForCompilation ? OS : *BufOS;
 const TemplateArgument  = getArgument(Arg);
 if (Argument.getKind() == TemplateArgument::Pack) {
   if (Argument.pack_size() && !FirstArg)
@@ -1553,17 +1557,19 @@
 OS << Comma;
   Argument.print(Policy, ArgOS);
 }
-StringRef ArgString = ArgOS.str();
+if (!Policy.NotForCompilation) {
+  StringRef ArgString = BufOS->str();
 
-// If this is the first argument and its string representation
-// begins with the global scope specifier ('::foo'), add a space
-// to avoid printing the diagraph '<:'.
-if (FirstArg && !ArgString.empty() && ArgString[0] == ':')
-  OS << ' ';
+  // If this is the first argument and its string representation
+  // begins with the global scope specifier ('::foo'), add a space
+  // to avoid printing the diagraph '<:'.
+  if (FirstArg && !ArgString.empty() && ArgString[0] == ':')
+OS << ' ';
 
-OS << ArgString;
+  OS << ArgString;
 
-NeedSpace = (!ArgString.empty() && ArgString.back() == '>');
+  NeedSpace = (!ArgString.empty() && ArgString.back() == '>');
+}
 FirstArg = false;
 

[PATCH] D40508: Replace long type names in IR with hashes

2017-11-27 Thread John McCall via Phabricator via cfe-commits
rjmccall added a comment.

In Swift's IRGen, we have an option controlling whether to emit meaningful 
value names.  That option can be set directly from the command line, but it 
defaults to whether we're emitting IR assembly (i.e. .ll, not .bc), which means 
that the clients most interested in getting meaningful value names — humans, 
presumably — always get good names, but nobody else pays for them.  I have 
many, many times wished that we'd taken that same approach in Clang instead of 
basing it on build configuration.

If type names are a significant burden on compile times, should we just start 
taking that same approach?  Just don't use real type names in .bc output unless 
we're asked to.  Maybe we can eventually retroactively use the same option for 
value names.

I agree with Reid that it's really weird for there to be a raw_ostream that 
automatically rewrites the string contents to be a hash when some limit is 
reached; that does not feel like generalizable technology.




Comment at: include/clang/AST/PrettyPrinter.h:231
+  /// make things like breaking digraphs and '>>' in template parameters.
+  bool NotForCompilation : 1;
 };

This saves, what, a few spaces from a few thousand IR type names?  I'm 
skeptical that this even offsets the code-size impact of adding this option.



Comment at: include/clang/AST/Type.h:4623
+   const TemplateArgumentListInfo ,
+   const PrintingPolicy );
+

I like this refactor, but since it's the majority of your patch, please split 
it out (it can use post-commit review) and make this patch just about your 
actual change.



Comment at: lib/AST/TypePrinter.cpp:1543
+  for (const auto  : Args) {
+std::unique_ptr> Buffer;
+std::unique_ptr BufOS;

rnk wrote:
> Just use `SmallString<0>` for Buffer. No wasted stack space, no extra 
> unique_ptr. It will allocate memory if it needs it.
Well, it might as well have some stack storage, but otherwise I agree.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-27 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added a comment.

It's not clear to me that this abbreviation functionality should live in 
Support. Clang probably wants enough control over this (assuming we're doing 
it) that the logic should live in clang.

I also think we might want to try solving this a completely different way: just 
don't bother emitting more than two template arguments for IR type names. We 
don't really need to worry about type name uniqueness or matching them across 
TUs.




Comment at: lib/AST/TypePrinter.cpp:1532-1534
+namespace {
+template
+void printTo(raw_ostream , ArrayRef Args, const PrintingPolicy ,

`static` is nicer than a short anonymous namespace.



Comment at: lib/AST/TypePrinter.cpp:1543
+  for (const auto  : Args) {
+std::unique_ptr> Buffer;
+std::unique_ptr BufOS;

Just use `SmallString<0>` for Buffer. No wasted stack space, no extra 
unique_ptr. It will allocate memory if it needs it.



Comment at: lib/AST/TypePrinter.cpp:1588-1589
 
-// Sadly, repeat all that with TemplateArgLoc.
-void TemplateSpecializationType::
-PrintTemplateArgumentList(raw_ostream ,
-  ArrayRef Args,
-  const PrintingPolicy ) {
-  OS << '<';
-  const char *Comma = Policy.MSVCFormatting ? "," : ", ";
-
-  bool needSpace = false;
-  bool FirstArg = true;
-  for (const TemplateArgumentLoc  : Args) {
-if (!FirstArg)
-  OS << Comma;
-
-// Print the argument into a string.
-SmallString<128> Buf;
-llvm::raw_svector_ostream ArgOS(Buf);
-if (Arg.getArgument().getKind() == TemplateArgument::Pack) {
-  PrintTemplateArgumentList(ArgOS,
-Arg.getArgument().getPackAsArray(),
-Policy, true);
-} else {
-  Arg.getArgument().print(Policy, ArgOS);
-}
-StringRef ArgString = ArgOS.str();
-
-// If this is the first argument and its string representation
-// begins with the global scope specifier ('::foo'), add a space
-// to avoid printing the diagraph '<:'.
-if (FirstArg && !ArgString.empty() && ArgString[0] == ':')
-  OS << ' ';
-
-OS << ArgString;
-
-needSpace = (!ArgString.empty() && ArgString.back() == '>');
-FirstArg = false;
-  }
+namespace clang {
+void printTemplateArgumentList(raw_ostream ,
+   const TemplateArgumentListInfo ,

It's usually nicer to define free functions in namespaces as `void 
clang::printTemplateArgumentList(...`. This ensures that nobody can mess up the 
namespace scope or forget to include the header that declares the function. It 
also sometimes turns link errors into compile errors.


https://reviews.llvm.org/D40508



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D40508: Replace long type names in IR with hashes

2017-11-27 Thread Serge Pavlov via Phabricator via cfe-commits
sepavloff created this revision.
Herald added a subscriber: JDevlieghere.

If a source file extensively uses templates, resulting LLVM IR may have
types with huge names. It may occur if a record type is defined in a class.
In this case its type name contains all declaration contexts and, if a
declaration context is a class specialization, it is specified with
template parameters.

This change implements transformation of long IR type names. If name length
exceeds some limit, it is truncated and SHA1 hash of its full name is
appended to the obtained abbreviated name. Such solution could reduce memory
footprint and still keep names usable for identification.

To implement this algorithm functions PrintTemplateArgumentList were changed.
They try to make their output a valid C++ code. For this purpose they ensure
that digraph '<:' and tokens '>>' are not formed by inserting space between
characters. The implementation prints template arguments into a separate
stream and then put its content into 'uplevel' stream adding space before
and/or after the text if necessary. Such implementation prevents from using
special stream implementations because the intermediate stream is always of
the same type. To cope with this problem, a new flag in PrintingPolicy is
introduced, which turns off checks for undesirable character sequences. In
this case the intermediate stream becomes unneeded and printing occurs into
the same stream.


https://reviews.llvm.org/D40508

Files:
  include/clang/AST/PrettyPrinter.h
  include/clang/AST/Type.h
  lib/AST/ASTContext.cpp
  lib/AST/Decl.cpp
  lib/AST/DeclTemplate.cpp
  lib/AST/NestedNameSpecifier.cpp
  lib/AST/StmtPrinter.cpp
  lib/AST/TypePrinter.cpp
  lib/CodeGen/CGDebugInfo.cpp
  lib/CodeGen/CodeGenTypes.cpp
  lib/Sema/SemaTemplateInstantiate.cpp
  test/CodeGenCXX/template-types.cpp
  tools/libclang/CIndex.cpp

Index: tools/libclang/CIndex.cpp
===
--- tools/libclang/CIndex.cpp
+++ tools/libclang/CIndex.cpp
@@ -4718,12 +4718,12 @@
 // If the type was explicitly written, use that.
 if (TypeSourceInfo *TSInfo = ClassSpec->getTypeAsWritten())
   return cxstring::createDup(TSInfo->getType().getAsString(Policy));
-
+
 SmallString<128> Str;
 llvm::raw_svector_ostream OS(Str);
 OS << *ClassSpec;
-TemplateSpecializationType::PrintTemplateArgumentList(
-OS, ClassSpec->getTemplateArgs().asArray(), Policy);
+printTemplateArgumentList(OS, ClassSpec->getTemplateArgs().asArray(),
+  Policy);
 return cxstring::createDup(OS.str());
   }
   
Index: test/CodeGenCXX/template-types.cpp
===
--- /dev/null
+++ test/CodeGenCXX/template-types.cpp
@@ -0,0 +1,30 @@
+// RUN: %clang_cc1 -std=c++11 -triple i686-linux-gnu -S -emit-llvm %s -o - | FileCheck %s
+
+// Taken from the test pr29160.cpp
+template 
+struct Foo {
+  template 
+  static void ignore() {}
+  Foo() { ignore(); }
+  struct Inner {};
+};
+
+struct Base {
+  Base();
+  ~Base();
+};
+
+#define STAMP(thiz, prev) using thiz = Foo< \
+  prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, \
+  prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, \
+  prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev, prev \
+  >;
+STAMP(A, Base);
+STAMP(B, A);
+STAMP(C, B);
+
+int main() {
+  C::Inner val_1;
+}
+
+// CHECK: %"struct.Foo TemplateArgsStr;
   llvm::raw_svector_ostream OS(TemplateArgsStr);
   Template->printName(OS);
-  TemplateSpecializationType::PrintTemplateArgumentList(
-  OS, Active->template_arguments(), getPrintingPolicy());
+  printTemplateArgumentList(OS, Active->template_arguments(),
+getPrintingPolicy());
   Diags.Report(Active->PointOfInstantiation,
diag::note_default_arg_instantiation_here)
 << OS.str()
@@ -562,8 +562,8 @@
   SmallVector TemplateArgsStr;
   llvm::raw_svector_ostream OS(TemplateArgsStr);
   FD->printName(OS);
-  TemplateSpecializationType::PrintTemplateArgumentList(
-  OS, Active->template_arguments(), getPrintingPolicy());
+  printTemplateArgumentList(OS, Active->template_arguments(),
+getPrintingPolicy());
   Diags.Report(Active->PointOfInstantiation,
diag::note_default_function_arg_instantiation_here)
 << OS.str()
Index: lib/CodeGen/CodeGenTypes.cpp
===
--- lib/CodeGen/CodeGenTypes.cpp