[PATCH] D43667: [clang-doc] Implement a YAML generator

2018-04-10 Thread Athos via Phabricator via cfe-commits
Athosvk added a comment.

In https://reviews.llvm.org/D43667#1063049, @juliehockett wrote:

> In https://reviews.llvm.org/D43667#1062746, @Athosvk wrote:
>
> > I'm a bit late on this, but I'd say that YAML is usually not a 'final' 
> > format. What would be the use-cases for this? And if is meant as an 
> > alternative intermediate format, why not instead of having one big file, 
> > have one file per input file? Just wondering what the particular motivation 
> > for that could be
>
>
> The idea is that it's a generally-consumable output format, and so could be 
> interpreted by external software fairly trivially. The bitsream format is 
> compact and good for things that will live entirely in the clang-doc tool, 
> but is harder to deal with outside that scope. YAML bridges that gap.


That's what I expected :).

The primary advantage of one file per output to us was granularity. That allows 
you to do incremental builds and distribute them at this stage (locally over 
multiple cores, but also remotely if need be).


https://reviews.llvm.org/D43667



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43667: [clang-doc] Implement a YAML generator

2018-04-10 Thread Athos via Phabricator via cfe-commits
Athosvk added a comment.

I'm a bit late on this, but I'd say that YAML is usually not a 'final' format. 
What would be the use-cases for this? And if is meant as an alternative 
intermediate format, why not instead of having one big file, have one file per 
input file? Just wondering what the particular motivation for that could be

(Don't mind the previous and inline comment, it was old and I never got to 
submit it. Differential doesn't let me remove it unfortunately)


https://reviews.llvm.org/D43667



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43667: [clang-doc] Implement a YAML generator

2018-04-10 Thread Athos via Phabricator via cfe-commits
Athosvk added a comment.

I'm a bit late on this, but I'd say that YAML is usually not a 'final' format. 
What would be the use-cases for this? And if is meant as an alternative 
intermediate format, why not instead of having one big file, have one file per 
input file? Just wondering what the particular motivation for that could be




Comment at: clang-doc/generators/YAMLGenerator.cpp:159
+if (!C->Position.empty()) IO.mapRequired("Position", C->Position);
+if (!C->Children.empty()) IO.mapRequired("Children", C->Children);
+  }

You should easily be able to unify these 'empty' checks


https://reviews.llvm.org/D43667



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41102: Setup clang-doc frontend framework

2018-03-07 Thread Athos via Phabricator via cfe-commits
Athosvk added a comment.

In https://reviews.llvm.org/D41102#1028760, @juliehockett wrote:

> If you take a look at the follow-on patch to this (D43341 
> ), you'll see that that is where the pointer 
> is added in (since it is irrelevant to the mapper portion, as it cannot be 
> filled out until the information has been reduced). The back references to 
> children and whatnot are also added there.


Oops! I'll have a look!

In https://reviews.llvm.org/D41102#1028760, @juliehockett wrote:

> The USRs are kept for serialization purposes -- given the modular nature of 
> the design, the goal is to be able to write out the bitstream and have it be 
> consumable with all necessary information. Since we can't write out pointers 
> (and it would be useless if we did, since they would change as soon as the 
> file was read in), we maintain the USRs to have a means of re-finding the 
> referenced declaration.


What I was referring to was the storing of a USR per reference. Of course, 
serializing pointers wouldn't work, but what I mean is that what we used as a 
USR was stored in what was pointed to, not in the reference that tells what we 
are pointing to. To be a little more concise, a RecordInfo has pointers to the 
FuntionInfo for its member functions. Upon serialization, the RecordInfo 
queries the USR of those functions. A function being referenced multiple times 
remains to only have the USR stored. If I understand correctly, you currently 
save the USR for time an InfoType references another InfoType.

Anyhow, don't pay too much attention to that comment, it's all meant as a minor 
thing. It sure is looking good so far!


https://reviews.llvm.org/D41102



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41102: Setup clang-doc frontend framework

2018-03-06 Thread Athos via Phabricator via cfe-commits
Athosvk added a comment.

My apologies for getting back on this so late!

In https://reviews.llvm.org/D41102#1017683, @juliehockett wrote:

> So, as an idea (as this diff implements), I updated the string references to 
> be a struct, which holds the USR of the referenced type (for serialization, 
> both here in the mapper and for the dump option in the reducer, as well as a 
> pointer to an `Info` struct. This pointer is not used at this point, but 
> would be populated by the reducer. Thoughts?


This seems like quite a decent approach! That being said, I don't see the 
pointer yet? I assume you mean that you will be adding this? Additionally, a 
slight disadvantage of doing this generic approach is that you need to do 
bookkeeping on what it is referencing, but I guess there's no helping that due 
to the architecture which makes you rely upon the USR? Personally I'd prefer 
having the explicit types if and where possible. So for now a RecordInfo has a 
vecotr of Reference's to its parents, but we know the parents can only be of 
certain kinds (more than just a RecordType, but you get the point); it won't be 
an enum, namespace or function.

As I mentioned, we did this the other way around, which also has the slight 
advantage that I only had to create and save the USR once per info instance (as 
in, 10 references to a class only add the overhead of 10 pointers, rather than 
each having the USR as well), but our disadvantage was of course that we had 
delayed serialization (although we could arguably do both simultaneously). It 
seems each method has its merits :).


https://reviews.llvm.org/D41102



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41102: Setup clang-doc frontend framework

2018-02-23 Thread Athos via Phabricator via cfe-commits
Athosvk added a comment.

The change to USR seems like quite an improvement already! That being said, I 
do think that it might be preferable to opt out of the use of strings for 
linking things together. What we did with our clang-doc is that we directly 
used pointers to refer to other types. So for example, our class for storing 
Record/CXX related information has something like:

  std::vectormMethods;
  std::vectormVariables;
  std::vectormEnums;
  std::vector mTypedefs;

Only upon serialization we fetch some kind of USR that would uniquely identify 
the type. This is especially useful to us for the conversion to HTML and I 
think the same would go for this backend, as it seems this way you'll have to 
do string lookups to get to the actual types, which would be inefficient in 
multiple aspects. It can make the backend a little more of a one-on-one 
conversion, e.g. with one of our HTML template definitions (note: this is a 
Jinja2 template in Python):

  {%- for enum in inEntry.GetMemberEnums() -%}




{{- 
Modifiers.RenderAccessModifier(enum.GetAccessModifier()) -}}
enum {{- enum.GetName().GetName()|e -}}
{{- 
Descriptions.RenderDescription(enum.GetBriefDescription()) -}}

  {%- endfor -%}

Disadvantage is of course that you add complexity to certain parts of the 
deserialization (/serialization) for nested types and inheritance, by either 
having to do so in the correct order or having to defer the process of 
initializing these pointers. But see this as just as some thought sharing. I do 
think this would improve the interaction in the backend (assuming you use the 
same representation as currently in the frontend). Also, we didn't apply this 
to our Type representation (which we use to store the type of a member, 
parameter etc.), which stores the name of the type rather than a pointer to it 
(since it can also be a built-in), though it embeds pretty much every possible 
modifier on said type, like this:

  EntryName mName;  

  bool  mIsConst = false;   

  EReferenceTypemReferenceType = EReferenceType::None;  
  std::vector mPointerConstnessMask;  

  std::vector  mArraySizes;

  bool  mIsAtomic = false;  

  std::vectormAttributes;

  bool  mIsExpansion = false;   

  std::vector mTemplateArguments; 

  std::unique_ptr mFunctionTypeProperties = 
nullptr;
  EntryName mParentCXXEntry;

The last member refers to the case where a pointer is a pointer to member, 
though some other fields may require some explaining too. Anyway, this is just 
to give some insight into how we structured our representation, where we 
largely omitted string representations where possible.

Have you actually started work already on some backend? Developing backend and 
frontend in tandem can provide some additional insights as to how things should 
be structured, especially representation-wise!




Comment at: clang-doc/Representation.h:113
+  TagTypeKind TagType;
+  llvm::SmallVector Members;
+  llvm::SmallVector ParentUSRs;

How come these are actually unique ptrs? They can be stored directly in the 
vector, right? (same for CommentInfo children, FnctionInfo params etc.)


https://reviews.llvm.org/D41102



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41102: Setup clang-doc frontend framework

2018-02-20 Thread Athos via Phabricator via cfe-commits
Athosvk added a comment.

The changes seem good (both mapper and additional changes)! I've added some 
comments, but those are primarily details.

What I'm however primarily interested in is the following:

1. You seemingly output only little information for declarations that are not 
definition. Is that temporary? Will you //need// to have a definition in order 
for a function be documented?
2. I've mentioned it before as a comment, but to what extent will you be 
parsing information in this frontend? Currently the links between types are 
primarily stored as strings. Are you planning to have the backend that 
generates the MarkDown parse those strings and link them to types? E.g. the 
parenttype is a std::vector and assuming you want the markdown to 
have a link to this parent type, will the MarkDown have to parse this type and 
lookup the link to the original type? Or will you embed references to other 
types within the intermediate format?

I'm curious to hear your thoughts on this!




Comment at: clang-doc/ClangDocRepresentation.h:39
+  llvm::SmallVector Position;
+  std::vector Children;
+};

I might be missing something, but can't this be a unique ptr? Shouldn't 
children of comments only have one parent?



Comment at: clang-doc/ClangDocRepresentation.h:46
+struct NamedType {
+  enum FieldName { PARAM = 1, MEMBER, RETTYPE };
+  FieldName Field;

Perhaps use an enum class instead? Same goes for the other enums



Comment at: clang-doc/ClangDocRepresentation.h:63
+  std::string SimpleName;
+  std::string Namespace;
+  llvm::SmallVector Description;

It's not too important for now , but you probably want to at least store the 
namespace identifier for each nested namespace at some point. So instead you 
store a vector of namespaces, which in the final markdown generation stage 
allows you to link to each namespace individually (assuming you'll have some 
kind of namespace overview pages)



Comment at: tools/clang-doc/ClangDocReporter.h:42
   std::string Name;
   AccessSpecifier Access;
 };

You might want to separate this out to a FieldType/MemberType or something 
alike, as only class members will have this set, while you also use this for 
parameters/return types etc. I know there's AS_NONE but it seems a little 
wasteful considering the amount of instances that will not have this set


https://reviews.llvm.org/D41102



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41102: Setup clang-doc frontend framework

2018-02-06 Thread Athos via Phabricator via cfe-commits
Athosvk added inline comments.



Comment at: tools/clang-doc/ClangDocReporter.h:39
 // Info for named types (parameters, members).
 struct NamedType {
   std::string Type;

Storing the type information seems more suitable than storing just the name and 
type as a string. In my view, the frontend creates a format suitable for 
(almost) any backend to use without further parsing. This would for example 
require me to parse part of the name to get the namespace.



Comment at: tools/clang-doc/ClangDocReporter.h:75
 struct Info {
   bool isDefined = false;
   std::string FullyQualifiedName;

NItpick but should probably be 'IsDefined'



Comment at: tools/clang-doc/ClangDocReporter.h:87
   std::string MangledName;
   std::string DefinitionFile;
   std::string ReturnType;

Seems common to almost all Info structs, so you can probably move it to 
the/some base.

Namespaces do seem unrelated, so maybe you can make another struct inbetween? 
E.g. something like struct SymbolInfo : Info which contains a field for 
DefinitionFile and Locations (since that may not be used for namespaces either).

Additionally, what will you do when you merge this output information from 
multiple compilation untis back together? Only one should have the 
DefinitionFile set as the other compilation units won't see the definition. 
What happens if a function stays undefined? Can you generate documentation for 
it?



Comment at: tools/clang-doc/ClangDocReporter.h:88
   std::string DefinitionFile;
   std::string ReturnType;
   llvm::SmallVector Params;

This should probably be a NamedType like the parameters



Comment at: tools/clang-doc/ClangDocReporter.h:89
   std::string ReturnType;
   llvm::SmallVector Params;
   AccessSpecifier Access;

Perhaps you could already attach the parameter comments to this? So something 
like a 

```
struct ParamInfo
{
NamedType Type;
std::string/CommentInfo Description/CommentInfo;
}
```
Or are you planning to keep this to a later stage? At this point it seems like 
the backend will have to parse a CommentInfo struct to attach comments to 
parameters etc. manually



Comment at: tools/clang-doc/ClangDocReporter.h:93
 
 struct NamespaceInfo : public Info {
+  llvm::StringMap Functions;

Currently info stores the locations of occurrences, but this seems like a hard 
thing to do when it comes namespaces. Is it useful to have this particular 
information?


https://reviews.llvm.org/D41102



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41102: Setup clang-doc frontend framework

2018-02-06 Thread Athos via Phabricator via cfe-commits
Athosvk added a comment.

As we had discussed before, we're interested in the development as well! As an 
overall comment, I speak from experience that maintaining a large degree of 
documentation throughout the source code of the tool can provide an excellent 
test-case.

We sure hope this will move forward well, but so far it seems to be heading in 
the right overall direction. I'll try to place some of my thoughts on the 
overall structure as well!


https://reviews.llvm.org/D41102



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits