Ping - any thoughts here?
On Sun, Jul 24, 2022 at 9:08 PM David Blaikie <dblai...@gmail.com> wrote: > > Ping on this thread - would love to hear what ideas folks have for > addressing the naming of anonymous types (enums, structs/classes, and > lambdas) - especially if it'd make it easier to go back/forth between > the DW_AT_name of a template with an unnamed type as a parameter and > the actual DIEs describing the same parameter type. > > On Tue, Jun 14, 2022 at 1:02 PM David Blaikie <dblai...@gmail.com> wrote: > > > > Looks like https://reviews.llvm.org/D122766 (-ffile-reproducible) might > > solve my immediate issues in clang, but I think we should still consider > > moving to a more canonical naming of lambdas that, necessarily, doesn't > > include the file name (unfortunately). Probably has to include the lambda > > numbering/something roughly equivalent to the mangled lambda name - it > > could include type information (it'd be superfluous to a unique identifier, > > but I don't think it would break consistently naming the same type across > > CUs either). > > > > Anyone got ideas/preferences/thoughts on this? > > > > On Mon, Jan 24, 2022 at 5:51 PM David Blaikie <dblai...@gmail.com> wrote: > >> > >> On Mon, Jan 24, 2022 at 5:37 PM Adrian Prantl <apra...@apple.com> wrote: > >>> > >>> > >>> > >>> On Jan 23, 2022, at 2:53 PM, David Blaikie <dblai...@gmail.com> wrote: > >>> > >>> A rather common "quality of implementation" issue seems to be lambda > >>> naming. > >>> > >>> I came across this due to non-canonicalization of lambda names in > >>> template parameters depending on how a source file is named in Clang, and > >>> GCC's seem to be very ambiguous: > >>> > >>> $ cat tmp/lambda.h > >>> template<typename T> > >>> void f1(T) { } > >>> static int i = (f1([]{}), 1); > >>> static int j = (f1([]{}), 2); > >>> void f1() { > >>> f1([]{}); > >>> f1([]{}); > >>> } > >>> $ cat tmp/lambda.cpp > >>> #ifdef I_PATH > >>> #include <tmp/lambda.h> > >>> #else > >>> #include "lambda.h" > >>> #endif > >>> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot > >>> lambda.o | grep "f1<" > >>> DW_AT_name ("f1<(lambda at ./tmp/lambda.h:3:20)>") > >>> DW_AT_name ("f1<(lambda at ./tmp/lambda.h:4:20)>") > >>> DW_AT_name ("f1<(lambda at ./tmp/lambda.h:6:6)>") > >>> DW_AT_name ("f1<(lambda at ./tmp/lambda.h:7:6)>") > >>> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep > >>> "f1<" > >>> DW_AT_name ("f1<(lambda at tmp/lambda.h:3:20)>") > >>> DW_AT_name ("f1<(lambda at tmp/lambda.h:4:20)>") > >>> DW_AT_name ("f1<(lambda at tmp/lambda.h:6:6)>") > >>> DW_AT_name ("f1<(lambda at tmp/lambda.h:7:6)>") > >>> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep > >>> "f1<" > >>> DW_AT_name ("f1<f1()::<lambda()> >") > >>> DW_AT_name ("f1<f1()::<lambda()> >") > >>> DW_AT_name ("f1<<lambda()> >") > >>> > >>> DW_AT_name ("f1<<lambda()> >") > >>> > >>> (I came across this in the context of my simplified template names work - > >>> rebuilding names from the DW_TAG description of the template parameters - > >>> and while I'm not rebuilding names that have lambda parameters (keep > >>> encoding the full string instead). The issue is if some other type > >>> depending on a type with a lambda parameter - but then multiple uses of > >>> that inner type exist, from different translation units (using type > >>> units) with different ways of naming the same file - so then the expected > >>> name has one spelling, but the actual spelling is different due to the > >>> "./") > >>> > >>> But all this said - it'd be good to figure out a reliable naming - the > >>> naming we have here, while usable for humans (pointing to surce files, > >>> etc) - they don't reliably give unique names for each lambda/template > >>> instantiation which would make it difficult for a consumer to know if two > >>> entities are the same (important for types - is some function parameter > >>> the same type as another type?) > >>> > >>> While it's expected cross-producer (eg: trying to be compatible with GCC > >>> and Clang debug info) you have to do some fuzzy matching (eg: "f1<int*>" > >>> or "f1<int *>" at the most basic - there are more complicated cases) - > >>> this one's not possible with the data available. > >>> > >>> The source file/line/column is insufficient to uniquely identify a lambda > >>> (multiple lambdas stamped out by a macro would get all the same > >>> file/line/col) and valid code (albeit unlikely) that writes the same > >>> definition in multiple places could make the same lambda have different > >>> names. > >>> > >>> We should probably use something more like the way various ABI manglings > >>> do to identify these entities. > >>> > >>> But we should probably also do this for other unnamed types that have > >>> linkage (need to/would benefit from being matched up between two CUs), > >>> even not lambdas. > >>> > >>> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for > >>> these symbols is: > >>> > >>> void f1<$_0>($_0) > >>> f1<$_1>($_1) > >>> void f1<f1()::$_2>(f1()::$_2) > >>> void f1<f1()::$_3>(f1()::$_3) > >>> > >>> Should we use that instead? > >>> > >>> > >>> The only other information that the current human-readable DWARF name > >>> carries is the file+line and that is fully redundant with > >>> DW_AT_file/line, so the above scheme seem reasonable to me. Poorly > >>> symbolicated backtraces would be worse in this scheme, so I'm expecting > >>> most pushback from users who rely on a tool that just prints the human > >>> readable name with no source info. > >> > >> > >> Yeah - you can always pull the file/line/col from the DW_AT_decl_* anyway, > >> so encoding it in the type name does seem redundant and inefficient indeed > >> (beyond/independent of the correctness issues). > >>> > >>> GCC's mangling's different (in these examples that's OK, since they're > >>> all internal linkage): > >>> > >>> void f1<f1()::'lambda0'()>(f1()::'lambda0'()) > >>> void f1<f1()::'lambda'()>(f1()::'lambda'()) > >>> > >>> If I add an example like this: > >>> > >>> inline auto f1() { return []{}; } > >>> > >>> and instantiate the template with the result of f1: > >>> > >>> void f1<f2()::'lambda'()>(f2()::'lambda'()) > >>> > >>> GCC: > >>> > >>> void f1<f2()::'lambda'()>(f2()::'lambda'()) > >>> > >>> So they consistently use the same mangling - we could use the same naming > >>> for template parameters? > >>> > >>> How should we communicate this sort of identity for unnamed types in the > >>> DIEs describing the types themselves (not just the string of a template > >>> name of a type instantiated with the unnamed type) so the unnamed type > >>> can be matched up between translation units. > >>> > >>> eg, if I have these two translation units: > >>> // header > >>> inline auto f1() { struct { } local; return local; } > >>> // unit 1: > >>> #include "header" > >>> auto f2(decltype(f1())) { } > >>> // unit 2: > >>> #include "header" > >>> decltype(f1()) v1; > >>> > >>> Currently the DWARF produced for this unnamed type is: > >>> 0x0000003f: DW_TAG_structure_type > >>> DW_AT_calling_convention (DW_CC_pass_by_value) > >>> DW_AT_byte_size (0x01) > >>> DW_AT_decl_file > >>> ("/usr/local/google/home/blaikie/dev/scratch/test.cpp") > >>> DW_AT_decl_line (1) > >>> > >>> > >>> is this the type of struct {}? > >> > >> > >> Yep. You'll get separate distinct descriptions that are essentially the > >> same - imagine if `f1` had two such types written as "struct {}" (say they > >> were used to instantiate two different templates - "struct {} a; struct {} > >> b; f_templ(a); f_templ(b);" - the DWARF will have two of those unnamed > >> DW_TAG_structure_types and two template specializations, etc - but no way > >> to know which of those unnamed types line up with uses in another > >> translation unit, in terms of overload resolution, etc. > >>> > >>> So there's no way to know if you see that structure type definition in > >>> two different translation units whether they refer to the same type > >>> because there may be multiple types that have the same DWARF description. > >>> (so no way to know if the DWARF consumer should allow the user to > >>> evaluate an expression `f2(v1)` or not, I think?) > >>> > >>> > >>> Does a C++ compiler usually treat structurally equivalent but differently > >>> named types as interchangeable? > >> > >> > >> No - given "struct A { int i; }; struct B { int i; }; void f1(A); ... " - > >> "f1(A())" is valid, but "f1(B())" is invalid and an error at compile-time. > >> https://godbolt.org/z/de7Yce1qW > >> > >>> > >>> Does a C++ compiler usually treat structurally equivalent anonymous types > >>> as interchangeable? > >> > >> > >> No, same rules apply as named types: https://godbolt.org/z/hxWMYbWc8 > >> > >>> > >>> > >>> -- adrian > >>> > >>> > >>> I guess the only way to have an unnamed type with linkage is to use it > >>> inside an inline function - so within that scope you'd have to produce > >>> DWARF for any types consistently in all definitions of the function and > >>> then a consumer could match them up by counting (assuming the unnamed > >>> types were always emitted in the same order in the child DIE list)... > >>> > >>> But this all seems a bit subtle & maybe would benefit from a more > >>> robust/explicit description? > >>> > >>> Perhaps adding an integer attribute to number anonymous types? They'd > >>> need to differentiate between lambdas and other anonymous types, since > >>> they have separate numberings. > >>> > >>> _______________________________________________ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org