Re: Type representation in CTF and DWARF
On Fri, Oct 25, 2019 at 1:52 AM Indu Bhagat wrote: > > > > On 10/11/2019 04:41 AM, Jakub Jelinek wrote: > > On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote: > >>> (coreutils-0.22) > >>>.debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > >>> ls 30616 |1136 |21098 | 26240 > >>> | 0.62 > >>> pwd 10734 |788|10433 | 13929 > >>> | 0.83 > >>> groups 10706 |811|10249 | 13378 > >>> | 0.80 > >>> > >>> (emacs-26.3) > >>>.debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > >>> emacs-26.3.1 674657 |6402 | 273963 | 273910 > >>> | 0.33 > >>> > >>> I chose to account for 50% of .debug_str because at this point, it will be > >>> unfair to not account for them. Actually, one could even argue that upto > >>> 70% > >>> of the .debug_str are names of entities. CTF section sizes do include the > >>> CTF > >>> string tables. > >>> > >>> Across coreutils, I see a geomean of 0.73 (ratio of > >>> .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the > >>> "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger > >>> footprint than CTF (with 50% of .debug_str accounted for). > >> I'm not convinced this "improvement" in size is worth maintainig another > >> debug-info format much less since it lacks desirable features right now > >> and thus evaluation is tricky. > >> > >> At least you can improve dwarf size considerably with a low amount of work. > >> > >> I suspect another factor where dwarf is bigger compared to CTF is that > >> dwarf > >> is recording typedef names as well as qualified type variants. But maybe > >> CTF just has a more compact representation for the bits it actually > >> implements. > > Does CTF record automatic variables in functions, or just global variables? > > If only the latter, it would be fair to also disable addition of local > > variable DIEs, lexical blocks. Does CTF record inline functions? Again, if > > not, it would be fair to not emit that either in .debug_info. > > -gno-record-gcc-switches so that the compiler command line is not encoded in > > the debug info (unless it is in CTF). > > CTF includes file-scope and global-scope entities. So, CTF for a function > defined/declared at these scopes is available in .ctf section, even if it is > inlined. > > To not generate DWARF for function-local entities, I made a tweak in the > gen_decl_die API to have an early exit when TREE_CODE (DECL_CONTEXT (decl)) > is FUNCTION_DECL. > > @@ -26374,6 +26374,12 @@ gen_decl_die (tree decl, tree origin, struct > vlr_context *ctx, > if (DECL_P (decl_or_origin) && DECL_IGNORED_P (decl_or_origin)) > return NULL; > > + /* Do not generate info for function local decl when -gdwarf-like-ctf is > + enabled. */ > + if (debug_dwarf_like_ctf && DECL_CONTEXT (decl) > + && (TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)) > +return NULL; > + > switch (TREE_CODE (decl_or_origin)) > { > case ERROR_MARK: A better place is probably in gen_subprogram_die, returning early before /* Output Dwarf info for all of the stuff within the body of the function (if it has one - it may be just a declaration). note we also emit DIEs for [optionally also unused, if requested] function declarations without actual definitions, I would guess CTF doesn't since there's no symbol table entry for those. Plus we by default prune types that are not used. So struct S { int i; }; extern void foo (struct S *); void bar() { struct S s; foo (&s); } would have DIEs for S and foo in addition to that for bar. To me it seems those are not relevant for function entry point inspection (eventually both S and foo have CTF info in the defining unit). Correct? Richard. > > For the numbers in the email today: > 1. CFLAGS="-g -gdwarf-like-ctf -gno-record-gcc-switches -O2". dwz is used on > generated binaries. > 2. At this time, I wanted to account for .debug_str entities appropriately > (not > 50% as done previously). Using a small script to count chars for > accounting the "path-like" strings, specifically those strings that start > with a ".", I gathered the data in column named D5. > > (coreutils-0.22) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings > (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5)) > ls 14100 |994|16945 | 1328 > | 26240 | 0.85 > pwd 6341 |632| 9311 | 596 > | 13929 | 0.88 > groups 6410 |714| 9218 | 667 > | 13378 | 0.85 > Average geomean across coreutils = 0.84 > > (ema
Re: Type representation in CTF and DWARF
On 10/11/2019 04:41 AM, Jakub Jelinek wrote: On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote: (coreutils-0.22) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) ls 30616 |1136 |21098 | 26240 | 0.62 pwd 10734 |788|10433 | 13929 | 0.83 groups 10706 |811|10249 | 13378 | 0.80 (emacs-26.3) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) emacs-26.3.1 674657 |6402 | 273963 | 273910 | 0.33 I chose to account for 50% of .debug_str because at this point, it will be unfair to not account for them. Actually, one could even argue that upto 70% of the .debug_str are names of entities. CTF section sizes do include the CTF string tables. Across coreutils, I see a geomean of 0.73 (ratio of .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger footprint than CTF (with 50% of .debug_str accounted for). I'm not convinced this "improvement" in size is worth maintainig another debug-info format much less since it lacks desirable features right now and thus evaluation is tricky. At least you can improve dwarf size considerably with a low amount of work. I suspect another factor where dwarf is bigger compared to CTF is that dwarf is recording typedef names as well as qualified type variants. But maybe CTF just has a more compact representation for the bits it actually implements. Does CTF record automatic variables in functions, or just global variables? If only the latter, it would be fair to also disable addition of local variable DIEs, lexical blocks. Does CTF record inline functions? Again, if not, it would be fair to not emit that either in .debug_info. -gno-record-gcc-switches so that the compiler command line is not encoded in the debug info (unless it is in CTF). CTF includes file-scope and global-scope entities. So, CTF for a function defined/declared at these scopes is available in .ctf section, even if it is inlined. To not generate DWARF for function-local entities, I made a tweak in the gen_decl_die API to have an early exit when TREE_CODE (DECL_CONTEXT (decl)) is FUNCTION_DECL. @@ -26374,6 +26374,12 @@ gen_decl_die (tree decl, tree origin, struct vlr_context *ctx, if (DECL_P (decl_or_origin) && DECL_IGNORED_P (decl_or_origin)) return NULL; + /* Do not generate info for function local decl when -gdwarf-like-ctf is + enabled. */ + if (debug_dwarf_like_ctf && DECL_CONTEXT (decl) + && (TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)) +return NULL; + switch (TREE_CODE (decl_or_origin)) { case ERROR_MARK: For the numbers in the email today: 1. CFLAGS="-g -gdwarf-like-ctf -gno-record-gcc-switches -O2". dwz is used on generated binaries. 2. At this time, I wanted to account for .debug_str entities appropriately (not 50% as done previously). Using a small script to count chars for accounting the "path-like" strings, specifically those strings that start with a ".", I gathered the data in column named D5. (coreutils-0.22) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5)) ls 14100 |994|16945 | 1328 | 26240 | 0.85 pwd 6341 |632| 9311 | 596 | 13929 | 0.88 groups 6410 |714| 9218 | 667 | 13378 | 0.85 Average geomean across coreutils = 0.84 (emacs-26.3) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5)) emacs-26.3.1 373678 |3794 | 219048 | 3842 | 273910 | 0.46 DWARF is highly extensible format, what exactly is and is not emitted is something that consumers can choose. Yes, DWARF can be large, but mainly because it provides a lot of information, the actual representation has been designed with size concerns in mind and newer versions of the standard keep improving that too. Jakub Yes. I started out to provide some numbers around the size impact of CTF vs DWARF as it was a legitimate curiosity many of us have had. Comparing Compactness or feature matrices is only one dimension of evaluating the utility of supporting CTF in the toolchain (including GCC; Bintuils and GDB have already accepted initial CTF support). The other dimension is a user friendly workflow which supports current users and eases further adoption and growth. Indu
Re: Type representation in CTF and DWARF
On 18 Oct 2019, Pedro Alves stated: > On 10/18/19 2:21 PM, Richard Biener wrote: > In most cases local types etc are a fairly small contributor to the total volume -- but macros can contribute a lot in some codebases. >>> (The Linux kernel's READ_ONCE macro is one I've personally been bitten by >>> in the past, with a new local struct in every use. GCC doesn't >>> deduplicate any of those so the resulting bloat from tens of thousands of >>> instances of this identical structure is quite incredible...) >>> >>> Sounds like something that would be beneficial to do with DWARF too. >> >> Otoh those are distinct types according to the C standard and since dwarf is >> a source level representation we should preserve this (source locations also >> differ). > > Right. Maybe some partial deduplication would be possible, preserving > type distinction. But since CTF doesn't include these, this is moot > for now. Yeah, the libctf API and existing CTF users only care if they're assignment-compatible, which they are. We could preserve more type-identity information if there was a need to do so, but none has yet emerged. -- NULL && (void)
Re: Type representation in CTF and DWARF
On 10/18/19 2:21 PM, Richard Biener wrote: >>> In most cases local types etc are a fairly small contributor to the >>> total volume -- but macros can contribute a lot in some codebases. >> (The >>> Linux kernel's READ_ONCE macro is one I've personally been bitten by >> in >>> the past, with a new local struct in every use. GCC doesn't >> deduplicate >>> any of those so the resulting bloat from tens of thousands of >> instances >>> of this identical structure is quite incredible...) >>> >> >> Sounds like something that would be beneficial to do with DWARF too. > > Otoh those are distinct types according to the C standard and since dwarf is > a source level representation we should preserve this (source locations also > differ). Right. Maybe some partial deduplication would be possible, preserving type distinction. But since CTF doesn't include these, this is moot for now. Thanks, Pedro Alves
Re: Type representation in CTF and DWARF
On October 18, 2019 1:59:36 PM GMT+02:00, Pedro Alves wrote: >On 10/17/19 7:59 PM, Nick Alcock wrote: >> On 17 Oct 2019, Richard Biener verbalised: >> >>> On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock >wrote: On 11 Oct 2019, Indu Bhagat stated: > Compile with -g -gdwarf-like-ctf and use dwz -o > (using > dwz compiled from the master branch) on the generated binaries: > > (coreutils-0.22) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf >(uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > ls 30616 |1136 |21098 | 26240 > | 0.62 > pwd 10734 |788|10433 | 13929 > | 0.83 > groups 10706 |811|10249 | 13378 > | 0.80 > > (emacs-26.3) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf >(uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > emacs-26.3.1 674657 |6402 | 273963 | >273910| 0.33 >>> >>> Btw, for a fair comparison you have to remove all DW_TAG_subroutine >>> children as well since CTF doesn't represent scopes or local >variables >>> at all (nor types only used by locals). It seems CTF only represents >>> function entry points. >> >> Good point: I'll have to hack up a DWARF trimmer to do this >comparison >> properly, I think. (Though CTF does represent global variables, >> including file-scope statics.) > >Wouldn't it be possible to extend the -gdwarf-like-ctf hack to skip >emitting those things? Sure. >> >> In most cases local types etc are a fairly small contributor to the >> total volume -- but macros can contribute a lot in some codebases. >(The >> Linux kernel's READ_ONCE macro is one I've personally been bitten by >in >> the past, with a new local struct in every use. GCC doesn't >deduplicate >> any of those so the resulting bloat from tens of thousands of >instances >> of this identical structure is quite incredible...) >> > >Sounds like something that would be beneficial to do with DWARF too. Otoh those are distinct types according to the C standard and since dwarf is a source level representation we should preserve this (source locations also differ). Richard. >Thanks, >Pedro Alves
Re: Type representation in CTF and DWARF
On 10/17/19 7:59 PM, Nick Alcock wrote: > On 17 Oct 2019, Richard Biener verbalised: > >> On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock wrote: >>> >>> On 11 Oct 2019, Indu Bhagat stated: Compile with -g -gdwarf-like-ctf and use dwz -o (using dwz compiled from the master branch) on the generated binaries: (coreutils-0.22) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) ls 30616 |1136 |21098 | 26240 | 0.62 pwd 10734 |788|10433 | 13929 | 0.83 groups 10706 |811|10249 | 13378 | 0.80 (emacs-26.3) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) emacs-26.3.1 674657 |6402 | 273963 | 273910 | 0.33 >> >> Btw, for a fair comparison you have to remove all DW_TAG_subroutine >> children as well since CTF doesn't represent scopes or local variables >> at all (nor types only used by locals). It seems CTF only represents >> function entry points. > > Good point: I'll have to hack up a DWARF trimmer to do this comparison > properly, I think. (Though CTF does represent global variables, > including file-scope statics.) Wouldn't it be possible to extend the -gdwarf-like-ctf hack to skip emitting those things? > > In most cases local types etc are a fairly small contributor to the > total volume -- but macros can contribute a lot in some codebases. (The > Linux kernel's READ_ONCE macro is one I've personally been bitten by in > the past, with a new local struct in every use. GCC doesn't deduplicate > any of those so the resulting bloat from tens of thousands of instances > of this identical structure is quite incredible...) > Sounds like something that would be beneficial to do with DWARF too. Thanks, Pedro Alves
Re: Type representation in CTF and DWARF
On 10/17/19 6:36 PM, Nick Alcock wrote: > A side note here: the sizes given above are uncompressed sizes, but in > the real world CTF is almost always compressed: the threshold for > compression is in theory customizable but at the moment is hardwired at > 4KiB-uncompressed in the linker. I usually see compression ratios of > roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary, > /usr/lib/libgtk-3.so.0.2404.3, and got these sizes: DWARF can be compressed too, with --compress-debug-sections. Thanks, Pedro Alves
Re: Type representation in CTF and DWARF
On 17 Oct 2019, Richard Biener verbalised: > On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock wrote: >> >> On 11 Oct 2019, Indu Bhagat stated: >> > Compile with -g -gdwarf-like-ctf and use dwz -o >> > (using >> > dwz compiled from the master branch) on the generated binaries: >> > >> > (coreutils-0.22) >> > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf >> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) >> > ls 30616 |1136 |21098 | 26240 >> > | 0.62 >> > pwd 10734 |788|10433 | 13929 >> > | 0.83 >> > groups 10706 |811|10249 | 13378 >> > | 0.80 >> > >> > (emacs-26.3) >> > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf >> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) >> > emacs-26.3.1 674657 |6402 | 273963 | 273910 >> > | 0.33 > > Btw, for a fair comparison you have to remove all DW_TAG_subroutine > children as well since CTF doesn't represent scopes or local variables > at all (nor types only used by locals). It seems CTF only represents > function entry points. Good point: I'll have to hack up a DWARF trimmer to do this comparison properly, I think. (Though CTF does represent global variables, including file-scope statics.) In most cases local types etc are a fairly small contributor to the total volume -- but macros can contribute a lot in some codebases. (The Linux kernel's READ_ONCE macro is one I've personally been bitten by in the past, with a new local struct in every use. GCC doesn't deduplicate any of those so the resulting bloat from tens of thousands of instances of this identical structure is quite incredible...)
Re: Type representation in CTF and DWARF
On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock wrote: > > On 11 Oct 2019, Indu Bhagat stated: > > Compile with -g -gdwarf-like-ctf and use dwz -o (using > > dwz compiled from the master branch) on the generated binaries: > > > > (coreutils-0.22) > > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > > ls 30616 |1136 |21098 | 26240 > > | 0.62 > > pwd 10734 |788|10433 | 13929 > > | 0.83 > > groups 10706 |811|10249 | 13378 > > | 0.80 > > > > (emacs-26.3) > > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > > emacs-26.3.1 674657 |6402 | 273963 | 273910 > > | 0.33 Btw, for a fair comparison you have to remove all DW_TAG_subroutine children as well since CTF doesn't represent scopes or local variables at all (nor types only used by locals). It seems CTF only represents function entry points. > A side note here: the sizes given above are uncompressed sizes, but in > the real world CTF is almost always compressed: the threshold for > compression is in theory customizable but at the moment is hardwired at > 4KiB-uncompressed in the linker. I usually see compression ratios of > roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary, > /usr/lib/libgtk-3.so.0.2404.3, and got these sizes: > > .text: 3317489 > DWARF: 8589254 > Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264 > .ctf section size: 213839 > > Note that this is not only in the absence of CTF strtab sharing with the > ELF dynstrtab, but also using a less effective compressor: currently we > use gzip, but I expect to transition to lzma iff available at binutils > build time (which it usually is), perhaps as an option (on by default) > to allow interoperability with binutils that don't have lzma available. > Obviously better compressors will save even more space. > > It may help that CTF is designed for good compressibility: we try to > minimize the number of unique symbols if we can do so without impairing > other properties, e.g. by avoiding encoding IDs of objects when we can > instead rely on the consumer to compute them at read time by walking > through the relevant data structures and counting. > > A few benchamrks indicate that compression by default also saves time > both at compression and decompression time. > > (Within a week I should be able to repeat this with an ld capable of CTF > deduplication rather than kludging it with a deduplicator meant for a > quite different job. I expect the sizes above to improve. In fact if > they *don't* improve I will take this as strong evidence that my > deduplicator is buggy.) > > > FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but > with deduplication: it's bigger than I'd like at around 10% of .text > size, but still much less than 1% of binary size (my goal is 1--2% of > .text, but Emacs is a nice tricky case, like Gtk, with lots of big types > and structures with long member names): > > section size addr > .interp28 4194872 > .note.ABI-tag 32 4194900 > .note.gnu.build-id 36 4194932 > .gnu.hash 628 4194968 > .dynsym 24432 4195600 > .dynstr 16934 4220032 > .gnu.version 2036 4236966 > .gnu.version_r704 4239008 > .rela.data.rel.ro 72 4239712 > .rela.data168 4239784 > .rela.got 48 4239952 > .rela,bss 336 424 > .rela.plt 23448 4240336 > .init 23 4263784 > .plt15648 4263808 > .text 1912622 4279456 > .fini 9 6192080 > .rodata165416 6192096 > .eh_frame_hdr 36196 6357512 > .eh_frame 210976 6393712 > .init_array 8 6609328 > .fini_array 8 6609336 > .data.rel.ro 4569 6609344 > .dynamic 1104 6613920 > .got 16 6615024 > .got.plt 7840 6615040 > .data 3276077 6622880 > ,bss 34153472 9899008 > .comment 26 0 > .gnu_debuglink 24 0 > .comment 26 0 > .debug_aranges 1536 0 > .debug_info 3912261 0 > .debug_abbrev 38821 0 > .debug_line408063 0 > .debug_str 117631 0 > .debug_loc 954538 0 > .debug_ranges 149590 0 > .ctf 213839 0 > .ctf (uncompressed)713264 0 > > (obviously, manually edited a bit, size -A
Re: Type representation in CTF and DWARF
On 11 Oct 2019, Indu Bhagat stated: > Compile with -g -gdwarf-like-ctf and use dwz -o (using > dwz compiled from the master branch) on the generated binaries: > > (coreutils-0.22) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > ls 30616 |1136 |21098 | 26240 > | 0.62 > pwd 10734 |788|10433 | 13929 > | 0.83 > groups 10706 |811|10249 | 13378 > | 0.80 > > (emacs-26.3) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > emacs-26.3.1 674657 |6402 | 273963 | 273910 > | 0.33 A side note here: the sizes given above are uncompressed sizes, but in the real world CTF is almost always compressed: the threshold for compression is in theory customizable but at the moment is hardwired at 4KiB-uncompressed in the linker. I usually see compression ratios of roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary, /usr/lib/libgtk-3.so.0.2404.3, and got these sizes: .text: 3317489 DWARF: 8589254 Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264 .ctf section size: 213839 Note that this is not only in the absence of CTF strtab sharing with the ELF dynstrtab, but also using a less effective compressor: currently we use gzip, but I expect to transition to lzma iff available at binutils build time (which it usually is), perhaps as an option (on by default) to allow interoperability with binutils that don't have lzma available. Obviously better compressors will save even more space. It may help that CTF is designed for good compressibility: we try to minimize the number of unique symbols if we can do so without impairing other properties, e.g. by avoiding encoding IDs of objects when we can instead rely on the consumer to compute them at read time by walking through the relevant data structures and counting. A few benchamrks indicate that compression by default also saves time both at compression and decompression time. (Within a week I should be able to repeat this with an ld capable of CTF deduplication rather than kludging it with a deduplicator meant for a quite different job. I expect the sizes above to improve. In fact if they *don't* improve I will take this as strong evidence that my deduplicator is buggy.) FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but with deduplication: it's bigger than I'd like at around 10% of .text size, but still much less than 1% of binary size (my goal is 1--2% of .text, but Emacs is a nice tricky case, like Gtk, with lots of big types and structures with long member names): section size addr .interp28 4194872 .note.ABI-tag 32 4194900 .note.gnu.build-id 36 4194932 .gnu.hash 628 4194968 .dynsym 24432 4195600 .dynstr 16934 4220032 .gnu.version 2036 4236966 .gnu.version_r704 4239008 .rela.data.rel.ro 72 4239712 .rela.data168 4239784 .rela.got 48 4239952 .rela,bss 336 424 .rela.plt 23448 4240336 .init 23 4263784 .plt15648 4263808 .text 1912622 4279456 .fini 9 6192080 .rodata165416 6192096 .eh_frame_hdr 36196 6357512 .eh_frame 210976 6393712 .init_array 8 6609328 .fini_array 8 6609336 .data.rel.ro 4569 6609344 .dynamic 1104 6613920 .got 16 6615024 .got.plt 7840 6615040 .data 3276077 6622880 ,bss 34153472 9899008 .comment 26 0 .gnu_debuglink 24 0 .comment 26 0 .debug_aranges 1536 0 .debug_info 3912261 0 .debug_abbrev 38821 0 .debug_line408063 0 .debug_str 117631 0 .debug_loc 954538 0 .debug_ranges 149590 0 .ctf 213839 0 .ctf (uncompressed)713264 0 (obviously, manually edited a bit, size -A doesn't produce the last line on its own!) (I'm not sure what the hell is going on with the weirdly-named ,bss section. Probably something to do with unexec().)
Re: Type representation in CTF and DWARF
On 9 Oct 2019, Indu Bhagat told this: > Yes, CTF does not support C++ at this time. To cover all of C (including > GNU C extensions), we need to add representation for things like Vector type, > non IEEE float etc. (somewhat infrequently occurring constructs) One note: adding C++ support will not make the representation of CTF for C any larger, because I plan to do as DWARF does and have a language tag in the header, and only support one language per CTF dictionary[1]. The type section format will otherwise be completely distinct between the two languages, specifically in order that the C side of things not pay the price for the (necessarily richer) C++ type representation. This is very much a C++ thing: don't pay for what you don't use :) So there's no need to worry that adding C++ support will make any C compactness figures worse. You only need to consider that the C++ CTF representation may not be able to be as compact as the C representation -- and even there I hope to come close. [1! though there is a possibility of having a C++ dictionary cite types from a C one, allowing some sharing: this is all format v5 stuff, i.e. two format revs away, and this bit in particular is not yet designed, but feels possible.)
Re: Type representation in CTF and DWARF
On 10/11/2019 04:23 AM, Richard Biener wrote: Thanks for your pointers. CTF does not encode location information. So, I used early exit in the add_src_coords_attributes to avoid generation of location info (file, line, column). To answer Richard's question, CTF does have type debug info for function declarations and the argument types. So I think with these changes, both CTF and DWARF generation will emit debug info for the same set of types and decl. Compile with -g -gdwarf-like-ctf and use dwz -o (using dwz compiled from the master branch) on the generated binaries: (coreutils-0.22) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) ls 30616 |1136 |21098 | 26240 | 0.62 pwd 10734 |788|10433 | 13929 | 0.83 groups 10706 |811|10249 | 13378 | 0.80 (emacs-26.3) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) emacs-26.3.1 674657 |6402 | 273963 | 273910 | 0.33 I chose to account for 50% of .debug_str because at this point, it will be unfair to not account for them. Actually, one could even argue that upto 70% of the .debug_str are names of entities. CTF section sizes do include the CTF string tables. Across coreutils, I see a geomean of 0.73 (ratio of .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger footprint than CTF (with 50% of .debug_str accounted for). I'm not convinced this "improvement" in size is worth maintainig another debug-info format much less since it lacks desirable features right now and thus evaluation is tricky. At least you can improve dwarf size considerably with a low amount of work. I suspect another factor where dwarf is bigger compared to CTF is that dwarf is recording typedef names as well as qualified type variants. But maybe CTF just has a more compact representation for the bits it actually implements. Richard. CTF represents typedefs and qualified type variants. They are included in the the .ctf section sizes above. Indu
Re: Type representation in CTF and DWARF
On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote: > > (coreutils-0.22) > > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > > ls 30616 |1136 |21098 | 26240 > > | 0.62 > > pwd 10734 |788|10433 | 13929 > > | 0.83 > > groups 10706 |811|10249 | 13378 > > | 0.80 > > > > (emacs-26.3) > > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > > emacs-26.3.1 674657 |6402 | 273963 | 273910 > > | 0.33 > > > > I chose to account for 50% of .debug_str because at this point, it will be > > unfair to not account for them. Actually, one could even argue that upto 70% > > of the .debug_str are names of entities. CTF section sizes do include the > > CTF > > string tables. > > > > Across coreutils, I see a geomean of 0.73 (ratio of > > .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the > > "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger > > footprint than CTF (with 50% of .debug_str accounted for). > > I'm not convinced this "improvement" in size is worth maintainig another > debug-info format much less since it lacks desirable features right now > and thus evaluation is tricky. > > At least you can improve dwarf size considerably with a low amount of work. > > I suspect another factor where dwarf is bigger compared to CTF is that dwarf > is recording typedef names as well as qualified type variants. But maybe > CTF just has a more compact representation for the bits it actually > implements. Does CTF record automatic variables in functions, or just global variables? If only the latter, it would be fair to also disable addition of local variable DIEs, lexical blocks. Does CTF record inline functions? Again, if not, it would be fair to not emit that either in .debug_info. -gno-record-gcc-switches so that the compiler command line is not encoded in the debug info (unless it is in CTF). DWARF is highly extensible format, what exactly is and is not emitted is something that consumers can choose. Yes, DWARF can be large, but mainly because it provides a lot of information, the actual representation has been designed with size concerns in mind and newer versions of the standard keep improving that too. Jakub
Re: Type representation in CTF and DWARF
On Fri, Oct 11, 2019 at 1:06 AM Indu Bhagat wrote: > > > > On 10/09/2019 12:49 AM, Jakub Jelinek wrote: > > On Wed, Oct 09, 2019 at 09:41:09AM +0200, Richard Biener wrote: > >> There's a mechanism to get type (and decl - I suppose CTF also > >> contains debug info > >> for function declarations not only its type?) info as part of early > >> debug generation. > >> The attached "hack" simply mangles dwarf2out to output this early info as > >> the > >> only debug info (only verified on a small .c file). We still have things > >> like > >> file, line and column numbers for entities (not sure if CTF has those). > >> > >> It should be possible to "hide" the hack behind a -gdwarf-like-ctf or > >> similar. > >> I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already... > >> (and -g1 doesn't include types but just decls). > > Yeah. And if location info isn't in CTF, you can as well add an early > > return in add_src_coords_attributes, like it has one for UNKNOWN_LOCATION > > already. Or if it is there, but just file/line and not column, you can use > > -gno-column-info. As has been mentioned earlier, you can use dwz utility > > post-linking instead of -fdebug-types-section. > > > > Jakub > > Thanks for your pointers. > > CTF does not encode location information. So, I used early exit in the > add_src_coords_attributes to avoid generation of location info (file, line, > column). To answer Richard's question, CTF does have type debug info > for function declarations and the argument types. So I think with these > changes, both CTF and DWARF generation will emit debug info for the same set > of > types and decl. > > Compile with -g -gdwarf-like-ctf and use dwz -o (using > dwz compiled from the master branch) on the generated binaries: > > (coreutils-0.22) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > ls 30616 |1136 |21098 | 26240 > | 0.62 > pwd 10734 |788|10433 | 13929 > | 0.83 > groups 10706 |811|10249 | 13378 > | 0.80 > > (emacs-26.3) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > emacs-26.3.1 674657 |6402 | 273963 | 273910 > | 0.33 > > I chose to account for 50% of .debug_str because at this point, it will be > unfair to not account for them. Actually, one could even argue that upto 70% > of the .debug_str are names of entities. CTF section sizes do include the CTF > string tables. > > Across coreutils, I see a geomean of 0.73 (ratio of > .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the > "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger > footprint than CTF (with 50% of .debug_str accounted for). I'm not convinced this "improvement" in size is worth maintainig another debug-info format much less since it lacks desirable features right now and thus evaluation is tricky. At least you can improve dwarf size considerably with a low amount of work. I suspect another factor where dwarf is bigger compared to CTF is that dwarf is recording typedef names as well as qualified type variants. But maybe CTF just has a more compact representation for the bits it actually implements. Richard. > Indu >
Re: Type representation in CTF and DWARF
On 10/09/2019 12:49 AM, Jakub Jelinek wrote: On Wed, Oct 09, 2019 at 09:41:09AM +0200, Richard Biener wrote: There's a mechanism to get type (and decl - I suppose CTF also contains debug info for function declarations not only its type?) info as part of early debug generation. The attached "hack" simply mangles dwarf2out to output this early info as the only debug info (only verified on a small .c file). We still have things like file, line and column numbers for entities (not sure if CTF has those). It should be possible to "hide" the hack behind a -gdwarf-like-ctf or similar. I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already... (and -g1 doesn't include types but just decls). Yeah. And if location info isn't in CTF, you can as well add an early return in add_src_coords_attributes, like it has one for UNKNOWN_LOCATION already. Or if it is there, but just file/line and not column, you can use -gno-column-info. As has been mentioned earlier, you can use dwz utility post-linking instead of -fdebug-types-section. Jakub Thanks for your pointers. CTF does not encode location information. So, I used early exit in the add_src_coords_attributes to avoid generation of location info (file, line, column). To answer Richard's question, CTF does have type debug info for function declarations and the argument types. So I think with these changes, both CTF and DWARF generation will emit debug info for the same set of types and decl. Compile with -g -gdwarf-like-ctf and use dwz -o (using dwz compiled from the master branch) on the generated binaries: (coreutils-0.22) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) ls 30616 |1136 |21098 | 26240 | 0.62 pwd 10734 |788|10433 | 13929 | 0.83 groups 10706 |811|10249 | 13378 | 0.80 (emacs-26.3) .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) emacs-26.3.1 674657 |6402 | 273963 | 273910 | 0.33 I chose to account for 50% of .debug_str because at this point, it will be unfair to not account for them. Actually, one could even argue that upto 70% of the .debug_str are names of entities. CTF section sizes do include the CTF string tables. Across coreutils, I see a geomean of 0.73 (ratio of .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger footprint than CTF (with 50% of .debug_str accounted for). Indu
Re: Type representation in CTF and DWARF
On Tue, Oct 08, 2019 at 10:26:13PM -0700, Indu Bhagat wrote: > The justification for CTF is and will remain - a compact, faster debug > format > for type information and support some online debugging use-cases (like > backtraces) in future. Approximate backtraces, sure. (It cannot know if another frame has been stacked by the current function already, or not). Segher
Re: Type representation in CTF and DWARF
On Wed, Oct 09, 2019 at 09:41:09AM +0200, Richard Biener wrote: > There's a mechanism to get type (and decl - I suppose CTF also > contains debug info > for function declarations not only its type?) info as part of early > debug generation. > The attached "hack" simply mangles dwarf2out to output this early info as the > only debug info (only verified on a small .c file). We still have things like > file, line and column numbers for entities (not sure if CTF has those). > > It should be possible to "hide" the hack behind a -gdwarf-like-ctf or similar. > I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already... > (and -g1 doesn't include types but just decls). Yeah. And if location info isn't in CTF, you can as well add an early return in add_src_coords_attributes, like it has one for UNKNOWN_LOCATION already. Or if it is there, but just file/line and not column, you can use -gno-column-info. As has been mentioned earlier, you can use dwz utility post-linking instead of -fdebug-types-section. Jakub
Re: Type representation in CTF and DWARF
On Wed, Oct 9, 2019 at 7:26 AM Indu Bhagat wrote: > > > > On 10/08/2019 08:37 AM, Pedro Alves wrote: > > On 10/4/19 8:23 PM, Indu Bhagat wrote: > >> Hello, > >> > >> At GNU Tools Cauldron this year, some folks were curious to know more on > >> how > >> the "type representation" in CTF compares vis-a-vis DWARF. > > I was one of those, and I brought this up to Jose, after your > > presentation. Glad to see the follow up! Thanks much for this. > > > > In your Cauldron presentation we saw CTF compared to full blown DWARF > > as justification for CTF, > > Hmm. And I thought I made the effort reqd to clarify my position that > comparing > full-blown DWARF sizes to type-only CTF section sizes is not appropriate, let > alone to not use as a justification for CTF. My intention to show those > numbers was > only to give some perspective to users curious to know the sizes of CTF debug > info (as generated by dwarf2ctf) because these sections will ideally be not > stripped out of shipped binaries. > > The justification for CTF is and will remain - a compact, faster debug format > for type information and support some online debugging use-cases (like > backtraces) in future. > > > but I was more interested in a comparison between > > CTF and a DWARF subset containing exactly only what you have available in > > CTF. Because if DWARF with everything-you-don't-need stripped out > > is in the same ballpark, then I am puzzled on why add/maintain a new > > Debug format, with all the duplication of effort that entails going > > forward. > > I shared some numbers on this in the previous emails in this thread. I thought > comparing DWARF's de-duplication-amenable offering (using > -fdebug-types-section) will be useful in this context. > > For binaries compiled with -fdebug-types-section -gdwarf-4, here is some data. > The CTF sections are generated with dwarf2ctf because CTF link-time de-dup is > being worked on currently. The end result of link-time CTF de-dup is expected > to be at par with these .ctf section sizes. > > The .ctf section sizes below include the CTF string table (.debug_str is > excluded from the calculations however): > > (coreutils-0.22) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | > .ctf (uncompressed) | ratio (.ctf/(D1+D2+D3)) > ls 109806 | 18876| 22042 | 12413 | > 26240 | 0.18 > pwd 27902 | 7914 | 10851 | 5753| > 13929 | 0.33 > groups 26920 | 8173 | 10674 | 5070| > 13378 | 0.33 > > (emacs-26.3) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | > .ctf (uncompressed) | ratio (.ctf/(D1+D2+D3)) > emacs 3755083 | 202926 | 431926| 143462 | > 273910| 0.06 > > > It is not easy to get an estimate of 'DWARF with everything-you-don't-need > stripped out'. At this time, I don't know of an easy way to make this > comparison > more meaningful. Any suggestions ? There's a mechanism to get type (and decl - I suppose CTF also contains debug info for function declarations not only its type?) info as part of early debug generation. The attached "hack" simply mangles dwarf2out to output this early info as the only debug info (only verified on a small .c file). We still have things like file, line and column numbers for entities (not sure if CTF has those). It should be possible to "hide" the hack behind a -gdwarf-like-ctf or similar. I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already... (and -g1 doesn't include types but just decls). Richard. > > Also, it's my understanding that the current CTF format doesn't yet > > support C++, Vector registers, etc., maybe other things, so if DWARF > > was sufficient for your needs, then in the long run it sounds like > > a better option to me, as then you wouldn't have to extend CTF _and_ > > DWARF whenever some feature is needed. > > Yes, CTF does not support C++ at this time. To cover all of C (including > GNU C extensions), we need to add representation for things like Vector type, > non IEEE float etc. (somewhat infrequently occurring constructs) > > The issue is not that DWARF cannot represent the required type information. > DWARF is voluminous and secondly, the current workflow to get to CTF from > source programs without direct toolchain support is tiresome and lengthy. > > For current and future users of CTF, having the support for the format in the > toolchain is the best way to promote adoption and enhance community > experience. > > > Maybe it would make sense to work on integrating CTF into the DWARF > > standard itself, not sure? > > > > I was also curious on your plans for adding unwinding support to CTF, > > while the kernel (the main CTF user, IIUC), already has plans to > > use its own unwinding format (ORC)? > > Kernel's unwinding format (ORC) helps generate backtrace with fun
Re: Type representation in CTF and DWARF
On 10/08/2019 08:37 AM, Pedro Alves wrote: On 10/4/19 8:23 PM, Indu Bhagat wrote: Hello, At GNU Tools Cauldron this year, some folks were curious to know more on how the "type representation" in CTF compares vis-a-vis DWARF. I was one of those, and I brought this up to Jose, after your presentation. Glad to see the follow up! Thanks much for this. In your Cauldron presentation we saw CTF compared to full blown DWARF as justification for CTF, Hmm. And I thought I made the effort reqd to clarify my position that comparing full-blown DWARF sizes to type-only CTF section sizes is not appropriate, let alone to not use as a justification for CTF. My intention to show those numbers was only to give some perspective to users curious to know the sizes of CTF debug info (as generated by dwarf2ctf) because these sections will ideally be not stripped out of shipped binaries. The justification for CTF is and will remain - a compact, faster debug format for type information and support some online debugging use-cases (like backtraces) in future. but I was more interested in a comparison between CTF and a DWARF subset containing exactly only what you have available in CTF. Because if DWARF with everything-you-don't-need stripped out is in the same ballpark, then I am puzzled on why add/maintain a new Debug format, with all the duplication of effort that entails going forward. I shared some numbers on this in the previous emails in this thread. I thought comparing DWARF's de-duplication-amenable offering (using -fdebug-types-section) will be useful in this context. For binaries compiled with -fdebug-types-section -gdwarf-4, here is some data. The CTF sections are generated with dwarf2ctf because CTF link-time de-dup is being worked on currently. The end result of link-time CTF de-dup is expected to be at par with these .ctf section sizes. The .ctf section sizes below include the CTF string table (.debug_str is excluded from the calculations however): (coreutils-0.22) .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D3)) ls 109806 | 18876| 22042 | 12413 | 26240 | 0.18 pwd 27902 | 7914 | 10851 | 5753| 13929 | 0.33 groups 26920 | 8173 | 10674 | 5070| 13378 | 0.33 (emacs-26.3) .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D3)) emacs 3755083 | 202926 | 431926| 143462 | 273910| 0.06 It is not easy to get an estimate of 'DWARF with everything-you-don't-need stripped out'. At this time, I don't know of an easy way to make this comparison more meaningful. Any suggestions ? Also, it's my understanding that the current CTF format doesn't yet support C++, Vector registers, etc., maybe other things, so if DWARF was sufficient for your needs, then in the long run it sounds like a better option to me, as then you wouldn't have to extend CTF _and_ DWARF whenever some feature is needed. Yes, CTF does not support C++ at this time. To cover all of C (including GNU C extensions), we need to add representation for things like Vector type, non IEEE float etc. (somewhat infrequently occurring constructs) The issue is not that DWARF cannot represent the required type information. DWARF is voluminous and secondly, the current workflow to get to CTF from source programs without direct toolchain support is tiresome and lengthy. For current and future users of CTF, having the support for the format in the toolchain is the best way to promote adoption and enhance community experience. Maybe it would make sense to work on integrating CTF into the DWARF standard itself, not sure? I was also curious on your plans for adding unwinding support to CTF, while the kernel (the main CTF user, IIUC), already has plans to use its own unwinding format (ORC)? Kernel's unwinding format (ORC) helps generate backtrace with function identifiers. For some (ORCL) internal customers, the requirement is to go beyond that and support input arg values. The requirement there is to generate backtraces in a fast way, without relying on DWARF. So with all those questions, I came out of the presentation thinking that I could not really justify CTF if I were asked to. Thanks for discussing this openly. I believe there are other GCC maintainers who are undecided as well :) I hope I have answered some of your concerns. (Side note: the Cauldron page is missing slides for your presentation, so I couldn't go and recheck some things mentioned above.) Thanks, Pedro Alves I mailed the organizers my slides. They should be online soon. Thanks
Re: Type representation in CTF and DWARF
On 10/4/19 8:23 PM, Indu Bhagat wrote: > Hello, > > At GNU Tools Cauldron this year, some folks were curious to know more on how > the "type representation" in CTF compares vis-a-vis DWARF. I was one of those, and I brought this up to Jose, after your presentation. Glad to see the follow up! Thanks much for this. In your Cauldron presentation we saw CTF compared to full blown DWARF as justification for CTF, but I was more interested in a comparison between CTF and a DWARF subset containing exactly only what you have available in CTF. Because if DWARF with everything-you-don't-need stripped out is in the same ballpark, then I am puzzled on why add/maintain a new Debug format, with all the duplication of effort that entails going forward. Also, it's my understanding that the current CTF format doesn't yet support C++, Vector registers, etc., maybe other things, so if DWARF was sufficient for your needs, then in the long run it sounds like a better option to me, as then you wouldn't have to extend CTF _and_ DWARF whenever some feature is needed. Maybe it would make sense to work on integrating CTF into the DWARF standard itself, not sure? I was also curious on your plans for adding unwinding support to CTF, while the kernel (the main CTF user, IIUC), already has plans to use its own unwinding format (ORC)? So with all those questions, I came out of the presentation thinking that I could not really justify CTF if I were asked to. (Side note: the Cauldron page is missing slides for your presentation, so I couldn't go and recheck some things mentioned above.) Thanks, Pedro Alves
Re: Type representation in CTF and DWARF
On Mon, Oct 7, 2019 at 4:47 PM Indu Bhagat wrote: > On 10/07/2019 12:35 AM, Richard Biener wrote: > > On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat > wrote: > >> Hello, > >> > >> At GNU Tools Cauldron this year, some folks were curious to know more > on how > >> the "type representation" in CTF compares vis-a-vis DWARF. > >> > >> [...] > >> > >> So, for the small C testcase with a union, enum, array, struct, typedef > etc, I > >> see following sizes : > >> > >> Compile with -fdebug-types-section -gdwarf-4 (size -A excerpt): > >> .debug_aranges 48 0 > >> .debug_info 150 0 > >> .debug_abbrev 314 0 > >> .debug_line73 0 > >> .debug_str455 0 > >> .debug_ranges 32 0 > >> .debug_types 578 0 > >> > >> Compile with -fdebug-types-section -gdwarf-5 (size -A excerpt): > >> .debug_aranges 48 0 > >> .debug_info732 0 > >> .debug_abbrev 309 0 > >> .debug_line 73 0 > >> .debug_str 455 0 > >> .debug_rnglists 23 0 > >> > >> Compile with -gt (size -A excerpt): > >> .ctf 966 0 > >> CTF strings sub-section size (ctf_strlen in disassmebly) = 374 > >> == > CTF section just for representing types = 966 - 374 = 592 > bytes > >> (The 592 bytes include the CTF header and other indexes etc.) > >> > >> So, following points are what I would highlight. Hopefully this helps > you see > >> that CTF has promise for the task of representing type debug info. > >> > >> 1. Type Information layout in sections: > >> A .ctf section is self-sufficient to represent types in a program. > All > >> references within the CTF section are via either indexes or > offsets into the > >> CTF section. No relocations are necessary in CTF at this time. In > contrast, > >> DWARF type information is organized in multiple sections - > .debug_info, > >> .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types > in DWARF4. > >> > >> 2. Type Information encoding / compactness matters: > >> Because the type information is organized across sections in DWARF > (and > >> contains some debug information like location etc.) , it is not > feasible > >> to put a distinct number to the size in bytes for representing type > >> information in DWARF. But the size info of sections shown above > should > >> be helpful to show that CTF does show promise in compactly > representing > >> types. > >> > >> Lets see some size data. CTF string table (= 374 bytes) is left > out of the > >> discussion at hand because it will not be fair to compare with > .debug_str > >> section which contains other information than just names of types. > >> > >> The 592 bytes of the .ctf section are needed to represent types in > CTF > >> format. Now, when using DWARF5, the type information needs 732 > bytes in > >> .debug_info and 309 bytes in .debug_abbrev. > >> > >> In DWARF (when using -fdebug-types-section), the base types are > duplicated > >> across type units. So for the above example, the DWARF DIE > representing > >> 'unsigned int' will appear in both the DWARF trees for types - > node and > >> node_payload. In CTF, there is a single lone type 'unsigned int'. > > It's not clear to me why you are using -fdebug-types-section for this > > comparison? > > With just -gdwarf-4 I get > > > > .debug_info 292 > > .debug_abbrev 189 > > .debug_str 299 > > > > this contains all the info CTF provides (and more). This sums to 780 > bytes, > > smaller than the CTF variant. I skimmed over the info and there's not > much > > to strip to get to CTF levels, mainly locations. The strings section > also > > has a quite large portion for GCC version and arguments, which is 93 > bytes. > > So overall the DWARF representation should clock in at less than 700 > bytes, > > more close to 650. > > > > Richard. > > It's not in favor of DWARF to go with just -gdwarf-4. Because the types > in the .debug_info section will not be de-duplicated. For more complicated > code > bases with many compilation units, this will skew the results in favor of > CTF > (once the CTF de-duplictor is ready :) ). > > Now, one might argue that in this example, there is no role for > de-duplicator. > Yes to that. But to all users of DWARF type debug information for _real > codebases_, -fdebug-types-section option is the best option. Isn't it ? > > Keeping "the size of type debug information in the shipped artifact small" > as > our target is meaningful for both CTF and DWARF. > > De-duplication is a key contributor to reducing the size of the type debug > information; and both CTF and DWARF types can be de-duplicated. At this > time, I > stuck to a simple example with one CU because it eases interpreting the > CTF and > DWARF debug
Re: Type representation in CTF and DWARF
On 10/07/2019 12:35 AM, Richard Biener wrote: On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat wrote: Hello, At GNU Tools Cauldron this year, some folks were curious to know more on how the "type representation" in CTF compares vis-a-vis DWARF. [...] So, for the small C testcase with a union, enum, array, struct, typedef etc, I see following sizes : Compile with -fdebug-types-section -gdwarf-4 (size -A excerpt): .debug_aranges 48 0 .debug_info 150 0 .debug_abbrev 314 0 .debug_line73 0 .debug_str455 0 .debug_ranges 32 0 .debug_types 578 0 Compile with -fdebug-types-section -gdwarf-5 (size -A excerpt): .debug_aranges 48 0 .debug_info732 0 .debug_abbrev 309 0 .debug_line 73 0 .debug_str 455 0 .debug_rnglists 23 0 Compile with -gt (size -A excerpt): .ctf 966 0 CTF strings sub-section size (ctf_strlen in disassmebly) = 374 == > CTF section just for representing types = 966 - 374 = 592 bytes (The 592 bytes include the CTF header and other indexes etc.) So, following points are what I would highlight. Hopefully this helps you see that CTF has promise for the task of representing type debug info. 1. Type Information layout in sections: A .ctf section is self-sufficient to represent types in a program. All references within the CTF section are via either indexes or offsets into the CTF section. No relocations are necessary in CTF at this time. In contrast, DWARF type information is organized in multiple sections - .debug_info, .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in DWARF4. 2. Type Information encoding / compactness matters: Because the type information is organized across sections in DWARF (and contains some debug information like location etc.) , it is not feasible to put a distinct number to the size in bytes for representing type information in DWARF. But the size info of sections shown above should be helpful to show that CTF does show promise in compactly representing types. Lets see some size data. CTF string table (= 374 bytes) is left out of the discussion at hand because it will not be fair to compare with .debug_str section which contains other information than just names of types. The 592 bytes of the .ctf section are needed to represent types in CTF format. Now, when using DWARF5, the type information needs 732 bytes in .debug_info and 309 bytes in .debug_abbrev. In DWARF (when using -fdebug-types-section), the base types are duplicated across type units. So for the above example, the DWARF DIE representing 'unsigned int' will appear in both the DWARF trees for types - node and node_payload. In CTF, there is a single lone type 'unsigned int'. It's not clear to me why you are using -fdebug-types-section for this comparison? With just -gdwarf-4 I get .debug_info 292 .debug_abbrev 189 .debug_str 299 this contains all the info CTF provides (and more). This sums to 780 bytes, smaller than the CTF variant. I skimmed over the info and there's not much to strip to get to CTF levels, mainly locations. The strings section also has a quite large portion for GCC version and arguments, which is 93 bytes. So overall the DWARF representation should clock in at less than 700 bytes, more close to 650. Richard. It's not in favor of DWARF to go with just -gdwarf-4. Because the types in the .debug_info section will not be de-duplicated. For more complicated code bases with many compilation units, this will skew the results in favor of CTF (once the CTF de-duplictor is ready :) ). Now, one might argue that in this example, there is no role for de-duplicator. Yes to that. But to all users of DWARF type debug information for _real codebases_, -fdebug-types-section option is the best option. Isn't it ? Keeping "the size of type debug information in the shipped artifact small" as our target is meaningful for both CTF and DWARF. De-duplication is a key contributor to reducing the size of the type debug information; and both CTF and DWARF types can be de-duplicated. At this time, I stuck to a simple example with one CU because it eases interpreting the CTF and DWARF debug info in the binaries and because the CTF link-time de-duplication is not fully ready. (NickA suggested few days ago to compare how DWARF and CTF section sizes increase when a new member, or a new enum, or a new union etc are added. I can share some more data if there is interest in such a comparison. Few examples below : 1. Add a new member 'struct node_payload * a' to struct node_payload DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total = 11 + 17 = 28 CTF = 980 - 966 (.c
Re: Type representation in CTF and DWARF
On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat wrote: > > Hello, > > At GNU Tools Cauldron this year, some folks were curious to know more on how > the "type representation" in CTF compares vis-a-vis DWARF. > > I use small testcase below to gather some numbers to help drive this > discussion. > > [ibhagat@ibhagatpc ctf-size]$ cat ctf_sizeme.c > #define MAX_NUM_MSGS 5 > > enum node_type > { >INIT_TYPE = 0, >COMM_TYPE = 1, >COMP_TYPE = 2, >MSG_TYPE = 3, >RELEASE_TYPE = 4, >MAX_NODE_TYPE > }; > > typedef struct node_payload > { >unsigned short npay_offset; >const char * npay_msg; >unsigned int npay_nelems; >struct node_payload * npay_next; > } node_payload; > > typedef struct node_property > { >int timestamp; >char category; >long initvalue; > } node_property_t; > > typedef struct node > { >enum node_type ntype; >int nmask:5; >union > { >struct node_payload * npayload; >void * nbase; > } nu; > unsigned int msgs[MAX_NUM_MSGS]; > node_property_t node_prop; > } Node; > > Node s; > > int main (void) > { >return 0; > } > > Note that in this case, there is nothing that the de-duplicator has to do > (neither for the TYPE comdat sections nor CTF types). I chose such an example > because de-duplication of types is orthogonal to the concept of representation > of types. > > So, for the small C testcase with a union, enum, array, struct, typedef etc, I > see following sizes : > > Compile with -fdebug-types-section -gdwarf-4 (size -A excerpt): > .debug_aranges 48 0 > .debug_info 150 0 > .debug_abbrev 314 0 > .debug_line73 0 > .debug_str455 0 > .debug_ranges 32 0 > .debug_types 578 0 > > Compile with -fdebug-types-section -gdwarf-5 (size -A excerpt): > .debug_aranges 48 0 > .debug_info732 0 > .debug_abbrev 309 0 > .debug_line 73 0 > .debug_str 455 0 > .debug_rnglists 23 0 > > Compile with -gt (size -A excerpt): > .ctf 966 0 > CTF strings sub-section size (ctf_strlen in disassmebly) = 374 > == > CTF section just for representing types = 966 - 374 = 592 bytes > (The 592 bytes include the CTF header and other indexes etc.) > > So, following points are what I would highlight. Hopefully this helps you see > that CTF has promise for the task of representing type debug info. > > 1. Type Information layout in sections: > A .ctf section is self-sufficient to represent types in a program. All > references within the CTF section are via either indexes or offsets into > the > CTF section. No relocations are necessary in CTF at this time. In > contrast, > DWARF type information is organized in multiple sections - .debug_info, > .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in > DWARF4. > > 2. Type Information encoding / compactness matters: > Because the type information is organized across sections in DWARF (and > contains some debug information like location etc.) , it is not feasible > to put a distinct number to the size in bytes for representing type > information in DWARF. But the size info of sections shown above should > be helpful to show that CTF does show promise in compactly representing > types. > > Lets see some size data. CTF string table (= 374 bytes) is left out of the > discussion at hand because it will not be fair to compare with .debug_str > section which contains other information than just names of types. > > The 592 bytes of the .ctf section are needed to represent types in CTF > format. Now, when using DWARF5, the type information needs 732 bytes in > .debug_info and 309 bytes in .debug_abbrev. > > In DWARF (when using -fdebug-types-section), the base types are duplicated > across type units. So for the above example, the DWARF DIE representing > 'unsigned int' will appear in both the DWARF trees for types - node and > node_payload. In CTF, there is a single lone type 'unsigned int'. It's not clear to me why you are using -fdebug-types-section for this comparison? With just -gdwarf-4 I get .debug_info 292 .debug_abbrev 189 .debug_str 299 this contains all the info CTF provides (and more). This sums to 780 bytes, smaller than the CTF variant. I skimmed over the info and there's not much to strip to get to CTF levels, mainly locations. The strings section also has a quite large portion for GCC version and arguments, which is 93 bytes. So overall the DWARF representation should clock in at less than 700 bytes, more close to 650. Richard. > 3. Type Information retrieval and handling: > CTF type information is organized as a linear array of CTF types. CTF > types > have references to other CTF types. libctf
Type representation in CTF and DWARF
Hello, At GNU Tools Cauldron this year, some folks were curious to know more on how the "type representation" in CTF compares vis-a-vis DWARF. I use small testcase below to gather some numbers to help drive this discussion. [ibhagat@ibhagatpc ctf-size]$ cat ctf_sizeme.c #define MAX_NUM_MSGS 5 enum node_type { INIT_TYPE = 0, COMM_TYPE = 1, COMP_TYPE = 2, MSG_TYPE = 3, RELEASE_TYPE = 4, MAX_NODE_TYPE }; typedef struct node_payload { unsigned short npay_offset; const char * npay_msg; unsigned int npay_nelems; struct node_payload * npay_next; } node_payload; typedef struct node_property { int timestamp; char category; long initvalue; } node_property_t; typedef struct node { enum node_type ntype; int nmask:5; union { struct node_payload * npayload; void * nbase; } nu; unsigned int msgs[MAX_NUM_MSGS]; node_property_t node_prop; } Node; Node s; int main (void) { return 0; } Note that in this case, there is nothing that the de-duplicator has to do (neither for the TYPE comdat sections nor CTF types). I chose such an example because de-duplication of types is orthogonal to the concept of representation of types. So, for the small C testcase with a union, enum, array, struct, typedef etc, I see following sizes : Compile with -fdebug-types-section -gdwarf-4 (size -A excerpt): .debug_aranges 48 0 .debug_info 150 0 .debug_abbrev 314 0 .debug_line73 0 .debug_str455 0 .debug_ranges 32 0 .debug_types 578 0 Compile with -fdebug-types-section -gdwarf-5 (size -A excerpt): .debug_aranges 48 0 .debug_info732 0 .debug_abbrev 309 0 .debug_line 73 0 .debug_str 455 0 .debug_rnglists 23 0 Compile with -gt (size -A excerpt): .ctf 966 0 CTF strings sub-section size (ctf_strlen in disassmebly) = 374 == > CTF section just for representing types = 966 - 374 = 592 bytes (The 592 bytes include the CTF header and other indexes etc.) So, following points are what I would highlight. Hopefully this helps you see that CTF has promise for the task of representing type debug info. 1. Type Information layout in sections: A .ctf section is self-sufficient to represent types in a program. All references within the CTF section are via either indexes or offsets into the CTF section. No relocations are necessary in CTF at this time. In contrast, DWARF type information is organized in multiple sections - .debug_info, .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in DWARF4. 2. Type Information encoding / compactness matters: Because the type information is organized across sections in DWARF (and contains some debug information like location etc.) , it is not feasible to put a distinct number to the size in bytes for representing type information in DWARF. But the size info of sections shown above should be helpful to show that CTF does show promise in compactly representing types. Lets see some size data. CTF string table (= 374 bytes) is left out of the discussion at hand because it will not be fair to compare with .debug_str section which contains other information than just names of types. The 592 bytes of the .ctf section are needed to represent types in CTF format. Now, when using DWARF5, the type information needs 732 bytes in .debug_info and 309 bytes in .debug_abbrev. In DWARF (when using -fdebug-types-section), the base types are duplicated across type units. So for the above example, the DWARF DIE representing 'unsigned int' will appear in both the DWARF trees for types - node and node_payload. In CTF, there is a single lone type 'unsigned int'. 3. Type Information retrieval and handling: CTF type information is organized as a linear array of CTF types. CTF types have references to other CTF types. libctf facilitates name lookups, i.e. given the name of the type, get the type information. DWARF type information is organized in a tree of DIEs. The information at the leaf DIEs (base types) across DWARF type units is often duplicated. DWARF type units do have references to other type units for larger types though. In the example, the DWARF type unit for node has a reference to the DWARF type unit for node_payload. I only state the above for sake of observation, I don't know for certain if one format is necessarily better or worse for consumers of type debug information at this time WRT runtime access patterns. On a related note though, it's not clear to me how .debug_types integration with split-dwarf works out. If the linker does not see the non-relocation-necessary part of the DWARF, I am not sure how .debug_type type units are de-duplicated when using split-dwar