Re: Type representation in CTF and DWARF

2019-10-25 Thread Richard Biener
On Fri, Oct 25, 2019 at 1:52 AM Indu Bhagat  wrote:
>
>
>
> On 10/11/2019 04:41 AM, Jakub Jelinek wrote:
> > On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote:
> >>> (coreutils-0.22)
> >>>.debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> >>> ls   30616   |1136   |21098   | 26240 
> >>>   | 0.62
> >>> pwd  10734   |788|10433   | 13929 
> >>>   | 0.83
> >>> groups 10706 |811|10249   | 13378 
> >>>   | 0.80
> >>>
> >>> (emacs-26.3)
> >>>.debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> >>> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> >>> emacs-26.3.1 674657  |6402   |   273963   |   273910  
> >>>   | 0.33
> >>>
> >>> I chose to account for 50% of .debug_str because at this point, it will be
> >>> unfair to not account for them. Actually, one could even argue that upto 
> >>> 70%
> >>> of the .debug_str are names of entities. CTF section sizes do include the 
> >>> CTF
> >>> string tables.
> >>>
> >>> Across coreutils, I see a geomean of 0.73 (ratio of
> >>> .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
> >>> "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
> >>> footprint than CTF (with 50% of .debug_str accounted for).
> >> I'm not convinced this "improvement" in size is worth maintainig another
> >> debug-info format much less since it lacks desirable features right now
> >> and thus evaluation is tricky.
> >>
> >> At least you can improve dwarf size considerably with a low amount of work.
> >>
> >> I suspect another factor where dwarf is bigger compared to CTF is that 
> >> dwarf
> >> is recording typedef names as well as qualified type variants.  But maybe
> >> CTF just has a more compact representation for the bits it actually 
> >> implements.
> > Does CTF record automatic variables in functions, or just global variables?
> > If only the latter, it would be fair to also disable addition of local
> > variable DIEs, lexical blocks.  Does CTF record inline functions?  Again, if
> > not, it would be fair to not emit that either in .debug_info.
> > -gno-record-gcc-switches so that the compiler command line is not encoded in
> > the debug info (unless it is in CTF).
>
> CTF includes file-scope and global-scope entities. So, CTF for a function
> defined/declared at these scopes is available in .ctf section, even if it is
> inlined.
>
> To not generate DWARF for function-local entities, I made a tweak in the
> gen_decl_die API to have an early exit when TREE_CODE (DECL_CONTEXT (decl))
> is FUNCTION_DECL.
>
> @@ -26374,6 +26374,12 @@ gen_decl_die (tree decl, tree origin, struct 
> vlr_context *ctx,
> if (DECL_P (decl_or_origin) && DECL_IGNORED_P (decl_or_origin))
>   return NULL;
>
> +  /* Do not generate info for function local decl when -gdwarf-like-ctf is
> + enabled.  */
> +  if (debug_dwarf_like_ctf && DECL_CONTEXT (decl)
> +  && (TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL))
> +return NULL;
> +
> switch (TREE_CODE (decl_or_origin))
>   {
>   case ERROR_MARK:

A better place is probably in gen_subprogram_die, returning early before

  /* Output Dwarf info for all of the stuff within the body of the function
 (if it has one - it may be just a declaration).

note we also emit DIEs for [optionally also unused, if requested] function
declarations without actual definitions, I would guess CTF doesn't since
there's no symbol table entry for those.  Plus we by default prune types
that are not used.  So

struct S { int i; };
extern void foo (struct S *);
void bar()
{
  struct S s;
  foo ();
}

would have DIEs for S and foo in addition to that for bar.  To me it seems
those are not relevant for function entry point inspection (eventually both
S and foo have CTF info in the defining unit).  Correct?

Richard.

>
> For the numbers in the email today:
> 1. CFLAGS="-g -gdwarf-like-ctf -gno-record-gcc-switches -O2". dwz is used on
> generated binaries.
> 2. At this time, I wanted to account for .debug_str entities appropriately 
> (not
> 50% as done previously). Using a small script to count chars for
> accounting the "path-like" strings, specifically those strings that start
> with a ".", I gathered the data in column named D5.
>
> (coreutils-0.22)
>   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings 
> (D5) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
> ls   14100   |994|16945   | 1328  
> |   26240 | 0.85
> pwd   6341   |632| 9311   |  596  
> |   13929 | 0.88
> groups 6410  |714| 9218   |  667  
> |   13378 | 0.85
> Average geomean across coreutils = 0.84
>
> 

Re: Type representation in CTF and DWARF

2019-10-24 Thread Indu Bhagat




On 10/11/2019 04:41 AM, Jakub Jelinek wrote:

On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote:

(coreutils-0.22)
   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
(uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
ls   30616   |1136   |21098   | 26240   
| 0.62
pwd  10734   |788|10433   | 13929   
| 0.83
groups 10706 |811|10249   | 13378   
| 0.80

(emacs-26.3)
   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
(uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
emacs-26.3.1 674657  |6402   |   273963   |   273910
| 0.33

I chose to account for 50% of .debug_str because at this point, it will be
unfair to not account for them. Actually, one could even argue that upto 70%
of the .debug_str are names of entities. CTF section sizes do include the CTF
string tables.

Across coreutils, I see a geomean of 0.73 (ratio of
.ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
"-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
footprint than CTF (with 50% of .debug_str accounted for).

I'm not convinced this "improvement" in size is worth maintainig another
debug-info format much less since it lacks desirable features right now
and thus evaluation is tricky.

At least you can improve dwarf size considerably with a low amount of work.

I suspect another factor where dwarf is bigger compared to CTF is that dwarf
is recording typedef names as well as qualified type variants.  But maybe
CTF just has a more compact representation for the bits it actually implements.

Does CTF record automatic variables in functions, or just global variables?
If only the latter, it would be fair to also disable addition of local
variable DIEs, lexical blocks.  Does CTF record inline functions?  Again, if
not, it would be fair to not emit that either in .debug_info.
-gno-record-gcc-switches so that the compiler command line is not encoded in
the debug info (unless it is in CTF).


CTF includes file-scope and global-scope entities. So, CTF for a function
defined/declared at these scopes is available in .ctf section, even if it is
inlined.

To not generate DWARF for function-local entities, I made a tweak in the
gen_decl_die API to have an early exit when TREE_CODE (DECL_CONTEXT (decl))
is FUNCTION_DECL.

@@ -26374,6 +26374,12 @@ gen_decl_die (tree decl, tree origin, struct 
vlr_context *ctx,
   if (DECL_P (decl_or_origin) && DECL_IGNORED_P (decl_or_origin))
 return NULL;
 
+  /* Do not generate info for function local decl when -gdwarf-like-ctf is

+ enabled.  */
+  if (debug_dwarf_like_ctf && DECL_CONTEXT (decl)
+  && (TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL))
+return NULL;
+
   switch (TREE_CODE (decl_or_origin))
 {
 case ERROR_MARK:


For the numbers in the email today:
1. CFLAGS="-g -gdwarf-like-ctf -gno-record-gcc-switches -O2". dwz is used on
   generated binaries.
2. At this time, I wanted to account for .debug_str entities appropriately (not
   50% as done previously). Using a small script to count chars for
   accounting the "path-like" strings, specifically those strings that start
   with a ".", I gathered the data in column named D5.

(coreutils-0.22)
 .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings (D5) | 
.ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
ls   14100   |994|16945   | 1328  | 
  26240 | 0.85
pwd   6341   |632| 9311   |  596  | 
  13929 | 0.88
groups 6410  |714| 9218   |  667  | 
  13378 | 0.85
Average geomean across coreutils = 0.84

(emacs-26.3)
 .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | path strings (D5) | 
.ctf (uncompressed) | ratio (.ctf/(D1+D2+D4-D5))
emacs-26.3.1 373678  |3794   |   219048   |  3842 | 
273910  | 0.46


DWARF is highly extensible format, what exactly is and is not emitted is
something that consumers can choose.
Yes, DWARF can be large, but mainly because it provides a lot of
information, the actual representation has been designed with size concerns
in mind and newer versions of the standard keep improving that too.

Jakub


Yes.

I started out to provide some numbers around the size impact of CTF vs DWARF
as it was a legitimate curiosity many of us have had. Comparing Compactness or
feature matrices is only one dimension of evaluating the utility of supporting
CTF in the toolchain (including GCC; Bintuils and GDB have already accepted
initial CTF support). The other dimension is a user friendly workflow which
supports current users and eases further adoption and growth.

Indu



Re: Type representation in CTF and DWARF

2019-10-18 Thread Nick Alcock
On 18 Oct 2019, Pedro Alves stated:

> On 10/18/19 2:21 PM, Richard Biener wrote:
>
 In most cases local types etc are a fairly small contributor to the
 total volume -- but macros can contribute a lot in some codebases.
>>> (The
 Linux kernel's READ_ONCE macro is one I've personally been bitten by
>>> in
 the past, with a new local struct in every use. GCC doesn't
>>> deduplicate
 any of those so the resulting bloat from tens of thousands of
>>> instances
 of this identical structure is quite incredible...)

>>>
>>> Sounds like something that would be beneficial to do with DWARF too.
>> 
>> Otoh those are distinct types according to the C standard and since dwarf is 
>> a source level representation we should preserve this (source locations also 
>> differ). 
>
> Right.  Maybe some partial deduplication would be possible, preserving
> type distinction.  But since CTF doesn't include these, this is moot
> for now.

Yeah, the libctf API and existing CTF users only care if they're
assignment-compatible, which they are. We could preserve more
type-identity information if there was a need to do so, but none has yet
emerged.

-- 
NULL && (void)


Re: Type representation in CTF and DWARF

2019-10-18 Thread Pedro Alves
On 10/18/19 2:21 PM, Richard Biener wrote:

>>> In most cases local types etc are a fairly small contributor to the
>>> total volume -- but macros can contribute a lot in some codebases.
>> (The
>>> Linux kernel's READ_ONCE macro is one I've personally been bitten by
>> in
>>> the past, with a new local struct in every use. GCC doesn't
>> deduplicate
>>> any of those so the resulting bloat from tens of thousands of
>> instances
>>> of this identical structure is quite incredible...)
>>>
>>
>> Sounds like something that would be beneficial to do with DWARF too.
> 
> Otoh those are distinct types according to the C standard and since dwarf is 
> a source level representation we should preserve this (source locations also 
> differ). 

Right.  Maybe some partial deduplication would be possible, preserving
type distinction.  But since CTF doesn't include these, this is moot
for now.

Thanks,
Pedro Alves


Re: Type representation in CTF and DWARF

2019-10-18 Thread Richard Biener
On October 18, 2019 1:59:36 PM GMT+02:00, Pedro Alves  wrote:
>On 10/17/19 7:59 PM, Nick Alcock wrote:
>> On 17 Oct 2019, Richard Biener verbalised:
>> 
>>> On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock 
>wrote:

 On 11 Oct 2019, Indu Bhagat stated:
> Compile with -g -gdwarf-like-ctf and use dwz -o 
> (using
> dwz compiled from the master branch) on the generated binaries:
>
> (coreutils-0.22)
>  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf
>(uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> ls   30616   |1136   |21098   | 26240 
> | 0.62
> pwd  10734   |788|10433   | 13929 
> | 0.83
> groups 10706 |811|10249   | 13378 
> | 0.80
>
> (emacs-26.3)
>  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf
>(uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> emacs-26.3.1 674657  |6402   |   273963   |  
>273910| 0.33
>>>
>>> Btw, for a fair comparison you have to remove all DW_TAG_subroutine
>>> children as well since CTF doesn't represent scopes or local
>variables
>>> at all (nor types only used by locals). It seems CTF only represents
>>> function entry points.
>> 
>> Good point: I'll have to hack up a DWARF trimmer to do this
>comparison
>> properly, I think. (Though CTF does represent global variables,
>> including file-scope statics.)
>
>Wouldn't it be possible to extend the -gdwarf-like-ctf hack to skip
>emitting those things?

Sure. 

>> 
>> In most cases local types etc are a fairly small contributor to the
>> total volume -- but macros can contribute a lot in some codebases.
>(The
>> Linux kernel's READ_ONCE macro is one I've personally been bitten by
>in
>> the past, with a new local struct in every use. GCC doesn't
>deduplicate
>> any of those so the resulting bloat from tens of thousands of
>instances
>> of this identical structure is quite incredible...)
>> 
>
>Sounds like something that would be beneficial to do with DWARF too.

Otoh those are distinct types according to the C standard and since dwarf is a 
source level representation we should preserve this (source locations also 
differ). 

Richard. 

>Thanks,
>Pedro Alves



Re: Type representation in CTF and DWARF

2019-10-18 Thread Pedro Alves
On 10/17/19 7:59 PM, Nick Alcock wrote:
> On 17 Oct 2019, Richard Biener verbalised:
> 
>> On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock  wrote:
>>>
>>> On 11 Oct 2019, Indu Bhagat stated:
 Compile with -g -gdwarf-like-ctf and use dwz -o   
 (using
 dwz compiled from the master branch) on the generated binaries:

 (coreutils-0.22)
  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
 (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
 ls   30616   |1136   |21098   | 26240  
  | 0.62
 pwd  10734   |788|10433   | 13929  
  | 0.83
 groups 10706 |811|10249   | 13378  
  | 0.80

 (emacs-26.3)
  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
 (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
 emacs-26.3.1 674657  |6402   |   273963   |   273910   
  | 0.33
>>
>> Btw, for a fair comparison you have to remove all DW_TAG_subroutine
>> children as well since CTF doesn't represent scopes or local variables
>> at all (nor types only used by locals). It seems CTF only represents
>> function entry points.
> 
> Good point: I'll have to hack up a DWARF trimmer to do this comparison
> properly, I think. (Though CTF does represent global variables,
> including file-scope statics.)

Wouldn't it be possible to extend the -gdwarf-like-ctf hack to skip
emitting those things?

> 
> In most cases local types etc are a fairly small contributor to the
> total volume -- but macros can contribute a lot in some codebases. (The
> Linux kernel's READ_ONCE macro is one I've personally been bitten by in
> the past, with a new local struct in every use. GCC doesn't deduplicate
> any of those so the resulting bloat from tens of thousands of instances
> of this identical structure is quite incredible...)
> 

Sounds like something that would be beneficial to do with DWARF too.

Thanks,
Pedro Alves


Re: Type representation in CTF and DWARF

2019-10-18 Thread Pedro Alves
On 10/17/19 6:36 PM, Nick Alcock wrote:
> A side note here: the sizes given above are uncompressed sizes, but in
> the real world CTF is almost always compressed: the threshold for
> compression is in theory customizable but at the moment is hardwired at
> 4KiB-uncompressed in the linker. I usually see compression ratios of
> roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary,
> /usr/lib/libgtk-3.so.0.2404.3, and got these sizes:

DWARF can be compressed too, with --compress-debug-sections.

Thanks,
Pedro Alves


Re: Type representation in CTF and DWARF

2019-10-17 Thread Nick Alcock
On 17 Oct 2019, Richard Biener verbalised:

> On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock  wrote:
>>
>> On 11 Oct 2019, Indu Bhagat stated:
>> > Compile with -g -gdwarf-like-ctf and use dwz -o   
>> > (using
>> > dwz compiled from the master branch) on the generated binaries:
>> >
>> > (coreutils-0.22)
>> >  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
>> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
>> > ls   30616   |1136   |21098   | 26240  
>> >  | 0.62
>> > pwd  10734   |788|10433   | 13929  
>> >  | 0.83
>> > groups 10706 |811|10249   | 13378  
>> >  | 0.80
>> >
>> > (emacs-26.3)
>> >  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
>> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
>> > emacs-26.3.1 674657  |6402   |   273963   |   273910   
>> >  | 0.33
>
> Btw, for a fair comparison you have to remove all DW_TAG_subroutine
> children as well since CTF doesn't represent scopes or local variables
> at all (nor types only used by locals). It seems CTF only represents
> function entry points.

Good point: I'll have to hack up a DWARF trimmer to do this comparison
properly, I think. (Though CTF does represent global variables,
including file-scope statics.)

In most cases local types etc are a fairly small contributor to the
total volume -- but macros can contribute a lot in some codebases. (The
Linux kernel's READ_ONCE macro is one I've personally been bitten by in
the past, with a new local struct in every use. GCC doesn't deduplicate
any of those so the resulting bloat from tens of thousands of instances
of this identical structure is quite incredible...)


Re: Type representation in CTF and DWARF

2019-10-17 Thread Richard Biener
On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock  wrote:
>
> On 11 Oct 2019, Indu Bhagat stated:
> > Compile with -g -gdwarf-like-ctf and use dwz -o   (using
> > dwz compiled from the master branch) on the generated binaries:
> >
> > (coreutils-0.22)
> >  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > ls   30616   |1136   |21098   | 26240   
> > | 0.62
> > pwd  10734   |788|10433   | 13929   
> > | 0.83
> > groups 10706 |811|10249   | 13378   
> > | 0.80
> >
> > (emacs-26.3)
> >  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > emacs-26.3.1 674657  |6402   |   273963   |   273910
> > | 0.33

Btw, for a fair comparison you have to remove all DW_TAG_subroutine
children as well since
CTF doesn't represent scopes or local variables at all (nor types only
used by locals).  It seems
CTF only represents function entry points.

> A side note here: the sizes given above are uncompressed sizes, but in
> the real world CTF is almost always compressed: the threshold for
> compression is in theory customizable but at the moment is hardwired at
> 4KiB-uncompressed in the linker. I usually see compression ratios of
> roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary,
> /usr/lib/libgtk-3.so.0.2404.3, and got these sizes:
>
> .text: 3317489
> DWARF: 8589254
> Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264
> .ctf section size: 213839
>
> Note that this is not only in the absence of CTF strtab sharing with the
> ELF dynstrtab, but also using a less effective compressor: currently we
> use gzip, but I expect to transition to lzma iff available at binutils
> build time (which it usually is), perhaps as an option (on by default)
> to allow interoperability with binutils that don't have lzma available.
> Obviously better compressors will save even more space.
>
> It may help that CTF is designed for good compressibility: we try to
> minimize the number of unique symbols if we can do so without impairing
> other properties, e.g. by avoiding encoding IDs of objects when we can
> instead rely on the consumer to compute them at read time by walking
> through the relevant data structures and counting.
>
> A few benchamrks indicate that compression by default also saves time
> both at compression and decompression time.
>
> (Within a week I should be able to repeat this with an ld capable of CTF
> deduplication rather than kludging it with a deduplicator meant for a
> quite different job. I expect the sizes above to improve. In fact if
> they *don't* improve I will take this as strong evidence that my
> deduplicator is buggy.)
>
>
> FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but
> with deduplication: it's bigger than I'd like at around 10% of .text
> size, but still much less than 1% of binary size (my goal is 1--2% of
> .text, but Emacs is a nice tricky case, like Gtk, with lots of big types
> and structures with long member names):
>
> section  size  addr
> .interp28   4194872
> .note.ABI-tag  32   4194900
> .note.gnu.build-id 36   4194932
> .gnu.hash 628   4194968
> .dynsym 24432   4195600
> .dynstr 16934   4220032
> .gnu.version 2036   4236966
> .gnu.version_r704   4239008
> .rela.data.rel.ro  72   4239712
> .rela.data168   4239784
> .rela.got  48   4239952
> .rela,bss 336   424
> .rela.plt   23448   4240336
> .init  23   4263784
> .plt15648   4263808
> .text 1912622   4279456
> .fini   9   6192080
> .rodata165416   6192096
> .eh_frame_hdr   36196   6357512
> .eh_frame  210976   6393712
> .init_array 8   6609328
> .fini_array 8   6609336
> .data.rel.ro 4569   6609344
> .dynamic 1104   6613920
> .got   16   6615024
> .got.plt 7840   6615040
> .data 3276077   6622880
> ,bss 34153472   9899008
> .comment   26 0
> .gnu_debuglink 24 0
> .comment   26 0
> .debug_aranges   1536 0
> .debug_info   3912261 0
> .debug_abbrev   38821 0
> .debug_line408063 0
> .debug_str 117631 0
> .debug_loc 954538 0
> .debug_ranges  149590 0
> .ctf   213839 0
> .ctf (uncompressed)713264 0
>
> (obviously, manually edited a bit, size -A 

Re: Type representation in CTF and DWARF

2019-10-17 Thread Nick Alcock
On 11 Oct 2019, Indu Bhagat stated:
> Compile with -g -gdwarf-like-ctf and use dwz -o   (using
> dwz compiled from the master branch) on the generated binaries:
>
> (coreutils-0.22)
>  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> ls   30616   |1136   |21098   | 26240 
>   | 0.62
> pwd  10734   |788|10433   | 13929 
>   | 0.83
> groups 10706 |811|10249   | 13378 
>   | 0.80
>
> (emacs-26.3)
>  .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> emacs-26.3.1 674657  |6402   |   273963   |   273910  
>   | 0.33

A side note here: the sizes given above are uncompressed sizes, but in
the real world CTF is almost always compressed: the threshold for
compression is in theory customizable but at the moment is hardwired at
4KiB-uncompressed in the linker. I usually see compression ratios of
roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary,
/usr/lib/libgtk-3.so.0.2404.3, and got these sizes:

.text: 3317489
DWARF: 8589254
Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264
.ctf section size: 213839

Note that this is not only in the absence of CTF strtab sharing with the
ELF dynstrtab, but also using a less effective compressor: currently we
use gzip, but I expect to transition to lzma iff available at binutils
build time (which it usually is), perhaps as an option (on by default)
to allow interoperability with binutils that don't have lzma available.
Obviously better compressors will save even more space.

It may help that CTF is designed for good compressibility: we try to
minimize the number of unique symbols if we can do so without impairing
other properties, e.g. by avoiding encoding IDs of objects when we can
instead rely on the consumer to compute them at read time by walking
through the relevant data structures and counting.

A few benchamrks indicate that compression by default also saves time
both at compression and decompression time. 

(Within a week I should be able to repeat this with an ld capable of CTF
deduplication rather than kludging it with a deduplicator meant for a
quite different job. I expect the sizes above to improve. In fact if
they *don't* improve I will take this as strong evidence that my
deduplicator is buggy.)


FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but
with deduplication: it's bigger than I'd like at around 10% of .text
size, but still much less than 1% of binary size (my goal is 1--2% of
.text, but Emacs is a nice tricky case, like Gtk, with lots of big types
and structures with long member names):

section  size  addr
.interp28   4194872
.note.ABI-tag  32   4194900
.note.gnu.build-id 36   4194932
.gnu.hash 628   4194968
.dynsym 24432   4195600
.dynstr 16934   4220032
.gnu.version 2036   4236966
.gnu.version_r704   4239008
.rela.data.rel.ro  72   4239712
.rela.data168   4239784
.rela.got  48   4239952
.rela,bss 336   424
.rela.plt   23448   4240336
.init  23   4263784
.plt15648   4263808
.text 1912622   4279456
.fini   9   6192080
.rodata165416   6192096
.eh_frame_hdr   36196   6357512
.eh_frame  210976   6393712
.init_array 8   6609328
.fini_array 8   6609336
.data.rel.ro 4569   6609344
.dynamic 1104   6613920
.got   16   6615024
.got.plt 7840   6615040
.data 3276077   6622880
,bss 34153472   9899008
.comment   26 0
.gnu_debuglink 24 0
.comment   26 0
.debug_aranges   1536 0
.debug_info   3912261 0
.debug_abbrev   38821 0
.debug_line408063 0
.debug_str 117631 0
.debug_loc 954538 0
.debug_ranges  149590 0
.ctf   213839 0
.ctf (uncompressed)713264 0

(obviously, manually edited a bit, size -A doesn't produce the last line
on its own!)

(I'm not sure what the hell is going on with the weirdly-named ,bss
section. Probably something to do with unexec().)


Re: Type representation in CTF and DWARF

2019-10-15 Thread Nick Alcock
On 9 Oct 2019, Indu Bhagat told this:

> Yes, CTF does not support C++ at this time. To cover all of C (including
> GNU C extensions), we need to add representation for things like Vector type,
> non IEEE float etc. (somewhat infrequently occurring constructs)

One note: adding C++ support will not make the representation of CTF for
C any larger, because I plan to do as DWARF does and have a language tag
in the header, and only support one language per CTF dictionary[1].  The
type section format will otherwise be completely distinct between the
two languages, specifically in order that the C side of things not pay
the price for the (necessarily richer) C++ type representation. This is
very much a C++ thing: don't pay for what you don't use :)

So there's no need to worry that adding C++ support will make any C
compactness figures worse. You only need to consider that the C++ CTF
representation may not be able to be as compact as the C representation
-- and even there I hope to come close.

[1! though there is a possibility of having a C++ dictionary cite types
from a C one, allowing some sharing: this is all format v5 stuff,
i.e. two format revs away, and this bit in particular is not yet
designed, but feels possible.)


Re: Type representation in CTF and DWARF

2019-10-11 Thread Indu Bhagat




On 10/11/2019 04:23 AM, Richard Biener wrote:

Thanks for your pointers.

CTF does not encode location information. So, I used early exit in the
add_src_coords_attributes to avoid generation of location info (file, line,
column). To answer Richard's question, CTF does have type debug info
for function declarations and the argument types. So I think with these
changes, both CTF and DWARF generation will emit debug info for the same set of
types and decl.

Compile with -g -gdwarf-like-ctf and use dwz -o   (using
dwz compiled from the master branch) on the generated binaries:

(coreutils-0.22)
   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
(uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
ls   30616   |1136   |21098   | 26240   
| 0.62
pwd  10734   |788|10433   | 13929   
| 0.83
groups 10706 |811|10249   | 13378   
| 0.80

(emacs-26.3)
   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
(uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
emacs-26.3.1 674657  |6402   |   273963   |   273910
| 0.33

I chose to account for 50% of .debug_str because at this point, it will be
unfair to not account for them. Actually, one could even argue that upto 70%
of the .debug_str are names of entities. CTF section sizes do include the CTF
string tables.

Across coreutils, I see a geomean of 0.73 (ratio of
.ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
"-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
footprint than CTF (with 50% of .debug_str accounted for).

I'm not convinced this "improvement" in size is worth maintainig another
debug-info format much less since it lacks desirable features right now
and thus evaluation is tricky.

At least you can improve dwarf size considerably with a low amount of work.

I suspect another factor where dwarf is bigger compared to CTF is that dwarf
is recording typedef names as well as qualified type variants.  But maybe
CTF just has a more compact representation for the bits it actually implements.

Richard.


CTF represents typedefs and qualified type variants. They are included in the
the .ctf section sizes above.

Indu



Re: Type representation in CTF and DWARF

2019-10-11 Thread Jakub Jelinek
On Fri, Oct 11, 2019 at 01:23:12PM +0200, Richard Biener wrote:
> > (coreutils-0.22)
> >   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > ls   30616   |1136   |21098   | 26240   
> > | 0.62
> > pwd  10734   |788|10433   | 13929   
> > | 0.83
> > groups 10706 |811|10249   | 13378   
> > | 0.80
> >
> > (emacs-26.3)
> >   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> > (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > emacs-26.3.1 674657  |6402   |   273963   |   273910
> > | 0.33
> >
> > I chose to account for 50% of .debug_str because at this point, it will be
> > unfair to not account for them. Actually, one could even argue that upto 70%
> > of the .debug_str are names of entities. CTF section sizes do include the 
> > CTF
> > string tables.
> >
> > Across coreutils, I see a geomean of 0.73 (ratio of
> > .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
> > "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
> > footprint than CTF (with 50% of .debug_str accounted for).
> 
> I'm not convinced this "improvement" in size is worth maintainig another
> debug-info format much less since it lacks desirable features right now
> and thus evaluation is tricky.
> 
> At least you can improve dwarf size considerably with a low amount of work.
> 
> I suspect another factor where dwarf is bigger compared to CTF is that dwarf
> is recording typedef names as well as qualified type variants.  But maybe
> CTF just has a more compact representation for the bits it actually 
> implements.

Does CTF record automatic variables in functions, or just global variables?
If only the latter, it would be fair to also disable addition of local
variable DIEs, lexical blocks.  Does CTF record inline functions?  Again, if
not, it would be fair to not emit that either in .debug_info.
-gno-record-gcc-switches so that the compiler command line is not encoded in
the debug info (unless it is in CTF).
DWARF is highly extensible format, what exactly is and is not emitted is
something that consumers can choose.
Yes, DWARF can be large, but mainly because it provides a lot of
information, the actual representation has been designed with size concerns
in mind and newer versions of the standard keep improving that too.

Jakub


Re: Type representation in CTF and DWARF

2019-10-11 Thread Richard Biener
On Fri, Oct 11, 2019 at 1:06 AM Indu Bhagat  wrote:
>
>
>
> On 10/09/2019 12:49 AM, Jakub Jelinek wrote:
> > On Wed, Oct 09, 2019 at 09:41:09AM +0200, Richard Biener wrote:
> >> There's a mechanism to get type (and decl - I suppose CTF also
> >> contains debug info
> >> for function declarations not only its type?) info as part of early
> >> debug generation.
> >> The attached "hack" simply mangles dwarf2out to output this early info as 
> >> the
> >> only debug info (only verified on a small .c file).  We still have things 
> >> like
> >> file, line and column numbers for entities (not sure if CTF has those).
> >>
> >> It should be possible to "hide" the hack behind a -gdwarf-like-ctf or 
> >> similar.
> >> I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already...
> >> (and -g1 doesn't include types but just decls).
> > Yeah.  And if location info isn't in CTF, you can as well add an early
> > return in add_src_coords_attributes, like it has one for UNKNOWN_LOCATION
> > already.  Or if it is there, but just file/line and not column, you can use
> > -gno-column-info.  As has been mentioned earlier, you can use dwz utility
> > post-linking instead of -fdebug-types-section.
> >
> >   Jakub
>
> Thanks for your pointers.
>
> CTF does not encode location information. So, I used early exit in the
> add_src_coords_attributes to avoid generation of location info (file, line,
> column). To answer Richard's question, CTF does have type debug info
> for function declarations and the argument types. So I think with these
> changes, both CTF and DWARF generation will emit debug info for the same set 
> of
> types and decl.
>
> Compile with -g -gdwarf-like-ctf and use dwz -o   (using
> dwz compiled from the master branch) on the generated binaries:
>
> (coreutils-0.22)
>   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> ls   30616   |1136   |21098   | 26240 
>   | 0.62
> pwd  10734   |788|10433   | 13929 
>   | 0.83
> groups 10706 |811|10249   | 13378 
>   | 0.80
>
> (emacs-26.3)
>   .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf 
> (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> emacs-26.3.1 674657  |6402   |   273963   |   273910  
>   | 0.33
>
> I chose to account for 50% of .debug_str because at this point, it will be
> unfair to not account for them. Actually, one could even argue that upto 70%
> of the .debug_str are names of entities. CTF section sizes do include the CTF
> string tables.
>
> Across coreutils, I see a geomean of 0.73 (ratio of
> .ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
> "-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
> footprint than CTF (with 50% of .debug_str accounted for).

I'm not convinced this "improvement" in size is worth maintainig another
debug-info format much less since it lacks desirable features right now
and thus evaluation is tricky.

At least you can improve dwarf size considerably with a low amount of work.

I suspect another factor where dwarf is bigger compared to CTF is that dwarf
is recording typedef names as well as qualified type variants.  But maybe
CTF just has a more compact representation for the bits it actually implements.

Richard.

> Indu
>


Re: Type representation in CTF and DWARF

2019-10-10 Thread Indu Bhagat




On 10/09/2019 12:49 AM, Jakub Jelinek wrote:

On Wed, Oct 09, 2019 at 09:41:09AM +0200, Richard Biener wrote:

There's a mechanism to get type (and decl - I suppose CTF also
contains debug info
for function declarations not only its type?) info as part of early
debug generation.
The attached "hack" simply mangles dwarf2out to output this early info as the
only debug info (only verified on a small .c file).  We still have things like
file, line and column numbers for entities (not sure if CTF has those).

It should be possible to "hide" the hack behind a -gdwarf-like-ctf or similar.
I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already...
(and -g1 doesn't include types but just decls).

Yeah.  And if location info isn't in CTF, you can as well add an early
return in add_src_coords_attributes, like it has one for UNKNOWN_LOCATION
already.  Or if it is there, but just file/line and not column, you can use
-gno-column-info.  As has been mentioned earlier, you can use dwz utility
post-linking instead of -fdebug-types-section.

Jakub


Thanks for your pointers.

CTF does not encode location information. So, I used early exit in the
add_src_coords_attributes to avoid generation of location info (file, line,
column). To answer Richard's question, CTF does have type debug info
for function declarations and the argument types. So I think with these
changes, both CTF and DWARF generation will emit debug info for the same set of
types and decl.

Compile with -g -gdwarf-like-ctf and use dwz -o   (using
dwz compiled from the master branch) on the generated binaries:

(coreutils-0.22)
 .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) 
| ratio (.ctf/(D1+D2+0.5*D4))
ls   30616   |1136   |21098   | 26240   
| 0.62
pwd  10734   |788|10433   | 13929   
| 0.83
groups 10706 |811|10249   | 13378   
| 0.80

(emacs-26.3)
 .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) 
| ratio (.ctf/(D1+D2+0.5*D4))
emacs-26.3.1 674657  |6402   |   273963   |   273910
| 0.33

I chose to account for 50% of .debug_str because at this point, it will be
unfair to not account for them. Actually, one could even argue that upto 70%
of the .debug_str are names of entities. CTF section sizes do include the CTF
string tables.

Across coreutils, I see a geomean of 0.73 (ratio of
.ctf/(.debug_info + .debug_abbrev + 50% of .debug_str)). So, with the
"-gdwarf-like-ctf code stubs" and dwz, DWARF continues to have a larger
footprint than CTF (with 50% of .debug_str accounted for).

Indu



Re: Type representation in CTF and DWARF

2019-10-09 Thread Segher Boessenkool
On Tue, Oct 08, 2019 at 10:26:13PM -0700, Indu Bhagat wrote:
> The justification for CTF is and will remain - a compact, faster debug 
> format
> for type information and support some online debugging use-cases (like
> backtraces) in future.

Approximate backtraces, sure.  (It cannot know if another frame has been
stacked by the current function already, or not).


Segher


Re: Type representation in CTF and DWARF

2019-10-09 Thread Jakub Jelinek
On Wed, Oct 09, 2019 at 09:41:09AM +0200, Richard Biener wrote:
> There's a mechanism to get type (and decl - I suppose CTF also
> contains debug info
> for function declarations not only its type?) info as part of early
> debug generation.
> The attached "hack" simply mangles dwarf2out to output this early info as the
> only debug info (only verified on a small .c file).  We still have things like
> file, line and column numbers for entities (not sure if CTF has those).
> 
> It should be possible to "hide" the hack behind a -gdwarf-like-ctf or similar.
> I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already...
> (and -g1 doesn't include types but just decls).

Yeah.  And if location info isn't in CTF, you can as well add an early
return in add_src_coords_attributes, like it has one for UNKNOWN_LOCATION
already.  Or if it is there, but just file/line and not column, you can use
-gno-column-info.  As has been mentioned earlier, you can use dwz utility
post-linking instead of -fdebug-types-section.

Jakub


Re: Type representation in CTF and DWARF

2019-10-09 Thread Richard Biener
On Wed, Oct 9, 2019 at 7:26 AM Indu Bhagat  wrote:
>
>
>
> On 10/08/2019 08:37 AM, Pedro Alves wrote:
> > On 10/4/19 8:23 PM, Indu Bhagat wrote:
> >> Hello,
> >>
> >> At GNU Tools Cauldron this year, some folks were curious to know more on 
> >> how
> >> the "type representation" in CTF compares vis-a-vis DWARF.
> > I was one of those, and I brought this up to Jose, after your
> > presentation.  Glad to see the follow up!  Thanks much for this.
> >
> > In your Cauldron presentation we saw CTF compared to full blown DWARF
> > as justification for CTF,
>
> Hmm. And I thought I made the effort reqd to clarify my position that 
> comparing
> full-blown DWARF sizes to type-only CTF section sizes is not appropriate, let
> alone to not use as a justification for CTF. My intention to show those 
> numbers was
> only to give some perspective to users curious to know the sizes of CTF debug
> info (as generated by dwarf2ctf) because these sections will ideally be not
> stripped out of shipped binaries.
>
> The justification for CTF is and will remain - a compact, faster debug format
> for type information and support some online debugging use-cases (like
> backtraces) in future.
>
> > but I was more interested in a comparison between
> > CTF and a DWARF subset containing exactly only what you have available in
> > CTF.  Because if DWARF with everything-you-don't-need stripped out
> > is in the same ballpark, then I am puzzled on why add/maintain a new
> > Debug format, with all the duplication of effort that entails going
> > forward.
>
> I shared some numbers on this in the previous emails in this thread. I thought
> comparing DWARF's de-duplication-amenable offering (using
> -fdebug-types-section) will be useful in this context.
>
> For binaries compiled with -fdebug-types-section -gdwarf-4, here is some data.
> The CTF sections are generated with dwarf2ctf because CTF link-time de-dup is
> being worked on currently. The end result of link-time CTF de-dup is expected
> to be at par with these .ctf section sizes.
>
> The .ctf section sizes below include the CTF string table (.debug_str is
> excluded from the calculations however):
>
> (coreutils-0.22)
> .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | 
> .ctf (uncompressed) | ratio (.ctf/(D1+D2+D3))
> ls  109806 |  18876|  22042 |  12413   |   
> 26240 | 0.18
> pwd 27902  |  7914 |  10851 |  5753|   
> 13929 | 0.33
> groups 26920   |  8173 |  10674 |  5070|   
> 13378 | 0.33
>
> (emacs-26.3)
> .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | 
> .ctf (uncompressed) | ratio (.ctf/(D1+D2+D3))
> emacs 3755083  |   202926  |  431926|   143462 |   
> 273910| 0.06
>
>
> It is not easy to get an estimate of 'DWARF with everything-you-don't-need
> stripped out'. At this time, I don't know of an easy way to make this 
> comparison
> more meaningful. Any suggestions ?

There's a mechanism to get type (and decl - I suppose CTF also
contains debug info
for function declarations not only its type?) info as part of early
debug generation.
The attached "hack" simply mangles dwarf2out to output this early info as the
only debug info (only verified on a small .c file).  We still have things like
file, line and column numbers for entities (not sure if CTF has those).

It should be possible to "hide" the hack behind a -gdwarf-like-ctf or similar.
I guess -g0.5 isn't desirable and we've taken both -g0 and -g1 already...
(and -g1 doesn't include types but just decls).

Richard.

> > Also, it's my understanding that the current CTF format doesn't yet
> > support C++, Vector registers, etc., maybe other things, so if DWARF
> > was sufficient for your needs, then in the long run it sounds like
> > a better option to me, as then you wouldn't have to extend CTF _and_
> > DWARF whenever some feature is needed.
>
> Yes, CTF does not support C++ at this time. To cover all of C (including
> GNU C extensions), we need to add representation for things like Vector type,
> non IEEE float etc. (somewhat infrequently occurring constructs)
>
> The issue is not that DWARF cannot represent the required type information.
> DWARF is voluminous and secondly, the current workflow to get to CTF from
> source programs without direct toolchain support is tiresome and lengthy.
>
> For current and future users of CTF, having the support for the format in the
> toolchain is the best way to promote adoption and enhance community 
> experience.
>
> > Maybe it would make sense to work on integrating CTF into the DWARF
> > standard itself, not sure?
> >
> > I was also curious on your plans for adding unwinding support to CTF,
> > while the kernel (the main CTF user, IIUC), already has plans to
> > use its own unwinding format (ORC)?
>
> Kernel's unwinding format (ORC) helps generate backtrace with 

Re: Type representation in CTF and DWARF

2019-10-08 Thread Indu Bhagat




On 10/08/2019 08:37 AM, Pedro Alves wrote:

On 10/4/19 8:23 PM, Indu Bhagat wrote:

Hello,

At GNU Tools Cauldron this year, some folks were curious to know more on how
the "type representation" in CTF compares vis-a-vis DWARF.

I was one of those, and I brought this up to Jose, after your
presentation.  Glad to see the follow up!  Thanks much for this.

In your Cauldron presentation we saw CTF compared to full blown DWARF
as justification for CTF,


Hmm. And I thought I made the effort reqd to clarify my position that comparing
full-blown DWARF sizes to type-only CTF section sizes is not appropriate, let
alone to not use as a justification for CTF. My intention to show those numbers 
was
only to give some perspective to users curious to know the sizes of CTF debug
info (as generated by dwarf2ctf) because these sections will ideally be not
stripped out of shipped binaries.

The justification for CTF is and will remain - a compact, faster debug format
for type information and support some online debugging use-cases (like
backtraces) in future.


but I was more interested in a comparison between
CTF and a DWARF subset containing exactly only what you have available in
CTF.  Because if DWARF with everything-you-don't-need stripped out
is in the same ballpark, then I am puzzled on why add/maintain a new
Debug format, with all the duplication of effort that entails going
forward.


I shared some numbers on this in the previous emails in this thread. I thought
comparing DWARF's de-duplication-amenable offering (using
-fdebug-types-section) will be useful in this context.

For binaries compiled with -fdebug-types-section -gdwarf-4, here is some data.
The CTF sections are generated with dwarf2ctf because CTF link-time de-dup is
being worked on currently. The end result of link-time CTF de-dup is expected
to be at par with these .ctf section sizes.

The .ctf section sizes below include the CTF string table (.debug_str is
excluded from the calculations however):

(coreutils-0.22)
   .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | .ctf 
(uncompressed) | ratio (.ctf/(D1+D2+D3))
ls  109806 |  18876|  22042 |  12413   |   
26240 | 0.18
pwd 27902  |  7914 |  10851 |  5753|   
13929 | 0.33
groups 26920   |  8173 |  10674 |  5070|   
13378 | 0.33

(emacs-26.3)
   .debug_info(D1) | .debug_abbrev(D2) | .debug_str | .debug_types(D3) | .ctf 
(uncompressed) | ratio (.ctf/(D1+D2+D3))
emacs 3755083  |   202926  |  431926|   143462 |   
273910| 0.06


It is not easy to get an estimate of 'DWARF with everything-you-don't-need
stripped out'. At this time, I don't know of an easy way to make this comparison
more meaningful. Any suggestions ?


Also, it's my understanding that the current CTF format doesn't yet
support C++, Vector registers, etc., maybe other things, so if DWARF
was sufficient for your needs, then in the long run it sounds like
a better option to me, as then you wouldn't have to extend CTF _and_
DWARF whenever some feature is needed.


Yes, CTF does not support C++ at this time. To cover all of C (including
GNU C extensions), we need to add representation for things like Vector type,
non IEEE float etc. (somewhat infrequently occurring constructs)

The issue is not that DWARF cannot represent the required type information.
DWARF is voluminous and secondly, the current workflow to get to CTF from
source programs without direct toolchain support is tiresome and lengthy.

For current and future users of CTF, having the support for the format in the
toolchain is the best way to promote adoption and enhance community experience.


Maybe it would make sense to work on integrating CTF into the DWARF
standard itself, not sure?

I was also curious on your plans for adding unwinding support to CTF,
while the kernel (the main CTF user, IIUC), already has plans to
use its own unwinding format (ORC)?


Kernel's unwinding format (ORC) helps generate backtrace with function
identifiers. For some (ORCL) internal customers, the requirement is to go beyond
that and support input arg values. The requirement there is to generate
backtraces in a fast way, without relying on DWARF.


So with all those questions, I came out of the presentation
thinking that I could not really justify CTF if I were asked to.


Thanks for discussing this openly. I believe there are other GCC
maintainers who are undecided as well :)

I hope I have answered some of your concerns.


(Side note: the Cauldron page is missing slides for your
presentation, so I couldn't go and recheck some things
mentioned above.)

Thanks,
Pedro Alves


I mailed the organizers my slides. They should be online soon.

Thanks



Re: Type representation in CTF and DWARF

2019-10-08 Thread Pedro Alves
On 10/4/19 8:23 PM, Indu Bhagat wrote:
> Hello,
> 
> At GNU Tools Cauldron this year, some folks were curious to know more on how
> the "type representation" in CTF compares vis-a-vis DWARF.

I was one of those, and I brought this up to Jose, after your
presentation.  Glad to see the follow up!  Thanks much for this.

In your Cauldron presentation we saw CTF compared to full blown DWARF
as justification for CTF, but I was more interested in a comparison between
CTF and a DWARF subset containing exactly only what you have available in
CTF.  Because if DWARF with everything-you-don't-need stripped out
is in the same ballpark, then I am puzzled on why add/maintain a new
Debug format, with all the duplication of effort that entails going
forward.

Also, it's my understanding that the current CTF format doesn't yet
support C++, Vector registers, etc., maybe other things, so if DWARF
was sufficient for your needs, then in the long run it sounds like
a better option to me, as then you wouldn't have to extend CTF _and_
DWARF whenever some feature is needed.

Maybe it would make sense to work on integrating CTF into the DWARF
standard itself, not sure?

I was also curious on your plans for adding unwinding support to CTF,
while the kernel (the main CTF user, IIUC), already has plans to 
use its own unwinding format (ORC)?

So with all those questions, I came out of the presentation
thinking that I could not really justify CTF if I were asked to.

(Side note: the Cauldron page is missing slides for your
presentation, so I couldn't go and recheck some things
mentioned above.)

Thanks,
Pedro Alves



Re: Type representation in CTF and DWARF

2019-10-07 Thread Jason Merrill
On Mon, Oct 7, 2019 at 4:47 PM Indu Bhagat  wrote:

> On 10/07/2019 12:35 AM, Richard Biener wrote:
> > On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat 
> wrote:
> >> Hello,
> >>
> >> At GNU Tools Cauldron this year, some folks were curious to know more
> on how
> >> the "type representation" in CTF compares vis-a-vis DWARF.
> >>
> >> [...]
> >>
> >> So, for the small C testcase with a union, enum, array, struct, typedef
> etc, I
> >> see following sizes :
> >>
> >> Compile with -fdebug-types-section -gdwarf-4 (size -A  excerpt):
> >>   .debug_aranges 48 0
> >>   .debug_info   150 0
> >>   .debug_abbrev 314 0
> >>   .debug_line73 0
> >>   .debug_str455 0
> >>   .debug_ranges  32 0
> >>   .debug_types  578 0
> >>
> >> Compile with -fdebug-types-section -gdwarf-5 (size -A  excerpt):
> >>   .debug_aranges  48 0
> >>   .debug_info732 0
> >>   .debug_abbrev  309 0
> >>   .debug_line 73 0
> >>   .debug_str 455 0
> >>   .debug_rnglists 23 0
> >>
> >> Compile with -gt (size -A  excerpt):
> >>   .ctf  966 0
> >>   CTF strings sub-section size (ctf_strlen in disassmebly) = 374
> >>   == > CTF section just for representing types = 966 - 374 = 592
> bytes
> >>   (The 592 bytes include the CTF header and other indexes etc.)
> >>
> >> So, following points are what I would highlight. Hopefully this helps
> you see
> >> that CTF has promise for the task of representing type debug info.
> >>
> >> 1. Type Information layout in sections:
> >>  A .ctf section is self-sufficient to represent types in a program.
> All
> >>  references within the CTF section are via either indexes or
> offsets into the
> >>  CTF section. No relocations are necessary in CTF at this time. In
> contrast,
> >>  DWARF type information is organized in multiple sections -
> .debug_info,
> >>  .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types
> in DWARF4.
> >>
> >> 2. Type Information encoding / compactness matters:
> >>  Because the type information is organized across sections in DWARF
> (and
> >>  contains some debug information like location etc.) , it is not
> feasible
> >>  to put a distinct number to the size in bytes for representing type
> >>  information in DWARF. But the size info of sections shown above
> should
> >>  be helpful to show that CTF does show promise in compactly
> representing
> >>  types.
> >>
> >>  Lets see some size data. CTF string table (= 374 bytes) is left
> out of the
> >>  discussion at hand because it will not be fair to compare with
> .debug_str
> >>  section which contains other information than just names of types.
> >>
> >>  The 592 bytes of the .ctf section are needed to represent types in
> CTF
> >>  format. Now, when using DWARF5, the type information needs 732
> bytes in
> >>  .debug_info and 309 bytes in .debug_abbrev.
> >>
> >>  In DWARF (when using -fdebug-types-section), the base types are
> duplicated
> >>  across type units. So for the above example, the DWARF DIE
> representing
> >>  'unsigned int' will appear in both the  DWARF trees for types -
> node and
> >>  node_payload. In CTF, there is a single lone type 'unsigned int'.
> > It's not clear to me why you are using -fdebug-types-section for this
> > comparison?
> > With just -gdwarf-4 I get
> >
> > .debug_info  292
> > .debug_abbrev 189
> > .debug_str   299
> >
> > this contains all the info CTF provides (and more).  This sums to 780
> bytes,
> > smaller than the CTF variant.  I skimmed over the info and there's not
> much
> > to strip to get to CTF levels, mainly locations.  The strings section
> also
> > has a quite large portion for GCC version and arguments, which is 93
> bytes.
> > So overall the DWARF representation should clock in at less than 700
> bytes,
> > more close to 650.
> >
> > Richard.
>
> It's not in favor of DWARF to go with just -gdwarf-4. Because the types
> in the .debug_info section will not be de-duplicated. For more complicated
> code
> bases with many compilation units, this will skew the results in favor of
> CTF
> (once the CTF de-duplictor is ready :) ).
>
> Now, one might argue that in this example, there is no role for
> de-duplicator.
> Yes to that. But to all users of DWARF type debug information for _real
> codebases_, -fdebug-types-section option is the best option. Isn't it ?
>
> Keeping "the size of type debug information in the shipped artifact small"
> as
> our target is meaningful for both CTF and DWARF.
>
> De-duplication is a key contributor to reducing the size of the type debug
> information; and both CTF and DWARF types can be de-duplicated. At this
> time, I
> stuck to a simple example with one CU because it eases interpreting the
> CTF and
> DWARF 

Re: Type representation in CTF and DWARF

2019-10-07 Thread Indu Bhagat




On 10/07/2019 12:35 AM, Richard Biener wrote:

On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat  wrote:

Hello,

At GNU Tools Cauldron this year, some folks were curious to know more on how
the "type representation" in CTF compares vis-a-vis DWARF.

[...]

So, for the small C testcase with a union, enum, array, struct, typedef etc, I
see following sizes :

Compile with -fdebug-types-section -gdwarf-4 (size -A  excerpt):
  .debug_aranges 48 0
  .debug_info   150 0
  .debug_abbrev 314 0
  .debug_line73 0
  .debug_str455 0
  .debug_ranges  32 0
  .debug_types  578 0

Compile with -fdebug-types-section -gdwarf-5 (size -A  excerpt):
  .debug_aranges  48 0
  .debug_info732 0
  .debug_abbrev  309 0
  .debug_line 73 0
  .debug_str 455 0
  .debug_rnglists 23 0

Compile with -gt (size -A  excerpt):
  .ctf  966 0
  CTF strings sub-section size (ctf_strlen in disassmebly) = 374
  == > CTF section just for representing types = 966 - 374 = 592 bytes
  (The 592 bytes include the CTF header and other indexes etc.)

So, following points are what I would highlight. Hopefully this helps you see
that CTF has promise for the task of representing type debug info.

1. Type Information layout in sections:
 A .ctf section is self-sufficient to represent types in a program. All
 references within the CTF section are via either indexes or offsets into 
the
 CTF section. No relocations are necessary in CTF at this time. In contrast,
 DWARF type information is organized in multiple sections - .debug_info,
 .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in 
DWARF4.

2. Type Information encoding / compactness matters:
 Because the type information is organized across sections in DWARF (and
 contains some debug information like location etc.) , it is not feasible
 to put a distinct number to the size in bytes for representing type
 information in DWARF. But the size info of sections shown above should
 be helpful to show that CTF does show promise in compactly representing
 types.

 Lets see some size data. CTF string table (= 374 bytes) is left out of the
 discussion at hand because it will not be fair to compare with .debug_str
 section which contains other information than just names of types.

 The 592 bytes of the .ctf section are needed to represent types in CTF
 format. Now, when using DWARF5, the type information needs 732 bytes in
 .debug_info and 309 bytes in .debug_abbrev.

 In DWARF (when using -fdebug-types-section), the base types are duplicated
 across type units. So for the above example, the DWARF DIE representing
 'unsigned int' will appear in both the  DWARF trees for types - node and
 node_payload. In CTF, there is a single lone type 'unsigned int'.

It's not clear to me why you are using -fdebug-types-section for this
comparison?
With just -gdwarf-4 I get

.debug_info  292
.debug_abbrev 189
.debug_str   299

this contains all the info CTF provides (and more).  This sums to 780 bytes,
smaller than the CTF variant.  I skimmed over the info and there's not much
to strip to get to CTF levels, mainly locations.  The strings section also
has a quite large portion for GCC version and arguments, which is 93 bytes.
So overall the DWARF representation should clock in at less than 700 bytes,
more close to 650.

Richard.


It's not in favor of DWARF to go with just -gdwarf-4. Because the types
in the .debug_info section will not be de-duplicated. For more complicated code
bases with many compilation units, this will skew the results in favor of CTF
(once the CTF de-duplictor is ready :) ).

Now, one might argue that in this example, there is no role for de-duplicator.
Yes to that. But to all users of DWARF type debug information for _real
codebases_, -fdebug-types-section option is the best option. Isn't it ?

Keeping "the size of type debug information in the shipped artifact small" as
our target is meaningful for both CTF and DWARF.

De-duplication is a key contributor to reducing the size of the type debug
information; and both CTF and DWARF types can be de-duplicated. At this time, I
stuck to a simple example with one CU because it eases interpreting the CTF and
DWARF debug info in the binaries and because the CTF link-time de-duplication
is not fully ready.

(NickA suggested few days ago to compare how DWARF and CTF section sizes
 increase when a new member, or a new enum, or a new union etc are added. I can
 share some more data if there is interest in such a comparison. Few examples
 below :

1. Add a new member 'struct node_payload * a' to struct node_payload
   DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total = 11 + 17 
= 28
   CTF = 980 - 966 

Re: Type representation in CTF and DWARF

2019-10-07 Thread Richard Biener
On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat  wrote:
>
> Hello,
>
> At GNU Tools Cauldron this year, some folks were curious to know more on how
> the "type representation" in CTF compares vis-a-vis DWARF.
>
> I use small testcase below to gather some numbers to help drive this 
> discussion.
>
> [ibhagat@ibhagatpc ctf-size]$ cat ctf_sizeme.c
> #define MAX_NUM_MSGS 5
>
> enum node_type
> {
>INIT_TYPE = 0,
>COMM_TYPE = 1,
>COMP_TYPE = 2,
>MSG_TYPE = 3,
>RELEASE_TYPE = 4,
>MAX_NODE_TYPE
> };
>
> typedef struct node_payload
> {
>unsigned short npay_offset;
>const char * npay_msg;
>unsigned int npay_nelems;
>struct node_payload * npay_next;
> } node_payload;
>
> typedef struct node_property
> {
>int timestamp;
>char category;
>long initvalue;
> } node_property_t;
>
> typedef struct node
> {
>enum node_type ntype;
>int nmask:5;
>union
>  {
>struct node_payload * npayload;
>void * nbase;
>  } nu;
>  unsigned int msgs[MAX_NUM_MSGS];
>  node_property_t node_prop;
> } Node;
>
> Node s;
>
> int main (void)
> {
>return 0;
> }
>
> Note that in this case, there is nothing that the de-duplicator has to do
> (neither for the TYPE comdat sections nor CTF types). I chose such an example
> because de-duplication of types is orthogonal to the concept of representation
> of types.
>
> So, for the small C testcase with a union, enum, array, struct, typedef etc, I
> see following sizes :
>
> Compile with -fdebug-types-section -gdwarf-4 (size -A  excerpt):
>  .debug_aranges 48 0
>  .debug_info   150 0
>  .debug_abbrev 314 0
>  .debug_line73 0
>  .debug_str455 0
>  .debug_ranges  32 0
>  .debug_types  578 0
>
> Compile with -fdebug-types-section -gdwarf-5 (size -A  excerpt):
>  .debug_aranges  48 0
>  .debug_info732 0
>  .debug_abbrev  309 0
>  .debug_line 73 0
>  .debug_str 455 0
>  .debug_rnglists 23 0
>
> Compile with -gt (size -A  excerpt):
>  .ctf  966 0
>  CTF strings sub-section size (ctf_strlen in disassmebly) = 374
>  == > CTF section just for representing types = 966 - 374 = 592 bytes
>  (The 592 bytes include the CTF header and other indexes etc.)
>
> So, following points are what I would highlight. Hopefully this helps you see
> that CTF has promise for the task of representing type debug info.
>
> 1. Type Information layout in sections:
> A .ctf section is self-sufficient to represent types in a program. All
> references within the CTF section are via either indexes or offsets into 
> the
> CTF section. No relocations are necessary in CTF at this time. In 
> contrast,
> DWARF type information is organized in multiple sections - .debug_info,
> .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in 
> DWARF4.
>
> 2. Type Information encoding / compactness matters:
> Because the type information is organized across sections in DWARF (and
> contains some debug information like location etc.) , it is not feasible
> to put a distinct number to the size in bytes for representing type
> information in DWARF. But the size info of sections shown above should
> be helpful to show that CTF does show promise in compactly representing
> types.
>
> Lets see some size data. CTF string table (= 374 bytes) is left out of the
> discussion at hand because it will not be fair to compare with .debug_str
> section which contains other information than just names of types.
>
> The 592 bytes of the .ctf section are needed to represent types in CTF
> format. Now, when using DWARF5, the type information needs 732 bytes in
> .debug_info and 309 bytes in .debug_abbrev.
>
> In DWARF (when using -fdebug-types-section), the base types are duplicated
> across type units. So for the above example, the DWARF DIE representing
> 'unsigned int' will appear in both the  DWARF trees for types - node and
> node_payload. In CTF, there is a single lone type 'unsigned int'.

It's not clear to me why you are using -fdebug-types-section for this
comparison?
With just -gdwarf-4 I get

.debug_info  292
.debug_abbrev 189
.debug_str   299

this contains all the info CTF provides (and more).  This sums to 780 bytes,
smaller than the CTF variant.  I skimmed over the info and there's not much
to strip to get to CTF levels, mainly locations.  The strings section also
has a quite large portion for GCC version and arguments, which is 93 bytes.
So overall the DWARF representation should clock in at less than 700 bytes,
more close to 650.

Richard.

> 3. Type Information retrieval and handling:
> CTF type information is organized as a linear array of CTF types. CTF 
> types
> have references to other CTF types.