[Bug middle-end/192] String literals don't obey -fdata-sections

2020-09-15 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #19 from Fangrui Song  ---
(In reply to Jakub Jelinek from comment #14)
> This doesn't really look like a good idea to me.  Instead, perhaps ld's
> --gc-sections or new special option should just remove unused string
> literals from mergeable sections.
> With your patch, I bet you lose e.g. all tail merging.  Consider:
> const char *used1 () { return "foo bar baz blah blah"; }
> in one TU and
> const char *used2 () { return "bar baz blah blah"; }
> in another.  The linker necessarily knows which strings (or other data) in
> mergeable sections are used and which are unused.

I second Jakub's idea that the linker should perform the constant merge (which
is implemented in LLD): the cost of a section header (sizeof(Elf64_Shdr)=64) +
a section name (".rodata.xxx.str1.1") is quite large.

Created a GNU ld (and gold) feature request:
https://sourceware.org/bugzilla/show_bug.cgi?id=26622

[Bug middle-end/192] String literals don't obey -fdata-sections

2015-11-25 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||segher at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #18 from Segher Boessenkool  ---
I'm closing this bug.  If there is some other (still supported) case we
do not support well, please open a new bug report.

[Bug middle-end/192] String literals don't obey -fdata-sections

2015-05-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #14 from Jakub Jelinek jakub at gcc dot gnu.org ---
This doesn't really look like a good idea to me.  Instead, perhaps ld's
--gc-sections or new special option should just remove unused string literals
from mergeable sections.
With your patch, I bet you lose e.g. all tail merging.  Consider:
const char *used1 () { return foo bar baz blah blah; }
in one TU and
const char *used2 () { return bar baz blah blah; }
in another.  The linker necessarily knows which strings (or other data) in
mergeable sections are used and which are unused.


[Bug middle-end/192] String literals don't obey -fdata-sections

2015-05-07 Thread gcc at mattwhitlock dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #17 from Matt Whitlock gcc at mattwhitlock dot name ---
(In reply to Segher Boessenkool from comment #16)

Thanks for the fix, Segher. Your patch seems more right than mine, although I
will point out that it doesn't precisely address this bug report, as it places
string literal data into unique sections only if -ffunction-sections is set,
whereas -fdata-sections has no impact. I can see arguments both ways, and
personally this distinction is irrelevant to me, as I always use for
-ffunction-sections and -fdata-sections, but the new behavior does seem
somewhat counter-intuitive to me.

Anyway, I tested your new patch (backported to GCC 4.9.2) with the use cases in
Comment 11 and Comment 15, and both produced the desired results (after I added
-ffunction-sections to the command lines in Comment 15). So I'm appeased.


[Bug middle-end/192] String literals don't obey -fdata-sections

2015-05-07 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #16 from Segher Boessenkool segher at gcc dot gnu.org ---
Author: segher
Date: Thu May  7 15:51:01 2015
New Revision: 222880

URL: https://gcc.gnu.org/viewcvs?rev=222880root=gccview=rev
Log:
PR middle-end/192
PR middle-end/54303
* varasm.c (function_mergeable_rodata_prefix): New function.
(mergeable_string_section): Use it.
(mergeable_constant_section): Use it.

gcc/testsuite/
* gcc.dg/fdata-sections-2.c: New file.

Added:
trunk/gcc/testsuite/gcc.dg/fdata-sections-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/varasm.c


[Bug middle-end/192] String literals don't obey -fdata-sections

2015-05-07 Thread gcc at mattwhitlock dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #15 from Matt Whitlock gcc at mattwhitlock dot name ---
(In reply to Jakub Jelinek from comment #14)
 This doesn't really look like a good idea to me.  Instead, perhaps ld's
 --gc-sections or new special option should just remove unused string
 literals from mergeable sections.

I believe (I've read, but I haven't verified) that Gold already does this.

 With your patch, I bet you lose e.g. all tail merging.

Tail merging still works fine.

 Consider:
 const char *used1 () { return foo bar baz blah blah; }
 in one TU and
 const char *used2 () { return bar baz blah blah; }
 in another.

Okay, I'll use your example.

$ echo 'const char *used1 () { return foo bar baz blah blah; }'  tu1.c
$ echo 'const char *used2 () { return bar baz blah blah; }'  tu2.c
$ cat  main.c EOF
extern const char * used1(), * used2();
int main() { puts(used1()); puts(used2()); return 0; }
EOF

$ gcc -c -fdata-sections -fmerge-constants -o tu1.o tu1.c
$ gcc -c -fdata-sections -fmerge-constants -o tu2.o tu2.c
$ gcc -c -fdata-sections -fmerge-constants -o main.o main.c

$ objdump -s tu1.o tu2.o | fgrep -A2 .rodata
Contents of section .rodata.str1.1.b4d3fd7d:
  666f6f20 62617220 62617a20 626c6168  foo bar baz blah
 0010 20626c61 6800 blah.  
--
Contents of section .rodata.str1.1.a07ea0c2:
  62617220 62617a20 626c6168 20626c61  bar baz blah bla
 0010 6800 h.  

$ gcc -Wl,--gc-sections -o proof main.o tu1.o tu2.o

$ ./proof
foo bar baz blah blah
bar baz blah blah

$ objdump -s proof | fgrep -A2 .rodata
Contents of section .rodata:
 40061d 666f6f20 62617220 62617a20 626c6168  foo bar baz blah
 40062d 20626c61 6800 blah.  

As you can see, tail merging across translation units works fine.


[Bug middle-end/192] String literals don't obey -fdata-sections

2015-05-06 Thread rahul.gundecha at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

Rahul rahul.gundecha at gmail dot com changed:

   What|Removed |Added

 CC||rahul.gundecha at gmail dot com

--- Comment #9 from Rahul rahul.gundecha at gmail dot com ---
I am also experiencing the same issue. Is there any solution for it?


[Bug middle-end/192] String literals don't obey -fdata-sections

2015-05-06 Thread gcc at mattwhitlock dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #10 from Matt Whitlock gcc at mattwhitlock dot name ---
(In reply to Rahul from comment #9)
 I am also experiencing the same issue. Is there any solution for it?

You can wrap a preprocessor macro around string literals that you want to
subject to the linker's garbage collection:

  #define GCSTR(str) ({ static const char __str[] = str; __str; })

  void hello() {
  puts(GCSTR(111)); // NOT in .rodata
  puts(222);// in .rodata
  }

  int main() {
  puts(GCSTR(333)); // in .rodata
  puts(444);// in .rodata
  return 0;
  }

$ gcc -ffunction-sections -fdata-sections -Wl,--gc-sections -o gcstr gcstr.c

$ objdump -s -j .rodata gcstr

  gcstr: file format elf64-x86-64

  Contents of section .rodata:
   4005fd 32323200 34343400 3300   222.444.333.

The downside of this strategy, however, is that these strings then become
ineligible for merging, so if you have multiple *reachable* occurrences of the
same GCSTR in your code, then you'll have multiple copies of the string data in
the .rodata section of your linked binary.

These redundant copies would not be present if the compiler were correctly
outputting literal-initialized constant character arrays to sections with the
merge and strings flags set (which it should do only if
-fmerge-all-constants is set). You can simulate how this could/should work by
editing the compiler's assembly output so that it sets the section flags
appropriately.

Given this program, gcstr.c:

  #define GCSTR(str) ({ static const char __str[] = str; __str; })

  int main() {
  puts(GCSTR(111));
  puts(GCSTR(111));
  puts(111);
  return 0;
  }

Compile (but do not assemble) the program:

$ gcc -S -ffunction-sections -fdata-sections -fmerge-all-constants -o gcstr.s
gcstr.c

Edit the assembly code so that all .rodata.__str.* sections are declared with
the merge and strings flags and an entity size of 1:

$ sed -e
's/\(\.section\t\.rodata\.__str\..*\),a,\(@progbits\)$/\1,aMS,\2,1/' -i
gcstr.s

Now assemble and link the program:

$ gcc -Wl,--gc-sections -o gcstr gcstr.s

Dumping the .rodata section from the resulting executable reveals that the
linker did correctly perform string merging.

$ objdump -s -j .rodata gcstr

  gcstr: file format elf64-x86-64

  Contents of section .rodata:
   40060d 31313100 111.

Compare the above objdump output to that which results when skipping the sed
step:

   40060d 31313100 31313100 31313100   111.111.111.

The needed correction is that the compiler should, when -fmerge-all-constants
is set, emit literal-initialized constant character array data to a section
with flags aMS and entsize==sizeof(T), where T is the type of characters in
the array.

A further correction (and really the main request in this bug report) would be
for the compiler to emit string literals to discrete sections when
-fdata-sections is set.


[Bug middle-end/192] String literals don't obey -fdata-sections

2015-05-06 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #12 from H.J. Lu hjl.tools at gmail dot com ---
(In reply to Matt Whitlock from comment #11)
 Created attachment 35479 [details]
 put string literals into unique sections when -fmerge-constants
 -fdata-sections
 
 This patch puts each string literal into a (probably) unique section when
 compiling with -fmerge-constants -fdata-sections. The section name is
 constructed from the character width and string alignment (as before) plus a
 32-bit hash of the string contents.

Would it better to use MD5 checksum on string contents?


[Bug middle-end/192] String literals don't obey -fdata-sections

2015-05-06 Thread gcc at mattwhitlock dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #13 from Matt Whitlock gcc at mattwhitlock dot name ---
(In reply to H.J. Lu from comment #12)
 Would it better to use MD5 checksum on string contents?

MD5 would be slower for not much gain in uniqueness (assuming its output is
truncated to 32 bits). This application doesn't require a cryptographically
strong hash function, as the consequence of a collision is merely that a string
gets included in the binary when maybe it didn't need to be.

Actually, I would favor replacing the very old (1996) Lookup2 hash function
(implemented in libiberty/hashtab.c) with a more modern hash function, such as
MurmurHash3, CityHash, or even Lookup3, all of which are faster than Lookup2.

I would hesitate to use more than 32 bits, as the section names would start
getting rather long.


[Bug middle-end/192] String literals don't obey -fdata-sections

2015-05-06 Thread gcc at mattwhitlock dot name
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #11 from Matt Whitlock gcc at mattwhitlock dot name ---
Created attachment 35479
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35479action=edit
put string literals into unique sections when -fmerge-constants -fdata-sections

This patch puts each string literal into a (probably) unique section when
compiling with -fmerge-constants -fdata-sections. The section name is
constructed from the character width and string alignment (as before) plus a
32-bit hash of the string contents.

Consider the following program:

  void used() {
  puts(keep me);
  puts(common);
  puts(string);
  puts(tail);
  }

  void not_used() {
  puts(toss me);
  puts(common);
  puts(ring);
  puts(entail);
  }

  int main() {
  used();
  return 0;
  }

$ gcc -ffunction-sections -fdata-sections -fmerge-constants \
  -Wl,--gc-sections -o test test.c

Compiling with an unpatched GCC produces a binary whose .rodata contains:

   40061d 6b656570 206d6500 636f6d6d 6f6e0073  keep me.common.s
   40062d 7472696e 6700746f 7373206d 6500656e  tring.toss me.en
   40063d 7461696c 00  tail.   

Compiling with a patched GCC produces a binary whose .rodata contains:

   40061d 6b656570 206d6500 636f6d6d 6f6e0073  keep me.common.s
   40062d 7472696e 67007461 696c00 tring.tail.


[Bug middle-end/192] String literals don't obey -fdata-sections

2007-04-02 Thread maskva at searxhmash dot com


--- Comment #8 from maskva at searxhmash dot com  2007-04-02 21:27 ---
Created an attachment (id=13319)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13319action=view)
ned


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192



[Bug middle-end/192] String literals don't obey -fdata-sections

2004-11-29 Thread amodra at bigpond dot net dot au

--- Additional Comments From amodra at bigpond dot net dot au  2004-11-30 
06:36 ---
This is true of other constants too.  For example, on powerpc-linux, compiling
the testcase in pr9571:
gcc -O2 -m32 -fdata-sections -fno-merge-constants -S /src/tmp/pr9571.c
gives:
.file   pr9571.c
.globl d
.section.sdata.d,a,@progbits
.align 3
.type   d, @object
.size   d, 8
d:
.long   1074339512
.long   1374389535
.section.rodata
.align 3
.LC0:
.long   1074339512
.long   1374389535
.section.text
.align 2
.p2align 4,,15
.globl f
.type   f, @function
f:
lis 9,[EMAIL PROTECTED]
lfd 1,[EMAIL PROTECTED](9)
blr
.size   f,.-f
.ident  GCC: (GNU) 4.0.0 20041129 (experimental)
.section.note.GNU-stack,,@progbits

The duplication of the constant isn't ideal either.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192


[Bug middle-end/192] String literals don't obey -fdata-sections

2004-10-13 Thread pinskia at gcc dot gnu dot org


-- 
   What|Removed |Added

 Status|REOPENED|NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192