[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #19 from Fangrui Song --- (In reply to Jakub Jelinek from comment #14) > This doesn't really look like a good idea to me. Instead, perhaps ld's > --gc-sections or new special option should just remove unused string > literals from mergeable sections. > With your patch, I bet you lose e.g. all tail merging. Consider: > const char *used1 () { return "foo bar baz blah blah"; } > in one TU and > const char *used2 () { return "bar baz blah blah"; } > in another. The linker necessarily knows which strings (or other data) in > mergeable sections are used and which are unused. I second Jakub's idea that the linker should perform the constant merge (which is implemented in LLD): the cost of a section header (sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large. Created a GNU ld (and gold) feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=26622
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 Segher Boessenkool changed: What|Removed |Added Status|NEW |RESOLVED CC||segher at gcc dot gnu.org Resolution|--- |FIXED --- Comment #18 from Segher Boessenkool --- I'm closing this bug. If there is some other (still supported) case we do not support well, please open a new bug report.
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #14 from Jakub Jelinek jakub at gcc dot gnu.org --- This doesn't really look like a good idea to me. Instead, perhaps ld's --gc-sections or new special option should just remove unused string literals from mergeable sections. With your patch, I bet you lose e.g. all tail merging. Consider: const char *used1 () { return foo bar baz blah blah; } in one TU and const char *used2 () { return bar baz blah blah; } in another. The linker necessarily knows which strings (or other data) in mergeable sections are used and which are unused.
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 --- Comment #17 from Matt Whitlock gcc at mattwhitlock dot name --- (In reply to Segher Boessenkool from comment #16) Thanks for the fix, Segher. Your patch seems more right than mine, although I will point out that it doesn't precisely address this bug report, as it places string literal data into unique sections only if -ffunction-sections is set, whereas -fdata-sections has no impact. I can see arguments both ways, and personally this distinction is irrelevant to me, as I always use for -ffunction-sections and -fdata-sections, but the new behavior does seem somewhat counter-intuitive to me. Anyway, I tested your new patch (backported to GCC 4.9.2) with the use cases in Comment 11 and Comment 15, and both produced the desired results (after I added -ffunction-sections to the command lines in Comment 15). So I'm appeased.
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 --- Comment #16 from Segher Boessenkool segher at gcc dot gnu.org --- Author: segher Date: Thu May 7 15:51:01 2015 New Revision: 222880 URL: https://gcc.gnu.org/viewcvs?rev=222880root=gccview=rev Log: PR middle-end/192 PR middle-end/54303 * varasm.c (function_mergeable_rodata_prefix): New function. (mergeable_string_section): Use it. (mergeable_constant_section): Use it. gcc/testsuite/ * gcc.dg/fdata-sections-2.c: New file. Added: trunk/gcc/testsuite/gcc.dg/fdata-sections-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/varasm.c
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 --- Comment #15 from Matt Whitlock gcc at mattwhitlock dot name --- (In reply to Jakub Jelinek from comment #14) This doesn't really look like a good idea to me. Instead, perhaps ld's --gc-sections or new special option should just remove unused string literals from mergeable sections. I believe (I've read, but I haven't verified) that Gold already does this. With your patch, I bet you lose e.g. all tail merging. Tail merging still works fine. Consider: const char *used1 () { return foo bar baz blah blah; } in one TU and const char *used2 () { return bar baz blah blah; } in another. Okay, I'll use your example. $ echo 'const char *used1 () { return foo bar baz blah blah; }' tu1.c $ echo 'const char *used2 () { return bar baz blah blah; }' tu2.c $ cat main.c EOF extern const char * used1(), * used2(); int main() { puts(used1()); puts(used2()); return 0; } EOF $ gcc -c -fdata-sections -fmerge-constants -o tu1.o tu1.c $ gcc -c -fdata-sections -fmerge-constants -o tu2.o tu2.c $ gcc -c -fdata-sections -fmerge-constants -o main.o main.c $ objdump -s tu1.o tu2.o | fgrep -A2 .rodata Contents of section .rodata.str1.1.b4d3fd7d: 666f6f20 62617220 62617a20 626c6168 foo bar baz blah 0010 20626c61 6800 blah. -- Contents of section .rodata.str1.1.a07ea0c2: 62617220 62617a20 626c6168 20626c61 bar baz blah bla 0010 6800 h. $ gcc -Wl,--gc-sections -o proof main.o tu1.o tu2.o $ ./proof foo bar baz blah blah bar baz blah blah $ objdump -s proof | fgrep -A2 .rodata Contents of section .rodata: 40061d 666f6f20 62617220 62617a20 626c6168 foo bar baz blah 40062d 20626c61 6800 blah. As you can see, tail merging across translation units works fine.
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 Rahul rahul.gundecha at gmail dot com changed: What|Removed |Added CC||rahul.gundecha at gmail dot com --- Comment #9 from Rahul rahul.gundecha at gmail dot com --- I am also experiencing the same issue. Is there any solution for it?
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 --- Comment #10 from Matt Whitlock gcc at mattwhitlock dot name --- (In reply to Rahul from comment #9) I am also experiencing the same issue. Is there any solution for it? You can wrap a preprocessor macro around string literals that you want to subject to the linker's garbage collection: #define GCSTR(str) ({ static const char __str[] = str; __str; }) void hello() { puts(GCSTR(111)); // NOT in .rodata puts(222);// in .rodata } int main() { puts(GCSTR(333)); // in .rodata puts(444);// in .rodata return 0; } $ gcc -ffunction-sections -fdata-sections -Wl,--gc-sections -o gcstr gcstr.c $ objdump -s -j .rodata gcstr gcstr: file format elf64-x86-64 Contents of section .rodata: 4005fd 32323200 34343400 3300 222.444.333. The downside of this strategy, however, is that these strings then become ineligible for merging, so if you have multiple *reachable* occurrences of the same GCSTR in your code, then you'll have multiple copies of the string data in the .rodata section of your linked binary. These redundant copies would not be present if the compiler were correctly outputting literal-initialized constant character arrays to sections with the merge and strings flags set (which it should do only if -fmerge-all-constants is set). You can simulate how this could/should work by editing the compiler's assembly output so that it sets the section flags appropriately. Given this program, gcstr.c: #define GCSTR(str) ({ static const char __str[] = str; __str; }) int main() { puts(GCSTR(111)); puts(GCSTR(111)); puts(111); return 0; } Compile (but do not assemble) the program: $ gcc -S -ffunction-sections -fdata-sections -fmerge-all-constants -o gcstr.s gcstr.c Edit the assembly code so that all .rodata.__str.* sections are declared with the merge and strings flags and an entity size of 1: $ sed -e 's/\(\.section\t\.rodata\.__str\..*\),a,\(@progbits\)$/\1,aMS,\2,1/' -i gcstr.s Now assemble and link the program: $ gcc -Wl,--gc-sections -o gcstr gcstr.s Dumping the .rodata section from the resulting executable reveals that the linker did correctly perform string merging. $ objdump -s -j .rodata gcstr gcstr: file format elf64-x86-64 Contents of section .rodata: 40060d 31313100 111. Compare the above objdump output to that which results when skipping the sed step: 40060d 31313100 31313100 31313100 111.111.111. The needed correction is that the compiler should, when -fmerge-all-constants is set, emit literal-initialized constant character array data to a section with flags aMS and entsize==sizeof(T), where T is the type of characters in the array. A further correction (and really the main request in this bug report) would be for the compiler to emit string literals to discrete sections when -fdata-sections is set.
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 --- Comment #12 from H.J. Lu hjl.tools at gmail dot com --- (In reply to Matt Whitlock from comment #11) Created attachment 35479 [details] put string literals into unique sections when -fmerge-constants -fdata-sections This patch puts each string literal into a (probably) unique section when compiling with -fmerge-constants -fdata-sections. The section name is constructed from the character width and string alignment (as before) plus a 32-bit hash of the string contents. Would it better to use MD5 checksum on string contents?
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 --- Comment #13 from Matt Whitlock gcc at mattwhitlock dot name --- (In reply to H.J. Lu from comment #12) Would it better to use MD5 checksum on string contents? MD5 would be slower for not much gain in uniqueness (assuming its output is truncated to 32 bits). This application doesn't require a cryptographically strong hash function, as the consequence of a collision is merely that a string gets included in the binary when maybe it didn't need to be. Actually, I would favor replacing the very old (1996) Lookup2 hash function (implemented in libiberty/hashtab.c) with a more modern hash function, such as MurmurHash3, CityHash, or even Lookup3, all of which are faster than Lookup2. I would hesitate to use more than 32 bits, as the section names would start getting rather long.
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 --- Comment #11 from Matt Whitlock gcc at mattwhitlock dot name --- Created attachment 35479 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35479action=edit put string literals into unique sections when -fmerge-constants -fdata-sections This patch puts each string literal into a (probably) unique section when compiling with -fmerge-constants -fdata-sections. The section name is constructed from the character width and string alignment (as before) plus a 32-bit hash of the string contents. Consider the following program: void used() { puts(keep me); puts(common); puts(string); puts(tail); } void not_used() { puts(toss me); puts(common); puts(ring); puts(entail); } int main() { used(); return 0; } $ gcc -ffunction-sections -fdata-sections -fmerge-constants \ -Wl,--gc-sections -o test test.c Compiling with an unpatched GCC produces a binary whose .rodata contains: 40061d 6b656570 206d6500 636f6d6d 6f6e0073 keep me.common.s 40062d 7472696e 6700746f 7373206d 6500656e tring.toss me.en 40063d 7461696c 00 tail. Compiling with a patched GCC produces a binary whose .rodata contains: 40061d 6b656570 206d6500 636f6d6d 6f6e0073 keep me.common.s 40062d 7472696e 67007461 696c00 tring.tail.
[Bug middle-end/192] String literals don't obey -fdata-sections
--- Comment #8 from maskva at searxhmash dot com 2007-04-02 21:27 --- Created an attachment (id=13319) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13319action=view) ned -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192
[Bug middle-end/192] String literals don't obey -fdata-sections
--- Additional Comments From amodra at bigpond dot net dot au 2004-11-30 06:36 --- This is true of other constants too. For example, on powerpc-linux, compiling the testcase in pr9571: gcc -O2 -m32 -fdata-sections -fno-merge-constants -S /src/tmp/pr9571.c gives: .file pr9571.c .globl d .section.sdata.d,a,@progbits .align 3 .type d, @object .size d, 8 d: .long 1074339512 .long 1374389535 .section.rodata .align 3 .LC0: .long 1074339512 .long 1374389535 .section.text .align 2 .p2align 4,,15 .globl f .type f, @function f: lis 9,[EMAIL PROTECTED] lfd 1,[EMAIL PROTECTED](9) blr .size f,.-f .ident GCC: (GNU) 4.0.0 20041129 (experimental) .section.note.GNU-stack,,@progbits The duplication of the constant isn't ideal either. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192
[Bug middle-end/192] String literals don't obey -fdata-sections
-- What|Removed |Added Status|REOPENED|NEW http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192