[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2021-12-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502

--- Comment #8 from Andrew Pinski  ---
(In reply to Marc Glisse from comment #3)
> (In reply to M Welinder from comment #2)
> > The destruction still stinks: the full destructor is inlined instead of
> > the small-string-only version (i.e., a no-op).  Evidently gcc cannot
> > see that the string remains a small-string.
> 
> void foo (const std::string ){
>   const_cast(s)="some longer string so it needs proper
> deletion";
> }

I submitted PR 103827 for the similar case but only passed via a hidden
reference instead.

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2018-10-05 Thread terra at gnome dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502

--- Comment #7 from M Welinder  ---
Actually, it's more like 50+ instructions for destructing the string that
never (or almost never) needs destructing.  16 of those appear to need
linker fixup.

Sample for the C++14 mode:




callZ3fooRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
.LEHE5:
movq(%rsp), %rbx
leaq16(%rsp), %rax
cmpq%rax, %rbx
je  .L72
movq16(%rsp), %rsi
addq$1, %rsi
je  .L72
testq   %rbx, %rbx
je  .L72
cmpq$128, %rsi
ja  .L73
movl_ZN9__gnu_cxx12__pool_allocIcE12_S_force_newE(%rip), %r9d
testl   %r9d, %r9d
jle .L74
.L73:
movq%rbx, %rdi
call_ZdlPv
.L72:
[done]
...
[out-of-band code]
.p2align 4,,10
.p2align 3
.L74:
movq%rsp, %rdi
movl$_ZL28__gthrw___pthread_key_createPjPFvPvE, %ebp
call_ZN9__gnu_cxx17__pool_alloc_base16_M_get_free_listEm
movq%rsp, %rdi
movq%rax, %r12
call_ZN9__gnu_cxx17__pool_alloc_base12_M_get_mutexEv
movq%rax, %r13
testq   %rbp, %rbp
je  .L75
movq%rax, %rdi
call_ZL26__gthrw_pthread_mutex_lockP15pthread_mutex_t
testl   %eax, %eax
je  .L75
movl$8, %edi
call__cxa_allocate_exception
movl$_ZN9__gnu_cxx24__concurrence_lock_errorD1Ev, %edx
movl$_ZTIN9__gnu_cxx24__concurrence_lock_errorE, %esi
movq$_ZTVN9__gnu_cxx24__concurrence_lock_errorE+16, (%rax)
movq%rax, %rdi
.LEHB20:
call__cxa_throw
.LEHE20:
.p2align 4,,10
.p2align 3
.L75:
movq(%r12), %rax
movq%rax, (%rbx)
movq%rbx, (%r12)
testq   %rbp, %rbp
je  .L72
movq%r13, %rdi
call_ZL28__gthrw_pthread_mutex_unlockP15pthread_mutex_t
testl   %eax, %eax
je  .L72
movl$8, %edi
call__cxa_allocate_exception
movl$_ZN9__gnu_cxx26__concurrence_unlock_errorD1Ev, %edx
movl$_ZTIN9__gnu_cxx26__concurrence_unlock_errorE, %esi
movq$_ZTVN9__gnu_cxx26__concurrence_unlock_errorE+16, (%rax)
movq%rax, %rdi
.LEHB21:
call__cxa_throw

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2018-10-05 Thread terra at gnome dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502

--- Comment #6 from M Welinder  ---
> const_cast(s)="some longer string so it needs proper deletion";

Is that really valid?

This comes down to whether the temporary object creating with the function
call is constant [in which case the above is UB] or not [in which case the
above is perfectly fine].

If I am reading http://cpp14.centaur.ath.cx/dcl.init.ref.html right then
the object is created with the const/volatile specifiers of the argument
type, in this case const.  If so, such an object must not be changed.

Just in case I am reading it wrong, I still say the generated code is
excessive.  gcc creates 16 instructions that, among other things, check
the _S_force_new flag.  Most of those instructions will never be executed
in practice.  I think the generated code, at least for the
temporary-from-small-string case, should simply be

movq (%rsp), %rdi
leaq 16(%rsp), %rax
cmpq %rax, %rdi;; check if buffer is allocated
je .L42
call do_the_complicated_thing
.L42:

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2018-10-05 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502

--- Comment #5 from Andrew Pinski  ---
The C++17 calling a function part of this issue is record as PR 86590 already.

The rest is a different issue and should be looked into.

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2018-10-05 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502

Martin Liška  changed:

   What|Removed |Added

 CC||redi at gcc dot gnu.org

--- Comment #4 from Martin Liška  ---
Ccing Jonathan, libstdc++ maintainer.

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2018-10-04 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502

--- Comment #3 from Marc Glisse  ---
(In reply to M Welinder from comment #2)
> The destruction still stinks: the full destructor is inlined instead of
> the small-string-only version (i.e., a no-op).  Evidently gcc cannot
> see that the string remains a small-string.

void foo (const std::string ){
  const_cast(s)="some longer string so it needs proper deletion";
}

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2018-10-04 Thread terra at gnome dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502

--- Comment #2 from M Welinder  ---
Created attachment 44789
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44789=edit
Proof-on-concept patch

This proof-of-concept improves the situation dramatically.  I don't
know if it is actually correct.  This is done, in part, by not discarding
the fact that the source is zero-terminated.

With this patch I get the code I expect, mostly.  For sizes 2 mod 4 I
still get the zero termination as a separate movb.

The destruction still stinks: the full destructor is inlined instead of
the small-string-only version (i.e., a no-op).  Evidently gcc cannot
see that the string remains a small-string.

[Bug libstdc++/87502] Poor code generation for std::string("c-style string")

2018-10-04 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87502

Martin Liška  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org

--- Comment #1 from Martin Liška  ---
(In reply to M Welinder from comment #0)
> Created attachment 44776 [details]
> Preprocessed source code
> 
> It appears that gcc is creating quite poor code when "c-style strings"
> are used to construct std::string objects.  Ideally, the result ought
> to be just a few move instructions for small strings.
> 
> 
> Host: Linux x86_64 4.4.140-62-default (OpenSuSE)
> 
> Test code:
> ---
> #include 
> 
> extern void foo (const std::string &);
> 
> void
> bar ()
> {
>   foo ("abc");
>   foo (std::string("abc"));
> }
> ---
> 
> 
> 
> # /usr/local/products/gcc/8.2.0/bin/g++ -std=gnu++1z  -S -m32 -O3 ttt.C
> # grep 'call.*construct' ttt.s 
>   call
> _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S
> 8_St20forward_iterator_tag.constprop.18
>   call
> _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE12_M_constructIPKcEEvT_S
> 8_St20forward_iterator_tag.constprop.18
> 
> Here gcc generates complete calls to the generic string construction
> even though the strings are constructed from small, known strings.

With -O2 -fdump-ipa-inline says:
function not declared inline and code size would grow

> 
> "-std=gnu++1z" is important; "-m32" and "-O3" (as opposed to "-m64" and
> "-O2") are not.

With -O3 more inlining happens.

> 
> # /usr/local/products/gcc/8.2.0/bin/g++ -S -m32 -O3 ttt.C
> # grep 'call.*construct' ttt.s
> # (nada)
> 
> No calls -- good.  In this case gcc generates this fragment:
> 
> _Z3barv:
> .LFB1084:
>   .cfi_startproc
>   .cfi_personality 0,__gxx_personality_v0
>   .cfi_lsda 0,.LLSDA1084
>   pushl   %ebp
>   .cfi_def_cfa_offset 8
>   .cfi_offset 5, -8
>   movl$25185, %edx
>   movl%esp, %ebp
>   .cfi_def_cfa_register 5
>   pushl   %edi
>   pushl   %esi
>   .cfi_offset 7, -12
>   .cfi_offset 6, -16
>   leal-48(%ebp), %esi
>   pushl   %ebx
>   .cfi_offset 3, -20
>   leal-40(%ebp), %ebx
>   subl$56, %esp
>   movl%ebx, -48(%ebp)
>   pushl   %esi
>   movw%dx, -40(%ebp)
>   movb$99, -38(%ebp)
>   movl$3, -44(%ebp)
>   movb$0, -37(%ebp)
> .LEHB6:
>   .cfi_escape 0x2e,0x10
>   call_Z3fooRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
> [...]
> 
> This is better than a call, but not great:
> 1. The string is moved into position in three chunks (25185, 99, 0).
>This probably comes from inlined memcpy of 3 bytes, but the source
>is zero-terminated so rounding the memcpy size up to 4 would have
>been better.

Yes we end up with:
  __builtin_memcpy (_M_local_buf, "abc", 3);


> 2. It's unclear why 25185 is passed through a register.

It's somehow connected to fact that constant are somehow expensive
on x86_64. Jakub can help here..