https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110945
Bug ID: 110945 Summary: std::basic_string::assign dramatically slower than other means of copying memory Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: janschultke at googlemail dot com Target Milestone: --- See https://quick-bench.com/q/bqGjfyd180oOlJhiY_XnURMNKG8 Using the copy constructor performs best, and ends up using std::memcpy internally. Even using .resize() and std::copy is much faster than .assign(), because it is subject to more partial loop unrolling. basic_string::assign: https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.h#L1713C28-L1713C28 this calls the four-iterator form of .replace(): https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.h#L2378 this calls this form of _M_replace_dispatch(): (I think) https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.tcc#L430 this calls _M_replace(): https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.tcc#L507 in this case, it should call _S_move(): https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/basic_string.h#L431 this calls char_traits::move(): https://github.com/gcc-mirror/gcc/blob/25c4b1620ebc10fceabd86a34fdbbaf8037e7e82/libstdc%2B%2B-v3/include/bits/char_traits.h#L223 and that calls __builtin_memcpy() However, I must have followed this chain of calls incorrectly, because I do not see a call to memmove in the output assembly, and most of the time is spent here: > nopl (%rax) > movdqa 0x42d8a0(%rdx),%xmm0 > 63.27% movups %xmm0,(%rax,%rdx,1) > 36.69% add $0x10,%rdx > 0.03% cmp $0x100000,%rdx