https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356

            Bug ID: 91356
           Summary: Poor optimization of calls involving std::unique_ptr
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nisse at lysator dot liu.se
  Target Milestone: ---

The naïve understanding of unique_ptr, is that it is handled the same
way as a raw pointer, with just

* additional compile time safety checks, and

* automatic runtime calls to delete whenever a non-null unique_ptr
  goes out of scope.

However, the calling convention for unique_ptr implies a *lot* more
overhead than passing a raw pointer. For a start, a unique_ptr is not
passed in a register, but by "invisible reference". To make things
worse, the invisible reference refers to a temporary object that the
caller is responsible for destroying.

Consider a function just passing on a unique_ptr:

  void bar(std::unique_ptr<int> p);
  void baz(std::unique_ptr<int> p) { bar(std::move(p)); }

This compiles (with g++-8 -O3 --fno-exceptions, on gnu/linux x86_64)
to

  _Z3bazSt10unique_ptrIiSt14default_deleteIiEE:
          subq    $24, %rsp
          movq    (%rdi), %rax
          movq    $0, (%rdi)
          leaq    8(%rsp), %rdi
          movq    %rax, 8(%rsp)
          call    _Z3barSt10unique_ptrIiSt14default_deleteIiEE@PLT
          movq    8(%rsp), %rdi
          testq   %rdi, %rdi
          je      .L6
          movl    $4, %esi
          call    _ZdlPvm@PLT
  .L6:
          addq    $24, %rsp
          ret

As I read this, the steps are

1. Allocate a new temporary unique_ptr on the stack.

2. Move-construct it from the input argument (pointed to by %rdi).

3. Put the address of the object in %rdi, and invoke the bar function.

4. Destroy the temporary object, including a null test and a branch,
   and a call to the destructor of the underlying type if appropriate.

This can be compared to the raw pointer version,

  void bar(int* p);
  void baz(int* p) { bar(p); }

which compiles to a single jump instruction:

_Z3bazPi:
        jmp     _Z3barPi@PLT

As far as I understand, it's not possible to really fix this in just
the compiler or library, it's also an ABI issue. I see two somewhat
independent things needed to make the calling convention for
unique_ptr more efficient:

1. Move responsibility for destructing the temporary object from
   caller to callee. This is particularly nice for unique_ptr, since
   the callee often knows statically that the unique_ptr is null when
   going out of scope, and then both the null test and the destructor
   call should be optimized away completely. I don't fully understand
   C++ rules on destruction order, but I've been told that
   callee-destruction is allowed by the language specification (and
   used in the i386-pc-win32 abi). It's less clear if a forwarding
   function like baz(std::unique_ptr<int> p) can delegate
   responsibility further.

2. Make it possible to pass small objects in registers, even if they
   have a non-trivial destructor or copy-constructor. In particular,
   invoke the unique_ptr destructor with the object to be destructed
   in a register. 

   The callee may then need to move the object to memory if it for any
   reason needs a pointer to it. To allow that move, one may need
   something like a "relocatable" property,
   https://quuxplusone.github.io/draft/d1144-object-relocation.html, or
   https://en.cppreference.com/w/cpp/language/attributes/no_unique_address

Reply via email to