[Bug target/109896] Missed optimisation: overflow detection in multiplication instructions for operator new

2023-05-18 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896

--- Comment #7 from Thiago Macieira  ---
(In reply to Jonathan Wakely from comment #6)
> With placement-new there's no allocation:
> https://gcc.godbolt.org/z/68e4PaeYz

Is the exception expected there, though?

[Bug target/109896] Missed optimisation: overflow detection in multiplication instructions for operator new

2023-05-18 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896

--- Comment #6 from Jonathan Wakely  ---
With placement-new there's no allocation:
https://gcc.godbolt.org/z/68e4PaeYz

[Bug target/109896] Missed optimisation: overflow detection in multiplication instructions for operator new

2023-05-17 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896

--- Comment #5 from Thiago Macieira  ---
(In reply to Andrew Pinski from comment #4)
> If you are that picky for cycles, these cycles are not going to be a problem
> compared to the dynamic allocation that is just about to happen ..

Yeah, I realised that after I posted the reply. If the calculation is
successful, we're going to allocate memory and that's neither fast nor
determinstic. If it overflows, we're going to unwind the stack, which is even
worse. I had only looked at the multiplication and failed to consider what
comes after it.

So, yeah, do this if it's a low-hanging fruit.

[Bug target/109896] Missed optimisation: overflow detection in multiplication instructions for operator new

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896

--- Comment #4 from Andrew Pinski  ---
(In reply to Thiago Macieira from comment #3)
> 5 instructions, 4 cycles (not including front-end decode), so roughly the
> same as the imulq example above (4 cycles), but with far more ports to
> dispatch to.

If you are that picky for cycles, these cycles are not going to be a problem
compared to the dynamic allocation that is just about to happen ..

[Bug target/109896] Missed optimisation: overflow detection in multiplication instructions for operator new

2023-05-17 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896

--- Comment #3 from Thiago Macieira  ---
(In reply to H.J. Lu from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > I suspect the overflow code was added before __builtin_*_overflow were added
> > which is why the generated code is this way.
> 
> Should the C++ front-end use __builtin_mul_overflow?

That's what that code is doing, yes.

But mind you that not all examples are doing actual multiplications. That's why
I had the weird size of 47.

A size that is a power of 2 is just doing bit checks. For example, 16:
movq%rdi, %rax
shrq$59, %rax
jne .L2

Other sizes do the compare, but there's no multiplication involved. For 24:
movabsq $384307168202282325, %rax
cmpq%rdi, %rax
jb  .L2
leaq(%rdi,%rdi,2), %rdi
salq$3, %rdi
5 instructions, 4 cycles (not including front-end decode), so roughly the same
as the imulq example above (4 cycles), but with far more ports to dispatch to.

[Bug target/109896] Missed optimisation: overflow detection in multiplication instructions for operator new

2023-05-17 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896

--- Comment #2 from H.J. Lu  ---
(In reply to Andrew Pinski from comment #1)
> I suspect the overflow code was added before __builtin_*_overflow were added
> which is why the generated code is this way.

Should the C++ front-end use __builtin_mul_overflow?

[Bug target/109896] Missed optimisation: overflow detection in multiplication instructions for operator new

2023-05-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109896

--- Comment #1 from Andrew Pinski  ---
I suspect the overflow code was added before __builtin_*_overflow were added
which is why the generated code is this way.