[Bug target/112868] GCC passes -many to the assembler for --enable-checking=release builds

2024-05-06 Thread nisse at lysator dot liu.se via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112868

--- Comment #13 from Niels Möller  ---
(In reply to Peter Bergner from comment #11)

> This is clearly a stage1 patch, so we'll wait until
> then before submitting it.

I'm not that familiar with gcc development procedures. Do I understand you
correctly, that a fix for this bug will not be included in gcc-14 (according to
https://gcc.gnu.org/develop.html#timeline, gcc-14 stage1 ended several months
ago), it will have to wait for gcc-15?

[Bug driver/114813] New: powerpc64: Assembly option -many passed unconditionally, please drop, or make easily configurable

2024-04-22 Thread nisse at lysator dot liu.se via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114813

Bug ID: 114813
   Summary: powerpc64: Assembly option -many passed
unconditionally, please drop, or make easily
configurable
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nisse at lysator dot liu.se
  Target Milestone: ---

Hi,

my project includes C and assembly for a couple of different
architectures and cpu flavors (with some runtime selection of which
functions to use). I use the gcc driver for both .c and .s
(preprocessed assembly) files.

I would like each assembly file to declare the kind of processor it is
intended for, to get compile-time errors if I accidentally use
instructions not available for that processor.

For powerpc64, there are assembly pseudoops for that; docs are a bit
sketchy, but I think I should be able to use, e.g.,

  .machine "power8"

as annotation in the assembly input, and get an error from the
assembler if I use instructions introduced in later processor
versions. However, it turns out that gcc adds the command-line option
-many when processing assembly files, which allows all instructions, and I find
no easy way to
disable that. There may be a good reason for that option when
processing files generated by the gcc compiler (e.g., use of inline
asm for arbitrary processor variants), but it's unhelpful for my
usecase, where I pass an assembly file as input to gcc.

Example:

$ powerpc64le-linux-gnu-gcc -v no-such-file.s
Using built-in specs.
COLLECT_GCC=powerpc64le-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc-cross/powerpc64le-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: powerpc64le-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12 --enable-shared
--enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/
--enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--without-target-system-zlib --with-libphobos-druntime-only=yes
--enable-secureplt --enable-targets=powerpcle-linux --disable-multilib
--enable-multiarch --disable-werror --with-long-double-128
--enable-offload-targets=nvptx-none=/build/gcc-12-cross-7DlaPe/gcc-12-cross-15/gcc/debian/tmp-nvptx/usr
--enable-offload-defaulted --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=powerpc64le-linux-gnu
--program-prefix=powerpc64le-linux-gnu-
--includedir=/usr/powerpc64le-linux-gnu/include
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14) 
COLLECT_GCC_OPTIONS='-v' '-dumpdir' 'a-'

/usr/lib/gcc-cross/powerpc64le-linux-gnu/12/../../../../powerpc64le-linux-gnu/bin/as
-v -a64 -mpower8 -many -mlittle -o /tmp/ccncUIii.o no-such-file.s
GNU assembler version 2.40 (powerpc64le-linux-gnu) using BFD version (GNU
Binutils for Debian) 2.40
Assembler messages:
Error: can't open no-such-file.s for reading: No such file or directory

Note the "-many" on the assembler invocation close to the end. If I
look at the output of -dumpspecs, I see a "-many" at the end of the
"*asm_cpu:" section. I don't understand the syntax, but it appears at
the end with no qualifiers, and I guess that means it's unconditional.
I've checked that I get a different command line, without -many, if I
take that out from the specs section, and pass the modified specs file
to the gcc -specs option, but that's a rather cumbersome workaround.

Could the gcc driver drop the -many option, at least for the case that
its input is a .s (assembly) file?

[Bug libstdc++/91356] Poor optimization of calls involving std::unique_ptr

2021-10-05 Thread nisse at lysator dot liu.se via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356

--- Comment #11 from Niels Möller  ---
Here's a bit more docs on using std::unique_ptr [[clang::trivial_abi]] with
clang and its C++ library.

https://libcxx.llvm.org//DesignDocs/UniquePtrTrivialAbi.html

As for "No, it needs co-operation between G++ and all other compilers using the
same ABI", I would think that if gcc and clang could agree, others would start
moving.

[Bug middle-end/4210] should not warn in dead code

2020-06-04 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210

--- Comment #42 from Niels Möller  ---
Created attachment 48678
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48678=edit
Add a new pass for emitting the warning (not working)

Since adding a new pass seems to be the right way, I've given that a try. See
patch. By my printf debugging I can see that the pass is instantiated and the
execute method is invoked twice (same example program):

unsigned 
shift_dead (unsigned x)
{
  if (0)
return x >> 32;
  else
return x >> 1;
}

unsigned 
shift_invalid (unsigned x)
{
  return x >> 32;
}

But no further printouts, it seems my loop 

+void
+do_warn_shift_count_range (gimple_seq seq)
+{
+  gimple_stmt_iterator i;
+  for (i = gsi_start (seq); !gsi_end_p (i); gsi_next ())

exits before a single iteration. My intention was that it should iterate over
the body of the current function (i.e., fun->gimple_body), which I expect to be
a single return statement after the cfg pass has removed dead code. I'm
probably misunderstanding the data structures.

And what's the easiest way to run the the right compiler process (I guess
that's cc1) under gdb?

[Bug middle-end/4210] should not warn in dead code

2020-05-06 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210

--- Comment #41 from Niels Möller  ---
(In reply to Manuel López-Ibáñez from comment #39)

> You can easily find which pass does something by dumping (-ftree-dump-*)
> all of them and comparing them.

It's -ftree-dump-all, and also -fdump-passes was useful. Thanks! I'm now
compiling without optimization to (i) reduce number of passes, and (ii) because
it would be nice to get it right also in absence of optimization options.

It looks like the dead code is eliminated by the "cfg" (control flow graph)
pass, in gcc/tree-cfg.c. In the .cfg dumpfile it says "Removing basic block 3",
and the invalid shift disappears in that dump. That's nice. Immediately after
this pass comes *warn_function_return, implemented in the same file.

It would make sense to me to add a pass to warn about shift operations with
invalid constant operands at about the same place. Is it easy to traverse a
gimple function, and check all expressions? The "*warn_unused_result" pass
seems to do a similar traversal (but examining statements rather than
expressions, and done *before* the cfg pass).

[Bug middle-end/4210] should not warn in dead code

2020-05-06 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210

--- Comment #38 from Niels Möller  ---
Just a brief update.

1. Tried adding fprintf warnings to c_gimplify_expr (btw, what's the right way
to display a pretty warning with line numbers etc in later passes?). But it
seems that's too early, I still get warnings for dead code.

2. The pass_remove_useless_stmts, mentioned in the docs, was deleted in 2009
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41573), and I take it it was
obsoleted earlier, since there's no mention of a replacement. So what pass
should I look at that is related to basic control flow analysis, and discarding
unreachable statements?

[Bug middle-end/4210] should not warn in dead code

2020-05-05 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210

--- Comment #37 from Niels Möller  ---
(In reply to Manuel López-Ibáñez from comment #35)
> There is no such place. Dead code is identified in the middle-end and by
> then, there is no parse tree, only GIMPLE and SSA:
> https://gcc.gnu.org/onlinedocs/gccint/Passes.html#Passes

So if emission of the warning is postponed to after the Tree SSA passes
(https://gcc.gnu.org/onlinedocs/gccint/Tree-SSA-passes.html#Tree-SSA-passes).
Could perhaps be inserted just after (or as part of) pass_remove_useless_stmts?

> > 3. Alternatively, [...] construct a special tree object representing an
> > expression with invalid/undefined behavior, and any meta data needed to emit
> > a warning or error to describe it? Then emission of the warning could be
> > postponed to later, say, close to the conversion to SSA form?
> 
> That is still too early, since the dead code elimination happens after SSA.

Then that special value needs to be passed through the conversions to gimple
and SSA. I imagine it could be treated much like a compile time constant, but
with an annotation saying it's invalid, and why, to be displayed in case the
compiler thinks it needs to generate any code to evaluate it. (And if we go
that route, we should probably do the same for most or all warnings on invalid
operands detected by the frontend).

[Bug middle-end/4210] should not warn in dead code

2020-05-05 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210

--- Comment #36 from Niels Möller  ---
(In reply to Martin Sebor from comment #34)
> 
> The front ends can eliminate simple subexpressions (as in '0 ? x >> 32 : x
> >> 1') but they don't do the same for statements.  Moving the warning from
> the front end to some later pass would avoid diagnosing code in those cases
> (it would also avoid duplicating the same code between different front
> ends).  The earliest is probably gimplify.c.  That would avoid warning on
> statements rendered dead as a result of constant expressions (as defined by
> the language) but not [...]

Thanks. So moving the warning to the gimplify pass would be an improvement over
the current situation. Maybe I can try it out, it sounds reasonably simple.

[Bug middle-end/4210] should not warning with dead code

2020-05-04 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210

--- Comment #32 from Niels Möller  ---
I've checked out the gcc sources, to see if I can understand how to move the
warning around. The example input I'm looking at now is  

unsigned 
shift_dead (unsigned x)
{
  if (0)
return x >> 32;
  else
return x >> 1;
}

unsigned 
shift_invalid (unsigned x)
{
  return x >> 32;
}

where I'd like gcc -Wall to warn for the second function, but not from the
first (currently, it warns for both). The warning is emitted from
build_binary_op, in gcc/c/c-typeck.c, close to line 11880. Deleting it there
silences the warning for *both* functions above.

I have a few questions. Keep in mind that I only have a very vague
understanding of the different phases in gcc, so any guidance is appreciated. 

1. There's similar code in c_fully_fold_internal, in gcc/c/c-fold.c, close to
line 400. But that code does *not* emit any warning for the example above,
which surprised me a bit. Maybe that's only for the case that both operands to
the shift operator are constants?

2. More importantly, if the warning is deleted from build_binary_op, we need to
add a check for this case, and an appropriate warning, somewhere later in the
compilation process. It has to be done after dead code is identified, i.e., in
a phase processing only non-dead code. And preferably as early as possibly,
when we're still working close to the parse-tree representation. Is there such
a place? Some other functions traversing the tree are c_gimplify_expr
(gcc/c-family/c-gimplify.c) and verify_tree (gcc/c-family/c-common.c), are any
of those called after elimination of dead code?

3. Alternatively, if there's no place after dead code elimination where the
parse tree is still easily available, a different alternative could be to leave
the check for invalid shift counts in c-typeck.c, but instead of emitting a
warning, construct a special tree object representing an expression with
invalid/undefined behavior, and any meta data needed to emit a warning or error
to describe it? Then emission of the warning could be postponed to later, say,
close to the conversion to SSA form?

4. I also wonder what happens if, for some reason, a constant invalid shift
count is passed through all the way to code generation? Most architectures
would represent a constant shift count for a 32-bit value as 5-bit field in the
opcode, and then the invalid shift counts aren't representable at all. Will gcc
silently ignore higher bits, or is it an internal compiler error, or would it
make sense to produce a friendly warning at this late stage of the compilation?

[Bug c/94773] Unhelpful warning "right shift count >= width of type [-Wshift-count-overflow]" on unreachable code.

2020-04-26 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94773

Niels Möller  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Niels Möller  ---
(In reply to Vincent Lefèvre from comment #1)
> Dup of PR4210?

Yes, it seems to be the same problem.

*** This bug has been marked as a duplicate of bug 4210 ***

[Bug middle-end/4210] should not warning with dead code

2020-04-26 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210

Niels Möller  changed:

   What|Removed |Added

 CC||nisse at lysator dot liu.se

--- Comment #31 from Niels Möller  ---
*** Bug 94773 has been marked as a duplicate of this bug. ***

[Bug c/94773] New: Unhelpful warning "right shift count >= width of type [-Wshift-count-overflow]" on unreachable code.

2020-04-26 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94773

Bug ID: 94773
   Summary: Unhelpful warning "right shift count >= width of type
[-Wshift-count-overflow]" on unreachable code.
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
      Reporter: nisse at lysator dot liu.se
  Target Milestone: ---

Consider the following program:

---8<--
/* To avoid including limits.h */
#define CHAR_BIT 8 

unsigned 
shift(unsigned x)
{
  if (sizeof(unsigned) * CHAR_BIT > 32)
return x >> 32;
  else
return x >> 1;
}
---8<---

I compile this on a debian gnu/linux machine, x86_64, where char is 8 bits and
int is 32 bits, and hence the if condition is false. I think the function shift
above is well defined C, even if it has a peculiar dependency on the
architecture's word size.

When compiled with -Wall, I get a warning message on the shift which is in an
unreachable branch of the code: 

$ gcc-10 --version
gcc-10 (Debian 10-20200418-1) 10.0.1 20200418 (experimental) [master revision
27c171775ab:4c277008be0:c5bac7d127f288fd2f8a1f15c3f30da5903141c6]
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc-10 -O -Wall -c rshift-warning.c 
rshift-warning.c: In function ‘shift’:
rshift-warning.c:8:14: warning: right shift count >= width of type
[-Wshift-count-overflow]
8 | return x >> 32;
  |  ^~

I get the same warning (just different formatting) with gcc-8.3.0.

I consider the warning unhelpful, since the code is unreachable precisely
because the outer if statement determines that the shift isn't valid. I mean,
if I did 

  if (CHAR_BIT != 8) return x / (CHAR_BIT - 8);

I wouldn't want to get a warning that the division by the constant zero is
invalid. I agree the minimal example is a bit silly, so I can't argue strongly
that the warning is unhelpful based on just this example. 

The actual code where I encountered the problem was the umul_ppmm macro in
mini-gmp. It's defined at
https://gmplib.org/repo/gmp/file/f4c89b9840ba/mini-gmp/mini-gmp.c#l132 and the
first few lines look like this:

#define gmp_umul_ppmm(w1, w0, u, v) \
  do {  \
int LOCAL_GMP_LIMB_BITS = GMP_LIMB_BITS;\
if (sizeof(unsigned int) * CHAR_BIT >= 2 * GMP_LIMB_BITS)   \
  { \
unsigned int __ww = (unsigned int) (u) * (v);   \
w0 = (mp_limb_t) __ww;  \
w1 = (mp_limb_t) (__ww >> LOCAL_GMP_LIMB_BITS); \
  } \

Gcc doesn't warn for this, but that's because there's a workaround, the
LOCAL_GMP_LIMB_BITS variable. For some reason that suppresses the warning. But
if I try to simplify this by deleting LOCAL_GMP_LIMB_BITS and instead use
GMP_LIMB_BITS in the shift expression, I get the same warning as in the minimal
example above, for every expansion of the macro. In my environment,
sizeof(unsigned int) == 4, CHAR_BIT == 8, GMP_LIMB_BITS == 64, and mp_limb_t is
an alias for unsigned long.

For context, it's possible to make this code reachable by changing the way
mini-gmp is compiled, e.g., making mp_limb_t an alias for unsigned char, and
defining GMP_LIMB_BITS == 8. The purpose of the if statement is to use portable
C to determine, at compile time, if unsigned int is large enough to be used to
temporarily hold a number of bit size 2*GMP_LIMB_BITS, in which case the
following code block is correct.

BTW, if the example is written using a conditional expression instead of a
conditional statement,

  unsigned 
  shift(unsigned x)
  {
return (sizeof(unsigned) * CHAR_BIT > 32? x >> 32 : x >> 1);
  }

it does *not* trigger a warning (gcc-8.3 and gcc-10 behave the same). But that
kind of rewrite is not an option for the code I'm really interested in.

[Bug libstdc++/91356] Poor optimization of calls involving std::unique_ptr

2019-11-05 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356

--- Comment #10 from Niels Möller  ---
I was just made aware of this paper:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1863r0.pdf

arguing that C++ standards community and implementers ought to decide on how to
prioritize C++ performance vs ABI stability. Passing unique_ptr is one if the
items mentioned.

[Bug libstdc++/91356] Poor optimization of calls involving std::unique_ptr

2019-08-06 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356

--- Comment #5 from Niels Möller  ---
(In reply to Jonathan Wakely from comment #4)
> Why wouldn't you take unique_ptr&& instead of passing by value?

Because passing unique_ptr (and other move-only types) by value seems to be the
mainstream idiomatic way to pass around ownership in C++. I'm confident I could
save 0.05% binary size on webrtc's AppRTCMobile.apk by passing unique_ptr&&
everywhere. But my closest C++ experts consider that not a good enough reason
to depart from the more mainstream idiom, and I think they have a good point.

Maybe it's a revival of the old Lisp tradition to write code for clarity, and
in case the compiler generates poor code, just put your faith in future
compiler improvements.

> I'm closing this bug, as there's nothing libstdc++ can do here.

Any improvement needs cooperation between g++ and libstdc++; I was hoping this
bug report would reach relevant people involved on both sides.

[Bug libstdc++/91356] Poor optimization of calls involving std::unique_ptr

2019-08-06 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356

--- Comment #2 from Niels Möller  ---
(In reply to Jonathan Wakely from comment #1)
> The ABI dictates the calling conventions and there's certainly nothing that
> libstdc++ can do about it.

My impression was that C++ ABI is under the control of compiler and C++
standard library, and that there is no such thing as a standard C++ ABI. As
evidenced by the (rare) ABI breaks in libstdc++, and the difficulty of linking
C++ objects compiled with different C++ compilers (e.g, g++ and clang++).

If ABI standard + language spec nails down, e.g., rules on when to pass a C++
object by invisible reference, but doesn't nail down the ABI of std::string,
then the C++ specifics of the ABI standard is a bit pointless, imo. But I
understand my opinion doesn't carry much weight on that... So if it's really
not feasible to improve the situation at all for gnu/linux x86_64 elf targets,
*please* try to keep std::unique_ptr in mind when involved in ABI design for
other targets.

For specific types defined by the GNU standard C++ library, it may also be
possible to add any needed G++ specific attributes in library headers to tell
the compiler to depart from the "standard" ABI.

> In any case, how common is it to have a pointless non-inline baz function
> which does nothing but forward to another non-inline function?

In my experience (mainly from working on the webrtc.org code, where
implementation inheritance is discouraged), it's common with implementations of
interface classes consisting of almost trivial implementations of the
interface's virtual functions, which only setup the correct arguments for
calling a non-inlined (and possibly virtual) method on some member to do the
real work.

[Bug libstdc++/91356] New: Poor optimization of calls involving std::unique_ptr

2019-08-05 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356

Bug ID: 91356
   Summary: Poor optimization of calls involving std::unique_ptr
   Product: gcc
   Version: 8.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nisse at lysator dot liu.se
  Target Milestone: ---

The naïve understanding of unique_ptr, is that it is handled the same
way as a raw pointer, with just

* additional compile time safety checks, and

* automatic runtime calls to delete whenever a non-null unique_ptr
  goes out of scope.

However, the calling convention for unique_ptr implies a *lot* more
overhead than passing a raw pointer. For a start, a unique_ptr is not
passed in a register, but by "invisible reference". To make things
worse, the invisible reference refers to a temporary object that the
caller is responsible for destroying.

Consider a function just passing on a unique_ptr:

  void bar(std::unique_ptr p);
  void baz(std::unique_ptr p) { bar(std::move(p)); }

This compiles (with g++-8 -O3 --fno-exceptions, on gnu/linux x86_64)
to

  _Z3bazSt10unique_ptrIiSt14default_deleteIiEE:
  subq$24, %rsp
  movq(%rdi), %rax
  movq$0, (%rdi)
  leaq8(%rsp), %rdi
  movq%rax, 8(%rsp)
  call_Z3barSt10unique_ptrIiSt14default_deleteIiEE@PLT
  movq8(%rsp), %rdi
  testq   %rdi, %rdi
  je  .L6
  movl$4, %esi
  call_ZdlPvm@PLT
  .L6:
  addq$24, %rsp
  ret

As I read this, the steps are

1. Allocate a new temporary unique_ptr on the stack.

2. Move-construct it from the input argument (pointed to by %rdi).

3. Put the address of the object in %rdi, and invoke the bar function.

4. Destroy the temporary object, including a null test and a branch,
   and a call to the destructor of the underlying type if appropriate.

This can be compared to the raw pointer version,

  void bar(int* p);
  void baz(int* p) { bar(p); }

which compiles to a single jump instruction:

_Z3bazPi:
jmp _Z3barPi@PLT

As far as I understand, it's not possible to really fix this in just
the compiler or library, it's also an ABI issue. I see two somewhat
independent things needed to make the calling convention for
unique_ptr more efficient:

1. Move responsibility for destructing the temporary object from
   caller to callee. This is particularly nice for unique_ptr, since
   the callee often knows statically that the unique_ptr is null when
   going out of scope, and then both the null test and the destructor
   call should be optimized away completely. I don't fully understand
   C++ rules on destruction order, but I've been told that
   callee-destruction is allowed by the language specification (and
   used in the i386-pc-win32 abi). It's less clear if a forwarding
   function like baz(std::unique_ptr p) can delegate
   responsibility further.

2. Make it possible to pass small objects in registers, even if they
   have a non-trivial destructor or copy-constructor. In particular,
   invoke the unique_ptr destructor with the object to be destructed
   in a register. 

   The callee may then need to move the object to memory if it for any
   reason needs a pointer to it. To allow that move, one may need
   something like a "relocatable" property,
   https://quuxplusone.github.io/draft/d1144-object-relocation.html, or
   https://en.cppreference.com/w/cpp/language/attributes/no_unique_address

[Bug rtl-optimization/82677] Many projects (linux, coreutils, GMP, gcrypt, openSSL, etc) are misusing asm(divq/divl) etc, potentially resulting in faulty/unintended optimisations

2017-11-08 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677

--- Comment #12 from Niels Möller  ---
It would be nice with some way to annotate the asm to have it treated in the
same as a possibly trapping division or pointer dereference.

E.g., adding "trap" to the clobber list, somewhat similar to "memory". This
would be a weaker constraint than volatile, since it would allow the compiler
to reorder the asm relative to other instructions, including loads and stores,
but it can't move it out of a conditional. 

Do you think that's a reasonable addition?

[Bug rtl-optimization/82677] Many projects (linux, coreutils, GMP, gcrypt, openSSL, etc) are misusing asm(divq/divl) etc, potentially resulting in faulty/unintended optimisations

2017-10-26 Thread nisse at lysator dot liu.se
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677

Niels Möller  changed:

   What|Removed |Added

 CC||nisse at lysator dot liu.se

--- Comment #10 from Niels Möller  ---
Out of curiosity, how is this handled for division instructions generated by
gcc, with no __asm__ involved? E.g., consider

int foo(int d) {
  int r = 1234567;
  if (d != 0)
r = r / d;
  return r;
}

On an architecture where the div instruction doesn't raise any exception on
divide by zero, this function could be compiled to a division instruction +
conditional move, without any branch instruction. Right?

But on most architectures, that optimization would be invalid, and the compiler
must somehow know that. Is that a property on the representation of division
expression? Or is it tied to some property of the instruction pattern for the
divide instruction?

My question really is: What would it take to mark an __asm__ expression so that
it's treated in the same way as a plain C division?

[Bug c/59905] New: Unfriendly abort when calling a fucntion via a function pointer cast

2014-01-22 Thread nisse at lysator dot liu.se
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59905

Bug ID: 59905
   Summary: Unfriendly abort when calling a fucntion via a
function pointer cast
   Product: gcc
   Version: 4.7.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nisse at lysator dot liu.se

Created attachment 31918
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31918action=edit
Example program

Some background:

In GNU nettle, I use function casts between function types differing in
the type of some pointer arguments. E.g.,

  struct s;

  int foo(struct s *);

  typedef int func_t(void *);

  func_t *f = (func_t *) foo;

Later on, f will be called with an argument of type void *, which
actually is a pointer to a valid struct s, but cast to void *.

I'm aware this is not strictly valid C, but I expect it to work on
virtually all C implementations, because struct s may well be an
incomplete type, and hence the calling conventions cannot reasonably
depend on it's actual definition. If it matters, I could replace void
* above by struct anything *, where struct anything is a different
incomplete type, not defined in any compilation unit ever.

(If you know any examples of architectures, supported by gcc or
otherwise, where calling conventions make this break, I'd be curious
to know about it).

And I think this style is fairly common in object oriented C code. The
closest alternatives, if one wants to stick to the C specification, is
to either skip type checking altogether, always using void * for the
function arguments, and casting when used. Or have do-nothing wrapper
functions like

  int foo(struct s *);
  int foo_wrapper(void *p) { return foo(p); }

This would typically compile to a single jump instruction, but I don't
think the wrapper can be eliminated completely by an optimizing C
compiler, because it is required that pointer comparison foo ==
foo_wrapper gives a false result.

And the reason I care is that I have a library with fairly a large
number of functions, which I want to let applications call either
directly, *with* strong type checking of all arguments, or call via an
abstraction using function pointers and void * instead of the real
struct type pointer, for state/context arguments.

This style seems to work fine with gcc. The surprise is when you call
a function via a cast like this, *without* first storing it a
variable. Like

  struct s;
  ((func_t *)foo)(s);

Here, the compiler issues a warning, and replaces the call by a call
to abort(). I'm attaching a complete example program. With gcc-4.7.2,
on Debian GNU/linux x86_64, this is what happens (and it's the same
with gcc-4.4 and gcc-4.6)

  $ gcc func-cast.c 
  func-cast.c: In function ‘main’:
  func-cast.c:33:41: warning: function called through a non-compatible type
[enabled by default]
  func-cast.c:33:41: note: if this code is reached, the program will abort
  $ ./a.out
  foo x: 1
  bar x (var): 2
  Illegal instruction (core dumped)

I find this behaviour a bit obnoxious... I understand you might just
say that this is bad code and gcc could emit an exec(nethack) if it
wanted to. I think the current gcc behaviour is bad, in particular as
default behavior, because

1. Generating a call to abort is generally unfriendly.

2. I expect that

 f = expr; f(...);

   and

 (expr)(...);

   should behave in roughly the same way for all possible expr, at
   least as long as f and expr have the same type, so the assignment
   itself doesn't imply any type conversion.

3. I think the compiler should in general treat explicit casts in the
   source code as meaning I think I know what I'm doing, please don't
   complain about it.

[Bug c/57157] New: Poor optimization of portable rotate idiom

2013-05-03 Thread nisse at lysator dot liu.se


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157



 Bug #: 57157

   Summary: Poor optimization of portable rotate idiom

Classification: Unclassified

   Product: gcc

   Version: 4.6.3

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: ni...@lysator.liu.se





The standard rotate idiom,



  (x  n) | (x  (32 - n))



is recognized by gcc (for concreteness, I discuss only the case that x

is an uint32_t here).



However, this is portable C only for n in the range 0  n  32. For n

== 0, we get x  32 which gives undefined behaviour according to the

C standard (6.5.7, Bitwise shift operators). To portably support n ==

0, one has to write the rotate as something like



  (x  n) | (x  ((-n)  31))



And this is apparently not recognized by gcc. I compiled this test

program with gcc -O3 -c -save-temps rot.c. Using gcc-4.6.3 on

GNU/Linux x86_64 (ubuntu):



typedef unsigned int uint32_t;



/* Allows 0  n  32 (n == 0 gives undefined behaviour) */

uint32_t

rot1(unsigned n, uint32_t x)

{

  return (x  n) | (x  (32 - n));

}



/* Allows 0 = n  32 */

uint32_t

rot2(unsigned n, uint32_t x)

{

  return (x  n) | (x  ((- n)  31));

}



Generated assembler



.filerot.c

.text

.p2align 4,,15

.globlrot1

.typerot1, @function

rot1:

.LFB0:

.cfi_startproc

movl%esi, %eax

movl%edi, %ecx

roll%cl, %eax

ret

.cfi_endproc

.LFE0:

.sizerot1, .-rot1

.p2align 4,,15

.globlrot2

.typerot2, @function

rot2:

.LFB1:

.cfi_startproc

movl%edi, %ecx

movl%esi, %eax

negl%ecx

shrl%cl, %eax

movl%edi, %ecx

sall%cl, %esi

orl%esi, %eax

ret

.cfi_endproc

.LFE1:

.sizerot2, .-rot2

.identGCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

.section.note.GNU-stack,,@progbits



The desired result is of course to get a rotl instruction also for

rot2, instead of the combination of negl, shrl, sall, and orl.



Applying the above portability fix to my ROTL32 macro in GNU Nettle

results in a slowdown of almost 20% for cast128. This function depends

a lot on key-dependant rotates, where rotation counts of zero will

happen for some keys.


[Bug c/48568] New: Missing documentation for __attribute__((visibility (protected))) on variables.

2011-04-12 Thread nisse at lysator dot liu.se
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48568

   Summary: Missing documentation for __attribute__((visibility
(protected))) on variables.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ni...@lysator.liu.se


I discovered that __attribute__((visibility (protected))) is supported on
variables, not just on functions. This should be documented in the node
Variable Attributes in the GCC manual

This feature is useful in particular for accessing read-only tables (allocated
in the .rodata section) from PIC code. By default, access to such data in other
compilation units is done via the GOT table. Using the above attribute (and
assuming that the definition is linked into in the same shared library file),
the GOT lookup is omitted, and instead one gets a direct access via pc-relative
addressing. I was working on GNU/linux x86_64, where this corresponds to 

  .protected table
  lea table(%rip), %rax  # To get the address of the table

rather than the default

  mov table@GOTPCREL(%rip), %rax

in assembler input. I admit I don't fully understand the .protected pseudo op,
but it's appearantly essential for getting the linker to do the right thing
when linking the shared library (gcc -shared table-def.o table-use.o -o
lib.so).

Regards,
/Niels Möller