[Bug target/112868] GCC passes -many to the assembler for --enable-checking=release builds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112868 --- Comment #13 from Niels Möller --- (In reply to Peter Bergner from comment #11) > This is clearly a stage1 patch, so we'll wait until > then before submitting it. I'm not that familiar with gcc development procedures. Do I understand you correctly, that a fix for this bug will not be included in gcc-14 (according to https://gcc.gnu.org/develop.html#timeline, gcc-14 stage1 ended several months ago), it will have to wait for gcc-15?
[Bug driver/114813] New: powerpc64: Assembly option -many passed unconditionally, please drop, or make easily configurable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114813 Bug ID: 114813 Summary: powerpc64: Assembly option -many passed unconditionally, please drop, or make easily configurable Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: nisse at lysator dot liu.se Target Milestone: --- Hi, my project includes C and assembly for a couple of different architectures and cpu flavors (with some runtime selection of which functions to use). I use the gcc driver for both .c and .s (preprocessed assembly) files. I would like each assembly file to declare the kind of processor it is intended for, to get compile-time errors if I accidentally use instructions not available for that processor. For powerpc64, there are assembly pseudoops for that; docs are a bit sketchy, but I think I should be able to use, e.g., .machine "power8" as annotation in the assembly input, and get an error from the assembler if I use instructions introduced in later processor versions. However, it turns out that gcc adds the command-line option -many when processing assembly files, which allows all instructions, and I find no easy way to disable that. There may be a good reason for that option when processing files generated by the gcc compiler (e.g., use of inline asm for arbitrary processor variants), but it's unhelpful for my usecase, where I pass an assembly file as input to gcc. Example: $ powerpc64le-linux-gnu-gcc -v no-such-file.s Using built-in specs. COLLECT_GCC=powerpc64le-linux-gnu-gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc-cross/powerpc64le-linux-gnu/12/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: powerpc64le-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --without-target-system-zlib --with-libphobos-druntime-only=yes --enable-secureplt --enable-targets=powerpcle-linux --disable-multilib --enable-multiarch --disable-werror --with-long-double-128 --enable-offload-targets=nvptx-none=/build/gcc-12-cross-7DlaPe/gcc-12-cross-15/gcc/debian/tmp-nvptx/usr --enable-offload-defaulted --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=powerpc64le-linux-gnu --program-prefix=powerpc64le-linux-gnu- --includedir=/usr/powerpc64le-linux-gnu/include Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.2.0 (Debian 12.2.0-14) COLLECT_GCC_OPTIONS='-v' '-dumpdir' 'a-' /usr/lib/gcc-cross/powerpc64le-linux-gnu/12/../../../../powerpc64le-linux-gnu/bin/as -v -a64 -mpower8 -many -mlittle -o /tmp/ccncUIii.o no-such-file.s GNU assembler version 2.40 (powerpc64le-linux-gnu) using BFD version (GNU Binutils for Debian) 2.40 Assembler messages: Error: can't open no-such-file.s for reading: No such file or directory Note the "-many" on the assembler invocation close to the end. If I look at the output of -dumpspecs, I see a "-many" at the end of the "*asm_cpu:" section. I don't understand the syntax, but it appears at the end with no qualifiers, and I guess that means it's unconditional. I've checked that I get a different command line, without -many, if I take that out from the specs section, and pass the modified specs file to the gcc -specs option, but that's a rather cumbersome workaround. Could the gcc driver drop the -many option, at least for the case that its input is a .s (assembly) file?
[Bug libstdc++/91356] Poor optimization of calls involving std::unique_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356 --- Comment #11 from Niels Möller --- Here's a bit more docs on using std::unique_ptr [[clang::trivial_abi]] with clang and its C++ library. https://libcxx.llvm.org//DesignDocs/UniquePtrTrivialAbi.html As for "No, it needs co-operation between G++ and all other compilers using the same ABI", I would think that if gcc and clang could agree, others would start moving.
[Bug middle-end/4210] should not warn in dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210 --- Comment #42 from Niels Möller --- Created attachment 48678 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48678=edit Add a new pass for emitting the warning (not working) Since adding a new pass seems to be the right way, I've given that a try. See patch. By my printf debugging I can see that the pass is instantiated and the execute method is invoked twice (same example program): unsigned shift_dead (unsigned x) { if (0) return x >> 32; else return x >> 1; } unsigned shift_invalid (unsigned x) { return x >> 32; } But no further printouts, it seems my loop +void +do_warn_shift_count_range (gimple_seq seq) +{ + gimple_stmt_iterator i; + for (i = gsi_start (seq); !gsi_end_p (i); gsi_next ()) exits before a single iteration. My intention was that it should iterate over the body of the current function (i.e., fun->gimple_body), which I expect to be a single return statement after the cfg pass has removed dead code. I'm probably misunderstanding the data structures. And what's the easiest way to run the the right compiler process (I guess that's cc1) under gdb?
[Bug middle-end/4210] should not warn in dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210 --- Comment #41 from Niels Möller --- (In reply to Manuel López-Ibáñez from comment #39) > You can easily find which pass does something by dumping (-ftree-dump-*) > all of them and comparing them. It's -ftree-dump-all, and also -fdump-passes was useful. Thanks! I'm now compiling without optimization to (i) reduce number of passes, and (ii) because it would be nice to get it right also in absence of optimization options. It looks like the dead code is eliminated by the "cfg" (control flow graph) pass, in gcc/tree-cfg.c. In the .cfg dumpfile it says "Removing basic block 3", and the invalid shift disappears in that dump. That's nice. Immediately after this pass comes *warn_function_return, implemented in the same file. It would make sense to me to add a pass to warn about shift operations with invalid constant operands at about the same place. Is it easy to traverse a gimple function, and check all expressions? The "*warn_unused_result" pass seems to do a similar traversal (but examining statements rather than expressions, and done *before* the cfg pass).
[Bug middle-end/4210] should not warn in dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210 --- Comment #38 from Niels Möller --- Just a brief update. 1. Tried adding fprintf warnings to c_gimplify_expr (btw, what's the right way to display a pretty warning with line numbers etc in later passes?). But it seems that's too early, I still get warnings for dead code. 2. The pass_remove_useless_stmts, mentioned in the docs, was deleted in 2009 (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41573), and I take it it was obsoleted earlier, since there's no mention of a replacement. So what pass should I look at that is related to basic control flow analysis, and discarding unreachable statements?
[Bug middle-end/4210] should not warn in dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210 --- Comment #37 from Niels Möller --- (In reply to Manuel López-Ibáñez from comment #35) > There is no such place. Dead code is identified in the middle-end and by > then, there is no parse tree, only GIMPLE and SSA: > https://gcc.gnu.org/onlinedocs/gccint/Passes.html#Passes So if emission of the warning is postponed to after the Tree SSA passes (https://gcc.gnu.org/onlinedocs/gccint/Tree-SSA-passes.html#Tree-SSA-passes). Could perhaps be inserted just after (or as part of) pass_remove_useless_stmts? > > 3. Alternatively, [...] construct a special tree object representing an > > expression with invalid/undefined behavior, and any meta data needed to emit > > a warning or error to describe it? Then emission of the warning could be > > postponed to later, say, close to the conversion to SSA form? > > That is still too early, since the dead code elimination happens after SSA. Then that special value needs to be passed through the conversions to gimple and SSA. I imagine it could be treated much like a compile time constant, but with an annotation saying it's invalid, and why, to be displayed in case the compiler thinks it needs to generate any code to evaluate it. (And if we go that route, we should probably do the same for most or all warnings on invalid operands detected by the frontend).
[Bug middle-end/4210] should not warn in dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210 --- Comment #36 from Niels Möller --- (In reply to Martin Sebor from comment #34) > > The front ends can eliminate simple subexpressions (as in '0 ? x >> 32 : x > >> 1') but they don't do the same for statements. Moving the warning from > the front end to some later pass would avoid diagnosing code in those cases > (it would also avoid duplicating the same code between different front > ends). The earliest is probably gimplify.c. That would avoid warning on > statements rendered dead as a result of constant expressions (as defined by > the language) but not [...] Thanks. So moving the warning to the gimplify pass would be an improvement over the current situation. Maybe I can try it out, it sounds reasonably simple.
[Bug middle-end/4210] should not warning with dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210 --- Comment #32 from Niels Möller --- I've checked out the gcc sources, to see if I can understand how to move the warning around. The example input I'm looking at now is unsigned shift_dead (unsigned x) { if (0) return x >> 32; else return x >> 1; } unsigned shift_invalid (unsigned x) { return x >> 32; } where I'd like gcc -Wall to warn for the second function, but not from the first (currently, it warns for both). The warning is emitted from build_binary_op, in gcc/c/c-typeck.c, close to line 11880. Deleting it there silences the warning for *both* functions above. I have a few questions. Keep in mind that I only have a very vague understanding of the different phases in gcc, so any guidance is appreciated. 1. There's similar code in c_fully_fold_internal, in gcc/c/c-fold.c, close to line 400. But that code does *not* emit any warning for the example above, which surprised me a bit. Maybe that's only for the case that both operands to the shift operator are constants? 2. More importantly, if the warning is deleted from build_binary_op, we need to add a check for this case, and an appropriate warning, somewhere later in the compilation process. It has to be done after dead code is identified, i.e., in a phase processing only non-dead code. And preferably as early as possibly, when we're still working close to the parse-tree representation. Is there such a place? Some other functions traversing the tree are c_gimplify_expr (gcc/c-family/c-gimplify.c) and verify_tree (gcc/c-family/c-common.c), are any of those called after elimination of dead code? 3. Alternatively, if there's no place after dead code elimination where the parse tree is still easily available, a different alternative could be to leave the check for invalid shift counts in c-typeck.c, but instead of emitting a warning, construct a special tree object representing an expression with invalid/undefined behavior, and any meta data needed to emit a warning or error to describe it? Then emission of the warning could be postponed to later, say, close to the conversion to SSA form? 4. I also wonder what happens if, for some reason, a constant invalid shift count is passed through all the way to code generation? Most architectures would represent a constant shift count for a 32-bit value as 5-bit field in the opcode, and then the invalid shift counts aren't representable at all. Will gcc silently ignore higher bits, or is it an internal compiler error, or would it make sense to produce a friendly warning at this late stage of the compilation?
[Bug c/94773] Unhelpful warning "right shift count >= width of type [-Wshift-count-overflow]" on unreachable code.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94773 Niels Möller changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Niels Möller --- (In reply to Vincent Lefèvre from comment #1) > Dup of PR4210? Yes, it seems to be the same problem. *** This bug has been marked as a duplicate of bug 4210 ***
[Bug middle-end/4210] should not warning with dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210 Niels Möller changed: What|Removed |Added CC||nisse at lysator dot liu.se --- Comment #31 from Niels Möller --- *** Bug 94773 has been marked as a duplicate of this bug. ***
[Bug c/94773] New: Unhelpful warning "right shift count >= width of type [-Wshift-count-overflow]" on unreachable code.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94773 Bug ID: 94773 Summary: Unhelpful warning "right shift count >= width of type [-Wshift-count-overflow]" on unreachable code. Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: nisse at lysator dot liu.se Target Milestone: --- Consider the following program: ---8<-- /* To avoid including limits.h */ #define CHAR_BIT 8 unsigned shift(unsigned x) { if (sizeof(unsigned) * CHAR_BIT > 32) return x >> 32; else return x >> 1; } ---8<--- I compile this on a debian gnu/linux machine, x86_64, where char is 8 bits and int is 32 bits, and hence the if condition is false. I think the function shift above is well defined C, even if it has a peculiar dependency on the architecture's word size. When compiled with -Wall, I get a warning message on the shift which is in an unreachable branch of the code: $ gcc-10 --version gcc-10 (Debian 10-20200418-1) 10.0.1 20200418 (experimental) [master revision 27c171775ab:4c277008be0:c5bac7d127f288fd2f8a1f15c3f30da5903141c6] Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ gcc-10 -O -Wall -c rshift-warning.c rshift-warning.c: In function ‘shift’: rshift-warning.c:8:14: warning: right shift count >= width of type [-Wshift-count-overflow] 8 | return x >> 32; | ^~ I get the same warning (just different formatting) with gcc-8.3.0. I consider the warning unhelpful, since the code is unreachable precisely because the outer if statement determines that the shift isn't valid. I mean, if I did if (CHAR_BIT != 8) return x / (CHAR_BIT - 8); I wouldn't want to get a warning that the division by the constant zero is invalid. I agree the minimal example is a bit silly, so I can't argue strongly that the warning is unhelpful based on just this example. The actual code where I encountered the problem was the umul_ppmm macro in mini-gmp. It's defined at https://gmplib.org/repo/gmp/file/f4c89b9840ba/mini-gmp/mini-gmp.c#l132 and the first few lines look like this: #define gmp_umul_ppmm(w1, w0, u, v) \ do { \ int LOCAL_GMP_LIMB_BITS = GMP_LIMB_BITS;\ if (sizeof(unsigned int) * CHAR_BIT >= 2 * GMP_LIMB_BITS) \ { \ unsigned int __ww = (unsigned int) (u) * (v); \ w0 = (mp_limb_t) __ww; \ w1 = (mp_limb_t) (__ww >> LOCAL_GMP_LIMB_BITS); \ } \ Gcc doesn't warn for this, but that's because there's a workaround, the LOCAL_GMP_LIMB_BITS variable. For some reason that suppresses the warning. But if I try to simplify this by deleting LOCAL_GMP_LIMB_BITS and instead use GMP_LIMB_BITS in the shift expression, I get the same warning as in the minimal example above, for every expansion of the macro. In my environment, sizeof(unsigned int) == 4, CHAR_BIT == 8, GMP_LIMB_BITS == 64, and mp_limb_t is an alias for unsigned long. For context, it's possible to make this code reachable by changing the way mini-gmp is compiled, e.g., making mp_limb_t an alias for unsigned char, and defining GMP_LIMB_BITS == 8. The purpose of the if statement is to use portable C to determine, at compile time, if unsigned int is large enough to be used to temporarily hold a number of bit size 2*GMP_LIMB_BITS, in which case the following code block is correct. BTW, if the example is written using a conditional expression instead of a conditional statement, unsigned shift(unsigned x) { return (sizeof(unsigned) * CHAR_BIT > 32? x >> 32 : x >> 1); } it does *not* trigger a warning (gcc-8.3 and gcc-10 behave the same). But that kind of rewrite is not an option for the code I'm really interested in.
[Bug libstdc++/91356] Poor optimization of calls involving std::unique_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356 --- Comment #10 from Niels Möller --- I was just made aware of this paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1863r0.pdf arguing that C++ standards community and implementers ought to decide on how to prioritize C++ performance vs ABI stability. Passing unique_ptr is one if the items mentioned.
[Bug libstdc++/91356] Poor optimization of calls involving std::unique_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356 --- Comment #5 from Niels Möller --- (In reply to Jonathan Wakely from comment #4) > Why wouldn't you take unique_ptr&& instead of passing by value? Because passing unique_ptr (and other move-only types) by value seems to be the mainstream idiomatic way to pass around ownership in C++. I'm confident I could save 0.05% binary size on webrtc's AppRTCMobile.apk by passing unique_ptr&& everywhere. But my closest C++ experts consider that not a good enough reason to depart from the more mainstream idiom, and I think they have a good point. Maybe it's a revival of the old Lisp tradition to write code for clarity, and in case the compiler generates poor code, just put your faith in future compiler improvements. > I'm closing this bug, as there's nothing libstdc++ can do here. Any improvement needs cooperation between g++ and libstdc++; I was hoping this bug report would reach relevant people involved on both sides.
[Bug libstdc++/91356] Poor optimization of calls involving std::unique_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356 --- Comment #2 from Niels Möller --- (In reply to Jonathan Wakely from comment #1) > The ABI dictates the calling conventions and there's certainly nothing that > libstdc++ can do about it. My impression was that C++ ABI is under the control of compiler and C++ standard library, and that there is no such thing as a standard C++ ABI. As evidenced by the (rare) ABI breaks in libstdc++, and the difficulty of linking C++ objects compiled with different C++ compilers (e.g, g++ and clang++). If ABI standard + language spec nails down, e.g., rules on when to pass a C++ object by invisible reference, but doesn't nail down the ABI of std::string, then the C++ specifics of the ABI standard is a bit pointless, imo. But I understand my opinion doesn't carry much weight on that... So if it's really not feasible to improve the situation at all for gnu/linux x86_64 elf targets, *please* try to keep std::unique_ptr in mind when involved in ABI design for other targets. For specific types defined by the GNU standard C++ library, it may also be possible to add any needed G++ specific attributes in library headers to tell the compiler to depart from the "standard" ABI. > In any case, how common is it to have a pointless non-inline baz function > which does nothing but forward to another non-inline function? In my experience (mainly from working on the webrtc.org code, where implementation inheritance is discouraged), it's common with implementations of interface classes consisting of almost trivial implementations of the interface's virtual functions, which only setup the correct arguments for calling a non-inlined (and possibly virtual) method on some member to do the real work.
[Bug libstdc++/91356] New: Poor optimization of calls involving std::unique_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91356 Bug ID: 91356 Summary: Poor optimization of calls involving std::unique_ptr Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: nisse at lysator dot liu.se Target Milestone: --- The naïve understanding of unique_ptr, is that it is handled the same way as a raw pointer, with just * additional compile time safety checks, and * automatic runtime calls to delete whenever a non-null unique_ptr goes out of scope. However, the calling convention for unique_ptr implies a *lot* more overhead than passing a raw pointer. For a start, a unique_ptr is not passed in a register, but by "invisible reference". To make things worse, the invisible reference refers to a temporary object that the caller is responsible for destroying. Consider a function just passing on a unique_ptr: void bar(std::unique_ptr p); void baz(std::unique_ptr p) { bar(std::move(p)); } This compiles (with g++-8 -O3 --fno-exceptions, on gnu/linux x86_64) to _Z3bazSt10unique_ptrIiSt14default_deleteIiEE: subq$24, %rsp movq(%rdi), %rax movq$0, (%rdi) leaq8(%rsp), %rdi movq%rax, 8(%rsp) call_Z3barSt10unique_ptrIiSt14default_deleteIiEE@PLT movq8(%rsp), %rdi testq %rdi, %rdi je .L6 movl$4, %esi call_ZdlPvm@PLT .L6: addq$24, %rsp ret As I read this, the steps are 1. Allocate a new temporary unique_ptr on the stack. 2. Move-construct it from the input argument (pointed to by %rdi). 3. Put the address of the object in %rdi, and invoke the bar function. 4. Destroy the temporary object, including a null test and a branch, and a call to the destructor of the underlying type if appropriate. This can be compared to the raw pointer version, void bar(int* p); void baz(int* p) { bar(p); } which compiles to a single jump instruction: _Z3bazPi: jmp _Z3barPi@PLT As far as I understand, it's not possible to really fix this in just the compiler or library, it's also an ABI issue. I see two somewhat independent things needed to make the calling convention for unique_ptr more efficient: 1. Move responsibility for destructing the temporary object from caller to callee. This is particularly nice for unique_ptr, since the callee often knows statically that the unique_ptr is null when going out of scope, and then both the null test and the destructor call should be optimized away completely. I don't fully understand C++ rules on destruction order, but I've been told that callee-destruction is allowed by the language specification (and used in the i386-pc-win32 abi). It's less clear if a forwarding function like baz(std::unique_ptr p) can delegate responsibility further. 2. Make it possible to pass small objects in registers, even if they have a non-trivial destructor or copy-constructor. In particular, invoke the unique_ptr destructor with the object to be destructed in a register. The callee may then need to move the object to memory if it for any reason needs a pointer to it. To allow that move, one may need something like a "relocatable" property, https://quuxplusone.github.io/draft/d1144-object-relocation.html, or https://en.cppreference.com/w/cpp/language/attributes/no_unique_address
[Bug rtl-optimization/82677] Many projects (linux, coreutils, GMP, gcrypt, openSSL, etc) are misusing asm(divq/divl) etc, potentially resulting in faulty/unintended optimisations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 --- Comment #12 from Niels Möller --- It would be nice with some way to annotate the asm to have it treated in the same as a possibly trapping division or pointer dereference. E.g., adding "trap" to the clobber list, somewhat similar to "memory". This would be a weaker constraint than volatile, since it would allow the compiler to reorder the asm relative to other instructions, including loads and stores, but it can't move it out of a conditional. Do you think that's a reasonable addition?
[Bug rtl-optimization/82677] Many projects (linux, coreutils, GMP, gcrypt, openSSL, etc) are misusing asm(divq/divl) etc, potentially resulting in faulty/unintended optimisations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82677 Niels Möller changed: What|Removed |Added CC||nisse at lysator dot liu.se --- Comment #10 from Niels Möller --- Out of curiosity, how is this handled for division instructions generated by gcc, with no __asm__ involved? E.g., consider int foo(int d) { int r = 1234567; if (d != 0) r = r / d; return r; } On an architecture where the div instruction doesn't raise any exception on divide by zero, this function could be compiled to a division instruction + conditional move, without any branch instruction. Right? But on most architectures, that optimization would be invalid, and the compiler must somehow know that. Is that a property on the representation of division expression? Or is it tied to some property of the instruction pattern for the divide instruction? My question really is: What would it take to mark an __asm__ expression so that it's treated in the same way as a plain C division?
[Bug c/59905] New: Unfriendly abort when calling a fucntion via a function pointer cast
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59905 Bug ID: 59905 Summary: Unfriendly abort when calling a fucntion via a function pointer cast Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: nisse at lysator dot liu.se Created attachment 31918 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31918action=edit Example program Some background: In GNU nettle, I use function casts between function types differing in the type of some pointer arguments. E.g., struct s; int foo(struct s *); typedef int func_t(void *); func_t *f = (func_t *) foo; Later on, f will be called with an argument of type void *, which actually is a pointer to a valid struct s, but cast to void *. I'm aware this is not strictly valid C, but I expect it to work on virtually all C implementations, because struct s may well be an incomplete type, and hence the calling conventions cannot reasonably depend on it's actual definition. If it matters, I could replace void * above by struct anything *, where struct anything is a different incomplete type, not defined in any compilation unit ever. (If you know any examples of architectures, supported by gcc or otherwise, where calling conventions make this break, I'd be curious to know about it). And I think this style is fairly common in object oriented C code. The closest alternatives, if one wants to stick to the C specification, is to either skip type checking altogether, always using void * for the function arguments, and casting when used. Or have do-nothing wrapper functions like int foo(struct s *); int foo_wrapper(void *p) { return foo(p); } This would typically compile to a single jump instruction, but I don't think the wrapper can be eliminated completely by an optimizing C compiler, because it is required that pointer comparison foo == foo_wrapper gives a false result. And the reason I care is that I have a library with fairly a large number of functions, which I want to let applications call either directly, *with* strong type checking of all arguments, or call via an abstraction using function pointers and void * instead of the real struct type pointer, for state/context arguments. This style seems to work fine with gcc. The surprise is when you call a function via a cast like this, *without* first storing it a variable. Like struct s; ((func_t *)foo)(s); Here, the compiler issues a warning, and replaces the call by a call to abort(). I'm attaching a complete example program. With gcc-4.7.2, on Debian GNU/linux x86_64, this is what happens (and it's the same with gcc-4.4 and gcc-4.6) $ gcc func-cast.c func-cast.c: In function ‘main’: func-cast.c:33:41: warning: function called through a non-compatible type [enabled by default] func-cast.c:33:41: note: if this code is reached, the program will abort $ ./a.out foo x: 1 bar x (var): 2 Illegal instruction (core dumped) I find this behaviour a bit obnoxious... I understand you might just say that this is bad code and gcc could emit an exec(nethack) if it wanted to. I think the current gcc behaviour is bad, in particular as default behavior, because 1. Generating a call to abort is generally unfriendly. 2. I expect that f = expr; f(...); and (expr)(...); should behave in roughly the same way for all possible expr, at least as long as f and expr have the same type, so the assignment itself doesn't imply any type conversion. 3. I think the compiler should in general treat explicit casts in the source code as meaning I think I know what I'm doing, please don't complain about it.
[Bug c/57157] New: Poor optimization of portable rotate idiom
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157 Bug #: 57157 Summary: Poor optimization of portable rotate idiom Classification: Unclassified Product: gcc Version: 4.6.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: ni...@lysator.liu.se The standard rotate idiom, (x n) | (x (32 - n)) is recognized by gcc (for concreteness, I discuss only the case that x is an uint32_t here). However, this is portable C only for n in the range 0 n 32. For n == 0, we get x 32 which gives undefined behaviour according to the C standard (6.5.7, Bitwise shift operators). To portably support n == 0, one has to write the rotate as something like (x n) | (x ((-n) 31)) And this is apparently not recognized by gcc. I compiled this test program with gcc -O3 -c -save-temps rot.c. Using gcc-4.6.3 on GNU/Linux x86_64 (ubuntu): typedef unsigned int uint32_t; /* Allows 0 n 32 (n == 0 gives undefined behaviour) */ uint32_t rot1(unsigned n, uint32_t x) { return (x n) | (x (32 - n)); } /* Allows 0 = n 32 */ uint32_t rot2(unsigned n, uint32_t x) { return (x n) | (x ((- n) 31)); } Generated assembler .filerot.c .text .p2align 4,,15 .globlrot1 .typerot1, @function rot1: .LFB0: .cfi_startproc movl%esi, %eax movl%edi, %ecx roll%cl, %eax ret .cfi_endproc .LFE0: .sizerot1, .-rot1 .p2align 4,,15 .globlrot2 .typerot2, @function rot2: .LFB1: .cfi_startproc movl%edi, %ecx movl%esi, %eax negl%ecx shrl%cl, %eax movl%edi, %ecx sall%cl, %esi orl%esi, %eax ret .cfi_endproc .LFE1: .sizerot2, .-rot2 .identGCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 .section.note.GNU-stack,,@progbits The desired result is of course to get a rotl instruction also for rot2, instead of the combination of negl, shrl, sall, and orl. Applying the above portability fix to my ROTL32 macro in GNU Nettle results in a slowdown of almost 20% for cast128. This function depends a lot on key-dependant rotates, where rotation counts of zero will happen for some keys.
[Bug c/48568] New: Missing documentation for __attribute__((visibility (protected))) on variables.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48568 Summary: Missing documentation for __attribute__((visibility (protected))) on variables. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: ni...@lysator.liu.se I discovered that __attribute__((visibility (protected))) is supported on variables, not just on functions. This should be documented in the node Variable Attributes in the GCC manual This feature is useful in particular for accessing read-only tables (allocated in the .rodata section) from PIC code. By default, access to such data in other compilation units is done via the GOT table. Using the above attribute (and assuming that the definition is linked into in the same shared library file), the GOT lookup is omitted, and instead one gets a direct access via pc-relative addressing. I was working on GNU/linux x86_64, where this corresponds to .protected table lea table(%rip), %rax # To get the address of the table rather than the default mov table@GOTPCREL(%rip), %rax in assembler input. I admit I don't fully understand the .protected pseudo op, but it's appearantly essential for getting the linker to do the right thing when linking the shared library (gcc -shared table-def.o table-use.o -o lib.so). Regards, /Niels Möller