[Bug c/66122] Bad uninlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 --- Comment #6 from Denis Vlasenko vda.linux at googlemail dot com --- Got a hold on a machine with gcc version 5.1.1 20150422 (Red Hat 5.1.1-1) Pulled current Linus kernel tree and built it with this config: http://busybox.net/~vda/kernel_config2 Note that CONFIG_CC_OPTIMIZE_FOR_SIZE is not set, i.e. it's a -O2 build. Selecting duplicate functions still shows a number of tiny uninlined functions: $ nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^ *1 ' | sort -rn 83 008a t rcu_read_lock_sched_held 48 001b t sd_driver_init 48 0012 t sd_driver_exit 48 0008 t __initcall_sd_driver_init6 47 0020 t usb_serial_module_init 47 0012 t usb_serial_module_exit 47 0008 t __initcall_usb_serial_module_init6 45 0057 t uni2char 45 0025 t char2uni 43 001f t sd_probe 40 006a t rcu_read_unlock 29 005a t cpumask_next 27 007a t rcu_read_lock 27 0011 t kzalloc 24 0022 t arch_local_save_flags 23 0041 t cpumask_check 19 0017 t phy_module_init 19 0017 t phy_module_exit 19 0008 t __initcall_phy_module_init6 18 006c t spi_write 18 003f t show_alarm 18 000b t bitmap_weight 15 0037 t show_alarms 15 0014 t init_once 14 0603 t init_engine 14 0354 t pcm_trigger 14 033b t pcm_open 14 00f8 t stop_transport 14 00db t pcm_close 14 00c8 t set_meters_on 14 00b5 t write_dsp 14 00b5 t pcm_hw_free 14 0091 t pcm_pointer 14 0090 t hw_rule_playback_channels_by_format 14 008d t send_vector 14 004f t snd_echo_vumeters_info 14 0042 t hw_rule_sample_rate 14 003e t snd_echo_vumeters_switch_put 14 0034 t audiopipe_free 14 002b t snd_echo_channels_info_info 14 0024 t snd_echo_remove 14 001b t echo_driver_init 14 0019 t pcm_analog_out_hw_params 14 0019 t arch_local_irq_restore 14 0014 t snd_echo_dev_free 14 0012 t echo_driver_exit 14 0008 t __initcall_echo_driver_init6 13 0127 t pcm_analog_out_open 13 0127 t pcm_analog_in_open 13 0039 t qdisc_peek_dequeued 13 0037 t cpumask_check 13 0022 t arch_local_irq_restore 13 001c t pcm_analog_in_hw_params 13 0006 t bcma_host_soc_unregister_driver 12 0053 t nlmsg_trim ... Such as: 811a42e0 kzalloc: 811a42e0: 55 push %rbp 811a42e1: 81 ce 00 80 00 00 or $0x8000,%esi 811a42e7: 48 89 e5mov%rsp,%rbp 811a42ea: e8 f1 92 1a 00 callq __kmalloc 811a42ef: 5d pop%rbp 811a42f0: c3 retq 810792d0 bitmap_weight: 810792d0: 55 push %rbp 810792d1: 48 89 e5mov%rsp,%rbp 810792d4: e8 37 a8 b7 00 callq __bitmap_weight 810792d9: 5d pop%rbp 810792da: c3 retq and even 88566c9b bcma_host_soc_unregister_driver: 88566c9b: 55 push %rbp 88566c9c: 48 89 e5mov%rsp,%rbp 88566c9f: 5d pop%rbp 88566ca0: c3 retq This is an *empty function* from drivers/bcma/bcma_private.h:103 uninlined: static inline void __exit bcma_host_soc_unregister_driver(void) { } BTW it doesn't even have any callers in vmlinux. It should have been optimized out.
[Bug c/66122] Bad uninlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 --- Comment #7 from Richard Biener rguenth at gcc dot gnu.org --- You can look at -Winline output
[Bug c/66122] Bad uninlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 --- Comment #2 from Denis Vlasenko vda.linux at googlemail dot com --- Tested with gcc-4.9.2. The attached testcase doesn't exhibit the bug, but compiling the same kernel tree, with the same .config, and then running nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^ *1 ' | sort -rn reveals that now other functions get wrongly deinlined: 8 0028 t acpi_os_allocate_zeroed 7 0011 t dst_output_sk 7 000b t hweight_long 5 0023 t umask_show 5 000f t init_once 4 0047 t uni2char 4 0028 t cmask_show 4 0025 t inv_show 4 0025 t edge_show 4 0020 t char2uni 4 001f t event_show 4 001d t acpi_node 4 0012 t t_stop 4 0012 t dst_discard 4 0011 t kzalloc 4 000b t udp_lib_close 4 0006 t udp_lib_hash 3 0059 t get_expiry 3 0025 t __uncore_inv_show 3 0025 t __uncore_edge_show 3 0023 t __uncore_umask_show 3 0023 t name_show 3 0022 t acpi_os_allocate 3 001f t __uncore_event_show 3 000d t cpumask_set_cpu 3 000a t nofill ... ... For example, hweight_long: static inline unsigned long hweight_long(unsigned long w) { return sizeof(w) == 4 ? hweight32(w) : hweight64(w); } wasn't expected by programmer to be deinlined. But it was: 81009c40 hweight_long: 81009c40: 55 push %rbp 81009c41: e8 da eb 31 00 callq 81328820 __sw_hweight64 81009c46: 48 89 e5mov%rsp,%rbp 81009c49: 5d pop%rbp 81009c4a: c3 retq 81009c4b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) I'm going to find and attach a file which deinlines hweight_long.
[Bug c/66122] Bad uninlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 --- Comment #3 from Denis Vlasenko vda.linux at googlemail dot com --- Created attachment 35530 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35530action=edit Preprocessed example exhibiting a bug on gcc -4.9.2 This is a preprocessed kernel/pid.c file from kernel source. When built with -O2, it wrongly deinlines hweight_long. $ gcc -O2 -c pid.preprocessed.c -o kernel.pid.o $ objdump -dr kernel.pid.o | grep -A3 hweight_long hweight_long: 0: e8 00 00 00 00 callq 5 hweight_long+0x5 1: R_X86_64_PC32__sw_hweight64-0x4 5: c3 retq $ gcc -v 21 | tail -1 gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)
[Bug c/66122] Bad uninlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 --- Comment #1 from Denis Vlasenko vda.linux at googlemail dot com --- Created attachment 35528 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35528action=edit Preprocessed example exhibiting a bug This is a preprocessed kernel/locking/mutex.c file from kernel source. When built with either -O2 or -Os, it wrongly deinlines spin_lock() and spin_unlock(): $ gcc -O2 -c mutex.preprocessed.c -o mutex.preprocessed.o $ objdump -dr mutex.preprocessed.o mutex.preprocessed.o: file format elf64-x86-64 Disassembly of section .text: spin_unlock: 0: 80 07 01addb $0x1,(%rdi) 3: c3 retq 4: 66 66 66 2e 0f 1f 84data32 data32 nopw %cs:0x0(%rax,%rax,1) b: 00 00 00 00 00 0010 __mutex_init: ... 0040 spin_lock: 40: e9 00 00 00 00 jmpq 45 spin_lock+0x5 41: R_X86_64_PC32 _raw_spin_lock-0x4 45: 66 66 2e 0f 1f 84 00data32 nopw %cs:0x0(%rax,%rax,1) 4c: 00 00 00 00 These functions are defined as: static inline __attribute__((no_instrument_function)) void spin_unlock(spinlock_t *lock) { __raw_spin_unlock(lock-rlock); } static inline __attribute__((no_instrument_function)) void spin_lock(spinlock_t *lock) { _raw_spin_lock(lock-rlock); } and programmer's intent was that they will always be inlined. This is with gcc-4.7.2
[Bug c/66122] Bad uninlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org --- gcc 4.7 is not supported anymore, so there is no point reporting issues against it. As for #c3, that got fixed (well, improved, inline unlike always_inline attribute is just an optimization hint) by r219108, which is certainly not backportable to 4.9 branch.
[Bug c/66122] Bad uninlining decisions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 --- Comment #5 from Markus Trippelsdorf trippels at gcc dot gnu.org --- The last time I looked at the kernel build with -Os, all cases were simply caused by: ipa-inline.c: 820 /* If call is cold, do not inline when function body would grow. */ 821 else if (!e-maybe_hot_p () 822 (growth = MAX_INLINE_INSNS_SINGLE 823|| growth_likely_positive (callee, growth))) 824 { 825 e-inline_failed = CIF_UNLIKELY_CALL; 826 want_inline = false; 827 } 828 }