[Bug c/66122] Bad uninlining decisions

2015-05-13 Thread vda.linux at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

--- Comment #6 from Denis Vlasenko vda.linux at googlemail dot com ---
Got a hold on a machine with gcc version 5.1.1 20150422 (Red Hat 5.1.1-1)

Pulled current Linus kernel tree and built it with this config:
http://busybox.net/~vda/kernel_config2
Note that CONFIG_CC_OPTIMIZE_FOR_SIZE is not set, i.e. it's a -O2 build.

Selecting duplicate functions still shows a number of tiny uninlined functions:

$ nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^ *1 ' | sort
-rn
 83 008a t rcu_read_lock_sched_held
 48 001b t sd_driver_init
 48 0012 t sd_driver_exit
 48 0008 t __initcall_sd_driver_init6
 47 0020 t usb_serial_module_init
 47 0012 t usb_serial_module_exit
 47 0008 t __initcall_usb_serial_module_init6
 45 0057 t uni2char
 45 0025 t char2uni
 43 001f t sd_probe
 40 006a t rcu_read_unlock
 29 005a t cpumask_next
 27 007a t rcu_read_lock
 27 0011 t kzalloc
 24 0022 t arch_local_save_flags
 23 0041 t cpumask_check
 19 0017 t phy_module_init
 19 0017 t phy_module_exit
 19 0008 t __initcall_phy_module_init6
 18 006c t spi_write
 18 003f t show_alarm
 18 000b t bitmap_weight
 15 0037 t show_alarms
 15 0014 t init_once
 14 0603 t init_engine
 14 0354 t pcm_trigger
 14 033b t pcm_open
 14 00f8 t stop_transport
 14 00db t pcm_close
 14 00c8 t set_meters_on
 14 00b5 t write_dsp
 14 00b5 t pcm_hw_free
 14 0091 t pcm_pointer
 14 0090 t hw_rule_playback_channels_by_format
 14 008d t send_vector
 14 004f t snd_echo_vumeters_info
 14 0042 t hw_rule_sample_rate
 14 003e t snd_echo_vumeters_switch_put
 14 0034 t audiopipe_free
 14 002b t snd_echo_channels_info_info
 14 0024 t snd_echo_remove
 14 001b t echo_driver_init
 14 0019 t pcm_analog_out_hw_params
 14 0019 t arch_local_irq_restore
 14 0014 t snd_echo_dev_free
 14 0012 t echo_driver_exit
 14 0008 t __initcall_echo_driver_init6
 13 0127 t pcm_analog_out_open
 13 0127 t pcm_analog_in_open
 13 0039 t qdisc_peek_dequeued
 13 0037 t cpumask_check
 13 0022 t arch_local_irq_restore
 13 001c t pcm_analog_in_hw_params
 13 0006 t bcma_host_soc_unregister_driver
 12 0053 t nlmsg_trim
...

Such as:
811a42e0 kzalloc:
811a42e0:   55  push   %rbp
811a42e1:   81 ce 00 80 00 00   or $0x8000,%esi
811a42e7:   48 89 e5mov%rsp,%rbp
811a42ea:   e8 f1 92 1a 00  callq  __kmalloc
811a42ef:   5d  pop%rbp
811a42f0:   c3  retq

810792d0 bitmap_weight:
810792d0:   55  push   %rbp
810792d1:   48 89 e5mov%rsp,%rbp
810792d4:   e8 37 a8 b7 00  callq  __bitmap_weight
810792d9:   5d  pop%rbp
810792da:   c3  retq

and even
88566c9b bcma_host_soc_unregister_driver:
88566c9b:   55  push   %rbp
88566c9c:   48 89 e5mov%rsp,%rbp
88566c9f:   5d  pop%rbp
88566ca0:   c3  retq

This is an *empty function* from drivers/bcma/bcma_private.h:103 uninlined:
static inline void __exit bcma_host_soc_unregister_driver(void)
{
}

BTW it doesn't even have any callers in vmlinux. It should have been optimized
out.


[Bug c/66122] Bad uninlining decisions

2015-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

--- Comment #7 from Richard Biener rguenth at gcc dot gnu.org ---
You can look at -Winline output


[Bug c/66122] Bad uninlining decisions

2015-05-12 Thread vda.linux at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

--- Comment #2 from Denis Vlasenko vda.linux at googlemail dot com ---
Tested with gcc-4.9.2. The attached testcase doesn't exhibit the bug, but
compiling the same kernel tree, with the same .config, and then running

nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^ *1 ' | sort -rn

reveals that now other functions get wrongly deinlined:

  8 0028 t acpi_os_allocate_zeroed
  7 0011 t dst_output_sk
  7 000b t hweight_long
  5 0023 t umask_show
  5 000f t init_once
  4 0047 t uni2char
  4 0028 t cmask_show
  4 0025 t inv_show
  4 0025 t edge_show
  4 0020 t char2uni
  4 001f t event_show
  4 001d t acpi_node
  4 0012 t t_stop
  4 0012 t dst_discard
  4 0011 t kzalloc
  4 000b t udp_lib_close
  4 0006 t udp_lib_hash
  3 0059 t get_expiry
  3 0025 t __uncore_inv_show
  3 0025 t __uncore_edge_show
  3 0023 t __uncore_umask_show
  3 0023 t name_show
  3 0022 t acpi_os_allocate
  3 001f t __uncore_event_show
  3 000d t cpumask_set_cpu
  3 000a t nofill
...
...

For example, hweight_long:

static inline unsigned long hweight_long(unsigned long w)
{
return sizeof(w) == 4 ? hweight32(w) : hweight64(w);
}

wasn't expected by programmer to be deinlined. But it was:

81009c40 hweight_long:
81009c40:   55  push   %rbp
81009c41:   e8 da eb 31 00  callq  81328820
__sw_hweight64
81009c46:   48 89 e5mov%rsp,%rbp
81009c49:   5d  pop%rbp
81009c4a:   c3  retq   
81009c4b:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)

I'm going to find and attach a file which deinlines hweight_long.


[Bug c/66122] Bad uninlining decisions

2015-05-12 Thread vda.linux at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

--- Comment #3 from Denis Vlasenko vda.linux at googlemail dot com ---
Created attachment 35530
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35530action=edit
Preprocessed example exhibiting a bug on gcc -4.9.2

This is a preprocessed kernel/pid.c file from kernel source.
When built with -O2, it wrongly deinlines hweight_long.

$ gcc -O2 -c pid.preprocessed.c -o kernel.pid.o
$ objdump -dr kernel.pid.o | grep -A3 hweight_long
 hweight_long:
   0:   e8 00 00 00 00  callq  5 hweight_long+0x5
1: R_X86_64_PC32__sw_hweight64-0x4
   5:   c3  retq
$ gcc -v 21 | tail -1
gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC)


[Bug c/66122] Bad uninlining decisions

2015-05-12 Thread vda.linux at googlemail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

--- Comment #1 from Denis Vlasenko vda.linux at googlemail dot com ---
Created attachment 35528
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35528action=edit
Preprocessed example exhibiting a bug

This is a preprocessed kernel/locking/mutex.c file from kernel source.
When built with either -O2 or -Os, it wrongly deinlines spin_lock() and
spin_unlock():

$ gcc -O2 -c mutex.preprocessed.c -o mutex.preprocessed.o
$ objdump -dr mutex.preprocessed.o
mutex.preprocessed.o: file format elf64-x86-64
Disassembly of section .text:
 spin_unlock:
   0:   80 07 01addb   $0x1,(%rdi)
   3:   c3  retq
   4:   66 66 66 2e 0f 1f 84data32 data32 nopw %cs:0x0(%rax,%rax,1)
   b:   00 00 00 00 00
0010 __mutex_init:
...
0040 spin_lock:
  40:   e9 00 00 00 00  jmpq   45 spin_lock+0x5
41: R_X86_64_PC32   _raw_spin_lock-0x4
  45:   66 66 2e 0f 1f 84 00data32 nopw %cs:0x0(%rax,%rax,1)
  4c:   00 00 00 00

These functions are defined as:

static inline __attribute__((no_instrument_function)) void
spin_unlock(spinlock_t *lock)
{
 __raw_spin_unlock(lock-rlock);
}

static inline __attribute__((no_instrument_function)) void spin_lock(spinlock_t
*lock)
{
 _raw_spin_lock(lock-rlock);
}

and programmer's intent was that they will always be inlined.

This is with gcc-4.7.2


[Bug c/66122] Bad uninlining decisions

2015-05-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org ---
gcc 4.7 is not supported anymore, so there is no point reporting issues against
it.
As for #c3, that got fixed (well, improved, inline unlike always_inline
attribute is just an optimization hint) by r219108, which is certainly not
backportable to 4.9 branch.


[Bug c/66122] Bad uninlining decisions

2015-05-12 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

--- Comment #5 from Markus Trippelsdorf trippels at gcc dot gnu.org ---
The last time I looked at the kernel build with -Os, all cases
were simply caused by:

ipa-inline.c:
 820   /* If call is cold, do not inline when function body would grow. */  
 821   else if (!e-maybe_hot_p ()  
 822 (growth = MAX_INLINE_INSNS_SINGLE   
 823|| growth_likely_positive (callee, growth)))
 824 {  
 825   e-inline_failed = CIF_UNLIKELY_CALL;
 826   want_inline = false; 
 827 }  
 828 }