Public bug reported:

We see UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6
 we saw the following trace during traffic in the regression:

[12885.292500] UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6
[12885.296358] signed integer overflow:
[12885.300100] 358869104 * 100 cannot be represented in type 'int'
[12885.304001] CPU: 2 PID: 19630 Comm: sock_stream_tes Tainted: G           OE  
  4.15.0-rc8-for-upstream-dbg-2018-01-25_19-31-23-61 #1
[12885.311856] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu2 04/01/2014
[12885.316091] Call Trace:
[12885.320234]  <IRQ>
[12885.324366]  dump_stack+0xd1/0x159
[12885.328586]  ? dma_virt_map_sg+0x147/0x147
[12885.332804]  ? val_to_string.constprop.4+0x88/0xd1
[12885.337055]  ubsan_epilogue+0x9/0x49
[12885.341345]  handle_overflow+0x15e/0x189
[12885.345636]  ? __ubsan_handle_negate_overflow+0x108/0x108
[12885.349891]  ? kvm_clock_read+0x1f/0x30
[12885.354230]  ? ktime_get+0x18d/0x280
[12885.358654]  ? getrawmonotonic64+0x320/0x320
[12885.363116]  ? mark_lock+0x1cf/0xc50
[12885.367624]  ? inet_recvmsg+0x121/0x4a0
[12885.372114]  mlx5e_napi_poll+0x1199/0x15c0 [mlx5_core]
[12885.376774]  ? mlx5e_rx_dim_work+0x160/0x160 [mlx5_core]
[12885.381406]  ? print_irqtrace_events+0x120/0x120
[12885.385907]  ? mark_held_locks+0x93/0x100
[12885.392099]  ? print_irqtrace_events+0x120/0x120
[12885.396589]  ? trace_hardirqs_on_caller+0x206/0x390
[12885.401278]  ? kasan_slab_free+0x87/0xc0
[12885.406000]  ? pvclock_clocksource_read+0x146/0x280
[12885.410608]  ? mark_held_locks+0x71/0x100
[12885.415251]  net_rx_action+0x58c/0x10a0
[12885.419873]  ? napi_complete_done+0x3d0/0x3d0
[12885.424385]  ? check_chain_key+0x150/0x260
[12885.428784]  ? debug_check_no_locks_freed+0x200/0x200
[12885.433041]  ? match_held_lock+0x8a/0x4f0
[12885.437215]  ? match_held_lock+0x8a/0x4f0
[12885.441249]  ? lock_downgrade+0x3e0/0x3e0
[12885.445151]  ? do_raw_spin_unlock+0x14d/0x230
[12885.448970]  ? save_trace+0x1f0/0x1f0
[12885.452664]  ? save_trace+0x1f0/0x1f0
[12885.456224]  ? match_held_lock+0xa2/0x4f0
[12885.459668]  ? pvclock_clocksource_read+0x146/0x280
[12885.463085]  ? save_trace+0x1f0/0x1f0
[12885.466361]  ? preempt_count_sub+0x14/0xd0
[12885.469566]  ? __lock_is_held+0x5d/0x110
[12885.472665]  ? preempt_count_sub+0x14/0xd0
[12885.475653]  ? __lock_is_held+0x5d/0x110
[12885.478529]  ? mark_lock+0x1cf/0xc50
[12885.481276]  ? match_held_lock+0xa2/0x4f0
[12885.483984]  ? print_irqtrace_events+0x120/0x120
[12885.486679]  ? save_trace+0x1f0/0x1f0
[12885.490891]  ? irq_exit+0x150/0x150
[12885.493454]  ? __napi_schedule+0x1ae/0x220
[12885.495936]  ? netdev_master_upper_dev_link+0x20/0x20
[12885.498402]  ? check_chain_key+0x150/0x260
[12885.500774]  ? __tasklet_schedule+0x22/0xf0
[12885.503086]  ? match_held_lock+0xa2/0x4f0
[12885.505431]  ? mlx5_eq_int+0x821/0xb50 [mlx5_core]
[12885.507775]  ? save_trace+0x1f0/0x1f0
[12885.510082]  ? pvclock_clocksource_read+0x146/0x280
[12885.512416]  ? pvclock_read_flags+0x80/0x80
[12885.514705]  ? save_trace+0x1f0/0x1f0
[12885.516995]  ? __handle_irq_event_percpu+0x1b0/0x800
[12885.519305]  ? __lock_is_held+0x5d/0x110
[12885.521630]  __do_softirq+0x248/0xba9
[12885.523913]  ? __irqentry_text_end+0x1f8a70/0x1f8a70
[12885.526234]  ? pvclock_clocksource_read+0x146/0x280
[12885.528563]  ? pvclock_read_flags+0x80/0x80
[12885.530843]  ? do_raw_spin_trylock+0x120/0x120
[12885.533178]  ? kvm_clock_read+0x1f/0x30
[12885.535432]  ? kvm_sched_clock_read+0x5/0x10
[12885.537702]  ? sched_clock_cpu+0x14/0x1f0
[12885.539968]  irq_exit+0xf4/0x150
[12885.542186]  do_IRQ+0xe8/0x1e0
[12885.544390]  common_interrupt+0xa2/0xa2
[12885.546607]  </IRQ>
There is int overflow in:
include/linux/net_dim.h 
#define IS_SIGNIFICANT_DIFF(val, ref) \
(((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */


The include/linux/net_dim.h library in new in kernel 4.16, in 4.15 kernel this 
code was in drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 

The upstream fix that fix this issue is 
commit f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33
Author: Tal Gilboa <ta...@mellanox.com>
Date:   Thu Mar 29 13:53:52 2018 +0300

    net/dim: Fix int overflow

    When calculating difference between samples, the values
    are multiplied by 100. Large values may cause int overflow
    when multiplied (usually on first iteration).
    Fixed by forcing 100 to be of type unsigned long.

    Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to 
include/linux")
    Signed-off-by: Tal Gilboa <ta...@mellanox.com>
    Reviewed-by: Andy Gospodarek <go...@broadcom.com>
    Signed-off-by: David S. Miller <da...@davemloft.net>

diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h
index bebeaad..29ed8fd 100644
--- a/include/linux/net_dim.h
+++ b/include/linux/net_dim.h
@@ -231,7 +231,7 @@ static inline void net_dim_exit_parking(struct net_dim *dim)
 }

 #define IS_SIGNIFICANT_DIFF(val, ref) \
-   (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */
+ (((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */

 static inline int net_dim_stats_compare(struct net_dim_stats *curr,
                                        struct net_dim_stats *prev)


Will sent a patch to Ubuntu kernel mailing list with a backported patch
to the old location

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Incomplete


** Tags: bionic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1763269

Title:
  Mellanox [mlx5] [bionic] UBSAN: Undefined behaviour in
  ./include/linux/net_dim.h

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763269/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to