[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
inaddy@mylinux ~/Work/Kernel/Ubuntu/ubuntu-trusty master git tag --contains 64863995563d71836fa48b743148dce993154a4e Ubuntu-3.13.0-60.99 Ubuntu-3.13.0-62.101 Ubuntu-3.13.0-62.102 Ubuntu-3.13.0-63.103 Ubuntu-3.13.0-64.104 Ubuntu-3.13.0-65.105 linux-image-generic | 3.13.0.24.28 | trusty | amd64, arm64, armhf, i386, ppc64el linux-image-generic | 3.13.0.65.71 | trusty-security | amd64, arm64, armhf, i386, ppc64el linux-image-generic | 3.13.0.65.71 | trusty-updates | amd64, arm64, armhf, i386, ppc64el This is already fixed. Updating case status. ** Changed in: linux (Ubuntu Trusty) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
This bug was fixed in the package linux - 3.19.0-26.28 --- linux (3.19.0-26.28) vivid; urgency=low [ Luis Henriques ] * Release Tracking Bug - LP: #1483630 [ Upstream Kernel Changes ] * Revert Bluetooth: ath3k: Add support of 04ca:300d AR3012 device linux (3.19.0-26.27) vivid; urgency=low [ Luis Henriques ] * Release Tracking Bug - LP: #1479055 * [Config] updateconfigs for 3.19.8-ckt4 stable update [ Chris J Arges ] * [Config] Add MTD_POWERNV_FLASH and OPAL_PRD - LP: #1464560 [ Mika Kuoppala ] * SAUCE: i915_bpo: drm/i915: Fix divide by zero on watermark update - LP: #1473175 [ Tim Gardner ] * [Config] ACORN_PARTITION=n - LP: #1453117 * [Config] Add i40e[vf] to d-i - LP: #1476393 [ Timo Aaltonen ] * SAUCE: i915_bpo: Rebase to v4.2-rc3 - LP: #1473175 * SAUCE: i915_bpo: Revert mm/fault, drm/i915: Use pagefault_disabled() to check for disabled pagefaults - LP: #1473175 * SAUCE: i915_bpo: Revert drm: i915: Port to new backlight interface selection API - LP: #1473175 [ Upstream Kernel Changes ] * Revert tools/vm: fix page-flags build - LP: #1473547 * Revert ALSA: hda - Add mute-LED mode control to Thinkpad - LP: #1473547 * Revert drm/radeon: adjust pll when audio is not enabled - LP: #1473547 * Revert crypto: talitos - convert to use be16_add_cpu() - LP: #1479048 * module: Call module notifier on failure after complete_formation() - LP: #1473547 * gpio: gpio-kempld: Fix get_direction return value - LP: #1473547 * ARM: dts: imx27: only map 4 Kbyte for fec registers - LP: #1473547 * ARM: 8356/1: mm: handle non-pmd-aligned end of RAM - LP: #1473547 * x86/mce: Fix MCE severity messages - LP: #1473547 * mac80211: don't use napi_gro_receive() outside NAPI context - LP: #1473547 * iwlwifi: mvm: Free fw_status after use to avoid memory leak - LP: #1473547 * iwlwifi: mvm: clean net-detect info if device was reset during suspend - LP: #1473547 * drm/plane-helper: Adapt cursor hack to transitional helpers - LP: #1473547 * ARM: dts: set display clock correctly for exynos4412-trats2 - LP: #1473547 * hwmon: (ntc_thermistor) Ensure iio channel is of type IIO_VOLTAGE - LP: #1473547 * mfd: da9052: Fix broken regulator probe - LP: #1473547 * ALSA: hda - Fix noise on AMD radeon 290x controller - LP: #1473547 * lguest: fix out-by-one error in address checking. - LP: #1473547 * xfs: xfs_attr_inactive leaves inconsistent attr fork state behind - LP: #1473547 * xfs: xfs_iozero can return positive errno - LP: #1473547 * fs, omfs: add NULL terminator in the end up the token list - LP: #1473547 * omfs: fix sign confusion for bitmap loop counter - LP: #1473547 * d_walk() might skip too much - LP: #1473547 * dm: fix casting bug in dm_merge_bvec() - LP: #1473547 * hwmon: (nct6775) Add missing sysfs attribute initialization - LP: #1473547 * hwmon: (nct6683) Add missing sysfs attribute initialization - LP: #1473547 * target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST - LP: #1473547 * net: phy: bcm7xxx: Fix 7425 PHY ID and flags - LP: #1473547 * fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings - LP: #1473547 * i2c: hix5hd2: Fix modalias to make module auto-loading work - LP: #1473547 * i2c: s3c2410: fix oops in suspend callback for non-dt platforms - LP: #1473547 * iio: adis16400: Report pressure channel scale - LP: #1473547 * iio: adis16400: Use != channel indices for the two voltage channels - LP: #1473547 * iio: adis16400: Compute the scan mask from channel indices - LP: #1473547 * iio: adis16400: Fix burst mode - LP: #1473547 * iio: adis16400: Fix burst transfer for adis16448 - LP: #1473547 * USB: serial: ftdi_sio: Add support for a Motion Tracker Development Board - LP: #1473547 * iio: adc: twl6030-gpadc: Fix modalias - LP: #1473547 * usb: make module xhci_hcd removable - LP: #1473547 * usb: host: xhci: add mutex for non-thread-safe data - LP: #1473547 * serial: imx: Fix DMA handling for IDLE condition aborts - LP: #1473547 * usb: dwc3: gadget: Fix incorrect DEPCMD and DGCMD status macros - LP: #1473547 * brcmfmac: avoid null pointer access when brcmf_msgbuf_get_pktid() fails - LP: #1473547 * ALSA: usb-audio: Add mic volume fix quirk for Logitech Quickcam Fusion - LP: #1473547 * n_tty: Fix auditing support for cannonical mode - LP: #1473547 * drivers/base: cacheinfo: handle absence of caches - LP: #1473547 * drm/i915/hsw: Fix workaround for server AUX channel clock divisor - LP: #1473547 * MIPS: ralink: Fix clearing the illegal access interrupt - LP: #1473547 * x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers - LP: #1473547 * lib: Fix strnlen_user() to not touch memory after
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
Started verifying the fix.. will provide results soon. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
Trusty verification: inaddy@sf00079894trusty:~$ uname -a Linux sf00079894trusty 3.13.0-62-generic #101-Ubuntu SMP Thu Jul 30 09:01:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux inaddy@sf00079894trusty:~$ sudo trace-cmd report | grep stop_two_cpus | wc -l 74 In 5 seconds the logic was executed 74 times. I kept it running for quite sometime and it does not look like there is a regression. Marking this as verification-done-trusty. Moving on to Vivid's verification... ** Tags removed: verification-needed-trusty ** Tags added: verification-done-trusty -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
Vivid verification: inaddy@sf00079894vivid:~$ uname -a Linux sf00079894vivid 3.19.0-26-generic #27-Ubuntu SMP Tue Jul 28 18:27:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux inaddy@sf00079894vivid:~$ sudo trace-cmd report | grep stop_two_cpus | wc -l 46 In 5 seconds the logic was executed 46 times. I kept it running for quite sometime and it does not look like there is a regression. Marking this as verification-done-vivid. Thank you ** Tags removed: verification-done-trusty verification-needed-vivid ** Tags added: verification-done -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Tags added: sts ** Tags added: cts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- trusty' to 'verification-done-trusty'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-trusty -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- vivid' to 'verification-done-vivid'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-vivid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Branch linked: lp:ubuntu/trusty-proposed/linux-lts-vivid -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
This bug was fixed in the package linux - 4.1.0-3.3 --- linux (4.1.0-3.3) wily; urgency=low [ Andy Whitcroft ] * Release Tracking Bug - LP: #1478897 [ Colin Ian King ] * SAUCE: KEYS: ensure we free the assoc array edit if edit is valid - CVE-2015-1333 [ Seth Forshee ] * SAUCE: overlayfs: Enable user namespace mounts for the overlay fstype - LP: #1478578 [ Upstream Kernel Changes ] * sched/stop_machine: Fix deadlock between multiple stop_two_cpus() - LP: #1461620 * x86/nmi: Enable nested do_nmi() handling for 64-bit kernels * x86/nmi/64: Remove asm code that saves cr2 * x86/nmi/64: Switch stacks on userspace NMI entry * x86/nmi/64: Reorder nested NMI checks * x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection -- Andy Whitcroft a...@canonical.com Tue, 28 Jul 2015 11:59:03 +0100 ** Changed in: linux (Ubuntu) Status: Fix Committed = Fix Released ** CVE added: http://www.cve.mitre.org/cgi- bin/cvename.cgi?name=2015-1333 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Changed in: linux (Ubuntu) Status: Invalid = Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Changed in: linux (Ubuntu Trusty) Status: In Progress = Fix Committed ** Changed in: linux (Ubuntu Vivid) Status: In Progress = Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Description changed: SRU Justification: - Impact: - - Deadlock when migrating processes in between NUMA domains. - - Came with 1 kernel dump given to me. - - Hard to trigger. + Impact: + - Deadlock when migrating processes in between NUMA domains. + - Came with 1 kernel dump given to me. + - Hard to trigger. - Fix: - - Upstream development after upstream discussion. - - Discussion: https://lkml.org/lkml/2015/6/15/531 + Fix: + - Upstream development after upstream discussion. + - Discussion: https://lkml.org/lkml/2015/6/15/531 - Testcase: - - Stress test in a virtual NUMA environment - - Wait indefinitely... Hard to trigger - - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/comments/8 - - Can, at least, make sure the logic did not introduce regression + Testcase: + - Stress test in a virtual NUMA environment + - Wait indefinitely... Hard to trigger + - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/comments/8 + - Can, at least, make sure the logic did not introduce regression It was brought to my attention the follow kernel panic: [3367068.076488] Code: 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 fa 04 74 3d f3 90 41 8b 5c 24 20 39 d3 74 f0 83 fb 02 75 d7 fa 66 0f 1f 44 00 00 eb d8 66 0f 1f [3367068.092735] BUG: soft lockup - CPU#16 stuck for 22s! [migration/16:153] [3367068.100368] Modules linked in: iptable_raw xt_nat xt_REDIRECT veth openvswitch(OF) gre vxlan ip_tunnel libcrc32c dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables ipmi_devintf 8021q garp stp mrp llc bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gpio_ich joydev aes_x86_64 lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sb_edac wmi lpc_ich edac_core mac_hid acpi_power_meter nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache lp parport hid_generic ixgbe fnic libfcoe dca ptp [3367068.100409] libfc megaraid_sas pps_core mdio usbhid scsi_transport_fc hid enic scsi_tgt [3367068.100415] CPU: 16 PID: 153 Comm: migration/16 Tainted: GF O 3.13.0-34-generic #60-Ubuntu [3367068.100417] Hardware name: Cisco Systems Inc UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.1.5.4f.0.111320130449 11/13/2013 [3367068.100419] task: 881fd2f517f0 ti: 881fd2f1c000 task.ti: 881fd2f1c000 [3367068.100420] RIP: 0010:[810f5944] [810f5944] multi_cpu_stop+0x64/0xf0 [3367068.100426] RSP: :881fd2f1dd98 EFLAGS: 0246 [3367068.100427] RAX: 8180af40 RBX: 0086 RCX: a402 [3367068.100428] RDX: 0001 RSI: RDI: 883e607edb48 [3367068.100430] RBP: 881fd2f1ddb8 R08: 0282 R09: 0001 [3367068.100431] R10: b6d8 R11: 881fc374dc80 R12: 00014440 [3367068.100432] R13: 881fd291ae00 R14: 881fd291ae08 R15: 00020010 [3367068.100433] FS: () GS:881fffd0() knlGS: [3367068.100434] CS: 0010 DS: ES: CR0: 80050033 [3367068.100435] CR2: 7f6202134b98 CR3: 01c0e000 CR4: 001407e0 [3367068.100437] Stack: [3367068.100438] 883e607edb70 881fffd0ede0 881fffd0ede8 883e607edb48 [3367068.100441] 881fd2f1de78 810f5b5e 8109dfc4 881fffd14440 [3367068.100443] 881fd2f1de08 81097508 881fffd14440 [3367068.100446] Call Trace: [3367068.100450] [810f5b5e] cpu_stopper_thread+0x7e/0x150 [3367068.100454] [8109dfc4] ? vtime_common_task_switch+0x24/0x40 [3367068.100458] [81097508] ? finish_task_switch+0x128/0x170 [3367068.100462] [8171fd41] ? __schedule+0x381/0x7d0 [3367068.100465] [810926af] smpboot_thread_fn+0xff/0x1b0 [3367068.100467] [810925b0] ? SyS_setgroups+0x1a0/0x1a0 [3367068.100470] [8108b3d2] kthread+0xd2/0xf0 [3367068.100473] [8108b300] ? kthread_create_on_node+0x1d0/0x1d0 [3367068.100477] [8172c6bc] ret_from_fork+0x7c/0xb0 [3367068.100479] [8108b300] ? kthread_create_on_node+0x1d0/0x1d0 [3367068.100480] Code: db 85 db 41 0f 95 c5 31 f6 31 d2 eb 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 fa 04 74 3d f3 90 41 8b 5c 24 20 39 d3 74 f0 83 fb 02 75 d7 I'm explaining WHY this is happening in the first comments and HOW to fix it. ** Description changed: SRU Justification: Impact: - Deadlock when migrating processes in between NUMA domains. - Came with 1 kernel dump given to me. - Hard to trigger.
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Description changed: + SRU Justification: + + Impact: + - Deadlock when migrating processes in between NUMA domains. + - Came with 1 kernel dump given to me. + - Hard to trigger. + + Fix: + - Upstream development after upstream discussion. + - Discussion: https://lkml.org/lkml/2015/6/15/531 + + Testcase: + - Stress test in a virtual NUMA environment + - Wait indefinitely... Hard to trigger + - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/comments/8 + - Can, at least, make sure the logic did not introduce regression + + + It was brought to my attention the follow kernel panic: - [3367068.076488] Code: 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 fa 04 74 3d f3 90 41 8b 5c 24 20 39 d3 74 f0 83 fb 02 75 d7 fa 66 0f 1f 44 00 00 eb d8 66 0f 1f - [3367068.092735] BUG: soft lockup - CPU#16 stuck for 22s! [migration/16:153] - [3367068.100368] Modules linked in: iptable_raw xt_nat xt_REDIRECT veth openvswitch(OF) gre vxlan ip_tunnel libcrc32c dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables ipmi_devintf 8021q garp stp mrp llc bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gpio_ich joydev aes_x86_64 lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sb_edac wmi lpc_ich edac_core mac_hid acpi_power_meter nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache lp parport hid_generic ixgbe fnic libfcoe dca ptp - [3367068.100409] libfc megaraid_sas pps_core mdio usbhid scsi_transport_fc hid enic scsi_tgt - [3367068.100415] CPU: 16 PID: 153 Comm: migration/16 Tainted: GF O 3.13.0-34-generic #60-Ubuntu - [3367068.100417] Hardware name: Cisco Systems Inc UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.1.5.4f.0.111320130449 11/13/2013 - [3367068.100419] task: 881fd2f517f0 ti: 881fd2f1c000 task.ti: 881fd2f1c000 - [3367068.100420] RIP: 0010:[810f5944] [810f5944] multi_cpu_stop+0x64/0xf0 - [3367068.100426] RSP: :881fd2f1dd98 EFLAGS: 0246 - [3367068.100427] RAX: 8180af40 RBX: 0086 RCX: a402 - [3367068.100428] RDX: 0001 RSI: RDI: 883e607edb48 - [3367068.100430] RBP: 881fd2f1ddb8 R08: 0282 R09: 0001 - [3367068.100431] R10: b6d8 R11: 881fc374dc80 R12: 00014440 - [3367068.100432] R13: 881fd291ae00 R14: 881fd291ae08 R15: 00020010 - [3367068.100433] FS: () GS:881fffd0() knlGS: - [3367068.100434] CS: 0010 DS: ES: CR0: 80050033 - [3367068.100435] CR2: 7f6202134b98 CR3: 01c0e000 CR4: 001407e0 - [3367068.100437] Stack: - [3367068.100438] 883e607edb70 881fffd0ede0 881fffd0ede8 883e607edb48 - [3367068.100441] 881fd2f1de78 810f5b5e 8109dfc4 881fffd14440 - [3367068.100443] 881fd2f1de08 81097508 881fffd14440 - [3367068.100446] Call Trace: - [3367068.100450] [810f5b5e] cpu_stopper_thread+0x7e/0x150 - [3367068.100454] [8109dfc4] ? vtime_common_task_switch+0x24/0x40 - [3367068.100458] [81097508] ? finish_task_switch+0x128/0x170 - [3367068.100462] [8171fd41] ? __schedule+0x381/0x7d0 - [3367068.100465] [810926af] smpboot_thread_fn+0xff/0x1b0 - [3367068.100467] [810925b0] ? SyS_setgroups+0x1a0/0x1a0 - [3367068.100470] [8108b3d2] kthread+0xd2/0xf0 - [3367068.100473] [8108b300] ? kthread_create_on_node+0x1d0/0x1d0 - [3367068.100477] [8172c6bc] ret_from_fork+0x7c/0xb0 - [3367068.100479] [8108b300] ? kthread_create_on_node+0x1d0/0x1d0 + [3367068.076488] Code: 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 fa 04 74 3d f3 90 41 8b 5c 24 20 39 d3 74 f0 83 fb 02 75 d7 fa 66 0f 1f 44 00 00 eb d8 66 0f 1f + [3367068.092735] BUG: soft lockup - CPU#16 stuck for 22s! [migration/16:153] + [3367068.100368] Modules linked in: iptable_raw xt_nat xt_REDIRECT veth openvswitch(OF) gre vxlan ip_tunnel libcrc32c dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables ipmi_devintf 8021q garp stp mrp llc bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gpio_ich joydev aes_x86_64 lrw gf128mul glue_helper ablk_helper ipmi_si
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Also affects: linux (Ubuntu Vivid) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Changed in: linux (Ubuntu Vivid) Status: New = In Progress ** Changed in: linux (Ubuntu Vivid) Assignee: (unassigned) = Rafael David Tinoco (inaddy) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
I'm running the NUMA tests on 3.13 for some time now and it looks like the change did not introduce any regression... $ uname -a Linux sf00079894trusty 3.13.11-ckt22-201507231149 #2 SMP Thu Jul 23 13:45:04 BRT 2015 x86_64 x86_64 x86_64 GNU/Linux I'm using a virtualized 16 Domains / 16 CPUs NUMA environment with the stress test tool: $ sudo numactl -H available: 16 nodes (0-15) node 0 cpus: 0 node 0 size: 363 MB node 0 free: 23 MB node 1 cpus: 1 node 1 size: 121 MB node 1 free: 7 MB node 2 cpus: 2 node 2 size: 377 MB node 2 free: 23 MB node 3 cpus: 3 node 3 size: 377 MB node 3 free: 23 MB node 4 cpus: 4 node 4 size: 377 MB node 4 free: 23 MB node 5 cpus: 5 node 5 size: 377 MB node 5 free: 23 MB node 6 cpus: 6 node 6 size: 377 MB node 6 free: 35 MB node 7 cpus: 7 node 7 size: 313 MB node 7 free: 19 MB node 8 cpus: 8 node 8 size: 377 MB node 8 free: 61 MB node 9 cpus: 9 node 9 size: 377 MB node 9 free: 57 MB node 10 cpus: 10 node 10 size: 377 MB node 10 free: 63 MB node 11 cpus: 11 node 11 size: 377 MB node 11 free: 30 MB node 12 cpus: 12 node 12 size: 377 MB node 12 free: 67 MB node 13 cpus: 13 node 13 size: 377 MB node 13 free: 68 MB node 14 cpus: 14 node 14 size: 377 MB node 14 free: 68 MB node 15 cpus: 15 node 15 size: 377 MB node 15 free: 64 MB $ sudo stress --vm 16 --vm-bytes 314572800 --vm-stride 1 --vm-keep Causing memory allocations of around 300MB on each node and touching every byte of the allocation (causing all the pages to be hot on the CPU running). And generating concurrency: $ sudo stress --cpu 16 So kernel scheduler has to migrate tasks, triggering the buggy logic's fix. I can confirm the logic is being triggered by using ftrace: $ sudo trace-cmd record -p function -l numa_migrate_preferred -l task_numa_migrate -l migrate_swap -l stop_two_cpus $ sudo trace-cmd report | grep stop_two_cpus | wc -l162 And can't find any regression. I'll let the tests to run a bit more and will suggest the fix to our kernel team to merge it as a Stable Release Update for Trusty, Utopic and Vivid. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
Just got an update from Peter: https://lkml.org/lkml/2015/6/15/531 asking for feedback on a patch: Subject: stop_machine: Fix deadlock between multiple stop_two_cpus() From: Peter Zijlstra pet...@infradead.org Date: Fri, 5 Jun 2015 17:30:23 +0200 Will try to test the latest builds + this patch with the NUMA migration test. Unfortunately it is REALLY hard to reproduce the issue so I cannot know if the patch fixed anything, just test if it looks good or not. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
Sasha pointed me the a fix for this particular behaviour in between 3.16 and 3.17: https://lkml.org/lkml/2014/4/10/297 [PATCH] sched: Checking for stop task appearance when balancing happens Saying that indeed mine previous observation: --- NMI exception stack --- #4 [883fd2907d98] multi_cpu_stop+0x64 at 810f5944 208 } while (curstate != MULTI_STOP_EXIT); --- RIP RIP 0x810f5944 +100: cmp $0x4,%edx --- CHECKING FOR MULTI_STOP_EXIT RDX: 883fd2907d98 - does not make any sense was right due to a stop task being picked by scheduler when it should not. And this commit is present into: $ git tag --contains a1d9a3231eac4117cadaf4b6bba5b2902c15a33e v3.15-rc2 v3.15-rc3 ... v4.1-rc5 So only Trusty is affected. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
Using ftrace I can make sure that we are triggering the logic that is responsible for the dead lock to happen (in a frequent basis) but until now without the success of making it to happen. root@numa:~# trace-cmd record -p function -l numa_migrate_preferred -l task_numa_migrate -l migrate_swap -l stop_two_cpus ... stress-1547 [012] 136.309393: function: numa_migrate_preferred stress-1547 [012] 136.309394: function: task_numa_migrate stress-1547 [012] 136.309414: function: migrate_swap stress-1547 [012] 136.309414: function: stop_two_cpus stress-1539 [017] 136.309519: function: numa_migrate_preferred stress-1539 [017] 136.309519: function: task_numa_migrate stress-1539 [017] 136.309528: function: migrate_swap stress-1539 [017] 136.309528: function: stop_two_cpus stress-1563 [006] 136.313389: function: numa_migrate_preferred stress-1563 [006] 136.313391: function: task_numa_migrate stress-1428 [004] 136.313415: function: numa_migrate_preferred stress-1428 [004] 136.313416: function: task_numa_migrate stress-1428 [004] 136.313434: function: migrate_swap stress-1428 [004] 136.313434: function: stop_two_cpus stress-1421 [016] 136.325398: function: numa_migrate_preferred stress-1464 [025] 136.386219: function: numa_migrate_preferred stress-1464 [025] 136.386221: function: task_numa_migrate stress-1464 [025] 136.386240: function: migrate_swap stress-1464 [025] 136.386241: function: stop_two_cpus stress-1435 [014] 136.400792: function: numa_migrate_preferred stress-1435 [014] 136.400793: function: task_numa_migrate ...-1513 [023] 136.401345: function: numa_migrate_preferred stress-1447 [019] 136.410245: function: numa_migrate_preferred stress-1447 [019] 136.410246: function: task_numa_migrate stress-1517 [012] 136.413338: function: numa_migrate_preferred stress-1554 [024] 136.417383: function: numa_migrate_preferred stress-1554 [024] 136.417384: function: task_numa_migrate stress-1554 [024] 136.417407: function: migrate_swap stress-1554 [024] 136.417408: function: stop_two_cpus ...-1507 [023] 136.421348: function: numa_migrate_preferred stress-1500 [018] 136.445321: function: numa_migrate_preferred stress-1525 [025] 136.473330: function: numa_migrate_preferred stress-1472 [029] 136.502245: function: numa_migrate_preferred stress-1472 [029] 136.502247: function: task_numa_migrate stress-1472 [029] 136.502270: function: migrate_swap stress-1472 [029] 136.502270: function: stop_two_cpus stress-1496 [004] 136.569273: function: numa_migrate_preferred stress-1496 [004] 136.569275: function: task_numa_migrate ... root@ttwcnuma:~# trace-cmd report | grep stop_two_cpus | wc -l 475 Meaning that I caused a task to be migrated between NUMA domains 475 times in less the 3 seconds. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Changed in: linux (Ubuntu Trusty) Importance: Undecided = Medium -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
You can follow my comments in LKML: https://lkml.org/lkml/2015/3/6/484 Basically in kernel 3.13 we are getting the follow situation: I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet. - multi_cpu_stop - do { } while (curstate != MULTI_STOP_EXIT); In my case, curstate is WAY different from enum containing MULTI_STOP_EXIT (4). Register totally messed up (probably after cpu_relax(), right where you were trapped - after the pause instruction). my case: PID: 118TASK: 883fd28ec7d0 CPU: 9 COMMAND: migration/9 ... [exception RIP: multi_cpu_stop+0x64] RIP: 810f5944 RSP: 883fd2907d98 RFLAGS: 0246 RAX: 0010 RBX: 0010 RCX: 0246 RDX: 883fd2907d98 RSI: RDI: 0001 RBP: 810f5944 R8: 810f5944 R9: R10: 883fd2907d98 R11: 0246 R12: R13: 883f55d01b48 R14: R15: 0001 ORIG_RAX: 0001 CS: 0010 SS: --- NMI exception stack --- #4 [883fd2907d98] multi_cpu_stop+0x64 at 810f5944 208 } while (curstate != MULTI_STOP_EXIT); --- RIP RIP 0x810f5944 +100: cmp$0x4,%edx --- CHECKING FOR MULTI_STOP_EXIT RDX: 883fd2907d98 - does not make any sense ### If i'm reading this right, CPU 05 - PID 14990 do_numa_page task_numa_fault numa_migrate_preferred task_numa_migrate migrate_swap (curr: 14990, task: 14996) stop_two_cpus (cpu1=05(14996), cpu2=00(14990)) wait_for_completion 14990 - CPU05 14996 - CPU00 stop_two_cpus: multi_stop_data (msdata-state = MULTI_STOP_PREPARE) smp_call_function_single (min=cpu2=00, irq_cpu_stop_queue_work, wait=1) smp_call_function_single (ran on lowest CPU, 00 for this case) irq_cpu_stop_queue_work cpu_stop_queue_work(cpu1=05(14996)) # add work (multi_cpu_stop) to cpu 05 cpu_stopper queue cpu_stop_queue_work(cpu2=00(14990)) # add work (multi_cpu_stop) to cpu 00 cpu_stopper queue wait_for_completion() -- HERE in my case, checking task structs for tasks scheduled when waiting_for_completion(): PID 14990 CPU 05 - PID 14996 CPU 00 PID 14991 CPU 30 - PID 14998 CPU 01 PID 14992 CPU 30 - PID 14998 CPU 01 PID 14996 CPU 00 - PID 14992 CPU 30 PID 14998 CPU 01 - PID 14990 CPU 05 AND 102 2 6 881fd2ea97f0 RU 0.0 0 0 [migration/6] 118 2 9 883fd28ec7d0 RU 0.0 0 0 [migration/9] 143 2 14 883fd29d47d0 RU 0.0 0 0 [migration/14] 148 2 15 883fd29fc7d0 RU 0.0 0 0 [migration/15] 153 2 16 881fd2f517f0 RU 0.0 0 0 [migration/16] THEN I am still waiting for 5 cpu_stopper_thread - multi_cpu_stop just scheduled (probably in the per cpu's queue of cpus 0,1,5,30), not running yet. AND I don't have any wait_for_completion for those OLDER migration threads (6, 9, 14, 15 and 16) Probably wait_for_completion signaled done.completion before racing. Looks like something messed up with curstate in the multi_cpu_stop state machine. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
To understand better if this bug was triggered easy I created the following test case: I've been using a KVM guest emulating a NUMA environment with 32 different domains (1 for each vCPU): root@numa:~# numactl -H available: 32 nodes (0-31) node 0 cpus: 0 node 0 size: 237 MB node 0 free: 82 MB node 1 cpus: 1 node 1 size: 251 MB node 1 free: 15 MB node 2 cpus: 2 node 2 size: 251 MB node 2 free: 52 MB node 3 cpus: 3 node 3 size: 251 MB node 3 free: 240 MB node 4 cpus: 4 node 4 size: 251 MB node 4 free: 15 MB node 5 cpus: 5 node 5 size: 251 MB node 5 free: 15 MB node 6 cpus: 6 node 6 size: 251 MB node 6 free: 17 MB node 7 cpus: 7 node 7 size: 251 MB node 7 free: 15 MB node 8 cpus: 8 node 8 size: 251 MB node 8 free: 16 MB node 9 cpus: 9 node 9 size: 251 MB node 9 free: 16 MB node 10 cpus: 10 node 10 size: 251 MB node 10 free: 15 MB node 11 cpus: 11 node 11 size: 187 MB node 11 free: 13 MB node 12 cpus: 12 node 12 size: 251 MB node 12 free: 15 MB node 13 cpus: 13 node 13 size: 251 MB node 13 free: 17 MB node 14 cpus: 14 node 14 size: 251 MB node 14 free: 15 MB node 15 cpus: 15 node 15 size: 251 MB node 15 free: 16 MB node 16 cpus: 16 node 16 size: 251 MB node 16 free: 17 MB node 17 cpus: 17 node 17 size: 251 MB node 17 free: 17 MB node 18 cpus: 18 node 18 size: 251 MB node 18 free: 16 MB node 19 cpus: 19 node 19 size: 251 MB node 19 free: 15 MB node 20 cpus: 20 node 20 size: 251 MB node 20 free: 16 MB node 21 cpus: 21 node 21 size: 251 MB node 21 free: 17 MB node 22 cpus: 22 node 22 size: 251 MB node 22 free: 51 MB node 23 cpus: 23 node 23 size: 251 MB node 23 free: 37 MB node 24 cpus: 24 node 24 size: 251 MB node 24 free: 120 MB node 25 cpus: 25 node 25 size: 251 MB node 25 free: 115 MB node 26 cpus: 26 node 26 size: 251 MB node 26 free: 41 MB node 27 cpus: 27 node 27 size: 251 MB node 27 free: 15 MB node 28 cpus: 28 node 28 size: 251 MB node 28 free: 15 MB node 29 cpus: 29 node 29 size: 251 MB node 29 free: 17 MB node 30 cpus: 30 node 30 size: 251 MB node 30 free: 164 MB node 31 cpus: 31 node 31 size: 251 MB node 31 free: 228 MB And stressing the environment (as you can see in free memory for every NUMA node with a specific tool that allocates a certain amount of memory and touches every 32 bytes of this memory (and dirtying it at the end, restarting the same behavior). Together with that I'm creating enough kernel tasks concurrent to these memory allocators for them to compete for CPU - forcing the memory threads to migrate between CPUs (and NUMA domains since every CPU is inside a different NUMA domain). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
But unfortunately I could not reproduce the issue (although I know it is in there). I'll create a small logic similar to: Commit a1d9a3231eac4117cadaf4b6bba5b2902c15a33e Author: Kirill Tkhai tk...@yandex.ru Date: Thu Apr 10 17:38:36 2014 +0400 sched: Check for stop task appearance when balancing happens We need to do it like we do for the other higher priority classes.. Signed-off-by: Kirill Tkhai tk...@yandex.ru Cc: Michael wang wang...@linux.vnet.ibm.com Cc: Sasha Levin sasha.le...@oracle.com Signed-off-by: Peter Zijlstra pet...@infradead.org Link: http://lkml.kernel.org/r/336561397137...@web27h.yandex.ru Signed-off-by: Ingo Molnar mi...@kernel.org Where I'll just bypass task selection instead of returning RETRY_TASK. Since 3.13 scheduler does not have the RETRY_TASK logic, it will be just a question of not choosing the stop worker (kthread) to run in the same conditions (since the rest is pretty much the same). Asking for kernel team review while I work on this. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
** Also affects: linux (Ubuntu Trusty) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Status: In Progress = Invalid ** Changed in: linux (Ubuntu Trusty) Status: New = In Progress ** Changed in: linux (Ubuntu Trusty) Assignee: (unassigned) = Rafael David Tinoco (inaddy) ** Changed in: linux (Ubuntu) Assignee: Rafael David Tinoco (inaddy) = (unassigned) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens
It happens that the fix relies on checking if the stop worker needs task selection re-start: + if (need_pull_dl_task(rq, prev)) { pull_dl_task(rq); + /* +* pull_rt_task() can drop (and re-acquire) rq-lock; this +* means a stop task can slip in, in which case we need to +* re-start task selection. +*/ + if (rq-stop rq-stop-on_rq) + return RETRY_TASK; And this is done by returning RETRY_TASK. This logic was not available in 3.13 AND I don't want to jeopardise our 3.13 scheduler. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1461620 Title: NUMA task migration race condition due to stop task not being checked when balancing happens To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs