[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-09-30 Thread Rafael David Tinoco
inaddy@mylinux  ~/Work/Kernel/Ubuntu/ubuntu-trusty   master  git tag 
--contains 64863995563d71836fa48b743148dce993154a4e
Ubuntu-3.13.0-60.99
Ubuntu-3.13.0-62.101
Ubuntu-3.13.0-62.102
Ubuntu-3.13.0-63.103
Ubuntu-3.13.0-64.104
Ubuntu-3.13.0-65.105

 linux-image-generic | 3.13.0.24.28 | trusty   | amd64, arm64, armhf, 
i386, ppc64el
 linux-image-generic | 3.13.0.65.71 | trusty-security  | amd64, arm64, armhf, 
i386, ppc64el
 linux-image-generic | 3.13.0.65.71 | trusty-updates   | amd64, arm64, armhf, 
i386, ppc64el

This is already fixed. Updating case status.

** Changed in: linux (Ubuntu Trusty)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-08-17 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 3.19.0-26.28

---
linux (3.19.0-26.28) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
- LP: #1483630

  [ Upstream Kernel Changes ]

  * Revert Bluetooth: ath3k: Add support of 04ca:300d AR3012 device

linux (3.19.0-26.27) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
- LP: #1479055
  * [Config] updateconfigs for 3.19.8-ckt4 stable update

  [ Chris J Arges ]

  * [Config] Add MTD_POWERNV_FLASH and OPAL_PRD
- LP: #1464560

  [ Mika Kuoppala ]

  * SAUCE: i915_bpo: drm/i915: Fix divide by zero on watermark update
- LP: #1473175

  [ Tim Gardner ]

  * [Config] ACORN_PARTITION=n
- LP: #1453117
  * [Config] Add i40e[vf] to d-i
- LP: #1476393

  [ Timo Aaltonen ]

  * SAUCE: i915_bpo: Rebase to v4.2-rc3
- LP: #1473175
  * SAUCE: i915_bpo: Revert mm/fault, drm/i915: Use pagefault_disabled()
to check for disabled pagefaults
- LP: #1473175
  * SAUCE: i915_bpo: Revert drm: i915: Port to new backlight interface
selection API
- LP: #1473175

  [ Upstream Kernel Changes ]

  * Revert tools/vm: fix page-flags build
- LP: #1473547
  * Revert ALSA: hda - Add mute-LED mode control to Thinkpad
- LP: #1473547
  * Revert drm/radeon: adjust pll when audio is not enabled
- LP: #1473547
  * Revert crypto: talitos - convert to use be16_add_cpu()
- LP: #1479048
  * module: Call module notifier on failure after complete_formation()
- LP: #1473547
  * gpio: gpio-kempld: Fix get_direction return value
- LP: #1473547
  * ARM: dts: imx27: only map 4 Kbyte for fec registers
- LP: #1473547
  * ARM: 8356/1: mm: handle non-pmd-aligned end of RAM
- LP: #1473547
  * x86/mce: Fix MCE severity messages
- LP: #1473547
  * mac80211: don't use napi_gro_receive() outside NAPI context
- LP: #1473547
  * iwlwifi: mvm: Free fw_status after use to avoid memory leak
- LP: #1473547
  * iwlwifi: mvm: clean net-detect info if device was reset during suspend
- LP: #1473547
  * drm/plane-helper: Adapt cursor hack to transitional helpers
- LP: #1473547
  * ARM: dts: set display clock correctly for exynos4412-trats2
- LP: #1473547
  * hwmon: (ntc_thermistor) Ensure iio channel is of type IIO_VOLTAGE
- LP: #1473547
  * mfd: da9052: Fix broken regulator probe
- LP: #1473547
  * ALSA: hda - Fix noise on AMD radeon 290x controller
- LP: #1473547
  * lguest: fix out-by-one error in address checking.
- LP: #1473547
  * xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
- LP: #1473547
  * xfs: xfs_iozero can return positive errno
- LP: #1473547
  * fs, omfs: add NULL terminator in the end up the token list
- LP: #1473547
  * omfs: fix sign confusion for bitmap loop counter
- LP: #1473547
  * d_walk() might skip too much
- LP: #1473547
  * dm: fix casting bug in dm_merge_bvec()
- LP: #1473547
  * hwmon: (nct6775) Add missing sysfs attribute initialization
- LP: #1473547
  * hwmon: (nct6683) Add missing sysfs attribute initialization
- LP: #1473547
  * target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST
- LP: #1473547
  * net: phy: bcm7xxx: Fix 7425 PHY ID and flags
- LP: #1473547
  * fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length
mappings
- LP: #1473547
  * i2c: hix5hd2: Fix modalias to make module auto-loading work
- LP: #1473547
  * i2c: s3c2410: fix oops in suspend callback for non-dt platforms
- LP: #1473547
  * iio: adis16400: Report pressure channel scale
- LP: #1473547
  * iio: adis16400: Use != channel indices for the two voltage channels
- LP: #1473547
  * iio: adis16400: Compute the scan mask from channel indices
- LP: #1473547
  * iio: adis16400: Fix burst mode
- LP: #1473547
  * iio: adis16400: Fix burst transfer for adis16448
- LP: #1473547
  * USB: serial: ftdi_sio: Add support for a Motion Tracker Development
Board
- LP: #1473547
  * iio: adc: twl6030-gpadc: Fix modalias
- LP: #1473547
  * usb: make module xhci_hcd removable
- LP: #1473547
  * usb: host: xhci: add mutex for non-thread-safe data
- LP: #1473547
  * serial: imx: Fix DMA handling for IDLE condition aborts
- LP: #1473547
  * usb: dwc3: gadget: Fix incorrect DEPCMD and DGCMD status macros
- LP: #1473547
  * brcmfmac: avoid null pointer access when brcmf_msgbuf_get_pktid() fails
- LP: #1473547
  * ALSA: usb-audio: Add mic volume fix quirk for Logitech Quickcam Fusion
- LP: #1473547
  * n_tty: Fix auditing support for cannonical mode
- LP: #1473547
  * drivers/base: cacheinfo: handle absence of caches
- LP: #1473547
  * drm/i915/hsw: Fix workaround for server AUX channel clock divisor
- LP: #1473547
  * MIPS: ralink: Fix clearing the illegal access interrupt
- LP: #1473547
  * x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
- LP: #1473547
  * lib: Fix strnlen_user() to not touch memory after 

[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-08-11 Thread Rafael David Tinoco
Started verifying the fix.. will provide results soon.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-08-11 Thread Rafael David Tinoco
Trusty verification:

inaddy@sf00079894trusty:~$ uname -a
Linux sf00079894trusty 3.13.0-62-generic #101-Ubuntu SMP Thu Jul 30 09:01:36 
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

inaddy@sf00079894trusty:~$ sudo trace-cmd report | grep stop_two_cpus | wc -l
74

In 5 seconds the logic was executed 74 times. I kept it running for
quite sometime and it does not look like there is a regression. Marking
this as verification-done-trusty. Moving on to Vivid's verification...

** Tags removed: verification-needed-trusty
** Tags added: verification-done-trusty

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-08-11 Thread Rafael David Tinoco
Vivid verification:

inaddy@sf00079894vivid:~$ uname -a
Linux sf00079894vivid 3.19.0-26-generic #27-Ubuntu SMP Tue Jul 28 18:27:31 UTC 
2015 x86_64 x86_64 x86_64 GNU/Linux

inaddy@sf00079894vivid:~$ sudo trace-cmd report | grep stop_two_cpus | wc -l
46

In 5 seconds the logic was executed 46 times. I kept it running for
quite sometime and it does not look like there is a regression. Marking
this as verification-done-vivid.

Thank you

** Tags removed: verification-done-trusty verification-needed-vivid
** Tags added: verification-done

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-08-11 Thread Rafael David Tinoco
** Tags added: sts

** Tags added: cts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-08-05 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-trusty

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-08-05 Thread Brad Figg
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-vivid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-08-04 Thread Launchpad Bug Tracker
** Branch linked: lp:ubuntu/trusty-proposed/linux-lts-vivid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-07-31 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.1.0-3.3

---
linux (4.1.0-3.3) wily; urgency=low

  [ Andy Whitcroft ]

  * Release Tracking Bug
- LP: #1478897

  [ Colin Ian King ]

  * SAUCE: KEYS: ensure we free the assoc array edit if edit is valid
- CVE-2015-1333

  [ Seth Forshee ]

  * SAUCE: overlayfs: Enable user namespace mounts for the overlay fstype
- LP: #1478578

  [ Upstream Kernel Changes ]

  * sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
- LP: #1461620
  * x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
  * x86/nmi/64: Remove asm code that saves cr2
  * x86/nmi/64: Switch stacks on userspace NMI entry
  * x86/nmi/64: Reorder nested NMI checks
  * x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI
detection

 -- Andy Whitcroft a...@canonical.com  Tue, 28 Jul 2015 11:59:03 +0100

** Changed in: linux (Ubuntu)
   Status: Fix Committed = Fix Released

** CVE added: http://www.cve.mitre.org/cgi-
bin/cvename.cgi?name=2015-1333

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-07-27 Thread Andy Whitcroft
** Changed in: linux (Ubuntu)
   Status: Invalid = Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-07-27 Thread Luis Henriques
** Changed in: linux (Ubuntu Trusty)
   Status: In Progress = Fix Committed

** Changed in: linux (Ubuntu Vivid)
   Status: In Progress = Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-07-23 Thread Chris J Arges
** Description changed:

  SRU Justification:
  
- Impact: 
-  - Deadlock when migrating processes in between NUMA domains.
-  - Came with 1 kernel dump given to me.
-  - Hard to trigger. 
+ Impact:
+  - Deadlock when migrating processes in between NUMA domains.
+  - Came with 1 kernel dump given to me.
+  - Hard to trigger.
  
- Fix: 
-  - Upstream development after upstream discussion.
-  - Discussion: https://lkml.org/lkml/2015/6/15/531
+ Fix:
+  - Upstream development after upstream discussion.
+  - Discussion: https://lkml.org/lkml/2015/6/15/531
  
- Testcase: 
-  - Stress test in a virtual NUMA environment
-  - Wait indefinitely... Hard to trigger
-  - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/comments/8
-  - Can, at least, make sure the logic did not introduce regression 
+ Testcase:
+  - Stress test in a virtual NUMA environment
+  - Wait indefinitely... Hard to trigger
+  - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/comments/8
+  - Can, at least, make sure the logic did not introduce regression
  
  
  
  It was brought to my attention the follow kernel panic:
  
  
  [3367068.076488] Code: 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 
ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 fa 04 74 3d f3 90 41 8b 5c 24 20 39 
d3 74 f0 83 fb 02 75 d7 fa 66 0f 1f 44 00 00 eb d8 66 0f 1f
  [3367068.092735] BUG: soft lockup - CPU#16 stuck for 22s! [migration/16:153]
  [3367068.100368] Modules linked in: iptable_raw xt_nat xt_REDIRECT veth 
openvswitch(OF) gre vxlan ip_tunnel libcrc32c dccp_diag dccp tcp_diag udp_diag 
inet_diag unix_diag ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
xt_CHECKSUM iptable_mangle xt_tcpudp bridge ip6table_filter ip6_tables 
iptable_filter ip_tables ebtable_nat ebtables x_tables ipmi_devintf 8021q garp 
stp mrp llc bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gpio_ich 
joydev aes_x86_64 lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sb_edac 
wmi lpc_ich edac_core mac_hid acpi_power_meter nfsd auth_rpcgss nfs_acl nfs 
lockd sunrpc fscache lp parport hid_generic ixgbe fnic libfcoe dca ptp
  [3367068.100409] libfc megaraid_sas pps_core mdio usbhid scsi_transport_fc 
hid enic scsi_tgt
  [3367068.100415] CPU: 16 PID: 153 Comm: migration/16 Tainted: GF O 
3.13.0-34-generic #60-Ubuntu
  [3367068.100417] Hardware name: Cisco Systems Inc 
UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.1.5.4f.0.111320130449 11/13/2013
  [3367068.100419] task: 881fd2f517f0 ti: 881fd2f1c000 task.ti: 
881fd2f1c000
  [3367068.100420] RIP: 0010:[810f5944] [810f5944] 
multi_cpu_stop+0x64/0xf0
  [3367068.100426] RSP: :881fd2f1dd98 EFLAGS: 0246
  [3367068.100427] RAX: 8180af40 RBX: 0086 RCX: 
a402
  [3367068.100428] RDX: 0001 RSI:  RDI: 
883e607edb48
  [3367068.100430] RBP: 881fd2f1ddb8 R08: 0282 R09: 
0001
  [3367068.100431] R10: b6d8 R11: 881fc374dc80 R12: 
00014440
  [3367068.100432] R13: 881fd291ae00 R14: 881fd291ae08 R15: 
00020010
  [3367068.100433] FS: () GS:881fffd0() 
knlGS:
  [3367068.100434] CS: 0010 DS:  ES:  CR0: 80050033
  [3367068.100435] CR2: 7f6202134b98 CR3: 01c0e000 CR4: 
001407e0
  [3367068.100437] Stack:
  [3367068.100438] 883e607edb70 881fffd0ede0 881fffd0ede8 
883e607edb48
  [3367068.100441] 881fd2f1de78 810f5b5e 8109dfc4 
881fffd14440
  [3367068.100443] 881fd2f1de08 81097508  
881fffd14440
  [3367068.100446] Call Trace:
  [3367068.100450] [810f5b5e] cpu_stopper_thread+0x7e/0x150
  [3367068.100454] [8109dfc4] ? vtime_common_task_switch+0x24/0x40
  [3367068.100458] [81097508] ? finish_task_switch+0x128/0x170
  [3367068.100462] [8171fd41] ? __schedule+0x381/0x7d0
  [3367068.100465] [810926af] smpboot_thread_fn+0xff/0x1b0
  [3367068.100467] [810925b0] ? SyS_setgroups+0x1a0/0x1a0
  [3367068.100470] [8108b3d2] kthread+0xd2/0xf0
  [3367068.100473] [8108b300] ? kthread_create_on_node+0x1d0/0x1d0
  [3367068.100477] [8172c6bc] ret_from_fork+0x7c/0xb0
  [3367068.100479] [8108b300] ? kthread_create_on_node+0x1d0/0x1d0
  [3367068.100480] Code: db 85 db 41 0f 95 c5 31 f6 31 d2 eb 23 66 2e 0f 1f 84 
00 00 00 00 00 83 fb 03 75 05 45 84 ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 
fa 04 74 3d f3 90 41 8b 5c 24 20 39 d3 74 f0 83 fb 02 75 d7
  
  
  I'm explaining WHY this is happening in the first comments and HOW to
  fix it.

** Description changed:

  SRU Justification:
  
  Impact:
   - Deadlock when migrating processes in between NUMA domains.
   - Came with 1 kernel dump given to me.
   - Hard to trigger.

[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-07-23 Thread Rafael David Tinoco
** Description changed:

+ SRU Justification:
+ 
+ Impact: 
+  - Deadlock when migrating processes in between NUMA domains.
+  - Came with 1 kernel dump given to me.
+  - Hard to trigger. 
+ 
+ Fix: 
+  - Upstream development after upstream discussion.
+  - Discussion: https://lkml.org/lkml/2015/6/15/531
+ 
+ Testcase: 
+  - Stress test in a virtual NUMA environment
+  - Wait indefinitely... Hard to trigger
+  - https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/comments/8
+  - Can, at least, make sure the logic did not introduce regression 
+ 
+ 
+ 
  It was brought to my attention the follow kernel panic:
  
  
- [3367068.076488] Code: 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 
ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 fa 04 74 3d f3 90 41 8b 5c 24 20 39 
d3 74 f0 83 fb 02 75 d7 fa 66 0f 1f 44 00 00 eb d8 66 0f 1f 
- [3367068.092735] BUG: soft lockup - CPU#16 stuck for 22s! [migration/16:153] 
- [3367068.100368] Modules linked in: iptable_raw xt_nat xt_REDIRECT veth 
openvswitch(OF) gre vxlan ip_tunnel libcrc32c dccp_diag dccp tcp_diag udp_diag 
inet_diag unix_diag ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
xt_CHECKSUM iptable_mangle xt_tcpudp bridge ip6table_filter ip6_tables 
iptable_filter ip_tables ebtable_nat ebtables x_tables ipmi_devintf 8021q garp 
stp mrp llc bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gpio_ich 
joydev aes_x86_64 lrw gf128mul glue_helper ablk_helper ipmi_si cryptd sb_edac 
wmi lpc_ich edac_core mac_hid acpi_power_meter nfsd auth_rpcgss nfs_acl nfs 
lockd sunrpc fscache lp parport hid_generic ixgbe fnic libfcoe dca ptp 
- [3367068.100409] libfc megaraid_sas pps_core mdio usbhid scsi_transport_fc 
hid enic scsi_tgt 
- [3367068.100415] CPU: 16 PID: 153 Comm: migration/16 Tainted: GF O 
3.13.0-34-generic #60-Ubuntu 
- [3367068.100417] Hardware name: Cisco Systems Inc 
UCSC-C220-M3S/UCSC-C220-M3S, BIOS C220M3.1.5.4f.0.111320130449 11/13/2013 
- [3367068.100419] task: 881fd2f517f0 ti: 881fd2f1c000 task.ti: 
881fd2f1c000 
- [3367068.100420] RIP: 0010:[810f5944] [810f5944] 
multi_cpu_stop+0x64/0xf0 
- [3367068.100426] RSP: :881fd2f1dd98 EFLAGS: 0246 
- [3367068.100427] RAX: 8180af40 RBX: 0086 RCX: 
a402 
- [3367068.100428] RDX: 0001 RSI:  RDI: 
883e607edb48 
- [3367068.100430] RBP: 881fd2f1ddb8 R08: 0282 R09: 
0001 
- [3367068.100431] R10: b6d8 R11: 881fc374dc80 R12: 
00014440 
- [3367068.100432] R13: 881fd291ae00 R14: 881fd291ae08 R15: 
00020010 
- [3367068.100433] FS: () GS:881fffd0() 
knlGS: 
- [3367068.100434] CS: 0010 DS:  ES:  CR0: 80050033 
- [3367068.100435] CR2: 7f6202134b98 CR3: 01c0e000 CR4: 
001407e0 
- [3367068.100437] Stack: 
- [3367068.100438] 883e607edb70 881fffd0ede0 881fffd0ede8 
883e607edb48 
- [3367068.100441] 881fd2f1de78 810f5b5e 8109dfc4 
881fffd14440 
- [3367068.100443] 881fd2f1de08 81097508  
881fffd14440 
- [3367068.100446] Call Trace: 
- [3367068.100450] [810f5b5e] cpu_stopper_thread+0x7e/0x150 
- [3367068.100454] [8109dfc4] ? vtime_common_task_switch+0x24/0x40 
- [3367068.100458] [81097508] ? finish_task_switch+0x128/0x170 
- [3367068.100462] [8171fd41] ? __schedule+0x381/0x7d0 
- [3367068.100465] [810926af] smpboot_thread_fn+0xff/0x1b0 
- [3367068.100467] [810925b0] ? SyS_setgroups+0x1a0/0x1a0 
- [3367068.100470] [8108b3d2] kthread+0xd2/0xf0 
- [3367068.100473] [8108b300] ? kthread_create_on_node+0x1d0/0x1d0 
- [3367068.100477] [8172c6bc] ret_from_fork+0x7c/0xb0 
- [3367068.100479] [8108b300] ? kthread_create_on_node+0x1d0/0x1d0 
+ [3367068.076488] Code: 23 66 2e 0f 1f 84 00 00 00 00 00 83 fb 03 75 05 45 84 
ed 75 66 f0 41 ff 4c 24 24 74 26 89 da 83 fa 04 74 3d f3 90 41 8b 5c 24 20 39 
d3 74 f0 83 fb 02 75 d7 fa 66 0f 1f 44 00 00 eb d8 66 0f 1f
+ [3367068.092735] BUG: soft lockup - CPU#16 stuck for 22s! [migration/16:153]
+ [3367068.100368] Modules linked in: iptable_raw xt_nat xt_REDIRECT veth 
openvswitch(OF) gre vxlan ip_tunnel libcrc32c dccp_diag dccp tcp_diag udp_diag 
inet_diag unix_diag ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT 
xt_CHECKSUM iptable_mangle xt_tcpudp bridge ip6table_filter ip6_tables 
iptable_filter ip_tables ebtable_nat ebtables x_tables ipmi_devintf 8021q garp 
stp mrp llc bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel 
kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gpio_ich 
joydev aes_x86_64 lrw gf128mul glue_helper ablk_helper ipmi_si 

[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-07-23 Thread Chris J Arges
** Also affects: linux (Ubuntu Vivid)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-07-23 Thread Rafael David Tinoco
** Changed in: linux (Ubuntu Vivid)
   Status: New = In Progress

** Changed in: linux (Ubuntu Vivid)
 Assignee: (unassigned) = Rafael David Tinoco (inaddy)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-07-23 Thread Rafael David Tinoco
I'm running the NUMA tests on 3.13 for some time now and it looks like
the change did not introduce any regression...

$ uname -a 
Linux sf00079894trusty 3.13.11-ckt22-201507231149 #2 SMP Thu Jul 23 13:45:04 
BRT 2015 x86_64 x86_64 x86_64 GNU/Linux 

I'm using a virtualized 16 Domains / 16 CPUs NUMA environment with the
stress test tool:

$ sudo numactl -H 
available: 16 nodes (0-15) 
node 0 cpus: 0 
node 0 size: 363 MB 
node 0 free: 23 MB 
node 1 cpus: 1 
node 1 size: 121 MB 
node 1 free: 7 MB 
node 2 cpus: 2 
node 2 size: 377 MB 
node 2 free: 23 MB 
node 3 cpus: 3 
node 3 size: 377 MB 
node 3 free: 23 MB 
node 4 cpus: 4 
node 4 size: 377 MB 
node 4 free: 23 MB 
node 5 cpus: 5 
node 5 size: 377 MB 
node 5 free: 23 MB 
node 6 cpus: 6 
node 6 size: 377 MB 
node 6 free: 35 MB 
node 7 cpus: 7 
node 7 size: 313 MB 
node 7 free: 19 MB 
node 8 cpus: 8 
node 8 size: 377 MB 
node 8 free: 61 MB 
node 9 cpus: 9 
node 9 size: 377 MB 
node 9 free: 57 MB 
node 10 cpus: 10 
node 10 size: 377 MB 
node 10 free: 63 MB 
node 11 cpus: 11 
node 11 size: 377 MB 
node 11 free: 30 MB 
node 12 cpus: 12 
node 12 size: 377 MB 
node 12 free: 67 MB 
node 13 cpus: 13 
node 13 size: 377 MB 
node 13 free: 68 MB 
node 14 cpus: 14 
node 14 size: 377 MB 
node 14 free: 68 MB 
node 15 cpus: 15 
node 15 size: 377 MB 
node 15 free: 64 MB 

$ sudo stress --vm 16 --vm-bytes 314572800 --vm-stride 1 --vm-keep 

Causing memory allocations of around 300MB on each node and touching
every byte of the allocation (causing all the pages to be hot on the
CPU running).

And generating concurrency:

$ sudo stress --cpu 16 

So kernel scheduler has to migrate tasks, triggering the buggy logic's
fix. I can confirm the logic is being triggered by using ftrace:

$ sudo trace-cmd record -p function -l numa_migrate_preferred -l 
task_numa_migrate -l migrate_swap -l stop_two_cpus 
$ sudo trace-cmd report | grep stop_two_cpus | wc -l162 

And can't find any regression.

I'll let the tests to run a bit more and will suggest the fix to our
kernel team to merge it as a Stable Release Update for Trusty, Utopic
and Vivid.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-06-15 Thread Rafael David Tinoco
Just got an update from Peter:

https://lkml.org/lkml/2015/6/15/531

asking for feedback on a patch:

Subject: stop_machine: Fix deadlock between multiple stop_two_cpus()
From: Peter Zijlstra pet...@infradead.org
Date: Fri, 5 Jun 2015 17:30:23 +0200

Will try to test the latest builds + this patch with the NUMA migration
test. Unfortunately it is REALLY hard to reproduce the issue so I cannot
know if the patch fixed anything, just test if it looks good or not.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-06-03 Thread Rafael David Tinoco
Sasha pointed me the a fix for this particular behaviour in between 3.16
and 3.17:

https://lkml.org/lkml/2014/4/10/297

[PATCH] sched: Checking for stop task appearance when balancing happens

Saying that indeed mine previous observation:


--- NMI exception stack ---
 #4 [883fd2907d98] multi_cpu_stop+0x64 at 810f5944
208 } while (curstate != MULTI_STOP_EXIT);
   --- RIP
RIP 0x810f5944 +100: cmp $0x4,%edx
   --- CHECKING FOR MULTI_STOP_EXIT
RDX: 883fd2907d98 - does not make any sense


was right due to a stop task being picked by scheduler when it should
not.

And this commit is present into:

$ git tag --contains a1d9a3231eac4117cadaf4b6bba5b2902c15a33e
v3.15-rc2
v3.15-rc3
...
v4.1-rc5

So only Trusty is affected.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-06-03 Thread Rafael David Tinoco
Using ftrace I can make sure that we are triggering the logic that is
responsible for the dead lock to happen (in a frequent basis) but until
now without the success of making it to happen.

root@numa:~# trace-cmd record -p function -l numa_migrate_preferred -l
task_numa_migrate -l migrate_swap -l stop_two_cpus

... 
stress-1547 [012] 136.309393: function: numa_migrate_preferred 
stress-1547 [012] 136.309394: function: task_numa_migrate 
stress-1547 [012] 136.309414: function: migrate_swap 
stress-1547 [012] 136.309414: function: stop_two_cpus 
stress-1539 [017] 136.309519: function: numa_migrate_preferred 
stress-1539 [017] 136.309519: function: task_numa_migrate 
stress-1539 [017] 136.309528: function: migrate_swap 
stress-1539 [017] 136.309528: function: stop_two_cpus 
stress-1563 [006] 136.313389: function: numa_migrate_preferred 
stress-1563 [006] 136.313391: function: task_numa_migrate 
stress-1428 [004] 136.313415: function: numa_migrate_preferred 
stress-1428 [004] 136.313416: function: task_numa_migrate 
stress-1428 [004] 136.313434: function: migrate_swap 
stress-1428 [004] 136.313434: function: stop_two_cpus 
stress-1421 [016] 136.325398: function: numa_migrate_preferred 
stress-1464 [025] 136.386219: function: numa_migrate_preferred 
stress-1464 [025] 136.386221: function: task_numa_migrate 
stress-1464 [025] 136.386240: function: migrate_swap 
stress-1464 [025] 136.386241: function: stop_two_cpus 
stress-1435 [014] 136.400792: function: numa_migrate_preferred 
stress-1435 [014] 136.400793: function: task_numa_migrate 
...-1513 [023] 136.401345: function: numa_migrate_preferred 
stress-1447 [019] 136.410245: function: numa_migrate_preferred 
stress-1447 [019] 136.410246: function: task_numa_migrate 
stress-1517 [012] 136.413338: function: numa_migrate_preferred 
stress-1554 [024] 136.417383: function: numa_migrate_preferred 
stress-1554 [024] 136.417384: function: task_numa_migrate 
stress-1554 [024] 136.417407: function: migrate_swap 
stress-1554 [024] 136.417408: function: stop_two_cpus 
...-1507 [023] 136.421348: function: numa_migrate_preferred 
stress-1500 [018] 136.445321: function: numa_migrate_preferred 
stress-1525 [025] 136.473330: function: numa_migrate_preferred 
stress-1472 [029] 136.502245: function: numa_migrate_preferred 
stress-1472 [029] 136.502247: function: task_numa_migrate 
stress-1472 [029] 136.502270: function: migrate_swap 
stress-1472 [029] 136.502270: function: stop_two_cpus 
stress-1496 [004] 136.569273: function: numa_migrate_preferred 
stress-1496 [004] 136.569275: function: task_numa_migrate 
... 

root@ttwcnuma:~# trace-cmd report | grep stop_two_cpus | wc -l 
475 

Meaning that I caused a task to be migrated between NUMA domains 475
times in less the 3 seconds.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-06-03 Thread Joseph Salisbury
** Changed in: linux (Ubuntu Trusty)
   Importance: Undecided = Medium

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-06-03 Thread Rafael David Tinoco
You can follow my comments in LKML:

https://lkml.org/lkml/2015/3/6/484


Basically in kernel 3.13 we are getting the follow situation:

I have a core dump locked on the same place
(state machine for powering cpu down for the task swap) from a 3.13 (+
upstream patches) and this commit wasn't backported yet.

- multi_cpu_stop - do { } while (curstate != MULTI_STOP_EXIT);
In my case, curstate is WAY different from enum containing MULTI_STOP_EXIT (4).

Register totally messed up (probably after cpu_relax(), right where
you were trapped - after the pause instruction).

my case:

PID: 118TASK: 883fd28ec7d0  CPU: 9   COMMAND: migration/9
...
[exception RIP: multi_cpu_stop+0x64]
RIP: 810f5944  RSP: 883fd2907d98  RFLAGS: 0246
RAX: 0010  RBX: 0010  RCX: 0246
RDX: 883fd2907d98  RSI:   RDI: 0001
RBP: 810f5944   R8: 810f5944   R9: 
R10: 883fd2907d98  R11: 0246  R12: 
R13: 883f55d01b48  R14:   R15: 0001
ORIG_RAX: 0001  CS: 0010  SS: 
--- NMI exception stack ---
 #4 [883fd2907d98] multi_cpu_stop+0x64 at 810f5944
208  } while (curstate != MULTI_STOP_EXIT);
   --- RIP
RIP 0x810f5944 +100:   cmp$0x4,%edx
   --- CHECKING FOR MULTI_STOP_EXIT
RDX: 883fd2907d98 - does not make any sense
###

If i'm reading this right,


CPU 05 - PID 14990

do_numa_page
task_numa_fault
numa_migrate_preferred
task_numa_migrate
migrate_swap (curr: 14990, task: 14996)
stop_two_cpus (cpu1=05(14996), cpu2=00(14990))
wait_for_completion

14990 - CPU05
14996 - CPU00

stop_two_cpus:
multi_stop_data (msdata-state = MULTI_STOP_PREPARE)
smp_call_function_single (min=cpu2=00, irq_cpu_stop_queue_work, wait=1)
smp_call_function_single (ran on lowest CPU, 00 for this case)
irq_cpu_stop_queue_work
cpu_stop_queue_work(cpu1=05(14996)) # add work
(multi_cpu_stop) to cpu 05 cpu_stopper queue
cpu_stop_queue_work(cpu2=00(14990)) # add work
(multi_cpu_stop) to cpu 00 cpu_stopper queue
wait_for_completion() -- HERE


in my case, checking task structs for tasks scheduled when
waiting_for_completion():

PID 14990 CPU 05 - PID 14996 CPU 00
PID 14991 CPU 30 - PID 14998 CPU 01
PID 14992 CPU 30 - PID 14998 CPU 01
PID 14996 CPU 00 - PID 14992 CPU 30
PID 14998 CPU 01 - PID 14990 CPU 05

AND

   102  2   6  881fd2ea97f0  RU   0.0   0  0  [migration/6]
   118  2   9  883fd28ec7d0  RU   0.0   0  0  [migration/9]
   143  2  14  883fd29d47d0  RU   0.0   0  0  [migration/14]
   148  2  15  883fd29fc7d0  RU   0.0   0  0  [migration/15]
   153  2  16  881fd2f517f0  RU   0.0   0  0  [migration/16]

THEN

I am still waiting for 5 cpu_stopper_thread - multi_cpu_stop just
scheduled (probably in the per cpu's queue of cpus 0,1,5,30), not
running yet.

AND

I don't have any wait_for_completion for those OLDER migration
threads (6, 9, 14, 15 and 16)
Probably wait_for_completion signaled done.completion before racing.

Looks like something messed up with curstate in the multi_cpu_stop
state machine.


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-06-03 Thread Rafael David Tinoco
To understand better if this bug was triggered easy I created the
following test case:

I've been using a KVM guest emulating a NUMA environment with 32
different domains (1 for each vCPU):

root@numa:~# numactl -H 
available: 32 nodes (0-31) 
node 0 cpus: 0 
node 0 size: 237 MB 
node 0 free: 82 MB 
node 1 cpus: 1 
node 1 size: 251 MB 
node 1 free: 15 MB 
node 2 cpus: 2 
node 2 size: 251 MB 
node 2 free: 52 MB 
node 3 cpus: 3 
node 3 size: 251 MB 
node 3 free: 240 MB 
node 4 cpus: 4 
node 4 size: 251 MB 
node 4 free: 15 MB 
node 5 cpus: 5 
node 5 size: 251 MB 
node 5 free: 15 MB 
node 6 cpus: 6 
node 6 size: 251 MB 
node 6 free: 17 MB 
node 7 cpus: 7 
node 7 size: 251 MB 
node 7 free: 15 MB 
node 8 cpus: 8 
node 8 size: 251 MB 
node 8 free: 16 MB 
node 9 cpus: 9 
node 9 size: 251 MB 
node 9 free: 16 MB 
node 10 cpus: 10 
node 10 size: 251 MB 
node 10 free: 15 MB 
node 11 cpus: 11 
node 11 size: 187 MB 
node 11 free: 13 MB 
node 12 cpus: 12 
node 12 size: 251 MB 
node 12 free: 15 MB 
node 13 cpus: 13 
node 13 size: 251 MB 
node 13 free: 17 MB 
node 14 cpus: 14 
node 14 size: 251 MB 
node 14 free: 15 MB 
node 15 cpus: 15 
node 15 size: 251 MB 
node 15 free: 16 MB 
node 16 cpus: 16 
node 16 size: 251 MB 
node 16 free: 17 MB 
node 17 cpus: 17 
node 17 size: 251 MB 
node 17 free: 17 MB 
node 18 cpus: 18 
node 18 size: 251 MB 
node 18 free: 16 MB 
node 19 cpus: 19 
node 19 size: 251 MB 
node 19 free: 15 MB 
node 20 cpus: 20 
node 20 size: 251 MB 
node 20 free: 16 MB 
node 21 cpus: 21 
node 21 size: 251 MB 
node 21 free: 17 MB 
node 22 cpus: 22 
node 22 size: 251 MB 
node 22 free: 51 MB 
node 23 cpus: 23 
node 23 size: 251 MB 
node 23 free: 37 MB 
node 24 cpus: 24 
node 24 size: 251 MB 
node 24 free: 120 MB 
node 25 cpus: 25 
node 25 size: 251 MB 
node 25 free: 115 MB 
node 26 cpus: 26 
node 26 size: 251 MB 
node 26 free: 41 MB 
node 27 cpus: 27 
node 27 size: 251 MB 
node 27 free: 15 MB 
node 28 cpus: 28 
node 28 size: 251 MB 
node 28 free: 15 MB 
node 29 cpus: 29 
node 29 size: 251 MB 
node 29 free: 17 MB 
node 30 cpus: 30 
node 30 size: 251 MB 
node 30 free: 164 MB 
node 31 cpus: 31 
node 31 size: 251 MB 
node 31 free: 228 MB 

And stressing the environment (as you can see in free memory for every
NUMA node with a specific tool that allocates a certain amount of memory
and touches every 32 bytes of this memory (and dirtying it at the end,
restarting the same behavior). Together with that I'm creating enough
kernel tasks concurrent to these memory allocators for them to compete
for CPU - forcing the memory threads to migrate between CPUs (and NUMA
domains since every CPU is inside a different NUMA domain).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-06-03 Thread Rafael David Tinoco
But unfortunately I could not reproduce the issue (although I know it is
in there). I'll create a small logic similar to:

Commit a1d9a3231eac4117cadaf4b6bba5b2902c15a33e
Author: Kirill Tkhai tk...@yandex.ru
Date:   Thu Apr 10 17:38:36 2014 +0400

sched: Check for stop task appearance when balancing happens

We need to do it like we do for the other higher priority classes..

Signed-off-by: Kirill Tkhai tk...@yandex.ru
Cc: Michael wang wang...@linux.vnet.ibm.com
Cc: Sasha Levin sasha.le...@oracle.com
Signed-off-by: Peter Zijlstra pet...@infradead.org
Link: http://lkml.kernel.org/r/336561397137...@web27h.yandex.ru
Signed-off-by: Ingo Molnar mi...@kernel.org

Where I'll just bypass task selection instead of returning RETRY_TASK.
Since 3.13 scheduler does not have the RETRY_TASK logic, it will be just
a question of not choosing the stop worker (kthread) to run in the same
conditions (since the rest is pretty much the same).

Asking for kernel team review while I work on this.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-06-03 Thread Brad Figg
** Also affects: linux (Ubuntu Trusty)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu)
   Status: In Progress = Invalid

** Changed in: linux (Ubuntu Trusty)
   Status: New = In Progress

** Changed in: linux (Ubuntu Trusty)
 Assignee: (unassigned) = Rafael David Tinoco (inaddy)

** Changed in: linux (Ubuntu)
 Assignee: Rafael David Tinoco (inaddy) = (unassigned)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1461620] Re: NUMA task migration race condition due to stop task not being checked when balancing happens

2015-06-03 Thread Rafael David Tinoco
It happens that the fix relies on checking if the stop worker needs task
selection re-start:

+   if (need_pull_dl_task(rq, prev)) {
pull_dl_task(rq);
+   /*
+* pull_rt_task() can drop (and re-acquire) rq-lock; this
+* means a stop task can slip in, in which case we need to
+* re-start task selection.
+*/
+   if (rq-stop  rq-stop-on_rq)
+   return RETRY_TASK;

And this is done by returning RETRY_TASK. This logic was not available
in 3.13 AND I don't want to jeopardise our 3.13 scheduler.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1461620

Title:
  NUMA task migration race condition due to stop task not being  checked
  when balancing happens

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1461620/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs