Analysis
--------

The 1st hard lockup is harder to get the interesting data out of, as apparently 
the registers with variables
related to the cpu number have been clobbered by more recent calls in the 
spinlock path.

Looking at the 2nd hard lockup:

addr2line + code shows us that try_to_wake_up() in line 1997 is indeed
looping with IRQs disabled in line 1939 (thus a hard lockup):

    $ addr2line -pifae 
ddeb-116.140/usr/lib/debug/boot/vmlinux-4.4.0-116-generic 0xffffffff810aacb6    
                     
    0xffffffff810aacb6: try_to_wake_up at 
/build/linux-lts-xenial-ozsla7/linux-lts-xenial-4.4.0/kernel/sched/core.c:1997

    1926 static int
    1927 try_to_wake_up(struct task_struct *p, unsigned int state, int 
wake_flags)
    1928 {
    ...
    1939         raw_spin_lock_irqsave(&p->pi_lock, flags);
    ...
    1993         /*
    1994          * If the owning (remote) cpu is still in the middle of 
schedule() with
    1995          * this task as prev, wait until its done referencing the task.
    1996          */
    1997         while (p->on_cpu)
    1998                 cpu_relax();
    ...
    2027         raw_spin_unlock_irqrestore(&p->pi_lock, flags);
    2028 
    2029         return success;
    2030 }

The objdump disassembly of try_to_wake_up() in vmlinux for the RIP instruction 
address (ffffffff810aacb6),
shows a while loop that just checks for non-zero 'p->on_cpu' and calls 
cpu_relax() (which translates to the 'pause' instruction):

    ffffffff810aacb1:       f3 90                   pause
    ffffffff810aacb3:       8b 43 28                mov    0x28(%rbx),%eax
    ffffffff810aacb6:       85 c0                   test   %eax,%eax
    ffffffff810aacb8:       75 f7                   jne    ffffffff810aacb1 
<try_to_wake_up+0x81>

So, it checks for the value in pointer in RBX + offset 0x28, which
according to the 'pahole' tool, is indeed the 'on_cpu' field:

    $ pahole --hex -C task_struct 
ddeb-116.140/usr/lib/debug/boot/vmlinux-4.4.0-116-generic | grep on_cpu         
         
        int                        on_cpu;               /*  0x28   0x4 */

So, the task_struct pointer is in RBX, which is:

    RBX: ffff883ff2a76200

And that matches the other hard locked up task on CPU 10 (see its
'task:' field).

Per the stack trace in CPU 10, and the identical timestamp of the two hard 
lockup messages, and the fact both stack traces are cpu_stopper related,
it does look like CPU 10 is waiting on the spinlock of one of the 2 cpu 
stoppers held by CPU 6, which is exactly the scenario in the suggested patch.

The problem/fix has been verified with a synthetic test-case (attached).


commit 0b26351b910fb8fe6a056f8a1bbccabe50c0e19f
Author: Peter Zijlstra <[email protected]>
Date:   Fri Apr 20 11:50:05 2018 +0200

    stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
    
    Matt reported the following deadlock:
    
    CPU0                                    CPU1
    
    schedule(.prev=migrate/0)               <fault>
      pick_next_task()                        ...
        idle_balance()                          migrate_swap()
          active_balance()                        stop_two_cpus()
                                                    spin_lock(stopper0->lock)
                                                    spin_lock(stopper1->lock)
                                                    ttwu(migrate/0)
                                                      smp_cond_load_acquire() 
-- waits for schedule()
            stop_one_cpu(1)
              spin_lock(stopper1->lock) -- waits for stopper lock
    
    Fix this deadlock by taking the wakeups out from under stopper->lock.
    This allows the active_balance() to queue the stop work and finish the
    context switch, which in turn allows the wakeup from migrate_swap() to
    observe the context and complete the wakeup.
<...>


The stop_two_cpus() call can only happen in a NUMA system per it's caller chain:
  stop_two_cpus() <- migrate_swap() <- task_numa_migrate() <- 
numa_migrate_preferred() <- [task_numa_placement()] <- task_numa_fault()

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1821259

Title:
  Hard lockup in 2 CPUs due to deadlock in cpu_stoppers

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1821259/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to