Okay, I removed almost everything out of equation:

- removed networking interfaces from systemd
- removed open-iscsi login/logout logic from systemd
- used a single network interface for the logins, on a single iscsi portal

was still able to reproduce the issue by:

- doing a simple login after configuring the network device: 
  ./net-start.sh ; sleep 1 ; sudo iscsiadm -m node --login

- shutting down the network device before any logout:
  ./net-stop.sh ; sudo shutdown -h now

There was no systemd service order in play (in between open-iscsi,
iscsid and networking / ifupdown scripts) and I was able to cause the
same issue 100%, same messages, same symptoms. This tells us that,
definitely turning down the network while still logged iscsi sessions is
causing the hangs (check #3 for why).

Summary of Causes

1)

network shuts down -> iscsi luns are logged out (attempt) -> iscsid daemon is 
shutdown
or
network shuts down -> iscsi daemon is shutdown -> iscsi luns are logged out

- logout would fail due to lack of network
- iscsi daemon would die and/or logout wouldn't work
- iscsi sessions would still be there
- system would hang (transport layer can't logout)

2)

iscsi daemon is shutdown -> iscsi luns are logged out -> network shuts
down

- this would cause the bug i'm mentioning: daemon dies, logout doesn't work.
- some iscsi sessions would still be there
- system would hang (transport layer can't logout)
- (to check: open-iscsi.service ExecStop= running in parallel ?!?)

3)

I used KVM watchdog virtual device + NMI from host to crash the guest
during the hang

http://pastebin.ubuntu.com/25394744/

And, finally, the hang is because kernel is hanged during its shutdown
logic.

>     0      0   0  ffffffff81e11500  RU   0.0       0      0  [swapper/0]
>     0      0   1  ffff8801a6a20e00  RU   0.0       0      0  [swapper/1]
>     0      0   2  ffff8801a6a21c00  RU   0.0       0      0  [swapper/2]
>     0      0   3  ffff8801a6a22a00  RU   0.0       0      0  [swapper/3]

ALL CPUs were idling during the hang.

crash> runq
CPU 0 RUNQUEUE: ffff8801ae016e00
  CURRENT: PID: 0      TASK: ffffffff81e11500  COMMAND: "swapper/0"
  RT PRIO_ARRAY: ffff8801ae016fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae016e98
     [no tasks queued]

CPU 1 RUNQUEUE: ffff8801ae096e00
  CURRENT: PID: 0      TASK: ffff8801a6a20e00  COMMAND: "swapper/1"
  RT PRIO_ARRAY: ffff8801ae096fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae096e98
     [no tasks queued]

CPU 2 RUNQUEUE: ffff8801ae116e00
  CURRENT: PID: 0      TASK: ffff8801a6a21c00  COMMAND: "swapper/2"
  RT PRIO_ARRAY: ffff8801ae116fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae116e98
     [no tasks queued]

CPU 3 RUNQUEUE: ffff8801ae196e00
  CURRENT: PID: 0      TASK: ffff8801a6a22a00  COMMAND: "swapper/3"
  RT PRIO_ARRAY: ffff8801ae196fb0
     [no tasks queued]
  CFS RB_ROOT: ffff8801ae196e98
     [no tasks queued]

NO task was scheduled to run.

crash> ps -u
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
      1      0   1  ffff8801a69b8000  UN   0.0   15484   3204  systemd-shutdow

There was just ONE SINGLE user task running (systemd-shutdown)

crash> set ffff8801a69b8000
    PID: 1
COMMAND: "systemd-shutdow"
   TASK: ffff8801a69b8000  [THREAD_INFO: ffff8801a69c0000]
    CPU: 1
  STATE: TASK_UNINTERRUPTIBLE 
crash> bt
PID: 1      TASK: ffff8801a69b8000  CPU: 1   COMMAND: "systemd-shutdow"
 #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
 #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
 #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
 #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
#10 [ffff8801a69c3dc0] device_shutdown at ffffffff8156112c
#11 [ffff8801a69c3df0] kernel_power_off at ffffffff810a32f5
#12 [ffff8801a69c3e00] SYSC_reboot at ffffffff810a34d3
#13 [ffff8801a69c3f40] sys_reboot at ffffffff810a359e
#14 [ffff8801a69c3f50] entry_SYSCALL_64_fastpath at ffffffff818431f2
    RIP: 00007face7d8f856  RSP: 00007ffd4c271d08  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 00007ffd4c271240  RCX: 00007face7d8f856
    RDX: 000000004321fedc  RSI: 0000000028121969  RDI: fffffffffee1dead
    RBP: 00007ffd4c271230   R8: 0000000000001400   R9: 0000000000000005
    R10: 00007face88d36c0  R11: 0000000000000202  R12: 00007ffd4c2713d0
    R13: 000000b7e6795c83  R14: 00007ffd4c271c30  R15: 0000000000000001
    ORIG_RAX: 00000000000000a9  CS: 0033  SS: 002b

And it called the "kernel_power_off" logic from a system call.

This logic iterates over all devices calling:

if (dev->bus && dev->bus->shutdown) {
    dev->bus->shutdown(dev);
} else if (dev->driver && dev->driver->shutdown) {
    dev->driver->shutdown(dev);
}

This called sd_shutdown, which called sd_sync_cache, which sent a
SYNCHRONIZE_CACHE scsi op code to the device in question (one of the
iscsi paths). scsi_execute called blk_execute_rq to put the request on
device's queue, for execution.

/* Prevent hang_check timer from firing at us during very long I/O */
hang_check = sysctl_hung_task_timeout_secs;
if (hang_check)
    while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2)));

The default timeout for that operation is hang_check 120 seconds * 250/2
AND, "wait_for_completion_io_timeout" returns 0, !0 == 1 so.. this will
run forever.

So basically kernel is trying o complete SYNCHRONIZE_CACHE operation,
the SCSI cmd is already in device's queue and there is no return (for
obvious reasons).

Obs 1)
I have tested this with kernel 4.11 with the same symptom. I have to check now: 
is there any error propagation logic to cancel this so system can shutdown ? We 
are not getting a SCSI sense message because the "bus" (iscsi/tcp) is gone. 
Should we hard flush the device's queue ? 

Obs 2)
This isn't necessarily related to systemd. IMO this should also be addressed in 
kernel, allowing a broken transport layer - specially tcp/ip - to allow the 
shutdown to happen.

Obs 3)
The cause of this can be related to systemd (if there are leftovers) or iscsid 
daemon (if it died and/or got restarted before logging out. is there a way of 
keeping history of sessions so it knows what to logout after daemon starts 
again ?).

I'll get back to this next week, after reading more this dump.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1569925

Title:
  Shutdown hang on 16.04 with iscsi targets

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1569925/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to