[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2021-06-30 Thread Dan Streetman
please reopen if this is still an issue

** Changed in: systemd (Ubuntu)
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread  Christian Ehrhardt 
\o/
All worked on the first run now, this gives some confidence that the flakyness 
is gone for now.

2.0.874-5ubuntu5debconf/1.5.68  2018-07-20
06:12:24pass
2.0.874-5ubuntu5targetcli-fb/2.1.43-2   2018-07-20
06:05:57pass
2.0.874-5ubuntu5qemu/1:2.12+dfsg-3ubuntu3   2018-07-20
06:05:48pass
2.0.874-5ubuntu5python3-defaults/3.6.6-12018-07-20
06:05:44pass
2.0.874-5ubuntu5netifaces/0.10.4-1build12018-07-20
06:04:45pass

If said runtime increase of the test is found to be affecting e.g.
Bionic SRUs as well blocking on opne-iscsi more than in the past then we
will need to consider backporting the fix.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread  Christian Ehrhardt 
2.0.874-5ubuntu5 migrated and had the first good new run.
Retriggering all those currently blocked on it ...

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread Launchpad Bug Tracker
This bug was fixed in the package open-iscsi - 2.0.874-5ubuntu5

---
open-iscsi (2.0.874-5ubuntu5) cosmic; urgency=medium

  * Harden dep8 tests against effects due to slow execution on Launchpad
infrastructure (LP: #1732028).
- debian/tests/patch-image: remove problematic fstab entries
- debian/tests/tgt-boot-test: ran xkvm in verbose mode

 -- Christian Ehrhardt   Thu, 19 Jul
2018 18:22:39 +0200

** Changed in: open-iscsi (Ubuntu)
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~paelzer/ubuntu/+source/open-iscsi/+git/open-iscsi/+merge/350076

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread  Christian Ehrhardt 
First good run:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac
/autopkgtest-cosmic-ci-train-ppa-service-3325/cosmic/amd64/o/open-
iscsi/20180719_180433_98a21@/log.gz

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread  Christian Ehrhardt 
Note to myself: WIP at
https://code.launchpad.net/~paelzer/ubuntu/+source/open-iscsi/+git/open-
iscsi/+ref/mask-uefi-part-to-unbreak-tests

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread  Christian Ehrhardt 
Due to snapd still lopoing in this case (bug 1782602) 
The sum of changes is:
$ systemctl disable snapd.seeded.service
$ systemctl disable snapd.service

cat /etc/fstab 
LABEL=cloudimg-rootfs   /ext4   defaults0 0
#LABEL=UEFI /boot/efi   vfatdefaults0 0

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread  Christian Ehrhardt 
In a discussion with smoser we decided to try just clearing /etc/fstab - it 
might be sufficient for our test intention and at the same time avoid the issue.
With that done we still have the following failing with timeouts:
  [ TIME ] Timed out waiting for device dev-ttyS0.device.
  [FAILED] Failed to start Journal Service.

Not sure why the first happens, but the second is due to the lack of a
root FS.

In my slowed down environments it boots REALLY slow then, like 30 minutes.
Several services are unforgiving of the root not being there.
Especially snap that was getting into a loop of Starting/Stopping/... and by 
that did not let it complete.


The former try was too hard (intentionally), so I stepped back to just replace 
the labels with the (in this case) known partition IDs /dev/sda1 and /dev/sda15.
This I first booted non slowed to check if the config works at all.
It works

Note: non slowed the TCG-system is at 4*100% for the vcpus and 400% for
TCG. ~800% vs ~98% is a slowdown by 1/8 which matches what I see as
effective slowdown.

A few more tries together with Scott found that disabling the fstab entry for 
EFI, but keeping the main root as-is.
Again we verified this config is valid (boot without CPU limit) and then we set 
the limit.
With that finally it seems to work.
Lets code that up in open-iscsi and put it to a PPA to test on LP infra against 
it.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread  Christian Ehrhardt 
I worked on the two messages and related boot messages:
1. the ordering cycle is about media-root, maybe the config is not yet perfect
2. timeouts on 
 dev-ttyS0.device
 dev-disk-by\x2dlabel-UEFI.device
Hmm, that is:
  Dependency failed for /boot/efi.
  Dependency failed for Local File Systems.
  local-fs.target: Job local-fs.target/start failed

I checked for network coming up and was first confused between 
  systemd[1]: Failed to start Network Service.
And network in general coming up very late.
In a log:
  eth0: DHCPv4 address 10.0.12.15/24
  systemd-networkd[529]: eth0: Configured
In one of my cases ~10 minutes later than I got into the emergency console.

Afterwards boot-efi.mount autocompleted just fine btw.

Early boot is a swarm of red herrings:
I was checking more like
  Started Remount Root and Kernel File Systems.
before
  Started Network Service
It seems the latter just depends on time synchronization and therefore waits.
  Reached target Network is Online.
is before it as it should be.

And as trivial as that may sound, the network seems to be related at least.
Of all the logs only the good cases reach:
  Reached target Network is Online
But that is only reachable after local-fs which in the bad case fails - so it 
is ok to not be reached if the root is missing.

I also checked if the ordering changed e.g. compared to Bionic?
But it seemed - at least for network init similar in old bad&good cases.

Arr - early boot dependencies are lovely :-)
Following a discussion about fstab modifications next ...

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread Scott Moser
** Description changed:

  This issue keeps cropping up.  It shows itself in open-iscsi autopkg tests.
  I think it might just be "really slow system".  It seems the timeout is only
-  1 minute 30 seconds for the disk to appear, and in a happy run you
+  1 minute 30 seconds for the disk to appear, and in a happy run you
  might see something very close:
  
  [K[   [0;31m*[0;1;31m*[0m[0;31m*[0m] A start job is running for 
dev-disk…-UEFI.device (1min 29s / 1min 32s)
  [K[  [0;31m*[0;1;31m*[0m[0;31m* [0m] A start job is running for 
dev-disk…-UEFI.device (1min 30s / 1min 32s)
  [K[[0;32m  OK  [0m] Found device VIRTUAL-DISK UEFI.
-  Mounting /boot/efi...
+  Mounting /boot/efi...
  
  ---
  There is information on the open-iscsi tests at [1].
-   [1] 
https://git.launchpad.net/~usd-import-team/ubuntu/+source/open-iscsi/tree/debian/tests/README-boot-test.md
+   [1] 
https://git.launchpad.net/~usd-import-team/ubuntu/+source/open-iscsi/tree/debian/tests/README-boot-test.md
  
  The tests set up an iscsi target and boot a kvm guest off that read-only
  root with overlayroot.
  
  # get open-iscsi source
  $ apt-get source open-iscsi
  $ cd open-iscsi-2.0.874/
  
  $ sudo apt-get install -qy simplestreams tgt qemu-system-x86 \
-  cloud-image-utils distro-info
+  cloud-image-utils distro-info
  $ cd open-iscsi-2.0.874/
  
  ## we're now mostly following debian/tests/README-boot-test.md
  # download the image and get kernel/initrd
  $ PATH=$PWD/debian/tests:$PATH
- $ get-image bionic.d bionic 
+ $ get-image bionic.d bionic
  $ sudo `which patch-image` \
- --kernel=bionic.d/kernel --initrd=bionic.d/initrd bionic.d/disk.img
+ --kernel=bionic.d/kernel --initrd=bionic.d/initrd bionic.d/disk.img
  
  $ tgt-boot-test -v bionic.d/disk.img bionic.d/kernel bionic.d/initrd
  
  
  Success is being able to log in with 'ubuntu' and 'passw0rd'.
  Failure as seen in the log is dropping into an emergency shell.
  
  Once inside (this was successful) you'll see a mostly sane system.
  Some things to note:
  a.) tgt-boot-test boots without kvm enabled.  This is because using
-  kvm with qemu in nested virt would cause system lockups. Its slower
+  kvm with qemu in nested virt would cause system lockups. Its slower
  but more reliable to go wtihout.
  b.) under bug 1723183 I  made overlayroot comment out the root filesystem
  from the rendered /etc/fstab.  That was because systemd got confused and
  assumed that /media/root-ro had to be on top of /.
  c.) you can enable or disable kvm by setting _USE_KVM=0 or _USE_KVM=1
-in your environment.
+    in your environment.
  
  $ grep -v "^# " /etc/fstab
  #
  #
  #LABEL=cloudimg-rootfs /media/root-ro/ ext4 ro,defaults,noauto 0 0
  /media/root-ro/ / overlay 
lowerdir=/media/root-ro/,upperdir=/media/root-rw/overl
  ay/,workdir=/media/root-rw/overlay-workdir/_ 0 0
  LABEL=UEFI /boot/efi vfat defaults 0 0 # overlayroot:fs-unsupported
  
  $ sudo blkid
  /dev/sda1: LABEL="cloudimg-rootfs" 
UUID="7b1980bd-9102-4356-8df0-ec7a0c062411" TYPE="ext4" 
PARTUUID="c0b5ace0-4703-4667-babb-3d38137cab88"
  /dev/sda15: LABEL="UEFI" UUID="B177-3CC9" TYPE="vfat" 
PARTUUID="0ab0b9fd-2c28-4724-857a-1559f0cf76ea"
  /dev/sda14: PARTUUID="221662d6-cab0-4290-ba1c-e72acf2bf193"
  
  $ cat /run/systemd/generator/local-fs.target.requires/boot-efi.mount
  # Automatically generated by systemd-fstab-generator
  
  [Unit]
  SourcePath=/etc/fstab
  Documentation=man:fstab(5) man:systemd-fstab-generator(8)
  Before=local-fs.target
  
  [Mount]
  Where=/boot/efi
  What=/dev/disk/by-label/UEFI
  Type=vfat
  
  Related bugs:
-  * bug 1680197: Zesty deployments failing sporadically
-  * bug 1723183: transient systemd ordering issue when using overlayroot
+  * bug 1680197: Zesty deployments failing sporadically
+  * bug 1723183: transient systemd ordering issue when using overlayroot
+  * bug 1666573: transient systemd ordering cycle in boot with overlayroot ver 
read-only open-iscsi root 
  
  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: systemd 234-2ubuntu12
  ProcVersionSignature: User Name 4.13.0-16.19-generic 4.13.4
  Uname: Linux 4.13.0-16-generic x86_64
  ApportVersion: 2.20.7-0ubuntu4
  Architecture: amd64
  Date: Mon Nov 13 21:06:36 2017
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
  ProcEnviron:
-  TERM=vt220
-  PATH=(custom, no user)
-  XDG_RUNTIME_DIR=
-  LANG=C.UTF-8
-  SHELL=/bin/bash
+  TERM=vt220
+  PATH=(custom, no user)
+  XDG_RUNTIME_DIR=
+  LANG=C.UTF-8
+  SHELL=/bin/bash
  ProcKernelCmdLine: nomodeset iscsi_initiator=maas-enlist 
iscsi_target_name=tgt-boot-test-7xuhwl iscsi_target_ip=10.0.12.2 
iscsi_target_port=3260 iscsi_initiator=maas-enlist ip=maas-enlist:BOOTIF ro 
net.ifnames=0 BOOTIF_DEFAULT=eth0 
root=/dev/disk/by-path/ip-10.0.12.2:3260-iscsi-tgt-boot-test-7xuhwl-lun-1-part1 
overlayroot=tmpfs console=ttyS0 ds=nocloud-net;seedfrom=http://10.0.12.2:32600/
  SourcePackage: system

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread Scott Moser
i attached as reference
 https://hackmd.io/E0ydu7Y7QEe-kroPb6-OOA
at some point we can improve the doc in the debian/tests directory with that.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread  Christian Ehrhardt 
Running just with tgt-boot-test in interactive mode, no user-data and no extra 
disk.
Issue still reproducible with that.

Saving the image to try different modifications while being able to get
back to the current state.

Then I started where Scott already experimented. The timeouts.
I was trying the big hammer first setting all of them.

# Inside:
sudo mount-image-callback --system-mounts disk.img /bin/bash

/etc/fstab:
LABEL=cloudimg-rootfs   /ext4   
defaults,x-systemd.device-timeout=600,x-systemd.mount-timeout=600,_netdev   
0 0

$ mkdir etc/systemd/user.conf.d
$ echo DefaultTimeoutStartSec=600s > etc/systemd/user.conf.d/longtimeout.conf


I (accidentally) verified that without CPU limit this modified root disk would 
boot fine.
And will now run it with 40% over lunchtime (a bit less to more surely trigger 
it - and since I don't "wait" for it to complete lower % won't hurt).

But that still ran into the issues.
At least I have an emergency shell there interactively now that I can take a 
look before the next tries.
Also not only disk fails but also (probably dependency) Timed out waiting for 
device dev-ttyS0.device.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1732028] Re: timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests on LP Infra]

2018-07-19 Thread  Christian Ehrhardt 
It does not consume "a lot" of CPU but lets try still to slow it down into the 
error.
$ sudo apt install cgroup-tools
$ sudo cgcreate -g cpu:/cpulimited
# you might want to ensure that cpu.cfs_period_us: 10
# e.g. Set 10% as hard limit
$ echo 1 | sudo tee /sys/fs/cgroup/cpu,cpuacct/cpulimited/cpu.cfs_quota_us
# This was done on the Host, all tests run in the qemu guest (as on LP), move 
qemu in those cgroups
$ for task in $(ls -1 /proc/$(pidof qemu-system-x86_64)/task/); do echo $task | 
sudo tee -a /sys/fs/cgroup/cpu,cpuacct/cpulimited/tasks; done;

I tested different shares using TCG in the guest.
50% seemed to be a tradeoff that isn't too slow to totally fail or having you 
wait too much.
It gets slower, TCG&Emulation blocked at 50%, vcpus at ~12%

With that I was able to hit this locally:
[  432.218071] systemd[1]: Timed out waiting for device 
dev-disk-by\x2dlabel-UEFI.device.
[ TIME ] Timed out waiting for device dev-disk-by\x2dlabel-UEFI.device.
[DEPEND] Dependency failed for /boot/efi.
[DEPEND] Dependency failed for Local File Systems.
[...]
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"
to boot into default mode.
Press Enter for maintenance
(or press Control-D to continue):

>From here going step by step down to the smallest test (this was full 
>autopkgtest).
And there we can again start trying tweaks as in comment #5 and comment #6 but 
hopefully iterate faster on them.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1732028

Title:
  timeout in iscsi boot fail with overlayroot [open-iscsi autopkg tests
  on LP Infra]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1732028/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs