Bug#821254: systemd[1]: xendomains.service start operation timed out.

2020-01-06 Thread Hans van Kranenburg
Hi,

On 1/3/20 5:42 PM, Martin Maney wrote:
> 
> [...]
> 
> Yes, the shutdown hang is a different issue, but I'm going to hope that
> the real systemd units mentioned in this bug will fix my problem, too.

What you could do already now is try testing those scripts, just
shutting down and starting up the domUs, without actually rebooting the
machine. By doing so we can learn if we could use them as a drop in
replacement or not.

The xendomains init script that we have in Debian is:

https://salsa.debian.org/xen-team/debian-xen/blob/master/debian/xen-utils-common.xendomains.init

The upstream one (which is quite a bit different) is:

https://salsa.debian.org/xen-team/debian-xen/blob/master/tools/hotplug/Linux/xendomains.in

Or, it seems that last one gets installed in a location for helper
scripts and it's just called from both the init.d script and the systemd
service:

https://salsa.debian.org/xen-team/debian-xen/blob/master/tools/hotplug/Linux/init.d/xendomains.in

https://salsa.debian.org/xen-team/debian-xen/blob/master/tools/hotplug/Linux/systemd/xendomains.service.in

It would be really helpful if you would want to spend some time on this.

Speaking for myself, I either deal with clusters and using live migrate
to empty a server before shutting it down, or otherwise I rather have my
own way to carefully shut down things before typing a reboot command,
combined with a molly-guard script to prevent accidental reboots while
something is still running. That way there's still an option to
debug/salvage a misbehaving domU before shutdown.

Hans



Bug#821254: systemd[1]: xendomains.service start operation timed out.

2020-01-03 Thread Martin Maney


First, an answer that I happen to have handy to Hans's question from
Feb 2019:

"TBH, I'm not an expert at all in this area, I never figured out yet
how all these systemd<->init-script compatibility layers work yet."

Neither am I an expert, and I'd really prefer not needing to become
one, but from what I just read in the systemd-sysv-generator man page,
the answer would have to be "poorly, in general".

I was specifically looking into a timeout on shutdown, and it's
problematic at least in part because the generator does not process the
$syslog item from the LSB header, so the hang happens in a black hole. 
The last visible message other than the truncated one about the "LSB
stop job ...  xendomains" on the console was one that seemed to be
about the end of block device availability, which would account for the
shutdown hang in xendomains very handily.  The screen was apparently
cleared just before that, which I guess is systemd being "helpful".  As
opposed to many things with which it actually is helpful .

Yes, the shutdown hang is a different issue, but I'm going to hope that
the real systemd units mentioned in this bug will fix my problem, too.

-- 
As economics is known as The Miserable Science, software engineering should
be known as The Doomed Discipline, doomed because it cannot even approach
its goal since its goal is self-contradictory. -- Edsger Dijkstra



Bug#821254: systemd[1]: xendomains.service start operation timed out.

2019-02-03 Thread Hans van Kranenburg
Hi,

On 2/2/19 11:49 PM, Andy Smith wrote:
> 
> On Sat, Feb 02, 2019 at 11:24:36PM +0100, Hans van Kranenburg wrote:
>> When working on actually shipping systemd units we'd really need
>> to have a group of users that want to actively help testing
>> everything. Downgrade, upgrade, try to break it etc...
> 
> I actually ended up going from Debian-packaged 4.4.x to Mark Pryor's
> Debian packages because I needed to upgrade version and patch some
> XSAs during embargo. At the time there wasn't much going on with the
> Debian packaging and I didn't feel confident to do it myself, so I
> based things off of Mark's work.
> 
> I used that as a basis for 4.8.x and now 4.10.x packages. Now that
> you are helping with Debian packaging I would like to come back to
> Debian's packages, probably along with an upgrade to buster.
> 
> The systemd stuff from those packages of Mark's did solve this
> problem though. I assume this is upstream content.

Yes, it is, and this made me just realize that this means that you've
been running the end result of what would happen when we would actually
add the systemd stuff to the packaging, already, for quite some time.

That's great, because I guess that already answers most of the "will
these things do the right thing out of the box?" uncertainty.

> I think in 4.4 it
> was generating a systemd service from an init script, whereas now
> it's a native systemd service. Here's /lib/systemd/system/xendomains.service:
> 
> [Unit]
> Description=Xendomains - start and stop guests on boot and shutdown
> Requires=proc-xen.mount xenstored.service
> After=proc-xen.mount xenstored.service xenconsoled.service
> xen-init-dom0.service
> After=network-online.target
> After=remote-fs.target
> ConditionPathExists=/proc/xen/capabilities
> Conflicts=libvirtd.service
> 
> [Service]
> Type=oneshot
> RemainAfterExit=true
> ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities
> ExecStart=-/usr/lib/xen-4.10/bin/xendomains start
> ExecStop=/usr/lib/xen-4.10/bin/xendomains stop
> ExecReload=/usr/lib/xen-4.10/bin/xendomains restart
> 
> [Install]
> WantedBy=multi-user.target

Yup,
https://salsa.debian.org/xen-team/debian-xen/blob/master/tools/hotplug/Linux/systemd/xendomains.service.in

> Those packages came from:
> 
> http://107.185.106.30/xen/debian/stretch-nmu/4ax/
> 
> (plus the XSAs published since then)
> 
> Would it be helpful if I installed buster and
> xen-hypervisor-4.11-amd64 and checked how the systemd unit files
> cope with trying to start 75 or so domains? If so I will try to find
> some time to try that,

Absolutely.

First option would be to find out who/what decides there should be a 5
minute timeout.

But, other option is to upgrade a box to the buster 4.11 packages and
then just put the systemd things from tools/hotplug/Linux/systemd in
place and test what happens. This might be doable for Buster after all
(...there's also 22 other items still on the TODO).

TBH, I'm not an expert at all in this area, I never figured out yet how
all these systemd<->init-script compatibility layers work yet.

Hans



Bug#821254: systemd[1]: xendomains.service start operation timed out.

2019-02-02 Thread Andy Smith
Hi Hans,

On Sat, Feb 02, 2019 at 11:24:36PM +0100, Hans van Kranenburg wrote:
> When working on actually shipping systemd units we'd really need
> to have a group of users that want to actively help testing
> everything. Downgrade, upgrade, try to break it etc...

I actually ended up going from Debian-packaged 4.4.x to Mark Pryor's
Debian packages because I needed to upgrade version and patch some
XSAs during embargo. At the time there wasn't much going on with the
Debian packaging and I didn't feel confident to do it myself, so I
based things off of Mark's work.

I used that as a basis for 4.8.x and now 4.10.x packages. Now that
you are helping with Debian packaging I would like to come back to
Debian's packages, probably along with an upgrade to buster.

The systemd stuff from those packages of Mark's did solve this
problem though. I assume this is upstream content. I think in 4.4 it
was generating a systemd service from an init script, whereas now
it's a native systemd service. Here's /lib/systemd/system/xendomains.service:

[Unit]
Description=Xendomains - start and stop guests on boot and shutdown
Requires=proc-xen.mount xenstored.service
After=proc-xen.mount xenstored.service xenconsoled.service
xen-init-dom0.service
After=network-online.target
After=remote-fs.target
ConditionPathExists=/proc/xen/capabilities
Conflicts=libvirtd.service

[Service]
Type=oneshot
RemainAfterExit=true
ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities
ExecStart=-/usr/lib/xen-4.10/bin/xendomains start
ExecStop=/usr/lib/xen-4.10/bin/xendomains stop
ExecReload=/usr/lib/xen-4.10/bin/xendomains restart

[Install]
WantedBy=multi-user.target

Those packages came from:

http://107.185.106.30/xen/debian/stretch-nmu/4ax/

(plus the XSAs published since then)

Would it be helpful if I installed buster and
xen-hypervisor-4.11-amd64 and checked how the systemd unit files
cope with trying to start 75 or so domains? If so I will try to find
some time to try that,

Cheers,
Andy



Bug#821254: systemd[1]: xendomains.service start operation timed out.

2019-02-02 Thread Hans van Kranenburg
Hi Andy,

Just to set expectations... Ian is not using systemd at all, and for me,
the current whatever init script stuff there is does its thing for my
usecase at work. I don't use xendomains, I use live migrate to drain
physical servers so I can reboot / upgrade / whatever them without any
need to hurry. TBH, all the extra-time I had for working on Debian/Xen
in the last months was eaten by getting things fixed for myself, like
live migration bugs.

Yesterday, we spend the day working on the Buster TODO list, and in the
beginning of the day we identified "init scripts and systemd" as the
main topic of the day. However, when starting to look into that, it
quickly became clear that "just" merging the debian and upstream init
scripts is not a trivial operation (it needs discussion with the
redhat-based users). When working on actually shipping systemd units
we'd really need to have a group of users that want to actively help
testing everything. Downgrade, upgrade, try to break it etc...

For buster, there will be a notice in the "known-issues" section of
README.Debian about this issue.

If you have an idea about how to change this timeout then please share.
Don't wait for it to magically happen. :)

Hans



Bug#821254: systemd[1]: xendomains.service start operation timed out.

2016-04-16 Thread Andy Smith
Package: xen-utils-common
Version: 4.4.1-9+deb8u4
Severity: normal

Dear Maintainer,

I have a server with a large number of domUs set to auto-start. For the
first time I have booted it with all of them needing to start from cold,
but the xendomains service only got part way through.

syslog showed nothing notable about the domains starting…

Apr 16 14:57:45 snaps xendomains[4631]: Starting Xen domain lima (from 
/etc/xen/auto/010-lima.conf)...done.

…until…

Apr 16 15:02:36 sierra xendomains[4631]: Starting Xen domain juliet (from 
/etc/xen/auto/627-juliet.conf)...done.
Apr 16 15:02:37 sierra kernel: [  341.269174] xen-blkback:ring-ref 8, 
event-channel 15, protocol 2 (x86_32-abi)
Apr 16 15:02:37 sierra kernel: [  341.367307] xen-blkback:ring-ref 9, 
event-channel 16, protocol 2 (x86_32-abi)
Apr 16 15:02:38 sierra kernel: [  341.437187] vif vif-51-0 v-juliet: Guest 
Rx ready
Apr 16 15:02:38 sierra kernel: [  341.437429] IPv6: 
ADDRCONF(NETDEV_CHANGE): v-juliet: link becomes ready
Apr 16 15:02:40 sierra systemd[1]: xendomains.service start operation timed 
out. Terminating.
Apr 16 15:02:40 sierra systemd[1]: Failed to start LSB: Start/stop 
secondary xen domains.
Apr 16 15:02:40 sierra systemd[1]: Unit xendomains.service entered failed 
state.

That the 51st domain, around 60% of the way through the list of domains
it should have started.

Once I'd realised only some of the domains were started I reran "service
xendomains start" and it finished the job.

So, is there a built in timeout of ~5 minutes here that I need to
increase? 

I see that the generated /run/systemd/generator.late/xendomains.service
file contains:

.
.
[Service]
Type=forking
Restart=no
TimeoutSec=5min
.
.

So that's probably what is being hit, but I cannot work out how to make
the generator apply a longer timeout.

Any hints would be appreciated.

Cheers,
Andy

-- System Information:
Debian Release: 8.4
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages xen-utils-common depends on:
ii  lsb-base4.1+Debian13+nmu1
ii  python  2.7.9-1
ii  ucf 3.0030
ii  udev215-17+deb8u4
ii  xenstore-utils  4.4.1-9+deb8u4

xen-utils-common recommends no packages.

xen-utils-common suggests no packages.

-- Configuration Files:
/etc/default/xendomains changed [not included]
/etc/xen/scripts/vif-route changed [not included]
/etc/xen/xl.conf changed [not included]

-- no debconf information