Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-13 Thread Barry Scott
On Wednesday 12 October 2011 17:54:27 Barry Scott wrote:
 On Wednesday 12 October 2011 16:42:38 Michal Schmidt wrote:
  On 10/12/2011 04:54 PM, Barry Scott wrote:
   What dependency is supposed to cause the swapoff to be after the 
   production
   processes are stopped?
  
  The units' ordering should be something like this:
  
  *.swap Before swap.target Before sysinit.target Before basic.target
  Before production.service
  
  This ordering is used for startup. When shutting down, they should get 
  stopped in the reverse order.
 
 I think I see what the issue is.
 
 My default target contains:
 
   Requires=basic.target
   After=basic.target

 But it also contains lines like:
 
   Requires=start-signage.service
   After=start-signage.service
 
 This means that start-signage can run in parallel with basic.target.
 
 What I need to do is find all the production services and tell them
 to run after basic.target. I'll work on that and see where I get.

Using systemctl dump I can see that my production services all
have the suggested properties:

...
- Unit daemon-network-manager-controller.service:
Requires: basic.target
After: basic.target
References: basic.target

- Unit daemon-layout.service:
Requires: basic.target
After: basic.target
References: basic.target
...

Does this point to a bug in systemd-26-8.fc15.i686 reboot code?

Barry
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-12 Thread Barry Scott
Thanks to Michal's observation that swapoff failed we have now found the root 
cause.

swapoff is called while all our production processes are still running.

We would have expected systemd to turn off swap after stopping most if not
all processes and thus freeing up as much memory as possible. swapoff fails
becuase there is not enough ram available to load the pages from swap into.

Is this a systemd problem or an issue with the way we wrote our services and 
targets?

Barry
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-12 Thread Michal Schmidt

On 10/12/2011 03:46 PM, Barry Scott wrote:

Thanks to Michal's observation that swapoff failed we have now found the
root cause.

swapoff is called while all our production processes are still running.

We would have expected systemd to turn off swap after stopping most if not
all processes and thus freeing up as much memory as possible. swapoff fails
becuase there is not enough ram available to load the pages from swap into.

Is this a systemd problem or an issue with the way we wrote our services
and targets?


Do you use DefaultDependencies=false in any of your service unit 
files? That could explain the missing ordering.


Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-12 Thread Barry Scott
On Wednesday 12 October 2011 15:27:10 Michal Schmidt wrote:
 On 10/12/2011 03:46 PM, Barry Scott wrote:
  Thanks to Michal's observation that swapoff failed we have now found the
  root cause.
 
  swapoff is called while all our production processes are still running.
 
  We would have expected systemd to turn off swap after stopping most if not
  all processes and thus freeing up as much memory as possible. swapoff fails
  becuase there is not enough ram available to load the pages from swap into.
 
  Is this a systemd problem or an issue with the way we wrote our services
  and targets?
 
 Do you use DefaultDependencies=false in any of your service unit 
 files? That could explain the missing ordering.

No, not in any of our files.

What dependency is supposed to cause the swapoff to be after the production
processes are stopped?

Barry
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-12 Thread Michal Schmidt

On 10/12/2011 04:54 PM, Barry Scott wrote:

What dependency is supposed to cause the swapoff to be after the production
processes are stopped?


The units' ordering should be something like this:

*.swap Before swap.target Before sysinit.target Before basic.target
Before production.service

This ordering is used for startup. When shutting down, they should get 
stopped in the reverse order.


Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-12 Thread Barry Scott
On Wednesday 12 October 2011 16:42:38 Michal Schmidt wrote:
 On 10/12/2011 04:54 PM, Barry Scott wrote:
  What dependency is supposed to cause the swapoff to be after the production
  processes are stopped?
 
 The units' ordering should be something like this:
 
 *.swap Before swap.target Before sysinit.target Before basic.target
 Before production.service
 
 This ordering is used for startup. When shutting down, they should get 
 stopped in the reverse order.

I think I see what the issue is.

My default target contains:

Requires=basic.target
After=basic.target

But it also contains lines like:

Requires=start-signage.service
After=start-signage.service

This means that start-signage can run in parallel with basic.target.

What I need to do is find all the production services and tell them
to run after basic.target. I'll work on that and see where I get.

Barry
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-11 Thread Michal Schmidt

On 10/11/2011 12:44 PM, Barry Scott wrote:

The full log is here:

http://onelanftp.co.uk/bscott/ntb10117.netconsole.txt

The first boot sequence from lines 1-182 works.

The second from 188-1182 has the problem.

If the kernel is at fault then I would expect to see the
About to execute: /bin/systemctl --force reboot line
and a failure to reboot.

But given that line is not output I suspect its more
likely to be an issue with systemd not getting to that
point. And question is what stops this happening?


In the second run umount.target is not reached. It is not obvious why.
The log shows only /data, /boot and /media being unmounted in both runs.
There may be a mountpoint that refuses to die. Did you try waiting a few 
minutes to see if any timeout kicks in?
It is unfortunate that we only see the messages after syslog is stopped. 
Try with systemd.log_target=console, or just stop rsyslog.service 
manually before invoking reboot. This way netconsole should show more of 
the events.


Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-11 Thread Barry Scott
On Tuesday 11 October 2011 12:37:23 Michal Schmidt wrote:
 On 10/11/2011 12:44 PM, Barry Scott wrote:
  The full log is here:
 
  http://onelanftp.co.uk/bscott/ntb10117.netconsole.txt
 
  The first boot sequence from lines 1-182 works.
 
  The second from 188-1182 has the problem.
 
  If the kernel is at fault then I would expect to see the
  About to execute: /bin/systemctl --force reboot line
  and a failure to reboot.

  But given that line is not output I suspect its more
  likely to be an issue with systemd not getting to that
  point. And question is what stops this happening?
 
 In the second run umount.target is not reached. It is not obvious why.
 The log shows only /data, /boot and /media being unmounted in both runs.
 There may be a mountpoint that refuses to die. Did you try waiting a few 
 minutes to see if any timeout kicks in?

We do not see any timeout.

 It is unfortunate that we only see the messages after syslog is stopped. 
 Try with systemd.log_target=console, or just stop rsyslog.service 
 manually before invoking reboot. This way netconsole should show more of 
 the events.

I have uploaded full netconsole logs with the options you suggested:

http://onelanftp.co.uk/bscott/netconsole.noreboot.txt
http://onelanftp.co.uk/bscott/netconsole.reboot.txt

Here is the output of mount from the same system after power-fail
reboot:

# mount
/proc on /proc type proc (rw,relatime)
/sys on /sys type sysfs (rw,relatime)
udev on /dev type devtmpfs 
(rw,nosuid,relatime,size=509736k,nr_inodes=127434,mode=755)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /dev/shm type tmpfs (rw,relatime)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755)
/dev/mapper/NTBgroup-System4 on / type ext4 
(rw,noatime,user_xattr,barrier=1,data=ordered)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup 
(rw,nosuid,nodev,noexec,relatime,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup 
(rw,nosuid,nodev,noexec,relatime,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup 
(rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup 
(rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup 
(rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup 
(rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup 
(rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup 
(rw,nosuid,nodev,noexec,relatime,perf_event)
systemd-1 on /dev/mqueue type autofs 
(rw,relatime,fd=23,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
systemd-1 on /dev/hugepages type autofs 
(rw,relatime,fd=24,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs 
(rw,relatime,fd=25,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
systemd-1 on /sys/kernel/debug type autofs 
(rw,relatime,fd=26,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
systemd-1 on /sys/kernel/security type autofs 
(rw,relatime,fd=28,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
tmpfs on /media type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755)
/dev/mapper/NTBgroup-Data on /data type ext4 
(rw,noatime,user_xattr,barrier=1,data=ordered)
/dev/sda1 on /boot type ext3 
(rw,noatime,errors=continue,user_xattr,acl,barrier=0,data=ordered)

Barry
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-11 Thread Michal Schmidt

On 10/11/2011 05:09 PM, Barry Scott wrote:

I have uploaded full netconsole logs with the options you suggested:

http://onelanftp.co.uk/bscott/netconsole.noreboot.txt
http://onelanftp.co.uk/bscott/netconsole.reboot.txt


A suspicious moment occurring only in the failing case is:

[  751.743255] swapoff[7788]: swapoff: /dev/dm-1: swapoff failed: Cannot 
allocate memory

...
[  751.830279] systemd[1]: dev-dm\x2d1.swap changed deactivating - failed

I wonder if this could prevent the reaching of umount.target.

Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-11 Thread Barry Scott
On Tuesday 11 October 2011 16:35:00 Michal Schmidt wrote:
 On 10/11/2011 05:09 PM, Barry Scott wrote:
  I have uploaded full netconsole logs with the options you suggested:
 
  http://onelanftp.co.uk/bscott/netconsole.noreboot.txt
  http://onelanftp.co.uk/bscott/netconsole.reboot.txt
 
 A suspicious moment occurring only in the failing case is:
 
 [  751.743255] swapoff[7788]: swapoff: /dev/dm-1: swapoff failed: Cannot 
 allocate memory
 ...
 [  751.830279] systemd[1]: dev-dm\x2d1.swap changed deactivating - failed
 
 I wonder if this could prevent the reaching of umount.target.

Aha. I'll work on findout out what is using up all the memory.
Maybe I have a process that has eaten up lots of ram preventing
swapoff from succeeding.

Barry
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-10 Thread Lennart Poettering
On Thu, 06.10.11 15:56, Barry Scott (barry.sc...@onelan.co.uk) wrote:

 We can reproducably get an F15 system in a state that it fails to
 complete a reboot.
 
 Becuase all the logging, network and getty's has been stopped all
 we can see is the kernel messages on the console.
 
 What we need to know is why systemd is not completing the reboot.
 
 What advice do you have on how to debug this situation please?

My usual recommendation is to boot with systemd.log_level=debug
systemd.log_target=kmsg on the kernel cmdline.

Then, place a script like the following in /lib/systemd/system-shutdown
and mark it +x:

#!/bin/bash
mount / -o rw,remount
dmesg  /dmesg.shutdown
mount / -o ro,remount
exit 0

Then, when you shutdown you'll have the full kernel log buffer in
/dmesg.shutdown, right before the machine powers off/reboots. If you get
that far however, it's usually a good indication that the kernel is
borked, rather than systemd.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-07 Thread Michal Schmidt

On 10/06/2011 07:26 PM, Barry Scott wrote:

On Thursday 06 October 2011 15:56:43 Barry Scott wrote:
  We can reproducably get an F15 system in a state that it fails to
  complete a reboot.


Then can you suggest how to reproduce it?


The obvious difference between a reboot that works and one that
fails is that I see these lines in the good case:

[ 135.391206] systemd[1]: final.target changed dead - active
[ 135.391217] systemd[1]: Job final.target/start finished, result=done
[ 135.391249] systemd[1]: About to execute: /bin/systemctl --force reboot


What is printed before these lines? Could you attach the complete log?

Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Seeking Advice on debugging failure to reboot

2011-10-06 Thread Barry Scott
On Thursday 06 October 2011 15:56:43 Barry Scott wrote:
 We can reproducably get an F15 system in a state that it fails to
 complete a reboot.
 
 Becuase all the logging, network and getty's has been stopped all
 we can see is the kernel messages on the console.
 
 What we need to know is why systemd is not completing the reboot.
 
 What advice do you have on how to debug this situation please?
 
 Barry
 














We have got a bit further along using netconsole to see the kmsg
after systemd turns off syslog.







The obvious difference between a reboot that works and one that
fails is that I see these lines in the good case:







[  135.391206] systemd[1]: final.target changed dead - active
[  135.391217] systemd[1]: Job final.target/start finished, result=done
[  135.391249] systemd[1]: About to execute: /bin/systemctl --force reboot







But this never happens in the bad case.







How do I track down why final.target is not activated?







Barry






___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel