Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On Wednesday 12 October 2011 17:54:27 Barry Scott wrote: On Wednesday 12 October 2011 16:42:38 Michal Schmidt wrote: On 10/12/2011 04:54 PM, Barry Scott wrote: What dependency is supposed to cause the swapoff to be after the production processes are stopped? The units' ordering should be something like this: *.swap Before swap.target Before sysinit.target Before basic.target Before production.service This ordering is used for startup. When shutting down, they should get stopped in the reverse order. I think I see what the issue is. My default target contains: Requires=basic.target After=basic.target But it also contains lines like: Requires=start-signage.service After=start-signage.service This means that start-signage can run in parallel with basic.target. What I need to do is find all the production services and tell them to run after basic.target. I'll work on that and see where I get. Using systemctl dump I can see that my production services all have the suggested properties: ... - Unit daemon-network-manager-controller.service: Requires: basic.target After: basic.target References: basic.target - Unit daemon-layout.service: Requires: basic.target After: basic.target References: basic.target ... Does this point to a bug in systemd-26-8.fc15.i686 reboot code? Barry ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
Thanks to Michal's observation that swapoff failed we have now found the root cause. swapoff is called while all our production processes are still running. We would have expected systemd to turn off swap after stopping most if not all processes and thus freeing up as much memory as possible. swapoff fails becuase there is not enough ram available to load the pages from swap into. Is this a systemd problem or an issue with the way we wrote our services and targets? Barry ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On 10/12/2011 03:46 PM, Barry Scott wrote: Thanks to Michal's observation that swapoff failed we have now found the root cause. swapoff is called while all our production processes are still running. We would have expected systemd to turn off swap after stopping most if not all processes and thus freeing up as much memory as possible. swapoff fails becuase there is not enough ram available to load the pages from swap into. Is this a systemd problem or an issue with the way we wrote our services and targets? Do you use DefaultDependencies=false in any of your service unit files? That could explain the missing ordering. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On Wednesday 12 October 2011 15:27:10 Michal Schmidt wrote: On 10/12/2011 03:46 PM, Barry Scott wrote: Thanks to Michal's observation that swapoff failed we have now found the root cause. swapoff is called while all our production processes are still running. We would have expected systemd to turn off swap after stopping most if not all processes and thus freeing up as much memory as possible. swapoff fails becuase there is not enough ram available to load the pages from swap into. Is this a systemd problem or an issue with the way we wrote our services and targets? Do you use DefaultDependencies=false in any of your service unit files? That could explain the missing ordering. No, not in any of our files. What dependency is supposed to cause the swapoff to be after the production processes are stopped? Barry ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On 10/12/2011 04:54 PM, Barry Scott wrote: What dependency is supposed to cause the swapoff to be after the production processes are stopped? The units' ordering should be something like this: *.swap Before swap.target Before sysinit.target Before basic.target Before production.service This ordering is used for startup. When shutting down, they should get stopped in the reverse order. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On Wednesday 12 October 2011 16:42:38 Michal Schmidt wrote: On 10/12/2011 04:54 PM, Barry Scott wrote: What dependency is supposed to cause the swapoff to be after the production processes are stopped? The units' ordering should be something like this: *.swap Before swap.target Before sysinit.target Before basic.target Before production.service This ordering is used for startup. When shutting down, they should get stopped in the reverse order. I think I see what the issue is. My default target contains: Requires=basic.target After=basic.target But it also contains lines like: Requires=start-signage.service After=start-signage.service This means that start-signage can run in parallel with basic.target. What I need to do is find all the production services and tell them to run after basic.target. I'll work on that and see where I get. Barry ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On 10/11/2011 12:44 PM, Barry Scott wrote: The full log is here: http://onelanftp.co.uk/bscott/ntb10117.netconsole.txt The first boot sequence from lines 1-182 works. The second from 188-1182 has the problem. If the kernel is at fault then I would expect to see the About to execute: /bin/systemctl --force reboot line and a failure to reboot. But given that line is not output I suspect its more likely to be an issue with systemd not getting to that point. And question is what stops this happening? In the second run umount.target is not reached. It is not obvious why. The log shows only /data, /boot and /media being unmounted in both runs. There may be a mountpoint that refuses to die. Did you try waiting a few minutes to see if any timeout kicks in? It is unfortunate that we only see the messages after syslog is stopped. Try with systemd.log_target=console, or just stop rsyslog.service manually before invoking reboot. This way netconsole should show more of the events. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On Tuesday 11 October 2011 12:37:23 Michal Schmidt wrote: On 10/11/2011 12:44 PM, Barry Scott wrote: The full log is here: http://onelanftp.co.uk/bscott/ntb10117.netconsole.txt The first boot sequence from lines 1-182 works. The second from 188-1182 has the problem. If the kernel is at fault then I would expect to see the About to execute: /bin/systemctl --force reboot line and a failure to reboot. But given that line is not output I suspect its more likely to be an issue with systemd not getting to that point. And question is what stops this happening? In the second run umount.target is not reached. It is not obvious why. The log shows only /data, /boot and /media being unmounted in both runs. There may be a mountpoint that refuses to die. Did you try waiting a few minutes to see if any timeout kicks in? We do not see any timeout. It is unfortunate that we only see the messages after syslog is stopped. Try with systemd.log_target=console, or just stop rsyslog.service manually before invoking reboot. This way netconsole should show more of the events. I have uploaded full netconsole logs with the options you suggested: http://onelanftp.co.uk/bscott/netconsole.noreboot.txt http://onelanftp.co.uk/bscott/netconsole.reboot.txt Here is the output of mount from the same system after power-fail reboot: # mount /proc on /proc type proc (rw,relatime) /sys on /sys type sysfs (rw,relatime) udev on /dev type devtmpfs (rw,nosuid,relatime,size=509736k,nr_inodes=127434,mode=755) devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /dev/shm type tmpfs (rw,relatime) tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755) /dev/mapper/NTBgroup-System4 on / type ext4 (rw,noatime,user_xattr,barrier=1,data=ordered) tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu) cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) systemd-1 on /dev/mqueue type autofs (rw,relatime,fd=23,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) systemd-1 on /dev/hugepages type autofs (rw,relatime,fd=24,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=25,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) systemd-1 on /sys/kernel/debug type autofs (rw,relatime,fd=26,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) systemd-1 on /sys/kernel/security type autofs (rw,relatime,fd=28,pgrp=1,timeout=300,minproto=5,maxproto=5,direct) tmpfs on /media type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755) /dev/mapper/NTBgroup-Data on /data type ext4 (rw,noatime,user_xattr,barrier=1,data=ordered) /dev/sda1 on /boot type ext3 (rw,noatime,errors=continue,user_xattr,acl,barrier=0,data=ordered) Barry ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On 10/11/2011 05:09 PM, Barry Scott wrote: I have uploaded full netconsole logs with the options you suggested: http://onelanftp.co.uk/bscott/netconsole.noreboot.txt http://onelanftp.co.uk/bscott/netconsole.reboot.txt A suspicious moment occurring only in the failing case is: [ 751.743255] swapoff[7788]: swapoff: /dev/dm-1: swapoff failed: Cannot allocate memory ... [ 751.830279] systemd[1]: dev-dm\x2d1.swap changed deactivating - failed I wonder if this could prevent the reaching of umount.target. Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On Tuesday 11 October 2011 16:35:00 Michal Schmidt wrote: On 10/11/2011 05:09 PM, Barry Scott wrote: I have uploaded full netconsole logs with the options you suggested: http://onelanftp.co.uk/bscott/netconsole.noreboot.txt http://onelanftp.co.uk/bscott/netconsole.reboot.txt A suspicious moment occurring only in the failing case is: [ 751.743255] swapoff[7788]: swapoff: /dev/dm-1: swapoff failed: Cannot allocate memory ... [ 751.830279] systemd[1]: dev-dm\x2d1.swap changed deactivating - failed I wonder if this could prevent the reaching of umount.target. Aha. I'll work on findout out what is using up all the memory. Maybe I have a process that has eaten up lots of ram preventing swapoff from succeeding. Barry ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On Thu, 06.10.11 15:56, Barry Scott (barry.sc...@onelan.co.uk) wrote: We can reproducably get an F15 system in a state that it fails to complete a reboot. Becuase all the logging, network and getty's has been stopped all we can see is the kernel messages on the console. What we need to know is why systemd is not completing the reboot. What advice do you have on how to debug this situation please? My usual recommendation is to boot with systemd.log_level=debug systemd.log_target=kmsg on the kernel cmdline. Then, place a script like the following in /lib/systemd/system-shutdown and mark it +x: #!/bin/bash mount / -o rw,remount dmesg /dmesg.shutdown mount / -o ro,remount exit 0 Then, when you shutdown you'll have the full kernel log buffer in /dmesg.shutdown, right before the machine powers off/reboots. If you get that far however, it's usually a good indication that the kernel is borked, rather than systemd. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On 10/06/2011 07:26 PM, Barry Scott wrote: On Thursday 06 October 2011 15:56:43 Barry Scott wrote: We can reproducably get an F15 system in a state that it fails to complete a reboot. Then can you suggest how to reproduce it? The obvious difference between a reboot that works and one that fails is that I see these lines in the good case: [ 135.391206] systemd[1]: final.target changed dead - active [ 135.391217] systemd[1]: Job final.target/start finished, result=done [ 135.391249] systemd[1]: About to execute: /bin/systemctl --force reboot What is printed before these lines? Could you attach the complete log? Michal ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Seeking Advice on debugging failure to reboot
On Thursday 06 October 2011 15:56:43 Barry Scott wrote: We can reproducably get an F15 system in a state that it fails to complete a reboot. Becuase all the logging, network and getty's has been stopped all we can see is the kernel messages on the console. What we need to know is why systemd is not completing the reboot. What advice do you have on how to debug this situation please? Barry We have got a bit further along using netconsole to see the kmsg after systemd turns off syslog. The obvious difference between a reboot that works and one that fails is that I see these lines in the good case: [ 135.391206] systemd[1]: final.target changed dead - active [ 135.391217] systemd[1]: Job final.target/start finished, result=done [ 135.391249] systemd[1]: About to execute: /bin/systemctl --force reboot But this never happens in the bad case. How do I track down why final.target is not activated? Barry ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel