Re: [systemd-devel] Errorneous detection of degraded array
On Mon, Jan 30, 2017 at 9:36 AM, NeilBrownwrote: ... > > systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. > systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. > systemd[1]: Starting Activate md array even though degraded... > systemd[1]: Stopped target Local File Systems. > systemd[1]: Stopping Local File Systems. > systemd[1]: Unmounting /share... > systemd[1]: Stopped (with error) /dev/md0. >>> ... > > The race is, I think, that one I mentioned. If the md device is started > before udev tells systemd to start the timer, the Conflicts dependencies > goes the "wrong" way and stops the wrong thing. > From the logs provided it is unclear whether it is *timer* or *service*. If it is timer - I do not understand why it is started exactly 30 seconds after device apparently appears. This would match starting service. Yet another case where system logging is hopelessly unfriendly for troubleshooting :( > It would be nice to be able to reliably stop the timer when the device > starts, without risking having the device get stopped when the timer > starts, but I don't think we can reliably do that. > Well, let's wait until we can get some more information about what happens. > Changing the > Conflicts=sys-devices-virtual-block-%i.device > lines to > ConditionPathExists=/sys/devices/virtual/block/%i > might make the problem go away, without any negative consequences. > Ugly, but yes, may be this is the only way using current systemd. > The primary purpose of having the 'Conflicts' directives was so that > systemd wouldn't log > Starting Activate md array even though degraded > after the array was successfully started. This looks like cosmetic problem. What will happen if last resort service is started when array is fully assembled? Will it do any harm? > Hopefully it won't do that when the Condition fails. > ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Errorneous detection of degraded array
Am 30.01.2017 um 07:36 schrieb NeilBrown: By virtue of "Following" attribute. dev-md0.device is Following sys-devices-virtual-block-md0.device so stopping the latter will also stop the former. Ahh.. I see why I never saw this now. Two reasons. 1/ My /etc/fstab has UUID=d1711227-c9fa-4883-a904-7cd7a3eb865c rather than /dev/md0 systemd doesn't manage to intuit a 'Following' dependency between the UUID and the mount point. 2/ I use partitions of md arrays: that UUID is actually /dev/md0p3. systemd doesn't intuit that md0p3.device is Following md0.device. So you only hit a problem if you have "/dev/md0" or similar in /etc/fstab. the fstab in the virtual machine which ad that issue around 3 times in the last year is using UUID - so it's not hit often at all but wouldn't have expected it especially hit a VM (reason for that raid is just testing RAID10 with writemostly option) UUID=f0b27a5c-7e3d-45ad-8b7f-617820860379 /mnt/raid10 ext4 defaults,commit=30,inode_readahead_blks=16,relatime,lazytime,barrier=0 0 1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Errorneous detection of degraded array
On Mon, Jan 30 2017, Andrei Borzenkov wrote: > 30.01.2017 04:53, NeilBrown пишет: >> On Fri, Jan 27 2017, Andrei Borzenkov wrote: >> >>> 26.01.2017 21:02, Luke Pyzowski пишет: Hello, I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly (around 50% of the time) systemd will unmount my RAID device thinking it is degraded after the mdadm-last-resort@.timer expires, however the device is working normally by all accounts, and I can immediately mount it manually upon boot completion. In the logs below /share is the RAID device. I can increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from 30 to 60 seconds, but this problem can randomly still occur. systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. systemd[1]: Starting Activate md array even though degraded... systemd[1]: Stopped target Local File Systems. systemd[1]: Stopping Local File Systems. systemd[1]: Unmounting /share... systemd[1]: Stopped (with error) /dev/md0. >> >> This line perplexes me. >> >> The last-resort.service (and .timer) files have a Conflict= directive >> against sys-devices-virtual-block-md$DEV.device >> Normally a Conflicts= directive means that if this service starts, that >> one is stopped, and if that one starts, this is stopped. >> However .device units cannot be stopped: >> >> $ systemctl show sys-devices-virtual-block-md0.device | grep Can >> CanStart=no >> CanStop=no >> CanReload=no >> CanIsolate=no >> >> so presumable the attempt to stop the device fails, so the Conflict= >> dependency cannot be met, so the last-resort service (or timer) doesn't >> get started. > > As I explained in other mail, to me it looks like last-resort timer does > get started, and then last-resort service is started which attempts to > stop device and because mount point depends on device it also stops > mount point. So somehow we have bad timing when both device and timer > start without canceling each other. > > The fact that stopping of device itself fails is irrelevant here - > dependencies are evaluated at the time job is submitted, so if > share.mount Requires dev-md0.device and you attempt to Stop > dev-md0.device, systemd still queues job to Stop share.mount. > >> At least, that is what I see happening in my tests. >> > > Yes, we have race condition here, I cannot reproduce this either. It > does not mean it does not exist :) Let's hope debug logging will show > something more useful (it is entirely possible that with debugging logs > turned on this race does not happen). > >> But your log doesn't mention sys-devices-virtual-block-md0, it >> mentions /dev/md0. >> How does systemd know about /dev/md0, or the connection it has with >> sys-devices-virtual-block-md0 ?? >> > > By virtue of "Following" attribute. dev-md0.device is Following > sys-devices-virtual-block-md0.device so stopping the latter will also > stop the former. Ahh.. I see why I never saw this now. Two reasons. 1/ My /etc/fstab has UUID=d1711227-c9fa-4883-a904-7cd7a3eb865c rather than /dev/md0 systemd doesn't manage to intuit a 'Following' dependency between the UUID and the mount point. 2/ I use partitions of md arrays: that UUID is actually /dev/md0p3. systemd doesn't intuit that md0p3.device is Following md0.device. So you only hit a problem if you have "/dev/md0" or similar in /etc/fstab. The race is, I think, that one I mentioned. If the md device is started before udev tells systemd to start the timer, the Conflicts dependencies goes the "wrong" way and stops the wrong thing. It would be nice to be able to reliably stop the timer when the device starts, without risking having the device get stopped when the timer starts, but I don't think we can reliably do that. Changing the Conflicts=sys-devices-virtual-block-%i.device lines to ConditionPathExists=/sys/devices/virtual/block/%i might make the problem go away, without any negative consequences. The primary purpose of having the 'Conflicts' directives was so that systemd wouldn't log Starting Activate md array even though degraded after the array was successfully started. Hopefully it won't do that when the Condition fails. Thanks, NeilBrown > >> Does >> systemctl list-dependencies sys-devices-virtual-block-md0.device >> >> report anything interesting? I get >> >> sys-devices-virtual-block-md0.device >> ● └─mdmonitor.service >> signature.asc Description: PGP signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Errorneous detection of degraded array
30.01.2017 04:53, NeilBrown пишет: > On Fri, Jan 27 2017, Andrei Borzenkov wrote: > >> 26.01.2017 21:02, Luke Pyzowski пишет: >>> Hello, >>> I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly >>> (around 50% of the time) systemd will unmount my RAID device thinking it is >>> degraded after the mdadm-last-resort@.timer expires, however the device is >>> working normally by all accounts, and I can immediately mount it manually >>> upon boot completion. In the logs below /share is the RAID device. I can >>> increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from >>> 30 to 60 seconds, but this problem can randomly still occur. >>> >>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. >>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. >>> systemd[1]: Starting Activate md array even though degraded... >>> systemd[1]: Stopped target Local File Systems. >>> systemd[1]: Stopping Local File Systems. >>> systemd[1]: Unmounting /share... >>> systemd[1]: Stopped (with error) /dev/md0. > > This line perplexes me. > > The last-resort.service (and .timer) files have a Conflict= directive > against sys-devices-virtual-block-md$DEV.device > Normally a Conflicts= directive means that if this service starts, that > one is stopped, and if that one starts, this is stopped. > However .device units cannot be stopped: > > $ systemctl show sys-devices-virtual-block-md0.device | grep Can > CanStart=no > CanStop=no > CanReload=no > CanIsolate=no > > so presumable the attempt to stop the device fails, so the Conflict= > dependency cannot be met, so the last-resort service (or timer) doesn't > get started. As I explained in other mail, to me it looks like last-resort timer does get started, and then last-resort service is started which attempts to stop device and because mount point depends on device it also stops mount point. So somehow we have bad timing when both device and timer start without canceling each other. The fact that stopping of device itself fails is irrelevant here - dependencies are evaluated at the time job is submitted, so if share.mount Requires dev-md0.device and you attempt to Stop dev-md0.device, systemd still queues job to Stop share.mount. > At least, that is what I see happening in my tests. > Yes, we have race condition here, I cannot reproduce this either. It does not mean it does not exist :) Let's hope debug logging will show something more useful (it is entirely possible that with debugging logs turned on this race does not happen). > But your log doesn't mention sys-devices-virtual-block-md0, it > mentions /dev/md0. > How does systemd know about /dev/md0, or the connection it has with > sys-devices-virtual-block-md0 ?? > By virtue of "Following" attribute. dev-md0.device is Following sys-devices-virtual-block-md0.device so stopping the latter will also stop the former. > Does > systemctl list-dependencies sys-devices-virtual-block-md0.device > > report anything interesting? I get > > sys-devices-virtual-block-md0.device > ● └─mdmonitor.service > signature.asc Description: OpenPGP digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Errorneous detection of degraded array
On Fri, Jan 27 2017, Andrei Borzenkov wrote: > 26.01.2017 21:02, Luke Pyzowski пишет: >> Hello, >> I have a large RAID6 device with 24 local drives on CentOS7.3. Randomly >> (around 50% of the time) systemd will unmount my RAID device thinking it is >> degraded after the mdadm-last-resort@.timer expires, however the device is >> working normally by all accounts, and I can immediately mount it manually >> upon boot completion. In the logs below /share is the RAID device. I can >> increase the timer in /usr/lib/systemd/system/mdadm-last-resort@.timer from >> 30 to 60 seconds, but this problem can randomly still occur. >> >> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice. >> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice. >> systemd[1]: Starting Activate md array even though degraded... >> systemd[1]: Stopped target Local File Systems. >> systemd[1]: Stopping Local File Systems. >> systemd[1]: Unmounting /share... >> systemd[1]: Stopped (with error) /dev/md0. This line perplexes me. The last-resort.service (and .timer) files have a Conflict= directive against sys-devices-virtual-block-md$DEV.device Normally a Conflicts= directive means that if this service starts, that one is stopped, and if that one starts, this is stopped. However .device units cannot be stopped: $ systemctl show sys-devices-virtual-block-md0.device | grep Can CanStart=no CanStop=no CanReload=no CanIsolate=no so presumable the attempt to stop the device fails, so the Conflict= dependency cannot be met, so the last-resort service (or timer) doesn't get started. At least, that is what I see happening in my tests. But your log doesn't mention sys-devices-virtual-block-md0, it mentions /dev/md0. How does systemd know about /dev/md0, or the connection it has with sys-devices-virtual-block-md0 ?? Does systemctl list-dependencies sys-devices-virtual-block-md0.device report anything interesting? I get sys-devices-virtual-block-md0.device ● └─mdmonitor.service NeilBrown signature.asc Description: PGP signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] New systemd-ui
On Fri, Jan 27, 2017 at 06:34:29PM +0100, Sławomir Lach wrote: > Hi. > > I discovered, there's no recent changes in systemadm/systemd-ui. I > prepared new UI, based on my application logic library, called > libgreattao. Libgreattao generate UI on the fly. The UI is generaated > on the data send by application, such as window class, allowed actions, > etc. Libgreattao can also work in network or shell mode, additionally. > It supports GTK+ backend and still experimentals Qt5 and CUI backends. Cool. I think there's space for another graphical manager, so it's nice to see experimentation in this area. There's some related projects: cockpit, systemd-kcm, and its offspring systemdgenie. > The two reason to don't move to libgreattao yet is my version of > systemd-ui is slow, eats a lot of memory and don't free memory in many > cases, but I will work on this. It'd be nice if you could put up the git repo somewhere. Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] "libsystemdexec/systemd-run --exec-from-unit"
Am 29.01.2017 um 07:56 schrieb Colin Walters: Hey so, this is is a half-baked thought, but here goes: One problem I've hit when trying to use systemd unit file features is that they only work when executed by systemd. Let's take the example of User=, but it applies to tons of other ones too. Wait you ask - your service runs under systemd, why do you care about running it not under systemd? The main case I hit is I often want to run my service under a debugger. If I just do gdb /usr/lib/myservice ... it will run as root. Now of course, I could runuser myservice gdb --args /usr/lib/myservice. But as a unit file gains more features, from WorkingDirectory to ProtectSystem=, I find myself wanting something like: systemd-run --exec-from-unit myservice.service /path/to/myservice/src/myservice what you really want is something like this (that's wat dropins below /etc/systemd/ are for to not touch the unit itself) supported by gdb because using valgrind within a systemd unit works for many years and the same for strace and a logfile output i never understodd why taht stupid gdb always ends in interactive mode instead "if thee is some crash write backtraces to file xyz and shut up" ExecStart=/usr/bin/valgrind --tool=memcheck --leak-check=yes --log-file=/var/log/valgrind/imapd.log /usr/sbin/dbmail-imapd -D ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel