Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
No problem, I think it's very hard to find out what the problem is if you don't have direct access to the affected machines. I reinstalled one of the machines, so there is one more machine running Debian Jessie, and it is affected with the bug. Nobody uses this machine, so I won't reinstall it. These machines were affected and since one of them is still running Jessie, it is still affected, so it is an ideal subject :) 2014-11-20 4:09 GMT+01:00 Benjamin Kaduk ka...@mit.edu: On Sun, 16 Nov 2014, István Kuklin wrote: I reinstalled my machine and switched from Debian to Ubuntu. It seems that it isn't affected with this bug, shutdown is quick now. This is unsurpising, but good to hear confirmed. I'm sorry that you're having such troubles with Debian and I'm failing to debug them effectively. I still have two machines running Debian testing, so I am still able collect logs if you wish. Just to clairfy: these two machines also suffer from the slow shutdown/reboot? I will try to convince some systemd experts to help out on this bug, as I seem to be at the limits of my understanding. -Ben
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
Asking around on IRC, I was linked to http://freedesktop.org/wiki/Software/systemd/Debugging/#shutdowncompleteseventually which has a little bit of advice for debugging issues such as this. -Ben -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
On Sun, 16 Nov 2014, István Kuklin wrote: I reinstalled my machine and switched from Debian to Ubuntu. It seems that it isn't affected with this bug, shutdown is quick now. This is unsurpising, but good to hear confirmed. I'm sorry that you're having such troubles with Debian and I'm failing to debug them effectively. I still have two machines running Debian testing, so I am still able collect logs if you wish. Just to clairfy: these two machines also suffer from the slow shutdown/reboot? I will try to convince some systemd experts to help out on this bug, as I seem to be at the limits of my understanding. -Ben
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
I reinstalled my machine and switched from Debian to Ubuntu. It seems that it isn't affected with this bug, shutdown is quick now. I still have two machines running Debian testing, so I am still able collect logs if you wish. 2014-11-09 14:57 GMT+01:00 István Kuklin kuklins...@gmail.com: I have an rc.local file, but it contains only exit 0 line and some comments before it. I tried to specify Before=umount.target, but nothing has changed, apparently. http://pastebin.com/XckqcvR2 User number 5000 is a user with AFS home directory. Now I'm considering to switch back to Ubuntu; maybe it hasn't got the bug as I had no problem with Wheezy as well. 2014-11-06 21:06 GMT+01:00 Benjamin Kaduk ka...@mit.edu: Thanks for that. It looks like the builtin systemd umount.target is trying to unmount /afs before or in parallel with the openafs-client.service commands to unmount /afs, which is not the desired ordering. We should be able to specify Before=umount.target in openafs-client.service to get a different behavior. (I'm not confident enough in my understanding of systemd yet to claim that this should fix the issue.) If you want, you can copy /lib/systemd/system/openafs-client.service to /etc/systemd/system and make that change locally (the /etc version overrides the /lib version), but I will try to get this in a new upload as well. I'm still confused by the lines: nov 04 18:15:36 kingdom-play systemd[1]: user@5000.service stop-sigterm timed out. Killing. nov 04 18:15:36 kingdom-play systemd[1]: Stopped User Manager for UID 5000. nov 04 18:15:36 kingdom-play systemd[1]: Unit user@5000.service entered failed state. which account for more than a minute of the delay, but the larger portion of the delay seems attributable to the bits which are obviously AFS-related. I did find http://forums.fedoraforum.org/archive/index.php/t-298680.html , which seems to implicate an rc.local file. Do you have one in place? -Ben
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
I have an rc.local file, but it contains only exit 0 line and some comments before it. I tried to specify Before=umount.target, but nothing has changed, apparently. http://pastebin.com/XckqcvR2 User number 5000 is a user with AFS home directory. Now I'm considering to switch back to Ubuntu; maybe it hasn't got the bug as I had no problem with Wheezy as well. 2014-11-06 21:06 GMT+01:00 Benjamin Kaduk ka...@mit.edu: Thanks for that. It looks like the builtin systemd umount.target is trying to unmount /afs before or in parallel with the openafs-client.service commands to unmount /afs, which is not the desired ordering. We should be able to specify Before=umount.target in openafs-client.service to get a different behavior. (I'm not confident enough in my understanding of systemd yet to claim that this should fix the issue.) If you want, you can copy /lib/systemd/system/openafs-client.service to /etc/systemd/system and make that change locally (the /etc version overrides the /lib version), but I will try to get this in a new upload as well. I'm still confused by the lines: nov 04 18:15:36 kingdom-play systemd[1]: user@5000.service stop-sigterm timed out. Killing. nov 04 18:15:36 kingdom-play systemd[1]: Stopped User Manager for UID 5000. nov 04 18:15:36 kingdom-play systemd[1]: Unit user@5000.service entered failed state. which account for more than a minute of the delay, but the larger portion of the delay seems attributable to the bits which are obviously AFS-related. I did find http://forums.fedoraforum.org/archive/index.php/t-298680.html , which seems to implicate an rc.local file. Do you have one in place? -Ben
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
Thanks for that. It looks like the builtin systemd umount.target is trying to unmount /afs before or in parallel with the openafs-client.service commands to unmount /afs, which is not the desired ordering. We should be able to specify Before=umount.target in openafs-client.service to get a different behavior. (I'm not confident enough in my understanding of systemd yet to claim that this should fix the issue.) If you want, you can copy /lib/systemd/system/openafs-client.service to /etc/systemd/system and make that change locally (the /etc version overrides the /lib version), but I will try to get this in a new upload as well. I'm still confused by the lines: nov 04 18:15:36 kingdom-play systemd[1]: user@5000.service stop-sigterm timed out. Killing. nov 04 18:15:36 kingdom-play systemd[1]: Stopped User Manager for UID 5000. nov 04 18:15:36 kingdom-play systemd[1]: Unit user@5000.service entered failed state. which account for more than a minute of the delay, but the larger portion of the delay seems attributable to the bits which are obviously AFS-related. I did find http://forums.fedoraforum.org/archive/index.php/t-298680.html , which seems to implicate an rc.local file. Do you have one in place? -Ben -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
Here we go: http://pastebin.com/aCREDCR8 2014. 11. 2, vasárnap keltezéssel 23.11-kor Benjamin Kaduk ezt írta: On Fri, 31 Oct 2014, Kuklin István wrote: I've upgraded the openafs-client package to unstable, but it hasn't solved the problem. I tested the OpenVPN problem by stopping the openvpn service before logging in but the machine hung on shutdown anyway, so it looks like the problem isn't with the VPN. Maybe just I misconfigured something, because it seems like my case is quite special :) Hard to say. I should probably mark the bug as re-opened, if you're still seeing it with the version from unstable. Can you grab the journalctl log of a hung shutdown with the new package, please? Thanks, Ben smime.p7s Description: S/MIME cryptographic signature
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
On Fri, 31 Oct 2014, Kuklin István wrote: I've upgraded the openafs-client package to unstable, but it hasn't solved the problem. I tested the OpenVPN problem by stopping the openvpn service before logging in but the machine hung on shutdown anyway, so it looks like the problem isn't with the VPN. Maybe just I misconfigured something, because it seems like my case is quite special :) Hard to say. I should probably mark the bug as re-opened, if you're still seeing it with the version from unstable. Can you grab the journalctl log of a hung shutdown with the new package, please? Thanks, Ben
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
I've upgraded the openafs-client package to unstable, but it hasn't solved the problem. I tested the OpenVPN problem by stopping the openvpn service before logging in but the machine hung on shutdown anyway, so it looks like the problem isn't with the VPN. Maybe just I misconfigured something, because it seems like my case is quite special :) 2014. 10. 31, péntek keltezéssel 15.17-kor Benjamin Kaduk ezt írta: On Fri, 31 Oct 2014, Kuklin István wrote: Hello there, Here we go: http://pastebin.com/uQ3n21CY Thanks. Interestingly, this trace seems to show that the openafs-client shut down successfully, as those are the normal shutdown messages and systemd seems to think the client shut down successfully. There are two delays, okt 31 16:40:21 okt 31 16:41:16 okt 31 16:41:47 kingdom-play systemd[1]: user@5000.service stop-sigterm timed out. Killing. I don't see an obvious cause for the first gap, but the second one is clearly a systemd process that is hanging and has to wait for a timeout. I gather this user@UID.service job relates to any running systemd --user invocation, with configuration in ~/.systemd/. It's unclear whether the user session would be trying to write to ~/.systemd/ at that point, though. I think I've noticed something: all the machines run OpenVPN and the AFS server machine's LAN IP address 192.168.0.2, but it has got another IP address on the VPN: 192.168.99.1. Although I've defined the server's IP address in /etc/openafs/CellServDB, but as far as I remember I saw somewhere in the logs AFS looking for the OpenVPN address... So, I think OpenVPN terminates before AFS and so AFS cannot find the server on 192.168.99.1, and this causes the system to hang. What can I do, if that's the problem? Is that possible, even with the correct CellServDB file? It's not clear that that's the problem, but the only thing that comes to mind would be to put a NetInfo file on the server so that the wrong address isn't registered in the vldb. (http://docs.openafs.org/Reference/5/NetInfo.html) I believe that the systemd unit file for openafs-client in sid (I uploaded a new version yesterday that fixes the bug I mentioned) should have the ordering directives needed to ensure that the client is shut down after user sessions (what I mentioned above) and before the network is shut down (which Andrew mentioned previously on the ticket). -Ben P.S. I see that we dropped the bug address from the cc list, which is reasonable given the pastebin that was linked here. It's probably best to forward at least the rest of these messages onto the bug so the history is recorded, though -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
On Sat, 6 Sep 2014, Kuklin István wrote: Okay, if the bug has gone, I'll report it. The 1.6.10-1 in unstable has a unit file for the client. (It also has a RC bug against it; stopping the client before taking the upgrade should be a valid workaround.) However, before you take the upgrade, I figured out (while reading up on systemd) how to collect the shutdown messages I was interested in. With systemd, the shutdown messages are logged in the journal, which can be stored across boot, depending on the configuration of journald. In the default configuration, you need to create the directory /var/log/journal for the logs to be stored. Starting from the next boot after that, the logs should be kept, so after the following reboot (and hang), you can then use 'journalctl --full --system -b -1' to get logs from the previous boot and shutdown. I would be interested in seeing those, in the vicinity of the openafs-client shutdown messages, if you can. Thanks, Ben
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
Okay, if the bug has gone, I'll report it. Thank you for your help! István 2014. 09. 5, péntek keltezéssel 14.39-kor Benjamin Kaduk ezt írta: On Fri, 5 Sep 2014, Kuklin István wrote: Last time the console wrote: [ *** ] A stop job is running for User Manager for 5000 Does that mean something? I don't know what it means, offhand. At this point, I feel like the best step forward is going to be to use a proper systemd unit file for the client, instead of relying on the compatibility shims for sysvinit scripts, since there doesn't seem to be an obvious way to further debug exactly what's happening at the moment. I don't think I have an ETA for when that might happen, though. How should I do that exactly? :) I think it's something that the package maintainers need to do, not something that you need to do. Some future version of the package will include a systemd unit file for the openafs-client, and we will want to come back to this ticket and re-test the shutdown behavior with that new version of the package. -Ben signature.asc Description: This is a digitally signed message part
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
I have already removed the quiet parameter from /etc/default/grub, and setting LogLevel=debug hasn't changed anything, it doesn't give more information. Last time the console wrote: [ *** ] A stop job is running for User Manager for 5000 Does that mean something? 2014. 09. 4, csütörtök keltezéssel 22.07-kor Benjamin Kaduk ezt írta: On Thu, 4 Sep 2014, Kuklin István wrote: I've found another clue: The shutdown problem initializes itself only if I cd to the afs share (after kinit and aklog). Without that, shutdown is quick. Here are some links to some pictures I took: I see this on shutdown if I mount the share from a tty with a local account (using kinit a central account, aklog, then cd): http://pbrd.co/1o246kL When it hangs, it looks like this(sorry for the quality): http://pasteboard.co/2Nv06Lqd.jpg Once it looked like this: http://pasteboard.co/2Nv6A7q0.jpg http://pasteboard.co/2Nv7OMtt.jpg Here is a video: http://youtu.be/sAc44PtsJds Thanks for putting in the time to capture all this data, I really appreciate the effort. I don't see an obvious smoking gun, but there are at least a couple of hints. I have remotved 'quiet' from my kernel command line (/etc/default/grub) and set LogLevel=debug in /etc/systemd/system.conf to try and get more/better diagnostics. Also, using shutdown -H should leave the last messages visible without powering off. (I'm not sure that the very last messages are going to be helpful, though.) In the video I'm logged in with a central profile (I use PAM modules for AFS home directories), which can sudo on that machine. When I'm shutting it down, you see what happens. It's not best quality and a couple of lines are missing from the picture at the ending so if you wish, I can record it again, for example the whole screen without moving the camcorder. Note that I'm using the same machine in the video as before, I've just replaced the machine name earlier to client1 for better understanding. Please try to reproduce the problem by cd-ing to the afs share. I had cd-ed into /afs in my previous attempts, though maybe I did not have an active shell still there during the reboot attempts. Note that I don't have to stay in the directory, it is enough to cd into it once, so it's okay if you don't have an active shell with that directory. Even now, when I halt the system with root's shell in /afs/..., I do not see a noticably longer shutdown time than when AFS has not been used. I can, however, reproduce some of the hints I mentioned above. Well, sometimes. It doesn't seem fully deterministic. In particular, there is a diagnostic about unmounting /afs failing, and later on a note that a cold shutdown is being performed (these two are related). You had a message AFS isn't unmounted yet! Call aborted, which is another indicator of this, since it is what happens when a shutdown syscall is issued but shutdown is already in progress (but incomplete). At this point, I feel like the best step forward is going to be to use a proper systemd unit file for the client, instead of relying on the compatibility shims for sysvinit scripts, since there doesn't seem to be an obvious way to further debug exactly what's happening at the moment. I don't think I have an ETA for when that might happen, though. How should I do that exactly? :) I'm sorry, I always find new things in Linux. -Ben signature.asc Description: This is a digitally signed message part
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
On Fri, 5 Sep 2014, Kuklin István wrote: Last time the console wrote: [ *** ] A stop job is running for User Manager for 5000 Does that mean something? I don't know what it means, offhand. At this point, I feel like the best step forward is going to be to use a proper systemd unit file for the client, instead of relying on the compatibility shims for sysvinit scripts, since there doesn't seem to be an obvious way to further debug exactly what's happening at the moment. I don't think I have an ETA for when that might happen, though. How should I do that exactly? :) I think it's something that the package maintainers need to do, not something that you need to do. Some future version of the package will include a systemd unit file for the openafs-client, and we will want to come back to this ticket and re-test the shutdown behavior with that new version of the package. -Ben
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
On Fri, 5 Sep 2014 14:39:03 -0400 Benjamin Kaduk ka...@mit.edu wrote: On Fri, 5 Sep 2014, Kuklin István wrote: Last time the console wrote: [ *** ] A stop job is running for User Manager for 5000 Does that mean something? I don't know what it means, offhand. If it's not clear, this isn't a message from openafs; it's something for systemd (I think) but I'm not exactly sure what it means. And sorry if I'm butting in without reading this in detail, but maybe one possibility is that we're hanging on trying to access the net, and the local interface is down. Specifically, the shutdown process tries to stop the openafs-client service, and it fails (because something is accessing /afs). Later on, the shutdown process stops all processes, which includes stopping networkmanager which takes down the interface. Then we try to umount all filesystems, which means umounting /afs, which can mean hitting the net (giving up callbacks, or flushing certain things). And the afs client hangs on trying to access the net for a while. I'm not sure at the moment of an easy way of verifying if that is what is going on, but it's just an idea. -- Andrew Deason adea...@sinenomine.net -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
I've found another clue: The shutdown problem initializes itself only if I cd to the afs share (after kinit and aklog). Without that, shutdown is quick. Here are some links to some pictures I took: I see this on shutdown if I mount the share from a tty with a local account (using kinit a central account, aklog, then cd): http://pbrd.co/1o246kL When it hangs, it looks like this(sorry for the quality): http://pasteboard.co/2Nv06Lqd.jpg Once it looked like this: http://pasteboard.co/2Nv6A7q0.jpg http://pasteboard.co/2Nv7OMtt.jpg Here is a video: http://youtu.be/sAc44PtsJds In the video I'm logged in with a central profile (I use PAM modules for AFS home directories), which can sudo on that machine. When I'm shutting it down, you see what happens. It's not best quality and a couple of lines are missing from the picture at the ending so if you wish, I can record it again, for example the whole screen without moving the camcorder. Note that I'm using the same machine in the video as before, I've just replaced the machine name earlier to client1 for better understanding. Please try to reproduce the problem by cd-ing to the afs share. Thank you for your help! István 2014. 09. 3, szerda keltezéssel 00.30-kor Benjamin Kaduk ezt írta: On Tue, 2 Sep 2014, Kuklin István wrote: Okay, here is a complete one from the booting to shutting down: http://pastebin.com/tApVAfM1 Thanks for this. On first glance, I don't see anything that looks suspicious or particularly relevant. It looks like the syslog has stopped when the shutdown started, so anything that may have happened after that didn't make it to the log. Of course, those are just the parts that we would be most interested in. Can you arrange to be watching the console of an affected machine during the hang? My local jessie VM seems to reboot quickly after having had AFS mounted, so I don't seem to be able to reproduce the issue at the moment. -Ben signature.asc Description: This is a digitally signed message part
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
On Thu, 4 Sep 2014, Kuklin István wrote: I've found another clue: The shutdown problem initializes itself only if I cd to the afs share (after kinit and aklog). Without that, shutdown is quick. Here are some links to some pictures I took: I see this on shutdown if I mount the share from a tty with a local account (using kinit a central account, aklog, then cd): http://pbrd.co/1o246kL When it hangs, it looks like this(sorry for the quality): http://pasteboard.co/2Nv06Lqd.jpg Once it looked like this: http://pasteboard.co/2Nv6A7q0.jpg http://pasteboard.co/2Nv7OMtt.jpg Here is a video: http://youtu.be/sAc44PtsJds Thanks for putting in the time to capture all this data, I really appreciate the effort. I don't see an obvious smoking gun, but there are at least a couple of hints. I have remotved 'quiet' from my kernel command line (/etc/default/grub) and set LogLevel=debug in /etc/systemd/system.conf to try and get more/better diagnostics. Also, using shutdown -H should leave the last messages visible without powering off. (I'm not sure that the very last messages are going to be helpful, though.) In the video I'm logged in with a central profile (I use PAM modules for AFS home directories), which can sudo on that machine. When I'm shutting it down, you see what happens. It's not best quality and a couple of lines are missing from the picture at the ending so if you wish, I can record it again, for example the whole screen without moving the camcorder. Note that I'm using the same machine in the video as before, I've just replaced the machine name earlier to client1 for better understanding. Please try to reproduce the problem by cd-ing to the afs share. I had cd-ed into /afs in my previous attempts, though maybe I did not have an active shell still there during the reboot attempts. Even now, when I halt the system with root's shell in /afs/..., I do not see a noticably longer shutdown time than when AFS has not been used. I can, however, reproduce some of the hints I mentioned above. Well, sometimes. It doesn't seem fully deterministic. In particular, there is a diagnostic about unmounting /afs failing, and later on a note that a cold shutdown is being performed (these two are related). You had a message AFS isn't unmounted yet! Call aborted, which is another indicator of this, since it is what happens when a shutdown syscall is issued but shutdown is already in progress (but incomplete). At this point, I feel like the best step forward is going to be to use a proper systemd unit file for the client, instead of relying on the compatibility shims for sysvinit scripts, since there doesn't seem to be an obvious way to further debug exactly what's happening at the moment. I don't think I have an ETA for when that might happen, though. -Ben
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
Thank you for your answer. Unfortunately, I'll not be able to answer so quick, but I'll do my best. I think I found something in /var/log/messages, this line appears 4 times: Sep 2 08:06:17 client1 kernel: [ 113.230480] afs: byte-range locks only enforced for processes on this machine (pid 2430 (zeitgeist-daemo), user 5000, fid 536870921.300.500). Sep 2 08:06:36 client1 kernel: [ 132.165684] afs: byte-range locks only enforced for processes on this machine (pid 2409 (tracker-store), user 5000, fid 536870921.660.863186). Sep 2 08:06:52 client1 kernel: [ 148.373147] afs: byte-range locks only enforced for processes on this machine (pid 2692 (localStorage DB), user 5000, fid 536870921.22438.708412). Sep 2 08:09:09 client1 kernel: [ 284.963197] afs: byte-range locks only enforced for processes on this machine (pid 2692 (localStorage DB), user 5000, fid 536870921.22438.708412). And I have to correct myself: the whole rebooting process takes 2 or 3 minutes, according to this line: Sep 2 08:09:25 client1 rsyslogd: [origin software=rsyslogd swVersion=8.4.0 x-pid=759 x-info=http://www.rsyslog.com;] exiting on signal 15. Sep 2 08:11:47 client1 rsyslogd: [origin software=rsyslogd swVersion=8.4.0 x-pid=759 x-info=http://www.rsyslog.com;] start Anyway, shutting down the machine takes still too long... May I copy a bigger part of /var/log/messages? István 2014. 09. 1, hétfő keltezéssel 00.42-kor Benjamin Kaduk ezt írta: On Sun, 31 Aug 2014, Kuklin István wrote: There is a network with central LDAP+Kerberos+AFS users. If a central user tries to access an afs share, shutting down the client is going to take about 3 minutes. It can be done using PAM modules, or with a local (non-central) user using kinit ldap+krb5-username, then aklog commands. If user logs out correctly using unlog and kdestroy, it doesn't solve the problem, shutting down is going to take about 3 minutes. If I stop openafs-client service and umount /afs before shutdown, it doesn't help. It affects rebooting as well. It seems that the system is trying to stop some User Manager job at shutdown as far as I remember. This problem affects Debian Jessie, shutdown was quite quick on Wheezy. It affects all the client machines. I'm writing this report from a client machine. The kernel messages during the hang (ideally with timestamps) would be quite helpful for understanding what's going on here. I'll have to double-check, but I may only have wheezy and sid machines sitting around. I would expect any issues to also be present on sid, but one never knows... -Ben signature.asc Description: This is a digitally signed message part
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
On Tue, 2 Sep 2014, Kuklin Istv=C3=A1n wrote: Thank you for your answer. Unfortunately, I'll not be able to answer so quick, but I'll do my best. =20 I think I found something in /var/log/messages, this line appears 4 times: Sep 2 08:06:17 client1 kernel: [ 113.230480] afs: byte-range locks only enforced for processes on this machine (pid 2430 (zeitgeist-daemo), user 5000, fid 536870921.300.500). Sep 2 08:06:36 client1 kernel: [ 132.165684] afs: byte-range locks only enforced for processes on this machine (pid 2409 (tracker-store), user 5000, fid 536870921.660.863186). Sep 2 08:06:52 client1 kernel: [ 148.373147] afs: byte-range locks only enforced for processes on this machine (pid 2692 (localStorage DB), user 5000, fid 536870921.22438.708412). Sep 2 08:09:09 client1 kernel: [ 284.963197] afs: byte-range locks only enforced for processes on this machine (pid 2692 (localStorage DB), user 5000, fid 536870921.22438.708412). I don't think these are helpful; they are normal messages, and the=20 timestamps are outside of the reboot window you derived from the syslog=20 messages below. And I have to correct myself: the whole rebooting process takes 2 or 3 minutes, according to this line: Sep 2 08:09:25 client1 rsyslogd: [origin software=3Drsyslogd swVersion=3D8.4.0 x-pid=3D759 x-info=3Dhttp://www.rsyslog.com;] exit= ing on signal 15. Sep 2 08:11:47 client1 rsyslogd: [origin software=3Drsyslogd swVersion=3D8.4.0 x-pid=3D759 x-info=3Dhttp://www.rsyslog.com;] star= t Anyway, shutting down the machine takes still too long... =20 May I copy a bigger part of /var/log/messages? Please do. I will see if I can pull up a VM for testing. -Ben -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
Okay, here is a complete one from the booting to shutting down: http://pastebin.com/tApVAfM1 2014. 09. 2, kedd keltezéssel 11.17-kor Benjamin Kaduk ezt írta: On Tue, 2 Sep 2014, Kuklin Istv=C3=A1n wrote: Thank you for your answer. Unfortunately, I'll not be able to answer so quick, but I'll do my best. =20 I think I found something in /var/log/messages, this line appears 4 times: Sep 2 08:06:17 client1 kernel: [ 113.230480] afs: byte-range locks only enforced for processes on this machine (pid 2430 (zeitgeist-daemo), user 5000, fid 536870921.300.500). Sep 2 08:06:36 client1 kernel: [ 132.165684] afs: byte-range locks only enforced for processes on this machine (pid 2409 (tracker-store), user 5000, fid 536870921.660.863186). Sep 2 08:06:52 client1 kernel: [ 148.373147] afs: byte-range locks only enforced for processes on this machine (pid 2692 (localStorage DB), user 5000, fid 536870921.22438.708412). Sep 2 08:09:09 client1 kernel: [ 284.963197] afs: byte-range locks only enforced for processes on this machine (pid 2692 (localStorage DB), user 5000, fid 536870921.22438.708412). I don't think these are helpful; they are normal messages, and the=20 timestamps are outside of the reboot window you derived from the syslog=20 messages below. And I have to correct myself: the whole rebooting process takes 2 or 3 minutes, according to this line: Sep 2 08:09:25 client1 rsyslogd: [origin software=3Drsyslogd swVersion=3D8.4.0 x-pid=3D759 x-info=3Dhttp://www.rsyslog.com;] exit= ing on signal 15. Sep 2 08:11:47 client1 rsyslogd: [origin software=3Drsyslogd swVersion=3D8.4.0 x-pid=3D759 x-info=3Dhttp://www.rsyslog.com;] star= t Anyway, shutting down the machine takes still too long... =20 May I copy a bigger part of /var/log/messages? Please do. I will see if I can pull up a VM for testing. -Ben signature.asc Description: This is a digitally signed message part
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
On Tue, 2 Sep 2014, Kuklin István wrote: Okay, here is a complete one from the booting to shutting down: http://pastebin.com/tApVAfM1 Thanks for this. On first glance, I don't see anything that looks suspicious or particularly relevant. It looks like the syslog has stopped when the shutdown started, so anything that may have happened after that didn't make it to the log. Of course, those are just the parts that we would be most interested in. Can you arrange to be watching the console of an affected machine during the hang? My local jessie VM seems to reboot quickly after having had AFS mounted, so I don't seem to be able to reproduce the issue at the moment. -Ben
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
Package: openafs-client Version: 1.6.9-1 Severity: important Tags: upstream There is a network with central LDAP+Kerberos+AFS users. If a central user tries to access an afs share, shutting down the client is going to take about 3 minutes. It can be done using PAM modules, or with a local (non-central) user using kinit ldap+krb5-username, then aklog commands. If user logs out correctly using unlog and kdestroy, it doesn't solve the problem, shutting down is going to take about 3 minutes. If I stop openafs-client service and umount /afs before shutdown, it doesn't help. It affects rebooting as well. It seems that the system is trying to stop some User Manager job at shutdown as far as I remember. This problem affects Debian Jessie, shutdown was quite quick on Wheezy. It affects all the client machines. I'm writing this report from a client machine. -- System Information: Debian Release: jessie/sid APT prefers testing-updates APT policy: (500, 'testing-updates'), (500, 'testing') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.14-2-amd64 (SMP w/2 CPU cores) Locale: LANG=hu_HU.UTF-8, LC_CTYPE=hu_HU.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages openafs-client depends on: ii debconf [debconf-2.0] 1.5.53 ii libc6 2.19-9 ii libcomerr2 1.42.11-2 ii libk5crypto3 1.12.1+dfsg-7 ii libkrb5-3 1.12.1+dfsg-7 ii libncurses55.9+20140712-2 ii libtinfo5 5.9+20140712-2 Versions of packages openafs-client recommends: ii lsof 4.86+dfsg-1 ii openafs-modules-dkms 1.6.9-1 Versions of packages openafs-client suggests: pn openafs-doc none ii openafs-krb5 1.6.9-1 -- debconf information: * openafs-client/cachesize: 5 openafs-client/afsdb: true * openafs-client/thiscell: lo openafs-client/crypt: true openafs-client/cell-info: openafs-client/dynroot: Yes openafs-client/run-client: true openafs-client/fakestat: true signature.asc Description: This is a digitally signed message part
Bug#760063: openafs-client: Acessing afs share causes slow shutdown/reboot (about 3 minutes) on Debian Jessie
On Sun, 31 Aug 2014, Kuklin István wrote: There is a network with central LDAP+Kerberos+AFS users. If a central user tries to access an afs share, shutting down the client is going to take about 3 minutes. It can be done using PAM modules, or with a local (non-central) user using kinit ldap+krb5-username, then aklog commands. If user logs out correctly using unlog and kdestroy, it doesn't solve the problem, shutting down is going to take about 3 minutes. If I stop openafs-client service and umount /afs before shutdown, it doesn't help. It affects rebooting as well. It seems that the system is trying to stop some User Manager job at shutdown as far as I remember. This problem affects Debian Jessie, shutdown was quite quick on Wheezy. It affects all the client machines. I'm writing this report from a client machine. The kernel messages during the hang (ideally with timestamps) would be quite helpful for understanding what's going on here. I'll have to double-check, but I may only have wheezy and sid machines sitting around. I would expect any issues to also be present on sid, but one never knows... -Ben