Hi Stephen,
Replies in-line below.
Thanks,
- Larry
On 3/3/15 11:49 AM, Stephen John Smoogen wrote:
On Mar 3, 2015 8:49 AM, "P. Larry Nelson" <[email protected]
<mailto:[email protected]>> wrote:
>
> I am seeing a bizarre bug where an SL6.x system hangs on either
> shutdown or reboot at the point where it wants to shutdown the
> loopback interface.
>
> Let me start off by saying I'm running a mixed shop of SL5.x servers
> (DNS, NIS, NTP, DHCP, NFS, etc.) along with a bunch of new cluster-esque
> nodes running SL6.x. All new SL6 nodes are Dell R410, R510, R710, for
> whatever that's worth, but I don't believe they have anything to do
> with the bug, per se.
>
> Since building these new SL6 nodes many weeks back, they have all
> exhibited this extremely annoying habit of hanging on shutdown or
> reboot at the shutdown of the loopback interface.
> Eventually (for the most part) they stop spinning whatever wheels
> they're spinning and do manage to complete either the shutdown or
> reboot, but it takes upwards of 15, 20, or 30 minutes! Usually
> I can't wait that long and just do a power off/on of the node.
>
> No amount of trying to find out what they are doing has worked,
> from trying to open another console window (Alt-F1, etc.) at
> shutdown/reboot to having top running in one terminal window while
> doing a 'service network restart' in another. Everything just freezes!
>
> I tried any number of things over the past several weeks, including
> ripping out NetworkManager knowing that it has had a history of mucking
> things up. No luck. They still hang.
>
> On another front, I was having some UID/GID problems with the mix of
> NFS v3 from my SL5.x file servers and NFS v4 on the SL6 nodes, so
> I forced all mounts to use NFS v3. I thought maybe that could be
> the problem, but again, no luck - still hanging.
>
> Revisiting it again in earnest this weekend via Google, I came up
> empty as all hits seemed to have something to do with scenarios that
> just did not apply, including many hits about a problem with running
> the iscsi daemon (and there was a patch for that). But I'm not running
> the iscsi daemon. It's not even installed.
>
> One comment by someone who also had the same problem was that he, not
> ever figuring out the cause, just commented out the line in
> /etc/init.d/network that shuts down the loopback interface, saying it's
> not a real device anyway, so what the hell.
>
> So yesterday I thought I'd try the commenting out the loopback
shutdown tactic on a test system. Sure enough, the reboot was normal
with no
> hangs.
>
> Ok, at least now I have a workaround, though that seems pretty kludgy.
>
> I decided to try and nail the culprit down with a fresh rebuild of
> a test system and see just where in the build process the bug appears.
>
> After the basic install of SL6, the system reboots just fine.
> Then do a 'yum update' with all its hundreds of patches.
> It reboots just fine, as I expected.
>
> So the first "local" change was to configure NIS.
> Try the reboot. Reboots fine.
>
> [ok, here is where it becomes bizarre]
> Modify /etc/nsswitch.conf to switch the order of "files nis" to
> "nis files" for passwd, shadow, and group, as I've always done.
> Reboot. Boom! It hangs at loopback interface shutdown!
>
I want to thank you for giving all the details of your testing. I would
like to use it as a future example of how to be constructive and helpful
to other people needing help.
Thanks. Yep, feel free to use this as an example. I suppose it comes
from being in the biz for over 46 years and shaking my head at *SO* many
ill conceived requests for help on listservs.
So have you looked at nscd any? Does having nscd turned on or off alter
this problem.
Nay, I have not, and frankly, it didn't occur to me till you asked.
I will explore that when I get a chance and see if it alters the problem.
Also what is in hosts and is the NIS server listed. Thanks
I assume you're talking about /etc/hosts on the clients.
The SL6.x clients just have the following in hosts:
127.0.0.1 localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost localhost.localdomain localhost6
localhost6.localdomain6
> I repeated this many times to be sure, and it happens the same on
> every SL6.x node.
>
> Bug or feature? I can't imagine it to be a feature nor can I
> fathom what the order of "files" and "nis" in /etc/nsswitch.conf
> has to do with the hanging of the loopback interface shutdown.
> It's possible that an SL6.x NIS server might correct the situation,
> but I have no time right now to spend a week on that not knowing
> it would even work.
>
> Comments and suggestions are welcome.
>
> - Larry
>
> --
> P. Larry Nelson (217-244-9855) | IT Administrator
> 461 Loomis Lab | High Energy Physics Group
> 1110 W. Green St., Urbana, IL | Physics Dept., Univ. of Ill.
> MailTo:[email protected] <mailto:[email protected]> |
http://www.brf-llc.com/lnelson/
> -------------------------------------------------------------------
> "Information without accountability is just noise." - P.L. Nelson
--
P. Larry Nelson (217-244-9855) | IT Administrator
461 Loomis Lab | High Energy Physics Group
1110 W. Green St., Urbana, IL | Physics Dept., Univ. of Ill.
MailTo:[email protected] | http://www.brf-llc.com/lnelson/
-------------------------------------------------------------------
"Information without accountability is just noise." - P.L. Nelson