Hello there. My setup is that i have five ipa servers. 2 in one location (alder, auth-syd2), 2 in anouther location (auth-wlg, auth-wlg2), and one in yet anouther location (waffle) which is reached over a long, mostly-but-possibly-notably-not-entirely reliable vpn connection.
I'm having an issue with an IPA server falling over. By 'falling over' what i mean is that it no longer responds to ldap queries (although the tcp port 389 is still open via nmap). When i run 'systemctl ipa stop' the command never seems to return, so up to now the only fix i have it to reboot that server. All machines are centos 7. All are using ipa-server-4.2.0-15.0.1.el7.centos.18.x86_64. Replication occurs between: alder<->auth-wlg, alder<->syd2, auth-wlg<->auth-wlg2, and auth-wlg<->waffle, possibly notably *not* between alder and waffle directly. The problem of ldap being unavailable occurs on alder only; the other ipa servers seem to be reliable. Unfortunately, alder is also our most used server. The error logs off alder look like this: http://pastebin.com/TxCVjWTe with reboot done at around 19:55 I did notice upon investigating / googling the errors in this log - starting with the attr_replace (nsslapd-referral) one, that on my servers this ldap query: ldapsearch -ZZ -h alder.blah.com -D "cn=Directory Manager" -W -b "o=ipaca" "(&(objectclass=nstombstone)(nsUniqueId=ffffffff-ffffffff-ffffffff-ffffffff))" | grep "nsds50ruv\|nsDS5ReplicaId" returns results similar to this: nsDS5ReplicaId: 96 nsds50ruv: {replicageneration} 5733d428000000600000 nsds50ruv: {replica 96 ldap://alder.blah.com:389} 5733d474000000600000 57 nsds50ruv: {replica 91 ldap://auth-syd2.blah.com:389} 576337b90004005b000 nsds50ruv: {replica 97 ldap://auth-wlg.blah.com:389} 5733d49a000000610000 nsds50ruv: {replica 1095 ldap://auth-wlg2.blah.com:389} 574fa5b0000004470 nsds50ruv: {replica 1090 ldap://waffle.bsh.blah.com:389} 576b1add00000442 nsds50ruv: {replica 1085 ldap://waffle.bsh.blah.com:389} 576b22f10000043d i.e: waffle is listed twice. If i run that ldap query on waffle though, i get no results at all (but the command does at least return). - so i dont know waffle's nsDS5ReplicaId at the moment. I understand once i know that i can clean-ruv the other id off the other ipa servers? I don't *think* any of this is related to my original issue above though, but it might be a smoking gun, i don't know - just mentioning it in case. At the moment i've not got a lot to go on. Has anyone else seen errors like those in the paste bin, or might know where to look for more useful info ? Possibly also worth noting that alder, and auth-syd2 are AWS ec2 instances. The rest are vm's on site(s).
-- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project