Re: [Freeipa-users] ipa replica failure
On Thu, Jun 25, 2015 at 05:40:23PM -0400, Andrew E. Bruno wrote: On Mon, Jun 22, 2015 at 12:49:01PM -0400, Rob Crittenden wrote: You aren't seeing a replication agreement. You're seeing the Replication Update Vector (RUV). See http://directory.fedoraproject.org/docs/389ds/howto/howto-cleanruv.html You need to do something like: # ldapmodify -D cn=directory manager -W -a dn: cn=clean 97, cn=cleanallruv, cn=tasks, cn=config objectclass: extensibleObject replica-base-dn: o=ipaca replica-id: 97 cn: clean 97 Great, thanks for the clarification. Curious what's the difference between running the ldapmodify above and ipa-replica-manage clean-ruv? Nothing, for the IPA data. This is a remanant from a CA replication agreement and it was an oversight not to add similar RUV management options to the ipa-careplica-manage tool. I'm still seeing some inconsistencies. Forgive me if I'm mis-interpreting any of this output (still learning the ropes with FreeIPA here).. Just trying to wrap my head around the RUVs. Trying to follow the docs here: http://directory.fedoraproject.org/docs/389ds/howto/howto-cleanruv.html And after running the ldapsearch command to check for obsolete masters I'm not seeing the replica ID for the old replica we deleted (rep2): $ ldapsearch -xLLL -D cn=directory manager -W -s sub -b cn=config objectclass=nsds5replica Enter LDAP Password: dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping tree,cn=config cn: replica nsDS5Flags: 1 objectClass: nsds5replica objectClass: top objectClass: extensibleobject nsDS5ReplicaType: 3 nsDS5ReplicaRoot: dc=ccr,dc=buffalo,dc=edu nsds5ReplicaLegacyConsumer: off nsDS5ReplicaId: 4 nsDS5ReplicaBindDN: cn=replication manager,cn=config nsDS5ReplicaBindDN: krbprincipalname=ldap/rep2@CCR.BUFFA LO.EDU,cn=services,cn=accounts,dc=ccr,dc=buffalo,dc=edu nsDS5ReplicaBindDN: krbprincipalname=ldap/rep3@CCR.BUFFA LO.EDU,cn=services,cn=accounts,dc=ccr,dc=buffalo,dc=edu nsState:: BABIa4xVJAABAA== nsDS5ReplicaName: a0957886-df9c11e4-a351aa45-2e06257b nsds5ReplicaChangeCount: 1687559 nsds5replicareapactive: 0 dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config objectClass: top objectClass: nsDS5Replica objectClass: extensibleobject nsDS5ReplicaRoot: o=ipaca nsDS5ReplicaType: 3 nsDS5ReplicaBindDN: cn=Replication Manager masterAgreement1-rep2 falo.edu-pki-tomcat,ou=csusers,cn=config nsDS5ReplicaBindDN: cn=Replication Manager masterAgreement1-rep3 falo.edu-pki-tomcat,ou=csusers,cn=config cn: replica nsDS5ReplicaId: 96 nsDS5Flags: 1 nsState:: YAAPa4xVAAkACgABAA== nsDS5ReplicaName: c458be8e-df9c11e4-a351aa45-2e06257b nsds5ReplicaChangeCount: 9480 nsds5replicareapactive: 0 I see: dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping tree,cn=config) nsds5replicaid: 4 and dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config nsDS5ReplicaId: 96 In the above output I only see the old replica showing up under: nsDS5ReplicaBindDN: krbprincipalname=ldap/rep2@CCR.BUFFA... According to the docs I need the nsds5replicaid for use in the CLEANALLRUV task? I also checked the RUV tombstone entry as per the docs: # ldapsearch -xLLL -D cn=directory manager -W -b dc=ccr,dc=buffalo,dc=edu '((nsuniqueid=---)(objectclass=nstombstone))' Enter LDAP Password: dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping tree,cn=config cn: replica nsDS5Flags: 1 objectClass: nsds5replica objectClass: top objectClass: extensibleobject nsDS5ReplicaType: 3 nsDS5ReplicaRoot: dc=ccr,dc=buffalo,dc=edu nsds5ReplicaLegacyConsumer: off nsDS5ReplicaId: 4 nsDS5ReplicaBindDN: cn=replication manager,cn=config nsDS5ReplicaBindDN: krbprincipalname=ldap/rep2@CCR.BUFFA LO.EDU,cn=services,cn=accounts,dc=ccr,dc=buffalo,dc=edu nsDS5ReplicaBindDN: krbprincipalname=ldap/rep3@CCR.BUFFA LO.EDU,cn=services,cn=accounts,dc=ccr,dc=buffalo,dc=edu nsState:: BADycYxVJAABAA== nsDS5ReplicaName: a0957886-df9c11e4-a351aa45-2e06257b nsds50ruv: {replicageneration} 5527f7110004 nsds50ruv: {replica 4 ldap://rep1:389} 5527f77100040 000 558c722800020004 nsds50ruv: {replica 5 ldap://rep3:389} 5537c77300050 000 5582c7f600060005 nsds5agmtmaxcsn: dc=ccr,dc=buffalo,dc=edu;meTorep3;rep3;389;5;558c572b000a0004 nsruvReplicaLastModified: {replica 4 ldap://rep1:389} 55 8c7204 nsruvReplicaLastModified: {replica 5 ldap://rep3:389} 00 00 nsds5ReplicaChangeCount: 1689129 nsds5replicareapactive: 0 And only see nsds50ruv attributes for rep1, and rep3. However, still seeing rep2 in the nsDS5ReplicaBindDN. If I'm parsing this output correct, it appears RUVs for rep2 is already cleaned? If so, how come the nsDS5ReplicaBindDN still exist? Also, why is there
Re: [Freeipa-users] ipa replica failure
On Mon, Jun 22, 2015 at 12:49:01PM -0400, Rob Crittenden wrote: You aren't seeing a replication agreement. You're seeing the Replication Update Vector (RUV). See http://directory.fedoraproject.org/docs/389ds/howto/howto-cleanruv.html You need to do something like: # ldapmodify -D cn=directory manager -W -a dn: cn=clean 97, cn=cleanallruv, cn=tasks, cn=config objectclass: extensibleObject replica-base-dn: o=ipaca replica-id: 97 cn: clean 97 Great, thanks for the clarification. Curious what's the difference between running the ldapmodify above and ipa-replica-manage clean-ruv? Nothing, for the IPA data. This is a remanant from a CA replication agreement and it was an oversight not to add similar RUV management options to the ipa-careplica-manage tool. I'm still seeing some inconsistencies. Forgive me if I'm mis-interpreting any of this output (still learning the ropes with FreeIPA here).. Just trying to wrap my head around the RUVs. Trying to follow the docs here: http://directory.fedoraproject.org/docs/389ds/howto/howto-cleanruv.html And after running the ldapsearch command to check for obsolete masters I'm not seeing the replica ID for the old replica we deleted (rep2): $ ldapsearch -xLLL -D cn=directory manager -W -s sub -b cn=config objectclass=nsds5replica Enter LDAP Password: dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping tree,cn=config cn: replica nsDS5Flags: 1 objectClass: nsds5replica objectClass: top objectClass: extensibleobject nsDS5ReplicaType: 3 nsDS5ReplicaRoot: dc=ccr,dc=buffalo,dc=edu nsds5ReplicaLegacyConsumer: off nsDS5ReplicaId: 4 nsDS5ReplicaBindDN: cn=replication manager,cn=config nsDS5ReplicaBindDN: krbprincipalname=ldap/rep2@CCR.BUFFA LO.EDU,cn=services,cn=accounts,dc=ccr,dc=buffalo,dc=edu nsDS5ReplicaBindDN: krbprincipalname=ldap/rep3@CCR.BUFFA LO.EDU,cn=services,cn=accounts,dc=ccr,dc=buffalo,dc=edu nsState:: BABIa4xVJAABAA== nsDS5ReplicaName: a0957886-df9c11e4-a351aa45-2e06257b nsds5ReplicaChangeCount: 1687559 nsds5replicareapactive: 0 dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config objectClass: top objectClass: nsDS5Replica objectClass: extensibleobject nsDS5ReplicaRoot: o=ipaca nsDS5ReplicaType: 3 nsDS5ReplicaBindDN: cn=Replication Manager masterAgreement1-rep2 falo.edu-pki-tomcat,ou=csusers,cn=config nsDS5ReplicaBindDN: cn=Replication Manager masterAgreement1-rep3 falo.edu-pki-tomcat,ou=csusers,cn=config cn: replica nsDS5ReplicaId: 96 nsDS5Flags: 1 nsState:: YAAPa4xVAAkACgABAA== nsDS5ReplicaName: c458be8e-df9c11e4-a351aa45-2e06257b nsds5ReplicaChangeCount: 9480 nsds5replicareapactive: 0 I see: dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping tree,cn=config) nsds5replicaid: 4 and dn: cn=replica,cn=o\3Dipaca,cn=mapping tree,cn=config nsDS5ReplicaId: 96 In the above output I only see the old replica showing up under: nsDS5ReplicaBindDN: krbprincipalname=ldap/rep2@CCR.BUFFA... According to the docs I need the nsds5replicaid for use in the CLEANALLRUV task? I also checked the RUV tombstone entry as per the docs: # ldapsearch -xLLL -D cn=directory manager -W -b dc=ccr,dc=buffalo,dc=edu '((nsuniqueid=---)(objectclass=nstombstone))' Enter LDAP Password: dn: cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2Cdc\3Dedu,cn=mapping tree,cn=config cn: replica nsDS5Flags: 1 objectClass: nsds5replica objectClass: top objectClass: extensibleobject nsDS5ReplicaType: 3 nsDS5ReplicaRoot: dc=ccr,dc=buffalo,dc=edu nsds5ReplicaLegacyConsumer: off nsDS5ReplicaId: 4 nsDS5ReplicaBindDN: cn=replication manager,cn=config nsDS5ReplicaBindDN: krbprincipalname=ldap/rep2@CCR.BUFFA LO.EDU,cn=services,cn=accounts,dc=ccr,dc=buffalo,dc=edu nsDS5ReplicaBindDN: krbprincipalname=ldap/rep3@CCR.BUFFA LO.EDU,cn=services,cn=accounts,dc=ccr,dc=buffalo,dc=edu nsState:: BADycYxVJAABAA== nsDS5ReplicaName: a0957886-df9c11e4-a351aa45-2e06257b nsds50ruv: {replicageneration} 5527f7110004 nsds50ruv: {replica 4 ldap://rep1:389} 5527f77100040 000 558c722800020004 nsds50ruv: {replica 5 ldap://rep3:389} 5537c77300050 000 5582c7f600060005 nsds5agmtmaxcsn: dc=ccr,dc=buffalo,dc=edu;meTorep3;rep3;389;5;558c572b000a0004 nsruvReplicaLastModified: {replica 4 ldap://rep1:389} 55 8c7204 nsruvReplicaLastModified: {replica 5 ldap://rep3:389} 00 00 nsds5ReplicaChangeCount: 1689129 nsds5replicareapactive: 0 And only see nsds50ruv attributes for rep1, and rep3. However, still seeing rep2 in the nsDS5ReplicaBindDN. If I'm parsing this output correct, it appears RUVs for rep2 is already cleaned? If so, how come the nsDS5ReplicaBindDN still exist? Also, why is there a nsds50ruv attribute for rep2 listed when I run this query (but not the others above): $ ldapsearch -xLLL -D cn=directory manager -W -b cn=mapping tree,cn=config objectClass=nsDS5ReplicationAgreement dn:
Re: [Freeipa-users] ipa replica failure
Andrew E. Bruno wrote: On Mon, Jun 22, 2015 at 10:02:59AM -0400, Rob Crittenden wrote: Andrew E. Bruno wrote: On Fri, Jun 19, 2015 at 03:18:50PM -0400, Rob Crittenden wrote: Rich Megginson wrote: On 06/19/2015 12:22 PM, Andrew E. Bruno wrote: Questions: 0. Is it likely that after running out of file descriptors the dirsrv slapd database on rep2 was corrupted? That would appear to be the case based on correlation of events, although I've never seen that happen, and it is not supposed to happen. 1. Do we have to run ipa-replica-manage del rep2 on *each* of the remaining replica servers (rep1 and rep3)? Or should it just be run on the first master? I believe it should only be run on the first master, but it hung, so something is not right, and I'm not sure how to remedy the situation. How long did it hang, and where? This command was run on rep1 (first master): [rep1]$ ipa-replica-manage del rep2 This command hung.. (~10 minutes..) until I Ctr-C. After noticing ldap queries were hanging on rep2 we ran this on rep2: [rep2]$ systemctl stop ipa (shutdown all ipa services on rep2) Then back on rep1 (first master) [rep1]$ ipa-replica-manage -v --force del rep2 Which appeared to work ok. Do we need to run ipa-csreplicate-manage del as well? 2. Why does the rep2 server still appear when querying the nsDS5ReplicationAgreement in ldap? Is this benign or will this pose problems when we go to add rep2 back in? You should remove it. And ipa-csreplica-manage is the tool to do it. When I run this on rep1 (first master): [rep1]$ ipa-csreplica-manage list Directory Manager password: rep3: master rep1: master [rep1]$ ipa-csreplica-manage del rep2 Directory Manager password: 'rep1' has no replication agreement for 'rep2' But seems to still be there: [rep1]$ ldapsearch -Y GSSAPI -b cn=mapping tree,cn=config objectClass=nsDS5ReplicationAgreement -LL dn: cn=masterAgreement1-rep3-pki-tomcat,cn=replica,cn=ipaca,cn=mapping tree,cn=config objectClass: top objectClass: nsds5replicationagreement cn: masterAgreement1-rep3-pki-tomcat nsDS5ReplicaRoot: o=ipaca nsDS5ReplicaHost: rep3 nsDS5ReplicaPort: 389 nsDS5ReplicaBindDN: cn=Replication Manager cloneAgreement1-rep3-pki-tomcat,ou=csusers,cn=config nsDS5ReplicaBindMethod: Simple nsDS5ReplicaTransportInfo: TLS description: masterAgreement1-rep3-pki-tomcat nsds50ruv: {replicageneration} 5527f74b0060 nsds50ruv: {replica 91 ldap://rep3:389} 5537c7ba005b 5582c7e40004005b nsds50ruv: {replica 96 ldap://rep1:389} 5527f7540060 5582cd190060 nsds50ruv: {replica 97 ldap://rep2:389} 5527f7600061 556f462b00040061 nsruvReplicaLastModified: {replica 91 ldap://rep3:389} 0 000 nsruvReplicaLastModified: {replica 96 ldap://rep1:389} 0 000 nsruvReplicaLastModified: {replica 97 ldap://rep2:389} 0 000 nsds5replicaLastUpdateStart: 20150619193149Z nsds5replicaLastUpdateEnd: 20150619193149Z nsds5replicaChangesSentSinceStartup:: OTY6MTMyLzAg nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: Incremental upd ate succeeded nsds5replicaUpdateInProgress: FALSE nsds5replicaLastInitStart: 0 nsds5replicaLastInitEnd: 0 However, when I run the ldapsearch on rep3 it's not there (the cn=ipaca,cn=mapping tree,cn=config is not listed): [rep3]$ ldapsearch -Y GSSAPI -b cn=mapping tree,cn=config objectClass=nsDS5ReplicationAgreement -LL dn: cn=meTorep1,cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2C dc\3Dedu,cn=mapping tree,cn=config cn: meTorep1 objectClass: nsds5replicationagreement objectClass: top nsDS5ReplicaTransportInfo: LDAP description: me to rep1 nsDS5ReplicaRoot: dc=ccr,dc=buffalo,dc=edu nsDS5ReplicaHost: rep1 3. What steps/commands can we take to verify rep2 was successfully removed and replication is behaving normally? The ldapsearch you performed already will confirm that the CA agreement has been removed. Still showing up.. Any thoughts? At this point we want to ensure both remaining masters are functional and operating normally. Any other commands you recommend running to check? You aren't seeing a replication agreement. You're seeing the Replication Update Vector (RUV). See http://directory.fedoraproject.org/docs/389ds/howto/howto-cleanruv.html You need to do something like: # ldapmodify -D cn=directory manager -W -a dn: cn=clean 97, cn=cleanallruv, cn=tasks, cn=config objectclass: extensibleObject replica-base-dn: o=ipaca replica-id: 97 cn: clean 97 Great, thanks for the clarification. Curious what's the difference between running the ldapmodify above and ipa-replica-manage clean-ruv? Nothing, for the IPA data. This is a remanant from a CA replication agreement and it was an oversight not to add similar RUV management options to the ipa-careplica-manage tool. rob -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] ipa replica failure
On Mon, Jun 22, 2015 at 10:02:59AM -0400, Rob Crittenden wrote: Andrew E. Bruno wrote: On Fri, Jun 19, 2015 at 03:18:50PM -0400, Rob Crittenden wrote: Rich Megginson wrote: On 06/19/2015 12:22 PM, Andrew E. Bruno wrote: Questions: 0. Is it likely that after running out of file descriptors the dirsrv slapd database on rep2 was corrupted? That would appear to be the case based on correlation of events, although I've never seen that happen, and it is not supposed to happen. 1. Do we have to run ipa-replica-manage del rep2 on *each* of the remaining replica servers (rep1 and rep3)? Or should it just be run on the first master? I believe it should only be run on the first master, but it hung, so something is not right, and I'm not sure how to remedy the situation. How long did it hang, and where? This command was run on rep1 (first master): [rep1]$ ipa-replica-manage del rep2 This command hung.. (~10 minutes..) until I Ctr-C. After noticing ldap queries were hanging on rep2 we ran this on rep2: [rep2]$ systemctl stop ipa (shutdown all ipa services on rep2) Then back on rep1 (first master) [rep1]$ ipa-replica-manage -v --force del rep2 Which appeared to work ok. Do we need to run ipa-csreplicate-manage del as well? 2. Why does the rep2 server still appear when querying the nsDS5ReplicationAgreement in ldap? Is this benign or will this pose problems when we go to add rep2 back in? You should remove it. And ipa-csreplica-manage is the tool to do it. When I run this on rep1 (first master): [rep1]$ ipa-csreplica-manage list Directory Manager password: rep3: master rep1: master [rep1]$ ipa-csreplica-manage del rep2 Directory Manager password: 'rep1' has no replication agreement for 'rep2' But seems to still be there: [rep1]$ ldapsearch -Y GSSAPI -b cn=mapping tree,cn=config objectClass=nsDS5ReplicationAgreement -LL dn: cn=masterAgreement1-rep3-pki-tomcat,cn=replica,cn=ipaca,cn=mapping tree,cn=config objectClass: top objectClass: nsds5replicationagreement cn: masterAgreement1-rep3-pki-tomcat nsDS5ReplicaRoot: o=ipaca nsDS5ReplicaHost: rep3 nsDS5ReplicaPort: 389 nsDS5ReplicaBindDN: cn=Replication Manager cloneAgreement1-rep3-pki-tomcat,ou=csusers,cn=config nsDS5ReplicaBindMethod: Simple nsDS5ReplicaTransportInfo: TLS description: masterAgreement1-rep3-pki-tomcat nsds50ruv: {replicageneration} 5527f74b0060 nsds50ruv: {replica 91 ldap://rep3:389} 5537c7ba005b 5582c7e40004005b nsds50ruv: {replica 96 ldap://rep1:389} 5527f7540060 5582cd190060 nsds50ruv: {replica 97 ldap://rep2:389} 5527f7600061 556f462b00040061 nsruvReplicaLastModified: {replica 91 ldap://rep3:389} 0 000 nsruvReplicaLastModified: {replica 96 ldap://rep1:389} 0 000 nsruvReplicaLastModified: {replica 97 ldap://rep2:389} 0 000 nsds5replicaLastUpdateStart: 20150619193149Z nsds5replicaLastUpdateEnd: 20150619193149Z nsds5replicaChangesSentSinceStartup:: OTY6MTMyLzAg nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: Incremental upd ate succeeded nsds5replicaUpdateInProgress: FALSE nsds5replicaLastInitStart: 0 nsds5replicaLastInitEnd: 0 However, when I run the ldapsearch on rep3 it's not there (the cn=ipaca,cn=mapping tree,cn=config is not listed): [rep3]$ ldapsearch -Y GSSAPI -b cn=mapping tree,cn=config objectClass=nsDS5ReplicationAgreement -LL dn: cn=meTorep1,cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2C dc\3Dedu,cn=mapping tree,cn=config cn: meTorep1 objectClass: nsds5replicationagreement objectClass: top nsDS5ReplicaTransportInfo: LDAP description: me to rep1 nsDS5ReplicaRoot: dc=ccr,dc=buffalo,dc=edu nsDS5ReplicaHost: rep1 3. What steps/commands can we take to verify rep2 was successfully removed and replication is behaving normally? The ldapsearch you performed already will confirm that the CA agreement has been removed. Still showing up.. Any thoughts? At this point we want to ensure both remaining masters are functional and operating normally. Any other commands you recommend running to check? You aren't seeing a replication agreement. You're seeing the Replication Update Vector (RUV). See http://directory.fedoraproject.org/docs/389ds/howto/howto-cleanruv.html You need to do something like: # ldapmodify -D cn=directory manager -W -a dn: cn=clean 97, cn=cleanallruv, cn=tasks, cn=config objectclass: extensibleObject replica-base-dn: o=ipaca replica-id: 97 cn: clean 97 Great, thanks for the clarification. Curious what's the difference between running the ldapmodify above and ipa-replica-manage clean-ruv? --Andrew -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] ipa replica failure
Andrew E. Bruno wrote: On Fri, Jun 19, 2015 at 03:18:50PM -0400, Rob Crittenden wrote: Rich Megginson wrote: On 06/19/2015 12:22 PM, Andrew E. Bruno wrote: Questions: 0. Is it likely that after running out of file descriptors the dirsrv slapd database on rep2 was corrupted? That would appear to be the case based on correlation of events, although I've never seen that happen, and it is not supposed to happen. 1. Do we have to run ipa-replica-manage del rep2 on *each* of the remaining replica servers (rep1 and rep3)? Or should it just be run on the first master? I believe it should only be run on the first master, but it hung, so something is not right, and I'm not sure how to remedy the situation. How long did it hang, and where? This command was run on rep1 (first master): [rep1]$ ipa-replica-manage del rep2 This command hung.. (~10 minutes..) until I Ctr-C. After noticing ldap queries were hanging on rep2 we ran this on rep2: [rep2]$ systemctl stop ipa (shutdown all ipa services on rep2) Then back on rep1 (first master) [rep1]$ ipa-replica-manage -v --force del rep2 Which appeared to work ok. Do we need to run ipa-csreplicate-manage del as well? 2. Why does the rep2 server still appear when querying the nsDS5ReplicationAgreement in ldap? Is this benign or will this pose problems when we go to add rep2 back in? You should remove it. And ipa-csreplica-manage is the tool to do it. When I run this on rep1 (first master): [rep1]$ ipa-csreplica-manage list Directory Manager password: rep3: master rep1: master [rep1]$ ipa-csreplica-manage del rep2 Directory Manager password: 'rep1' has no replication agreement for 'rep2' But seems to still be there: [rep1]$ ldapsearch -Y GSSAPI -b cn=mapping tree,cn=config objectClass=nsDS5ReplicationAgreement -LL dn: cn=masterAgreement1-rep3-pki-tomcat,cn=replica,cn=ipaca,cn=mapping tree,cn=config objectClass: top objectClass: nsds5replicationagreement cn: masterAgreement1-rep3-pki-tomcat nsDS5ReplicaRoot: o=ipaca nsDS5ReplicaHost: rep3 nsDS5ReplicaPort: 389 nsDS5ReplicaBindDN: cn=Replication Manager cloneAgreement1-rep3-pki-tomcat,ou=csusers,cn=config nsDS5ReplicaBindMethod: Simple nsDS5ReplicaTransportInfo: TLS description: masterAgreement1-rep3-pki-tomcat nsds50ruv: {replicageneration} 5527f74b0060 nsds50ruv: {replica 91 ldap://rep3:389} 5537c7ba005b 5582c7e40004005b nsds50ruv: {replica 96 ldap://rep1:389} 5527f7540060 5582cd190060 nsds50ruv: {replica 97 ldap://rep2:389} 5527f7600061 556f462b00040061 nsruvReplicaLastModified: {replica 91 ldap://rep3:389} 0 000 nsruvReplicaLastModified: {replica 96 ldap://rep1:389} 0 000 nsruvReplicaLastModified: {replica 97 ldap://rep2:389} 0 000 nsds5replicaLastUpdateStart: 20150619193149Z nsds5replicaLastUpdateEnd: 20150619193149Z nsds5replicaChangesSentSinceStartup:: OTY6MTMyLzAg nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: Incremental upd ate succeeded nsds5replicaUpdateInProgress: FALSE nsds5replicaLastInitStart: 0 nsds5replicaLastInitEnd: 0 However, when I run the ldapsearch on rep3 it's not there (the cn=ipaca,cn=mapping tree,cn=config is not listed): [rep3]$ ldapsearch -Y GSSAPI -b cn=mapping tree,cn=config objectClass=nsDS5ReplicationAgreement -LL dn: cn=meTorep1,cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2C dc\3Dedu,cn=mapping tree,cn=config cn: meTorep1 objectClass: nsds5replicationagreement objectClass: top nsDS5ReplicaTransportInfo: LDAP description: me to rep1 nsDS5ReplicaRoot: dc=ccr,dc=buffalo,dc=edu nsDS5ReplicaHost: rep1 3. What steps/commands can we take to verify rep2 was successfully removed and replication is behaving normally? The ldapsearch you performed already will confirm that the CA agreement has been removed. Still showing up.. Any thoughts? At this point we want to ensure both remaining masters are functional and operating normally. Any other commands you recommend running to check? You aren't seeing a replication agreement. You're seeing the Replication Update Vector (RUV). See http://directory.fedoraproject.org/docs/389ds/howto/howto-cleanruv.html You need to do something like: # ldapmodify -D cn=directory manager -W -a dn: cn=clean 97, cn=cleanallruv, cn=tasks, cn=config objectclass: extensibleObject replica-base-dn: o=ipaca replica-id: 97 cn: clean 97 rob -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] ipa replica failure
On Fri, Jun 19, 2015 at 03:18:50PM -0400, Rob Crittenden wrote: Rich Megginson wrote: On 06/19/2015 12:22 PM, Andrew E. Bruno wrote: Questions: 0. Is it likely that after running out of file descriptors the dirsrv slapd database on rep2 was corrupted? That would appear to be the case based on correlation of events, although I've never seen that happen, and it is not supposed to happen. 1. Do we have to run ipa-replica-manage del rep2 on *each* of the remaining replica servers (rep1 and rep3)? Or should it just be run on the first master? I believe it should only be run on the first master, but it hung, so something is not right, and I'm not sure how to remedy the situation. How long did it hang, and where? This command was run on rep1 (first master): [rep1]$ ipa-replica-manage del rep2 This command hung.. (~10 minutes..) until I Ctr-C. After noticing ldap queries were hanging on rep2 we ran this on rep2: [rep2]$ systemctl stop ipa (shutdown all ipa services on rep2) Then back on rep1 (first master) [rep1]$ ipa-replica-manage -v --force del rep2 Which appeared to work ok. Do we need to run ipa-csreplicate-manage del as well? 2. Why does the rep2 server still appear when querying the nsDS5ReplicationAgreement in ldap? Is this benign or will this pose problems when we go to add rep2 back in? You should remove it. And ipa-csreplica-manage is the tool to do it. When I run this on rep1 (first master): [rep1]$ ipa-csreplica-manage list Directory Manager password: rep3: master rep1: master [rep1]$ ipa-csreplica-manage del rep2 Directory Manager password: 'rep1' has no replication agreement for 'rep2' But seems to still be there: [rep1]$ ldapsearch -Y GSSAPI -b cn=mapping tree,cn=config objectClass=nsDS5ReplicationAgreement -LL dn: cn=masterAgreement1-rep3-pki-tomcat,cn=replica,cn=ipaca,cn=mapping tree,cn=config objectClass: top objectClass: nsds5replicationagreement cn: masterAgreement1-rep3-pki-tomcat nsDS5ReplicaRoot: o=ipaca nsDS5ReplicaHost: rep3 nsDS5ReplicaPort: 389 nsDS5ReplicaBindDN: cn=Replication Manager cloneAgreement1-rep3-pki-tomcat,ou=csusers,cn=config nsDS5ReplicaBindMethod: Simple nsDS5ReplicaTransportInfo: TLS description: masterAgreement1-rep3-pki-tomcat nsds50ruv: {replicageneration} 5527f74b0060 nsds50ruv: {replica 91 ldap://rep3:389} 5537c7ba005b 5582c7e40004005b nsds50ruv: {replica 96 ldap://rep1:389} 5527f7540060 5582cd190060 nsds50ruv: {replica 97 ldap://rep2:389} 5527f7600061 556f462b00040061 nsruvReplicaLastModified: {replica 91 ldap://rep3:389} 0 000 nsruvReplicaLastModified: {replica 96 ldap://rep1:389} 0 000 nsruvReplicaLastModified: {replica 97 ldap://rep2:389} 0 000 nsds5replicaLastUpdateStart: 20150619193149Z nsds5replicaLastUpdateEnd: 20150619193149Z nsds5replicaChangesSentSinceStartup:: OTY6MTMyLzAg nsds5replicaLastUpdateStatus: 0 Replica acquired successfully: Incremental upd ate succeeded nsds5replicaUpdateInProgress: FALSE nsds5replicaLastInitStart: 0 nsds5replicaLastInitEnd: 0 However, when I run the ldapsearch on rep3 it's not there (the cn=ipaca,cn=mapping tree,cn=config is not listed): [rep3]$ ldapsearch -Y GSSAPI -b cn=mapping tree,cn=config objectClass=nsDS5ReplicationAgreement -LL dn: cn=meTorep1,cn=replica,cn=dc\3Dccr\2Cdc\3Dbuffalo\2C dc\3Dedu,cn=mapping tree,cn=config cn: meTorep1 objectClass: nsds5replicationagreement objectClass: top nsDS5ReplicaTransportInfo: LDAP description: me to rep1 nsDS5ReplicaRoot: dc=ccr,dc=buffalo,dc=edu nsDS5ReplicaHost: rep1 3. What steps/commands can we take to verify rep2 was successfully removed and replication is behaving normally? The ldapsearch you performed already will confirm that the CA agreement has been removed. Still showing up.. Any thoughts? At this point we want to ensure both remaining masters are functional and operating normally. Any other commands you recommend running to check? 8192 is extremely high. The fact that you ran out of file descriptors at 8192 seems like a bug/fd leak somewhere. I suppose you could, as a very temporary workaround, set the fd limit higher, but that is no guarantee that you won't run out again. Please file at least 1 ticket e.g. database corrupted when server ran out of file descriptors, with as much information about that particular problem as you can provide. Will do. Thanks very much for all the help! --Andrew -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] ipa replica failure
On Fri, Jun 19, 2015 at 09:08:15PM -0700, Janelle wrote: On 6/19/15 11:22 AM, Andrew E. Bruno wrote: Hello, First time trouble shooting an ipa server failure and looking for some guidance on how best to proceed. First some background on our setup: Servers are running freeipa v4.1.0 on CentOS 7.1.1503: - ipa-server-4.1.0-18.el7.centos.3.x86_64 - 389-ds-base-1.3.3.1-16.el7_1.x86_64 3 ipa-servers, 1 first master (rep1) and 2 (rep2, rep3) replicates. The replicates were setup to be ca's (i.e. ipa-replica-install --setup-ca...) We have ~3000 user accounts (~1000 active the rest disabled). We have ~700 hosts enrolled (all installed using ipa-client-install and running sssd). Hosts clients are a mix of centos 7 and centos 6.5. We recently discovered one of our replica servers (rep2) was not responding. A quick check of the dirsrv logs /var/log/dirsrv/slapd-/errors (sanitized): PR_Accept() failed, Netscape Portable Runtime error (Process open FD table is full.) ... The server was rebooted and after coming back up had these errors in the logs: 389-Directory/1.3.3.1 B2015.118.1941 replica2:636 (/etc/dirsrv/slapd-) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to trickle, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to checkpoint database, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to checkpoint database, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - checkpoint_threadmain: log archive failed - BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery (-30973) [16/Jun/2015:16:24:04 -0400] - 389-Directory/1.3.3.1 B2015.118.1941 starting up [16/Jun/2015:16:24:04 -0400] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. ... [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica dc=XXX. Check if DB RUV needs to be updated [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 5577006800030003 [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f463200140004 [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f4631004d0005 [16/Jun/2015:16:24:15 -0400] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 111 (Connection refused) [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - agmt=cn=cloneAgreement1-rep2 (rep1:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) () [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica o=ipaca. Check if DB RUV needs to be updated [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f46290005005b [16/Jun/2015:16:24:15 -0400] set_krb5_creds - Could not get initial credentials for principal [ldap/rep2] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm) [16/Jun/2015:16:24:15 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 111 (Connection refused) [16/Jun/2015:16:24:15 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] authentication mechanism [GSSAPI]: error -1 (Can't contact LDAP server) [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - agmt=cn=meTorep1 (rep1:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) () [16/Jun/2015:16:24:15 -0400] - Skipping CoS Definition cn=Password
Re: [Freeipa-users] ipa replica failure
On 6/19/15 11:22 AM, Andrew E. Bruno wrote: Hello, First time trouble shooting an ipa server failure and looking for some guidance on how best to proceed. First some background on our setup: Servers are running freeipa v4.1.0 on CentOS 7.1.1503: - ipa-server-4.1.0-18.el7.centos.3.x86_64 - 389-ds-base-1.3.3.1-16.el7_1.x86_64 3 ipa-servers, 1 first master (rep1) and 2 (rep2, rep3) replicates. The replicates were setup to be ca's (i.e. ipa-replica-install --setup-ca...) We have ~3000 user accounts (~1000 active the rest disabled). We have ~700 hosts enrolled (all installed using ipa-client-install and running sssd). Hosts clients are a mix of centos 7 and centos 6.5. We recently discovered one of our replica servers (rep2) was not responding. A quick check of the dirsrv logs /var/log/dirsrv/slapd-/errors (sanitized): PR_Accept() failed, Netscape Portable Runtime error (Process open FD table is full.) ... The server was rebooted and after coming back up had these errors in the logs: 389-Directory/1.3.3.1 B2015.118.1941 replica2:636 (/etc/dirsrv/slapd-) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to trickle, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to checkpoint database, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to checkpoint database, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - checkpoint_threadmain: log archive failed - BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery (-30973) [16/Jun/2015:16:24:04 -0400] - 389-Directory/1.3.3.1 B2015.118.1941 starting up [16/Jun/2015:16:24:04 -0400] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. ... [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica dc=XXX. Check if DB RUV needs to be updated [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 5577006800030003 [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f463200140004 [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f4631004d0005 [16/Jun/2015:16:24:15 -0400] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 111 (Connection refused) [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - agmt=cn=cloneAgreement1-rep2 (rep1:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) () [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica o=ipaca. Check if DB RUV needs to be updated [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f46290005005b [16/Jun/2015:16:24:15 -0400] set_krb5_creds - Could not get initial credentials for principal [ldap/rep2] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm) [16/Jun/2015:16:24:15 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 111 (Connection refused) [16/Jun/2015:16:24:15 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] authentication mechanism [GSSAPI]: error -1 (Can't contact LDAP server) [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - agmt=cn=meTorep1 (rep1:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) () [16/Jun/2015:16:24:15 -0400] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=xxx--no CoS Templates found, which should be added before the CoS Definition. [16/Jun/2015:16:24:15 -0400] DSRetroclPlugin - delete_changerecord: could not
Re: [Freeipa-users] ipa replica failure
On 06/19/2015 12:22 PM, Andrew E. Bruno wrote: Hello, First time trouble shooting an ipa server failure and looking for some guidance on how best to proceed. First some background on our setup: Servers are running freeipa v4.1.0 on CentOS 7.1.1503: - ipa-server-4.1.0-18.el7.centos.3.x86_64 - 389-ds-base-1.3.3.1-16.el7_1.x86_64 3 ipa-servers, 1 first master (rep1) and 2 (rep2, rep3) replicates. The replicates were setup to be ca's (i.e. ipa-replica-install --setup-ca...) We have ~3000 user accounts (~1000 active the rest disabled). We have ~700 hosts enrolled (all installed using ipa-client-install and running sssd). Hosts clients are a mix of centos 7 and centos 6.5. We recently discovered one of our replica servers (rep2) was not responding. A quick check of the dirsrv logs /var/log/dirsrv/slapd-/errors (sanitized): PR_Accept() failed, Netscape Portable Runtime error (Process open FD table is full.) ... The server was rebooted and after coming back up had these errors in the logs: 389-Directory/1.3.3.1 B2015.118.1941 replica2:636 (/etc/dirsrv/slapd-) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to trickle, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to checkpoint database, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to checkpoint database, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - checkpoint_threadmain: log archive failed - BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery (-30973) [16/Jun/2015:16:24:04 -0400] - 389-Directory/1.3.3.1 B2015.118.1941 starting up [16/Jun/2015:16:24:04 -0400] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. ... [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica dc=XXX. Check if DB RUV needs to be updated [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 5577006800030003 [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f463200140004 [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f4631004d0005 [16/Jun/2015:16:24:15 -0400] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 111 (Connection refused) [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - agmt=cn=cloneAgreement1-rep2 (rep1:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) () [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica o=ipaca. Check if DB RUV needs to be updated [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f46290005005b [16/Jun/2015:16:24:15 -0400] set_krb5_creds - Could not get initial credentials for principal [ldap/rep2] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm) [16/Jun/2015:16:24:15 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 111 (Connection refused) [16/Jun/2015:16:24:15 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] authentication mechanism [GSSAPI]: error -1 (Can't contact LDAP server) [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - agmt=cn=meTorep1 (rep1:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) () [16/Jun/2015:16:24:15 -0400] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=xxx--no CoS Templates found, which should be added before the CoS Definition. [16/Jun/2015:16:24:15 -0400] DSRetroclPlugin - delete_changerecord: could
Re: [Freeipa-users] ipa replica failure
Rich Megginson wrote: On 06/19/2015 12:22 PM, Andrew E. Bruno wrote: Hello, First time trouble shooting an ipa server failure and looking for some guidance on how best to proceed. First some background on our setup: Servers are running freeipa v4.1.0 on CentOS 7.1.1503: - ipa-server-4.1.0-18.el7.centos.3.x86_64 - 389-ds-base-1.3.3.1-16.el7_1.x86_64 3 ipa-servers, 1 first master (rep1) and 2 (rep2, rep3) replicates. The replicates were setup to be ca's (i.e. ipa-replica-install --setup-ca...) We have ~3000 user accounts (~1000 active the rest disabled). We have ~700 hosts enrolled (all installed using ipa-client-install and running sssd). Hosts clients are a mix of centos 7 and centos 6.5. We recently discovered one of our replica servers (rep2) was not responding. A quick check of the dirsrv logs /var/log/dirsrv/slapd-/errors (sanitized): PR_Accept() failed, Netscape Portable Runtime error (Process open FD table is full.) ... The server was rebooted and after coming back up had these errors in the logs: 389-Directory/1.3.3.1 B2015.118.1941 replica2:636 (/etc/dirsrv/slapd-) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to trickle, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to checkpoint database, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - Serious Error---Failed to checkpoint database, err=-30973 (BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery) [16/Jun/2015:10:12:33 -0400] - libdb: BDB0060 PANIC: fatal region error detected; run recovery [16/Jun/2015:10:12:33 -0400] - checkpoint_threadmain: log archive failed - BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery (-30973) [16/Jun/2015:16:24:04 -0400] - 389-Directory/1.3.3.1 B2015.118.1941 starting up [16/Jun/2015:16:24:04 -0400] - Detected Disorderly Shutdown last time Directory Server was running, recovering database. ... [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica dc=XXX. Check if DB RUV needs to be updated [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 5577006800030003 [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f463200140004 [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f4631004d0005 [16/Jun/2015:16:24:15 -0400] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 111 (Connection refused) [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - agmt=cn=cloneAgreement1-rep2 (rep1:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) () [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: disordely shutdown for replica o=ipaca. Check if DB RUV needs to be updated [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - Force update of database RUV (from CL RUV) - 556f46290005005b [16/Jun/2015:16:24:15 -0400] set_krb5_creds - Could not get initial credentials for principal [ldap/rep2] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC for requested realm) [16/Jun/2015:16:24:15 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 111 (Connection refused) [16/Jun/2015:16:24:15 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] authentication mechanism [GSSAPI]: error -1 (Can't contact LDAP server) [16/Jun/2015:16:24:15 -0400] NSMMReplicationPlugin - agmt=cn=meTorep1 (rep1:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) () [16/Jun/2015:16:24:15 -0400] - Skipping CoS Definition cn=Password Policy,cn=accounts,dc=xxx--no CoS Templates found, which should be added before the CoS Definition. [16/Jun/2015:16:24:15 -0400] DSRetroclPlugin - delete_changerecord: could not delete