Re: [Freeipa-users] Replica cannot be reinitialized after upgrade

2017-05-19 Thread Ludwig Krispenz


On 05/18/2017 10:13 PM, Goran Marik wrote:

Thanks Ludwig for the suggestion and thanks to Maciej for the confirmation from 
his end. This issue is happening for us for several weeks, so I don’t think 
this is a transient problem.

What is the best way to sanitize the logs without removing useful info before sending them 
your way? Will the files mentioned on 
"https://www.freeipa.org/page/Files_to_be_attached_to_bug_report -> Directory server 
failed" be sufficient?
yes, but we need soem additional info on the replication config and 
state, you could add /etc/dirsrv/slapd-*/dse.ldif

and the result of these query

ldapsearch -o ldif-wrap=no  -D "cn=directory 
manager" ... -b "cn=config" "objectclass=nsds5replica" \* nsds50ruv


But looking again at the csn reorted missing it is from June, 2016. So I 
wonder if this is for an stale/removed replica and cleaning the ruvs 
would help


I’ve also run the ipa_consistency_check script, and the output shows that 
something is indeed wrong with the sync:
“””
FreeIPA servers:inf01inf01inf02inf02STATE
=
Active Users15   15   15   15   OK
Stage Users 0000OK
Preserved Users 3333OK
User Groups 9999OK
Hosts   45   45   45   46   FAIL
Host Groups 7777OK
HBAC Rules  6666OK
SUDO Rules  7777OK
DNS Zones   33   33   33   33   OK
LDAP Conflicts  NO   NO   NO   NO   OK
Ghost Replicas  2222FAIL
Anonymous BIND  YES  YES  YES  YES  OK
Replication Status  inf01.prod 0inf01.dev 0inf01.dev 0inf01.dev 0
 inf02.dev 0inf02.dev 0inf01.prod 0inf01.prod 0
 inf02.prod 0inf02.prod 0inf02.prod 0inf02.dev 0
=
“””

Thanks,
Goran


On May 15, 2017, at 6:35 AM, Ludwig Krispenz  wrote:

The messages you see could be transient messages, and if replication is working 
than this seems to be the case. If not we would need more data to investigate: 
deployment info, relicaIDs of all servers, ruvs, logs,.

Here is some background info: there are some scenarios where a csn could not be 
found in the changelog, eg if updates were aplied on the supplier during a 
total init, they could be part of the data and database ruv, but not in the 
changelog of the initialized replica.
ds did try to use an alternative csn in cases where it could not be found, but 
this had the risk of missing updates, so we decided to change it and make this 
misssing csn a non fatal error, backoff and retry, if another supplier would 
have updated the replica in between, the starting csn could have changed and be 
found. so if the reported missing csns change and replication continues 
everything is ok, although I think the messages should stop at some point.

There is a configuration parameter for a replciation agreement to trigger the 
previous behaviour of picking an alternative csn:
nsds5ReplicaIgnoreMissingChange
with potential values "once", "always".

where "once" just tries to kickstart replication by using another csn and 
"always" changes the default behaviour


On 05/11/2017 06:53 PM, Goran Marik wrote:

Hi,

After an upgrade to Centos 7.3.1611 with “yum update", we started seeing the 
following messages in the logs:
“””
May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.519724479 +] 
NSMMReplicationPlugin - changelog program - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
576b34e8000a050f not found, we aren't as up to date, or we purged
May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.550459233 +] 
NSMMReplicationPlugin - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data 
required to update replica has been purged from the changelog. The replica must be 
reinitialized.
May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.588245476 +] 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389) - Can't 
locate CSN 576b34e8000a050f in the changelog (DB rc=-30988). If replication stops, 
the consumer may need to be reinitialized.
May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.611400689 +] 
NSMMReplicationPlugin - changelog program - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
576b34e8000a050f not found, we aren't as up to date, or we purged
May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.642226385 +] 
NSMMReplicationPlugin - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data 
required to update replica has been 

Re: [Freeipa-users] Replica cannot be reinitialized after upgrade

2017-05-18 Thread Goran Marik
Thanks Ludwig for the suggestion and thanks to Maciej for the confirmation from 
his end. This issue is happening for us for several weeks, so I don’t think 
this is a transient problem. 

What is the best way to sanitize the logs without removing useful info before 
sending them your way? Will the files mentioned on 
"https://www.freeipa.org/page/Files_to_be_attached_to_bug_report -> Directory 
server failed" be sufficient? 

I’ve also run the ipa_consistency_check script, and the output shows that 
something is indeed wrong with the sync:
“””
FreeIPA servers:inf01inf01inf02inf02STATE
=
Active Users15   15   15   15   OK
Stage Users 0000OK
Preserved Users 3333OK
User Groups 9999OK
Hosts   45   45   45   46   FAIL
Host Groups 7777OK
HBAC Rules  6666OK
SUDO Rules  7777OK
DNS Zones   33   33   33   33   OK
LDAP Conflicts  NO   NO   NO   NO   OK
Ghost Replicas  2222FAIL
Anonymous BIND  YES  YES  YES  YES  OK
Replication Status  inf01.prod 0inf01.dev 0inf01.dev 0inf01.dev 0
inf02.dev 0inf02.dev 0inf01.prod 0inf01.prod 0
inf02.prod 0inf02.prod 0inf02.prod 0inf02.dev 0
=
“””

Thanks,
Goran

> On May 15, 2017, at 6:35 AM, Ludwig Krispenz  wrote:
> 
> The messages you see could be transient messages, and if replication is 
> working than this seems to be the case. If not we would need more data to 
> investigate: deployment info, relicaIDs of all servers, ruvs, logs,.
> 
> Here is some background info: there are some scenarios where a csn could not 
> be found in the changelog, eg if updates were aplied on the supplier during a 
> total init, they could be part of the data and database ruv, but not in the 
> changelog of the initialized replica.
> ds did try to use an alternative csn in cases where it could not be found, 
> but this had the risk of missing updates, so we decided to change it and make 
> this misssing csn a non fatal error, backoff and retry, if another supplier 
> would have updated the replica in between, the starting csn could have 
> changed and be found. so if the reported missing csns change and replication 
> continues everything is ok, although I think the messages should stop at some 
> point.
> 
> There is a configuration parameter for a replciation agreement to trigger the 
> previous behaviour of picking an alternative csn: 
> nsds5ReplicaIgnoreMissingChange
> with potential values "once", "always".
> 
> where "once" just tries to kickstart replication by using another csn and 
> "always" changes the default behaviour
> 
> 
> On 05/11/2017 06:53 PM, Goran Marik wrote:
>> Hi,
>> 
>> After an upgrade to Centos 7.3.1611 with “yum update", we started seeing the 
>> following messages in the logs:
>> “””
>> May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.519724479 +] 
>> NSMMReplicationPlugin - changelog program - 
>> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
>> 576b34e8000a050f not found, we aren't as up to date, or we purged
>> May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.550459233 +] 
>> NSMMReplicationPlugin - 
>> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data 
>> required to update replica has been purged from the changelog. The replica 
>> must be reinitialized.
>> May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.588245476 +] 
>> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389) - 
>> Can't locate CSN 576b34e8000a050f in the changelog (DB rc=-30988). If 
>> replication stops, the consumer may need to be reinitialized.
>> May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.611400689 +] 
>> NSMMReplicationPlugin - changelog program - 
>> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
>> 576b34e8000a050f not found, we aren't as up to date, or we purged
>> May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.642226385 +] 
>> NSMMReplicationPlugin - 
>> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data 
>> required to update replica has been purged from the changelog. The replica 
>> must be reinitialized.
>> “””
>> 
>> The log messages are pretty frequently, every few seconds, and report few 
>> different CSN numbers that cannot be located. 
>> 
>> This happens only on one replica out of 4. We’ve tried "ipa-replica-manage 
>> re-initialize —from” and “ipa-csreplica-manage re-initialize 

Re: [Freeipa-users] Replica cannot be reinitialized after upgrade

2017-05-15 Thread Ludwig Krispenz
The messages you see could be transient messages, and if replication is 
working than this seems to be the case. If not we would need more data 
to investigate: deployment info, relicaIDs of all servers, ruvs, logs,.


Here is some background info: there are some scenarios where a csn could 
not be found in the changelog, eg if updates were aplied on the supplier 
during a total init, they could be part of the data and database ruv, 
but not in the changelog of the initialized replica.
ds did try to use an alternative csn in cases where it could not be 
found, but this had the risk of missing updates, so we decided to change 
it and make this misssing csn a non fatal error, backoff and retry, if 
another supplier would have updated the replica in between, the starting 
csn could have changed and be found. so if the reported missing csns 
change and replication continues everything is ok, although I think the 
messages should stop at some point.


There is a configuration parameter for a replciation agreement to 
trigger the previous behaviour of picking an alternative csn:


nsds5ReplicaIgnoreMissingChange

with potential values "once", "always".

where "once" just tries to kickstart replication by using another csn 
and "always" changes the default behaviour



On 05/11/2017 06:53 PM, Goran Marik wrote:

Hi,

After an upgrade to Centos 7.3.1611 with “yum update", we started seeing the 
following messages in the logs:
“””
May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.519724479 +] 
NSMMReplicationPlugin - changelog program - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
576b34e8000a050f not found, we aren't as up to date, or we purged
May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.550459233 +] 
NSMMReplicationPlugin - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data 
required to update replica has been purged from the changelog. The replica must be 
reinitialized.
May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.588245476 +] 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389) - Can't 
locate CSN 576b34e8000a050f in the changelog (DB rc=-30988). If replication stops, 
the consumer may need to be reinitialized.
May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.611400689 +] 
NSMMReplicationPlugin - changelog program - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
576b34e8000a050f not found, we aren't as up to date, or we purged
May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.642226385 +] 
NSMMReplicationPlugin - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data 
required to update replica has been purged from the changelog. The replica must be 
reinitialized.
“””

The log messages are pretty frequently, every few seconds, and report few 
different CSN numbers that cannot be located.

This happens only on one replica out of 4. We’ve tried "ipa-replica-manage 
re-initialize —from” and “ipa-csreplica-manage re-initialize —from” several times, 
but while both commands report success, the log messages continue to happen. The 
server was rebooted and “systemctl restart ipa” was done few times as well.

The replica seems to be working fine despite the errors, but I’m worried that 
the logs indicate underlaying problem we are not fully detecting. I would like 
to understand better what is triggering this behaviour and how to fix it, and 
if someone else saw them after a recent upgrades.

The software versions are 389-ds-base-1.3.5.10-20.el7_3.x86_64 and 
ipa-server-4.4.0-14.el7.centos.7.x86_64

Thanks,
Goran

--
Goran Marik
Senior Systems Developer

ecobee
250 University Ave, Suite 400
Toronto, ON M5H 3E5





--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric 
Shander

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Re: [Freeipa-users] Replica cannot be reinitialized after upgrade

2017-05-15 Thread Maciej Drobniuch
Hi Goran

Exact same issue here with the same troubleshooting steps taken(I've tried
to reinitialize the replicas with success msg) - no luck so far.

I've additionally have run ipa_check_consistency script:
FreeIPA servers:ipa1  ipa2  ipa3STATE
===
Active Users373737OK
Stage Users 0 0 0 OK
Preserved Users 0 0 0 OK
User Groups 101010OK
Hosts   696969OK
Host Groups 7 7 7 OK
HBAC Rules  111111OK
SUDO Rules  1 1 1 OK
DNS Zones   8 8 8 OK
LDAP Conflicts  YES   YES   YES   FAIL
Ghost Replicas  NONONOOK
Anonymous BIND  YES   YES   YES   OK
Replication Status  ipa2 18   ipa1 0ipa1 0
ipa3 0
===

Besides of this the ipa master named-pkcs is sometimes crashing and ipa
fails to start.
I've rolled a backup from 1week ago and it's starting but I don't know how
long it will last.

IPA team please help.


# ipa --version
VERSION: 4.4.0, API_VERSION: 2.213

-- 
Best regards

Maciej Drobniuch
Network Security Engineer
Collective-Sense,LLC


On Thu, May 11, 2017 at 6:53 PM, Goran Marik  wrote:

> Hi,
>
> After an upgrade to Centos 7.3.1611 with “yum update", we started seeing
> the following messages in the logs:
> “””
> May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.519724479
> +] NSMMReplicationPlugin - changelog program - agmt="cn=cloneAgreement1-
> inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 576b34e8000a050f
> not found, we aren't as up to date, or we purged
> May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.550459233
> +] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-
> inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data required to update
> replica has been purged from the changelog. The replica must be
> reinitialized.
> May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.588245476
> +] agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat"
> (inf02:389) - Can't locate CSN 576b34e8000a050f in the changelog (DB
> rc=-30988). If replication stops, the consumer may need to be reinitialized.
> May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.611400689
> +] NSMMReplicationPlugin - changelog program - agmt="cn=cloneAgreement1-
> inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 576b34e8000a050f
> not found, we aren't as up to date, or we purged
> May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.642226385
> +] NSMMReplicationPlugin - agmt="cn=cloneAgreement1-
> inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data required to update
> replica has been purged from the changelog. The replica must be
> reinitialized.
> “””
>
> The log messages are pretty frequently, every few seconds, and report few
> different CSN numbers that cannot be located.
>
> This happens only on one replica out of 4. We’ve tried "ipa-replica-manage
> re-initialize —from” and “ipa-csreplica-manage re-initialize —from” several
> times, but while both commands report success, the log messages continue to
> happen. The server was rebooted and “systemctl restart ipa” was done few
> times as well.
>
> The replica seems to be working fine despite the errors, but I’m worried
> that the logs indicate underlaying problem we are not fully detecting. I
> would like to understand better what is triggering this behaviour and how
> to fix it, and if someone else saw them after a recent upgrades.
>
> The software versions are 389-ds-base-1.3.5.10-20.el7_3.x86_64 and
> ipa-server-4.4.0-14.el7.centos.7.x86_64
>
> Thanks,
> Goran
>
> --
> Goran Marik
> Senior Systems Developer
>
> ecobee
> 250 University Ave, Suite 400
> Toronto, ON M5H 3E5
>
>
>
> --
> Manage your subscription for the Freeipa-users mailing list:
> https://www.redhat.com/mailman/listinfo/freeipa-users
> Go to http://freeipa.org for more info on the project
-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

[Freeipa-users] Replica cannot be reinitialized after upgrade

2017-05-11 Thread Goran Marik
Hi,

After an upgrade to Centos 7.3.1611 with “yum update", we started seeing the 
following messages in the logs:
“””
May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.519724479 +] 
NSMMReplicationPlugin - changelog program - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
576b34e8000a050f not found, we aren't as up to date, or we purged
May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.550459233 +] 
NSMMReplicationPlugin - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data 
required to update replica has been purged from the changelog. The replica must 
be reinitialized.
May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.588245476 +] 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389) - Can't 
locate CSN 576b34e8000a050f in the changelog (DB rc=-30988). If replication 
stops, the consumer may need to be reinitialized.
May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.611400689 +] 
NSMMReplicationPlugin - changelog program - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
576b34e8000a050f not found, we aren't as up to date, or we purged
May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.642226385 +] 
NSMMReplicationPlugin - 
agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): Data 
required to update replica has been purged from the changelog. The replica must 
be reinitialized.
“””

The log messages are pretty frequently, every few seconds, and report few 
different CSN numbers that cannot be located. 

This happens only on one replica out of 4. We’ve tried "ipa-replica-manage 
re-initialize —from” and “ipa-csreplica-manage re-initialize —from” several 
times, but while both commands report success, the log messages continue to 
happen. The server was rebooted and “systemctl restart ipa” was done few times 
as well. 

The replica seems to be working fine despite the errors, but I’m worried that 
the logs indicate underlaying problem we are not fully detecting. I would like 
to understand better what is triggering this behaviour and how to fix it, and 
if someone else saw them after a recent upgrades. 

The software versions are 389-ds-base-1.3.5.10-20.el7_3.x86_64 and 
ipa-server-4.4.0-14.el7.centos.7.x86_64

Thanks,
Goran

--
Goran Marik
Senior Systems Developer

ecobee
250 University Ave, Suite 400
Toronto, ON M5H 3E5



-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project