Thanks Ludwig. I’ve open the issue #6990 with the logs and files requested. 

In the past few days I’ve managed to remove the stale replicas running the 
cleanruv task via ldif, and tried to resync again few times, but the error logs 
still keep happening. You mentioned that there is the 
nsds5ReplicaIgnoreMissingChange option, but can you specify the steps on how to 
set/enable that option?

Thanks,
Goran


> On May 19, 2017, at 3:49 AM, Ludwig Krispenz <lkris...@redhat.com> wrote:
> 
> 
> On 05/18/2017 10:13 PM, Goran Marik wrote:
>> Thanks Ludwig for the suggestion and thanks to Maciej for the confirmation 
>> from his end. This issue is happening for us for several weeks, so I don’t 
>> think this is a transient problem.
>> 
>> What is the best way to sanitize the logs without removing useful info 
>> before sending them your way? Will the files mentioned on 
>> "https://www.freeipa.org/page/Files_to_be_attached_to_bug_report -> 
>> Directory server failed" be sufficient?
> yes, but we need soem additional info on the replication config and state, 
> you could add /etc/dirsrv/slapd-*/dse.ldif
> and the result of these query
> 
> ldapsearch -o ldif-wrap=no .................... -D "cn=directory manager" ... 
> -b "cn=config" "objectclass=nsds5replica" \* nsds50ruv
> 
> But looking again at the csn reorted missing it is from June, 2016. So I 
> wonder if this is for an stale/removed replica and cleaning the ruvs would 
> help
>> 
>> I’ve also run the ipa_consistency_check script, and the output shows that 
>> something is indeed wrong with the sync:
>> “””
>> FreeIPA servers:    inf01    inf01    inf02    inf02    STATE
>> =============================================================
>> Active Users        15       15       15       15       OK
>> Stage Users         0        0        0        0        OK
>> Preserved Users     3        3        3        3        OK
>> User Groups         9        9        9        9        OK
>> Hosts               45       45       45       46       FAIL
>> Host Groups         7        7        7        7        OK
>> HBAC Rules          6        6        6        6        OK
>> SUDO Rules          7        7        7        7        OK
>> DNS Zones           33       33       33       33       OK
>> LDAP Conflicts      NO       NO       NO       NO       OK
>> Ghost Replicas      2        2        2        2        FAIL
>> Anonymous BIND      YES      YES      YES      YES      OK
>> Replication Status  inf01.prod 0inf01.dev 0inf01.dev 0inf01.dev 0
>>                     inf02.dev 0inf02.dev 0inf01.prod 0inf01.prod 0
>>                     inf02.prod 0inf02.prod 0inf02.prod 0inf02.dev 0
>> =============================================================
>> “””
>> 
>> Thanks,
>> Goran
>> 
>>> On May 15, 2017, at 6:35 AM, Ludwig Krispenz <lkris...@redhat.com> wrote:
>>> 
>>> The messages you see could be transient messages, and if replication is 
>>> working than this seems to be the case. If not we would need more data to 
>>> investigate: deployment info, relicaIDs of all servers, ruvs, logs,.....
>>> 
>>> Here is some background info: there are some scenarios where a csn could 
>>> not be found in the changelog, eg if updates were aplied on the supplier 
>>> during a total init, they could be part of the data and database ruv, but 
>>> not in the changelog of the initialized replica.
>>> ds did try to use an alternative csn in cases where it could not be found, 
>>> but this had the risk of missing updates, so we decided to change it and 
>>> make this misssing csn a non fatal error, backoff and retry, if another 
>>> supplier would have updated the replica in between, the starting csn could 
>>> have changed and be found. so if the reported missing csns change and 
>>> replication continues everything is ok, although I think the messages 
>>> should stop at some point.
>>> 
>>> There is a configuration parameter for a replciation agreement to trigger 
>>> the previous behaviour of picking an alternative csn:
>>> nsds5ReplicaIgnoreMissingChange
>>> with potential values "once", "always".
>>> 
>>> where "once" just tries to kickstart replication by using another csn and 
>>> "always" changes the default behaviour
>>> 
>>> 
>>> On 05/11/2017 06:53 PM, Goran Marik wrote:
>>>> Hi,
>>>> 
>>>> After an upgrade to Centos 7.3.1611 with “yum update", we started seeing 
>>>> the following messages in the logs:
>>>> “””
>>>> May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.519724479 
>>>> +0000] NSMMReplicationPlugin - changelog program - 
>>>> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
>>>> 576b34e8000a050f0000 not found, we aren't as up to date, or we purged
>>>> May  9 21:58:28 inf01 ns-slapd[4323]: [09/May/2017:21:58:28.550459233 
>>>> +0000] NSMMReplicationPlugin - 
>>>> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): 
>>>> Data required to update replica has been purged from the changelog. The 
>>>> replica must be reinitialized.
>>>> May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.588245476 
>>>> +0000] agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" 
>>>> (inf02:389) - Can't locate CSN 576b34e8000a050f0000 in the changelog (DB 
>>>> rc=-30988). If replication stops, the consumer may need to be 
>>>> reinitialized.
>>>> May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.611400689 
>>>> +0000] NSMMReplicationPlugin - changelog program - 
>>>> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): CSN 
>>>> 576b34e8000a050f0000 not found, we aren't as up to date, or we purged
>>>> May  9 21:58:32 inf01 ns-slapd[4323]: [09/May/2017:21:58:32.642226385 
>>>> +0000] NSMMReplicationPlugin - 
>>>> agmt="cn=cloneAgreement1-inf02.dev.ecobee.com-pki-tomcat" (inf02:389): 
>>>> Data required to update replica has been purged from the changelog. The 
>>>> replica must be reinitialized.
>>>> “””
>>>> 
>>>> The log messages are pretty frequently, every few seconds, and report few 
>>>> different CSN numbers that cannot be located.
>>>> 
>>>> This happens only on one replica out of 4. We’ve tried "ipa-replica-manage 
>>>> re-initialize —from” and “ipa-csreplica-manage re-initialize —from” 
>>>> several times, but while both commands report success, the log messages 
>>>> continue to happen. The server was rebooted and “systemctl restart ipa” 
>>>> was done few times as well.
>>>> 
>>>> The replica seems to be working fine despite the errors, but I’m worried 
>>>> that the logs indicate underlaying problem we are not fully detecting. I 
>>>> would like to understand better what is triggering this behaviour and how 
>>>> to fix it, and if someone else saw them after a recent upgrades.
>>>> 
>>>> The software versions are 389-ds-base-1.3.5.10-20.el7_3.x86_64 and 
>>>> ipa-server-4.4.0-14.el7.centos.7.x86_64
>>>> 
>>>> Thanks,
>>>> Goran
>>>> 
>>>> --
>>>> Goran Marik
>>>> Senior Systems Developer
>>>> 
>>>> ecobee
>>>> 250 University Ave, Suite 400
>>>> Toronto, ON M5H 3E5
>>>> 
>>>> 
>>>> 
>>>> 
>>> -- 
>>> Red Hat GmbH,
>>> http://www.de.redhat.com/
>>> , Registered seat: Grasbrunn,
>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, 
>>> Eric Shander
>>> 
>>> -- 
>>> Manage your subscription for the Freeipa-users mailing list:
>>> https://www.redhat.com/mailman/listinfo/freeipa-users
>>> Go to http://freeipa.org for more info on the project
>> --
>> Goran Marik
>> Senior Systems Developer
>> 
>> ecobee
>> 250 University Ave, Suite 400
>> Toronto, ON M5H 3E5
>> 
>> 
> 
> -- 
> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, 
> Eric Shander
> 

--
Goran Marik
Senior Systems Developer

ecobee
250 University Ave, Suite 400
Toronto, ON M5H 3E5


_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

Reply via email to