[Freeipa-users] Re: Replication failing on some records

2017-06-12 Thread Nick Campion via FreeIPA-users
Thanks Mark,

So this example is a user password change using kinit, the password has
been changed on freeipa02 but not then replicated to the others. This
happens for other records, but I don't have examples of these at the
moment.

As far as I'm aware, there is no fractal replication set up.

Freeipa01:

# dynamic-kepler, users, accounts, ipa.example.com
dn: uid=dynamic-kepler,cn=users,cn=accounts,dc=ipa,dc=example,dc=com
uid: dynamic-kepler
krbLastPwdChange: 20170608170011Z
krbPasswordExpiration: 20170608170011Z

Freeipa02:

# dynamic-kepler, users, accounts, ipa.example.com
dn: uid=dynamic-kepler,cn=users,cn=accounts,dc=ipa,dc=example,dc=com
uid: dynamic-kepler
krbLastPwdChange: 20170608170021Z
krbPasswordExpiration: 20170906170021Z

Freeipa03:

# dynamic-kepler, users, accounts, ipa.example.com
dn: uid=dynamic-kepler,cn=users,cn=accounts,dc=ipa,dc=example,dc=com
uid: dynamic-kepler
krbLastPwdChange: 20170608170011Z
krbPasswordExpiration: 20170608170011Z

Errors on Freeipa02:

[08/Jun/2017:01:46:50.635529447 +] replica_generate_next_csn:
opcsn=5938ac8b00050003 <= basecsn=5938ac8b00050004, adjusted
opcsn=5938ac8b00060003
[08/Jun/2017:12:16:46.497249649 +] replica_generate_next_csn:
opcsn=5939402f00050003 <= basecsn=5939402f00080004, adjusted
opcsn=5939402f00090003
[08/Jun/2017:23:38:48.197750001 +] replica_generate_next_csn:
opcsn=5939e00900010003 <= basecsn=5939e009000f0004, adjusted
opcsn=5939e0090013

The other nodes have no errors from this data.

Access logs:

Freeipa01:

[08/Jun/2017:01:46:50.635529447 +] replica_generate_next_csn:
opcsn=5938ac8b00050003 <= basecsn=5938ac8b00050004, adjusted
opcsn=5938ac8b00060003
[08/Jun/2017:12:16:46.497249649 +] replica_generate_next_csn:
opcsn=5939402f00050003 <= basecsn=5939402f00080004, adjusted
opcsn=5939402f00090003
[08/Jun/2017:23:38:48.197750001 +] replica_generate_next_csn:
opcsn=5939e00900010003 <= basecsn=5939e009000f0004, adjusted
opcsn=5939e0090013

Freeipa02:

Shows no logs "to" the other 2 nodes.

Freeipa03:

[08/Jun/2017:17:10:06.343697044 +] conn=9237 fd=70 slot=70
connection from 192.168.0.12 to 192.168.0.13
[08/Jun/2017:19:54:05.025713675 +] conn=9665 fd=70 slot=70
connection from 192.168.0.12 to 192.168.0.13

Freeipa02 replication logging:

[09/Jun/2017:11:24:58.827281135 +] NSMMReplicationPlugin -
csnplCommitALL: processing data csn 593964af00090003

Repeats 800 - 900 time per second with a different csn.

Full logs attached.


On 08/06/17 15:45, Mark Reynolds wrote:
>
>
> On 06/07/2017 10:58 AM, Nick Campion via FreeIPA-users wrote:
>>
>> Hi all,
>>
>>  
>>
>> We have a 3 master setup that is failing to replicate changes from a
>> particular node to the other IPA instances. The replication status
>> says it's all fine, however the record hasn't been changed on the
>> other servers. We've seen this on user password changes, adding hosts
>> and services. The only thing we've found that seems to fix this
>> temporarily is to re-initialize from the master with the changed
>> record. A force-sync doesn't pick up the changed record.
>>
> What is the change you making, what attribute are you updating?  Could
> it be possible that its being excluded by fractional replication?  Or
> is it all changes?
>
> Any errors in the logs on the nodes(good and bad): 
> /var/log/dirsrv/slapd-INSTANCE/errors
>
> Do you see replication sessions starting between the bad node and good
> ones?  Are they talking?  Check the access log (
> /var/log/dirsrv/slapd-INSTANCE/access) on a good node and look for
> "connection from "
>
> Next would be to enable replication logging on the bad node and
> reproduce the problem (then disable repl logging right away), then
> send us the logs to look at.  See 
> https://access.redhat.com/documentation/en-us/red_hat_directory_server/10/html/administration_guide/managing_replication-troubleshooting_replication_related_problems
>
> Regards,
> Mark
>
>> Not sure what logs would be helpful to diagnose what is happening in
>> this setup. 
>>
>> # ipa-replica-manage -v list `hostname`
>> freeipa03.mgmt.example.com: replica
>> last init status: None
>> last init ended: 1970-01-01 00:00:00+00:00
>> last update status: Error (0) Replica acquired successfully:
>> Incremental update succeeded
>> last update ended: 2017-06-07 14:43:53+00:00
>> freeipa02.mgmt.example.com: replica
>> last init status: None
>> last init ended: 1970-01-01 00:00:00+00:00
>> last update status: Error (0) Replica acquired successfully:
>> Incremental update succeeded
>> last update ended: 2017-06-07 14:43:53+00:00
>>
>> # ldapsearch -W -x -D "cn=directory manager" -b
>> "cn=users,cn=accounts,dc=ipa,dc=example,dc=com" "nsds5ReplConflict=*"
>> \* nsds5ReplConflict
>> Enter LDAP Password:
>> # extended LDIF
>> #
>> # LDAPv3
>> # base 

[Freeipa-users] Re: Replication failing on some records

2017-06-08 Thread Mark Reynolds via FreeIPA-users


On 06/07/2017 10:58 AM, Nick Campion via FreeIPA-users wrote:
>
> Hi all,
>
>  
>
> We have a 3 master setup that is failing to replicate changes from a
> particular node to the other IPA instances. The replication status
> says it's all fine, however the record hasn't been changed on the
> other servers. We've seen this on user password changes, adding hosts
> and services. The only thing we've found that seems to fix this
> temporarily is to re-initialize from the master with the changed
> record. A force-sync doesn't pick up the changed record.
>
What is the change you making, what attribute are you updating?  Could
it be possible that its being excluded by fractional replication?  Or is
it all changes?

Any errors in the logs on the nodes(good and bad): 
/var/log/dirsrv/slapd-INSTANCE/errors

Do you see replication sessions starting between the bad node and good
ones?  Are they talking?  Check the access log (
/var/log/dirsrv/slapd-INSTANCE/access) on a good node and look for
"connection from "

Next would be to enable replication logging on the bad node and
reproduce the problem (then disable repl logging right away), then send
us the logs to look at.  See 
https://access.redhat.com/documentation/en-us/red_hat_directory_server/10/html/administration_guide/managing_replication-troubleshooting_replication_related_problems

Regards,
Mark

> Not sure what logs would be helpful to diagnose what is happening in
> this setup. 
>
> # ipa-replica-manage -v list `hostname`
> freeipa03.mgmt.example.com: replica
> last init status: None
> last init ended: 1970-01-01 00:00:00+00:00
> last update status: Error (0) Replica acquired successfully:
> Incremental update succeeded
> last update ended: 2017-06-07 14:43:53+00:00
> freeipa02.mgmt.example.com: replica
> last init status: None
> last init ended: 1970-01-01 00:00:00+00:00
> last update status: Error (0) Replica acquired successfully:
> Incremental update succeeded
> last update ended: 2017-06-07 14:43:53+00:00
>
> # ldapsearch -W -x -D "cn=directory manager" -b
> "cn=users,cn=accounts,dc=ipa,dc=example,dc=com" "nsds5ReplConflict=*"
> \* nsds5ReplConflict
> Enter LDAP Password:
> # extended LDIF
> #
> # LDAPv3
> # base