Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

2016-08-22 Thread John Desantis
Ludwig,

> I looked into the logs, I think the messages are harmless, just an effect of
> csn adjustment due to time difference on the two machines. I had said that
> the replication protocol will try to adjust the csn generator, but looks
> like you have long lasting replication connections and the adjustment is
> done only at the beginning. Maybe we can look into this and improve it.
> Just the tracking of one of these error messages:

Thank you for your insight and looking into these logs.  Given the
relative obscurity of results of this message within Google and this
mailing list, there may be nothing to improve!  In other words, it
looks like the issue has been resolved.

In a not so nutshell, I've been monitoring a ns-slapd thread that was
continually pegged with high file I/O via the 'pread' while reading
its db* files on the master server, which produced some latency.
After seemingly pointless searches, I stumbled upon Rich's "dbmon.sh"
via a post [1] and verified that some tuning was needed for our site.
After applying the changes I did notice that there was a drop in
memory pressure on the system and that there seemed to be less
latency, but the ns-slapd thread was still performing a lot of file
I/O.  It seems now that the issue with the timing was due to this
observed latency.  Anyways, I was still bugged with an issue I had
originally opened in my first post to the list [2] and finally
discovered that one of our replication nodes was culpable for not
responding to the 'ipa-replica-manage clean-ruv #' (stdout via this
command did not report which servers had and had not properly cleaned
the RUV).  I was able to manually remove it via 'ldapmodify' and
cleanruv.  At this point, some of the file I/O I had seen was more
than halved.  The last piece of the puzzle was using
"ipa-csreplica-manage" and 'ldapmodify' to remove [3] the CA
references to the replica mentioned in [1].  Once this was done, all
of the thread I/O stopped.

I then performed some testing of adding and removing DNS records via a
for loop, with and without sleep statements.  Not once did any more of
the replica_generate_next_csn messages appear.

For anyone else seeing similar issues, hopefully this information will help.

John DeSantis

[1] https://www.redhat.com/archives/freeipa-users/2014-November/msg00138.html
[2] https://www.redhat.com/archives/freeipa-users/2014-October/msg00283.html
[3] https://www.redhat.com/archives/freeipa-users/2015-March/msg00436.html

2016-08-22 5:41 GMT-04:00 Ludwig Krispenz :
> Thanks,
>
> I looked into the logs, I think the messages are harmless, just an effect of
> csn adjustment due to time difference on the two machines. I had said that
> the replication protocol will try to adjust the csn generator, but looks
> like you have long lasting replication connections and the adjustment is
> done only at the beginning. Maybe we can look into this and improve it.
> Just the tracking of one of these error messages:
>
> the entry is modified on adm3
> adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 MOD
> dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
> adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 RESULT err=0 tag=103
> nentries=0 etime=0 csn=57b763030016
> this mod is replicated to adm0
> adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 MOD
> dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
> adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 RESULT err=0 tag=103
> nentries=0 etime=0 csn=57b763030016
> the entry is modified again on adm0
> adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 MOD
> dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
> but it looks like the csn generated is smaller than the one already in the
> entry, and it is adjusted
> adm0 :[19/Aug/2016:15:47:07 -0400] - replica_generate_next_csn:
> opcsn=57b76301000a0004 <= basecsn=57b763030016, adjusted
> opcsn=57b7630300010004
> then the result is logged with the adjusted csn
> adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 RESULT err=0 tag=103
> nentries=0 etime=0 csn=57b7630300010004
>
> so the mechanism works, but the messages may be confusing and improvement of
> the protocol could be investigated.
>
> One question I have, but someone more familiar with dns should answer:
> we  have regular updates of the same entry on both replicas, about every 2
> seconds, what is the reason for this ?
>
>
> /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:03 -0400] conn=13
> op=126630 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
> /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:05 -0400] conn=13
> op=126637 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
> /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:07 -0400] conn=13
> op=126646 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
> /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:09 -0400] conn=13
> op=126653 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"

Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

2016-08-22 Thread Ludwig Krispenz

Thanks,

I looked into the logs, I think the messages are harmless, just an 
effect of csn adjustment due to time difference on the two machines. I 
had said that the replication protocol will try to adjust the csn 
generator, but looks like you have long lasting replication connections 
and the adjustment is done only at the beginning. Maybe we can look into 
this and improve it.

Just the tracking of one of these error messages:

the entry is modified on adm3
adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 MOD 
dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 RESULT err=0 
tag=103 nentries=0 etime=0 csn=57b763030016

this mod is replicated to adm0
adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 MOD 
dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 RESULT err=0 
tag=103 nentries=0 etime=0 csn=57b763030016

the entry is modified again on adm0
adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 MOD 
dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
but it looks like the csn generated is smaller than the one already in 
the entry, and it is adjusted
adm0 :[19/Aug/2016:15:47:07 -0400] - replica_generate_next_csn: 
opcsn=57b76301000a0004 <= basecsn=57b763030016, adjusted 
opcsn=57b7630300010004

then the result is logged with the adjusted csn
adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 RESULT err=0 
tag=103 nentries=0 etime=0 csn=57b7630300010004


so the mechanism works, but the messages may be confusing and 
improvement of the protocol could be investigated.


One question I have, but someone more familiar with dns should answer:
we  have regular updates of the same entry on both replicas, about every 
2 seconds, what is the reason for this ?



/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:03 -0400] conn=13 
op=126630 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:05 -0400] conn=13 
op=126637 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:07 -0400] conn=13 
op=126646 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:09 -0400] conn=13 
op=126653 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:13 -0400] conn=13 
op=12 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:16 -0400] conn=13 
op=126673 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:18 -0400] conn=13 
op=126689 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:20 -0400] conn=13 
op=126696 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:21 -0400] conn=13 
op=126702 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:23 -0400] conn=13 
op=126737 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:26 -0400] conn=13 
op=126758 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
/tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:29 -0400] conn=13 
op=126801 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"




On 08/19/2016 10:00 PM, John Desantis wrote:

Ludwig,


you still only grep the replication connection, but before being replicated
the entry has to be added by some client connection, can you get all
references to the entry ?
the log snippet you provide shows also csns with tag=103, which indicate a
MOD, are these MODs for the added entries ? or other mods ?

.

I can't believe I did that!

Ok, so the logs have been rotated (I didn't think to adjust
logrotate..), so there aren't any logs to peruse for the case I've
presented so far.  However, I was able to reproduce the errors by
"bulk" deleting 39 DNS entries, and only the MASTER reported
"replica_generate_next_csn" entries.

Given the size of the logs, I think it would be pointless to do any
kind of sanitization.  I'll go ahead and gzip them for you and email
you off-list.

I've labeled them as MASTER and REPLICA.

John DeSantis


2016-08-19 9:18 GMT-04:00 Ludwig Krispenz :

On 08/18/2016 05:28 PM, John Desantis wrote:

Ludwig,


unfortunately this is not enough to determine what is going on. The
intersting generated/used csn is only logged in the
corresponding RESULT message and these are only the replication
connections,
it would be necessary to see the
original ADD operation, was it added once or twice by a client ?
you could pick one entry eg server-6-3-sp and grep for all references in
the
access logs of both servers  (maybe there are mods as well) and then
get also get the RESULT line for the ops found


Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

2016-08-19 Thread John Desantis
Ludwig,

> you still only grep the replication connection, but before being replicated
> the entry has to be added by some client connection, can you get all
> references to the entry ?
> the log snippet you provide shows also csns with tag=103, which indicate a
> MOD, are these MODs for the added entries ? or other mods ?

.

I can't believe I did that!

Ok, so the logs have been rotated (I didn't think to adjust
logrotate..), so there aren't any logs to peruse for the case I've
presented so far.  However, I was able to reproduce the errors by
"bulk" deleting 39 DNS entries, and only the MASTER reported
"replica_generate_next_csn" entries.

Given the size of the logs, I think it would be pointless to do any
kind of sanitization.  I'll go ahead and gzip them for you and email
you off-list.

I've labeled them as MASTER and REPLICA.

John DeSantis


2016-08-19 9:18 GMT-04:00 Ludwig Krispenz :
>
> On 08/18/2016 05:28 PM, John Desantis wrote:
>>
>> Ludwig,
>>
>>> unfortunately this is not enough to determine what is going on. The
>>> intersting generated/used csn is only logged in the
>>> corresponding RESULT message and these are only the replication
>>> connections,
>>> it would be necessary to see the
>>> original ADD operation, was it added once or twice by a client ?
>>> you could pick one entry eg server-6-3-sp and grep for all references in
>>> the
>>> access logs of both servers  (maybe there are mods as well) and then
>>> get also get the RESULT line for the ops found
>>
>> Here are the updated log snippets looking for ADD and RESULT:
>
> you still only grep the replication connection, but before being replicated
> the entry has to be added by some client connection, can you get all
> references to the entry ?
> the log snippet you provide shows also csns with tag=103, which indicate a
> MOD, are these MODs for the added entries ? or other mods ?
>
>>
>> PROD:11:20:13-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM
>> # grep -E '17/Aug/2016:13:50:4.*conn=602.*(RESULT|ADD)' access.2016081*
>> access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4139
>> RESULT err=0 tag=120 nentries=0 etime=0
>> access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4140
>> RESULT err=0 tag=120 nentries=0 etime=0
>> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4141
>> RESULT err=0 tag=120 nentries=0 etime=0
>> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4142
>> RESULT err=0 tag=120 nentries=0 etime=0
>> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
>> ADD
>> dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
>> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
>> RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00030004
>> access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4144
>> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bb00040004
>> access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4145
>> RESULT err=0 tag=103 nentries=0 etime=0
>> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4146
>> RESULT err=0 tag=120 nentries=0 etime=0
>> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4147
>> RESULT err=0 tag=120 nentries=0 etime=0
>> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
>> ADD
>> dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
>> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
>> RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00080004
>> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4149
>> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bc00010004
>> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4150
>> RESULT err=0 tag=103 nentries=0 etime=0
>> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
>> ADD
>> dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
>> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
>> RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4c100050004
>> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4152
>> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c100060004
>>
>> PROD:11:19:54-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM
>> # grep -E '17/Aug/2016:13:50:4.*conn=1395.*(RESULT|ADD)' access.2016081*
>> access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4148
>> RESULT err=0 tag=120 nentries=0 etime=0
>> access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4149
>> RESULT err=0 tag=120 nentries=0 etime=0
>> access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4150
>> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4b900050016
>> access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
>> ADD
>> dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
>> 

Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

2016-08-19 Thread Ludwig Krispenz


On 08/18/2016 05:28 PM, John Desantis wrote:

Ludwig,


unfortunately this is not enough to determine what is going on. The
intersting generated/used csn is only logged in the
corresponding RESULT message and these are only the replication connections,
it would be necessary to see the
original ADD operation, was it added once or twice by a client ?
you could pick one entry eg server-6-3-sp and grep for all references in the
access logs of both servers  (maybe there are mods as well) and then
get also get the RESULT line for the ops found

Here are the updated log snippets looking for ADD and RESULT:
you still only grep the replication connection, but before being 
replicated the entry has to be added by some client connection, can you 
get all references to the entry ?
the log snippet you provide shows also csns with tag=103, which indicate 
a MOD, are these MODs for the added entries ? or other mods ?


PROD:11:20:13-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=602.*(RESULT|ADD)' access.2016081*
access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4139
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4140
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4141
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4142
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00030004
access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4144
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bb00040004
access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4145
RESULT err=0 tag=103 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4146
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4147
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
ADD dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00080004
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4149
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bc00010004
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4150
RESULT err=0 tag=103 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4c100050004
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4152
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c100060004

PROD:11:19:54-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=1395.*(RESULT|ADD)' access.2016081*
access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4148
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4149
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4150
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4b900050016
access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
RESULT err=0 tag=105 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:44 -0400] conn=1395 op=4152
RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4bc0016
access.20160817-111940:[17/Aug/2016:13:50:46 -0400] conn=1395 op=4153
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4154
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4155
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4156
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:48 -0400] conn=1395 op=4157
RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4c100010016
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
RESULT err=0 tag=105 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4159
RESULT err=0 tag=103 

Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

2016-08-18 Thread John Desantis
Ludwig,

> unfortunately this is not enough to determine what is going on. The
> intersting generated/used csn is only logged in the
> corresponding RESULT message and these are only the replication connections,
> it would be necessary to see the
> original ADD operation, was it added once or twice by a client ?
> you could pick one entry eg server-6-3-sp and grep for all references in the
> access logs of both servers  (maybe there are mods as well) and then
> get also get the RESULT line for the ops found

Here are the updated log snippets looking for ADD and RESULT:

PROD:11:20:13-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=602.*(RESULT|ADD)' access.2016081*
access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4139
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4140
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4141
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4142
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00030004
access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4144
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bb00040004
access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4145
RESULT err=0 tag=103 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4146
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4147
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
ADD dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00080004
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4149
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bc00010004
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4150
RESULT err=0 tag=103 nentries=0 etime=0
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4c100050004
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4152
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c100060004

PROD:11:19:54-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=1395.*(RESULT|ADD)' access.2016081*
access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4148
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4149
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4150
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4b900050016
access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
RESULT err=0 tag=105 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:44 -0400] conn=1395 op=4152
RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4bc0016
access.20160817-111940:[17/Aug/2016:13:50:46 -0400] conn=1395 op=4153
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4154
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4155
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4156
RESULT err=0 tag=120 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:48 -0400] conn=1395 op=4157
RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4c100010016
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
RESULT err=0 tag=105 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4159
RESULT err=0 tag=103 nentries=0 etime=0
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4160
RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c30016

I'm positive that I was the only one performing DNS updates during
this time, and I was only using 1 console.

Thanks,
John DeSantis


2016-08-18 10:09 GMT-04:00 Ludwig Krispenz 

Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

2016-08-18 Thread Ludwig Krispenz


On 08/18/2016 03:15 PM, John Desantis wrote:

Ludwig,

Thank you for your response!


This is a normal scenario, but you could check if the simultaneous updates
on 4 and 16 are intentional.

In regards to the simultaneous updates, the only items I have noted so far are:

*  The time sync between the master (4) and replica (16) was off by
about 1-2 seconds, with the latter being ahead;
yes, this happens, but the replication protocol tries to handle this, in 
a replication session the supplier and consumer
exchange their ruvs and if the time differs the csn state generator is 
updated with a local or remote offset so that the
generated time is always based on the most advanced clock - on all 
servers. And even if you adjust the system time, the csn

time will never go back.

*  There are continual log entries referencing
"replication-multimaster-extop" and "Netscape Replication End Session"
in the dirsrv "access" logs, and during one of the manifestations of
"replica_generate_next_csn", I found this:

PROD:08:46:08-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=602.*ADD' access.2016081*
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
ADD dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"

PROD:08:47:44-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=1395.*ADD' access.2016081*
access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"

It looks like the entries for server-6-3-sp and 6-5-sp were referenced
twice.  Do you think that the time being off by 1-2 seconds between
the master and replica could be the issue?  The connection 602 is the
replication between the replica and master, and the connection 1395 is
the replication between the master and replica.
unfortunately this is not enough to determine what is going on. The 
intersting generated/used csn is only logged in the
corresponding RESULT message and these are only the replication 
connections, it would be necessary to see the

original ADD operation, was it added once or twice by a client ?
you could pick one entry eg server-6-3-sp and grep for all references in 
the access logs of both servers  (maybe there are mods as well) and then

get also get the RESULT line for the ops found


Since I know these operations were performed using the console via a
for loop 'ipa dnsrecord-add dom.dom.dom server-6-$i-sp
--a-rec=10.250.12.$i' on one of our login nodes, do you think that
specifying an _srv_ record in the DOMAIN configuration with the
address of the master server, e.g.: ipa_server = _srv_,
MASTER.dom.dom.dom could be the issue (coupled with the time syncing)?

I know that these questions are probably leaning more towards the
389ds team, so feel free to pass me over to them if need be.
I think I can address the ds related questions, but I don't know about 
console and dns to assess if the behaviour is normal


Again, thank you very much for responding!

John DeSantis

2016-08-18 4:14 GMT-04:00 Ludwig Krispenz :

On 08/17/2016 08:54 PM, John Desantis wrote:

Hello all,

We've been re-using old host names and IP addresses for a new
deployment of nodes, and recently I've been seeing the messages pasted
below in the slapd-DC.DC.DC "error" log on our nodes.

[17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn:
opcsn=57b475cd00120004 <= basecsn=57b475cf0016, adjusted
opcsn=57b475cf00010004
[17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
opcsn=57b47f020004 <= basecsn=57b47f020016, adjusted
opcsn=57b47f030004
[17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
opcsn=57b47f040004 <= basecsn=57b47f040016, adjusted
opcsn=57b47f050004
[17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn:
opcsn=57b47f2f0014 <= basecsn=57b47f320016, adjusted
opcsn=57b47f330004
[17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn:
opcsn=57b4a4bb00090004 <= basecsn=57b4a4bc0016, adjusted
opcsn=57b4a4bc00010004
[17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn:
opcsn=57b4a53e000a0004 <= basecsn=57b4a53f0016, adjusted
opcsn=57b4a53f00010004
[17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn:
opcsn=57b4a55200070004 <= basecsn=57b4a5530016, adjusted
opcsn=57b4a55300010004
[17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn:

Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

2016-08-18 Thread John Desantis
Ludwig,

Thank you for your response!

> This is a normal scenario, but you could check if the simultaneous updates
> on 4 and 16 are intentional.

In regards to the simultaneous updates, the only items I have noted so far are:

*  The time sync between the master (4) and replica (16) was off by
about 1-2 seconds, with the latter being ahead;
*  There are continual log entries referencing
"replication-multimaster-extop" and "Netscape Replication End Session"
in the dirsrv "access" logs, and during one of the manifestations of
"replica_generate_next_csn", I found this:

PROD:08:46:08-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=602.*ADD' access.2016081*
access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
ADD dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"

PROD:08:47:44-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM
# grep -E '17/Aug/2016:13:50:4.*conn=1395.*ADD' access.2016081*
access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"

It looks like the entries for server-6-3-sp and 6-5-sp were referenced
twice.  Do you think that the time being off by 1-2 seconds between
the master and replica could be the issue?  The connection 602 is the
replication between the replica and master, and the connection 1395 is
the replication between the master and replica.

Since I know these operations were performed using the console via a
for loop 'ipa dnsrecord-add dom.dom.dom server-6-$i-sp
--a-rec=10.250.12.$i' on one of our login nodes, do you think that
specifying an _srv_ record in the DOMAIN configuration with the
address of the master server, e.g.: ipa_server = _srv_,
MASTER.dom.dom.dom could be the issue (coupled with the time syncing)?

I know that these questions are probably leaning more towards the
389ds team, so feel free to pass me over to them if need be.

Again, thank you very much for responding!

John DeSantis

2016-08-18 4:14 GMT-04:00 Ludwig Krispenz :
>
> On 08/17/2016 08:54 PM, John Desantis wrote:
>>
>> Hello all,
>>
>> We've been re-using old host names and IP addresses for a new
>> deployment of nodes, and recently I've been seeing the messages pasted
>> below in the slapd-DC.DC.DC "error" log on our nodes.
>>
>> [17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn:
>> opcsn=57b475cd00120004 <= basecsn=57b475cf0016, adjusted
>> opcsn=57b475cf00010004
>> [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
>> opcsn=57b47f020004 <= basecsn=57b47f020016, adjusted
>> opcsn=57b47f030004
>> [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
>> opcsn=57b47f040004 <= basecsn=57b47f040016, adjusted
>> opcsn=57b47f050004
>> [17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn:
>> opcsn=57b47f2f0014 <= basecsn=57b47f320016, adjusted
>> opcsn=57b47f330004
>> [17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn:
>> opcsn=57b4a4bb00090004 <= basecsn=57b4a4bc0016, adjusted
>> opcsn=57b4a4bc00010004
>> [17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn:
>> opcsn=57b4a53e000a0004 <= basecsn=57b4a53f0016, adjusted
>> opcsn=57b4a53f00010004
>> [17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn:
>> opcsn=57b4a55200070004 <= basecsn=57b4a5530016, adjusted
>> opcsn=57b4a55300010004
>> [17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn:
>> opcsn=57b4a56200090004 <= basecsn=57b4a5640016, adjusted
>> opcsn=57b4a56400010004
>
> Each modification (add/del/mod) gets a csn assignged used in replication
> update resolution. And each assigned csn has to newer than an existing one.
> The messages you see is from code that double checks that the entry doesn't
> have already a lareg csn - and adjusts it.
> The logs indicate that entries are more or less concurrently updated on
> replica 4 and 16, and the updates from16 are received while processing the
> updates on 4.
> This is a normal scenario, but you could check if the simultaneous updates
> on 4 and 16 are intentional.
>
>>
>> They seem to only occur when updating DNS entries, whether on the
>> console or via the GUI (tail -f'ing the log).
>>
>> A search in this mailing-list returns nothing, but a message is found
>> on the 389-ds list [1];  it seems to suggest that the messages aren't
>> fatal and are purely informational, yet if they are 

Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

2016-08-18 Thread Ludwig Krispenz


On 08/17/2016 08:54 PM, John Desantis wrote:

Hello all,

We've been re-using old host names and IP addresses for a new
deployment of nodes, and recently I've been seeing the messages pasted
below in the slapd-DC.DC.DC "error" log on our nodes.

[17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn:
opcsn=57b475cd00120004 <= basecsn=57b475cf0016, adjusted
opcsn=57b475cf00010004
[17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
opcsn=57b47f020004 <= basecsn=57b47f020016, adjusted
opcsn=57b47f030004
[17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
opcsn=57b47f040004 <= basecsn=57b47f040016, adjusted
opcsn=57b47f050004
[17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn:
opcsn=57b47f2f0014 <= basecsn=57b47f320016, adjusted
opcsn=57b47f330004
[17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn:
opcsn=57b4a4bb00090004 <= basecsn=57b4a4bc0016, adjusted
opcsn=57b4a4bc00010004
[17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn:
opcsn=57b4a53e000a0004 <= basecsn=57b4a53f0016, adjusted
opcsn=57b4a53f00010004
[17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn:
opcsn=57b4a55200070004 <= basecsn=57b4a5530016, adjusted
opcsn=57b4a55300010004
[17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn:
opcsn=57b4a56200090004 <= basecsn=57b4a5640016, adjusted
opcsn=57b4a56400010004
Each modification (add/del/mod) gets a csn assignged used in replication 
update resolution. And each assigned csn has to newer than an existing 
one. The messages you see is from code that double checks that the entry 
doesn't have already a lareg csn - and adjusts it.
The logs indicate that entries are more or less concurrently updated on 
replica 4 and 16, and the updates from16 are received while processing 
the updates on 4.
This is a normal scenario, but you could check if the simultaneous 
updates on 4 and 16 are intentional.


They seem to only occur when updating DNS entries, whether on the
console or via the GUI (tail -f'ing the log).

A search in this mailing-list returns nothing, but a message is found
on the 389-ds list [1];  it seems to suggest that the messages aren't
fatal and are purely informational, yet if they are occurring
constantly that there could be a problem with the replication
algorithm and/or deployment.

We're using ipa-server 3.0.0-47 and 389-ds 1.2.11.15-60.  Nothing has
changed on the deployment side of things, and I don't recall seeing
this message before.

I'm wondering if it's safe to disregard these messages due to the
re-use of the entries, or if something else should be looked into.

Thank you,
John DeSantis

[1] https://fedorahosted.org/389/ticket/47959



--
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric 
Shander

--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


[Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

2016-08-17 Thread John Desantis
Hello all,

We've been re-using old host names and IP addresses for a new
deployment of nodes, and recently I've been seeing the messages pasted
below in the slapd-DC.DC.DC "error" log on our nodes.

[17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn:
opcsn=57b475cd00120004 <= basecsn=57b475cf0016, adjusted
opcsn=57b475cf00010004
[17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
opcsn=57b47f020004 <= basecsn=57b47f020016, adjusted
opcsn=57b47f030004
[17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
opcsn=57b47f040004 <= basecsn=57b47f040016, adjusted
opcsn=57b47f050004
[17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn:
opcsn=57b47f2f0014 <= basecsn=57b47f320016, adjusted
opcsn=57b47f330004
[17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn:
opcsn=57b4a4bb00090004 <= basecsn=57b4a4bc0016, adjusted
opcsn=57b4a4bc00010004
[17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn:
opcsn=57b4a53e000a0004 <= basecsn=57b4a53f0016, adjusted
opcsn=57b4a53f00010004
[17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn:
opcsn=57b4a55200070004 <= basecsn=57b4a5530016, adjusted
opcsn=57b4a55300010004
[17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn:
opcsn=57b4a56200090004 <= basecsn=57b4a5640016, adjusted
opcsn=57b4a56400010004

They seem to only occur when updating DNS entries, whether on the
console or via the GUI (tail -f'ing the log).

A search in this mailing-list returns nothing, but a message is found
on the 389-ds list [1];  it seems to suggest that the messages aren't
fatal and are purely informational, yet if they are occurring
constantly that there could be a problem with the replication
algorithm and/or deployment.

We're using ipa-server 3.0.0-47 and 389-ds 1.2.11.15-60.  Nothing has
changed on the deployment side of things, and I don't recall seeing
this message before.

I'm wondering if it's safe to disregard these messages due to the
re-use of the entries, or if something else should be looked into.

Thank you,
John DeSantis

[1] https://fedorahosted.org/389/ticket/47959

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project