Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs
Ludwig, > I looked into the logs, I think the messages are harmless, just an effect of > csn adjustment due to time difference on the two machines. I had said that > the replication protocol will try to adjust the csn generator, but looks > like you have long lasting replication connections and the adjustment is > done only at the beginning. Maybe we can look into this and improve it. > Just the tracking of one of these error messages: Thank you for your insight and looking into these logs. Given the relative obscurity of results of this message within Google and this mailing list, there may be nothing to improve! In other words, it looks like the issue has been resolved. In a not so nutshell, I've been monitoring a ns-slapd thread that was continually pegged with high file I/O via the 'pread' while reading its db* files on the master server, which produced some latency. After seemingly pointless searches, I stumbled upon Rich's "dbmon.sh" via a post [1] and verified that some tuning was needed for our site. After applying the changes I did notice that there was a drop in memory pressure on the system and that there seemed to be less latency, but the ns-slapd thread was still performing a lot of file I/O. It seems now that the issue with the timing was due to this observed latency. Anyways, I was still bugged with an issue I had originally opened in my first post to the list [2] and finally discovered that one of our replication nodes was culpable for not responding to the 'ipa-replica-manage clean-ruv #' (stdout via this command did not report which servers had and had not properly cleaned the RUV). I was able to manually remove it via 'ldapmodify' and cleanruv. At this point, some of the file I/O I had seen was more than halved. The last piece of the puzzle was using "ipa-csreplica-manage" and 'ldapmodify' to remove [3] the CA references to the replica mentioned in [1]. Once this was done, all of the thread I/O stopped. I then performed some testing of adding and removing DNS records via a for loop, with and without sleep statements. Not once did any more of the replica_generate_next_csn messages appear. For anyone else seeing similar issues, hopefully this information will help. John DeSantis [1] https://www.redhat.com/archives/freeipa-users/2014-November/msg00138.html [2] https://www.redhat.com/archives/freeipa-users/2014-October/msg00283.html [3] https://www.redhat.com/archives/freeipa-users/2015-March/msg00436.html 2016-08-22 5:41 GMT-04:00 Ludwig Krispenz: > Thanks, > > I looked into the logs, I think the messages are harmless, just an effect of > csn adjustment due to time difference on the two machines. I had said that > the replication protocol will try to adjust the csn generator, but looks > like you have long lasting replication connections and the adjustment is > done only at the beginning. Maybe we can look into this and improve it. > Just the tracking of one of these error messages: > > the entry is modified on adm3 > adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 MOD > dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" > adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 RESULT err=0 tag=103 > nentries=0 etime=0 csn=57b763030016 > this mod is replicated to adm0 > adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 MOD > dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" > adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 RESULT err=0 tag=103 > nentries=0 etime=0 csn=57b763030016 > the entry is modified again on adm0 > adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 MOD > dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" > but it looks like the csn generated is smaller than the one already in the > entry, and it is adjusted > adm0 :[19/Aug/2016:15:47:07 -0400] - replica_generate_next_csn: > opcsn=57b76301000a0004 <= basecsn=57b763030016, adjusted > opcsn=57b7630300010004 > then the result is logged with the adjusted csn > adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 RESULT err=0 tag=103 > nentries=0 etime=0 csn=57b7630300010004 > > so the mechanism works, but the messages may be confusing and improvement of > the protocol could be investigated. > > One question I have, but someone more familiar with dns should answer: > we have regular updates of the same entry on both replicas, about every 2 > seconds, what is the reason for this ? > > > /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:03 -0400] conn=13 > op=126630 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" > /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:05 -0400] conn=13 > op=126637 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" > /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:07 -0400] conn=13 > op=126646 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" > /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:09 -0400] conn=13 > op=126653 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu"
Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs
Thanks, I looked into the logs, I think the messages are harmless, just an effect of csn adjustment due to time difference on the two machines. I had said that the replication protocol will try to adjust the csn generator, but looks like you have long lasting replication connections and the adjustment is done only at the beginning. Maybe we can look into this and improve it. Just the tracking of one of these error messages: the entry is modified on adm3 adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" adm3 :[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b763030016 this mod is replicated to adm0 adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" adm0 :[19/Aug/2016:15:47:06 -0400] conn=1395 op=86121 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b763030016 the entry is modified again on adm0 adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" but it looks like the csn generated is smaller than the one already in the entry, and it is adjusted adm0 :[19/Aug/2016:15:47:07 -0400] - replica_generate_next_csn: opcsn=57b76301000a0004 <= basecsn=57b763030016, adjusted opcsn=57b7630300010004 then the result is logged with the adjusted csn adm0 :[19/Aug/2016:15:47:07 -0400] conn=27 op=1108697 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b7630300010004 so the mechanism works, but the messages may be confusing and improvement of the protocol could be investigated. One question I have, but someone more familiar with dns should answer: we have regular updates of the same entry on both replicas, about every 2 seconds, what is the reason for this ? /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:03 -0400] conn=13 op=126630 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:05 -0400] conn=13 op=126637 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:07 -0400] conn=13 op=126646 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:09 -0400] conn=13 op=126653 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:13 -0400] conn=13 op=12 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:16 -0400] conn=13 op=126673 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:18 -0400] conn=13 op=126689 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:20 -0400] conn=13 op=126696 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:21 -0400] conn=13 op=126702 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:23 -0400] conn=13 op=126737 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:26 -0400] conn=13 op=126758 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" /tmp/adm3-logs-del39-errors.txt:[19/Aug/2016:15:47:29 -0400] conn=13 op=126801 MOD dn="idnsname=rc.usf.edu,cn=dns,dc=rc,dc=usf,dc=edu" On 08/19/2016 10:00 PM, John Desantis wrote: Ludwig, you still only grep the replication connection, but before being replicated the entry has to be added by some client connection, can you get all references to the entry ? the log snippet you provide shows also csns with tag=103, which indicate a MOD, are these MODs for the added entries ? or other mods ? . I can't believe I did that! Ok, so the logs have been rotated (I didn't think to adjust logrotate..), so there aren't any logs to peruse for the case I've presented so far. However, I was able to reproduce the errors by "bulk" deleting 39 DNS entries, and only the MASTER reported "replica_generate_next_csn" entries. Given the size of the logs, I think it would be pointless to do any kind of sanitization. I'll go ahead and gzip them for you and email you off-list. I've labeled them as MASTER and REPLICA. John DeSantis 2016-08-19 9:18 GMT-04:00 Ludwig Krispenz: On 08/18/2016 05:28 PM, John Desantis wrote: Ludwig, unfortunately this is not enough to determine what is going on. The intersting generated/used csn is only logged in the corresponding RESULT message and these are only the replication connections, it would be necessary to see the original ADD operation, was it added once or twice by a client ? you could pick one entry eg server-6-3-sp and grep for all references in the access logs of both servers (maybe there are mods as well) and then get also get the RESULT line for the ops found
Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs
Ludwig, > you still only grep the replication connection, but before being replicated > the entry has to be added by some client connection, can you get all > references to the entry ? > the log snippet you provide shows also csns with tag=103, which indicate a > MOD, are these MODs for the added entries ? or other mods ? . I can't believe I did that! Ok, so the logs have been rotated (I didn't think to adjust logrotate..), so there aren't any logs to peruse for the case I've presented so far. However, I was able to reproduce the errors by "bulk" deleting 39 DNS entries, and only the MASTER reported "replica_generate_next_csn" entries. Given the size of the logs, I think it would be pointless to do any kind of sanitization. I'll go ahead and gzip them for you and email you off-list. I've labeled them as MASTER and REPLICA. John DeSantis 2016-08-19 9:18 GMT-04:00 Ludwig Krispenz: > > On 08/18/2016 05:28 PM, John Desantis wrote: >> >> Ludwig, >> >>> unfortunately this is not enough to determine what is going on. The >>> intersting generated/used csn is only logged in the >>> corresponding RESULT message and these are only the replication >>> connections, >>> it would be necessary to see the >>> original ADD operation, was it added once or twice by a client ? >>> you could pick one entry eg server-6-3-sp and grep for all references in >>> the >>> access logs of both servers (maybe there are mods as well) and then >>> get also get the RESULT line for the ops found >> >> Here are the updated log snippets looking for ADD and RESULT: > > you still only grep the replication connection, but before being replicated > the entry has to be added by some client connection, can you get all > references to the entry ? > the log snippet you provide shows also csns with tag=103, which indicate a > MOD, are these MODs for the added entries ? or other mods ? > >> >> PROD:11:20:13-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM >> # grep -E '17/Aug/2016:13:50:4.*conn=602.*(RESULT|ADD)' access.2016081* >> access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4139 >> RESULT err=0 tag=120 nentries=0 etime=0 >> access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4140 >> RESULT err=0 tag=120 nentries=0 etime=0 >> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4141 >> RESULT err=0 tag=120 nentries=0 etime=0 >> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4142 >> RESULT err=0 tag=120 nentries=0 etime=0 >> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143 >> ADD >> dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" >> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143 >> RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00030004 >> access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4144 >> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bb00040004 >> access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4145 >> RESULT err=0 tag=103 nentries=0 etime=0 >> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4146 >> RESULT err=0 tag=120 nentries=0 etime=0 >> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4147 >> RESULT err=0 tag=120 nentries=0 etime=0 >> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148 >> ADD >> dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" >> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148 >> RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00080004 >> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4149 >> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bc00010004 >> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4150 >> RESULT err=0 tag=103 nentries=0 etime=0 >> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151 >> ADD >> dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" >> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151 >> RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4c100050004 >> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4152 >> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c100060004 >> >> PROD:11:19:54-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM >> # grep -E '17/Aug/2016:13:50:4.*conn=1395.*(RESULT|ADD)' access.2016081* >> access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4148 >> RESULT err=0 tag=120 nentries=0 etime=0 >> access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4149 >> RESULT err=0 tag=120 nentries=0 etime=0 >> access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4150 >> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4b900050016 >> access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151 >> ADD >> dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" >>
Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs
On 08/18/2016 05:28 PM, John Desantis wrote: Ludwig, unfortunately this is not enough to determine what is going on. The intersting generated/used csn is only logged in the corresponding RESULT message and these are only the replication connections, it would be necessary to see the original ADD operation, was it added once or twice by a client ? you could pick one entry eg server-6-3-sp and grep for all references in the access logs of both servers (maybe there are mods as well) and then get also get the RESULT line for the ops found Here are the updated log snippets looking for ADD and RESULT: you still only grep the replication connection, but before being replicated the entry has to be added by some client connection, can you get all references to the entry ? the log snippet you provide shows also csns with tag=103, which indicate a MOD, are these MODs for the added entries ? or other mods ? PROD:11:20:13-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM # grep -E '17/Aug/2016:13:50:4.*conn=602.*(RESULT|ADD)' access.2016081* access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4139 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4140 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4141 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4142 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143 ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143 RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00030004 access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4144 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bb00040004 access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4145 RESULT err=0 tag=103 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4146 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4147 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148 ADD dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148 RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00080004 access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4149 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bc00010004 access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4150 RESULT err=0 tag=103 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151 ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151 RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4c100050004 access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4152 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c100060004 PROD:11:19:54-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM # grep -E '17/Aug/2016:13:50:4.*conn=1395.*(RESULT|ADD)' access.2016081* access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4148 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4149 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4150 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4b900050016 access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151 ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151 RESULT err=0 tag=105 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:44 -0400] conn=1395 op=4152 RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4bc0016 access.20160817-111940:[17/Aug/2016:13:50:46 -0400] conn=1395 op=4153 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4154 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4155 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4156 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:48 -0400] conn=1395 op=4157 RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4c100010016 access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158 ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158 RESULT err=0 tag=105 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4159 RESULT err=0 tag=103
Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs
Ludwig, > unfortunately this is not enough to determine what is going on. The > intersting generated/used csn is only logged in the > corresponding RESULT message and these are only the replication connections, > it would be necessary to see the > original ADD operation, was it added once or twice by a client ? > you could pick one entry eg server-6-3-sp and grep for all references in the > access logs of both servers (maybe there are mods as well) and then > get also get the RESULT line for the ops found Here are the updated log snippets looking for ADD and RESULT: PROD:11:20:13-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM # grep -E '17/Aug/2016:13:50:4.*conn=602.*(RESULT|ADD)' access.2016081* access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4139 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4140 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4141 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4142 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143 ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143 RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00030004 access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4144 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bb00040004 access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4145 RESULT err=0 tag=103 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4146 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4147 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148 ADD dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148 RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb00080004 access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4149 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bc00010004 access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4150 RESULT err=0 tag=103 nentries=0 etime=0 access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151 ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151 RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4c100050004 access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4152 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c100060004 PROD:11:19:54-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM # grep -E '17/Aug/2016:13:50:4.*conn=1395.*(RESULT|ADD)' access.2016081* access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4148 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4149 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4150 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4b900050016 access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151 ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151 RESULT err=0 tag=105 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:44 -0400] conn=1395 op=4152 RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4bc0016 access.20160817-111940:[17/Aug/2016:13:50:46 -0400] conn=1395 op=4153 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4154 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4155 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4156 RESULT err=0 tag=120 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:48 -0400] conn=1395 op=4157 RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4c100010016 access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158 ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158 RESULT err=0 tag=105 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4159 RESULT err=0 tag=103 nentries=0 etime=0 access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4160 RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c30016 I'm positive that I was the only one performing DNS updates during this time, and I was only using 1 console. Thanks, John DeSantis 2016-08-18 10:09 GMT-04:00 Ludwig Krispenz
Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs
On 08/18/2016 03:15 PM, John Desantis wrote: Ludwig, Thank you for your response! This is a normal scenario, but you could check if the simultaneous updates on 4 and 16 are intentional. In regards to the simultaneous updates, the only items I have noted so far are: * The time sync between the master (4) and replica (16) was off by about 1-2 seconds, with the latter being ahead; yes, this happens, but the replication protocol tries to handle this, in a replication session the supplier and consumer exchange their ruvs and if the time differs the csn state generator is updated with a local or remote offset so that the generated time is always based on the most advanced clock - on all servers. And even if you adjust the system time, the csn time will never go back. * There are continual log entries referencing "replication-multimaster-extop" and "Netscape Replication End Session" in the dirsrv "access" logs, and during one of the manifestations of "replica_generate_next_csn", I found this: PROD:08:46:08-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM # grep -E '17/Aug/2016:13:50:4.*conn=602.*ADD' access.2016081* access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143 ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148 ADD dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151 ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" PROD:08:47:44-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM # grep -E '17/Aug/2016:13:50:4.*conn=1395.*ADD' access.2016081* access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151 ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158 ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" It looks like the entries for server-6-3-sp and 6-5-sp were referenced twice. Do you think that the time being off by 1-2 seconds between the master and replica could be the issue? The connection 602 is the replication between the replica and master, and the connection 1395 is the replication between the master and replica. unfortunately this is not enough to determine what is going on. The intersting generated/used csn is only logged in the corresponding RESULT message and these are only the replication connections, it would be necessary to see the original ADD operation, was it added once or twice by a client ? you could pick one entry eg server-6-3-sp and grep for all references in the access logs of both servers (maybe there are mods as well) and then get also get the RESULT line for the ops found Since I know these operations were performed using the console via a for loop 'ipa dnsrecord-add dom.dom.dom server-6-$i-sp --a-rec=10.250.12.$i' on one of our login nodes, do you think that specifying an _srv_ record in the DOMAIN configuration with the address of the master server, e.g.: ipa_server = _srv_, MASTER.dom.dom.dom could be the issue (coupled with the time syncing)? I know that these questions are probably leaning more towards the 389ds team, so feel free to pass me over to them if need be. I think I can address the ds related questions, but I don't know about console and dns to assess if the behaviour is normal Again, thank you very much for responding! John DeSantis 2016-08-18 4:14 GMT-04:00 Ludwig Krispenz: On 08/17/2016 08:54 PM, John Desantis wrote: Hello all, We've been re-using old host names and IP addresses for a new deployment of nodes, and recently I've been seeing the messages pasted below in the slapd-DC.DC.DC "error" log on our nodes. [17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn: opcsn=57b475cd00120004 <= basecsn=57b475cf0016, adjusted opcsn=57b475cf00010004 [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn: opcsn=57b47f020004 <= basecsn=57b47f020016, adjusted opcsn=57b47f030004 [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn: opcsn=57b47f040004 <= basecsn=57b47f040016, adjusted opcsn=57b47f050004 [17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn: opcsn=57b47f2f0014 <= basecsn=57b47f320016, adjusted opcsn=57b47f330004 [17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn: opcsn=57b4a4bb00090004 <= basecsn=57b4a4bc0016, adjusted opcsn=57b4a4bc00010004 [17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn: opcsn=57b4a53e000a0004 <= basecsn=57b4a53f0016, adjusted opcsn=57b4a53f00010004 [17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn: opcsn=57b4a55200070004 <= basecsn=57b4a5530016, adjusted opcsn=57b4a55300010004 [17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn:
Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs
Ludwig, Thank you for your response! > This is a normal scenario, but you could check if the simultaneous updates > on 4 and 16 are intentional. In regards to the simultaneous updates, the only items I have noted so far are: * The time sync between the master (4) and replica (16) was off by about 1-2 seconds, with the latter being ahead; * There are continual log entries referencing "replication-multimaster-extop" and "Netscape Replication End Session" in the dirsrv "access" logs, and during one of the manifestations of "replica_generate_next_csn", I found this: PROD:08:46:08-root@REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM # grep -E '17/Aug/2016:13:50:4.*conn=602.*ADD' access.2016081* access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143 ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148 ADD dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151 ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" PROD:08:47:44-root@MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM # grep -E '17/Aug/2016:13:50:4.*conn=1395.*ADD' access.2016081* access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151 ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158 ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom" It looks like the entries for server-6-3-sp and 6-5-sp were referenced twice. Do you think that the time being off by 1-2 seconds between the master and replica could be the issue? The connection 602 is the replication between the replica and master, and the connection 1395 is the replication between the master and replica. Since I know these operations were performed using the console via a for loop 'ipa dnsrecord-add dom.dom.dom server-6-$i-sp --a-rec=10.250.12.$i' on one of our login nodes, do you think that specifying an _srv_ record in the DOMAIN configuration with the address of the master server, e.g.: ipa_server = _srv_, MASTER.dom.dom.dom could be the issue (coupled with the time syncing)? I know that these questions are probably leaning more towards the 389ds team, so feel free to pass me over to them if need be. Again, thank you very much for responding! John DeSantis 2016-08-18 4:14 GMT-04:00 Ludwig Krispenz: > > On 08/17/2016 08:54 PM, John Desantis wrote: >> >> Hello all, >> >> We've been re-using old host names and IP addresses for a new >> deployment of nodes, and recently I've been seeing the messages pasted >> below in the slapd-DC.DC.DC "error" log on our nodes. >> >> [17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn: >> opcsn=57b475cd00120004 <= basecsn=57b475cf0016, adjusted >> opcsn=57b475cf00010004 >> [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn: >> opcsn=57b47f020004 <= basecsn=57b47f020016, adjusted >> opcsn=57b47f030004 >> [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn: >> opcsn=57b47f040004 <= basecsn=57b47f040016, adjusted >> opcsn=57b47f050004 >> [17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn: >> opcsn=57b47f2f0014 <= basecsn=57b47f320016, adjusted >> opcsn=57b47f330004 >> [17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn: >> opcsn=57b4a4bb00090004 <= basecsn=57b4a4bc0016, adjusted >> opcsn=57b4a4bc00010004 >> [17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn: >> opcsn=57b4a53e000a0004 <= basecsn=57b4a53f0016, adjusted >> opcsn=57b4a53f00010004 >> [17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn: >> opcsn=57b4a55200070004 <= basecsn=57b4a5530016, adjusted >> opcsn=57b4a55300010004 >> [17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn: >> opcsn=57b4a56200090004 <= basecsn=57b4a5640016, adjusted >> opcsn=57b4a56400010004 > > Each modification (add/del/mod) gets a csn assignged used in replication > update resolution. And each assigned csn has to newer than an existing one. > The messages you see is from code that double checks that the entry doesn't > have already a lareg csn - and adjusts it. > The logs indicate that entries are more or less concurrently updated on > replica 4 and 16, and the updates from16 are received while processing the > updates on 4. > This is a normal scenario, but you could check if the simultaneous updates > on 4 and 16 are intentional. > >> >> They seem to only occur when updating DNS entries, whether on the >> console or via the GUI (tail -f'ing the log). >> >> A search in this mailing-list returns nothing, but a message is found >> on the 389-ds list [1]; it seems to suggest that the messages aren't >> fatal and are purely informational, yet if they are
Re: [Freeipa-users] replica_generate_next_csn messages in dirsrv error logs
On 08/17/2016 08:54 PM, John Desantis wrote: Hello all, We've been re-using old host names and IP addresses for a new deployment of nodes, and recently I've been seeing the messages pasted below in the slapd-DC.DC.DC "error" log on our nodes. [17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn: opcsn=57b475cd00120004 <= basecsn=57b475cf0016, adjusted opcsn=57b475cf00010004 [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn: opcsn=57b47f020004 <= basecsn=57b47f020016, adjusted opcsn=57b47f030004 [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn: opcsn=57b47f040004 <= basecsn=57b47f040016, adjusted opcsn=57b47f050004 [17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn: opcsn=57b47f2f0014 <= basecsn=57b47f320016, adjusted opcsn=57b47f330004 [17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn: opcsn=57b4a4bb00090004 <= basecsn=57b4a4bc0016, adjusted opcsn=57b4a4bc00010004 [17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn: opcsn=57b4a53e000a0004 <= basecsn=57b4a53f0016, adjusted opcsn=57b4a53f00010004 [17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn: opcsn=57b4a55200070004 <= basecsn=57b4a5530016, adjusted opcsn=57b4a55300010004 [17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn: opcsn=57b4a56200090004 <= basecsn=57b4a5640016, adjusted opcsn=57b4a56400010004 Each modification (add/del/mod) gets a csn assignged used in replication update resolution. And each assigned csn has to newer than an existing one. The messages you see is from code that double checks that the entry doesn't have already a lareg csn - and adjusts it. The logs indicate that entries are more or less concurrently updated on replica 4 and 16, and the updates from16 are received while processing the updates on 4. This is a normal scenario, but you could check if the simultaneous updates on 4 and 16 are intentional. They seem to only occur when updating DNS entries, whether on the console or via the GUI (tail -f'ing the log). A search in this mailing-list returns nothing, but a message is found on the 389-ds list [1]; it seems to suggest that the messages aren't fatal and are purely informational, yet if they are occurring constantly that there could be a problem with the replication algorithm and/or deployment. We're using ipa-server 3.0.0-47 and 389-ds 1.2.11.15-60. Nothing has changed on the deployment side of things, and I don't recall seeing this message before. I'm wondering if it's safe to disregard these messages due to the re-use of the entries, or if something else should be looked into. Thank you, John DeSantis [1] https://fedorahosted.org/389/ticket/47959 -- Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
[Freeipa-users] replica_generate_next_csn messages in dirsrv error logs
Hello all, We've been re-using old host names and IP addresses for a new deployment of nodes, and recently I've been seeing the messages pasted below in the slapd-DC.DC.DC "error" log on our nodes. [17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn: opcsn=57b475cd00120004 <= basecsn=57b475cf0016, adjusted opcsn=57b475cf00010004 [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn: opcsn=57b47f020004 <= basecsn=57b47f020016, adjusted opcsn=57b47f030004 [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn: opcsn=57b47f040004 <= basecsn=57b47f040016, adjusted opcsn=57b47f050004 [17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn: opcsn=57b47f2f0014 <= basecsn=57b47f320016, adjusted opcsn=57b47f330004 [17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn: opcsn=57b4a4bb00090004 <= basecsn=57b4a4bc0016, adjusted opcsn=57b4a4bc00010004 [17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn: opcsn=57b4a53e000a0004 <= basecsn=57b4a53f0016, adjusted opcsn=57b4a53f00010004 [17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn: opcsn=57b4a55200070004 <= basecsn=57b4a5530016, adjusted opcsn=57b4a55300010004 [17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn: opcsn=57b4a56200090004 <= basecsn=57b4a5640016, adjusted opcsn=57b4a56400010004 They seem to only occur when updating DNS entries, whether on the console or via the GUI (tail -f'ing the log). A search in this mailing-list returns nothing, but a message is found on the 389-ds list [1]; it seems to suggest that the messages aren't fatal and are purely informational, yet if they are occurring constantly that there could be a problem with the replication algorithm and/or deployment. We're using ipa-server 3.0.0-47 and 389-ds 1.2.11.15-60. Nothing has changed on the deployment side of things, and I don't recall seeing this message before. I'm wondering if it's safe to disregard these messages due to the re-use of the entries, or if something else should be looked into. Thank you, John DeSantis [1] https://fedorahosted.org/389/ticket/47959 -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project