Re: [Freeipa-users] Haunted servers?
Hello Alexander, Cleanallruv can hang to do the cleanup (depending on task options and if replica are reachable). Did you try using CLEANRUV that is a more basic tool but that should not fail to do the cleanup. Before using cleanruv, you need to abort all cleanallruv pending tasks. Then for each RID that you want to clean, you have to log on each replica and run dn: cn=replica,cn=suffix,cn=mapping tree,cn=config changetype: modify replace: nsds5task nsds5task:CLEANRUVRID This task should succeeds but there is possibility that a given RID resurects in case a replication session occurs before all cleanRUV are completed. So we may have to do cleanRUV a second time. thanks thierry On 05/27/2015 11:06 AM, Alexander Frolushkin wrote: For common information - we also have a ghost replica id: unable to decode: {replica 16} 548a81260010 548a81260010 and trying to get it away with help of Red Hat support, but at this point - no luck... WBR, Alexander Frolushkin -Original Message- From: freeipa-users-boun...@redhat.com [mailto:freeipa-users-boun...@redhat.com] On Behalf Of Janelle Sent: Tuesday, May 26, 2015 8:56 PM To: thierry bordaz; Martin Kosek Cc: freeipa-users@redhat.com Subject: Re: [Freeipa-users] Haunted servers? On 5/26/15 7:04 AM, thierry bordaz wrote: On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? Hello Janelle, Thanks for update. So you worry that there might still be the rogue IPA replica that would be injecting the wrong replica data? In any case, I bet Ludwig and Thierry will follow up with your thread, there is just delay caused by the various public holidays and PTOs this week and we need to rest before digging into the fun with RUVs - as you already know yourself :-) unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 24} 554d53d300010018 554d54a400020018 unable to decode {replica 25} 554d78bf00020019 555af30200040019 unable to decode {replica 9} Now, if I delete these from a server using the ldapmodify method - they go away briefly, but then if I restart the server, they come back. Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem to go away from maybe 4 of them -- but if I wait a few minutes, it is almost as though replication is re-adding these bad replicas from the servers that I have NOT deleted them from. On each replica (master/replica) there are one RUV in the database and one RUV in the changelog. When cleanallruv succeeds it clears both of them. All replica should be reachable when you issue cleanallruv, so that it can clean the RUVs on all the replicas in almost single operation. If some replica are not reachable, they keep information of about the cleaned RID and then can later propagate those old RID to the rest of the replica. Ludwig managed to reproduce the issue with a quite complex test case (3 replicas and multiple cleanallruv). We have not yet identified the reason how a cleaned replicaId can get resurrected. In parallel we just reproduced it without a clear test case but in a 2 replica topology. So my question is simple - is there something in the logs I can look for that would indicate the SOURCE of these bogus entries? Is the replica 9 with NO extra data any indication of something I could look for? I guess that if I have the answer to your question we would have understood the bug .. A little more information to go on: I changed my password on a master (actually, the original master) and was able to login to each replica within a few seconds with the new password. This tells me replication is working across all the servers. I also created a new account and it showed up on all the servers, again within 15-20 seconds. This tells me replication
Re: [Freeipa-users] Haunted servers?
On 05/28/2015 09:33 AM, Alexander Frolushkin wrote: Hello! Thank you for this info. Things seems to be complicated for now... We have this: unable to decode: {replica 16} 548a81260010 548a81260010 on all of our 17 servers. After launching cleanallruv we have it disappeared from 16 servers and one server hangs (any requests addressed ldap just freezes, including ipactl status). After dirsrv restart (via systemctl restart ipa) I found unable to decode: {replica 16} 548a81260010 548a81260010 on this server (and only on it), run cleanallruv and get it from this server, but right after that unable to decode: {replica 16} 548a81260010 548a81260010 reappeared on three other servers. Hello, Yes this is exactly why cleanallruv is the first tool to use, it does the job on all replicas. When you restarted the hanging server, some (3) of them established a replication session with it and learned this old/invalid RUVelement. Janelle, Alexander, do you remember if you ran the command : 'ipa-replica-manage del SERVER --force --clean'. (with the option --force and --clean) ? thanks thierry Now I'm waiting response from support, they requested dirsrv logs form hanged server and from servers where error appeared again. WBR, Alexander Frolushkin Cell +79232508764 Work +79232507764 -Original Message- From: thierry bordaz [mailto:tbor...@redhat.com] Sent: Thursday, May 28, 2015 1:24 PM To: Alexander Frolushkin (SIB) Cc: freeipa-users@redhat.com; 'Janelle' Subject: Re: [Freeipa-users] Haunted servers? Hello Alexander, Cleanallruv can hang to do the cleanup (depending on task options and if replica are reachable). Did you try using CLEANRUV that is a more basic tool but that should not fail to do the cleanup. Before using cleanruv, you need to abort all cleanallruv pending tasks. Then for each RID that you want to clean, you have to log on each replica and run dn: cn=replica,cn=suffix,cn=mapping tree,cn=config changetype: modify replace: nsds5task nsds5task:CLEANRUVRID This task should succeeds but there is possibility that a given RID resurects in case a replication session occurs before all cleanRUV are completed. So we may have to do cleanRUV a second time. thanks thierry On 05/27/2015 11:06 AM, Alexander Frolushkin wrote: For common information - we also have a ghost replica id: unable to decode: {replica 16} 548a81260010 548a81260010 and trying to get it away with help of Red Hat support, but at this point - no luck... WBR, Alexander Frolushkin -Original Message- From: freeipa-users-boun...@redhat.com [mailto:freeipa-users-boun...@redhat.com] On Behalf Of Janelle Sent: Tuesday, May 26, 2015 8:56 PM To: thierry bordaz; Martin Kosek Cc: freeipa-users@redhat.com Subject: Re: [Freeipa-users] Haunted servers? On 5/26/15 7:04 AM, thierry bordaz wrote: On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? Hello Janelle, Thanks for update. So you worry that there might still be the rogue IPA replica that would be injecting the wrong replica data? In any case, I bet Ludwig and Thierry will follow up with your thread, there is just delay caused by the various public holidays and PTOs this week and we need to rest before digging into the fun with RUVs - as you already know yourself :-) unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 24} 554d53d300010018 554d54a400020018 unable to decode {replica 25} 554d78bf00020019 555af30200040019 unable to decode {replica 9} Now, if I delete these from a server using the ldapmodify method - they go away briefly, but then if I restart the server, they come back. Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem
Re: [Freeipa-users] Haunted servers?
Hello! Thank you for this info. Things seems to be complicated for now... We have this: unable to decode: {replica 16} 548a81260010 548a81260010 on all of our 17 servers. After launching cleanallruv we have it disappeared from 16 servers and one server hangs (any requests addressed ldap just freezes, including ipactl status). After dirsrv restart (via systemctl restart ipa) I found unable to decode: {replica 16} 548a81260010 548a81260010 on this server (and only on it), run cleanallruv and get it from this server, but right after that unable to decode: {replica 16} 548a81260010 548a81260010 reappeared on three other servers. Now I'm waiting response from support, they requested dirsrv logs form hanged server and from servers where error appeared again. WBR, Alexander Frolushkin Cell +79232508764 Work +79232507764 -Original Message- From: thierry bordaz [mailto:tbor...@redhat.com] Sent: Thursday, May 28, 2015 1:24 PM To: Alexander Frolushkin (SIB) Cc: freeipa-users@redhat.com; 'Janelle' Subject: Re: [Freeipa-users] Haunted servers? Hello Alexander, Cleanallruv can hang to do the cleanup (depending on task options and if replica are reachable). Did you try using CLEANRUV that is a more basic tool but that should not fail to do the cleanup. Before using cleanruv, you need to abort all cleanallruv pending tasks. Then for each RID that you want to clean, you have to log on each replica and run dn: cn=replica,cn=suffix,cn=mapping tree,cn=config changetype: modify replace: nsds5task nsds5task:CLEANRUVRID This task should succeeds but there is possibility that a given RID resurects in case a replication session occurs before all cleanRUV are completed. So we may have to do cleanRUV a second time. thanks thierry On 05/27/2015 11:06 AM, Alexander Frolushkin wrote: For common information - we also have a ghost replica id: unable to decode: {replica 16} 548a81260010 548a81260010 and trying to get it away with help of Red Hat support, but at this point - no luck... WBR, Alexander Frolushkin -Original Message- From: freeipa-users-boun...@redhat.com [mailto:freeipa-users-boun...@redhat.com] On Behalf Of Janelle Sent: Tuesday, May 26, 2015 8:56 PM To: thierry bordaz; Martin Kosek Cc: freeipa-users@redhat.com Subject: Re: [Freeipa-users] Haunted servers? On 5/26/15 7:04 AM, thierry bordaz wrote: On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? Hello Janelle, Thanks for update. So you worry that there might still be the rogue IPA replica that would be injecting the wrong replica data? In any case, I bet Ludwig and Thierry will follow up with your thread, there is just delay caused by the various public holidays and PTOs this week and we need to rest before digging into the fun with RUVs - as you already know yourself :-) unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 24} 554d53d300010018 554d54a400020018 unable to decode {replica 25} 554d78bf00020019 555af30200040019 unable to decode {replica 9} Now, if I delete these from a server using the ldapmodify method - they go away briefly, but then if I restart the server, they come back. Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem to go away from maybe 4 of them -- but if I wait a few minutes, it is almost as though replication is re-adding these bad replicas from the servers that I have NOT deleted them from. On each replica (master/replica) there are one RUV in the database and one RUV in the changelog. When cleanallruv succeeds it clears both of them. All replica should be reachable when you issue cleanallruv, so that it can clean
Re: [Freeipa-users] Haunted servers?
On 05/28/2015 12:36 PM, Alexander Frolushkin wrote: Thank you again, Red Hat support directed me to do exactly the same. This removed my unable to decode: {replica 16} 548a81260010 548a81260010 from the rest of servers. I will check again tomorrow all our servers for this :) Alexander, this is good news. Hoping it will not resurect from a forgotten replica or changelog ;-) The problem is that we are still fighting to reproduce it as certainly there are some dynamics around that bug. cleanruv is just a not perfect workaround. thanks thierry Well, I'm not the only person have privileges on our IPA servers, so I cannot completely guarantee nobody run this command ('ipa-replica-manage del SERVER --force --clean'. (with the option --force and --clean)) but after interrogation no one made a confession, including myself. Ok. thanks thierry WBR, Alexander Frolushkin Cell +79232508764 Work +79232507764 -Original Message- From: thierry bordaz [mailto:tbor...@redhat.com] Sent: Thursday, May 28, 2015 1:49 PM To: Alexander Frolushkin (SIB) Cc: freeipa-users@redhat.com; 'Janelle' Subject: Re: [Freeipa-users] Haunted servers? On 05/28/2015 09:33 AM, Alexander Frolushkin wrote: Hello! Thank you for this info. Things seems to be complicated for now... We have this: unable to decode: {replica 16} 548a81260010 548a81260010 on all of our 17 servers. After launching cleanallruv we have it disappeared from 16 servers and one server hangs (any requests addressed ldap just freezes, including ipactl status). After dirsrv restart (via systemctl restart ipa) I found unable to decode: {replica 16} 548a81260010 548a81260010 on this server (and only on it), run cleanallruv and get it from this server, but right after that unable to decode: {replica 16} 548a81260010 548a81260010 reappeared on three other servers. Hello, Yes this is exactly why cleanallruv is the first tool to use, it does the job on all replicas. When you restarted the hanging server, some (3) of them established a replication session with it and learned this old/invalid RUVelement. Janelle, Alexander, do you remember if you ran the command : 'ipa-replica-manage del SERVER --force --clean'. (with the option --force and --clean) ? thanks thierry Now I'm waiting response from support, they requested dirsrv logs form hanged server and from servers where error appeared again. WBR, Alexander Frolushkin Cell +79232508764 Work +79232507764 -Original Message- From: thierry bordaz [mailto:tbor...@redhat.com] Sent: Thursday, May 28, 2015 1:24 PM To: Alexander Frolushkin (SIB) Cc: freeipa-users@redhat.com; 'Janelle' Subject: Re: [Freeipa-users] Haunted servers? Hello Alexander, Cleanallruv can hang to do the cleanup (depending on task options and if replica are reachable). Did you try using CLEANRUV that is a more basic tool but that should not fail to do the cleanup. Before using cleanruv, you need to abort all cleanallruv pending tasks. Then for each RID that you want to clean, you have to log on each replica and run dn: cn=replica,cn=suffix,cn=mapping tree,cn=config changetype: modify replace: nsds5task nsds5task:CLEANRUVRID This task should succeeds but there is possibility that a given RID resurects in case a replication session occurs before all cleanRUV are completed. So we may have to do cleanRUV a second time. thanks thierry On 05/27/2015 11:06 AM, Alexander Frolushkin wrote: For common information - we also have a ghost replica id: unable to decode: {replica 16} 548a81260010 548a81260010 and trying to get it away with help of Red Hat support, but at this point - no luck... WBR, Alexander Frolushkin -Original Message- From: freeipa-users-boun...@redhat.com [mailto:freeipa-users-boun...@redhat.com] On Behalf Of Janelle Sent: Tuesday, May 26, 2015 8:56 PM To: thierry bordaz; Martin Kosek Cc: freeipa-users@redhat.com Subject: Re: [Freeipa-users] Haunted servers? On 5/26/15 7:04 AM, thierry bordaz wrote: On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting
Re: [Freeipa-users] Haunted servers?
Thank you again, Red Hat support directed me to do exactly the same. This removed my unable to decode: {replica 16} 548a81260010 548a81260010 from the rest of servers. I will check again tomorrow all our servers for this :) Well, I'm not the only person have privileges on our IPA servers, so I cannot completely guarantee nobody run this command ('ipa-replica-manage del SERVER --force --clean'. (with the option --force and --clean)) but after interrogation no one made a confession, including myself. WBR, Alexander Frolushkin Cell +79232508764 Work +79232507764 -Original Message- From: thierry bordaz [mailto:tbor...@redhat.com] Sent: Thursday, May 28, 2015 1:49 PM To: Alexander Frolushkin (SIB) Cc: freeipa-users@redhat.com; 'Janelle' Subject: Re: [Freeipa-users] Haunted servers? On 05/28/2015 09:33 AM, Alexander Frolushkin wrote: Hello! Thank you for this info. Things seems to be complicated for now... We have this: unable to decode: {replica 16} 548a81260010 548a81260010 on all of our 17 servers. After launching cleanallruv we have it disappeared from 16 servers and one server hangs (any requests addressed ldap just freezes, including ipactl status). After dirsrv restart (via systemctl restart ipa) I found unable to decode: {replica 16} 548a81260010 548a81260010 on this server (and only on it), run cleanallruv and get it from this server, but right after that unable to decode: {replica 16} 548a81260010 548a81260010 reappeared on three other servers. Hello, Yes this is exactly why cleanallruv is the first tool to use, it does the job on all replicas. When you restarted the hanging server, some (3) of them established a replication session with it and learned this old/invalid RUVelement. Janelle, Alexander, do you remember if you ran the command : 'ipa-replica-manage del SERVER --force --clean'. (with the option --force and --clean) ? thanks thierry Now I'm waiting response from support, they requested dirsrv logs form hanged server and from servers where error appeared again. WBR, Alexander Frolushkin Cell +79232508764 Work +79232507764 -Original Message- From: thierry bordaz [mailto:tbor...@redhat.com] Sent: Thursday, May 28, 2015 1:24 PM To: Alexander Frolushkin (SIB) Cc: freeipa-users@redhat.com; 'Janelle' Subject: Re: [Freeipa-users] Haunted servers? Hello Alexander, Cleanallruv can hang to do the cleanup (depending on task options and if replica are reachable). Did you try using CLEANRUV that is a more basic tool but that should not fail to do the cleanup. Before using cleanruv, you need to abort all cleanallruv pending tasks. Then for each RID that you want to clean, you have to log on each replica and run dn: cn=replica,cn=suffix,cn=mapping tree,cn=config changetype: modify replace: nsds5task nsds5task:CLEANRUVRID This task should succeeds but there is possibility that a given RID resurects in case a replication session occurs before all cleanRUV are completed. So we may have to do cleanRUV a second time. thanks thierry On 05/27/2015 11:06 AM, Alexander Frolushkin wrote: For common information - we also have a ghost replica id: unable to decode: {replica 16} 548a81260010 548a81260010 and trying to get it away with help of Red Hat support, but at this point - no luck... WBR, Alexander Frolushkin -Original Message- From: freeipa-users-boun...@redhat.com [mailto:freeipa-users-boun...@redhat.com] On Behalf Of Janelle Sent: Tuesday, May 26, 2015 8:56 PM To: thierry bordaz; Martin Kosek Cc: freeipa-users@redhat.com Subject: Re: [Freeipa-users] Haunted servers? On 5/26/15 7:04 AM, thierry bordaz wrote: On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? Hello Janelle, Thanks for update. So you worry that there might still
Re: [Freeipa-users] Haunted servers?
Unfortunately, after a couple of minutes, on two of three servers error comes back in little changed form: # ipa-replica-manage list-ruv unable to decode: {replica 16} Before cleanruv it looked like: # ipa-replica-manage list-ruv unable to decode: {replica 16} 548a81260010 548a81260010 And one server seems to be fixed completely. WBR, Alexander Frolushkin -Original Message- From: thierry bordaz [mailto:tbor...@redhat.com] Sent: Thursday, May 28, 2015 5:19 PM To: Alexander Frolushkin (SIB) Cc: freeipa-users@redhat.com; 'Janelle' Subject: Re: [Freeipa-users] Haunted servers? On 05/28/2015 12:36 PM, Alexander Frolushkin wrote: Thank you again, Red Hat support directed me to do exactly the same. This removed my unable to decode: {replica 16} 548a81260010 548a81260010 from the rest of servers. I will check again tomorrow all our servers for this :) Alexander, this is good news. Hoping it will not resurect from a forgotten replica or changelog ;-) The problem is that we are still fighting to reproduce it as certainly there are some dynamics around that bug. cleanruv is just a not perfect workaround. thanks thierry Well, I'm not the only person have privileges on our IPA servers, so I cannot completely guarantee nobody run this command ('ipa-replica-manage del SERVER --force --clean'. (with the option --force and --clean)) but after interrogation no one made a confession, including myself. Ok. thanks thierry WBR, Alexander Frolushkin Cell +79232508764 Work +79232507764 -Original Message- From: thierry bordaz [mailto:tbor...@redhat.com] Sent: Thursday, May 28, 2015 1:49 PM To: Alexander Frolushkin (SIB) Cc: freeipa-users@redhat.com; 'Janelle' Subject: Re: [Freeipa-users] Haunted servers? On 05/28/2015 09:33 AM, Alexander Frolushkin wrote: Hello! Thank you for this info. Things seems to be complicated for now... We have this: unable to decode: {replica 16} 548a81260010 548a81260010 on all of our 17 servers. After launching cleanallruv we have it disappeared from 16 servers and one server hangs (any requests addressed ldap just freezes, including ipactl status). After dirsrv restart (via systemctl restart ipa) I found unable to decode: {replica 16} 548a81260010 548a81260010 on this server (and only on it), run cleanallruv and get it from this server, but right after that unable to decode: {replica 16} 548a81260010 548a81260010 reappeared on three other servers. Hello, Yes this is exactly why cleanallruv is the first tool to use, it does the job on all replicas. When you restarted the hanging server, some (3) of them established a replication session with it and learned this old/invalid RUVelement. Janelle, Alexander, do you remember if you ran the command : 'ipa-replica-manage del SERVER --force --clean'. (with the option --force and --clean) ? thanks thierry Now I'm waiting response from support, they requested dirsrv logs form hanged server and from servers where error appeared again. WBR, Alexander Frolushkin Cell +79232508764 Work +79232507764 -Original Message- From: thierry bordaz [mailto:tbor...@redhat.com] Sent: Thursday, May 28, 2015 1:24 PM To: Alexander Frolushkin (SIB) Cc: freeipa-users@redhat.com; 'Janelle' Subject: Re: [Freeipa-users] Haunted servers? Hello Alexander, Cleanallruv can hang to do the cleanup (depending on task options and if replica are reachable). Did you try using CLEANRUV that is a more basic tool but that should not fail to do the cleanup. Before using cleanruv, you need to abort all cleanallruv pending tasks. Then for each RID that you want to clean, you have to log on each replica and run dn: cn=replica,cn=suffix,cn=mapping tree,cn=config changetype: modify replace: nsds5task nsds5task:CLEANRUVRID This task should succeeds but there is possibility that a given RID resurects in case a replication session occurs before all cleanRUV are completed. So we may have to do cleanRUV a second time. thanks thierry On 05/27/2015 11:06 AM, Alexander Frolushkin wrote: For common information - we also have a ghost replica id: unable to decode: {replica 16} 548a81260010 548a81260010 and trying to get it away with help of Red Hat support, but at this point - no luck... WBR, Alexander Frolushkin -Original Message- From: freeipa-users-boun...@redhat.com [mailto:freeipa-users-boun...@redhat.com] On Behalf Of Janelle Sent: Tuesday, May 26, 2015 8:56 PM To: thierry bordaz; Martin Kosek Cc: freeipa-users@redhat.com Subject: Re: [Freeipa-users] Haunted servers? On 5/26/15 7:04 AM, thierry bordaz wrote: On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head
Re: [Freeipa-users] Haunted servers?
For common information - we also have a ghost replica id: unable to decode: {replica 16} 548a81260010 548a81260010 and trying to get it away with help of Red Hat support, but at this point - no luck... WBR, Alexander Frolushkin -Original Message- From: freeipa-users-boun...@redhat.com [mailto:freeipa-users-boun...@redhat.com] On Behalf Of Janelle Sent: Tuesday, May 26, 2015 8:56 PM To: thierry bordaz; Martin Kosek Cc: freeipa-users@redhat.com Subject: Re: [Freeipa-users] Haunted servers? On 5/26/15 7:04 AM, thierry bordaz wrote: On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? Hello Janelle, Thanks for update. So you worry that there might still be the rogue IPA replica that would be injecting the wrong replica data? In any case, I bet Ludwig and Thierry will follow up with your thread, there is just delay caused by the various public holidays and PTOs this week and we need to rest before digging into the fun with RUVs - as you already know yourself :-) unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 24} 554d53d300010018 554d54a400020018 unable to decode {replica 25} 554d78bf00020019 555af30200040019 unable to decode {replica 9} Now, if I delete these from a server using the ldapmodify method - they go away briefly, but then if I restart the server, they come back. Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem to go away from maybe 4 of them -- but if I wait a few minutes, it is almost as though replication is re-adding these bad replicas from the servers that I have NOT deleted them from. On each replica (master/replica) there are one RUV in the database and one RUV in the changelog. When cleanallruv succeeds it clears both of them. All replica should be reachable when you issue cleanallruv, so that it can clean the RUVs on all the replicas in almost single operation. If some replica are not reachable, they keep information of about the cleaned RID and then can later propagate those old RID to the rest of the replica. Ludwig managed to reproduce the issue with a quite complex test case (3 replicas and multiple cleanallruv). We have not yet identified the reason how a cleaned replicaId can get resurrected. In parallel we just reproduced it without a clear test case but in a 2 replica topology. So my question is simple - is there something in the logs I can look for that would indicate the SOURCE of these bogus entries? Is the replica 9 with NO extra data any indication of something I could look for? I guess that if I have the answer to your question we would have understood the bug .. A little more information to go on: I changed my password on a master (actually, the original master) and was able to login to each replica within a few seconds with the new password. This tells me replication is working across all the servers. I also created a new account and it showed up on all the servers, again within 15-20 seconds. This tells me replication is working just fine. I don't understand why the cleanallruv does not process across all the servers the same way. Baffling indeed. Perhaps the most important question -- does these bogus entries actually cause a problem? I mean they don't seem to be. What if I just ignored them? ~J -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project Информация в этом сообщении предназначена исключительно для конкретных лиц, которым она адресована. В сообщении может содержаться конфиденциальная информация, которая не может быть раскрыта или
Re: [Freeipa-users] Haunted servers?
On 5/26/15 7:04 AM, thierry bordaz wrote: On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? Hello Janelle, Thanks for update. So you worry that there might still be the rogue IPA replica that would be injecting the wrong replica data? In any case, I bet Ludwig and Thierry will follow up with your thread, there is just delay caused by the various public holidays and PTOs this week and we need to rest before digging into the fun with RUVs - as you already know yourself :-) unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 24} 554d53d300010018 554d54a400020018 unable to decode {replica 25} 554d78bf00020019 555af30200040019 unable to decode {replica 9} Now, if I delete these from a server using the ldapmodify method - they go away briefly, but then if I restart the server, they come back. Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem to go away from maybe 4 of them -- but if I wait a few minutes, it is almost as though replication is re-adding these bad replicas from the servers that I have NOT deleted them from. On each replica (master/replica) there are one RUV in the database and one RUV in the changelog. When cleanallruv succeeds it clears both of them. All replica should be reachable when you issue cleanallruv, so that it can clean the RUVs on all the replicas in almost single operation. If some replica are not reachable, they keep information of about the cleaned RID and then can later propagate those old RID to the rest of the replica. Ludwig managed to reproduce the issue with a quite complex test case (3 replicas and multiple cleanallruv). We have not yet identified the reason how a cleaned replicaId can get resurrected. In parallel we just reproduced it without a clear test case but in a 2 replica topology. After spending well over 2 days trying to clean things -- I am now here: CLEANALLRUV tasks RID 16 Not all replicas finished cleaning, retrying in 14400 seconds RID 19 None RID 22 None What is going on here? All the same data still exists as shown above in the original thread, but I seem to be stuck. I know I am not the only person having replica issues. Is there anything else I can try? ~J -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] Haunted servers?
On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? Hello Janelle, Thanks for update. So you worry that there might still be the rogue IPA replica that would be injecting the wrong replica data? In any case, I bet Ludwig and Thierry will follow up with your thread, there is just delay caused by the various public holidays and PTOs this week and we need to rest before digging into the fun with RUVs - as you already know yourself :-) unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 24} 554d53d300010018 554d54a400020018 unable to decode {replica 25} 554d78bf00020019 555af30200040019 unable to decode {replica 9} Now, if I delete these from a server using the ldapmodify method - they go away briefly, but then if I restart the server, they come back. Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem to go away from maybe 4 of them -- but if I wait a few minutes, it is almost as though replication is re-adding these bad replicas from the servers that I have NOT deleted them from. So my question is simple - is there something in the logs I can look for that would indicate the SOURCE of these bogus entries? Is the replica 9 with NO extra data any indication of something I could look for? I am not willing to give up easily (as you might have already guessed) and I am determined to find the cause of these. I know we need more logs, but with all the traffic, the logs rollover within a few hours, and if the problem is happening at 3am for example, I am not able to track it down because the logs have rolled. Back to my investigations. ~J -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] Haunted servers?
On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? Hello Janelle, Thanks for update. So you worry that there might still be the rogue IPA replica that would be injecting the wrong replica data? In any case, I bet Ludwig and Thierry will follow up with your thread, there is just delay caused by the various public holidays and PTOs this week and we need to rest before digging into the fun with RUVs - as you already know yourself :-) unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 24} 554d53d300010018 554d54a400020018 unable to decode {replica 25} 554d78bf00020019 555af30200040019 unable to decode {replica 9} Now, if I delete these from a server using the ldapmodify method - they go away briefly, but then if I restart the server, they come back. Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem to go away from maybe 4 of them -- but if I wait a few minutes, it is almost as though replication is re-adding these bad replicas from the servers that I have NOT deleted them from. On each replica (master/replica) there are one RUV in the database and one RUV in the changelog. When cleanallruv succeeds it clears both of them. All replica should be reachable when you issue cleanallruv, so that it can clean the RUVs on all the replicas in almost single operation. If some replica are not reachable, they keep information of about the cleaned RID and then can later propagate those old RID to the rest of the replica. Ludwig managed to reproduce the issue with a quite complex test case (3 replicas and multiple cleanallruv). We have not yet identified the reason how a cleaned replicaId can get resurrected. In parallel we just reproduced it without a clear test case but in a 2 replica topology. So my question is simple - is there something in the logs I can look for that would indicate the SOURCE of these bogus entries? Is the replica 9 with NO extra data any indication of something I could look for? I guess that if I have the answer to your question we would have understood the bug .. I am not willing to give up easily (as you might have already guessed) and I am determined to find the cause of these. I know we need more logs, but with all the traffic, the logs rollover within a few hours, and if the problem is happening at 3am for example, I am not able to track it down because the logs have rolled. Back to my investigations. ~J -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] Haunted servers?
On 5/26/15 7:04 AM, thierry bordaz wrote: On 05/26/2015 08:47 AM, Martin Kosek wrote: On 05/26/2015 12:20 AM, Janelle wrote: On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? Hello Janelle, Thanks for update. So you worry that there might still be the rogue IPA replica that would be injecting the wrong replica data? In any case, I bet Ludwig and Thierry will follow up with your thread, there is just delay caused by the various public holidays and PTOs this week and we need to rest before digging into the fun with RUVs - as you already know yourself :-) unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 24} 554d53d300010018 554d54a400020018 unable to decode {replica 25} 554d78bf00020019 555af30200040019 unable to decode {replica 9} Now, if I delete these from a server using the ldapmodify method - they go away briefly, but then if I restart the server, they come back. Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem to go away from maybe 4 of them -- but if I wait a few minutes, it is almost as though replication is re-adding these bad replicas from the servers that I have NOT deleted them from. On each replica (master/replica) there are one RUV in the database and one RUV in the changelog. When cleanallruv succeeds it clears both of them. All replica should be reachable when you issue cleanallruv, so that it can clean the RUVs on all the replicas in almost single operation. If some replica are not reachable, they keep information of about the cleaned RID and then can later propagate those old RID to the rest of the replica. Ludwig managed to reproduce the issue with a quite complex test case (3 replicas and multiple cleanallruv). We have not yet identified the reason how a cleaned replicaId can get resurrected. In parallel we just reproduced it without a clear test case but in a 2 replica topology. So my question is simple - is there something in the logs I can look for that would indicate the SOURCE of these bogus entries? Is the replica 9 with NO extra data any indication of something I could look for? I guess that if I have the answer to your question we would have understood the bug .. A little more information to go on: I changed my password on a master (actually, the original master) and was able to login to each replica within a few seconds with the new password. This tells me replication is working across all the servers. I also created a new account and it showed up on all the servers, again within 15-20 seconds. This tells me replication is working just fine. I don't understand why the cleanallruv does not process across all the servers the same way. Baffling indeed. Perhaps the most important question -- does these bogus entries actually cause a problem? I mean they don't seem to be. What if I just ignored them? ~J -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
Re: [Freeipa-users] Haunted servers?
On 5/24/15 3:12 AM, Janelle wrote: And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J So things are getting more interesting. Still trying to find the leaking server(s). here is what I mean by that. As you see, I continue to find these -- BUT, notice a new symptom -- replica 9 does NOT show any other data - it is blank? unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 24} 554d53d300010018 554d54a400020018 unable to decode {replica 25} 554d78bf00020019 555af30200040019 unable to decode {replica 9} Now, if I delete these from a server using the ldapmodify method - they go away briefly, but then if I restart the server, they come back. Let me try to explain -- given a number of servers, say 8, if I user ldapmodify to delete from 1 of those, they seem to go away from maybe 4 of them -- but if I wait a few minutes, it is almost as though replication is re-adding these bad replicas from the servers that I have NOT deleted them from. So my question is simple - is there something in the logs I can look for that would indicate the SOURCE of these bogus entries? Is the replica 9 with NO extra data any indication of something I could look for? I am not willing to give up easily (as you might have already guessed) and I am determined to find the cause of these. I know we need more logs, but with all the traffic, the logs rollover within a few hours, and if the problem is happening at 3am for example, I am not able to track it down because the logs have rolled. Back to my investigations. ~J -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
[Freeipa-users] Haunted servers?
And just like that, my haunted servers have all returned. I am going to just put a gun to my head and be done with it. :-( Why do things run perfectly and then suddenly ??? Logs show little to nothing, mostly because the servers are so busy, they have already rotated out. unable to decode {replica 16} 5535647200030010 5535647200030010 unable to decode {replica 22} 55371e9e0016 553eec6400040016 unable to decode {replica 23} 5545d61f00020017 555432430017 unable to decode {replica 24} 554d53d30018 554d54a400020018 unable to decode {replica 25} 554d78bf0019 555af30200040019 unable to decode {replica 9} 55402c3900030009 55402c3900030009 Don't know what to do anymore. At my wit's end.. ~J -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project