Re: [Freeipa-users] Timeout (?) issues
I'm pretty sure this is the root of my problem (not confirmed yet, but it's AIX -- that's always the problem): http://www-01.ibm.com/support/docview.wss?uid=swg21212940 The takeaway is this: The first query (184) is a normal IPV4 lookup for ldap.austin.texas.com, which returns 192.168.1.255. But then an IPV6 lookup is done for the same name. Because there is no IPV6 address for ldap.austin.texas.com, it continues searching every search domain in the resolv.conf file ( example.austin.texas.com austin.texas.com texas.com) trying to find one. On Fri, Sep 20, 2013 at 3:07 AM, Petr Spacek pspa...@redhat.com wrote: On 20.9.2013 01:24, KodaK wrote: This is ridiculous, right? IPA server 1: # for i in $(ls access*); do echo -n $i:\ ;grep err=32 $i | wc -l; done access: 248478 access.20130916-043207: 302774 access.20130916-123642: 272572 access.20130916-201516: 294308 access.20130917-081053: 295060 access.20130917-144559: 284498 access.20130917-231435: 281035 access.20130918-091611: 291165 access.20130918-154945: 275792 access.20130919-014322: 296113 IPA server 2: access: 4313 access.20130909-200216: 4023 access.20130910-200229: 4161 access.20130911-200239: 4182 access.20130912-200249: 5069 access.20130913-200258: 3833 access.20130914-200313: 4208 access.20130915-200323: 4702 access.20130916-200332: 4532 IPA server 3: access: 802 access.20130910-080737: 3876 access.20130911-080748: 3902 access.20130912-080802: 3678 access.20130913-080810: 3765 access.20130914-080826: 3524 access.20130915-080907: 4142 access.20130916-080916: 4930 access.20130917-080926: 4769 access.20130918-081005: 2879 IPA server 4: access: 2812 access.20130910-003051: 4095 access.20130911-003105: 3623 access.20130912-003113: 3606 access.20130913-003125: 3581 access.20130914-003135: 3758 access.20130915-003150: 3935 access.20130916-003159: 4184 access.20130917-003210: 3859 access.20130918-003221: 5110 The vast majority of the err=32 messages are DNS entries. It depends on your setup. Bind-dyndb-ldap does LDAP search for each non-existent name to verify that the name wasn't added to LDAP in meanwhile. If you have clients doing 1M queries for non-existing names per day, then you will see 1M LDAP queries with err=32 per day. Next major version of bind-dyndb-ldap will have reworked internal database and it will support negative caching, so number of err=32 should drop significantly. Here are some samples: [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 SRCH base=idnsName=xxx.com ,idnsname=unix.xxx.com,cn=dns,**dc=unix,dc=xxx,dc=com scope=0 filter=(objectClass=**idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 RESULT err=32 tag=101 nentries=0 etime=0 This is interesting, because this LDAP query is equal to DNS query for xxx.com.unix.xxx.com. Are your clients that crazy? :-) [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 SRCH base=idnsName= slpoxacl01.unix.xxx.com,**idnsname=unix.xxx.com,cn=dns,** dc=unix,dc=xxx,dc=com scope=0 filter=(objectClass=**idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 RESULT err=32 tag=101 nentries=0 etime=0 This is equivalent to DNS query for slpoxacl01.unix.xxx.com.unix.** xxx.com http://slpoxacl01.unix.xxx.com.unix.xxx.com.. [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 SRCH base=idnsName= sla400q1.unix.xxx.com,**idnsname=unix.xxx.com,cn=dns,** dc=unix,dc=xxx,dc=com scope=0 filter=(objectClass=**idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 RESULT err=32 tag=101 nentries=0 etime=0 And this is sla400q1.unix.xxx.com.unix.**xxx.comhttp://sla400q1.unix.xxx.com.unix.xxx.com .. [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 SRCH base=idnsName= magellanhealth.com,idnsname=un**ix.magellanhealth.comhttp://unix.magellanhealth.com ,cn=dns,**dc=unix,dc=magellanhealth,dc=**com scope=0 filter=(objectClass=**idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 RESULT err=32 tag=101 nentries=0 etime=0 So far today there are over half a million of these. That can't be right. I would recommend you to use network sniffer and check which clients sends these crazy queries. My guess is that your resolver library (libc?) causes this. On my Linux system with glibc-2.17-14.fc19.x86_64 it behaves in this way: client query = nonexistent.example.com. (I used $ ping nonexistent.example.com.) search domain in /etc/resolv.conf = brq.redhat.com. DNS query #1: nonexistent.example.com. = NXDOMAIN DNS query #2: nonexistent.example.com.brq.**redhat.comhttp://nonexistent.example.com.brq.redhat.com. = NXDOMAIN DNS query #3: nonexistent.example.com.**redhat.comhttp://nonexistent.example.com.redhat.com. = NXDOMAIN On Thu, Sep 19, 2013 at 3:05 PM, KodaK sako...@gmail.com wrote: I didn't realize that DNS created one connection. I thought it was one connection spanning several days. In theory, there should be 2-4 LDAP connections
Re: [Freeipa-users] Timeout (?) issues
On 20.9.2013 01:24, KodaK wrote: This is ridiculous, right? IPA server 1: # for i in $(ls access*); do echo -n $i:\ ;grep err=32 $i | wc -l; done access: 248478 access.20130916-043207: 302774 access.20130916-123642: 272572 access.20130916-201516: 294308 access.20130917-081053: 295060 access.20130917-144559: 284498 access.20130917-231435: 281035 access.20130918-091611: 291165 access.20130918-154945: 275792 access.20130919-014322: 296113 IPA server 2: access: 4313 access.20130909-200216: 4023 access.20130910-200229: 4161 access.20130911-200239: 4182 access.20130912-200249: 5069 access.20130913-200258: 3833 access.20130914-200313: 4208 access.20130915-200323: 4702 access.20130916-200332: 4532 IPA server 3: access: 802 access.20130910-080737: 3876 access.20130911-080748: 3902 access.20130912-080802: 3678 access.20130913-080810: 3765 access.20130914-080826: 3524 access.20130915-080907: 4142 access.20130916-080916: 4930 access.20130917-080926: 4769 access.20130918-081005: 2879 IPA server 4: access: 2812 access.20130910-003051: 4095 access.20130911-003105: 3623 access.20130912-003113: 3606 access.20130913-003125: 3581 access.20130914-003135: 3758 access.20130915-003150: 3935 access.20130916-003159: 4184 access.20130917-003210: 3859 access.20130918-003221: 5110 The vast majority of the err=32 messages are DNS entries. It depends on your setup. Bind-dyndb-ldap does LDAP search for each non-existent name to verify that the name wasn't added to LDAP in meanwhile. If you have clients doing 1M queries for non-existing names per day, then you will see 1M LDAP queries with err=32 per day. Next major version of bind-dyndb-ldap will have reworked internal database and it will support negative caching, so number of err=32 should drop significantly. Here are some samples: [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 SRCH base=idnsName=xxx.com ,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com scope=0 filter=(objectClass=idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 RESULT err=32 tag=101 nentries=0 etime=0 This is interesting, because this LDAP query is equal to DNS query for xxx.com.unix.xxx.com. Are your clients that crazy? :-) [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 SRCH base=idnsName= slpoxacl01.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com scope=0 filter=(objectClass=idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 RESULT err=32 tag=101 nentries=0 etime=0 This is equivalent to DNS query for slpoxacl01.unix.xxx.com.unix.xxx.com.. [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 SRCH base=idnsName= sla400q1.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com scope=0 filter=(objectClass=idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 RESULT err=32 tag=101 nentries=0 etime=0 And this is sla400q1.unix.xxx.com.unix.xxx.com.. [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 SRCH base=idnsName= magellanhealth.com,idnsname=unix.magellanhealth.com,cn=dns,dc=unix,dc=magellanhealth,dc=com scope=0 filter=(objectClass=idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 RESULT err=32 tag=101 nentries=0 etime=0 So far today there are over half a million of these. That can't be right. I would recommend you to use network sniffer and check which clients sends these crazy queries. My guess is that your resolver library (libc?) causes this. On my Linux system with glibc-2.17-14.fc19.x86_64 it behaves in this way: client query = nonexistent.example.com. (I used $ ping nonexistent.example.com.) search domain in /etc/resolv.conf = brq.redhat.com. DNS query #1: nonexistent.example.com. = NXDOMAIN DNS query #2: nonexistent.example.com.brq.redhat.com. = NXDOMAIN DNS query #3: nonexistent.example.com.redhat.com. = NXDOMAIN On Thu, Sep 19, 2013 at 3:05 PM, KodaK sako...@gmail.com wrote: I didn't realize that DNS created one connection. I thought it was one connection spanning several days. In theory, there should be 2-4 LDAP connections from each DNS server and those connections should live until DNS or LDAP server restarts/crashes. Petr^2 Spacek On Thu, Sep 19, 2013 at 2:51 PM, Rich Megginson rmegg...@redhat.comwrote: On 09/19/2013 12:57 PM, KodaK wrote: Well, this is awkward: [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l 5453936 [root@slpidml01 slapd-UNIX-xxx-COM]# Why is it awkward? On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote: Thanks. I've been running that against my logs, and this has to be abnormal: err=32 129274No Such Object err=0 10952Successful Operations err=14 536SASL Bind in Progress err=53 39Unwilling To Perform err=493Invalid Credentials (Bad Password) I'm still trying to figure out why there are so many error 32s. Are there any usual suspects I should know about? (That's just the current
Re: [Freeipa-users] Timeout (?) issues
SRV records were missing for _ldaps_tcp. I added them in for the IPA servers and that knocked out some of the errors, but there are still a lot. I suspect these boxes are overloaded with bad dns queries (probably due to something I've messed up.) Any help would be appreciated, but I'm opening a RH ticket. Thanks, --Jason On Thu, Sep 19, 2013 at 1:57 PM, KodaK sako...@gmail.com wrote: Well, this is awkward: [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l 5453936 [root@slpidml01 slapd-UNIX-xxx-COM]# On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote: Thanks. I've been running that against my logs, and this has to be abnormal: err=32 129274No Such Object err=0 10952Successful Operations err=14 536SASL Bind in Progress err=53 39Unwilling To Perform err=493Invalid Credentials (Bad Password) I'm still trying to figure out why there are so many error 32s. Are there any usual suspects I should know about? (That's just the current access log, btw.) On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.comwrote: On 09/16/2013 07:57 PM, Dmitri Pal wrote: On 09/16/2013 12:02 PM, KodaK wrote: Yet another AIX related problem: The AIX LDAP client is called secldapclntd (sure, they could make it more awkward, but the budget ran out.) I'm running into the issue detailed here: http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344 If an LDAP server fails to answer an LDAP query, secldapclntd caches the non-answered query negatively. This may happen if the LDAP server is down for example. After the LDAP server is back again secldapclntd will use the negative cache entry and the application initiating the original query will still fail until the cache entry expires. IBM is working on porting the fix to our specific TL and SP levels. What I'm concerned with here, though, is *why* is it timing out? I don't know what the current timeout values are (AIX sucks, etc.) I don't see timeout issues on my Linux boxes, which leads me to believe that either the sssd timouts are longer or that sssd is just more robust when dealing with timeouts. I believe I'm seeing similar behavior with LDAP sudo on AIX as well, because I occasionally have to re-run sudo commands because they initially fail (and I know I'm using the right passwords.) However, sudo doesn't appear to have a cache (or it handles caching better.) Does anyone have any troubleshooting suggestions? Any general speed things up suggestions on the IPA side? Thanks, --Jason -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users Is the server FreeIPA? Can see in the server logs what is actually happening is it the server that really takes time or there is a network connectivity issue or FW is dropping packets? I would really start with the server side logs. As far as 389 goes, run logconv.pl against the access logs in /var/log/dirsrv/slapd-DOMAIN-COM -- Thank you, Dmitri Pal Sr. Engineering Manager for IdM portfolio Red Hat Inc. --- Looking to carve out IT costs?www.redhat.com/carveoutcosts/ ___ Freeipa-users mailing listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] Timeout (?) issues
Well, this is awkward: [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l 5453936 [root@slpidml01 slapd-UNIX-xxx-COM]# On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote: Thanks. I've been running that against my logs, and this has to be abnormal: err=32 129274No Such Object err=0 10952Successful Operations err=14 536SASL Bind in Progress err=53 39Unwilling To Perform err=493Invalid Credentials (Bad Password) I'm still trying to figure out why there are so many error 32s. Are there any usual suspects I should know about? (That's just the current access log, btw.) On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.comwrote: On 09/16/2013 07:57 PM, Dmitri Pal wrote: On 09/16/2013 12:02 PM, KodaK wrote: Yet another AIX related problem: The AIX LDAP client is called secldapclntd (sure, they could make it more awkward, but the budget ran out.) I'm running into the issue detailed here: http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344 If an LDAP server fails to answer an LDAP query, secldapclntd caches the non-answered query negatively. This may happen if the LDAP server is down for example. After the LDAP server is back again secldapclntd will use the negative cache entry and the application initiating the original query will still fail until the cache entry expires. IBM is working on porting the fix to our specific TL and SP levels. What I'm concerned with here, though, is *why* is it timing out? I don't know what the current timeout values are (AIX sucks, etc.) I don't see timeout issues on my Linux boxes, which leads me to believe that either the sssd timouts are longer or that sssd is just more robust when dealing with timeouts. I believe I'm seeing similar behavior with LDAP sudo on AIX as well, because I occasionally have to re-run sudo commands because they initially fail (and I know I'm using the right passwords.) However, sudo doesn't appear to have a cache (or it handles caching better.) Does anyone have any troubleshooting suggestions? Any general speed things up suggestions on the IPA side? Thanks, --Jason -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users Is the server FreeIPA? Can see in the server logs what is actually happening is it the server that really takes time or there is a network connectivity issue or FW is dropping packets? I would really start with the server side logs. As far as 389 goes, run logconv.pl against the access logs in /var/log/dirsrv/slapd-DOMAIN-COM -- Thank you, Dmitri Pal Sr. Engineering Manager for IdM portfolio Red Hat Inc. --- Looking to carve out IT costs?www.redhat.com/carveoutcosts/ ___ Freeipa-users mailing listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] Timeout (?) issues
I didn't realize that DNS created one connection. I thought it was one connection spanning several days. On Thu, Sep 19, 2013 at 2:51 PM, Rich Megginson rmegg...@redhat.com wrote: On 09/19/2013 12:57 PM, KodaK wrote: Well, this is awkward: [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l 5453936 [root@slpidml01 slapd-UNIX-xxx-COM]# Why is it awkward? On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote: Thanks. I've been running that against my logs, and this has to be abnormal: err=32 129274No Such Object err=0 10952Successful Operations err=14 536SASL Bind in Progress err=53 39Unwilling To Perform err=493Invalid Credentials (Bad Password) I'm still trying to figure out why there are so many error 32s. Are there any usual suspects I should know about? (That's just the current access log, btw.) On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.comwrote: On 09/16/2013 07:57 PM, Dmitri Pal wrote: On 09/16/2013 12:02 PM, KodaK wrote: Yet another AIX related problem: The AIX LDAP client is called secldapclntd (sure, they could make it more awkward, but the budget ran out.) I'm running into the issue detailed here: http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344 If an LDAP server fails to answer an LDAP query, secldapclntd caches the non-answered query negatively. This may happen if the LDAP server is down for example. After the LDAP server is back again secldapclntd will use the negative cache entry and the application initiating the original query will still fail until the cache entry expires. IBM is working on porting the fix to our specific TL and SP levels. What I'm concerned with here, though, is *why* is it timing out? I don't know what the current timeout values are (AIX sucks, etc.) I don't see timeout issues on my Linux boxes, which leads me to believe that either the sssd timouts are longer or that sssd is just more robust when dealing with timeouts. I believe I'm seeing similar behavior with LDAP sudo on AIX as well, because I occasionally have to re-run sudo commands because they initially fail (and I know I'm using the right passwords.) However, sudo doesn't appear to have a cache (or it handles caching better.) Does anyone have any troubleshooting suggestions? Any general speed things up suggestions on the IPA side? Thanks, --Jason -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users Is the server FreeIPA? Can see in the server logs what is actually happening is it the server that really takes time or there is a network connectivity issue or FW is dropping packets? I would really start with the server side logs. As far as 389 goes, run logconv.pl against the access logs in /var/log/dirsrv/slapd-DOMAIN-COM -- Thank you, Dmitri Pal Sr. Engineering Manager for IdM portfolio Red Hat Inc. --- Looking to carve out IT costs?www.redhat.com/carveoutcosts/ ___ Freeipa-users mailing listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] Timeout (?) issues
On 09/19/2013 12:57 PM, KodaK wrote: Well, this is awkward: [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l 5453936 [root@slpidml01 slapd-UNIX-xxx-COM]# Why is it awkward? On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com mailto:sako...@gmail.com wrote: Thanks. I've been running that against my logs, and this has to be abnormal: err=32 129274No Such Object err=0 10952Successful Operations err=14 536SASL Bind in Progress err=53 39Unwilling To Perform err=493Invalid Credentials (Bad Password) I'm still trying to figure out why there are so many error 32s. Are there any usual suspects I should know about? (That's just the current access log, btw.) On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.com mailto:rmegg...@redhat.com wrote: On 09/16/2013 07:57 PM, Dmitri Pal wrote: On 09/16/2013 12:02 PM, KodaK wrote: Yet another AIX related problem: The AIX LDAP client is called secldapclntd (sure, they could make it more awkward, but the budget ran out.) I'm running into the issue detailed here: http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344 If an LDAP server fails to answer an LDAP query, secldapclntd caches the non-answered query negatively. This may happen if the LDAP server is down for example. After the LDAP server is back again secldapclntd will use the negative cache entry and the application initiating the original query will still fail until the cache entry expires. IBM is working on porting the fix to our specific TL and SP levels. What I'm concerned with here, though, is *why* is it timing out? I don't know what the current timeout values are (AIX sucks, etc.) I don't see timeout issues on my Linux boxes, which leads me to believe that either the sssd timouts are longer or that sssd is just more robust when dealing with timeouts. I believe I'm seeing similar behavior with LDAP sudo on AIX as well, because I occasionally have to re-run sudo commands because they initially fail (and I know I'm using the right passwords.) However, sudo doesn't appear to have a cache (or it handles caching better.) Does anyone have any troubleshooting suggestions? Any general speed things up suggestions on the IPA side? Thanks, --Jason -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing list Freeipa-users@redhat.com mailto:Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users Is the server FreeIPA? Can see in the server logs what is actually happening is it the server that really takes time or there is a network connectivity issue or FW is dropping packets? I would really start with the server side logs. As far as 389 goes, run logconv.pl http://logconv.pl against the access logs in /var/log/dirsrv/slapd-DOMAIN-COM -- Thank you, Dmitri Pal Sr. Engineering Manager for IdM portfolio Red Hat Inc. --- Looking to carve out IT costs? www.redhat.com/carveoutcosts/ http://www.redhat.com/carveoutcosts/ ___ Freeipa-users mailing list Freeipa-users@redhat.com mailto:Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users ___ Freeipa-users mailing list Freeipa-users@redhat.com mailto:Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] Timeout (?) issues
This is ridiculous, right? IPA server 1: # for i in $(ls access*); do echo -n $i:\ ;grep err=32 $i | wc -l; done access: 248478 access.20130916-043207: 302774 access.20130916-123642: 272572 access.20130916-201516: 294308 access.20130917-081053: 295060 access.20130917-144559: 284498 access.20130917-231435: 281035 access.20130918-091611: 291165 access.20130918-154945: 275792 access.20130919-014322: 296113 IPA server 2: access: 4313 access.20130909-200216: 4023 access.20130910-200229: 4161 access.20130911-200239: 4182 access.20130912-200249: 5069 access.20130913-200258: 3833 access.20130914-200313: 4208 access.20130915-200323: 4702 access.20130916-200332: 4532 IPA server 3: access: 802 access.20130910-080737: 3876 access.20130911-080748: 3902 access.20130912-080802: 3678 access.20130913-080810: 3765 access.20130914-080826: 3524 access.20130915-080907: 4142 access.20130916-080916: 4930 access.20130917-080926: 4769 access.20130918-081005: 2879 IPA server 4: access: 2812 access.20130910-003051: 4095 access.20130911-003105: 3623 access.20130912-003113: 3606 access.20130913-003125: 3581 access.20130914-003135: 3758 access.20130915-003150: 3935 access.20130916-003159: 4184 access.20130917-003210: 3859 access.20130918-003221: 5110 The vast majority of the err=32 messages are DNS entries. Here are some samples: [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 SRCH base=idnsName=xxx.com ,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com scope=0 filter=(objectClass=idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 RESULT err=32 tag=101 nentries=0 etime=0 [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 SRCH base=idnsName= slpoxacl01.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com scope=0 filter=(objectClass=idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 RESULT err=32 tag=101 nentries=0 etime=0 [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 SRCH base=idnsName= sla400q1.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com scope=0 filter=(objectClass=idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 RESULT err=32 tag=101 nentries=0 etime=0 [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 SRCH base=idnsName= magellanhealth.com,idnsname=unix.magellanhealth.com,cn=dns,dc=unix,dc=magellanhealth,dc=com scope=0 filter=(objectClass=idnsRecord) attrs=ALL [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 RESULT err=32 tag=101 nentries=0 etime=0 So far today there are over half a million of these. That can't be right. On Thu, Sep 19, 2013 at 3:05 PM, KodaK sako...@gmail.com wrote: I didn't realize that DNS created one connection. I thought it was one connection spanning several days. On Thu, Sep 19, 2013 at 2:51 PM, Rich Megginson rmegg...@redhat.comwrote: On 09/19/2013 12:57 PM, KodaK wrote: Well, this is awkward: [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l 5453936 [root@slpidml01 slapd-UNIX-xxx-COM]# Why is it awkward? On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote: Thanks. I've been running that against my logs, and this has to be abnormal: err=32 129274No Such Object err=0 10952Successful Operations err=14 536SASL Bind in Progress err=53 39Unwilling To Perform err=493Invalid Credentials (Bad Password) I'm still trying to figure out why there are so many error 32s. Are there any usual suspects I should know about? (That's just the current access log, btw.) On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.comwrote: On 09/16/2013 07:57 PM, Dmitri Pal wrote: On 09/16/2013 12:02 PM, KodaK wrote: Yet another AIX related problem: The AIX LDAP client is called secldapclntd (sure, they could make it more awkward, but the budget ran out.) I'm running into the issue detailed here: http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344 If an LDAP server fails to answer an LDAP query, secldapclntd caches the non-answered query negatively. This may happen if the LDAP server is down for example. After the LDAP server is back again secldapclntd will use the negative cache entry and the application initiating the original query will still fail until the cache entry expires. IBM is working on porting the fix to our specific TL and SP levels. What I'm concerned with here, though, is *why* is it timing out? I don't know what the current timeout values are (AIX sucks, etc.) I don't see timeout issues on my Linux boxes, which leads me to believe that either the sssd timouts are longer or that sssd is just more robust when dealing with timeouts. I believe I'm seeing similar behavior with LDAP sudo on AIX as well, because I occasionally have to re-run sudo commands because they initially fail (and I know I'm using the right passwords.) However, sudo doesn't appear to have a cache (or it handles caching
Re: [Freeipa-users] Timeout (?) issues
Thanks. I've been running that against my logs, and this has to be abnormal: err=32 129274No Such Object err=0 10952Successful Operations err=14 536SASL Bind in Progress err=53 39Unwilling To Perform err=493Invalid Credentials (Bad Password) I'm still trying to figure out why there are so many error 32s. Are there any usual suspects I should know about? (That's just the current access log, btw.) On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.com wrote: On 09/16/2013 07:57 PM, Dmitri Pal wrote: On 09/16/2013 12:02 PM, KodaK wrote: Yet another AIX related problem: The AIX LDAP client is called secldapclntd (sure, they could make it more awkward, but the budget ran out.) I'm running into the issue detailed here: http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344 If an LDAP server fails to answer an LDAP query, secldapclntd caches the non-answered query negatively. This may happen if the LDAP server is down for example. After the LDAP server is back again secldapclntd will use the negative cache entry and the application initiating the original query will still fail until the cache entry expires. IBM is working on porting the fix to our specific TL and SP levels. What I'm concerned with here, though, is *why* is it timing out? I don't know what the current timeout values are (AIX sucks, etc.) I don't see timeout issues on my Linux boxes, which leads me to believe that either the sssd timouts are longer or that sssd is just more robust when dealing with timeouts. I believe I'm seeing similar behavior with LDAP sudo on AIX as well, because I occasionally have to re-run sudo commands because they initially fail (and I know I'm using the right passwords.) However, sudo doesn't appear to have a cache (or it handles caching better.) Does anyone have any troubleshooting suggestions? Any general speed things up suggestions on the IPA side? Thanks, --Jason -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users Is the server FreeIPA? Can see in the server logs what is actually happening is it the server that really takes time or there is a network connectivity issue or FW is dropping packets? I would really start with the server side logs. As far as 389 goes, run logconv.pl against the access logs in /var/log/dirsrv/slapd-DOMAIN-COM -- Thank you, Dmitri Pal Sr. Engineering Manager for IdM portfolio Red Hat Inc. --- Looking to carve out IT costs?www.redhat.com/carveoutcosts/ ___ Freeipa-users mailing listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] Timeout (?) issues
On 09/16/2013 07:57 PM, Dmitri Pal wrote: On 09/16/2013 12:02 PM, KodaK wrote: Yet another AIX related problem: The AIX LDAP client is called secldapclntd (sure, they could make it more awkward, but the budget ran out.) I'm running into the issue detailed here: http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344 If an LDAP server fails to answer an LDAP query, secldapclntd caches the non-answered query negatively. This may happen if the LDAP server is down for example. After the LDAP server is back again secldapclntd will use the negative cache entry and the application initiating the original query will still fail until the cache entry expires. IBM is working on porting the fix to our specific TL and SP levels. What I'm concerned with here, though, is *why* is it timing out? I don't know what the current timeout values are (AIX sucks, etc.) I don't see timeout issues on my Linux boxes, which leads me to believe that either the sssd timouts are longer or that sssd is just more robust when dealing with timeouts. I believe I'm seeing similar behavior with LDAP sudo on AIX as well, because I occasionally have to re-run sudo commands because they initially fail (and I know I'm using the right passwords.) However, sudo doesn't appear to have a cache (or it handles caching better.) Does anyone have any troubleshooting suggestions? Any general speed things up suggestions on the IPA side? Thanks, --Jason -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users Is the server FreeIPA? Can see in the server logs what is actually happening is it the server that really takes time or there is a network connectivity issue or FW is dropping packets? I would really start with the server side logs. As far as 389 goes, run logconv.pl against the access logs in /var/log/dirsrv/slapd-DOMAIN-COM -- Thank you, Dmitri Pal Sr. Engineering Manager for IdM portfolio Red Hat Inc. --- Looking to carve out IT costs? www.redhat.com/carveoutcosts/ ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
[Freeipa-users] Timeout (?) issues
Yet another AIX related problem: The AIX LDAP client is called secldapclntd (sure, they could make it more awkward, but the budget ran out.) I'm running into the issue detailed here: http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344 If an LDAP server fails to answer an LDAP query, secldapclntd caches the non-answered query negatively. This may happen if the LDAP server is down for example. After the LDAP server is back again secldapclntd will use the negative cache entry and the application initiating the original query will still fail until the cache entry expires. IBM is working on porting the fix to our specific TL and SP levels. What I'm concerned with here, though, is *why* is it timing out? I don't know what the current timeout values are (AIX sucks, etc.) I don't see timeout issues on my Linux boxes, which leads me to believe that either the sssd timouts are longer or that sssd is just more robust when dealing with timeouts. I believe I'm seeing similar behavior with LDAP sudo on AIX as well, because I occasionally have to re-run sudo commands because they initially fail (and I know I'm using the right passwords.) However, sudo doesn't appear to have a cache (or it handles caching better.) Does anyone have any troubleshooting suggestions? Any general speed things up suggestions on the IPA side? Thanks, --Jason -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users
Re: [Freeipa-users] Timeout (?) issues
On 09/16/2013 12:02 PM, KodaK wrote: Yet another AIX related problem: The AIX LDAP client is called secldapclntd (sure, they could make it more awkward, but the budget ran out.) I'm running into the issue detailed here: http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344 If an LDAP server fails to answer an LDAP query, secldapclntd caches the non-answered query negatively. This may happen if the LDAP server is down for example. After the LDAP server is back again secldapclntd will use the negative cache entry and the application initiating the original query will still fail until the cache entry expires. IBM is working on porting the fix to our specific TL and SP levels. What I'm concerned with here, though, is *why* is it timing out? I don't know what the current timeout values are (AIX sucks, etc.) I don't see timeout issues on my Linux boxes, which leads me to believe that either the sssd timouts are longer or that sssd is just more robust when dealing with timeouts. I believe I'm seeing similar behavior with LDAP sudo on AIX as well, because I occasionally have to re-run sudo commands because they initially fail (and I know I'm using the right passwords.) However, sudo doesn't appear to have a cache (or it handles caching better.) Does anyone have any troubleshooting suggestions? Any general speed things up suggestions on the IPA side? Thanks, --Jason -- The government is going to read our mail anyway, might as well make it tough for them. GPG Public key ID: B6A1A7C6 ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users Is the server FreeIPA? Can see in the server logs what is actually happening is it the server that really takes time or there is a network connectivity issue or FW is dropping packets? I would really start with the server side logs. -- Thank you, Dmitri Pal Sr. Engineering Manager for IdM portfolio Red Hat Inc. --- Looking to carve out IT costs? www.redhat.com/carveoutcosts/ ___ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users