Re: [Freeipa-users] Timeout (?) issues

2013-09-23 Thread KodaK
I'm pretty sure this is the root of my problem (not confirmed yet, but it's
AIX -- that's always the problem):

http://www-01.ibm.com/support/docview.wss?uid=swg21212940

The takeaway is this:

The first query (184) is a normal IPV4 lookup for ldap.austin.texas.com,
which returns 192.168.1.255. But then an IPV6 lookup is done for the same
name. Because there is no IPV6 address for ldap.austin.texas.com, it
continues searching every search domain in the resolv.conf file (
example.austin.texas.com austin.texas.com texas.com) trying to find one.



On Fri, Sep 20, 2013 at 3:07 AM, Petr Spacek pspa...@redhat.com wrote:

 On 20.9.2013 01:24, KodaK wrote:

 This is ridiculous, right?

 IPA server 1:

 # for i in $(ls access*); do echo -n  $i:\  ;grep err=32 $i | wc -l; done
 access: 248478
 access.20130916-043207: 302774
 access.20130916-123642: 272572
 access.20130916-201516: 294308
 access.20130917-081053: 295060
 access.20130917-144559: 284498
 access.20130917-231435: 281035
 access.20130918-091611: 291165
 access.20130918-154945: 275792
 access.20130919-014322: 296113

 IPA server 2:

 access: 4313
 access.20130909-200216: 4023
 access.20130910-200229: 4161
 access.20130911-200239: 4182
 access.20130912-200249: 5069
 access.20130913-200258: 3833
 access.20130914-200313: 4208
 access.20130915-200323: 4702
 access.20130916-200332: 4532


 IPA server 3:

 access: 802
 access.20130910-080737: 3876
 access.20130911-080748: 3902
 access.20130912-080802: 3678
 access.20130913-080810: 3765
 access.20130914-080826: 3524
 access.20130915-080907: 4142
 access.20130916-080916: 4930
 access.20130917-080926: 4769
 access.20130918-081005: 2879

 IPA server 4:

 access: 2812
 access.20130910-003051: 4095
 access.20130911-003105: 3623
 access.20130912-003113: 3606
 access.20130913-003125: 3581
 access.20130914-003135: 3758
 access.20130915-003150: 3935
 access.20130916-003159: 4184
 access.20130917-003210: 3859
 access.20130918-003221: 5110


 The vast majority of the err=32 messages are DNS entries.


 It depends on your setup. Bind-dyndb-ldap does LDAP search for each
 non-existent name to verify that the name wasn't added to LDAP in
 meanwhile. If you have clients doing 1M queries for non-existing names per
 day, then you will see 1M LDAP queries with err=32 per day.

 Next major version of bind-dyndb-ldap will have reworked internal database
 and it will support negative caching, so number of err=32 should drop
 significantly.


  Here are some samples:

 [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 SRCH base=idnsName=xxx.com
 ,idnsname=unix.xxx.com,cn=dns,**dc=unix,dc=xxx,dc=com scope=0
 filter=(objectClass=**idnsRecord) attrs=ALL
 [19/Sep/2013:18:19:51 -0500] conn=9 op=169764 RESULT err=32 tag=101
 nentries=0 etime=0


 This is interesting, because this LDAP query is equal to DNS query for 
 xxx.com.unix.xxx.com. Are your clients that crazy? :-)


  [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 SRCH base=idnsName=
 slpoxacl01.unix.xxx.com,**idnsname=unix.xxx.com,cn=dns,**
 dc=unix,dc=xxx,dc=com
 scope=0 filter=(objectClass=**idnsRecord) attrs=ALL
 [19/Sep/2013:18:19:51 -0500] conn=9 op=169774 RESULT err=32 tag=101
 nentries=0 etime=0


 This is equivalent to DNS query for slpoxacl01.unix.xxx.com.unix.**
 xxx.com http://slpoxacl01.unix.xxx.com.unix.xxx.com..


  [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 SRCH base=idnsName=
 sla400q1.unix.xxx.com,**idnsname=unix.xxx.com,cn=dns,**
 dc=unix,dc=xxx,dc=com
 scope=0 filter=(objectClass=**idnsRecord) attrs=ALL
 [19/Sep/2013:18:19:51 -0500] conn=9 op=169770 RESULT err=32 tag=101
 nentries=0 etime=0


 And this is 
 sla400q1.unix.xxx.com.unix.**xxx.comhttp://sla400q1.unix.xxx.com.unix.xxx.com
 ..


  [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 SRCH base=idnsName=
 magellanhealth.com,idnsname=un**ix.magellanhealth.comhttp://unix.magellanhealth.com
 ,cn=dns,**dc=unix,dc=magellanhealth,dc=**com
 scope=0 filter=(objectClass=**idnsRecord) attrs=ALL
 [19/Sep/2013:18:19:51 -0500] conn=9 op=169772 RESULT err=32 tag=101
 nentries=0 etime=0

 So far today there are over half a million of these.  That can't be right.


 I would recommend you to use network sniffer and check which clients sends
 these crazy queries.

 My guess is that your resolver library (libc?) causes this.

 On my Linux system with glibc-2.17-14.fc19.x86_64 it behaves in this way:

 client query = nonexistent.example.com.
 (I used $ ping nonexistent.example.com.)
 search domain in /etc/resolv.conf = brq.redhat.com.

 DNS query #1: nonexistent.example.com. = NXDOMAIN
 DNS query #2: 
 nonexistent.example.com.brq.**redhat.comhttp://nonexistent.example.com.brq.redhat.com.
 = NXDOMAIN
 DNS query #3: 
 nonexistent.example.com.**redhat.comhttp://nonexistent.example.com.redhat.com.
 = NXDOMAIN


  On Thu, Sep 19, 2013 at 3:05 PM, KodaK sako...@gmail.com wrote:

  I didn't realize that DNS created one connection.  I thought it was one
 connection spanning several days.


 In theory, there should be 2-4 LDAP connections 

Re: [Freeipa-users] Timeout (?) issues

2013-09-20 Thread Petr Spacek

On 20.9.2013 01:24, KodaK wrote:

This is ridiculous, right?

IPA server 1:

# for i in $(ls access*); do echo -n  $i:\  ;grep err=32 $i | wc -l; done
access: 248478
access.20130916-043207: 302774
access.20130916-123642: 272572
access.20130916-201516: 294308
access.20130917-081053: 295060
access.20130917-144559: 284498
access.20130917-231435: 281035
access.20130918-091611: 291165
access.20130918-154945: 275792
access.20130919-014322: 296113

IPA server 2:

access: 4313
access.20130909-200216: 4023
access.20130910-200229: 4161
access.20130911-200239: 4182
access.20130912-200249: 5069
access.20130913-200258: 3833
access.20130914-200313: 4208
access.20130915-200323: 4702
access.20130916-200332: 4532


IPA server 3:

access: 802
access.20130910-080737: 3876
access.20130911-080748: 3902
access.20130912-080802: 3678
access.20130913-080810: 3765
access.20130914-080826: 3524
access.20130915-080907: 4142
access.20130916-080916: 4930
access.20130917-080926: 4769
access.20130918-081005: 2879

IPA server 4:

access: 2812
access.20130910-003051: 4095
access.20130911-003105: 3623
access.20130912-003113: 3606
access.20130913-003125: 3581
access.20130914-003135: 3758
access.20130915-003150: 3935
access.20130916-003159: 4184
access.20130917-003210: 3859
access.20130918-003221: 5110


The vast majority of the err=32 messages are DNS entries.


It depends on your setup. Bind-dyndb-ldap does LDAP search for each 
non-existent name to verify that the name wasn't added to LDAP in meanwhile. 
If you have clients doing 1M queries for non-existing names per day, then you 
will see 1M LDAP queries with err=32 per day.


Next major version of bind-dyndb-ldap will have reworked internal database and 
it will support negative caching, so number of err=32 should drop significantly.



Here are some samples:

[19/Sep/2013:18:19:51 -0500] conn=9 op=169764 SRCH base=idnsName=xxx.com
,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com scope=0
filter=(objectClass=idnsRecord) attrs=ALL
[19/Sep/2013:18:19:51 -0500] conn=9 op=169764 RESULT err=32 tag=101
nentries=0 etime=0


This is interesting, because this LDAP query is equal to DNS query for 
xxx.com.unix.xxx.com. Are your clients that crazy? :-)



[19/Sep/2013:18:19:51 -0500] conn=9 op=169774 SRCH base=idnsName=
slpoxacl01.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com
scope=0 filter=(objectClass=idnsRecord) attrs=ALL
[19/Sep/2013:18:19:51 -0500] conn=9 op=169774 RESULT err=32 tag=101
nentries=0 etime=0


This is equivalent to DNS query for slpoxacl01.unix.xxx.com.unix.xxx.com..


[19/Sep/2013:18:19:51 -0500] conn=9 op=169770 SRCH base=idnsName=
sla400q1.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com
scope=0 filter=(objectClass=idnsRecord) attrs=ALL
[19/Sep/2013:18:19:51 -0500] conn=9 op=169770 RESULT err=32 tag=101
nentries=0 etime=0


And this is sla400q1.unix.xxx.com.unix.xxx.com..


[19/Sep/2013:18:19:51 -0500] conn=9 op=169772 SRCH base=idnsName=
magellanhealth.com,idnsname=unix.magellanhealth.com,cn=dns,dc=unix,dc=magellanhealth,dc=com
scope=0 filter=(objectClass=idnsRecord) attrs=ALL
[19/Sep/2013:18:19:51 -0500] conn=9 op=169772 RESULT err=32 tag=101
nentries=0 etime=0

So far today there are over half a million of these.  That can't be right.


I would recommend you to use network sniffer and check which clients sends 
these crazy queries.


My guess is that your resolver library (libc?) causes this.

On my Linux system with glibc-2.17-14.fc19.x86_64 it behaves in this way:

client query = nonexistent.example.com.
(I used $ ping nonexistent.example.com.)
search domain in /etc/resolv.conf = brq.redhat.com.

DNS query #1: nonexistent.example.com. = NXDOMAIN
DNS query #2: nonexistent.example.com.brq.redhat.com. = NXDOMAIN
DNS query #3: nonexistent.example.com.redhat.com. = NXDOMAIN


On Thu, Sep 19, 2013 at 3:05 PM, KodaK sako...@gmail.com wrote:


I didn't realize that DNS created one connection.  I thought it was one
connection spanning several days.


In theory, there should be 2-4 LDAP connections from each DNS server and those 
connections should live until DNS or LDAP server restarts/crashes.


Petr^2 Spacek


On Thu, Sep 19, 2013 at 2:51 PM, Rich Megginson rmegg...@redhat.comwrote:


  On 09/19/2013 12:57 PM, KodaK wrote:

Well, this is awkward:

  [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l
5453936
[root@slpidml01 slapd-UNIX-xxx-COM]#


Why is it awkward?




On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote:


Thanks.  I've been running that against my logs, and this has to be
abnormal:

  err=32   129274No Such Object
err=0 10952Successful Operations
err=14  536SASL Bind in Progress
err=53   39Unwilling To Perform
err=493Invalid Credentials (Bad Password)

  I'm still trying to figure out why there are so many error 32s.  Are
there any usual suspects I should know about?  (That's just the current

Re: [Freeipa-users] Timeout (?) issues

2013-09-19 Thread KodaK
SRV records were missing for _ldaps_tcp.  I added them in for the IPA
servers and that knocked out some of the errors, but there are still a lot.
 I suspect these boxes are overloaded with bad dns queries (probably due to
something I've messed up.)

Any help would be appreciated, but I'm opening a RH ticket.

Thanks,

--Jason


On Thu, Sep 19, 2013 at 1:57 PM, KodaK sako...@gmail.com wrote:

 Well, this is awkward:

 [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l
 5453936
 [root@slpidml01 slapd-UNIX-xxx-COM]#


 On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote:

 Thanks.  I've been running that against my logs, and this has to be
 abnormal:

 err=32   129274No Such Object
 err=0 10952Successful Operations
 err=14  536SASL Bind in Progress
 err=53   39Unwilling To Perform
 err=493Invalid Credentials (Bad Password)

 I'm still trying to figure out why there are so many error 32s.  Are
 there any usual suspects I should know about?  (That's just the current
 access log, btw.)


 On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.comwrote:

  On 09/16/2013 07:57 PM, Dmitri Pal wrote:

 On 09/16/2013 12:02 PM, KodaK wrote:

 Yet another AIX related problem:

  The AIX LDAP client is called secldapclntd (sure, they could make it
 more awkward, but the budget ran out.)  I'm running into the issue detailed
 here:

  http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344

  If an LDAP server fails to answer an LDAP query, secldapclntd caches
 the non-answered query negatively. This may happen if the LDAP server
 is down for example. After the LDAP server is back again secldapclntd will
 use the negative cache entry and the application initiating the original
 query will still fail until the cache entry expires.

  IBM is working on porting the fix to our specific TL and SP levels.

  What I'm concerned with here, though, is *why* is it timing out?  I
 don't know what the current timeout values are (AIX sucks, etc.)

  I don't see timeout issues on my Linux boxes, which leads me to
 believe that either the sssd timouts are longer or that sssd is just more
 robust when dealing with timeouts.

  I believe I'm seeing similar behavior with LDAP sudo on AIX as well,
 because I occasionally have to re-run sudo commands because they initially
 fail (and I know I'm using the right passwords.)  However, sudo doesn't
 appear to have a cache (or it handles caching better.)

  Does anyone have any troubleshooting suggestions?  Any general speed
 things up suggestions on the IPA side?

  Thanks,

  --Jason

  --
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6


 ___
 Freeipa-users mailing 
 listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users


 Is the server FreeIPA?
 Can see in the server logs what is actually happening is it the server
 that really takes time or there is a network connectivity issue or FW is
 dropping packets?
 I would really start with the server side logs.


 As far as 389 goes, run logconv.pl against the access logs in
 /var/log/dirsrv/slapd-DOMAIN-COM



 --
 Thank you,
 Dmitri Pal

 Sr. Engineering Manager for IdM portfolio
 Red Hat Inc.


 ---
 Looking to carve out IT costs?www.redhat.com/carveoutcosts/



 ___
 Freeipa-users mailing 
 listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users



 ___
 Freeipa-users mailing list
 Freeipa-users@redhat.com
 https://www.redhat.com/mailman/listinfo/freeipa-users




 --
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6




 --
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6




-- 
The government is going to read our mail anyway, might as well make it
tough for them.  GPG Public key ID:  B6A1A7C6
___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] Timeout (?) issues

2013-09-19 Thread KodaK
Well, this is awkward:

[root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l
5453936
[root@slpidml01 slapd-UNIX-xxx-COM]#


On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote:

 Thanks.  I've been running that against my logs, and this has to be
 abnormal:

 err=32   129274No Such Object
 err=0 10952Successful Operations
 err=14  536SASL Bind in Progress
 err=53   39Unwilling To Perform
 err=493Invalid Credentials (Bad Password)

 I'm still trying to figure out why there are so many error 32s.  Are there
 any usual suspects I should know about?  (That's just the current access
 log, btw.)


 On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.comwrote:

  On 09/16/2013 07:57 PM, Dmitri Pal wrote:

 On 09/16/2013 12:02 PM, KodaK wrote:

 Yet another AIX related problem:

  The AIX LDAP client is called secldapclntd (sure, they could make it
 more awkward, but the budget ran out.)  I'm running into the issue detailed
 here:

  http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344

  If an LDAP server fails to answer an LDAP query, secldapclntd caches
 the non-answered query negatively. This may happen if the LDAP server is down
 for example. After the LDAP server is back again secldapclntd will use
 the negative cache entry and the application initiating the original
 query will still fail until the cache entry expires.

  IBM is working on porting the fix to our specific TL and SP levels.

  What I'm concerned with here, though, is *why* is it timing out?  I
 don't know what the current timeout values are (AIX sucks, etc.)

  I don't see timeout issues on my Linux boxes, which leads me to believe
 that either the sssd timouts are longer or that sssd is just more robust
 when dealing with timeouts.

  I believe I'm seeing similar behavior with LDAP sudo on AIX as well,
 because I occasionally have to re-run sudo commands because they initially
 fail (and I know I'm using the right passwords.)  However, sudo doesn't
 appear to have a cache (or it handles caching better.)

  Does anyone have any troubleshooting suggestions?  Any general speed
 things up suggestions on the IPA side?

  Thanks,

  --Jason

  --
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6


 ___
 Freeipa-users mailing 
 listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users


 Is the server FreeIPA?
 Can see in the server logs what is actually happening is it the server
 that really takes time or there is a network connectivity issue or FW is
 dropping packets?
 I would really start with the server side logs.


 As far as 389 goes, run logconv.pl against the access logs in
 /var/log/dirsrv/slapd-DOMAIN-COM



 --
 Thank you,
 Dmitri Pal

 Sr. Engineering Manager for IdM portfolio
 Red Hat Inc.


 ---
 Looking to carve out IT costs?www.redhat.com/carveoutcosts/



 ___
 Freeipa-users mailing 
 listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users



 ___
 Freeipa-users mailing list
 Freeipa-users@redhat.com
 https://www.redhat.com/mailman/listinfo/freeipa-users




 --
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6




-- 
The government is going to read our mail anyway, might as well make it
tough for them.  GPG Public key ID:  B6A1A7C6
___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] Timeout (?) issues

2013-09-19 Thread KodaK
I didn't realize that DNS created one connection.  I thought it was one
connection spanning several days.


On Thu, Sep 19, 2013 at 2:51 PM, Rich Megginson rmegg...@redhat.com wrote:

  On 09/19/2013 12:57 PM, KodaK wrote:

 Well, this is awkward:

  [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l
 5453936
 [root@slpidml01 slapd-UNIX-xxx-COM]#


 Why is it awkward?




 On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote:

 Thanks.  I've been running that against my logs, and this has to be
 abnormal:

  err=32   129274No Such Object
 err=0 10952Successful Operations
 err=14  536SASL Bind in Progress
 err=53   39Unwilling To Perform
 err=493Invalid Credentials (Bad Password)

  I'm still trying to figure out why there are so many error 32s.  Are
 there any usual suspects I should know about?  (That's just the current
 access log, btw.)


 On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.comwrote:

   On 09/16/2013 07:57 PM, Dmitri Pal wrote:

 On 09/16/2013 12:02 PM, KodaK wrote:

 Yet another AIX related problem:

  The AIX LDAP client is called secldapclntd (sure, they could make it
 more awkward, but the budget ran out.)  I'm running into the issue detailed
 here:

  http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344

  If an LDAP server fails to answer an LDAP query, secldapclntd caches
 the non-answered query negatively. This may happen if the LDAP server
 is down for example. After the LDAP server is back again secldapclntd will
 use the negative cache entry and the application initiating the original
 query will still fail until the cache entry expires.

  IBM is working on porting the fix to our specific TL and SP levels.

  What I'm concerned with here, though, is *why* is it timing out?  I
 don't know what the current timeout values are (AIX sucks, etc.)

  I don't see timeout issues on my Linux boxes, which leads me to
 believe that either the sssd timouts are longer or that sssd is just more
 robust when dealing with timeouts.

  I believe I'm seeing similar behavior with LDAP sudo on AIX as well,
 because I occasionally have to re-run sudo commands because they initially
 fail (and I know I'm using the right passwords.)  However, sudo doesn't
 appear to have a cache (or it handles caching better.)

  Does anyone have any troubleshooting suggestions?  Any general speed
 things up suggestions on the IPA side?

  Thanks,

  --Jason

  --
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6


 ___
 Freeipa-users mailing 
 listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users


 Is the server FreeIPA?
 Can see in the server logs what is actually happening is it the server
 that really takes time or there is a network connectivity issue or FW is
 dropping packets?
 I would really start with the server side logs.


  As far as 389 goes, run logconv.pl against the access logs in
 /var/log/dirsrv/slapd-DOMAIN-COM



 --
 Thank you,
 Dmitri Pal

 Sr. Engineering Manager for IdM portfolio
 Red Hat Inc.


 ---
 Looking to carve out IT costs?www.redhat.com/carveoutcosts/



  ___
 Freeipa-users mailing 
 listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users



 ___
 Freeipa-users mailing list
 Freeipa-users@redhat.com
 https://www.redhat.com/mailman/listinfo/freeipa-users




  --
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6




  --
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6





-- 
The government is going to read our mail anyway, might as well make it
tough for them.  GPG Public key ID:  B6A1A7C6
___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] Timeout (?) issues

2013-09-19 Thread Rich Megginson

On 09/19/2013 12:57 PM, KodaK wrote:

Well, this is awkward:

[root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l
5453936
[root@slpidml01 slapd-UNIX-xxx-COM]#


Why is it awkward?




On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com 
mailto:sako...@gmail.com wrote:


Thanks.  I've been running that against my logs, and this has to
be abnormal:

err=32   129274No Such Object
err=0 10952Successful Operations
err=14  536SASL Bind in Progress
err=53   39Unwilling To Perform
err=493Invalid Credentials (Bad Password)

I'm still trying to figure out why there are so many error 32s.
 Are there any usual suspects I should know about?  (That's just
the current access log, btw.)


On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson
rmegg...@redhat.com mailto:rmegg...@redhat.com wrote:

On 09/16/2013 07:57 PM, Dmitri Pal wrote:

On 09/16/2013 12:02 PM, KodaK wrote:

Yet another AIX related problem:

The AIX LDAP client is called secldapclntd (sure, they could
make it more awkward, but the budget ran out.)  I'm running
into the issue detailed here:

http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344

If an LDAP server fails to answer an LDAP query,
secldapclntd caches the non-answered query negatively. This
may happen if the LDAP server is down for example. After the
LDAP server is back again secldapclntd will use the negative
cache entry and the application initiating the original
query will still fail until the cache entry expires.

IBM is working on porting the fix to our specific TL and SP
levels.

What I'm concerned with here, though, is *why* is it timing
out?  I don't know what the current timeout values are (AIX
sucks, etc.)

I don't see timeout issues on my Linux boxes, which leads me
to believe that either the sssd timouts are longer or that
sssd is just more robust when dealing with timeouts.

I believe I'm seeing similar behavior with LDAP sudo on AIX
as well, because I occasionally have to re-run sudo commands
because they initially fail (and I know I'm using the right
passwords.)  However, sudo doesn't appear to have a cache
(or it handles caching better.)

Does anyone have any troubleshooting suggestions?  Any
general speed things up suggestions on the IPA side?

Thanks,

--Jason

-- 
The government is going to read our mail anyway, might as

well make it tough for them.  GPG Public key ID:  B6A1A7C6


___
Freeipa-users mailing list
Freeipa-users@redhat.com  mailto:Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users


Is the server FreeIPA?
Can see in the server logs what is actually happening is it
the server that really takes time or there is a network
connectivity issue or FW is dropping packets?
I would really start with the server side logs.


As far as 389 goes, run logconv.pl http://logconv.pl against
the access logs in /var/log/dirsrv/slapd-DOMAIN-COM



-- 
Thank you,

Dmitri Pal

Sr. Engineering Manager for IdM portfolio
Red Hat Inc.


---
Looking to carve out IT costs?
www.redhat.com/carveoutcosts/  http://www.redhat.com/carveoutcosts/




___
Freeipa-users mailing list
Freeipa-users@redhat.com  mailto:Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users



___
Freeipa-users mailing list
Freeipa-users@redhat.com mailto:Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users




-- 
The government is going to read our mail anyway, might as well

make it tough for them.  GPG Public key ID:  B6A1A7C6




--
The government is going to read our mail anyway, might as well make it 
tough for them.  GPG Public key ID:  B6A1A7C6


___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] Timeout (?) issues

2013-09-19 Thread KodaK
This is ridiculous, right?

IPA server 1:

# for i in $(ls access*); do echo -n  $i:\  ;grep err=32 $i | wc -l; done
access: 248478
access.20130916-043207: 302774
access.20130916-123642: 272572
access.20130916-201516: 294308
access.20130917-081053: 295060
access.20130917-144559: 284498
access.20130917-231435: 281035
access.20130918-091611: 291165
access.20130918-154945: 275792
access.20130919-014322: 296113

IPA server 2:

access: 4313
access.20130909-200216: 4023
access.20130910-200229: 4161
access.20130911-200239: 4182
access.20130912-200249: 5069
access.20130913-200258: 3833
access.20130914-200313: 4208
access.20130915-200323: 4702
access.20130916-200332: 4532


IPA server 3:

access: 802
access.20130910-080737: 3876
access.20130911-080748: 3902
access.20130912-080802: 3678
access.20130913-080810: 3765
access.20130914-080826: 3524
access.20130915-080907: 4142
access.20130916-080916: 4930
access.20130917-080926: 4769
access.20130918-081005: 2879

IPA server 4:

access: 2812
access.20130910-003051: 4095
access.20130911-003105: 3623
access.20130912-003113: 3606
access.20130913-003125: 3581
access.20130914-003135: 3758
access.20130915-003150: 3935
access.20130916-003159: 4184
access.20130917-003210: 3859
access.20130918-003221: 5110


The vast majority of the err=32 messages are DNS entries.

Here are some samples:

[19/Sep/2013:18:19:51 -0500] conn=9 op=169764 SRCH base=idnsName=xxx.com
,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com scope=0
filter=(objectClass=idnsRecord) attrs=ALL
[19/Sep/2013:18:19:51 -0500] conn=9 op=169764 RESULT err=32 tag=101
nentries=0 etime=0

[19/Sep/2013:18:19:51 -0500] conn=9 op=169774 SRCH base=idnsName=
slpoxacl01.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com
scope=0 filter=(objectClass=idnsRecord) attrs=ALL
[19/Sep/2013:18:19:51 -0500] conn=9 op=169774 RESULT err=32 tag=101
nentries=0 etime=0

[19/Sep/2013:18:19:51 -0500] conn=9 op=169770 SRCH base=idnsName=
sla400q1.unix.xxx.com,idnsname=unix.xxx.com,cn=dns,dc=unix,dc=xxx,dc=com
scope=0 filter=(objectClass=idnsRecord) attrs=ALL
[19/Sep/2013:18:19:51 -0500] conn=9 op=169770 RESULT err=32 tag=101
nentries=0 etime=0

[19/Sep/2013:18:19:51 -0500] conn=9 op=169772 SRCH base=idnsName=
magellanhealth.com,idnsname=unix.magellanhealth.com,cn=dns,dc=unix,dc=magellanhealth,dc=com
scope=0 filter=(objectClass=idnsRecord) attrs=ALL
[19/Sep/2013:18:19:51 -0500] conn=9 op=169772 RESULT err=32 tag=101
nentries=0 etime=0

So far today there are over half a million of these.  That can't be right.



On Thu, Sep 19, 2013 at 3:05 PM, KodaK sako...@gmail.com wrote:

 I didn't realize that DNS created one connection.  I thought it was one
 connection spanning several days.


 On Thu, Sep 19, 2013 at 2:51 PM, Rich Megginson rmegg...@redhat.comwrote:

  On 09/19/2013 12:57 PM, KodaK wrote:

 Well, this is awkward:

  [root@slpidml01 slapd-UNIX-xxx-COM]# grep conn=170902 access* | wc -l
 5453936
 [root@slpidml01 slapd-UNIX-xxx-COM]#


 Why is it awkward?




 On Thu, Sep 19, 2013 at 1:48 PM, KodaK sako...@gmail.com wrote:

 Thanks.  I've been running that against my logs, and this has to be
 abnormal:

  err=32   129274No Such Object
 err=0 10952Successful Operations
 err=14  536SASL Bind in Progress
 err=53   39Unwilling To Perform
 err=493Invalid Credentials (Bad Password)

  I'm still trying to figure out why there are so many error 32s.  Are
 there any usual suspects I should know about?  (That's just the current
 access log, btw.)


 On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.comwrote:

   On 09/16/2013 07:57 PM, Dmitri Pal wrote:

 On 09/16/2013 12:02 PM, KodaK wrote:

 Yet another AIX related problem:

  The AIX LDAP client is called secldapclntd (sure, they could make it
 more awkward, but the budget ran out.)  I'm running into the issue detailed
 here:

  http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344

  If an LDAP server fails to answer an LDAP query, secldapclntd caches
 the non-answered query negatively. This may happen if the LDAP server
 is down for example. After the LDAP server is back again secldapclntd will
 use the negative cache entry and the application initiating the original
 query will still fail until the cache entry expires.

  IBM is working on porting the fix to our specific TL and SP levels.

  What I'm concerned with here, though, is *why* is it timing out?  I
 don't know what the current timeout values are (AIX sucks, etc.)

  I don't see timeout issues on my Linux boxes, which leads me to
 believe that either the sssd timouts are longer or that sssd is just more
 robust when dealing with timeouts.

  I believe I'm seeing similar behavior with LDAP sudo on AIX as well,
 because I occasionally have to re-run sudo commands because they initially
 fail (and I know I'm using the right passwords.)  However, sudo doesn't
 appear to have a cache (or it handles caching 

Re: [Freeipa-users] Timeout (?) issues

2013-09-19 Thread KodaK
Thanks.  I've been running that against my logs, and this has to be
abnormal:

err=32   129274No Such Object
err=0 10952Successful Operations
err=14  536SASL Bind in Progress
err=53   39Unwilling To Perform
err=493Invalid Credentials (Bad Password)

I'm still trying to figure out why there are so many error 32s.  Are there
any usual suspects I should know about?  (That's just the current access
log, btw.)


On Tue, Sep 17, 2013 at 9:01 AM, Rich Megginson rmegg...@redhat.com wrote:

  On 09/16/2013 07:57 PM, Dmitri Pal wrote:

 On 09/16/2013 12:02 PM, KodaK wrote:

 Yet another AIX related problem:

  The AIX LDAP client is called secldapclntd (sure, they could make it
 more awkward, but the budget ran out.)  I'm running into the issue detailed
 here:

  http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344

  If an LDAP server fails to answer an LDAP query, secldapclntd caches
 the non-answered query negatively. This may happen if the LDAP server is down
 for example. After the LDAP server is back again secldapclntd will use
 the negative cache entry and the application initiating the original
 query will still fail until the cache entry expires.

  IBM is working on porting the fix to our specific TL and SP levels.

  What I'm concerned with here, though, is *why* is it timing out?  I
 don't know what the current timeout values are (AIX sucks, etc.)

  I don't see timeout issues on my Linux boxes, which leads me to believe
 that either the sssd timouts are longer or that sssd is just more robust
 when dealing with timeouts.

  I believe I'm seeing similar behavior with LDAP sudo on AIX as well,
 because I occasionally have to re-run sudo commands because they initially
 fail (and I know I'm using the right passwords.)  However, sudo doesn't
 appear to have a cache (or it handles caching better.)

  Does anyone have any troubleshooting suggestions?  Any general speed
 things up suggestions on the IPA side?

  Thanks,

  --Jason

  --
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6


 ___
 Freeipa-users mailing 
 listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users


 Is the server FreeIPA?
 Can see in the server logs what is actually happening is it the server
 that really takes time or there is a network connectivity issue or FW is
 dropping packets?
 I would really start with the server side logs.


 As far as 389 goes, run logconv.pl against the access logs in
 /var/log/dirsrv/slapd-DOMAIN-COM



 --
 Thank you,
 Dmitri Pal

 Sr. Engineering Manager for IdM portfolio
 Red Hat Inc.


 ---
 Looking to carve out IT costs?www.redhat.com/carveoutcosts/



 ___
 Freeipa-users mailing 
 listFreeipa-users@redhat.comhttps://www.redhat.com/mailman/listinfo/freeipa-users



 ___
 Freeipa-users mailing list
 Freeipa-users@redhat.com
 https://www.redhat.com/mailman/listinfo/freeipa-users




-- 
The government is going to read our mail anyway, might as well make it
tough for them.  GPG Public key ID:  B6A1A7C6
___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] Timeout (?) issues

2013-09-17 Thread Rich Megginson

On 09/16/2013 07:57 PM, Dmitri Pal wrote:

On 09/16/2013 12:02 PM, KodaK wrote:

Yet another AIX related problem:

The AIX LDAP client is called secldapclntd (sure, they could make it 
more awkward, but the budget ran out.)  I'm running into the issue 
detailed here:


http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344

If an LDAP server fails to answer an LDAP query, secldapclntd caches 
the non-answered query negatively. This may happen if the LDAP server 
is down for example. After the LDAP server is back again secldapclntd 
will use the negative cache entry and the application initiating the 
original query will still fail until the cache entry expires.


IBM is working on porting the fix to our specific TL and SP levels.

What I'm concerned with here, though, is *why* is it timing out?  I 
don't know what the current timeout values are (AIX sucks, etc.)


I don't see timeout issues on my Linux boxes, which leads me to 
believe that either the sssd timouts are longer or that sssd is just 
more robust when dealing with timeouts.


I believe I'm seeing similar behavior with LDAP sudo on AIX as well, 
because I occasionally have to re-run sudo commands because they 
initially fail (and I know I'm using the right passwords.)  However, 
sudo doesn't appear to have a cache (or it handles caching better.)


Does anyone have any troubleshooting suggestions?  Any general speed 
things up suggestions on the IPA side?


Thanks,

--Jason

--
The government is going to read our mail anyway, might as well make 
it tough for them.  GPG Public key ID:  B6A1A7C6



___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users


Is the server FreeIPA?
Can see in the server logs what is actually happening is it the server 
that really takes time or there is a network connectivity issue or FW 
is dropping packets?

I would really start with the server side logs.


As far as 389 goes, run logconv.pl against the access logs in 
/var/log/dirsrv/slapd-DOMAIN-COM



--
Thank you,
Dmitri Pal

Sr. Engineering Manager for IdM portfolio
Red Hat Inc.


---
Looking to carve out IT costs?
www.redhat.com/carveoutcosts/




___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users


___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

[Freeipa-users] Timeout (?) issues

2013-09-16 Thread KodaK
Yet another AIX related problem:

The AIX LDAP client is called secldapclntd (sure, they could make it more
awkward, but the budget ran out.)  I'm running into the issue detailed here:

http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344

If an LDAP server fails to answer an LDAP query, secldapclntd caches
the non-answered
query negatively. This may happen if the LDAP server is down for example.
After the LDAP server is back again secldapclntd will use the negative
cache entry and the application initiating the original query will still
fail until the cache entry expires.

IBM is working on porting the fix to our specific TL and SP levels.

What I'm concerned with here, though, is *why* is it timing out?  I don't
know what the current timeout values are (AIX sucks, etc.)

I don't see timeout issues on my Linux boxes, which leads me to believe
that either the sssd timouts are longer or that sssd is just more robust
when dealing with timeouts.

I believe I'm seeing similar behavior with LDAP sudo on AIX as well,
because I occasionally have to re-run sudo commands because they initially
fail (and I know I'm using the right passwords.)  However, sudo doesn't
appear to have a cache (or it handles caching better.)

Does anyone have any troubleshooting suggestions?  Any general speed
things up suggestions on the IPA side?

Thanks,

--Jason

-- 
The government is going to read our mail anyway, might as well make it
tough for them.  GPG Public key ID:  B6A1A7C6
___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] Timeout (?) issues

2013-09-16 Thread Dmitri Pal
On 09/16/2013 12:02 PM, KodaK wrote:
 Yet another AIX related problem:

 The AIX LDAP client is called secldapclntd (sure, they could make it
 more awkward, but the budget ran out.)  I'm running into the issue
 detailed here:

 http://www-01.ibm.com/support/docview.wss?uid=isg1IV11344

 If an LDAP server fails to answer an LDAP query, secldapclntd caches
 the non-answered query negatively. This may happen if the LDAP server
 is down for example. After the LDAP server is back
 again secldapclntd will use the negative cache entry and the
 application initiating the original query will still fail until the
 cache entry expires.

 IBM is working on porting the fix to our specific TL and SP levels.

 What I'm concerned with here, though, is *why* is it timing out?  I
 don't know what the current timeout values are (AIX sucks, etc.)

 I don't see timeout issues on my Linux boxes, which leads me to
 believe that either the sssd timouts are longer or that sssd is just
 more robust when dealing with timeouts.

 I believe I'm seeing similar behavior with LDAP sudo on AIX as well,
 because I occasionally have to re-run sudo commands because they
 initially fail (and I know I'm using the right passwords.)  However,
 sudo doesn't appear to have a cache (or it handles caching better.)

 Does anyone have any troubleshooting suggestions?  Any general speed
 things up suggestions on the IPA side?

 Thanks,

 --Jason

 -- 
 The government is going to read our mail anyway, might as well make it
 tough for them.  GPG Public key ID:  B6A1A7C6


 ___
 Freeipa-users mailing list
 Freeipa-users@redhat.com
 https://www.redhat.com/mailman/listinfo/freeipa-users

Is the server FreeIPA?
Can see in the server logs what is actually happening is it the server
that really takes time or there is a network connectivity issue or FW is
dropping packets?
I would really start with the server side logs.


-- 
Thank you,
Dmitri Pal

Sr. Engineering Manager for IdM portfolio
Red Hat Inc.


---
Looking to carve out IT costs?
www.redhat.com/carveoutcosts/



___
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users