from:"Paul B. Henson"

Re: After update slapd 2.5.13->2.5.14, dynlist memberOf not working anymore

2023-03-20 Thread Paul B. Henson

On Mon, Mar 20, 2023 at 10:28:46AM +0100, Andreas Ladanyi wrote:

> ldapsearch memberOf=uid=idm,ou=group,dc=cpp,dc=edu

Yes.

$ ldapsearch memberOf=uid=idm,ou=group,dc=cpp,dc=edu
# extended LDIF  
#  
# LDAPv3   
# base  (default) with scope subtree
# filter: memberOf=uid=idm,ou=group,dc=cpp,dc=edu
# requesting: ALL
#

# henson, user, cpp.edu
dn: uid=henson,ou=user,dc=cpp,dc=edu
[...]

Re: After update slapd 2.5.13->2.5.14, dynlist memberOf not working anymore

2023-03-18 Thread Paul B. Henson

On Fri, Mar 17, 2023 at 09:27:48AM +0100, Andreas Ladanyi KIT wrote:

> dynlist-attrset labeledURIObject labeledURI memberOf+member@groupOfNames

My config is set to:

dynlist-attrset groupOfURLs memberURL member+memberOf@groupOfNames

> ldapsearch  -H ldap://LDAP_Server -s sub  -b BASE_DN  '(uid=username)' 
> memberOf

I don't have any labeledURI associated with users, and it works fine:

# ldapsearch uid=henson memberOf
[...]
dn: uid=henson,ou=user,dc=cpp,dc=edu
memberOf: uid=idm,ou=group,dc=cpp,dc=edu
memberOf: uid=iit,ou=group,dc=cpp,dc=edu

Re: dynlist vs memberof performance issues

2023-01-11 Thread Paul B. Henson

On Tue, Jan 10, 2023 at 01:41:58PM +0100, Ondřej Kuzník wrote:

> The latest manpage update should make it clearer how dynamic *lists*
> differ from dynamic *groups*. And yes, no need to change config for
> groups.

Cool, thank you both for the clarification.

> If you can give the current 2.5/2.6 branch a go, or if you decide to
> wait until we get a call for testing out, feedback is always welcome.

Will do, thanks again...

Re: dynlist vs memberof performance issues

2023-01-09 Thread Paul B. Henson

On Mon, Jan 09, 2023 at 09:26:44AM -0600, Shawn McKinney wrote:

> Yes (both)

Sweet :). We've had some performance issues with the new dynlist
implementation since we upgraded to 2.5, I look forward to trying this
out.

We're currently using dynlist to add the memberOf attribute to users.
One thing that was nice about the new dynlist implementation in 2.5 was
that it allows searching on the dynamic memberOf attribute which we
couldn't do in 2.4. Looking at the commit diff, there's one part that
says:

"To enable filter evaluation on the dynamic list, the configuration must
be changed to explicitly map the dynamic attributes to be filtered"

This just applies to groups created dynamically, right, not static
objects that get dynamic attributes added? Our current config is:

dynlist-attrset groupOfURLs memberURL member+memberOf@groupOfNames

It doesn't require any changes to keep working with a searchable
memberOf attribute?

Thanks much...

Re: dynlist vs memberof performance issues

2023-01-07 Thread Paul B. Henson

On Tue, Jan 03, 2023 at 11:44:30AM -0600, Shawn McKinney wrote:

> Some work on the dynlist code of late to boost performance.
> 
> So, it might be worthwhile to give it a try. MR576[1] in the next release.

Is this going to hit the 2.5 LTS train or just 2.6?

Thanks...

Re: Antw: [EXT] Re: SSSD looking for password policy: "unrecognized control"

2022-11-02 Thread Paul B. Henson

On Wed, Nov 02, 2022 at 08:48:39AM +0100, Ulrich Windl wrote:

> For some strange reason sssd starts do query the sudo schema, even if it was
> not configured on the server, typically flooding the logs with invalid

That's the default. You need to add "sudo_provider = none" to the sssd
config under the domain section that's using ldap as an id_provider.

Re: dynlist vs memberof performance issues

2022-07-26 Thread Paul B. Henson


On 7/26/2022 1:56 AM, Wilkinson, Hugo (IT Dept) wrote:
If you have the ability to do so and your kernel is v5+, out-of-hours 
experiment with disabling system swap entirely (vm.swapiness=0 and 
'swapoff -a' )  and then simulate a run of requests you'd expect to 
encounter the page errors with and see if it stops happening.


We use to run our LDAP servers under Gentoo with a 5.4 kernel at the 
time, but currently are using Rocky Linux 8 which comes with a modified 
4.18 kernel.


I did actually run a production system with no swap for a while, it 
still occasionally had slow responses for queries requesting the 
memberOf attribute.


Part of the problem is I don't expect to ever see these slow response 
times 8-/. As of yet I have not figured out what is going on when they 
are slow, I definitely don't have a reproducible query test case to make 
it happen .


We were running 2.4 under rocky for a while before upgrading to 2.5 and 
never saw this problem, so I'm pretty sure it's the result of some 
change in the dynlist implementation.

Re: dynlist vs memberof performance issues

2022-07-25 Thread Paul B. Henson


On 7/25/2022 10:38 AM, Shawn McKinney wrote:


As you (and others) have pointed out, there's a significant
performance penalty for searching attributes generated by dylist.


I'm still seeing performance issues with queries that simply return 
memberOf, with no reference to it in the actual search filter.


For example, this query which searches on the static uid attribute and 
returns memberOf:


time ldapsearch -H ldapi:/// uid=henson memberOf

Most of the time it completes in fractions of a second:

real0m0.187s
user0m0.005s
sys 0m0.003s

But sometimes it takes 5 seconds, 10 seconds, or even more. These 
extremely slow response times coordinate with a high read I/O percentage 
on the server and the high number of page faults on the slapd process.


When I first deployed 2.5, sometimes the server would get into a state 
where every query that requested memberOf would take in excess of 30 
seconds to return until the server was restarted. I cranked up the 
memory on the servers and at this point I have had no more reoccurrences 
of that behavior, but I am still regularly seeing occasional slow 
performance on the queries and high read I/O percentages.


The servers have way more memory now than they should need to fully 
cache the entire database:


# du -sh /var/symas/openldap-data-cpp
2.6G/var/symas/openldap-data-cpp

# free -m
  totalusedfree  shared  buff/cache 
available
Mem:   4818 703 124   03991 
   3831

Swap:  2047 1361911

I haven't been able to correlate the slow response times with any other 
external criteria such as updates or query load. Sometimes it's just 
slow 8-/. We never saw this problem under 2.4 which used the previous 
implementation of dynlist to generate memberOf. I definitely appreciate 
the ability to query on dynamic memberOf that was added in 2.5, but it 
would be nice to sort out this performance issue.

Re: Antw: [EXT] Re: slow memberOf queries in 2.5 with dynlist overlay

2022-06-03 Thread Paul B. Henson


On 5/30/2022 11:53 PM, Ulrich Windl wrote:


I wonder: Would forcing a core dump help in that situation, or would a
gprof-compile be helpful?


Unfortunately I can't reproduce it; it just happens randomly on my 
production boxes, less frequently since I increased the amount of memory 
on them. I'm not sure I could run a debug or profiling build in 
production for an extended amount of time waiting for it to happen. And 
with my luck it would be a heisenbug anyway 8-/.



Maybe one could track down the page fault to some code (function) using tools
from the valgrind family (Cachegrind, callgrind, etc.)
http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.valgrind-monitor-commands

Re: slow memberOf queries in 2.5 with dynlist overlay

2022-05-30 Thread Paul B. Henson

On Mon, May 30, 2022 at 02:38:11PM +0100, Howard Chu wrote:

> Let us know how things go.

Arg. Seems to have been a red herring. Blew up again with swappiness set
to 1, and then again with swap completely disabled :(. Usual symptoms of
crazy high disk reads:

Total DISK READ : 389.05 M/s | Total DISK WRITE :   3.93 K/s
  
Actual DISK READ: 391.50 M/s | Actual DISK WRITE:   0.00 B/s
TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IO>COMMAND
  
 577547 be/4 ldap   36.88 M/s0.00 B/s  0.00 % 97.92 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 577546 be/4 ldap   32.27 M/s0.00 B/s  0.00 % 97.88 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 575034 be/4 ldap   29.47 M/s0.00 B/s  0.00 % 97.72 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 572838 be/4 ldap   27.38 M/s0.00 B/s  0.00 % 97.66 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 575308 be/4 ldap   24.47 M/s0.00 B/s  0.00 % 97.50 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 572866 be/4 ldap   91.55 M/s0.00 B/s  0.00 % 97.33 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 572841 be/4 ldap   26.96 M/s0.00 B/s  0.00 % 96.87 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 572836 be/4 ldap   43.90 M/s0.00 B/s  0.00 % 96.84 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 577508 be/4 ldap   76.17 M/s0.00 B/s  0.00 % 95.96 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap

Even though there's plenty of memory:

  totalusedfree  shared  buff/cache   available
Mem:   3901 944 109   128472715
Swap: 0   0   0

Looking at the lmdb mapping:

7f662ea25000 5242880  325560   0 rw-s- data.mdb
7f676ec26000 2097152   0   0 rw-s- data.mdb

There seems to be fewer pages mapped in than on one that isn't blowing up:

7f6ab1606000 5242880  560712   0 rw-s- data.mdb
7f6bf1807000 2097152  120772   0 rw-s- data.mdb

Memory use is similar:

  totalusedfree  shared  buff/cache   available
Mem:   3896 725 156   030142893
Swap:  2047 1271920


The one that's unhappy is generating a lot of page faults:

ldap-02 ~ # ps -o min_flt,maj_flt 572833; sleep 10; ps -o min_flt,maj_flt 572833
 MINFL  MAJFL
11924597 3715970
 MINFL  MAJFL
11931358 3718833
ldap-02 ~ # ps -o min_flt,maj_flt 572833; sleep 10; ps -o min_flt,maj_flt 572833
 MINFL  MAJFL
11949883 3726966
 MINFL  MAJFL
11957081 3730080

Compared to the one that's working properly, which has none:

ldap-01 ~ # ps -o min_flt,maj_flt 1227; sleep 10; ps -o min_flt,maj_flt 1227
 MINFL  MAJFL
1282224 221928
 MINFL  MAJFL
1282224 221928
ldap-01 ~ # ps -o min_flt,maj_flt 1227; sleep 10; ps -o min_flt,maj_flt 1227
 MINFL  MAJFL
1282225 221928
 MINFL  MAJFL
1282225 221928

But why? Arg. All the slow queries are asking for memberOf:

May 30 21:54:25 ldap-02 slapd[572833]: conn=120576 op=1 SRCH 
base="ou=user,dc=cpp,dc=edu" scope=2 deref=3 
filter="(&(objectClass=person)(calstateEduPersonEmplID=014994057))"
May 30 21:54:25 ldap-02 slapd[572833]: conn=120576 op=1 SRCH attr=memberOf
May 30 21:56:59 ldap-02 slapd[572833]: conn=120576 op=1 SEARCH RESULT tag=101 
err=0 qtime=0.16 etime=154.273556 nentries=1 text=

There's something going on with the dynlist overlay and memberof queries,
but I still can't figure out what . It's not a low on memory issue,
there's plenty of free memory. But for some reason the read IO goes through
the roof. I'm pretty sure it has the same query load while it's freaking
as it did when it was running fine.

Re: slow memberOf queries in 2.5 with dynlist overlay

2022-05-29 Thread Paul B. Henson

On Sun, Feb 13, 2022 at 08:00:29PM -0800, Paul B. Henson wrote:
> I'm still trying to figure out why my servers sometimes get into a state
> where queries requesting the memberOf attribute take an exceedingly long

So one of my servers got into this state again:

Total DISK READ :  89.60 M/s | Total DISK WRITE : 241.97 K/s
  [0/9406]
Actual DISK READ:  91.61 M/s | Actual DISK WRITE: 140.50 K/s
TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IO>COMMAND

 430373 be/4 ldap   34.88 M/s0.00 B/s  0.00 % 96.44 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 430064 be/4 ldap   45.04 M/s0.00 B/s  0.00 % 94.93 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap
 430069 be/4 ldap6.34 M/s0.00 B/s  0.00 %  8.22 % slapd -d 0 -h 
ldap:/// ~dapi:/// -u ldap -g ldap

There's plenty of free memory:

ldap-02 ~ # free -m
  totalusedfree  shared  buff/cache   available
Mem:   3901 418 113   033683255
Swap:  2047 7631284

Just for giggles, I removed all swap:

  totalusedfree  shared  buff/cache   available
Mem:   3901 730 102   130682949
Swap: 0   0   0

The problem immediately went away. Didn't restart slapd, didn't do anything,
other than remove all swap and force it to use the plethora of memory
it had available. Disk read went back to virtually 0, response time went
back to subsecond.

I updated vm.swappiness to 1 (it defaulted to 30) and added swap back, I'm
going to see what happens.

Have no idea what's causing it, but it seems somehow the system and slapd
get into a state where doing memberof queries make it read lmdb pages
from disk rather than keeping them in memory?

Re: dynlist vs memberof performance issues

2022-05-21 Thread Paul B. Henson


On 5/11/2022 3:48 AM, Soisik Froger wrote:

Are this performance issues an expected side-effect of switching to 
dynlist - as the memberOf attributes are now dynamically calculated 
while the memberOf overlay used to writes these attributes - or 


I am also having ongoing sporadic issues with memberOf performance using 
the new dynlist overlay. Initially, randomly a server would get into a 
state where any query requesting the memberOf attribute would take in 
excess of 30 seconds, whereas normally it would only take a fraction of 
a second. The symptoms were the same, free memory, no swapping, but 
insanely high read IO load.


I cranked up the memory, which not did resolve the issue, but did help, 
it doesn't happen nearly as often. But still, every now and again, a 
server demonstrates a high read IO rate and severely degraded memberOf 
query performance. At this point, I just have a monitoring check that 
alerts on slow query performance and high read I/O and go restart them 
when it happens, as the additional memory made the issue go from a 
couple of times a week to every month or three.


I did notice now that when the issue occurs the box with the slow 
queries does have less memory available then when it is working 
normally, but still a good gigabyte of free memory not being used.


Even when the systems don't completely blow up, there are occasional 
slower than normal queries. Typically the test query I am doing 
literally takes fractions of a second:


May 21 19:47:22 ldap-01 slapd[1223]: conn=849157 op=1 SEARCH RESULT 
tag=101 err=0 qtime=0.15 etime=0.198042 nentries=1 text=


Every now and again for no discernible reason it might take 5 to 10 seconds.

Re: slow memberOf queries in 2.5 with dynlist overlay

2022-02-16 Thread Paul B. Henson


On 2/16/2022 11:25 AM, Howard Chu wrote:

Use perf (formerly oprofile) and get a trace of slapd's execution
during one of these searches. Also get a trace when it is performing
normally, for comparison.


Cool, thanks much. Now I've just got to wait until it happens again; 
which given I'm ready this time to investigate it better means it 
probably will work fine for a while :).

Re: Antw: [EXT] Re: slow memberOf queries in 2.5 with dynlist overlay

2022-02-16 Thread Paul B. Henson


On 2/16/2022 3:11 AM, Ulrich Windl wrote:


"man pmap", maybe?


You must be using a different version of pmap than me? My line came from 
pmap, which on my system emits the following when given no options:


ldap-01 ~ # pmap 1207 

1207:   /opt/symas/lib/slapd -d 0 -h ldap:/// ldaps:/// ldapi:/// -u 
ldap -g ldap
556f4f39   1328K r-x-- slapd 

556f4f6dc000 16K r slapd 

556f4f6e 36K rw--- slapd 

556f4f6e9000348K rw---   [ anon ] 



Ah, or perhaps you have pmap aliased to 'pmap -x'?

ldap-01 ~ # pmap -x 1207
1207:   /opt/symas/lib/slapd -d 0 -h ldap:/// ldaps:/// ldapi:/// -u 
ldap -g ldap

Address   Kbytes RSS   Dirty Mode  Mapping
556f4f391328 604   0 r-x-- slapd
556f4f6dc000  16  12   8 r slapd
556f4f6e  36  16  16 rw--- slapd

 I did neglect to look at the pmap man page d'oh 8-/. 
I did a Google search looking for how to get that information but found 
no results, and there it was right in front of my face as an option to 
the command I was already using . I apologize, my group has shrunk 
from three members down to one and I am ridiculously overloaded and 
somewhat thrashing and flailing 8-/.


So it appears that right now there is about 720M mapped of the primary 
mdb, and about 171M mapped for the accesslog:


7fece8cfb000 5242880  738648   0 rw-s- data.mdb
7fee28efc000 2097152  175864   0 rw-s- data.mdb

The next time the problem occurs I will take a look at this and see if 
it has changed or is doing anything different.


Thanks for the pointer :).


# pmap $$
21561:   -bash
Address   Kbytes RSS PSS   DirtySwap Mode  Mapping
558d2476d000 964 904 904   0   0 r-xp- /usr/bin/bash
558d24a5e000   8   8   8   8   0 r--p- /usr/bin/bash
558d24a6  16  16  16  16   0 rw-p- /usr/bin/bash
558d24a64000  56  44  44  44   0 rw-p-   [ anon ]
558d25ff70004464438443844384   0 rw-p-   [ anon ]
7f1ca39a80002528 192  35   0   0 r--p-
/usr/lib/locale/en_US.utf8/LC_COLLATE
7f1ca3c2  28  28   0   0   0 r-xp-
/lib64/libnss_compat-2.31.so
7f1ca3c270002048   0   0   0   0 ---p-
/lib64/libnss_compat-2.31.so
7f1ca3e27000   4   4   4   4   0 r--p-
/lib64/libnss_compat-2.31.so
7f1ca3e28000   4   4   4   4   0 rw-p-
/lib64/libnss_compat-2.31.so
7f1ca3e29000 152 152  38   0   0 r-xp-
/lib64/libtinfo.so.6.1
7f1ca3e4f0002044   0   0   0   0 ---p-
/lib64/libtinfo.so.6.1
[...]



Thanks…

Re: Antw: [EXT] Re: slow memberOf queries in 2.5 with dynlist overlay

2022-02-16 Thread Paul B. Henson


On 2/16/2022 3:08 AM, Ulrich Windl wrote:


Remember there are some classic tools like sar, vmstat, iostat, etc.
to display or store some interesting information about what the OS is


Like I mentioned, the OS doesn't appear to be doing anything interesting 
when this has occurred. There's plenty of free memory, there is no swap 
paging, and CPU utilization isn't particularly high.



iotop is another nice utility.


Yes, I was using that; the only abnormal behavior when the problem 
occurred was extremely high read IO by the slapd process. Normally, read 
IO is virtually nil, jumping up to a MB/s or two occasionally. When the 
memberOf query is taking excessive time, there is like continuous 
200-400 MB/s read IO being performed by slapd. Despite the high read IO, 
all queries that don't involve memberOf continue to operate with usual 
speed. Just the queries that return memberOf are degraded.


That's why it feels to me like an issue somehow with slapd, not the OS.


You didn't tell what hypervisor you are using, bu tdid you know that
Xen PVMs support "memory hotplugging"?
We are using VMware ESXi, which also supports memory hot plugging. 
Although they do something weird where if you have less than 3G of RAM 
allocated they only let you hot plug up to 3G. If you have more than 3G 
allocated, you can hot plug up to the max. There is some knowledge base 
article about it, ah, here it is:


https://kb.vmware.com/s/article/2008405

It claims that linux "freezes" if you increase from less than 3G to more 
than 3G? I can believe that maybe that occurred once upon a time due to 
a bug, but find it hard to believe that it still does. It seems like 
VMware implemented a kludge workaround and has never removed it...


Yup. I just bumped the memory from 2GB to 8GB on an instance running 
under kvm, no problem. Stupid vmware. Hmm, nifty, kvm even lets you 
dynamically remove memory, another thing VMware doesn't let you do.


Thanks…

Re: slow memberOf queries in 2.5 with dynlist overlay

2022-02-15 Thread Paul B. Henson


On 2/15/2022 1:57 AM, Ondřej Kuzník wrote:


- if, to answer that query, you need to crawl a large part of the DB,
   the OS will have to page that part into memory


Do you know if there's any way to tell which pages of a memory mapped 
file are actually in memory at any given time? The process map just 
shows 5G (the max size I currently have configured for the database):


7fece8cfb000 5242880K rw-s- data.mdb

Thanks…

Re: slow memberOf queries in 2.5 with dynlist overlay

2022-02-15 Thread Paul B. Henson


On 2/15/2022 1:57 AM, Ondřej Kuzník wrote:


- your DB is just over the size of available RAM by itself


Yes, but that size includes not just the data itself, but all of the 
indexes as well, right?



- after a while using the system, other processes (and slapd) will carve
   out a fair amount of it that the system will be unwilling/unable to
   page out


Yes. But that is not currently the case. Here is a slapd process on one 
of our nodes that has been up about a week and a half:


ldap1207   1  9 Feb04 ?1-01:46:47 
/opt/symas/lib/slapd -d 0 -h ldap:/// ldaps:/// ldapi:/// -u ldap -g ldap


It's resident set is a bit less than a gigabyte:

PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ 
COMMAND
1207 ldap  20   0 8530708 954688 829836 S  28.1  47.8   1546:40 
slapd


While unused (ie wasted) memory is only 82M, the amount of memory in use 
by buffer/cache that the system would be willing to give up at any time 
is more than a gigabyte:


  totalusedfree  shared  buff/cache 
available
Mem:   1949 413  82   01453 
   1382

Swap:  1023 295 728

When the problem occurs, there isn't a memory shortage. There is still 
free memory. Nothing is getting paged in or out, the only IO is 
application read, not system swap.



- if, to answer that query, you need to crawl a large part of the DB,
   the OS will have to page that part into memory, at the beginning,
   there is enough RAM to do it all just once, later, you've reached a
   threshold and it needs to page bits in and then drop them again to
   fetch others you develop these symptoms - lots or read I/O and a delay
   in processing


Intuitively that does sound like a good description of the problem I'm 
having. But the only thing that takes a long time is returning the 
memberOf attribute. When queries requesting that are taking more than 30 
seconds or even minutes to respond, all other queries remain 
instantaneous. It seems unlikely that under memory pressure the only 
queries that would end up having to page out stuff and be degraded would 
be those? Every other query just happens to have what it needs still in 
memory?



Figure out what is involved in that search and see if you can tweak


It's not a very complicated query:

# ldapsearch -x -H ldapi:/// uid=henson memberOf
[...]
dn: uid=henson,ou=user,dc=cpp,dc=edu 

memberOf: uid=idm,ou=group,dc=cpp,dc=edu 

memberOf: uid=iit,ou=group,dc=cpp,dc=edu 



If I understand correctly, this just needs to access the index on uid to 
find my entry, and then the dynlist module presumably does something 
like this:


# ldapsearch -x -H ldapi:/// member=uid=henson,ou=user,dc=cpp,dc=edu dn
[...]
# cppnet, group, cpp.edu
dn: uid=cppnet,ou=group,dc=cpp,dc=edu
# cppnet-admin, group, cpp.edu
dn: uid=cppnet-admin,ou=group,dc=cpp,dc=edu

this just needs to access the index on member to find all of the group 
objects, which in my case is 36.


So it only needs to have two indexes and 37 objects in memory to perform 
quickly, right?


When performance on memberOf queries is degraded, this takes more than 
30 seconds to run. Every single time. I could run it 20 times in a row 
and it always takes more than 30 seconds. If it was a memory issue, you 
would think that at least some of the queries would get lucky and the 
pages needed would be in memory, given they had just been accessed 
moments before?


I can certainly just throw memory at it and hope the problem goes away. 
But based on the observations when it occurs it does not feel like just 
a memory problem. The last time it happened I pulled the node out of the 
load balancer so nothing else was poking at it and the test query was 
still taking more than 30 seconds.


I'm going to bump the production nodes up to 4G, which should be more 
than enough to run the OS and always have the entire database plus all 
indexes in memory. I will keep my fingers crossed this problem just goes 
away, but if it doesn't, what else can I do when it occurs to help track 
it down?


Thanks much…

Re: Antw: [EXT] Re: slow memberOf queries in 2.5 with dynlist overlay

2022-02-15 Thread Paul B. Henson


On 2/14/2022 11:55 PM, Ulrich Windl wrote:


Independent of LDAP my guess is that 2GB is somewhat tight these
days, and my guess is that it's virtual machine.


It is a virtual machine indeed. I generally try to minimize the memory 
allocated to my linux instances to make up for the bloated Windows 
instances that are sucking up 32-64GB on the same virtualization 
infrastructure 8-/.


I can certainly just throw memory at it and hope the problem goes away, 
but I will discuss a bit more why I don't think it is memory in my reply 
to Ondřej.


Thanks…

Re: slow memberOf queries in 2.5 with dynlist overlay

2022-02-14 Thread Paul B. Henson

On Mon, Feb 14, 2022 at 09:57:06AM -0800, Quanah Gibson-Mount wrote:

> How large is the total size of all mdb databases?

According to mdb_stat the primary db is:

  Page size: 4096
  Number of pages used: 500651

4k * 500651 = 2002604

which lines up with du:

# du -k data.mdb 
2002604 data.mdb

Then the accesslog db is:

  Page size: 4096
  Number of pages used: 74753

# du -k data.mdb 
299016  data.mdb

So a total of a bit over 2GB.

> How much RAM do you have on the system?

2GB. I don't think I'm running low on memory, there's usually a bit
free:

# free -m
  totalusedfree  shared  buff/cache   available
Mem:   1949 423  79   014461372
Swap:  1023 291 732

And when it's being really slow for memberOf there's no swapping or other
signs of low memory. Other queries are still very responsive, it's just
memberOf that gets slow. The only metric that seems ramped when it occurs
is high read IO.

> Is it possible that you've exceeded the index slot size?

I'm not sure. How would I check that? The issue only happens occasionally
though, with generally the same load and query set. If it was a problem
like that I would think it would happen all the time.

Is there anything I could poke with mdb_stat or the slapd monitor interface
the next time it happens to try and narrow down the root cause?

Thanks much...

Re: slapo-memberof -> slapo-dynlist and value-based ACLs

2022-02-14 Thread Paul B. Henson


On 2/14/2022 12:43 PM, Michael Ströder wrote:

Thus I have ACLs like this and which don't work for these clients (lines 
wrapped):


I'm not sure if you are asking whether the slapo-dynlist memberOf 
implementation supports ACLs in general, or specifically the type of ACL 
you are trying to use?


We are currently using the slapo-dynlist module for memberOf:

dynlist-attrset groupOfURLs memberURL member+memberOf@groupOfNames

and the following ACL appears to work properly:

access to dn.children="ou=user,dc=cpp,dc=edu" attrs=memberOf
by self read
by 
group.exact="cn=member-readers,ou=group,ou=service,dc=cpp,dc=edu" read

by * none

slow memberOf queries in 2.5 with dynlist overlay

2022-02-13 Thread Paul B. Henson

I'm still trying to figure out why my servers sometimes get into a state
where queries requesting the memberOf attribute take an exceedingly long
time to process, for example:

Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 fd=104 ACCEPT from 
PATH=/var/symas/run/ldapi (PATH=/var/symas/run/ldapi)
Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=0 BIND 
dn="cn=ldaproot,dc=cpp,dc=edu" method=128
Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=0 BIND 
dn="cn=ldaproot,dc=cpp,dc=edu" mech=SIMPLE bind_ssf=0 ssf=71
Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=0 RESULT tag=97 err=0 
qtime=0.06 etime=0.50 text=
Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=1 SRCH 
base="dc=cpp,dc=edu" scope=2 deref=0 filter="(uid=henson)"
Feb 13 19:46:05 ldap-02 slapd[13564]: conn=297643 op=1 SRCH attr=memberOf
Feb 13 19:46:42 ldap-02 slapd[13564]: conn=297643 op=1 SEARCH RESULT tag=101 
err=0 qtime=0.12 etime=36.955710 nentries=1 text=

How is the qtime calculated? It is nice and short, despite the wall clock
reading over 30 seconds :(.

Usually I have to reboot the server completely to clear this up, but this
time I just had to restart it, and then the queries were lickity split
again:

Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 fd=40 ACCEPT from 
PATH=/var/symas/run/ldapi (PATH=/var/symas/run/ldapi)
Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=0 BIND 
dn="cn=ldaproot,dc=cpp,dc=edu" method=128
Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=0 BIND 
dn="cn=ldaproot,dc=cpp,dc=edu" mech=SIMPLE bind_ssf=0 ssf=71
Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=0 RESULT tag=97 err=0 
qtime=0.09 etime=0.88 text=
Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=1 SRCH base="dc=cpp,dc=edu" 
scope=2 deref=0 filter="(uid=henson)"
Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=1 SRCH attr=memberOf
Feb 13 19:55:01 ldap-02 slapd[89655]: conn=1556 op=1 SEARCH RESULT tag=101 
err=0 qtime=0.13 etime=0.213149 nentries=1 text=

Over 30 seconds elapsed time to .2 seconds? I'd like to see the .2 all
the time :).

When the server gets like this, there's a very high read IO load, 200-300Mb/s,
compared to generally less than 1Mb/s when things are working right.

It almost seems like it's doing a full disk scan searching for members
every time.

Any suggestions on what to dig into?

Thanks...

Re: log analysis tools

2022-02-07 Thread Paul B. Henson


On 2/6/2022 11:55 PM, SATOH Fumiyasu wrote:


converts raw OpenLDAP stats log to JSONL (JSON lines), but
currently does NOT support OpenLDAP 2.5+ stats log.

Just FYI.


Cool, thanks for the pointer.

Re: log analysis tools

2022-02-05 Thread Paul B. Henson

On Sat, Feb 05, 2022 at 09:57:15AM -0300, Andreas Hasenack wrote:
> openldap also has a monitor backend IIRC, have you looked into that?

Yes, historically we've used that with icinga and munin, although we're
looking to replace munin. That doesn't provide the per query timing
analysis I'm looking for to address a specific performance issue though.

Thanks...

Re: log analysis tools

2022-02-05 Thread Paul B. Henson

On Sat, Feb 05, 2022 at 10:54:08AM +0100, Michael Ströder wrote:

> You could also look into tools which extract metrics from logs and 
> provide them as a Prometheus-compatible exporter: mtail, promtail etc.

Cool, thanks for the pointer.

Re: log analysis tools

2022-02-05 Thread Paul B. Henson

On Fri, Feb 04, 2022 at 10:12:40PM -0500, Dave Macias wrote:
> https://www.ltb-project.org/documentation/ldap-stats.html

Thanks for the pointer. There doesn't seem to be any way to download
just the script? You have to get their whole tool package? I don't
really want to add a repo just for this, I tried the manual download
option on their page for CentoOS 8:

https://www.ltb-project.org/download.html

https://www.ltb-project.org/archives/openldap-ltb-2.5.11-el8.tar.gz

But the second link just takes me to:

https://www.ltb-project.org/

Meh. I'm really looking for query times too, which it doesn't seem to
provide. I set up a test script which does a memberOf query every 5
minutes to alert me when they start taking 30+ seconds. Most of the time
they're subsecond, but every now and again the exact same query takes
5-20 seconds, just here and there. Odd. At least the occasional slow
query is better than continuous slow queries.

log analysis tools

2022-02-04 Thread Paul B. Henson

Does anybody know of any good tools that can rip through an openldap log 
file and analyze it, creating a report of what queries are being made 
and how long they are taking to process?


All of the information I'm interested in is included in the log:

Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 fd=84 ACCEPT from 
IP=134.71.247.28:46592 (IP=0.0.0.0:636)
Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 fd=84 TLS established 
tls_ssf=256 ssf=256 tls_proto=TLSv1.2 tls_cipher=ECDHE-RSA-AES256-GCM-SHA384
Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 op=0 BIND 
dn="cn=it_boomi,ou=user,ou=service,dc=cpp,dc=edu" method=128
Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 op=0 BIND 
dn="cn=it_boomi,ou=user,ou=service,dc=cpp,dc=edu" mech=SIMPLE bind_ssf=0 
ssf=256
Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 op=0 RESULT tag=97 err=0 
qtime=0.31 etime=0.000189 text=
Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 op=1 SRCH 
base="ou=user,dc=cpp,dc=edu" scope=2 deref=3 
filter="(&(objectClass=person)(calstateEduPersonEmplID=014532336))"

Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 op=1 SRCH attr=memberOf
Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 op=2 UNBIND
Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 op=1 SEARCH RESULT 
tag=101 err=0 qtime=0.16 etime=0.192994 nentries=1 text=

Feb  4 18:23:54 ldap-01 slapd[1207]: conn=46272 fd=84 closed

but split up into a number of different lines which need to be 
correlated to summarize it. Before I try it myself I was hoping somebody 
else had already scratched that itch :). The only things I can find 
searching are either really old or commercial products.


Thanks much…

intermittent memberOf performance issues

2022-02-04 Thread Paul B. Henson

I've run into another problem with the memberOf implementation on my 2.5 
servers. After I sorted out the proper configuration, queries requesting 
 memberOf were very performant:


Feb  4 13:26:44 ldap-01 slapd[1207]: conn=23393 op=1 SRCH 
base="ou=user,dc=cpp,dc=edu" scope=2 deref=3 
filter="(&(objectClass=person)(calstateEduPersonEmplID=013522522))"

Feb  4 13:26:44 ldap-01 slapd[1207]: conn=23393 op=1 SRCH attr=memberOf
Feb  4 13:26:44 ldap-01 slapd[1207]: conn=23393 op=1 SEARCH RESULT 
tag=101 err=0 qtime=0.15 etime=0.191860 nentries=1 text=


However, intermittently the server gets into a state where the exact 
same query takes over 30 seconds:


Feb  4 08:05:11 ldap-01 slapd[1425456]: conn=40797 op=1 SRCH 
base="ou=user,dc=cpp,dc=edu" scope=2 deref=3 
filter="(&(objectClass=person)(calstateEduPersonEmplID=015559557))"

Feb  4 08:05:11 ldap-01 slapd[1425456]: conn=40797 op=1 SRCH attr=memberOf
Feb  4 08:05:50 ldap-01 slapd[1425456]: conn=40797 op=1 SEARCH RESULT 
tag=101 err=0 qtime=0.19 etime=39.435523 nentries=1 text=


When this occurs, the only way to resolve the issue that I have found is 
to reboot the server. Simply restarting slapd results in the same 
degraded performance on these queries.


Normally there is very low read I/O load on the servers during 
operation, probably averaging less than 1M/s, peaking up to maybe 
20-30M/s for just an instant occasionally. When the memberOf query 
performance is degraded, there is a very high read I/O load on the 
server, continuously about 200-300M/s.


Any thoughts on this? It seems like for some reason the server gets into 
a state where it is not using the cache or memory map for doing the 
search required to construct the memberOf results? But instead is doing 
a full disk read of the entire database?


It's also weird that restarting the service is not resolve this, but 
rebooting the server does. I'm not intimately familiar with the 
internals of lmdb, is there some state that persists with the 
environment or memory map in between service runs that is only cleared 
by a reboot?


I initially thought I might have had a theory on it, relating to an 
unrelated bug in RHEL 8.5 that broke the "needs-rebooting" command 
resulting in servers not properly rebooting after kernel/library 
updates. The most recent occurrence of this issue started up after such 
an update without the required reboot, but upon reviewing historic 
occurrences it has occurred at times that don't meet that criteria, so I 
find myself clueless again as to what's going on.


Any advice on how to fix or do further debugging on this issue much 
appreciated, thanks…

Re: compare operation behaves differently under 2.5

2021-11-10 Thread Paul B. Henson

On Wed, Nov 10, 2021 at 10:14:44PM -0600, Quanah Gibson-Mount wrote:

> Yea this is a bug. Please open an ITS

Ok, ITS 9747. Let me know if there's anything I can do to help resolve
this issue or you'd like me to test any fixes.

Thanks again...

Re: compare operation behaves differently under 2.5

2021-11-10 Thread Paul B. Henson

On Wed, Nov 10, 2021 at 04:36:00PM -0800, Quanah Gibson-Mount wrote:

> If you disable the dynlist overlay, do you get the same behavior?

Nope; if I remove the line

dynlist-attrset groupOfURLs memberURL member+memberOf@groupOfNames

from the config, the ldapcompare command succeeds as expected. Good call
:).

I only need it to add the memberOf attribute to users, I don't need it
to muck with the member attribute on groups (we don't have any dynamic
groups), but I didn't see a way to configure it that way?

For giggles I tried changing it to

dynlist-attrset groupOfURLs memberURL uniqueMember+memberOf@groupOfNames

which made the compare work but the memberOf attribute wasn't
populated...

Bug? Should I open an ITS?

Thanks much...

compare operation behaves differently under 2.5

2021-11-10 Thread Paul B. Henson

I've run into another quirk during my 2.5 upgrade process. We use the 
Apache ldap authnz module with a configuration such as:


AuthLDAPURL ldaps://ldap.cpp.edu:636/DC=cpp,DC=edu?uid
Require ldap-group uid=unxadmin,ou=group,dc=cpp,dc=edu

This stopped working when accessing a server updated to 2.5. On the 2.4 
server, the Apache logs look like:


[Wed Nov 10 15:07:52.056237 2021] [authnz_ldap:debug] [pid 29887] 
mod_authnz_ldap.c(926): [client 10.104.223.117:51050] AH01714: auth_ldap 
authorize: require group: testing for member: 
uid=henson,ou=user,dc=cpp,dc=edu (uid=unxadmin,ou=group,dc=cpp,dc=edu)
[Wed Nov 10 15:07:52.056264 2021] [authnz_ldap:debug] [pid 29887] 
mod_authnz_ldap.c(935): [client 10.104.223.117:51050] AH01715: auth_ldap 
authorize: require group: authorization successful (attribute member) 
[Comparison true (cached)][6 - Compare True]


and the slapd logs look like:

Nov 10 15:07:52 ldap-01 slapd[1233]: conn=224154 op=4 CMP 
dn="uid=unxadmin,ou=group,dc=cpp,dc=edu" attr="member"
Nov 10 15:07:52 ldap-01 slapd[1233]: conn=224154 op=4 RESULT tag=111 
err=6 text=
Nov 10 15:08:42 ldap-01 slapd[1233]: conn=224154 fd=138 closed 
(connection lost)


whereas on the 2.5 server, the Apache logs look like:

[Wed Nov 10 15:03:52.375004 2021] [authnz_ldap:debug] [pid 29088] 
mod_authnz_ldap.c(926): [client 10.104.223.117:51022] AH01714: auth_ldap 
authorize: require group: testing for member: 
uid=henson,ou=user,dc=cpp,dc=edu (uid=unxadmin,ou=group,dc=cpp,dc=edu)
[Wed Nov 10 15:03:52.375887 2021] [authnz_ldap:debug] [pid 29088] 
mod_authnz_ldap.c(945): [client 10.104.223.117:51022] AH01719: auth_ldap 
authorize: require group "uid=unxadmin,ou=group,dc=cpp,dc=edu": didn't 
match with attr member [Comparison false (adding to cache)][5 - Compare 
False]


and the slapd logs look like:

Nov 10 15:03:52 ldap-03 slapd[1197]: conn=208924 op=4 CMP 
dn="uid=unxadmin,ou=group,dc=cpp,dc=edu" attr="member"
Nov 10 15:03:52 ldap-03 slapd[1197]: conn=208924 op=4 RESULT tag=111 
err=5 qtime=0.11 etime=0.000139 text=


If I understand correctly, Apache is making a compare call to check to 
see if my DN (uid=henson,ou=user,dc=cpp,dc=edu) exists as a value of the 
member attribute in the group uid=unxadmin,ou=group,dc=cpp,dc=edu. It 
certainly does, on both the server running 2.4 and 2.5:


dn: uid=unxadmin,ou=group,dc=cpp,dc=edu
objectClass: groupOfNames
objectClass: cppGroup
objectClass: posixGroup
uid: unxadmin
cn: Unix Administrators
gidNumber: 17730
member:
member: uid=gkuri,ou=user,dc=cpp,dc=edu
member: uid=henson,ou=user,dc=cpp,dc=edu
memberUid: gkuri
memberUid: henson

Any thoughts on why this is behaving differently? The configuration 
should be pretty much identical, modulo required path changes and 
updating the memberOf handling.


I was able to work around it by sitting a couple of additional parameters:

AuthLDAPGroupAttribute memberUid
AuthLDAPGroupAttributeIsDN off

This now compares just my uid (henson) to the group memberUid attribute:

Nov 10 15:24:54 ldap-03 slapd[1197]: conn=210708 op=4 CMP 
dn="uid=unxadmin,ou=group,dc=cpp,dc=edu" attr="memberUid"
Nov 10 15:24:54 ldap-03 slapd[1197]: conn=210708 op=4 RESULT tag=111 
err=6 qtime=0.16 etime=0.000108 text=


Apparently "err=6" is good and "err=5" is not :)? I was able to 
replicate this difference in behavior using the CLI tools:


# ldapcompare -x -H ldaps://ldap-01.ldap.cpp.edu/ 
uid=unxadmin,ou=group,dc=cpp,dc=edu member:uid=henson,ou=user,dc=cpp,dc=edu

TRUE

# ldapcompare -x -H ldaps://ldap-03.ldap.cpp.edu/ 
uid=unxadmin,ou=group,dc=cpp,dc=edu member:uid=henson,ou=user,dc=cpp,dc=edu

FALSE

ldapsearch confirms that attribute exists for that group in both locations:

ldapsearch -x -H ldaps://ldap-01.ldap.cpp.edu/ 
member=uid=henson,ou=user,dc=cpp,dc=edu dn | grep unxadmin

# unxadmin, group, cpp.edu
dn: uid=unxadmin,ou=group,dc=cpp,dc=edu

# ldapsearch -x -H ldaps://ldap-03.ldap.cpp.edu/ 
member=uid=henson,ou=user,dc=cpp,dc=edu dn | grep unxadmin

# unxadmin, group, cpp.edu
dn: uid=unxadmin,ou=group,dc=cpp,dc=edu

Help :)? Thanks much…

Re: Antw: [EXT] Symas openldap 2.5 RPMs / openssl cert trust

2021-10-28 Thread Paul B. Henson

On Thu, Oct 28, 2021 at 08:59:00AM +0200, Ulrich Windl wrote:

> OK, thanks for explaining. I wasN#t aware that slapd uses ist own
> versiomn of openssl.

It doesn't necessarily in general, but specifically the Symas built RPMs
do. If you use an OS packaged version it uses the os openssl, and if you
build your own it does whatever you tell it to :).

Re: Antw: [EXT] Symas openldap 2.5 RPMs / openssl cert trust

2021-10-25 Thread Paul B. Henson


On 10/25/2021 12:33 PM, Quanah Gibson-Mount wrote:

Symas OpenLDAP 2.5 (and soon 2.6) reflect how we would package the 
software.  Note that in 2.6, you can specify multiple paths to find CA 
certs in, so you could configure it to use the system CAs as well as 
your own local certificate authority if desired.


Okay, cool; I will update my local configuration to meet my needs, thanks…

Re: Antw: [EXT] Symas openldap 2.5 RPMs / openssl cert trust

2021-10-25 Thread Paul B. Henson


On 10/24/2021 11:19 PM, Ulrich Windl wrote:


Some time ago the way to install root certificates had changed


You mean on the server side? There's nothing wrong with the certificate 
chain on the server, everything trusts that properly including the 
ldapsearch included with the Symas openldap 2.4 rpms. The issue is that 
the 2.5 rpms include their own bundled version of openssl, which is not 
configured to trust the system certificate repository.

Symas openldap 2.5 RPMs / openssl cert trust

2021-10-22 Thread Paul B. Henson

The openssl binaries in the 2.5 RPMs use their own build of openssl,
which doesn't appear to be configured to trust the system root
certificate store:

$ ldapsearch -H ldaps://ldap.cpp.edu/
ldap_sasl_interactive_bind: Can't contact LDAP server (-1)
additional info: error:1416F086:SSL
routines:tls_process_server_certificate:certificate verify
failed (self signed certificate in certificate chain)

It works fine if you explicitly tell it to:

SSL_CERT_FILE=/etc/pki/tls/cert.pem ldapsearch -x -H ldaps://ldap.cpp.edu/
# extended LDIF
[...]

Is this intentional? It seems it would be useful for the openldap
utilities, which are added to the default search path, to support the
standard system root CA's.

Thanks...

Re: significant performance degradation in 2.5, broken indexes?

2021-10-21 Thread Paul B. Henson


On 10/21/2021 6:06 PM, Howard Chu wrote:


You should change this configuration to use 2.5 dynlist's memberOf support.


Ah, it seems I stupidly didn't look at the examples at the bottom of the 
man page 8-/.


I'm not using any dynamic groups, so I guess the example relevant to me is:

dynlist-attrset groupOfURLs memberURL member+memberOf@groupOfNames

which is described as "This example extends the dynamic memberOf feature 
to add the memberOf attribute to all the members of both static and 
dynamic groups".


I don't have any dynamic groups, so no objects with the class 
groupOfURLs will contain a memberURL attribute. So what exactly would 
this do? Hmm. I tried it and it seems to work. Wow, you can even search 
by memberOf now, you couldn't do that before, nice.


How does this get triggered? Previously, if I searched for a user, it 
would match on the object class for the user and expand the memberURL 
dynamically, filling in the memberOf attribute. What makes the overlay 
add in the attributes when I search for a user, as the group isn't 
referenced? Perhaps some source code spelunking is in order :).


This appears to solve my problem, but I am still curious why the config 
that worked fine under 2.4 blew up under 2.5.


Thanks much...

Re: significant performance degradation in 2.5, broken indexes?

2021-10-21 Thread Paul B. Henson


On 10/21/2021 6:06 PM, Howard Chu wrote:


Then this is probably dynlist searching for objectclass=cppEduPerson.

You should change this configuration to use 2.5 dynlist's memberOf support.


I must have missed that, I wasn't aware of any new specific memberOf 
support in dynlist? I don't see anything mentioning that in the 
administration guide:


https://www.openldap.org/doc/admin25/guide.html#Dynamic%20Lists

I did find a reference to it in a release announcement:

https://www.openldap.org/software/release/announce.html

"dynlist can now generate (is)memberOf dynamically"

but it had no specifics as to how that was configured or what it did?

Ah, I see there is a mention of it in the man page, but sadly I don't 
quite understand it.


   dynlist-attrset  []
[[:][+[*]] ...]


  The  value group-oc is the name of the objectClass that 
triggers the dynamic expansion of the data.


  The optional URI restricts expansion only to entries 
matching the DN, the scope and the filter portions of the URI.


  The value URL-ad is the name of the attributeDescription 
that contains the URI that is expanded by the overlay; if none is 
present, no expansion occurs.  If the intersection  of  the  attributes 
 requested  by  the  search operation (or the asserted attribute for 
compares) and the attributes listed in the URI is empty, no expansion

occurs for that specific URI.  It must be a subtype of labeledURI.

  The  value  member-ad  is  optional;  if  present, the 
overlay behaves as a dynamic group: this attribute will list the DN of 
the entries resulting from  the  internal search.   In  this case, the 
attrs portion of the URIs in the URL-ad attribute must be absent, and 
the DNs of all the entries resulting from the expansion of the  URIs
are listed as values of this attribute.  Compares that assert the value 
of the member-ad attribute of entries with group-oc objectClass apply as 
if  the  DN  of  the entries  resulting from the expansion of the URI 
were present in the group-oc entry as values of the member-ad attribute. 
 If the  optional  memberOf-ad  attribute  is also  specified,  then it 
will be populated with the DNs of the dynamic groups that an entry is a 
member of.  If the optional static-oc objectClass is also  specified,
then  the  memberOf  attribute  will  also  be populated with the DNs of 
the static groups that an entry is a member of.


It appears it still needs an object class to trigger it? And in my case, 
that object class would still be cppEduPerson? (IE, only trigger this 
dynamic expansion on objects that have that object class?) Right now my 
configuration again is:


dynlist-attrset cppEduPerson memberURL memberOf

My understanding of which says that for any search which returns an 
object of the object class "cppEduPerson", perform the search as 
indicated in the attribute "memberURL", which for me is:


memberURL: ldap:///dc=cpp,dc=edu??sub?(memberUid=henson)

and shove all the DN's that result from that search in the memberOf 
attribute. What exactly am I supposed to do differently to avail of this 
new support? And how would it remove the need for the reference to the 
cppEduPerson object class?



Indexing is not broken.


There was a question mark in my subject line, I was just guessing :). It 
is still though definitely a difference in behavior between 2.4 and 2.5, 
and I'm not understanding why? I don't see any mention of dynlist or 
memberOf in the upgrade guide:


https://www.openldap.org/doc/admin25/guide.html#Upgrading%20from%202.4.x

Is my current configuration under 2.4 "broken but happens to work"?

Thanks…

Re: significant performance degradation in 2.5, broken indexes?

2021-10-21 Thread Paul B. Henson

mal: 
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <<< dnPrettyNormal: 
, 

Oct 21 17:33:16 ldap-dev-02 slapd[2389]: SRCH "dc=cpp,dc=edu" 2 0
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: 0 0 0
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: filter: (uid=henson)
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: attrs:
Oct 21 17:33:16 ldap-dev-02 slapd[2389]:
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: ==> limits_get: conn=1000 op=1 
self="[anonymous]" this="dc=cpp,dc=edu"
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <== limits_get: type=DN 
match=ANONYMOUS

Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => mdb_search
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: mdb_dn2entry("dc=cpp,dc=edu")
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => mdb_dn2id("dc=cpp,dc=edu")
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <= mdb_dn2id: got id=0x1
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => mdb_entry_decode:
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <= mdb_entry_decode
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: search_candidates: 
base="dc=cpp,dc=edu" (0x0001) scope=2
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => mdb_equality_candidates 
(objectClass)

Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => key_read
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: mdb_idl_fetch_key: [b49d1940]
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <= mdb_index_read: failed (-30798)
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <= mdb_equality_candidates: 
id=0, first=0, last=0


The search on objectClass doesn't seem to return any records? It just 
immediately jumps to the uid search:


Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => mdb_equality_candidates (uid)
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => key_read
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: mdb_idl_fetch_key: [b49d1940]
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <= mdb_index_read: failed (-30798)
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <= mdb_equality_candidates: 
id=0, first=0, last=0

Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => mdb_equality_candidates (uid)
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => key_read
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: mdb_idl_fetch_key: [6e61da89]
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <= mdb_index_read 1 candidates

It finds my record, does the memberUid look up, and immediately returns it:

Oct 21 17:33:16 ldap-dev-02 slapd[2389]: send_ldap_result: conn=1000 
op=1 p=3
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: send_ldap_result: err=0 
matched="" text=""
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: => send_search_entry: conn 1000 
dn="uid=henson,ou=user,dc=cpp,dc=edu"
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: <= send_search_entry: conn 1000 
exit.
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: send_ldap_result: conn=1000 
op=1 p=3
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: send_ldap_result: err=0 
matched="" text=""
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: send_ldap_response: msgid=2 
tag=101 err=0


After the record is returned, the connection is immediately closed 
without looking at anything else:


Oct 21 17:33:16 ldap-dev-02 slapd[2389]: connection_get(15)
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: connection_get(15): got connid=1000
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: connection_read(15): checking 
for input on id=1000

Oct 21 17:33:16 ldap-dev-02 slapd[2389]: op tag 0x42, time 1634862796
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: ber_get_next on fd 15 failed 
errno=0 (Success)

Oct 21 17:33:16 ldap-dev-02 slapd[2389]: conn=1000 op=2 do_unbind
Oct 21 17:33:16 ldap-dev-02 slapd[2389]: connection_close: conn=1000 sd=15


So it looks like there is something weird going on with objectClass? I 
do have an index on that:


index   default eq
index   entryCSN,objectClass,reqEnd,reqResult,reqStart


My configuration between 2.4 and 2.5 is pretty much identical. Any idea 
why it might be fully traversing the directory looking for object class 
matches?



On 10/21/2021 4:55 PM, Howard Chu wrote:

Paul B. Henson wrote:


Any thoughts on what might be going on or how I can debug it to track down 
exactly what is causing it? There was obviously a lot more debug info in the 
logs
that I didn't include, but none of it jumped out to me as "smoking gun".


Try the search again with -d5. Also include the lines showing which attribute 
it's checking in the index.
e.g.:

6171fcb7.1be5a183 0x7f28b65fd640 search_candidates: base="dc=example,dc=com" 
(0x0001) scope=2
6171fcb7.1be5b227 0x7f28b65fd640 => mdb_equality_candidates (objectClass)
6171fcb7.1be5cfe4 0x7f28b65fd640 => key_read
6171fcb7.1be5e088 0x7f28b65fd640 mdb_idl_fetch_key: [b49d1940]
6171fcb7.1be5faff 0x7f28b65fd640 <= mdb_index_read: failed (-30798)
6171fcb7.1be60a00 0x7f28b65fd640 <= mdb_equality_candidates: id=0, first=0, 
last=0
6171fcb7.1be61901 0x7f28b65fd640 => mdb_equality_candidates (sn)
6171fcb7.1be62f1a 0x7f28b65fd640 => key_read
6171fcb7.1be63fbf 0x7f28b65fd640 mdb_idl_fetch_key: [03915b69]
6171fcb7.1be659a9 0x7f28b65fd640 <= mdb_index_read 2 candidates
6171fcb7.1be66a94 0x7f28b65fd640 <= mdb_equality_candidates: id=2, first=8, 
last=9
6171fcb7.1be68cae 0x7f28b65fd640 mdb_search_candidates: id=2 first=8 last=9

significant performance degradation in 2.5, broken indexes?

2021-10-21 Thread Paul B. Henson

So I upgraded one of my test systems to 2.5, from 2.4. Doing a quick 
basic functionality check after the upgrade, I noticed that the 
performance on the upgraded system was significantly slower.


When the system was running 2.4, searching for my entry after a cold 
start took a little more than a second:


# time ldapsearch -x -b dc=cpp,dc=edu -H ldap://localhost/ uid=henson
[...]
dn: uid=henson,ou=user,dc=cpp,dc=edu
[...]
real0m1.274s

and then subsequent searches generally less than 1/10th of a second:

real0m0.010s
real0m0.009s
real0m0.007s
real0m0.006s
real0m0.010s
real0m0.008s

After upgrading to 2.5, searches, both cold start and subsequent, all 
took between 1.5-2 seconds:


real0m1.513s
real0m1.523s
real0m1.308s
real0m1.874s
real0m2.150s
real0m1.406s

These are test systems, so really have no load. I noticed on the 
upgraded box that the CPU kept spiking to 100% every few seconds, and 
tracked it down to the load balancer health checks, which basically run:


# time /opt/symas/bin/ldapsearch -x -b dc=cpp,dc=edu 
'(objectClass=dcObject)'


On the upgraded system, this again takes about 1.5 seconds on every search:

real0m1.483s

whereas on the 2.4 system, less than 1/10th of a second:

real0m0.006s


It feels like something weird is going on with indexing. I turned on 
debugging, and this is what the index activity looked like on the 2.4 
system:


Oct 21 15:27:06 ldap-dev-02 slapd[73033]: <= mdb_index_read: failed (-30798)
Oct 21 15:27:06 ldap-dev-02 slapd[73033]: <= mdb_index_read: failed (-30798)
Oct 21 15:27:12 ldap-dev-02 slapd[73033]: <= mdb_index_read: failed (-30798)
Oct 21 15:27:12 ldap-dev-02 slapd[73033]: <= mdb_index_read 1 candidates


Which makes sense, there's only one of me :). On the other hand, on the 
upgraded system:


Oct 21 15:25:55 ldap-dev-03 slapd[42259]: <= mdb_index_read: failed (-30798)
Oct 21 15:25:55 ldap-dev-03 slapd[42259]: <= mdb_index_read: failed (-30798)
Oct 21 15:26:19 ldap-dev-03 slapd[42259]: <= mdb_index_read 132354 
candidates



It looks like it's doing a full directory search for my entry? My 
configuration on the two systems is exactly the same, other than the 
changed paths for the 2.5 directory layout, removing the password policy 
schema include, and removing a pres index on automountInformation for 
the upgrade advice.


I have tested this both with using the existing binary mdb database 
files from the 2.4 system as well as dumping them out via slapcat, 
deleting the old ones, and creating new ones with the 2.5 version of 
slapadd, with the exact same behavior


Any thoughts on what might be going on or how I can debug it to track 
down exactly what is causing it? There was obviously a lot more debug 
info in the logs that I didn't include, but none of it jumped out to me 
as "smoking gun".


Thanks much…

Re: Symas OpenLDAP 2.5 RPMs run slapd as root?

2021-10-20 Thread Paul B. Henson


On 10/19/2021 8:10 AM, Quanah Gibson-Mount wrote:

If you want it to run as a non-root user, it's on you to configure it as 
such, including said user.  The majority of Symas customers run as root. 
So yes, this is intentional and due to the fact that it's not attempting 
to be the replacement of the system bundled OpenLDAP.  You're free to 
run things as best fits your environment.


Oh, ok; it was just an unexpected difference from the previous version 
that surprised me. Easy enough to resolve, although it seems inadvisable 
for the majority of your customers to run the service as root 8-/. How 
many do that intentionally, and how many do that because they just don't 
know any better and it's the default :)?


Thanks…

Symas OpenLDAP 2.5 RPMs run slapd as root?

2021-10-19 Thread Paul B. Henson

I'm testing openldap 2.5 in preparation for migration my production
services, and I noticed that the 2.5 RPMs no longer create an ldap user
and instead run slapd as root by default? Is this because they're no
longer intended to replace the system bundled openldap packages? It
seems undesirable from a security perspective to run slapd as root
rather than a dedicated service account.

I see there's a note about updating the startup options to run as a
service account here:

https://repo.symas.com/soldap/systemd/

but the ldap user/group used as an example won't exist unless the system
RPMs or the 2.4 RPMs have been previously installed or the user is
created manually.

Symas OpenLDAP for Linux 2.5

2021-07-14 Thread Paul B. Henson

We're currently using the RHEL8 sofl repo, which currently has version
2.4.59.1 available. Our management wants to start doing automatic
updates to the latest patches for security compliance reasons. This
isn't an issue with the native RHEL repos as they generally never
release major updates or incompatible packages.

I wanted to check though how the release of openldap 2.5 via sofl is
going to be handled. Is it just going to show up one day in the existing
repo such that it would automatically get upgraded to, or is it going to
be a different package name or a new repo that will require manual
intervention in order to upgrade?

Thanks...

Re: Antw: [EXT] Re: HAProxy protocol support?

2020-11-19 Thread Paul B. Henson


On 11/18/2020 11:05 PM, Ulrich Windl wrote:


I wonder: Would it be possible to use a specific named bind for on-campus
hosts, and use the name used for binding to controll further access?


Hmm, I'm not completely sure what you mean here? Do you mean an 
authenticated bind? Our current IP address access control allows 
anonymous users on campus access to attributes that anonymous users 
off-campus cannot get to, and it also limits authenticated binds for 
non-service accounts to on campus only.

Re: HAProxy protocol support?

2020-11-18 Thread Paul B. Henson


On 11/18/2020 12:47 PM, John C. Pfeifer wrote:

We did so as well (and would really like the haproxy support).


I posted the requested message to the dev list, if it sounds like
something they would accept I'll probably take a run at implementing it.

Actually, we have a hybrid model with but with more replicas in the 
cloud than on-prem.


Yes, technically, we are probably going to have a split service too,
with on-campus clients hitting the on-campus service and off-campus
clients hitting the off-campus service, with failover between them if
one or the other is dead.


It mirrors a general push to have the services which reply on LDAP
to also be in the cloud.


The cloud is magic, right ;)? At least, it magically assists in manager 
CYA when stuff breaks 8-/, blame it on Amazon…



I guess one advantage is that if/when we need more resources to
support demand, it is just a question of money rather than acquiring
physical resources, finding rack space, etc. I don’t know that it
actually any cheaper, but it is more immediate.
Once upon a time the whole cloud migration was pushed as a cost savings 
measure. Over time, it's become clear it's quite the opposite. I think 
the current claim to fame is "reliability and redundancy", along with 
"supernatural scaling ability".


IMHO, the "cloud" is just somebody else's physical data center you have 
minimal control over . If I had a nickel for every time my manager 
told me "it's less than ideal, but we could [...]" in regards to our 
cloud migration I could retire and stop having to hear him say it ;).

HAProxy protocol support?

2020-11-18 Thread Paul B. Henson

So management is insisting that we migrate our openLDAP systems from on 
premise into the cloud . Specifically, AWS behind one of their 
load balancers.


However, we currently rely upon some level of IP address based access 
control to distinguish between on-campus and off-campus clients. The 
Amazon load balancers do client NAT, so the back end servers have no 
idea who is connecting at the TCP/IP level.


They do support the haproxy in band protocol for supplying this 
information from the load balancer to the server, but that requires 
specific support from the server to do. I don't see any such support in 
openldap or any evidence of past discussion regarding it.


Is this something that would be considered as a possible feature to be 
included at some point, or something not desired as part of the code base?


Thanks...

Re: Race condition with groupOfNames using syncrepl

2020-08-12 Thread Paul B. Henson

On Tue, Aug 04, 2020 at 12:20:35PM -0700, Quanah Gibson-Mount wrote:

> There's been significant work for OpenLDAP 2.5 to allow slapo-dynlist to be 
> an alternative to slapo-memberOf in a replicated environment as it does not 
> suffer from the replication related issues.

Hmm, I've been using dynlist for the memberOf attribute with 2.4 ever
since you got tired of me reporting weird syncrepl bugs with the
slapo-memberOf module and updated the documentation to say it wasn't 
supported anymore ;). What changes will there be in 2.5 to make
it better? Other than not being able to search on memberOf anymore (and
the couple of cutover issues that resulting in when a couple apps I was
unaware of were doing that 8-/) it's been working well...

RE: .so version numbers for dlopen'd objects

2018-05-23 Thread Paul B. Henson

> From: Howard Chu
> Sent: Tuesday, May 22, 2018 11:48 PM
> >>
> >> -rwxr-xr-x 1 henson henson 1773576 May 21 12:54 back_bdb-2.4.so
> >> -rwxr-xr-x 1 henson henson 864 May 21 12:54 back_bdb.la
> 
> I'd say it's a mistake to remove the version info, since modules tend to be
> intimately tied to the specific version of OpenLDAP for which they were built.

Do you mean the "-2.4" as part of the filename, or the version in the extension 
" so.2.10.9"?

Wouldn't the server typically be installed alongside of the appropriate modules 
either via the package manager or a 'make install'? I guess you could get into 
a scenario where you have an orphaned module floating around, but given the 
config would just include "module.so" and not the versioned name, having a 
versioned name wouldn't keep it from failing?

RE: .so version numbers for dlopen'd objects

2018-05-23 Thread Paul B. Henson

> From: Quanah Gibson-Mount
> Sent: Tuesday, May 22, 2018 8:14 PM
> 
> The admin guide contains examples, is not authoritative, and may not apply
> to all cases (For example, statically built modules wouldn't work with the
> config you noted).  The manual pages are the authoritative documentation,
> and they explicitly allow for either the .la or .so files:

Unless I'm missing something, this documentation doesn't mention any
extensions :). So technically it allows for arbitrarily named modules ;).

In any case, that's a bit of a tangent; what do you think about the main
question, no longer including the so version number in the installed
modules? So:

-rwxr-xr-x 1 henson henson 811 May 22 18:41 accesslog.la
-rwxr-xr-x 1 henson henson  151064 May 22 18:41 accesslog.so

Just the la and so with no additional so symbolic links?

> moduleload 
>   Specify  the  name of a dynamically loadable module to load.
> The
>   filename may be an absolute path name or a simple filename.
> Non-
>   absolute  names are searched for in the directories
specified
> by
>   the modulepath option. This option and the modulepath option
> are
>   only usable if slapd was compiled with --enable-modules.

RE: .so version numbers for dlopen'd objects

2018-05-22 Thread Paul B. Henson

> From: Ryan Tandy
> Sent: Tuesday, May 22, 2018 5:56 PM
> 
> -rwxr-xr-x 1 henson henson 1773576 May 21 12:54 back_bdb.so
> 
> (and no .la at all - on platforms where that's feasible)

The documentation currently says to use the .la file:

http://www.openldap.org/doc/admin24/slapdconf2.html#cn=module

Although under linux I've always just used the .so file myself. Getting rid
of the .la would require a documentation update and also possibly config
changes for everybody who listened to the documentation :).

RE: .so version numbers for dlopen'd objects

2018-05-22 Thread Paul B. Henson

> From: Robert Heller
> Sent: Tuesday, May 22, 2018 5:37 PM
> 
> You can suppress the version numbers with the "-avoid-version" LDFLAGS
> option

Cool, thanks for the pointer. Then the question is whether this is
acceptable for upstream to deploy in general, or if I should just tweak the
OpenBSD port to do so? Quanah or Howard :)?

Thanks.

RE: .so version numbers for dlopen'd objects

2018-05-22 Thread Paul B. Henson

> From: Robert Heller
> Sent: Tuesday, May 22, 2018 1:16 PM
>
> (specificly Tcl extensions).  If using libtool, it *should* create
symlinks
> for the .so file without the version numbers like this:

Yes, currently it both creates a .so.x.y file as well as a symbolic link
from just .so, the assertion from OpenBSD is that for this use case there
should only be the .so.

> In my case, the shared library is *both* a dlopen'ed tcl extension and can
> also be linked to by a C++ program.  This is under Linux, but I do the
same
> under MacOSX (which is an OpenBSD variant under-the-hood).

OS X is actually based off of FreeBSD :). But all of the BSD's cross
pollinate to some extent.

In this case, it is only a dynamically loaded module, I do not believe it is
suitable or there would be a use case to link to it directly from an
application. I don't really care myself, but it is one of the stall points
keeping my changes from getting incorporated so I need to sort it out.

Thanks.

.so version numbers for dlopen'd objects

2018-05-22 Thread Paul B. Henson

I'm trying to get the OpenBSD openldap port updated to use modules,
currently it just builds everything into a monolithic binary. They are
objecting to the existence of the version number in the shared objects for
the modules:

-rwxr-xr-x 1 henson henson 1773576 May 21 12:54 back_bdb-2.4.so.2.10.7
-rwxr-xr-x 1 henson henson 864 May 21 12:54 back_bdb.la
lrwxrwxrwx 1 henson henson  22 May 21 12:54 back_bdb.so ->
back_bdb-2.4.so.2.10.7

They say that shared objects which are intended to be dlopen'd, as opposed
to linked to, should not include version numbers, and it should look like:

-rwxr-xr-x 1 henson henson 1773576 May 21 12:54 back_bdb-2.4.so
-rwxr-xr-x 1 henson henson 864 May 21 12:54 back_bdb.la

What is the openldap developer perspective on this?

Thanks.

Re: slapd: null_callback : error code 0x14

2017-10-02 Thread Paul B. Henson

On Mon, Oct 02, 2017 at 01:37:09PM -0700, Quanah Gibson-Mount wrote:

> Are you positive the servers are in sync at this point?  I.e., did you
> freshly reload from whatever you consider your golden master them
> after applying the patch?

Hmm, I thought they were in sync (I've spot checked the groups that have
given errors in the past to see if there were any inconsistencies), but
I can't say I'm 100% confident, as I have not done that at this point.
I'll reload all three from a dump of the current primary master and see
if that clears things up.

Thanks for the suggestion...

Re: slapd: null_callback : error code 0x14

2017-10-02 Thread Paul B. Henson

On Mon, Sep 25, 2017 at 04:31:40PM +0200, Ondřej Kuzník wrote:

> I'd apply it everywhere you have syncprov configured, these could send a
> cookie with too little information for a replica to spot and skip a
> duplicate.

Hmm, I applied the patch to all four of my servers but I'm still seeing
the errors :(...

Oct  2 03:46:24 egeria slapd[86715]: null_callback : error code 0x14
Oct  2 03:46:24 egeria slapd[86715]: syncrepl_message_to_op: rid=002 be_modify 
uid=nnharpale,ou=user,dc=cpp,dc=edu (20)
Oct  2 03:54:59 egeria slapd[86715]: null_callback : error code 0x14
Oct  2 03:54:59 egeria slapd[86715]: syncrepl_message_to_op: rid=003 be_modify 
uid=lvl_1_users,ou=group,dc=cpp,dc=edu (20)
Oct  2 03:55:00 egeria slapd[86715]: null_callback : error code 0x14
Oct  2 03:55:00 egeria slapd[86715]: syncrepl_message_to_op: rid=003 be_modify 
uid=lvl_1_users,ou=group,dc=cpp,dc=edu (20)
Oct  2 03:55:00 egeria slapd[86715]: null_callback : error code 0x14
Oct  2 03:55:00 egeria slapd[86715]: syncrepl_message_to_op: rid=003 be_modify 
uid=lvl_1_users,ou=group,dc=cpp,dc=edu (20)
Oct  2 03:55:00 egeria slapd[86715]: null_callback : error code 0x14
Oct  2 03:55:00 egeria slapd[86715]: syncrepl_entry: rid=003 be_modify failed 
(20)
Oct  2 03:55:00 egeria slapd[86715]: do_syncrepl: rid=003 rc 20 retrying (9 
retries left)

Oct  2 03:46:14 minerva slapd[68720]: null_callback : error code 0x14
Oct  2 03:46:14 minerva slapd[68720]: syncrepl_message_to_op: rid=002 be_modify 
uid=nicknguyen,ou=user,dc=cpp,dc=edu (20)
Oct  2 03:55:00 minerva slapd[68720]: null_callback : error code 0x14
Oct  2 03:55:00 minerva slapd[68720]: syncrepl_message_to_op: rid=003 be_modify 
uid=lvl_1_users,ou=group,dc=cpp,dc=edu (20)
Oct  2 03:55:00 minerva slapd[68720]: null_callback : error code 0x14
Oct  2 03:55:00 minerva slapd[68720]: syncrepl_message_to_op: rid=003 be_modify 
uid=lvl_1_users,ou=group,dc=cpp,dc=edu (20)

Any other thoughts on what might be going on or what would be helpful to
debug it?

Thanks...

Re: slapd: null_callback : error code 0x14

2017-09-22 Thread Paul B. Henson

On Fri, Sep 22, 2017 at 08:50:38AM -0700, Quanah Gibson-Mount wrote:

> Oh, I thought you had said you only had two masters.  This could well be 

Ah, my bad, there are a total of 4 nodes, and while technically I guess
they could all be "masters", only two of them ever receive writes, one
is the primary behind a hardware load balancer and the other is the
secondary; so in my head I have two masters and two read only systems.
Which I suppose isn't really accurate from an openldap architecture
perspective, sorry.

> ITS#8444 (ignore the ITS title, it has nothing to do with memberOf), where 
> there are out of sync problems with 3+ MMR nodes and delta-syncrepl when 
> syncprov checkpoints.

Oh, I remember that ITS, I thought I'd fixed that issue by getting rid
of the memberOf overlay and switching to dynlist 8-/. It seems since I
stopped paying attention to it it's moved on in other directions.

I see there was a proposed patch posted on 8/25 that's been applied to
RE24, I'll add that to my system and see if the issue goes away. Am I
correct in my assumption that the patch only needs to be applied to the
system that is receiving the updates?

Thanks...

Re: slapd: null_callback : error code 0x14

2017-09-21 Thread Paul B. Henson

On Tue, Sep 19, 2017 at 07:22:19PM -0700, Quanah Gibson-Mount wrote:

> the log from your other rid as well?  I.e., if it is being applied twice, 
> you should see each modification logged twice.  Usually with sync logging, 
> you have lines noting what the CSN is that's being processed as well. 
> There's not enough log information here to act on. :)

Ok, let's see if this is any better.

Here's a failure from this morning:

Sep 21 03:55:39 coeus slapd[103811]: null_callback : error code 0x14
Sep 21 03:55:39 coeus slapd[103811]: syncrepl_entry: rid=003 be_modify 
uid=cs26401,ou=group,dc=cpp,dc=edu (20)
Sep 21 03:55:39 coeus slapd[103811]: syncrepl_entry: rid=003 be_modify failed 
(20)

Here's what it looks like from a bit earlier than that:

Sep 21 03:55:39 coeus slapd[103811]: syncrepl_message_to_op: rid=002 be_modify 
uid=cs264,ou=group,dc
=cpp,dc=edu (0)
Sep 21 03:55:39 coeus slapd[103811]: slap_queue_csn: queueing 0x7f95b41acaa0 
20170921105538.952721Z#
00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_queue_csn: queueing 0x7f95b41170a0 
20170921105538.952721Z#
00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_graduate_commit_csn: removing 
0x7f95b41170a0 2017092110553
8.952721Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_graduate_commit_csn: removing 
0x7f95b41acaa0 2017092110553
8.952721Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: do_syncrep2: rid=003 CSN pending, ignoring 
20170921105538.95272
1Z#00#004#00 (reqStart=20170921105539.73Z,cn=accesslog)
Sep 21 03:55:39 coeus slapd[103811]: do_syncrep2: rid=002 
cookie=rid=002,sid=002,csn=20170921105538.
996047Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_queue_csn: queueing 0x7f95b4102bf0 
20170921105538.996047Z#
00#004#00
Sep 21 03:55:39 coeus slapd[103811]: do_syncrep2: rid=003 
cookie=rid=003,sid=003,csn=20170921105538.
998639Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: do_syncrep2: rid=001 CSN pending, ignoring 
20170921105538.95272
1Z#00#004#00 (reqStart=20170921105538.91Z,cn=accesslog)
Sep 21 03:55:39 coeus slapd[103811]: do_syncrep2: rid=001 
cookie=rid=001,sid=004,csn=20170921105538.
996047Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_queue_csn: queueing 0x7f95b41a2f70 
20170921105538.996047Z#
00#004#00
Sep 21 03:55:39 coeus slapd[103811]: syncprov_matchops: skipping original sid 
004
Sep 21 03:55:39 coeus slapd[103811]: slap_graduate_commit_csn: removing 
0x7f95b41a2f70 20170921105538.996047Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_graduate_commit_csn: removing 
0x7f95b4102bf0 20170921105538.996047Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: syncrepl_message_to_op: rid=002 be_modify 
uid=cs26401,ou=group,dc=cpp,dc=edu (0)
Sep 21 03:55:39 coeus slapd[103811]: syncprov_sendresp: to=002, 
cookie=rid=002,sid=001,csn=20170921105538.996047Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_queue_csn: queueing 0x7f95b4112730 
20170921105538.996047Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_queue_csn: queueing 0x7f95b41acaa0 
20170921105538.996047Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_graduate_commit_csn: removing 
0x7f95b41acaa0 20170921105538.996047Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_graduate_commit_csn: removing 
0x7f95b4112730 20170921105538.996047Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: do_syncrep2: rid=002 
cookie=rid=002,sid=002,csn=20170921105538.998639Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: slap_queue_csn: queueing 0x7f95d8130560 
20170921105538.998639Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: null_callback : error code 0x14
Sep 21 03:55:39 coeus slapd[103811]: slap_graduate_commit_csn: removing 
0x7f95d8130560 20170921105538.998639Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: syncrepl_message_to_op: rid=003 be_modify 
uid=cs26401,ou=group,dc=cpp,dc=edu (20)
Sep 21 03:55:39 coeus slapd[103811]: do_syncrep2: rid=003 delta-sync lost sync 
on (reqStart=20170921105539.75Z,cn=accesslog), switching to REFRESH
Sep 21 03:55:39 coeus slapd[103811]: slap_queue_csn: queueing 0x7f95b41acaa0 
20170921105538.998639Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: syncrepl_message_to_entry: rid=003 DN: 
uid=cs26401,ou=group,dc=cpp,dc=edu, UUID: af8393a4-8d2b-1027-9af4-e374887d0b81
Sep 21 03:55:39 coeus slapd[103811]: syncrepl_entry: rid=003 
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
Sep 21 03:55:39 coeus slapd[103811]: syncrepl_entry: rid=003 be_search (0)
Sep 21 03:55:39 coeus slapd[103811]: syncrepl_entry: rid=003 
uid=cs26401,ou=group,dc=cpp,dc=edu
Sep 21 03:55:39 coeus slapd[103811]: slap_queue_csn: queueing 0x7f95b41c8a10 
20170921105538.998639Z#00#004#00
Sep 21 03:55:39 coeus slapd[103811]: syncprov_matchops: skipping original sid 
004
Sep 21 03:55:39 coeus slapd[103811]: slap_graduate_commit_csn: removing 
0x7f95b41c8a10

Re: slapd: null_callback : error code 0x14

2017-09-20 Thread Paul B. Henson

On Tue, Sep 19, 2017 at 07:22:19PM -0700, Quanah Gibson-Mount wrote:

> Well, my first question is, do you see these changes to this group entry in 
> the log from your other rid as well?  I.e., if it is being applied twice, 
> you should see each modification logged twice.  Usually with sync logging, 
> you have lines noting what the CSN is that's being processed as well. 
> There's not enough log information here to act on. :)

Ah, I don't have sync logging on, just stats right now; I'll turn on
sync logging and see what shows up tomorrow. I'm sorry for the bad bug
reporting :(, but OTOH the fact that I'm in such bad practice at
reporting bugs is a tribute to how infrequently I have to do so :).

Thanks...

Re: slapd: null_callback : error code 0x14

2017-09-19 Thread Paul B. Henson

On Mon, Sep 18, 2017 at 08:31:34AM -0700, Quanah Gibson-Mount wrote:

> These aren't "wierd errors".  0x14=20.  I.e., you're simply getting the 
> same error logged twice.

Well, "unexpected" errors perhaps :).

I knew the lines were related but I didn't notice the hex decimal
equivilency.

> Well, error code 20 is "type or value already exists".  If the replica and 
> the masters are starting from the same point, this would tend to imply that 
> the replica is receiving the change multiple times, and it's failing on the 
> subsequent modify operations.

Hmm. I only have one active master, and it's the only server receiving
updates. Let me try including some more detail on an error from this
morning.

So here's the error on the replica:

Sep 19 03:56:19 coeus slapd[43551]: null_callback : error code 0x14
Sep 19 03:56:19 coeus slapd[43551]: syncrepl_message_to_op: rid=003 be_modify 
uid=egr44503,ou=group,dc=cpp,dc=edu (20)

rid 3 is the backup master:

syncreplrid=3
provider=ldaps://minerva.ldap.cpp.edu/

Let's see what's in the accesslog:

dn: reqStart=20170919105619.57Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20170919105619.57Z
reqEnd: 20170919105619.58Z
reqType: modify
reqSession: 1
reqAuthzID: cn=ldaproot,dc=cpp,dc=edu
reqDN: uid=egr44503,ou=group,dc=cpp,dc=edu
reqResult: 0
reqMod: member:- uid=michaelm3,ou=user,dc=cpp,dc=edu
reqMod: entryCSN:= 20170919105619.291799Z#00#004#00
reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=cpp,dc=edu
reqMod: modifyTimestamp:= 20170919105619Z
reqEntryUUID: f0b9ac97-5eec-4ac7-86e6-ac691689a3c8
entryUUID: 390ff571-794c-4fc1-86ac-d72aeac1dbbf
creatorsName: cn=accesslog
createTimestamp: 20170919105619Z
entryCSN: 20170919105619.291799Z#00#004#00
modifiersName: cn=accesslog
modifyTimestamp: 20170919105619Z

dn: reqStart=20170919105619.60Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20170919105619.60Z
reqEnd: 20170919105619.61Z
reqType: modify
reqSession: 1
reqAuthzID: cn=ldaproot,dc=cpp,dc=edu
reqDN: uid=egr44503,ou=group,dc=cpp,dc=edu
reqResult: 0
reqMod: memberUid:- michaelm3
reqMod: entryCSN:= 20170919105619.295478Z#00#004#00
reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=cpp,dc=edu
reqMod: modifyTimestamp:= 20170919105619Z
reqEntryUUID: f0b9ac97-5eec-4ac7-86e6-ac691689a3c8
entryUUID: 6d3a042b-d6ab-4b3e-bc14-6b37eaef91f7
creatorsName: cn=accesslog
createTimestamp: 20170919105619Z
entryCSN: 20170919105619.295478Z#00#004#00
modifiersName: cn=accesslog
modifyTimestamp: 20170919105619Z

dn: reqStart=20170919105619.63Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20170919105619.63Z
reqEnd: 20170919105619.69Z
reqType: modify
reqSession: 1
reqAuthzID: cn=ldaproot,dc=cpp,dc=edu
reqDN: uid=egr44503,ou=group,dc=cpp,dc=edu
reqResult: 0
reqMod: member:+ uid=qxho,ou=user,dc=cpp,dc=edu
reqMod: entryCSN:= 20170919105619.305311Z#00#004#00
reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=cpp,dc=edu
reqMod: modifyTimestamp:= 20170919105619Z
reqEntryUUID: f0b9ac97-5eec-4ac7-86e6-ac691689a3c8
entryUUID: 2ce0cd50-70eb-4bbc-80a5-85fd8322290a
creatorsName: cn=accesslog
createTimestamp: 20170919105619Z
entryCSN: 20170919105619.305311Z#00#004#00
modifiersName: cn=accesslog
modifyTimestamp: 20170919105619Z

dn: reqStart=20170919105619.71Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20170919105619.71Z
reqEnd: 20170919105619.72Z
reqType: modify
reqSession: 1
reqAuthzID: cn=ldaproot,dc=cpp,dc=edu
reqDN: uid=egr44503,ou=group,dc=cpp,dc=edu
reqResult: 0
reqMod: memberUid:+ qxho
reqMod: entryCSN:= 20170919105619.308548Z#00#004#00
reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=cpp,dc=edu
reqMod: modifyTimestamp:= 20170919105619Z
reqEntryUUID: f0b9ac97-5eec-4ac7-86e6-ac691689a3c8
entryUUID: 5f896be6-74b9-4c16-a96a-d7fc6f6a85eb
creatorsName: cn=accesslog
createTimestamp: 20170919105619Z
entryCSN: 20170919105619.308548Z#00#004#00
modifiersName: cn=accesslog
modifyTimestamp: 20170919105619Z

That's all that's in the accesslog. Ok, now in the slapd log on minerva;
there's no mention of egr44503 anywhere in the slapd log. In the
accesslog:

dn: reqStart=20170919105619.55Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20170919105619.55Z
reqEnd: 20170919105619.57Z
reqType: modify
reqSession: 1
reqAuthzID: cn=ldaproot,dc=cpp,dc=edu
reqDN: uid=egr44503,ou=group,dc=cpp,dc=edu
reqResult: 0
reqMod: member:- uid=michaelm3,ou=user,dc=cpp,dc=edu
reqMod: entryCSN:= 20170919105619.291799Z#00#004#00
reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=cpp,dc=edu
reqMod: modifyTimestamp:= 20170919105619Z
reqEntryUUID: f0b9ac97-5eec-4ac7-86e6-ac691689a3c8
entryUUID: be73d55d-bccc-4409-b3e1-1620440a5b99
creatorsName: cn=accesslog
createTimestamp:

slapd: null_callback : error code 0x14

2017-09-15 Thread Paul B. Henson

So I've been putting off posting about this, but I recently upgraded to
2.4.45, stopped using the memberof overlay, no longer have a node with
serverID 0, and overall think my ldap servers should be in good shape
:), and it's still happening, so I thought it was time...

I'm getting weird errors in my logs about group replication changes:

Sep 15 03:55:16 minerva slapd[71654]: null_callback : error code 0x14
Sep 15 03:55:16 minerva slapd[71654]: syncrepl_message_to_op: rid=003 be_modify 
uid=arc464,ou=group,dc=cpp,dc=edu (20)

Sep 15 03:56:32 minerva slapd[71654]: null_callback : error code 0x14
Sep 15 03:56:32 minerva slapd[71654]: syncrepl_message_to_op: rid=002 be_modify 
uid=ce427l01,ou=group,dc=cpp,dc=edu (20)

Sep 15 03:56:53 minerva slapd[71654]: null_callback : error code 0x14
Sep 15 03:56:53 minerva slapd[71654]: syncrepl_message_to_op: rid=003 be_modify 
uid=chm250l01,ou=group,dc=cpp,dc=edu (20)


They sporadically happen on my replicas, on random groups, not typically
the same groups on each one on any given day:

Sep 15 04:04:52 egeria slapd[29652]: null_callback : error code 0x14
Sep 15 04:04:52 egeria slapd[29652]: syncrepl_message_to_op: rid=001 be_modify 
uid=soc202,ou=group,dc=cpp,dc=edu (20)

Sep 15 04:02:38 coeus slapd[]: null_callback : error code 0x14
Sep 15 04:02:38 coeus slapd[]: syncrepl_message_to_op: rid=001 be_modify 
uid=mat321,ou=group,dc=cpp,dc=edu (20)

I'm updating the member and memberUid attributes:

Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205558 MOD 
dn="uid=soc202,ou=group,dc=cpp,dc=edu"
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205558 MOD attr=member
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205558 RESULT tag=103 err=0 
text=
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205559 MOD 
dn="uid=soc202,ou=group,dc=cpp,dc=edu"
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205559 MOD attr=memberUid
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205559 RESULT tag=103 err=0 
text=
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205560 MOD 
dn="uid=soc202,ou=group,dc=cpp,dc=edu"
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205560 MOD attr=member
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205560 RESULT tag=103 err=0 
text=
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205561 MOD 
dn="uid=soc202,ou=group,dc=cpp,dc=edu"
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205561 MOD attr=memberUid
Sep 15 04:04:52 themis slapd[3035]: conn=125451 op=205561 RESULT tag=103 err=0 
text=

Sep 15 03:55:15 themis slapd[3035]: conn=125451 op=174816 MOD 
dn="uid=arc464,ou=group,dc=cpp,dc=edu"
Sep 15 03:55:15 themis slapd[3035]: conn=125451 op=174816 MOD attr=member
Sep 15 03:55:15 themis slapd[3035]: conn=125451 op=174816 RESULT tag=103 err=0 
text=
Sep 15 03:55:15 themis slapd[3035]: conn=125451 op=174817 MOD 
dn="uid=arc464,ou=group,dc=cpp,dc=edu"
Sep 15 03:55:15 themis slapd[3035]: conn=125451 op=174817 MOD attr=memberUid
Sep 15 03:55:15 themis slapd[3035]: conn=125451 op=174817 RESULT tag=103 err=0 
text=

As far as I can tell, all the updates succeed and the objects are
exactly the same on all the replicas, so these failures just seem to be
log noise?

Any thoughts on what's going on?

Thanks...

RE: mdb broken under OpenBSD

2017-08-04 Thread Paul B. Henson

> From: Howard Chu
> Sent: Friday, August 4, 2017 4:23 AM
>
> Nice work, thanks for the update.

So 10 of the mdb tests ended up failing; a couple because slapd failed to 
start, a bunch because ldapadd failed, and one because the script that 
evidently was executable per the -x test moments before suddenly wasn't found? 
There doesn't seem to be any log info or other remnants of the tests left over 
once they are done running to determine why they failed? Unless I missed 
something; what does one do to check on a test to see why it failed? Thanks…

test020-proxycache:
Waiting 5 seconds for slapd to start...
Waiting 5 seconds for slapd to start...
Waiting 5 seconds for slapd to start...


test049-sync-config:
Adding schema and databases on provider...
ldapadd failed for database config (80)!


test050-syncrepl-multimaster:
Adding schema and databases on server 1...
ldapadd failed for database config (80)!


test052-memberof:
Running ldapadd to build slapd config database...
ldapadd failed (80)!


test057-memberof-refint:
Running ldapadd to build slapd config database...
ldapadd failed (80)!


test058-syncrepl-asymmetric:
Adding database config on central master...
ldapadd failed for central master database config (80)!


test059-slave-config:
Adding schema and databases on provider...
ldapadd failed for database config (80)!


test061-syncreplication-initiation:
Adding database configuration on ldap://localhost:9011/
ldapadd failed (80)!


test063-delta-multimaster:
Waiting 5 seconds for slapd to start...
Waiting 5 seconds for slapd to start...


test064-constraint:
/usr/obj/ports/openldap-2.4.44/openldap-2.4.44/tests/scripts/all[91]: 
/usr/obj/ports/openldap-2.4.44/openldap-2.4.44/tests/scripts/test064-constraint:
 No such file or directory

Re: mdb broken under OpenBSD

2017-08-03 Thread Paul B. Henson

On Thu, Aug 03, 2017 at 02:20:29PM -0700, Paul B. Henson wrote:

> From my initial look, mdb_env_create() successfully sets mdb->mi_dbenv,
> it's still valid in mdb_db_open(), but by the time it reaches
> be->be_entry_put in slapadd() it's NULL :(.

It turned out it was only sometimes NULL. The culprit was actually the local
OpenBSD patch that was added to mdb_db_open() to ensure MDB_WRITEMAP is always
set:

if ( !(flags & MDB_WRITEMAP) ) {
Debug( LDAP_DEBUG_ANY,
LDAP_XSTRING(mdb_db_open) ": database \"%s\" does not 
have writemap. "
"This is required on systems without unified buffer 
cache.\n",
be->be_suffix[0].bv_val, rc, 0 );
goto fail;
}

There were two problems with it; first, it accesses the local flags variable
before it is initialized to mdb->mi_dbenv_flags shortly thereafter, so
the value is random and the if block nondeterministically triggers, and
second, it doesn't assign a failure value to rc before it jumps to fail:, 
so the function returns successfully but with a closed be.

mdb has been disabled for a while, so I'm guessing the first issue might
have occurred over time as backend.c changed and the patch was just
blindly updated without testing. The second issue I'm not sure about.

I temporarily tweaked it to always enable MDB_WRITEMAP so I could run the
mdb test suite (which doesn't have it enabled for everything) and so far
it seems to be working.

I would hope that this simple issue isn't why they've had it disabled all
this time, but I guess I'll see what happens with the full test suite
as it progresses.

Re: mdb broken under OpenBSD

2017-08-03 Thread Paul B. Henson

On Thu, Aug 03, 2017 at 09:46:20PM +0100, Howard Chu wrote:

> Have to say, since getting nowhere discussing that feature with them, I 
> personally have written off supporting it.

It's not my first choice for an enterprise ldap deployment, but I do
like it for certain network functions, and in this case I wanted to run
a mirrored openldap deployment on two redundant openbsd routers as a
password backend for radius and smtp auth...

> For the trace you're showing, you'll have to debug the slapadd invocation and 
> find out why env is NULL. Also use current (2.4.45) source, at least.

2.4.44 is what's currently in their ports tree with some patches to work
(barring mdb), so I was starting lazy :). I'll move up to current code,
it will just require porting their local OS mods to the latest version.

>From my initial look, mdb_env_create() successfully sets mdb->mi_dbenv,
it's still valid in mdb_db_open(), but by the time it reaches
be->be_entry_put in slapadd() it's NULL :(. I'll keep tracing it,
thanks.

mdb broken under OpenBSD

2017-08-03 Thread Paul B. Henson

I was interested in using openldap under OpenBSD; they currently have
mdb disabled as they say it is broken. That OS lacks a unified buffer
cache, so mdb can only be used with the MDB_WRITEMAP option enabled, but
supposedly theoretically it should work with that. I tried running the
mdb tests, and it immediately segfaults:

Program terminated with signal 11, Segmentation fault.

2773flags |= env->me_flags & MDB_WRITEMAP;  


It looks like mdb_txn_begin is being passed a NULL env? I haven't started
poking around yet to see why that might be, but I thought I'd just toss this
out there in case an expert had a thought before I spent a lot of time on
it :). Thanks...


#0  0x16210db057c5 in mdb_txn_begin (env=0x0, parent=0x0, flags=0, 
ret=0x16210df6fec0)
at 
/usr/obj/ports/openldap-2.4.44/openldap-2.4.44/servers/slapd/back-mdb/../../../libraries/libl
mdb/mdb.c:2773
txn = (MDB_txn *) 0xf
ntxn = (MDB_ntxn *) 0x0
rc = 0
size = 0
tsize = 32639
#1  0x16210db224df in mdb_tool_entry_put (be=0x1623ccbdd200, 
e=0x16235a80c008, [47/9401]
text=0x7f7eeb90)
at 
/usr/obj/ports/openldap-2.4.44/openldap-2.4.44/servers/slapd/back-mdb/tools.c:624
rc = 0
mdb = (struct mdb_info *) 0x1623c358
op = {o_hdr = 0x0, o_tag = 0, o_time = 0, o_tincr = 0, o_bd = 0x0, 
o_req_dn = {bv_len = 0,
bv_val = 0x0}, o_req_ndn = {bv_len = 0, bv_val = 0x0}, o_request = {oq_add 
= {
  rs_modlist = 0x0, rs_e = 0x0}, oq_bind = {rb_method = 0, rb_cred = 
{bv_len = 0,
bv_val = 0x0}, rb_edn = {bv_len = 0, bv_val = 0x0}, rb_ssf = 0, rb_mech 
= {bv_len = 0,
bv_val = 0x0}}, oq_compare = {rs_ava = 0x0}, oq_modify = {rs_mods = 
{rs_modlist = 0x0,
rs_no_opattrs = 0 '\0'}, rs_increment = 0}, oq_modrdn = {rs_mods = 
{rs_modlist = 0x0,
rs_no_opattrs = 0 '\0'}, rs_deleteoldrdn = 0, rs_newrdn = {bv_len = 0, 
bv_val = 0x0},
  rs_nnewrdn = {bv_len = 0, bv_val = 0x0}, rs_newSup = 0x0, rs_nnewSup = 
0x0}, oq_search = {
  rs_scope = 0, rs_deref = 0, rs_slimit = 0, rs_tlimit = 0, rs_limit = 0x0, 
rs_attrsonly = 0,
  rs_attrs = 0x0, rs_filter = 0x0, rs_filterstr = {bv_len = 0, bv_val = 
0x0}}, oq_abandon = {
  rs_msgid = 0}, oq_cancel = {rs_msgid = 0}, oq_extended = {rs_reqoid = 
{bv_len = 0,
bv_val = 0x0}, rs_flags = 0, rs_reqdata = 0x0}, oq_pwdexop = 
{rs_extended = {rs_reqoid = {
  bv_len = 0, bv_val = 0x0}, rs_flags = 0, rs_reqdata = 0x0}, rs_old = 
{bv_len = 0,
bv_val = 0x0}, rs_new = {bv_len = 0, bv_val = 0x0}, rs_mods = 0x0, 
rs_modtail = 0x0}},
  o_abandon = 0, o_cancel = 0, o_groups = 0x0, o_do_not_cache = 0 '\0', 
o_is_auth_check = 0 '\0',
  o_dont_replicate = 0 '\0', o_acl_priv = ACL_NONE, o_nocaching = 0 '\0',
  o_delete_glue_parent = 0 '\0', o_no_schema_check = 0 '\0', 
o_no_subordinate_glue = 0 '\0',
  o_ctrlflag = '\0' , o_controls = 0x0, o_authz = {sai_method 
= 0, sai_mech = {
  bv_len = 0, bv_val = 0x0}, sai_dn = {bv_len = 0, bv_val = 0x0}, sai_ndn = 
{bv_len = 0,
  bv_val = 0x0}, sai_ssf = 0, sai_transport_ssf = 0, sai_tls_ssf = 0, 
sai_sasl_ssf = 0},
  o_ber = 0x0, o_res_ber = 0x0, o_callback = 0x0, o_ctrls = 0x0, o_csn = 
{bv_len = 0,
bv_val = 0x0}, o_private = 0x0, o_extra = {slh_first = 0x0}, o_next = 
{stqe_next = 0x0}}
ohdr = {oh_opid = 0, oh_connid = 0, oh_conn = 0x0, oh_msgid = 0, 
oh_protocol = 0,
  oh_tid = 0x0, oh_threadctx = 0x0, oh_tmpmemctx = 0x0, oh_tmpmfuncs = 0x0, 
oh_counters = 0x0,
  oh_log_prefix = '\0' }
__func__ = "mdb_tool_entry_put"
#2  0x16210dac8329 in slapadd (argc=8, argv=0x7f7eee58)
at 
/usr/obj/ports/openldap-2.4.44/openldap-2.4.44/servers/slapd/slapadd.c:456
textbuf = '\0' 
textlen = 256
erec = {e = 0x16235a80c008, lineno = 1, nextline = 18}
bvtext = {bv_len = 256, bv_val = 0x7f7eebd0 ""}
thr = 0x16210dd8cd78
id = 140187732470848
prev = (Entry *) 0x0
ldifrc = 1
rc = 0
stat_buf = {st_mode = 4294896512, st_dev = 32639, st_ino = 
140187732470584,
  st_nlink = 990767968, st_uid = 5667, st_gid = 0, st_rdev = 0, st_atim = 
{tv_sec = 179931522,
tv_nsec = 48}, st_mtim = {tv_sec = 24342436749856, tv_nsec = 
24341375494144}, st_ctim = {
tv_sec = 24343378665241, tv_nsec = 140187732470664}, st_size = 0, st_blocks 
= 140187732470664,
  st_blksize = 990767968, st_flags = 5667, st_gen = 0, __st_birthtim = {tv_sec 
= 179915312,
tv_nsec = 48}}
#3  0x16210da02d7a in main (argc=8, argv=0x7f7eee58)
at /usr/obj/ports/openldap-2.4.44/openldap-2.4.44/servers/slapd/main.c:664
i = 0
no_detach = 0
rc = 1
urls = 0x0
username = 0x0
groupname = 0x0
sandbox = 0x0
syslogUser = 160
pid = 8
waitfds = {331804096, 5667}
g_argc = 8
g_argv = (char **) 0x7f7eee58

openldap under openbsd

2017-07-09 Thread Paul B. Henson

I was curious if anybody is running openldap under openbsd? The version
in their ports system has mdb disabled, it says mdb is unreliable and
results in random slapd crashes. It seems openbsd lacks a unified buffer
cache, so mdb can only be used with the MDB_WRITEMAP option; they added
a patch to cause initialization to fail if that option isn't enabled but
then ended up disabling it completely anyway. This was back in 2015, I'm
not sure if they've tested it recently or what details there were behind
the crashes before they ended up disabling it.

I'm thinking of trying it out with the latest release and seeing what
happens but was wondering if anybody else had any recent experience.

Thanks...

Re: Elliptic Curve support in 2.4 branch

2017-07-09 Thread Paul B. Henson

On Mon, Jun 19, 2017 at 02:34:04PM +0100, Howard Chu wrote:

> OpenLDAP 2.4 is feature-frozen. All new features are 2.5 only.

What are current thoughts on a possible release date for 2.5 :)? The
roadmap still shows TBD...

Re: 2.4.44 + ITS 8432 patch segfault in modify_add_values

2017-02-22 Thread Paul B. Henson

On Tue, Feb 21, 2017 at 01:04:33PM +, Frank Swasey wrote:

> case.  I've not had any issues since.  Perhaps, your Net::LDAP module
> version has changed and it is sending what is being logged across the
> wire instead of the delete you are expecting.

Hmm, I don't believe we've updated Net::LDAP recently, the only change
since the crashes started was the openldap update. It probably would be
cleaner to do an explicit delete rather than a replace with an empty
array, but it's always done the right thing in the past so it's never
come up. I'll add that to my todo list :), thanks...

Re: 2.4.44 + ITS 8432 patch segfault in modify_add_values

2017-02-22 Thread Paul B. Henson

On Mon, Feb 20, 2017 at 04:30:00PM -0800, Quanah Gibson-Mount wrote:

> Ok, I can see if I can reproduce something like this.  Do you have a link 
> to the schema file in use that adds this attribute?

https://spaces.internet2.edu/display/macedir/OpenLDAP+eduPerson

> Also, can you confirm your patch matches 
> 

Exactly :).

$ diff ITS8432.patch 
/usr/portage-local/net-nds/openldap/files/openldap-its-8432.patch

Re: RE24 testing call (2.4.45) LMDB RE0.9 testing call (0.9.20)

2017-02-22 Thread Paul B. Henson

On Tue, Feb 21, 2017 at 01:14:58PM -0800, Quanah Gibson-Mount wrote:

> So I'd like to know the order in which your overlays are loaded.

First there's

database mdb
suffix cn=accesslog
overlay syncprov

Then

databasemdb
suffix  "dc=cpp,dc=edu"
overlay memberof
overlay syncprov
overlay accesslog


I don't recall ever seeing a requirement for a specific ordering of the
overlays.

Re: 2.4.44 + ITS 8432 patch segfault in modify_add_values

2017-02-17 Thread Paul B. Henson

On Thu, Feb 16, 2017 at 03:53:40PM -0800, Quanah Gibson-Mount wrote:

> It appears to be crashing while writing the change to the accesslog 
> database.  It's odd that the value for the attribute is NULL.  Do we know 
> for sure what the client doing the write op to the server is sending?

The code is in perl, and looks like this:

$entry->replace(eduPersonAffiliation => \@eduPersonAffiliation);

In this case, the array @eduPersonAffiliation is empty, effectively
deleting the attribute. I'm not 100% sure what this generates on the
wire, I'd have to debug it. I can say it's the same code that's been
running for a decade or so with no issues.

> Yeah, so this is the operation that actually failed... It'd be interesting 
> to know if it succeeded in the primary DB, but failed when writing to the 
> accesslog DB

I already reran that operation to fix the expiration, but the next time
it crashes I'll take a look at the primary and secondaries first to see
if they're out of sync.

> Hm, so I guess my question would be is it doing the op like this:

Hmm, the expiration code when removing an expiration looks like:

$entry->delete('cppEduPersonExpiration');

So it should be a delete on the wire for removing the attribute. You
think it crashed on the expiration operating, even though the backtrace
shows the segfault having the eduPersonAffiliation accesslog reqStart
id?

> dn: ...
> changetype: modify
> replace: csupomonaEduPersonExpiration
> csupomonaEduPersonExpiration:
> 
> Or is it doing it like this:
> 
> dn: ...
> changetype: modify
> delete: csupomonaEduPersonExpiration
> 
> Because the NULL value seems to imply the former.

Re: 2.4.44 + ITS 8432 patch segfault in modify_add_values

2017-02-15 Thread Paul B. Henson

On Wed, Feb 15, 2017 at 12:22:29PM -0800, Quanah Gibson-Mount wrote:

> I would suggest filing an ITS with the full backtrace info, so I can track 
> it.

Ok, will do.

> It could be useful to have the entry data from the accesslog as 
> well for the failed replication op, as we can see the failed entry DN in 
> the output of your backtrace.

That would be in the accesslog on the server that crashed? Hmm, the
server that crashed is the master, and all updates were going to it. Am
I confused, or did the update that caused the crash come in via syncrepl
though, and hence originate from a different server? So the accesslog
entry you want would be from that server, not the server that crashed?
But given no other servers should have been receiving updates, how would an
update have been received via replication? Or is this another issue like
the memberOf problem where updates are being improperly replicated?

> Does the operation complete successfully after slapd is restarted?

As far as I can tell; at least the server doesn't crash. I don't get any
errors at the application level in my logs with that specific message
from the backtrace. 

Hmm, looking at the logs that correspond with one of the crashes:

Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 MOD 
dn="uid=vntruong,ou=user,dc=csupomona,dc=edu"
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 MOD 
attr=eduPersonAffiliation eduPersonPrimaryAffiliation 
csupomonaEduPersonAffiliation
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 RESULT tag=103 err=0 text=

This operation appears to succeed? Then there's this:

Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=806 MOD 
dn="uid=vntruong,ou=user,dc=csupomona,dc=edu"
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=806 MOD 
attr=csupomonaEduPersonExpiration

Then nothing, the server crashed. In my application, it's the
csupomonaEduPersonExpiration modification that fails.

Here's the entry from the accesslog for the eduPersonAffiliation:

dn: reqStart=20170214120013.01Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20170214120013.01Z
reqEnd: 20170214120013.02Z
reqType: modify
reqSession: 37859
reqAuthzID: cn=idmgmt,ou=user,ou=service,dc=csupomona,dc=edu
reqDN: uid=vntruong,ou=user,dc=csupomona,dc=edu
reqResult: 0
reqMod: eduPersonAffiliation:=
reqMod: eduPersonPrimaryAffiliation:=
reqMod: csupomonaEduPersonAffiliation:=
reqMod: entryCSN:= 20170214120013.628665Z#00#000#00
reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=csupomona,dc=edu
reqMod: modifyTimestamp:= 20170214120013Z
reqEntryUUID: bd5ba51c-7a1b-4bdb-8bf3-fe90552f5909
entryCSN: 20170214120013.628665Z#00#000#00
entryUUID: 4737c48c-e46e-45a4-ba4b-2eb61540b27b
creatorsName: cn=accesslog
createTimestamp: 20170214120013Z
modifiersName: cn=accesslog
modifyTimestamp: 20170214120013Z

Then the next entry isn't until:

dn: reqStart=20170214184643.08Z,cn=accesslog

when I restarted the server. I guess I am confused; the entryCSN has
serverID 0, the ID of this server, so this isn't a replicated op, it's
an op from this server. So why does the backtrace show the change coming
in via syncrepl? It seems like it's getting applied twice. The change is
deleting the attribute, so the second time it's getting applied you
would get a no such attribute error...

Re: 2.4.44 + ITS 8432 patch segfault in modify_add_values

2017-02-15 Thread Paul B. Henson

On Wed, Feb 15, 2017 at 09:04:48AM -0800, Quanah Gibson-Mount wrote:

> Hm, so there is , but 
> not sure that's the same issue.

I do have a 4-way MMR setup, but that seems to be the only simularity.
It's crashing in mods.c, not search.c, and from a segfault when actually
dereferencing a null pointer, rather than from an assertion failure
checking for one. Well, I suppose the other simularity is that it
happens during replication.

Fortunately it's not that frequent of an occurance, and our load
balancer pops the load over to the failover master fairly quickly when
it happens, we usually only lose one transaction that I need to go clean
up. It typically happens during the heavy update load of our daily
identity management syncronization batch job.

2.4.44 + ITS 8432 patch segfault in modify_add_values

2017-02-14 Thread Paul B. Henson

So I've gotten a total of 5 crashes so far since I updated my production
environment to 2.4.44 with a locally applied ITS 8432 patch. I finally
went ahead and built a debug enabled binary to take a better look at the
core files.

They all have the same signature, SIGSEGV in modify_add_values; and in all of
them, ">sm_values[mod->sm_numvals]", is NULL. I included a full
backtrace from one of the cores. Any thoughts as to what might be going on
here?

Thanks...


#0  0x00485c61 in modify_add_values (e=e@entry=0x7fbdef662480, 
mod=mod@entry=0x7fbdd980, 
permissive=0, text=text@entry=0x7fbdef6629a0, 
textbuf=textbuf@entry=0x7fbdef6624d0 "modify/delete: eduPersonAffiliation: 
no such attribute", 
textlen=textlen@entry=256)
at 
/var/lib/portage/tmp/portage/net-nds/openldap-2.4.44-r1/work/openldap-2.4.44/servers/slapd/mods.c:61
61  if ( !BER_BVISNULL( >sm_values[mod->sm_numvals] )) {
(gdb) print >sm_values[mod->sm_numvals]
$1 = (BerValue *) 0x0



#0  0x00485c61 in modify_add_values (e=e@entry=0x7f3a720f3480, 
mod=mod@entry=0x7f3a6c0013a0, 
permissive=0, text=text@entry=0x7f3a720f39a0, 
textbuf=textbuf@entry=0x7f3a720f34d0 "modify/delete: eduPersonAffiliation: 
no such attribute", 
textlen=textlen@entry=256)
at 
/var/lib/portage/tmp/portage/net-nds/openldap-2.4.44-r1/work/openldap-2.4.44/servers/slapd/mods.c:61
61  if ( !BER_BVISNULL( >sm_values[mod->sm_numvals] )) {
(gdb) print >sm_values[mod->sm_numvals]
$1 = (BerValue *) 0x0



#0  0x00485c61 in modify_add_values (e=e@entry=0x7fa36fffd480, 
mod=mod@entry=0x7fa358000980, 
permissive=0, text=text@entry=0x7fa36fffd9a0, 
textbuf=textbuf@entry=0x7fa36fffd4d0 "modify/delete: eduPersonAffiliation: 
no such attribute", 
textlen=textlen@entry=256)
at 
/var/lib/portage/tmp/portage/net-nds/openldap-2.4.44-r1/work/openldap-2.4.44/servers/slapd/mods.c:61
61  if ( !BER_BVISNULL( >sm_values[mod->sm_numvals] )) {
(gdb) print >sm_values[mod->sm_numvals]
$1 = (BerValue *) 0x0



#0  0x00485c61 in modify_add_values (e=e@entry=0x7f7e249f8480, 
mod=mod@entry=0x7f7e10001f00, 
permissive=0, text=text@entry=0x7f7e249f89a0, 
textbuf=textbuf@entry=0x7f7e249f84d0 "modify/delete: eduPersonAffiliation: 
no such attribute", 
textlen=textlen@entry=256)
at 
/var/lib/portage/tmp/portage/net-nds/openldap-2.4.44-r1/work/openldap-2.4.44/servers/slapd/mods.c:61
61  if ( !BER_BVISNULL( >sm_values[mod->sm_numvals] )) {
(gdb) print >sm_values[mod->sm_numvals]
$1 = (BerValue *) 0x0



#0  0x00485c61 in modify_add_values (e=e@entry=0x7f50727fb480, 
mod=mod@entry=0x7f50600019a0, 
permissive=0, text=text@entry=0x7f50727fb9a0, 
textbuf=textbuf@entry=0x7f50727fb4d0 "modify/delete: eduPersonAffiliation: 
no such attribute", 
textlen=textlen@entry=256)
at 
/var/lib/portage/tmp/portage/net-nds/openldap-2.4.44-r1/work/openldap-2.4.44/servers/slapd/mods.c:61
61  if ( !BER_BVISNULL( >sm_values[mod->sm_numvals] )) {
(gdb) print >sm_values[mod->sm_numvals]
$1 = (BerValue *) 0x0




(gdb) bt full
#0  0x00485c61 in modify_add_values (e=e@entry=0x7f50727fb480, 
mod=mod@entry=0x7f50600019a0, 
permissive=0, text=text@entry=0x7f50727fb9a0, 
textbuf=textbuf@entry=0x7f50727fb4d0 "modify/delete: eduPersonAffiliation: 
no such attribute", 
textlen=textlen@entry=256)
at 
/var/lib/portage/tmp/portage/net-nds/openldap-2.4.44-r1/work/openldap-2.4.44/servers/slapd/mods.c:61
rc = 
op = 0x4fe045 "add"
a = 
pmod = {sm_desc = , sm_values = 0x0, sm_nvalues = 0x0, 
sm_numvals = , 
  sm_op = , sm_flags = , sm_type = 
{bv_len = , 
bv_val = }}
__PRETTY_FUNCTION__ = "modify_add_values"
#1  0x7f526dfccd0c in mdb_modify_internal (op=op@entry=0x7f50727fc600, 
tid=tid@entry=0x1088a90, 
modlist=0x7f5060001960, e=e@entry=0x7f50727fb480, 
text=text@entry=0x7f50727fb9a0, 
textbuf=textbuf@entry=0x7f50727fb4d0 "modify/delete: eduPersonAffiliation: 
no such attribute", 
textlen=256)
at 
/var/lib/portage/tmp/portage/net-nds/openldap-2.4.44-r1/work/openldap-2.4.44/servers/slapd/back-mdb/mod
ify.c:137
rc = 
err = 
mod = 0x7f50600019a0
ml = 0x7f50600019a0
save_attrs = 0x7f5060004270
ap = 0x7f5272f53010
glue_attr_delete = 
got_delete = 0
__PRETTY_FUNCTION__ = "mdb_modify_internal"
#2  0x7f526dfce134 in mdb_modify (op=0x7f50727fc600, rs=0x7f50727fb980)
at 
/var/lib/portage/tmp/portage/net-nds/openldap-2.4.44-r1/work/openldap-2.4.44/servers/slapd/back-mdb/mod
ify.c:624
mdb = 0x7f5272f53010
e = 0x7f5060004220
manageDSAit = 
textbuf = "modify/delete: eduPersonAffiliation: no such 
attribute\000\377\037\000\000\000\000\000\000\000@\003\315\000\000\000\000\000\200\271\177rP\177\000\000\021\000\000\000\000\000\000\000.,\020`P\177\000\000

Re: RE24 testing call (2.4.45) LMDB RE0.9 testing call (0.9.20)

2017-02-05 Thread Paul B. Henson

On Fri, Feb 03, 2017 at 01:25:30PM -0800, Quanah Gibson-Mount wrote:

> It turned out to not be related. :/

Oh, that's disappointing :(. I'm reproducing it multiple times on a daily
basis on my production systems 8-/, is there anything I can do to help
track it down? Log dumps from high log levels? I could drop in a custom
slapd binary on one of them if need be with extra logging code; anything
that wouldn't unduly impact normal end user functionality...

Thanks much.

RE: RE24 testing call (2.4.45) LMDB RE0.9 testing call (0.9.20)

2017-02-02 Thread Paul B. Henson

> From: Quanah Gibson-Mount
> Subject: RE24 testing call (2.4.45) LMDB RE0.9 testing call (0.9.20)
> 
> For this testing call, we particularly need folks to test OpenLDAP with
> startTLS/LDAPS when compiled against OpenSSL (both pre 1.1 series and with
> the 1.1 series).

Compiled successfully with Gentoo linux and openSSL 1.02j/cyrus-sasl 2.1.26,
configured as:

--prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share
--sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking
--disable-silent-rules --libdir=/usr/lib64 --libexecdir=/usr/lib64/openldap
--disable-static --enable-ldap --enable-slapd --enable-bdb --enable-hdb
--enable-dnssrv=mod --enable-ldap=mod --enable-mdb=mod --enable-meta=mod
--enable-monitor=mod --enable-null=mod --enable-passwd=mod
--enable-relay=mod --enable-shell=mod --enable-sock=mod --disable-perl
--disable-sql --disable-crypt --disable-slp --disable-lmpasswd
--enable-syslog --enable-aci --enable-cleartext --enable-modules
--enable-rewrite --enable-rlookups --enable-slapi --enable-syncprov=yes
--enable-overlays=mod --enable-ipv6 --with-cyrus-sasl --enable-spasswd
--disable-wrappers --with-tls=openssl --enable-dynamic --enable-local
--enable-proctitle --enable-shared

make test completed successfully, is there any particular way to verify all
the tests were okay? Does the make itself fail if any of the tests do, I did
not see a summary at the end. make its was not as happy:

> Starting its4326 ...
running defines.sh
Running slapadd to build slapd database...
Starting slapd on TCP/IP port 9011...
Using ldapsearch to check that slapd is running...
Starting proxy slapd on TCP/IP port 9012...
Using ldapsearch to check that proxy slapd is running...
Waiting 5 seconds for slapd to start...
Waiting 5 seconds for slapd to start...
Waiting 5 seconds for slapd to start...
Waiting 5 seconds for slapd to start...
Waiting 5 seconds for slapd to start...
Waiting 5 seconds for slapd to start...
ldapsearch failed (255)!
./data/regressions/its4326/its4326: line 93: kill: (28780) - No such process
> ./data/regressions/its4326/its4326 failed (exit 255)

I see the fix for ITS8432 is included in this release (yay); I was wondering
if you've had any luck tracking down the underlying issue behind ITS8444? So
far I still haven't seen any corruption or operational issues from it, but
the rampant noise in the logs and errors being generated are quite
disconcerting :). Plus they will potentially mask any errors that are
actually indicative of a real problem.

Thanks.

Re: Allow particular LDAP group users login

2017-01-07 Thread Paul B. Henson

On Sat, Jan 07, 2017 at 11:53:27AM +0800, Frank Yu wrote:

> # grep pam_listfile.so system-auth -A2
> authrequired  pam_listfile.so \
> onerr=fail item=group sense=allow file=/etc/login.group.allowed

Without your complete pam configuration there's really no way to tell
what's going on. For example, what if you have a module configured as
sufficient listed above this line? pam_listfile would never even be
consulted.

All I can really say is that I use pam_listfile as so:

auth   requisitepam_listfile.so item=group sense=allow 
file=/etc/security/authorized_groups.conf onerr=fail

and it works fine for me, with groups pulled out of LDAP, the way I have it
integrated into the rest of my pam configuration. That, and you'd
probably be better off taking this inquiry to the pam mailing list as
your issue is most likely with pam configuration, not ldap, assuming a
"getent group " returns the group from ldap you're working
with.

Re: Fwd: Help pls : KDC w/LDAP backend

2017-01-02 Thread Paul B. Henson

On Mon, Jan 02, 2017 at 08:27:29AM +0100, Pascal Jakobi wrote:

> My LDAP ACLs are as follows :

Just as a reference, the ACLs we use are:

access to attrs=userPassword
by anonymous auth

access to dn.subtree="cn=container,ou=kerberos"
by dn="cn=kdc,ou=service,ou=kerberos" write
by dn="cn=kadmin,ou=service,ou=kerberos" write
by * none break

access to dn.exact="ou=kerberos" attrs=entry,contextCSN,objectClass
by dn="cn=slapd-checksync,ou=service,ou=kerberos" read
by * none break

access to *
by dn.exact="cn=slapd-syncrepl,ou=service,ou=kerberos" read
by * none


We've never had an issue. The first stanza allows the various service
accounts to authenticate, the second provides access to the kdc and
kadmin services, the third to a replication check account, and the last
to the syncrepl service. We run separate dedicated ldap servers for our
kerberos backends on each kdc, we don't mix the kerberos ldap data into
our normal ldap systems.

Re: memberOf overlay issues with 2.4.44 + ITS 8432 patch

2017-01-02 Thread Paul B. Henson

On Sat, Dec 31, 2016 at 05:39:15PM -0600, Quanah Gibson-Mount wrote:
> Ok good to know!  The issue I created occurred when deleting the
> group, which I believe is the original complaint in the ITS as well.
> So I will add individual membership deletes as a part of that test
> too.

ITS 8444 indicates a full refresh occurs if an object maintained by
memberOf is either added or deleted. I don't think I'm seeing that. Mark
says his CPU use ramps up to 100% for a bit every time that happens. I
don't think I'm having any issues with full refreshes, just duplicate
operations. Although he says the full refresh appears to be occuring
because of a duplicate operation. So it might be the same underlying bug
just displaying different symptoms due to different
configurations/load/update patterns...

Re: memberOf overlay issues with 2.4.44 + ITS 8432 patch

2016-12-30 Thread Paul B. Henson

On Fri, Dec 30, 2016 at 02:41:06PM -0800, Quanah Gibson-Mount wrote:

> Well, it seems to be some sort of race condition.

Yes, I'd agree; probably also load dependent as I never triggered it on
my dev systems which are mostly idle other than my test load. It only
showed up on my prod systems which tend to have continuous load from
various other things.

> I did want to confirm that you see this on servers that are long running 
> (I.e., they've been running for a long time, and had other group deletes 
> that went through w/o issue during that time).  If so, then I can modify 
> the test to randomly add and delete groups as a part of the test, 
> increasing the likelyhood of triggering the issue within the test.

I don't have too many deletions of group objects themselves in
production, mostly just deletions of the members of groups. I didn't see
any issues with group deletions in dev, or during some basic initial
testing in prod. I'll go ahead and make a new test group, add some
members to it, and then delete it and see what happens now that I've
been running this code for about 3 weeks...

I didn't see any errors deleting a group, although there were these
syncrepl messages that I don't believe used to show up:

Dec 30 21:23:29 themis slapd[2607]: syncrepl_message_to_op: rid=001 be_delete 
uid=ldaptest5,ou=group,dc=cpp,dc=edu (32)
Dec 30 21:23:29 themis slapd[2607]: syncrepl_message_to_op: rid=003 be_delete 
uid=ldaptest5,ou=group,dc=cpp,dc=edu (32)
Dec 30 21:23:29 themis slapd[2607]: syncrepl_message_to_op: rid=002 be_delete 
uid=ldaptest5,ou=group,dc=cpp,dc=edu (32)

The group and memberOf attributes are gone on all four servers, so other
than noise in the logs I'm not sure what these messages meant.

Re: memberOf overlay issues with 2.4.44 + ITS 8432 patch

2016-12-29 Thread Paul B. Henson

On Thu, Dec 29, 2016 at 04:48:42PM -0800, Quanah Gibson-Mount wrote:

> I'm able to reproduce the problem with my test script, so I think things 
> are set from here.

Excellent; in that case please let me know when there is a fix available
:). Thanks much...

Re: memberOf overlay issues with 2.4.44 + ITS 8432 patch

2016-12-28 Thread Paul B. Henson

On Wed, Dec 28, 2016 at 09:53:23AM -0800, Quanah Gibson-Mount wrote:

> I'm going to see if I can create a test that reproduces the issue.
> I've been working on expanding the test suite so we can ensure we
> avoid regressions etc.

Cool, thanks. Let me know if there's anything I can do to help.

Re: 2.4.44 segfault in modify_add_values

2016-12-28 Thread Paul B. Henson

On Wed, Dec 28, 2016 at 09:51:38AM -0800, Quanah Gibson-Mount wrote:

> Doesn't ring a bell, and unfortunately it looks like your binaries are 
> stripped, so the backtrace doesn't contain useful information. :(

Ok, thanks. I'd been running 2.4.41 for quite a while with no issues, so
I'd guess this had something to do with the update. If it happens again
I'll drop in a debug non-stripped build so the core will be more
helpful.

2.4.44 segfault in modify_add_values

2016-12-25 Thread Paul B. Henson

Woke up this morning to an unwanted Christmas present :(, one of my ldap
servers had crashed with a segfault:

Core was generated by `/usr/lib64/openldap/slapd -u ldap -g ldap -h
ldaps:// ldap://'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00485c61 in modify_add_values ()
[Current thread is 1 (Thread 0x7f50727fd700 (LWP 15556))]
(gdb) where
#0  0x00485c61 in modify_add_values ()
#1  0x7f526dfccd0c in mdb_modify_internal () from 
/usr/lib64/openldap/openldap/back_mdb.so
#2  0x7f526dfce134 in mdb_modify () from 
/usr/lib64/openldap/openldap/back_mdb.so
#3  0x0049c2ea in overlay_op_walk ()
#4  0x0049c441 in ?? ()
#5  0x0048d938 in ?? ()
#6  0x004915d7 in ?? ()
#7  0x00495793 in ?? ()
#8  0x00433037 in ?? ()
#9  0x7f5272c83a32 in ?? () from /usr/lib64/libldap_r-2.4.so.2
#10 0x7f52722ce494 in start_thread () from /lib64/libpthread.so.0
#11 0x7f52719c6edd in clone () from /lib64/libc.so.6

The last bit in the log was an update, then it ended:

Dec 25 03:31:03 fosse slapd[]: conn=561646 op=3 MOD 
dn="uid=aapriest,ou=user,dc=csupomona,dc=edu "
Dec 25 03:31:03 fosse slapd[]: conn=561646 op=3 MOD 
attr=csupomonaEduPersonExpiration

When I saw the alert on my phone when I woke up I assumed it would have
something to do with the memberOf overlay issues I was already looking
into but this seems different.

I restarted it and it synced right back up from the failover master and
seems to be ok now. Does this pop out as anything known and possibly
resolved in head? Thanks...

Re: memberOf overlay issues with 2.4.44 + ITS 8432 patch

2016-12-22 Thread Paul B. Henson

On Thu, Dec 22, 2016 at 10:35:55AM -0600, Quanah Gibson-Mount wrote:

> Looks like a bug with the memberOf overlay when it is instantiated in
> a delta-syncrepl environment, based on this statement from the
> memberOf man page:

This is new behavior as of 2.4.44; I had the exact same
memberOf/delta-syncrepl configuration under 2.4.41 and never saw this
issue.

> Probably worth adding to ITS#8444.

Ok, will do. Although it seems there's been no response to that ticket
for six months :(? Anything else I can provide to help debug this?
Unfortunately it doesn't seem to be a straight "always happens" bug, I
didn't see it all in my dev environment under a test load, it only
popped up in production under a full load.

Thanks...

memberOf overlay issues with 2.4.44 + ITS 8432 patch

2016-12-21 Thread Paul B. Henson

I recently updated to openldap 2.4.44 with the patch for ITS 8432. I'd been
holding off in hopes of a new release with that patch included and for
some update on ITS 8444, but decided to go ahead and push it through
during the holiday break.

I installed it on my dev environment and was unable to replicate the
issues reported in ITS 8444 so went ahead and rolled into production.
However, now that I'm in production, I'm seeing different issues with
the memberOf overlay. It seems for some reason group membership
deletions are getting replicated multiple times? In the logs, I will see
something like the following:


Dec 21 04:16:59 themis slapd[2607]: conn=364875 op=227806 MOD 
dn="uid=members,ou=group,dc=cpp,dc=edu"
Dec 21 04:16:59 themis slapd[2607]: conn=364875 op=227806 MOD attr=member

My identity management system connected and removed some members from
this group. Next, there will be a number of lines like this:


Dec 21 04:17:15 themis slapd[2607]: conn=-1 op=0: memberof_value_modify 
DN="uid=prsloan,ou=user,dc=cpp,dc=edu" delete 
memberOf="uid=members,ou=group,dc=cpp,dc=edu" failed err=16
Dec 21 04:17:16 themis slapd[2607]: conn=-1 op=0: memberof_value_modify 
DN="uid=prsloan,ou=user,dc=cpp,dc=edu" delete 
memberOf="uid=members,ou=group,dc=cpp,dc=edu" failed err=16
Dec 21 04:17:17 themis slapd[2607]: conn=-1 op=0: memberof_value_modify 
DN="uid=prsloan,ou=user,dc=cpp,dc=edu" delete 
memberOf="uid=members,ou=group,dc=cpp,dc=edu" failed err=16
Dec 21 04:17:18 themis slapd[2607]: conn=-1 op=0: memberof_value_modify 
DN="uid=prsloan,ou=user,dc=cpp,dc=edu" delete 
memberOf="uid=members,ou=group,dc=cpp,dc=edu" failed err=16
Dec 21 04:17:18 themis slapd[2607]: conn=-1 op=0: memberof_value_modify 
DN="uid=prsloan,ou=user,dc=cpp,dc=edu" delete 
memberOf="uid=members,ou=group,dc=cpp,dc=edu" failed err=16
Dec 21 04:17:18 themis slapd[2607]: conn=-1 op=0: memberof_value_modify 
DN="uid=prsloan,ou=user,dc=c

Where the memberOf overlay is complaining that it can't remove the
corresponding memberOf attribute from the user. Reviewing the
accesslog, we see:

dn: reqStart=20161221121659.02Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20161221121659.02Z
reqEnd: 20161221121706.00Z
reqType: modify
reqSession: 364875
reqAuthzID: cn=idmgmt,ou=user,ou=service,dc=cpp,dc=edu
reqDN: uid=members,ou=group,dc=cpp,dc=edu
reqResult: 0
reqMod: member:- uid=jjtringali,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=kaijulee,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=srknight,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=ppimentel,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=epdetering,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=ktran,ou=user,dc=cpp,dc=edu
[...]
reqMod: member:- uid=prsloan,ou=user,dc=cpp,dc=edu

The initial tranaction from my client, authenticated as cn=idmgmt. This
one is successful and the overlay deletes the memberOf attribute.
However, there are then *six* more instances of the same transaction:

dn: reqStart=20161221121714.01Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20161221121714.01Z
reqEnd: 20161221121715.00Z
reqType: modify
reqSession: 3
reqAuthzID: cn=ldaproot,dc=cpp,dc=edu
reqDN: uid=members,ou=group,dc=cpp,dc=edu
reqResult: 0
reqMod: member:- uid=jjtringali,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=kaijulee,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=srknight,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=ppimentel,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=epdetering,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=ktran,ou=user,dc=cpp,dc=edu
[...]
reqMod: member:- uid=prsloan,ou=user,dc=cpp,dc=edu

dn: reqStart=20161221121716.01Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20161221121716.01Z
reqEnd: 20161221121716.02Z
reqType: modify
reqSession: 1
reqAuthzID: cn=ldaproot,dc=cpp,dc=edu
reqDN: uid=members,ou=group,dc=cpp,dc=edu
reqResult: 0
reqMod: member:- uid=jjtringali,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=kaijulee,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=srknight,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=ppimentel,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=epdetering,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=ktran,ou=user,dc=cpp,dc=edu
[...]
reqMod: member:- uid=prsloan,ou=user,dc=cpp,dc=edu

dn: reqStart=20161221121716.04Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20161221121716.04Z
reqEnd: 20161221121717.02Z
reqType: modify
reqSession: 1
reqAuthzID: cn=ldaproot,dc=cpp,dc=edu
reqDN: uid=members,ou=group,dc=cpp,dc=edu
reqResult: 0
reqMod: member:- uid=jjtringali,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=kaijulee,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=srknight,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=ppimentel,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=epdetering,ou=user,dc=cpp,dc=edu
reqMod: member:- uid=ktran,ou=user,dc=cpp,dc=edu
[...]
reqMod: member:- uid=prsloan,ou=user,dc=cpp,dc=edu

dn:

Re: openldap 2.4.44 + ITS 8432 patch still infinitely replicates

2016-07-29 Thread Paul B. Henson

On Thu, Jul 28, 2016 at 04:09:27PM -0700, Quanah Gibson-Mount wrote:

> > The bug noted in ITS#8460 is not present in 2.4 at all. Quanah is also
> > running experimental backports of features from 2.5 and forgets to
> > mention that sometimes. This particular issue is from 2.5 (and is now
> > fixed in git).
> > Most likely it's the same issue as #8448. You can disregard both of these.

Ah, thanks for the clarification Howard :).

> Same as ITS8462 as well.  So all 3 of those are off the table.  Other than 
> those, I haven't had problems with 2.4.44 + ITS8432. ;)

Cool. Are you using the memberOf overlay? It doesn't look like there's
been any update on ITS 8444 that Mark Cairney reported a month or so ago
in which he said he was seeing replication issues with 2.4.44 + ITS8432
fix for adds/deletes of objects managed by the overlay.

Mark, any progress on resolving that issue?

Thanks everybody...

RE: openldap 2.4.44 + ITS 8432 patch still infinitely replicates

2016-07-21 Thread Paul B. Henson

> From: Howard Chu
> Sent: Thursday, July 21, 2016 3:36 AM
> 
> The fix for #8432 only prevents the redundant mod from being processed on
> a particular node. If other nodes are still accepting the redundant op
then yes,
> it will continue to propagate. So yes, you need the patched code on all
> nodes.

Okay, thanks for the clarification. I usually stage updates to avoid a
complete outage at any given time. It's interesting though that I had never
seen this problem running 2.4.41 until I introduced a 2.4.44 system into the
mix, and then it went away once I reverted that system back to 2.4.41. I
wonder why that combination caused it to pop up suddenly. I'll have to
schedule a downtime window and update them all at once and see what happens.

By any chance have you had the time to look at ITS 8444? The reporter says
he sees a similar circular replication issue when the memberOf overlay is
enabled, which we also use.

Thanks much.

openldap 2.4.44 + ITS 8432 patch still infinitely replicates

2016-07-21 Thread Paul B. Henson

I upgraded one of the nodes of a four node MMR delta syncrepl openldap
system today to 2.4.44 + the backported ITS 8432 patch (the other three
nodes were still running 2.4.41, which all four had been running for
quite some time with no issues) and within a few hours they started blowing
up with the infinite replication issue referred to in ITS 8432, all the
accesslogs were filled with the same modification repeated over and
over. I ended up having to slapcat the db, restore the upgraded node to
2.4.41, and then reload the db on all of them to recover.

>From what I understand the bug was supposed to have existed in versions
before 2.4.44, but I had never seen it in 2.4.41, and backing out
to that version seems to have restored stability. I remember seeing
another ITS regarding an issue even with the 8432 patch applied if
you're using the memberOf overlay (although that number escapes me at
the moment), but I thought that only applied if you were updating group
memberships and the change that blew up my system was a password change.

Is it possible updating one node to the fixed version somehow triggered
the bug in one of the other nodes? Do I need to upgrade all the nodes at
the same time? Or are there possibly still edge cases where 2.4.44 with
the patch are still broken? I did a test run of the upgrade in my dev
environment, including the staged rollout with temporarily mixed
versions, but it doesn't really have the same load and variety of access
patterns that production sees.

Thanks...

Re: Odd MMR behaviour with delta-syncrepl and refreshAndPersist

2016-06-20 Thread Paul B. Henson

On Thu, Jun 16, 2016 at 10:10:19AM +0100, Mark Cairney wrote:

> I'll fill in an ITS as suggested.

Hmm, this is on a 2.4.44 deployment with the patch from head applied
that Quanah indicated fixed the original problem he was having? I just
compiled 2.4.44 with that patch last week in preparation for an upcoming
planned upgrade; however, we use memberOf as well so now perhaps I'll
hold off again a bit . Would you be so kind as to post the ITS #
once you file it?

Thanks...

Re: MMR: How do you tell that the MMR servers are in sync?

2016-06-07 Thread Paul B. Henson

On Tue, Jun 07, 2016 at 08:02:14AM -0400, Frank Swasey wrote:

> I am intending that only one will officially be active at a time. 
> However, I am doing that by activating the service IP addresses on the 
> system that I want to be active instead of using a load balancer.

Cool. In that case, here is the script we've been running for years to
check that our 4 MMR servers are synced. It occasionally blips when we
do a major idm run that creates or deletes 20-30 thousand accounts, but
other than that I rarely if ever get alerts from it. It runs on each of
the four servers, and our config management server populates it with the
names of the other three to check.
#! /usr/bin/perl

use Net::LDAP ();

my $other_csns = lookup_other_csns();
my $my_csns= lookup_my_csns();

if (!compare_csns($my_csns, $other_csns)) {
	my $prev_other_csns = $other_csns;
	my $prev_my_csns= $my_csns;

	sleep 120;

	$other_csns = lookup_other_csns();
	$my_csns= lookup_my_csns();

	if (!compare_csns($my_csns, $other_csns)) {
		if ($prev_my_csns ne $my_csns) {
			if (!(date_check($my_csns, $other_csns))) {
print STDERR "Warning: LDAP replica out of synchronization\n";
print_csns($my_csns, $other_csns);
exit(1);
			}
		}
		else {
			print STDERR "Error: LDAP replica out of syncronization, no apparent update progress seen\n";
			print_csns($my_csns, $other_csns);
			exit(1);
		}
	}
}

sub lookup_csns {
	my ($ldap, $server_name) = @_;

	my $search = $ldap->search(
		scope  => 'base',
		base   => "dc=cpp,dc=edu",
		filter => "(objectclass=*)",
		attrs  => [ 'contextCSN' ]
		);

	$search->code() and do {
		print STDERR "Error: failed to execute search on $server_name: "
			. $search->error() . " (" . $search->code() . ")\n";
		exit(1);
	};

	my $entry = $search->shift_entry() or do {
		print STDERR "Error: search on $server_name failed to find entry\n";
		exit(1);
	};

	my %csn_hash;

	foreach my $csn ($entry->get_value('contextCSN')) {
		defined($csn) or die("Error: no contextCSN attribute found in $server_name entry\n");
		if ($csn =~ m/Z#\d{6}#(\d{3})#.*/) {
			$csn_hash{$1} = $csn;
		}
	}

	my @csns;

	foreach my $key (sort(keys %csn_hash)) {
		push @csns, $csn_hash{$key};
	}

	my $csns_s = join(" ", @csns);
	return $csns_s;
}

sub lookup_other_csns {
	my @ret;

	foreach my $server ('ldap2','ldap3','ldap4') {
		my $ldap_master = Net::LDAP->new("$server.ldap.cpp.edu", timeout => 10) or do {
			print STDERR "Error: failed to connect to LDAP master $server: $@\n";
			print STDERR "Synchronization checks will not be performed on $server\n";
		};
		if (defined $ldap_master) {
			my $csns = lookup_csns($ldap_master, $server);
			push @ret, [$server, $csns];
			$ldap_master->unbind;
		}
	}

	return \@ret;
}

sub lookup_my_csns {
	my $ldap_slave = Net::LDAP->new("localhost", timeout => 10) or do {
		print STDERR "Error: failed to connect to local LDAP: $@\n";
		exit(1);
	};

	$csns = lookup_csns($ldap_slave, "localhost");
	$ldap_slave->unbind;
	return $csns;
}

sub compare_csns {
	my ($my_csns, $other_csns_list) = @_;
	my @other_csns = @$other_csns_list;

	foreach my $csnsref (@other_csns) {
		my ($servername, $csns) = @$csnsref;
		if (!($my_csns eq $csns)) {
			return 0;
		}
	}
	return 1;
}

sub date_check {
	my ($my_csns, $other_csns_list) = @_;

	my %my_csns_hash;

	foreach my $csn (split(" ", $my_csns)) {
		if ($csn =~ m/Z#\d{6}#(\d{3})/) {
			$my_csns_hash{$1} = $csn;
		}
	}

	my @other_csns = @$other_csns;

	foreach my $other_csnref (@other_csns) {
		my ($servername, $other_csn) = @$other_csnref;
		my %other_csns_hash;
		foreach my $csn (split(" ", $other_csn)) {
			if ($csn =~ m/Z#\d{6}#(\d{3})/) {
$other_csns_hash{$1} = $csn;
			}
  		}
		foreach my $key (keys %my_csns_hash) {
			if (!($my_csns_hash{$key} eq $other_csns_hash{$key})) {
# compare times.
my $csn1= $my_csns_hash{$key};
my $csn2= $other_csns_hash{$key};
my $csn1_ts = $csn1; $csn1_date =~ s/Z.*//;
my $csn2_ts = $csn2; $csn2_date =~ s/Z.*//;

if ($csn2_ts - $csn1_ts > 1000) {
	return 0;
}
			}
		}
	}
	return 1;
}

sub print_csns {
	my ($my_csns, $other_csns_list) = @_;
	my @other_csns = @$other_csns_list;

	print "My CSNs: \n";
	foreach my $csn (split(" ", $my_csns)) {
		print "$csn\n";
	}
	foreach my $ocsnref (@other_csns) {
		my ($servername, $ocsn) = @$ocsnref;
		print "$servername CSNs: \n";
		foreach my $csn (split(" ", $ocsn)) {
			print "$csn\n";
		}
	}
}

Re: Odd MMR behaviour with delta-syncrepl and refreshAndPersist

2016-06-04 Thread Paul B. Henson

On Fri, Jun 03, 2016 at 04:06:45PM -0700, Quanah Gibson-Mount wrote:

> Likely 

This is a new issue with 2.4.44? We've been running a 4 node MMR system
under 2.4.43 that's been very stable and were planning to update to
2.4.44 this summer. Would it be better to hold off on such an update?

Thanks...

Re: MMR: How do you tell that the MMR servers are in sync?

2016-06-02 Thread Paul B. Henson

On Fri, May 27, 2016 at 08:45:28AM -0400, Frank Swasey wrote:

> How are you folks, who are already using MMR, checking/verifying that 
> the MMR participants (and their replicas) are actually in sync?

To clarify, are you directing all writes to one master, or are you
actually spreading writes across all of them simultaneously?

I have my systems set up with MMR, but with a load balancer in front
such that only one node is ever actually receiving writes. I was going
to include the perl script we use to verify they are in sync, but if
you're actually running MMR with distributed writes, I'm not sure if it
would work. I've never had an issue with it when writes flipped between
nodes (on purpose or otherwise 8-/ ) and I'm sure there've been
occasions when two nodes might have gotten a few writes at the same
time, but I've never had multiple nodes receiving writes simultaneously
for an extended period of time.

Although thanks to management that believes more in audit check boxes
than actual security, we're going to be turning on account lockouts
soon, which will generate a write load on each individual server,
so I guess I'll get see what happens to my sync check script in that
case :).

RE: disable TLS compression with openssl?

2015-12-08 Thread Paul B. Henson

> From: Howard Chu
> Sent: Monday, December 07, 2015 6:26 AM
> 
> OpenLDAP does not enable compression so there is nothing to disable.

Hmm, that's not what I am seeing. Using the latest sslscan:

---
$ sslscan ldap.cpp.edu:636
Version: 1.10.6
OpenSSL 1.0.1p 9 Jul 2015

Testing SSL server ldap.cpp.edu on port 636

  TLS renegotiation:
Secure session renegotiation supported

  TLS Compression:
Compression enabled (CRIME)
[...]
-

shows that compression is enabled. As does Wireshark when sniffing the
packets over the wire. This is with openssl, perhaps gnutls behaves
differently?

> The CRIME attack does not work against LDAP or other stateful protocols
> where credentials are only sent once.

Great, thanks much for clarifying that for me.

disable TLS compression with openssl?

2015-12-06 Thread Paul B. Henson

We're currently running through all of our SSL/TLS using apps to disable
SSLv3 and update the accepted ciphers list, as well as other current
best practices. I don't see any way to disable SSL compression in
openldap? Does SSL compression with ldap traffic not lead to the same
issue as it does in web traffic?

Also, are there any plans to support ECDHE ciphers in openldap? I see
there's an ITS ticket about it, it's rather old and the last update
questioned whether those ciphers should be avoided due to potential NSA
meddling in their design.

Thanks...

RE: accesslog purge starves kerberos kdc authentications

2015-11-05 Thread Paul B. Henson

> From: Quanah Gibson-Mount
> Sent: Wednesday, November 04, 2015 6:34 PM
> 
> I set up my accesslog to do the purges every 4 hours by default, rather
> than once a day, to get around this.  You may want to do it more frequently
> than that.  I would say once a day clearly isn't often enough for the
> amount of write traffic you have.

I wasn't sure if doing it more frequently would amortize the load to the point 
where it did not impact production, or simply break production more often :). I 
guess I'll have to give it a try and see.

> You also don't note what slapd backend you're using (bdb, hdb, mdb).  bdb &
> hdb in particular are much slower, write wise, than mdb.  And you don't
> note your OpenLDAP version, either...

Sorry, we are running the latest and greatest 2.4.41 with the latest and 
greatest mdb backend :).

Thanks…

RE: Antw: accesslog purge starves kerberos kdc authentications

2015-11-05 Thread Paul B. Henson

> From: Dameon Wagner
> Sent: Thursday, November 05, 2015 3:01 AM
> 
> Just a simple question, is /var/lib/openldap-data/accesslog on the
> same physical disk as the rest of your directory storage?  I note from
> your initial thread on the kerberos list that there's small io spike
> at the same time, so it may be beneficial to have the accesslog on
> different spindles if possible.

Yes, it is. The system is a basic 1U server with a hardware RAID card and
mirrored disks. I don't have any other spindles 8-/. One of my colleagues is
giving me the big "I told you so", as he advocated upgrading to the hardware
RAID card with a battery backed write cache which might have prevented this
issue. The rest of us didn't think the extra expenditure was worth it for a
Kerberos server which typically doesn't have very high performance
requirements . I guess I'm going to try purging the accesslog more
frequently and seeing if that reduces the individual purge load to a low
enough level that it doesn't impact service response.

Thanks.

RE: Antw: Re: accesslog purge starves kerberos kdc authentications

2015-11-05 Thread Paul B. Henson

> From: Ulrich Windl
> Sent: Wednesday, November 04, 2015 11:26 PM
> 
> Maybe you have an I/O bottleneck? Could you try (for a test) to put the
> accesslog into a RAM disk? What filesystem are you using? Special mount
> options?

Yes, I'm pretty sure it is an I/O issue. The problem only occurs on the 
physical servers, the virtual machines (which are on a SAN with much better 
performance than local disks) don't exhibit this issue. While it is thrashing 
iotop shows the write load at about 2MB/s. It's on a linux system using ext4, 
the only special mount option is relatime.

Re: accesslog purge starves kerberos kdc authentications

2015-11-04 Thread Paul B. Henson

On Wed, Nov 04, 2015 at 07:46:47AM +0100, Michael Ströder wrote:

> Do you have an eq-index on the reqStart attribute as recommended
> in slapo-accesslog(5)?

Yes:

index default eq
index entryCSN,objectClass,reqEnd,reqResult,reqStart

Re: Antw: accesslog purge starves kerberos kdc authentications

2015-11-04 Thread Paul B. Henson

On Wed, Nov 04, 2015 at 09:08:47AM +0100, Ulrich Windl wrote:
> What type of indexes do you have for your accesslog? Any warning about
> missing index in syslog?

The overall accesslog config is:

database mdb
directory /var/lib/openldap-data/accesslog
maxsize 2147483648
suffix cn=accesslog
rootdn cn=accesslog

index default eq
index entryCSN,objectClass,reqEnd,reqResult,reqStart


overlay accesslog
logdb cn=accesslog
logops writes
logsuccess TRUE
logpurge 07+00:00 01+00:00

I haven't seen any errors or warnings in the openldap logs; the only
reason we noticed was the degraded kerberos performance.

Thanks...

accesslog purge starves kerberos kdc authentications

2015-11-03 Thread Paul B. Henson

We're running MIT kerberos with the ldap backend, specifically 3
openldap servers doing delta syncrepl. We started having a problem a
while back where once a day the kdc would time out authentication
requests, and finally tracked it down to openldap purging the accesslog.
We currently have the accesslog overlay configured to delete entries
over 7 days old once a day, and it seems that while openldap is
processing the purge the kdc is starved out and unable to process
authentications in a timely fashion. We do (thanks to our ISO) have
account lockout enabled, so every authentication involves not only a
read but a write.

Is it expected for the accesslog purge to be so disruptive? Is there any
way to tune it so it doesn't overwhelm the system to the point of being
unresponsive?

Would it be better to purge the accesslog more frequently as to amortize
the work across multiple intervals rather than being concentrated once a
day?

Thanks for any suggestions...

ITS#8046 - remote unauth DoS on 2.4.40

2015-02-06 Thread Paul B. Henson

I haven't seen any announcement of this other than on security lists,
but there's an unauthenticated remote DoS bug in 2.4.40:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776991

The actual ITS is a bit confusing, the reporter at one point says he had
the issue with a beta version of 2.4.40 and it didn't work against
release, but debian confirmed it kills their official 2.4.40 package and
it caused a segfault against my gentoo 2.4.40 release, so if you're
running 2.4.40 (older versions not vulnerable), it's probably worth
applying the patch from head:

http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=patch;h=2f1a2dd329b91afe561cd06b872d09630d4edb6a

I rebuilt my 2.4.40 with this and it no longer dies when the PoC query
is issued.

Re: ITS#8046 - remote unauth DoS on 2.4.40

2015-02-06 Thread Paul B. Henson

On Fri, Feb 06, 2015 at 02:09:47PM -0800, Xin Li wrote:

 Is there a CVE number for this one?

There's been a request:

http://www.openwall.com/lists/oss-security/2015/02/06/3

but I haven't seen one assigned.

I forgot to mention there's also a remote DoS in the deref overlay in
slapd 2.4.13 through 2.4.40, as I don't use that.

1 2 >

1 - 100 of 176 matches

Mail list logo