Hello Howard
On 2016-02-21 17:48, Howard Chu wrote:
Bruncko Michal wrote:
Hello list
We use ppolicy overlay for enforcing password lifecycle. Recently we
faced
with following issue and now I am trying to do some countermeasures to
minimize risk of issue reoccurring.
We use openldap server for user authentication. Here we store objects
of real
users as well as system users (for daemons and so on). We use
redundant setup
with two openldap servers running in mirror mode (multi-master).
- Few days ago I find out that I wasn't able to log into service which
uses
this LDAP as authentication backend.
- I find out that BOTH openldap servers are down (simply process
wasn't
listening)
- checked LDAP database partition (dedicated partition for storing
both DB and
LOG BDB transaction files) and it was exhausted on both servers
- reason of this exhaustion was a couple of BDB log files created just
within
last few minutes before daemon went down
- based on slapd logs it seems that one system user (used by Nagios)
had
expired password - for which I forgot to set no password expiration
- and it seems that those failed authentication tries caused this
transaction
logs to exhaust partition, as because for each failed bind, new
"pwdFailureTime" value was added into object which is basically normal
ldap
modify operation causing transaction log to involve.
- and as because that system user was used by Nagios for various
purposes and
LDAP BIND rate was really high, it effectively behave like DoS to kill
my ldap
servers due partition space exhausting
obviously I have fixed policy for that system user to keep password
with
unlimited expiration time. but anyway this DoS can be basically
reproduced by
any real user from outside to effectively kill those ldap servers.
Redundancy
with multiple servers does not provide any benefit as modifying
pwdFailureTime
is propagated over all cluster servers with same result to disk space.
Also
expanding partition will not help - it only extends service
availability based
on allocated space - and bdb log consuming was really huge - 15 log
files
(each with 10MB size) was created just within two minutes!!
now the question: did anybody considered this "effect" of using
"pwdFailureTime" attribute? If so, what can I do to avoid this
behavior to
occur? Or how you are facing with this potential kind of issues? On
one side
it is fine to see some failure attempt history. Also keeping
pwdFailureTime
limited to some max number of values will not help as the LDAP modify
operation have to be done anyway. For me the only useful possibility
is to NOT
use this attribute pwdFailureTime at all, but how to do it? I haven't
found
any possibility to disable using this attribute.
This is ITS#8327. The fix is released in 2.4.44.
You should upgrade.
You should not be using any BerkeleyDB-based backends, use back-mdb
which does not need transaction log files.
many thanks for this. this is a bit odd that even in latest centos7
(what we wanted to use for upgrade) there is old version. so the only
option would be to build from scratch.
is there any option to stop using pwdFailureTime attribute? if I set
global ACL rule like this:
access to attrs=pwdFailureTime
by * none
...will it work? or I assume not as overlay ppolicy is not represented
by any DN during modification.
thanks
michal
openldap-2.4.40/Centos6
many thanks for help
michal