Distributed ppolicy state

Howard Chu Thu, 22 Oct 2009 00:45:45 -0700

One of the major concerns I still have with password policy is the issue ofthe overhead involved in maintaining so many policy state variables forauthentication failure / lockout tracking. It turns what would otherwise bepure read operations into writes, which is already troublesome for some cases.But in the context of replication, the problem can be multiplied by the numberof replicas in use. Avoiding this write magnification effect is one of thereasons the initial versions of the ppolicy overlay explicitly prevented itsstate updates from being replicated. Replicating these state updates for everyauthentication request simply won't scale.

Unfortunately the braindead account lockout policy really doesn't work wellwithout this sort of state information.

The problem is not much different from the scaling issues we have to deal within making code run well on multiprocessor / multicore machines. Havingdeveloped effective solutions to those problems, we ought to be able to applythe same thinking to this as well.

The key to excellent scaling is the so-called "shared-nothing" approach, whereevery processor just uses its own local resources and never has to synchronizewith ( == wait for) any other processor, but for the most part it's a designideal, not something you can do perfectly in practice. However, we have somerecent examples in the slapd code where we've been able to use this approachto good effect.

In the connection manager, we used to handle monitoring/counter information(number of ops, type of ops, etc) in a single counter, which required a lot oflocking overhead to update. We now use an array of counters per thread, andeach thread can update its own counters for free, completely eliminating thelocking overhead. The trick is in recognizing that this type of info iswritten far more often than it is read, so optimizing the update case is farmore important than optimizing the query case. When someone tries to read thecounters that are exposed in back-monitor, then we simply iterate across thearrays and tally up the counters then. Since there's no particular requirementthat all the counters be read in the same instant in time, all of thesereads/updates can be performed without locking, so again we get it for free,no synchronization overhead at all.


So, it should now be obvious where we should go with the replication issue...

Ideally, you want password policy enforcement rules that don't even needglobal state at all. IMO, the best approach is still to keep policy stateprivate to each DSA, and this still makes sense for DSAs that aretopologically remote. E.g., assume you have a pair of servers, each in twoseparate cities. It's unlikely that a login attempt on one server will be inany way connected to a simultaneous login attempt on the other server. And inthe face of bot attack, the rate of logins will probably be high enough toswamp the channel between the two servers, resulting in queueing delays thatultimately aggregate several of the updates on the attacked server into just asingle update on the remote server. (E.g., N separate failure updates on oneserver will coalesce into a single update on the remote server.)Therefore, most of the time it's pointless for each server to try toimmediately update the other with login failure info.

In the case of a local, load-balanced cluster of replicas, where the networklatency between DSAs is very low, the natural coalescing of updates may notoccur as often. Still, it would be better if the updates didn't happen at all.And in such an environment, where the DSAs are so close together that latencyis low, distributing reads is still cheaper than distributing writes. So, thecorrect way to implement this global state is to keep it distributedseparately during writes, and collect it during reads.

I'm looking for a way to express this in the schema and in the ppolicy draft,but I'm not sure how just yet. It strikes me that X.500 probably already has atype of distributed/collective attribute but I haven't looked yet.

Also I think we can take this a step further, but haven't thought it throughall the way yet. If you typically have login failures coming from a singleclient, it should be sufficient to always route that client's requests to thesame DSA, and have all of its failure tracking done locally/privately on that DSA.

At the other end, if you have an attack mounted by a number of separatemachines, it's not clear that you must necessarily collect the state fromevery DSA on every authentication request. E.g., if you're setting a lockoutbased on the number of login failures, once the failure counter on a singleDSA reaches the lockout threshold, it doesn't matter any more what the failurecounter is on any other DSA, so that DSA no longer needs to look for thevalues on any other node.

If a client comes along and does a search to retrieve the policy state, e.g.looking for the last successful login or the last failure, then you wantwhatever DSA receives the request to broadcast the search to all the otherDSAs and collate the results for the client by default. (Note that simpleaggregation only works for multivalued attributes; for single-valuedattributes like pwdLastSuccess you have to know to pick the most recentvalue.) And probably you should be able to specify a control (likeManageDSAit) to disable this automatic broadcast and only retrieve the valuefrom a single DSA.

I realize that the points listed above about login attacks miss several attackscenarios. I think more of the scenarios need to be outlined and analyzedbefore moving forward with any recommendations on lockout behavior; theinternet today is pretty different from when these lockout mechanisms werefirst designed.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Distributed ppolicy state

Reply via email to