[
https://issues.apache.org/jira/browse/OAK-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581217#comment-13581217
]
Jukka Zitting commented on OAK-634:
-----------------------------------
bq. regarding your patch: do you have any figures on how that behaves in case
of millions of users and what
would be the possible impact of the synchronized map?
Not yet, it's just a rough first take on this.
That said, the size of the map as currently implemented costs a few hundred
bytes per password of a user who has successfully logged in after the
repository started. Thus for a system with a million users that *all* log in,
the map could grow to a few hundred megabytes.
The synchronization cost should be negligible as only the map itself is
synchronized, not the {{isSame}} calls, and as the map accesses are much faster
than the hash calculations surrounding them.
> PasswordUtility.isSame() performance bottleneck
> -----------------------------------------------
>
> Key: OAK-634
> URL: https://issues.apache.org/jira/browse/OAK-634
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core
> Reporter: Jukka Zitting
> Labels: performance
> Attachments: jackrabbit.patch, oak.patch
>
>
> The default 1000 SHA-256 iterations used for password hashes are seriously
> impacting the performance of login() calls. Here's a performance report of
> the number of milliseconds that a successful login takes with Jackrabbit 2.x
> and Oak (with an in-memory MK):
> {noformat}
> # Login min 10% 50% 90% max
> Jackrabbit 560 570 577 704 1522
> Oak-Memory 2537 2586 2630 2811 2916
> {noformat}
> Over 50% of that time is spent doing hash iterations in
> {{PasswordUtility.isSame()}}. This is a problem for two main reasons:
> # It severely drags down performance of acquiring a new session; something
> which should be essentially free.
> # It opens the denial of service attack vector of just bombarding a system
> with login attempts, which would cause CPU usage to spike.
> Iterating a password hash is a good idea for preventing offline attacks
> against a stolen password database (though instead of SHA-256 we should be
> using something like bcrypt that's explicitly designed and analyzed for this
> purpose), but the current implementation doesn't make much sense in a
> scenario like ours where we can expect dozens or hundreds of logins per
> second even in normal non-peak use cases. Password iteration makes more sense
> in use cases where logins are infrequent (e.g. once a day per user) and
> persisted through something like a session key.
> So, assuming we want to keep the cost of an offline attack high, here's what
> I suggest we do for password-based logins:
> * Switch to bcrypt or a similar password hashing algorithm if possible.
> * For each active user in the system, keep an in-memory record to speed up
> login calls.
> ** On a successful login the record should be updated to contain a password
> hash with just one iteration (calculated from the plain text password
> provided in the successful login). Use this instead of the in-repository
> password hash for authenticating further login attempts.
> ** The record should also keep track of unsuccessful login attempts and limit
> them to at most N attempts per minute to prevent DOS attacks.
> The result of such in-memory record keeping should be to massively speed up
> normal logins (point 1 above) and also to cap the CPU use of the potential
> DOS attack (point 2) to O(N*K) cycles per minute, with K being the total
> number of users in the system.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira