Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread bricklen
On Fri, Mar 21, 2014 at 5:59 PM, Erik van Zijst erik.van.zi...@gmail.comwrote:

 Hi there,

 I've got a relatively simple query that contains expensive BCRYPT
 functions that gets optimized in a way that causes postgres to compute
 more bcrypt hashes than necessary, thereby dramatically slowing things
 down.

 In a certain part of our application we need to lookup users by their
 username, email address and password. Now we don't store plaintext
 passwords and so the query needs to compute bcrypt hashes on the fly:

 SELECT DISTINCT u.*
 FROM auth_user u
 JOIN bb_userprofile p ON p.user_id = u.id
 JOIN bb_identity i ON i.profile_id = p.id
 WHERE
 (
   (
 u.username ILIKE 'detkin'
 OR
 i.email ILIKE 'f...@example.com'
   )
   AND
   (
 SUBSTRING(password FROM 8) = CRYPT(
   'detkin', SUBSTRING(password FROM 8))
   )
 )

 These queries are generated by a parser that translates from an
 external query language to SQL run on the database. This test db
 contains 12 user records.

 With a single bcrypt hash taking ~300ms to compute, this is a recipe
 for disaster and so the app only allows queries that require only a
 very small number of bcrypt computation.

 E.g. the user must always AND the password lookup with a clause like
  username = 'foo' AND password = 'bar' (username is unique).

 However, while the query above technically only needs to compute 1
 hash (there is a user 'detkin' and email 'f...@example.com' does not
 exist), it instead creates a query plan that computes hashes *before*
 filtering on username and email, leading to 12 hash computations and a
 very slow query.

 The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE

 The schemas for the 3 tables involved are here:
 http://pgsql.privatepaste.com/f72020ad0a

 As a quick experiment I tried moving the joins and email lookup into a
 nested IN query, but that still generates a plan that computes hashes
 for all 12 users, before picking out the 1 whose username matches.

 Is there any way I can get postgres to perform the hash calculations
 on the *result* of the other parts of the where clause, instead of the
 other way around? Or else rewrite the query?

 Cheers,
 Erik


(untested), but how about something like the following:

WITH au AS (
SELECT DISTINCT u.*
FROM auth_user u
JOIN bb_userprofile p ON p.user_id = u.id
JOIN bb_identity i ON i.profile_id = p.id
WHERE u.username ILIKE 'detkin'
OR i.email ILIKE 'f...@example.com')

SELECT au.*
FROM au
WHERE SUBSTRING(au.password FROM 8) = CRYPT('detkin', SUBSTRING(au.password
FROM 8));


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread Erik van Zijst
Yes, that works (it does at least on my small test database).

However, these queries are generated by a parser that translates
complex parse trees from a higher level DSL that doesn't lend itself
well to logically isolating the crypt checks from the remaining
conditions, as password checks might be present at arbitrary depths.

For example:

(
  active eq true
  AND
  (
password eq foo
OR
password eq bar
  )
)
AND
(
  username eq erik
  OR
  email contains bar
)

Currently the SQL generator translates each AST node into individual
predicates that straightforwardly concatenate into a single SQL WHERE
clause. For this to work, the individual nodes should compose well. I
don't immediately see how the above query could be automatically
translated into SQL when taking the WITH-AS approach.

I could nonetheless take a stab at it, but life would certainly be
easier if I could translate each component independently and leave
optimization to the query planner.

Cheers,
Erik


On Sat, Mar 22, 2014 at 1:51 PM, bricklen brick...@gmail.com wrote:

 On Fri, Mar 21, 2014 at 5:59 PM, Erik van Zijst erik.van.zi...@gmail.com
 wrote:

 Hi there,

 I've got a relatively simple query that contains expensive BCRYPT
 functions that gets optimized in a way that causes postgres to compute
 more bcrypt hashes than necessary, thereby dramatically slowing things
 down.

 In a certain part of our application we need to lookup users by their
 username, email address and password. Now we don't store plaintext
 passwords and so the query needs to compute bcrypt hashes on the fly:

 SELECT DISTINCT u.*
 FROM auth_user u
 JOIN bb_userprofile p ON p.user_id = u.id
 JOIN bb_identity i ON i.profile_id = p.id
 WHERE
 (
   (
 u.username ILIKE 'detkin'
 OR
 i.email ILIKE 'f...@example.com'
   )
   AND
   (
 SUBSTRING(password FROM 8) = CRYPT(
   'detkin', SUBSTRING(password FROM 8))
   )
 )

 These queries are generated by a parser that translates from an
 external query language to SQL run on the database. This test db
 contains 12 user records.

 With a single bcrypt hash taking ~300ms to compute, this is a recipe
 for disaster and so the app only allows queries that require only a
 very small number of bcrypt computation.

 E.g. the user must always AND the password lookup with a clause like
  username = 'foo' AND password = 'bar' (username is unique).

 However, while the query above technically only needs to compute 1
 hash (there is a user 'detkin' and email 'f...@example.com' does not
 exist), it instead creates a query plan that computes hashes *before*
 filtering on username and email, leading to 12 hash computations and a
 very slow query.

 The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE

 The schemas for the 3 tables involved are here:
 http://pgsql.privatepaste.com/f72020ad0a

 As a quick experiment I tried moving the joins and email lookup into a
 nested IN query, but that still generates a plan that computes hashes
 for all 12 users, before picking out the 1 whose username matches.

 Is there any way I can get postgres to perform the hash calculations
 on the *result* of the other parts of the where clause, instead of the
 other way around? Or else rewrite the query?

 Cheers,
 Erik


 (untested), but how about something like the following:

 WITH au AS (

 SELECT DISTINCT u.*
 FROM auth_user u
 JOIN bb_userprofile p ON p.user_id = u.id
 JOIN bb_identity i ON i.profile_id = p.id
 WHERE u.username ILIKE 'detkin'

 OR i.email ILIKE 'f...@example.com')

 SELECT au.*
 FROM au
 WHERE SUBSTRING(au.password FROM 8) = CRYPT('detkin', SUBSTRING(au.password
 FROM 8));



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread bricklen
On Sat, Mar 22, 2014 at 3:27 PM, Erik van Zijst erik.van.zi...@gmail.comwrote:

 Yes, that works (it does at least on my small test database).

 However, these queries are generated by a parser that translates
 complex parse trees from a higher level DSL that doesn't lend itself
 well to logically isolating the crypt checks from the remaining
 conditions, as password checks might be present at arbitrary depths.

 For example:

 (
   active eq true
   AND
   (
 password eq foo
 OR
 password eq bar
   )
 )
 AND
 (
   username eq erik
   OR
   email contains bar
 )

 Currently the SQL generator translates each AST node into individual
 predicates that straightforwardly concatenate into a single SQL WHERE
 clause. For this to work, the individual nodes should compose well. I
 don't immediately see how the above query could be automatically
 translated into SQL when taking the WITH-AS approach.

 I could nonetheless take a stab at it, but life would certainly be
 easier if I could translate each component independently and leave
 optimization to the query planner.



How about encapsulating the revised query inside a db function? That
simplifies the query for your query generator to something like select
x,y,z from your_func(p_user,p_email,p_crypt)


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread Erik van Zijst
On Sat, Mar 22, 2014 at 3:56 PM, bricklen brick...@gmail.com wrote:
 On Sat, Mar 22, 2014 at 3:27 PM, Erik van Zijst erik.van.zi...@gmail.com
 I could nonetheless take a stab at it, but life would certainly be
 easier if I could translate each component independently and leave
 optimization to the query planner.

 How about encapsulating the revised query inside a db function? That
 simplifies the query for your query generator to something like select
 x,y,z from your_func(p_user,p_email,p_crypt)

I'm not really sure I understand how a db function would make things
easier. What would the implementation for your_func() be and what
would the SQL look like for the DSL example which contains multiple
password checks?

Cheers,
Erik


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread bricklen
On Sat, Mar 22, 2014 at 8:37 PM, Erik van Zijst erik.van.zi...@gmail.comwrote:

 On Sat, Mar 22, 2014 at 3:56 PM, bricklen brick...@gmail.com wrote:
  On Sat, Mar 22, 2014 at 3:27 PM, Erik van Zijst 
 erik.van.zi...@gmail.com
  I could nonetheless take a stab at it, but life would certainly be
  easier if I could translate each component independently and leave
  optimization to the query planner.
 
  How about encapsulating the revised query inside a db function? That
  simplifies the query for your query generator to something like select
  x,y,z from your_func(p_user,p_email,p_crypt)

 I'm not really sure I understand how a db function would make things
 easier. What would the implementation for your_func() be and what
 would the SQL look like for the DSL example which contains multiple
 password checks?


I just reread your previous post about the checks being at potentially
arbitrary depths. In that case, the function may or may not help. Without a
representative database to test with I can't say one way or the other.
Perhaps someone else will have some other ideas of what could be useful
here.