Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-24 Thread Heikki Linnakangas

On 03/22/2014 02:59 AM, Erik van Zijst wrote:

Is there any way I can get postgres to perform the hash calculations
on the *result* of the other parts of the where clause, instead of the
other way around? Or else rewrite the query?


The planner doesn't know that the crypt function is expensive. That can 
be fixed with ALTER FUNCTION crypt(text, text) COST high value. Even 
with that, I'm not sure if the planner is smart enough to optimize the 
query the way you'd want, but it's worth a try.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-24 Thread Erik van Zijst
On Mon, Mar 24, 2014 at 12:08 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 03/22/2014 02:59 AM, Erik van Zijst wrote:

 Is there any way I can get postgres to perform the hash calculations
 on the *result* of the other parts of the where clause, instead of the
 other way around? Or else rewrite the query?


 The planner doesn't know that the crypt function is expensive. That can be
 fixed with ALTER FUNCTION crypt(text, text) COST high value. Even with
 that, I'm not sure if the planner is smart enough to optimize the query the
 way you'd want, but it's worth a try.

Thanks. That's the kind of hint I was looking for.

I just gave it a go, but unfortunately it's not enough to get the
optimizer to change the plan, regardless of the cost value.

Cheers,
Erik


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-23 Thread Tom Lane
bricklen brick...@gmail.com writes:
 Perhaps someone else will have some other ideas of what could be useful
 here.

Maybe I'm missing something ... but isn't the OP's query completely bogus?

SELECT DISTINCT u.*
FROM auth_user u
JOIN bb_userprofile p ON p.user_id = u.id
JOIN bb_identity i ON i.profile_id = p.id
WHERE
(
  (
u.username ILIKE 'detkin'
OR
i.email ILIKE 'foo(at)example(dot)com'
  )
  AND
  (
SUBSTRING(password FROM 8) = CRYPT(
  'detkin', SUBSTRING(password FROM 8))
  )
)

Granting that there are not chance collisions of password hashes (which
would surely be a bad thing if there were), success of the second AND arm
means that we are on user detkin's row of auth_user.  Therefore the OR
business is entirely nonfunctional: if the password test passes, then
the u.username ILIKE 'detkin' clause succeeds a fortiori, while if the
password test fails, it hardly matters what i.email is, because the WHERE
clause as a whole fails.  Ergo, the whole WHERE clause might as well just
be written u.username = 'detkin'.  If it were a RIGHT JOIN rather than
just a JOIN, this argument would fail ... but as written, the query
makes little sense.

I'll pass gently over the question of whether the password test as shown
could ever succeed at all.

I suppose we've been shown a lobotomized version of the real logic,
but it's hard to give advice in such situations.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-23 Thread Erik van Zijst
On Sat, Mar 22, 2014 at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maybe I'm missing something ... but isn't the OP's query completely bogus?

 SELECT DISTINCT u.*
 FROM auth_user u
 JOIN bb_userprofile p ON p.user_id = u.id
 JOIN bb_identity i ON i.profile_id = p.id
 WHERE
 (
   (
 u.username ILIKE 'detkin'
 OR
 i.email ILIKE 'foo(at)example(dot)com'
   )
   AND
   (
 SUBSTRING(password FROM 8) = CRYPT(
   'detkin', SUBSTRING(password FROM 8))
   )
 )

 Granting that there are not chance collisions of password hashes (which
 would surely be a bad thing if there were),

Would it?

Any hashing system is inherently open to collision (although you're
more likely to find 2 identical snowflakes), but how does that affect
our situation? It means you simply would have found another password
for that user that is just as valid. The system will accept it.

 success of the second AND arm
 means that we are on user detkin's row of auth_user.

My password could be 'detkin' too, but my username is 'erik'.

 Therefore the OR
 business is entirely nonfunctional: if the password test passes, then
 the u.username ILIKE 'detkin' clause succeeds a fortiori, while if the
 password test fails, it hardly matters what i.email is, because the WHERE
 clause as a whole fails.

My email could be 'f...@example.com', my username 'erik' and my
password 'detkin'.

Users are identified through their unique username or email address.
Passwords are not unique.

 I suppose we've been shown a lobotomized version of the real logic,
 but it's hard to give advice in such situations.

This is an actual query taken from the system.

Cheers,
Erik


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread bricklen
On Fri, Mar 21, 2014 at 5:59 PM, Erik van Zijst erik.van.zi...@gmail.comwrote:

 Hi there,

 I've got a relatively simple query that contains expensive BCRYPT
 functions that gets optimized in a way that causes postgres to compute
 more bcrypt hashes than necessary, thereby dramatically slowing things
 down.

 In a certain part of our application we need to lookup users by their
 username, email address and password. Now we don't store plaintext
 passwords and so the query needs to compute bcrypt hashes on the fly:

 SELECT DISTINCT u.*
 FROM auth_user u
 JOIN bb_userprofile p ON p.user_id = u.id
 JOIN bb_identity i ON i.profile_id = p.id
 WHERE
 (
   (
 u.username ILIKE 'detkin'
 OR
 i.email ILIKE 'f...@example.com'
   )
   AND
   (
 SUBSTRING(password FROM 8) = CRYPT(
   'detkin', SUBSTRING(password FROM 8))
   )
 )

 These queries are generated by a parser that translates from an
 external query language to SQL run on the database. This test db
 contains 12 user records.

 With a single bcrypt hash taking ~300ms to compute, this is a recipe
 for disaster and so the app only allows queries that require only a
 very small number of bcrypt computation.

 E.g. the user must always AND the password lookup with a clause like
  username = 'foo' AND password = 'bar' (username is unique).

 However, while the query above technically only needs to compute 1
 hash (there is a user 'detkin' and email 'f...@example.com' does not
 exist), it instead creates a query plan that computes hashes *before*
 filtering on username and email, leading to 12 hash computations and a
 very slow query.

 The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE

 The schemas for the 3 tables involved are here:
 http://pgsql.privatepaste.com/f72020ad0a

 As a quick experiment I tried moving the joins and email lookup into a
 nested IN query, but that still generates a plan that computes hashes
 for all 12 users, before picking out the 1 whose username matches.

 Is there any way I can get postgres to perform the hash calculations
 on the *result* of the other parts of the where clause, instead of the
 other way around? Or else rewrite the query?

 Cheers,
 Erik


(untested), but how about something like the following:

WITH au AS (
SELECT DISTINCT u.*
FROM auth_user u
JOIN bb_userprofile p ON p.user_id = u.id
JOIN bb_identity i ON i.profile_id = p.id
WHERE u.username ILIKE 'detkin'
OR i.email ILIKE 'f...@example.com')

SELECT au.*
FROM au
WHERE SUBSTRING(au.password FROM 8) = CRYPT('detkin', SUBSTRING(au.password
FROM 8));


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread Erik van Zijst
Yes, that works (it does at least on my small test database).

However, these queries are generated by a parser that translates
complex parse trees from a higher level DSL that doesn't lend itself
well to logically isolating the crypt checks from the remaining
conditions, as password checks might be present at arbitrary depths.

For example:

(
  active eq true
  AND
  (
password eq foo
OR
password eq bar
  )
)
AND
(
  username eq erik
  OR
  email contains bar
)

Currently the SQL generator translates each AST node into individual
predicates that straightforwardly concatenate into a single SQL WHERE
clause. For this to work, the individual nodes should compose well. I
don't immediately see how the above query could be automatically
translated into SQL when taking the WITH-AS approach.

I could nonetheless take a stab at it, but life would certainly be
easier if I could translate each component independently and leave
optimization to the query planner.

Cheers,
Erik


On Sat, Mar 22, 2014 at 1:51 PM, bricklen brick...@gmail.com wrote:

 On Fri, Mar 21, 2014 at 5:59 PM, Erik van Zijst erik.van.zi...@gmail.com
 wrote:

 Hi there,

 I've got a relatively simple query that contains expensive BCRYPT
 functions that gets optimized in a way that causes postgres to compute
 more bcrypt hashes than necessary, thereby dramatically slowing things
 down.

 In a certain part of our application we need to lookup users by their
 username, email address and password. Now we don't store plaintext
 passwords and so the query needs to compute bcrypt hashes on the fly:

 SELECT DISTINCT u.*
 FROM auth_user u
 JOIN bb_userprofile p ON p.user_id = u.id
 JOIN bb_identity i ON i.profile_id = p.id
 WHERE
 (
   (
 u.username ILIKE 'detkin'
 OR
 i.email ILIKE 'f...@example.com'
   )
   AND
   (
 SUBSTRING(password FROM 8) = CRYPT(
   'detkin', SUBSTRING(password FROM 8))
   )
 )

 These queries are generated by a parser that translates from an
 external query language to SQL run on the database. This test db
 contains 12 user records.

 With a single bcrypt hash taking ~300ms to compute, this is a recipe
 for disaster and so the app only allows queries that require only a
 very small number of bcrypt computation.

 E.g. the user must always AND the password lookup with a clause like
  username = 'foo' AND password = 'bar' (username is unique).

 However, while the query above technically only needs to compute 1
 hash (there is a user 'detkin' and email 'f...@example.com' does not
 exist), it instead creates a query plan that computes hashes *before*
 filtering on username and email, leading to 12 hash computations and a
 very slow query.

 The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE

 The schemas for the 3 tables involved are here:
 http://pgsql.privatepaste.com/f72020ad0a

 As a quick experiment I tried moving the joins and email lookup into a
 nested IN query, but that still generates a plan that computes hashes
 for all 12 users, before picking out the 1 whose username matches.

 Is there any way I can get postgres to perform the hash calculations
 on the *result* of the other parts of the where clause, instead of the
 other way around? Or else rewrite the query?

 Cheers,
 Erik


 (untested), but how about something like the following:

 WITH au AS (

 SELECT DISTINCT u.*
 FROM auth_user u
 JOIN bb_userprofile p ON p.user_id = u.id
 JOIN bb_identity i ON i.profile_id = p.id
 WHERE u.username ILIKE 'detkin'

 OR i.email ILIKE 'f...@example.com')

 SELECT au.*
 FROM au
 WHERE SUBSTRING(au.password FROM 8) = CRYPT('detkin', SUBSTRING(au.password
 FROM 8));



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread bricklen
On Sat, Mar 22, 2014 at 3:27 PM, Erik van Zijst erik.van.zi...@gmail.comwrote:

 Yes, that works (it does at least on my small test database).

 However, these queries are generated by a parser that translates
 complex parse trees from a higher level DSL that doesn't lend itself
 well to logically isolating the crypt checks from the remaining
 conditions, as password checks might be present at arbitrary depths.

 For example:

 (
   active eq true
   AND
   (
 password eq foo
 OR
 password eq bar
   )
 )
 AND
 (
   username eq erik
   OR
   email contains bar
 )

 Currently the SQL generator translates each AST node into individual
 predicates that straightforwardly concatenate into a single SQL WHERE
 clause. For this to work, the individual nodes should compose well. I
 don't immediately see how the above query could be automatically
 translated into SQL when taking the WITH-AS approach.

 I could nonetheless take a stab at it, but life would certainly be
 easier if I could translate each component independently and leave
 optimization to the query planner.



How about encapsulating the revised query inside a db function? That
simplifies the query for your query generator to something like select
x,y,z from your_func(p_user,p_email,p_crypt)


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread Erik van Zijst
On Sat, Mar 22, 2014 at 3:56 PM, bricklen brick...@gmail.com wrote:
 On Sat, Mar 22, 2014 at 3:27 PM, Erik van Zijst erik.van.zi...@gmail.com
 I could nonetheless take a stab at it, but life would certainly be
 easier if I could translate each component independently and leave
 optimization to the query planner.

 How about encapsulating the revised query inside a db function? That
 simplifies the query for your query generator to something like select
 x,y,z from your_func(p_user,p_email,p_crypt)

I'm not really sure I understand how a db function would make things
easier. What would the implementation for your_func() be and what
would the SQL look like for the DSL example which contains multiple
password checks?

Cheers,
Erik


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-22 Thread bricklen
On Sat, Mar 22, 2014 at 8:37 PM, Erik van Zijst erik.van.zi...@gmail.comwrote:

 On Sat, Mar 22, 2014 at 3:56 PM, bricklen brick...@gmail.com wrote:
  On Sat, Mar 22, 2014 at 3:27 PM, Erik van Zijst 
 erik.van.zi...@gmail.com
  I could nonetheless take a stab at it, but life would certainly be
  easier if I could translate each component independently and leave
  optimization to the query planner.
 
  How about encapsulating the revised query inside a db function? That
  simplifies the query for your query generator to something like select
  x,y,z from your_func(p_user,p_email,p_crypt)

 I'm not really sure I understand how a db function would make things
 easier. What would the implementation for your_func() be and what
 would the SQL look like for the DSL example which contains multiple
 password checks?


I just reread your previous post about the checks being at potentially
arbitrary depths. In that case, the function may or may not help. Without a
representative database to test with I can't say one way or the other.
Perhaps someone else will have some other ideas of what could be useful
here.