Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions
On Fri, Mar 21, 2014 at 5:59 PM, Erik van Zijst erik.van.zi...@gmail.comwrote: Hi there, I've got a relatively simple query that contains expensive BCRYPT functions that gets optimized in a way that causes postgres to compute more bcrypt hashes than necessary, thereby dramatically slowing things down. In a certain part of our application we need to lookup users by their username, email address and password. Now we don't store plaintext passwords and so the query needs to compute bcrypt hashes on the fly: SELECT DISTINCT u.* FROM auth_user u JOIN bb_userprofile p ON p.user_id = u.id JOIN bb_identity i ON i.profile_id = p.id WHERE ( ( u.username ILIKE 'detkin' OR i.email ILIKE 'f...@example.com' ) AND ( SUBSTRING(password FROM 8) = CRYPT( 'detkin', SUBSTRING(password FROM 8)) ) ) These queries are generated by a parser that translates from an external query language to SQL run on the database. This test db contains 12 user records. With a single bcrypt hash taking ~300ms to compute, this is a recipe for disaster and so the app only allows queries that require only a very small number of bcrypt computation. E.g. the user must always AND the password lookup with a clause like username = 'foo' AND password = 'bar' (username is unique). However, while the query above technically only needs to compute 1 hash (there is a user 'detkin' and email 'f...@example.com' does not exist), it instead creates a query plan that computes hashes *before* filtering on username and email, leading to 12 hash computations and a very slow query. The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE The schemas for the 3 tables involved are here: http://pgsql.privatepaste.com/f72020ad0a As a quick experiment I tried moving the joins and email lookup into a nested IN query, but that still generates a plan that computes hashes for all 12 users, before picking out the 1 whose username matches. Is there any way I can get postgres to perform the hash calculations on the *result* of the other parts of the where clause, instead of the other way around? Or else rewrite the query? Cheers, Erik (untested), but how about something like the following: WITH au AS ( SELECT DISTINCT u.* FROM auth_user u JOIN bb_userprofile p ON p.user_id = u.id JOIN bb_identity i ON i.profile_id = p.id WHERE u.username ILIKE 'detkin' OR i.email ILIKE 'f...@example.com') SELECT au.* FROM au WHERE SUBSTRING(au.password FROM 8) = CRYPT('detkin', SUBSTRING(au.password FROM 8));
Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions
Yes, that works (it does at least on my small test database). However, these queries are generated by a parser that translates complex parse trees from a higher level DSL that doesn't lend itself well to logically isolating the crypt checks from the remaining conditions, as password checks might be present at arbitrary depths. For example: ( active eq true AND ( password eq foo OR password eq bar ) ) AND ( username eq erik OR email contains bar ) Currently the SQL generator translates each AST node into individual predicates that straightforwardly concatenate into a single SQL WHERE clause. For this to work, the individual nodes should compose well. I don't immediately see how the above query could be automatically translated into SQL when taking the WITH-AS approach. I could nonetheless take a stab at it, but life would certainly be easier if I could translate each component independently and leave optimization to the query planner. Cheers, Erik On Sat, Mar 22, 2014 at 1:51 PM, bricklen brick...@gmail.com wrote: On Fri, Mar 21, 2014 at 5:59 PM, Erik van Zijst erik.van.zi...@gmail.com wrote: Hi there, I've got a relatively simple query that contains expensive BCRYPT functions that gets optimized in a way that causes postgres to compute more bcrypt hashes than necessary, thereby dramatically slowing things down. In a certain part of our application we need to lookup users by their username, email address and password. Now we don't store plaintext passwords and so the query needs to compute bcrypt hashes on the fly: SELECT DISTINCT u.* FROM auth_user u JOIN bb_userprofile p ON p.user_id = u.id JOIN bb_identity i ON i.profile_id = p.id WHERE ( ( u.username ILIKE 'detkin' OR i.email ILIKE 'f...@example.com' ) AND ( SUBSTRING(password FROM 8) = CRYPT( 'detkin', SUBSTRING(password FROM 8)) ) ) These queries are generated by a parser that translates from an external query language to SQL run on the database. This test db contains 12 user records. With a single bcrypt hash taking ~300ms to compute, this is a recipe for disaster and so the app only allows queries that require only a very small number of bcrypt computation. E.g. the user must always AND the password lookup with a clause like username = 'foo' AND password = 'bar' (username is unique). However, while the query above technically only needs to compute 1 hash (there is a user 'detkin' and email 'f...@example.com' does not exist), it instead creates a query plan that computes hashes *before* filtering on username and email, leading to 12 hash computations and a very slow query. The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE The schemas for the 3 tables involved are here: http://pgsql.privatepaste.com/f72020ad0a As a quick experiment I tried moving the joins and email lookup into a nested IN query, but that still generates a plan that computes hashes for all 12 users, before picking out the 1 whose username matches. Is there any way I can get postgres to perform the hash calculations on the *result* of the other parts of the where clause, instead of the other way around? Or else rewrite the query? Cheers, Erik (untested), but how about something like the following: WITH au AS ( SELECT DISTINCT u.* FROM auth_user u JOIN bb_userprofile p ON p.user_id = u.id JOIN bb_identity i ON i.profile_id = p.id WHERE u.username ILIKE 'detkin' OR i.email ILIKE 'f...@example.com') SELECT au.* FROM au WHERE SUBSTRING(au.password FROM 8) = CRYPT('detkin', SUBSTRING(au.password FROM 8)); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions
On Sat, Mar 22, 2014 at 3:27 PM, Erik van Zijst erik.van.zi...@gmail.comwrote: Yes, that works (it does at least on my small test database). However, these queries are generated by a parser that translates complex parse trees from a higher level DSL that doesn't lend itself well to logically isolating the crypt checks from the remaining conditions, as password checks might be present at arbitrary depths. For example: ( active eq true AND ( password eq foo OR password eq bar ) ) AND ( username eq erik OR email contains bar ) Currently the SQL generator translates each AST node into individual predicates that straightforwardly concatenate into a single SQL WHERE clause. For this to work, the individual nodes should compose well. I don't immediately see how the above query could be automatically translated into SQL when taking the WITH-AS approach. I could nonetheless take a stab at it, but life would certainly be easier if I could translate each component independently and leave optimization to the query planner. How about encapsulating the revised query inside a db function? That simplifies the query for your query generator to something like select x,y,z from your_func(p_user,p_email,p_crypt)
Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions
On Sat, Mar 22, 2014 at 3:56 PM, bricklen brick...@gmail.com wrote: On Sat, Mar 22, 2014 at 3:27 PM, Erik van Zijst erik.van.zi...@gmail.com I could nonetheless take a stab at it, but life would certainly be easier if I could translate each component independently and leave optimization to the query planner. How about encapsulating the revised query inside a db function? That simplifies the query for your query generator to something like select x,y,z from your_func(p_user,p_email,p_crypt) I'm not really sure I understand how a db function would make things easier. What would the implementation for your_func() be and what would the SQL look like for the DSL example which contains multiple password checks? Cheers, Erik -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions
On Sat, Mar 22, 2014 at 8:37 PM, Erik van Zijst erik.van.zi...@gmail.comwrote: On Sat, Mar 22, 2014 at 3:56 PM, bricklen brick...@gmail.com wrote: On Sat, Mar 22, 2014 at 3:27 PM, Erik van Zijst erik.van.zi...@gmail.com I could nonetheless take a stab at it, but life would certainly be easier if I could translate each component independently and leave optimization to the query planner. How about encapsulating the revised query inside a db function? That simplifies the query for your query generator to something like select x,y,z from your_func(p_user,p_email,p_crypt) I'm not really sure I understand how a db function would make things easier. What would the implementation for your_func() be and what would the SQL look like for the DSL example which contains multiple password checks? I just reread your previous post about the checks being at potentially arbitrary depths. In that case, the function may or may not help. Without a representative database to test with I can't say one way or the other. Perhaps someone else will have some other ideas of what could be useful here.