Re: Join optimization
- Mensagem original - De: "David Rowley" Para: "luis.roberto" Cc: "Fabrízio de Royes Mello" , "pgsql-general" Enviadas: Domingo, 12 de julho de 2020 5:29:08 Assunto: Re: Join optimization On Sun, 12 Jul 2020 at 06:59, wrote: > > I'm sorry for the bad example. > > Here is another, with some data on PG: > https://dbfiddle.uk/?rdbms=postgres_13=ccfd1c4fa291e74a6db9db1772e2b5ac > and Oracle: > https://dbfiddle.uk/?rdbms=oracle_18=21a98f499065ad4e2c35ff4bd1487e14. I believe what you're talking about is join removals. It appears as though Oracle is able to remove the inner join to the users table as the join only serves to check the user record exists. No columns are being selected. The record must exist due to the foreign key referencing users. PostgreSQL currently can only remove left joins. Likely what you could do here is just change your inner join into a left join. If you're happy enough that the user record will always exist then that should allow it to work. The reason PostgreSQL does not currently support inner join is that by default, foreign key constraints are only triggered at the end of the query, (or if deferred, at the end of the transaction). WIth PostgreSQL, it's possible for a SELECT query to see a violated foreign key constraint. This can happen if your SELECT query calls a function which updates a referenced record. The cascade of the foreign key won't occur until the end of the statement, so the select may stumble upon a violated foreign key. Here's a quick example of this case: drop table t1,t2; create table t1 (a int primary key); create table t2 (a int references t1 on update cascade); insert into t1 values(1),(2); insert into t2 values(2),(2); create or replace function update_t1 (p_a int) returns int as $$ begin update t1 set a = a + 1 where a = p_a; return p_a; end; $$ language plpgsql volatile; -- in theory, this should never return any rows as we're asking for all rows that -- don't exist in the referenced table. However, we do get a result since the function -- call updates t1 setting the row with a=2 to a=3. The cascade to t2 does not occur -- until the end of the statement. select update_t1(a+1),t1.a from t1 where not exists(select 1 from t2 where t1.a=t2.a); update_t1 | a ---+--- 2 | 1 (1 row) If you're happy that you don't have any functions like that which could momentarily cause the foreign key to appear violated, then there shouldn't be any harm in changing the INNER JOIN on users to a LEFT JOIN. PostgreSQL will be able to remove the join in that case. David - Thanks for the reply David! I understand it better now.
Re: Join optimization
On Sun, 12 Jul 2020 at 06:59, wrote: > > I'm sorry for the bad example. > > Here is another, with some data on PG: > https://dbfiddle.uk/?rdbms=postgres_13=ccfd1c4fa291e74a6db9db1772e2b5ac > and Oracle: > https://dbfiddle.uk/?rdbms=oracle_18=21a98f499065ad4e2c35ff4bd1487e14. I believe what you're talking about is join removals. It appears as though Oracle is able to remove the inner join to the users table as the join only serves to check the user record exists. No columns are being selected. The record must exist due to the foreign key referencing users. PostgreSQL currently can only remove left joins. Likely what you could do here is just change your inner join into a left join. If you're happy enough that the user record will always exist then that should allow it to work. The reason PostgreSQL does not currently support inner join is that by default, foreign key constraints are only triggered at the end of the query, (or if deferred, at the end of the transaction). WIth PostgreSQL, it's possible for a SELECT query to see a violated foreign key constraint. This can happen if your SELECT query calls a function which updates a referenced record. The cascade of the foreign key won't occur until the end of the statement, so the select may stumble upon a violated foreign key. Here's a quick example of this case: drop table t1,t2; create table t1 (a int primary key); create table t2 (a int references t1 on update cascade); insert into t1 values(1),(2); insert into t2 values(2),(2); create or replace function update_t1 (p_a int) returns int as $$ begin update t1 set a = a + 1 where a = p_a; return p_a; end; $$ language plpgsql volatile; -- in theory, this should never return any rows as we're asking for all rows that -- don't exist in the referenced table. However, we do get a result since the function -- call updates t1 setting the row with a=2 to a=3. The cascade to t2 does not occur -- until the end of the statement. select update_t1(a+1),t1.a from t1 where not exists(select 1 from t2 where t1.a=t2.a); update_t1 | a ---+--- 2 | 1 (1 row) If you're happy that you don't have any functions like that which could momentarily cause the foreign key to appear violated, then there shouldn't be any harm in changing the INNER JOIN on users to a LEFT JOIN. PostgreSQL will be able to remove the join in that case. David
Re: Join optimization
I'm sorry for the bad example. Here is another, with some data on PG: [ https://dbfiddle.uk/?rdbms=postgres_13=ccfd1c4fa291e74a6db9db1772e2b5ac | https://dbfiddle.uk/?rdbms=postgres_13=ccfd1c4fa291e74a6db9db1772e2b5ac ] and Oracle: [ https://dbfiddle.uk/?rdbms=oracle_18=21a98f499065ad4e2c35ff4bd1487e14 | https://dbfiddle.uk/?rdbms=oracle_18=21a98f499065ad4e2c35ff4bd1487e14 ] . I don't understand oracle's execution plan very well, but it doesn't seem to be hitting the Users table... De: "Fabrízio de Royes Mello" Para: "luis.roberto" Cc: "pgsql-general" Enviadas: Sábado, 11 de julho de 2020 15:24:04 Assunto: Re: Join optimization Em sáb, 11 de jul de 2020 às 14:20, < [ mailto:luis.robe...@siscobra.com.br | luis.robe...@siscobra.com.br ] > escreveu: Hi! Recently on a blogpost [1] I saw that Oracle was able to "optimize" a join strategy by completely eliminating access to a table. Heres the execution on Oracle 18c [2] and PostgreSQL 13 (beta) [3]. Is there a fundamental reason why PG can't do the same? It does... did you see the “never executed” notice on the Postgres explain output? Regards, BQ_BEGIN BQ_END -- Fabrízio de Royes Mello Timbira - [ http://www.timbira.com.br/ | http://www.timbira.com.br/ ] PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: Join optimization
Em sáb, 11 de jul de 2020 às 14:20, escreveu: > Hi! > > Recently on a blogpost [1] I saw that Oracle was able to "optimize" a join > strategy by completely eliminating access to a table. > > Heres the execution on Oracle 18c [2] and PostgreSQL 13 (beta) [3]. > > Is there a fundamental reason why PG can't do the same? > It does... did you see the “never executed” notice on the Postgres explain output? Regards, -- Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento