Re: [PERFORM] select with max functions
On 03/10/17 04:29, Tom Lane wrote: Mariel Cherkasskywrites: explain analyze SELECT Ma.User_Id, COUNT(*) COUNT FROM Manuim Ma WHERE Ma.Bb_Open_Date = (SELECT Bb_Open_Date FROM Manuim Man WHERE Man.User_Id = Ma.User_Id order by bb_open_date desc limit 1 ) GROUP BY Ma.User_Id HAVING COUNT(*) > 1; The core problem with this query is that the sub-select has to be done over again for each row of the outer table, since it's a correlated sub-select (ie, it refers to Ma.User_Id from the outer table). Replacing a max() call with handmade logic doesn't do anything to help that. I'd try refactoring it so that you calculate the max Bb_Open_Date just once for each user id, perhaps along the lines of SELECT Ma.User_Id, COUNT(*) COUNT FROM Manuim Ma, (SELECT User_Id, max(Bb_Open_Date) as max FROM Manuim Man GROUP BY User_Id) ss WHERE Ma.User_Id = ss.User_Id AND Ma.Bb_Open_Date = ss.max GROUP BY Ma.User_Id HAVING COUNT(*) > 1; This is still not going to be instantaneous, but it might be better. It's possible that an index on (User_Id, Bb_Open_Date) would help, but I'm not sure. regards, tom lane Further ideas based on Tom's rewrite: If that MAX is still expensive it might be worth breaking SELECT User_Id, max(Bb_Open_Date) as max FROM Manuim Man GROUP BY User_Id out into a VIEW, and considering making it MATERIALIZED, or creating an equivalent trigger based summary table (there are examples in the docs of how to do this). Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select with max functions
Mariel Cherkasskywrites: > explain analyze SELECT Ma.User_Id, > COUNT(*) COUNT >FROM Manuim Ma >WHERE Ma.Bb_Open_Date = > (SELECT Bb_Open_Date >FROM Manuim Man >WHERE Man.User_Id = Ma.User_Id order > by bb_open_date desc limit 1 > ) >GROUP BY Ma.User_Id >HAVING COUNT(*) > 1; The core problem with this query is that the sub-select has to be done over again for each row of the outer table, since it's a correlated sub-select (ie, it refers to Ma.User_Id from the outer table). Replacing a max() call with handmade logic doesn't do anything to help that. I'd try refactoring it so that you calculate the max Bb_Open_Date just once for each user id, perhaps along the lines of SELECT Ma.User_Id, COUNT(*) COUNT FROM Manuim Ma, (SELECT User_Id, max(Bb_Open_Date) as max FROM Manuim Man GROUP BY User_Id) ss WHERE Ma.User_Id = ss.User_Id AND Ma.Bb_Open_Date = ss.max GROUP BY Ma.User_Id HAVING COUNT(*) > 1; This is still not going to be instantaneous, but it might be better. It's possible that an index on (User_Id, Bb_Open_Date) would help, but I'm not sure. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select with max functions
explain analyze SELECT Ma.User_Id, COUNT(*) COUNT FROM Manuim Ma WHERE Ma.Bb_Open_Date = (SELECT Bb_Open_Date FROM Manuim Man WHERE Man.User_Id = Ma.User_Id order by bb_open_date desc limit 1 ) GROUP BY Ma.User_Id HAVING COUNT(*) > 1; QUERY PLAN - GroupAggregate (cost=0.56..2430770384.80 rows=128137 width=18) (actual time=55.823..2970443.757 rows=1213 loops=1) Group Key: ma.user_id Filter: (count(*) > 1) Rows Removed by Filter: 3693020 -> Index Scan using manuim_i_user_id on manuim ma (cost=0.56..2430767766.00 rows=178324 width=10) (actual time=0.249 ..2966355.734 rows=3695461 loops=1) Filter: (bb_open_date = (SubPlan 1)) Rows Removed by Filter: 31969367 SubPlan 1 -> Limit (cost=68.00..68.00 rows=1 width=8) (actual time=0.082..0.082 rows=0 loops=35664828) -> Sort (cost=68.00..68.04 rows=16 width=8) (actual time=0.081..0.081 rows=0 loops=35664828) Sort Key: man.bb_open_date DESC Sort Method: quicksort Memory: 25kB -> Index Scan using manuim_i_user_id on manuim man (cost=0.56..67.92 rows=16 width=8) (actual ti me=0.001..0.069 rows=85 loops=35664828) Index Cond: ((user_id)::text = (ma.user_id)::text) Planning time: 0.414 ms Execution time: 2970444.732 ms (16 rows) 2017-10-02 16:45 GMT+03:00 Gerardo Herzig <gher...@fmed.uba.ar>: > > > - Mensaje original - > > De: "Mariel Cherkassky" <mariel.cherkas...@gmail.com> > > Para: "Andreas Kretschmer" <andr...@a-kretschmer.de> > > CC: pgsql-performance@postgresql.org > > Enviados: Lunes, 2 de Octubre 2017 10:25:19 > > Asunto: Re: [PERFORM] select with max functions > > > > Andreas I tried to rewrite it with the function rank() but I failed. The > > query you wrote isnt the same as what I search. Moreover, I cant use > > explain analyze because it is taking to much time to run and I'm getting > > timeout.. > > > > 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andr...@a-kretschmer.de>: > > Do a "set statement_timeout TO 0" prior to "explain analyze" >
Re: [PERFORM] select with max functions
- Mensaje original - > De: "Mariel Cherkassky" <mariel.cherkas...@gmail.com> > Para: "Andreas Kretschmer" <andr...@a-kretschmer.de> > CC: pgsql-performance@postgresql.org > Enviados: Lunes, 2 de Octubre 2017 10:25:19 > Asunto: Re: [PERFORM] select with max functions > > Andreas I tried to rewrite it with the function rank() but I failed. The > query you wrote isnt the same as what I search. Moreover, I cant use > explain analyze because it is taking to much time to run and I'm getting > timeout.. > > 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andr...@a-kretschmer.de>: Do a "set statement_timeout TO 0" prior to "explain analyze" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select with max functions
Andreas I tried to rewrite it with the function rank() but I failed. The query you wrote isnt the same as what I search. Moreover, I cant use explain analyze because it is taking to much time to run and I'm getting timeout.. 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer: > > > Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky: > >> Hi, >> I need to use the max function in my query. I had very bad performance >> when I used the max : >> >>SELECT Ma.User_Id, >> COUNT(*) COUNT >>FROM Manuim Ma >>WHERE Ma.Bb_Open_Date = >> (SELECT max(Bb_Open_Date) >>FROM Manuim Man >>WHERE Man.User_Id = Ma.User_Id >> ) >>GROUP BY Ma.User_Id >>HAVING COUNT(*) > 1; >> >> >> Any idea how can I continue from here ? Thanks , Mariel. >> > > > Maybe you can rewrite it, for instance to > > select distinct on (user_id, bb_open_date) user_id, bb_open_date, count(1) > from Manuim group by 1,2 having count(1) > 1; > > maybe much cheaper, but untested! If not, please share more details, at > least table-definition. > > Regards, Andreas > > -- > 2ndQuadrant - The PostgreSQL Support Company. > www.2ndQuadrant.com > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] select with max functions
Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky: Hi, I need to use the max function in my query. I had very bad performance when I used the max : SELECT Ma.User_Id, COUNT(*) COUNT FROM Manuim Ma WHERE Ma.Bb_Open_Date = (SELECT max(Bb_Open_Date) FROM Manuim Man WHERE Man.User_Id = Ma.User_Id ) GROUP BY Ma.User_Id HAVING COUNT(*) > 1; Any idea how can I continue from here ? Thanks , Mariel. Maybe you can rewrite it, for instance to select distinct on (user_id, bb_open_date) user_id, bb_open_date, count(1) from Manuim group by 1,2 having count(1) > 1; maybe much cheaper, but untested! If not, please share more details, at least table-definition. Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select with max functions
- Mensaje original - > De: "Mariel Cherkassky" <mariel.cherkas...@gmail.com> > Para: pgsql-performance@postgresql.org > Enviados: Domingo, 1 de Octubre 2017 9:41:37 > Asunto: [PERFORM] select with max functions > > Hi, > I need to use the max function in my query. I had very bad performance when > I used the max : > >SELECT Ma.User_Id, > COUNT(*) COUNT >FROM Manuim Ma >WHERE Ma.Bb_Open_Date = > (SELECT max(Bb_Open_Date) >FROM Manuim Man >WHERE Man.User_Id = Ma.User_Id > ) >GROUP BY Ma.User_Id >HAVING COUNT(*) > 1; > > >QUERY PLAN > > - > GroupAggregate (cost=0.56..3250554784.13 rows=115111 width=18) >Group Key: ma.user_id >Filter: (count(*) > 1) >-> Index Scan using manuim_i_user_id on manuim ma > (cost=0.56..3250552295.59 rows=178324 width=10) > Filter: (bb_open_date = (SubPlan 1)) > SubPlan 1 >-> Aggregate (cost=90.98..90.99 rows=1 width=8) > -> Index Scan using manuim_i_user_id on manuim man > (cost=0.56..90.92 rows=22 width=8) >Index Cond: ((user_id)::text = (ma.user_id)::text) > (9 rows) > > > > So I used the limit 1 option : > >SELECT Ma.User_Id, > COUNT(*) COUNT >FROM Manuim Ma >WHERE Ma.Bb_Open_Date = > (SELECT Bb_Open_Date >FROM Manuim Man >WHERE Man.User_Id = Ma.User_Id order > by bb_open_date desc limit 1 > ) >GROUP BY Ma.User_Id >HAVING COUNT(*) > 1; > > and the performance are still the same : > > QUERY PLAN > > --- > GroupAggregate (cost=0.56..3252248863.46 rows=115111 width=18) >Group Key: ma.user_id >Filter: (count(*) > 1) >-> Index Scan using manuim_i_user_id on manuim ma > (cost=0.56..3252246374.92 rows=178324 width=10) > Filter: (bb_open_date = (SubPlan 1)) > SubPlan 1 >-> Limit (cost=91.03..91.03 rows=1 width=8) > -> Sort (cost=91.03..91.09 rows=22 width=8) >Sort Key: man.bb_open_date DESC >-> Index Scan using manuim_i_user_id on manuim man > (cost=0.56..90.92 rows=22 width=8) > Index Cond: ((user_id)::text = > (ma.user_id)::text) > (11 rows) > > > > the reading on the table manuim takes a lot of effort, what else can I do ? > the table`s size is 8G. > > select count(*) from manuim; > count > -- > 35664828 > (1 row) > > the indexes on the table : > "manuim_bb_open_date" btree (bb_open_date) > "manuim_i_user_id" btree (user_id) > > > Any idea how can I continue from here ? Thanks , Mariel. Start by posting the results of "explain analyze" of that queries, so we can see some timming stuff. Gerardo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] select with max functions
Hi, I need to use the max function in my query. I had very bad performance when I used the max : SELECT Ma.User_Id, COUNT(*) COUNT FROM Manuim Ma WHERE Ma.Bb_Open_Date = (SELECT max(Bb_Open_Date) FROM Manuim Man WHERE Man.User_Id = Ma.User_Id ) GROUP BY Ma.User_Id HAVING COUNT(*) > 1; QUERY PLAN - GroupAggregate (cost=0.56..3250554784.13 rows=115111 width=18) Group Key: ma.user_id Filter: (count(*) > 1) -> Index Scan using manuim_i_user_id on manuim ma (cost=0.56..3250552295.59 rows=178324 width=10) Filter: (bb_open_date = (SubPlan 1)) SubPlan 1 -> Aggregate (cost=90.98..90.99 rows=1 width=8) -> Index Scan using manuim_i_user_id on manuim man (cost=0.56..90.92 rows=22 width=8) Index Cond: ((user_id)::text = (ma.user_id)::text) (9 rows) So I used the limit 1 option : SELECT Ma.User_Id, COUNT(*) COUNT FROM Manuim Ma WHERE Ma.Bb_Open_Date = (SELECT Bb_Open_Date FROM Manuim Man WHERE Man.User_Id = Ma.User_Id order by bb_open_date desc limit 1 ) GROUP BY Ma.User_Id HAVING COUNT(*) > 1; and the performance are still the same : QUERY PLAN --- GroupAggregate (cost=0.56..3252248863.46 rows=115111 width=18) Group Key: ma.user_id Filter: (count(*) > 1) -> Index Scan using manuim_i_user_id on manuim ma (cost=0.56..3252246374.92 rows=178324 width=10) Filter: (bb_open_date = (SubPlan 1)) SubPlan 1 -> Limit (cost=91.03..91.03 rows=1 width=8) -> Sort (cost=91.03..91.09 rows=22 width=8) Sort Key: man.bb_open_date DESC -> Index Scan using manuim_i_user_id on manuim man (cost=0.56..90.92 rows=22 width=8) Index Cond: ((user_id)::text = (ma.user_id)::text) (11 rows) the reading on the table manuim takes a lot of effort, what else can I do ? the table`s size is 8G. select count(*) from manuim; count -- 35664828 (1 row) the indexes on the table : "manuim_bb_open_date" btree (bb_open_date) "manuim_i_user_id" btree (user_id) Any idea how can I continue from here ? Thanks , Mariel.