Re: [PERFORM] select with max functions

2017-10-02 Thread Tom Lane
Mariel Cherkassky  writes:
> explain analyze   SELECT Ma.User_Id,
>   COUNT(*) COUNT
>FROM   Manuim Ma
>WHERE  Ma.Bb_Open_Date  =
>   (SELECT Bb_Open_Date
>FROM   Manuim Man
>WHERE  Man.User_Id = Ma.User_Id order
> by   bb_open_date desc limit 1
>   )
>GROUP  BY Ma.User_Id
>HAVING COUNT(*) > 1;

The core problem with this query is that the sub-select has to be done
over again for each row of the outer table, since it's a correlated
sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
a max() call with handmade logic doesn't do anything to help that.
I'd try refactoring it so that you calculate the max Bb_Open_Date just
once for each user id, perhaps along the lines of

SELECT Ma.User_Id,
   COUNT(*) COUNT
   FROM   Manuim Ma,
  (SELECT User_Id, max(Bb_Open_Date) as max
   FROM   Manuim Man
   GROUP BY User_Id) ss
   WHERE  Ma.User_Id = ss.User_Id AND
  Ma.Bb_Open_Date = ss.max
   GROUP  BY Ma.User_Id
   HAVING COUNT(*) > 1;

This is still not going to be instantaneous, but it might be better.

It's possible that an index on (User_Id, Bb_Open_Date) would help,
but I'm not sure.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select with max functions

2017-10-02 Thread Mariel Cherkassky
explain analyze   SELECT Ma.User_Id,
  COUNT(*) COUNT
   FROM   Manuim Ma
   WHERE  Ma.Bb_Open_Date  =
  (SELECT Bb_Open_Date
   FROM   Manuim Man
   WHERE  Man.User_Id = Ma.User_Id order
by   bb_open_date desc limit 1
  )
   GROUP  BY Ma.User_Id
   HAVING COUNT(*) > 1;

 QUERY PLAN

-

 GroupAggregate  (cost=0.56..2430770384.80 rows=128137 width=18) (actual
time=55.823..2970443.757 rows=1213 loops=1)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   Rows Removed by Filter: 3693020
   ->  Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..2430767766.00 rows=178324 width=10) (actual time=0.249
..2966355.734 rows=3695461 loops=1)
 Filter: (bb_open_date = (SubPlan 1))
 Rows Removed by Filter: 31969367
 SubPlan 1
   ->  Limit  (cost=68.00..68.00 rows=1 width=8) (actual
time=0.082..0.082 rows=0 loops=35664828)
 ->  Sort  (cost=68.00..68.04 rows=16 width=8) (actual
time=0.081..0.081 rows=0 loops=35664828)
   Sort Key: man.bb_open_date DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using manuim_i_user_id on manuim man
(cost=0.56..67.92 rows=16 width=8) (actual ti
me=0.001..0.069 rows=85 loops=35664828)
 Index Cond: ((user_id)::text =
(ma.user_id)::text)
 Planning time: 0.414 ms
 Execution time: 2970444.732 ms
(16 rows)

2017-10-02 16:45 GMT+03:00 Gerardo Herzig :

>
>
> - Mensaje original -
> > De: "Mariel Cherkassky" 
> > Para: "Andreas Kretschmer" 
> > CC: pgsql-performance@postgresql.org
> > Enviados: Lunes, 2 de Octubre 2017 10:25:19
> > Asunto: Re: [PERFORM] select with max functions
> >
> > Andreas I tried to rewrite it with the function rank() but I failed. The
> > query you wrote isnt the same as what I search. Moreover, I cant use
> > explain analyze because it is taking to much time to run and I'm getting
> > timeout..
> >
> > 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer :
>
> Do a "set statement_timeout TO 0" prior to "explain analyze"
>


Re: [PERFORM] select with max functions

2017-10-02 Thread Gerardo Herzig


- Mensaje original -
> De: "Mariel Cherkassky" 
> Para: "Andreas Kretschmer" 
> CC: pgsql-performance@postgresql.org
> Enviados: Lunes, 2 de Octubre 2017 10:25:19
> Asunto: Re: [PERFORM] select with max functions
> 
> Andreas I tried to rewrite it with the function rank() but I failed. The
> query you wrote isnt the same as what I search. Moreover, I cant use
> explain analyze because it is taking to much time to run and I'm getting
> timeout..
> 
> 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer :

Do a "set statement_timeout TO 0" prior to "explain analyze"


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select with max functions

2017-10-02 Thread Mariel Cherkassky
Andreas I tried to rewrite it with the function rank() but I failed. The
query you wrote isnt the same as what I search. Moreover, I cant use
explain analyze because it is taking to much time to run and I'm getting
timeout..

2017-10-01 21:48 GMT+03:00 Andreas Kretschmer :

>
>
> Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
>
>> Hi,
>> I need to use the max function in my query. I had very bad performance
>> when I used the max  :
>>
>>SELECT Ma.User_Id,
>>   COUNT(*) COUNT
>>FROM   Manuim Ma
>>WHERE  Ma.Bb_Open_Date  =
>>   (SELECT max(Bb_Open_Date)
>>FROM  Manuim Man
>>WHERE Man.User_Id = Ma.User_Id
>>   )
>>GROUP  BY Ma.User_Id
>>HAVING COUNT(*) > 1;
>>
>>
>> Any idea how can I continue from here ? Thanks , Mariel.
>>
>
>
> Maybe you can rewrite it, for instance to
>
> select distinct on (user_id, bb_open_date) user_id, bb_open_date, count(1)
> from Manuim group by 1,2 having count(1) > 1;
>
> maybe much cheaper, but untested! If not, please share more details, at
> least table-definition.
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>