Re: [PERFORM] select with max functions

2017-10-03 Thread Mark Kirkwood

On 03/10/17 04:29, Tom Lane wrote:

Mariel Cherkassky  writes:

explain analyze   SELECT Ma.User_Id,
   COUNT(*) COUNT
FROM   Manuim Ma
WHERE  Ma.Bb_Open_Date  =
   (SELECT Bb_Open_Date
FROM   Manuim Man
WHERE  Man.User_Id = Ma.User_Id order
by   bb_open_date desc limit 1
   )
GROUP  BY Ma.User_Id
HAVING COUNT(*) > 1;

The core problem with this query is that the sub-select has to be done
over again for each row of the outer table, since it's a correlated
sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
a max() call with handmade logic doesn't do anything to help that.
I'd try refactoring it so that you calculate the max Bb_Open_Date just
once for each user id, perhaps along the lines of

SELECT Ma.User_Id,
COUNT(*) COUNT
FROM   Manuim Ma,
   (SELECT User_Id, max(Bb_Open_Date) as max
FROM   Manuim Man
GROUP BY User_Id) ss
WHERE  Ma.User_Id = ss.User_Id AND
   Ma.Bb_Open_Date = ss.max
GROUP  BY Ma.User_Id
HAVING COUNT(*) > 1;

This is still not going to be instantaneous, but it might be better.

It's possible that an index on (User_Id, Bb_Open_Date) would help,
but I'm not sure.

regards, tom lane




Further ideas based on Tom's rewrite: If that MAX is still expensive it 
might be worth breaking



SELECT User_Id, max(Bb_Open_Date) as max
   FROM   Manuim Man
   GROUP BY User_Id

out into a VIEW, and considering making it MATERIALIZED, or creating an 
equivalent  trigger based summary table (there are examples in the docs 
of how to do this).


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select with max functions

2017-10-02 Thread Tom Lane
Mariel Cherkassky  writes:
> explain analyze   SELECT Ma.User_Id,
>   COUNT(*) COUNT
>FROM   Manuim Ma
>WHERE  Ma.Bb_Open_Date  =
>   (SELECT Bb_Open_Date
>FROM   Manuim Man
>WHERE  Man.User_Id = Ma.User_Id order
> by   bb_open_date desc limit 1
>   )
>GROUP  BY Ma.User_Id
>HAVING COUNT(*) > 1;

The core problem with this query is that the sub-select has to be done
over again for each row of the outer table, since it's a correlated
sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
a max() call with handmade logic doesn't do anything to help that.
I'd try refactoring it so that you calculate the max Bb_Open_Date just
once for each user id, perhaps along the lines of

SELECT Ma.User_Id,
   COUNT(*) COUNT
   FROM   Manuim Ma,
  (SELECT User_Id, max(Bb_Open_Date) as max
   FROM   Manuim Man
   GROUP BY User_Id) ss
   WHERE  Ma.User_Id = ss.User_Id AND
  Ma.Bb_Open_Date = ss.max
   GROUP  BY Ma.User_Id
   HAVING COUNT(*) > 1;

This is still not going to be instantaneous, but it might be better.

It's possible that an index on (User_Id, Bb_Open_Date) would help,
but I'm not sure.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select with max functions

2017-10-02 Thread Mariel Cherkassky
explain analyze   SELECT Ma.User_Id,
  COUNT(*) COUNT
   FROM   Manuim Ma
   WHERE  Ma.Bb_Open_Date  =
  (SELECT Bb_Open_Date
   FROM   Manuim Man
   WHERE  Man.User_Id = Ma.User_Id order
by   bb_open_date desc limit 1
  )
   GROUP  BY Ma.User_Id
   HAVING COUNT(*) > 1;

 QUERY PLAN

-

 GroupAggregate  (cost=0.56..2430770384.80 rows=128137 width=18) (actual
time=55.823..2970443.757 rows=1213 loops=1)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   Rows Removed by Filter: 3693020
   ->  Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..2430767766.00 rows=178324 width=10) (actual time=0.249
..2966355.734 rows=3695461 loops=1)
 Filter: (bb_open_date = (SubPlan 1))
 Rows Removed by Filter: 31969367
 SubPlan 1
   ->  Limit  (cost=68.00..68.00 rows=1 width=8) (actual
time=0.082..0.082 rows=0 loops=35664828)
 ->  Sort  (cost=68.00..68.04 rows=16 width=8) (actual
time=0.081..0.081 rows=0 loops=35664828)
   Sort Key: man.bb_open_date DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using manuim_i_user_id on manuim man
(cost=0.56..67.92 rows=16 width=8) (actual ti
me=0.001..0.069 rows=85 loops=35664828)
 Index Cond: ((user_id)::text =
(ma.user_id)::text)
 Planning time: 0.414 ms
 Execution time: 2970444.732 ms
(16 rows)

2017-10-02 16:45 GMT+03:00 Gerardo Herzig <gher...@fmed.uba.ar>:

>
>
> - Mensaje original -
> > De: "Mariel Cherkassky" <mariel.cherkas...@gmail.com>
> > Para: "Andreas Kretschmer" <andr...@a-kretschmer.de>
> > CC: pgsql-performance@postgresql.org
> > Enviados: Lunes, 2 de Octubre 2017 10:25:19
> > Asunto: Re: [PERFORM] select with max functions
> >
> > Andreas I tried to rewrite it with the function rank() but I failed. The
> > query you wrote isnt the same as what I search. Moreover, I cant use
> > explain analyze because it is taking to much time to run and I'm getting
> > timeout..
> >
> > 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andr...@a-kretschmer.de>:
>
> Do a "set statement_timeout TO 0" prior to "explain analyze"
>


Re: [PERFORM] select with max functions

2017-10-02 Thread Gerardo Herzig


- Mensaje original -
> De: "Mariel Cherkassky" <mariel.cherkas...@gmail.com>
> Para: "Andreas Kretschmer" <andr...@a-kretschmer.de>
> CC: pgsql-performance@postgresql.org
> Enviados: Lunes, 2 de Octubre 2017 10:25:19
> Asunto: Re: [PERFORM] select with max functions
> 
> Andreas I tried to rewrite it with the function rank() but I failed. The
> query you wrote isnt the same as what I search. Moreover, I cant use
> explain analyze because it is taking to much time to run and I'm getting
> timeout..
> 
> 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <andr...@a-kretschmer.de>:

Do a "set statement_timeout TO 0" prior to "explain analyze"


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select with max functions

2017-10-02 Thread Mariel Cherkassky
Andreas I tried to rewrite it with the function rank() but I failed. The
query you wrote isnt the same as what I search. Moreover, I cant use
explain analyze because it is taking to much time to run and I'm getting
timeout..

2017-10-01 21:48 GMT+03:00 Andreas Kretschmer :

>
>
> Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
>
>> Hi,
>> I need to use the max function in my query. I had very bad performance
>> when I used the max  :
>>
>>SELECT Ma.User_Id,
>>   COUNT(*) COUNT
>>FROM   Manuim Ma
>>WHERE  Ma.Bb_Open_Date  =
>>   (SELECT max(Bb_Open_Date)
>>FROM  Manuim Man
>>WHERE Man.User_Id = Ma.User_Id
>>   )
>>GROUP  BY Ma.User_Id
>>HAVING COUNT(*) > 1;
>>
>>
>> Any idea how can I continue from here ? Thanks , Mariel.
>>
>
>
> Maybe you can rewrite it, for instance to
>
> select distinct on (user_id, bb_open_date) user_id, bb_open_date, count(1)
> from Manuim group by 1,2 having count(1) > 1;
>
> maybe much cheaper, but untested! If not, please share more details, at
> least table-definition.
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] select with max functions

2017-10-01 Thread Andreas Kretschmer



Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:

Hi,
I need to use the max function in my query. I had very bad performance 
when I used the max  :


   SELECT Ma.User_Id,
  COUNT(*) COUNT
   FROM   Manuim Ma
   WHERE  Ma.Bb_Open_Date  =
  (SELECT max(Bb_Open_Date)
   FROM  Manuim Man
   WHERE Man.User_Id = Ma.User_Id
  )
   GROUP  BY Ma.User_Id
   HAVING COUNT(*) > 1;


Any idea how can I continue from here ? Thanks , Mariel.



Maybe you can rewrite it, for instance to

select distinct on (user_id, bb_open_date) user_id, bb_open_date, 
count(1) from Manuim group by 1,2 having count(1) > 1;


maybe much cheaper, but untested! If not, please share more details, at 
least table-definition.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select with max functions

2017-10-01 Thread Gerardo Herzig


- Mensaje original -
> De: "Mariel Cherkassky" <mariel.cherkas...@gmail.com>
> Para: pgsql-performance@postgresql.org
> Enviados: Domingo, 1 de Octubre 2017 9:41:37
> Asunto: [PERFORM] select with max functions
> 
> Hi,
> I need to use the max function in my query. I had very bad performance when
> I used the max  :
> 
>SELECT Ma.User_Id,
>   COUNT(*) COUNT
>FROM   Manuim Ma
>WHERE  Ma.Bb_Open_Date  =
>   (SELECT max(Bb_Open_Date)
>FROM   Manuim Man
>WHERE  Man.User_Id = Ma.User_Id
>   )
>GROUP  BY Ma.User_Id
>HAVING COUNT(*) > 1;
> 
> 
>QUERY PLAN
> 
> -
>  GroupAggregate  (cost=0.56..3250554784.13 rows=115111 width=18)
>Group Key: ma.user_id
>Filter: (count(*) > 1)
>->  Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3250552295.59 rows=178324 width=10)
>  Filter: (bb_open_date = (SubPlan 1))
>  SubPlan 1
>->  Aggregate  (cost=90.98..90.99 rows=1 width=8)
>  ->  Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
>Index Cond: ((user_id)::text = (ma.user_id)::text)
> (9 rows)
> 
> 
> 
> So I used the limit 1 option :
> 
>SELECT Ma.User_Id,
>   COUNT(*) COUNT
>FROM   Manuim Ma
>WHERE  Ma.Bb_Open_Date  =
>   (SELECT Bb_Open_Date
>FROM   Manuim Man
>WHERE  Man.User_Id = Ma.User_Id order
> by   bb_open_date desc limit 1
>   )
>GROUP  BY Ma.User_Id
>HAVING COUNT(*) > 1;
> 
> and the performance are still the same :
> 
>   QUERY PLAN
> 
> ---
>  GroupAggregate  (cost=0.56..3252248863.46 rows=115111 width=18)
>Group Key: ma.user_id
>Filter: (count(*) > 1)
>->  Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3252246374.92 rows=178324 width=10)
>  Filter: (bb_open_date = (SubPlan 1))
>  SubPlan 1
>->  Limit  (cost=91.03..91.03 rows=1 width=8)
>  ->  Sort  (cost=91.03..91.09 rows=22 width=8)
>Sort Key: man.bb_open_date DESC
>->  Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
>  Index Cond: ((user_id)::text =
> (ma.user_id)::text)
> (11 rows)
> 
> 
> 
> the reading on the table manuim takes a lot of effort, what else can I do ?
> the table`s size is 8G.
> 
> select count(*) from manuim;
>   count
> --
>  35664828
> (1 row)
> 
> the indexes on the table :
>  "manuim_bb_open_date" btree (bb_open_date)
> "manuim_i_user_id" btree (user_id)
> 
> 
> Any idea how can I continue from here ? Thanks , Mariel.

Start by posting the results of "explain analyze" of that queries, so we can 
see some timming stuff.

Gerardo


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] select with max functions

2017-10-01 Thread Mariel Cherkassky
Hi,
I need to use the max function in my query. I had very bad performance when
I used the max  :

   SELECT Ma.User_Id,
  COUNT(*) COUNT
   FROM   Manuim Ma
   WHERE  Ma.Bb_Open_Date  =
  (SELECT max(Bb_Open_Date)
   FROM   Manuim Man
   WHERE  Man.User_Id = Ma.User_Id
  )
   GROUP  BY Ma.User_Id
   HAVING COUNT(*) > 1;


   QUERY PLAN

-
 GroupAggregate  (cost=0.56..3250554784.13 rows=115111 width=18)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   ->  Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..3250552295.59 rows=178324 width=10)
 Filter: (bb_open_date = (SubPlan 1))
 SubPlan 1
   ->  Aggregate  (cost=90.98..90.99 rows=1 width=8)
 ->  Index Scan using manuim_i_user_id on manuim man
(cost=0.56..90.92 rows=22 width=8)
   Index Cond: ((user_id)::text = (ma.user_id)::text)
(9 rows)



So I used the limit 1 option :

   SELECT Ma.User_Id,
  COUNT(*) COUNT
   FROM   Manuim Ma
   WHERE  Ma.Bb_Open_Date  =
  (SELECT Bb_Open_Date
   FROM   Manuim Man
   WHERE  Man.User_Id = Ma.User_Id order
by   bb_open_date desc limit 1
  )
   GROUP  BY Ma.User_Id
   HAVING COUNT(*) > 1;

and the performance are still the same :

  QUERY PLAN

---
 GroupAggregate  (cost=0.56..3252248863.46 rows=115111 width=18)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   ->  Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..3252246374.92 rows=178324 width=10)
 Filter: (bb_open_date = (SubPlan 1))
 SubPlan 1
   ->  Limit  (cost=91.03..91.03 rows=1 width=8)
 ->  Sort  (cost=91.03..91.09 rows=22 width=8)
   Sort Key: man.bb_open_date DESC
   ->  Index Scan using manuim_i_user_id on manuim man
(cost=0.56..90.92 rows=22 width=8)
 Index Cond: ((user_id)::text =
(ma.user_id)::text)
(11 rows)



the reading on the table manuim takes a lot of effort, what else can I do ?
the table`s size is 8G.

select count(*) from manuim;
  count
--
 35664828
(1 row)

the indexes on the table :
 "manuim_bb_open_date" btree (bb_open_date)
"manuim_i_user_id" btree (user_id)


Any idea how can I continue from here ? Thanks , Mariel.