Hi Scott,

On 23/7/2015 11:54 AM, Scott Ribe wrote:
On Jul 22, 2015, at 9:10 PM, Mohit Sindhwani <[email protected]> wrote:
We have tried this and the query is quite a bit slower.  Filtering to the last 
900k records before doing the recorded_on part helped speed it up.
I don't understand how that could possibly be the case if there's an index on 
recorded_on.

Because it was the count(distinct x) that was the problem :)
Doing only a count(*) is faster without the subquery... and is what we have switched to.


Your email got me going back to look at all the parts again since obviously the 
query should be using the index and it was still slow.  Further search last 
night made me realize that it's not the indexes that are a problem.  The 
problem is the count(distinct group_id) part which seems to be quite slow in 
PostgreSQL.  This is a lot faster:
select count(*) from
(select distinct group_id from
data_store_v2 where recorded_on >= '2015-06-06') td;
than:
select count(distinct group_id) from
data_store_v2 where recorded_on >= '2015-06-06';

as explained here: 
https://www.periscope.io/blog/use-subqueries-to-count-distinct-50x-faster.html

So, I guess the real problem was being masked by something else and an 
incorrect assumption on my part :)
I would expect the select count(distinct...) to be a major contributor to the 
time taken by the query, just given the amount of work it must do. The select 
count(*) from (select distinct...) alternative is a nice tip :)

...and that is what the cause was.

Thanks for the analysis :D
I do understand SQL and I thought I'm not throwing things together... for my 
understanding, which column was unnecessary?  I thought we needed all:
group_id for counting the distinct group_id
recorded_on for the subsequent query on it
id only for getting the most recent records
I apologize--I misread the query structure. I got it into my head as:

select ... from (select ... from ... where recorded_on ... order by ... limit 
...)

I think you can see how THAT query would have better fit my description of 
being poorly constructed.

I'm glad that my somewhat off-base pontification still managed to point you in 
a useful direction!


Yes, thanks again.

Best Regards,
Mohit.


--
You received this message because you are subscribed to the Google Groups "Ruby on 
Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/rubyonrails-talk/55B07B49.5050807%40onghu.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to