Final update on this thread: since it is only necessary for me to get a
rough ratio of the distribution (and not the absolute count), I refactored
the query to include a subquery that samples from the moments table
thus: SELECT moment_id, block_id FROM moments WHERE inserted BETWEEN
'yesterday' AND 'today' ORDER BY RANDOM() LIMIT 10000; I also took
advantage of another table called blocks that happens to contain the
moment_type as well (thus making it so I don't need to reference pg_class).
The final query looks like:

SELECT moment_type, emotion, COUNT(feedback_id)
  FROM (SELECT moment_id, block_id
          FROM moments
         WHERE inserted BETWEEN 'yesterday' AND 'today'
         ORDER BY RANDOM() LIMIT 10000) AS sample_moments
  JOIN blocks USING (block_id)
  JOIN emotions USING (moment_id)
 GROUP BY moment_type, emotion
 ORDER BY moment_type, emotion

The explain is at http://explain.depesz.com/s/lYh

Interestingly, increasing the limit does not seem to increase the runtime
in a linear fashion. When I run it with a limit of 60000 I get a runtime
of 14991 ms. But if I run it with a limit of 70000 I get a runtime of 77744
ms. I assume that that's because I'm hitting a memory limit and paging out.
Is that right?

On Tue, Jan 31, 2012 at 3:43 PM, Alessandro Gagliardi
<alessan...@path.com>wrote:

> I just got a pointer on presenting EXPLAIN ANALYZE in a more human
> friendly fashion (thanks, Agent M!): http://explain.depesz.com/s/A9S
>
> From this it looks like the bottleneck happens when Postgres does an Index
> Scan using emotions_moment_id_idx on emotions before filtering on
> moments.inserted so I thought I'd try filtering on emotions.inserted
> instead but that only made it worse. At the same time, I noticed that "FROM
> pg_class, moments WHERE moments.tableoid = pg_class.oid" tends to run a bit
> faster than "FROM pg_class JOIN moments ON moments.tableoid =
> pg_class.oid". So I tried:
>
> SELECT relname, emotion, COUNT(feedback_id)
>   FROM pg_class, moments, emotions
>  WHERE moments.tableoid = pg_class.oid
>    AND emotions.inserted > 'yesterday'
>    AND moments.inserted BETWEEN 'yesterday' AND 'today'
>    AND emotions.moment_id = moments.moment_id
>  GROUP BY relname, emotion
>  ORDER BY relname, emotion;
>
> That was a bit faster, but still very slow. Here's the EXPLAIN:
> http://explain.depesz.com/s/ZdF
>
> On Tue, Jan 31, 2012 at 2:53 PM, Alessandro Gagliardi <alessan...@path.com
> > wrote:
>
>> I changed the query a bit so the results would not change over the
>> course of the day to:
>>
>> SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
>> JOIN emotions USING (moment_id)
>> WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
>> moments.tableoid = pg_class.oid
>> GROUP BY relname, emotion ORDER BY relname, emotion;
>>
>

Reply via email to