On Thu, Jan 2, 2014 at 12:36 PM, Dave Johansen <davejohan...@gmail.com>wrote:
> On Thu, Jan 2, 2014 at 11:39 AM, Dave Johansen <davejohan...@gmail.com>wrote: > >> On Fri, Dec 20, 2013 at 10:46 PM, David Rowley <dgrowle...@gmail.com>wrote: >> >>> On Fri, Dec 20, 2013 at 1:35 PM, Dave Johansen >>> <davejohan...@gmail.com>wrote: >>> >>>> I just ran into an interesting issue on Postgres 8.4. I have a database >>>> with about 3 months of data and when I do following query: >>>> SELECT DATE_TRUNC('day', time) AS time_t, COUNT(*) FROM mytable GROUP >>>> BY time_t; >>>> >>>> EXPLAIN shows that it's doing a sort and then a GroupAggregate. There >>>> will only be ~90 outputs, so is there a way I can hint/force the planner to >>>> just do a HashAggregate? >>>> >>>> Just to see if it would change the plan, I tried increasing the >>>> work_mem up to 1GB and it still did the same plan. >>>> >>>> >>> PostgreSQL does not really have any stats on the selectivity of >>> date_trunc('day', time) so my guess is that it can only assume that it has >>> the same selectivity as the time column by itself... Which is very untrue >>> in this case. >>> The group aggregate plan is chosen here as PostgreSQL thinks the the >>> hash table is going to end up pretty big and decides that the group >>> aggregate will be the cheaper option. >>> >>> I mocked up your data and on 9.4 I can get the hash aggregate plan to >>> run if I set the n_distinct value to 90 then analyze the table again.. Even >>> if you could do this on 8.4 I'd not recommend it as it will probably cause >>> havoc with other plans around the time column. I did also get the hash >>> aggregate plan to run if I created a functional index on date_trunc('day', >>> time) then ran analyze again. I don't have a copy of 8.4 around to see if >>> the planner will make use of the index in the same way. >>> >> > I just tried this on 8.4 and it won't create the index because > DATE_TRUNC() is not immutable. The exact error is: > ERROR: function in index expression must be marked IMMUTABLE > > Any suggestions or other ideas? > I apologize for the multiple emails, but I just looked at the definition of DATE_TRUNC() and for TIMESTAMP WITHOUT TIME ZONE it's IMMUTABLE, so I will look into switching to that and see if using the index speeds up the queries.