Re: BRIN index which is much faster never chosen by planner
Since the optimizer is choosing a seq scan over index scan when it seems like it has good row estimates in both cases, to me that may mean costs of scanning index are expected to be high. Is this workload on SSD? Has the random_page_cost config been decreased from default 4 (compared with cost of 1 unit for sequential scan)? Your buffer hits aren't great. What is shared_buffers set to? How much ram on this cluster? With this table being insert only, one assumes correlation is very high on the data in this column as shown in pg_stats, but have your confirmed? To me, distinct ON is often a bad code smell and probably can be re-written to be much more efficient with GROUP BY, lateral & order by, or some other tool. Same with the window function. It is a powerful tool, but sometimes not the right one. Is "source" a function that is called on field1? What is it doing/how is it defined?
Re: BRIN index which is much faster never chosen by planner
On Thu, Oct 10, 2019 at 6:22 PM David Rowley wrote: > The planner might be able to get a better estimate on the number of > matching rows if the now() - interval '10 days' expression was > replaced with 'now'::timestamptz - interval '10 days'. However, care > would need to be taken to ensure the plan is never prepared since > 'now' is evaluated during parse. The same care must be taken when > creating views, functions, stored procedures and the like. > > The planner will just estimate the selectivity of now() - interval '10 > days' by using DEFAULT_INEQ_SEL, which is 0., so it > thinks it'll get 1/3rd of the table. Using 'now' will allow the > planner to lookup actual statistics on that column which will likely > give a much better estimate, which by the looks of it, likely will > result in one of those BRIN index being used. > This surprised me a bit, and would have significant implications. I tested a few different tables in our system and get the same row count estimate with either WHERE condition. Perhaps I am missing a critical piece of what you said. explain select * from charges where posted_on > now() - interval '10 days'; explain select * from charges where posted_on > 'now'::timestamptz - interval '10 days';
Re: BRIN index which is much faster never chosen by planner
Thanks for closing the loop on the data correlation question. I've been playing with BRIN indexes on a log table of sorts and this thread helped clear up some of the behavior I have been seeing. I am curious, would a partial btree index fit your needs? Perhaps the maintenance overhead is too significant or this is too off-the-wall, but a daily job to create new index and drop the old concurrently could give the performance you need while still saving the extra disk space of the full btree on the timestamp. CREATE INDEX CONCURRENTLY log_table_rec_insert_time_partial_10_04 ON log_table USING btree ( rec_insert_time ) WHERE rec_insert_time > '2019-10-04'::DATE; DROP INDEX CONCURRENTLY IF EXISTS log_table_rec_insert_time_partial_10_03; I would consider including category column as well, but I suspect that would increase the size of the index significantly. Of course, this depends on the query planner evaluating that "l.rec_insert_time >= now() - interval '10 days'" and determining that the index fulfills the need. >
Proposal- GUC for max dead rows before autovacuum
To mitigate the need for per-table tuning of autovacuum configuration, I'd like to propose a new GUC for autovacuum_vacuum_threshold_max. Currently, it seems that I can either set autovacuum_vacuum_scale_factor much smaller than default on tables with millions of rows, or set a value globally that means small tables are auto vacuumed rarely. The default value for this new setting value could be -1 or 0 to disable the feature, or something like 100,000 perhaps so that tables with more than 500, tuples are candidates for an autovacuum before they would with current default values.