Hi, My suggestion is to keep two sets of histograms. One which is generated by running ANALYZE and the other which is dynamically generated histograms using the entries from logging (that is done in insert/update/delete operations). I am not sure how difficult is it to read such record details from logs.
Basically from the details mentioned here what i understand is that the table data (timestamp) is added in incremental way, ie existing data is not modified to great extent and the new data is merely appended to old data. In this case, the only work for analyse/statistics generation is to merge the histograms of newly added records to old histograms. if we can treat this case as similar to that of merging of histograms in case of joins involving 2 tables and generating the histograms for the cartesian (result) node, then all we need to do is somehow generate temporary histogram for the new set of records and merge them with the old histogram. The information of interesting columns from the new records can be read from the logging section. We must be already having the part of merging of histograms and I hope that it wont be very costly to make these calls so as to effect planner. (Further my opinion is to calculate this cost of histogram generation and use it in costing in some way) Further we can put some threshold limit to make this merge happen automatically. Say if the temporary histograms reach some set threshold, only then these will be merged with the older histograms. Please pass on your inputs. Regards, Chetan On Wed, Dec 30, 2009 at 8:38 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Josh Berkus <j...@agliodbs.com> writes: > > My thoughts on dealing with this intelligently without a major change to > > statstics gathering went along these lines: > > > 1. add columns to pg_statistic to hold estimates of upper and lower > > bounds growth between analyzes. > > This seems like a fundamentally broken approach, first because "time > between analyzes" is not even approximately a constant, and second > because it assumes that we have a distance metric for all datatypes. > (Note that convert_to_scalar does not assume that it can measure > arbitrary distances, but only fractions *within* a histogram bucket; > and even that is pretty shaky.) > > I don't have a better idea at the moment :-( > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >