Re: [HACKERS] two dimensional statistics in Postgres

Gavin Flower Thu, 06 Nov 2014 02:52:08 -0800

On 06/11/14 23:15, Katharina Büchse wrote:

Hi,
I'm a phd-student at the university of Jena, Thüringen, Germany, inthe field of data bases, more accurate query optimization.I want to implement a system in PostgreSQL that detects columncorrelations and creates statistical data about correlated columns forthe optimizer. Therefore I need to store two dimensional statistics(especially two dimensional histograms) in PostgreSQL.I had a look at the description of "WIP: multivariate statistics /proof of concept", which looks really promising, I guess thesestatistics are based on scans of the data? For my system I need both-- statistical data based on table scans (actually, samples areenough) and those based on query feedback. Query feedback (tuplecounts and, speaking a little inaccurately, the where-part of thequery itself) needs to be extracted and there needs to be a decisionfor the optimizer, when to take multivariate statistics and when touse the one dimensional ones. Oracle in this case just disables onedimensional histograms if there is already a multidimensionalhistogram, but this is not always useful, especially in the case of afeedback based histogram (which might not cover the whole data space).I want to use both kinds of histograms because correlations mightoccur only in parts of the data. In this case a histogram based on asample of the whole table might not get the point and wouldn't helpfor the part of the data the user seems to be interested in.There are special data structures for storing multidimensionalhistograms based on feedback and I already tried to implement one ofthese in C. In the case of two dimensions they are of course not "forfree" (one dimensional would be much cheaper), but based on theprinciple of maximum entropy they deliver really good results. Idecided for only two dimensions because in this case we have the bestproportion of cost and benefit when searching for correlation (hereI'm relying on tests that were made in DB2 within a project calledCORDS which detects correlations even between different tables).
I'd be grateful for any advices and discussions.
Regards,

Katharina

Could you store a 2 dimensional histogram in a one dimensional array:A[z] = value, where z = col * rowSize + row (zero starting index)?



Cheers,
Gavin




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] two dimensional statistics in Postgres

Reply via email to