Hi Tomas, > See section 3.2 in this paper (the "view PDF" does not work for me, but > the "source PDF" does download a postscript):
I believe that you are referring to a dynamic programming approach. It is a 1-dimensional case! To find an optimal solution in the M-dimensional case is an NP-hard problem. Regards, Alexander Cheshev On Mon, 8 Jan 2024 at 00:29, Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > > > On 1/7/24 23:53, Alexander Cheshev wrote: > > Hi Tomas, > > > >> The thing I was proposing is that it should be possible to build > >> histograms with bins adapted to density in the given region. With > >> smaller buckets in areas with high density. So that queries intersect > >> with fewer buckets in low-density parts of the histogram. > > > > This is how Equi-Depth Histogram works. Postgres has maller buckets in > > areas with high density: > > > > values[(i * (nvals - 1)) / (num_hist - 1)] > > > True, but the boundaries are somewhat random. Also, I was referring to > the multi-dimensional case, it wasn't clear to me if the proposal is to > do the same thing. > > >> I don't recall the details of the MHIST-2 scheme, but it's true > >> calculating "perfect" V-optimal histogram would be quadratic O(N^2*B). > > > > In M-dimensional space "perfect" V-Optimal Histogram is an NP-hard > > problem. In other words it is not possible to build it in polynomial > > time. How did you come up with the estimate?! > > > See section 3.2 in this paper (the "view PDF" does not work for me, but > the "source PDF" does download a postscript): > > https://citeseerx.ist.psu.edu/doc_view/pid/35e29cbc2bfe6662653bdae1fb89c091e2ece560 > > >> But that's exactly why greedy/approximate algorithms exist. Yes, it's > >> not going to produce the optimal V-optimal histogram, but so what? > > > > Greedy/approximate algorithm has time complexity O(M*N*B), where M > > equals the number of dimensions. MHIST-2 is a greedy/approximate > > algorithm. > > > >> And how does this compare to the approximate/greedy algorithms, both in > >> terms of construction time and accuracy? > > > > Time complexity of Equi-Depth Histogram has no dependence on B. > > > Really? I'd expect that to build B buckets, the algorithm repeat some > O(M*N) action B-times, roughly. I mean, it needs to pick a dimension by > which to split, then do some calculation on the N tuples, etc. > > Maybe I'm imagining that wrong, though. It's been ages since I looked ad > big-O complexity and/or the histograms. I'd have to play with those > algorithms for a bit again. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company