> I'm still working my way around the math, but copulas sound better > than anything else I've been playing with.
I think the easiest way to think of them is, in 2-D finite spaces, they are just a plot of the order statistics against one another. Feel free to mail me off list if you have any math questions. I've previously thought that, at least in the 2D case, we could use image compression algorithms to compress the copula, but recently I've realized that this is a change point problem. In terms of compression, we want to decompose the copula into regions that are as homogenous as possible. I'm not familiar with change point problems in multiple dimensions, but I'll try and ask someone that is, probably late next week. If you decide to go the copula route, I'd be happy to write the decomposition algorithm - or at least work on the theory. Finally, a couple points that I hadn't seen mentioned earlier that should probably be considered- 1) NULL's need to be treated specially - I suspect the assumption of NULL independence is worse than other independence assumptions. Maybe dealing with NULL dependence could be a half step towards full dependence calculations? 2) Do we want to fold the MCV's into the dependence histogram? That will cause problems in our copula approach but I'd hate to have to keep an N^d histogram dependence relation in addition to the copula. 3) For equality selectivity estimates, I believe the assumption that the ndistinct value distribution is uniform in the histogram will become worse as the dimension increases. I proposed keeping track of ndistinct per histogram beckets earlier in the marginal case partially motivated by this exact scenario. Does that proposal make more sense in this case? If so we'd need to store two distributions - one of the counts and one of ndistinct. 4) How will this approach deal with histogram buckets that have scaling count sizes ( ie -0.4 )? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers