On 12/07/2014 03:54 AM, Tomas Vondra wrote:
The one interesting case is the 'step skew' with statistics_target=10, i.e. estimates based on mere 3000 rows. In that case, the adaptive estimator significantly overestimates:values current adaptive ------------------------------ 106 99 107 106 8 6449190 1006 38 6449190 10006 327 42441 I don't know why I didn't get these errors in the previous runs, because when I repeat the tests with the old patches I get similar results with a 'good' result from time to time. Apparently I had a lucky day back then :-/ I've been messing with the code for a few hours, and I haven't found any significant error in the implementation, so it seems that the estimator does not perform terribly well for very small samples (in this case it's 3000 rows out of 10.000.000 (i.e. ~0.03%).
The paper [1] gives an equation for an upper bound of the error of this GEE estimator. How do the above numbers compare with that bound?
[1] http://ftp.cse.buffalo.edu/users/azhang/disc/disc01/cd1/out/papers/pods/towardsestimatimosur.pdf
- Heikki -- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
