On 11/17/21 16:39, Xiaozhe Yao wrote:
Hi Tom,

Thanks for your feedback. I completely agree with you that a higher-level hook is better suited for this case. I have adjusted the PoC patch to this email.

Now it is located in the clauselist_selectivity_ext function, where we first check if the hook is defined. If so, we let the hook estimate the selectivity and return the result. With this one, I can also develop extensions to better estimate the selectivity.


I think clauselist_selectivity is the right level, because this is pretty similar to what extended statistics are doing. I'm not sure if the hook should be called in clauselist_selectivity_ext or in the plain clauselist_selectivity. But it should be in clauselist_selectivity_or too, probably.

The way the hook is used seems pretty inconvenient, though. I mean, if you do this

    if (clauselist_selectivity_hook)
        return clauselist_selectivity_hook(...);

then what will happen when the ML model has no information applicable to a query? This is called for all relations, all conditions, etc. and you've short-circuited all the regular code, so the hook will have to copy all of that. Seems pretty silly and fragile.

IMO the right approach is what statext_clauselist_selectivity is doing, i.e. estimate clauses, mark them as estimated in a bitmap, and let the rest of the existing code take care of the remaining clauses. So more something like

    if (clauselist_selectivity_hook)
        s1 *= clauselist_selectivity_hook(..., &estimatedclauses);


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to