I wrote: > ... I didn't work on tightening selfuncs.c's dependencies. > While I don't have a big problem with considering selfuncs.c to be > in bed with the planner, that's risky in that whatever dependencies > selfuncs.c has may well apply to extensions' selectivity estimators too. > What I'm thinking about doing there is trying to split selfuncs.c into > two parts, one being infrastructure that can be tightly tied to the > core planner (and, likely, get moved under backend/optimizer/) and the > other being estimators that use a limited API and can serve as models > for extension code.
I spent some time looking at this, wondering whether it'd be practical to write selectivity-estimating code that hasn't #included pathnodes.h (nee relation.h). It seems not: even pretty high-level functions such as eqjoinsel() need access to fields like RelOptInfo.rows and SpecialJoinInfo.jointype. Now, there are only a few such fields, so conceivably we could provide accessor functions in optimizer.h for the commonly-used fields and keep the struct types opaque. I'm not excited about that though; it's unlike how we do things elsewhere in Postgres and the mere savings of one #include dependency doesn't seem to justify it. So I'm thinking that planner support functions that want to do selectivity estimation will still end up pulling in pathnodes.h via selfuncs.h, and we'll just live with that. However ... there are three pretty clearly identifiable groups of functions in selfuncs.c: operator-specific estimators, support functions, and AM-specific indexscan cost estimation functions. There's a case to be made for splitting them into three separate files. One point is that selfuncs.c is just too large; at over 8K lines, it's currently the 7th-largest hand-maintained C file in our tree. Another point is that as things stand, there's a strong temptation to bury useful functionality in static functions that can't be gotten at by extension estimators. Separating the support functions might help keep us more honest on that point. (Or not.) I'm not sure whether those arguments are strong enough to justify the added pain-in-the-neck factor from moving a bunch of code around. That complicates back-patching bug fixes and it makes it harder to trace the git history of the code. So I've got mixed emotions about whether it's worth doing anything. Thoughts? regards, tom lane