On Wed, Jul 2, 2025 at 6:44 PM Andrei Lepikhov <lepi...@gmail.com> wrote: > On 2/7/2025 11:14, Richard Guo wrote: > > On Wed, Jul 2, 2025 at 4:32 PM Andrei Lepikhov <lepi...@gmail.com> wrote: > >> Therefore, it would be better to find a way to refactor the > >> `preprocess_relation_rtes` function to gather table statistics lazily > >> into the hash table when they are needed. For example, we could do this > >> at the moment of creating the `RelOptInfo` or before a subquery pull-up, > >> without modifying the RTE at all.
> > All the catalog information collected in preprocess_relation_rtes() is > > needed very early in the planner. I don't see how we could move that > > logic to a later stage, such as at the moment of creating RelOptInfos > > as you mentioned. > I apologise for the confusion in my previous message. I am not > suggesting that we postpone this. Instead, I would like an explanation > of why you believe that accessing the table statistics earlier could > negatively impact planner performance. As I mentioned before, I have > only envisioned rare instances where join eliminations may reduce the > number of relations and clause evaluations resulting in a constant. I wonder how you arrived at the conclusion that these cases are rare. If they truly are, then why have we invested so much effort in optimizing for them? I also wonder why you think we should collect all catalog information at the very early stage of the planner, given that most of it is only used much later -- after RelOptInfos have been created. If the goal is to avoid redundant catalog retrieval for the same relation in get_relation_info(), perhaps adding a caching mechanism within that function would be a more targeted solution. I don't see a strong reason for moving get_relation_info() to the very beginning of the planner. Thanks Richard