On Wed, Jul 2, 2025 at 6:44 PM Andrei Lepikhov <lepi...@gmail.com> wrote:
> On 2/7/2025 11:14, Richard Guo wrote:
> > On Wed, Jul 2, 2025 at 4:32 PM Andrei Lepikhov <lepi...@gmail.com> wrote:
> >> Therefore, it would be better to find a way to refactor the
> >> `preprocess_relation_rtes` function to gather table statistics lazily
> >> into the hash table when they are needed. For example, we could do this
> >> at the moment of creating the `RelOptInfo` or before a subquery pull-up,
> >> without modifying the RTE at all.

> > All the catalog information collected in preprocess_relation_rtes() is
> > needed very early in the planner.  I don't see how we could move that
> > logic to a later stage, such as at the moment of creating RelOptInfos
> > as you mentioned.

> I apologise for the confusion in my previous message. I am not
> suggesting that we postpone this. Instead, I would like an explanation
> of why you believe that accessing the table statistics earlier could
> negatively impact planner performance. As I mentioned before, I have
> only envisioned rare instances where join eliminations may reduce the
> number of relations and clause evaluations resulting in a constant.

I wonder how you arrived at the conclusion that these cases are rare.
If they truly are, then why have we invested so much effort in
optimizing for them?

I also wonder why you think we should collect all catalog information
at the very early stage of the planner, given that most of it is only
used much later -- after RelOptInfos have been created.  If the goal
is to avoid redundant catalog retrieval for the same relation in
get_relation_info(), perhaps adding a caching mechanism within that
function would be a more targeted solution.  I don't see a strong
reason for moving get_relation_info() to the very beginning of the
planner.

Thanks
Richard


Reply via email to