Hi Greg, On Tue, Oct 1, 2019 at 4:03 AM Greg Stark <st...@mit.edu> wrote: > > Actually -- I'm sorry to followup to myself (twice) -- but that's > wrong. That Todo item predates the modern partitioning code. It came > from when the partitioned statistics were added for inheritance trees. > The resulting comment almost doesn't make sense any more since it > talks about updates to the parent table and treats them as distinct > from updates to the children. > > In any case it's actually not true any more as updates to the parent > table aren't even tracked any more -- see below. My modest proposal is > that we should count any updates that arrive through the parent table > as mods for both the parent and child.
Yeah, we need to teach autovacuum to consider analyzing partitioned tables. That is still a TODO for declarative partitioning. We do need to weigh the trade-offs here. In the thread quoted in your previous email, Tom expresses a concern [1] about ending up doing excessive work, because partitions would be scanned twice -- first to collect their own statistics and then to collect the parent's when the parent table is analyzed. Maybe if we find a way to calculate parent's stats from the partitions' stats without scanning the partitions, that would be great. Another thing to consider is that users now (as of v11) have the option of using partitionwise plans. Consider joining two huge partitioned tables. If they are identically partitioned, Postgres planner considers joining pairs of matching partitions and appending the outputs of these smaller joins. In this case, even if the non-partitionwise join couldn't use hash join, individual smaller joins could, because partition stats would be up to date. The requirements that the tables being joined be identically partitioned (or be partitioned at all) might be a bit too restrictive though. > A more ambitious proposal would have updates to the children also > count against the parent somehow but I'm not sure exactly how. And I'm > not sure we shouldn't be updating the parent statistics whenever we > run analyze on a child anyways but again I'm not sure how. As I mentioned above, we could try to figure out a way to "merge" the individual partitions' statistics when they're refreshed into the parent's stats. Thanks, Amit [1] https://www.postgresql.org/message-id/489.1276114285%40sss.pgh.pa.us