On Mon, Nov 20, 2017 at 12:25 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Antonin Houska <a...@cybertec.at> writes: >> Robert Haas <robertmh...@gmail.com> wrote: >>> These two phases overlap, though. I believe progress reporting for >>> sorts is really hard. > >> Whatever complexity is hidden in the sort, cost_sort() should have taken it >> into consideration when called via plan_cluster_use_sort(). Thus I think that >> once we have both startup and total cost, the current progress of the sort >> stage can be estimated from the current number of input and output >> rows. Please remind me if my proposal appears to be too simplistic. > > Well, even if you assume that the planner's cost model omits nothing > (which I wouldn't bet on), its result is only going to be as good as the > planner's estimate of the number of rows to be sorted. And, in cases > where people actually care about progress monitoring, it's likely that > the planner got that wrong, maybe horribly so. I think it's a bad idea > for progress monitoring to depend on the planner's estimates in any way > whatsoever.
I agree. I have been of the opinion all along that progress monitoring needs to report facts, not theories. The number of tuples read thus far is a fact, and is fine to report for whatever value it may have to someone. The number of tuples that will be read in the future is a theory, and as you say, progress monitoring is most likely to be used in cases where theory and practice ended up being very different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company