On Thu, Jul 7, 2022 at 1:53 PM Greg Stark <st...@mit.edu> wrote: > > - -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50) > > + -> Parallel Seq Scan on tenk1 (actual rows=1960.00 > > At the not inconsiderable risk of bike-shedding.... > > I'm wondering if printing something like 0.00 will be somewhat > deceptive when the real value is non-zero but less than 1 row per 200 > loops. I wonder if the number of decimal places should be calculated > to produce a minimum of one non-zero digit for non-zero values.
I mean, what I really want here if I'm honest is to not have the system divide the number of rows by the loop count. And it sort of sounds like maybe that's what you want, too. You want to know whether the loop count is actually zero, not whether it's close to zero when you divide it by some number that might be gigantic. Parallel query's treatment of this topic has come in for some criticism, but I don't know what else it could really do: there could be any number of loops in each backend, and it need not be the same across all backends, so all it can do is add up the loop counts just like it adds up the row counts and times. And if we just printed out those totals, the result would be understandable by everyone. But we insist on dividing it by the loop count, and then things get really obscure. Consider this example, which does not involve parallel query: Nested Loop (actual time=TIME FOR THIS AND ALL CHILDREN rows=THE REAL ROW COUNT loops=1) -> Seq Scan on something (actual time=THE TIME IT REALLY TOOK rows=THE REAL ROW COUNT loops=1) -> Index Scan using someidx on somethingelse (actual time=NOT REALLY HOW LONG IT TOOK rows=NOT REALLY HOW MANY ROWS WE GOT loops=HUGE NUMBER) If I'm looking at this plan and trying to find out what's gone wrong, I want to know how much time got spent in the nested loop, how much time got spent in the Seq Scan, and how much time got spent in the Index Scan. It's easy to figure out how much time got spent in the Seq Scan, but to find out how much time got spent in the Index Scan, I have to multiply the time by the loop count. Then, I have to add that number to the time spent in the Seq Scan and subtract that from the time from the nested loop to find the time spent on the nested loop itself. This is quite a lot of computation, especially if the plan involves a dozen or two different nested loops, and if we didn't insist on dividing the time by the loop count, it would be MUCH EASIER to figure out whether the time spent in the Index Scan is a significant percentage of the total time or not. And likewise, if you're trying to understand the row count for the nested loop, it would be a heck of a lot simpler if you could see the *raw* row count for the index scan. It's unclear to me what value there ever is in knowing that the number of rows per iteration was about 0 or about 1 or about 2. The only thing I'm ever going to do with the row count that gets printed here is multiply it by the loop count and then try to figure out how much precision I've lost because of limits on the number of decimal places. Right now that's basically all of it because nearly every case ends up with the index scan having rows=1, so even just adding 2 decimal places will help a lot. But I'm still just going to be reverse engineering what I really want to know, which is the original number, from what the system gives me, which is a needlessly-obfuscated version of that value. Grumble, grumble. It's sad that it's been 13 years and we haven't done anything about this. -- Robert Haas EDB: http://www.enterprisedb.com