Tom Lane WROTE. > So after committing the latest round of parameterized-plan hacking, > I was dismayed to see the buildfarm breaking out in pink, thanks to > some of the members producing a different plan than expected for one > test query. I eventually managed to reproduce that (on the fourth > machine I tried locally), and after some quality time with gdb > I understand what is happening, to wit: the two alternative plans have > exactly the same cost so far as our cost model is concerned.
> The idea that I'm toying with is to try to make the choice a bit less > platform-specific, by removing the exact cost test that add_path uses > as its last-ditch comparison step, essentially this: > > /* > * Same pathkeys and outer rels, and fuzzily > * the same cost, so keep just one; to decide > * which, first check rows and then do an > * exact cost comparison. > */ > if (new_path->rows < old_path->rows) > remove_old = true; /* new dominates old */ > - else if (new_path->rows > old_path->rows) > - accept_new = false; /* old dominates new */ > - else if (compare_path_costs(new_path, old_path, > - TOTAL_COST) < 0) > - remove_old = true; /* new dominates old */ > else > accept_new = false; /* old equals or dominates new */ > > This change would mean that, when two paths have the same pathkeys, > parameterization, and rowcount, and fuzzily the same cost, that we > arbitrarily keep the first-submitted one rather than looking at low > order digits of the costs. Since the order in which different paths > are generated is a property of our code and not platform-specific, > this should eliminate platform dependencies in cases where two paths > are essentially identical to the cost model. > > A variant idea would be to replace the exact cost comparison with a > second round of fuzzy cost comparison, but with a much tighter fuzz > factor, maybe 1e-6 instead of 0.01. > > Now, neither of these fixes is perfect: what they would do is remove > platform-specific instability from where the costs are basically equal > and add some more in the range where the costs differ by almost exactly > the fuzz factor. But the behavior near that point is platform-specific > already, just not quite as much, and it's surely something we're > unlikely to trip over in the regression tests. What if you remove the exact cost comparison, but leave the part where old dominates new based on rows? That should not add any new instabilities compared to the code as it is. Of course, if the row count estimates are subject to the same problem, it would solve nothing. Perhaps it is worth trying on the buildfarm. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers