I did some more digging. To keep everyone on the same page there are four different ways to order TOCs:
1. topological order, 2. dataLength order, size of the table, is always zero when pg_dump can't seek, 3. dumpId order, which should be thought as random but roughly correlates to topological order to make things fun, 4. file order, the order that tables are physically stored in the custom dump file. Without being able to seek backwards a parallel restore of the custom dump archive format has to be ordered by #1 and #4. The reference counting that reduce_dependencies does inside of the parallel restore logic upholds ordering #1. Unfortunately, 548e50976ce changed TocEntrySizeCompare (which is used to break ties within #1) to order by #2, then by #3. This most often breaks on dumps written by pg_dump without seeks (everything has a dataLength of zero) as it then falls back to #3 ordering every time. But, because nothing in pg_restore does any ordering by #4 you could potentially run into this with any custom dump so I think it's a regression. For some troubleshooting I changed ready_list_sort to never call qsort. This fixes the problem by never ordering by #3, leaving things in #4 order, but breaks the new algorithm introduced in 548e50976ce. I did what Justin suggested earlier and it works great. Parallel restore requires seekable input (enforced elsewhere) so everyone's parallel restores should work again. On Wed, May 20, 2020 at 10:48 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > David Gilman <davidgilm...@gmail.com> writes: > >> I think the PG11 > >> commit you mentioned (548e5097) happens to make some databases fail in > >> parallel restore that previously worked (I didn't check). > > > Correct, if you do the bisect around that yourself you'll see > > pg_restore start failing with the expected "possibly due to > > out-of-order restore request" on offset-less dumps. > > Yeah. Now, the whole point of that patch was to decouple the restore > order from the dump order ... but with an offset-less dump file, we > can't do that, or at least the restore order is greatly constrained. > I wonder if it'd be sensible for pg_restore to use a different parallel > scheduling algorithm if it notices that the input lacks offsets. > (There could still be some benefit from parallelism, just not as much.) > No idea if this is going to be worth the trouble, but it probably > is worth looking into. > > regards, tom lane -- David Gilman :DG<
0001-pg_restore-fix-v2.patch
Description: Binary data