Hi, As I'm interested in this topic, I thought I'd take a look at the patch. I have no capability to test it on high end hardware but did some basic testing on my workstation and basic review of the patch.
I somehow had the impression that instead of creating a new connection for each restore item we would create the processes at the start and then send them the dumpId's they should be restoring. That would allow the controller to batch dumpId's together and expect the worker to process them in a transaction. But this is probably just an idea I created in my head. Do we know why we experience "tuple concurrently updated" errors if we spawn thread too fast? I completed some test restores using the pg_restore from head with the patch applied. The dump was a custom dump created with pg 8.2 and restored to an 8.2 database. To confirm this would work, I completed a restore using the standard single threaded mode. The schema restore successfully. The only errors reported involved non-existent roles. When I attempt to restore using parallel restore I get out of memory errors reported from _PrintData. The code returning the error is; _PrintData(... while (blkLen != 0) { if (blkLen + 1 > ctx->inSize) { free(ctx->zlibIn); ctx->zlibIn = NULL; ctx->zlibIn = (char *) malloc(blkLen + 1); if (!ctx->zlibIn) die_horribly(AH, modulename, " out of memory\n"); ctx->inSize = blkLen + 1; in = ctx->zlibIn; } It appears from my debugging and looking at the code that in _PrintData; lclContext *ctx = (lclContext *) AH->formatData; the memory context is shared across all threads. Which means that it's possible the memory contexts are stomping on each other. My GDB skills are now up to being able to reproduce this in a gdb session as there are forks going on all over the place. And if you process them in a serial fashion, there aren't any errors. I'm not sure of the fix for this. But in a parallel environment it doesn't seem possible to store the memory context in the AH. I also receive messages saying "pg_restore: [custom archiver] could not read from input file: end of file". I have not investigated these further as my current guess is they are linked to the out of memory error. Given I ran into this error at my first testing attempt I haven't evaluated much else at this point in time. Now all this could be because I'm using the 8.2 archive, but it works fine in single restore mode. The dump file is about 400M compressed and an entire archive schema was removed from the restore path with a custom restore list. Command line used; PGPORT=5432 ./pg_restore -h /var/run/postgresql -m4 --truncate-before-load -v -d tt2 -L tt.list /home/mr-russ/pg-index-test/timetable.pgdump 2> log.txt I sent the log and this email originally to the list, but I think the attachment was too large, so I've resent without any attachements. Since my initial testing, Stefan has confirmed the problem I am having. If you have any questions, would like me to run other tests or anything, feel free to contact me. Regards Russell -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers