People, I made a small modification in pg_dump to prevent parallel backup failures due to exclusive lock requests made by other tasks.
The modification I made take shared locks for each parallel backup worker at the very beginning of the job. That way, any other job that attempts to acquire exclusive locks will wait for the backup to finish. In my case, each server was taking a day to complete the backup, now with parallel backup one is taking 3 hours and the others less than a hour. The code below is not very elegant, but it works for me. My whishlist for the backup is: 1) replace plpgsql by c code reading the backup toc and assembling the lock commands. 2) create an timeout to the locks. 3) broadcast the end of copy to every worker in order to release the locks as early as possible; 4) create a monitor thread that prioritize an copy job based on a exclusive lock acquired; 5) grant the lock for other connection of the same distributed transaction if it is held by any connection of the same distributed transaction. There is some sideefect I can't see on that? 1 to 4 are within my capabilities and I may do it in the future. 4 is to advanced for me and I do not dare to mess with something so fundamental rights now. Anyone else is working on that? On, Parallel.c, void RunWorker(...), add: PQExpBuffer query; PGresult *res; query = createPQExpBuffer(); resetPQExpBuffer(query); appendPQExpBuffer(query, "do language 'plpgsql' $$" " declare " " x record;" " begin" " for x in select * from pg_tables where schemaname not in ('pg_catalog','information_schema') loop" " raise info 'lock table %.%', x.schemaname, x.tablename;" " execute 'LOCK TABLE '||quote_ident(x.schemaname)||'.'||quote_ident(x.tablename)||' IN ACCESS SHARE MODE NOWAIT';" " end loop;" "end" "$$" ); res = PQexec(AH->connection, query->data); if (!res || PQresultStatus(res) != PGRES_COMMAND_OK) exit_horribly(modulename,"Could not lock the tables to begin the work\n\n"); PQclear(res); destroyPQExpBuffer(query);