worthy hackers,
I propose the below patches to parallels.c and pg_backup_utils.c fixing 
deadlocks in pg_restore (windows only) if running more than 2 parallel jobs.
This problem was reported by me earlier this year.
http://www.postgresql.org/message-id/20160307161619.25731.78...@wrigleys.postgresql.org

- Winsock's "recv(...)" called in piperead() is a blocking read by default, 
therefor, signalizing termEvent as used in ShutdownWorkersHard() is not enough 
to make worker-threads go away.
We need a preceding shutdown(pipeWrite, SD_BOTH), first, to abort blocking IO 
in this case.
Otherwise, the main-thread will wait forever, if more than one additional 
worker is active (e.g. option -j3) and a premature EOF occurs in the input-file.

Findings in pg_backup_utils.c/ parallels.c, which could impact other tools, too:

- threads created with _beginthreadex need to be exited by either a "return 
exitcode"  or "_endthreadex(exitcode)". It might be obsolete in 
fire-and-forget-scenarios, but it matters in other cases.
As of current, pg_backup_utils uses EndThread to retire additional 
worker-threads., which are spawned by _beginthreadex in parallel.c. The 
corresponding call for ExitThread would be CreateThread,
nevertheless, _beginthreadex is the correct choice here, as we do call-out into 
CRT and need to retain the thread-handle for after-death synchronization with 
the main-thread.
The thread-handle needs to be closed explicitly.

If this is not the correct place to discuss patches, I'd be glad if somebody 
can notify the tool's maintainer, to take a look into it.

best regards,
Armin Schöffmann.


--
Aegaeon technologies GmbH
phone: +49.941.8107344
fax:   +49.941.8107356

Legal disclaimer:
http://aegaeon.de/disclaimer/email_all_int.txt


parallel.c

@@ -350,7 +350,8 @@ static void
 ShutdownWorkersHard(ParallelState *pstate)
 {
-#ifndef WIN32
+
        int                     i;
 
+#ifndef WIN32
        signal(SIGPIPE, SIG_IGN);
 
@@ -367,4 +368,7 @@ ShutdownWorkersHard(ParallelState *pstate)
        /* The workers monitor this event via checkAborting(). */
        SetEvent(termEvent);
+
+       for (i = 0; i < pstate->numWorkers; i++)
+               shutdown(pstate->parallelSlot[i].pipeWrite, SD_BOTH);
 #endif
 
@@ -408,5 +412,8 @@ WaitForTerminatingWorkers(ParallelState *pstate)
                for (j = 0; j < pstate->numWorkers; j++)
                        if (pstate->parallelSlot[j].hThread == hThread)
+                       {
                                slot = &(pstate->parallelSlot[j]);
+                               CloseHandle(hThread);
+                       }
 
                free(lpHandles);



pg_backup_utils.c

@@ -120,5 +120,5 @@ exit_nicely(int code)
 #ifdef WIN32
        if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
-               ExitThread(code);
+               _endthreadex(code);
 #endif





Attachment: parallel.c.diff
Description: Binary data

Attachment: pg_backup_utils.c.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to