On Wed, Mar 14, 2012 at 12:34 AM, Joachim Wieland <j...@mcknight.de> wrote:
>>> If a child terminates without leaving a message, the master will still
>>> detect it and just say "a worker process died unexpectedly" (this part
>>> was actually broken, but now it's fixed :-) )
>> All that may be true, but I still don't see why it's right for this to
>> apply in the cases where the worker thread says die_horribly(), but
>> not in the cases where the worker says exit_horribly().
> Hm, I'm not calling the error handler from exit_horribly because it
> doesn't have the AH. It looks like the code assumes that
> die_horribly() is called whenever AH is available and if not,
> exit_horribly() should be called which eventually calls these
> preregistered exit-hooks via exit_nicely() to clean up the connection.
> I think we should somehow unify both functions, the code is not very
> consistent in this respect, it also calls exit_horribly() when it has
> AH available. See for example pg_backup_tar.c

I think we should get rid of die_horribly(), and instead have arrange
to always clean up AH via an on_exit_nicely hook.

>> Sure, but since all the function does is write to it or access it,
>> what good does that do me?
> It encapsulates the variable so that it can only be used for one
> specific use case.

Seems pointless to me.

+       /*
+        * This is a data dumper routine, executed in a child for parallel backu
+        * so it must not access the global g_conn but AH->connection instead.
+        */

There's no g_conn any more.  This and several other references to it
should be updated or expunged.

+       {
+               write_msg(NULL, "parallel backup only supported by the directory
+               exit(1);
+       }

I think this should exit_horribly() with that message.  It definitely
can't use exit() rather than exit_nicely(); more generally, every copy
of exit() that you've added here should exit_nicely() instead, or use
some higher-level routine like exit_horribly().

+                       write_msg(NULL, "No synchronized snapshots available in
+                                                "You might have to run with --n
+                       exit(1);

In addition to the previous problem, what do you mean by "might"?  The
real problem is that on pre-9.2 versions multiple jobs are not OK
unless that option is used; I think we should say that more directly.

 * The sequence is the following (for dump, similar for restore):
 * Master                                   Worker
 *                                          enters WaitForCommands()
 * DispatchJobForTocEntry(...te...)
 * [ Worker is IDLE ]
 * arg = (MasterStartParallelItemPtr)()
 * send: DUMP arg
 *                                          receive: DUMP arg
 *                                          str = (WorkerJobDumpPtr)(arg)
 * [ Worker is WORKING ]                    ... gets te from arg ...
 *                                          ... dump te ...
 *                                          send: OK DUMP info
 * In ListenToWorkers():
 * [ Worker is FINISHED ]
 * receive: OK DUMP info
 * status = (MasterEndParallelItemPtr)(info)
 * In ReapWorkerStatus(&ptr):
 * *ptr = status;
 * [ Worker is IDLE ]

I don't find this comment very clear, and would suggest rewriting it
using prose rather than an ASCII diagram.  Note also that any sort of
thing that does look like an ASCII diagram must be surrounded by lines
of dashes within the comment block, or pgindent will make hash of it.
There are a couple of other places where this is an issue as well,
like the comment for ListenToWorkers().

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to