Re: [PATCH] More docs on what to do and not do in extension code

2021-11-04 Thread Daniel Gustafsson
> On 30 Aug 2021, at 04:20, Craig Ringer  wrote:
> 
> On Tue, 29 Jun 2021 at 13:30, Craig Ringer  > wrote:
> Laurenz,
> 
> Thanks for your comments. Sorry it's taken me so long to get back to you. 
> Commenting inline below on anything I think needs comment; other proposed 
> changes look good.
> 
> I'm not going to get back to this anytime soon.
> 
> If anybody wants to pick it up that's great. Otherwise at least it's there in 
> the mailing lists to search.

I'm marking this returned with feedback for now, please open a new entry when
there is an updated patch.

--
Daniel Gustafsson   https://vmware.com/





Re: [PATCH] More docs on what to do and not do in extension code

2021-08-29 Thread Craig Ringer
On Tue, 29 Jun 2021 at 13:30, Craig Ringer 
wrote:

> Laurenz,
>
> Thanks for your comments. Sorry it's taken me so long to get back to you.
> Commenting inline below on anything I think needs comment; other proposed
> changes look good.
>

I'm not going to get back to this anytime soon.

If anybody wants to pick it up that's great. Otherwise at least it's there
in the mailing lists to search.


Re: [PATCH] More docs on what to do and not do in extension code

2021-06-28 Thread Craig Ringer
Laurenz,

Thanks for your comments. Sorry it's taken me so long to get back to you.
Commenting inline below on anything I think needs comment; other proposed
changes look good.

On Sun, 30 May 2021 at 19:20, Laurenz Albe  wrote:

> +   always use  linkend="xfunc-sleeping-interrupts-blocking">PostgreSQL's
> +   interupt-aware APIs for the purpose. Do not
> usleep(),
> +   system(), make blocking system calls, etc.
> +  
>
> "system" is not a verb.
>

When it's a function call it is, but I'm fine with your revision:

  Do not use usleep(), system()
>   or other blocking system calls.
>
> +and should only use the main thread. PostgreSQL generally uses
> subprocesses
>
> Hm.  If the extension does not start threads, it automatically uses the
> main thread.
> I think that should be removed for clarity.
>

IIRC I intended that to apply to the section that tries to say how to
survive running your own threads in the backend if you really must do so.

+primitives like WaitEventSetWait where
> necessary. Any
> +potentially long-running loop should periodically call 
> +CHECK_FOR_INTERRUPTS() to give PostgreSQL a chance to
> interrupt
> +the function in case of a shutdown request, query cancel, etc.
> This means
>
> Are there other causes than shutdown or query cancellation?
> At any rate, I am not fond of enumerations ending with "etc".
>

I guess. I wanted to emphasise that if you mess this up postgres might fail
to shut down or your backend might fail to respond to SIGTERM /
pg_terminate_backend, as those are the most commonly reported symptoms when
such issues are encountered.


+by PostgreSQL if any function that it calls makes a
> +CHECK_FOR_INTERRUPTS() check. It may not
>
> "makes" sound kind of clumsy in my ears.
>

Yeah. I didn't come up with anything better right away but will look when I
get the chance to return to this patch.


> Attached is a new version that has my suggested changes, plus a few from
> Bharath Rupireddy (I do not agree with many of his suggestions).
>

Thanks very much. I will try to return to this soon and review the diff
then rebase and update the patch.

I have a large backlog to get through, and I've recently had the pleasure
of having to work on windows/powershell build system stuff, so it may still
take me a while.


Re: [PATCH] More docs on what to do and not do in extension code

2021-05-30 Thread Laurenz Albe
On Mon, 2021-01-18 at 15:56 +0800, Craig Ringer wrote:
> The attached patch expands the xfunc docs and bgworker docs a little, 
> providing a starting point for developers
>  to learn how to do some common tasks the Postgres Way.

I like these changes!

Here is a review:

+  
+   See  for information on how to
+   request extension shared memory allocations, LWLocks,
+   etc.
+  

This doesn't sound very English to me, and I don't see the point in
repeating parts of the enumeration.  How about

  See ... for detailed information how to allocate these resources.

+  
+   If a background worker needs to sleep or wait for activity it should

Missing comma after "activity".

+   always use PostgreSQL's
+   interupt-aware APIs for the purpose. Do not 
usleep(),
+   system(), make blocking system calls, etc.
+  

"system" is not a verb.  Suggestion:

  Do not use usleep(), system()
  or other blocking system calls.

+  
+   The src/test/modules/worker_spi and
+   src/test/modules/test_shm_mq contain working examples
+   that demonstrates some useful techniques.
   

That is missing a noun in my opinion, I would prefer:

  The modules ... contain working examples ...

+  
+   Signal Handling in Background Workers

It is not a good idea to start a section in the middle of a documentation page.
That will lead to a weird TOC entry at the top of the page.

The better way to do this is to write a short introductory header and convert
most of the first half of the page into another section, so that we end up
with a page the has the introductory material and two TOC entries for the 
details.

+The default signal handlers installed for background workers do
+not interrupt queries or blocking calls into other postgres code

PostgreSQL, not "postgres".
Also, there is a comma missing at the end of the line.

+so they are only suitable for simple background workers that frequently and
+predictably return control to their main loop. If your worker uses the
+default background worker signal handling it should call

Another missing comma after "handling".

+HandleMainLoopInterrupts() on each pass through its
+main loop.
+   
+
+   
+Background workers that may make blocking calls into core PostgreSQL code
+and/or run user-supplied queries should generally replace the default 
bgworker

Please stick with "background worker", "bgworker" is too sloppy IMO.

+signal handlers with the handlers used for normal user backends. This will
+ensure that the worker will respond in a timely manner to a termination
+request, query cancel request, recovery conflict interrupt, deadlock 
detector
+request, etc. To install these handlers, before unblocking interrupts
+run:

The following would be more grammatical:

  To install these handlers, run the following before unblocking interrupts:

+Then ensure that your main loop and any other code that could run for some
+time contains CHECK_FOR_INTERRUPTS(); calls. A
+background worker using these signal handlers must use PostgreSQL's resource management APIs
+and callbacks to handle cleanup rather than relying on control
+returning to the main loop because the signal handlers may call

There should be a comma before "because".

+proc_exit() directly. This is recommended practice
+for all types of extension code anyway.
+   
+
+   
+See the comments in src/include/miscadmin.h in the
+postgres headers for more details on signal handling.
+   

"in the postgres headers" is redundant - at any rate, it should be "PostgreSQL".

+Do not attempt to use C++ exceptions or Windows Structured Exception
+Handling, and never call exit() directly.

I am alright with this addition, but I think it would be good to link to
 from it.

+  
+   
+Individual PostgreSQL backends are single-threaded.
+Never call any PostgreSQL function or access any PostgreSQL-managed 
data
+structure from a thread other than the main

"PostgreSQL" should always have the  tag.
This applies to a lot of places in this patch.

+thread. If at all possible your extension should not start any threads

Comma after "possible".

+and should only use the main thread. PostgreSQL generally uses 
subprocesses

Hm.  If the extension does not start threads, it automatically uses the main 
thread.
I think that should be removed for clarity.

+that coordinate over shared memory instead of threads - see
+.

It also uses signals and light-weight locks - but I think that you don't need to
describe the coordination mechanisms here, which are explained in the link you 
added.

+primitives like WaitEventSetWait where necessary. 
Any
+potentially long-running loop should periodically call 
+CHECK_FOR_INTERRUPTS() to give PostgreSQL a chance to 
interrupt
+the function in case of a shutdown request, query cancel, etc. This 
means

Are there 

Re: [PATCH] More docs on what to do and not do in extension code

2021-03-26 Thread Craig Ringer
On Fri, 26 Mar 2021 at 06:15, Bruce Momjian  wrote:

> On Thu, Mar 25, 2021 at 08:49:44AM -0400, David Steele wrote:
> > On 1/22/21 1:36 AM, Craig Ringer wrote:
> > >
> > > Would you mind attaching a revised version of the patch with your
> edits?
> > > Otherwise I'll go and merge them in once you've had your say on my
> > > comments inline below.
> >
> > Bharath, do the revisions in [1] look OK to you?
> >
> > > Bruce, Robert, can I have an opinion from you on how best to locate and
> > > structure these docs, or whether you think they're suitable for the
> main
> > > docs at all? See patch upthread.
> >
> > Bruce, Robert, any thoughts here?
>
> I know I sent an email earlier this month saying we shouldn't
> over-document the backend hooks because the code could drift away from
> the README content:
>
>
> https://www.postgresql.org/message-id/20210309172049.GD26575%40momjian.us
>
> Agreed.  If you document the hooks too much, it allows them to
> drift
> away from matching the code, which makes the hook documentation
> actually
> worse than having no hook documentation at all.
>
> However, for this doc patch, the content seem to be more strategic, so
> less likely to change, and hard to figure out from the code directly.
> Therefore, I think this would be a useful addition to the docs.
>

Thanks for the kind words. It's good to hear that it may be useful. Let me
know if anything further is needed.


Re: [PATCH] More docs on what to do and not do in extension code

2021-03-25 Thread Bruce Momjian
On Thu, Mar 25, 2021 at 08:49:44AM -0400, David Steele wrote:
> On 1/22/21 1:36 AM, Craig Ringer wrote:
> > 
> > Would you mind attaching a revised version of the patch with your edits?
> > Otherwise I'll go and merge them in once you've had your say on my
> > comments inline below.
> 
> Bharath, do the revisions in [1] look OK to you?
> 
> > Bruce, Robert, can I have an opinion from you on how best to locate and
> > structure these docs, or whether you think they're suitable for the main
> > docs at all? See patch upthread.
> 
> Bruce, Robert, any thoughts here?

I know I sent an email earlier this month saying we shouldn't
over-document the backend hooks because the code could drift away from
the README content:


https://www.postgresql.org/message-id/20210309172049.GD26575%40momjian.us

Agreed.  If you document the hooks too much, it allows them to drift
away from matching the code, which makes the hook documentation actually
worse than having no hook documentation at all.

However, for this doc patch, the content seem to be more strategic, so
less likely to change, and hard to figure out from the code directly.
Therefore, I think this would be a useful addition to the docs.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: [PATCH] More docs on what to do and not do in extension code

2021-03-25 Thread David Steele

On 1/22/21 1:36 AM, Craig Ringer wrote:


Would you mind attaching a revised version of the patch with your edits? 
Otherwise I'll go and merge them in once you've had your say on my 
comments inline below.


Bharath, do the revisions in [1] look OK to you?

Bruce, Robert, can I have an opinion from you on how best to locate and 
structure these docs, or whether you think they're suitable for the main 
docs at all? See patch upthread.


Bruce, Robert, any thoughts here?

Regards,
--
-David
da...@pgmasters.net

[1] 
https://www.postgresql.org/message-id/CAGRY4nyjHh-Tm89A8eS1x%2BJtZ-qHU7wY%2BR0DEEtWfv5TQ3HcGA%40mail.gmail.com





Re: [PATCH] More docs on what to do and not do in extension code

2021-01-21 Thread Craig Ringer
Hi

Thanks so much for reading over this!

Would you mind attaching a revised version of the patch with your edits?
Otherwise I'll go and merge them in once you've had your say on my comments
inline below.

Bruce, Robert, can I have an opinion from you on how best to locate and
structure these docs, or whether you think they're suitable for the main
docs at all? See patch upthread.

On Tue, 19 Jan 2021 at 22:33, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

>
> Here are some comments:
>
> [1]
>background worker's main function, and must be unblocked by it; this is
> to
> allow the process to customize its signal handlers, if necessary.
> -   Signals can be unblocked in the new process by calling
> -   BackgroundWorkerUnblockSignals and blocked by
> calling
> -   BackgroundWorkerBlockSignals.
> +   It is important that all background workers set up and unblock signal
> +   handling before they enter their main loops. Signal handling in
> background
> +   workers is discussed separately in .
>
>

I wasn't sure either way on that, see your [3] below.

[2]
> +   interupt-aware APIs for the purpose. Do not
> usleep(),
> +   system(), make blocking system calls, etc.
> +  
>
> Is it "Do not use usleep(),
> system() or make blocking system calls etc." ?
>

Right. Good catch.

[3] IMO, we can remove following from "bgworker-signals" if we retain
> it where currently it is, as discussed in [1].
> +Signals can be unblocked in the new process by calling
> +BackgroundWorkerUnblockSignals and blocked by
> calling
> +BackgroundWorkerBlockSignals.
>

If so, need to mention that they start blocked and link to the main text
where that's mentioned.

That's part of why I moved this chunk into the signal section.

[4] Can we say
> +The default signal handlers set up for background workers do
> +default background worker signal handlers, it should call
>
> instead of
> +The default signal handlers installed for background workers
> do
> +default background worker signal handling it should call
>

Huh?

I don't understand this proposal.

s/install/set up/g?

[5] IMO, we can have something like below
> +request, etc. Set up these handlers before unblocking signals as
> +shown below:
>
> instead of
> +request, etc. To install these handlers, before unblocking interrupts
> +run:
>

Ditto

[6] I think logs and errors either elog() or ereport can be used, so how
> about
> +Use elog() or ereport()
> for
> +logging output or raising errors instead of any direct stdio
> calls.
>
> instead of
> +Use elog() and
> ereport() for
> +logging output and raising errors instead of any direct stdio
> calls.
>

OK.

[7] Can we use child processes instead of subprocess ? If okay in
> other places in the patch as well.
>

Fine with me. The point is really that they're non-postgres processes being
spawned by a backend, and that doing so must be done carefully.

[8] Why should file descriptor manager API be used to execute
> subprocesses/child processes?
> +To execute subprocesses and open files, use the routines provided
> by
> +the file descriptor manager like
> BasicOpenFile
> +and OpenPipeStream instead of a direct
>

Yeah, that wording is confusing, agreed. The point was that you shouldn't
use system() or popen(), you should OpenPipeStream(). And similarly, you
should avoid fopen() etc and instead use the Pg wrapper BasicOpenFile().

"
PostgreSQL backends are required to limit the number of file descriptors
they
open. To open files, use postgres file descriptor manager routines like
BasicOpenFile()
instead of directly using open() or fopen(). To open pipes to or from
external processes,
use OpenPipeStream() instead of popen().
"

?


> [9] "should always be"? even if it's a blocking extesion, does it
> work? If our intention is to recommend the developers, maybe we should
> avoid using the term "should" in the patch in other places as well.
>

The trouble is that it's a bit ... fuzzy.

You can get away with blocking for short periods without responding to
signals, but it's a "how long is a piece of string" issue.

"should be" is fine.

A hard "must" or "must not" would be misleading. But this isn't the place
to go into all the details of how time sensitive (or not) interrupt
handling of different kinds is in different places for different worker
types.


> [11] I think it is
> +block if this happens. So cleanup of resources is not
> entirely managed by PostgreSQL, it
> +   must be handled using appropriate callbacks provided by PostgreSQL
>
> instead of
> +block if this happens. So all cleanup of resources not already
> +managed by the PostgreSQL runtime must be handled using
> appropriate
>

I don't agree with the proposed new wording here.

Delete the "So all" from my original, or

... Cleanup of any resources that are not managed
> by the PostgreSQL runtime must be handled using 

Re: [PATCH] More docs on what to do and not do in extension code

2021-01-19 Thread Bharath Rupireddy
On Mon, Jan 18, 2021 at 1:27 PM Craig Ringer
 wrote:
> Hi folks
>
> The attached patch expands the xfunc docs and bgworker docs a little, 
> providing a starting point for developers to learn how to do some common 
> tasks the Postgres Way.
>
> It mentions in brief these topics:
>
> * longjmp() based exception handling with elog(ERROR), PG_CATCH() and 
> PG_RE_THROW() etc
> * Latches, spinlocks, LWLocks, heavyweight locks, condition variables
> * shm, DSM, DSA, shm_mq
> * syscache, relcache, relation_open(), invalidations
> * deferred signal handling, CHECK_FOR_INTERRUPTS()
> * Resource cleanup hooks and callbacks like on_exit, before_shmem_exit, the 
> resowner callbacks, etc
> * signal handling in bgworkers
>
> All very superficial, but all things I really wish I'd known a little about, 
> or even that I needed to learn about, when I started working on postgres.
>
> I'm not sure it's in quite the right place. I wonder if there should be a 
> separate part of xfunc.sgml that covers the slightly more advanced bits of 
> postgres backend and function coding like this, lists relevant README files 
> in the source tree, etc.
>
> I avoided going into details like how resource owners work. I don't want the 
> docs to have to cover all that in detail; what I hope to do is start 
> providing people with clear references to the right place in the code, 
> READMEs, etc to look when they need to understand specific topics.

Thanks for the patch.

Here are some comments:

[1]
   background worker's main function, and must be unblocked by it; this is to
allow the process to customize its signal handlers, if necessary.
-   Signals can be unblocked in the new process by calling
-   BackgroundWorkerUnblockSignals and blocked by calling
-   BackgroundWorkerBlockSignals.
+   It is important that all background workers set up and unblock signal
+   handling before they enter their main loops. Signal handling in background
+   workers is discussed separately in .
   

IMO, we can retain the statement about BackgroundWorkerUnblockSignals
and BackgroundWorkerBlockSignals, but mention the link to
"bgworker-signals" for more details and move the statement "it's
important to unblock signals before enter their main loop" to
"bgworker-signals" section and we can also reason there the
consequences if not done.

[2]
+   interupt-aware APIs for the purpose. Do not
usleep(),
+   system(), make blocking system calls, etc.
+  

Is it "Do not use usleep(),
system() or make blocking system calls etc." ?

[3] IMO, we can remove following from "bgworker-signals" if we retain
it where currently it is, as discussed in [1].
+Signals can be unblocked in the new process by calling
+BackgroundWorkerUnblockSignals and blocked by calling
+BackgroundWorkerBlockSignals.

[4] Can we say
+The default signal handlers set up for background workers do
+default background worker signal handlers, it should call

instead of
+The default signal handlers installed for background workers do
+default background worker signal handling it should call

[5] IMO, we can have something like below
+request, etc. Set up these handlers before unblocking signals as
+shown below:

instead of
+request, etc. To install these handlers, before unblocking interrupts
+run:

[6] I think logs and errors either elog() or ereport can be used, so how about
+Use elog() or ereport() for
+logging output or raising errors instead of any direct stdio calls.

instead of
+Use elog() and ereport() for
+logging output and raising errors instead of any direct stdio calls.

[7] Can we use child processes instead of subprocess ? If okay in
other places in the patch as well.
+and should only use the main thread. PostgreSQL generally
uses child processes
+that coordinate over shared memory instead of threads - for
instance, see
+.

instead of
+and should only use the main thread. PostgreSQL generally
uses subprocesses
+that coordinate over shared memory instead of threads - see
+.

[8] Why should file descriptor manager API be used to execute
subprocesses/child processes?
+To execute subprocesses and open files, use the routines provided by
+the file descriptor manager like BasicOpenFile
+and OpenPipeStream instead of a direct

[9] "should always be"? even if it's a blocking extesion, does it
work? If our intention is to recommend the developers, maybe we should
avoid using the term "should" in the patch in other places as well.
+Extension code should always be structured as a non-blocking

[10] I think it is
+you should avoid using sleep() or
usleep()

instead of
+you should sleep() or
usleep()


[11] I think it is
+block if this happens. So cleanup of resources is not
entirely managed by PostgreSQL, it
+   must be handled using appropriate callbacks provided by PostgreSQL

instead of
+block if this 

[PATCH] More docs on what to do and not do in extension code

2021-01-17 Thread Craig Ringer
Hi folks

The attached patch expands the xfunc docs and bgworker docs a little,
providing a starting point for developers to learn how to do some common
tasks the Postgres Way.

It mentions in brief these topics:

* longjmp() based exception handling with elog(ERROR), PG_CATCH() and
PG_RE_THROW() etc
* Latches, spinlocks, LWLocks, heavyweight locks, condition variables
* shm, DSM, DSA, shm_mq
* syscache, relcache, relation_open(), invalidations
* deferred signal handling, CHECK_FOR_INTERRUPTS()
* Resource cleanup hooks and callbacks like on_exit, before_shmem_exit, the
resowner callbacks, etc
* signal handling in bgworkers

All very superficial, but all things I really wish I'd known a little
about, or even that I needed to learn about, when I started working on
postgres.

I'm not sure it's in quite the right place. I wonder if there should be a
separate part of xfunc.sgml that covers the slightly more advanced bits of
postgres backend and function coding like this, lists relevant README files
in the source tree, etc.

I avoided going into details like how resource owners work. I don't want
the docs to have to cover all that in detail; what I hope to do is start
providing people with clear references to the right place in the code,
READMEs, etc to look when they need to understand specific topics.
From 96ce89cb7e1a97d9d247fbacba73ade85a01ea38 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 18 Jan 2021 14:05:15 +0800
Subject: [PATCH v1 2/2] Expand docs on PostgreSQL extension coding

Expand the docs on PostgreSQL extension coding and background worker
development a little so that key topics like allocation, interrupt
handling, exit callbacks, transaction callbacks, PG_TRY()/PG_CATCH(),
resource owners, transaction and snapshot state, etc are at least
briefly mentioned with a few pointers to where to learn more.

Add some detail on background worker signal handling.
---
 doc/src/sgml/bgworker.sgml |  86 ++--
 doc/src/sgml/xfunc.sgml| 162 -
 2 files changed, 239 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 7fd673ab54..9216b8e0ea 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -1,6 +1,6 @@
 
 
-
+
  Background Worker Processes
 
  
@@ -29,6 +29,12 @@
   
  
 
+ 
+  All code that will be executed in PostgreSQL background workers is expected
+  to follow the basic rules set out for all PostgreSQL extension code in .
+ 
+
  
   Background workers can be initialized at the time that
   PostgreSQL is started by including the module name in
@@ -95,6 +101,11 @@ typedef struct BackgroundWorker
buffers, or any custom data structures which the worker itself may
wish to create and use.
   
+  
+   See  for information on how to
+   request extension shared memory allocations, LWLocks,
+   etc.
+  
  
 
 
@@ -212,9 +223,9 @@ typedef struct BackgroundWorker
Signals are initially blocked when control reaches the
background worker's main function, and must be unblocked by it; this is to
allow the process to customize its signal handlers, if necessary.
-   Signals can be unblocked in the new process by calling
-   BackgroundWorkerUnblockSignals and blocked by calling
-   BackgroundWorkerBlockSignals.
+   It is important that all background workers set up and unblock signal
+   handling before they enter their main loops. Signal handling in background
+   workers is discussed separately in .
   
 
   
@@ -296,13 +307,74 @@ typedef struct BackgroundWorker
   
 
   
-   The src/test/modules/worker_spi module
-   contains a working example,
-   which demonstrates some useful techniques.
+   Background workers that require inter-process communication and/or
+   synchronisation should use PostgreSQL's built-in IPC and concurrency
+   features as discussed in 
+   and .
+  
+
+  
+   If a background worker needs to sleep or wait for activity it should
+   always use PostgreSQL's
+   interupt-aware APIs for the purpose. Do not usleep(),
+   system(), make blocking system calls, etc.
+  
+
+  
+   The src/test/modules/worker_spi and
+   src/test/modules/test_shm_mq contain working examples
+   that demonstrates some useful techniques.
   
 
   
The maximum number of registered background workers is limited by
.
   
+
+  
+   Signal Handling in Background Workers
+
+   
+Signals can be unblocked in the new process by calling
+BackgroundWorkerUnblockSignals and blocked by calling
+BackgroundWorkerBlockSignals.
+The default signal handlers installed for background workers do
+not interrupt queries or blocking calls into other postgres code
+so they are only suitable for simple background workers that frequently and
+predictably return control to their main loop. If your worker uses the
+default background worker signal handling it should call
+HandleMainLoopInterrupts() on each pass