[HACKERS] - Proposal for repreparing prepared statements
The following is a proposal for work I'd like to do to force long-running backend processes to reprepare their prepared statements. It would be used in cases where the user knows they have made a database change that will invalidate an existing prepared statement. I look forward to comments from the community. I propose creating a new system administration function to force repreparation of prepared statements in all backends. The functionality could be extended to include re-initialization of other kinds of per-backend data. This proposal addresses, to some degree, the prepare-alter-exec issue discussed in various mailing list postings, and the following wish-list item: # Invalidate prepared queries, like INSERT, when the table definition is altered However, the solution would only be partial, as it would be the responsibility of database clients to call the system administration function when needed. Alternately, additional integration work could be done to invoke this logic automatically whenever the columns of any table are altered. -- Here is what I propose: We define a new system administration function called pg_reload_per_backend_data. This function would work much like pg_reload_conf, i.e. it would require superuser privileges and would work by sending a signal to the postmaster that would then be propagated to all the child backends (but not the special ones, like the bgwriter). The signal handling logic for the backends would be modified to respond to the signal by reinitializing any data cached in the backend's memory space, such as prepared statements. Each kind of data that would be reinitialized would require special logic, as they would all be reinitialized in their own particular way. Choosing an appropriate signal to send might be difficult, as the list of available signals is somewhat restricted. The user-defined signals would be a natural choice, but it appears SIGUSR1 is used for sinval or catchup events, while SIGUSR2 is used for asynchronous notification. Use of the real time signals (signal numbers = 32) might be possible, but could have portability problems. Another alternative would be to overload SIGHUP, so that it causes both configuration reloads and reloading of per-backend data. This makes some sense, since most configuration parameters are basically a special form of per-backend data. However, changing the behavior of an existing signal might have undesirable side effects. Overall, I'm very open to suggestions regarding the appropriate signal to use. To implement the repreparation logic, a new function called RepreparePreparedStatements() could be added to source files backend/commands/prepare.[ch]. This function would be called by a signal handler installed the backends within backend/tcop/postgres.c. RepreparePreparedStatements would do the equivalent of iterating over the prepared_queries hash table and executing DropPreparedStatement() and PrepareQuery on each. However, it is possible that some refactoring of the logic would be needed to improve performance and make the code more robust. The scope of pg_reload_per_backend_data could also be expanded to include reinitialization of other data that resides in the memory space of individual backend processes. An example of such cached entities are reusable modules associated with a particular procedural language, e.g. the TCL modules found in the table pltcl_modules. Once a such a module is used in a particular backend, it remains held in backend memory and changes to the disk version are not noticed. There is also no way to undefine any global variables associated with such modules. I have not given much consideration to the implementation for reloading modules, but doing the equivalent of the SQL command LOAD 'libname' for all dynamically loaded libraries should have the desired effect (at least it does for the library that implements the PL/TCL language, pltcl.so). Perhaps the the general response should be to reload any libraries that have been dynamically-loaded by the particular backend. -- Here are few permutations of this plan that could be considered: 1. Bundle pg_reload_per_backend_data functionality with pg_reload_conf. Pros: Avoids having to find an appropriate unused signal Logical consistancy with reloading config, which could be considered a special case of reloading per-backend data. Cons: Changes behavior of an existing functionality, which has the risk of unintended side-effects. Gives less fine-grained control over when per-backend data is reloaded. 2. Break pg_reload_per_backend_data functional into multiple functions. Pros: Can assign more descriptive names to the functionality, e.g. pg_reload_ddl, pg_reprepare_statements, etc. Finer grained control over which kind of reloading is performed. Cons: Require more use of the scarce list of available signals.
Re: [HACKERS] - Proposal for repreparing prepared statements
Stephen Marshall [EMAIL PROTECTED] writes: The following is a proposal for work I'd like to do to force long-running backend processes to reprepare their prepared statements. It would be used in cases where the user knows they have made a database change that will invalidate an existing prepared statement. There should be no need for users to concern themselves with this. The direction we've been intending to go in is to automatically invalidate stored plans when any related schema or statistics change occurs, forcing a re-plan on any subsequent use. See past discussions (IIRC, Neil Conway actually did some work on this idea earlier this year, but didn't get it done). The appropriate cross-backend communication mechanism already exists: it's the catcache/relcache invalidation code. No need to fool with finding a spare signal; and you can't do any meaningful work in a signal handler anyway. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] - Proposal for repreparing prepared statements
Tom, Thanks for the update on the roadmap. I have a couple of questions: 1. Is the invalidation of stored plans going to be part of 8.2? If not, any idea when it would be available? I'd be willing to work on this, if it would help. 2. Is there any plan for the other part of my proposal, i.e. the ability to force one or all backends to reload their dynamically linked libraries? This is needed by backends that use loadable modules with procedural languages (like pltcl_modules) in cases where a loadable module gets updated. Thanks, Steve -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 13, 2006 2:08 PM To: Marshall, Steve Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] - Proposal for repreparing prepared statements Stephen Marshall [EMAIL PROTECTED] writes: The following is a proposal for work I'd like to do to force long-running backend processes to reprepare their prepared statements. It would be used in cases where the user knows they have made a database change that will invalidate an existing prepared statement. There should be no need for users to concern themselves with this. The direction we've been intending to go in is to automatically invalidate stored plans when any related schema or statistics change occurs, forcing a re-plan on any subsequent use. See past discussions (IIRC, Neil Conway actually did some work on this idea earlier this year, but didn't get it done). The appropriate cross-backend communication mechanism already exists: it's the catcache/relcache invalidation code. No need to fool with finding a spare signal; and you can't do any meaningful work in a signal handler anyway. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] - Proposal for repreparing prepared statements
Marshall, Steve [EMAIL PROTECTED] writes: 1. Is the invalidation of stored plans going to be part of 8.2? If not, any idea when it would be available? I'd be willing to work on this, if it would help. No, it did not get done; feel free to work on it for 8.3. 2. Is there any plan for the other part of my proposal, i.e. the ability to force one or all backends to reload their dynamically linked libraries? I would vote urgently against that, as not all loadable libraries are necessarily designed to survive being reloaded. Nor have I heard any previous requests for it; how often do loadable libraries get updated in production? Stuff like pltcl.so would only change at a version update, for which you have to restart the postmaster anyway to bring in the new executable image for postgres itself. The LOAD command is meant for development, not as something that would be invoked in production ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] - Proposal for repreparing prepared statements
Marshall, Steve [EMAIL PROTECTED] writes: 1. Is the invalidation of stored plans going to be part of 8.2? If not, any idea when it would be available? I'd be willing to work on this, if it would help. No, it did not get done; feel free to work on it for 8.3. [steve's reply]: Could you clue me into the buzz words for searching the lists to find out more details on the plan, what has been done, and what is left to do? 2. Is there any plan for the other part of my proposal, i.e. the ability to force one or all backends to reload their dynamically linked libraries? I would vote urgently against that, as not all loadable libraries are necessarily designed to survive being reloaded. Nor have I heard any previous requests for it; how often do loadable libraries get updated in production? Stuff like pltcl.so would only change at a version update, for which you have to restart the postmaster anyway to bring in the new executable image for postgres itself. The LOAD command is meant for development, not as something that would be invoked in production [steve's reply]: I can understand your reservations about reloading libraries, and I'm really not committed to doing that. Reloading libraries is just a mechanism to do what I really want to do. I guess that did not come across very clearly in my first email, so let me explain again. I want to be able to tell a backend to reinitialize some of the cached data it is holding in static variables. Generally I want to do this when the something cached in memory has gotten out-of-sync with what is in the database. The two examples that affect me are: 1. A database table has changed in its structure, and the prepared statements related to it either no longer work, or do the wrong thing. 2. A TCL module in pltcl_modules is updated with a new version, but running processes that have already executed a function from this module do not pick up the change. The former is handled by the prepared statement invalidation plan, but the latter is not. Effectively, I want something that does the equivalent to what CREATE OR REPLACE FUNCTION does for SQL function, except for these procedural language modules. The current way the pltcl_modules works is like have a CREATE OR REPLACE FUNCTION capability that will only effect newly started connections, but has no effect on existing connections. The reason I advocated the library reloading is not because pltcl.so has changed (which should be rare), but that by reloading it, the static variables associated with the library would get reinitialized. Essentially reloading the library pltcl.so has the side effect of causing the TCL modules in pltcl_modules to get reread from disk. I'd love to have a lighter-weight way to do this, but unfortunately, I have not come up with one. If someone has a better way to do this, I'm all ears. Currently the only way I know of to do a complete reinitialization of static variables is to force a disconnection on all the backend processes, and let them all reconnect to new postgres backends that have clean slates in their static variables. There does not seem to be a way to do this via SQL; as far as I know, this requires access to the database server as the postgres user or superuser and sending a TERM signal to each backend you want to kill. - Basically, I've come up with two basic approaches for forcing a reload of pltcl_modules: have a system admin function to do some kind of resetting in each backend that cleans out the static variables associated with a procedural language, or kill most or all of the backends and let the client applications handle the errors and reset their connections. Currently, the only resetting operation I know to work is reloaded the pltcl.so library. If anyone has any better ideas for how to crack this problem, I love to hear them. Here's some more details on the approaches I've thought of, for those who are interested: 1. Add special logic to pltcl_loadmod (the utility that facilitates inserting and updating TCL modules in the database). However, this would still require some kind of communication to all the other backend processes, which would probably require something like the next suggestion 2. A system admin command to force the reinitialization of the static data associated with a procedural language. This command would need to communicate either with all the backends, or to a backend with a particular pid (like pg_cancel_backend does); it is not sufficient for it only to effect the backend associated with its connection. This command could be specific to the procedural language (e.g. reload_pltcl_modules), or allow a particular dynamically linked library to be reload (reload_library_if_already_loaded(libname). There are probably other ways to effect the reinitialization, but all other them seem to require communication to a backend process other than the one we are connected to, which made it seem to me like a candidate for
Re: [HACKERS] - Proposal for repreparing prepared statements
Marshall, Steve [EMAIL PROTECTED] writes: I want to be able to tell a backend to reinitialize some of the cached data it is holding in static variables. Generally I want to do this when the something cached in memory has gotten out-of-sync with what is in the database. The two examples that affect me are: 1. A database table has changed in its structure, and the prepared statements related to it either no longer work, or do the wrong thing. This we need to fix. 2. A TCL module in pltcl_modules is updated with a new version, but running processes that have already executed a function from this module do not pick up the change. To be frank, this is not Postgres' problem, it's pltcl's. I think the pltcl_modules facility is poorly designed (precisely because it doesn't fit into any sane way of handling the schema-update problem) and needs to be thrown out and redone. If the units you were reloading were Postgres functions, or had some other way of being represented in the system catalogs, then we'd have a reasonable way to attack the problem. But forcing a reload of pltcl.so is nothing but a kluge --- it leaks memory like there's no tomorrow, and it's only an accident that it fails to crash. I don't want to design a further kluge on top of it. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] - Proposal for repreparing prepared statements
But forcing a reload of pltcl.so is nothing but a kluge --- it leaks memory like there's no tomorrow, and it's only an accident that it fails to crash. I don't want to design a further kluge on top of it. Are you saying that pltcl.so leaks memory in general, or that forcing a reload of the pltcl.so library leaks memory? If the former, I haven't seen evidence of it in my use of pltcl, but I'd be interested to know ofo problems other have had with it. Yours, Steve Marshall ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] - Proposal for repreparing prepared statements
Marshall, Steve [EMAIL PROTECTED] writes: But forcing a reload of pltcl.so is nothing but a kluge --- it leaks memory like there's no tomorrow, and it's only an accident that it fails to crash. I don't want to design a further kluge on top of it. Are you saying that pltcl.so leaks memory in general, or that forcing a reload of the pltcl.so library leaks memory? The latter. There's no mechanism to release the previous instance's data structures (Tcl interpreters, function bodies, etc). regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match