Re: [HACKERS] Free-space-map management thoughts

2003-03-02 Thread Stephen Marshall




Tom Lane wrote:

  Stephen Marshall [EMAIL PROTECTED] writes:
  
  
2. The histogram concept is a neat idea, but I think some reorganization 
of the page information might make it unnecessary.  Currently the FSM 
pages are sorted by BlockNumber.  This was particularly useful for 
adding information about a single page, but since that interface is no 
longer to be supported, perhaps the decision to sort by BlockNumber 
should also be revisited.  

  
  
I was thinking about that, but we do still need to handle
RecordAndGetFreeSpace --- in fact that should be the most common
operation.  The histogram approximation seems an okay price to pay for
not slowing down RecordAndGetFreeSpace.  If you wanted to depend on
the ordering-by-free-space property to any large extent,
RecordAndGetFreeSpace would actually have to move the old page down in
the list after adjusting its free space :-(
  

I hadn't considered the needs of RecordAndGetFreeSpace. It is called so
much more than MultiRecordFreeSpace that it make much better sense to optimize
it, and hence organize the page information by BlockNumber.

I think you just sold me on the histogram idea :)  but I still have some
thoughts about its behavior in the oversubscribed state.

If I understand the concept correctly, the histogram will only be calculated
when MultiRecordFreeSpace is called AND the FSM is oversubscribed. However,
when it is called, we will need to calculate a histogram for, and potentially
trim data from, all relations that have entries in the FSM. 

When vacuuming the entire database, we will end up with an N-squared loop
where we iterate over all the relations in vacuum, and iterate over them
again in each call to MultiRecordFreeSpace that occurs within each vacuum.
If each relation consistantly requests the storage of the same amount of
page info during each vacuum, the extra work of this N-squared loop will
probably disappear after the system settles into an equilibrium, but inconsistant
requests could cause more oscillations in the free space adjustment.

Do I understand how this will work properly, or did I miss something?

In any event, I don't really think this is a problem, just something to pay
attention to. It also highlights the need to make the histogram calculation
and free space adjustment as efficient as possible.

By-the-way, I think your other suggestions are great (e.g. changes to the
public API, maintaining more internal statics, reporting more info in VACUUM
VERBOSE, ensuring that a minimum amout of freespace info is retained for
all relations). I think this will be a nice improvement to how postgres
reclaims disk space.





Re: [HACKERS] Free-space-map management thoughts

2003-03-02 Thread Stephen Marshall
Tom,

I'm happy to see your attentions turning back to the FSM.  I like the 
design, but I do have a few suggestions, particularly about how to 
handle oversubscription of the FSM.

1. When the FSM is oversubscribed and one is trying to decide which 
pages to keep, remember that page info is stored in  groups of 
CHUNK_SIZE pages, where CHUNK_SIZE is current 32.  Thus, if you need to 
store info for 1 page, you have already committed space in the FSM for 
CHUNK_SIZE pages, so you might as well fill that chunk with valid page 
information.

Such logic is not needed just for optimization, but also to prevent the 
oversubscription logic from trying to use more chunks than the FSM has.

2. The histogram concept is a neat idea, but I think some reorganization 
of the page information might make it unnecessary.  Currently the FSM 
pages are sorted by BlockNumber.  This was particularly useful for 
adding information about a single page, but since that interface is no 
longer to be supported, perhaps the decision to sort by BlockNumber 
should also be revisited.  

If we sort the page info by available space, we could then use binary 
search to find space thresholds when we are handling oversubscription. 
I think this would be both faster and more exact than the histogram 
approach.

Sorting by available space would make the sgmr code a bit less 
efficient, as we would not be able to use binary search to skip to the 
min block number provided in MultiRecordFreeSpace.  However, lazy 
vacuuming would be more efficient, as this function starts with pages 
ordered by available space, then sorts the page info by block number 
just prior to the call of MultiRecordFreeSpace.

Am I missing something that requires the FSM to be ordered by block number?

Yours,
Steve Marshall
-
Tom Lane wrote:
I've been thinking about improving the algorithm that the free space map
(FSM) uses to decide what to store when it's not got enough shared
memory to keep track of everything.  The present design uses a dynamically
adjusted threshold for each relation, throwing away pages whose free
space amount is less than the threshold.  This unfortunately seems not
to work as well as I'd hoped when I wrote it :-(.  In particular it
cannot cope effectively with situations where many pages have exactly
the same amount of free space --- it degenerates to all-or-nothing.
This problem has become acute now that btree indexes use the FSM to
keep track of free pages: by definition, all those pages have exactly
the same amount of free space.
I had some off-list discussions last fall with Steve Marshall, who
was trying to improve the thresholding algorithm to work better, but
what he ended up with seemed to me to have a lot of ad-hoc logic and
magic numbers in it.  So I wasn't real satisfied with that.
This is a request for comments about the following redesign:

1. Continue to keep track of the average request size seen by
GetPageWithFreeSpace(), but make it a pure request-size average; don't
muck it up with thresholding adjustments.  Add an entry point to make
the average request size for a relation available to callers.  VACUUM
can use this as a starting point for its internal decisions about which
pages are even worth reporting to the FSM.
2. Eliminate retail addition of FSM entries.  RecordFreeSpace() isn't
being used anyway, and we can restrict RecordAndGetPageWithFreeSpace()
to only update existing entries not make new ones.  Only wholesale
replacement of a relation's FSM data, via MultiRecordFreeSpace(), need
be supported as a way of adding page entries to FSM.
3. With the above two changes, the numbers of pages passed to the FSM
by MultiRecordFreeSpace() calls become useful statistics in themselves.
We can keep track of those numbers in the FSM's per-relation statistics
independently of the number of pages actually stored in FSM.
4. In particular, the sum of the MultiRecordFreeSpace (henceforth MRFS)
page counts gives us the total number of pages we would *like* to keep
track of; the ratio of this number to the actual allocated max_fsm_pages
is our oversubscription ratio.  (One thing we can do in passing is
make VACUUM VERBOSE print these numbers, so that people finally have
some intelligent way of adjusting their FSM parameters.)
5. When FSM space is oversubscribed, we can divide each relation's MRFS
requested page count by the oversubscription ratio to arrive at an exact
target page count for each relation.  This page count is stable as long
as the space-allocation picture isn't changing much, which is a big
improvement over the existing inherently history-dependent thresholding
algorithm.
6. A reasonably effective way to reduce a relation's stored page list
to the target (or select out the pages to actually remember from an MRFS
request) is as follows:
 * Prescan the page data to compute a histogram of available-space
   values, with maybe 32 bins.
 * Find the histogram bin whose inclusion would make us exceed the target
   page count.  

[HACKERS] - Proposal for repreparing prepared statements

2006-09-13 Thread Stephen Marshall
The following is a proposal for work I'd like to do to force 
long-running backend processes to reprepare their prepared statements.  
It would be used in cases where the user knows they have made a database 
change that will invalidate an existing prepared statement. 


I look forward to comments from the community.

I propose creating a new system administration function to force 
repreparation of prepared statements in all backends.  The functionality 
could be extended to include re-initialization of other kinds of 
per-backend data.


This proposal addresses, to some degree, the prepare-alter-exec issue 
discussed in various mailing list postings, and the following wish-list 
item:


# Invalidate prepared queries, like INSERT, when the table definition is 
altered


However, the solution would only be partial, as it would be the 
responsibility of database clients to call the system administration 
function when needed.  Alternately, additional integration work could be 
done to invoke this logic automatically whenever the columns of any 
table are altered.


--
Here is what I propose:

We define a new system administration function called 
pg_reload_per_backend_data.  This function would work much like 
pg_reload_conf, i.e. it would require superuser privileges and would 
work by sending a signal to the postmaster that would then be propagated 
to all the child backends (but not the special ones, like the 
bgwriter).  The signal handling logic for the backends would be modified 
to respond to the signal by reinitializing any data cached in the 
backend's memory space, such as prepared statements.  Each kind of data 
that would be reinitialized would require special logic, as they would 
all be reinitialized in their own particular way.


Choosing an appropriate signal to send might be difficult, as the list 
of available signals is somewhat restricted.  The user-defined signals 
would be a natural choice, but it appears SIGUSR1 is used for sinval 
or catchup events, while SIGUSR2 is used for asynchronous notification.  
Use of the real time signals (signal numbers = 32) might be possible, 
but could have portability problems.  Another alternative would be to 
overload SIGHUP, so that it causes both configuration reloads and 
reloading of per-backend data.  This makes some sense, since most 
configuration parameters are basically a special form of per-backend 
data.  However, changing the behavior of an existing signal might have 
undesirable side effects.  Overall, I'm very open to suggestions 
regarding the appropriate signal to use.


To implement the repreparation logic, a new function called 
RepreparePreparedStatements() could be added to source files 
backend/commands/prepare.[ch].  This function would be called by a 
signal handler installed the backends within backend/tcop/postgres.c.  
RepreparePreparedStatements would do the equivalent of iterating over 
the prepared_queries hash table and executing DropPreparedStatement() 
and PrepareQuery on each.  However, it is possible that some refactoring 
of the logic would be needed to improve performance and make the code 
more robust.


The scope of pg_reload_per_backend_data could also be expanded to 
include reinitialization of other data that resides in the memory space 
of individual backend processes.  An example of such cached entities are 
reusable modules associated with a particular procedural language, e.g. 
the TCL modules found in the table pltcl_modules.  Once a such a module 
is used in a particular backend, it remains held in backend memory and 
changes to the disk version are not noticed.  There is also no way to 
undefine any global variables associated with such modules.


I have not given much consideration to the implementation for reloading 
modules, but doing the equivalent of the SQL command LOAD 'libname' 
for all dynamically loaded libraries should have the desired effect (at 
least it does for the library that implements the PL/TCL language, 
pltcl.so).  Perhaps the the general response should be to reload any 
libraries that have been dynamically-loaded by the particular backend.


--
Here are few permutations of this plan that could be considered:

1. Bundle pg_reload_per_backend_data functionality with pg_reload_conf.

Pros: Avoids having to find an appropriate unused signal
 Logical consistancy with reloading config, which could be considered a
 special case of reloading per-backend data.
Cons: Changes behavior of an existing functionality, which has the risk of
 unintended side-effects.
 Gives less fine-grained control over when per-backend data is 
reloaded.


2. Break pg_reload_per_backend_data functional into multiple functions.

Pros: Can assign more descriptive names to the functionality, e.g.
 pg_reload_ddl, pg_reprepare_statements, etc.
 Finer grained control over which kind of reloading is performed.
Cons: Require more use of the scarce list of available signals.