Hi Thomas,

On Friday, December 28, 2018 6:43 AM Thomas Munro 
<thomas.mu...@enterprisedb.com> wrote:
> [...]if you have ideas about the validity of the assumptions, the reason it 
> breaks initdb, or any other aspect of this approach (or alternatives), please 
> don't let me stop you, and of course please feel free to submit this, an 
> improved version or an alternative proposal [...]

Sure. Thanks. I'd like to try to work on the idea. I also took a look at the 
code, and I hope you don't mind if I ask for clarifications 
(explanation/advice/opinions) on the following, since my postgres experience is 
not substantial enough yet.

(1) I noticed that you used a "relsize_change_counter" to store in 
MdSharedData. Is my understanding below correct?

The intention is to cache the rel_size per-backend (lock-free), with an array 
of relsize_change_counter to skip using lseek syscall when the counter does not 
change.
In _mdnblocks(), if the counter did not change, the cached rel size (nblocks) 
is used and skip the call to FileSize() (lseek to get and cache rel size). That 
means in the default Postgres master, lseek syscall (through FileSize()) is 
called whenever we want to get the rel size (nblocks).

On the other hand, the simplest method I thought that could also work is to 
only cache the file size (nblock) in shared memory, not in the backend process, 
since both nblock and relsize_change_counter are uint32 data type anyway. If 
relsize_change_counter can be changed without lock, then nblock can be changed 
without lock, is it right? In that case, nblock can be accessed directly in 
shared memory. In this case, is the relation size necessary to be cached in 
backend?

(2) Is the MdSharedData temporary or permanent in shared memory?
from the patch:
 typedef struct MdSharedData
 {
        /* XXX could have an array of these, and use rel OID % nelements? */ 
        pg_atomic_uint32        relsize_change_counter;
 } MdSharedData;
 
 static MdSharedData *MdShared;

What I intend to have is a permanent hashtable that will keep the file size 
(eventually/future dev, including table addresses) in the shared memory for 
faster access by backend processes. The idea is to keep track of the 
unallocated blocks, based from how much the relation has been extended or 
truncated. Memory for this hashtable will be dynamically allocated.

Thanks, 
Kirk Jamison

Reply via email to