Hi,
On 2020-10-28 14:08:52 +0900, Michael Paquier wrote:
> Thanks for confirming. I have gone through the whole set today,
> splitted the thing into two commits and applied them. We had
> buildfarm member florican complain about a mistake in one of the
> GetDatum() calls that I took care of already, and there is nothing
> else on my radar.
The code does IO while holding the buffer mapping lock. That seems
*entirely* unacceptable to me. That basically locks 1/128 of shared
buffers against concurrent mapping changes, while reading data that is
likely not to be on disk? Seriously?
/* see if the block is in the buffer pool or not */
LWLockAcquire(partLock, LW_SHARED);
buf_id = BufTableLookup(&buf_tag, buf_hash);
if (buf_id >= 0)
{
uint32 buf_state;
/*
* Found it. Now, retrieve its state to know what to do with
it, and
* release the pin immediately. We do so to limit overhead as
much as
* possible. We keep the shared LWLock on the target buffer
mapping
* partition for now, so this buffer cannot be evicted, and we
acquire
* an I/O Lock on the buffer as we may need to read its
contents from
* disk.
*/
bufdesc = GetBufferDescriptor(buf_id);
LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED);
buf_state = LockBufHdr(bufdesc);
UnlockBufHdr(bufdesc, buf_state);
/* If the page is dirty or invalid, skip it */
if ((buf_state & BM_DIRTY) != 0 || (buf_state & BM_TAG_VALID)
== 0)
{
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
LWLockRelease(partLock);
return true;
}
/* Read the buffer from disk, with the I/O lock still held */
smgrread(smgr, forknum, blkno, buffer);
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
}
else
{
/*
* Simply read the buffer. There's no risk of modification on
it as
* we are holding the buffer pool partition mapping lock.
*/
smgrread(smgr, forknum, blkno, buffer);
}
The justification in the in-shared-buffers case seems to completely
mis-judge costs too:
* Found it. Now, retrieve its state to know what to do with
it, and
* release the pin immediately. We do so to limit overhead as
much as
* possible. We keep the shared LWLock on the target buffer
mapping
* partition for now, so this buffer cannot be evicted, and we
acquire
* an I/O Lock on the buffer as we may need to read its
contents from
* disk.
a pin is cheap. Holding the partition lock is not.
Also, using char[BLCKSZ] as a buffer isn't ok. This should use
PGAlignedBlock:
/*
* Use this, not "char buf[BLCKSZ]", to declare a field or local variable
* holding a page buffer, if that page might be accessed as a page and not
* just a string of bytes. Otherwise the variable might be under-aligned,
* causing problems on alignment-picky hardware. (In some places, we use
* this to declare buffers even though we only pass them to read() and
* write(), because copying to/from aligned buffers is usually faster than
* using unaligned buffers.) We include both "double" and "int64" in the
* union to ensure that the compiler knows the value must be MAXALIGN'ed
* (cf. configure's computation of MAXIMUM_ALIGNOF).
*/
typedef union PGAlignedBlock
I think this needs to be quickly reworked or reverted.
Greetings,
Andres Freund