Re: Separate HEAP WAL replay logic into its own file

2024-06-19 Thread Li, Yong


> On Jun 18, 2024, at 20:42, Melanie Plageman  wrote:
> 
> External Email
> 
> On Mon, Jun 17, 2024 at 9:12 PM Li, Yong  wrote:
>> 
>> As a newcomer, when I was walking through the code looking for WAL replay 
>> related code, it was relatively easy for me to find them for the B-Tree 
>> access method because of the “xlog” hint in the file names. It took me a 
>> while to find the same for the heap access method. When I finally found them 
>> (via text search), it was a small surprise. Having different file 
>> organizations for different access methods gives me this urge to make 
>> everything consistent. I think it will make it easier for newcomers, and it 
>> will reduce the mental load for everyone to remember that heap replay is 
>> inside the heapam.c not some “???xlog.c”.
> 
> That makes sense. The branch for PG18 has not been cut yet, so I
> recommend registering this patch for the July commitfest [1] so it
> doesn't get lost.
> 
> - Melanie
> 

Thanks for the positive feedback. I’ve added the patch to the July CF.

Yong



Re: Separate HEAP WAL replay logic into its own file

2024-06-17 Thread Li, Yong


> On Jun 17, 2024, at 23:01, Melanie Plageman  wrote:
> 
> External Email
> 
> On Mon, Jun 17, 2024 at 2:20 AM Li, Yong  wrote:
>> 
>> Hi PostgreSQL hackers,
>> 
>> For most access methods in PostgreSQL, the implementation of the access 
>> method itself and the implementation of its WAL replay logic are organized 
>> in separate source files.  However, the HEAP access method is an exception.  
>> Both the access method and the WAL replay logic are collocated in the same 
>> heapam.c.  To follow the pattern established by other access methods and to 
>> improve maintainability, I made the enclosed patch to separate HEAP’s replay 
>> logic into its own file.  The changes are straightforward.  Move the replay 
>> related functions into the new heapam_xlog.c file, push the common 
>> heap_execute_freeze_tuple() helper function into the heapam.h header, and 
>> adjust the build files.
> 
> I'm not against this change, but I am curious at what inspired this.
> Were you looking at Postgres code and simply noticed that there isn't
> a heapam_xlog.c (like there is a nbtxlog.c etc) and thought that you
> wanted to change that? Or is there some specific reason this would
> help you as a Postgres developer, user, or ecosystem member?
> 
> - Melanie

As a newcomer, when I was walking through the code looking for WAL replay 
related code, it was relatively easy for me to find them for the B-Tree access 
method because of the “xlog” hint in the file names. It took me a while to find 
the same for the heap access method. When I finally found them (via text 
search), it was a small surprise. Having different file organizations for 
different access methods gives me this urge to make everything consistent. I 
think it will make it easier for newcomers, and it will reduce the mental load 
for everyone to remember that heap replay is inside the heapam.c not some 
“???xlog.c”.

Yong

Separate HEAP WAL replay logic into its own file

2024-06-17 Thread Li, Yong
Hi PostgreSQL hackers,

For most access methods in PostgreSQL, the implementation of the access method 
itself and the implementation of its WAL replay logic are organized in separate 
source files.  However, the HEAP access method is an exception.  Both the 
access method and the WAL replay logic are collocated in the same heapam.c.  To 
follow the pattern established by other access methods and to improve 
maintainability, I made the enclosed patch to separate HEAP’s replay logic into 
its own file.  The changes are straightforward.  Move the replay related 
functions into the new heapam_xlog.c file, push the common 
heap_execute_freeze_tuple() helper function into the heapam.h header, and 
adjust the build files.

I hope people find this straightforward refactoring helpful.


Yong





heapam_refactor.patch
Description: heapam_refactor.patch


Re: Proposal to add page headers to SLRU pages

2024-06-13 Thread Li, Yong


> On Jun 10, 2024, at 16:01, Michael Paquier  wrote:
> 
> External Email
> 
> From: Michael Paquier 
> Subject: Re: Proposal to add page headers to SLRU pages
> Date: June 10, 2024 at 16:01:50 GMT+8
> To: Bertrand Drouvot 
> Cc: "Li, Yong" , Jeff Davis , Aleksander 
> Alekseev , PostgreSQL Hackers 
> , "Bagga, Rishu" , 
> Robert Haas , Andrey Borodin , 
> "Shyrabokau, Anton" 
> 
> 
> On Mon, Jun 10, 2024 at 07:19:56AM +0000, Bertrand Drouvot wrote:
>> On Tue, Mar 19, 2024 at 06:48:33AM +, Li, Yong wrote:
>>> Unfortunately, the test requires a setup of two different versions of PG. I 
>>> am not
>>> aware of an existing test infrastructure which can run automated tests 
>>> using two
>>> PGs. I did manually verify the output of pg_upgrade.
>> 
>> I think there is something in t/002_pg_upgrade.pl (see 
>> src/bin/pg_upgrade/TESTING),
>> that could be used to run automated tests using an old and a current version.
> 
> Cluster.pm relies on install_path for stuff, where it is possible to
> create tests with multiple nodes pointing to different installation
> paths.  This allows mixing nodes with different build options, or just
> different major versions like pg_upgrade's perl tests.
> —
> Michael
> 
> 

Thanks for pointing this out. Here is what I have tried:
1. Manually build and install PostgreSQL from the latest source code.
2. Following the instructions from src/bin/pg_upgrade to manually dump the 
regression database.
3. Apply the patch to the latest code, and build from the source.
4. Run “make check” by following the instructions from src/bin/pg_upgrade and 
setting up the olddump and oldinstall to point to the “old” installation used 
in step 2.

All tests pass.


Yong



Re: Proposal to add page headers to SLRU pages

2024-03-19 Thread Li, Yong

>> - New comments have been added to pg_upgrade to mention the SLRU
>>  page header change as the reason for upgrading clog files.
> 
> That seems reasonable, but were any alternatives discussed? Do we have
> consensus that this is the right thing to do?

In general, there are two approaches. Either we convert the existing clog files,
or we don’t.  The patch chooses to convert.

If we don’t, then the clog file code must be able to handle both formats. For,
XIDs in the range where the clog is written in the old format, segment and 
offset
computation must be done in one way, and for XIDs in a different range, it must
be computed in a different way.  To avoid changing the format in the middle of a
page, which must not happen, the new format must start from a clean page, 
possibly in a clean new segment.  If the database is extremely small and has 
only
a few transactions on the first page of clog, then we must either convert the 
whole
page (effectively the whole clog file), or we must skip the rest of the XIDs on 
the
page and ask the database to start from XIDs on the second page on restart.
Also, we need to consider where to store the cut-off XID and when to remove it.
All these details feel very complex and error prone to me.  Performing a 
one-time
conversion is the most efficient and straightforward approach to me. 

> 
> And if we use this approach, is there extra validation or testing that
> can be done?
> 
> Regards,
>Jeff Davis

Unfortunately, the test requires a setup of two different versions of PG. I am 
not
aware of an existing test infrastructure which can run automated tests using two
PGs. I did manually verify the output of pg_upgrade. 


Regards,
Yong



Re: Proposal to add page headers to SLRU pages

2024-03-11 Thread Li, Yong

> On Mar 9, 2024, at 05:22, Jeff Davis  wrote:
> 
> External Email
> 
> On Wed, 2024-03-06 at 12:01 +, Li, Yong wrote:
>> Rebase the patch against the latest HEAD.
> 
> The upgrade logic could use more comments explaining what's going on
> and why. As I understand it, it's a one-time conversion that needs to
> happen between 16 and 17. Is that right?
> 
> Regards,
>Jeff Davis
> 

> In the new code we effectively store only one LSN per page, which I
> understand is strictly worse.  Maybe the idea of doing away with these
> LSN groups should be reconsidered ... unless I completely misunderstand
> the whole thing.
> 
> --
> Álvaro Herrera PostgreSQL Developer  —


Thanks for the comments on LSN groups and pg_upgrade.

I have updated the patch to address both comments:
- The clog LSN group has been brought back.
  Now the page LSN on each clog page is used for honoring the write-ahead rule
  and it is always the highest LSN of all the LSN groups on the page.
  The LSN groups are used by TransactionIdGetStatus() as before.
- New comments have been added to pg_upgrade to mention the SLRU
  page header change as the reason for upgrading clog files.

Regards,
Yong

slru_page_header_v6.patch
Description: slru_page_header_v6.patch


Re: Proposal to add page headers to SLRU pages

2024-03-07 Thread Li, Yong

> On Mar 7, 2024, at 03:09, Stephen Frost  wrote:
> 
> External Email
> 
> From: Stephen Frost 
> Subject: Re: Proposal to add page headers to SLRU pages
> Date: March 7, 2024 at 03:09:59 GMT+8
> To: Alvaro Herrera 
> Cc: "Li, Yong" , Aleksander Alekseev 
> , PostgreSQL Hackers 
> , "Bagga, Rishu" , 
> Robert Haas , "Debnath, Shawn" , Andrey 
> Borodin , "Shyrabokau, Anton" 
> 
> 
> Greetings,
> 
> * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
>> I suppose this is important to do if we ever want to move SLRUs into
>> shared buffers.  However, I wonder about the extra time this adds to
>> pg_upgrade.  Is this something we should be concerned about?  Is there
>> any measurement/estimates to tell us how long this would be?  Right now,
>> if you use a cloning strategy for the data files, the upgrade should be
>> pretty quick ... but the amount of data in pg_xact and pg_multixact
>> could be massive, and the rewrite is likely to take considerable time.
> 
> While I definitely agree that there should be some consideration of
> this concern, it feels on-par with the visibility-map rewrite which was
> done previously.  Larger systems will likely have more to deal with than
> smaller systems, but it's still a relatively small portion of the data
> overall.
> 
> The benefit of this change, beyond just the possibility of moving them
> into shared buffers some day in the future, is that this would mean that
> SLRUs will have checksums (if the cluster has them enabled).  That
> benefit strikes me as well worth the cost of the rewrite taking some
> time and the minor loss of space due to the page header.
> 
> Would it be useful to consider parallelizing this work?  There's already
> parts of pg_upgrade which can be parallelized and so this isn't,
> hopefully, a big lift to add, but I'm not sure if there's enough work
> being done here CPU-wise, compared to the amount of IO being done, to
> have it make sense to run it in parallel.  Might be worth looking into
> though, at least, as disks have gotten to be quite fast.
> 
> Thanks!
> 
> Stephen
> 


Thank Alvaro and Stephen for your thoughtful comments.

I did a quick benchmark regarding pg_upgrade time, and here are the results.

Hardware spec:
MacBook Pro M1 Max - 10 cores, 64GB memory, 1TB Apple SSD

Operating system:
macOS 14.3.1

Complier:
Apple clang 15.0.0

Compiler optimization level: -O2


PG setups:
Old cluster:  PG 16.2 release (source build)
New cluster:  PG Git HEAD plus the patch (source build)


Benchmark steps:

1. Initdb for PG 16.2.
2. Initdb for PG HEAD.
3. Run pg_upgrade on the above empty database, and time the overall wall clock 
time.
4. In the old cluster, write 512MB all-zero dummy segment files (2048 segments) 
under pg_xact.
5. In the old cluster, write 512MB all-zero dummy segment files under 
pg_multixact/members.
6. In the old cluster, write 512MB all-zero dummy segment files under 
pg_multixact/offsets.
7. Purge the OS page cache.
7. Run pg_upgrade again, and time the overall wall clock time.


Test result:

On the empty database, pg_upgrade took 4.8 seconds to complete.

With 1.5GB combined SLRU data to convert, pg_upgrade took 11.5 seconds to 
complete.

It took 6.7 seconds to convert 1.5GB SLRU files for pg_upgrade.



For clog, 2048 segments can host about 2 billion transactions, right at the 
limit for wraparound.
That’s the maximum we can have.  2048 segments are also big for pg_multixact 
SLRUs.

Therefore, on a modern hardware, in the worst case, pg_upgrade will run for 7 
seconds longer.


Regards,

Yong

Re: Proposal to add page headers to SLRU pages

2024-03-06 Thread Li, Yong
Rebase the patch against the latest HEAD.

Regards,
Yong





slru_page_header_v5.patch
Description: slru_page_header_v5.patch


Re: locked reads for atomics

2024-01-16 Thread Li, Yong


> On Nov 28, 2023, at 05:00, Nathan Bossart  wrote:
> 
> External Email
> 
> Here's a v2 of the patch set in which I've attempted to address all
> feedback.  I've also added a pg_write_membarrier_u* pair of functions that
> provide an easy way to write to an atomic variable with full barrier
> semantics.  In the generic implementation, these are just aliases for an
> atomic exchange.
> 
> 0002 demonstrates how these functions might be used to eliminate the
> arch_lck spinlock, which is only ever used for one boolean variable.  My
> hope is that the membarrier functions make eliminating spinlocks for
> non-performance-sensitive code easy to reason about.
> 
> (We might be able to use a pg_atomic_flag instead for 0002, but that code
> seems intended for a slightly different use-case and has more complicated
> barrier semantics.)
> 
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com

Hi Nathan,

The patch looks good to me.

The patch adds two pairs of atomic functions that provide full-barrier 
semantics to atomic read/write operations. The patch also includes an example 
of how this new functions can be used to replace spin locks.

The patch applies cleanly to HEAD. “make check-world” also runs cleanly with no 
error.  I am moving it to Ready for Committers.

Regards,
Yong

Re: Proposal to add page headers to SLRU pages

2024-01-16 Thread Li, Yong
Rebase the patch against the latest HEAD.

Regards,
Yong



slru_page_header_v4.patch
Description: slru_page_header_v4.patch


Re: archive modules loose ends

2024-01-15 Thread Li, Yong


> On Nov 29, 2023, at 01:18, Nathan Bossart  wrote:
> 
> External Email
> 
> Here is a new version of the patch with feedback addressed.
> 
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com

Hi Nathan,

The patch looks good to me.  With the context explained in the thread, the 
patch is easy to understand.
The patch serves as a refactoring which pulls up common memory management and 
error handling concerns into the pgarch.c.  With the patch, individual archive 
callbacks can focus on copying the files and leave the boilerplate code to 
pgarch.c. 

The patch applies cleanly to HEAD.  “make check-world” also runs cleanly with 
no error.


Regards,
Yong

Re: Proposal to add page headers to SLRU pages

2024-01-04 Thread Li, Yong


> On Jan 2, 2024, at 19:35, Aleksander Alekseev  
> wrote:
>
> Thanks for the updated patch.
>
> cfbot seems to have some complaints regarding compiler warnings and
> also building the patch on Windows:
>
> http://cfbot.cputube.org/

Thanks for the information.  Here is the updated patch.

Regards,
Yong



slru_page_header_v3.patch
Description: slru_page_header_v3.patch


Re: Proposal to add page headers to SLRU pages

2023-12-18 Thread Li, Yong
> This work is being done in file.c – it seems to me the proper way to
> proceed would be to continue writing on-disk upgrade logic here.

> Besides that this looks good to me, would like to hear what others have to 
> say.

Thank you, Rishu for taking time to review the code.  I've updated the patch
and moved the on-disk upgrade logic to pg_upgrade/file.c.

I have also added this thread to the current Commitfest and hope this patch
will be  part of the 17 release.

The commitfest link:
https://commitfest.postgresql.org/46/4709/


Regards,
Yong,




slur_page_header_v2.patch
Description: slur_page_header_v2.patch


Re: Proposal to add page headers to SLRU pages

2023-12-08 Thread Li, Yong
Given so many different approaches were discussed, I have started a wiki to 
record and collaborate all efforts towards SLRU improvements.  The wiki 
provides a concise overview of all the ideas discussed and can serve as a 
portal for all historical discussions.  Currently, the wiki summarizes four 
recent threads ranging from identifier format change to page header change, to 
moving SLRU into the main buffer pool, to reduce lock contention on SLRU 
latches.  We can keep the patch related discussions in this thread and use the 
wiki as a live document for larger scale collaborations.

The wiki page is here:  https://wiki.postgresql.org/wiki/SLRU_improvements

Regarding the benefits of this patch, here is a detailed explanation:

  1.  Checksum is added to each page, allowing us to verify if a page has been 
corrupted when read from the disk.
  2.  The ad-hoc LSN group structure is removed from the SLRU cache control 
data and is replaced by the page LSN in the page header. This allows us to use 
the same WAL protocol as used by pages in the main buffer pool: flush all redo 
logs up to the page LSN before flushing the page itself. If we move SLRU caches 
into the main buffer pool, this change fits naturally.
  3.  It leaves further optimizations open. We can continue to pursue the goal 
of moving SLRU into the main buffer pool, or we can follow the lock partition 
idea. This change by itself does not conflict with either proposal.

Also, the patch is now complete and is ready for review.  All check-world tests 
including tap tests passed with this patch.


Regards,
Yong

From: Robert Haas 
Date: Friday, December 8, 2023 at 03:51
To: Debnath, Shawn 
Cc: Andrey Borodin , PostgreSQL Hackers 
, Aleksander Alekseev 
, Li, Yong , Shyrabokau, Anton 
, Bagga, Rishu 
Subject: Re: Proposal to add page headers to SLRU pages
External Email

On Thu, Dec 7, 2023 at 1:28 PM Debnath, Shawn  wrote:
> What is being proposed now is the simple and core functionality of introducing
> page headers to SLRU pages while continuing to be in the SLRU cache. This
> allows the whole project to be iterative and reviewers to better reason about
> the smaller set of changes being introduced into the codebase.
>
> Once the set of on-disk changes are in, we can follow up on optimizations.
> It may be moving to buffer cache or reviewing Dilip's approach in [1], we
> will have the option to be flexible in our approach.

I basically agree with this. I don't think we should let the perfect
be the enemy of the good. Shooting down this patch because it doesn't
do everything that we want is a recipe for getting nothing done at
all.

That said, I don't think that the original post on this thread
provides a sufficiently clear and detailed motivation for making this
change. For this to eventually be committed, it's going to need (among
other things) a commit message that articulates a convincing rationale
for whatever changes it makes. Here's what the original email said:

> It adds a checksum to each SLRU page, tracks page LSN as if it is a standard 
> page and eases future page enhancements.

Of those three things, in my opinion, the first is good and the other
two are too vague. I assume that most people who would be likely to
read a commit message would understand the value of pages having
checksums. But I can't immediately think of what the value of tracking
the page LSN as if it were a standard page might be, so that probably
needs more explanation. Similarly, at least one or two of the future
page enhancements that might be eased should be spelled out, and/or
the ways in which they would be made easier should be articulated.

--
Robert Haas
EDB: 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F=05%7C01%7Cyoli%40ebay.com%7C2cad2fe1de8a40f3167608dbf75de73c%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638375754901646398%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=hkccuGfpt1%2BKxuhk%2BJt%2F3HyYuJqQHYfizib76%2F9HtUU%3D=0<http://www.enterprisedb.com/>


slru_page_header_v1.patch
Description: slru_page_header_v1.patch


Proposal to add page headers to SLRU pages

2023-12-06 Thread Li, Yong
Hi all,

PostgreSQL currently maintains several data structures in the SLRU cache.  The 
current SLRU pages do not have any header, so it is impossible to checksum a 
page and verify its integrity.  It is very difficult to debug issues caused by 
corrupted SLRU pages.  Also, without a page header, page LSN is tracked in an 
ad-hoc fashion using LSN groups, which requires additional data structure in 
the shared memory.  At eBay, we are building on the patch shared by Rishu Bagga 
in [1], which adds the standard PageHeaderData to each SLRU page.  We believe 
that adding the standard page header to each SLRU page is the correct approach 
for the long run.  It adds a checksum to each SLRU page, tracks page LSN as if 
it is a standard page and eases future page enhancements.

The enclosed patch changes the address calculation logic for all 7 SLRUs in the 
following 6 files:
src/backend/access/transam/clog.c
src/backend/access/transam/commit_ts.c
src/backend/access/transam/multixact.c
src/backend/access/transam/subtrans.c
src/backend/commands/async.c
src/backend/storage/lmgr/predicate.c

The patch enables page checksum with changes to the following 2 files:
src/backend/access/transam/slru.c
src/bin/pg_checksums/pg_checksums.c

The patch removes the group LSNs defined for each SLRU cache. See changes to:
src/include/access/slru.h

The patch adds a few helper macros in the following files:
src/backend/storage/page/bufpage.c
src/include/storage/bufpage.h

The patch updates some test cases:
src/bin/pg_resetwal/t/001_basic.pl
src/test/modules/test_slru/test_slru.c

I am still working on patching the pg_upgrade.  Just love to hear your thoughts 
on the idea and the current patch.


Discussed with: Anton Shyrabokau and Shawn Debnath

[1] 
https://www.postgresql.org/message-id/flat/EFAAC0BE-27E9-4186-B925-79B7C696D5AC%40amazon.com


Regards,
Yong




slru_add_page_header.patch
Description: slru_add_page_header.patch