Re: WIP: Avoid creation of the free space map for small tables
On Fri, Mar 15, 2019 at 3:25 PM Amit Kapila wrote: > > I have committed the latest version of this patch. I think we can > wait for a day or two see if there is any complain from buildfarm or > in general and then we can close this CF entry. > Closed this CF entry. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Mar 15, 2019 at 3:40 PM John Naylor wrote: > > On Fri, Mar 15, 2019 at 5:55 PM Amit Kapila wrote: > > I have committed the latest version of this patch. I think we can > > wait for a day or two see if there is any complain from buildfarm or > > in general and then we can close this CF entry. IIRC, this was the > > last patch in the series, right? > > Great, thanks! I'll keep an eye on the buildfarm as well. > > I just spotted two comments in freespace.c that were true during > earlier patch revisions, but are no longer true, so I've attached a > fix for those. > LGTM, so committed. > There are no other patches in the series. > Okay. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Mar 15, 2019 at 5:55 PM Amit Kapila wrote: > I have committed the latest version of this patch. I think we can > wait for a day or two see if there is any complain from buildfarm or > in general and then we can close this CF entry. IIRC, this was the > last patch in the series, right? Great, thanks! I'll keep an eye on the buildfarm as well. I just spotted two comments in freespace.c that were true during earlier patch revisions, but are no longer true, so I've attached a fix for those. There are no other patches in the series. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services correct-local-map-comments.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Mar 14, 2019 at 7:46 PM Amit Kapila wrote: > > On Thu, Mar 14, 2019 at 12:37 PM John Naylor > wrote: > > > > On Thu, Mar 14, 2019 at 2:17 PM Amit Kapila wrote: > > > > > > 1. Added an Assert in new_cluster_needs_fsm() that old cluster version > > > should be >= 804 as that is where fsm support has been added. > > > > There is already an explicit check for 804 in the caller, > > > > Yeah, I know that, but I have added it to prevent this function being > used elsewhere. OTOH, maybe you are right that as per current code it > is superfluous, so we shouldn't add this assert. > I have committed the latest version of this patch. I think we can wait for a day or two see if there is any complain from buildfarm or in general and then we can close this CF entry. IIRC, this was the last patch in the series, right? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Mar 14, 2019 at 12:37 PM John Naylor wrote: > > On Thu, Mar 14, 2019 at 2:17 PM Amit Kapila wrote: > > > > 1. Added an Assert in new_cluster_needs_fsm() that old cluster version > > should be >= 804 as that is where fsm support has been added. > > There is already an explicit check for 804 in the caller, > Yeah, I know that, but I have added it to prevent this function being used elsewhere. OTOH, maybe you are right that as per current code it is superfluous, so we shouldn't add this assert. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Mar 14, 2019 at 2:17 PM Amit Kapila wrote: > > 1. Added an Assert in new_cluster_needs_fsm() that old cluster version > should be >= 804 as that is where fsm support has been added. There is already an explicit check for 804 in the caller, and the HEAD code is already resilient to FSMs not existing, so I think this is superfluous. > 2. Reverted the old cluster version check to <= 1100. There was > nothing wrong in the way you have written a check, but I find most of > the other places in the code uses that way. > 3. At one place changed the function header to be consistent with the > nearby code, run pgindent on the code and changed the commit message. Looks good to me, thanks. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Mar 14, 2019 at 7:08 AM John Naylor wrote: > > > [segfault problems] > > This now seems spurious. I ran make distclean, git pull, reapplied the > patch (leaving out the gettimeofday() calls), and now my upgrade perf > test works with default compiler settings. Not sure what happened, but > hopefully we can move forward. > Yeah, I took another pass through the patch and change minor things as you can see in the attached patch. 1. Added an Assert in new_cluster_needs_fsm() that old cluster version should be >= 804 as that is where fsm support has been added. 2. Reverted the old cluster version check to <= 1100. There was nothing wrong in the way you have written a check, but I find most of the other places in the code uses that way. 3. At one place changed the function header to be consistent with the nearby code, run pgindent on the code and changed the commit message. Let me know what you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v24-0001-During-pg_upgrade-conditionally-skip-transfer-of-FSM.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
> [segfault problems] This now seems spurious. I ran make distclean, git pull, reapplied the patch (leaving out the gettimeofday() calls), and now my upgrade perf test works with default compiler settings. Not sure what happened, but hopefully we can move forward. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Mar 13, 2019 at 7:42 PM John Naylor wrote: > > On Wed, Mar 13, 2019 at 8:18 PM Amit Kapila wrote: > > > First, I had a problem: On MacOS with their "gcc" wrapper around > > > clang, I got a segfault 11 when compiled with no debugging symbols. > > > > > > > Did you get this problem with the patch or both with and without the > > patch? If it is only with patch, then we definitely need to > > investigate. > > Only with the patch. > If the problem is reproducible, then I think we can try out a few things to narrow down or get some clue about the problem: (a) modify function new_cluster_needs_fsm() such that it always returns true as its first statement. This will tell us if the code in that function has some problem. (b) run with Valgrind -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Mar 13, 2019 at 8:18 PM Amit Kapila wrote: > > First, I had a problem: On MacOS with their "gcc" wrapper around > > clang, I got a segfault 11 when compiled with no debugging symbols. > > > > Did you get this problem with the patch or both with and without the > patch? If it is only with patch, then we definitely need to > investigate. Only with the patch. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Mar 13, 2019 at 4:57 PM John Naylor wrote: > > On Fri, Mar 8, 2019 at 7:43 PM Amit Kapila wrote: > > > Have you done any performance testing of this patch? I mean to say > > now that we added a new stat call for each table, we should see if > > that has any impact. Ideally, that should be compensated by the fact > > that we are now not transferring *fsm files for small relations. > > To be precise, it will only call stat if pg_class.relpages is below > the threshold. I suppose I could hack a database where all the > relpages values are wrong, but that seems like a waste of time. > Right. > > How > > about constructing a test where all relations are greater than 4 pages > > and then try to upgrade them. We can check for a cluster with a > > different number of relations say 10K, 20K, 50K, 100K. > > I did both greater and less than 4 pages for 10k relations. Since > pg_upgrade is O(# relations), I don't see a point in going higher. > > First, I had a problem: On MacOS with their "gcc" wrapper around > clang, I got a segfault 11 when compiled with no debugging symbols. > Did you get this problem with the patch or both with and without the patch? If it is only with patch, then we definitely need to investigate. > I > added "CFLAGS=-O0" and it worked fine. Since this doesn't happen in a > debugging build, I'm not sure how to investigate this. IIRC, this > doesn't happen for me on Linux gcc. > > Since it was at least running now, I measured by putting > gettimeofday() calls around transfer_all_new_tablespaces(). I did 10 > runs each and took the average, except for patch/1-page case since it > was obviously faster after a couple runs. > > 5 pages: > masterpatch > 5.59s 5.64s > > The variation within the builds is up to +/- 0.2s, so there is no > difference, as expected. > > 1 page: > masterpatch > 5.62s 4.25s > > Clearly, linking is much slower than stat. > The results are fine. Thanks for doing the tests. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Mar 8, 2019 at 7:43 PM Amit Kapila wrote: > Have you done any performance testing of this patch? I mean to say > now that we added a new stat call for each table, we should see if > that has any impact. Ideally, that should be compensated by the fact > that we are now not transferring *fsm files for small relations. To be precise, it will only call stat if pg_class.relpages is below the threshold. I suppose I could hack a database where all the relpages values are wrong, but that seems like a waste of time. > How > about constructing a test where all relations are greater than 4 pages > and then try to upgrade them. We can check for a cluster with a > different number of relations say 10K, 20K, 50K, 100K. I did both greater and less than 4 pages for 10k relations. Since pg_upgrade is O(# relations), I don't see a point in going higher. First, I had a problem: On MacOS with their "gcc" wrapper around clang, I got a segfault 11 when compiled with no debugging symbols. I added "CFLAGS=-O0" and it worked fine. Since this doesn't happen in a debugging build, I'm not sure how to investigate this. IIRC, this doesn't happen for me on Linux gcc. Since it was at least running now, I measured by putting gettimeofday() calls around transfer_all_new_tablespaces(). I did 10 runs each and took the average, except for patch/1-page case since it was obviously faster after a couple runs. 5 pages: masterpatch 5.59s 5.64s The variation within the builds is up to +/- 0.2s, so there is no difference, as expected. 1 page: masterpatch 5.62s 4.25s Clearly, linking is much slower than stat. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Sun, Mar 10, 2019 at 7:47 PM John Naylor wrote: > > On Fri, Mar 8, 2019 at 7:43 PM Amit Kapila wrote: > > > Have you done any performance testing of this patch? I mean to say > > now that we added a new stat call for each table, we should see if > > that has any impact. Ideally, that should be compensated by the fact > > that we are now not transferring *fsm files for small relations. How > > about constructing a test where all relations are greater than 4 pages > > and then try to upgrade them. We can check for a cluster with a > > different number of relations say 10K, 20K, 50K, 100K. > > I have not, but I agree it should be done. I will try to do so soon. > Thanks, I will wait for your test results. I believe this is the last patch in this project and we should try to get it done soon. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Mar 8, 2019 at 7:43 PM Amit Kapila wrote: > Few minor comments: > 1. > warning C4715: 'new_cluster_needs_fsm': not all control paths return a value > > Getting this new warning in the patch. Hmm, I don't get that in a couple versions of gcc. Your compiler must not know that pg_fatal() cannot return. I blindly added a fix. > 2. > > This comment line doesn't seem to be related to this patch. If so, I > think we can avoid having any additional change which is not related > to the functionality of this patch. Feel free to submit it > separately, if you think it is an improvement. > + > + /* Transfer any VM files if we can trust their contents. */ > if (vm_crashsafe_match) Well, I guess the current comment is still ok, so reverted. If I were to do a separate cleanup patch, I would rather remove the vm_must_add_frozenbit parameter -- there's no reason I can see for calls that transfer the heap and FSM to know about this. I also changed references to the 'first segment of the main fork' where there will almost always only be one segment. This was a vestige of the earlier algorithm I had. > 3. Can we add a note about this in the Notes section of pg_upgrade > documentation [1]? Done. > Have you done any performance testing of this patch? I mean to say > now that we added a new stat call for each table, we should see if > that has any impact. Ideally, that should be compensated by the fact > that we are now not transferring *fsm files for small relations. How > about constructing a test where all relations are greater than 4 pages > and then try to upgrade them. We can check for a cluster with a > different number of relations say 10K, 20K, 50K, 100K. I have not, but I agree it should be done. I will try to do so soon. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From b65084daf15d198c9fdd986992b0147290c8e690 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sun, 10 Mar 2019 22:01:21 +0800 Subject: [PATCH v23] During pg_upgrade, conditionally skip transfer of FSMs. If a heap on the old cluster has 4 pages or fewer, and the old cluster was PG v11 or earlier, don't copy or link the FSM. This will shrink space usage for installations with large numbers of small tables. --- doc/src/sgml/ref/pgupgrade.sgml | 7 src/bin/pg_upgrade/info.c| 16 +++-- src/bin/pg_upgrade/pg_upgrade.h | 6 src/bin/pg_upgrade/relfilenode.c | 58 +++- 4 files changed, 84 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index 7e1afaf0a5..c896882dd1 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -792,6 +792,13 @@ psql --username=postgres --file=script.sql postgres is down. + + In PostgreSQL 12 and later small tables by + default don't have a free space map, as a space optimization. If you are + upgrading a pre-12 cluster, the free space maps of small tables will + likewise not be transferred to the new cluster. + + diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c index 2f925f086c..902bfc647e 100644 --- a/src/bin/pg_upgrade/info.c +++ b/src/bin/pg_upgrade/info.c @@ -200,6 +200,8 @@ create_rel_filename_map(const char *old_data, const char *new_data, map->old_db_oid = old_db->db_oid; map->new_db_oid = new_db->db_oid; + map->relpages = old_rel->relpages; + map->relkind = old_rel->relkind; /* * old_relfilenode might differ from pg_class.oid (and hence @@ -418,6 +420,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) char *nspname = NULL; char *relname = NULL; char *tablespace = NULL; + char *relkind = NULL; int i_spclocation, i_nspname, i_relname, @@ -425,7 +428,9 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_indtable, i_toastheap, i_relfilenode, -i_reltablespace; +i_reltablespace, +i_relpages, +i_relkind; char query[QUERY_ALLOC]; char *last_namespace = NULL, *last_tablespace = NULL; @@ -494,7 +499,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) */ snprintf(query + strlen(query), sizeof(query) - strlen(query), "SELECT all_rels.*, n.nspname, c.relname, " - " c.relfilenode, c.reltablespace, %s " + " c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s " "FROM (SELECT * FROM regular_heap " " UNION ALL " " SELECT * FROM toast_heap " @@ -525,6 +530,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_relname = PQfnumber(res, "relname"); i_relfilenode = PQfnumber(res, "relfilenode"); i_reltablespace = PQfnumber(res, "reltablespace"); + i_relpages = PQfnumber(res, "relpages"); + i_relkind = PQfnumber(res, "relkind"); i_spclocation = PQfnumber(res, "spclocation"); for (relnum = 0; relnum < ntups; relnum++) @@ -556,6 +563,11 @@ get_rel_infos(ClusterInfo *cluster,
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Mar 8, 2019 at 5:13 PM Amit Kapila wrote: > > > Few minor comments: .. > > 2. > + > + /* Transfer any VM files if we can trust their > contents. */ > if (vm_crashsafe_match) > > 3. Can we add a note about this in the Notes section of pg_upgrade > documentation [1]? > > This comment line doesn't seem to be related to this patch. If so, I > think we can avoid having any additional change which is not related > to the functionality of this patch. Feel free to submit it > separately, if you think it is an improvement. > oops, I have messed up the above comments. The paragraph starting with "This comment line doesn't ..." is for comment number-2. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Mar 6, 2019 at 5:19 PM John Naylor wrote: > > On Fri, Jan 25, 2019 at 9:50 AM Amit Kapila wrote: > > Once we agree on the code, we need to test below scenarios: > > (a) upgrade from all supported versions to the latest version > > (b) upgrade standby with and without using rsync. > > Although the code hasn't been reviewed yet, I went ahead and tested > (a) on v21 of the pg_upgrade patch [1]. To do this I dumped out a 9.4 > instance with the regression database and restored it to all supported > versions. To make it work with pg_upgrade, I first had to drop tables > with oids, drop functions referring to C libraries, and drop the > later-removed '=>' operator. Then I pg_upgrade'd in copy mode from all > versions to HEAD with the patch applied. pg_upgrade worked without > error, and the following query returned 0 bytes on all the new > clusters: > > select sum(pg_relation_size(oid, 'fsm')) as total_fsm_size > from pg_class where relkind in ('r', 't') > and pg_relation_size(oid, 'main') <= 4 * 8192; > > The complementary query (> 4 * 8192) returned the same number of bytes > as in the original 9.4 instance. > Thanks, the tests done by you are quite useful. I have given a few comments as a response to your previous email. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 28, 2019 at 2:33 AM John Naylor wrote: > > On Sat, Jan 26, 2019 at 2:14 PM Amit Kapila wrote: > > > > I think there is some value in using the information from > > this function to skip fsm files, but the code doesn't appear to fit > > well, how about moving this check to new function > > new_cluster_needs_fsm()? > > For v21, new_cluster_needs_fsm() has all responsibility for obtaining > the info it needs. I think this is much cleaner, > Right, now the code looks much better. > but there is a small > bit of code duplication since it now has to form the file name. One > thing we could do is form the the base old/new file names in > transfer_single_new_db() and pass those to transfer_relfile(), which > will only add suffixes and segment numbers. We could then pass the > base old file name to new_cluster_needs_fsm() and use it as is. Not > sure if that's worthwhile, though. > I don't think it is worth. Few minor comments: 1. warning C4715: 'new_cluster_needs_fsm': not all control paths return a value Getting this new warning in the patch. 2. + + /* Transfer any VM files if we can trust their contents. */ if (vm_crashsafe_match) 3. Can we add a note about this in the Notes section of pg_upgrade documentation [1]? This comment line doesn't seem to be related to this patch. If so, I think we can avoid having any additional change which is not related to the functionality of this patch. Feel free to submit it separately, if you think it is an improvement. Have you done any performance testing of this patch? I mean to say now that we added a new stat call for each table, we should see if that has any impact. Ideally, that should be compensated by the fact that we are now not transferring *fsm files for small relations. How about constructing a test where all relations are greater than 4 pages and then try to upgrade them. We can check for a cluster with a different number of relations say 10K, 20K, 50K, 100K. In general, the patch looks okay to me. I would like to know if anybody else has any opinion whether pg_upgrade should skip transferring fsm files for small relations or not? I think both me and John thinks that it is good to have feature and now that patch turns out to be simpler, I feel we can go ahead with this optimization in pg_upgrade. [1] - https://www.postgresql.org/docs/devel/pgupgrade.html -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Sat, Feb 23, 2019 at 1:24 PM John Naylor wrote: > > On Thu, Feb 21, 2019 at 9:27 PM Alvaro Herrera > wrote: > > > > I think this test is going to break on nonstandard block sizes. While > > we don't promise that all tests work on such installs (particularly > > planner ones), it seems fairly easy to cope with this one -- just use a > > record size expressed as a fraction of current_setting('block_size'). > > So instead of "1024" you'd write current_setting('block_size') / 8. > > And then display the relation size in terms of pages, not bytes, so > > divide pg_relation_size by block size. > > I've done this for v6, tested on 16k block size. > Thanks, the patch looks good to me. I have additionally tested it 32K and 1K sized blocks and the test passes. I will commit this early next week. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Jan 25, 2019 at 9:50 AM Amit Kapila wrote: > Once we agree on the code, we need to test below scenarios: > (a) upgrade from all supported versions to the latest version > (b) upgrade standby with and without using rsync. Although the code hasn't been reviewed yet, I went ahead and tested (a) on v21 of the pg_upgrade patch [1]. To do this I dumped out a 9.4 instance with the regression database and restored it to all supported versions. To make it work with pg_upgrade, I first had to drop tables with oids, drop functions referring to C libraries, and drop the later-removed '=>' operator. Then I pg_upgrade'd in copy mode from all versions to HEAD with the patch applied. pg_upgrade worked without error, and the following query returned 0 bytes on all the new clusters: select sum(pg_relation_size(oid, 'fsm')) as total_fsm_size from pg_class where relkind in ('r', 't') and pg_relation_size(oid, 'main') <= 4 * 8192; The complementary query (> 4 * 8192) returned the same number of bytes as in the original 9.4 instance. To make it easy to find, the latest regression test patch, which is intended to be independent of block-size, was in [2]. [1] https://www.postgresql.org/message-id/CACPNZCu4cOdm3uGnNEGXivy7Gz8UWyQjynDpdkPGabQ18_zK6g%40mail.gmail.com [2] https://www.postgresql.org/message-id/CACPNZCsWa%3Ddd0K%2BFiODwM%3DLEsepAHVJCoSx2Avew%3DxBEX3Ywjw%40mail.gmail.com -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On 26/02/2019 16:20, Alvaro Herrera wrote: > On 2019-Feb-23, John Naylor wrote: > >> On Fri, Feb 22, 2019 at 3:59 AM Amit Kapila wrote: >>> The reason for not pushing much on making the test pass for >>> nonstandard block sizes is that when I tried existing tests, there >>> were already some failures. >> >> FWIW, I currently see 8 failures (attached). > > Hmm, not great -- even the strings test fails, which seems to try to handle > the case explicitly. I did expect the plan shape ones to fail, but I'm > surprised about the tablesample one. > The SYSTEM table sampling is basically per-page sampling so it depends heavily on which rows are on which page. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On 2019-Feb-23, John Naylor wrote: > On Fri, Feb 22, 2019 at 3:59 AM Amit Kapila wrote: > > The reason for not pushing much on making the test pass for > > nonstandard block sizes is that when I tried existing tests, there > > were already some failures. > > FWIW, I currently see 8 failures (attached). Hmm, not great -- even the strings test fails, which seems to try to handle the case explicitly. I did expect the plan shape ones to fail, but I'm surprised about the tablesample one. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Feb 22, 2019 at 3:59 AM Amit Kapila wrote: > The reason for not pushing much on making the test pass for > nonstandard block sizes is that when I tried existing tests, there > were already some failures. FWIW, I currently see 8 failures (attached). -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services jcn-16blk-regress.diffs Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Feb 21, 2019 at 9:27 PM Alvaro Herrera wrote: > > I think this test is going to break on nonstandard block sizes. While > we don't promise that all tests work on such installs (particularly > planner ones), it seems fairly easy to cope with this one -- just use a > record size expressed as a fraction of current_setting('block_size'). > So instead of "1024" you'd write current_setting('block_size') / 8. > And then display the relation size in terms of pages, not bytes, so > divide pg_relation_size by block size. I've done this for v6, tested on 16k block size. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 157f74155beb53c455b0a85b6eebdd22a60ce5f5 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sat, 23 Feb 2019 08:48:31 +0100 Subject: [PATCH v6] Add more tests for FSM. In commit b0eaa4c51bb, we left out a test that used a vacuum to remove dead rows as the behavior of test was not predictable. This test has been rewritten to use fillfactor instead to control free space. Since we no longer need to remove dead rows as part of the test, put the fsm regression test in a parallel group. --- src/test/regress/expected/fsm.out | 59 +- src/test/regress/parallel_schedule | 8 +--- src/test/regress/serial_schedule | 2 +- src/test/regress/sql/fsm.sql | 41 + 4 files changed, 77 insertions(+), 33 deletions(-) diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out index b02993188c..698d4b0be5 100644 --- a/src/test/regress/expected/fsm.out +++ b/src/test/regress/expected/fsm.out @@ -1,15 +1,40 @@ -- -- Free Space Map test -- +SELECT current_setting('block_size')::integer AS blocksize, +current_setting('block_size')::integer / 8 AS strsize +\gset CREATE TABLE fsm_check_size (num int, str text); --- With one block, there should be no FSM -INSERT INTO fsm_check_size VALUES(1, 'a'); +-- Fill 3 blocks with one record each +ALTER TABLE fsm_check_size SET (fillfactor=15); +INSERT INTO fsm_check_size SELECT i, rpad('', :strsize, 'a') +FROM generate_series(1,3) i; +-- There should be no FSM VACUUM fsm_check_size; -SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, -pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; - heap_size | fsm_size +-- - 8192 |0 +SELECT pg_relation_size('fsm_check_size', 'main') / :blocksize AS heap_nblocks, +pg_relation_size('fsm_check_size', 'fsm') / :blocksize AS fsm_nblocks; + heap_nblocks | fsm_nblocks +--+- +3 | 0 +(1 row) + +-- The following operations are for testing the functionality of the local +-- in-memory map. In particular, we want to be able to insert into some +-- other block than the one at the end of the heap, without using a FSM. +-- Fill most of the last block +ALTER TABLE fsm_check_size SET (fillfactor=100); +INSERT INTO fsm_check_size SELECT i, rpad('', :strsize, 'a') +FROM generate_series(101,105) i; +-- Make sure records can go into any block but the last one +ALTER TABLE fsm_check_size SET (fillfactor=30); +-- Insert large record and make sure it does not cause the relation to extend +INSERT INTO fsm_check_size VALUES (111, rpad('', :strsize, 'a')); +VACUUM fsm_check_size; +SELECT pg_relation_size('fsm_check_size', 'main') / :blocksize AS heap_nblocks, +pg_relation_size('fsm_check_size', 'fsm') / :blocksize AS fsm_nblocks; + heap_nblocks | fsm_nblocks +--+- +3 | 0 (1 row) -- Extend table with enough blocks to exceed the FSM threshold @@ -26,23 +51,23 @@ num = 11; END; $$; VACUUM fsm_check_size; -SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; - fsm_size --- -24576 +SELECT pg_relation_size('fsm_check_size', 'fsm') / :blocksize AS fsm_nblocks; + fsm_nblocks +- + 3 (1 row) -- Add long random string to extend TOAST table to 1 block INSERT INTO fsm_check_size VALUES(0, (SELECT string_agg(md5(chr(i)), '') - FROM generate_series(1,100) i)); + FROM generate_series(1, :blocksize / 100) i)); VACUUM fsm_check_size; -SELECT pg_relation_size(reltoastrelid, 'main') AS toast_size, -pg_relation_size(reltoastrelid, 'fsm') AS toast_fsm_size +SELECT pg_relation_size(reltoastrelid, 'main') / :blocksize AS toast_nblocks, +pg_relation_size(reltoastrelid, 'fsm') / :blocksize AS toast_fsm_nblocks FROM pg_class WHERE relname = 'fsm_check_size'; - toast_size | toast_fsm_size -+ - 8192 | 0 + toast_nblocks | toast_fsm_nblocks +---+--- + 1 | 0 (1 row) DROP TABLE fsm_check_size; diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 4051a4ad4e..a956775dd1 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -23,7 +23,7 @@
Re: WIP: Avoid creation of the free space map for small tables
On 2019-Feb-22, Peter Geoghegan wrote: > I find it suspicious that there is another crash in pageinspect's > brin_page_items(), since like amcheck, pageinspect is a contrib module > that relies on BLCKSZ when allocating a local temp buffer. Ah. Maybe they just weren't rebuilt. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Feb 22, 2019 at 8:04 AM Alvaro Herrera wrote: > Wow, there's a lot less tests failing there than I thought there would > be. That increases hope that we can someday have them pass. +1 on not > making things worse. > > I think the crash in the amcheck test should be studied, one way or > another; CCing Peter. I built Postgres with "--with-blocksize=16" and "--with-blocksize=32", and tested amcheck with both builds. All tests passed. I have a hard time imagining what the problem could be here. If there was a problem with amcheck relying on there being an 8KiB block size specifically, then it would almost certainly have been there since the initial commit from March 2017. Not much has changed since then, and the crash that Amit reported occurs at the earliest possible point. I find it suspicious that there is another crash in pageinspect's brin_page_items(), since like amcheck, pageinspect is a contrib module that relies on BLCKSZ when allocating a local temp buffer. -- Peter Geoghegan
Re: WIP: Avoid creation of the free space map for small tables
On 2019-Feb-22, Amit Kapila wrote: > On Fri, Feb 22, 2019 at 1:57 AM Alvaro Herrera > wrote: > > > > I think this test is going to break on nonstandard block sizes. While > > we don't promise that all tests work on such installs (particularly > > planner ones), > > The reason for not pushing much on making the test pass for > nonstandard block sizes is that when I tried existing tests, there > were already some failures. For example, see the failures in the > attached regression diff files (for block_size as 16K and 32K > respectively). I saw those failures during the previous > investigation, the situation on HEAD might or might not be exactly the > same. Whereas I see the value in trying to make sure that tests pass > for nonstandard block sizes, but that doesn't seem to be followed for > all the tests. Wow, there's a lot less tests failing there than I thought there would be. That increases hope that we can someday have them pass. +1 on not making things worse. I think the crash in the amcheck test should be studied, one way or another; CCing Peter. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Feb 21, 2019 at 9:59 PM Amit Kapila wrote: > The reason for not pushing much on making the test pass for > nonstandard block sizes is that when I tried existing tests, there > were already some failures. Sure, but let's not make things worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Feb 22, 2019 at 1:57 AM Alvaro Herrera wrote: > > I think this test is going to break on nonstandard block sizes. While > we don't promise that all tests work on such installs (particularly > planner ones), > The reason for not pushing much on making the test pass for nonstandard block sizes is that when I tried existing tests, there were already some failures. For example, see the failures in the attached regression diff files (for block_size as 16K and 32K respectively). I saw those failures during the previous investigation, the situation on HEAD might or might not be exactly the same. Whereas I see the value in trying to make sure that tests pass for nonstandard block sizes, but that doesn't seem to be followed for all the tests. > it seems fairly easy to cope with this one -- just use a > record size expressed as a fraction of current_setting('block_size'). > So instead of "1024" you'd write current_setting('block_size') / 8. > And then display the relation size in terms of pages, not bytes, so > divide pg_relation_size by block size. > The idea sounds good. John, would you like to give it a try? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com regression.16.diffs Description: Binary data regression.32.diffs Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
I think this test is going to break on nonstandard block sizes. While we don't promise that all tests work on such installs (particularly planner ones), it seems fairly easy to cope with this one -- just use a record size expressed as a fraction of current_setting('block_size'). So instead of "1024" you'd write current_setting('block_size') / 8. And then display the relation size in terms of pages, not bytes, so divide pg_relation_size by block size. (I see that there was already a motion for this, but was dismissed because of lack of interest.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Feb 21, 2019 at 6:39 PM Alvaro Herrera wrote: > > On 2019-Feb-21, Amit Kapila wrote: > > > On Wed, Feb 20, 2019 at 8:08 PM Alvaro Herrera > > wrote: > > > > > > Please remember to keep serial_schedule in sync. > > > > I don't understand what you mean by this? It is already present in > > serial_schedule. In parallel_schedule, we are just moving this test > > to one of the parallel groups. Do we need to take care of something > > else? > > Just to make sure it's in the same relative position. > Okay, thanks for the input. Attached, find the updated patch, will try to commit it tomorrow unless you or someone else has any other comments. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v5-0001-Add-more-tests-for-FSM.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On 2019-Feb-21, Amit Kapila wrote: > On Wed, Feb 20, 2019 at 8:08 PM Alvaro Herrera > wrote: > > > > Please remember to keep serial_schedule in sync. > > I don't understand what you mean by this? It is already present in > serial_schedule. In parallel_schedule, we are just moving this test > to one of the parallel groups. Do we need to take care of something > else? Just to make sure it's in the same relative position. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Feb 21, 2019 at 7:58 AM Amit Kapila wrote: > So here you are inserting 4-byte integer and 1024-bytes variable > length record. So the tuple length will be tuple_header (24-bytes) + > 4-bytes for integer + 4-bytes header for variable length data + 1024 > bytes of actual data. So, the length will be 1056 which is already > MAXALIGN. I took the new comments added in your latest version of the > patch and added them to the previous version of the patch. Kindly > see if I have not missed anything while merging the patch-versions? OK, that makes sense. Looks fine to me. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Feb 20, 2019 at 8:08 PM Alvaro Herrera wrote: > > Please remember to keep serial_schedule in sync. > I don't understand what you mean by this? It is already present in serial_schedule. In parallel_schedule, we are just moving this test to one of the parallel groups. Do we need to take care of something else? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
Please remember to keep serial_schedule in sync. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 11, 2019 at 10:48 PM John Naylor wrote: > > On 2/9/19, Amit Kapila wrote: > > On Tue, Feb 5, 2019 at 3:25 PM John Naylor > > wrote: > >> > >> On Tue, Feb 5, 2019 at 4:04 AM Amit Kapila > >> wrote: > > This is certainly a good test w.r.t code coverage of new code, but I > > have few comments: > > 1. The size of records in test still depends on alignment (MAXALIGN). > > Though it doesn't seem to be a problematic case, I still suggest we > > can avoid using records whose size depends on alignment. If you > > change the schema as CREATE TABLE fsm_check_size (num1 int, num2 int, > > str text);, then you can avoid alignment related issues for the > > records being used in test. > > Done. > > > 2. > > +-- Fill most of the last block > > .. > > +-- Make sure records can go into any block but the last one > > .. > > +-- Insert large record and make sure it does not cause the relation to > > extend > > > > The comments in some part of the test seems too focussed towards the > > algorithm used for in-memory map. I think we can keep these if we > > want, but it is required to write a more generic comment stating what > > is the actual motive of additional tests (basically we are testing the > > functionality of in-memory map (LSM) for the heap, so we should write > > about it.). > > Done. > Thanks, the modification looks good. I have slightly changed the commit message in the attached patch. I will spend some more time tomorrow morning on this and will commit unless I see any new problem. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v3-0001-Add-more-tests-for-FSM.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Feb 20, 2019 at 6:09 PM Amit Kapila wrote: > I have modified the patch for the above observations and added a > commit message as well, see if it looks okay to you. Looks good to me, thanks. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 11, 2019 at 10:48 PM John Naylor wrote: > > On 2/9/19, Amit Kapila wrote: > > > Shall we add a note to the docs of pg_freespacemap and > > pgstattuple_approx indicating that for small relations, FSM won't be > > created, so these functions won't give appropriate value? > > I've given this a try in 0002. > This looks mostly correct, but I have a few observations: 1. - tuples. + tuples. Small tables don't have a free space map, so in that case + this function will report zero free space, likewise inflating the + estimated number of live tuples. The last part of the sentence "likewise inflating the estimated number of live tuples." seems incorrect to me because live tuples are computed based on the pages scanned, live tuples in them and total blocks in the relation. So, I think it should be "likewise inflating the approximate tuple length". 2. + In addition, small tables don't have a free space map, so this function + will return zero even if free space is available. Actually, the paragraph you have modified applies to both the functions mentioned on that page. So instead of saying "this function ..", we can say "these functions .." 3. * space from the FSM and move on. + * Note: If a relation has no FSM, GetRecordedFreeSpace() will report + * zero free space. This is fine for the purposes of approximation. */ It is better to have an empty line before Note: ... I have modified the patch for the above observations and added a commit message as well, see if it looks okay to you. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v3-0002-Document-that-functions-that-use-the-FSM-for-will.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On 2/9/19, Amit Kapila wrote: > On Tue, Feb 5, 2019 at 3:25 PM John Naylor > wrote: >> >> On Tue, Feb 5, 2019 at 4:04 AM Amit Kapila >> wrote: > This is certainly a good test w.r.t code coverage of new code, but I > have few comments: > 1. The size of records in test still depends on alignment (MAXALIGN). > Though it doesn't seem to be a problematic case, I still suggest we > can avoid using records whose size depends on alignment. If you > change the schema as CREATE TABLE fsm_check_size (num1 int, num2 int, > str text);, then you can avoid alignment related issues for the > records being used in test. Done. > 2. > +-- Fill most of the last block > .. > +-- Make sure records can go into any block but the last one > .. > +-- Insert large record and make sure it does not cause the relation to > extend > > The comments in some part of the test seems too focussed towards the > algorithm used for in-memory map. I think we can keep these if we > want, but it is required to write a more generic comment stating what > is the actual motive of additional tests (basically we are testing the > functionality of in-memory map (LSM) for the heap, so we should write > about it.). Done. > Shall we add a note to the docs of pg_freespacemap and > pgstattuple_approx indicating that for small relations, FSM won't be > created, so these functions won't give appropriate value? I've given this a try in 0002. > Or other > possibility could be that we return an error if the block number is > less than the threshold value, but not sure if that is a good > alternative as that can happen today also if the vacuum hasn't run on > the table. Yeah, an error doesn't seem helpful. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 7a7eb0b17aa55dbec2032bf7cc5189e22bb2ca0c Mon Sep 17 00:00:00 2001 From: John Naylor Date: Mon, 11 Feb 2019 18:08:01 +0100 Subject: [PATCH v2 1/2] Improve FSM regression test In the interest of caution, in commit b0eaa4c51bb we left out a test that used vacuum to remove dead rows. This test has been rewritten to use fillfactor instead to control free space. Since we no longer need to remove dead rows as part of the test, put the fsm regression test in a parallel group. --- src/test/regress/expected/fsm.out | 36 -- src/test/regress/parallel_schedule | 8 +-- src/test/regress/sql/fsm.sql | 34 +++- 3 files changed, 58 insertions(+), 20 deletions(-) diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out index b02993188c..f5f62429ac 100644 --- a/src/test/regress/expected/fsm.out +++ b/src/test/regress/expected/fsm.out @@ -1,15 +1,37 @@ -- -- Free Space Map test -- -CREATE TABLE fsm_check_size (num int, str text); --- With one block, there should be no FSM -INSERT INTO fsm_check_size VALUES(1, 'a'); +CREATE TABLE fsm_check_size (num1 int, num2 int, str text); +-- Fill 3 blocks with one record each +ALTER TABLE fsm_check_size SET (fillfactor=15); +INSERT INTO fsm_check_size SELECT i, i, rpad('', 1024, 'a') +FROM generate_series(1,3) i; +-- There should be no FSM VACUUM fsm_check_size; SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; heap_size | fsm_size ---+-- - 8192 |0 + 24576 |0 +(1 row) + +-- The following operations are for testing the functionality of the local +-- in-memory map. In particular, we want to be able to insert into some +-- other block than the one at the end of the heap, without using a FSM. +-- Fill most of the last block +ALTER TABLE fsm_check_size SET (fillfactor=100); +INSERT INTO fsm_check_size SELECT i, i, rpad('', 1024, 'a') +FROM generate_series(101,105) i; +-- Make sure records can go into any block but the last one +ALTER TABLE fsm_check_size SET (fillfactor=30); +-- Insert large record and make sure it does not cause the relation to extend +INSERT INTO fsm_check_size VALUES (111, 111, rpad('', 1024, 'a')); +VACUUM fsm_check_size; +SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, +pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; + heap_size | fsm_size +---+-- + 24576 |0 (1 row) -- Extend table with enough blocks to exceed the FSM threshold @@ -19,7 +41,7 @@ num int; BEGIN num = 11; LOOP -INSERT INTO fsm_check_size VALUES (num, 'b') RETURNING ctid INTO curtid; +INSERT INTO fsm_check_size VALUES (num, num, 'b') RETURNING ctid INTO curtid; EXIT WHEN curtid >= tid '(4, 0)'; num = num + 1; END LOOP; @@ -34,8 +56,8 @@ SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; -- Add long random string to extend TOAST table to 1 block INSERT INTO fsm_check_size -VALUES(0, (SELECT string_agg(md5(chr(i)), '') - FROM generate_series(1,100) i)); +VALUES(0, 0, (SELECT string_agg(md5(chr(i)), '') +
Re: WIP: Avoid creation of the free space map for small tables
On Tue, Feb 5, 2019 at 3:25 PM John Naylor wrote: > > On Tue, Feb 5, 2019 at 4:04 AM Amit Kapila wrote: > > > > On Mon, Feb 4, 2019 at 2:27 PM John Naylor > > wrote: > > > > > > 1. Earlier, I had a test to ensure that free space towards the front > > > of the relation was visible with no FSM. In [1], I rewrote it without > > > using vacuum, so we can consider adding it back now if desired. I can > > > prepare a patch for this. > > > > > > > Yes, this is required. It is generally a good practise to add test (unless > > it takes a lot of time) which covers new code/functionality. > > > > > 2. As a follow-on, since we don't rely on vacuum to remove dead rows, > > > we could try putting the fsm.sql test in some existing group in the > > > parallel schedule, rather than its own group is it is now. > > > > > > > +1. > > This is done in 0001. > This is certainly a good test w.r.t code coverage of new code, but I have few comments: 1. The size of records in test still depends on alignment (MAXALIGN). Though it doesn't seem to be a problematic case, I still suggest we can avoid using records whose size depends on alignment. If you change the schema as CREATE TABLE fsm_check_size (num1 int, num2 int, str text);, then you can avoid alignment related issues for the records being used in test. 2. +-- Fill most of the last block .. +-- Make sure records can go into any block but the last one .. +-- Insert large record and make sure it does not cause the relation to extend The comments in some part of the test seems too focussed towards the algorithm used for in-memory map. I think we can keep these if we want, but it is required to write a more generic comment stating what is the actual motive of additional tests (basically we are testing the functionality of in-memory map (LSM) for the heap, so we should write about it.). > > > 3. While looking at 0d1fe9f74e, it occurred to me that I ignored this > > > patch's effects on GetRecordedFreeSpace(), which will return zero for > > > tables with no FSM. > > > > > > > Right, but what exactly we want to do for it? Do you want to add a comment > > atop of this function? > > Hmm, the comment already says "according to the FSM", so maybe it's > already obvious. I was thinking more about maybe commenting the > callsite where it's helpful, as in 0002. > > > > The other callers are in: > > > contrib/pg_freespacemap/pg_freespacemap.c > > > contrib/pgstattuple/pgstatapprox.c > > > > > > For pg_freespacemap, this doesn't matter, since it's just reporting > > > the facts. For pgstattuple_approx(), it might under-estimate the free > > > space and over-estimate the number of live tuples. > > > > > > > Sure, but without patch also, it can do so, if the vacuum hasn't updated > > freespace map. > > Okay, then maybe we don't need to do anything else here. > Shall we add a note to the docs of pg_freespacemap and pgstattuple_approx indicating that for small relations, FSM won't be created, so these functions won't give appropriate value? Or other possibility could be that we return an error if the block number is less than the threshold value, but not sure if that is a good alternative as that can happen today also if the vacuum hasn't run on the table. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Tue, Feb 5, 2019 at 4:04 AM Amit Kapila wrote: > > On Mon, Feb 4, 2019 at 2:27 PM John Naylor > wrote: > > > > 1. Earlier, I had a test to ensure that free space towards the front > > of the relation was visible with no FSM. In [1], I rewrote it without > > using vacuum, so we can consider adding it back now if desired. I can > > prepare a patch for this. > > > > Yes, this is required. It is generally a good practise to add test (unless > it takes a lot of time) which covers new code/functionality. > > > 2. As a follow-on, since we don't rely on vacuum to remove dead rows, > > we could try putting the fsm.sql test in some existing group in the > > parallel schedule, rather than its own group is it is now. > > > > +1. This is done in 0001. > > 3. While looking at 0d1fe9f74e, it occurred to me that I ignored this > > patch's effects on GetRecordedFreeSpace(), which will return zero for > > tables with no FSM. > > > > Right, but what exactly we want to do for it? Do you want to add a comment > atop of this function? Hmm, the comment already says "according to the FSM", so maybe it's already obvious. I was thinking more about maybe commenting the callsite where it's helpful, as in 0002. > > The other callers are in: > > contrib/pg_freespacemap/pg_freespacemap.c > > contrib/pgstattuple/pgstatapprox.c > > > > For pg_freespacemap, this doesn't matter, since it's just reporting > > the facts. For pgstattuple_approx(), it might under-estimate the free > > space and over-estimate the number of live tuples. > > > > Sure, but without patch also, it can do so, if the vacuum hasn't updated > freespace map. Okay, then maybe we don't need to do anything else here. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From ff821d820630313e4dcea65aa1b35d47a0736c64 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Tue, 5 Feb 2019 10:32:19 +0100 Subject: [PATCH v1 1/2] Improve FSM regression test In the interest of caution, in commit b0eaa4c51bb we left out a test that used vacuum to remove dead rows. This test has been rewritten to use fillfactor instead to control free space. Since we no longer need to remove dead rows as part of the test, put the fsm regression test in a parallel group. --- src/test/regress/expected/fsm.out | 25 ++--- src/test/regress/parallel_schedule | 8 +--- src/test/regress/sql/fsm.sql | 22 -- 3 files changed, 43 insertions(+), 12 deletions(-) diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out index b02993188c..d747c1a7dd 100644 --- a/src/test/regress/expected/fsm.out +++ b/src/test/regress/expected/fsm.out @@ -2,14 +2,33 @@ -- Free Space Map test -- CREATE TABLE fsm_check_size (num int, str text); --- With one block, there should be no FSM -INSERT INTO fsm_check_size VALUES(1, 'a'); +-- Fill 3 blocks with one record each +ALTER TABLE fsm_check_size SET (fillfactor=15); +INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a') +FROM generate_series(1,3) i; +-- There should be no FSM VACUUM fsm_check_size; SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; heap_size | fsm_size ---+-- - 8192 |0 + 24576 |0 +(1 row) + +-- Fill most of the last block +ALTER TABLE fsm_check_size SET (fillfactor=100); +INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a') +FROM generate_series(101,105) i; +-- Make sure records can go into any block but the last one +ALTER TABLE fsm_check_size SET (fillfactor=30); +-- Insert large record and make sure it does not cause the relation to extend +INSERT INTO fsm_check_size VALUES (111, rpad('', 1024, 'a')); +VACUUM fsm_check_size; +SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, +pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; + heap_size | fsm_size +---+-- + 24576 |0 (1 row) -- Extend table with enough blocks to exceed the FSM threshold diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 4051a4ad4e..a956775dd1 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -23,7 +23,7 @@ test: numerology # -- # The second group of parallel tests # -- -test: point lseg line box path polygon circle date time timetz timestamp timestamptz interval inet macaddr macaddr8 tstypes +test: point lseg line box path polygon circle date time timetz timestamp timestamptz interval inet macaddr macaddr8 tstypes fsm # -- # Another group of parallel tests @@ -68,12 +68,6 @@ test: create_aggregate create_function_3 create_cast constraints triggers inheri # -- test: sanity_check -# -- -# fsm does a delete followed by vacuum, and running it in parallel can prevent -# removal of rows. -# -- -test: fsm - # -- #
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 4, 2019 at 2:27 PM John Naylor wrote: > > On Mon, Feb 4, 2019 at 8:41 AM Amit Kapila wrote: > > > > The change seems to have worked. All the buildfarm machines that were > > showing the failure are passed now. > > Excellent! > > Now that the buildfarm is green as far as this patch goes, > There is still one recent failure which I don't think is related to commit of this patch: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird=2019-02-04%2016%3A38%3A48 == pgsql.build/src/bin/pg_ctl/tmp_check/log/004_logrotate_primary.log === TRAP: FailedAssertion("!(UsedShmemSegAddr != ((void *)0))", File: "g:\prog\bf\root\head\pgsql.build\src\backend\port\win32_shmem.c", Line: 513) I think we need to do something about this random failure, but not as part of this thread/patch. > I will > touch on a few details to round out development in this area: > > 1. Earlier, I had a test to ensure that free space towards the front > of the relation was visible with no FSM. In [1], I rewrote it without > using vacuum, so we can consider adding it back now if desired. I can > prepare a patch for this. > Yes, this is required. It is generally a good practise to add test (unless it takes a lot of time) which covers new code/functionality. > 2. As a follow-on, since we don't rely on vacuum to remove dead rows, > we could try putting the fsm.sql test in some existing group in the > parallel schedule, rather than its own group is it is now. > +1. > 3. While looking at 0d1fe9f74e, it occurred to me that I ignored this > patch's effects on GetRecordedFreeSpace(), which will return zero for > tables with no FSM. > Right, but what exactly we want to do for it? Do you want to add a comment atop of this function? > The other callers are in: > contrib/pg_freespacemap/pg_freespacemap.c > contrib/pgstattuple/pgstatapprox.c > > For pg_freespacemap, this doesn't matter, since it's just reporting > the facts. For pgstattuple_approx(), it might under-estimate the free > space and over-estimate the number of live tuples. > Sure, but without patch also, it can do so, if the vacuum hasn't updated freespace map. > This might be fine, > since it is approximate after all, but maybe a comment would be > helpful. If this is a problem, we could tweak it to be more precise > for tables without FSMs. > Sounds reasonable to me. > Thoughts? > > 4. The latest patch for the pg_upgrade piece was in [2] > It will be good if we get this one as well. I will look into it once we are done with the other points you have mentioned. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 4, 2019 at 8:41 AM Amit Kapila wrote: > > Yeah that can also work, but we still need to be careful about the > > alignment of that one tuple, otherwise, there will could be different > > free space on the fifth page. The probably easier way could be to use > > an even number of integers in the table say(int, int). Anyway, for > > now, I have avoided the dependency on FSM contents without losing on > > coverage of test. I have pushed my latest suggestion in the previous > > email. > > > > The change seems to have worked. All the buildfarm machines that were > showing the failure are passed now. Excellent! Now that the buildfarm is green as far as this patch goes, I will touch on a few details to round out development in this area: 1. Earlier, I had a test to ensure that free space towards the front of the relation was visible with no FSM. In [1], I rewrote it without using vacuum, so we can consider adding it back now if desired. I can prepare a patch for this. 2. As a follow-on, since we don't rely on vacuum to remove dead rows, we could try putting the fsm.sql test in some existing group in the parallel schedule, rather than its own group is it is now. 3. While looking at 0d1fe9f74e, it occurred to me that I ignored this patch's effects on GetRecordedFreeSpace(), which will return zero for tables with no FSM. The other callers are in: contrib/pg_freespacemap/pg_freespacemap.c contrib/pgstattuple/pgstatapprox.c For pg_freespacemap, this doesn't matter, since it's just reporting the facts. For pgstattuple_approx(), it might under-estimate the free space and over-estimate the number of live tuples. This might be fine, since it is approximate after all, but maybe a comment would be helpful. If this is a problem, we could tweak it to be more precise for tables without FSMs. Thoughts? 4. The latest patch for the pg_upgrade piece was in [2] Anything else? [1] https://www.postgresql.org/message-id/CACPNZCvEXLUx10pFvNcOs88RvqemMEjOv7D9MhL3ac86EzjAOA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CACPNZCu4cOdm3uGnNEGXivy7Gz8UWyQjynDpdkPGabQ18_zK6g%40mail.gmail.com -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 4, 2019 at 10:29 AM Amit Kapila wrote: > > On Mon, Feb 4, 2019 at 10:18 AM John Naylor > wrote: > > > > On Mon, Feb 4, 2019 at 4:17 AM Amit Kapila wrote: > > > This one seems to be FSM test portability issue (due to different page > > > contents, maybe). Looking into it, John, see if you are around and > > > have some thoughts on it. > > > > Maybe we can use the same plpgsql loop as fsm.sql that exits after 1 > > tuple has inserted into the 5th page. > > > > Yeah that can also work, but we still need to be careful about the > alignment of that one tuple, otherwise, there will could be different > free space on the fifth page. The probably easier way could be to use > an even number of integers in the table say(int, int). Anyway, for > now, I have avoided the dependency on FSM contents without losing on > coverage of test. I have pushed my latest suggestion in the previous > email. > The change seems to have worked. All the buildfarm machines that were showing the failure are passed now. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 4, 2019 at 10:18 AM John Naylor wrote: > > On Mon, Feb 4, 2019 at 4:17 AM Amit Kapila wrote: > > This one seems to be FSM test portability issue (due to different page > > contents, maybe). Looking into it, John, see if you are around and > > have some thoughts on it. > > Maybe we can use the same plpgsql loop as fsm.sql that exits after 1 > tuple has inserted into the 5th page. > Yeah that can also work, but we still need to be careful about the alignment of that one tuple, otherwise, there will could be different free space on the fifth page. The probably easier way could be to use an even number of integers in the table say(int, int). Anyway, for now, I have avoided the dependency on FSM contents without losing on coverage of test. I have pushed my latest suggestion in the previous email. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 4, 2019 at 4:17 AM Amit Kapila wrote: > This one seems to be FSM test portability issue (due to different page > contents, maybe). Looking into it, John, see if you are around and > have some thoughts on it. Maybe we can use the same plpgsql loop as fsm.sql that exits after 1 tuple has inserted into the 5th page. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 4, 2019 at 9:24 AM Amit Kapila wrote: > On Mon, Feb 4, 2019 at 8:47 AM Amit Kapila wrote: > > One more similar failure: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2019-02-04%2003%3A20%3A01 > > So, basically, this is due to difference in the number of tuples that > can fit on a page. The freespace in FSM for the page is shown > different because of available space on a particular page. This can > vary due to alignment. It seems to me we can't rely on FSM contents > if there are many tuples in a relation. One idea is to get rid of > dependency on FSM contents in this test, can you think of any better > way to have consistent FSM contents across different platforms? > One more idea could be that we devise a test (say with a char/varchar) such that it always consume same space in a page irrespective of its alignment. Yet another way could be we use explain (costs off, analyze on, timing off, summary off) ..., this will ensure that we will have test coverage for function fsm_page_contents, but we don't rely on its contents. What do you think? I will go with last option to stablize the buildfarm tests unless anyone thinks otherwise or has better idea. I willprobably wait for 20 minutes or so to see if anyone has inputs. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 4, 2019 at 8:47 AM Amit Kapila wrote: > > On Mon, Feb 4, 2019 at 12:39 AM John Naylor > wrote: > > > > On Sun, Feb 3, 2019 at 2:06 PM Amit Kapila wrote: > > > > > > On Thu, Jan 31, 2019 at 6:03 PM Amit Kapila > > > wrote: > > > This doesn't get applied cleanly after recent commit 0d1fe9f74e. > > > Attached is a rebased version. I have checked once that the changes > > > done by 0d1fe9f74e don't impact this patch. John, see if you can also > > > once confirm whether the recent commit (0d1fe9f74e) has any impact. I > > > am planning to push this tomorrow morning (IST) unless you or anyone > > > see any problem with this. > > > > Since that commit changes RelationAddExtraBlocks(), which can be > > induces by your pgbench adjustment upthread, I ran make check with > > that adjustment in the pgbench dir 300 times without triggering > > asserts. > > > > I also tried to follow the logic in 0d1fe9f74e, and I believe it will > > be correct without a FSM. > > > > I have just pushed it and buildfarm has shown two failures: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary=2019-02-04%2002%3A27%3A26 > > --- > /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/expected/page.out > 2019-02-03 21:27:29.0 -0500 > +++ > /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/results/page.out > 2019-02-03 21:41:32.0 -0500 > @@ -38,19 +38,19 @@ > SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0)); > fsm_page_contents > --- > - 0: 39+ > - 1: 39+ > - 3: 39+ > - 7: 39+ > - 15: 39 + > - 31: 39 + > - 63: 39 + > .. > > This one seems to be FSM test portability issue (due to different page > contents, maybe). Looking into it, John, see if you are around and > have some thoughts on it. > One more similar failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2019-02-04%2003%3A20%3A01 So, basically, this is due to difference in the number of tuples that can fit on a page. The freespace in FSM for the page is shown different because of available space on a particular page. This can vary due to alignment. It seems to me we can't rely on FSM contents if there are many tuples in a relation. One idea is to get rid of dependency on FSM contents in this test, can you think of any better way to have consistent FSM contents across different platforms? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Feb 4, 2019 at 12:39 AM John Naylor wrote: > > On Sun, Feb 3, 2019 at 2:06 PM Amit Kapila wrote: > > > > On Thu, Jan 31, 2019 at 6:03 PM Amit Kapila wrote: > > This doesn't get applied cleanly after recent commit 0d1fe9f74e. > > Attached is a rebased version. I have checked once that the changes > > done by 0d1fe9f74e don't impact this patch. John, see if you can also > > once confirm whether the recent commit (0d1fe9f74e) has any impact. I > > am planning to push this tomorrow morning (IST) unless you or anyone > > see any problem with this. > > Since that commit changes RelationAddExtraBlocks(), which can be > induces by your pgbench adjustment upthread, I ran make check with > that adjustment in the pgbench dir 300 times without triggering > asserts. > > I also tried to follow the logic in 0d1fe9f74e, and I believe it will > be correct without a FSM. > I have just pushed it and buildfarm has shown two failures: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary=2019-02-04%2002%3A27%3A26 --- /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/expected/page.out 2019-02-03 21:27:29.0 -0500 +++ /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/results/page.out 2019-02-03 21:41:32.0 -0500 @@ -38,19 +38,19 @@ SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0)); fsm_page_contents --- - 0: 39+ - 1: 39+ - 3: 39+ - 7: 39+ - 15: 39 + - 31: 39 + - 63: 39 + .. This one seems to be FSM test portability issue (due to different page contents, maybe). Looking into it, John, see if you are around and have some thoughts on it. 2. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-02-04%2002%3A30%3A25 select explain_parallel_append('execute ab_q5 (33, 44, 55)'); -explain_parallel_append - Finalize Aggregate (actual rows=1 loops=1) - -> Gather (actual rows=3 loops=1) - Workers Planned: 2 - Workers Launched: 2 - -> Partial Aggregate (actual rows=1 loops=3) - -> Parallel Append (actual rows=0 loops=N) - Subplans Removed: 8 - -> Parallel Seq Scan on ab_a1_b1 (never executed) - Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3]))) -(9 rows) - +ERROR: lost connection to parallel worker +CONTEXT: PL/pgSQL function explain_parallel_append(text) line 5 at FOR over EXECUTE statement -- Test Parallel Append with PARAM_EXEC Params select explain_parallel_append('select count(*) from ab where (a = (select 1) or a = (select 3)) and b = 2'); explain_parallel_append Failure is something like: 2019-02-03 21:44:42.456 EST [2812:327] pg_regress/partition_prune LOG: statement: select explain_parallel_append('execute ab_q5 (1, 1, 1)'); 2019-02-03 21:44:42.493 EST [2812:328] pg_regress/partition_prune LOG: statement: select explain_parallel_append('execute ab_q5 (2, 3, 3)'); 2019-02-03 21:44:42.531 EST [2812:329] pg_regress/partition_prune LOG: statement: select explain_parallel_append('execute ab_q5 (33, 44, 55)'); 2019-02-04 02:44:42.552 GMT [4172] FATAL: could not reattach to shared memory (key=01B4, addr=0198): error code 487 2019-02-03 21:44:42.555 EST [5116:6] LOG: background worker "parallel worker" (PID 4172) exited with exit code 1 2019-02-03 21:44:42.560 EST [2812:330] pg_regress/partition_prune ERROR: lost connection to parallel worker I don't think this is related to this commit. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Sun, Feb 3, 2019 at 2:06 PM Amit Kapila wrote: > > On Thu, Jan 31, 2019 at 6:03 PM Amit Kapila wrote: > This doesn't get applied cleanly after recent commit 0d1fe9f74e. > Attached is a rebased version. I have checked once that the changes > done by 0d1fe9f74e don't impact this patch. John, see if you can also > once confirm whether the recent commit (0d1fe9f74e) has any impact. I > am planning to push this tomorrow morning (IST) unless you or anyone > see any problem with this. Since that commit changes RelationAddExtraBlocks(), which can be induces by your pgbench adjustment upthread, I ran make check with that adjustment in the pgbench dir 300 times without triggering asserts. I also tried to follow the logic in 0d1fe9f74e, and I believe it will be correct without a FSM. [1] https://www.postgresql.org/message-id/CAA4eK1KRByXY03qR2JvUjUxKBzpBnCSO5H19oAC%3D_v4r5dzTwQ%40mail.gmail.com -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 6:03 PM Amit Kapila wrote: > On Thu, Jan 31, 2019 at 2:02 PM John Naylor > wrote: > > > > > > FYI, the second comment is still present in v20. > > > > oops, forgot to include in commit after making a change, done now. > This doesn't get applied cleanly after recent commit 0d1fe9f74e. Attached is a rebased version. I have checked once that the changes done by 0d1fe9f74e don't impact this patch. John, see if you can also once confirm whether the recent commit (0d1fe9f74e) has any impact. I am planning to push this tomorrow morning (IST) unless you or anyone see any problem with this. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v22-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Sat, Feb 2, 2019 at 7:30 AM Amit Kapila wrote: > > On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila wrote: > > > > On Mon, Jan 28, 2019 at 10:03 AM John Naylor > > wrote: > > In the past few days, we have done a further analysis of each problem > and tried to reproduce it. We are successful in generating some form > of reproducer for 3 out of 4 problems in the same way as it was failed > in the buildfarm. For the fourth symptom, we have tried a lot (even > Andrew Dunstan has helped us to run the regression tests with the > faulty commit on Jacana for many hours, but it didn't got reproduced) > but not able to regenerate a failure in a similar way. However, I > have a theory as mentioned below why the particular test could fail > and the fix for the same is done in the patch. I am planning to push > the latest version of the patch [1] which has fixes for all the > symptoms. > Today, I have spent some more time to generate a test which can reproduce the failure though it is with the help of breakpoints. See below: > > > > 4. Failure on jacana: > > --- > > c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/box.out > > 2018-09-26 > > 17:53:33 -0400 > > +++ > > c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/results/box.out > > 2019-01-27 23:14:35 > > -0500 > > @@ -252,332 +252,7 @@ > > ('(0,100)(0,infinity)'), > > ('(-infinity,0)(0,infinity)'), > > ('(-infinity,-infinity)(infinity,infinity)'); > > -SET enable_seqscan = false; > > -SELECT * FROM box_temp WHERE f1 << '(10,20),(30,40)'; > > .. > > .. > > TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File: > > "c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c", > > Line: > > 1118) > > .. > > 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:4] LOG: server process > > (PID 14388) exited with exit code 3 > > 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:5] DETAIL: Failed process > > was running: INSERT INTO box_temp > > VALUES (NULL), > > > > I think the reason for this failure is same as previous (as mentioned > > in point-3), but this can happen in a different way. Say, we have > > searched the local map and then try to extend a relation 'X' and in > > the meantime, another backend has extended such that it creates FSM. > > Now, we will reuse that page and won't clear local map. Now, say we > > try to insert in relation 'Y' which doesn't have FSM. It will try to > > set the local map and will find that it already exists, so will fail. > > Now, the question is how it can happen in this box.sql test. I guess > > that is happening for some system table which is being populated by > > Create Index statement executed just before the failing Insert. > > Based on the above theory, the test is as below: Session-1 --- postgres=# create table test_1(c1 int, c2 char(1500)); CREATE TABLE postgres=# create table test_2(c1 int, c2 char(1500)); CREATE TABLE postgres=# insert into test_1 values(generate_series(1,20),''); INSERT 0 20 postgres=# insert into test_2 values(1,''); INSERT 0 1 Session-2 postgres=# analyze test_1; ANALYZE postgres=# analyze test_2; ANALYZE postgres=# select oid, relname, relpages from pg_class where relname like 'test%'; oid |relname | relpages ---++-- 41835 | test_1 |4 41838 | test_2 |1 Till here we can see that test_1 has 4 pages and test2_1 has 1 page. At this stage, even one more record insertion in test_1 will create a new page. So, now we have to hit the scenario with the help of debugger. For session-1, attach debugger and put breakpoint in RelationGetBufferForTuple(). Session-1 postgres=# insert into test_1 values(21,''); It will hit the breakpoint in RelationGetBufferForTuple(). Now, add one more breakpoint on line 542 in hio.c, aka below line: RelationGetBufferForTuple { .. else if (!ConditionalLockRelationForExtension(relation, ExclusiveLock)) Press continue and stop the debugger at line 542. Attach the debugger for session-2 and add a breakpoint on line 580 in hio.c, aka below line: RelationGetBufferForTuple { .. buffer = ReadBufferBI(relation, P_NEW, bistate); Session-2 --- postgres=# insert into test_1 values(22,''); It will hit the breakpoint in RelationGetBufferForTuple(). Now proceed with debugging on session-1 by one step. This is to ensure that session-1 doesn't get a conditional lock. Now, continue the debugger in session-2 and after that run vacuum test_1 from session-2, this will ensure that FSM is created for relation test_1. So the state of session-2 will be as below: Session-2 postgres=# insert into test_1 values(22,''); INSERT 0 1 postgres=# vacuum test_1; VACUUM postgres=# select oid, relname, relpages from pg_class where relname like 'test%'; oid |relname | relpages
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila wrote: > > On Mon, Jan 28, 2019 at 10:03 AM John Naylor > wrote: > > > > On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila wrote: > > > There are a few buildfarm failures due to this commit, see my email on > > > pgsql-committers. If you have time, you can also once look into > > > those. > > > > I didn't see anything in common with the configs of the failed > > members. None have a non-default BLCKSZ that I can see. > > > > I have done an analysis of the different failures on buildfarm. > In the past few days, we have done a further analysis of each problem and tried to reproduce it. We are successful in generating some form of reproducer for 3 out of 4 problems in the same way as it was failed in the buildfarm. For the fourth symptom, we have tried a lot (even Andrew Dunstan has helped us to run the regression tests with the faulty commit on Jacana for many hours, but it didn't got reproduced) but not able to regenerate a failure in a similar way. However, I have a theory as mentioned below why the particular test could fail and the fix for the same is done in the patch. I am planning to push the latest version of the patch [1] which has fixes for all the symptoms. Does anybody have any opinion here? > > 4. Failure on jacana: > --- > c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/box.out > 2018-09-26 > 17:53:33 -0400 > +++ > c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/results/box.out > 2019-01-27 23:14:35 > -0500 > @@ -252,332 +252,7 @@ > ('(0,100)(0,infinity)'), > ('(-infinity,0)(0,infinity)'), > ('(-infinity,-infinity)(infinity,infinity)'); > -SET enable_seqscan = false; > -SELECT * FROM box_temp WHERE f1 << '(10,20),(30,40)'; > .. > .. > TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File: > "c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c", > Line: > 1118) > .. > 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:4] LOG: server process > (PID 14388) exited with exit code 3 > 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:5] DETAIL: Failed process > was running: INSERT INTO box_temp > VALUES (NULL), > > I think the reason for this failure is same as previous (as mentioned > in point-3), but this can happen in a different way. Say, we have > searched the local map and then try to extend a relation 'X' and in > the meantime, another backend has extended such that it creates FSM. > Now, we will reuse that page and won't clear local map. Now, say we > try to insert in relation 'Y' which doesn't have FSM. It will try to > set the local map and will find that it already exists, so will fail. > Now, the question is how it can happen in this box.sql test. I guess > that is happening for some system table which is being populated by > Create Index statement executed just before the failing Insert. > > I think both 3 and 4 are timing issues, so we didn't got in our local > regression runs. > [1] - https://www.postgresql.org/message-id/CAA4eK1%2B3ajhRPC0jvUi6p_aMrTUpB568OBH10LrbHtvOLNTgqQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 9:18 PM John Naylor wrote: > > On Thu, Jan 31, 2019 at 4:06 PM Amit Kapila wrote: > > I don't think that moving fsm tests to brin would be a good approach. > > We want to have a separate test for each access method. I think if we > > want to do something to avoid portability issues, maybe we can do what > > Masahiko San has just suggested. > > We could also use the same plpgsql loop as in fsm.sql to check the ctid, > right? > Yes, however, I feel we should leave it as it is for now unless we see any risk of portability issues. The only reason to do that way is to avoid any failure for bigger block size (say BLCKSZ is 16KB or 32KB). Does anyone else have any opinion on whether we should try to write tests which should care for bigger block size? I see that existing regression tests fail if we configure with bigger block size, so not sure if we should try to avoid that here. In an ideal scenario, I think it would be good if we can write tests which pass on all kind of block sizes, that will make the life easier if tomorrow one wants to set up a buildfarm or do the testing for bigger block sizes. > > OTOH, I think we are just good w.r.t > > this issue with the last patch I sent. I think unless we see some > > problem here, we should put energy into having a reproducible test for > > the fourth problem mentioned in my mail up thread [1]. Do you think > > it makes sense to run make check in loop for multiple times or do you > > have any idea how we can have a reproducible test? > > Okay. Earlier I tried running make installcheck with > force_parallel_mode='regress', but didn't get a failure. > AFAICS, this failure was not for force_parallel_mode='regress'. See the config at [1]. > I may not > have run enough times, though. > Yeah, probably running make check or make installcheck many times would help, but not sure. > I'll have to think about how to induce > it. > Thanks! [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2019-01-28%2004%3A00%3A23 -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 4:06 PM Amit Kapila wrote: > I don't think that moving fsm tests to brin would be a good approach. > We want to have a separate test for each access method. I think if we > want to do something to avoid portability issues, maybe we can do what > Masahiko San has just suggested. We could also use the same plpgsql loop as in fsm.sql to check the ctid, right? > OTOH, I think we are just good w.r.t > this issue with the last patch I sent. I think unless we see some > problem here, we should put energy into having a reproducible test for > the fourth problem mentioned in my mail up thread [1]. Do you think > it makes sense to run make check in loop for multiple times or do you > have any idea how we can have a reproducible test? Okay. Earlier I tried running make installcheck with force_parallel_mode='regress', but didn't get a failure. I may not have run enough times, though. I'll have to think about how to induce it. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 7:23 PM John Naylor wrote: > > On Thu, Jan 31, 2019 at 1:52 PM John Naylor > wrote: > > > > On Thu, Jan 31, 2019 at 1:43 PM Amit Kapila wrote: > > > > I have an idea -- instead of adding a bunch of records and hoping that > > > > the relation size and free space is consistent across platforms, how > > > > about we revert to the original test input, and add a BRIN index? That > > > > should have a FSM even with one record. > > > > > > > > > > Why would BRIN index allow having FSM for heap relation? > > > > Oops, I forgot this file is for testing heaps only. That said, we > > could possibly put most of the FSM tests such as > > > > SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0)); > > > > into brin.sql since we know a non-empty BRIN index will have a FSM. > > As in the attached. Applies on top of v20. First to revert to HEAD, > second to move FSM tests to brin.sql. This is a much less invasive and > more readable patch, in addition to being hopefully more portable. > I don't think that moving fsm tests to brin would be a good approach. We want to have a separate test for each access method. I think if we want to do something to avoid portability issues, maybe we can do what Masahiko San has just suggested. OTOH, I think we are just good w.r.t this issue with the last patch I sent. I think unless we see some problem here, we should put energy into having a reproducible test for the fourth problem mentioned in my mail up thread [1]. Do you think it makes sense to run make check in loop for multiple times or do you have any idea how we can have a reproducible test? [1] - https://www.postgresql.org/message-id/CAA4eK1L%3DqWp_bJ5aTc9%2Bfy4Ewx2LPaLWY-RbR4a60g_rupCKnQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 6:41 AM Amit Kapila wrote: > > On Wed, Jan 30, 2019 at 10:41 PM Masahiko Sawada > wrote: > > > > On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila wrote: > > > > > > On Tue, Jan 29, 2019 at 8:12 PM John Naylor > > > wrote: > > > > > > > > On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila > > > > wrote: > > > > > > > > > You can find this change in attached patch. Then, I ran the make > > > > > check in src/bin/pgbench multiple times using test_conc_insert.sh. > > > > > You can vary the number of times the test should run, if you are not > > > > > able to reproduce it with this. > > > > > > > > > > The attached patch (clear_local_map_if_exists_1.patch) atop the main > > > > > patch fixes the issue for me. Kindly verify the same. > > > > > > > > I got one failure in 50 runs. With the new patch, I didn't get any > > > > failures in 300 runs. > > > > > > > > > > Thanks for verification. I have included it in the attached patch and > > > I have also modified the page.sql test to have enough number of pages > > > in relation so that FSM will get created irrespective of alignment > > > boundaries. Masahiko San, can you verify if this now works for you? > > > > > > > Thank you for updating the patch! > > > > The modified page.sql test could fail if the block size is more than > > 8kB? > > That's right, but I don't think current regression tests will work for > block size greater than 8KB. I have tried with 16 and 32 as block > size, there were few failures on the head itself. Understood. That means that no build farm configures other block size than 8kB. > > > We can ensure the number of pages are more than 4 by checking it > > and adding more data if no enough but I'm really not sure we should > > care the bigger-block size cases. > > > > Yeah, I am not sure either. I think as this is an existing test, we > should not try to change it too much. However, if both you and John > feel it is better to change, we can go with that. > So I think the patch you proposed looks good to me but how about adding the check whether the table is more than 4 pages? For example, SELECT (pg_relation_size('test_rel_forks') / current_setting('block_size')::int) > 4; Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 1:52 PM John Naylor wrote: > > On Thu, Jan 31, 2019 at 1:43 PM Amit Kapila wrote: > > > I have an idea -- instead of adding a bunch of records and hoping that > > > the relation size and free space is consistent across platforms, how > > > about we revert to the original test input, and add a BRIN index? That > > > should have a FSM even with one record. > > > > > > > Why would BRIN index allow having FSM for heap relation? > > Oops, I forgot this file is for testing heaps only. That said, we > could possibly put most of the FSM tests such as > > SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0)); > > into brin.sql since we know a non-empty BRIN index will have a FSM. As in the attached. Applies on top of v20. First to revert to HEAD, second to move FSM tests to brin.sql. This is a much less invasive and more readable patch, in addition to being hopefully more portable. > And in page.sql we could just have a test that the table has no FSM. This is not possible, since we don't know the relfilenode for the error text, and it's not important. Better to have everything in brin.sql. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services commit 18b444cdfc2c50502389b84f4504ec9c7f2ab831 Author: John Naylor Date: Thu Jan 31 14:20:14 2019 +0100 revert contrib so I can redo diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out index 89b73ca991..3fcd9fbe6d 100644 --- a/contrib/pageinspect/expected/page.out +++ b/contrib/pageinspect/expected/page.out @@ -1,69 +1,48 @@ CREATE EXTENSION pageinspect; -CREATE TABLE test_rel_forks (a int); --- Make sure there are enough blocks in the heap for the FSM to be created. -INSERT INTO test_rel_forks SELECT i from generate_series(1,2000) i; --- set up FSM and VM -VACUUM test_rel_forks; +CREATE TABLE test1 (a int, b int); +INSERT INTO test1 VALUES (16777217, 131584); +VACUUM test1; -- set up FSM -- The page contents can vary, so just test that it can be read -- successfully, but don't keep the output. -SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0; +SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0; main_0 8192 (1 row) -SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; -ERROR: block number 100 is out of range for relation "test_rel_forks" -SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; +SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1; +ERROR: block number 1 is out of range for relation "test1" +SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0; fsm_0 --- 8192 (1 row) -SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 20)) AS fsm_20; -ERROR: block number 20 is out of range for relation "test_rel_forks" -SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0; +SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1; + fsm_1 +--- + 8192 +(1 row) + +SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0; vm_0 -- 8192 (1 row) -SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1; -ERROR: block number 1 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1; +ERROR: block number 1 is out of range for relation "test1" SELECT octet_length(get_raw_page('xxx', 'main', 0)); ERROR: relation "xxx" does not exist -SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0)); +SELECT octet_length(get_raw_page('test1', 'xxx', 0)); ERROR: invalid fork name HINT: Valid fork names are "main", "fsm", "vm", and "init". -SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0)); - fsm_page_contents - 0: 39+ - 1: 39+ - 3: 39+ - 7: 39+ - 15: 39 + - 31: 39 + - 63: 39 + - 127: 39 + - 255: 39 + - 511: 39 + - 1023: 39 + - 2047: 39 + - 4095: 39 + - fp_next_slot: 0 + - -(1 row) - -SELECT get_raw_page('test_rel_forks', 0) = get_raw_page('test_rel_forks', 'main', 0); +SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0); ?column? -- t (1 row) -DROP TABLE test_rel_forks; -CREATE TABLE test1 (a int, b int); -INSERT INTO test1 VALUES (16777217, 131584); SELECT pagesize, version FROM page_header(get_raw_page('test1', 0)); pagesize | version --+- @@ -83,6 +62,26 @@ SELECT tuple_data_split('test1'::regclass, t_data, t_infomask, t_infomask2, t_bi {"\\x0101","\\x00020200"} (1 row) +SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0)); + fsm_page_contents +--- + 0: 254 + + 1: 254 + + 3: 254 + + 7: 254 + + 15: 254 + + 31: 254 + + 63: 254 + +
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 1:43 PM Amit Kapila wrote: > > I have an idea -- instead of adding a bunch of records and hoping that > > the relation size and free space is consistent across platforms, how > > about we revert to the original test input, and add a BRIN index? That > > should have a FSM even with one record. > > > > Why would BRIN index allow having FSM for heap relation? Oops, I forgot this file is for testing heaps only. That said, we could possibly put most of the FSM tests such as SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0)); into brin.sql since we know a non-empty BRIN index will have a FSM. And in page.sql we could just have a test that the table has no FSM. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 2:12 PM John Naylor wrote: > > On Thu, Jan 31, 2019 at 6:41 AM Amit Kapila wrote: > > > > On Wed, Jan 30, 2019 at 10:41 PM Masahiko Sawada > > wrote: > > > > > > The modified page.sql test could fail if the block size is more than > > > 8kB? > > > > That's right, but I don't think current regression tests will work for > > block size greater than 8KB. I have tried with 16 and 32 as block > > size, there were few failures on the head itself. > > > > > We can ensure the number of pages are more than 4 by checking it > > > and adding more data if no enough but I'm really not sure we should > > > care the bigger-block size cases. > > > > > > > Yeah, I am not sure either. I think as this is an existing test, we > > should not try to change it too much. However, if both you and John > > feel it is better to change, we can go with that. > > I have an idea -- instead of adding a bunch of records and hoping that > the relation size and free space is consistent across platforms, how > about we revert to the original test input, and add a BRIN index? That > should have a FSM even with one record. > Why would BRIN index allow having FSM for heap relation? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 2:02 PM John Naylor wrote: > > On Thu, Jan 31, 2019 at 6:37 AM Amit Kapila wrote: > > > > On Wed, Jan 30, 2019 at 8:11 PM John Naylor > > wrote: > > > > > > That's probably a good idea to limit risk. I just very basic tests > > > now, and vacuum before every relation size check to make sure any FSM > > > extension (whether desired or not) is invoked. Also, in my last patch > > > I forgot to implement explicit checks of the block number instead of > > > assuming how many rows will fit on a page. I've used a plpgsql code > > > block to do this. > > > > > > > -- Extend table with enough blocks to exceed the FSM threshold > > -- FSM is created and extended to 3 blocks > > > > The second comment line seems redundant to me, so I have removed that > > and integrated it in the main patch. > > FYI, the second comment is still present in v20. > oops, forgot to include in commit after making a change, done now. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v21-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 6:41 AM Amit Kapila wrote: > > On Wed, Jan 30, 2019 at 10:41 PM Masahiko Sawada > wrote: > > > > The modified page.sql test could fail if the block size is more than > > 8kB? > > That's right, but I don't think current regression tests will work for > block size greater than 8KB. I have tried with 16 and 32 as block > size, there were few failures on the head itself. > > > We can ensure the number of pages are more than 4 by checking it > > and adding more data if no enough but I'm really not sure we should > > care the bigger-block size cases. > > > > Yeah, I am not sure either. I think as this is an existing test, we > should not try to change it too much. However, if both you and John > feel it is better to change, we can go with that. I have an idea -- instead of adding a bunch of records and hoping that the relation size and free space is consistent across platforms, how about we revert to the original test input, and add a BRIN index? That should have a FSM even with one record. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 31, 2019 at 6:37 AM Amit Kapila wrote: > > On Wed, Jan 30, 2019 at 8:11 PM John Naylor > wrote: > > > > That's probably a good idea to limit risk. I just very basic tests > > now, and vacuum before every relation size check to make sure any FSM > > extension (whether desired or not) is invoked. Also, in my last patch > > I forgot to implement explicit checks of the block number instead of > > assuming how many rows will fit on a page. I've used a plpgsql code > > block to do this. > > > > -- Extend table with enough blocks to exceed the FSM threshold > -- FSM is created and extended to 3 blocks > > The second comment line seems redundant to me, so I have removed that > and integrated it in the main patch. FYI, the second comment is still present in v20. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 30, 2019 at 10:41 PM Masahiko Sawada wrote: > > On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila wrote: > > > > On Tue, Jan 29, 2019 at 8:12 PM John Naylor > > wrote: > > > > > > On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila > > > wrote: > > > > > > > You can find this change in attached patch. Then, I ran the make > > > > check in src/bin/pgbench multiple times using test_conc_insert.sh. > > > > You can vary the number of times the test should run, if you are not > > > > able to reproduce it with this. > > > > > > > > The attached patch (clear_local_map_if_exists_1.patch) atop the main > > > > patch fixes the issue for me. Kindly verify the same. > > > > > > I got one failure in 50 runs. With the new patch, I didn't get any > > > failures in 300 runs. > > > > > > > Thanks for verification. I have included it in the attached patch and > > I have also modified the page.sql test to have enough number of pages > > in relation so that FSM will get created irrespective of alignment > > boundaries. Masahiko San, can you verify if this now works for you? > > > > Thank you for updating the patch! > > The modified page.sql test could fail if the block size is more than > 8kB? That's right, but I don't think current regression tests will work for block size greater than 8KB. I have tried with 16 and 32 as block size, there were few failures on the head itself. > We can ensure the number of pages are more than 4 by checking it > and adding more data if no enough but I'm really not sure we should > care the bigger-block size cases. > Yeah, I am not sure either. I think as this is an existing test, we should not try to change it too much. However, if both you and John feel it is better to change, we can go with that. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 30, 2019 at 8:11 PM John Naylor wrote: > > On Wed, Jan 30, 2019 at 2:11 PM Amit Kapila wrote: > > This is much better than the earlier version of test and there is no > > dependency on the vacuum. However, I feel still there is some > > dependency on how the rows will fit in a page and we have seen some > > related failures due to alignment stuff. By looking at the test, I > > can't envision any such problem, but how about if we just write some > > simple tests where we can check that the FSM won't be created for very > > small number of records say one or two and then when we increase the > > records FSM gets created, here if we want, we can even use vacuum to > > ensure FSM gets created. Once we are sure that the main patch passes > > all the buildfarm tests, we can extend the test to something advanced > > as you are proposing now. I think that will reduce the chances of > > failure, what do you think? > > That's probably a good idea to limit risk. I just very basic tests > now, and vacuum before every relation size check to make sure any FSM > extension (whether desired or not) is invoked. Also, in my last patch > I forgot to implement explicit checks of the block number instead of > assuming how many rows will fit on a page. I've used a plpgsql code > block to do this. > -- Extend table with enough blocks to exceed the FSM threshold -- FSM is created and extended to 3 blocks The second comment line seems redundant to me, so I have removed that and integrated it in the main patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v20-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila wrote: > > On Tue, Jan 29, 2019 at 8:12 PM John Naylor > wrote: > > > > On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila > > wrote: > > > > > You can find this change in attached patch. Then, I ran the make > > > check in src/bin/pgbench multiple times using test_conc_insert.sh. > > > You can vary the number of times the test should run, if you are not > > > able to reproduce it with this. > > > > > > The attached patch (clear_local_map_if_exists_1.patch) atop the main > > > patch fixes the issue for me. Kindly verify the same. > > > > I got one failure in 50 runs. With the new patch, I didn't get any > > failures in 300 runs. > > > > Thanks for verification. I have included it in the attached patch and > I have also modified the page.sql test to have enough number of pages > in relation so that FSM will get created irrespective of alignment > boundaries. Masahiko San, can you verify if this now works for you? > Thank you for updating the patch! The modified page.sql test could fail if the block size is more than 8kB? We can ensure the number of pages are more than 4 by checking it and adding more data if no enough but I'm really not sure we should care the bigger-block size cases. However maybe it's good to check the number of pages after insertion so that we can break down the issue in case the test failed again. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 30, 2019 at 2:11 PM Amit Kapila wrote: > This is much better than the earlier version of test and there is no > dependency on the vacuum. However, I feel still there is some > dependency on how the rows will fit in a page and we have seen some > related failures due to alignment stuff. By looking at the test, I > can't envision any such problem, but how about if we just write some > simple tests where we can check that the FSM won't be created for very > small number of records say one or two and then when we increase the > records FSM gets created, here if we want, we can even use vacuum to > ensure FSM gets created. Once we are sure that the main patch passes > all the buildfarm tests, we can extend the test to something advanced > as you are proposing now. I think that will reduce the chances of > failure, what do you think? That's probably a good idea to limit risk. I just very basic tests now, and vacuum before every relation size check to make sure any FSM extension (whether desired or not) is invoked. Also, in my last patch I forgot to implement explicit checks of the block number instead of assuming how many rows will fit on a page. I've used a plpgsql code block to do this. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out index df6b15b9d2..eb2c3842e2 100644 --- a/src/test/regress/expected/fsm.out +++ b/src/test/regress/expected/fsm.out @@ -2,46 +2,30 @@ -- Free Space Map test -- CREATE TABLE fsm_check_size (num int, str text); --- Fill 3 blocks with as many large records as will fit --- No FSM -INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a') -FROM generate_series(1,7*3) i; +-- With one block, there should be no FSM +INSERT INTO fsm_check_size VALUES(1, 'a'); VACUUM fsm_check_size; SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; heap_size | fsm_size ---+-- - 24576 |0 -(1 row) - --- Clear some space on block 0 -DELETE FROM fsm_check_size WHERE num <= 5; -VACUUM fsm_check_size; --- Insert small record in block 2 to set the cached smgr targetBlock -INSERT INTO fsm_check_size VALUES(99, 'b'); --- Insert large record and make sure it goes in block 0 rather than --- causing the relation to extend -INSERT INTO fsm_check_size VALUES (101, rpad('', 1024, 'a')); -SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, -pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; - heap_size | fsm_size +-- - 24576 |0 + 8192 |0 (1 row) -- Extend table with enough blocks to exceed the FSM threshold -- FSM is created and extended to 3 blocks -INSERT INTO fsm_check_size SELECT i, 'c' FROM generate_series(200,1200) i; -VACUUM fsm_check_size; -SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; - fsm_size --- -24576 -(1 row) - --- Truncate heap to 1 block --- No change in FSM -DELETE FROM fsm_check_size WHERE num > 7; +DO $$ +DECLARE curtid tid; +num int; +BEGIN +num = 11; + LOOP +INSERT INTO fsm_check_size VALUES (num, 'b') RETURNING ctid INTO curtid; +EXIT WHEN curtid >= tid '(4, 0)'; +num = num + 1; + END LOOP; +END; +$$; VACUUM fsm_check_size; SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; fsm_size @@ -49,16 +33,6 @@ SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; 24576 (1 row) --- Truncate heap to 0 blocks --- FSM now truncated to 2 blocks -DELETE FROM fsm_check_size; -VACUUM fsm_check_size; -SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; - fsm_size --- -16384 -(1 row) - -- Add long random string to extend TOAST table to 1 block INSERT INTO fsm_check_size VALUES(0, (SELECT string_agg(md5(chr(i)), '') diff --git a/src/test/regress/sql/fsm.sql b/src/test/regress/sql/fsm.sql index 07f505591a..b1debedea1 100644 --- a/src/test/regress/sql/fsm.sql +++ b/src/test/regress/sql/fsm.sql @@ -4,42 +4,28 @@ CREATE TABLE fsm_check_size (num int, str text); --- Fill 3 blocks with as many large records as will fit --- No FSM -INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a') -FROM generate_series(1,7*3) i; -VACUUM fsm_check_size; -SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, -pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; +-- With one block, there should be no FSM +INSERT INTO fsm_check_size VALUES(1, 'a'); --- Clear some space on block 0 -DELETE FROM fsm_check_size WHERE num <= 5; VACUUM fsm_check_size; - --- Insert small record in block 2 to set the cached smgr targetBlock -INSERT INTO fsm_check_size VALUES(99, 'b'); - --- Insert large record and make sure it goes in block 0 rather than --- causing the relation to extend -INSERT INTO fsm_check_size VALUES (101, rpad('', 1024, 'a')); SELECT pg_relation_size('fsm_check_size',
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 30, 2019 at 3:26 PM John Naylor wrote: > > On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila wrote: > > There are two more failures which we need to something about. > > 1. Make fsm.sql independent of vacuum without much losing on coverage > > of newly added code. John, I guess you have an idea, see if you can > > take care of it, otherwise, I will see what I can do for it. > > I've attached a patch that applies on top of v19 that uses Andrew > Gierth's idea to use fillfactor to control free space. I've also > removed tests that relied on truncation and weren't very useful to > begin with. > This is much better than the earlier version of test and there is no dependency on the vacuum. However, I feel still there is some dependency on how the rows will fit in a page and we have seen some related failures due to alignment stuff. By looking at the test, I can't envision any such problem, but how about if we just write some simple tests where we can check that the FSM won't be created for very small number of records say one or two and then when we increase the records FSM gets created, here if we want, we can even use vacuum to ensure FSM gets created. Once we are sure that the main patch passes all the buildfarm tests, we can extend the test to something advanced as you are proposing now. I think that will reduce the chances of failure, what do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila wrote: > There are two more failures which we need to something about. > 1. Make fsm.sql independent of vacuum without much losing on coverage > of newly added code. John, I guess you have an idea, see if you can > take care of it, otherwise, I will see what I can do for it. I've attached a patch that applies on top of v19 that uses Andrew Gierth's idea to use fillfactor to control free space. I've also removed tests that relied on truncation and weren't very useful to begin with. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out index df6b15b9d2..f92d454042 100644 --- a/src/test/regress/expected/fsm.out +++ b/src/test/regress/expected/fsm.out @@ -2,11 +2,10 @@ -- Free Space Map test -- CREATE TABLE fsm_check_size (num int, str text); --- Fill 3 blocks with as many large records as will fit --- No FSM +-- Fill 3 blocks with one record each +ALTER TABLE fsm_check_size SET (fillfactor=15); INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a') -FROM generate_series(1,7*3) i; -VACUUM fsm_check_size; +FROM generate_series(1,3) i; SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; heap_size | fsm_size @@ -14,11 +13,12 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; 24576 |0 (1 row) --- Clear some space on block 0 -DELETE FROM fsm_check_size WHERE num <= 5; -VACUUM fsm_check_size; --- Insert small record in block 2 to set the cached smgr targetBlock -INSERT INTO fsm_check_size VALUES(99, 'b'); +-- Fill most of the last block +ALTER TABLE fsm_check_size SET (fillfactor=100); +INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a') +FROM generate_series(11,15) i; +-- Make sure records can go into any block but the last one +ALTER TABLE fsm_check_size SET (fillfactor=30); -- Insert large record and make sure it goes in block 0 rather than -- causing the relation to extend INSERT INTO fsm_check_size VALUES (101, rpad('', 1024, 'a')); @@ -32,38 +32,16 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; -- Extend table with enough blocks to exceed the FSM threshold -- FSM is created and extended to 3 blocks INSERT INTO fsm_check_size SELECT i, 'c' FROM generate_series(200,1200) i; -VACUUM fsm_check_size; -SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; - fsm_size --- -24576 -(1 row) - --- Truncate heap to 1 block --- No change in FSM -DELETE FROM fsm_check_size WHERE num > 7; -VACUUM fsm_check_size; SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; fsm_size -- 24576 (1 row) --- Truncate heap to 0 blocks --- FSM now truncated to 2 blocks -DELETE FROM fsm_check_size; -VACUUM fsm_check_size; -SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; - fsm_size --- -16384 -(1 row) - -- Add long random string to extend TOAST table to 1 block INSERT INTO fsm_check_size VALUES(0, (SELECT string_agg(md5(chr(i)), '') FROM generate_series(1,100) i)); -VACUUM fsm_check_size; SELECT pg_relation_size(reltoastrelid, 'main') AS toast_size, pg_relation_size(reltoastrelid, 'fsm') AS toast_fsm_size FROM pg_class WHERE relname = 'fsm_check_size'; diff --git a/src/test/regress/sql/fsm.sql b/src/test/regress/sql/fsm.sql index 07f505591a..542bd6e203 100644 --- a/src/test/regress/sql/fsm.sql +++ b/src/test/regress/sql/fsm.sql @@ -4,20 +4,21 @@ CREATE TABLE fsm_check_size (num int, str text); --- Fill 3 blocks with as many large records as will fit --- No FSM +-- Fill 3 blocks with one record each +ALTER TABLE fsm_check_size SET (fillfactor=15); INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a') -FROM generate_series(1,7*3) i; -VACUUM fsm_check_size; +FROM generate_series(1,3) i; + SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size, pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; --- Clear some space on block 0 -DELETE FROM fsm_check_size WHERE num <= 5; -VACUUM fsm_check_size; +-- Fill most of the last block +ALTER TABLE fsm_check_size SET (fillfactor=100); +INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a') +FROM generate_series(11,15) i; --- Insert small record in block 2 to set the cached smgr targetBlock -INSERT INTO fsm_check_size VALUES(99, 'b'); +-- Make sure records can go into any block but the last one +ALTER TABLE fsm_check_size SET (fillfactor=30); -- Insert large record and make sure it goes in block 0 rather than -- causing the relation to extend @@ -28,26 +29,12 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; -- Extend table with enough blocks to exceed the FSM threshold -- FSM is created and extended to 3 blocks INSERT INTO fsm_check_size SELECT i, 'c' FROM generate_series(200,1200) i; -VACUUM fsm_check_size; -SELECT pg_relation_size('fsm_check_size',
Re: WIP: Avoid creation of the free space map for small tables
On Tue, Jan 29, 2019 at 8:12 PM John Naylor wrote: > > On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila wrote: > > > You can find this change in attached patch. Then, I ran the make > > check in src/bin/pgbench multiple times using test_conc_insert.sh. > > You can vary the number of times the test should run, if you are not > > able to reproduce it with this. > > > > The attached patch (clear_local_map_if_exists_1.patch) atop the main > > patch fixes the issue for me. Kindly verify the same. > > I got one failure in 50 runs. With the new patch, I didn't get any > failures in 300 runs. > Thanks for verification. I have included it in the attached patch and I have also modified the page.sql test to have enough number of pages in relation so that FSM will get created irrespective of alignment boundaries. Masahiko San, can you verify if this now works for you? There are two more failures which we need to something about. 1. Make fsm.sql independent of vacuum without much losing on coverage of newly added code. John, I guess you have an idea, see if you can take care of it, otherwise, I will see what I can do for it. 2. I still could not figure out how to verify if the failure on Jacana will be fixed. I have posted some theory above and the attached patch has a solution for it, but I think it would be better if find out some way to verify the same. Note - you might see some cosmetic changes in freespace.c due to pgindent. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v19-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila wrote: > You can find this change in attached patch. Then, I ran the make > check in src/bin/pgbench multiple times using test_conc_insert.sh. > You can vary the number of times the test should run, if you are not > able to reproduce it with this. > > The attached patch (clear_local_map_if_exists_1.patch) atop the main > patch fixes the issue for me. Kindly verify the same. I got one failure in 50 runs. With the new patch, I didn't get any failures in 300 runs. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Tue, Jan 29, 2019 at 5:20 PM Masahiko Sawada wrote: > > On Tue, Jan 29, 2019 at 9:29 AM Amit Kapila wrote: > > > > I'd suspect the alignment of integer. In my environemnt, the tuple > actual size is 28 bytes but the aligned size is 32 bytes (= > MAXALIGN(28)), so we can store 226 tuples to single page. But if > MAXALIGN(28) = 28 then we can store 255 tuples and 1000 tuples fits > within 4 pages. The MAXALIGN of four buildfarms seem 4 accroding to > the configure script so MAXALIGN(28) might be 28 on these buildfarms. > Good finding. I was also wondering along these lines and wanted to verify. Thanks a lot. So, this clearly states why we have a second failure in my email above [1]. I think this means for the fsm test also we have to be careful when relying on the number of pages in the test. I think now we have found the reasons and solutions for the first three problems mentioned in my email [1]. For the problem-4 (Failure on jacana:), I have speculated some theory, but not sure how can we confirm? Can we try the patch on Jacana before considering the patch for commit? Is there any other way we can replicate that error? [1] - https://www.postgresql.org/message-id/CAA4eK1L%3DqWp_bJ5aTc9%2Bfy4Ewx2LPaLWY-RbR4a60g_rupCKnQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Tue, Jan 29, 2019 at 5:59 AM Amit Kapila wrote: > > On Tue, Jan 29, 2019 at 12:37 AM John Naylor > wrote: > > > I think here you need to clear the map if it exists or clear it > > > unconditionally, the earlier one would be better. > > > > Ok, maybe all callers should call it unconditonally, but within the > > function, check "if (FSM_LOCAL_MAP_EXISTS)"? > > > > Sounds sensible. I think we should try to reproduce these failures, > for ex. for pgbench failure, we can try the same test with more > clients. > I am able to reproduce this by changing pgbench test as below: --- a/src/bin/pgbench/t/001_pgbench_with_server.pl +++ b/src/bin/pgbench/t/001_pgbench_with_server.pl @@ -56,9 +56,9 @@ $node->safe_psql('postgres', 'CREATE UNLOGGED TABLE insert_tbl (id serial primary key); '); pgbench( - '--no-vacuum --client=5 --protocol=prepared --transactions=25', + '--no-vacuum --client=10 --protocol=prepared --transactions=25', 0, - [qr{processed: 125/125}], + [qr{processed: 250/250}], You can find this change in attached patch. Then, I ran the make check in src/bin/pgbench multiple times using test_conc_insert.sh. You can vary the number of times the test should run, if you are not able to reproduce it with this. The attached patch (clear_local_map_if_exists_1.patch) atop the main patch fixes the issue for me. Kindly verify the same. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com change_pgbench_test_1.patch Description: Binary data for ((i = 1 ; i < 25; i++)) do make check done; clear_local_map_if_exists_1.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Tue, Jan 29, 2019 at 12:37 AM John Naylor wrote: > > On Mon, Jan 28, 2019 at 12:10 PM Amit Kapila wrote: > > > 2. > > @@ -15,13 +15,9 @@ > > SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS > > main_100; > > ERROR: block number 100 is out of range for relation "test_rel_forks" > > SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; > > - fsm_0 > > > > - 8192 > > -(1 row) > > - > > +ERROR: could not open file "base/50769/50798_fsm": No such file or > > directory > > SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; > > -ERROR: block number 10 is out of range for relation "test_rel_forks" > > +ERROR: could not open file "base/50769/50798_fsm": No such file or > > directory > > > > This indicates that even though the Vacuum is executed, but the FSM > > doesn't get created. This could be due to different BLCKSZ, but the > > failed machines don't seem to have a non-default value of it. I am > > not sure why this could happen, maybe we need to check once in the > > failed regression database to see the size of relation? > > I'm also having a hard time imagining why this failed. Just in case, > we could return ctid in a plpgsql loop and stop as soon as we see the > 5th block. I've done that for some tests during development and is a > safer method anyway. > I think we can devise some concrete way, but it is better first we try to understand why it failed, otherwise there is always a chance that we will repeat the mistake in some other case. I think we have no other choice, but to request the buildfarm owners to either give us the access to see what happens or help us in investigating the problem. The four buildfarms where it failed were lapwing, locust, dromedary, prairiedog. Among these, the owner of last two is Tom Lane and others I don't recognize. Tom, Andrew, can you help us in getting the access of one of those four? Yet another alternative is the owner can apply the patch attached (this is same what got committed) or reset to commit ac88d2962a and execute below statements and share the results: CREATE EXTENSION pageinspect; CREATE TABLE test_rel_forks (a int); INSERT INTO test_rel_forks SELECT i from generate_series(1,1000) i; VACUUM test_rel_forks; SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0; SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0; SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1; If the above statements give error: "ERROR: could not open file ...", then run: Analyze test_rel_forks; Select oid, relname, relpages, reltuples from pg_class where relname like 'test%'; The result of the above tests will tell us whether there are 5 pages in the table or not. If the table contains 5 pages and throws an error, then there is some bug in our code, otherwise, there is something specific to those systems where the above insert doesn't result in 5 pages. > > I think here you need to clear the map if it exists or clear it > > unconditionally, the earlier one would be better. > > Ok, maybe all callers should call it unconditonally, but within the > function, check "if (FSM_LOCAL_MAP_EXISTS)"? > Sounds sensible. I think we should try to reproduce these failures, for ex. for pgbench failure, we can try the same test with more clients. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v18-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 28, 2019 at 12:10 PM Amit Kapila wrote: > > On Mon, Jan 28, 2019 at 10:03 AM John Naylor > wrote: > > > 1. > @@ -26,7 +26,7 @@ > pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; > heap_size | fsm_size > ---+-- > - 24576 |0 > + 32768 |0 > (1 row) > > -- Extend table with enough blocks to exceed the FSM threshold > @@ -56,7 +56,7 @@ > SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; > fsm_size > -- > -16384 > +24576 > (1 row) > > > As discussed on another thread, this seems to be due to the reason > that a parallel auto-analyze doesn't allow vacuum to remove dead-row > versions. To fix this, I think we should avoid having a dependency on > vacuum to remove dead rows. Ok, to make the first test here more reliable I will try Andrew's idea to use fillfactor to save free space. As I said earlier, I think that second test isn't helpful and can be dropped. > 2. > @@ -15,13 +15,9 @@ > SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; > ERROR: block number 100 is out of range for relation "test_rel_forks" > SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; > - fsm_0 > > - 8192 > -(1 row) > - > +ERROR: could not open file "base/50769/50798_fsm": No such file or directory > SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; > -ERROR: block number 10 is out of range for relation "test_rel_forks" > +ERROR: could not open file "base/50769/50798_fsm": No such file or directory > > This indicates that even though the Vacuum is executed, but the FSM > doesn't get created. This could be due to different BLCKSZ, but the > failed machines don't seem to have a non-default value of it. I am > not sure why this could happen, maybe we need to check once in the > failed regression database to see the size of relation? I'm also having a hard time imagining why this failed. Just in case, we could return ctid in a plpgsql loop and stop as soon as we see the 5th block. I've done that for some tests during development and is a safer method anyway. > > > @@ -377,20 +383,9 @@ RelationGetBufferForTuple(Relation relation, Size len, > + > + /* > + * In case we used an in-memory map of available blocks, reset it > + * for next use. > + */ > + if (targetBlock < HEAP_FSM_CREATION_THRESHOLD) > + FSMClearLocalMap(); > + > > I think here you need to clear the map if it exists or clear it > unconditionally, the earlier one would be better. Ok, maybe all callers should call it unconditonally, but within the function, check "if (FSM_LOCAL_MAP_EXISTS)"? Thanks for investigating the failures -- I'm a bit pressed for time this week. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
> "Amit" == Amit Kapila writes: Amit> All of these seems to run with fsync=off. Is it possible that Amit> vacuum has updated FSM, but the same is not synced to disk and Amit> when we try to read it, we didn't get the required page? No. fsync never affects what programs see while the system is running, only what happens after an OS crash. -- Andrew (irc:RhodiumToad)
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila wrote: > > On Mon, Jan 28, 2019 at 10:03 AM John Naylor > wrote: > > > > On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila wrote: > > > There are a few buildfarm failures due to this commit, see my email on > > > pgsql-committers. If you have time, you can also once look into > > > those. > > > > I didn't see anything in common with the configs of the failed > > members. None have a non-default BLCKSZ that I can see. > > > > I have done an analysis of the different failures on buildfarm. > > > 2. > @@ -15,13 +15,9 @@ > SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; > ERROR: block number 100 is out of range for relation "test_rel_forks" > SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; > - fsm_0 > > - 8192 > -(1 row) > - > +ERROR: could not open file "base/50769/50798_fsm": No such file or directory > SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; > -ERROR: block number 10 is out of range for relation "test_rel_forks" > +ERROR: could not open file "base/50769/50798_fsm": No such file or directory > > This indicates that even though the Vacuum is executed, but the FSM > doesn't get created. This could be due to different BLCKSZ, but the > failed machines don't seem to have a non-default value of it. I am > not sure why this could happen, maybe we need to check once in the > failed regression database to see the size of relation? > This symptom is shown in the below buildfarm critters: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2019-01-28%2005%3A05%3A22 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2019-01-28%2003%3A20%3A02 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2019-01-28%2003%3A13%3A47 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary=2019-01-28%2003%3A07%3A39 All of these seems to run with fsync=off. Is it possible that vacuum has updated FSM, but the same is not synced to disk and when we try to read it, we didn't get the required page? This is just a guess. I have checked all the buildfarm failures and I see only 4 symptoms for which I have sent some initial analysis. I think you can also once cross-verify the same. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 28, 2019 at 10:03 AM John Naylor wrote: > > On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila wrote: > > There are a few buildfarm failures due to this commit, see my email on > > pgsql-committers. If you have time, you can also once look into > > those. > > I didn't see anything in common with the configs of the failed > members. None have a non-default BLCKSZ that I can see. > I have done an analysis of the different failures on buildfarm. 1. @@ -26,7 +26,7 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; heap_size | fsm_size ---+-- - 24576 |0 + 32768 |0 (1 row) -- Extend table with enough blocks to exceed the FSM threshold @@ -56,7 +56,7 @@ SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; fsm_size -- -16384 +24576 (1 row) As discussed on another thread, this seems to be due to the reason that a parallel auto-analyze doesn't allow vacuum to remove dead-row versions. To fix this, I think we should avoid having a dependency on vacuum to remove dead rows. 2. @@ -15,13 +15,9 @@ SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; ERROR: block number 100 is out of range for relation "test_rel_forks" SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; - fsm_0 - 8192 -(1 row) - +ERROR: could not open file "base/50769/50798_fsm": No such file or directory SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; -ERROR: block number 10 is out of range for relation "test_rel_forks" +ERROR: could not open file "base/50769/50798_fsm": No such file or directory This indicates that even though the Vacuum is executed, but the FSM doesn't get created. This could be due to different BLCKSZ, but the failed machines don't seem to have a non-default value of it. I am not sure why this could happen, maybe we need to check once in the failed regression database to see the size of relation? 3. Failure on 'mantid' 2019-01-28 00:13:55.191 EST [123979] 001_pgbench_with_server.pl LOG: statement: CREATE UNLOGGED TABLE insert_tbl (id serial primary key); 2019-01-28 00:13:55.218 EST [123982] 001_pgbench_with_server.pl LOG: execute P0_0: INSERT INTO insert_tbl SELECT FROM generate_series(1,1000); 2019-01-28 00:13:55.219 EST [123983] 001_pgbench_with_server.pl LOG: execute P0_0: INSERT INTO insert_tbl SELECT FROM generate_series(1,1000); 2019-01-28 00:13:55.220 EST [123984] 001_pgbench_with_server.pl LOG: execute P0_0: INSERT INTO insert_tbl SELECT FROM generate_series(1,1000); .. .. TRAP: FailedAssertion("!((rel->rd_rel->relkind == 'r' || rel->rd_rel->relkind == 't') && fsm_local_map.map[oldPage] == 0x01)", File: "freespace.c", Line: 223) I think this can happen if we forget to clear the local map after we get the block with space in function RelationGetBufferForTuple(). I see the race condition in the code where that can happen. Say, we tried all the blocks in the local map and then tried to extend the relation and we didn't get ConditionalLockRelationForExtension, in the meantime, another backend has extended the relation and updated the FSM (via RelationAddExtraBlocks). Now, when the backend that didn't get the extension lock will get the target block from FSM which will be greater than HEAP_FSM_CREATION_THRESHOLD. Next, it will find that the block can be used to insert a new row and return the buffer, but won't clear the local map due to below condition in code: @@ -377,20 +383,9 @@ RelationGetBufferForTuple(Relation relation, Size len, + + /* + * In case we used an in-memory map of available blocks, reset it + * for next use. + */ + if (targetBlock < HEAP_FSM_CREATION_THRESHOLD) + FSMClearLocalMap(); + I think here you need to clear the map if it exists or clear it unconditionally, the earlier one would be better. This test gets executed concurrently by 5 clients, so it can hit the above race condition. 4. Failure on jacana: --- c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/box.out 2018-09-26 17:53:33 -0400 +++ c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/results/box.out 2019-01-27 23:14:35 -0500 @@ -252,332 +252,7 @@ ('(0,100)(0,infinity)'), ('(-infinity,0)(0,infinity)'), ('(-infinity,-infinity)(infinity,infinity)'); -SET enable_seqscan = false; -SELECT * FROM box_temp WHERE f1 << '(10,20),(30,40)'; .. .. TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File: "c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c", Line: 1118) .. 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:4] LOG: server process (PID 14388) exited with exit code 3 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:5] DETAIL: Failed process was running: INSERT INTO box_temp VALUES (NULL), I think the reason for this failure is same as previous (as mentioned in point-3), but this can happen in a different way. Say, we
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 28, 2019 at 10:03 AM John Naylor wrote: > > On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila wrote: > > There are a few buildfarm failures due to this commit, see my email on > > pgsql-committers. If you have time, you can also once look into > > those. > > I didn't see anything in common with the configs of the failed > members. None have a non-default BLCKSZ that I can see. > > Looking at this typical example from woodlouse: > > == pgsql.build/src/test/regress/regression.diffs > == > --- C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/expected/fsm.out > 2019-01-28 04:43:09.031456700 +0100 > +++ C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/results/fsm.out > 2019-01-28 05:06:20.351100400 +0100 > @@ -26,7 +26,7 @@ > pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; > heap_size | fsm_size > ---+-- > - 24576 |0 > + 32768 |0 > (1 row) > > ***It seems like the relation extended when the new records should > have gone into block 0. > > -- Extend table with enough blocks to exceed the FSM threshold > @@ -56,7 +56,7 @@ > SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; > fsm_size > -- > -16384 > +24576 > (1 row) > > ***And here it seems vacuum didn't truncate the FSM. I wonder if the > heap didn't get truncated either. > Yeah, it seems to me that vacuum is not able to truncate the relation, see my latest reply on another thread [1]. [1] - https://www.postgresql.org/message-id/CAA4eK1JntHd7X6dLJVPGYV917HejjhbMKXn9m_RnnCE162LbLA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila wrote: > There are a few buildfarm failures due to this commit, see my email on > pgsql-committers. If you have time, you can also once look into > those. I didn't see anything in common with the configs of the failed members. None have a non-default BLCKSZ that I can see. Looking at this typical example from woodlouse: == pgsql.build/src/test/regress/regression.diffs == --- C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/expected/fsm.out 2019-01-28 04:43:09.031456700 +0100 +++ C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/results/fsm.out 2019-01-28 05:06:20.351100400 +0100 @@ -26,7 +26,7 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; heap_size | fsm_size ---+-- - 24576 |0 + 32768 |0 (1 row) ***It seems like the relation extended when the new records should have gone into block 0. -- Extend table with enough blocks to exceed the FSM threshold @@ -56,7 +56,7 @@ SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size; fsm_size -- -16384 +24576 (1 row) ***And here it seems vacuum didn't truncate the FSM. I wonder if the heap didn't get truncated either. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 28, 2019 at 9:16 AM John Naylor wrote: > > On Mon, Jan 28, 2019 at 3:53 AM Amit Kapila wrote: > > On Thu, Jan 24, 2019 at 9:14 AM Amit Kapila wrote: > > > Sure, apart from this I have run pgindent on the patches and make some > > > changes accordingly. Latest patches attached (only second patch has > > > some changes). I will take one more pass on Monday morning (28th Jan) > > > and will commit unless you or others see any problem. > > > > Pushed these two patches. > > Thank you for your input and detailed review! Thank you Mithun for testing! > There are a few buildfarm failures due to this commit, see my email on pgsql-committers. If you have time, you can also once look into those. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 28, 2019 at 3:53 AM Amit Kapila wrote: > On Thu, Jan 24, 2019 at 9:14 AM Amit Kapila wrote: > > Sure, apart from this I have run pgindent on the patches and make some > > changes accordingly. Latest patches attached (only second patch has > > some changes). I will take one more pass on Monday morning (28th Jan) > > and will commit unless you or others see any problem. > > Pushed these two patches. Thank you for your input and detailed review! Thank you Mithun for testing! -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 24, 2019 at 9:14 AM Amit Kapila wrote: > > Sure, apart from this I have run pgindent on the patches and make some > changes accordingly. Latest patches attached (only second patch has > some changes). I will take one more pass on Monday morning (28th Jan) > and will commit unless you or others see any problem. > Pushed these two patches. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Sat, Jan 26, 2019 at 2:14 PM Amit Kapila wrote: > > On Sat, Jan 26, 2019 at 5:05 AM John Naylor > wrote: > > > > So, in v19 we check pg_class.relpages and if it's > > a heap and less than or equal the threshold we call stat on the 0th > > segment to verify. > > > > Okay, but the way logic is implemented appears clumsy to me. > The function transfer_relfile has no clue about skipping of FSM stuff, > but it contains comments about it. Yeah, I wasn't entirely happy with how that turned out. > I think there is some value in using the information from > this function to skip fsm files, but the code doesn't appear to fit > well, how about moving this check to new function > new_cluster_needs_fsm()? For v21, new_cluster_needs_fsm() has all responsibility for obtaining the info it needs. I think this is much cleaner, but there is a small bit of code duplication since it now has to form the file name. One thing we could do is form the the base old/new file names in transfer_single_new_db() and pass those to transfer_relfile(), which will only add suffixes and segment numbers. We could then pass the base old file name to new_cluster_needs_fsm() and use it as is. Not sure if that's worthwhile, though. > The order in which relkind and relpages is used in the above code is > different from the order in which it is mentioned in the query, it > won't matter, but keeping in order will make look code consistent. I > have made this and some more minor code adjustments in the attached > patch. If you like those, you can include them in the next version of > your patch. Okay, done. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 2b70b65f43c05234b81e939b0ed49e34b97d5667 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sun, 27 Jan 2019 21:42:07 +0100 Subject: [PATCH v21 3/3] During pg_upgrade, conditionally skip transfer of FSMs. If a heap on the old cluster has 4 pages or fewer, and the old cluster was PG v11 or earlier, don't copy or link the FSM. This will shrink space usage for installations with large numbers of small tables. --- src/bin/pg_upgrade/info.c| 16 +++-- src/bin/pg_upgrade/pg_upgrade.h | 6 src/bin/pg_upgrade/relfilenode.c | 58 ++-- 3 files changed, 76 insertions(+), 4 deletions(-) diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c index 2f925f086c..902bfc647e 100644 --- a/src/bin/pg_upgrade/info.c +++ b/src/bin/pg_upgrade/info.c @@ -200,6 +200,8 @@ create_rel_filename_map(const char *old_data, const char *new_data, map->old_db_oid = old_db->db_oid; map->new_db_oid = new_db->db_oid; + map->relpages = old_rel->relpages; + map->relkind = old_rel->relkind; /* * old_relfilenode might differ from pg_class.oid (and hence @@ -418,6 +420,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) char *nspname = NULL; char *relname = NULL; char *tablespace = NULL; + char *relkind = NULL; int i_spclocation, i_nspname, i_relname, @@ -425,7 +428,9 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_indtable, i_toastheap, i_relfilenode, -i_reltablespace; +i_reltablespace, +i_relpages, +i_relkind; char query[QUERY_ALLOC]; char *last_namespace = NULL, *last_tablespace = NULL; @@ -494,7 +499,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) */ snprintf(query + strlen(query), sizeof(query) - strlen(query), "SELECT all_rels.*, n.nspname, c.relname, " - " c.relfilenode, c.reltablespace, %s " + " c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s " "FROM (SELECT * FROM regular_heap " " UNION ALL " " SELECT * FROM toast_heap " @@ -525,6 +530,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_relname = PQfnumber(res, "relname"); i_relfilenode = PQfnumber(res, "relfilenode"); i_reltablespace = PQfnumber(res, "reltablespace"); + i_relpages = PQfnumber(res, "relpages"); + i_relkind = PQfnumber(res, "relkind"); i_spclocation = PQfnumber(res, "spclocation"); for (relnum = 0; relnum < ntups; relnum++) @@ -556,6 +563,11 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) curr->relname = pg_strdup(relname); curr->relfilenode = atooid(PQgetvalue(res, relnum, i_relfilenode)); + curr->relpages = atoi(PQgetvalue(res, relnum, i_relpages)); + + relkind = PQgetvalue(res, relnum, i_relkind); + curr->relkind = relkind[0]; + curr->tblsp_alloc = false; /* Is the tablespace oid non-default? */ diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h index 2f67eee22b..baeb8ff0f8 100644 --- a/src/bin/pg_upgrade/pg_upgrade.h +++ b/src/bin/pg_upgrade/pg_upgrade.h @@ -147,6 +147,8 @@ typedef struct char *tablespace; /* tablespace path; "" for cluster default */ bool nsp_alloc; /* should nspname be freed? */ bool tblsp_alloc; /* should tablespace be freed? */ +
Re: WIP: Avoid creation of the free space map for small tables
On Sat, Jan 26, 2019 at 5:05 AM John Naylor wrote: > > On Thu, Jan 24, 2019 at 5:19 AM Amit Kapila wrote: > > > Performance testing is probably a good idea anyway, but I went ahead > and implemented your next idea: > > > The other alternative is we can fetch pg_class.relpages and rely on > > that to take this decision, but again if that is not updated, we might > > take the wrong decision. > > We can think of it this way: Which is worse, > 1. Transferring a FSM we don't need, or > 2. Skipping a FSM we need > > I'd say #2 is worse. > Agreed. > So, in v19 we check pg_class.relpages and if it's > a heap and less than or equal the threshold we call stat on the 0th > segment to verify. > Okay, but the way logic is implemented appears clumsy to me. @@ -234,16 +243,40 @@ transfer_relfile(FileNameMap *map, const char *type_suffix, bool vm_must_add_fro { /* File does not exist? That's OK, just return */ if (errno == ENOENT) - return; + return first_seg_size; else - pg_fatal("error while checking for file existence \"%s.%s\" (\"%s\" to \"%s\"): %s\n", - map->nspname, map->relname, old_file, new_file, - strerror(errno)); + goto fatal; } /* If file is empty, just return */ if (statbuf.st_size == 0) - return; + return first_seg_size; + } + + /* Save size of the first segment of the main fork. */ + + else if (map->relpages <= HEAP_FSM_CREATION_THRESHOLD && + (map->relkind == RELKIND_RELATION || + map->relkind == RELKIND_TOASTVALUE)) + { + /* + * In this case, if pg_class.relpages is wrong, it's possible + * that a FSM will be skipped when we actually need it. To guard + * against this, we verify the size of the first segment. + */ + if (stat(old_file, ) != 0) + goto fatal; + else + first_seg_size = statbuf.st_size; + } + else + { + /* + * For indexes etc., we don't care if pg_class.relpages is wrong, + * since we always transfer their FSMs. For heaps, we might + * transfer a FSM when we don't need to, but this is harmless. + */ + first_seg_size = Min(map->relpages, RELSEG_SIZE) * BLCKSZ; } The function transfer_relfile has no clue about skipping of FSM stuff, but it contains comments about it. The check "if (map->relpages <= HEAP_FSM_CREATION_THRESHOLD ..." will needlessly be executed for each segment. I think there is some value in using the information from this function to skip fsm files, but the code doesn't appear to fit well, how about moving this check to new function new_cluster_needs_fsm()? > In the common case, the cost of the stat call is > offset by not linking the FSM. > Agreed. > Despite needing another pg_class field, > I think this code is actually easier to read than my earlier versions. > Yeah, the code appears cleaner from the last version, but I think we can do more in that regards. One more minor comment: snprintf(query + strlen(query), sizeof(query) - strlen(query), "SELECT all_rels.*, n.nspname, c.relname, " - " c.relfilenode, c.reltablespace, %s " + " c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s " "FROM (SELECT * FROM regular_heap " " UNION ALL " " SELECT * FROM toast_heap " @@ -525,6 +530,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_relname = PQfnumber(res, "relname"); i_relfilenode = PQfnumber(res, "relfilenode"); i_reltablespace = PQfnumber(res, "reltablespace"); + i_relpages = PQfnumber(res, "relpages"); + i_relkind = PQfnumber(res, "relkind"); i_spclocation = PQfnumber(res, "spclocation"); The order in which relkind and relpages is used in the above code is different from the order in which it is mentioned in the query, it won't matter, but keeping in order will make look code consistent. I have made this and some more minor code adjustments in the attached patch. If you like those, you can include them in the next version of your patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v20-0003-During-pg_upgrade-conditionally-skip-transfer-of-FSM.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 24, 2019 at 9:50 PM Amit Kapila wrote: > > On Fri, Jan 25, 2019 at 1:03 AM John Naylor > wrote: > > > > On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila > > wrote: > > > I think what doc means to say is > > > that it copies any unlinked files present in primary's new cluster > > > (which in your case will be data2). > > > > In that case, I'm still confused why that doc says, "Unfortunately, > > rsync needlessly copies files associated with temporary and unlogged > > tables because these files don't normally exist on standby servers." > > I fail to see why the primary's new cluster would have these if they > > weren't linked. > > > > Why unlogged files won't be in primary's new cluster? After the > upgrade, they should be present in a new cluster if they were present > in the old cluster. I assume they would be linked, however (I haven't checked this). I did think rewritten VM files would fall under this, but I was confused about unlogged files. > > And in the case we're discussing here, the skipped > > FSMs won't be on data2, so won't end up in standby/data2. > > > > Right. I think we are safe with respect to rsync because I have seen > that we do rewrite the vm files in link mode and rsync will copy them > from primary's new cluster. Okay. > I think you can try to address my other comments on your pg_upgrade > patch. Once we agree on the code, we need to test below scenarios: > (a) upgrade from all supported versions to the latest version > (b) upgrade standby with and without using rsync. Sounds good. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 24, 2019 at 5:19 AM Amit Kapila wrote: > > 1. > + if ((maps[mapnum].relkind != RELKIND_RELATION && > + maps[mapnum].relkind != RELKIND_TOASTVALUE) || > + first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ || > + GET_MAJOR_VERSION(new_cluster.major_version) <= 1100) > + (void) transfer_relfile([mapnum], "_fsm", vm_must_add_frozenbit); > > I think this check will needlessly be performed for future versions as > well, say when wants to upgrade from PG12 to PG13. That might not > create any problem, but let's try to be more precise. Can you try to > rewrite this check? You might want to encapsulate it inside a > function. I have thought of doing something similar to what we do for > vm, see checks relate to VISIBILITY_MAP_FROZEN_BIT_CAT_VER, but I > guess for this patch it is not important to check catalog version as > even if someone tries to upgrade to the same version. Agreed, done for v19 (I've only attached the pg_upgrade patch). > 2. > transfer_relfile() > { > .. > - /* Is it an extent, fsm, or vm file? */ > - if (type_suffix[0] != '\0' || segno != 0) > + /* Did file open fail? */ > + if (stat(old_file, ) != 0) > .. > } > > So from now onwards, we will call stat for even 0th segment which > means there is one additional system call for each relation, not sure > if that matters, but I think there is no harm in once testing with a > large number of relations say 10K to 50K relations which have FSM. Performance testing is probably a good idea anyway, but I went ahead and implemented your next idea: > The other alternative is we can fetch pg_class.relpages and rely on > that to take this decision, but again if that is not updated, we might > take the wrong decision. We can think of it this way: Which is worse, 1. Transferring a FSM we don't need, or 2. Skipping a FSM we need I'd say #2 is worse. So, in v19 we check pg_class.relpages and if it's a heap and less than or equal the threshold we call stat on the 0th segment to verify. In the common case, the cost of the stat call is offset by not linking the FSM. Despite needing another pg_class field, I think this code is actually easier to read than my earlier versions. > 3. > -static void > +static Size > transfer_relfile(FileNameMap *map, const char *type_suffix, bool > vm_must_add_frozenbit) > > If we decide to go with the approach proposed by you, we should add > some comments atop this function for return value change? Done, as well as other comment edits. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 277d969c31349b22e2c7726935d681e72aacaece Mon Sep 17 00:00:00 2001 From: John Naylor Date: Fri, 25 Jan 2019 17:18:28 -0500 Subject: [PATCH v19 3/3] During pg_upgrade, conditionally skip transfer of FSMs. If a heap on the old cluster has 4 pages or fewer, don't copy or link the FSM. This will reduce space usage for installations with large numbers of small tables. --- src/bin/pg_upgrade/info.c| 18 ++- src/bin/pg_upgrade/pg_upgrade.h | 6 +++ src/bin/pg_upgrade/relfilenode.c | 84 +++- 3 files changed, 93 insertions(+), 15 deletions(-) diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c index 2f925f086c..55d4911d10 100644 --- a/src/bin/pg_upgrade/info.c +++ b/src/bin/pg_upgrade/info.c @@ -200,6 +200,8 @@ create_rel_filename_map(const char *old_data, const char *new_data, map->old_db_oid = old_db->db_oid; map->new_db_oid = new_db->db_oid; + map->relpages = old_rel->relpages; + map->relkind = old_rel->relkind; /* * old_relfilenode might differ from pg_class.oid (and hence @@ -415,9 +417,11 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) int ntups; int relnum; int num_rels = 0; + int relpages; char *nspname = NULL; char *relname = NULL; char *tablespace = NULL; + char *relkind = NULL; int i_spclocation, i_nspname, i_relname, @@ -425,7 +429,9 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_indtable, i_toastheap, i_relfilenode, -i_reltablespace; +i_reltablespace, +i_relpages, +i_relkind; char query[QUERY_ALLOC]; char *last_namespace = NULL, *last_tablespace = NULL; @@ -494,7 +500,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) */ snprintf(query + strlen(query), sizeof(query) - strlen(query), "SELECT all_rels.*, n.nspname, c.relname, " - " c.relfilenode, c.reltablespace, %s " + " c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s " "FROM (SELECT * FROM regular_heap " " UNION ALL " " SELECT * FROM toast_heap " @@ -526,6 +532,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) i_relfilenode = PQfnumber(res, "relfilenode"); i_reltablespace = PQfnumber(res, "reltablespace"); i_spclocation = PQfnumber(res, "spclocation"); + i_relpages = PQfnumber(res, "relpages"); + i_relkind = PQfnumber(res, "relkind");
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Jan 25, 2019 at 1:03 AM John Naylor wrote: > > On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila wrote: > > I think what doc means to say is > > that it copies any unlinked files present in primary's new cluster > > (which in your case will be data2). > > In that case, I'm still confused why that doc says, "Unfortunately, > rsync needlessly copies files associated with temporary and unlogged > tables because these files don't normally exist on standby servers." > I fail to see why the primary's new cluster would have these if they > weren't linked. > Why unlogged files won't be in primary's new cluster? After the upgrade, they should be present in a new cluster if they were present in the old cluster. > And in the case we're discussing here, the skipped > FSMs won't be on data2, so won't end up in standby/data2. > Right. I think we are safe with respect to rsync because I have seen that we do rewrite the vm files in link mode and rsync will copy them from primary's new cluster. I think you can try to address my other comments on your pg_upgrade patch. Once we agree on the code, we need to test below scenarios: (a) upgrade from all supported versions to the latest version (b) upgrade standby with and without using rsync. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila wrote: > > On Thu, Jan 24, 2019 at 3:39 AM John Naylor > wrote: > > mkdir -p data1 data2 standby > > > > echo 'heap' > data1/foo > > echo 'fsm' > data1/foo_fsm > > > > # simulate streaming replication > > rsync --archive data1 standby > > > > # simulate pg_upgrade, skipping FSM > > ln data1/foo -t data2/ > > > > rsync --archive --delete --hard-links --size-only --no-inc-recursive > > data1 data2 standby > > > > # result > > ls standby/data1 > > ls standby/data2 > > > > > > The result is that foo_fsm is not copied to standby/data2, contrary to > > what the docs above imply for other unlinked files. Can anyone shed > > light on this? > > > > Is foo_fsm present in standby/data1? Yes it is. > I think what doc means to say is > that it copies any unlinked files present in primary's new cluster > (which in your case will be data2). In that case, I'm still confused why that doc says, "Unfortunately, rsync needlessly copies files associated with temporary and unlogged tables because these files don't normally exist on standby servers." I fail to see why the primary's new cluster would have these if they weren't linked. And in the case we're discussing here, the skipped FSMs won't be on data2, so won't end up in standby/data2. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 24, 2019 at 9:46 AM Amit Kapila wrote: > > On Thu, Jan 24, 2019 at 3:39 AM John Naylor > wrote: > > Few comments related to pg_upgrade patch: 1. + if ((maps[mapnum].relkind != RELKIND_RELATION && + maps[mapnum].relkind != RELKIND_TOASTVALUE) || + first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ || + GET_MAJOR_VERSION(new_cluster.major_version) <= 1100) + (void) transfer_relfile([mapnum], "_fsm", vm_must_add_frozenbit); I think this check will needlessly be performed for future versions as well, say when wants to upgrade from PG12 to PG13. That might not create any problem, but let's try to be more precise. Can you try to rewrite this check? You might want to encapsulate it inside a function. I have thought of doing something similar to what we do for vm, see checks relate to VISIBILITY_MAP_FROZEN_BIT_CAT_VER, but I guess for this patch it is not important to check catalog version as even if someone tries to upgrade to the same version. 2. transfer_relfile() { .. - /* Is it an extent, fsm, or vm file? */ - if (type_suffix[0] != '\0' || segno != 0) + /* Did file open fail? */ + if (stat(old_file, ) != 0) .. } So from now onwards, we will call stat for even 0th segment which means there is one additional system call for each relation, not sure if that matters, but I think there is no harm in once testing with a large number of relations say 10K to 50K relations which have FSM. The other alternative is we can fetch pg_class.relpages and rely on that to take this decision, but again if that is not updated, we might take the wrong decision. Anyone else has any thoughts on this point? 3. -static void +static Size transfer_relfile(FileNameMap *map, const char *type_suffix, bool vm_must_add_frozenbit) If we decide to go with the approach proposed by you, we should add some comments atop this function for return value change? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 24, 2019 at 3:39 AM John Naylor wrote: > > On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila wrote: > > Also, another case to think in this regard is the upgrade for standby > > servers, if you read below paragraph from the user manual [1], you > > will see what I am worried about? > > > > "What this does is to record the links created by pg_upgrade's link > > mode that connect files in the old and new clusters on the primary > > server. It then finds matching files in the standby's old cluster and > > creates links for them in the standby's new cluster. Files that were > > not linked on the primary are copied from the primary to the standby. > > (They are usually small.)" > > > > [1] - https://www.postgresql.org/docs/devel/pgupgrade.html > > I am still not able to get the upgraded standby to go into recovery > without resorting to pg_basebackup, but in another attempt to > investigate your question I tried the following (data1 = old cluster, > data2 = new cluster): > > > mkdir -p data1 data2 standby > > echo 'heap' > data1/foo > echo 'fsm' > data1/foo_fsm > > # simulate streaming replication > rsync --archive data1 standby > > # simulate pg_upgrade, skipping FSM > ln data1/foo -t data2/ > > rsync --archive --delete --hard-links --size-only --no-inc-recursive > data1 data2 standby > > # result > ls standby/data1 > ls standby/data2 > > > The result is that foo_fsm is not copied to standby/data2, contrary to > what the docs above imply for other unlinked files. Can anyone shed > light on this? > Is foo_fsm present in standby/data1? I think what doc means to say is that it copies any unlinked files present in primary's new cluster (which in your case will be data2). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 23, 2019 at 9:18 PM John Naylor wrote: > > On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila wrote: > > I think the first two patches (a) removal of dead code in bootstrap > > and (b) the core patch to avoid creation of FSM file for the small > > table are good now. I have prepared the patches along with commit > > message. There is no change except for some changes in README and > > commit message of the second patch. Kindly let me know what you think > > about them? > > Good to hear! The additional language is fine. In "Once the FSM is > created for heap", I would just change that to "...for a heap". > Sure, apart from this I have run pgindent on the patches and make some changes accordingly. Latest patches attached (only second patch has some changes). I will take one more pass on Monday morning (28th Jan) and will commit unless you or others see any problem. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch Description: Binary data v18-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila wrote: > Also, another case to think in this regard is the upgrade for standby > servers, if you read below paragraph from the user manual [1], you > will see what I am worried about? > > "What this does is to record the links created by pg_upgrade's link > mode that connect files in the old and new clusters on the primary > server. It then finds matching files in the standby's old cluster and > creates links for them in the standby's new cluster. Files that were > not linked on the primary are copied from the primary to the standby. > (They are usually small.)" > > [1] - https://www.postgresql.org/docs/devel/pgupgrade.html I am still not able to get the upgraded standby to go into recovery without resorting to pg_basebackup, but in another attempt to investigate your question I tried the following (data1 = old cluster, data2 = new cluster): mkdir -p data1 data2 standby echo 'heap' > data1/foo echo 'fsm' > data1/foo_fsm # simulate streaming replication rsync --archive data1 standby # simulate pg_upgrade, skipping FSM ln data1/foo -t data2/ rsync --archive --delete --hard-links --size-only --no-inc-recursive data1 data2 standby # result ls standby/data1 ls standby/data2 The result is that foo_fsm is not copied to standby/data2, contrary to what the docs above imply for other unlinked files. Can anyone shed light on this? -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila wrote: > I think the first two patches (a) removal of dead code in bootstrap > and (b) the core patch to avoid creation of FSM file for the small > table are good now. I have prepared the patches along with commit > message. There is no change except for some changes in README and > commit message of the second patch. Kindly let me know what you think > about them? Good to hear! The additional language is fine. In "Once the FSM is created for heap", I would just change that to "...for a heap". > I think these two patches can go even without the upgrade patch > (during pg_upgrade, conditionally skip transfer of FSMs.) which is > still under discussion. However, I am not in a hurry if you or other > thinks that upgrade patch must be committed along with the second > patch. I think the upgrade patch is generally going on track but > might need some more review. The pg_upgrade piece is a nice-to-have feature and not essential, so can go in later. Additional review is also welcome. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Sun, Jan 20, 2019 at 5:19 AM John Naylor wrote: > > I have a test for in-range and out-of-range for each relation fork. > I think the first two patches (a) removal of dead code in bootstrap and (b) the core patch to avoid creation of FSM file for the small table are good now. I have prepared the patches along with commit message. There is no change except for some changes in README and commit message of the second patch. Kindly let me know what you think about them? I think these two patches can go even without the upgrade patch (during pg_upgrade, conditionally skip transfer of FSMs.) which is still under discussion. However, I am not in a hurry if you or other thinks that upgrade patch must be committed along with the second patch. I think the upgrade patch is generally going on track but might need some more review. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch Description: Binary data v17-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila wrote: > So we won't allow transfer of FSM files if their size is below > HEAP_FSM_CREATION_THRESHOLD. What will be its behavior in link mode? > It seems that the old files will remain there. Will it create any > problem when we try to create the files via the new server, can you > once test this case? I tried upgrading in --link mode, and on the new cluster, enlarging the table past the threshold causes a new FSM to be created as expected. > Also, another case to think in this regard is the upgrade for standby > servers, if you read below paragraph from the user manual [1], you > will see what I am worried about? > > "What this does is to record the links created by pg_upgrade's link > mode that connect files in the old and new clusters on the primary > server. It then finds matching files in the standby's old cluster and > creates links for them in the standby's new cluster. Files that were > not linked on the primary are copied from the primary to the standby. > (They are usually small.)" > > [1] - https://www.postgresql.org/docs/devel/pgupgrade.html Trying this, I ran into a couple problems. I'm probably doing something wrong, but I can't help but think there's a pg_upgrade bug/feature I'm unaware of: I set up my test to have primary directory data1 and for the secondary standby/data1. I instructed pg_upgrade to upgrade data1 into data1u, and I tried the rsync recipe in the docs quoted above, and the upgraded standby wouldn't go into recovery. While debugging that, I found surprisingly that pg_upgrade also went further and upgraded standby/data1 into standby/data1u. I tried deleting standby/data1u before running the rsync command and still nothing. Because the upgraded secondary is non-functional, I can't really answer your question. Not sure if this is normal, but the pg_upgraded new cluster no longer had the replication slot. Re-adding it didn't allow my upgraded secondary to go into recovery, either. (I made sure to copy the recovery settings, so that can't be the problem) -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Sun, Jan 20, 2019 at 5:19 AM John Naylor wrote: > Review of v16-0002-During-pg_upgrade-conditionally-skip-transfer-of: - * Copy/link any fsm and vm files, if they exist + * Copy/link any fsm and vm files, if they exist and if they would + * be created in the new cluster. */ - transfer_relfile([mapnum], "_fsm", vm_must_add_frozenbit); + if ((maps[mapnum].relkind != RELKIND_RELATION && + maps[mapnum].relkind != RELKIND_TOASTVALUE) || + first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ || + GET_MAJOR_VERSION (new_cluster.major_version) <= 1100) + (void) transfer_relfile([mapnum], "_fsm", vm_must_add_frozenbit); So we won't allow transfer of FSM files if their size is below HEAP_FSM_CREATION_THRESHOLD. What will be its behavior in link mode? It seems that the old files will remain there. Will it create any problem when we try to create the files via the new server, can you once test this case? Also, another case to think in this regard is the upgrade for standby servers, if you read below paragraph from the user manual [1], you will see what I am worried about? "What this does is to record the links created by pg_upgrade's link mode that connect files in the old and new clusters on the primary server. It then finds matching files in the standby's old cluster and creates links for them in the standby's new cluster. Files that were not linked on the primary are copied from the primary to the standby. (They are usually small.)" [1] - https://www.postgresql.org/docs/devel/pgupgrade.html -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Sat, Jan 19, 2019 at 8:06 AM Amit Kapila wrote: > > On Thu, Jan 17, 2019 at 11:13 PM John Naylor > Few more comments: > 1. > I think we should not allow to create FSM for toast tables as well > till there size reaches HEAP_FSM_CREATION_THRESHOLD. If you try below > test, you can see that FSM will be created for the toast table even if > the size of toast relation is 1 page. ... > I have fixed this in the attached patch, kindly verify it once and see > if you can add the test for same as well. Works for me. For v16, I've added and tested similar logic to pg_upgrade and verified that toast tables work the same as normal tables in recovery. I used a slightly different method to generate the long random string to avoid creating a function. Also, some cosmetic adjustments -- I changed the regression test to use 'i' instead of 'g' to match the use of generate_series in most other tests, and made capitalization more consistent. > 2. > -CREATE TABLE test1 (a int, b int); > -INSERT INTO test1 VALUES (16777217, 131584); > +CREATE TABLE test_rel_forks (a > int); > +-- Make sure there are enough blocks in the heap for the FSM to be created. > +INSERT INTO test_rel_forks SELECT g > from generate_series(1,1) g; > > -VACUUM test1; -- set up FSM > +-- set up FSM and VM > +VACUUM test_rel_forks; > > This test will create 45 pages instead of 1. I know that to create > FSM, we now need more than 4 pages, but 45 seems to be on the higher > side. I think we should not unnecessarily populate more data if there > is no particular need for it, let's restrict the number of pages to 5 > if possible. Good idea, done here and in the fsm regression test. > 3. > -SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1; > - fsm_1 > > - 8192 > -(1 row) > - > -SELECT octet_length > (get_raw_page('test1', 'vm', 0)) AS vm_0; > +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; > +ERROR: block number 10 is out of range for relation "test_rel_forks" > > Why have you changed the test definition here? Previously test checks > the existing FSM page, but now it tries to access out of range page. The patch is hard to read here, but I still have a test for the existing FSM page: -SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; +ERROR: block number 100 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; fsm_0 --- 8192 (1 row) I have a test for in-range and out-of-range for each relation fork. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From ecf0cecee8b0adf8529ddb1fbdabd041172de5e3 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sat, 19 Jan 2019 18:39:39 -0500 Subject: [PATCH v16 1/2] Avoid creation of the free space map for small heap relations. Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. --- contrib/pageinspect/expected/page.out | 77 +++--- contrib/pageinspect/sql/page.sql | 33 ++- doc/src/sgml/storage.sgml | 13 +- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 +- src/backend/access/heap/hio.c | 47 ++-- src/backend/access/heap/vacuumlazy.c | 17 +- src/backend/access/transam/xact.c | 14 ++ src/backend/storage/freespace/README | 33 ++- src/backend/storage/freespace/freespace.c | 275 +- src/backend/storage/freespace/indexfsm.c | 6 +- src/include/storage/freespace.h | 9 +- src/test/regress/expected/fsm.out | 75 ++ src/test/regress/parallel_schedule| 6 + src/test/regress/serial_schedule | 1 + src/test/regress/sql/fsm.sql | 55 + 16 files changed, 571 insertions(+), 102 deletions(-) create mode 100644 src/test/regress/expected/fsm.out create mode 100644 src/test/regress/sql/fsm.sql diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out index 3fcd9fbe6d..11d56f49a4 100644 --- a/contrib/pageinspect/expected/page.out +++ b/contrib/pageinspect/expected/page.out @@ -1,48
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 17, 2019 at 11:13 PM John Naylor wrote: > > On Wed, Jan 16, 2019 at 10:35 PM Amit Kapila wrote: > > Yes, I think it would be good if you can explain the concept of > > local-map with the help of this example. > > > Then let's not add a reference to the version number in this case. I > > Okay, done in v14. I kept your spelling of the new macro. One minor > detail added: use uint8 rather than char for the local map array. This > seems to be preferred, especially in this file. > I am fine with your change. Few more comments: 1. I think we should not allow to create FSM for toast tables as well till there size reaches HEAP_FSM_CREATION_THRESHOLD. If you try below test, you can see that FSM will be created for the toast table even if the size of toast relation is 1 page. CREATE OR REPLACE FUNCTION random_text(length INTEGER) RETURNS TEXT LANGUAGE SQL AS $$ select string_agg(chr (32+(random()*96)::int), '') from generate_series(1,length); $$; create table tt(c1 int, c2 text); insert into tt values(1, random_text(2500)); Vacuum tt; I have fixed this in the attached patch, kindly verify it once and see if you can add the test for same as well. 2. -CREATE TABLE test1 (a int, b int); -INSERT INTO test1 VALUES (16777217, 131584); +CREATE TABLE test_rel_forks (a int); +-- Make sure there are enough blocks in the heap for the FSM to be created. +INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g; -VACUUM test1; -- set up FSM +-- set up FSM and VM +VACUUM test_rel_forks; This test will create 45 pages instead of 1. I know that to create FSM, we now need more than 4 pages, but 45 seems to be on the higher side. I think we should not unnecessarily populate more data if there is no particular need for it, let's restrict the number of pages to 5 if possible. 3. -SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1; - fsm_1 - 8192 -(1 row) - -SELECT octet_length (get_raw_page('test1', 'vm', 0)) AS vm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; +ERROR: block number 10 is out of range for relation "test_rel_forks" Why have you changed the test definition here? Previously test checks the existing FSM page, but now it tries to access out of range page. Apart from the above, I have changed one sentence in README. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v15-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 16, 2019 at 10:35 PM Amit Kapila wrote: > Yes, I think it would be good if you can explain the concept of > local-map with the help of this example. > Then let's not add a reference to the version number in this case. I Okay, done in v14. I kept your spelling of the new macro. One minor detail added: use uint8 rather than char for the local map array. This seems to be preferred, especially in this file. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 38bc2d9b4ae3f0d1ab6d21c54ed638c5aee6d637 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Thu, 17 Jan 2019 12:25:30 -0500 Subject: [PATCH v14 1/2] Avoid creation of the free space map for small heap relations. Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. --- contrib/pageinspect/expected/page.out | 77 +++--- contrib/pageinspect/sql/page.sql | 33 +-- doc/src/sgml/storage.sgml | 13 +- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 +- src/backend/access/heap/hio.c | 47 ++-- src/backend/access/heap/vacuumlazy.c | 17 +- src/backend/access/transam/xact.c | 14 ++ src/backend/storage/freespace/README | 32 ++- src/backend/storage/freespace/freespace.c | 272 +- src/backend/storage/freespace/indexfsm.c | 6 +- src/include/storage/freespace.h | 9 +- src/test/regress/expected/fsm.out | 62 + src/test/regress/parallel_schedule| 6 + src/test/regress/serial_schedule | 1 + src/test/regress/sql/fsm.sql | 46 16 files changed, 545 insertions(+), 102 deletions(-) create mode 100644 src/test/regress/expected/fsm.out create mode 100644 src/test/regress/sql/fsm.sql diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out index 3fcd9fbe6d..83e5910453 100644 --- a/contrib/pageinspect/expected/page.out +++ b/contrib/pageinspect/expected/page.out @@ -1,48 +1,69 @@ CREATE EXTENSION pageinspect; -CREATE TABLE test1 (a int, b int); -INSERT INTO test1 VALUES (16777217, 131584); -VACUUM test1; -- set up FSM +CREATE TABLE test_rel_forks (a int); +-- Make sure there are enough blocks in the heap for the FSM to be created. +INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g; +-- set up FSM and VM +VACUUM test_rel_forks; -- The page contents can vary, so just test that it can be read -- successfully, but don't keep the output. -SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0; main_0 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1; -ERROR: block number 1 is out of range for relation "test1" -SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; +ERROR: block number 100 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; fsm_0 --- 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1; - fsm_1 - 8192 -(1 row) - -SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; +ERROR: block number 10 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0; vm_0 -- 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1; -ERROR: block number 1 is out of range for relation "test1" +SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1; +ERROR: block number 1 is out of range for relation "test_rel_forks" SELECT octet_length(get_raw_page('xxx', 'main', 0)); ERROR: relation "xxx" does not exist -SELECT octet_length(get_raw_page('test1', 'xxx', 0)); +SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0)); ERROR: invalid fork name HINT: Valid fork names are "main", "fsm", "vm", and "init". -SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0); +SELECT * FROM
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 16, 2019 at 10:10 PM John Naylor wrote: > > On Wed, Jan 16, 2019 at 8:41 AM Amit Kapila wrote: > > > > On Fri, Jan 11, 2019 at 3:54 AM John Naylor > > wrote: > > 1. > > Commit message: > > > Any pages with wasted free space become visible at next relation > > > extension, so we still control table bloat. > > > > I think the free space will be available after the next pass of > > vacuum, no? How can relation extension make it available? > > To explain, this diagram shows the map as it looks for different small > table sizes: > > 0123 > A > NA > ANA > NANA > > So for a 3-block table, the alternating strategy never checks block 1. > Any free space block 1 has acquired via delete-and-vacuum will become > visible if it extends to 4 blocks. We are accepting a small amount of > bloat for improved performance, as discussed. Would it help to include > this diagram in the README? > Yes, I think it would be good if you can explain the concept of local-map with the help of this example. > > 2. > > +2. For very small heap relations, the FSM would be relatively large and > > +wasteful, so as of PostgreSQL 12 we refrain from creating the FSM for > > +heaps with HEAP_FSM_CREATION_THRESHOLD pages or fewer, both to save space > > +and to improve performance. To locate free space in this case, we simply > > +iterate over the heap, trying alternating pages in turn. There may be some > > +wasted free space in this case, but it becomes visible again upon next > > +relation extension. > > > > a. Again, how space becomes available at next relation extension. > > b. I think there is no use of mentioning the version number in the > > above comment, this code will be present from PG-12, so one can find > > out from which version this optimization is added. > > It fits with the reference to PG 8.4 earlier in the document. I chose > to be consistent, but to be honest, I'm not much in favor of a lot of > version references in code/READMEs. > Then let's not add a reference to the version number in this case. I also don't see much advantage of adding version number at least in this case. > > I'll include these with some grammar corrections in the next version. > Okay, thanks! -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com